Searched +hist:46 +hist:c0a8ca (Results 1 - 6 of 6) sorted by relevance
/linux-master/ipc/ | ||
H A D | util.h | diff da27f796 Fri Jan 27 11:46:51 MST 2023 Rik van Riel <riel@surriel.com> ipc,namespace: batch free ipc_namespace structures Instead of waiting for an RCU grace period between each ipc_namespace structure that is being freed, wait an RCU grace period for every batch of ipc_namespace structures. Thanks to Al Viro for the suggestion of the helper function. This speeds up the run time of the test case that allocates ipc_namespaces in a loop from 6 minutes, to a little over 1 second: real 0m1.192s user 0m0.038s sys 0m1.152s Signed-off-by: Rik van Riel <riel@surriel.com> Reported-by: Chris Mason <clm@meta.com> Tested-by: Giuseppe Scrivano <gscrivan@redhat.com> Suggested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> diff 99db46ea Tue May 14 16:46:36 MDT 2019 Manfred Spraul <manfred@colorfullife.com> ipc: do cyclic id allocation for the ipc object. For ipcmni_extend mode, the sequence number space is only 7 bits. So the chance of id reuse is relatively high compared with the non-extended mode. To alleviate this id reuse problem, this patch enables cyclic allocation for the index to the radix tree (idx). The disadvantage is that this can cause a slight slow-down of the fast path, as the radix tree could be higher than necessary. To limit the radix tree height, I have chosen the following limits: 1) The cycling is done over in_use*1.5. 2) At least, the cycling is done over "normal" ipcnmi mode: RADIX_TREE_MAP_SIZE elements "ipcmni_extended": 4096 elements Result: - for normal mode: No change for <= 42 active ipc elements. With more than 42 active ipc elements, a 2nd level would be added to the radix tree. Without cyclic allocation, a 2nd level would be added only with more than 63 active elements. - for extended mode: Cycling creates always at least a 2-level radix tree. With more than 2730 active objects, a 3rd level would be added, instead of > 4095 active objects until the 3rd level is added without cyclic allocation. For a 2-level radix tree compared to a 1-level radix tree, I have observed < 1% performance impact. Notes: 1) Normal "x=semget();y=semget();" is unaffected: Then the idx is e.g. a and a+1, regardless if idr_alloc() or idr_alloc_cyclic() is used. 2) The -1% happens in a microbenchmark after this situation: x=semget(); for(i=0;i<4000;i++) {t=semget();semctl(t,0,IPC_RMID);} y=semget(); Now perform semget calls on x and y that do not sleep. 3) The worst-case reuse cycle time is unfortunately unaffected: If you have 2^24-1 ipc objects allocated, and get/remove the last possible element in a loop, then the id is reused after 128 get/remove pairs. Performance check: A microbenchmark that performes no-op semop() randomly on two IDs, with only these two IDs allocated. The IDs were set using /proc/sys/kernel/sem_next_id. The test was run 5 times, averages are shown. 1 & 2: Base (6.22 seconds for 10.000.000 semops) 1 & 40: -0.2% 1 & 3348: - 0.8% 1 & 27348: - 1.6% 1 & 15777204: - 3.2% Or: ~12.6 cpu cycles per additional radix tree level. The cpu is an Intel I3-5010U. ~1300 cpu cycles/syscall is slower than what I remember (spectre impact?). V2 of the patch: - use "min" and "max" - use RADIX_TREE_MAP_SIZE * RADIX_TREE_MAP_SIZE instead of (2<<12). [akpm@linux-foundation.org: fix max() warning] Link: http://lkml.kernel.org/r/20190329204930.21620-3-longman@redhat.com Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Acked-by: Waiman Long <longman@redhat.com> Cc: "Luis R. Rodriguez" <mcgrof@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Matthew Wilcox <willy@infradead.org> Cc: "Eric W . Biederman" <ebiederm@xmission.com> Cc: Takashi Iwai <tiwai@suse.de> Cc: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> diff 3278a2c2 Tue May 14 16:46:33 MDT 2019 Manfred Spraul <manfred@colorfullife.com> ipc: conserve sequence numbers in ipcmni_extend mode Rewrite, based on the patch from Waiman Long: The mixing in of a sequence number into the IPC IDs is probably to avoid ID reuse in userspace as much as possible. With ipcmni_extend mode, the number of usable sequence numbers is greatly reduced leading to higher chance of ID reuse. To address this issue, we need to conserve the sequence number space as much as possible. Right now, the sequence number is incremented for every new ID created. In reality, we only need to increment the sequence number when new allocated ID is not greater than the last one allocated. It is in such case that the new ID may collide with an existing one. This is being done irrespective of the ipcmni mode. In order to avoid any races, the index is first allocated and then the pointer is replaced. Changes compared to the initial patch: - Handle failures from idr_alloc(). - Avoid that concurrent operations can see the wrong sequence number. (This is achieved by using idr_replace()). - IPCMNI_SEQ_SHIFT is not a constant, thus renamed to ipcmni_seq_shift(). - IPCMNI_SEQ_MAX is not a constant, thus renamed to ipcmni_seq_max(). Link: http://lkml.kernel.org/r/20190329204930.21620-2-longman@redhat.com Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Waiman Long <longman@redhat.com> Suggested-by: Matthew Wilcox <willy@infradead.org> Acked-by: Waiman Long <longman@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: "Eric W . Biederman" <ebiederm@xmission.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kees Cook <keescook@chromium.org> Cc: "Luis R. Rodriguez" <mcgrof@kernel.org> Cc: Takashi Iwai <tiwai@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> diff 5ac893b8 Tue May 14 16:46:29 MDT 2019 Waiman Long <longman@redhat.com> ipc: allow boot time extension of IPCMNI from 32k to 16M The maximum number of unique System V IPC identifiers was limited to 32k. That limit should be big enough for most use cases. However, there are some users out there requesting for more, especially those that are migrating from Solaris which uses 24 bits for unique identifiers. To satisfy the need of those users, a new boot time kernel option "ipcmni_extend" is added to extend the IPCMNI value to 16M. This is a 512X increase which should be big enough for users out there that need a large number of unique IPC identifier. The use of this new option will change the pattern of the IPC identifiers returned by functions like shmget(2). An application that depends on such pattern may not work properly. So it should only be used if the users really need more than 32k of unique IPC numbers. This new option does have the side effect of reducing the maximum number of unique sequence numbers from 64k down to 128. So it is a trade-off. The computation of a new IPC id is not done in the performance critical path. So a little bit of additional overhead shouldn't have any real performance impact. Link: http://lkml.kernel.org/r/20190329204930.21620-1-longman@redhat.com Signed-off-by: Waiman Long <longman@redhat.com> Acked-by: Manfred Spraul <manfred@colorfullife.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: "Eric W . Biederman" <ebiederm@xmission.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kees Cook <keescook@chromium.org> Cc: "Luis R. Rodriguez" <mcgrof@kernel.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Takashi Iwai <tiwai@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> diff 0cfb6aee Fri Sep 08 17:17:55 MDT 2017 Guillaume Knispel <guillaume.knispel@supersonicimagine.com> ipc: optimize semget/shmget/msgget for lots of keys ipc_findkey() used to scan all objects to look for the wanted key. This is slow when using a high number of keys. This change adds an rhashtable of kern_ipc_perm objects in ipc_ids, so that one lookup cease to be O(n). This change gives a 865% improvement of benchmark reaim.jobs_per_min on a 56 threads Intel(R) Xeon(R) CPU E5-2695 v3 @ 2.30GHz with 256G memory [1] Other (more micro) benchmark results, by the author: On an i5 laptop, the following loop executed right after a reboot took, without and with this change: for (int i = 0, k=0x424242; i < KEYS; ++i) semget(k++, 1, IPC_CREAT | 0600); total total max single max single KEYS without with call without call with 1 3.5 4.9 µs 3.5 4.9 10 7.6 8.6 µs 3.7 4.7 32 16.2 15.9 µs 4.3 5.3 100 72.9 41.8 µs 3.7 4.7 1000 5,630.0 502.0 µs * * 10000 1,340,000.0 7,240.0 µs * * 31900 17,600,000.0 22,200.0 µs * * *: unreliable measure: high variance The duration for a lookup-only usage was obtained by the same loop once the keys are present: total total max single max single KEYS without with call without call with 1 2.1 2.5 µs 2.1 2.5 10 4.5 4.8 µs 2.2 2.3 32 13.0 10.8 µs 2.3 2.8 100 82.9 25.1 µs * 2.3 1000 5,780.0 217.0 µs * * 10000 1,470,000.0 2,520.0 µs * * 31900 17,400,000.0 7,810.0 µs * * Finally, executing each semget() in a new process gave, when still summing only the durations of these syscalls: creation: total total KEYS without with 1 3.7 5.0 µs 10 32.9 36.7 µs 32 125.0 109.0 µs 100 523.0 353.0 µs 1000 20,300.0 3,280.0 µs 10000 2,470,000.0 46,700.0 µs 31900 27,800,000.0 219,000.0 µs lookup-only: total total KEYS without with 1 2.5 2.7 µs 10 25.4 24.4 µs 32 106.0 72.6 µs 100 591.0 352.0 µs 1000 22,400.0 2,250.0 µs 10000 2,510,000.0 25,700.0 µs 31900 28,200,000.0 115,000.0 µs [1] http://lkml.kernel.org/r/20170814060507.GE23258@yexl-desktop Link: http://lkml.kernel.org/r/20170815194954.ck32ta2z35yuzpwp@debix Signed-off-by: Guillaume Knispel <guillaume.knispel@supersonicimagine.com> Reviewed-by: Marc Pardo <marc.pardo@supersonicimagine.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Kees Cook <keescook@chromium.org> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Serge Hallyn <serge@hallyn.com> Cc: Andrey Vagin <avagin@openvz.org> Cc: Guillaume Knispel <guillaume.knispel@supersonicimagine.com> Cc: Marc Pardo <marc.pardo@supersonicimagine.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> diff 46c0a8ca Fri Jun 06 15:37:37 MDT 2014 Paul McQuade <paulmcquad@gmail.com> ipc, kernel: clear whitespace trailing whitespace Signed-off-by: Paul McQuade <paulmcquad@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> diff 46c0a8ca Fri Jun 06 15:37:37 MDT 2014 Paul McQuade <paulmcquad@gmail.com> ipc, kernel: clear whitespace trailing whitespace Signed-off-by: Paul McQuade <paulmcquad@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> diff c1d7e01d Mon Jul 30 15:42:46 MDT 2012 Will Deacon <will@kernel.org> ipc: use Kconfig options for __ARCH_WANT_[COMPAT_]IPC_PARSE_VERSION Rather than #define the options manually in the architecture code, add Kconfig options for them and select them there instead. This also allows us to select the compat IPC version parsing automatically for platforms using the old compat IPC interface. Reported-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Will Deacon <will.deacon@arm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> diff 31a985fb Wed Jun 17 17:27:46 MDT 2009 Arnd Bergmann <arnd@arndb.de> ipc: use __ARCH_WANT_IPC_PARSE_VERSION in ipc/util.h The definition of ipc_parse_version depends on __ARCH_WANT_IPC_PARSE_VERSION, but the header file declares it conditionally based on the architecture. Use the macro consistently to make it easier to add new architectures. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
H A D | util.c | diff 99db46ea Tue May 14 16:46:36 MDT 2019 Manfred Spraul <manfred@colorfullife.com> ipc: do cyclic id allocation for the ipc object. For ipcmni_extend mode, the sequence number space is only 7 bits. So the chance of id reuse is relatively high compared with the non-extended mode. To alleviate this id reuse problem, this patch enables cyclic allocation for the index to the radix tree (idx). The disadvantage is that this can cause a slight slow-down of the fast path, as the radix tree could be higher than necessary. To limit the radix tree height, I have chosen the following limits: 1) The cycling is done over in_use*1.5. 2) At least, the cycling is done over "normal" ipcnmi mode: RADIX_TREE_MAP_SIZE elements "ipcmni_extended": 4096 elements Result: - for normal mode: No change for <= 42 active ipc elements. With more than 42 active ipc elements, a 2nd level would be added to the radix tree. Without cyclic allocation, a 2nd level would be added only with more than 63 active elements. - for extended mode: Cycling creates always at least a 2-level radix tree. With more than 2730 active objects, a 3rd level would be added, instead of > 4095 active objects until the 3rd level is added without cyclic allocation. For a 2-level radix tree compared to a 1-level radix tree, I have observed < 1% performance impact. Notes: 1) Normal "x=semget();y=semget();" is unaffected: Then the idx is e.g. a and a+1, regardless if idr_alloc() or idr_alloc_cyclic() is used. 2) The -1% happens in a microbenchmark after this situation: x=semget(); for(i=0;i<4000;i++) {t=semget();semctl(t,0,IPC_RMID);} y=semget(); Now perform semget calls on x and y that do not sleep. 3) The worst-case reuse cycle time is unfortunately unaffected: If you have 2^24-1 ipc objects allocated, and get/remove the last possible element in a loop, then the id is reused after 128 get/remove pairs. Performance check: A microbenchmark that performes no-op semop() randomly on two IDs, with only these two IDs allocated. The IDs were set using /proc/sys/kernel/sem_next_id. The test was run 5 times, averages are shown. 1 & 2: Base (6.22 seconds for 10.000.000 semops) 1 & 40: -0.2% 1 & 3348: - 0.8% 1 & 27348: - 1.6% 1 & 15777204: - 3.2% Or: ~12.6 cpu cycles per additional radix tree level. The cpu is an Intel I3-5010U. ~1300 cpu cycles/syscall is slower than what I remember (spectre impact?). V2 of the patch: - use "min" and "max" - use RADIX_TREE_MAP_SIZE * RADIX_TREE_MAP_SIZE instead of (2<<12). [akpm@linux-foundation.org: fix max() warning] Link: http://lkml.kernel.org/r/20190329204930.21620-3-longman@redhat.com Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Acked-by: Waiman Long <longman@redhat.com> Cc: "Luis R. Rodriguez" <mcgrof@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Matthew Wilcox <willy@infradead.org> Cc: "Eric W . Biederman" <ebiederm@xmission.com> Cc: Takashi Iwai <tiwai@suse.de> Cc: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> diff 3278a2c2 Tue May 14 16:46:33 MDT 2019 Manfred Spraul <manfred@colorfullife.com> ipc: conserve sequence numbers in ipcmni_extend mode Rewrite, based on the patch from Waiman Long: The mixing in of a sequence number into the IPC IDs is probably to avoid ID reuse in userspace as much as possible. With ipcmni_extend mode, the number of usable sequence numbers is greatly reduced leading to higher chance of ID reuse. To address this issue, we need to conserve the sequence number space as much as possible. Right now, the sequence number is incremented for every new ID created. In reality, we only need to increment the sequence number when new allocated ID is not greater than the last one allocated. It is in such case that the new ID may collide with an existing one. This is being done irrespective of the ipcmni mode. In order to avoid any races, the index is first allocated and then the pointer is replaced. Changes compared to the initial patch: - Handle failures from idr_alloc(). - Avoid that concurrent operations can see the wrong sequence number. (This is achieved by using idr_replace()). - IPCMNI_SEQ_SHIFT is not a constant, thus renamed to ipcmni_seq_shift(). - IPCMNI_SEQ_MAX is not a constant, thus renamed to ipcmni_seq_max(). Link: http://lkml.kernel.org/r/20190329204930.21620-2-longman@redhat.com Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Waiman Long <longman@redhat.com> Suggested-by: Matthew Wilcox <willy@infradead.org> Acked-by: Waiman Long <longman@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: "Eric W . Biederman" <ebiederm@xmission.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kees Cook <keescook@chromium.org> Cc: "Luis R. Rodriguez" <mcgrof@kernel.org> Cc: Takashi Iwai <tiwai@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> diff 5ac893b8 Tue May 14 16:46:29 MDT 2019 Waiman Long <longman@redhat.com> ipc: allow boot time extension of IPCMNI from 32k to 16M The maximum number of unique System V IPC identifiers was limited to 32k. That limit should be big enough for most use cases. However, there are some users out there requesting for more, especially those that are migrating from Solaris which uses 24 bits for unique identifiers. To satisfy the need of those users, a new boot time kernel option "ipcmni_extend" is added to extend the IPCMNI value to 16M. This is a 512X increase which should be big enough for users out there that need a large number of unique IPC identifier. The use of this new option will change the pattern of the IPC identifiers returned by functions like shmget(2). An application that depends on such pattern may not work properly. So it should only be used if the users really need more than 32k of unique IPC numbers. This new option does have the side effect of reducing the maximum number of unique sequence numbers from 64k down to 128. So it is a trade-off. The computation of a new IPC id is not done in the performance critical path. So a little bit of additional overhead shouldn't have any real performance impact. Link: http://lkml.kernel.org/r/20190329204930.21620-1-longman@redhat.com Signed-off-by: Waiman Long <longman@redhat.com> Acked-by: Manfred Spraul <manfred@colorfullife.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: "Eric W . Biederman" <ebiederm@xmission.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kees Cook <keescook@chromium.org> Cc: "Luis R. Rodriguez" <mcgrof@kernel.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Takashi Iwai <tiwai@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> diff 0cfb6aee Fri Sep 08 17:17:55 MDT 2017 Guillaume Knispel <guillaume.knispel@supersonicimagine.com> ipc: optimize semget/shmget/msgget for lots of keys ipc_findkey() used to scan all objects to look for the wanted key. This is slow when using a high number of keys. This change adds an rhashtable of kern_ipc_perm objects in ipc_ids, so that one lookup cease to be O(n). This change gives a 865% improvement of benchmark reaim.jobs_per_min on a 56 threads Intel(R) Xeon(R) CPU E5-2695 v3 @ 2.30GHz with 256G memory [1] Other (more micro) benchmark results, by the author: On an i5 laptop, the following loop executed right after a reboot took, without and with this change: for (int i = 0, k=0x424242; i < KEYS; ++i) semget(k++, 1, IPC_CREAT | 0600); total total max single max single KEYS without with call without call with 1 3.5 4.9 µs 3.5 4.9 10 7.6 8.6 µs 3.7 4.7 32 16.2 15.9 µs 4.3 5.3 100 72.9 41.8 µs 3.7 4.7 1000 5,630.0 502.0 µs * * 10000 1,340,000.0 7,240.0 µs * * 31900 17,600,000.0 22,200.0 µs * * *: unreliable measure: high variance The duration for a lookup-only usage was obtained by the same loop once the keys are present: total total max single max single KEYS without with call without call with 1 2.1 2.5 µs 2.1 2.5 10 4.5 4.8 µs 2.2 2.3 32 13.0 10.8 µs 2.3 2.8 100 82.9 25.1 µs * 2.3 1000 5,780.0 217.0 µs * * 10000 1,470,000.0 2,520.0 µs * * 31900 17,400,000.0 7,810.0 µs * * Finally, executing each semget() in a new process gave, when still summing only the durations of these syscalls: creation: total total KEYS without with 1 3.7 5.0 µs 10 32.9 36.7 µs 32 125.0 109.0 µs 100 523.0 353.0 µs 1000 20,300.0 3,280.0 µs 10000 2,470,000.0 46,700.0 µs 31900 27,800,000.0 219,000.0 µs lookup-only: total total KEYS without with 1 2.5 2.7 µs 10 25.4 24.4 µs 32 106.0 72.6 µs 100 591.0 352.0 µs 1000 22,400.0 2,250.0 µs 10000 2,510,000.0 25,700.0 µs 31900 28,200,000.0 115,000.0 µs [1] http://lkml.kernel.org/r/20170814060507.GE23258@yexl-desktop Link: http://lkml.kernel.org/r/20170815194954.ck32ta2z35yuzpwp@debix Signed-off-by: Guillaume Knispel <guillaume.knispel@supersonicimagine.com> Reviewed-by: Marc Pardo <marc.pardo@supersonicimagine.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Kees Cook <keescook@chromium.org> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Serge Hallyn <serge@hallyn.com> Cc: Andrey Vagin <avagin@openvz.org> Cc: Guillaume Knispel <guillaume.knispel@supersonicimagine.com> Cc: Marc Pardo <marc.pardo@supersonicimagine.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> diff 46c0a8ca Fri Jun 06 15:37:37 MDT 2014 Paul McQuade <paulmcquad@gmail.com> ipc, kernel: clear whitespace trailing whitespace Signed-off-by: Paul McQuade <paulmcquad@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> diff 46c0a8ca Fri Jun 06 15:37:37 MDT 2014 Paul McQuade <paulmcquad@gmail.com> ipc, kernel: clear whitespace trailing whitespace Signed-off-by: Paul McQuade <paulmcquad@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> diff 206fa940 Tue Nov 12 16:11:46 MST 2013 Xie XiuQi <xiexiuqi@huawei.com> ipc/util.c: remove unnecessary work pending test Remove unnecessary work pending test before calling schedule_work(). It has been tested in queue_work_on() already. No functional changed. Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com> Cc: Tejun Heo <tj@kernel.org> Reviewed-by: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> diff 18ccee26 Wed Oct 16 14:46:45 MDT 2013 Davidlohr Bueso <davidlohr@hp.com> ipc: update locking scheme comments The initial documentation was a bit incomplete, update accordingly. [akpm@linux-foundation.org: make it more readable in 80 columns] Signed-off-by: Davidlohr Bueso <davidlohr@hp.com> Acked-by: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> diff c1d7e01d Mon Jul 30 15:42:46 MDT 2012 Will Deacon <will@kernel.org> ipc: use Kconfig options for __ARCH_WANT_[COMPAT_]IPC_PARSE_VERSION Rather than #define the options manually in the architecture code, add Kconfig options for them and select them there instead. This also allows us to select the compat IPC version parsing automatically for platforms using the old compat IPC interface. Reported-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Will Deacon <will.deacon@arm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> diff c59ede7b Wed Jan 11 01:17:46 MST 2006 Randy Dunlap <rdunlap@infradead.org> [PATCH] move capable() to capability.h - Move capable() from sched.h to capability.h; - Use <linux/capability.h> where capable() is used (in include/, block/, ipc/, kernel/, a few drivers/, mm/, security/, & sound/; many more drivers/ to go) Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> |
H A D | msg.c | diff 889b3317 Mon Feb 03 18:34:46 MST 2020 Lu Shuaibing <shuaibinglu@126.com> ipc/msg.c: consolidate all xxxctl_down() functions A use of uninitialized memory in msgctl_down() because msqid64 in ksys_msgctl hasn't been initialized. The local | msqid64 | is created in ksys_msgctl() and then passed into msgctl_down(). Along the way msqid64 is never initialized before msgctl_down() checks msqid64->msg_qbytes. KUMSAN(KernelUninitializedMemorySantizer, a new error detection tool) reports: ================================================================== BUG: KUMSAN: use of uninitialized memory in msgctl_down+0x94/0x300 Read of size 8 at addr ffff88806bb97eb8 by task syz-executor707/2022 CPU: 0 PID: 2022 Comm: syz-executor707 Not tainted 5.2.0-rc4+ #63 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014 Call Trace: dump_stack+0x75/0xae __kumsan_report+0x17c/0x3e6 kumsan_report+0xe/0x20 msgctl_down+0x94/0x300 ksys_msgctl.constprop.14+0xef/0x260 do_syscall_64+0x7e/0x1f0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x4400e9 Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 fb 13 fc ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007ffd869e0598 EFLAGS: 00000246 ORIG_RAX: 0000000000000047 RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 00000000004400e9 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: 00000000006ca018 R08: 0000000000000000 R09: 0000000000000000 R10: 00000000ffffffff R11: 0000000000000246 R12: 0000000000401970 R13: 0000000000401a00 R14: 0000000000000000 R15: 0000000000000000 The buggy address belongs to the page: page:ffffea0001aee5c0 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 flags: 0x100000000000000() raw: 0100000000000000 0000000000000000 ffffffff01ae0101 0000000000000000 raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: kumsan: bad access detected ================================================================== Syzkaller reproducer: msgctl$IPC_RMID(0x0, 0x0) C reproducer: // autogenerated by syzkaller (https://github.com/google/syzkaller) int main(void) { syscall(__NR_mmap, 0x20000000, 0x1000000, 3, 0x32, -1, 0); syscall(__NR_msgctl, 0, 0, 0); return 0; } [natechancellor@gmail.com: adjust indentation in ksys_msgctl] Link: https://github.com/ClangBuiltLinux/linux/issues/829 Link: http://lkml.kernel.org/r/20191218032932.37479-1-natechancellor@gmail.com Link: http://lkml.kernel.org/r/20190613014044.24234-1-shuaibinglu@126.com Signed-off-by: Lu Shuaibing <shuaibinglu@126.com> Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Suggested-by: Arnd Bergmann <arnd@arndb.de> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: NeilBrown <neilb@suse.com> From: Andrew Morton <akpm@linux-foundation.org> Subject: drivers/block/null_blk_main.c: fix layout Each line here overflows 80 cols by exactly one character. Delete one tab per line to fix. Cc: Shaohua Li <shli@fb.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> diff 0cfb6aee Fri Sep 08 17:17:55 MDT 2017 Guillaume Knispel <guillaume.knispel@supersonicimagine.com> ipc: optimize semget/shmget/msgget for lots of keys ipc_findkey() used to scan all objects to look for the wanted key. This is slow when using a high number of keys. This change adds an rhashtable of kern_ipc_perm objects in ipc_ids, so that one lookup cease to be O(n). This change gives a 865% improvement of benchmark reaim.jobs_per_min on a 56 threads Intel(R) Xeon(R) CPU E5-2695 v3 @ 2.30GHz with 256G memory [1] Other (more micro) benchmark results, by the author: On an i5 laptop, the following loop executed right after a reboot took, without and with this change: for (int i = 0, k=0x424242; i < KEYS; ++i) semget(k++, 1, IPC_CREAT | 0600); total total max single max single KEYS without with call without call with 1 3.5 4.9 µs 3.5 4.9 10 7.6 8.6 µs 3.7 4.7 32 16.2 15.9 µs 4.3 5.3 100 72.9 41.8 µs 3.7 4.7 1000 5,630.0 502.0 µs * * 10000 1,340,000.0 7,240.0 µs * * 31900 17,600,000.0 22,200.0 µs * * *: unreliable measure: high variance The duration for a lookup-only usage was obtained by the same loop once the keys are present: total total max single max single KEYS without with call without call with 1 2.1 2.5 µs 2.1 2.5 10 4.5 4.8 µs 2.2 2.3 32 13.0 10.8 µs 2.3 2.8 100 82.9 25.1 µs * 2.3 1000 5,780.0 217.0 µs * * 10000 1,470,000.0 2,520.0 µs * * 31900 17,400,000.0 7,810.0 µs * * Finally, executing each semget() in a new process gave, when still summing only the durations of these syscalls: creation: total total KEYS without with 1 3.7 5.0 µs 10 32.9 36.7 µs 32 125.0 109.0 µs 100 523.0 353.0 µs 1000 20,300.0 3,280.0 µs 10000 2,470,000.0 46,700.0 µs 31900 27,800,000.0 219,000.0 µs lookup-only: total total KEYS without with 1 2.5 2.7 µs 10 25.4 24.4 µs 32 106.0 72.6 µs 100 591.0 352.0 µs 1000 22,400.0 2,250.0 µs 10000 2,510,000.0 25,700.0 µs 31900 28,200,000.0 115,000.0 µs [1] http://lkml.kernel.org/r/20170814060507.GE23258@yexl-desktop Link: http://lkml.kernel.org/r/20170815194954.ck32ta2z35yuzpwp@debix Signed-off-by: Guillaume Knispel <guillaume.knispel@supersonicimagine.com> Reviewed-by: Marc Pardo <marc.pardo@supersonicimagine.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Kees Cook <keescook@chromium.org> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Serge Hallyn <serge@hallyn.com> Cc: Andrey Vagin <avagin@openvz.org> Cc: Guillaume Knispel <guillaume.knispel@supersonicimagine.com> Cc: Marc Pardo <marc.pardo@supersonicimagine.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> diff 194a6b5b Thu Nov 17 09:46:38 MST 2016 Waiman Long <longman@redhat.com> sched/wake_q: Rename WAKE_Q to DEFINE_WAKE_Q Currently the wake_q data structure is defined by the WAKE_Q() macro. This macro, however, looks like a function doing something as "wake" is a verb. Even checkpatch.pl was confused as it reported warnings like WARNING: Missing a blank line after declarations #548: FILE: kernel/futex.c:3665: + int ret; + WAKE_Q(wake_q); This patch renames the WAKE_Q() macro to DEFINE_WAKE_Q() which clarifies what the macro is doing and eliminates the checkpatch.pl warnings. Signed-off-by: Waiman Long <longman@redhat.com> Acked-by: Davidlohr Bueso <dave@stgolabs.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1479401198-1765-1-git-send-email-longman@redhat.com [ Resolved conflict and added missing rename. ] Signed-off-by: Ingo Molnar <mingo@kernel.org> diff 4bb6657d Fri Jun 06 15:37:46 MDT 2014 Davidlohr Bueso <davidlohr@hp.com> ipc,msg: document volatile r_msg The need for volatile is not obvious, document it. Signed-off-by: Davidlohr Bueso <davidlohr@hp.com> Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Cc: Aswin Chandramouleeswaran <aswin@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> diff 46c0a8ca Fri Jun 06 15:37:37 MDT 2014 Paul McQuade <paulmcquad@gmail.com> ipc, kernel: clear whitespace trailing whitespace Signed-off-by: Paul McQuade <paulmcquad@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> diff 46c0a8ca Fri Jun 06 15:37:37 MDT 2014 Paul McQuade <paulmcquad@gmail.com> ipc, kernel: clear whitespace trailing whitespace Signed-off-by: Paul McQuade <paulmcquad@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> diff 4f87dac3 Mon Mar 10 07:46:07 MDT 2014 Michael Kerrisk <mtk.manpages@gmail.com> ipc: Fix 2 bugs in msgrcv() MSG_COPY implementation While testing and documenting the msgrcv() MSG_COPY flag that Stanislav Kinsbursky added in commit 4a674f34ba04 ("ipc: introduce message queue copy feature" => kernel 3.8), I discovered a couple of bugs in the implementation. The two bugs concern MSG_COPY interactions with other msgrcv() flags, namely: (A) MSG_COPY + MSG_EXCEPT (B) MSG_COPY + !IPC_NOWAIT The bugs are distinct (and the fix for the first one is obvious), however my fix for both is a single-line patch, which is why I'm combining them in a single mail, rather than writing two mails+patches. ===== (A) MSG_COPY + MSG_EXCEPT ===== With the addition of the MSG_COPY flag, there are now two msgrcv() flags--MSG_COPY and MSG_EXCEPT--that modify the meaning of the 'msgtyp' argument in unrelated ways. Specifying both in the same call is a logical error that is currently permitted, with the effect that MSG_COPY has priority and MSG_EXCEPT is ignored. The call should give an error if both flags are specified. The patch below implements that behavior. ===== (B) (B) MSG_COPY + !IPC_NOWAIT ===== The test code that was submitted in commit 3a665531a3b7 ("selftests: IPC message queue copy feature test") shows MSG_COPY being used in conjunction with IPC_NOWAIT. In other words, if there is no message at the position 'msgtyp'. return immediately with the error in ENOMSG. What was not (fully) tested is the behavior if MSG_COPY is specified *without* IPC_NOWAIT, and there is an odd behavior. If the queue contains less than 'msgtyp' messages, then the call blocks until the next message is written to the queue. At that point, the msgrcv() call returns a copy of the newly added message, regardless of whether that message is at the ordinal position 'msgtyp'. This is clearly bogus, and problematic for applications that might want to make use of the MSG_COPY flag. I considered the following possible solutions to this problem: (1) Force the call to block until a message *does* appear at the position 'msgtyp'. (2) If the MSG_COPY flag is specified, the kernel should implicitly add IPC_NOWAIT, so that the call fails with ENOMSG for this case. (3) If the MSG_COPY flag is specified, but IPC_NOWAIT is not, generate an error (probably, EINVAL is the right one). I do not know if any application would really want to have the functionality of solution (1), especially since an application can determine in advance the number of messages in the queue using msgctl() IPC_STAT. Obviously, this solution would be the most work to implement. Solution (2) would have the effect of silently fixing any applications that tried to employ broken behavior. However, it would mean that if we later decided to implement solution (1), then user-space could not easily detect what the kernel supports (but, since I'm somewhat doubtful that solution (1) is needed, I'm not sure that this is much of a problem). Solution (3) would have the effect of informing broken applications that they are doing something broken. The downside is that this would cause a ABI breakage for any applications that are currently employing the broken behavior. However: a) Those applications are almost certainly not getting the results they expect. b) Possibly, those applications don't even exist, because MSG_COPY is currently hidden behind CONFIG_CHECKPOINT_RESTORE. The upside of solution (3) is that if we later decided to implement solution (1), user-space could determine what the kernel supports, via the error return. In my view, solution (3) is mildly preferable to solution (2), and solution (1) could still be done later if anyone really cares. The patch below implements solution (3). PS. For anyone out there still listening, it's the usual story: documenting an API (and the thinking about, and the testing of the API, that documentation entails) is the one of the single best ways of finding bugs in the API, as I've learned from a lot of experience. Best to do that documentation before releasing the API. Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com> Acked-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Cc: Stanislav Kinsbursky <skinsbursky@parallels.com> Cc: stable@vger.kernel.org Cc: Serge Hallyn <serge.hallyn@canonical.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> diff dfcceb26 Thu Jun 05 23:46:38 MDT 2008 Nadia Derbey <Nadia.Derbey@bull.net> ipc: only output msgmni value at boot time When posting: [PATCH 1/8] Scaling msgmni to the amount of lowmem (see http://lkml.org/lkml/2008/2/11/171), I have added a KERN_INFO message that is output each time msgmni is recomputed. In http://lkml.org/lkml/2008/4/29/575 Tony Luck complained that this message references an ipc namespace address that is useless. I first thought of using an audit_log instead of a printk, as suggested by Serge Hallyn. But unfortunately, we do not have any other information than the namespace address to provide here too. So I chose to move the message and output it only at boot time, removing the reference to the namespace. Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net> Cc: Pierre Peiffer <peifferp@gmail.com> Cc: Manfred Spraul <manfred@colorfullife.com> Acked-by: Tony Luck <tony.luck@intel.com> Cc: "Serge E. Hallyn" <serue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> diff c59ede7b Wed Jan 11 01:17:46 MST 2006 Randy Dunlap <rdunlap@infradead.org> [PATCH] move capable() to capability.h - Move capable() from sched.h to capability.h; - Use <linux/capability.h> where capable() is used (in include/, block/, ipc/, kernel/, a few drivers/, mm/, security/, & sound/; many more drivers/ to go) Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> |
H A D | shm.c | diff 01293a62 Tue Sep 06 13:48:58 MDT 2022 Liam R. Howlett <Liam.Howlett@Oracle.com> ipc/shm: use VMA iterator instead of linked list The VMA iterator is faster than the linked llist, and it can be walked even when VMAs are being removed from the address space, so there's no need to keep track of 'next'. Link: https://lkml.kernel.org/r/20220906194824.2110408-46-Liam.Howlett@oracle.com Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Tested-by: Yu Zhao <yuzhao@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: SeongJae Park <sj@kernel.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> diff 9c21dae2 Tue Sep 04 16:46:02 MDT 2018 Davidlohr Bueso <dbueso@suse.de> ipc/shm: properly return EIDRM in shm_lock() When getting rid of the general ipc_lock(), this was missed furthermore, making the comment around the ipc object validity check bogus. Under EIDRM conditions, callers will in turn not see the error and continue with the operation. Link: http://lkml.kernel.org/r/20180824030920.GD3677@linux-r8p5 Link: http://lkml.kernel.org/r/20180823024051.GC13343@shao2-debian Fixes: 82061c57ce9 ("ipc: drop ipc_lock()") Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Reported-by: kernel test robot <rong.a.chen@intel.com> Cc: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> diff 0cfb6aee Fri Sep 08 17:17:55 MDT 2017 Guillaume Knispel <guillaume.knispel@supersonicimagine.com> ipc: optimize semget/shmget/msgget for lots of keys ipc_findkey() used to scan all objects to look for the wanted key. This is slow when using a high number of keys. This change adds an rhashtable of kern_ipc_perm objects in ipc_ids, so that one lookup cease to be O(n). This change gives a 865% improvement of benchmark reaim.jobs_per_min on a 56 threads Intel(R) Xeon(R) CPU E5-2695 v3 @ 2.30GHz with 256G memory [1] Other (more micro) benchmark results, by the author: On an i5 laptop, the following loop executed right after a reboot took, without and with this change: for (int i = 0, k=0x424242; i < KEYS; ++i) semget(k++, 1, IPC_CREAT | 0600); total total max single max single KEYS without with call without call with 1 3.5 4.9 µs 3.5 4.9 10 7.6 8.6 µs 3.7 4.7 32 16.2 15.9 µs 4.3 5.3 100 72.9 41.8 µs 3.7 4.7 1000 5,630.0 502.0 µs * * 10000 1,340,000.0 7,240.0 µs * * 31900 17,600,000.0 22,200.0 µs * * *: unreliable measure: high variance The duration for a lookup-only usage was obtained by the same loop once the keys are present: total total max single max single KEYS without with call without call with 1 2.1 2.5 µs 2.1 2.5 10 4.5 4.8 µs 2.2 2.3 32 13.0 10.8 µs 2.3 2.8 100 82.9 25.1 µs * 2.3 1000 5,780.0 217.0 µs * * 10000 1,470,000.0 2,520.0 µs * * 31900 17,400,000.0 7,810.0 µs * * Finally, executing each semget() in a new process gave, when still summing only the durations of these syscalls: creation: total total KEYS without with 1 3.7 5.0 µs 10 32.9 36.7 µs 32 125.0 109.0 µs 100 523.0 353.0 µs 1000 20,300.0 3,280.0 µs 10000 2,470,000.0 46,700.0 µs 31900 27,800,000.0 219,000.0 µs lookup-only: total total KEYS without with 1 2.5 2.7 µs 10 25.4 24.4 µs 32 106.0 72.6 µs 100 591.0 352.0 µs 1000 22,400.0 2,250.0 µs 10000 2,510,000.0 25,700.0 µs 31900 28,200,000.0 115,000.0 µs [1] http://lkml.kernel.org/r/20170814060507.GE23258@yexl-desktop Link: http://lkml.kernel.org/r/20170815194954.ck32ta2z35yuzpwp@debix Signed-off-by: Guillaume Knispel <guillaume.knispel@supersonicimagine.com> Reviewed-by: Marc Pardo <marc.pardo@supersonicimagine.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Kees Cook <keescook@chromium.org> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Serge Hallyn <serge@hallyn.com> Cc: Andrey Vagin <avagin@openvz.org> Cc: Guillaume Knispel <guillaume.knispel@supersonicimagine.com> Cc: Marc Pardo <marc.pardo@supersonicimagine.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> diff e1832f29 Thu Aug 06 16:46:55 MDT 2015 Stephen Smalley <sds@tycho.nsa.gov> ipc: use private shmem or hugetlbfs inodes for shm segments. The shm implementation internally uses shmem or hugetlbfs inodes for shm segments. As these inodes are never directly exposed to userspace and only accessed through the shm operations which are already hooked by security modules, mark the inodes with the S_PRIVATE flag so that inode security initialization and permission checking is skipped. This was motivated by the following lockdep warning: ====================================================== [ INFO: possible circular locking dependency detected ] 4.2.0-0.rc3.git0.1.fc24.x86_64+debug #1 Tainted: G W ------------------------------------------------------- httpd/1597 is trying to acquire lock: (&ids->rwsem){+++++.}, at: shm_close+0x34/0x130 but task is already holding lock: (&mm->mmap_sem){++++++}, at: SyS_shmdt+0x4b/0x180 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #3 (&mm->mmap_sem){++++++}: lock_acquire+0xc7/0x270 __might_fault+0x7a/0xa0 filldir+0x9e/0x130 xfs_dir2_block_getdents.isra.12+0x198/0x1c0 [xfs] xfs_readdir+0x1b4/0x330 [xfs] xfs_file_readdir+0x2b/0x30 [xfs] iterate_dir+0x97/0x130 SyS_getdents+0x91/0x120 entry_SYSCALL_64_fastpath+0x12/0x76 -> #2 (&xfs_dir_ilock_class){++++.+}: lock_acquire+0xc7/0x270 down_read_nested+0x57/0xa0 xfs_ilock+0x167/0x350 [xfs] xfs_ilock_attr_map_shared+0x38/0x50 [xfs] xfs_attr_get+0xbd/0x190 [xfs] xfs_xattr_get+0x3d/0x70 [xfs] generic_getxattr+0x4f/0x70 inode_doinit_with_dentry+0x162/0x670 sb_finish_set_opts+0xd9/0x230 selinux_set_mnt_opts+0x35c/0x660 superblock_doinit+0x77/0xf0 delayed_superblock_init+0x10/0x20 iterate_supers+0xb3/0x110 selinux_complete_init+0x2f/0x40 security_load_policy+0x103/0x600 sel_write_load+0xc1/0x750 __vfs_write+0x37/0x100 vfs_write+0xa9/0x1a0 SyS_write+0x58/0xd0 entry_SYSCALL_64_fastpath+0x12/0x76 ... Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov> Reported-by: Morten Stevens <mstevens@fedoraproject.org> Acked-by: Hugh Dickins <hughd@google.com> Acked-by: Paul Moore <paul@paul-moore.com> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Eric Paris <eparis@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> diff 46c0a8ca Fri Jun 06 15:37:37 MDT 2014 Paul McQuade <paulmcquad@gmail.com> ipc, kernel: clear whitespace trailing whitespace Signed-off-by: Paul McQuade <paulmcquad@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> diff 46c0a8ca Fri Jun 06 15:37:37 MDT 2014 Paul McQuade <paulmcquad@gmail.com> ipc, kernel: clear whitespace trailing whitespace Signed-off-by: Paul McQuade <paulmcquad@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> diff 54cb8821 Thu Jul 19 02:46:59 MDT 2007 Nick Piggin <npiggin@suse.de> mm: merge populate and nopage into fault (fixes nonlinear) Nonlinear mappings are (AFAIKS) simply a virtual memory concept that encodes the virtual address -> file offset differently from linear mappings. ->populate is a layering violation because the filesystem/pagecache code should need to know anything about the virtual memory mapping. The hitch here is that the ->nopage handler didn't pass down enough information (ie. pgoff). But it is more logical to pass pgoff rather than have the ->nopage function calculate it itself anyway (because that's a similar layering violation). Having the populate handler install the pte itself is likewise a nasty thing to be doing. This patch introduces a new fault handler that replaces ->nopage and ->populate and (later) ->nopfn. Most of the old mechanism is still in place so there is a lot of duplication and nice cleanups that can be removed if everyone switches over. The rationale for doing this in the first place is that nonlinear mappings are subject to the pagefault vs invalidate/truncate race too, and it seemed stupid to duplicate the synchronisation logic rather than just consolidate the two. After this patch, MAP_NONBLOCK no longer sets up ptes for pages present in pagecache. Seems like a fringe functionality anyway. NOPAGE_REFAULT is removed. This should be implemented with ->fault, and no users have hit mainline yet. [akpm@linux-foundation.org: cleanup] [randy.dunlap@oracle.com: doc. fixes for readahead] [akpm@linux-foundation.org: build fix] Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: Mark Fasheh <mark.fasheh@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> diff 516dffdc Thu Mar 01 16:46:08 MST 2007 Adam Litke <agl@us.ibm.com> [PATCH] Fix get_unmapped_area and fsync for hugetlb shm segments This patch provides the following hugetlb-related fixes to the recent stacked shm files changes: - Update is_file_hugepages() so it will reconize hugetlb shm segments. - get_unmapped_area must be called with the nested file struct to handle the sfd->file->f_ops->get_unmapped_area == NULL case. - The fsync f_op must be wrapped since it is specified in the hugetlbfs f_ops. This is based on proposed fixes from Eric Biederman that were debugged and tested by me. Without it, attempting to use hugetlb shared memory segments on powerpc (and likely ia64) will kill your box. Signed-off-by: Adam Litke <agl@us.ibm.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Andrew Morton <akpm@linux-foundation.org> Acked-by: William Irwin <bill.irwin@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> diff c59ede7b Wed Jan 11 01:17:46 MST 2006 Randy Dunlap <rdunlap@infradead.org> [PATCH] move capable() to capability.h - Move capable() from sched.h to capability.h; - Use <linux/capability.h> where capable() is used (in include/, block/, ipc/, kernel/, a few drivers/, mm/, security/, & sound/; many more drivers/ to go) Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> |
H A D | sem.c | diff 0cfb6aee Fri Sep 08 17:17:55 MDT 2017 Guillaume Knispel <guillaume.knispel@supersonicimagine.com> ipc: optimize semget/shmget/msgget for lots of keys ipc_findkey() used to scan all objects to look for the wanted key. This is slow when using a high number of keys. This change adds an rhashtable of kern_ipc_perm objects in ipc_ids, so that one lookup cease to be O(n). This change gives a 865% improvement of benchmark reaim.jobs_per_min on a 56 threads Intel(R) Xeon(R) CPU E5-2695 v3 @ 2.30GHz with 256G memory [1] Other (more micro) benchmark results, by the author: On an i5 laptop, the following loop executed right after a reboot took, without and with this change: for (int i = 0, k=0x424242; i < KEYS; ++i) semget(k++, 1, IPC_CREAT | 0600); total total max single max single KEYS without with call without call with 1 3.5 4.9 µs 3.5 4.9 10 7.6 8.6 µs 3.7 4.7 32 16.2 15.9 µs 4.3 5.3 100 72.9 41.8 µs 3.7 4.7 1000 5,630.0 502.0 µs * * 10000 1,340,000.0 7,240.0 µs * * 31900 17,600,000.0 22,200.0 µs * * *: unreliable measure: high variance The duration for a lookup-only usage was obtained by the same loop once the keys are present: total total max single max single KEYS without with call without call with 1 2.1 2.5 µs 2.1 2.5 10 4.5 4.8 µs 2.2 2.3 32 13.0 10.8 µs 2.3 2.8 100 82.9 25.1 µs * 2.3 1000 5,780.0 217.0 µs * * 10000 1,470,000.0 2,520.0 µs * * 31900 17,400,000.0 7,810.0 µs * * Finally, executing each semget() in a new process gave, when still summing only the durations of these syscalls: creation: total total KEYS without with 1 3.7 5.0 µs 10 32.9 36.7 µs 32 125.0 109.0 µs 100 523.0 353.0 µs 1000 20,300.0 3,280.0 µs 10000 2,470,000.0 46,700.0 µs 31900 27,800,000.0 219,000.0 µs lookup-only: total total KEYS without with 1 2.5 2.7 µs 10 25.4 24.4 µs 32 106.0 72.6 µs 100 591.0 352.0 µs 1000 22,400.0 2,250.0 µs 10000 2,510,000.0 25,700.0 µs 31900 28,200,000.0 115,000.0 µs [1] http://lkml.kernel.org/r/20170814060507.GE23258@yexl-desktop Link: http://lkml.kernel.org/r/20170815194954.ck32ta2z35yuzpwp@debix Signed-off-by: Guillaume Knispel <guillaume.knispel@supersonicimagine.com> Reviewed-by: Marc Pardo <marc.pardo@supersonicimagine.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Kees Cook <keescook@chromium.org> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Serge Hallyn <serge@hallyn.com> Cc: Andrey Vagin <avagin@openvz.org> Cc: Guillaume Knispel <guillaume.knispel@supersonicimagine.com> Cc: Marc Pardo <marc.pardo@supersonicimagine.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> diff b5fa01a2 Wed Dec 14 16:06:46 MST 2016 Davidlohr Bueso <dave@stgolabs.net> ipc/sem: simplify wait-wake loop Instead of using the reverse goto, we can simplify the flow and make it more language natural by just doing do-while instead. One would hope this is the standard way (or obviously just with a while bucle) that we do wait/wakeup handling in the kernel. The exact same logic is kept, just more indented. [akpm@linux-foundation.org: coding-style fixes] Link: http://lkml.kernel.org/r/1478708774-28826-2-git-send-email-dave@stgolabs.net Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Cc: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> diff 46c0a8ca Fri Jun 06 15:37:37 MDT 2014 Paul McQuade <paulmcquad@gmail.com> ipc, kernel: clear whitespace trailing whitespace Signed-off-by: Paul McQuade <paulmcquad@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> diff 46c0a8ca Fri Jun 06 15:37:37 MDT 2014 Paul McQuade <paulmcquad@gmail.com> ipc, kernel: clear whitespace trailing whitespace Signed-off-by: Paul McQuade <paulmcquad@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> diff 6e224f94 Wed Oct 16 14:46:45 MDT 2013 Manfred Spraul <manfred@colorfullife.com> ipc/sem.c: synchronize semop and semctl with IPC_RMID After acquiring the semlock spinlock, operations must test that the array is still valid. - semctl() and exit_sem() would walk stale linked lists (ugly, but should be ok: all lists are empty) - semtimedop() would sleep forever - and if woken up due to a signal - access memory after free. The patch also: - standardizes the tests for .deleted, so that all tests in one function leave the function with the same approach. - unconditionally tests for .deleted immediately after every call to sem_lock - even it it means that for semctl(GETALL), .deleted will be tested twice. Both changes make the review simpler: After every sem_lock, there must be a test of .deleted, followed by a goto to the cleanup code (if the function uses "goto cleanup"). The only exception is semctl_down(): If sem_ids().rwsem is locked, then the presence in ids->ipcs_idr is equivalent to !.deleted, thus no additional test is required. Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Cc: Mike Galbraith <efault@gmx.de> Acked-by: Davidlohr Bueso <davidlohr@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> diff 6ff37972 Tue Apr 29 02:00:46 MDT 2008 Pierre Peiffer <pierre.peiffer@bull.net> IPC/semaphores: code factorisation Trivial patch which adds some small locking functions and makes use of them to factorize some part of the code and to make it cleaner. Signed-off-by: Pierre Peiffer <pierre.peiffer@bull.net> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: Nadia Derbey <Nadia.Derbey@bull.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> diff c59ede7b Wed Jan 11 01:17:46 MST 2006 Randy Dunlap <rdunlap@infradead.org> [PATCH] move capable() to capability.h - Move capable() from sched.h to capability.h; - Use <linux/capability.h> where capable() is used (in include/, block/, ipc/, kernel/, a few drivers/, mm/, security/, & sound/; many more drivers/ to go) Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> |
/linux-master/kernel/ | ||
H A D | acct.c | diff 46c0a8ca Fri Jun 06 15:37:37 MDT 2014 Paul McQuade <paulmcquad@gmail.com> ipc, kernel: clear whitespace trailing whitespace Signed-off-by: Paul McQuade <paulmcquad@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> diff 46c0a8ca Fri Jun 06 15:37:37 MDT 2014 Paul McQuade <paulmcquad@gmail.com> ipc, kernel: clear whitespace trailing whitespace Signed-off-by: Paul McQuade <paulmcquad@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> diff 6248b1b3 Fri Jul 25 02:48:46 MDT 2008 Pavel Emelyanov <xemul@openvz.org> bsdacct: make internal code work with passed bsd_acct_struct, not global This adds the appropriate pointer to all the internal (i.e. static) functions that work with global acct instance. API calls pass a global instance to them (while we still have such). Mostly this is a s/acct_globals./acct->/ over the file. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Cc: Balbir Singh <balbir@in.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> diff c59ede7b Wed Jan 11 01:17:46 MST 2006 Randy Dunlap <rdunlap@infradead.org> [PATCH] move capable() to capability.h - Move capable() from sched.h to capability.h; - Use <linux/capability.h> where capable() is used (in include/, block/, ipc/, kernel/, a few drivers/, mm/, security/, & sound/; many more drivers/ to go) Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> |
Completed in 559 milliseconds