#
cd14b018 |
|
11-Feb-2024 |
Masahiro Yamada <masahiroy@kernel.org> |
treewide: replace or remove redundant def_bool in Kconfig files 'def_bool X' is a shorthand for 'bool' plus 'default X'. 'def_bool' is redundant where 'bool' is already present, so 'def_bool X' can be replaced with 'default X', or removed if X is 'n'. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
#
4ff4c745 |
|
31-Jan-2024 |
Andrea Parri <parri.andrea@gmail.com> |
locking: Introduce prepare_sync_core_cmd() Introduce an architecture function that architectures can use to set up ("prepare") SYNC_CORE commands. The function will be used by RISC-V to update its "deferred icache- flush" data structures (icache_stale_mask). Architectures defining prepare_sync_core_cmd() static inline need to select ARCH_HAS_PREPARE_SYNC_CORE_CMD. Suggested-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Andrea Parri <parri.andrea@gmail.com> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://lore.kernel.org/r/20240131144936.29190-4-parri.andrea@gmail.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
#
fd0a68a2 |
|
15-Feb-2024 |
Tejun Heo <tj@kernel.org> |
workqueue, irq_work: Build fix for !CONFIG_IRQ_WORK 2f34d7337d98 ("workqueue: Fix queue_work_on() with BH workqueues") added irq_work usage to workqueue; however, it turns out irq_work is actually optional and the change breaks build on configuration which doesn't have CONFIG_IRQ_WORK enabled. Fix build by making workqueue use irq_work only when CONFIG_SMP and enabling CONFIG_IRQ_WORK when CONFIG_SMP is set. It's reasonable to argue that it may be better to just always enable it. However, this still saves a small bit of memory for tiny UP configs and also the least amount of change, so, for now, let's keep it conditional. Verified to do the right thing for x86_64 allnoconfig and defconfig, and aarch64 allnoconfig, allnoconfig + prink disable (SMP but nothing selects IRQ_WORK) and a modified aarch64 Kconfig where !SMP and nothing selects IRQ_WORK. v2: `depends on SMP` leads to Kconfig warnings when CONFIG_IRQ_WORK is selected by something else when !CONFIG_SMP. Use `def_bool y if SMP` instead. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Tested-by: Anders Roxell <anders.roxell@linaro.org> Fixes: 2f34d7337d98 ("workqueue: Fix queue_work_on() with BH workqueues") Cc: Stephen Rothwell <sfr@canb.auug.org.au>
|
#
e55dad12 |
|
04-Feb-2024 |
Masahiro Yamada <masahiroy@kernel.org> |
bpf: Merge two CONFIG_BPF entries 'config BPF' exists in both init/Kconfig and kernel/bpf/Kconfig. Commit b24abcff918a ("bpf, kconfig: Add consolidated menu entry for bpf with core options") added the second one to kernel/bpf/Kconfig instead of moving the existing one. Merge them together. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/bpf/20240204075634.32969-1-masahiroy@kernel.org
|
#
3e00f580 |
|
23-Feb-2024 |
Kees Cook <keescook@chromium.org> |
init/Kconfig: lower GCC version check for -Warray-bounds We continue to see false positives from -Warray-bounds even in GCC 10, which is getting reported in a few places[1] still: security/security.c:811:2: warning: `memcpy' offset 32 is out of the bounds [0, 0] [-Warray-bounds] Lower the GCC version check from 11 to 10. Link: https://lkml.kernel.org/r/20240223170824.work.768-kees@kernel.org Reported-by: Lu Yao <yaolu@kylinos.cn> Closes: https://lore.kernel.org/lkml/20240117014541.8887-1-yaolu@kylinos.cn/ Link: https://lore.kernel.org/linux-next/65d84438.620a0220.7d171.81a7@mx.google.com [1] Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Paul Moore <paul@paul-moore.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Gustavo A. R. Silva" <gustavoars@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Marc Aurèle La France <tsi@tuyoix.net> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
#
68fb3ca0 |
|
15-Feb-2024 |
Linus Torvalds <torvalds@linux-foundation.org> |
update workarounds for gcc "asm goto" issue In commit 4356e9f841f7 ("work around gcc bugs with 'asm goto' with outputs") I did the gcc workaround unconditionally, because the cause of the bad code generation wasn't entirely clear. In the meantime, Jakub Jelinek debugged the issue, and has come up with a fix in gcc [2], which also got backported to the still maintained branches of gcc-11, gcc-12 and gcc-13. Note that while the fix technically wasn't in the original gcc-14 branch, Jakub says: "while it is true that no GCC 14 snapshots until today (or whenever the fix will be committed) have the fix, for GCC trunk it is up to the distros to use the latest snapshot if they use it at all and would allow better testing of the kernel code without the workaround, so that if there are other issues they won't be discovered years later. Most userland code doesn't actually use asm goto with outputs..." so we will consider gcc-14 to be fixed - if somebody is using gcc snapshots of the gcc-14 before the fix, they should upgrade. Note that while the bug goes back to gcc-11, in practice other gcc changes seem to have effectively hidden it since gcc-12.1 as per a bisect by Jakub. So even a gcc-14 snapshot without the fix likely doesn't show actual problems. Also, make the default 'asm_goto_output()' macro mark the asm as volatile by hand, because of an unrelated gcc issue [1] where it doesn't match the documented behavior ("asm goto is always volatile"). Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103979 [1] Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113921 [2] Link: https://lore.kernel.org/all/20240208220604.140859-1-seanjc@google.com/ Requested-by: Jakub Jelinek <jakub@redhat.com> Cc: Uros Bizjak <ubizjak@gmail.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Andrew Pinski <quic_apinski@quicinc.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
02153319 |
|
01-Feb-2024 |
Linus Torvalds <torvalds@linux-foundation.org> |
Kconfig: Disable -Wstringop-overflow for GCC globally It turns out it was never just gcc-11 that was broken. Apparently it just happens to work on x86-64 with other gcc versions. On arm64, I see warnings with gcc version 13.2.1, and the kernel test robot reports the same problem on s390 with gcc 13.2.0. Admittedly it seems to be just the new Xe drm driver, but this is keeping me from doing my normal arm64 build testing. So it gets reverted until somebody figures out what causes the problem (and why it doesn't show on x86-64, which is what makes me suspect it was never just about gcc-11, and more about just random happenstance). This also changes the Kconfig naming a bit - just make the "disable this for GCC" conditional be one simple Kconfig entry, and we can put the gcc version dependencies in that entry once we figure out what the correct rules are. The version dependency _may_ still end up being "gcc version larger than 11" if the issue is purely in the Xe driver, but even if that ends up the case, let's make that all part of the "GCC_NO_STRINGOP_OVERFLOW" logic. For now, we just disable it for all gcc versions while the exact cause is unknown. Link: https://lore.kernel.org/all/202401161031.hjGJHMiJ-lkp@intel.com/T/ Cc: Gustavo A. R. Silva <gustavoars@kernel.org> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
a5e0ace0 |
|
30-Nov-2023 |
Gustavo A. R. Silva <gustavoars@kernel.org> |
init: Kconfig: Disable -Wstringop-overflow for GCC-11 -Wstringop-overflow is buggy in GCC-11. Therefore, we should disable this option specifically for that compiler version. To achieve this, we introduce a new configuration option: GCC11_NO_STRINGOP_OVERFLOW. The compiler option related to string operation overflow is now managed under configuration CC_STRINGOP_OVERFLOW. This option is enabled by default for all other versions of GCC that support it. Link: https://lore.kernel.org/lkml/b3c99290-40bc-426f-b3d2-1aa903f95c4e@embeddedor.com/ Link: https://lore.kernel.org/lkml/20231128091351.2bfb38dd@canb.auug.org.au/ Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/linux-hardening/ZWj1+jkweEDWbmAR@work/ Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
|
#
a751ea34 |
|
07-Dec-2023 |
Randy Dunlap <rdunlap@infradead.org> |
init/Kconfig: move more items into the EXPERT menu KCMP, RSEQ, CACHESTAT_SYSCALL, and PC104 depend on EXPERT but not shown in the EXPERT menu. Move some lines around so that they are displayed in the EXPERT menu. Drop one useless comment. Change "enabled" to "enable" for DEBUG_RSEQ. Link: https://lkml.kernel.org/r/20231208045819.2922-1-rdunlap@infradead.org Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
#
cf8e8658 |
|
20-Oct-2022 |
Ard Biesheuvel <ardb@kernel.org> |
arch: Remove Itanium (IA-64) architecture The Itanium architecture is obsolete, and an informal survey [0] reveals that any residual use of Itanium hardware in production is mostly HP-UX or OpenVMS based. The use of Linux on Itanium appears to be limited to enthusiasts that occasionally boot a fresh Linux kernel to see whether things are still working as intended, and perhaps to churn out some distro packages that are rarely used in practice. None of the original companies behind Itanium still produce or support any hardware or software for the architecture, and it is listed as 'Orphaned' in the MAINTAINERS file, as apparently, none of the engineers that contributed on behalf of those companies (nor anyone else, for that matter) have been willing to support or maintain the architecture upstream or even be responsible for applying the odd fix. The Intel firmware team removed all IA-64 support from the Tianocore/EDK2 reference implementation of EFI in 2018. (Itanium is the original architecture for which EFI was developed, and the way Linux supports it deviates significantly from other architectures.) Some distros, such as Debian and Gentoo, still maintain [unofficial] ia64 ports, but many have dropped support years ago. While the argument is being made [1] that there is a 'for the common good' angle to being able to build and run existing projects such as the Grid Community Toolkit [2] on Itanium for interoperability testing, the fact remains that none of those projects are known to be deployed on Linux/ia64, and very few people actually have access to such a system in the first place. Even if there were ways imaginable in which Linux/ia64 could be put to good use today, what matters is whether anyone is actually doing that, and this does not appear to be the case. There are no emulators widely available, and so boot testing Itanium is generally infeasible for ordinary contributors. GCC still supports IA-64 but its compile farm [3] no longer has any IA-64 machines. GLIBC would like to get rid of IA-64 [4] too because it would permit some overdue code cleanups. In summary, the benefits to the ecosystem of having IA-64 be part of it are mostly theoretical, whereas the maintenance overhead of keeping it supported is real. So let's rip off the band aid, and remove the IA-64 arch code entirely. This follows the timeline proposed by the Debian/ia64 maintainer [5], which removes support in a controlled manner, leaving IA-64 in a known good state in the most recent LTS release. Other projects will follow once the kernel support is removed. [0] https://lore.kernel.org/all/CAMj1kXFCMh_578jniKpUtx_j8ByHnt=s7S+yQ+vGbKt9ud7+kQ@mail.gmail.com/ [1] https://lore.kernel.org/all/0075883c-7c51-00f5-2c2d-5119c1820410@web.de/ [2] https://gridcf.org/gct-docs/latest/index.html [3] https://cfarm.tetaneutral.net/machines/list/ [4] https://lore.kernel.org/all/87bkiilpc4.fsf@mid.deneb.enyo.de/ [5] https://lore.kernel.org/all/ff58a3e76e5102c94bb5946d99187b358def688a.camel@physik.fu-berlin.de/ Acked-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
|
#
ef815d2c |
|
15-Aug-2023 |
Randy Dunlap <rdunlap@infradead.org> |
treewide: drop CONFIG_EMBEDDED There is only one Kconfig user of CONFIG_EMBEDDED and it can be switched to EXPERT or "if !ARCH_MULTIPLATFORM" (suggested by Arnd). Link: https://lkml.kernel.org/r/20230816055010.31534-1-rdunlap@infradead.org Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Acked-by: Geert Uytterhoeven <geert+renesas@glider.be> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Palmer Dabbelt <palmer@rivosinc.com> [RISC-V] Acked-by: Greg Ungerer <gerg@linux-m68k.org> Acked-by: Jason A. Donenfeld <Jason@zx2c4.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> [powerpc] Cc: Russell King <linux@armlinux.org.uk> Cc: Vineet Gupta <vgupta@kernel.org> Cc: Brian Cain <bcain@quicinc.com> Cc: Michal Simek <monstr@monstr.eu> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Dinh Nguyen <dinguyen@kernel.org> Cc: Jonas Bonn <jonas@southpole.se> Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi> Cc: Stafford Horne <shorne@gmail.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Rich Felker <dalias@libc.org> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
#
89cde455 |
|
11-Jul-2023 |
Eric DeVolder <eric.devolder@oracle.com> |
kexec: consolidate kexec and crash options into kernel/Kconfig.kexec Patch series "refactor Kconfig to consolidate KEXEC and CRASH options", v6. The Kconfig is refactored to consolidate KEXEC and CRASH options from various arch/<arch>/Kconfig files into new file kernel/Kconfig.kexec. The Kconfig.kexec is now a submenu titled "Kexec and crash features" located under "General Setup". The following options are impacted: - KEXEC - KEXEC_FILE - KEXEC_SIG - KEXEC_SIG_FORCE - KEXEC_IMAGE_VERIFY_SIG - KEXEC_BZIMAGE_VERIFY_SIG - KEXEC_JUMP - CRASH_DUMP Over time, these options have been copied between Kconfig files and are very similar to one another, but with slight differences. The following architectures are impacted by the refactor (because of use of one or more KEXEC/CRASH options): - arm - arm64 - ia64 - loongarch - m68k - mips - parisc - powerpc - riscv - s390 - sh - x86 More information: In the patch series "crash: Kernel handling of CPU and memory hot un/plug" https://lore.kernel.org/lkml/20230503224145.7405-1-eric.devolder@oracle.com/ the new kernel feature introduces the config option CRASH_HOTPLUG. In reviewing, Thomas Gleixner requested that the new config option not be placed in x86 Kconfig. Rather the option needs a generic/common home. To Thomas' point, the KEXEC and CRASH options have largely been duplicated in the various arch/<arch>/Kconfig files, with minor differences. This kind of proliferation is to be avoid/stopped. https://lore.kernel.org/lkml/875y91yv63.ffs@tglx/ To that end, I have refactored the arch Kconfigs so as to consolidate the various KEXEC and CRASH options. Generally speaking, this work has the following themes: - KEXEC and CRASH options are moved into new file kernel/Kconfig.kexec - These items from arch/Kconfig: CRASH_CORE KEXEC_CORE KEXEC_ELF HAVE_IMA_KEXEC - These items from arch/x86/Kconfig form the common options: KEXEC KEXEC_FILE KEXEC_SIG KEXEC_SIG_FORCE KEXEC_BZIMAGE_VERIFY_SIG KEXEC_JUMP CRASH_DUMP - These items from arch/arm64/Kconfig form the common options: KEXEC_IMAGE_VERIFY_SIG - The crash hotplug series appends CRASH_HOTPLUG to Kconfig.kexec - The Kconfig.kexec is now a submenu titled "Kexec and crash features" and is now listed in "General Setup" submenu from init/Kconfig. - To control the common options, each has a new ARCH_SUPPORTS_<option> option. These gateway options determine whether the common options options are valid for the architecture. - To account for the slight differences in the original architecture coding of the common options, each now has a corresponding ARCH_SELECTS_<option> which are used to elicit the same side effects as the original arch/<arch>/Kconfig files for KEXEC and CRASH options. An example, 'make menuconfig' illustrating the submenu: > General setup > Kexec and crash features [*] Enable kexec system call [*] Enable kexec file based system call [*] Verify kernel signature during kexec_file_load() syscall [ ] Require a valid signature in kexec_file_load() syscall [ ] Enable bzImage signature verification support [*] kexec jump [*] kernel crash dumps [*] Update the crash elfcorehdr on system configuration changes In the process of consolidating the common options, I encountered slight differences in the coding of these options in several of the architectures. As a result, I settled on the following solution: - Each of the common options has a 'depends on ARCH_SUPPORTS_<option>' statement. For example, the KEXEC_FILE option has a 'depends on ARCH_SUPPORTS_KEXEC_FILE' statement. This approach is needed on all common options so as to prevent options from appearing for architectures which previously did not allow/enable them. For example, arm supports KEXEC but not KEXEC_FILE. The arch/arm/Kconfig does not provide ARCH_SUPPORTS_KEXEC_FILE and so KEXEC_FILE and related options are not available to arm. - The boolean ARCH_SUPPORTS_<option> in effect allows the arch to determine when the feature is allowed. Archs which don't have the feature simply do not provide the corresponding ARCH_SUPPORTS_<option>. For each arch, where there previously were KEXEC and/or CRASH options, these have been replaced with the corresponding boolean ARCH_SUPPORTS_<option>, and an appropriate def_bool statement. For example, if the arch supports KEXEC_FILE, then the ARCH_SUPPORTS_KEXEC_FILE simply has a 'def_bool y'. This permits the KEXEC_FILE option to be available. If the arch has a 'depends on' statement in its original coding of the option, then that expression becomes part of the def_bool expression. For example, arm64 had: config KEXEC depends on PM_SLEEP_SMP and in this solution, this converts to: config ARCH_SUPPORTS_KEXEC def_bool PM_SLEEP_SMP - In order to account for the architecture differences in the coding for the common options, the ARCH_SELECTS_<option> in the arch/<arch>/Kconfig is used. This option has a 'depends on <option>' statement to couple it to the main option, and from there can insert the differences from the common option and the arch original coding of that option. For example, a few archs enable CRYPTO and CRYTPO_SHA256 for KEXEC_FILE. These require a ARCH_SELECTS_KEXEC_FILE and 'select CRYPTO' and 'select CRYPTO_SHA256' statements. Illustrating the option relationships: For each of the common KEXEC and CRASH options: ARCH_SUPPORTS_<option> <- <option> <- ARCH_SELECTS_<option> <option> # in Kconfig.kexec ARCH_SUPPORTS_<option> # in arch/<arch>/Kconfig, as needed ARCH_SELECTS_<option> # in arch/<arch>/Kconfig, as needed For example, KEXEC: ARCH_SUPPORTS_KEXEC <- KEXEC <- ARCH_SELECTS_KEXEC KEXEC # in Kconfig.kexec ARCH_SUPPORTS_KEXEC # in arch/<arch>/Kconfig, as needed ARCH_SELECTS_KEXEC # in arch/<arch>/Kconfig, as needed To summarize, the ARCH_SUPPORTS_<option> permits the <option> to be enabled, and the ARCH_SELECTS_<option> handles side effects (ie. select statements). Examples: A few examples to show the new strategy in action: ===== x86 (minus the help section) ===== Original: config KEXEC bool "kexec system call" select KEXEC_CORE config KEXEC_FILE bool "kexec file based system call" select KEXEC_CORE select HAVE_IMA_KEXEC if IMA depends on X86_64 depends on CRYPTO=y depends on CRYPTO_SHA256=y config ARCH_HAS_KEXEC_PURGATORY def_bool KEXEC_FILE config KEXEC_SIG bool "Verify kernel signature during kexec_file_load() syscall" depends on KEXEC_FILE config KEXEC_SIG_FORCE bool "Require a valid signature in kexec_file_load() syscall" depends on KEXEC_SIG config KEXEC_BZIMAGE_VERIFY_SIG bool "Enable bzImage signature verification support" depends on KEXEC_SIG depends on SIGNED_PE_FILE_VERIFICATION select SYSTEM_TRUSTED_KEYRING config CRASH_DUMP bool "kernel crash dumps" depends on X86_64 || (X86_32 && HIGHMEM) config KEXEC_JUMP bool "kexec jump" depends on KEXEC && HIBERNATION help becomes... New: config ARCH_SUPPORTS_KEXEC def_bool y config ARCH_SUPPORTS_KEXEC_FILE def_bool X86_64 && CRYPTO && CRYPTO_SHA256 config ARCH_SELECTS_KEXEC_FILE def_bool y depends on KEXEC_FILE select HAVE_IMA_KEXEC if IMA config ARCH_SUPPORTS_KEXEC_PURGATORY def_bool KEXEC_FILE config ARCH_SUPPORTS_KEXEC_SIG def_bool y config ARCH_SUPPORTS_KEXEC_SIG_FORCE def_bool y config ARCH_SUPPORTS_KEXEC_BZIMAGE_VERIFY_SIG def_bool y config ARCH_SUPPORTS_KEXEC_JUMP def_bool y config ARCH_SUPPORTS_CRASH_DUMP def_bool X86_64 || (X86_32 && HIGHMEM) ===== powerpc (minus the help section) ===== Original: config KEXEC bool "kexec system call" depends on PPC_BOOK3S || PPC_E500 || (44x && !SMP) select KEXEC_CORE config KEXEC_FILE bool "kexec file based system call" select KEXEC_CORE select HAVE_IMA_KEXEC if IMA select KEXEC_ELF depends on PPC64 depends on CRYPTO=y depends on CRYPTO_SHA256=y config ARCH_HAS_KEXEC_PURGATORY def_bool KEXEC_FILE config CRASH_DUMP bool "Build a dump capture kernel" depends on PPC64 || PPC_BOOK3S_32 || PPC_85xx || (44x && !SMP) select RELOCATABLE if PPC64 || 44x || PPC_85xx becomes... New: config ARCH_SUPPORTS_KEXEC def_bool PPC_BOOK3S || PPC_E500 || (44x && !SMP) config ARCH_SUPPORTS_KEXEC_FILE def_bool PPC64 && CRYPTO=y && CRYPTO_SHA256=y config ARCH_SUPPORTS_KEXEC_PURGATORY def_bool KEXEC_FILE config ARCH_SELECTS_KEXEC_FILE def_bool y depends on KEXEC_FILE select KEXEC_ELF select HAVE_IMA_KEXEC if IMA config ARCH_SUPPORTS_CRASH_DUMP def_bool PPC64 || PPC_BOOK3S_32 || PPC_85xx || (44x && !SMP) config ARCH_SELECTS_CRASH_DUMP def_bool y depends on CRASH_DUMP select RELOCATABLE if PPC64 || 44x || PPC_85xx Testing Approach and Results There are 388 config files in the arch/<arch>/configs directories. For each of these config files, a .config is generated both before and after this Kconfig series, and checked for equivalence. This approach allows for a rather rapid check of all architectures and a wide variety of configs wrt/ KEXEC and CRASH, and avoids requiring compiling for all architectures and running kernels and run-time testing. For each config file, the olddefconfig, allnoconfig and allyesconfig targets are utilized. In testing the randconfig has revealed problems as well, but is not used in the before and after equivalence check since one can not generate the "same" .config for before and after, even if using the same KCONFIG_SEED since the option list is different. As such, the following script steps compare the before and after of 'make olddefconfig'. The new symbols introduced by this series are filtered out, but otherwise the config files are PASS only if they were equivalent, and FAIL otherwise. The script performs the test by doing the following: # Obtain the "golden" .config output for given config file # Reset test sandbox git checkout master git branch -D test_Kconfig git checkout -B test_Kconfig master make distclean # Write out updated config cp -f <config file> .config make ARCH=<arch> olddefconfig # Track each item in .config, LHSB is "golden" scoreboard .config # Obtain the "changed" .config output for given config file # Reset test sandbox make distclean # Apply this Kconfig series git am <this Kconfig series> # Write out updated config cp -f <config file> .config make ARCH=<arch> olddefconfig # Track each item in .config, RHSB is "changed" scoreboard .config # Determine test result # Filter-out new symbols introduced by this series # Filter-out symbol=n which not in either scoreboard # Compare LHSB "golden" and RHSB "changed" scoreboards and issue PASS/FAIL The script was instrumental during the refactoring of Kconfig as it continually revealed problems. The end result being that the solution presented in this series passes all configs as checked by the script, with the following exceptions: - arch/ia64/configs/zx1_config with olddefconfig This config file has: # CONFIG_KEXEC is not set CONFIG_CRASH_DUMP=y and this refactor now couples KEXEC to CRASH_DUMP, so it is not possible to enable CRASH_DUMP without KEXEC. - arch/sh/configs/* with allyesconfig The arch/sh/Kconfig codes CRASH_DUMP as dependent upon BROKEN_ON_MMU (which clearly is not meant to be set). This symbol is not provided but with the allyesconfig it is set to yes which enables CRASH_DUMP. But KEXEC is coded as dependent upon MMU, and is set to no in arch/sh/mm/Kconfig, so KEXEC is not enabled. This refactor now couples KEXEC to CRASH_DUMP, so it is not possible to enable CRASH_DUMP without KEXEC. While the above exceptions are not equivalent to their original, the config file produced is valid (and in fact better wrt/ CRASH_DUMP handling). This patch (of 14) The config options for kexec and crash features are consolidated into new file kernel/Kconfig.kexec. Under the "General Setup" submenu is a new submenu "Kexec and crash handling". All the kexec and crash options that were once in the arch-dependent submenu "Processor type and features" are now consolidated in the new submenu. The following options are impacted: - KEXEC - KEXEC_FILE - KEXEC_SIG - KEXEC_SIG_FORCE - KEXEC_BZIMAGE_VERIFY_SIG - KEXEC_JUMP - CRASH_DUMP The three main options are KEXEC, KEXEC_FILE and CRASH_DUMP. Architectures specify support of certain KEXEC and CRASH features with similarly named new ARCH_SUPPORTS_<option> config options. Architectures can utilize the new ARCH_SELECTS_<option> config options to specify additional components when <option> is enabled. To summarize, the ARCH_SUPPORTS_<option> permits the <option> to be enabled, and the ARCH_SELECTS_<option> handles side effects (ie. select statements). Link: https://lkml.kernel.org/r/20230712161545.87870-1-eric.devolder@oracle.com Link: https://lkml.kernel.org/r/20230712161545.87870-2-eric.devolder@oracle.com Signed-off-by: Eric DeVolder <eric.devolder@oracle.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov (AMD) <bp@alien8.de> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Cc. "H. Peter Anvin" <hpa@zytor.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Dave Hansen <dave.hansen@linux.intel.com> # for x86 Cc: Frederic Weisbecker <frederic@kernel.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hari Bathini <hbathini@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Juerg Haefliger <juerg.haefliger@canonical.com> Cc: Kees Cook <keescook@chromium.org> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Linus Walleij <linus.walleij@linaro.org> Cc: Marc Aurèle La France <tsi@tuyoix.net> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rich Felker <dalias@libc.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Cc: Sami Tolvanen <samitolvanen@google.com> Cc: Sebastian Reichel <sebastian.reichel@collabora.com> Cc: Sourabh Jain <sourabhjain@linux.ibm.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: WANG Xuerui <kernel@xen0n.name> Cc: Will Deacon <will@kernel.org> Cc: Xin Li <xin3.li@intel.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Zhen Lei <thunder.leizhen@huawei.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
#
98dfdd9e |
|
30-Jul-2023 |
Randy Dunlap <rdunlap@infradead.org> |
sched/psi: Select KERNFS as needed Users of KERNFS should select it to enforce its being built, so do this to prevent a build error. In file included from ../kernel/sched/build_utility.c:97: ../kernel/sched/psi.c: In function 'psi_trigger_poll': ../kernel/sched/psi.c:1479:17: error: implicit declaration of function 'kernfs_generic_poll' [-Werror=implicit-function-declaration] 1479 | kernfs_generic_poll(t->of, wait); Fixes: aff037078eca ("sched/psi: use kernfs polling functions for PSI trigger polling") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Suren Baghdasaryan <surenb@google.com> Link: lore.kernel.org/r/202307310732.r65EQFY0-lkp@intel.com
|
#
cf264e13 |
|
02-May-2023 |
Nhat Pham <nphamcs@gmail.com> |
cachestat: implement cachestat syscall There is currently no good way to query the page cache state of large file sets and directory trees. There is mincore(), but it scales poorly: the kernel writes out a lot of bitmap data that userspace has to aggregate, when the user really doesn not care about per-page information in that case. The user also needs to mmap and unmap each file as it goes along, which can be quite slow as well. Some use cases where this information could come in handy: * Allowing database to decide whether to perform an index scan or direct table queries based on the in-memory cache state of the index. * Visibility into the writeback algorithm, for performance issues diagnostic. * Workload-aware writeback pacing: estimating IO fulfilled by page cache (and IO to be done) within a range of a file, allowing for more frequent syncing when and where there is IO capacity, and batching when there is not. * Computing memory usage of large files/directory trees, analogous to the du tool for disk usage. More information about these use cases could be found in the following thread: https://lore.kernel.org/lkml/20230315170934.GA97793@cmpxchg.org/ This patch implements a new syscall that queries cache state of a file and summarizes the number of cached pages, number of dirty pages, number of pages marked for writeback, number of (recently) evicted pages, etc. in a given range. Currently, the syscall is only wired in for x86 architecture. NAME cachestat - query the page cache statistics of a file. SYNOPSIS #include <sys/mman.h> struct cachestat_range { __u64 off; __u64 len; }; struct cachestat { __u64 nr_cache; __u64 nr_dirty; __u64 nr_writeback; __u64 nr_evicted; __u64 nr_recently_evicted; }; int cachestat(unsigned int fd, struct cachestat_range *cstat_range, struct cachestat *cstat, unsigned int flags); DESCRIPTION cachestat() queries the number of cached pages, number of dirty pages, number of pages marked for writeback, number of evicted pages, number of recently evicted pages, in the bytes range given by `off` and `len`. An evicted page is a page that is previously in the page cache but has been evicted since. A page is recently evicted if its last eviction was recent enough that its reentry to the cache would indicate that it is actively being used by the system, and that there is memory pressure on the system. These values are returned in a cachestat struct, whose address is given by the `cstat` argument. The `off` and `len` arguments must be non-negative integers. If `len` > 0, the queried range is [`off`, `off` + `len`]. If `len` == 0, we will query in the range from `off` to the end of the file. The `flags` argument is unused for now, but is included for future extensibility. User should pass 0 (i.e no flag specified). Currently, hugetlbfs is not supported. Because the status of a page can change after cachestat() checks it but before it returns to the application, the returned values may contain stale information. RETURN VALUE On success, cachestat returns 0. On error, -1 is returned, and errno is set to indicate the error. ERRORS EFAULT cstat or cstat_args points to an invalid address. EINVAL invalid flags. EBADF invalid file descriptor. EOPNOTSUPP file descriptor is of a hugetlbfs file [nphamcs@gmail.com: replace rounddown logic with the existing helper] Link: https://lkml.kernel.org/r/20230504022044.3675469-1-nphamcs@gmail.com Link: https://lkml.kernel.org/r/20230503013608.2431726-3-nphamcs@gmail.com Signed-off-by: Nhat Pham <nphamcs@gmail.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Brian Foster <bfoster@redhat.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
#
721da5ce |
|
23-Feb-2023 |
Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
driver core: remove CONFIG_SYSFS_DEPRECATED and CONFIG_SYSFS_DEPRECATED_V2 CONFIG_SYSFS_DEPRECATED was added in commit 88a22c985e35 ("CONFIG_SYSFS_DEPRECATED") in 2006 to allow systems with older versions of some tools (i.e. Fedora 3's version of udev) to boot properly. Four years later, in 2010, the option was attempted to be removed as most of userspace should have been fixed up properly by then, but some kernel developers clung to those old systems and refused to update, so we added CONFIG_SYSFS_DEPRECATED_V2 in commit e52eec13cd6b ("SYSFS: Allow boot time switching between deprecated and modern sysfs layout") to allow them to continue to boot properly, and we allowed a boot time parameter to be used to switch back to the old format if needed. Over time, the logic that was covered under these config options was slowly removed from individual driver subsystems successfully, removed, and the only thing that is now left in the kernel are some changes in the block layer's representation in sysfs where real directories are used instead of symlinks like normal. Because the original changes were done to userspace tools in 2006, and all distros that use those tools are long end-of-life, and older non-udev-based systems do not care about the block layer's sysfs representation, it is time to finally remove this old logic and the config entries from the kernel. Cc: Jonathan Corbet <corbet@lwn.net> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: linux-block@vger.kernel.org Cc: linux-doc@vger.kernel.org Acked-by: Jens Axboe <axboe@kernel.dk> Link: https://lore.kernel.org/r/20230223073326.2073220-1-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
c9929f0e |
|
27-Feb-2023 |
Vlastimil Babka <vbabka@suse.cz> |
mm/slob: remove CONFIG_SLOB Remove SLOB from Kconfig and Makefile. Everything under #ifdef CONFIG_SLOB, and mm/slob.c is now dead code. Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Acked-by: Lorenzo Stoakes <lstoakes@gmail.com> Acked-by: Mike Rapoport (IBM) <rppt@kernel.org>
|
#
0c705be9 |
|
20-Feb-2023 |
Marc Aurèle La France <tsi@tuyoix.net> |
Remove orphaned CONFIG_PRINTK_SAFE_LOG_BUF_SHIFT After the commit 93d102f094be9beab2 ("printk: remove safe buffers"), CONFIG_PRINTK_SAFE_LOG_BUF_SHIFT is no longer useful. Remove it. Signed-off-by: Marc Aurèle La France <tsi@tuyoix.net> Reviewed-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Reviewed-by: Petr Mladek <pmladek@suse.com> [pmladek@suse.cz: Cleaned up the commit message.] Signed-off-by: Petr Mladek <pmladek@suse.com> Fixes: 93d102f094be9beab ("printk: remove safe buffers") Link: https://lore.kernel.org/r/5c19e248-1b6b-330c-7c4c-a824688daefe@tuyoix.net
|
#
0da6e5fd |
|
23-Apr-2023 |
Linus Torvalds <torvalds@linux-foundation.org> |
gcc: disable '-Warray-bounds' for gcc-13 too We started disabling '-Warray-bounds' for gcc-12 originally on s390, because it resulted in some warnings that weren't realistically fixable (commit 8b202ee21839: "s390: disable -Warray-bounds"). That s390-specific issue was then found to be less common elsewhere, but generic (see f0be87c42cbd: "gcc-12: disable '-Warray-bounds' universally for now"), and then later expanded the version check was expanded to gcc-11 (5a41237ad1d4: "gcc: disable -Warray-bounds for gcc-11 too"). And it turns out that I was much too optimistic in thinking that it's all going to go away, and here we are with gcc-13 showing all the same issues. So instead of expanding this one version at a time, let's just disable it for gcc-11+, and put an end limit to it only when we actually find a solution. Yes, I'm sure some of this is because the kernel just does odd things (like our "container_of()" use, but also knowingly playing games with things like linker tables and array layouts). And yes, some of the warnings are likely signs of real bugs, but when there are hundreds of false positives, that doesn't really help. Oh well. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
ec61452a |
|
19-Jan-2023 |
Masahiro Yamada <masahiroy@kernel.org> |
scripts: remove bin2c Commit 80f8be7af03f ("tomoyo: Omit use of bin2c") removed the last use of bin2c. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Nicolas Schier <nicolas@fjasle.eu> Reviewed-by: Sedat Dilek <sedat.dilek@gmail.com>
|
#
6ded8a28 |
|
21-Feb-2023 |
Paul E. McKenney <paulmck@kernel.org> |
bootconfig: Default BOOT_CONFIG_FORCE to y if BOOT_CONFIG_EMBED When a kernel is built with CONFIG_BOOT_CONFIG_EMBED=y, the intention will normally be to unconditionally provide the specified kernel-boot arguments to the kernel, as opposed to requiring a separately provided bootconfig parameter. Therefore, make the BOOT_CONFIG_FORCE Kconfig option default to y in kernels built with CONFIG_BOOT_CONFIG_EMBED=y. The old semantics may be obtained by manually overriding this default. Link: https://lore.kernel.org/all/20230107162202.GA4028633@paulmck-ThinkPad-P17-Gen-1/ Suggested-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
|
#
b743852c |
|
21-Feb-2023 |
Paul E. McKenney <paulmck@kernel.org> |
Allow forcing unconditional bootconfig processing The BOOT_CONFIG family of Kconfig options allows a bootconfig file containing kernel boot parameters to be embedded into an initrd or into the kernel itself. This can be extremely useful when deploying kernels in cases where some of the boot parameters depend on the kernel version rather than on the server hardware, firmware, or workload. Unfortunately, the "bootconfig" kernel parameter must be specified in order to cause the kernel to look for the embedded bootconfig file, and it clearly does not help to embed this "bootconfig" kernel parameter into that file. Therefore, provide a new BOOT_CONFIG_FORCE Kconfig option that causes the kernel to act as if the "bootconfig" kernel parameter had been specified. In other words, kernels built with CONFIG_BOOT_CONFIG_FORCE=y will look for the embedded bootconfig file even when the "bootconfig" kernel parameter is omitted. This permits kernel-version-dependent kernel boot parameters to be embedded into the kernel image without the need to (for example) update large numbers of boot loaders. Link: https://lore.kernel.org/all/20230105005838.GA1772817@paulmck-ThinkPad-P17-Gen-1/ Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: <linux-doc@vger.kernel.org> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
|
#
bc636dcb |
|
22-Nov-2022 |
Paul E. McKenney <paulmck@kernel.org> |
init: Remove "select SRCU" Now that the SRCU Kconfig option is unconditionally selected, there is no longer any point in selecting it. Therefore, remove the "select SRCU" Kconfig statements. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com> Cc: Aaron Tomlin <atomlin@redhat.com> Cc: Tejun Heo <tj@kernel.org> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: John Ogness <john.ogness@linutronix.de>
|
#
c1177979 |
|
10-Jan-2023 |
Martin Rodriguez Reboredo <yakoyoku@gmail.com> |
btf, scripts: Exclude Rust CUs with pahole Version 1.24 of pahole has the capability to exclude compilation units (CUs) of specific languages [1] [2]. Rust, as of writing, is not currently supported by pahole and if it's used with a build that has BTF debugging enabled it results in malformed kernel and module binaries [3]. So it's better for pahole to exclude Rust CUs until support for it arrives. Co-developed-by: Eric Curtin <ecurtin@redhat.com> Signed-off-by: Eric Curtin <ecurtin@redhat.com> Signed-off-by: Martin Rodriguez Reboredo <yakoyoku@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Eric Curtin <ecurtin@redhat.com> Reviewed-by: Neal Gompa <neal@gompa.dev> Acked-by: Miguel Ojeda <ojeda@kernel.org> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: https://git.kernel.org/pub/scm/devel/pahole/pahole.git/commit/?id=49358dfe2aaae4e90b072332c3e324019826783f [1] Link: https://git.kernel.org/pub/scm/devel/pahole/pahole.git/commit/?id=8ee363790b7437283c53090a85a9fec2f0b0fbc4 [2] Link: https://github.com/Rust-for-Linux/linux/issues/735 [3] Link: https://lore.kernel.org/bpf/20230111152050.559334-1-yakoyoku@gmail.com
|
#
af7f588d |
|
22-Nov-2022 |
Mathieu Desnoyers <mathieu.desnoyers@efficios.com> |
sched: Introduce per-memory-map concurrency ID This feature allows the scheduler to expose a per-memory map concurrency ID to user-space. This concurrency ID is within the possible cpus range, and is temporarily (and uniquely) assigned while threads are actively running within a memory map. If a memory map has fewer threads than cores, or is limited to run on few cores concurrently through sched affinity or cgroup cpusets, the concurrency IDs will be values close to 0, thus allowing efficient use of user-space memory for per-cpu data structures. This feature is meant to be exposed by a new rseq thread area field. The primary purpose of this feature is to do the heavy-lifting needed by memory allocators to allow them to use per-cpu data structures efficiently in the following situations: - Single-threaded applications, - Multi-threaded applications on large systems (many cores) with limited cpu affinity mask, - Multi-threaded applications on large systems (many cores) with restricted cgroup cpuset per container. One of the key concern from scheduler maintainers is the overhead associated with additional spin locks or atomic operations in the scheduler fast-path. This is why the following optimization is implemented. On context switch between threads belonging to the same memory map, transfer the mm_cid from prev to next without any atomic ops. This takes care of use-cases involving frequent context switch between threads belonging to the same memory map. Additional optimizations can be done if the spin locks added when context switching between threads belonging to different memory maps end up being a performance bottleneck. Those are left out of this patch though. A performance impact would have to be clearly demonstrated to justify the added complexity. The credit goes to Paul Turner (Google) for the original virtual cpu id idea. This feature is implemented based on the discussions with Paul Turner and Peter Oskolkov (Google), but I took the liberty to implement scheduler fast-path optimizations and my own NUMA-awareness scheme. The rumor has it that Google have been running a rseq vcpu_id extension internally in production for a year. The tcmalloc source code indeed has comments hinting at a vcpu_id prototype extension to the rseq system call [1]. The following benchmarks do not show any significant overhead added to the scheduler context switch by this feature: * perf bench sched messaging (process) Baseline: 86.5±0.3 ms With mm_cid: 86.7±2.6 ms * perf bench sched messaging (threaded) Baseline: 84.3±3.0 ms With mm_cid: 84.7±2.6 ms * hackbench (process) Baseline: 82.9±2.7 ms With mm_cid: 82.9±2.9 ms * hackbench (threaded) Baseline: 85.2±2.6 ms With mm_cid: 84.4±2.9 ms [1] https://github.com/google/tcmalloc/blob/master/tcmalloc/internal/linux_syscall_support.h#L26 Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20221122203932.231377-8-mathieu.desnoyers@efficios.com
|
#
0f9c608d |
|
11-Jan-2023 |
Rasmus Villemoes <linux@rasmusvillemoes.dk> |
init/Kconfig: fix LOCALVERSION_AUTO help text It was never guaranteed to be exactly eight, but since commit 548b8b5168c9 ("scripts/setlocalversion: make git describe output more reliable"), it has been exactly 12. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
#
19fa92fb |
|
09-Jan-2023 |
Lizzy Fleckenstein <eliasfleckenstein@web.de> |
init/Kconfig: fix typo (usafe -> unsafe) Fix the help text for the PRINTK_SAFE_LOG_BUF_SHIFT setting. Link: https://lkml.kernel.org/r/20230109201837.23873-1-eliasfleckenstein@web.de Signed-off-by: Lizzy Fleckenstein <eliasfleckenstein@web.de> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Cc: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
#
5a41237a |
|
09-Jan-2023 |
Linus Torvalds <torvalds@linux-foundation.org> |
gcc: disable -Warray-bounds for gcc-11 too We had already disabled this warning for gcc-12 due to bugs in the value range analysis, but it turns out we end up having some similar problems with gcc-11.3 too, so let's disable it there too. Older gcc versions end up being increasingly less relevant, and hopefully clang and newer version of gcc (ie gcc-13) end up working reliably enough that we still get the build coverage even when we disable this for some versions. Link: https://lore.kernel.org/all/20221227002941.GA2691687@roeck-us.net/ Link: https://lore.kernel.org/all/D8BDBF66-E44C-45D4-9758-BAAA4F0C1998@kernel.org/ Cc: Kees Cook <kees@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
e1789d7c |
|
25-Oct-2022 |
Xin Li <xin3.li@intel.com> |
kbuild: upgrade the orphan section warning to an error if CONFIG_WERROR is set Andrew Cooper suggested upgrading the orphan section warning to a hard link error. However Nathan Chancellor said outright turning the warning into an error with no escape hatch might be too aggressive, as we have had these warnings triggered by new compiler generated sections, and suggested turning orphan sections into an error only if CONFIG_WERROR is set. Kees Cook echoed and emphasized that the mandate from Linus is that we should avoid breaking builds. It wrecks bisection, it causes problems across compiler versions, etc. Thus upgrade the orphan section warning to a hard link error only if CONFIG_WERROR is set. Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com> Suggested-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Xin Li <xin3.li@intel.com> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Tested-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20221025073023.16137-2-xin3.li@intel.com
|
#
30f3bb09 |
|
15-Nov-2022 |
Zhen Lei <thunder.leizhen@huawei.com> |
kallsyms: Add self-test facility Added test cases for basic functions and performance of functions kallsyms_lookup_name(), kallsyms_on_each_symbol() and kallsyms_on_each_match_symbol(). It also calculates the compression rate of the kallsyms compression algorithm for the current symbol set. The basic functions test begins by testing a set of symbols whose address values are known. Then, traverse all symbol addresses and find the corresponding symbol name based on the address. It's impossible to determine whether these addresses are correct, but we can use the above three functions along with the addresses to test each other. Due to the traversal operation of kallsyms_on_each_symbol() is too slow, only 60 symbols can be tested in one second, so let it test on average once every 128 symbols. The other two functions validate all symbols. If the basic functions test is passed, print only performance test results. If the test fails, print error information, but do not perform subsequent performance tests. Start self-test automatically after system startup if CONFIG_KALLSYMS_SELFTEST=y. Example of output content: (prefix 'kallsyms_selftest:' is omitted start --------------------------------------------------------- | nr_symbols | compressed size | original size | ratio(%) | |---------------------------------------------------------| | 107543 | 1357912 | 2407433 | 56.40 | --------------------------------------------------------- kallsyms_lookup_name() looked up 107543 symbols The time spent on each symbol is (ns): min=630, max=35295, avg=7353 kallsyms_on_each_symbol() traverse all: 11782628 ns kallsyms_on_each_match_symbol() traverse all: 9261 ns finish Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
|
#
534bd703 |
|
14-Nov-2022 |
Alexandre Belloni <alexandre.belloni@bootlin.com> |
init/Kconfig: fix CC_HAS_ASM_GOTO_TIED_OUTPUT test with dash When using dash as /bin/sh, the CC_HAS_ASM_GOTO_TIED_OUTPUT test fails with a syntax error which is not the one we are looking for: <stdin>: In function ‘foo’: <stdin>:1:29: warning: missing terminating " character <stdin>:1:29: error: missing terminating " character <stdin>:2:5: error: expected ‘:’ before ‘+’ token <stdin>:2:7: warning: missing terminating " character <stdin>:2:7: error: missing terminating " character <stdin>:2:5: error: expected declaration or statement at end of input Removing '\n' solves this. Fixes: 1aa0e8b144b6 ("Kconfig: Add option for asm goto w/ tied outputs to workaround clang-13 bug") Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
#
eacf96d2 |
|
07-Oct-2022 |
Colin Ian King <colin.i.king@gmail.com> |
init: Kconfig: fix spelling mistake "satify" -> "satisfy" There is a spelling mistake in a Kconfig description. Fix it. Link: https://lkml.kernel.org/r/20221007204339.2757753-1-colin.i.king@gmail.com Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
#
30341ec9 |
|
29-Sep-2022 |
Ren Zhijie <renzhijie2@huawei.com> |
init/Kconfig: fix unmet direct dependencies Commit 3c07bfce92a5 ("proc: make config PROC_CHILDREN depend on PROC_FS") make config PROC_CHILDREN depend on PROC_FS. When CONFIG_PROC_FS is not set and CONFIG_CHECKPOINT_RESTORE=y, make menuconfig screams like this: WARNING: unmet direct dependencies detected for PROC_CHILDREN Depends on [n]: PROC_FS [=n] Selected by [y]: - CHECKPOINT_RESTORE [=y] CHECKPOINT_RESTORE would select PROC_CHILDREN which depends on PROC_FS, so add depends on PROC_FS to CHECKPOINT_RESTORE to fix this. Link: https://lkml.kernel.org/r/20220929070057.59044-1-renzhijie2@huawei.com Fixes: 3c07bfce92a5 ("proc: make config PROC_CHILDREN depend on PROC_FS") Signed-off-by: Ren Zhijie <renzhijie2@huawei.com> Reviewed-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
#
e55b9f96 |
|
26-Sep-2022 |
Johannes Weiner <hannes@cmpxchg.org> |
mm: memcontrol: drop dead CONFIG_MEMCG_SWAP config symbol Since 2d1c498072de ("mm: memcontrol: make swap tracking an integral part of memory control"), CONFIG_MEMCG_SWAP hasn't been a user-visible config option anymore, it just means CONFIG_MEMCG && CONFIG_SWAP. Update the sites accordingly and drop the symbol. [ While touching the docs, remove two references to CONFIG_MEMCG_KMEM, which hasn't been a user-visible symbol for over half a decade. ] Link: https://lkml.kernel.org/r/20220926135704.400818-5-hannes@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Shakeel Butt <shakeelb@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
#
02382aff |
|
02-Sep-2022 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc/64: Remove PPC64 special case for cputime accounting default Distro kernels tend to be moving to VIRT_CPU_ACCOUNTING_GEN, and there is not much reason why PPC64 should be special here. Remove the special case and make the ppc64 and pseries defconfigs use GEN accounting (others will use TICK, as-per Kconfig defaults). VIRT_CPU_ACCOUNTING_NATIVE does provide scaled vtime and stolen time apportioned between system and user time, and vtime accounting is not unconditionally enabled, and possibly other things. But it would be better at this point to extend GEN to cover important missing features rather than directing users back to a less used option. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220902085316.2071519-4-npiggin@gmail.com
|
#
2f7ab126 |
|
03-Jul-2021 |
Miguel Ojeda <ojeda@kernel.org> |
Kbuild: add Rust support Having most of the new files in place, we now enable Rust support in the build system, including `Kconfig` entries related to Rust, the Rust configuration printer and a few other bits. Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Tested-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Co-developed-by: Alex Gaynor <alex.gaynor@gmail.com> Signed-off-by: Alex Gaynor <alex.gaynor@gmail.com> Co-developed-by: Finn Behrens <me@kloenk.de> Signed-off-by: Finn Behrens <me@kloenk.de> Co-developed-by: Adam Bratschi-Kaye <ark.email@gmail.com> Signed-off-by: Adam Bratschi-Kaye <ark.email@gmail.com> Co-developed-by: Wedson Almeida Filho <wedsonaf@google.com> Signed-off-by: Wedson Almeida Filho <wedsonaf@google.com> Co-developed-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Co-developed-by: Sven Van Asbroeck <thesven73@gmail.com> Signed-off-by: Sven Van Asbroeck <thesven73@gmail.com> Co-developed-by: Gary Guo <gary@garyguo.net> Signed-off-by: Gary Guo <gary@garyguo.net> Co-developed-by: Boris-Chengbiao Zhou <bobo1239@web.de> Signed-off-by: Boris-Chengbiao Zhou <bobo1239@web.de> Co-developed-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Co-developed-by: Douglas Su <d0u9.su@outlook.com> Signed-off-by: Douglas Su <d0u9.su@outlook.com> Co-developed-by: Dariusz Sosnowski <dsosnowski@dsosnowski.pl> Signed-off-by: Dariusz Sosnowski <dsosnowski@dsosnowski.pl> Co-developed-by: Antonio Terceiro <antonio.terceiro@linaro.org> Signed-off-by: Antonio Terceiro <antonio.terceiro@linaro.org> Co-developed-by: Daniel Xu <dxu@dxuuu.xyz> Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Co-developed-by: Björn Roy Baron <bjorn3_gh@protonmail.com> Signed-off-by: Björn Roy Baron <bjorn3_gh@protonmail.com> Co-developed-by: Martin Rodriguez Reboredo <yakoyoku@gmail.com> Signed-off-by: Martin Rodriguez Reboredo <yakoyoku@gmail.com> Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
|
#
a0a12c3e |
|
19-Aug-2022 |
Nick Desaulniers <ndesaulniers@google.com> |
asm goto: eradicate CC_HAS_ASM_GOTO GCC has supported asm goto since 4.5, and Clang has since version 9.0.0. The minimum supported versions of these tools for the build according to Documentation/process/changes.rst are 5.1 and 11.0.0 respectively. Remove the feature detection script, Kconfig option, and clean up some fallback code that is no longer supported. The removed script was also testing for a GCC specific bug that was fixed in the 4.7 release. Also remove workarounds for bpftrace using clang older than 9.0.0, since other BPF backend fixes are required at this point. Link: https://lore.kernel.org/lkml/CAK7LNATSr=BXKfkdW8f-H5VT_w=xBpT2ZQcZ7rm6JfkdE+QnmA@mail.gmail.com/ Link: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48637 Acked-by: Borislav Petkov <bp@suse.de> Suggested-by: Masahiro Yamada <masahiroy@kernel.org> Suggested-by: Alexei Starovoitov <alexei.starovoitov@gmail.com> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
bdf0fe33 |
|
06-Jul-2022 |
Baruch Siach <baruch@tkos.co.il> |
init/Kconfig: update KALLSYMS_ALL help text CONFIG_KALLSYMS_ALL is required for kernel live patching which is a common use case that is enabled in some major distros. Update the Kconfig help text to reflect that. While at it, s/e.g./i.e./ to match the text intention. Signed-off-by: Baruch Siach <baruch@tkos.co.il> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
#
a6036a41 |
|
28-Jun-2022 |
Nick Desaulniers <ndesaulniers@google.com> |
kbuild: drop support for CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE_O3 The difference in most compilers between `-O3` and `-O2` is mostly down to whether loops with statically determinable trip counts are fully unrolled vs unrolled to a multiple of SIMD width. This patch is effectively a revert of commit 15f5db60a137 ("kbuild,arc: add CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE_O3 for ARC") without re-adding ARCH_CFLAGS Ever since commit cfdbc2e16e65 ("ARC: Build system: Makefiles, Kconfig, Linker script") ARC has been built with -O3, though the reason for doing so was not specified in inline comments or the commit message. This commit does not re-add -O3 to arch/arc/Makefile. Folks looking to experiment with `-O3` (or any compiler flag for that matter) may pass them along to the command line invocation of make: $ make KCFLAGS=-O3 Code that looks to re-add an explicit Kconfig option for `-O3` should provide: 1. A rigorous and reproducible performance profile of a reasonable userspace workload that demonstrates a hot loop in the kernel that would benefit from `-O3` over `-O2`. 2. Disassembly of said loop body before and after. 3. Provides stats on terms of increase in file size. Link: https://lore.kernel.org/linux-kbuild/CA+55aFz2sNBbZyg-_i8_Ldr2e8o9dfvdSfHHuRzVtP2VMAUWPg@mail.gmail.com/ Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
#
73b4fc92 |
|
11-Jul-2022 |
Christophe Leroy <christophe.leroy@csgroup.eu> |
module: Move module's Kconfig items in kernel/module/ In init/Kconfig, the part dedicated to modules is quite large. Move it into a dedicated Kconfig in kernel/module/ MODULES_TREE_LOOKUP was outside of the 'if MODULES', but as it is only used when MODULES are set, move it in with everything else to avoid confusion. MODULE_SIG_FORMAT is left in init/Kconfig because this configuration item is not used in kernel/modules/ but in kernel/ and can be selected independently from CONFIG_MODULES. It is for instance selected from security/integrity/ima/Kconfig. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
|
#
6a010a49 |
|
23-Jul-2022 |
Tejun Heo <tj@kernel.org> |
cgroup: Make !percpu threadgroup_rwsem operations optional 3942a9bd7b58 ("locking, rcu, cgroup: Avoid synchronize_sched() in __cgroup_procs_write()") disabled percpu operations on threadgroup_rwsem because the impiled synchronize_rcu() on write locking was pushing up the latencies too much for android which constantly moves processes between cgroups. This makes the hotter paths - fork and exit - slower as they're always forced into the slow path. There is no reason to force this on everyone especially given that more common static usage pattern can now completely avoid write-locking the rwsem. Write-locking is elided when turning on and off controllers on empty sub-trees and CLONE_INTO_CGROUP enables seeding a cgroup without grabbing the rwsem. Restore the default percpu operations and introduce the mount option "favordynmods" and config option CGROUP_FAVOR_DYNMODS for users who need lower latencies for the dynamic operations. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Christian Brauner <brauner@kernel.org> Cc: Michal Koutn� <mkoutny@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: John Stultz <john.stultz@linaro.org> Cc: Dmitry Shmidt <dimitrysh@google.com> Cc: Oleg Nesterov <oleg@redhat.com>
|
#
ec8f7f48 |
|
09-Jul-2022 |
Eric Biggers <ebiggers@google.com> |
crypto: lib - make the sha1 library optional Since the Linux RNG no longer uses sha1_transform(), the SHA-1 library is no longer needed unconditionally. Make it possible to build the Linux kernel without the SHA-1 library by putting it behind a kconfig option, and selecting this new option from the kconfig options that gate the remaining users: CRYPTO_SHA1 for crypto/sha1_generic.c, BPF for kernel/bpf/core.c, and IPV6 for net/ipv6/addrconf.c. Unfortunately, since BPF is selected by NET, for now this can only make a difference for kernels built without networking support. Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
#
c02b872a |
|
26-Jun-2022 |
Mauro Carvalho Chehab <mchehab@kernel.org> |
Documentation: update watch_queue.rst references Changeset f5461124d59b ("Documentation: move watch_queue to core-api") renamed: Documentation/watch_queue.rst to: Documentation/core-api/watch_queue.rst. Update the cross-references accordingly. Fixes: f5461124d59b ("Documentation: move watch_queue to core-api") Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org> Link: https://lore.kernel.org/r/1c220de9c58f35e815a3df9458ac2bea323c8bfb.1656234456.git.mchehab@kernel.org Signed-off-by: Jonathan Corbet <corbet@lwn.net>
|
#
24a9c541 |
|
08-Jun-2022 |
Frederic Weisbecker <frederic@kernel.org> |
context_tracking: Split user tracking Kconfig Context tracking is going to be used not only to track user transitions but also idle/IRQs/NMIs. The user tracking part will then become a separate feature. Prepare Kconfig for that. [ frederic: Apply Max Filippov feedback. ] Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com> Cc: Uladzislau Rezki <uladzislau.rezki@sony.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Nicolas Saenz Julienne <nsaenz@kernel.org> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Xiongfeng Wang <wangxiongfeng2@huawei.com> Cc: Yu Liao <liaoyu15@huawei.com> Cc: Phil Auld <pauld@redhat.com> Cc: Paul Gortmaker<paul.gortmaker@windriver.com> Cc: Alex Belits <abelits@marvell.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Nicolas Saenz Julienne <nsaenzju@redhat.com> Tested-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
|
#
f0be87c4 |
|
09-Jun-2022 |
Linus Torvalds <torvalds@linux-foundation.org> |
gcc-12: disable '-Warray-bounds' universally for now In commit 8b202ee21839 ("s390: disable -Warray-bounds") the s390 people disabled the '-Warray-bounds' warning for gcc-12, because the new logic in gcc would cause warnings for their use of the S390_lowcore macro, which accesses absolute pointers. It turns out gcc-12 has many other issues in this area, so this takes that s390 warning disable logic, and turns it into a kernel build config entry instead. Part of the intent is that we can make this all much more targeted, and use this conflig flag to disable it in only particular configurations that cause problems, with the s390 case as an example: select GCC12_NO_ARRAY_BOUNDS and we could do that for other configuration cases that cause issues. Or we could possibly use the CONFIG_CC_NO_ARRAY_BOUNDS thing in a more targeted way, and disable the warning only for particular uses: again the s390 case as an example: KBUILD_CFLAGS_DECOMPRESSOR += $(if $(CONFIG_CC_NO_ARRAY_BOUNDS),-Wno-array-bounds) but this ends up just doing it globally in the top-level Makefile, since the current issues are spread fairly widely all over: KBUILD_CFLAGS-$(CONFIG_CC_NO_ARRAY_BOUNDS) += -Wno-array-bounds We'll try to limit this later, since the gcc-12 problems are rare enough that *much* of the kernel can be built with it without disabling this warning. Cc: Kees Cook <keescook@chromium.org> Cc: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
0cbed0ee |
|
05-Apr-2022 |
Guo Ren <guoren@kernel.org> |
arch: Add SYSVIPC_COMPAT for all architectures The existing per-arch definitions are pretty much historic cruft. Move SYSVIPC_COMPAT into init/Kconfig. Signed-off-by: Guo Ren <guoren@linux.alibaba.com> Signed-off-by: Guo Ren <guoren@kernel.org> Acked-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: Heiko Stuebner <heiko@sntech.de> Acked-by: Helge Deller <deller@gmx.de> # parisc Link: https://lore.kernel.org/r/20220405071314.3225832-5-guoren@kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
#
a2a9d67a |
|
05-Apr-2022 |
Masami Hiramatsu <mhiramat@kernel.org> |
bootconfig: Support embedding a bootconfig file in kernel This allows kernel developer to embed a default bootconfig file in the kernel instead of embedding it in the initrd. This will be good for who are using the kernel without initrd, or who needs a default bootconfigs. This needs to set two kconfigs: CONFIG_BOOT_CONFIG_EMBED=y and set the file path to CONFIG_BOOT_CONFIG_EMBED_FILE. Note that you still need 'bootconfig' command line option to load the embedded bootconfig. Also if you boot using an initrd with a different bootconfig, the kernel will use the bootconfig in the initrd, instead of the default bootconfig. Link: https://lkml.kernel.org/r/164921227943.1090670.14035119557571329218.stgit@devnote2 Cc: Padmanabha Srinivasaiah <treasure4paddy@gmail.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Sami Tolvanen <samitolvanen@google.com> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Linux Kbuild mailing list <linux-kbuild@vger.kernel.org> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
#
0710d012 |
|
25-May-2022 |
Vlastimil Babka <vbabka@suse.cz> |
mm: Kconfig: reorganize misplaced mm options After commits 7b42f1041c98 ("mm: Kconfig: move swap and slab config options to the MM section") and 519bcb797907 ("mm: Kconfig: group swap, slab, hotplug and thp options into submenus") we now have nicely organized mm related config options. I have noticed some that were still misplaced, so this moves them from various places into the new structure: VM_EVENT_COUNTERS, COMPAT_BRK, MMAP_ALLOW_UNINITIALIZED to mm/Kconfig and general MM section. SLUB_STATS to mm/Kconfig and the slab submenu. DEBUG_SLAB, SLUB_DEBUG, SLUB_DEBUG_ON to mm/Kconfig.debug and the Kernel hacking / Memory Debugging submenu. Link: https://lkml.kernel.org/r/20220525112559.1139-1-vbabka@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
#
1274aea1 |
|
09-May-2022 |
David Disseldorp <ddiss@suse.de> |
initramfs: add INITRAMFS_PRESERVE_MTIME Kconfig option initramfs cpio mtime preservation, as implemented in commit 889d51a10712 ("initramfs: add option to preserve mtime from initramfs cpio images"), uses a linked list to defer directory mtime processing until after all other items in the cpio archive have been processed. This is done to ensure that parent directory mtimes aren't overwritten via subsequent child creation. The lkml link below indicates that the mtime retention use case was for embedded devices with applications running exclusively out of initramfs, where the 32-bit mtime value provided a rough file version identifier. Linux distributions which discard an extracted initramfs immediately after the root filesystem has been mounted may want to avoid the unnecessary overhead. This change adds a new INITRAMFS_PRESERVE_MTIME Kconfig option, which can be used to disable on-by-default mtime retention and in turn speed up initramfs extraction, particularly for cpio archives with large directory counts. Benchmarks with a one million directory cpio archive extracted 20 times demonstrated: mean extraction time (s) std dev INITRAMFS_PRESERVE_MTIME=y 3.808 0.006 INITRAMFS_PRESERVE_MTIME unset 3.056 0.004 The above extraction times were measured using ftrace (initcall_finish - initcall_start) values for populate_rootfs() with initramfs_async disabled. [ddiss@suse.de: rebase atop dir_entry.name flexible array member and drop separate initramfs_mtime.h header] Link: https://lkml.org/lkml/2008/9/3/424 Link: https://lkml.kernel.org/r/20220404093429.27570-4-ddiss@suse.de Signed-off-by: David Disseldorp <ddiss@suse.de> Reviewed-by: Martin Wilck <mwilck@suse.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <christian.brauner@ubuntu.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
#
7374fa33 |
|
29-Apr-2022 |
Kees Cook <keescook@chromium.org> |
init/Kconfig: remove USELIB syscall by default The uselib syscall has been long deprecated. There's no need to keep this enabled by default under X86_32. Link: https://lkml.kernel.org/r/20220412212519.4113845-1-keescook@chromium.org Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Cc: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
#
7b453719 |
|
13-May-2022 |
Masahiro Yamada <masahiroy@kernel.org> |
kbuild: link symbol CRCs at final link, removing CONFIG_MODULE_REL_CRCS include/{linux,asm-generic}/export.h defines a weak symbol, __crc_* as a placeholder. Genksyms writes the version CRCs into the linker script, which will be used for filling the __crc_* symbols. The linker script format depends on CONFIG_MODULE_REL_CRCS. If it is enabled, __crc_* holds the offset to the reference of CRC. It is time to get rid of this complexity. Now that modpost parses text files (.*.cmd) to collect all the CRCs, it can generate C code that will be linked to the vmlinux or modules. Generate a new C file, .vmlinux.export.c, which contains the CRCs of symbols exported by vmlinux. It is compiled and linked to vmlinux in scripts/link-vmlinux.sh. Put the CRCs of symbols exported by modules into the existing *.mod.c files. No additional build step is needed for modules. As before, *.mod.c are compiled and linked to *.ko in scripts/Makefile.modfinal. No linker magic is used here. The new C implementation works in the same way, whether CONFIG_RELOCATABLE is enabled or not. CONFIG_MODULE_REL_CRCS is no longer needed. Previously, Kbuild invoked additional $(LD) to update the CRCs in objects, but this step is unneeded too. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Tested-by: Nathan Chancellor <nathan@kernel.org> Tested-by: Nicolas Schier <nicolas@fjasle.eu> Reviewed-by: Nicolas Schier <nicolas@fjasle.eu> Tested-by: Sedat Dilek <sedat.dilek@gmail.com> # LLVM-14 (x86-64)
|
#
7b42f104 |
|
19-May-2022 |
Johannes Weiner <hannes@cmpxchg.org> |
mm: Kconfig: move swap and slab config options to the MM section These are currently under General Setup. MM seems like a better fit. Link: https://lkml.kernel.org/r/20220510152847.230957-3-hannes@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Roman Gushchin <guro@fb.com> Cc: Seth Jennings <sjenning@redhat.com> Cc: Shakeel Butt <shakeelb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
#
430529b5 |
|
12-May-2022 |
Peter Xu <peterx@redhat.com> |
mm/uffd: move USERFAULTFD configs into mm/ We used to have USERFAULTFD configs stored in init/. It makes sense as a start because that's the default place for storing syscall related configs. However userfaultfd evolved a bit in the past few years and some more config options were added. They're no longer related to syscalls and start to be not suitable to be kept in the init/ directory anymore, because they're pure mm concepts. But it's not ideal either to keep the userfaultfd configs separate from each other. Hence this patch moves the userfaultfd configs under init/ to be under mm/ so that we'll start to group all userfaultfd configs together. We do have quite a few examples of syscall related configs that are not put under init/Kconfig: FTRACE_SYSCALLS, SWAP, FILE_LOCKING, MEMFD_CREATE.. They all reside in the dir where they're more suitable for the concept. So it seems there's no restriction to keep the role of having syscall related CONFIG_* under init/ only. Link: https://lkml.kernel.org/r/20220420144823.35277-1-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Suggested-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Axel Rasmussen <axelrasmussen@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
#
99bd9956 |
|
02-May-2022 |
Aaron Tomlin <atomlin@redhat.com> |
module: Introduce module unload taint tracking Currently, only the initial module that tainted the kernel is recorded e.g. when an out-of-tree module is loaded. The purpose of this patch is to allow the kernel to maintain a record of each unloaded module that taints the kernel. So, in addition to displaying a list of linked modules (see print_modules()) e.g. in the event of a detected bad page, unloaded modules that carried a taint/or taints are displayed too. A tainted module unload count is maintained. The number of tracked modules is not fixed. This feature is disabled by default. Signed-off-by: Aaron Tomlin <atomlin@redhat.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
|
#
1aa0e8b1 |
|
01-Feb-2022 |
Sean Christopherson <seanjc@google.com> |
Kconfig: Add option for asm goto w/ tied outputs to workaround clang-13 bug Add a config option to guard (future) usage of asm_volatile_goto() that includes "tied outputs", i.e. "+" constraints that specify both an input and output parameter. clang-13 has a bug[1] that causes compilation of such inline asm to fail, and KVM wants to use a "+m" constraint to implement a uaccess form of CMPXCHG[2]. E.g. the test code fails with <stdin>:1:29: error: invalid operand in inline asm: '.long (${1:l}) - .' int foo(int *x) { asm goto (".long (%l[bar]) - .\n": "+m"(*x) ::: bar); return *x; bar: return 0; } ^ <stdin>:1:29: error: unknown token in expression <inline asm>:1:9: note: instantiated into assembly here .long () - . ^ 2 errors generated. on clang-13, but passes on gcc (with appropriate asm goto support). The bug is fixed in clang-14, but won't be backported to clang-13 as the changes are too invasive/risky. gcc also had a similar bug[3], fixed in gcc-11, where gcc failed to account for its behavior of assigning two numbers to tied outputs (one for input, one for output) when evaluating symbolic references. [1] https://github.com/ClangBuiltLinux/linux/issues/1512 [2] https://lore.kernel.org/all/YfMruK8%2F1izZ2VHS@google.com [3] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98096 Suggested-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220202004945.2540433-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
1c4b5ecb |
|
23-Feb-2022 |
Christoph Hellwig <hch@lst.de> |
remove the h8300 architecture Signed-off-by: Christoph Hellwig <hch@lst.de>
|
#
5cf909c5 |
|
07-Jul-2021 |
Oliver Glitta <glittao@gmail.com> |
mm/slub: use stackdepot to save stack trace in objects Many stack traces are similar so there are many similar arrays. Stackdepot saves each unique stack only once. Replace field addrs in struct track with depot_stack_handle_t handle. Use stackdepot to save stack trace. The benefits are smaller memory overhead and possibility to aggregate per-cache statistics in the following patch using the stackdepot handle instead of matching stacks manually. [ vbabka@suse.cz: rebase to 5.17-rc1 and adjust accordingly ] This was initially merged as commit 788691464c29 and reverted by commit ae14c63a9f20 due to several issues, that should now be fixed. The problem of unconditional memory overhead by stackdepot has been addressed by commit 2dba5eb1c73b ("lib/stackdepot: allow optional init and stack_table allocation by kvmalloc()"), so the dependency on stackdepot will result in extra memory usage only when a slab cache tracking is actually enabled, and not for all CONFIG_SLUB_DEBUG builds. The build failures on some architectures were also addressed, and the reported issue with xfs/433 test did not reproduce on 5.17-rc1 with this patch. Signed-off-by: Oliver Glitta <glittao@gmail.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-and-tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Acked-by: David Rientjes <rientjes@google.com> Cc: David Rientjes <rientjes@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
|
#
f67695c9 |
|
01-Feb-2022 |
Elliot Berman <quic_eberman@quicinc.com> |
kbuild: Add environment variables for userprogs flags Allow additional arguments be passed to userprogs compilation. Reproducible clang builds need to provide a sysroot and gcc path to ensure the same toolchain is used across hosts. KCFLAGS is not currently used for any user programs compilation, so add new USERCFLAGS and USERLDFLAGS which serves similar purpose as HOSTCFLAGS/HOSTLDFLAGS. Clang might detect GCC installation on hosts which have it installed to a default location in /. With addition of these environment variables, you can specify flags such as: $ make USERCFLAGS=--sysroot=/path/to/sysroot This can also be used to specify different sysroots such as musl or bionic which may be installed on the host in paths that the compiler may not search by default. Signed-off-by: Elliot Berman <quic_eberman@quicinc.com> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Fangrui Song <maskray@google.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
#
613fe169 |
|
01-Feb-2022 |
Nathan Chancellor <nathan@kernel.org> |
kbuild: Add CONFIG_PAHOLE_VERSION There are a few different places where pahole's version is turned into a three digit form with the exact same command. Move this command into scripts/pahole-version.sh to reduce the amount of duplication across the tree. Create CONFIG_PAHOLE_VERSION so the version code can be used in Kconfig to enable and disable configuration options based on the pahole version, which is already done in a couple of places. Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220201205624.652313-3-nathan@kernel.org
|
#
1c6f9ec0 |
|
08-Feb-2022 |
Sebastian Andrzej Siewior <bigeasy@linutronix.de> |
locking: Enable RT_MUTEXES by default on PREEMPT_RT. The CONFIG_RT_MUTEXES option is enabled by CONFIG_FUTEX and CONFIG_I2C. If both are disabled then a CONFIG_PREEMPT_RT build fails to compile. It is not possible to have a PREEMPT_RT kernel without RT_MUTEX support because RT_MUTEX based locking is always used. Enable CONFIG_RT_MUTEXES by default on PREEMPT_RT builds. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/YgKmhjkcuqWXdUjQ@linutronix.de
|
#
4dc0759c |
|
29-Nov-2021 |
Nathan Chancellor <nathan@kernel.org> |
init/Kconfig: Drop linker version check for LD_ORPHAN_WARN The minimum supported version of LLVM has been raised to 11.0.0, meaning this check is always true, so it can be dropped. Signed-off-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Miguel Ojeda <ojeda@kernel.org> Reviewed-by: Mark Brown <broonie@kernel.org> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
#
b1ae6dc4 |
|
05-Jan-2022 |
Dmitry Torokhov <dmitry.torokhov@gmail.com> |
module: add in-kernel support for decompressing Current scheme of having userspace decompress kernel modules before loading them into the kernel runs afoul of LoadPin security policy, as it loses link between the source of kernel module on the disk and binary blob that is being loaded into the kernel. To solve this issue let's implement decompression in kernel, so that we can pass a file descriptor of compressed module file into finit_module() which will keep LoadPin happy. To let userspace know what compression/decompression scheme kernel supports it will create /sys/module/compression attribute. kmod can read this attribute and decide if it can pass compressed file to finit_module(). New MODULE_INIT_COMPRESSED_DATA flag indicates that the kernel should attempt to decompress the data read from file descriptor prior to trying load the module. To simplify things kernel will only implement single decompression method matching compression method selected when generating modules. This patch implements gzip and xz; more can be added later, Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
|
#
2aef6f30 |
|
10-Nov-2021 |
Sean Christopherson <seanjc@google.com> |
perf: Force architectures to opt-in to guest callbacks Introduce GUEST_PERF_EVENTS and require architectures to select it to allow registering and using guest callbacks in perf. This will hopefully make it more difficult for new architectures to add useless "support" for guest callbacks, e.g. via copy+paste. Stubbing out the helpers has the happy bonus of avoiding a load of perf_guest_cbs when GUEST_PERF_EVENTS=n on arm64/x86. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Link: https://lore.kernel.org/r/20211111020738.2512932-9-seanjc@google.com
|
#
3297481d |
|
25-Oct-2021 |
Arnd Bergmann <arnd@arndb.de> |
futex: Remove futex_cmpxchg detection Now that all architectures have a working futex implementation in any configuration, remove the runtime detection code. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Acked-by: Vineet Gupta <vgupta@kernel.org> Acked-by: Max Filippov <jcmvbkbc@gmail.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Link: https://lore.kernel.org/r/20211026100432.1730393-2-arnd@kernel.org
|
#
3f2bedab |
|
25-Oct-2021 |
Arnd Bergmann <arnd@arndb.de> |
futex: Ensure futex_atomic_cmpxchg_inatomic() is present The boot-time detection of futex_atomic_cmpxchg_inatomic() has a bug on some 32-bit arm builds, and Thomas Gleixner suggested that setting CONFIG_HAVE_FUTEX_CMPXCHG would avoid the problem, as it is always present anyway. Looking into which other architectures could do the same showed that almost all architectures have it, the exceptions being: - some old 32-bit MIPS uniprocessor cores without ll/sc - one xtensa variant with no SMP - 32-bit SPARC when built for SMP Fix MIPS And Xtensa by rearranging the generic code to let it be used as a fallback. For SPARC, the SMP definition just ends up turning off futex anyway, so this can be done at Kconfig time instead. Note that sparc32 glibc requires the CASA instruction for its mutexes anyway, which is only available when running on SPARCv9 or LEON CPUs, but needs to be implemented in the sparc32 kernel for those. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Max Filippov <jcmvbkbc@gmail.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Rich Felker <dalias@libc.org> Link: https://lore.kernel.org/r/20211026100432.1730393-1-arnd@kernel.org
|
#
eb52c0fc |
|
24-Dec-2021 |
Hyeonggon Yoo <42.hyeyoo@gmail.com> |
mm: Make SLAB_MERGE_DEFAULT depend on SL[AU]B SLOB always manage objects of different caches in same page regardless of SLAB_MERGE_DEFAULT. Because it has no effect on SLOB, make it depend on SLAB || SLUB. Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Link: https://lore.kernel.org/r/20211225060921.13584-1-42.hyeyoo@gmail.com
|
#
7e97b3dc |
|
09-Nov-2021 |
Lukasz Luba <lukasz.luba@arm.com> |
arch_topology: Remove unused topology_set_thermal_pressure() and related There is no need of this function (and related) since code has been converted to use the new arch_update_thermal_pressure() API. The old code can be removed. Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
|
#
158ea2d2 |
|
14-Nov-2021 |
Gustavo A. R. Silva <gustavoars@kernel.org> |
kbuild: Fix -Wimplicit-fallthrough=5 error for GCC 5.x and 6.x -Wimplicit-fallthrough=5 was under cc-option because it was only available in GCC 7.x and newer so the build is now broken for GCC 5.x and 6.x: gcc: error: unrecognized command line option '-Wimplicit-fallthrough=5'; did you mean '-Wno-fallthrough'? Fix this by moving -Wimplicit-fallthrough=5 under cc-option. Fixes: dee2b702bcf0 ("kconfig: Add support for -Wimplicit-fallthrough") Reported-by: Nathan Chancellor <nathan@kernel.org> Co-developed-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
dee2b702 |
|
13-Nov-2021 |
Gustavo A. R. Silva <gustavoars@kernel.org> |
kconfig: Add support for -Wimplicit-fallthrough Add Kconfig support for -Wimplicit-fallthrough for both GCC and Clang. The compiler option is under configuration CC_IMPLICIT_FALLTHROUGH, which is enabled by default. Special thanks to Nathan Chancellor who fixed the Clang bug[1][2]. This bugfix only appears in Clang 14.0.0, so older versions still contain the bug and -Wimplicit-fallthrough won't be enabled for them, for now. This concludes a long journey and now we are finally getting rid of the unintentional fallthrough bug-class in the kernel, entirely. :) Link: https://github.com/llvm/llvm-project/commit/9ed4a94d6451046a51ef393cd62f00710820a7e8 [1] Link: https://bugs.llvm.org/show_bug.cgi?id=51094 [2] Link: https://github.com/KSPP/linux/issues/115 Link: https://github.com/ClangBuiltLinux/linux/issues/236 Co-developed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Kees Cook <keescook@chromium.org> Co-developed-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Tested-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
252220da |
|
10-Nov-2021 |
Ingo Molnar <mingo@kernel.org> |
mm: allow only SLUB on PREEMPT_RT Memory allocators may disable interrupts or preemption as part of the allocation and freeing process. For PREEMPT_RT it is important that these sections remain deterministic and short and therefore don't depend on the size of the memory to allocate/ free or the inner state of the algorithm. Until v3.12-RT the SLAB allocator was an option but involved several changes to meet all the requirements. The SLUB design fits better with PREEMPT_RT model and so the SLAB patches were dropped in the 3.12-RT patchset. Comparing the two allocator, SLUB outperformed SLAB in both throughput (time needed to allocate and free memory) and the maximal latency of the system measured with cyclictest during hackbench. SLOB was never evaluated since it was unlikely that it preforms better than SLAB. During a quick test, the kernel crashed with SLOB enabled during boot. Disable SLAB and SLOB on PREEMPT_RT. [bigeasy@linutronix.de: commit description] Link: https://lkml.kernel.org/r/20211015210336.gen3tib33ig5q2md@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
554b0f3c |
|
05-Nov-2021 |
Sebastian Andrzej Siewior <bigeasy@linutronix.de> |
mm: disable NUMA_BALANCING_DEFAULT_ENABLED and TRANSPARENT_HUGEPAGE on PREEMPT_RT TRANSPARENT_HUGEPAGE: There are potential non-deterministic delays to an RT thread if a critical memory region is not THP-aligned and a non-RT buffer is located in the same hugepage-aligned region. It's also possible for an unrelated thread to migrate pages belonging to an RT task incurring unexpected page faults due to memory defragmentation even if khugepaged is disabled. Regular HUGEPAGEs are not affected by this can be used. NUMA_BALANCING: There is a non-deterministic delay to mark PTEs PROT_NONE to gather NUMA fault samples, increased page faults of regions even if mlocked and non-deterministic delays when migrating pages. [Mel Gorman worded 99% of the commit description]. Link: https://lore.kernel.org/all/20200304091159.GN3818@techsingularity.net/ Link: https://lore.kernel.org/all/20211026165100.ahz5bkx44lrrw5pt@linutronix.de/ Link: https://lkml.kernel.org/r/20211028143327.hfbxjze7palrpfgp@linutronix.de Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Mel Gorman <mgorman@techsingularity.net> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
b339ec9c |
|
07-Sep-2021 |
Marco Elver <elver@google.com> |
kbuild: Only default to -Werror if COMPILE_TEST The cross-product of the kernel's supported toolchains, architectures, and configuration options is large. So large, that it's generally accepted to be infeasible to enumerate and build+test them all (many compile-testers rely on randomly generated configs). Without the possibility to enumerate all possible combinations of toolchains, architectures, and configuration options, it is inevitable that compiler warnings in this space exist. With -Werror, this means that an innumerable set of kernels are now broken, yet had been perfectly usable before (confused compilers, code with warnings unused, or luck). Distributors will necessarily pick a point in the toolchain X arch X config space, and if unlucky, will have a broken build. Granted, those will likely disable CONFIG_WERROR and move on. The kernel's default configuration is unlikely to be suitable for all users, but it's inappropriate to force many users to set CONFIG_WERROR=n. This also holds for CI systems which are focused on runtime testing, where the odd warning in some subsystem will disrupt testing of the rest of the kernel. Many of those runtime-focused CI systems run tests or fuzz the kernel using runtime debugging tools. Runtime testing of different subsystems can proceed in parallel, and potentially uncover serious bugs; halting runtime testing of the entire kernel because of the odd warning (now error) in a subsystem or driver is simply inappropriate. Therefore, runtime-focused CI systems will likely choose CONFIG_WERROR=n as well. The appropriate usecase for -Werror is therefore compile-test focused builds (often done by developers or CI systems). Reflect this in the Kconfig option by making the default value of WERROR match COMPILE_TEST. Signed-off-by: Marco Elver <elver@google.com> Acked-by: Guenter Roeck <linux@roeck-us.net> Acked-by: Randy Dunlap <rdunlap@infradead.org> Reviwed-by: Mark Brown <broonie@kernel.org> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
3fe617cc |
|
05-Sep-2021 |
Linus Torvalds <torvalds@linux-foundation.org> |
Enable '-Werror' by default for all kernel builds ... but make it a config option so that broken environments can disable it when required. We really should always have a clean build, and will disable specific over-eager warnings as required, if we can't fix them. But while I fairly religiously enforce that in my own tree, it doesn't get enforced by various build robots that don't necessarily report warnings. So this just makes '-Werror' a default compiler flag, but allows people to disable it for their configuration if they have some particular issues. Occasionally, new compiler versions end up enabling new warnings, and it can take a while before we have them fixed (or the warnings disabled if that is what it takes), so the config option allows for that situation. Hopefully this will mean that I get fewer pull requests that have new warnings that were not noticed by various automation we have in place. Knock wood. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
85e3e7fb |
|
15-Jul-2021 |
John Ogness <john.ogness@linutronix.de> |
printk: remove NMI tracking All NMI contexts are handled the same as the safe context: store the message and defer printing. There is no need to have special NMI context tracking for this. Using in_nmi() is enough. There are several parts of the kernel that are manually calling into the printk NMI context tracking in order to cause general printk deferred printing: arch/arm/kernel/smp.c arch/powerpc/kexec/crash.c kernel/trace/trace.c For arm/kernel/smp.c and powerpc/kexec/crash.c, provide a new function pair printk_deferred_enter/exit that explicitly achieves the same objective. For ftrace, remove the printk context manipulation completely. It was added in commit 03fc7f9c99c1 ("printk/nmi: Prevent deadlock when accessing the main log buffer in NMI"). The purpose was to enforce storing messages directly into the ring buffer even in NMI context. It really should have only modified the behavior in NMI context. There is no need for a special behavior any longer. All messages are always stored directly now. The console deferring is handled transparently in vprintk(). Signed-off-by: John Ogness <john.ogness@linutronix.de> [pmladek@suse.com: Remove special handling in ftrace.c completely. Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20210715193359.25946-5-john.ogness@linutronix.de
|
#
33701557 |
|
15-Jun-2021 |
Chris Down <chris@chrisdown.name> |
printk: Userspace format indexing support We have a number of systems industry-wide that have a subset of their functionality that works as follows: 1. Receive a message from local kmsg, serial console, or netconsole; 2. Apply a set of rules to classify the message; 3. Do something based on this classification (like scheduling a remediation for the machine), rinse, and repeat. As a couple of examples of places we have this implemented just inside Facebook, although this isn't a Facebook-specific problem, we have this inside our netconsole processing (for alarm classification), and as part of our machine health checking. We use these messages to determine fairly important metrics around production health, and it's important that we get them right. While for some kinds of issues we have counters, tracepoints, or metrics with a stable interface which can reliably indicate the issue, in order to react to production issues quickly we need to work with the interface which most kernel developers naturally use when developing: printk. Most production issues come from unexpected phenomena, and as such usually the code in question doesn't have easily usable tracepoints or other counters available for the specific problem being mitigated. We have a number of lines of monitoring defence against problems in production (host metrics, process metrics, service metrics, etc), and where it's not feasible to reliably monitor at another level, this kind of pragmatic netconsole monitoring is essential. As one would expect, monitoring using printk is rather brittle for a number of reasons -- most notably that the message might disappear entirely in a new version of the kernel, or that the message may change in some way that the regex or other classification methods start to silently fail. One factor that makes this even harder is that, under normal operation, many of these messages are never expected to be hit. For example, there may be a rare hardware bug which one wants to detect if it was to ever happen again, but its recurrence is not likely or anticipated. This precludes using something like checking whether the printk in question was printed somewhere fleetwide recently to determine whether the message in question is still present or not, since we don't anticipate that it should be printed anywhere, but still need to monitor for its future presence in the long-term. This class of issue has happened on a number of occasions, causing unhealthy machines with hardware issues to remain in production for longer than ideal. As a recent example, some monitoring around blk_update_request fell out of date and caused semi-broken machines to remain in production for longer than would be desirable. Searching through the codebase to find the message is also extremely fragile, because many of the messages are further constructed beyond their callsite (eg. btrfs_printk and other module-specific wrappers, each with their own functionality). Even if they aren't, guessing the format and formulation of the underlying message based on the aesthetics of the message emitted is not a recipe for success at scale, and our previous issues with fleetwide machine health checking demonstrate as much. This provides a solution to the issue of silently changed or deleted printks: we record pointers to all printk format strings known at compile time into a new .printk_index section, both in vmlinux and modules. At runtime, this can then be iterated by looking at <debugfs>/printk/index/<module>, which emits the following format, both readable by humans and able to be parsed by machines: $ head -1 vmlinux; shuf -n 5 vmlinux # <level[,flags]> filename:line function "format" <5> block/blk-settings.c:661 disk_stack_limits "%s: Warning: Device %s is misaligned\n" <4> kernel/trace/trace.c:8296 trace_create_file "Could not create tracefs '%s' entry\n" <6> arch/x86/kernel/hpet.c:144 _hpet_print_config "hpet: %s(%d):\n" <6> init/do_mounts.c:605 prepare_namespace "Waiting for root device %s...\n" <6> drivers/acpi/osl.c:1410 acpi_no_auto_serialize_setup "ACPI: auto-serialization disabled\n" This mitigates the majority of cases where we have a highly-specific printk which we want to match on, as we can now enumerate and check whether the format changed or the printk callsite disappeared entirely in userspace. This allows us to catch changes to printks we monitor earlier and decide what to do about it before it becomes problematic. There is no additional runtime cost for printk callers or printk itself, and the assembly generated is exactly the same. Signed-off-by: Chris Down <chris@chrisdown.name> Cc: Petr Mladek <pmladek@suse.com> Cc: Jessica Yu <jeyu@kernel.org> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: John Ogness <john.ogness@linutronix.de> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kees Cook <keescook@chromium.org> Reviewed-by: Petr Mladek <pmladek@suse.com> Tested-by: Petr Mladek <pmladek@suse.com> Reported-by: kernel test robot <lkp@intel.com> Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com> Acked-by: Jessica Yu <jeyu@kernel.org> # for module.{c,h} Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/e42070983637ac5e384f17fbdbe86d19c7b212a5.1623775748.git.chris@chrisdown.name
|
#
ae14c63a |
|
17-Jul-2021 |
Linus Torvalds <torvalds@linux-foundation.org> |
Revert "mm/slub: use stackdepot to save stack trace in objects" This reverts commit 788691464c29455346dc613a3b43c2fb9e5757a4. It's not clear why, but it causes unexplained problems in entirely unrelated xfs code. The most likely explanation is some slab corruption, possibly triggered due to CONFIG_SLUB_DEBUG_ON. See [1]. It ends up having a few other problems too, like build errors on arch/arc, and Geert reporting it using much more memory on m68k [3] (it probably does so elsewhere too, but it is probably just more noticeable on m68k). The architecture issues (both build and memory use) are likely just because this change effectively force-enabled STACKDEPOT (along with a very bad default value for the stackdepot hash size). But together with the xfs issue, this all smells like "this commit was not ready" to me. Link: https://lore.kernel.org/linux-xfs/YPE3l82acwgI2OiV@infradead.org/ [1] Link: https://lore.kernel.org/lkml/202107150600.LkGNb4Vb-lkp@intel.com/ [2] Link: https://lore.kernel.org/lkml/CAMuHMdW=eoVzM1Re5FVoEN87nKfiLmM2+Ah7eNu2KXEhCvbZyA@mail.gmail.com/ [3] Reported-by: Christoph Hellwig <hch@infradead.org> Reported-by: kernel test robot <lkp@intel.com> Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
78869146 |
|
07-Jul-2021 |
Oliver Glitta <glittao@gmail.com> |
mm/slub: use stackdepot to save stack trace in objects Many stack traces are similar so there are many similar arrays. Stackdepot saves each unique stack only once. Replace field addrs in struct track with depot_stack_handle_t handle. Use stackdepot to save stack trace. The benefits are smaller memory overhead and possibility to aggregate per-cache statistics in the future using the stackdepot handle instead of matching stacks manually. [rdunlap@infradead.org: rename save_stack_trace()] Link: https://lkml.kernel.org/r/20210513051920.29320-1-rdunlap@infradead.org [vbabka@suse.cz: fix lockdep splat] Link: https://lkml.kernel.org/r/20210516195150.26740-1-vbabka@suse.czLink: https://lkml.kernel.org/r/20210414163434.4376-1-glittao@gmail.com Signed-off-by: Oliver Glitta <glittao@gmail.com> Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: David Rientjes <rientjes@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
51c2ee6d |
|
21-Jun-2021 |
Nick Desaulniers <ndesaulniers@google.com> |
Kconfig: Introduce ARCH_WANTS_NO_INSTR and CC_HAS_NO_PROFILE_FN_ATTR We don't want compiler instrumentation to touch noinstr functions, which are annotated with the no_profile_instrument_function function attribute. Add a Kconfig test for this and make GCOV depend on it, and in the future, PGO. If an architecture is using noinstr, it should denote that via this Kconfig value. That makes Kconfigs that depend on noinstr able to express dependencies in an architecturally agnostic way. Cc: Masahiro Yamada <masahiroy@kernel.org> Link: https://lore.kernel.org/lkml/YMTn9yjuemKFLbws@hirez.programming.kicks-ass.net/ Link: https://lore.kernel.org/lkml/YMcssV%2Fn5IBGv4f0@hirez.programming.kicks-ass.net/ Suggested-by: Nathan Chancellor <nathan@kernel.org> Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Peter Oberparleiter <oberpar@linux.ibm.com> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20210621231822.2848305-4-ndesaulniers@google.com
|
#
b24abcff |
|
11-May-2021 |
Daniel Borkmann <daniel@iogearbox.net> |
bpf, kconfig: Add consolidated menu entry for bpf with core options Right now, all core BPF related options are scattered in different Kconfig locations mainly due to historic reasons. Moving forward, lets add a proper subsystem entry under ... General setup ---> BPF subsystem ---> ... in order to have all knobs in a single location and thus ease BPF related configuration. Networking related bits such as sockmap are out of scope for the general setup and therefore better suited to remain in net/Kconfig. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/f23f58765a4d59244ebd8037da7b6a6b2fb58446.1620765074.git.daniel@iogearbox.net
|
#
17652f42 |
|
06-May-2021 |
Rasmus Villemoes <linux@rasmusvillemoes.dk> |
modules: add CONFIG_MODPROBE_PATH Allow the developer to specifiy the initial value of the modprobe_path[] string. This can be used to set it to the empty string initially, thus effectively disabling request_module() during early boot until userspace writes a new value via the /proc/sys/kernel/modprobe interface. [1] When building a custom kernel (often for an embedded target), it's normal to build everything into the kernel that is needed for booting, and indeed the initramfs often contains no modules at all, so every such request_module() done before userspace init has mounted the real rootfs is a waste of time. This is particularly useful when combined with the previous patch, which made the initramfs unpacking asynchronous - for that to work, it had to make any usermodehelper call wait for the unpacking to finish before attempting to invoke the userspace helper. By eliminating all such (known-to-be-futile) calls of usermodehelper, the initramfs unpacking and the {device,late}_initcalls can proceed in parallel for much longer. For a relatively slow ppc board I'm working on, the two patches combined lead to 0.2s faster boot - but more importantly, the fact that the initramfs unpacking proceeds completely in the background while devices get probed means I get to handle the gpio watchdog in time without getting reset. [1] __request_module() already has an early -ENOENT return when modprobe_path is the empty string. Link: https://lkml.kernel.org/r/20210313212528.2956377-3-linux@rasmusvillemoes.dk Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Jessica Yu <jeyu@kernel.org> Acked-by: Luis Chamberlain <mcgrof@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Takashi Iwai <tiwai@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
7677f7fd |
|
04-May-2021 |
Axel Rasmussen <axelrasmussen@google.com> |
userfaultfd: add minor fault registration mode Patch series "userfaultfd: add minor fault handling", v9. Overview ======== This series adds a new userfaultfd feature, UFFD_FEATURE_MINOR_HUGETLBFS. When enabled (via the UFFDIO_API ioctl), this feature means that any hugetlbfs VMAs registered with UFFDIO_REGISTER_MODE_MISSING will *also* get events for "minor" faults. By "minor" fault, I mean the following situation: Let there exist two mappings (i.e., VMAs) to the same page(s) (shared memory). One of the mappings is registered with userfaultfd (in minor mode), and the other is not. Via the non-UFFD mapping, the underlying pages have already been allocated & filled with some contents. The UFFD mapping has not yet been faulted in; when it is touched for the first time, this results in what I'm calling a "minor" fault. As a concrete example, when working with hugetlbfs, we have huge_pte_none(), but find_lock_page() finds an existing page. We also add a new ioctl to resolve such faults: UFFDIO_CONTINUE. The idea is, userspace resolves the fault by either a) doing nothing if the contents are already correct, or b) updating the underlying contents using the second, non-UFFD mapping (via memcpy/memset or similar, or something fancier like RDMA, or etc...). In either case, userspace issues UFFDIO_CONTINUE to tell the kernel "I have ensured the page contents are correct, carry on setting up the mapping". Use Case ======== Consider the use case of VM live migration (e.g. under QEMU/KVM): 1. While a VM is still running, we copy the contents of its memory to a target machine. The pages are populated on the target by writing to the non-UFFD mapping, using the setup described above. The VM is still running (and therefore its memory is likely changing), so this may be repeated several times, until we decide the target is "up to date enough". 2. We pause the VM on the source, and start executing on the target machine. During this gap, the VM's user(s) will *see* a pause, so it is desirable to minimize this window. 3. Between the last time any page was copied from the source to the target, and when the VM was paused, the contents of that page may have changed - and therefore the copy we have on the target machine is out of date. Although we can keep track of which pages are out of date, for VMs with large amounts of memory, it is "slow" to transfer this information to the target machine. We want to resume execution before such a transfer would complete. 4. So, the guest begins executing on the target machine. The first time it touches its memory (via the UFFD-registered mapping), userspace wants to intercept this fault. Userspace checks whether or not the page is up to date, and if not, copies the updated page from the source machine, via the non-UFFD mapping. Finally, whether a copy was performed or not, userspace issues a UFFDIO_CONTINUE ioctl to tell the kernel "I have ensured the page contents are correct, carry on setting up the mapping". We don't have to do all of the final updates on-demand. The userfaultfd manager can, in the background, also copy over updated pages once it receives the map of which pages are up-to-date or not. Interaction with Existing APIs ============================== Because this is a feature, a registered VMA could potentially receive both missing and minor faults. I spent some time thinking through how the existing API interacts with the new feature: UFFDIO_CONTINUE cannot be used to resolve non-minor faults, as it does not allocate a new page. If UFFDIO_CONTINUE is used on a non-minor fault: - For non-shared memory or shmem, -EINVAL is returned. - For hugetlb, -EFAULT is returned. UFFDIO_COPY and UFFDIO_ZEROPAGE cannot be used to resolve minor faults. Without modifications, the existing codepath assumes a new page needs to be allocated. This is okay, since userspace must have a second non-UFFD-registered mapping anyway, thus there isn't much reason to want to use these in any case (just memcpy or memset or similar). - If UFFDIO_COPY is used on a minor fault, -EEXIST is returned. - If UFFDIO_ZEROPAGE is used on a minor fault, -EEXIST is returned (or -EINVAL in the case of hugetlb, as UFFDIO_ZEROPAGE is unsupported in any case). - UFFDIO_WRITEPROTECT simply doesn't work with shared memory, and returns -ENOENT in that case (regardless of the kind of fault). Future Work =========== This series only supports hugetlbfs. I have a second series in flight to support shmem as well, extending the functionality. This series is more mature than the shmem support at this point, and the functionality works fully on hugetlbfs, so this series can be merged first and then shmem support will follow. This patch (of 6): This feature allows userspace to intercept "minor" faults. By "minor" faults, I mean the following situation: Let there exist two mappings (i.e., VMAs) to the same page(s). One of the mappings is registered with userfaultfd (in minor mode), and the other is not. Via the non-UFFD mapping, the underlying pages have already been allocated & filled with some contents. The UFFD mapping has not yet been faulted in; when it is touched for the first time, this results in what I'm calling a "minor" fault. As a concrete example, when working with hugetlbfs, we have huge_pte_none(), but find_lock_page() finds an existing page. This commit adds the new registration mode, and sets the relevant flag on the VMAs being registered. In the hugetlb fault path, if we find that we have huge_pte_none(), but find_lock_page() does indeed find an existing page, then we have a "minor" fault, and if the VMA has the userfaultfd registration flag, we call into userfaultfd to handle it. This is implemented as a new registration mode, instead of an API feature. This is because the alternative implementation has significant drawbacks [1]. However, doing it this was requires we allocate a VM_* flag for the new registration mode. On 32-bit systems, there are no unused bits, so this feature is only supported on architectures with CONFIG_ARCH_USES_HIGH_VMA_FLAGS. When attempting to register a VMA in MINOR mode on 32-bit architectures, we return -EINVAL. [1] https://lore.kernel.org/patchwork/patch/1380226/ [peterx@redhat.com: fix minor fault page leak] Link: https://lkml.kernel.org/r/20210322175132.36659-1-peterx@redhat.com Link: https://lkml.kernel.org/r/20210301222728.176417-1-axelrasmussen@google.com Link: https://lkml.kernel.org/r/20210301222728.176417-2-axelrasmussen@google.com Signed-off-by: Axel Rasmussen <axelrasmussen@google.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chinwen Chang <chinwen.chang@mediatek.com> Cc: Huang Ying <ying.huang@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Lokesh Gidra <lokeshgidra@google.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: "Michal Koutn" <mkoutny@suse.com> Cc: Michel Lespinasse <walken@google.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Peter Xu <peterx@redhat.com> Cc: Shaohua Li <shli@fb.com> Cc: Shawn Anastasio <shawn@anastas.io> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Steven Price <steven.price@arm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Adam Ruprecht <ruprecht@google.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Cannon Matthews <cannonmatthews@google.com> Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Mina Almasry <almasrymina@google.com> Cc: Oliver Upton <oupton@google.com> Cc: Kirill A. Shutemov <kirill@shutemov.name> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
0165f4ca |
|
09-Apr-2021 |
Nayna Jain <nayna@linux.ibm.com> |
ima: enable signing of modules with build time generated key The kernel build process currently only signs kernel modules when MODULE_SIG is enabled. Also, sign the kernel modules at build time when IMA_APPRAISE_MODSIG is enabled. Signed-off-by: Nayna Jain <nayna@linux.ibm.com> Acked-by: Stefan Berger <stefanb@linux.ibm.com> Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
|
#
48cac3f4 |
|
27-Apr-2021 |
Florent Revest <revest@chromium.org> |
bpf: Implement formatted output helpers with bstr_printf BPF has three formatted output helpers: bpf_trace_printk, bpf_seq_printf and bpf_snprintf. Their signatures specify that all arguments are provided from the BPF world as u64s (in an array or as registers). All of these helpers are currently implemented by calling functions such as snprintf() whose signatures take a variable number of arguments, then placed in a va_list by the compiler to call vsnprintf(). "d9c9e4db bpf: Factorize bpf_trace_printk and bpf_seq_printf" introduced a bpf_printf_prepare function that fills an array of u64 sanitized arguments with an array of "modifiers" which indicate what the "real" size of each argument should be (given by the format specifier). The BPF_CAST_FMT_ARG macro consumes these arrays and casts each argument to its real size. However, the C promotion rules implicitely cast them all back to u64s. Therefore, the arguments given to snprintf are u64s and the va_list constructed by the compiler will use 64 bits for each argument. On 64 bit machines, this happens to work well because 32 bit arguments in va_lists need to occupy 64 bits anyway, but on 32 bit architectures this breaks the layout of the va_list expected by the called function and mangles values. In "88a5c690b6 bpf: fix bpf_trace_printk on 32 bit archs", this problem had been solved for bpf_trace_printk only with a "horrid workaround" that emitted multiple calls to trace_printk where each call had different argument types and generated different va_list layouts. One of the call would be dynamically chosen at runtime. This was ok with the 3 arguments that bpf_trace_printk takes but bpf_seq_printf and bpf_snprintf accept up to 12 arguments. Because this approach scales code exponentially, it is not a viable option anymore. Because the promotion rules are part of the language and because the construction of a va_list is an arch-specific ABI, it's best to just avoid variadic arguments and va_lists altogether. Thankfully the kernel's snprintf() has an alternative in the form of bstr_printf() that accepts arguments in a "binary buffer representation". These binary buffers are currently created by vbin_printf and used in the tracing subsystem to split the cost of printing into two parts: a fast one that only dereferences and remembers values, and a slower one, called later, that does the pretty-printing. This patch refactors bpf_printf_prepare to construct binary buffers of arguments consumable by bstr_printf() instead of arrays of arguments and modifiers. This gets rid of BPF_CAST_FMT_ARG and greatly simplifies the bpf_printf_prepare usage but there are a few gotchas that change how bpf_printf_prepare needs to do things. Currently, bpf_printf_prepare uses a per cpu temporary buffer as a generic storage for strings and IP addresses. With this refactoring, the temporary buffers now holds all the arguments in a structured binary format. To comply with the format expected by bstr_printf, certain format specifiers also need to be pre-formatted: %pB and %pi6/%pi4/%pI4/%pI6. Because vsnprintf subroutines for these specifiers are hard to expose, we pre-format these arguments with calls to snprintf(). Reported-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Florent Revest <revest@chromium.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210427174313.860948-3-revest@chromium.org
|
#
0e0345b7 |
|
15-Apr-2021 |
Alexey Dobriyan <adobriyan@gmail.com> |
kbuild: redo fake deps at include/config/*.h Make include/config/foo/bar.h fake deps files generation simpler. * delete .h suffix those aren't header files, shorten filenames, * delete tolower() Linux filesystems can deal with both upper and lowercase filenames very well, * put everything in 1 directory Presumably 'mkdir -p' split is from dark times when filesystems handled huge directories badly, disks were round adding to seek times. x86_64 allmodconfig lists 12364 files in include/config. ../obj/include/config/ ├── 104_QUAD_8 ├── 60XX_WDT ├── 64BIT ... ├── ZSWAP_DEFAULT_ON ├── ZSWAP_ZPOOL_DEFAULT └── ZSWAP_ZPOOL_DEFAULT_ZBUD 0 directories, 12364 files Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
#
c3d7ef37 |
|
07-Apr-2021 |
Piotr Gorski <lucjan.lucjanov@gmail.com> |
kbuild: add support for zstd compressed modules kmod 28 supports modules compressed in zstd format so let's add this possibility to kernel. Signed-off-by: Piotr Gorski <lucjan.lucjanov@gmail.com> Reviewed-by: Oleksandr Natalenko <oleksandr@natalenko.name> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
#
d4bbe942 |
|
31-Mar-2021 |
Masahiro Yamada <masahiroy@kernel.org> |
kbuild: remove CONFIG_MODULE_COMPRESS CONFIG_MODULE_COMPRESS is only used to activate the choice for module compression algorithm. It will be simpler to make the choice always visible, and add CONFIG_MODULE_COMPRESS_NONE in the choice. This is more consistent with the "Kernel compression mode" and "Built-in initramfs compression mode" choices. CONFIG_KERNEL_UNCOMPRESSED and CONFIG_INITRAMFS_COMPRESSION_NONE are available to choose no compression. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
|
#
ba64beb1 |
|
15-Mar-2021 |
Masahiro Yamada <masahiroy@kernel.org> |
kbuild: check the minimum assembler version in Kconfig Documentation/process/changes.rst defines the minimum assembler version (binutils version), but we have never checked it in the build time. Kbuild never invokes 'as' directly because all assembly files in the kernel tree are *.S, hence must be preprocessed. I do not expect raw assembly source files (*.s) would be added to the kernel tree. Therefore, we always use $(CC) as the assembler driver, and commit aa824e0c962b ("kbuild: remove AS variable") removed 'AS'. However, we are still interested in the version of the assembler acting behind. As usual, the --version option prints the version string. $ as --version | head -n 1 GNU assembler (GNU Binutils for Ubuntu) 2.35.1 But, we do not have $(AS). So, we can add the -Wa prefix so that $(CC) passes --version down to the backing assembler. $ gcc -Wa,--version | head -n 1 gcc: fatal error: no input files compilation terminated. OK, we need to input something to satisfy gcc. $ gcc -Wa,--version -c -x assembler /dev/null -o /dev/null | head -n 1 GNU assembler (GNU Binutils for Ubuntu) 2.35.1 The combination of Clang and GNU assembler works in the same way: $ clang -no-integrated-as -Wa,--version -c -x assembler /dev/null -o /dev/null | head -n 1 GNU assembler (GNU Binutils for Ubuntu) 2.35.1 Clang with the integrated assembler fails like this: $ clang -integrated-as -Wa,--version -c -x assembler /dev/null -o /dev/null | head -n 1 clang: error: unsupported argument '--version' to option 'Wa,' For the last case, checking the error message is fragile. If the proposal for -Wa,--version support [1] is accepted, this may not be even an error in the future. One easy way is to check if -integrated-as is present in the passed arguments. We did not pass -integrated-as to CLANG_FLAGS before, but we can make it explicit. Nathan pointed out -integrated-as is the default for all of the architectures/targets that the kernel cares about, but it goes along with "explicit is better than implicit" policy. [2] With all this in my mind, I implemented scripts/as-version.sh to check the assembler version in Kconfig time. $ scripts/as-version.sh gcc GNU 23501 $ scripts/as-version.sh clang -no-integrated-as GNU 23501 $ scripts/as-version.sh clang -integrated-as LLVM 0 [1]: https://github.com/ClangBuiltLinux/linux/issues/1320 [2]: https://lore.kernel.org/linux-kbuild/20210307044253.v3h47ucq6ng25iay@archlinux-ax161/ Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Nathan Chancellor <nathan@kernel.org>
|
#
6dd85ff1 |
|
13-Mar-2021 |
Masahiro Yamada <masahiroy@kernel.org> |
kconfig: change "modules" from sub-option to first-level attribute Now "modules" is the only member of the "option" property. Remove "option", and move "modules" to the top level property. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
#
f8f0d064 |
|
13-Mar-2021 |
Masahiro Yamada <masahiroy@kernel.org> |
kconfig: do not use allnoconfig_y option allnoconfig_y is an ugly hack that sets a symbol to 'y' by allnoconfig. allnoconfig does not mean a minimal set of CONFIG options because a bunch of prompts are hidden by 'if EMBEDDED' or 'if EXPERT', but I do not like to hack Kconfig this way. Use the pre-existing feature, KCONFIG_ALLCONFIG, to provide a one liner config fragment. CONFIG_EMBEDDED=y is still forced when allnoconfig is invoked as a part of tinyconfig. No change in the .config file produced by 'make tinyconfig'. The output of 'make allnoconfig' will be changed; we will get CONFIG_EMBEDDED=n because allnoconfig literally sets all symbols to n. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
#
b75b0a81 |
|
13-Mar-2021 |
Masahiro Yamada <masahiroy@kernel.org> |
kconfig: change defconfig_list option to environment variable "defconfig_list" is a weird option that defines a static symbol that declares the list of base config files in case the .config does not exist yet. This is quite different from other normal symbols; we just abused the "string" type and the "default" properties to list out the input files. They must be fixed values since these are searched for and loaded in the parse stage. It is an ugly hack, and should not exist in the first place. Providing this feature as an environment variable is a saner approach. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
#
88759609 |
|
23-Feb-2021 |
Cong Wang <cong.wang@bytedance.com> |
bpf: Clean up sockmap related Kconfigs As suggested by John, clean up sockmap related Kconfigs: Reduce the scope of CONFIG_BPF_STREAM_PARSER down to TCP stream parser, to reflect its name. Make the rest sockmap code simply depend on CONFIG_BPF_SYSCALL and CONFIG_INET, the latter is still needed at this point because of TCP/UDP proto update. And leave CONFIG_NET_SOCK_MSG untouched, as it is used by non-sockmap cases. Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Lorenz Bauer <lmb@cloudflare.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/20210223184934.6054-2-xiyou.wangcong@gmail.com
|
#
cf68fffb |
|
08-Apr-2021 |
Sami Tolvanen <samitolvanen@google.com> |
add support for Clang CFI This change adds support for Clang’s forward-edge Control Flow Integrity (CFI) checking. With CONFIG_CFI_CLANG, the compiler injects a runtime check before each indirect function call to ensure the target is a valid function with the correct static type. This restricts possible call targets and makes it more difficult for an attacker to exploit bugs that allow the modification of stored function pointers. For more details, see: https://clang.llvm.org/docs/ControlFlowIntegrity.html Clang requires CONFIG_LTO_CLANG to be enabled with CFI to gain visibility to possible call targets. Kernel modules are supported with Clang’s cross-DSO CFI mode, which allows checking between independently compiled components. With CFI enabled, the compiler injects a __cfi_check() function into the kernel and each module for validating local call targets. For cross-module calls that cannot be validated locally, the compiler calls the global __cfi_slowpath_diag() function, which determines the target module and calls the correct __cfi_check() function. This patch includes a slowpath implementation that uses __module_address() to resolve call targets, and with CONFIG_CFI_CLANG_SHADOW enabled, a shadow map that speeds up module look-ups by ~3x. Clang implements indirect call checking using jump tables and offers two methods of generating them. With canonical jump tables, the compiler renames each address-taken function to <function>.cfi and points the original symbol to a jump table entry, which passes __cfi_check() validation. This isn’t compatible with stand-alone assembly code, which the compiler doesn’t instrument, and would result in indirect calls to assembly code to fail. Therefore, we default to using non-canonical jump tables instead, where the compiler generates a local jump table entry <function>.cfi_jt for each address-taken function, and replaces all references to the function with the address of the jump table entry. Note that because non-canonical jump table addresses are local to each component, they break cross-module function address equality. Specifically, the address of a global function will be different in each module, as it's replaced with the address of a local jump table entry. If this address is passed to a different module, it won’t match the address of the same function taken there. This may break code that relies on comparing addresses passed from other components. CFI checking can be disabled in a function with the __nocfi attribute. Additionally, CFI can be disabled for an entire compilation unit by filtering out CC_FLAGS_CFI. By default, CFI failures result in a kernel panic to stop a potential exploit. CONFIG_CFI_PERMISSIVE enables a permissive mode, where the kernel prints out a rate-limited warning instead, and allows execution to continue. This option is helpful for locating type mismatches, but should only be enabled during development. Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Tested-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20210408182843.1754385-2-samitolvanen@google.com
|
#
a72232ea |
|
29-Mar-2021 |
Vipin Sharma <vipinsh@google.com> |
cgroup: Add misc cgroup controller The Miscellaneous cgroup provides the resource limiting and tracking mechanism for the scalar resources which cannot be abstracted like the other cgroup resources. Controller is enabled by the CONFIG_CGROUP_MISC config option. A resource can be added to the controller via enum misc_res_type{} in the include/linux/misc_cgroup.h file and the corresponding name via misc_res_name[] in the kernel/cgroup/misc.c file. Provider of the resource must set its capacity prior to using the resource by calling misc_cg_set_capacity(). Once a capacity is set then the resource usage can be updated using charge and uncharge APIs. All of the APIs to interact with misc controller are in include/linux/misc_cgroup.h. Miscellaneous controller provides 3 interface files. If two misc resources (res_a and res_b) are registered then: misc.capacity A read-only flat-keyed file shown only in the root cgroup. It shows miscellaneous scalar resources available on the platform along with their quantities:: $ cat misc.capacity res_a 50 res_b 10 misc.current A read-only flat-keyed file shown in the non-root cgroups. It shows the current usage of the resources in the cgroup and its children:: $ cat misc.current res_a 3 res_b 0 misc.max A read-write flat-keyed file shown in the non root cgroups. Allowed maximum usage of the resources in the cgroup and its children.:: $ cat misc.max res_a max res_b 4 Limit can be set by:: # echo res_a 1 > misc.max Limit can be set to max by:: # echo res_a max > misc.max Limits can be set more than the capacity value in the misc.capacity file. Signed-off-by: Vipin Sharma <vipinsh@google.com> Reviewed-by: David Rientjes <rientjes@google.com> Signed-off-by: Tejun Heo <tj@kernel.org>
|
#
ea29b20a |
|
12-Mar-2021 |
Masahiro Yamada <masahiroy@kernel.org> |
init/Kconfig: make COMPILE_TEST depend on HAS_IOMEM I read the commit log of the following two: - bc083a64b6c0 ("init/Kconfig: make COMPILE_TEST depend on !UML") - 334ef6ed06fa ("init/Kconfig: make COMPILE_TEST depend on !S390") Both are talking about HAS_IOMEM dependency missing in many drivers. So, 'depends on HAS_IOMEM' seems the direct, sensible solution to me. This does not change the behavior of UML. UML still cannot enable COMPILE_TEST because it does not provide HAS_IOMEM. The current dependency for S390 is too strong. Under the condition of CONFIG_PCI=y, S390 provides HAS_IOMEM, hence can enable COMPILE_TEST. I also removed the meaningless 'default n'. Link: https://lkml.kernel.org/r/20210224140809.1067582-1-masahiroy@kernel.org Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Guenter Roeck <linux@roeck-us.net> Cc: Arnd Bergmann <arnd@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: KP Singh <kpsingh@google.com> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Terrell <terrelln@fb.com> Cc: Quentin Perret <qperret@google.com> Cc: Valentin Schneider <valentin.schneider@arm.com> Cc: "Enrico Weigelt, metux IT consult" <lkml@metux.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
ce6ed1c4 |
|
04-Mar-2021 |
Masahiro Yamada <masahiroy@kernel.org> |
kbuild: rebuild GCC plugins when the compiler is upgraded Linus reported a build error due to the GCC plugin incompatibility when the compiler is upgraded. [1] GCC plugins are tied to a particular GCC version. So, they must be rebuilt when the compiler is upgraded. This seems to be a long-standing flaw since the initial support of GCC plugins. Extend commit 8b59cd81dc5e ("kbuild: ensure full rebuild when the compiler is updated"), so that GCC plugins are covered by the compiler upgrade detection. [1]: https://lore.kernel.org/lkml/CAHk-=wieoN5ttOy7SnsGwZv+Fni3R6m-Ut=oxih6bbZ28G+4dw@mail.gmail.com/ Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Kees Cook <keescook@chromium.org>
|
#
a6aaeb84 |
|
25-Feb-2021 |
Masahiro Yamada <masahiroy@kernel.org> |
kbuild: fix UNUSED_KSYMS_WHITELIST for Clang LTO Commit fbe078d397b4 ("kbuild: lto: add a default list of used symbols") does not work as expected if the .config file has already specified CONFIG_UNUSED_KSYMS_WHITELIST="my/own/white/list" before enabling CONFIG_LTO_CLANG. So, the user-supplied whitelist and LTO-specific white list must be independent of each other. I refactored the shell script so CONFIG_MODVERSIONS and CONFIG_CLANG_LTO handle whitelists in the same way. Fixes: fbe078d397b4 ("kbuild: lto: add a default list of used symbols") Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
|
#
f9c8bc46 |
|
25-Feb-2021 |
Bhaskar Chowdhury <unixbhaskar@gmail.com> |
init/Kconfig: fix a typo in CC_VERSION_TEXT help text s/compier/compiler/ Link: https://lkml.kernel.org/r/20210224223325.29099-1-unixbhaskar@gmail.com Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
02aff859 |
|
15-Feb-2021 |
Masahiro Yamada <masahiroy@kernel.org> |
kbuild: check the minimum linker version in Kconfig Unify the two scripts/ld-version.sh and scripts/lld-version.sh, and check the minimum linker version like scripts/cc-version.sh did. I tested this script for some corner cases reported in the past: - GNU ld version 2.25-15.fc23 as reported by commit 8083013fc320 ("ld-version: Fix it on Fedora") - GNU ld (GNU Binutils) 2.20.1.20100303 as reported by commit 0d61ed17dd30 ("ld-version: Drop the 4th and 5th version components") This script show an error message if the linker is too old: $ make LD=ld.lld-9 SYNC include/config/auto.conf *** *** Linker is too old. *** Your LLD version: 9.0.1 *** Minimum LLD version: 10.0.1 *** scripts/Kconfig.include:50: Sorry, this linker is not supported. make[2]: *** [scripts/kconfig/Makefile:71: syncconfig] Error 1 make[1]: *** [Makefile:600: syncconfig] Error 2 make: *** [Makefile:708: include/config/auto.conf] Error 2 I also moved the check for gold to this script, so gold is still rejected: $ make LD=gold SYNC include/config/auto.conf gold linker is not supported as it is not capable of linking the kernel proper. scripts/Kconfig.include:50: Sorry, this linker is not supported. make[2]: *** [scripts/kconfig/Makefile:71: syncconfig] Error 1 make[1]: *** [Makefile:600: syncconfig] Error 2 make: *** [Makefile:708: include/config/auto.conf] Error 2 Thanks to David Laight for suggesting shell script improvements. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Acked-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Tested-by: Nathan Chancellor <nathan@kernel.org>
|
#
aec6c60a |
|
15-Jan-2021 |
Masahiro Yamada <masahiroy@kernel.org> |
kbuild: check the minimum compiler version in Kconfig Paul Gortmaker reported a regression in the GCC version check. [1] If you use GCC 4.8, the build breaks before showing the error message "error Sorry, your version of GCC is too old - please use 4.9 or newer." I do not want to apply his fix-up since it implies we would not be able to remove any cc-option test. Anyway, I admit checking the GCC version in <linux/compiler-gcc.h> is too late. Almost at the same time, Linus also suggested to move the compiler version error to Kconfig time. [2] I unified the two similar scripts, gcc-version.sh and clang-version.sh into cc-version.sh. The old scripts invoked the compiler multiple times (3 times for gcc-version.sh, 4 times for clang-version.sh). I refactored the code so the new one invokes the compiler just once, and also tried my best to use shell-builtin commands where possible. The new script runs faster. $ time ./scripts/clang-version.sh clang 120000 real 0m0.029s user 0m0.012s sys 0m0.021s $ time ./scripts/cc-version.sh clang Clang 120000 real 0m0.009s user 0m0.006s sys 0m0.004s cc-version.sh also shows an error message if the compiler is too old: $ make defconfig CC=clang-9 *** Default configuration is based on 'x86_64_defconfig' *** *** Compiler is too old. *** Your Clang version: 9.0.1 *** Minimum Clang version: 10.0.1 *** scripts/Kconfig.include:46: Sorry, this compiler is not supported. make[1]: *** [scripts/kconfig/Makefile:81: defconfig] Error 1 make: *** [Makefile:602: defconfig] Error 2 The new script takes care of ICC because we have <linux/compiler-intel.h> although I am not sure if building the kernel with ICC is well-supported. [1]: https://lore.kernel.org/r/20210110190807.134996-1-paul.gortmaker@windriver.com [2]: https://lore.kernel.org/r/CAHk-=wh-+TMHPTFo1qs-MYyK7tZh-OQovA=pP3=e06aCVp6_kA@mail.gmail.com Fixes: 87de84c9140e ("kbuild: remove cc-option test of -Werror=date-time") Reported-by: Paul Gortmaker <paul.gortmaker@windriver.com> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Tested-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Nathan Chancellor <natechancellor@gmail.com> Tested-by: Nathan Chancellor <natechancellor@gmail.com> Reviewed-by: Miguel Ojeda <ojeda@kernel.org> Tested-by: Miguel Ojeda <ojeda@kernel.org> Tested-by: Sedat Dilek <sedat.dilek@gmail.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
#
fe2cce15 |
|
24-Feb-2021 |
Vlastimil Babka <vbabka@suse.cz> |
mm, slub: remove slub_memcg_sysfs boot param and CONFIG_SLUB_MEMCG_SYSFS_ON The boot param and config determine the value of memcg_sysfs_enabled, which is unused since commit 10befea91b61 ("mm: memcg/slab: use a single set of kmem_caches for all allocations") as there are no per-memcg kmem caches anymore. Link: https://lkml.kernel.org/r/20210127124745.7928-1-vbabka@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: David Hildenbrand <david@redhat.com> Acked-by: Roman Gushchin <guro@fb.com> Acked-by: David Rientjes <rientjes@google.com> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
a555bdd0 |
|
24-Feb-2021 |
Linus Torvalds <torvalds@linux-foundation.org> |
Kbuild: enable TRIM_UNUSED_KSYMS again, with some guarding In commit 5cf0fd591f2e ("Kbuild: disable TRIM_UNUSED_KSYMS option") I disabled this option because it's hugely expensive at build time, and I questioned how much use it gets. Several people piped up and convinced me it's actually useful, so instead of disabling it entirely, it now depends on EXPERT and gets disabled by COMPILE_TEST builds so that 'allmodconfig' style things don't enable it. I still hope somebody will take a look at the build time issue, because as Arnd also noted: "However, the combination of thinlto and trim indeed has a steep cost in compile time, taking almost twice as long as a normal defconfig (gc-sections makes it slightly faster)" Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Arnd Bergmann <arnd@kernel.org> Cc: Jessica Yu <jeyu@kernel.org> Cc: Cristoph Hellwig <hch@lst.de>, Cc: Miroslav Benes <mbenes@suse.cz> Cc: Emil Velikov <emil.l.velikov@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
5cf0fd59 |
|
23-Feb-2021 |
Linus Torvalds <torvalds@linux-foundation.org> |
Kbuild: disable TRIM_UNUSED_KSYMS option The removal of EXPORT_UNUSED_SYMBOL() in commit 367948220fce looks like (and was sold as) a no-op, but it actually had a rather serious and subtle side effect: the UNUSED_SYMBOLS option not only enabled the removed (unused) functionality, it also _disabled_ the TRIM_UNUSED_KSYMS functionality. And it turns out that TRIM_UNUSED_KSYMS is a huge time waste, and takes up a third of the kernel build time for me. For no actual upside, since no distro is likely to ever be able to enable it (because they all support external kernel modules). Rather than re-enable EXPORT_UNUSED_SYMBOL, this just disables the TRIM_UNUSED_KSYMS option by marking it broken. I'm tempted to just remove the support entirely, but maybe somebody has a use-case and can fix the behavior of it. I could have just disabled it for COMPILE_TEST, but it really smells like the TRIM_UNUSED_KSYMS option is badly done and not really useful, so this takes the more direct approach - let's see if anybody ever actually notices or complains. Cc: Miroslav Benes <mbenes@suse.cz> Cc: Emil Velikov <emil.l.velikov@gmail.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Jessica Yu <jeyu@kernel.org> Fixes: 367948220fce ("module: remove EXPORT_UNUSED_SYMBOL*") Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
36794822 |
|
02-Feb-2021 |
Christoph Hellwig <hch@lst.de> |
module: remove EXPORT_UNUSED_SYMBOL* EXPORT_UNUSED_SYMBOL* is not actually used anywhere. Remove the unused functionality as we generally just remove unused code anyway. Reviewed-by: Miroslav Benes <mbenes@suse.cz> Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jessica Yu <jeyu@kernel.org>
|
#
fbe078d3 |
|
11-Dec-2020 |
Sami Tolvanen <samitolvanen@google.com> |
kbuild: lto: add a default list of used symbols With CONFIG_LTO_CLANG, LLVM bitcode has not yet been compiled into a binary when the .mod files are generated, which means they don't yet contain references to certain symbols that will be present in the final binaries. This includes intrinsic functions, such as memcpy, memmove, and memset [1], and stack protector symbols [2]. This change adds a default symbol list to use with CONFIG_TRIM_UNUSED_KSYMS when Clang's LTO is used. [1] https://llvm.org/docs/LangRef.html#standard-c-c-library-intrinsics [2] https://llvm.org/docs/LangRef.html#llvm-stackprotector-intrinsic Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20201211184633.3213045-7-samitolvanen@google.com
|
#
bfe3911a |
|
05-Feb-2021 |
Chris Wilson <chris@chris-wilson.co.uk> |
kcmp: Support selection of SYS_kcmp without CHECKPOINT_RESTORE Userspace has discovered the functionality offered by SYS_kcmp and has started to depend upon it. In particular, Mesa uses SYS_kcmp for os_same_file_description() in order to identify when two fd (e.g. device or dmabuf) point to the same struct file. Since they depend on it for core functionality, lift SYS_kcmp out of the non-default CONFIG_CHECKPOINT_RESTORE into the selectable syscall category. Rasmus Villemoes also pointed out that systemd uses SYS_kcmp to deduplicate the per-service file descriptor store. Note that some distributions such as Ubuntu are already enabling CHECKPOINT_RESTORE in their configs and so, by extension, SYS_kcmp. References: https://gitlab.freedesktop.org/drm/intel/-/issues/3046 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Kees Cook <keescook@chromium.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Will Drewry <wad@chromium.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dave Airlie <airlied@gmail.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Lucas Stach <l.stach@pengutronix.de> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: stable@vger.kernel.org Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> # DRM depends on kcmp Acked-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> # systemd uses kcmp Reviewed-by: Cyrill Gorcunov <gorcunov@gmail.com> Reviewed-by: Kees Cook <keescook@chromium.org> Acked-by: Thomas Zimmermann <tzimmermann@suse.de> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20210205220012.1983-1-chris@chris-wilson.co.uk
|
#
f8408264 |
|
14-Jan-2021 |
Viresh Kumar <viresh.kumar@linaro.org> |
drivers: Remove CONFIG_OPROFILE support The "oprofile" user-space tools don't use the kernel OPROFILE support any more, and haven't in a long time. User-space has been converted to the perf interfaces. Remove kernel's old oprofile support. Suggested-by: Christoph Hellwig <hch@infradead.org> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Robert Richter <rric@kernel.org> Acked-by: Paul E. McKenney <paulmck@kernel.org> #RCU Acked-by: William Cohen <wcohen@redhat.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Thomas Gleixner <tglx@linutronix.de>
|
#
432900f8 |
|
26-Jan-2021 |
Yue Hu <huyue2@yulong.com> |
init/Kconfig: Correct thermal pressure help text We're using arch_scale_thermal_pressure() to retrieve per CPU thermal pressure. Signed-off-by: Yue Hu <huyue2@yulong.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Link: https://lkml.kernel.org/r/20210127054451.1240-1-zbestahu@gmail.com
|
#
55b6f763 |
|
04-Feb-2021 |
Johannes Berg <johannes.berg@intel.com> |
init/gcov: allow CONFIG_CONSTRUCTORS on UML to fix module gcov On ARCH=um, loading a module doesn't result in its constructors getting called, which breaks module gcov since the debugfs files are never registered. On the other hand, in-kernel constructors have already been called by the dynamic linker, so we can't call them again. Get out of this conundrum by allowing CONFIG_CONSTRUCTORS to be selected, but avoiding the in-kernel constructor calls. Also remove the "if !UML" from GCOV selecting CONSTRUCTORS now, since we really do want CONSTRUCTORS, just not kernel binary ones. Link: https://lkml.kernel.org/r/20210120172041.c246a2cac2fb.I1358f584b76f1898373adfed77f4462c8705b736@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com> Reviewed-by: Peter Oberparleiter <oberpar@linux.ibm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Jessica Yu <jeyu@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
d5750cd3 |
|
19-Nov-2020 |
Nathan Chancellor <nathan@kernel.org> |
kbuild: Disable CONFIG_LD_ORPHAN_WARN for ld.lld 10.0.1 ld.lld 10.0.1 spews a bunch of various warnings about .rela sections, along with a few others. Newer versions of ld.lld do not have these warnings. As a result, do not add '--orphan-handling=warn' to LDFLAGS_vmlinux if ld.lld's version is not new enough. Link: https://github.com/ClangBuiltLinux/linux/issues/1187 Link: https://github.com/ClangBuiltLinux/linux/issues/1193 Reported-by: Arvind Sankar <nivedita@alum.mit.edu> Reported-by: kernelci.org bot <bot@kernelci.org> Reported-by: Mark Brown <broonie@kernel.org> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
#
59612b24 |
|
19-Nov-2020 |
Nathan Chancellor <nathan@kernel.org> |
kbuild: Hoist '--orphan-handling' into Kconfig Currently, '--orphan-handling=warn' is spread out across four different architectures in their respective Makefiles, which makes it a little unruly to deal with in case it needs to be disabled for a specific linker version (in this case, ld.lld 10.0.1). To make it easier to control this, hoist this warning into Kconfig and the main Makefile so that disabling it is simpler, as the warning will only be enabled in a couple places (main Makefile and a couple of compressed boot folders that blow away LDFLAGS_vmlinx) and making it conditional is easier due to Kconfig syntax. One small additional benefit of this is saving a call to ld-option on incremental builds because we will have already evaluated it for CONFIG_LD_ORPHAN_WARN. To keep the list of supported architectures the same, introduce CONFIG_ARCH_WANT_LD_ORPHAN_WARN, which an architecture can select to gain this automatically after all of the sections are specified and size asserted. A special thanks to Kees Cook for the help text on this config. Link: https://github.com/ClangBuiltLinux/linux/issues/1187 Acked-by: Kees Cook <keescook@chromium.org> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Tested-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
#
334ef6ed |
|
18-Nov-2020 |
Heiko Carstens <hca@linux.ibm.com> |
init/Kconfig: make COMPILE_TEST depend on !S390 While allmodconfig and allyesconfig build for s390 there are also various bots running compile tests with randconfig, where PCI is disabled. This reveals that a lot of drivers should actually depend on HAS_IOMEM. Adding this to each device driver would be a never ending story, therefore just disable COMPILE_TEST for s390. The reasoning is more or less the same as described in commit bc083a64b6c0 ("init/Kconfig: make COMPILE_TEST depend on !UML"). Reported-by: kernel test robot <lkp@intel.com> Suggested-by: Arnd Bergmann <arnd@kernel.org> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
#
0f7636e1 |
|
11-Aug-2020 |
Paul Menzel <pmenzel@molgen.mpg.de> |
init/Kconfig: Fix CPU number in LOG_CPU_MAX_BUF_SHIFT description Currently, LOG_BUF_SHIFT defaults to 17, which is 2 ^ 17 bytes = 128 KB, and LOG_CPU_MAX_BUF_SHIFT defaults to 12, which is 2 ^ 12 bytes = 4 KB. Half of 128 KB is 64 KB, so more than 16 CPUs are required for the value to be used, as then the sum of contributions is greater than 64 KB for the first time. My guess is, that the description was written with the configuration values used in the SUSE in mind. Fixes: 23b2899f7f194f06e ("printk: allow increasing the ring buffer depending on the number of CPUs") Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: linux-kernel@vger.kernel.org Signed-off-by: Paul Menzel <pmenzel@molgen.mpg.de> Reviewed-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20200811092924.6256-1-pmenzel@molgen.mpg.de
|
#
dd19d293 |
|
12-Aug-2020 |
Stephen Kitt <steve@sk2.org> |
Fix references to nommu-mmap.rst nommu-mmap.rst was moved to Documentation/admin-guide/mm; this patch updates the remaining stale references to Documentation/mm. Fixes: 800c02f5d030 ("docs: move nommu-mmap.txt to admin-guide and rename to ReST") Signed-off-by: Stephen Kitt <steve@sk2.org> Link: https://lore.kernel.org/r/20200812092230.27541-1-steve@sk2.org Signed-off-by: Jonathan Corbet <corbet@lwn.net>
|
#
550c10d2 |
|
12-Aug-2020 |
John Ogness <john.ogness@linutronix.de> |
printk: reduce LOG_BUF_SHIFT range for H8300 The .bss section for the h8300 is relatively small. A value of CONFIG_LOG_BUF_SHIFT that is larger than 19 will create a static printk ringbuffer that is too large. Limit the range appropriately for the H8300. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20200812073122.25412-1-john.ogness@linutronix.de
|
#
1e6c62a8 |
|
27-Aug-2020 |
Alexei Starovoitov <ast@kernel.org> |
bpf: Introduce sleepable BPF programs Introduce sleepable BPF programs that can request such property for themselves via BPF_F_SLEEPABLE flag at program load time. In such case they will be able to use helpers like bpf_copy_from_user() that might sleep. At present only fentry/fexit/fmod_ret and lsm programs can request to be sleepable and only when they are attached to kernel functions that are known to allow sleeping. The non-sleepable programs are relying on implicit rcu_read_lock() and migrate_disable() to protect life time of programs, maps that they use and per-cpu kernel structures used to pass info between bpf programs and the kernel. The sleepable programs cannot be enclosed into rcu_read_lock(). migrate_disable() maps to preempt_disable() in non-RT kernels, so the progs should not be enclosed in migrate_disable() as well. Therefore rcu_read_lock_trace is used to protect the life time of sleepable progs. There are many networking and tracing program types. In many cases the 'struct bpf_prog *' pointer itself is rcu protected within some other kernel data structure and the kernel code is using rcu_dereference() to load that program pointer and call BPF_PROG_RUN() on it. All these cases are not touched. Instead sleepable bpf programs are allowed with bpf trampoline only. The program pointers are hard-coded into generated assembly of bpf trampoline and synchronize_rcu_tasks_trace() is used to protect the life time of the program. The same trampoline can hold both sleepable and non-sleepable progs. When rcu_read_lock_trace is held it means that some sleepable bpf program is running from bpf trampoline. Those programs can use bpf arrays and preallocated hash/lru maps. These map types are waiting on programs to complete via synchronize_rcu_tasks_trace(); Updates to trampoline now has to do synchronize_rcu_tasks_trace() and synchronize_rcu_tasks() to wait for sleepable progs to finish and for trampoline assembly to finish. This is the first step of introducing sleepable progs. Eventually dynamically allocated hash maps can be allowed and networking program types can become sleepable too. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Acked-by: Andrii Nakryiko <andriin@fb.com> Acked-by: KP Singh <kpsingh@google.com> Link: https://lore.kernel.org/bpf/20200827220114.69225-3-alexei.starovoitov@gmail.com
|
#
d71fa5c9 |
|
18-Aug-2020 |
Alexei Starovoitov <ast@kernel.org> |
bpf: Add kernel module with user mode driver that populates bpffs. Add kernel module with user mode driver that populates bpffs with BPF iterators. $ mount bpffs /my/bpffs/ -t bpf $ ls -la /my/bpffs/ total 4 drwxrwxrwt 2 root root 0 Jul 2 00:27 . drwxr-xr-x 19 root root 4096 Jul 2 00:09 .. -rw------- 1 root root 0 Jul 2 00:27 maps.debug -rw------- 1 root root 0 Jul 2 00:27 progs.debug The user mode driver will load BPF Type Formats, create BPF maps, populate BPF maps, load two BPF programs, attach them to BPF iterators, and finally send two bpf_link IDs back to the kernel. The kernel will pin two bpf_links into newly mounted bpffs instance under names "progs.debug" and "maps.debug". These two files become human readable. $ cat /my/bpffs/progs.debug id name attached 11 dump_bpf_map bpf_iter_bpf_map 12 dump_bpf_prog bpf_iter_bpf_prog 27 test_pkt_access 32 test_main test_pkt_access test_pkt_access 33 test_subprog1 test_pkt_access_subprog1 test_pkt_access 34 test_subprog2 test_pkt_access_subprog2 test_pkt_access 35 test_subprog3 test_pkt_access_subprog3 test_pkt_access 36 new_get_skb_len get_skb_len test_pkt_access 37 new_get_skb_ifindex get_skb_ifindex test_pkt_access 38 new_get_constant get_constant test_pkt_access The BPF program dump_bpf_prog() in iterators.bpf.c is printing this data about all BPF programs currently loaded in the system. This information is unstable and will change from kernel to kernel as ".debug" suffix conveys. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200819042759.51280-4-alexei.starovoitov@gmail.com
|
#
3404be67 |
|
07-Aug-2020 |
Kees Cook <keescook@chromium.org> |
mm/slab: expand CONFIG_SLAB_FREELIST_HARDENED to include SLAB Patch series "mm: Expand CONFIG_SLAB_FREELIST_HARDENED to include SLAB" In reviewing Vlastimil Babka's latest slub debug series, I realized[1] that several checks under CONFIG_SLAB_FREELIST_HARDENED weren't being applied to SLAB. Fix this by expanding the Kconfig coverage, and adding a simple double-free test for SLAB. This patch (of 2): Include SLAB caches when performing kmem_cache pointer verification. A defense against such corruption[1] should be applied to all the allocators. With this added, the "SLAB_FREE_CROSS" and "SLAB_FREE_PAGE" LKDTM tests now pass on SLAB: lkdtm: Performing direct entry SLAB_FREE_CROSS lkdtm: Attempting cross-cache slab free ... ------------[ cut here ]------------ cache_from_obj: Wrong slab cache. lkdtm-heap-b but object is from lkdtm-heap-a WARNING: CPU: 2 PID: 2195 at mm/slab.h:530 kmem_cache_free+0x8d/0x1d0 ... lkdtm: Performing direct entry SLAB_FREE_PAGE lkdtm: Attempting non-Slab slab free ... ------------[ cut here ]------------ virt_to_cache: Object is not a Slab page! WARNING: CPU: 1 PID: 2202 at mm/slab.h:489 kmem_cache_free+0x196/0x1d0 Additionally clean up neighboring Kconfig entries for clarity, readability, and redundant option removal. [1] https://github.com/ThomasKing2014/slides/raw/master/Building%20universal%20Android%20rooting%20with%20a%20type%20confusion%20vulnerability.pdf Fixes: 598a0717a816 ("mm/slab: validate cache membership under freelist hardening") Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Alexander Popov <alex.popov@linux.com> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Jann Horn <jannh@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Matthew Garrett <mjg59@google.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Roman Gushchin <guro@fb.com> Cc: Vijayanand Jitta <vjitta@codeaurora.org> Cc: Vinayak Menon <vinmenon@codeaurora.org> Link: http://lkml.kernel.org/r/20200625215548.389774-1-keescook@chromium.org Link: http://lkml.kernel.org/r/20200625215548.389774-2-keescook@chromium.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
48f7ddf7 |
|
30-Jul-2020 |
Nick Terrell <terrelln@fb.com> |
init: Add support for zstd compressed kernel - Add the zstd and zstd22 cmds to scripts/Makefile.lib - Add the HAVE_KERNEL_ZSTD and KERNEL_ZSTD options Architecture specific support is still needed for decompression. Signed-off-by: Nick Terrell <terrelln@fb.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Sedat Dilek <sedat.dilek@gmail.com> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20200730190841.2071656-4-nickrterrell@gmail.com
|
#
fcd7c9c3 |
|
29-Jul-2020 |
Valentin Schneider <valentin.schneider@arm.com> |
arm, arm64: Fix selection of CONFIG_SCHED_THERMAL_PRESSURE Qian reported that the current setup forgoes the Kconfig dependencies and results in warnings such as: WARNING: unmet direct dependencies detected for SCHED_THERMAL_PRESSURE Depends on [n]: SMP [=y] && CPU_FREQ_THERMAL [=n] Selected by [y]: - ARM64 [=y] Revert commit e17ae7fea871 ("arm, arm64: Select CONFIG_SCHED_THERMAL_PRESSURE") and re-implement it by making the option default to 'y' for arm64 and arm, which respects Kconfig dependencies (i.e. will remain 'n' if CPU_FREQ_THERMAL=n). Fixes: e17ae7fea871 ("arm, arm64: Select CONFIG_SCHED_THERMAL_PRESSURE") Reported-by: Qian Cai <cai@lca.pw> Signed-off-by: Valentin Schneider <valentin.schneider@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200729135718.1871-1-valentin.schneider@arm.com
|
#
98eb401d |
|
12-Jul-2020 |
Valentin Schneider <valentin.schneider@arm.com> |
sched: Cleanup SCHED_THERMAL_PRESSURE kconfig entry As Russell pointed out [1], this option is severely lacking in the documentation department, and figuring out if one has the required dependencies to benefit from turning it on is not straightforward. Make it non user-visible, and add a bit of help to it. While at it, make it depend on CPU_FREQ_THERMAL. [1]: https://lkml.kernel.org/r/20200603173150.GB1551@shell.armlinux.org.uk Signed-off-by: Valentin Schneider <valentin.schneider@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200712165917.9168-3-valentin.schneider@arm.com
|
#
b816b3db |
|
30-Jun-2020 |
Masahiro Yamada <masahiroy@kernel.org> |
kbuild: fix CONFIG_CC_CAN_LINK(_STATIC) for cross-compilation with Clang scripts/cc-can-link.sh tests if the compiler can link userspace programs. When $(CC) is GCC, it is checked against the target architecture because the toolchain prefix is specified as a part of $(CC). When $(CC) is Clang, it is checked against the host architecture because --target option is missing. Pass $(CLANG_FLAGS) to scripts/cc-can-link.sh to evaluate the link capability for the target architecture. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
|
#
800c02f5 |
|
23-Jun-2020 |
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> |
docs: move nommu-mmap.txt to admin-guide and rename to ReST The nommu-mmap.txt file provides description of user visible behaviuour. So, move it to the admin-guide. As it is already at the ReST, also rename it. Suggested-by: Mike Rapoport <rppt@linux.ibm.com> Suggested-by: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Link: https://lore.kernel.org/r/3a63d1833b513700755c85bf3bda0a6c4ab56986.1592918949.git.mchehab+huawei@kernel.org Signed-off-by: Jonathan Corbet <corbet@lwn.net>
|
#
a7f7f624 |
|
13-Jun-2020 |
Masahiro Yamada <masahiroy@kernel.org> |
treewide: replace '---help---' in Kconfig files with 'help' Since commit 84af7a6194e4 ("checkpatch: kconfig: prefer 'help' over '---help---'"), the number of '---help---' has been gradually decreasing, but there are still more than 2400 instances. This commit finishes the conversion. While I touched the lines, I also fixed the indentation. There are a variety of indentation styles found. a) 4 spaces + '---help---' b) 7 spaces + '---help---' c) 8 spaces + '---help---' d) 1 space + 1 tab + '---help---' e) 1 tab + '---help---' (correct indentation) f) 1 tab + 1 space + '---help---' g) 1 tab + 2 spaces + '---help---' In order to convert all of them to 1 tab + 'help', I ran the following commend: $ find . -name 'Kconfig*' | xargs sed -i 's/^[[:space:]]*---help---/\thelp/' Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
#
587f1701 |
|
14-Feb-2020 |
Nick Desaulniers <ndesaulniers@google.com> |
Kconfig: add config option for asm goto w/ outputs This allows C code to make use of compilers with support for output variables along the fallthrough path via preprocessor define: CONFIG_CC_HAS_ASM_GOTO_OUTPUT [ This is not used anywhere yet, and currently released compilers don't support this yet, but it's coming, and I have some local experimental patches to take advantage of it when it does - Linus ] Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
ada4ab7a |
|
04-Jun-2020 |
Chris Down <chris@chrisdown.name> |
init: allow distribution configuration of default init Some init systems (eg. systemd) have init at their own paths, for example, /usr/lib/systemd/systemd. A compatibility symlink to one of the hardcoded init paths is provided by another package, usually named something like systemd-sysvcompat or similar. Currently distro maintainers who are hands-off on the bootloader are more or less required to include those compatibility links as part of their base distribution, because it's hard to migrate away from them since there's a risk some users will not get the message to set init= on the kernel command line appropriately. Moreover, for distributions where the init system is something the distribution itself is opinionated about (eg. Arch, which has systemd in the required `base` package), we could usually reasonably configure this ahead of time when building the distribution kernel. However, we currently simply don't have any way to configure the kernel to do this. Here's an example discussion where removing sysvcompat was discussed by distro maintainers[0]. This patch adds a new Kconfig tunable, CONFIG_DEFAULT_INIT, which if set is tried before the hardcoded fallback list. So the order of precedence is now thus: 1. init= on command line (on failure: panic) 2. CONFIG_DEFAULT_INIT (on failure: try #3) 3. Hardcoded fallback list (on failure: panic) This new config parameter will allow distribution maintainers to move away from these compatibility links safely, without having to worry that their users might not have the right init=. There are also two other benefits of this over having the distribution maintain a symlink: 1. One of the value propositions over simply having distributions maintain a /sbin/init symlink via a package is that it also frees distributions which have a preferred default, but not mandatory, init system from having their package manager fight with their users for control of /{s,}bin/init. Instead, the distribution simply makes their preference known in CONFIG_DEFAULT_INIT, and if the user installs another init system and uninstalls the default one they can still make use of /{s,}bin/init and friends for their own uses. This makes more cases Just Work(tm) without the user having to perform extra configuration via init=. 2. Since before this we don't know which path the distribution actually _intends_ to serve init from, we don't pr_err if it is simply missing, and usually will just silently put the user in a /bin/sh shell. Now that the distribution can make a declaration of intent, we can be more vocal when this init system fails to launch for any reason, even if it's simply because no file exists at that location, speeding up the palaver of init/mount dependency/etc debugging a bit. [0]: https://lists.archlinux.org/pipermail/arch-dev-public/2019-January/029435.html Signed-off-by: Chris Down <chris@chrisdown.name> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Link: http://lkml.kernel.org/r/20200522160234.GA1487022@chrisdown.name Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
2d1c4980 |
|
03-Jun-2020 |
Johannes Weiner <hannes@cmpxchg.org> |
mm: memcontrol: make swap tracking an integral part of memory control Without swap page tracking, users that are otherwise memory controlled can easily escape their containment and allocate significant amounts of memory that they're not being charged for. That's because swap does readahead, but without the cgroup records of who owned the page at swapout, readahead pages don't get charged until somebody actually faults them into their page table and we can identify an owner task. This can be maliciously exploited with MADV_WILLNEED, which triggers arbitrary readahead allocations without charging the pages. Make swap swap page tracking an integral part of memcg and remove the Kconfig options. In the first place, it was only made configurable to allow users to save some memory. But the overhead of tracking cgroup ownership per swap page is minimal - 2 byte per page, or 512k per 1G of swap, or 0.04%. Saving that at the expense of broken containment semantics is not something we should present as a coequal option. The swapaccount=0 boot option will continue to exist, and it will eliminate the page_counter overhead and hide the swap control files, but it won't disable swap slot ownership tracking. This patch makes sure we always have the cgroup records at swapin time; the next patch will fix the actual bug by charging readahead swap pages at swapin time rather than at fault time. v2: fix double swap charge bug in cgroup1/cgroup2 code gating [hannes@cmpxchg.org: fix crash with cgroup_disable=memory] Link: http://lkml.kernel.org/r/20200521215855.GB815153@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Alex Shi <alex.shi@linux.alibaba.com> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Roman Gushchin <guro@fb.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Naresh Kamboju <naresh.kamboju@linaro.org> Link: http://lkml.kernel.org/r/20200508183105.225460-16-hannes@cmpxchg.org Debugged-by: Hugh Dickins <hughd@google.com> Debugged-by: Michal Hocko <mhocko@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
c73be61c |
|
14-Jan-2020 |
David Howells <dhowells@redhat.com> |
pipe: Add general notification queue support Make it possible to have a general notification queue built on top of a standard pipe. Notifications are 'spliced' into the pipe and then read out. splice(), vmsplice() and sendfile() are forbidden on pipes used for notifications as post_one_notification() cannot take pipe->mutex. This means that notifications could be posted in between individual pipe buffers, making iov_iter_revert() difficult to effect. The way the notification queue is used is: (1) An application opens a pipe with a special flag and indicates the number of messages it wishes to be able to queue at once (this can only be set once): pipe2(fds, O_NOTIFICATION_PIPE); ioctl(fds[0], IOC_WATCH_QUEUE_SET_SIZE, queue_depth); (2) The application then uses poll() and read() as normal to extract data from the pipe. read() will return multiple notifications if the buffer is big enough, but it will not split a notification across buffers - rather it will return a short read or EMSGSIZE. Notification messages include a length in the header so that the caller can split them up. Each message has a header that describes it: struct watch_notification { __u32 type:24; __u32 subtype:8; __u32 info; }; The type indicates the source (eg. mount tree changes, superblock events, keyring changes, block layer events) and the subtype indicates the event type (eg. mount, unmount; EIO, EDQUOT; link, unlink). The info field indicates a number of things, including the entry length, an ID assigned to a watchpoint contributing to this buffer and type-specific flags. Supplementary data, such as the key ID that generated an event, can be attached in additional slots. The maximum message size is 127 bytes. Messages may not be padded or aligned, so there is no guarantee, for example, that the notification type will be on a 4-byte bounary. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
b1183b6d |
|
09-May-2020 |
Masahiro Yamada <masahiroy@kernel.org> |
bpfilter: check if $(CC) can link static libc in Kconfig On Fedora, linking static glibc requires the glibc-static RPM package, which is not part of the glibc-devel package. CONFIG_CC_CAN_LINK does not check the capability of static linking, so you can enable CONFIG_BPFILTER_UMH, then fail to build: HOSTLD net/bpfilter/bpfilter_umh /usr/bin/ld: cannot find -lc collect2: error: ld returned 1 exit status Add CONFIG_CC_CAN_LINK_STATIC, and make CONFIG_BPFILTER_UMH depend on it. Reported-by: Valdis Kletnieks <valdis.kletnieks@vt.edu> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Acked-by: Alexei Starovoitov <ast@kernel.org>
|
#
9371f86e |
|
28-Apr-2020 |
Masahiro Yamada <masahiroy@kernel.org> |
bpfilter: match bit size of bpfilter_umh to that of the kernel bpfilter_umh is built for the default machine bit of the compiler, which may not match to the bit size of the kernel. This happens in the scenario below: You can use biarch GCC that defaults to 64-bit for building the 32-bit kernel. In this case, Kbuild passes -m32 to teach the compiler to produce 32-bit kernel space objects. However, it is missing when building bpfilter_umh. It is built as a 64-bit ELF, and then embedded into the 32-bit kernel. The 32-bit kernel and 64-bit umh is a bad combination. In theory, we can have 32-bit umh running on 64-bit kernel, but we do not have a good reason to support such a usecase. The best is to match the bit size between them. Pass -m32 or -m64 to the umh build command if it is found in $(KBUILD_CFLAGS). Evaluate CC_CAN_LINK against the kernel bit-size. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
#
0ebeea8c |
|
14-May-2020 |
Daniel Borkmann <daniel@iogearbox.net> |
bpf: Restrict bpf_probe_read{, str}() only to archs where they work Given the legacy bpf_probe_read{,str}() BPF helpers are broken on archs with overlapping address ranges, we should really take the next step to disable them from BPF use there. To generally fix the situation, we've recently added new helper variants bpf_probe_read_{user,kernel}() and bpf_probe_read_{user,kernel}_str(). For details on them, see 6ae08ae3dea2 ("bpf: Add probe_read_{user, kernel} and probe_read_{user,kernel}_str helpers"). Given bpf_probe_read{,str}() have been around for ~5 years by now, there are plenty of users at least on x86 still relying on them today, so we cannot remove them entirely w/o breaking the BPF tracing ecosystem. However, their use should be restricted to archs with non-overlapping address ranges where they are working in their current form. Therefore, move this behind a CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE and have x86, arm64, arm select it (other archs supporting it can follow-up on it as well). For the remaining archs, they can workaround easily by relying on the feature probe from bpftool which spills out defines that can be used out of BPF C code to implement the drop-in replacement for old/new kernels via: bpftool feature probe macro Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Brendan Gregg <brendan.d.gregg@gmail.com> Cc: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/bpf/20200515101118.6508-2-daniel@iogearbox.net
|
#
b744b43f |
|
28-Apr-2020 |
Sami Tolvanen <samitolvanen@google.com> |
kbuild: add CONFIG_LD_IS_LLD Similarly to the CC_IS_CLANG config, add LD_IS_LLD to avoid GNU ld specific logic such as ld-version or ld-ifversion and gain the ability to select potential features that depend on the linker at configuration time such as LTO. Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Acked-by: Masahiro Yamada <masahiroy@kernel.org> [nc: Reword commit message] Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Tested-by: Sedat Dilek <sedat.dilek@gmail.com> Reviewed-by: Sedat Dilek <sedat.dilek@gmail.com> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
|
#
8b59cd81 |
|
23-Apr-2020 |
Masahiro Yamada <masahiroy@kernel.org> |
kbuild: ensure full rebuild when the compiler is updated Commit 21c54b774744 ("kconfig: show compiler version text in the top comment") added the environment variable, CC_VERSION_TEXT in the comment of the top Kconfig file. It can detect the compiler update, and invoke the syncconfig because all environment variables referenced in Kconfig files are recorded in include/config/auto.conf.cmd This commit makes it a CONFIG option in order to ensure the full rebuild when the compiler is updated. This works like follows: include/config/kconfig.h contains "CONFIG_CC_VERSION_TEXT" in the comment block. The top Makefile specifies "-include $(srctree)/include/linux/kconfig.h" to guarantee it is included from all kernel source files. fixdep parses every source file and all headers included from it, searching for words prefixed with "CONFIG_". Then, fixdep finds CONFIG_CC_VERSION_TEXT in include/config/kconfig.h and adds include/config/cc/version/text.h into every .*.cmd file. When the compiler is updated, syncconfig is invoked because init/Kconfig contains the reference to the environment variable CC_VERTION_TEXT. CONFIG_CC_VERSION_TEXT is updated to the new version string, and include/config/cc/version/text.h is touched. In the next rebuild, Make will rebuild every files since the timestamp of include/config/cc/version/text.h is newer than that of target. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
#
e33ae3ed |
|
23-Apr-2020 |
Masahiro Yamada <masahiroy@kernel.org> |
kbuild: use $(CC_VERSION_TEXT) to evaluate CC_IS_GCC and CC_IS_CLANG The result of '$(CC) --version | head -n 1' has already been computed by the top Makefile, and stored in the environment variable, CC_VERSION_TEXT. 'echo' is cheaper than the two commands $(CC) and 'head' although this optimization is not noticeable level. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Nathan Chancellor <natechancellor@gmail.com> Tested-by: Nathan Chancellor <natechancellor@gmail.com>
|
#
78a5255f |
|
09-May-2020 |
Linus Torvalds <torvalds@linux-foundation.org> |
Stop the ad-hoc games with -Wno-maybe-initialized We have some rather random rules about when we accept the "maybe-initialized" warnings, and when we don't. For example, we consider it unreliable for gcc versions < 4.9, but also if -O3 is enabled, or if optimizing for size. And then various kernel config options disabled it, because they know that they trigger that warning by confusing gcc sufficiently (ie PROFILE_ALL_BRANCHES). And now gcc-10 seems to be introducing a lot of those warnings too, so it falls under the same heading as 4.9 did. At the same time, we have a very straightforward way to _enable_ that warning when wanted: use "W=2" to enable more warnings. So stop playing these ad-hoc games, and just disable that warning by default, with the known and straight-forward "if you want to work on the extra compiler warnings, use W=123". Would it be great to have code that is always so obvious that it never confuses the compiler whether a variable is used initialized or not? Yes, it would. In a perfect world, the compilers would be smarter, and our source code would be simpler. That's currently not the world we live in, though. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
5429ef62 |
|
22-Jan-2020 |
Will Deacon <will@kernel.org> |
compiler/gcc: Raise minimum GCC version for kernel builds to 4.8 It is very rare to see versions of GCC prior to 4.8 being used to build the mainline kernel. These old compilers are also know to have codegen issues which can lead to silent miscompilation: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58145 Raise the minimum GCC version for kernel build to 4.8 and remove some tautological Kconfig dependencies as a consequence. Cc: Masahiro Yamada <masahiroy@kernel.org> Acked-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Will Deacon <will@kernel.org>
|
#
7baf2199 |
|
06-Apr-2020 |
Krzysztof Kozlowski <krzk@kernel.org> |
init/Kconfig: clean up ANON_INODES and old IO schedulers options CONFIG_ANON_INODES is gone since commit 5dd50aaeb185 ("Make anon_inodes unconditional"). CONFIG_CFQ_GROUP_IOSCHED was replaced with CONFIG_BFQ_GROUP_IOSCHED in commit f382fb0bcef4 ("block: remove legacy IO schedulers"). Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: http://lkml.kernel.org/r/20200130192419.3026-1-krzk@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
5a281062 |
|
06-Apr-2020 |
Andrea Arcangeli <aarcange@redhat.com> |
userfaultfd: wp: add WP pagetable tracking to x86 Accurate userfaultfd WP tracking is possible by tracking exactly which virtual memory ranges were writeprotected by userland. We can't relay only on the RW bit of the mapped pagetable because that information is destroyed by fork() or KSM or swap. If we were to relay on that, we'd need to stay on the safe side and generate false positive wp faults for every swapped out page. [peterx@redhat.com: append _PAGE_UFD_WP to _PAGE_CHG_MASK] Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Jerome Glisse <jglisse@redhat.com> Reviewed-by: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Bobby Powers <bobbypowers@gmail.com> Cc: Brian Geffon <bgeffon@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: Denis Plotnikov <dplotnikov@virtuozzo.com> Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "Kirill A . Shutemov" <kirill@shutemov.name> Cc: Martin Cracauer <cracauer@cons.org> Cc: Marty McFadden <mcfadden8@llnl.gov> Cc: Maya Gokhale <gokhale2@llnl.gov> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: Rik van Riel <riel@redhat.com> Cc: Shaohua Li <shli@fb.com> Link: http://lkml.kernel.org/r/20200220163112.11409-4-peterx@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
9553d16f |
|
30-Mar-2020 |
Amit Daniel Kachhap <amit.kachhap@arm.com> |
init/kconfig: Add LD_VERSION Kconfig This option can be used in Kconfig files to compare the ld version and enable/disable incompatible config options if required. This option is used in the subsequent patch along with GCC_VERSION to filter out an incompatible feature. Signed-off-by: Amit Daniel Kachhap <amit.kachhap@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
#
4edf16b7 |
|
30-Mar-2020 |
KP Singh <kpsingh@google.com> |
bpf, lsm: Make BPF_LSM depend on BPF_EVENTS LSM and tracing programs share their helpers with bpf_tracing_func_proto which is only defined (in bpf_trace.c) when BPF_EVENTS is enabled. Instead of adding __weak symbol, make BPF_LSM depend on BPF_EVENTS so that both tracing and LSM programs can actually share helpers. Fixes: fc611f47f218 ("bpf: Introduce BPF_PROG_TYPE_LSM") Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: KP Singh <kpsingh@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200330204059.13024-1-kpsingh@chromium.org
|
#
fc611f47 |
|
28-Mar-2020 |
KP Singh <kpsingh@google.com> |
bpf: Introduce BPF_PROG_TYPE_LSM Introduce types and configs for bpf programs that can be attached to LSM hooks. The programs can be enabled by the config option CONFIG_BPF_LSM. Signed-off-by: KP Singh <kpsingh@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Brendan Jackman <jackmanb@google.com> Reviewed-by: Florent Revest <revest@google.com> Reviewed-by: Thomas Garnier <thgarnie@google.com> Acked-by: Yonghong Song <yhs@fb.com> Acked-by: Andrii Nakryiko <andriin@fb.com> Acked-by: James Morris <jamorris@linux.microsoft.com> Link: https://lore.kernel.org/bpf/20200329004356.27286-2-kpsingh@chromium.org
|
#
6546b19f |
|
25-Mar-2020 |
Namhyung Kim <namhyung@kernel.org> |
perf/core: Add PERF_SAMPLE_CGROUP feature The PERF_SAMPLE_CGROUP bit is to save (perf_event) cgroup information in the sample. It will add a 64-bit id to identify current cgroup and it's the file handle in the cgroup file system. Userspace should use this information with PERF_RECORD_CGROUP event to match which cgroup it belongs. I put it before PERF_SAMPLE_AUX for simplicity since it just needs a 64-bit word. But if we want bigger samples, I can work on that direction too. Committer testing: $ pahole perf_sample_data | grep -w cgroup -B5 -A5 /* --- cacheline 4 boundary (256 bytes) was 56 bytes ago --- */ struct perf_regs regs_intr; /* 312 16 */ /* --- cacheline 5 boundary (320 bytes) was 8 bytes ago --- */ u64 stack_user_size; /* 328 8 */ u64 phys_addr; /* 336 8 */ u64 cgroup; /* 344 8 */ /* size: 384, cachelines: 6, members: 22 */ /* padding: 32 */ }; $ Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Tejun Heo <tj@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Zefan Li <lizefan@huawei.com> Link: http://lore.kernel.org/lkml/20200325124536.2800725-3-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
3a7c7331 |
|
10-Mar-2020 |
Masahiro Yamada <masahiroy@kernel.org> |
int128: fix __uint128_t compiler test in Kconfig The support for __uint128_t is dependent on the target bit size. GCC that defaults to the 32-bit can still build the 64-bit kernel with -m64 flag passed. However, $(cc-option,-D__SIZEOF_INT128__=0) is evaluated against the default machine bit, which may not match to the kernel it is building. Theoretically, this could be evaluated separately for 64BIT/32BIT. config CC_HAS_INT128 bool default !$(cc-option,$(m64-flag) -D__SIZEOF_INT128__=0) if 64BIT default !$(cc-option,$(m32-flag) -D__SIZEOF_INT128__=0) I simplified it more because the 32-bit compiler is unlikely to support __uint128_t. Fixes: c12d3362a74b ("int128: move __uint128_t compiler test to Kconfig") Reported-by: George Spelvin <lkml@sdf.org> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Tested-by: George Spelvin <lkml@sdf.org>
|
#
76504793 |
|
21-Feb-2020 |
Thara Gopinath <thara.gopinath@linaro.org> |
sched/pelt: Add support to track thermal pressure Extrapolating on the existing framework to track rt/dl utilization using pelt signals, add a similar mechanism to track thermal pressure. The difference here from rt/dl utilization tracking is that, instead of tracking time spent by a CPU running a RT/DL task through util_avg, the average thermal pressure is tracked through load_avg. This is because thermal pressure signal is weighted time "delta" capacity unlike util_avg which is binary. "delta capacity" here means delta between the actual capacity of a CPU and the decreased capacity a CPU due to a thermal event. In order to track average thermal pressure, a new sched_avg variable avg_thermal is introduced. Function update_thermal_load_avg can be called to do the periodic bookkeeping (accumulate, decay and average) of the thermal pressure. Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Thara Gopinath <thara.gopinath@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lkml.kernel.org/r/20200222005213.3873-2-thara.gopinath@linaro.org
|
#
1518c633 |
|
28-Feb-2020 |
Quentin Perret <qperret@google.com> |
kbuild: allow symbol whitelisting with TRIM_UNUSED_KSYMS CONFIG_TRIM_UNUSED_KSYMS currently removes all unused exported symbols from ksymtab. This works really well when using in-tree drivers, but cannot be used in its current form if some of them are out-of-tree. Indeed, even if the list of symbols required by out-of-tree drivers is known at compile time, the only solution today to guarantee these don't get trimmed is to set CONFIG_TRIM_UNUSED_KSYMS=n. This not only wastes space, but also makes it difficult to control the ABI usable by vendor modules in distribution kernels such as Android. Being able to control the kernel ABI surface is particularly useful to ship a unique Generic Kernel Image (GKI) for all vendors, which is a first step in the direction of getting all vendors to contribute their code upstream. As such, attempt to improve the situation by enabling users to specify a symbol 'whitelist' at compile time. Any symbol specified in this whitelist will be kept exported when CONFIG_TRIM_UNUSED_KSYMS is set, even if it has no in-tree user. The whitelist is defined as a simple text file, listing symbols, one per line. Acked-by: Jessica Yu <jeyu@kernel.org> Acked-by: Nicolas Pitre <nico@fluxnic.net> Tested-by: Matthias Maennich <maennich@google.com> Reviewed-by: Matthias Maennich <maennich@google.com> Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
#
2a86f661 |
|
27-Feb-2020 |
Masahiro Yamada <masahiroy@kernel.org> |
kbuild: use KBUILD_DEFCONFIG as the fallback for DEFCONFIG_LIST Most of the Kconfig commands (except defconfig and all*config) read the .config file as a base set of CONFIG options. When it does not exist, the files in DEFCONFIG_LIST are searched in this order and loaded if found. I do not see much sense in the last two lines in DEFCONFIG_LIST. [1] ARCH_DEFCONFIG The entry for DEFCONFIG_LIST is guarded by 'depends on !UML'. So, the ARCH_DEFCONFIG definition in arch/x86/um/Kconfig is meaningless. arch/{sh,sparc,x86}/Kconfig define ARCH_DEFCONFIG depending on 32 or 64 bit variant symbols. This is a little bit strange; ARCH_DEFCONFIG should be a fixed string because the base config file is loaded before the symbol evaluation stage. Using KBUILD_DEFCONFIG makes more sense because it is fixed before Kconfig is invoked. Fortunately, arch/{sh,sparc,x86}/Makefile define it in the same way, and it works as expected. Hence, replace ARCH_DEFCONFIG with "arch/$(SRCARCH)/configs/$(KBUILD_DEFCONFIG)". [2] arch/$(ARCH)/defconfig This file path is no longer valid. The defconfig files are always located in the arch configs/ directories. $ find arch -name defconfig | sort arch/alpha/configs/defconfig arch/arm64/configs/defconfig arch/csky/configs/defconfig arch/nds32/configs/defconfig arch/riscv/configs/defconfig arch/s390/configs/defconfig arch/unicore32/configs/defconfig The path arch/*/configs/defconfig is already covered by "arch/$(SRCARCH)/configs/$(KBUILD_DEFCONFIG)". So, this file path is not necessary. I moved the default KBUILD_DEFCONFIG to the top Makefile. Otherwise, the 7 architectures listed above would end up with endless loop of syncconfig. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
#
2910b5aa |
|
25-Feb-2020 |
Masami Hiramatsu <mhiramat@kernel.org> |
bootconfig: Fix CONFIG_BOOTTIME_TRACING dependency issue Since commit d8a953ddde5e ("bootconfig: Set CONFIG_BOOT_CONFIG=n by default") also changed the CONFIG_BOOTTIME_TRACING to select CONFIG_BOOT_CONFIG to show the boot-time tracing on the menu, it introduced wrong dependencies with BLK_DEV_INITRD as below. WARNING: unmet direct dependencies detected for BOOT_CONFIG Depends on [n]: BLK_DEV_INITRD [=n] Selected by [y]: - BOOTTIME_TRACING [=y] && TRACING_SUPPORT [=y] && FTRACE [=y] && TRACING [=y] This makes the CONFIG_BOOT_CONFIG selects CONFIG_BLK_DEV_INITRD to fix this error and make CONFIG_BOOTTIME_TRACING=n by default, so that both boot-time tracing and boot configuration off but those appear on the menu list. Link: http://lkml.kernel.org/r/158264140162.23842.11237423518607465535.stgit@devnote2 Fixes: d8a953ddde5e ("bootconfig: Set CONFIG_BOOT_CONFIG=n by default") Reported-by: Randy Dunlap <rdunlap@infradead.org> Compiled-tested-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
#
85c46b78 |
|
20-Feb-2020 |
Masami Hiramatsu <mhiramat@kernel.org> |
bootconfig: Add bootconfig magic word for indicating bootconfig explicitly Add bootconfig magic word to the end of bootconfig on initrd image for indicating explicitly the bootconfig is there. Also tools/bootconfig treats wrong size or wrong checksum or parse error as an error, because if there is a bootconfig magic word, there must be a bootconfig. The bootconfig magic word is "#BOOTCONFIG\n", 12 bytes word. Thus the block image of the initrd file with bootconfig is as follows. [Initrd][bootconfig][size][csum][#BOOTCONFIG\n] Link: http://lkml.kernel.org/r/158220112263.26565.3944814205960612841.stgit@devnote2 Suggested-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
#
d8a953dd |
|
20-Feb-2020 |
Masami Hiramatsu <mhiramat@kernel.org> |
bootconfig: Set CONFIG_BOOT_CONFIG=n by default Set CONFIG_BOOT_CONFIG=n by default. This also warns user if CONFIG_BOOT_CONFIG=n but "bootconfig" is given in the kernel command line. Link: http://lkml.kernel.org/r/158220111291.26565.9036889083940367969.stgit@devnote2 Suggested-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
#
26445f98 |
|
06-Feb-2020 |
Masami Hiramatsu <mhiramat@kernel.org> |
bootconfig: Remove unneeded CONFIG_LIBXBC Since there is no user except CONFIG_BOOT_CONFIG and no plan to use it from other functions, CONFIG_LIBXBC can be removed and we can use CONFIG_BOOT_CONFIG directly. Link: http://lkml.kernel.org/r/158098769281.939.16293492056419481105.stgit@devnote2 Suggested-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
#
0947db01 |
|
19-Jan-2020 |
Masami Hiramatsu <mhiramat@kernel.org> |
bootconfig: Fix Kconfig help message for BOOT_CONFIG Fix Kconfig help message since the bootconfig file is only available to be appended to initramfs. And also add a reference to the documentation. Link: http://lkml.kernel.org/r/157949058031.25888.18399447161895787505.stgit@devnote2 Reported-by: Randy Dunlap <rdunlap@infradead.org> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
#
87c9366e |
|
04-Dec-2019 |
Johannes Berg <johannes.berg@intel.com> |
Revert "um: Enable CONFIG_CONSTRUCTORS" This reverts commit 786b2384bf1c ("um: Enable CONFIG_CONSTRUCTORS"). There are two issues with this commit, uncovered by Anton in tests on some (Debian) systems: 1) I completely forgot to call any constructors if CONFIG_CONSTRUCTORS isn't set. Don't recall now if it just wasn't needed on my system, or if I never tested this case. 2) With that fixed, it works - with CONFIG_CONSTRUCTORS *unset*. If I set CONFIG_CONSTRUCTORS, it fails again, which isn't totally unexpected since whatever wanted to run is likely to have to run before the kernel init etc. that calls the constructors in this case. Basically, some constructors that gcc emits (libc has?) need to run very early during init; the failure mode otherwise was that the ptrace fork test already failed: ---------------------- $ ./linux mem=512M Core dump limits : soft - 0 hard - NONE Checking that ptrace can change system call numbers...check_ptrace : child exited with exitcode 6, while expecting 0; status 0x67f Aborted ---------------------- Thinking more about this, it's clear that we simply cannot support CONFIG_CONSTRUCTORS in UML. All the cases we need now (gcov, kasan) involve not use of the __attribute__((constructor)), but instead some constructor code/entry generated by gcc. Therefore, we cannot distinguish between kernel constructors and system constructors. Thus, revert this commit. Cc: stable@vger.kernel.org [5.4+] Fixes: 786b2384bf1c ("um: Enable CONFIG_CONSTRUCTORS") Reported-by: Anton Ivanov <anton.ivanov@cambridgegreys.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Acked-by: Anton Ivanov <anton.ivanov@cambridgegreys.co.uk> Signed-off-by: Richard Weinberger <richard@nod.at>
|
#
660fd04f |
|
11-Nov-2019 |
Thomas Gleixner <tglx@linutronix.de> |
lib/vdso: Prepare for time namespace support To support time namespaces in the vdso with a minimal impact on regular non time namespace affected tasks, the namespace handling needs to be hidden in a slow path. The most obvious place is vdso_seq_begin(). If a task belongs to a time namespace then the VVAR page which contains the system wide vdso data is replaced with a namespace specific page which has the same layout as the VVAR page. That page has vdso_data->seq set to 1 to enforce the slow path and vdso_data->clock_mode set to VCLOCK_TIMENS to enforce the time namespace handling path. The extra check in the case that vdso_data->seq is odd, e.g. a concurrent update of the vdso data is in progress, is not really affecting regular tasks which are not part of a time namespace as the task is spin waiting for the update to finish and vdso_data->seq to become even again. If a time namespace task hits that code path, it invokes the corresponding time getter function which retrieves the real VVAR page, reads host time and then adds the offset for the requested clock which is stored in the special VVAR page. If VDSO time namespace support is disabled the whole magic is compiled out. Initial testing shows that the disabled case is almost identical to the host case which does not take the slow timens path. With the special timens page installed the performance hit is constant time and in the range of 5-7%. For the vdso functions which are not using the sequence count an unconditional check for vdso_data->clock_mode is added which switches to the real vdso when the clock_mode is VCLOCK_TIMENS. [avagin: Make do_hres_timens() work with raw clocks too: choose vdso_data pointer by CS_RAW offset.] Suggested-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20191112012724.250792-21-dima@arista.com
|
#
769071ac |
|
11-Nov-2019 |
Andrei Vagin <avagin@openvz.org> |
ns: Introduce Time Namespace Time Namespace isolates clock values. The kernel provides access to several clocks CLOCK_REALTIME, CLOCK_MONOTONIC, CLOCK_BOOTTIME, etc. CLOCK_REALTIME System-wide clock that measures real (i.e., wall-clock) time. CLOCK_MONOTONIC Clock that cannot be set and represents monotonic time since some unspecified starting point. CLOCK_BOOTTIME Identical to CLOCK_MONOTONIC, except it also includes any time that the system is suspended. For many users, the time namespace means the ability to changes date and time in a container (CLOCK_REALTIME). Providing per namespace notions of CLOCK_REALTIME would be complex with a massive overhead, but has a dubious value. But in the context of checkpoint/restore functionality, monotonic and boottime clocks become interesting. Both clocks are monotonic with unspecified starting points. These clocks are widely used to measure time slices and set timers. After restoring or migrating processes, it has to be guaranteed that they never go backward. In an ideal case, the behavior of these clocks should be the same as for a case when a whole system is suspended. All this means that it is required to set CLOCK_MONOTONIC and CLOCK_BOOTTIME clocks, which can be achieved by adding per-namespace offsets for clocks. A time namespace is similar to a pid namespace in the way how it is created: unshare(CLONE_NEWTIME) system call creates a new time namespace, but doesn't set it to the current process. Then all children of the process will be born in the new time namespace, or a process can use the setns() system call to join a namespace. This scheme allows setting clock offsets for a namespace, before any processes appear in it. All available clone flags have been used, so CLONE_NEWTIME uses the highest bit of CSIGNAL. It means that it can be used only with the unshare() and the clone3() system calls. [ tglx: Adjusted paragraph about clone3() to reality and massaged the changelog a bit. ] Co-developed-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://criu.org/Time_namespace Link: https://lists.openvz.org/pipermail/criu/2018-June/041504.html Link: https://lore.kernel.org/r/20191112012724.250792-4-dima@arista.com
|
#
7684b858 |
|
10-Jan-2020 |
Masami Hiramatsu <mhiramat@kernel.org> |
bootconfig: Load boot config from the tail of initrd Load the extended boot config data from the tail of initrd image. If there is an SKC data there, it has [(u32)size][(u32)checksum] header (in really, this is a footer) at the end of initrd. If the checksum (simple sum of bytes) is match, this starts parsing it from there. Link: http://lkml.kernel.org/r/157867222435.17873.9936667353335606867.stgit@devnote2 Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
#
76db5a27 |
|
10-Jan-2020 |
Masami Hiramatsu <mhiramat@kernel.org> |
bootconfig: Add Extra Boot Config support Extra Boot Config (XBC) allows admin to pass a tree-structured boot configuration file when boot up the kernel. This extends the kernel command line in an efficient way. Boot config will contain some key-value commands, e.g. key.word = value1 another.key.word = value2 It can fold same keys with braces, also you can write array data. For example, key { word1 { setting1 = data setting2 } word2.array = "val1", "val2" } User can access these key-value pair and tree structure via SKC APIs. Link: http://lkml.kernel.org/r/157867221257.17873.1775090991929862549.stgit@devnote2 Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
#
10916706 |
|
03-Dec-2019 |
Shile Zhang <shile.zhang@linux.alibaba.com> |
scripts/sorttable: Rename 'sortextable' to 'sorttable' Use a more generic name for additional table sorting usecases, such as the upcoming ORC table sorting feature. This tool is not tied to exception table sorting anymore. No functional changes intended. [ mingo: Rewrote the changelog. ] Signed-off-by: Shile Zhang <shile.zhang@linux.alibaba.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: Michal Marek <michal.lkml@markovi.net> Cc: linux-kbuild@vger.kernel.org Link: https://lkml.kernel.org/r/20191204004633.88660-6-shile.zhang@linux.alibaba.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
81c22041 |
|
09-Dec-2019 |
Daniel Borkmann <daniel@iogearbox.net> |
bpf, x86, arm64: Enable jit by default when not built as always-on After Spectre 2 fix via 290af86629b2 ("bpf: introduce BPF_JIT_ALWAYS_ON config") most major distros use BPF_JIT_ALWAYS_ON configuration these days which compiles out the BPF interpreter entirely and always enables the JIT. Also given recent fix in e1608f3fa857 ("bpf: Avoid setting bpf insns pages read-only when prog is jited"), we additionally avoid fragmenting the direct map for the BPF insns pages sitting in the general data heap since they are not used during execution. Latter is only needed when run through the interpreter. Since both x86 and arm64 JITs have seen a lot of exposure over the years, are generally most up to date and maintained, there is more downside in !BPF_JIT_ALWAYS_ON configurations to have the interpreter enabled by default rather than the JIT. Add a ARCH_WANT_DEFAULT_BPF_JIT config which archs can use to set the bpf_jit_{enable,kallsyms} to 1. Back in the days the bpf_jit_kallsyms knob was set to 0 by default since major distros still had /proc/kallsyms addresses exposed to unprivileged user space which is not the case anymore. Hence both knobs are set via BPF_JIT_DEFAULT_ON which is set to 'y' in case of BPF_JIT_ALWAYS_ON or ARCH_WANT_DEFAULT_BPF_JIT. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Will Deacon <will@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/f78ad24795c2966efcc2ee19025fa3459f622185.1575903816.git.daniel@iogearbox.net
|
#
e8cf4e9c |
|
04-Dec-2019 |
Krzysztof Kozlowski <krzk@kernel.org> |
init/Kconfig: fix indentation Adjust indentation from spaces to tab (+optional two spaces) as in coding style with command like: $ sed -e 's/^ / /' -i */Kconfig Link: http://lkml.kernel.org/r/1574306670-30234-1-git-send-email-krzk@kernel.org Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Cc: Jiri Kosina <trivial@kernel.org> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
61a47c1a |
|
01-Oct-2019 |
Eric W. Biederman <ebiederm@xmission.com> |
sysctl: Remove the sysctl system call This system call has been deprecated almost since it was introduced, and in a survey of the linux distributions I can no longer find any of them that enable CONFIG_SYSCTL_SYSCALL. The only indication that I can find that anyone might care is that a few of the defconfigs in the kernel enable CONFIG_SYSCTL_SYSCALL. However this appears in only 31 of 414 defconfigs in the kernel, so I suspect this symbols presence is simply because it is harmless to include rather than because it is necessary. As there appear to be no users of the sysctl system call, remove the code. As this removes one of the few uses of the internal kernel mount of proc I hope this allows for even more simplifications of the proc filesystem. Cc: Alex Smith <alex.smith@imgtec.com> Cc: Anders Berg <anders.berg@lsi.com> Cc: Apelete Seketeli <apelete@seketeli.net> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Chee Nouk Phoon <cnphoon@altera.com> Cc: Chris Zankel <chris@zankel.net> Cc: Christian Ruppert <christian.ruppert@abilis.com> Cc: Greg Ungerer <gerg@uclinux.org> Cc: Harvey Hunt <harvey.hunt@imgtec.com> Cc: Helge Deller <deller@gmx.de> Cc: Hongliang Tao <taohl@lemote.com> Cc: Hua Yan <yanh@lemote.com> Cc: Huacai Chen <chenhc@lemote.com> Cc: John Crispin <blogic@openwrt.org> Cc: Jonas Jensen <jonas.jensen@gmail.com> Cc: Josh Boyer <jwboyer@gmail.com> Cc: Jun Nie <jun.nie@linaro.org> Cc: Kevin Hilman <khilman@linaro.org> Cc: Kevin Wells <kevin.wells@nxp.com> Cc: Kumar Gala <galak@codeaurora.org> Cc: Lars-Peter Clausen <lars@metafoo.de> Cc: Ley Foon Tan <lftan@altera.com> Cc: Linus Walleij <linus.walleij@linaro.org> Cc: Markos Chandras <markos.chandras@imgtec.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Noam Camus <noamc@ezchip.com> Cc: Olof Johansson <olof@lixom.net> Cc: Paul Burton <paul.burton@mips.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Phil Edworthy <phil.edworthy@renesas.com> Cc: Pierrick Hascoet <pierrick.hascoet@abilis.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Roland Stigge <stigge@antcom.de> Cc: Santosh Shilimkar <santosh.shilimkar@ti.com> Cc: Scott Telford <stelford@cadence.com> Cc: Stephen Boyd <sboyd@codeaurora.org> Cc: Steven J. Hill <Steven.Hill@imgtec.com> Cc: Tanmay Inamdar <tinamdar@apm.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Wolfram Sang <w.sang@pengutronix.de> Acked-by: Andi Kleen <ak@linux.intel.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
#
c12d3362 |
|
08-Nov-2019 |
Ard Biesheuvel <ardb@kernel.org> |
int128: move __uint128_t compiler test to Kconfig In order to use 128-bit integer arithmetic in C code, the architecture needs to have declared support for it by setting ARCH_SUPPORTS_INT128, and it requires a version of the toolchain that supports this at build time. This is why all existing tests for ARCH_SUPPORTS_INT128 also test whether __SIZEOF_INT128__ is defined, since this is only the case for compilers that can support 128-bit integers. Let's fold this additional test into the Kconfig declaration of ARCH_SUPPORTS_INT128 so that we can also use the symbol in Makefiles, e.g., to decide whether a certain object needs to be included in the first place. Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
#
fcbb8461 |
|
07-Nov-2019 |
Masahiro Yamada <yamada.masahiro@socionext.com> |
kbuild: remove header compile test There are both positive and negative options about this feature. At first, I thought it was a good idea, but actually Linus stated a negative opinion (https://lkml.org/lkml/2019/9/29/227). I admit it is ugly and annoying. The baseline I'd like to keep is the compile-test of uapi headers. (Otherwise, kernel developers have no way to ensure the correctness of the exported headers.) I will maintain a small build rule in usr/include/Makefile. Remove the other header test functionality. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
|
#
561fb04a |
|
24-Oct-2019 |
Jens Axboe <axboe@kernel.dk> |
io_uring: replace workqueue usage with io-wq Drop various work-arounds we have for workqueues: - We no longer need the async_list for tracking sequential IO. - We don't have to maintain our own mm tracking/setting. - We don't need a separate workqueue for buffered writes. This didn't even work that well to begin with, as it was suboptimal for multiple buffered writers on multiple files. - We can properly cancel pending interruptible work. This fixes deadlocks with particularly socket IO, where we cannot cancel them when the io_uring is closed. Hence the ring will wait forever for these requests to complete, which may never happen. This is different from disk IO where we know requests will complete in a finite amount of time. - Due to being able to cancel work interruptible work that is already running, we can implement file table support for work. We need that for supporting system calls that add to a process file table. - It gets us one step closer to adding async support for any system call. Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
786b2384 |
|
23-Aug-2019 |
Johannes Berg <johannes.berg@intel.com> |
um: Enable CONFIG_CONSTRUCTORS We do need to call the constructors for *modules*, and at least for KASAN in the future, we must call even the kernel constructors only later when the kernel has been initialized. Instead of relying on libc to call them, emit an empty section for libc and let the kernel's CONSTRUCTORS code do the rest of the job. Tested that it indeed doesn't work in modules, and does work after the fixes in both, with a few functions with __attribute__((constructor)) in both dynamic and static builds. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Richard Weinberger <richard@nod.at>
|
#
eb111869 |
|
12-Sep-2019 |
Rasmus Villemoes <linux@rasmusvillemoes.dk> |
compiler-types.h: add asm_inline definition This adds an asm_inline macro which expands to "asm inline" [1] when the compiler supports it. This is currently gcc 9.1+, gcc 8.3 and (once released) gcc 7.5 [2]. It expands to just "asm" for other compilers. Using asm inline("foo") instead of asm("foo") overrules gcc's heuristic estimate of the size of the code represented by the asm() statement, and makes gcc use the minimum possible size instead. That can in turn affect gcc's inlining decisions. I wasn't sure whether to make this a function-like macro or not - this way, it can be combined with volatile as asm_inline volatile() but perhaps we'd prefer to spell that asm_inline_volatile() anyway. The Kconfig logic is taken from an RFC patch by Masahiro Yamada [3]. [1] Technically, asm __inline, since both inline and __inline__ are macros that attach various attributes, making gcc barf if one literally does "asm inline()". However, the third spelling __inline is available for referring to the bare keyword. [2] https://lore.kernel.org/lkml/20190907001411.GG9749@gate.crashing.org/ [3] https://lore.kernel.org/lkml/1544695154-15250-1-git-send-email-yamada.masahiro@socionext.com/ Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
|
#
efd9763d |
|
09-Sep-2019 |
Masahiro Yamada <yamada.masahiro@socionext.com> |
module: move CONFIG_UNUSED_SYMBOLS to the sub-menu of MODULES When CONFIG_MODULES is disabled, CONFIG_UNUSED_SYMBOLS is pointless, thus it should be invisible. Instead of adding "depends on MODULES", I moved it to the sub-menu "Enable loadable module support", which is a better fit. I put it close to TRIM_UNUSED_KSYMS because it depends on !UNUSED_SYMBOLS. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Jessica Yu <jeyu@kernel.org>
|
#
d189c2a4 |
|
09-Sep-2019 |
Masahiro Yamada <yamada.masahiro@socionext.com> |
module: remove redundant 'depends on MODULES' These are located in the 'if MODULES' ... 'endif' block. Remove the redundant dependencies. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Jessica Yu <jeyu@kernel.org>
|
#
3d52ec5e |
|
06-Sep-2019 |
Matthias Maennich <maennich@google.com> |
module: add config option MODULE_ALLOW_MISSING_NAMESPACE_IMPORTS If MODULE_ALLOW_MISSING_NAMESPACE_IMPORTS is enabled (default=n), the requirement for modules to import all namespaces that are used by the module is relaxed. Enabling this option effectively allows (invalid) modules to be loaded while only a warning is emitted. Disabling this option keeps the enforcement at module loading time and loading is denied if the module's imports are not satisfactory. Reviewed-by: Martijn Coenen <maco@android.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Matthias Maennich <maennich@google.com> Signed-off-by: Jessica Yu <jeyu@kernel.org>
|
#
15f5db60 |
|
20-Aug-2019 |
Masahiro Yamada <yamada.masahiro@socionext.com> |
kbuild,arc: add CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE_O3 for ARC arch/arc/Makefile overrides -O2 with -O3. This is the only user of ARCH_CFLAGS. There is no user of ARCH_CPPFLAGS or ARCH_AFLAGS. My plan is to remove ARCH_{CPP,A,C}FLAGS after refactoring the ARC Makefile. Currently, ARC has no way to enable -Wmaybe-uninitialized because both -O3 and -Os disable it. Enabling it will be useful for compile-testing. This commit allows allmodconfig (, which defaults to -O2) to enable it. Add CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE_O3=y to all the defconfig files in arch/arc/configs/ in order to keep the current config settings. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Acked-by: Vineet Gupta <vgupta@synopsys.com>
|
#
2480c093 |
|
22-Aug-2019 |
Patrick Bellasi <patrick.bellasi@arm.com> |
sched/uclamp: Extend CPU's cgroup controller The cgroup CPU bandwidth controller allows to assign a specified (maximum) bandwidth to the tasks of a group. However this bandwidth is defined and enforced only on a temporal base, without considering the actual frequency a CPU is running on. Thus, the amount of computation completed by a task within an allocated bandwidth can be very different depending on the actual frequency the CPU is running that task. The amount of computation can be affected also by the specific CPU a task is running on, especially when running on asymmetric capacity systems like Arm's big.LITTLE. With the availability of schedutil, the scheduler is now able to drive frequency selections based on actual task utilization. Moreover, the utilization clamping support provides a mechanism to bias the frequency selection operated by schedutil depending on constraints assigned to the tasks currently RUNNABLE on a CPU. Giving the mechanisms described above, it is now possible to extend the cpu controller to specify the minimum (or maximum) utilization which should be considered for tasks RUNNABLE on a cpu. This makes it possible to better defined the actual computational power assigned to task groups, thus improving the cgroup CPU bandwidth controller which is currently based just on time constraints. Extend the CPU controller with a couple of new attributes uclamp.{min,max} which allow to enforce utilization boosting and capping for all the tasks in a group. Specifically: - uclamp.min: defines the minimum utilization which should be considered i.e. the RUNNABLE tasks of this group will run at least at a minimum frequency which corresponds to the uclamp.min utilization - uclamp.max: defines the maximum utilization which should be considered i.e. the RUNNABLE tasks of this group will run up to a maximum frequency which corresponds to the uclamp.max utilization These attributes: a) are available only for non-root nodes, both on default and legacy hierarchies, while system wide clamps are defined by a generic interface which does not depends on cgroups. This system wide interface enforces constraints on tasks in the root node. b) enforce effective constraints at each level of the hierarchy which are a restriction of the group requests considering its parent's effective constraints. Root group effective constraints are defined by the system wide interface. This mechanism allows each (non-root) level of the hierarchy to: - request whatever clamp values it would like to get - effectively get only up to the maximum amount allowed by its parent c) have higher priority than task-specific clamps, defined via sched_setattr(), thus allowing to control and restrict task requests. Add two new attributes to the cpu controller to collect "requested" clamp values. Allow that at each non-root level of the hierarchy. Keep it simple by not caring now about "effective" values computation and propagation along the hierarchy. Update sysctl_sched_uclamp_handler() to use the newly introduced uclamp_mutex so that we serialize system default updates with cgroup relate updates. Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Michal Koutny <mkoutny@suse.com> Acked-by: Tejun Heo <tj@kernel.org> Cc: Alessio Balsini <balsini@android.com> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Joel Fernandes <joelaf@google.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Morten Rasmussen <morten.rasmussen@arm.com> Cc: Paul Turner <pjt@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Perret <quentin.perret@arm.com> Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com> Cc: Steve Muckle <smuckle@google.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Todd Kjos <tkjos@google.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Viresh Kumar <viresh.kumar@linaro.org> Link: https://lkml.kernel.org/r/20190822132811.31294-2-patrick.bellasi@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
ce3b487f |
|
20-Aug-2019 |
Masahiro Yamada <yamada.masahiro@socionext.com> |
init/Kconfig: rework help of CONFIG_CC_OPTIMIZE_FOR_SIZE CONFIG_CC_OPTIMIZE_FOR_SIZE was originally an independent boolean option, but commit 877417e6ffb9 ("Kbuild: change CC_OPTIMIZE_FOR_SIZE definition") turned it into a choice between _PERFORMANCE and _SIZE. The phrase "If unsure, say N." sounds like an independent option. Reword the help text to make it appropriate for the choice menu. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
|
#
2ff2b7ec |
|
18-Aug-2019 |
Masahiro Yamada <yamada.masahiro@socionext.com> |
kbuild: add CONFIG_ASM_MODVERSIONS Add CONFIG_ASM_MODVERSIONS. This allows to remove one if-conditional nesting in scripts/Makefile.build. scripts/Makefile.build is run every time Kbuild descends into a sub-directory. So, I want to avoid $(wildcard ...) evaluation where possible although computing $(wildcard ...) is so cheap that it may not make measurable performance difference. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
|
#
2d122942 |
|
20-Aug-2019 |
Will Deacon <will@kernel.org> |
Revert "init/Kconfig: Fix infinite Kconfig recursion on PPC" This reverts commit 71c67a31f09fa8fdd1495dffd96a5f0d4cef2ede. Commit 117acf5c29dd ("powerpc/Makefile: Always pass --synthetic to nm if supported") removed the only conditional definition of $(NM), so we can revert our temporary bodge to avoid Kconfig recursion and go back to passing $(NM) through to the 'tools-support-relr.sh' when detecting support for RELR relocations. Signed-off-by: Will Deacon <will@kernel.org>
|
#
49fcf732 |
|
19-Aug-2019 |
David Howells <dhowells@redhat.com> |
lockdown: Enforce module signatures if the kernel is locked down If the kernel is locked down, require that all modules have valid signatures that we can verify. I have adjusted the errors generated: (1) If there's no signature (ENODATA) or we can't check it (ENOPKG, ENOKEY), then: (a) If signatures are enforced then EKEYREJECTED is returned. (b) If there's no signature or we can't check it, but the kernel is locked down then EPERM is returned (this is then consistent with other lockdown cases). (2) If the signature is unparseable (EBADMSG, EINVAL), the signature fails the check (EKEYREJECTED) or a system error occurs (eg. ENOMEM), we return the error we got. Note that the X.509 code doesn't check for key expiry as the RTC might not be valid or might not have been transferred to the kernel's clock yet. [Modified by Matthew Garrett to remove the IMA integration. This will be replaced with integration with the IMA architecture policy patchset.] Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Matthew Garrett <matthewgarrett@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: Jessica Yu <jeyu@kernel.org> Signed-off-by: James Morris <jmorris@namei.org>
|
#
71c67a31 |
|
06-Aug-2019 |
Will Deacon <will@kernel.org> |
init/Kconfig: Fix infinite Kconfig recursion on PPC Commit 5cf896fb6be3 ("arm64: Add support for relocating the kernel with RELR relocations") introduced CONFIG_TOOLS_SUPPORT_RELR, which checks for RELR support in the toolchain as part of the kernel configuration. During this procedure, "$(NM)" is invoked to see if it supports the new relocation format, however PowerPC conditionally overrides this variable in the architecture Makefile in order to pass '--synthetic' when targetting PPC64. This conditional override causes Kconfig to recurse forever, since CONFIG_TOOLS_SUPPORT_RELR cannot be determined without $(NM) being defined, but that in turn depends on CONFIG_PPC64: $ make ARCH=powerpc CROSS_COMPILE=powerpc-linux-gnu- scripts/kconfig/conf --syncconfig Kconfig scripts/kconfig/conf --syncconfig Kconfig scripts/kconfig/conf --syncconfig Kconfig [...] In this particular case, it looks like PowerPC may be able to pass '--synthetic' unconditionally to nm or even drop it altogether. While that is being resolved, let's just bodge the RELR check by picking up $(NM) directly from the environment in whatever state it happens to be in. Cc: Peter Collingbourne <pcc@google.com> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Suggested-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Will Deacon <will@kernel.org>
|
#
c8424e77 |
|
04-Jul-2019 |
Thiago Jung Bauermann <bauerman@linux.ibm.com> |
MODSIGN: Export module signature definitions IMA will use the module_signature format for append signatures, so export the relevant definitions and factor out the code which verifies that the appended signature trailer is valid. Also, create a CONFIG_MODULE_SIG_FORMAT option so that IMA can select it and be able to use mod_check_sig() without having to depend on either CONFIG_MODULE_SIG or CONFIG_MODULES. s390 duplicated the definition of struct module_signature so now they can use the new <linux/module_signature.h> header instead. Signed-off-by: Thiago Jung Bauermann <bauerman@linux.ibm.com> Acked-by: Jessica Yu <jeyu@kernel.org> Reviewed-by: Philipp Rudo <prudo@linux.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
|
#
5cf896fb |
|
31-Jul-2019 |
Peter Collingbourne <pcc@google.com> |
arm64: Add support for relocating the kernel with RELR relocations RELR is a relocation packing format for relative relocations. The format is described in a generic-abi proposal: https://groups.google.com/d/topic/generic-abi/bX460iggiKg/discussion The LLD linker can be instructed to pack relocations in the RELR format by passing the flag --pack-dyn-relocs=relr. This patch adds a new config option, CONFIG_RELR. Enabling this option instructs the linker to pack vmlinux's relative relocations in the RELR format, and causes the kernel to apply the relocations at startup along with the RELA relocations. RELA relocations still need to be applied because the linker will emit RELA relative relocations if they are unrepresentable in the RELR format (i.e. address not a multiple of 2). Enabling CONFIG_RELR reduces the size of a defconfig kernel image with CONFIG_RANDOMIZE_BASE by 3.5MB/16% uncompressed, or 550KB/5% compressed (lz4). Signed-off-by: Peter Collingbourne <pcc@google.com> Tested-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Will Deacon <will@kernel.org>
|
#
92bae787 |
|
16-Jul-2019 |
Kees Cook <keescook@chromium.org> |
init/Kconfig: fix neighboring typos This fixes a couple typos I noticed in the slab Kconfig: sacrifies -> sacrifices accellerate -> accelerate Seeing as no other instances of these typos are found elsewhere in the kernel and that I originally added one of the two, I can only assume working on slab must have caused damage to the spelling centers of my brain. Link: http://lkml.kernel.org/r/201905292203.CD000546EB@keescook Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
da82c92f |
|
27-Jun-2019 |
Mauro Carvalho Chehab <mchehab+samsung@kernel.org> |
docs: cgroup-v1: add it to the admin-guide book Those files belong to the admin guide, so add them. Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
|
#
c3123552 |
|
17-Apr-2019 |
Mauro Carvalho Chehab <mchehab+samsung@kernel.org> |
docs: accounting: convert to ReST Rename the accounting documentation files to ReST, add an index for them and adjust in order to produce a nice html output via the Sphinx build system. At its new index.rst, let's add a :orphan: while this is not linked to the main index.rst file, in order to avoid build warnings. Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
|
#
43c78d88 |
|
30-Jun-2019 |
Masahiro Yamada <yamada.masahiro@socionext.com> |
kbuild: compile-test kernel headers to ensure they are self-contained The headers in include/ are globally used in the kernel source tree to provide common APIs. They are included from external modules, too. It will be useful to make as many headers self-contained as possible so that we do not have to rely on a specific include order. There are more than 4000 headers in include/. In my rough analysis, 70% of them are already self-contained. With efforts, most of them can be self-contained. For now, we must exclude more than 1000 headers just because they cannot be compiled as standalone units. I added them to header-test-. The blacklist was mostly generated by a script, so the reason of the breakage should be checked later. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Tested-by: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
|
#
d6fc9fcb |
|
30-Jun-2019 |
Masahiro Yamada <yamada.masahiro@socionext.com> |
kbuild: compile-test exported headers to ensure they are self-contained Multiple people have suggested compile-testing UAPI headers to ensure they can be really included from user-space. "make headers_check" is obviously not enough to catch bugs, and we often leak unresolved references to user-space. Use the new header-test-y syntax to implement it. Please note exported headers are compile-tested with a completely different set of compiler flags. The header search path is set to $(objtree)/usr/include since exported headers should not include unexported ones. We use -std=gnu89 for the kernel space since the kernel code highly depends on GNU extensions. On the other hand, UAPI headers should be written in more standardized C, so they are compiled with -std=c90. This will emit errors if C++ style comments, the keyword 'inline', etc. are used. Please use C style comments (/* ... */), '__inline__', etc. in UAPI headers. There is additional compiler requirement to enable this test because many of UAPI headers include <stdlib.h>, <sys/ioctl.h>, <sys/time.h>, etc. directly or indirectly. You cannot use kernel.org pre-built toolchains [1] since they lack <stdlib.h>. I reused CONFIG_CC_CAN_LINK to check the system header availability. The intention is slightly different, but a compiler that can link userspace programs provide system headers. For now, a lot of headers need to be excluded because they cannot be compiled standalone, but this is a good start point. [1] https://mirrors.edge.kernel.org/pub/tools/crosstool/index.html Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Reviewed-by: Sam Ravnborg <sam@ravnborg.org>
|
#
1a927fd3 |
|
30-Jun-2019 |
Masahiro Yamada <yamada.masahiro@socionext.com> |
init/Kconfig: add CONFIG_CC_CAN_LINK Currently, scripts/cc-can-link.sh is run just for BPFILTER_UMH, but defining CC_CAN_LINK will be useful in other places. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
|
#
69842cba |
|
21-Jun-2019 |
Patrick Bellasi <patrick.bellasi@arm.com> |
sched/uclamp: Add CPU's clamp buckets refcounting Utilization clamping allows to clamp the CPU's utilization within a [util_min, util_max] range, depending on the set of RUNNABLE tasks on that CPU. Each task references two "clamp buckets" defining its minimum and maximum (util_{min,max}) utilization "clamp values". A CPU's clamp bucket is active if there is at least one RUNNABLE tasks enqueued on that CPU and refcounting that bucket. When a task is {en,de}queued {on,from} a rq, the set of active clamp buckets on that CPU can change. If the set of active clamp buckets changes for a CPU a new "aggregated" clamp value is computed for that CPU. This is because each clamp bucket enforces a different utilization clamp value. Clamp values are always MAX aggregated for both util_min and util_max. This ensures that no task can affect the performance of other co-scheduled tasks which are more boosted (i.e. with higher util_min clamp) or less capped (i.e. with higher util_max clamp). A task has: task_struct::uclamp[clamp_id]::bucket_id to track the "bucket index" of the CPU's clamp bucket it refcounts while enqueued, for each clamp index (clamp_id). A runqueue has: rq::uclamp[clamp_id]::bucket[bucket_id].tasks to track how many RUNNABLE tasks on that CPU refcount each clamp bucket (bucket_id) of a clamp index (clamp_id). It also has a: rq::uclamp[clamp_id]::bucket[bucket_id].value to track the clamp value of each clamp bucket (bucket_id) of a clamp index (clamp_id). The rq::uclamp::bucket[clamp_id][] array is scanned every time it's needed to find a new MAX aggregated clamp value for a clamp_id. This operation is required only when it's dequeued the last task of a clamp bucket tracking the current MAX aggregated clamp value. In this case, the CPU is either entering IDLE or going to schedule a less boosted or more clamped task. The expected number of different clamp values configured at build time is small enough to fit the full unordered array into a single cache line, for configurations of up to 7 buckets. Add to struct rq the basic data structures required to refcount the number of RUNNABLE tasks for each clamp bucket. Add also the max aggregation required to update the rq's clamp value at each enqueue/dequeue event. Use a simple linear mapping of clamp values into clamp buckets. Pre-compute and cache bucket_id to avoid integer divisions at enqueue/dequeue time. Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alessio Balsini <balsini@android.com> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Joel Fernandes <joelaf@google.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Morten Rasmussen <morten.rasmussen@arm.com> Cc: Paul Turner <pjt@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Perret <quentin.perret@arm.com> Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com> Cc: Steve Muckle <smuckle@google.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Todd Kjos <tkjos@google.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Viresh Kumar <viresh.kumar@linaro.org> Link: https://lkml.kernel.org/r/20190621084217.8167-2-patrick.bellasi@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
8060c47b |
|
05-Jun-2019 |
Christoph Hellwig <hch@lst.de> |
block: rename CONFIG_DEBUG_BLK_CGROUP to CONFIG_BFQ_CGROUP_DEBUG This option is entirely bfq specific, give it an appropinquate name. Also make it depend on CONFIG_BFQ_GROUP_IOSCHED in Kconfig, as all the functionality already does so anyway. Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Paolo Valente <paolo.valente@linaro.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
e846f0dc |
|
04-Jun-2019 |
Jani Nikula <jani.nikula@intel.com> |
kbuild: add support for ensuring headers are self-contained Sometimes it's useful to be able to explicitly ensure certain headers remain self-contained, i.e. that they are compilable as standalone units, by including and/or forward declaring everything they depend on. Add special target header-test-y where individual Makefiles can add headers to be tested if CONFIG_HEADER_TEST is enabled. This will generate a dummy C file per header that gets built as part of extra-y. Signed-off-by: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
|
#
d6a3b247 |
|
12-Jun-2019 |
Mauro Carvalho Chehab <mchehab+samsung@kernel.org> |
docs: scheduler: convert docs to ReST and rename to *.rst In order to prepare to add them to the Kernel API book, convert the files to ReST format. The conversion is actually: - add blank lines and identation in order to identify paragraphs; - fix tables markups; - add some lists markups; - mark literal blocks; - adjust title markups. At its new index.rst, let's add a :orphan: while this is not linked to the main index.rst file, in order to avoid build warnings. Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
|
#
99c8b231 |
|
12-Jun-2019 |
Mauro Carvalho Chehab <mchehab+samsung@kernel.org> |
docs: cgroup-v1: convert docs to ReST and rename to *.rst Convert the cgroup-v1 files to ReST format, in order to allow a later addition to the admin-guide. The conversion is actually: - add blank lines and identation in order to identify paragraphs; - fix tables markups; - add some lists markups; - mark literal blocks; - adjust title markups. At its new index.rst, let's add a :orphan: while this is not linked to the main index.rst file, in order to avoid build warnings. Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Tejun Heo <tj@kernel.org>
|
#
f7b101d3 |
|
15-May-2019 |
Joel Fernandes (Google) <joel@joelfernandes.org> |
kheaders: Move from proc to sysfs The kheaders archive consisting of the kernel headers used for compiling bpf programs is in /proc. However there is concern that moving it here will make it permanent. Let us move it to /sys/kernel as discussed [1]. [1] https://lore.kernel.org/patchwork/patch/1067310/#1265969 Suggested-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
ec8f24b7 |
|
19-May-2019 |
Thomas Gleixner <tglx@linutronix.de> |
treewide: Add SPDX license identifier - Makefile/Kconfig Add SPDX license identifiers to all Make/Kconfig files which: - Have no license information of any form These files fall under the project license, GPL v2 only. The resulting SPDX license identifier is: GPL-2.0-only Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
e900a918 |
|
14-May-2019 |
Dan Williams <dan.j.williams@intel.com> |
mm: shuffle initial free memory to improve memory-side-cache utilization Patch series "mm: Randomize free memory", v10. This patch (of 3): Randomization of the page allocator improves the average utilization of a direct-mapped memory-side-cache. Memory side caching is a platform capability that Linux has been previously exposed to in HPC (high-performance computing) environments on specialty platforms. In that instance it was a smaller pool of high-bandwidth-memory relative to higher-capacity / lower-bandwidth DRAM. Now, this capability is going to be found on general purpose server platforms where DRAM is a cache in front of higher latency persistent memory [1]. Robert offered an explanation of the state of the art of Linux interactions with memory-side-caches [2], and I copy it here: It's been a problem in the HPC space: http://www.nersc.gov/research-and-development/knl-cache-mode-performance-coe/ A kernel module called zonesort is available to try to help: https://software.intel.com/en-us/articles/xeon-phi-software and this abandoned patch series proposed that for the kernel: https://lkml.kernel.org/r/20170823100205.17311-1-lukasz.daniluk@intel.com Dan's patch series doesn't attempt to ensure buffers won't conflict, but also reduces the chance that the buffers will. This will make performance more consistent, albeit slower than "optimal" (which is near impossible to attain in a general-purpose kernel). That's better than forcing users to deploy remedies like: "To eliminate this gradual degradation, we have added a Stream measurement to the Node Health Check that follows each job; nodes are rebooted whenever their measured memory bandwidth falls below 300 GB/s." A replacement for zonesort was merged upstream in commit cc9aec03e58f ("x86/numa_emulation: Introduce uniform split capability"). With this numa_emulation capability, memory can be split into cache sized ("near-memory" sized) numa nodes. A bind operation to such a node, and disabling workloads on other nodes, enables full cache performance. However, once the workload exceeds the cache size then cache conflicts are unavoidable. While HPC environments might be able to tolerate time-scheduling of cache sized workloads, for general purpose server platforms, the oversubscribed cache case will be the common case. The worst case scenario is that a server system owner benchmarks a workload at boot with an un-contended cache only to see that performance degrade over time, even below the average cache performance due to excessive conflicts. Randomization clips the peaks and fills in the valleys of cache utilization to yield steady average performance. Here are some performance impact details of the patches: 1/ An Intel internal synthetic memory bandwidth measurement tool, saw a 3X speedup in a contrived case that tries to force cache conflicts. The contrived cased used the numa_emulation capability to force an instance of the benchmark to be run in two of the near-memory sized numa nodes. If both instances were placed on the same emulated they would fit and cause zero conflicts. While on separate emulated nodes without randomization they underutilized the cache and conflicted unnecessarily due to the in-order allocation per node. 2/ A well known Java server application benchmark was run with a heap size that exceeded cache size by 3X. The cache conflict rate was 8% for the first run and degraded to 21% after page allocator aging. With randomization enabled the rate levelled out at 11%. 3/ A MongoDB workload did not observe measurable difference in cache-conflict rates, but the overall throughput dropped by 7% with randomization in one case. 4/ Mel Gorman ran his suite of performance workloads with randomization enabled on platforms without a memory-side-cache and saw a mix of some improvements and some losses [3]. While there is potentially significant improvement for applications that depend on low latency access across a wide working-set, the performance may be negligible to negative for other workloads. For this reason the shuffle capability defaults to off unless a direct-mapped memory-side-cache is detected. Even then, the page_alloc.shuffle=0 parameter can be specified to disable the randomization on those systems. Outside of memory-side-cache utilization concerns there is potentially security benefit from randomization. Some data exfiltration and return-oriented-programming attacks rely on the ability to infer the location of sensitive data objects. The kernel page allocator, especially early in system boot, has predictable first-in-first out behavior for physical pages. Pages are freed in physical address order when first onlined. Quoting Kees: "While we already have a base-address randomization (CONFIG_RANDOMIZE_MEMORY), attacks against the same hardware and memory layouts would certainly be using the predictability of allocation ordering (i.e. for attacks where the base address isn't important: only the relative positions between allocated memory). This is common in lots of heap-style attacks. They try to gain control over ordering by spraying allocations, etc. I'd really like to see this because it gives us something similar to CONFIG_SLAB_FREELIST_RANDOM but for the page allocator." While SLAB_FREELIST_RANDOM reduces the predictability of some local slab caches it leaves vast bulk of memory to be predictably in order allocated. However, it should be noted, the concrete security benefits are hard to quantify, and no known CVE is mitigated by this randomization. Introduce shuffle_free_memory(), and its helper shuffle_zone(), to perform a Fisher-Yates shuffle of the page allocator 'free_area' lists when they are initially populated with free memory at boot and at hotplug time. Do this based on either the presence of a page_alloc.shuffle=Y command line parameter, or autodetection of a memory-side-cache (to be added in a follow-on patch). The shuffling is done in terms of CONFIG_SHUFFLE_PAGE_ORDER sized free pages where the default CONFIG_SHUFFLE_PAGE_ORDER is MAX_ORDER-1 i.e. 10, 4MB this trades off randomization granularity for time spent shuffling. MAX_ORDER-1 was chosen to be minimally invasive to the page allocator while still showing memory-side cache behavior improvements, and the expectation that the security implications of finer granularity randomization is mitigated by CONFIG_SLAB_FREELIST_RANDOM. The performance impact of the shuffling appears to be in the noise compared to other memory initialization work. This initial randomization can be undone over time so a follow-on patch is introduced to inject entropy on page free decisions. It is reasonable to ask if the page free entropy is sufficient, but it is not enough due to the in-order initial freeing of pages. At the start of that process putting page1 in front or behind page0 still keeps them close together, page2 is still near page1 and has a high chance of being adjacent. As more pages are added ordering diversity improves, but there is still high page locality for the low address pages and this leads to no significant impact to the cache conflict rate. [1]: https://itpeernetwork.intel.com/intel-optane-dc-persistent-memory-operating-modes/ [2]: https://lkml.kernel.org/r/AT5PR8401MB1169D656C8B5E121752FC0F8AB120@AT5PR8401MB1169.NAMPRD84.PROD.OUTLOOK.COM [3]: https://lkml.org/lkml/2018/10/12/309 [dan.j.williams@intel.com: fix shuffle enable] Link: http://lkml.kernel.org/r/154943713038.3858443.4125180191382062871.stgit@dwillia2-desk3.amr.corp.intel.com [cai@lca.pw: fix SHUFFLE_PAGE_ALLOCATOR help texts] Link: http://lkml.kernel.org/r/20190425201300.75650-1-cai@lca.pw Link: http://lkml.kernel.org/r/154899811738.3165233.12325692939590944259.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Qian Cai <cai@lca.pw> Reviewed-by: Kees Cook <keescook@chromium.org> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Keith Busch <keith.busch@intel.com> Cc: Robert Elliott <elliott@hpe.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
bc0c6045 |
|
26-Apr-2019 |
Joel Fernandes (Google) <joel@joelfernandes.org> |
init/config: Do not select BUILD_BIN2C for IKCONFIG Since commit 13610aa908dc ("kernel/configs: use .incbin directive to embed config_data.gz"), IKCONFIG no longer uses BUILD_BIN2C so prevent it from being selected in Kconfig. Reviewed-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
43d8ce9d |
|
26-Apr-2019 |
Joel Fernandes (Google) <joel@joelfernandes.org> |
Provide in-kernel headers to make extending kernel easier Introduce in-kernel headers which are made available as an archive through proc (/proc/kheaders.tar.xz file). This archive makes it possible to run eBPF and other tracing programs that need to extend the kernel for tracing purposes without any dependency on the file system having headers. A github PR is sent for the corresponding BCC patch at: https://github.com/iovisor/bcc/pull/2312 On Android and embedded systems, it is common to switch kernels but not have kernel headers available on the file system. Further once a different kernel is booted, any headers stored on the file system will no longer be useful. This is an issue even well known to distros. By storing the headers as a compressed archive within the kernel, we can avoid these issues that have been a hindrance for a long time. The best way to use this feature is by building it in. Several users have a need for this, when they switch debug kernels, they do not want to update the filesystem or worry about it where to store the headers on it. However, the feature is also buildable as a module in case the user desires it not being part of the kernel image. This makes it possible to load and unload the headers from memory on demand. A tracing program can load the module, do its operations, and then unload the module to save kernel memory. The total memory needed is 3.3MB. By having the archive available at a fixed location independent of filesystem dependencies and conventions, all debugging tools can directly refer to the fixed location for the archive, without concerning with where the headers on a typical filesystem which significantly simplifies tooling that needs kernel headers. The code to read the headers is based on /proc/config.gz code and uses the same technique to embed the headers. Other approaches were discussed such as having an in-memory mountable filesystem, but that has drawbacks such as requiring an in-kernel xz decompressor which we don't have today, and requiring usage of 42 MB of kernel memory to host the decompressed headers at anytime. Also this approach is simpler than such approaches. Reviewed-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
5dd50aae |
|
05-Nov-2018 |
David Howells <dhowells@redhat.com> |
Make anon_inodes unconditional Make the anon_inodes facility unconditional so that it can be used by core VFS code and pidfd code. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> [christian@brauner.io: adapt commit message to mention pidfds] Signed-off-by: Christian Brauner <christian@brauner.io>
|
#
dadd2299 |
|
05-Nov-2018 |
David Howells <dhowells@redhat.com> |
Make anon_inodes unconditional Make the anon_inodes facility unconditional so that it can be used by core VFS code. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
041a1574 |
|
04-Mar-2019 |
Arnd Bergmann <arnd@arndb.de> |
time: Make VIRT_CPU_ACCOUNTING_GEN depend on GENERIC_CLOCKEVENTS Moving the CONTEXT_TRACKING Kconfig option into kernel/time/Kconfig added an implicit dependency on the surrounding GENERIC_CLOCKEVENTS option, but this is not always enabled when it is possible to select VIRT_CPU_ACCOUNTING_GEN: WARNING: unmet direct dependencies detected for CONTEXT_TRACKING Depends on [n]: GENERIC_CLOCKEVENTS [=n] Selected by [y]: - VIRT_CPU_ACCOUNTING_GEN [=y] && <choice> && HAVE_CONTEXT_TRACKING [=y] && HAVE_VIRT_CPU_ACCOUNTING_GEN [=y] Platforms without GENERIC_CLOCKEVENTS are rare enough so that corner case can be just ignored. Make it a dependency for VIRT_CPU_ACCOUNTING_GEN to simplify the configuration. Fixes: a4cffdad7314 ("time: Move CONTEXT_TRACKING to kernel/time/Kconfig") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: "Paul E . McKenney" <paulmck@linux.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Link: https://lkml.kernel.org/r/20190304200202.1163250-1-arnd@arndb.de
|
#
fa7295ab |
|
01-Mar-2019 |
Masahiro Yamada <yamada.masahiro@socionext.com> |
kbuild: clean up scripts/gcc-version.sh Now that the Kconfig is the only user of this script, we can drop unneeded code. Remove the -p option, and stop prepending the output with zero, so that Kconfig can directly use the output from this script. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
|
#
2b188cc1 |
|
07-Jan-2019 |
Jens Axboe <axboe@kernel.dk> |
Add io_uring IO interface The submission queue (SQ) and completion queue (CQ) rings are shared between the application and the kernel. This eliminates the need to copy data back and forth to submit and complete IO. IO submissions use the io_uring_sqe data structure, and completions are generated in the form of io_uring_cqe data structures. The SQ ring is an index into the io_uring_sqe array, which makes it possible to submit a batch of IOs without them being contiguous in the ring. The CQ ring is always contiguous, as completion events are inherently unordered, and hence any io_uring_cqe entry can point back to an arbitrary submission. Two new system calls are added for this: io_uring_setup(entries, params) Sets up an io_uring instance for doing async IO. On success, returns a file descriptor that the application can mmap to gain access to the SQ ring, CQ ring, and io_uring_sqes. io_uring_enter(fd, to_submit, min_complete, flags, sigset, sigsetsize) Initiates IO against the rings mapped to this fd, or waits for them to complete, or both. The behavior is controlled by the parameters passed in. If 'to_submit' is non-zero, then we'll try and submit new IO. If IORING_ENTER_GETEVENTS is set, the kernel will wait for 'min_complete' events, if they aren't already available. It's valid to set IORING_ENTER_GETEVENTS and 'min_complete' == 0 at the same time, this allows the kernel to return already completed events without waiting for them. This is useful only for polling, as for IRQ driven IO, the application can just check the CQ ring without entering the kernel. With this setup, it's possible to do async IO with a single system call. Future developments will enable polled IO with this interface, and polled submission as well. The latter will enable an application to do IO without doing ANY system calls at all. For IRQ driven IO, an application only needs to enter the kernel for completions if it wants to wait for them to occur. Each io_uring is backed by a workqueue, to support buffered async IO as well. We will only punt to an async context if the command would need to wait for IO on the device side. Any data that can be accessed directly in the page cache is done inline. This avoids the slowness issue of usual threadpools, since cached data is accessed as quickly as a sync interface. Sample application: http://git.kernel.dk/cgit/fio/plain/t/io_uring.c Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
b303c6df |
|
20-Feb-2019 |
Masahiro Yamada <yamada.masahiro@socionext.com> |
kbuild: compute false-positive -Wmaybe-uninitialized cases in Kconfig Since -Wmaybe-uninitialized was introduced by GCC 4.7, we have patched various false positives: - commit e74fc973b6e5 ("Turn off -Wmaybe-uninitialized when building with -Os") turned off this option for -Os. - commit 815eb71e7149 ("Kbuild: disable 'maybe-uninitialized' warning for CONFIG_PROFILE_ALL_BRANCHES") turned off this option for CONFIG_PROFILE_ALL_BRANCHES - commit a76bcf557ef4 ("Kbuild: enable -Wmaybe-uninitialized warning for "make W=1"") turned off this option for GCC < 4.9 Arnd provided more explanation in https://lkml.org/lkml/2017/3/14/903 I think this looks better by shifting the logic from Makefile to Kconfig. Link: https://github.com/ClangBuiltLinux/linux/issues/350 Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Reviewed-by: Nathan Chancellor <natechancellor@gmail.com> Tested-by: Nick Desaulniers <ndesaulniers@google.com>
|
#
7b2489d3 |
|
01-Feb-2019 |
Johannes Weiner <hannes@cmpxchg.org> |
psi: clarify the Kconfig text for the default-disable option The current help text caused some confusion in online forums about whether or not to default-enable or default-disable psi in vendor kernels. This is because it doesn't communicate the reason for why we made this setting configurable in the first place: that the overhead is non-zero in an artificial scheduler stress test. Since this isn't representative of real workloads, and the effect was not measurable in scheduler-heavy real world applications such as the webservers and memcache installations at Facebook, it's fair to point out that this is a pretty cautious option to select. Link: http://lkml.kernel.org/r/20190129233617.16767-1-hannes@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
98076833 |
|
01-Feb-2019 |
Jonathan Neuschäfer <j.neuschaefer@gmx.net> |
init/Kconfig: fix grammar by moving a closing parenthesis Link: http://lkml.kernel.org/r/20190129150813.15785-1-j.neuschaefer@gmx.net Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
16fd20aa |
|
11-Jan-2019 |
Paul Burton <paulburton@kernel.org> |
kbuild: Disable LD_DEAD_CODE_DATA_ELIMINATION with ftrace & GCC <= 4.7 When building using GCC 4.7 or older, -ffunction-sections & the -pg flag used by ftrace are incompatible. This causes warnings or build failures (where -Werror applies) such as the following: arch/mips/generic/init.c: error: -ffunction-sections disabled; it makes profiling impossible This used to be taken into account by the ordering of calls to cc-option from within the top-level Makefile, which was introduced by commit 90ad4052e85c ("kbuild: avoid conflict between -ffunction-sections and -pg on gcc-4.7"). Unfortunately this was broken when the CONFIG_LD_DEAD_CODE_DATA_ELIMINATION cc-option check was moved to Kconfig in commit e85d1d65cd8a ("kbuild: test dead code/data elimination support in Kconfig"), because the flags used by this check no longer include -pg. Fix this by not allowing CONFIG_LD_DEAD_CODE_DATA_ELIMINATION to be enabled at the same time as ftrace/CONFIG_FUNCTION_TRACER when building using GCC 4.7 or older. Signed-off-by: Paul Burton <paul.burton@mips.com> Fixes: e85d1d65cd8a ("kbuild: test dead code/data elimination support in Kconfig") Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: stable@vger.kernel.org # v4.19+ Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
|
#
e9666d10 |
|
30-Dec-2018 |
Masahiro Yamada <yamada.masahiro@socionext.com> |
jump_label: move 'asm goto' support test to Kconfig Currently, CONFIG_JUMP_LABEL just means "I _want_ to use jump label". The jump label is controlled by HAVE_JUMP_LABEL, which is defined like this: #if defined(CC_HAVE_ASM_GOTO) && defined(CONFIG_JUMP_LABEL) # define HAVE_JUMP_LABEL #endif We can improve this by testing 'asm goto' support in Kconfig, then make JUMP_LABEL depend on CC_HAS_ASM_GOTO. Ugly #ifdef HAVE_JUMP_LABEL will go away, and CONFIG_JUMP_LABEL will match to the real kernel capability. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
|
#
428a1cb4 |
|
14-Dec-2018 |
Baruch Siach <baruch@tkos.co.il> |
psi: fix reference to kernel commandline enable The kernel commandline parameter named in CONFIG_PSI_DEFAULT_DISABLED help text contradicts the documentation in kernel-parameters.txt, and the code. Fix that. Link: http://lkml.kernel.org/r/20181203213416.GA12627@cmpxchg.org Fixes: e0c274472d ("psi: make disabling/enabling easier for vendor kernels") Signed-off-by: Baruch Siach <baruch@tkos.co.il> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
e0c27447 |
|
30-Nov-2018 |
Johannes Weiner <hannes@cmpxchg.org> |
psi: make disabling/enabling easier for vendor kernels Mel Gorman reports a hackbench regression with psi that would prohibit shipping the suse kernel with it default-enabled, but he'd still like users to be able to opt in at little to no cost to others. With the current combination of CONFIG_PSI and the psi_disabled bool set from the commandline, this is a challenge. Do the following things to make it easier: 1. Add a config option CONFIG_PSI_DEFAULT_DISABLED that allows distros to enable CONFIG_PSI in their kernel but leave the feature disabled unless a user requests it at boot-time. To avoid double negatives, rename psi_disabled= to psi=. 2. Make psi_disabled a static branch to eliminate any branch costs when the feature is disabled. In terms of numbers before and after this patch, Mel says: : The following is a comparision using CONFIG_PSI=n as a baseline against : your patch and a vanilla kernel : : 4.20.0-rc4 4.20.0-rc4 4.20.0-rc4 : kconfigdisable-v1r1 vanilla psidisable-v1r1 : Amean 1 1.3100 ( 0.00%) 1.3923 ( -6.28%) 1.3427 ( -2.49%) : Amean 3 3.8860 ( 0.00%) 4.1230 * -6.10%* 3.8860 ( -0.00%) : Amean 5 6.8847 ( 0.00%) 8.0390 * -16.77%* 6.7727 ( 1.63%) : Amean 7 9.9310 ( 0.00%) 10.8367 * -9.12%* 9.9910 ( -0.60%) : Amean 12 16.6577 ( 0.00%) 18.2363 * -9.48%* 17.1083 ( -2.71%) : Amean 18 26.5133 ( 0.00%) 27.8833 * -5.17%* 25.7663 ( 2.82%) : Amean 24 34.3003 ( 0.00%) 34.6830 ( -1.12%) 32.0450 ( 6.58%) : Amean 30 40.0063 ( 0.00%) 40.5800 ( -1.43%) 41.5087 ( -3.76%) : Amean 32 40.1407 ( 0.00%) 41.2273 ( -2.71%) 39.9417 ( 0.50%) : : It's showing that the vanilla kernel takes a hit (as the bisection : indicated it would) and that disabling PSI by default is reasonably : close in terms of performance for this particular workload on this : particular machine so; Link: http://lkml.kernel.org/r/20181127165329.GA29728@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Tested-by: Mel Gorman <mgorman@techsingularity.net> Reported-by: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
c8fc5d49 |
|
15-Nov-2018 |
Richard Guy Briggs <rgb@redhat.com> |
audit: remove WATCH and TREE config options Remove the CONFIG_AUDIT_WATCH and CONFIG_AUDIT_TREE config options since they are both dependent on CONFIG_AUDITSYSCALL and force CONFIG_FSNOTIFY. Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
|
#
2ce7135a |
|
26-Oct-2018 |
Johannes Weiner <hannes@cmpxchg.org> |
psi: cgroup support On a system that executes multiple cgrouped jobs and independent workloads, we don't just care about the health of the overall system, but also that of individual jobs, so that we can ensure individual job health, fairness between jobs, or prioritize some jobs over others. This patch implements pressure stall tracking for cgroups. In kernels with CONFIG_PSI=y, cgroup2 groups will have cpu.pressure, memory.pressure, and io.pressure files that track aggregate pressure stall times for only the tasks inside the cgroup. Link: http://lkml.kernel.org/r/20180828172258.3185-10-hannes@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Daniel Drake <drake@endlessm.com> Tested-by: Suren Baghdasaryan <surenb@google.com> Cc: Christopher Lameter <cl@linux.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Johannes Weiner <jweiner@fb.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Enderborg <peter.enderborg@sony.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Shakeel Butt <shakeelb@google.com> Cc: Vinayak Menon <vinmenon@codeaurora.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
eb414681 |
|
26-Oct-2018 |
Johannes Weiner <hannes@cmpxchg.org> |
psi: pressure stall information for CPU, memory, and IO When systems are overcommitted and resources become contended, it's hard to tell exactly the impact this has on workload productivity, or how close the system is to lockups and OOM kills. In particular, when machines work multiple jobs concurrently, the impact of overcommit in terms of latency and throughput on the individual job can be enormous. In order to maximize hardware utilization without sacrificing individual job health or risk complete machine lockups, this patch implements a way to quantify resource pressure in the system. A kernel built with CONFIG_PSI=y creates files in /proc/pressure/ that expose the percentage of time the system is stalled on CPU, memory, or IO, respectively. Stall states are aggregate versions of the per-task delay accounting delays: cpu: some tasks are runnable but not executing on a CPU memory: tasks are reclaiming, or waiting for swapin or thrashing cache io: tasks are waiting for io completions These percentages of walltime can be thought of as pressure percentages, and they give a general sense of system health and productivity loss incurred by resource overcommit. They can also indicate when the system is approaching lockup scenarios and OOMs. To do this, psi keeps track of the task states associated with each CPU and samples the time they spend in stall states. Every 2 seconds, the samples are averaged across CPUs - weighted by the CPUs' non-idle time to eliminate artifacts from unused CPUs - and translated into percentages of walltime. A running average of those percentages is maintained over 10s, 1m, and 5m periods (similar to the loadaverage). [hannes@cmpxchg.org: doc fixlet, per Randy] Link: http://lkml.kernel.org/r/20180828205625.GA14030@cmpxchg.org [hannes@cmpxchg.org: code optimization] Link: http://lkml.kernel.org/r/20180907175015.GA8479@cmpxchg.org [hannes@cmpxchg.org: rename psi_clock() to psi_update_work(), per Peter] Link: http://lkml.kernel.org/r/20180907145404.GB11088@cmpxchg.org [hannes@cmpxchg.org: fix build] Link: http://lkml.kernel.org/r/20180913014222.GA2370@cmpxchg.org Link: http://lkml.kernel.org/r/20180828172258.3185-9-hannes@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Daniel Drake <drake@endlessm.com> Tested-by: Suren Baghdasaryan <surenb@google.com> Cc: Christopher Lameter <cl@linux.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Johannes Weiner <jweiner@fb.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Enderborg <peter.enderborg@sony.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Shakeel Butt <shakeelb@google.com> Cc: Tejun Heo <tj@kernel.org> Cc: Vinayak Menon <vinmenon@codeaurora.org> Cc: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
11d4afd4 |
|
25-Sep-2018 |
Vincent Guittot <vincent.guittot@linaro.org> |
sched/pelt: Fix warning and clean up IRQ PELT config Create a config for enabling irq load tracking in the scheduler. irq load tracking is useful only when irq or paravirtual time is accounted but it's only possible with SMP for now. Also use __maybe_unused to remove the compilation warning in update_rq_clock_task() that has been introduced by: 2e62c4743adc ("sched/fair: Remove #ifdefs from scale_rt_capacity()") Suggested-by: Ingo Molnar <mingo@redhat.com> Reported-by: Dou Liyang <douly.fnst@cn.fujitsu.com> Reported-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: bp@alien8.de Cc: dou_liyang@163.com Fixes: 2e62c4743adc ("sched/fair: Remove #ifdefs from scale_rt_capacity()") Link: http://lkml.kernel.org/r/1537867062-27285-1-git-send-email-vincent.guittot@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
e85d1d65 |
|
22-Aug-2018 |
Masahiro Yamada <yamada.masahiro@socionext.com> |
kbuild: test dead code/data elimination support in Kconfig This config option should be enabled only when both the compiler and the linker support necessary flags. Add proper dependencies to Kconfig. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
|
#
5cb366bb |
|
21-Aug-2018 |
Adrian Reber <adrian@lisas.de> |
init/Kconfig: remove EXPERT from CHECKPOINT_RESTORE The CHECKPOINT_RESTORE configuration option was introduced in 2012 and combined with EXPERT. CHECKPOINT_RESTORE is already enabled in many distribution kernels and also part of the defconfigs of various architectures. To make it easier for distributions to enable CHECKPOINT_RESTORE this removes EXPERT and moves the configuration option out of the EXPERT block. Link: http://lkml.kernel.org/r/20180712130733.11510-1-adrian@lisas.de Signed-off-by: Adrian Reber <adrian@lisas.de> Acked-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com> Acked-by: Pavel Emelyanov <xemul@virtuozzo.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Andrei Vagin <avagin@virtuozzo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
3903bf94 |
|
21-Aug-2018 |
Randy Dunlap <rdunlap@infradead.org> |
init/Kconfig: fix its typos Correct typos of "it's" to "its. Link: http://lkml.kernel.org/r/0ac627b6-5527-55f4-0489-1631aa34fc11@infradead.org Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
84c07d11 |
|
17-Aug-2018 |
Kirill Tkhai <ktkhai@virtuozzo.com> |
mm: introduce CONFIG_MEMCG_KMEM as combination of CONFIG_MEMCG && !CONFIG_SLOB Introduce new config option, which is used to replace repeating CONFIG_MEMCG && !CONFIG_SLOB pattern. Next patches add a little more memcg+kmem related code, so let's keep the defines more clearly. Link: http://lkml.kernel.org/r/153063053670.1818.15013136946600481138.stgit@localhost.localdomain Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Acked-by: Vladimir Davydov <vdavydov.dev@gmail.com> Tested-by: Shakeel Butt <shakeelb@google.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Guenter Roeck <linux@roeck-us.net> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Josef Bacik <jbacik@fb.com> Cc: Li RongQing <lirongqing@baidu.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Matthias Kaehlcke <mka@chromium.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@kernel.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Philippe Ombredanne <pombredanne@nexb.com> Cc: Roman Gushchin <guro@fb.com> Cc: Sahitya Tummala <stummala@codeaurora.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Waiman Long <longman@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
47f38ae0 |
|
07-Aug-2018 |
Rob Landley <rob@landley.net> |
init/Kconfig: Use short unix-style option instead of --longname Avoids warning messages with the latest release of toybox, which never bothered to implement the --longopts nothing was using. Signed-off-by: Rob Landley <rob@landley.net> Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
|
#
87a4c375 |
|
31-Jul-2018 |
Christoph Hellwig <hch@lst.de> |
kconfig: include kernel/Kconfig.preempt from init/Kconfig Almost all architectures include it. Add a ARCH_NO_PREEMPT symbol to disable preempt support for alpha, hexagon, non-coldfire m68k and user mode Linux. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
|
#
1572497c |
|
31-Jul-2018 |
Christoph Hellwig <hch@lst.de> |
kconfig: include common Kconfig files from top-level Kconfig Instead of duplicating the source statements in every architecture just do it once in the toplevel Kconfig file. Note that with this the inclusion of arch/$(SRCARCH/Kconfig moves out of the top-level Kconfig into arch/Kconfig so that don't violate ordering constraits while keeping a sensible menu structure. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
|
#
17c46a6a |
|
31-Jul-2018 |
Christoph Hellwig <hch@lst.de> |
kconfig: remove duplicate SWAP symbol defintions microblaze and nios2 define their own always n SWAP symbols. Remove those and let the generic defintion do the right thing by adding a new symbol to disable swap entirely. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
|
#
9afb719e |
|
05-Jul-2018 |
Laura Abbott <labbott@redhat.com> |
kbuild: Add build salt to the kernel and modules In Fedora, the debug information is packaged separately (foo-debuginfo) and can be installed separately. There's been a long standing issue where only one version of a debuginfo info package can be installed at a time. There's been an effort for Fedora for parallel debuginfo to rectify this problem. Part of the requirement to allow parallel debuginfo to work is that build ids are unique between builds. The existing upstream rpm implementation ensures this by re-calculating the build-id using the version and release as a seed. This doesn't work 100% for the kernel because of the vDSO which is its own binary and doesn't get updated when embedded. Fix this by adding some data in an ELF note for both the kernel and modules. The data is controlled via a Kconfig option so distributions can set it to an appropriate value to ensure uniqueness between builds. Suggested-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Laura Abbott <labbott@redhat.com> Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
|
#
8b9d2712 |
|
23-Jun-2018 |
Masahiro Yamada <yamada.masahiro@socionext.com> |
kbuild: reword help of LD_DEAD_CODE_DATA_ELIMINATION Since commit 5d20ee3192a5 ("kbuild: Allow LD_DEAD_CODE_DATA_ELIMINATION to be selectable if enabled"), HAVE_LD_DEAD_CODE_DATA_ELIMINATION is supposed to be selected by architectures that are capable of this functionality. LD_DEAD_CODE_DATA_ELIMINATION is now users' selection. Update the help message. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
|
#
f16466af |
|
12-Jun-2018 |
Vasily Gorbik <gor@linux.ibm.com> |
init/Kconfig: add an option for uncompressed kernel Add "None" as the kernel compression mode. This option is useful for debugging the kernel in slow simulation environments, where decompressing and moving the kernel is awfully slow. Uncompressed kernel implementation might allow early boot code to skip the decompressor and jump right at uncompressed kernel image entry point. Platforms implementing that should define HAVE_KERNEL_UNCOMPRESSED. Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
cf65a0f6 |
|
12-Jun-2018 |
Christoph Hellwig <hch@lst.de> |
dma-mapping: move all DMA mapping code to kernel/dma Currently the code is split over various files with dma- prefixes in the lib/ and drives/base directories, and the number of files keeps growing. Move them into a single directory to keep the code together and remove the file name prefixes. To match the irq infrastructure this directory is placed under the kernel/ directory. Signed-off-by: Christoph Hellwig <hch@lst.de>
|
#
469cb737 |
|
28-May-2018 |
Masahiro Yamada <yamada.masahiro@socionext.com> |
kconfig: add CC_IS_CLANG and CLANG_VERSION This will be useful to describe the clang version dependency. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Reviewed-by: Kees Cook <keescook@chromium.org>
|
#
a4353898 |
|
28-May-2018 |
Masahiro Yamada <yamada.masahiro@socionext.com> |
kconfig: add CC_IS_GCC and GCC_VERSION This will be useful to specify the required compiler version, like this: config FOO bool "Use Foo" depends on GCC_VERSION >= 40800 help This feature requires GCC 4.8 or newer. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Reviewed-by: Kees Cook <keescook@chromium.org>
|
#
d7822b1e |
|
02-Jun-2018 |
Mathieu Desnoyers <mathieu.desnoyers@efficios.com> |
rseq: Introduce restartable sequences system call Expose a new system call allowing each thread to register one userspace memory area to be used as an ABI between kernel and user-space for two purposes: user-space restartable sequences and quick access to read the current CPU number value from user-space. * Restartable sequences (per-cpu atomics) Restartables sequences allow user-space to perform update operations on per-cpu data without requiring heavy-weight atomic operations. The restartable critical sections (percpu atomics) work has been started by Paul Turner and Andrew Hunter. It lets the kernel handle restart of critical sections. [1] [2] The re-implementation proposed here brings a few simplifications to the ABI which facilitates porting to other architectures and speeds up the user-space fast path. Here are benchmarks of various rseq use-cases. Test hardware: arm32: ARMv7 Processor rev 4 (v7l) "Cubietruck", 2-core x86-64: Intel E5-2630 v3@2.40GHz, 16-core, hyperthreading The following benchmarks were all performed on a single thread. * Per-CPU statistic counter increment getcpu+atomic (ns/op) rseq (ns/op) speedup arm32: 344.0 31.4 11.0 x86-64: 15.3 2.0 7.7 * LTTng-UST: write event 32-bit header, 32-bit payload into tracer per-cpu buffer getcpu+atomic (ns/op) rseq (ns/op) speedup arm32: 2502.0 2250.0 1.1 x86-64: 117.4 98.0 1.2 * liburcu percpu: lock-unlock pair, dereference, read/compare word getcpu+atomic (ns/op) rseq (ns/op) speedup arm32: 751.0 128.5 5.8 x86-64: 53.4 28.6 1.9 * jemalloc memory allocator adapted to use rseq Using rseq with per-cpu memory pools in jemalloc at Facebook (based on rseq 2016 implementation): The production workload response-time has 1-2% gain avg. latency, and the P99 overall latency drops by 2-3%. * Reading the current CPU number Speeding up reading the current CPU number on which the caller thread is running is done by keeping the current CPU number up do date within the cpu_id field of the memory area registered by the thread. This is done by making scheduler preemption set the TIF_NOTIFY_RESUME flag on the current thread. Upon return to user-space, a notify-resume handler updates the current CPU value within the registered user-space memory area. User-space can then read the current CPU number directly from memory. Keeping the current cpu id in a memory area shared between kernel and user-space is an improvement over current mechanisms available to read the current CPU number, which has the following benefits over alternative approaches: - 35x speedup on ARM vs system call through glibc - 20x speedup on x86 compared to calling glibc, which calls vdso executing a "lsl" instruction, - 14x speedup on x86 compared to inlined "lsl" instruction, - Unlike vdso approaches, this cpu_id value can be read from an inline assembly, which makes it a useful building block for restartable sequences. - The approach of reading the cpu id through memory mapping shared between kernel and user-space is portable (e.g. ARM), which is not the case for the lsl-based x86 vdso. On x86, yet another possible approach would be to use the gs segment selector to point to user-space per-cpu data. This approach performs similarly to the cpu id cache, but it has two disadvantages: it is not portable, and it is incompatible with existing applications already using the gs segment selector for other purposes. Benchmarking various approaches for reading the current CPU number: ARMv7 Processor rev 4 (v7l) Machine model: Cubietruck - Baseline (empty loop): 8.4 ns - Read CPU from rseq cpu_id: 16.7 ns - Read CPU from rseq cpu_id (lazy register): 19.8 ns - glibc 2.19-0ubuntu6.6 getcpu: 301.8 ns - getcpu system call: 234.9 ns x86-64 Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz: - Baseline (empty loop): 0.8 ns - Read CPU from rseq cpu_id: 0.8 ns - Read CPU from rseq cpu_id (lazy register): 0.8 ns - Read using gs segment selector: 0.8 ns - "lsl" inline assembly: 13.0 ns - glibc 2.19-0ubuntu6 getcpu: 16.6 ns - getcpu system call: 53.9 ns - Speed (benchmark taken on v8 of patchset) Running 10 runs of hackbench -l 100000 seems to indicate, contrary to expectations, that enabling CONFIG_RSEQ slightly accelerates the scheduler: Configuration: 2 sockets * 8-core Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz (directly on hardware, hyperthreading disabled in BIOS, energy saving disabled in BIOS, turboboost disabled in BIOS, cpuidle.off=1 kernel parameter), with a Linux v4.6 defconfig+localyesconfig, restartable sequences series applied. * CONFIG_RSEQ=n avg.: 41.37 s std.dev.: 0.36 s * CONFIG_RSEQ=y avg.: 40.46 s std.dev.: 0.33 s - Size On x86-64, between CONFIG_RSEQ=n/y, the text size increase of vmlinux is 567 bytes, and the data size increase of vmlinux is 5696 bytes. [1] https://lwn.net/Articles/650333/ [2] http://www.linuxplumbersconf.org/2013/ocw/system/presentations/1695/original/LPC%20-%20PerCpu%20Atomics.pdf Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Joel Fernandes <joelaf@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Watson <davejwatson@fb.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Chris Lameter <cl@linux.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Andrew Hunter <ahh@google.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com> Cc: Paul Turner <pjt@google.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ben Maurer <bmaurer@fb.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: linux-api@vger.kernel.org Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20151027235635.16059.11630.stgit@pjt-glaptop.roam.corp.google.com Link: http://lkml.kernel.org/r/20150624222609.6116.86035.stgit@kitami.mtv.corp.google.com Link: https://lkml.kernel.org/r/20180602124408.8430-3-mathieu.desnoyers@efficios.com
|
#
2972666a |
|
28-May-2018 |
Masahiro Yamada <yamada.masahiro@socionext.com> |
kconfig: replace $(UNAME_RELEASE) with function call Now that 'shell' function is supported, this can be self-contained in Kconfig. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Ulf Magnusson <ulfalizer@gmail.com>
|
#
104daea1 |
|
28-May-2018 |
Masahiro Yamada <yamada.masahiro@socionext.com> |
kconfig: reference environment variables directly and remove 'option env=' To get access to environment variables, Kconfig needs to define a symbol using "option env=" syntax. It is tedious to add a symbol entry for each environment variable given that we need to define much more such as 'CC', 'AS', 'srctree' etc. to evaluate the compiler capability in Kconfig. Adding '$' for symbol references is grammatically inconsistent. Looking at the code, the symbols prefixed with 'S' are expanded by: - conf_expand_value() This is used to expand 'arch/$ARCH/defconfig' and 'defconfig_list' - sym_expand_string_value() This is used to expand strings in 'source' and 'mainmenu' All of them are fixed values independent of user configuration. So, they can be changed into the direct expansion instead of symbols. This change makes the code much cleaner. The bounce symbols 'SRCARCH', 'ARCH', 'SUBARCH', 'KERNELVERSION' are gone. sym_init() hard-coding 'UNAME_RELEASE' is also gone. 'UNAME_RELEASE' should be replaced with an environment variable. ARCH_DEFCONFIG is a normal symbol, so it should be simply referenced without '$' prefix. The new syntax is addicted by Make. The variable reference needs parentheses, like $(FOO), but you can omit them for single-letter variables, like $F. Yet, in Makefiles, people tend to use the parenthetical form for consistency / clarification. At this moment, only the environment variable is supported, but I will extend the concept of 'variable' later on. The variables are expanded in the lexer so we can simplify the token handling on the parser side. For example, the following code works. [Example code] config MY_TOOLCHAIN_LIST string default "My tools: CC=$(CC), AS=$(AS), CPP=$(CPP)" [Result] $ make -s alldefconfig && tail -n 1 .config CONFIG_MY_TOOLCHAIN_LIST="My tools: CC=gcc, AS=as, CPP=gcc -E" Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Reviewed-by: Kees Cook <keescook@chromium.org>
|
#
f1089c92 |
|
28-May-2018 |
Masahiro Yamada <yamada.masahiro@socionext.com> |
kbuild: remove CONFIG_CROSS_COMPILE support Kbuild provides a couple of ways to specify CROSS_COMPILE: [1] Command line [2] Environment [3] arch/*/Makefile (only some architectures) [4] CONFIG_CROSS_COMPILE [4] is problematic for the compiler capability tests in Kconfig. CONFIG_CROSS_COMPILE allows users to change the compiler prefix from 'make menuconfig', etc. It means, the compiler options would have to be all re-calculated everytime CONFIG_CROSS_COMPILE is changed. To avoid complexity and performance issues, I'd like to evaluate the shell commands statically, i.e. only parsing Kconfig files. I guess the majority is [1] or [2]. Currently, there are only 5 defconfig files that specify CONFIG_CROSS_COMPILE. arch/arm/configs/lpc18xx_defconfig arch/hexagon/configs/comet_defconfig arch/nds32/configs/defconfig arch/openrisc/configs/or1ksim_defconfig arch/openrisc/configs/simple_smp_defconfig Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Reviewed-by: Kees Cook <keescook@chromium.org>
|
#
cd33d880 |
|
15-May-2018 |
Sebastian Andrzej Siewior <bigeasy@linutronix.de> |
sched/fair: Fix documentation file path The 'tip' prefix probably referred to the -tip tree and is not required, remove it. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Turner <pjt@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20180515165328.24899-1-bigeasy@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
5d20ee31 |
|
09-May-2018 |
Nicholas Piggin <npiggin@gmail.com> |
kbuild: Allow LD_DEAD_CODE_DATA_ELIMINATION to be selectable if enabled Architectures that are capable can select HAVE_LD_DEAD_CODE_DATA_ELIMINATION to enable selection of that option (as an EXPERT kernel option). Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
|
#
bae77c5e |
|
07-May-2018 |
Song Liu <songliubraving@fb.com> |
bpf: enable stackmap with build_id in nmi context Currently, we cannot parse build_id in nmi context because of up_read(¤t->mm->mmap_sem), this makes stackmap with build_id less useful. This patch enables parsing build_id in nmi by putting the up_read() call in irq_work. To avoid memory allocation in nmi context, we use per cpu variable for the irq_work. As a result, only one irq_work per cpu is allowed. If the irq_work is in-use, we fallback to only report ips. Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
#
7303e30e |
|
05-Apr-2018 |
Dominik Brodowski <linux@dominikbrodowski.net> |
syscalls/core: Prepare CONFIG_ARCH_HAS_SYSCALL_WRAPPER=y for compat syscalls It may be useful for an architecture to override the definitions of the COMPAT_SYSCALL_DEFINE0() and __COMPAT_SYSCALL_DEFINEx() macros in <linux/compat.h>, in particular to use a different calling convention for syscalls. This patch provides a mechanism to do so, based on the previously introduced CONFIG_ARCH_HAS_SYSCALL_WRAPPER. If it is enabled, <asm/sycall_wrapper.h> is included in <linux/compat.h> and may be used to define the macros mentioned above. Moreover, as the syscall calling convention may be different if CONFIG_ARCH_HAS_SYSCALL_WRAPPER is set, the compat syscall function prototypes in <linux/compat.h> are #ifndef'd out in that case. As some of the syscalls and/or compat syscalls may not be present, the COND_SYSCALL() and COND_SYSCALL_COMPAT() macros in kernel/sys_ni.c as well as the SYS_NI() and COMPAT_SYS_NI() macros in kernel/time/posix-stubs.c can be re-defined in <asm/syscall_wrapper.h> iff CONFIG_ARCH_HAS_SYSCALL_WRAPPER is enabled. Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20180405095307.3730-5-linux@dominikbrodowski.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
1bd21c6c |
|
05-Apr-2018 |
Dominik Brodowski <linux@dominikbrodowski.net> |
syscalls/core: Introduce CONFIG_ARCH_HAS_SYSCALL_WRAPPER=y It may be useful for an architecture to override the definitions of the SYSCALL_DEFINE0() and __SYSCALL_DEFINEx() macros in <linux/syscalls.h>, in particular to use a different calling convention for syscalls. This patch provides a mechanism to do so: It introduces CONFIG_ARCH_HAS_SYSCALL_WRAPPER. If it is enabled, <asm/sycall_wrapper.h> is included in <linux/syscalls.h> and may be used to define the macros mentioned above. Moreover, as the syscall calling convention may be different if CONFIG_ARCH_HAS_SYSCALL_WRAPPER is set, the syscall function prototypes in <linux/syscalls.h> are #ifndef'd out in that case. Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20180405095307.3730-3-linux@dominikbrodowski.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
a687a533 |
|
07-Mar-2018 |
Arnd Bergmann <arnd@arndb.de> |
treewide: simplify Kconfig dependencies for removed archs A lot of Kconfig symbols have architecture specific dependencies. In those cases that depend on architectures we have already removed, they can be omitted. Acked-by: Kalle Valo <kvalo@codeaurora.org> Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
#
739d875d |
|
08-Mar-2018 |
David Howells <dhowells@redhat.com> |
mn10300: Remove the architecture Remove the MN10300 arch as the hardware is defunct. Suggested-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David Howells <dhowells@redhat.com> cc: Masahiro Yamada <yamada.masahiro@socionext.com> cc: linux-am33-list@redhat.com Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
#
424529fb |
|
29-Dec-2017 |
William Breathitt Gray <vilhelm.gray@gmail.com> |
pc104: Add EXPERT dependency for PC104 Kconfig option PC/104 device driver Kconfig options previously had an implicit EXPERT dependency by way of an explicit ISA_BUS_API dependency. Now that these driver Kconfig options select ISA_BUS_API rather than depend on it, the PC104 Kconfig option should have an explicit EXPERT dependency. The PC/104 form factor and bus architecture are common in embedded and specialized systems, but uncommon in typical desktop setups. For this reason, it is best to mask these devices and configurations via the EXPERT Kconfig option because the majority of users will never need to concern themselves with PC/104. Signed-off-by: William Breathitt Gray <vilhelm.gray@gmail.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
|
#
70216e18 |
|
29-Jan-2018 |
Mathieu Desnoyers <mathieu.desnoyers@efficios.com> |
membarrier: Provide core serializing command, *_SYNC_CORE Provide core serializing membarrier command to support memory reclaim by JIT. Each architecture needs to explicitly opt into that support by documenting in their architecture code how they provide the core serializing instructions required when returning from the membarrier IPI, and after the scheduler has updated the curr->mm pointer (before going back to user-space). They should then select ARCH_HAS_MEMBARRIER_SYNC_CORE to enable support for that command on their architecture. Architectures selecting this feature need to either document that they issue core serializing instructions when returning to user-space, or implement their architecture-specific sync_core_before_usermode(). Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrea Parri <parri.andrea@gmail.com> Cc: Andrew Hunter <ahh@google.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Avi Kivity <avi@scylladb.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Dave Watson <davejwatson@fb.com> Cc: David Sehr <sehr@google.com> Cc: Greg Hackmann <ghackmann@google.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Maged Michael <maged.michael@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-api@vger.kernel.org Cc: linux-arch@vger.kernel.org Link: http://lkml.kernel.org/r/20180129202020.8515-9-mathieu.desnoyers@efficios.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
e61938a9 |
|
29-Jan-2018 |
Mathieu Desnoyers <mathieu.desnoyers@efficios.com> |
locking: Introduce sync_core_before_usermode() Introduce an architecture function that ensures the current CPU issues a core serializing instruction before returning to usermode. This is needed for the membarrier "sync_core" command. Architectures defining the sync_core_before_usermode() static inline need to select ARCH_HAS_SYNC_CORE_BEFORE_USERMODE. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrea Parri <parri.andrea@gmail.com> Cc: Andrew Hunter <ahh@google.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Avi Kivity <avi@scylladb.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Dave Watson <davejwatson@fb.com> Cc: David Sehr <sehr@google.com> Cc: Greg Hackmann <ghackmann@google.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Maged Michael <maged.michael@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-api@vger.kernel.org Cc: linux-arch@vger.kernel.org Link: http://lkml.kernel.org/r/20180129202020.8515-7-mathieu.desnoyers@efficios.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
3ccfebed |
|
29-Jan-2018 |
Mathieu Desnoyers <mathieu.desnoyers@efficios.com> |
powerpc, membarrier: Skip memory barrier in switch_mm() Allow PowerPC to skip the full memory barrier in switch_mm(), and only issue the barrier when scheduling into a task belonging to a process that has registered to use expedited private. Threads targeting the same VM but which belong to different thread groups is a tricky case. It has a few consequences: It turns out that we cannot rely on get_nr_threads(p) to count the number of threads using a VM. We can use (atomic_read(&mm->mm_users) == 1 && get_nr_threads(p) == 1) instead to skip the synchronize_sched() for cases where the VM only has a single user, and that user only has a single thread. It also turns out that we cannot use for_each_thread() to set thread flags in all threads using a VM, as it only iterates on the thread group. Therefore, test the membarrier state variable directly rather than relying on thread flags. This means membarrier_register_private_expedited() needs to set the MEMBARRIER_STATE_PRIVATE_EXPEDITED flag, issue synchronize_sched(), and only then set MEMBARRIER_STATE_PRIVATE_EXPEDITED_READY which allows private expedited membarrier commands to succeed. membarrier_arch_switch_mm() now tests for the MEMBARRIER_STATE_PRIVATE_EXPEDITED flag. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Andrea Parri <parri.andrea@gmail.com> Cc: Andrew Hunter <ahh@google.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Avi Kivity <avi@scylladb.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Dave Watson <davejwatson@fb.com> Cc: David Sehr <sehr@google.com> Cc: Greg Hackmann <ghackmann@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Maged Michael <maged.michael@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-api@vger.kernel.org Cc: linux-arch@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Link: http://lkml.kernel.org/r/20180129202020.8515-3-mathieu.desnoyers@efficios.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
290af866 |
|
09-Jan-2018 |
Alexei Starovoitov <ast@kernel.org> |
bpf: introduce BPF_JIT_ALWAYS_ON config The BPF interpreter has been used as part of the spectre 2 attack CVE-2017-5715. A quote from goolge project zero blog: "At this point, it would normally be necessary to locate gadgets in the host kernel code that can be used to actually leak data by reading from an attacker-controlled location, shifting and masking the result appropriately and then using the result of that as offset to an attacker-controlled address for a load. But piecing gadgets together and figuring out which ones work in a speculation context seems annoying. So instead, we decided to use the eBPF interpreter, which is built into the host kernel - while there is no legitimate way to invoke it from inside a VM, the presence of the code in the host kernel's text section is sufficient to make it usable for the attack, just like with ordinary ROP gadgets." To make attacker job harder introduce BPF_JIT_ALWAYS_ON config option that removes interpreter from the kernel in favor of JIT-only mode. So far eBPF JIT is supported by: x64, arm64, arm32, sparc64, s390, powerpc64, mips64 The start of JITed program is randomized and code page is marked as read-only. In addition "constant blinding" can be turned on with net.core.bpf_jit_harden v2->v3: - move __bpf_prog_ret0 under ifdef (Daniel) v1->v2: - fix init order, test_bpf and cBPF (Daniel's feedback) - fix offloaded bpf (Jakub's feedback) - add 'return 0' dummy in case something can invoke prog->bpf_func - retarget bpf tree. For bpf-next the patch would need one extra hunk. It will be sent when the trees are merged back to net-next Considered doing: int bpf_jit_enable __read_mostly = BPF_EBPF_JIT_DEFAULT; but it seems better to land the patch as-is and in bpf-next remove bpf_jit_enable global variable from all JITs, consolidate in one place and remove this jit_init() function. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
#
414a2dc1 |
|
01-Jan-2018 |
Geert Uytterhoeven <geert@linux-m68k.org> |
sched/isolation: Make CONFIG_CPU_ISOLATION=y depend on SMP or COMPILE_TEST On uniprocessor systems, critical and non-critical tasks cannot be isolated, as there is only a single CPU core. Hence enabling CPU isolation by default on such systems does not make much sense. Instead of changing the default for !SMP, fix this by making the feature depend on SMP, with an override for compile-testing. Note that its sole selector (NO_HZ_FULL) already depends on SMP. This decreases kernel size for a default uniprocessor kernel by ca. 1 KiB. Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Nicolas Pitre <nico@linaro.org> Cc: Frederic Weisbecker <frederic@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 2c43838c99d9d23f ("sched/isolation: Enable CONFIG_CPU_ISOLATION=y by default") Link: http://lkml.kernel.org/r/1514891590-20782-1-git-send-email-geert@linux-m68k.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
2c43838c |
|
14-Dec-2017 |
Frederic Weisbecker <frederic@kernel.org> |
sched/isolation: Enable CONFIG_CPU_ISOLATION=y by default The "isolcpus=" boot parameter support was always built-in before we moved the related code under CONFIG_CPU_ISOLATION. Having it disabled by default is very confusing for people accustomed to use this parameter. So enable it by dafault to keep the previous behaviour but keep it optable for those who want to tinify their kernels. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Chris Metcalf <cmetcalf@mellanox.com> Cc: Christoph Lameter <cl@linux.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luiz Capitulino <lcapitulino@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wanpeng Li <kernellwp@gmail.com> Cc: kernel test robot <xiaolong.ye@intel.com> Link: http://lkml.kernel.org/r/1513275507-29200-3-git-send-email-frederic@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
d1b069f5 |
|
17-Nov-2017 |
Randy Dunlap <rdunlap@infradead.org> |
EXPERT Kconfig menu: fix broken EXPERT menu Clean up the EXPERT menu (yet again). Move FHANDLE and CHECKPOINT_RESTORE into the primary EXPERT menu since they already depend on EXPERT. Move BPF_SYSCALL and USERFAULTFD out of the EXPERT Kconfig symbols menu list since they do not depend on EXPERT and were breaking the continuity of that menu list. Move all of the KALLSYMS Kconfig symbols to the end of the EXPERT menu. This separates the kernel services from the build options. This patch depends on [PATCH] pci: move PCI_QUIRKS to the PCI bus menu (https://lkml.org/lkml/2017/11/2/907). Link: http://lkml.kernel.org/r/72e4465a-a5ff-cb3c-1a90-11aa4861b161@infradead.org Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> [BPF] Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
5b365771 |
|
15-Nov-2017 |
Yang Shi <yang.s@alibaba-inc.com> |
mm: slabinfo: remove CONFIG_SLABINFO According to discussion with Christoph (https://marc.info/?l=linux-kernel&m=150695909709711&w=2), it sounds like it is pointless to keep CONFIG_SLABINFO around. This patch removes the CONFIG_SLABINFO config option, but /proc/slabinfo is still available. [yang.s@alibaba-inc.com: v11] Link: http://lkml.kernel.org/r/1507656303-103845-3-git-send-email-yang.s@alibaba-inc.com Link: http://lkml.kernel.org/r/1507152550-46205-3-git-send-email-yang.s@alibaba-inc.com Signed-off-by: Yang Shi <yang.s@alibaba-inc.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Pekka Enberg <penberg@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
03ea2263 |
|
02-Nov-2017 |
Randy Dunlap <rdunlap@infradead.org> |
PCI: Move PCI_QUIRKS to the PCI bus menu Localize PCI_QUIRKS in the PCI bus menu. Move PCI_QUIRKS to the PCI bus menu instead of the (often broken) General Setup EXPERT menu. The prompt still depends on EXPERT. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
#
6f1982fe |
|
26-Oct-2017 |
Frederic Weisbecker <frederic@kernel.org> |
sched/isolation: Handle the nohz_full= parameter We want to centralize the isolation management, done by the housekeeping subsystem. Therefore we need to handle the nohz_full= parameter from there. Since nohz_full= so far has involved unbound timers, watchdog, RCU and tilegx NAPI isolation, we keep that default behaviour. nohz_full= will be deprecated in the future. We want to control the isolation features from the isolcpus= parameter. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Chris Metcalf <cmetcalf@mellanox.com> Cc: Christoph Lameter <cl@linux.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luiz Capitulino <lcapitulino@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Wanpeng Li <kernellwp@gmail.com> Link: http://lkml.kernel.org/r/1509072159-31808-10-git-send-email-frederic@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
5c4991e2 |
|
26-Oct-2017 |
Frederic Weisbecker <frederic@kernel.org> |
sched/isolation: Split out new CONFIG_CPU_ISOLATION=y config from CONFIG_NO_HZ_FULL Split the housekeeping config from CONFIG_NO_HZ_FULL. This way we finally separate the isolation code from NOHZ. Although a dependency to CONFIG_NO_HZ_FULL remains for now, while the housekeeping code still deals with NOHZ internals. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Chris Metcalf <cmetcalf@mellanox.com> Cc: Christoph Lameter <cl@linux.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luiz Capitulino <lcapitulino@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Wanpeng Li <kernellwp@gmail.com> Link: http://lkml.kernel.org/r/1509072159-31808-8-git-send-email-frederic@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
cbdc8217 |
|
10-Sep-2017 |
Nathan Chancellor <nathan@kernel.org> |
init/Kconfig: Fix module signing document location This was moved in commit 94e980cc45f2 ("Documentation/module-signing.txt: convert to ReST markup") and was missed by commit 8c27ceff3604 ("docs: fix locations of several documents that got moved"). Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
|
#
2cc3ce24 |
|
03-Oct-2017 |
Ulf Magnusson <ulfalizer@gmail.com> |
kbuild: Fix optimization level choice default The choice containing the CC_OPTIMIZE_FOR_PERFORMANCE symbol accidentally added a "CONFIG_" prefix when trying to make it the default, selecting an undefined symbol as the default. The mistake is harmless here: Since the default symbol is not visible, the choice falls back on using the visible symbol as the default instead, which is CC_OPTIMIZE_FOR_PERFORMANCE, as intended. A patch that makes Kconfig print a warning in this case has been submitted separately: http://www.spinics.net/lists/linux-kbuild/msg15566.html Signed-off-by: Ulf Magnusson <ulfalizer@gmail.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
|
#
2482ddec |
|
06-Sep-2017 |
Kees Cook <keescook@chromium.org> |
mm: add SLUB free list pointer obfuscation This SLUB free list pointer obfuscation code is modified from Brad Spengler/PaX Team's code in the last public patch of grsecurity/PaX based on my understanding of the code. Changes or omissions from the original code are mine and don't reflect the original grsecurity/PaX code. This adds a per-cache random value to SLUB caches that is XORed with their freelist pointer address and value. This adds nearly zero overhead and frustrates the very common heap overflow exploitation method of overwriting freelist pointers. A recent example of the attack is written up here: http://cyseclabs.com/blog/cve-2016-6187-heap-off-by-one-exploit and there is a section dedicated to the technique the book "A Guide to Kernel Exploitation: Attacking the Core". This is based on patches by Daniel Micay, and refactored to minimize the use of #ifdef. With 200-count cycles of "hackbench -g 20 -l 1000" I saw the following run times: before: mean 10.11882499999999999995 variance .03320378329145728642 stdev .18221905304181911048 after: mean 10.12654000000000000014 variance .04700556623115577889 stdev .21680767106160192064 The difference gets lost in the noise, but if the above is to be taken literally, using CONFIG_FREELIST_HARDENED is 0.07% slower. Link: http://lkml.kernel.org/r/20170802180609.GA66807@beast Signed-off-by: Kees Cook <keescook@chromium.org> Suggested-by: Daniel Micay <danielmicay@gmail.com> Cc: Rik van Riel <riel@redhat.com> Cc: Tycho Andersen <tycho@docker.com> Cc: Alexander Popov <alex.popov@linux.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
bc2eecd7 |
|
31-Jul-2017 |
Nicolas Pitre <nico@fluxnic.net> |
futex: Allow for compiling out PI support This makes it possible to preserve basic futex support and compile out the PI support when RT mutexes are not available. Signed-off-by: Nicolas Pitre <nico@linaro.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Darren Hart <dvhart@infradead.org> Link: http://lkml.kernel.org/r/alpine.LFD.2.20.1708010024190.5981@knanqh.ubzr
|
#
7660a6fd |
|
06-Jul-2017 |
Kees Cook <keescook@chromium.org> |
mm: allow slab_nomerge to be set at build time Some hardened environments want to build kernels with slab_nomerge already set (so that they do not depend on remembering to set the kernel command line option). This is desired to reduce the risk of kernel heap overflows being able to overwrite objects from merged caches and changes the requirements for cache layout control, increasing the difficulty of these attacks. By keeping caches unmerged, these kinds of exploits can usually only damage objects in the same cache (though the risk to metadata exploitation is unchanged). Link: http://lkml.kernel.org/r/20170620230911.GA25238@beast Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Daniel Micay <danielmicay@gmail.com> Cc: David Windsor <dave@nullcore.net> Cc: Eric Biggers <ebiggers3@gmail.com> Cc: Christoph Lameter <cl@linux.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Daniel Micay <danielmicay@gmail.com> Cc: David Windsor <dave@nullcore.net> Cc: Eric Biggers <ebiggers3@gmail.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Nicolas Pitre <nicolas.pitre@linaro.org> Cc: Tejun Heo <tj@kernel.org> Cc: Daniel Mack <daniel@zonque.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Helge Deller <deller@gmx.de> Cc: Rik van Riel <riel@redhat.com> Cc: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
e1d4eeec |
|
14-Jun-2017 |
Nicolas Pitre <nico@fluxnic.net> |
sched/cpuset: Only offer CONFIG_CPUSETS if SMP is enabled Make CONFIG_CPUSETS=y depend on SMP as this feature makes no sense on UP. This allows for configuring out cpuset_cpumask_can_shrink() and task_can_attach() entirely, which shrinks the kernel a bit. Signed-off-by: Nicolas Pitre <nico@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20170614171926.8345-2-nicolas.pitre@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
23b0be48 |
|
13-Jun-2017 |
Waiman Long <longman@redhat.com> |
cgroup: Make Kconfig prompt of debug cgroup more accurate The Kconfig prompt and description of the debug cgroup controller more accurate by saying that it is for debug purpose only and its interfaces are unstable. Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>
|
#
0af92d46 |
|
17-May-2017 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Move RCU non-debug Kconfig options to kernel/rcu RCU's Kconfig options are scattered, and there are enough of them that it would be good for them to be more centralized. This commit therefore extracts RCU's Kconfig options from init/Kconfig into a new kernel/rcu/Kconfig file. Reported-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
44c65ff2 |
|
15-May-2017 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Eliminate NOCBs CPU-state Kconfig options The CONFIG_RCU_NOCB_CPU_ALL, CONFIG_RCU_NOCB_CPU_NONE, and CONFIG_RCU_NOCB_CPU_ZERO Kconfig options are used only in testing and are redundant with the rcu_nocbs= boot parameter. This commit therefore removes these three Kconfig options and adjusts the rcutorture scripts to use the boot parameter instead. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
ae91aa0a |
|
15-May-2017 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Remove debugfs tracing RCU's debugfs tracing used to be the only reasonable low-level debug information available, but ftrace and event tracing has since surpassed the RCU debugfs level of usefulness. This commit therefore removes RCU's debugfs tracing. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
bd8cc5a0 |
|
15-May-2017 |
Paul E. McKenney <paulmck@kernel.org> |
srcu: Remove Classic SRCU Classic SRCU was only ever intended to be a fallback in case of issues with Tree/Tiny SRCU, and the latter two are doing quite well in testing. This commit therefore removes Classic SRCU. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
f7a10a97 |
|
10-May-2017 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Remove the RCU_KTHREAD_PRIO Kconfig option Anything that can be done with the RCU_KTHREAD_PRIO Kconfig option can also be done with the rcutree.kthread_prio kernel boot parameter. This commit therefore removes this Kconfig option. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Rik van Riel <riel@redhat.com>
|
#
2464dd94 |
|
04-May-2017 |
Paul E. McKenney <paulmck@kernel.org> |
srcu: Apply trivial callback lists to shrink Tiny SRCU The rcu_segcblist structure provides quite a bit of functionality, and Tiny SRCU needs almost none of it. So this commit replaces Tiny SRCU's uses of rcu_segcblist with a simple singly linked list with tail pointer. This change significantly reduces Tiny SRCU's memory footprint, more than making up for the growth caused by the creation of rcu_segcblist.c Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
07f6e64b |
|
28-Apr-2017 |
Paul E. McKenney <paulmck@kernel.org> |
srcu: Make SRCU be once again optional Commit d160a727c40e ("srcu: Make SRCU be built by default") in response to build errors, which were caused by code that included srcu.h despite !SRCU. However, srcutiny.o is almost 2K of code, which is not insignificant for those attempting to run the Linux kernel on IoT devices. This commit therefore makes SRCU be once again optional, and adjusts srcu.h to allow error-free inclusion in !SRCU kernel builds. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Nicolas Pitre <nico@linaro.org>
|
#
98059b98 |
|
02-May-2017 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Separately compile large rcu_segcblist functions This commit creates a new kernel/rcu/rcu_segcblist.c file that contains non-trivial segcblist functions. Trivial functions remain as static inline functions in kernel/rcu/rcu_segcblist.h Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de>
|
#
d160a727 |
|
23-Apr-2017 |
Paul E. McKenney <paulmck@kernel.org> |
srcu: Make SRCU be built by default SRCU is optional, and included only if there is a "select SRCU" in effect. However, we now have Tiny SRCU, so this commit defaults CONFIG_SRCU=y. Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
677df9d4 |
|
23-Apr-2017 |
Paul E. McKenney <paulmck@kernel.org> |
srcu: Fix Kconfig botch when SRCU not selected If the CONFIG_SRCU option is not selected, for example, when building arch/tile allnoconfig, the following build errors appear: kernel/rcu/tree.o: In function `srcu_online_cpu': tree.c:(.text+0x4248): multiple definition of `srcu_online_cpu' kernel/rcu/srcutree.o:srcutree.c:(.text+0x2120): first defined here kernel/rcu/tree.o: In function `srcu_offline_cpu': tree.c:(.text+0x4250): multiple definition of `srcu_offline_cpu' kernel/rcu/srcutree.o:srcutree.c:(.text+0x2160): first defined here The corresponding .config file shows CONFIG_TREE_SRCU=y, but no sign of CONFIG_SRCU, which fatally confuses SRCU's #ifdefs, resulting in the above errors. The reason this occurs is the folowing line in init/Kconfig's definition for TREE_SRCU: default y if !TINY_RCU && !CLASSIC_SRCU If CONFIG_CLASSIC_SRCU=n, as it will be in for allnoconfig, and if CONFIG_SMP=y, then we will get CONFIG_TREE_SRCU=y but no CONFIG_SRCU, as seen in the .config file, and which will result in the above errors. This error did not show up during rcutorture testing because rcutorture forces CONFIG_SRCU=y, as it must to prevent build errors in rcutorture.c. This commit therefore conditions TREE_SRCU (and TINY_SRCU, while it is at it) with SRCU, like this: default y if SRCU && !TINY_RCU && !CLASSIC_SRCU Reported-by: kbuild test robot <fengguang.wu@intel.com> Reported-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/20170423162205.GP3956@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
02482880 |
|
03-Feb-2017 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Make RCU_FANOUT_LEAF help text more explicit about skew_tick If you set RCU_FANOUT_LEAF too high, you can get lock contention on the leaf rcu_node, and you should boot with the skew_tick kernel parameter set in order to avoid this lock contention. This commit therefore upgrades the RCU_FANOUT_LEAF help text to explicitly state this. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
dad81a20 |
|
25-Mar-2017 |
Paul E. McKenney <paulmck@kernel.org> |
srcu: Introduce CLASSIC_SRCU Kconfig option The TREE_SRCU rewrite is large and a bit on the non-simple side, so this commit helps reduce risk by allowing the old v4.11 SRCU algorithm to be selected using a new CLASSIC_SRCU Kconfig option that depends on RCU_EXPERT. The default is to use the new TREE_SRCU and TINY_SRCU algorithms, in order to help get these the testing that they need. However, if your users do not require the update-side scalability that is to be provided by TREE_SRCU, select RCU_EXPERT and then CLASSIC_SRCU to revert back to the old classic SRCU algorithm. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
d8be8173 |
|
25-Mar-2017 |
Paul E. McKenney <paulmck@kernel.org> |
srcu: Create a tiny SRCU In response to automated complaints about modifications to SRCU increasing its size, this commit creates a tiny SRCU that is used in SMP=n && PREEMPT=n builds. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
1663f26d |
|
22-Feb-2017 |
Tejun Heo <tj@kernel.org> |
slub: make sysfs directories for memcg sub-caches optional SLUB creates a per-cache directory under /sys/kernel/slab which hosts a bunch of debug files. Usually, there aren't that many caches on a system and this doesn't really matter; however, if memcg is in use, each cache can have per-cgroup sub-caches. SLUB creates the same directories for these sub-caches under /sys/kernel/slab/$CACHE/cgroup. Unfortunately, because there can be a lot of cgroups, active or draining, the product of the numbers of caches, cgroups and files in each directory can reach a very high number - hundreds of thousands is commonplace. Millions and beyond aren't difficult to reach either. What's under /sys/kernel/slab is primarily for debugging and the information and control on the a root cache already cover its sub-caches. While having a separate directory for each sub-cache can be helpful for development, it doesn't make much sense to pay this amount of overhead by default. This patch introduces a boot parameter slub_memcg_sysfs which determines whether to create sysfs directories for per-memcg sub-caches. It also adds CONFIG_SLUB_MEMCG_SYSFS_ON which determines the boot parameter's default value and defaults to 0. [akpm@linux-foundation.org: kset_unregister(NULL) is legal] Link: http://lkml.kernel.org/r/20170204145203.GB26958@mtj.duckdns.org Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
f92bac3b |
|
27-Dec-2016 |
Sergey Senozhatsky <sergey.senozhatsky@gmail.com> |
printk: rename nmi.c and exported api A preparation patch for printk_safe work. No functional change. - rename nmi.c to print_safe.c - add `printk_safe' prefix to some (which used both by printk-safe and printk-nmi) of the exported functions. Link: http://lkml.kernel.org/r/20161227141611.940-3-sergey.senozhatsky@gmail.com Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Jan Kara <jack@suse.cz> Cc: Tejun Heo <tj@kernel.org> Cc: Calvin Owens <calvinowens@fb.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Hurley <peter@hurleysoftware.com> Cc: linux-kernel@vger.kernel.org Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: Petr Mladek <pmladek@suse.com>
|
#
56067812 |
|
03-Feb-2017 |
Ard Biesheuvel <ardb@kernel.org> |
kbuild: modversions: add infrastructure for emitting relative CRCs This add the kbuild infrastructure that will allow architectures to emit vmlinux symbol CRCs as 32-bit offsets to another location in the kernel where the actual value is stored. This works around problems with CRCs being mistaken for relocatable symbols on kernels that self relocate at runtime (i.e., powerpc with CONFIG_RELOCATABLE=y) For the kbuild side of things, this comes down to the following: - introducing a Kconfig symbol MODULE_REL_CRCS - adding a -R switch to genksyms to instruct it to emit the CRC symbols as references into the .rodata section - making modpost distinguish such references from absolute CRC symbols by the section index (SHN_ABS) - making kallsyms disregard non-absolute symbols with a __crc_ prefix Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
1626c365 |
|
14-Dec-2016 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Re-enable TASKS_RCU for User Mode Linux Now that User Mode Linux supports arch_irqs_disabled_flags(), this commit re-enables TASKS_RCU for User Mode Linux. Reported-by: Richard Weinberger <richard@nod.at> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
|
#
ad90a3de |
|
10-Jan-2017 |
William Breathitt Gray <vilhelm.gray@gmail.com> |
pc104: Introduce the PC104 Kconfig option PC/104 form factor devices serve a specific niche of embedded system users; most Linux users will not have PC/104 form factor devices. This patch introduces the PC104 Kconfig option, which should be used to filter PC/104 specific device drivers and options, so that only those users interested in PC/104 related options are exposed to them. Signed-off-by: William Breathitt Gray <vilhelm.gray@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
7c6094db |
|
02-Nov-2016 |
Sebastian Andrzej Siewior <bigeasy@linutronix.de> |
rcu: update: Make RCU_EXPEDITE_BOOT be the default RCU_EXPEDITE_BOOT should speed up the boot process by enforcing synchronize_rcu_expedited() instead of synchronize_rcu() during the boot process. There should be no reason why one does not want this and there is no need worry about real time latency at this point. Therefore make it default. Note that users wishing to avoid expediting entirely, for example when bringing up new hardware possibly having flaky IPIs, can use the rcu_normal boot parameter to override boot-time expediting. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> [ paulmck: Reworded commit log. ] Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
|
#
73b35147 |
|
10-Jan-2017 |
Arnd Bergmann <arnd@arndb.de> |
cgroup: move CONFIG_SOCK_CGROUP_DATA to init/Kconfig We now 'select SOCK_CGROUP_DATA' but Kconfig complains that this is not right when CONFIG_NET is disabled and there is no socket interface: warning: (CGROUP_BPF) selects SOCK_CGROUP_DATA which has unmet direct dependencies (NET) I don't know what the correct solution for this is, but simply removing the dependency on NET from SOCK_CGROUP_DATA by moving it out of the 'if NET' section avoids the warning and does not produce other build errors. Fixes: 483c4933ea09 ("cgroup: Fix CGROUP_BPF config") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
39d3e758 |
|
09-Jan-2017 |
Parav Pandit <pandit.parav@gmail.com> |
rdmacg: Added rdma cgroup controller Added rdma cgroup controller that does accounting, limit enforcement on rdma/IB resources. Added rdma cgroup header file which defines its APIs to perform charging/uncharging functionality. It also defined APIs for RDMA/IB stack for device registration. Devices which are registered will participate in controller functions of accounting and limit enforcements. It define rdmacg_device structure to bind IB stack and RDMA cgroup controller. RDMA resources are tracked using resource pool. Resource pool is per device, per cgroup entity which allows setting up accounting limits on per device basis. Currently resources are defined by the RDMA cgroup. Resource pool is created/destroyed dynamically whenever charging/uncharging occurs respectively and whenever user configuration is done. Its a tradeoff of memory vs little more code space that creates resource pool object whenever necessary, instead of creating them during cgroup creation and device registration time. Signed-off-by: Parav Pandit <pandit.parav@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>
|
#
483c4933 |
|
16-Dec-2016 |
Andy Lutomirski <luto@kernel.org> |
cgroup: Fix CGROUP_BPF config CGROUP_BPF depended on SOCK_CGROUP_DATA which can't be manually enabled, making it rather challenging to turn CGROUP_BPF on. Signed-off-by: Andy Lutomirski <luto@kernel.org> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
faaae2a5 |
|
29-Nov-2016 |
Linus Torvalds <torvalds@linux-foundation.org> |
Re-enable CONFIG_MODVERSIONS in a slightly weaker form This enables CONFIG_MODVERSIONS again, but allows for missing symbol CRC information in order to work around the issue that newer binutils versions seem to occasionally drop the CRC on the floor. binutils 2.26 seems to work fine, while binutils 2.27 seems to break MODVERSIONS of symbols that have been defined in assembler files. [ We've had random missing CRC's before - it may be an old problem that just is now reliably triggered with the weak asm symbols and a new version of binutils ] Some day I really do want to remove MODVERSIONS entirely. Sadly, today does not appear to be that day: Debian people apparently do want the option to enable MODVERSIONS to make it easier to have external modules across kernel versions, and this seems to be a fairly minimal fix for the annoying problem. Cc: Ben Hutchings <ben@decadent.org.uk> Acked-by: Michal Marek <mmarek@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
cd3caefb |
|
25-Nov-2016 |
Linus Torvalds <torvalds@linux-foundation.org> |
Fix subtle CONFIG_MODVERSIONS problems CONFIG_MODVERSIONS has been broken for pretty much the whole 4.9 series, and quite frankly, nobody has cared very deeply. We absolutely know how to fix it, and it's not _complicated_, but it's not exactly pretty either. This oneliner fixes it without the ugliness, and allows for further future cleanups. "We've secretly replaced their regular MODVERSIONS with nothing at all, let's see if they notice" Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
30070984 |
|
23-Nov-2016 |
Daniel Mack <daniel@zonque.org> |
cgroup: add support for eBPF programs This patch adds two sets of eBPF program pointers to struct cgroup. One for such that are directly pinned to a cgroup, and one for such that are effective for it. To illustrate the logic behind that, assume the following example cgroup hierarchy. A - B - C \ D - E If only B has a program attached, it will be effective for B, C, D and E. If D then attaches a program itself, that will be effective for both D and E, and the program in B will only affect B and C. Only one program of a given type is effective for a cgroup. Attaching and detaching programs will be done through the bpf(2) syscall. For now, ingress and egress inet socket filtering are the only supported use-cases. Signed-off-by: Daniel Mack <daniel@zonque.org> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
baa73d9e |
|
10-Nov-2016 |
Nicolas Pitre <nico@fluxnic.net> |
posix-timers: Make them configurable Some embedded systems have no use for them. This removes about 25KB from the kernel binary size when configured out. Corresponding syscalls are routed to a stub logging the attempt to use those syscalls which should be enough of a clue if they were disabled without proper consideration. They are: timer_create, timer_gettime: timer_getoverrun, timer_settime, timer_delete, clock_adjtime, setitimer, getitimer, alarm. The clock_settime, clock_gettime, clock_getres and clock_nanosleep syscalls are replaced by simple wrappers compatible with CLOCK_REALTIME, CLOCK_MONOTONIC and CLOCK_BOOTTIME only which should cover the vast majority of use cases with very little code. Signed-off-by: Nicolas Pitre <nico@linaro.org> Acked-by: Richard Cochran <richardcochran@gmail.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <john.stultz@linaro.org> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Cc: Paul Bolle <pebolle@tiscali.nl> Cc: linux-kbuild@vger.kernel.org Cc: netdev@vger.kernel.org Cc: Michal Marek <mmarek@suse.com> Cc: Edward Cree <ecree@solarflare.com> Link: http://lkml.kernel.org/r/1478841010-28605-7-git-send-email-nicolas.pitre@linaro.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
8c27ceff3 |
|
18-Oct-2016 |
Mauro Carvalho Chehab <mchehab@kernel.org> |
docs: fix locations of several documents that got moved The previous patch renamed several files that are cross-referenced along the Kernel documentation. Adjust the links to point to the right places. Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
|
#
26b5679e |
|
11-Oct-2016 |
Peter Zijlstra <peterz@infradead.org> |
relay: Use irq_work instead of plain timer for deferred wakeup Relay avoids calling wake_up_interruptible() for doing the wakeup of readers/consumers, waiting for the generation of new data, from the context of a process which produced the data. This is apparently done to prevent the possibility of a deadlock in case Scheduler itself is is generating data for the relay, after acquiring rq->lock. The following patch used a timer (to be scheduled at next jiffy), for delegating the wakeup to another context. commit 7c9cb38302e78d24e37f7d8a2ea7eed4ae5f2fa7 Author: Tom Zanussi <zanussi@comcast.net> Date: Wed May 9 02:34:01 2007 -0700 relay: use plain timer instead of delayed work relay doesn't need to use schedule_delayed_work() for waking readers when a simple timer will do. Scheduling a plain timer, at next jiffies boundary, to do the wakeup causes a significant wakeup latency for the Userspace client, which makes relay less suitable for the high-frequency low-payload use cases where the data gets generated at a very high rate, like multiple sub buffers getting filled within a milli second. Moreover the timer is re-scheduled on every newly produced sub buffer so the timer keeps getting pushed out if sub buffers are filled in a very quick succession (less than a jiffy gap between filling of 2 sub buffers). As a result relay runs out of sub buffers to store the new data. By using irq_work it is ensured that wakeup of userspace client, blocked in the poll call, is done at earliest (through self IPI or next timer tick) enabling it to always consume the data in time. Also this makes relay consistent with printk & ring buffers (trace), as they too use irq_work for deferred wake up of readers. [arnd@arndb.de: select CONFIG_IRQ_WORK] Link: http://lkml.kernel.org/r/20160912154035.3222156-1-arnd@arndb.de [akpm@linux-foundation.org: coding-style fixes] Link: http://lkml.kernel.org/r/1472906487-1559-1-git-send-email-akash.goel@intel.com Signed-off-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Akash Goel <akash.goel@intel.com> Cc: Tom Zanussi <tzanussi@gmail.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
b5d5cf2b |
|
20-Sep-2016 |
Helge Deller <deller@gmx.de> |
parisc: Drop BROKEN_RODATA config option PARISC was the only architecture which selected the BROKEN_RODATA config option. Drop it and remove the special handling from init.h as well. Signed-off-by: Helge Deller <deller@gmx.de>
|
#
c6c314a6 |
|
15-Sep-2016 |
Andy Lutomirski <luto@kernel.org> |
sched/core: Add try_get_task_stack() and put_task_stack() There are a few places in the kernel that access stack memory belonging to a different task. Before we can start freeing task stacks before the task_struct is freed, we need a way for those code paths to pin the stack. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jann Horn <jann@thejh.net> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/17a434f50ad3d77000104f21666575e10a9c1fbd.1474003868.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
c65eacbe |
|
13-Sep-2016 |
Andy Lutomirski <luto@kernel.org> |
sched/core: Allow putting thread_info into task_struct If an arch opts in by setting CONFIG_THREAD_INFO_IN_TASK_STRUCT, then thread_info is defined as a single 'u32 flags' and is the first entry of task_struct. thread_info::task is removed (it serves no purpose if thread_info is embedded in task_struct), and thread_info::cpu gets its own slot in task_struct. This is heavily based on a patch written by Linus. Originally-from: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jann Horn <jann@thejh.net> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/a0898196f0476195ca02713691a5037a14f2aac5.1473801993.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
f1cb637e |
|
02-Aug-2016 |
Valdis Kletnieks <Valdis.Kletnieks@vt.edu> |
init/Kconfig: add clarification for out-of-tree modules It doesn't trim just symbols that are totally unused in-tree - it trims the symbols unused by any in-tree modules actually built. If you've done a 'make localmodconfig' and only build a hundred or so modules, it's pretty likely that your out-of-tree module will come up lacking something... Hopefully this will save the next guy from a Homer Simpson "D'oh!" moment. Link: http://lkml.kernel.org/r/10177.1469787292@turing-police.cc.vt.edu Signed-off-by: Valdis Kletnieks <valdis.kletnieks@vt.edu> Cc: Michal Marek <mmarek@suse.cz> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
ac3339ba |
|
02-Aug-2016 |
Alexey Dobriyan <adobriyan@gmail.com> |
init/Kconfig: ban CONFIG_LOCALVERSION_AUTO with allmodconfig Doing patches with allmodconfig kernel compiled and committing stuff into local tree have unfortunate consequence: kernel version changes (as it should) leading to recompiling and relinking of several files even if they weren't touched (or interesting at all). This and "git-whatever" figuring out current version slow down compilation for no good reason. But lets face it, "allmodconfig" kernels don't care about kernel version, they are simply compile check guinea pigs. Make LOCALVERSION_AUTO depend on !COMPILE_TEST, so it doesn't sneak into allmodconfig .config. Link: http://lkml.kernel.org/r/20160707214954.GC31678@p183.telecom.by Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Michal Marek <mmarek@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
bc083a64 |
|
02-Aug-2016 |
Richard Weinberger <richard@nod.at> |
init/Kconfig: make COMPILE_TEST depend on !UML UML is a bit special since it does not have iomem nor dma. That means a lot of drivers will not build if they miss a dependency on HAS_IOMEM. s390 used to have the same issues but since it gained PCI support UML is the only stranger. We are tired of patching dozens of new drivers after every merge window just to un-break allmod/yesconfig UML builds. One could argue that a decent driver has to know on what it depends and therefore a missing HAS_IOMEM dependency is a clear driver bug. But the dependency not obvious and not everyone does UML builds with COMPILE_TEST enabled when developing a device driver. A possible solution to make these builds succeed on UML would be providing stub functions for ioremap() and friends which fail upon runtime. Another one is simply disabling COMPILE_TEST for UML. Since it is the least hassle and does not force use to fake iomem support let's do the latter. Link: http://lkml.kernel.org/r/1466152995-28367-1-git-send-email-richard@nod.at Signed-off-by: Richard Weinberger <richard@nod.at> Acked-by: Arnd Bergmann <arnd@arndb.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
9991a9c8 |
|
02-Aug-2016 |
seokhoon.yoon <iamyooon@gmail.com> |
cgroup: update cgroup's document path cgroup's document path is changed to "cgroup-v1". update it. Link: http://lkml.kernel.org/r/1470148443-6509-1-git-send-email-iamyooon@gmail.com Signed-off-by: seokhoon.yoon <iamyooon@gmail.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
210e7a43 |
|
26-Jul-2016 |
Thomas Garnier <thgarnie@google.com> |
mm: SLUB freelist randomization Implements freelist randomization for the SLUB allocator. It was previous implemented for the SLAB allocator. Both use the same configuration option (CONFIG_SLAB_FREELIST_RANDOM). The list is randomized during initialization of a new set of pages. The order on different freelist sizes is pre-computed at boot for performance. Each kmem_cache has its own randomized freelist. This security feature reduces the predictability of the kernel SLUB allocator against heap overflows rendering attacks much less stable. For example these attacks exploit the predictability of the heap: - Linux Kernel CAN SLUB overflow (https://goo.gl/oMNWkU) - Exploiting Linux Kernel Heap corruptions (http://goo.gl/EXLn95) Performance results: slab_test impact is between 3% to 4% on average for 100000 attempts without smp. It is a very focused testing, kernbench show the overall impact on the system is way lower. Before: Single thread testing ===================== 1. Kmalloc: Repeatedly allocate then free test 100000 times kmalloc(8) -> 49 cycles kfree -> 77 cycles 100000 times kmalloc(16) -> 51 cycles kfree -> 79 cycles 100000 times kmalloc(32) -> 53 cycles kfree -> 83 cycles 100000 times kmalloc(64) -> 62 cycles kfree -> 90 cycles 100000 times kmalloc(128) -> 81 cycles kfree -> 97 cycles 100000 times kmalloc(256) -> 98 cycles kfree -> 121 cycles 100000 times kmalloc(512) -> 95 cycles kfree -> 122 cycles 100000 times kmalloc(1024) -> 96 cycles kfree -> 126 cycles 100000 times kmalloc(2048) -> 115 cycles kfree -> 140 cycles 100000 times kmalloc(4096) -> 149 cycles kfree -> 171 cycles 2. Kmalloc: alloc/free test 100000 times kmalloc(8)/kfree -> 70 cycles 100000 times kmalloc(16)/kfree -> 70 cycles 100000 times kmalloc(32)/kfree -> 70 cycles 100000 times kmalloc(64)/kfree -> 70 cycles 100000 times kmalloc(128)/kfree -> 70 cycles 100000 times kmalloc(256)/kfree -> 69 cycles 100000 times kmalloc(512)/kfree -> 70 cycles 100000 times kmalloc(1024)/kfree -> 73 cycles 100000 times kmalloc(2048)/kfree -> 72 cycles 100000 times kmalloc(4096)/kfree -> 71 cycles After: Single thread testing ===================== 1. Kmalloc: Repeatedly allocate then free test 100000 times kmalloc(8) -> 57 cycles kfree -> 78 cycles 100000 times kmalloc(16) -> 61 cycles kfree -> 81 cycles 100000 times kmalloc(32) -> 76 cycles kfree -> 93 cycles 100000 times kmalloc(64) -> 83 cycles kfree -> 94 cycles 100000 times kmalloc(128) -> 106 cycles kfree -> 107 cycles 100000 times kmalloc(256) -> 118 cycles kfree -> 117 cycles 100000 times kmalloc(512) -> 114 cycles kfree -> 116 cycles 100000 times kmalloc(1024) -> 115 cycles kfree -> 118 cycles 100000 times kmalloc(2048) -> 147 cycles kfree -> 131 cycles 100000 times kmalloc(4096) -> 214 cycles kfree -> 161 cycles 2. Kmalloc: alloc/free test 100000 times kmalloc(8)/kfree -> 66 cycles 100000 times kmalloc(16)/kfree -> 66 cycles 100000 times kmalloc(32)/kfree -> 66 cycles 100000 times kmalloc(64)/kfree -> 66 cycles 100000 times kmalloc(128)/kfree -> 65 cycles 100000 times kmalloc(256)/kfree -> 67 cycles 100000 times kmalloc(512)/kfree -> 67 cycles 100000 times kmalloc(1024)/kfree -> 64 cycles 100000 times kmalloc(2048)/kfree -> 67 cycles 100000 times kmalloc(4096)/kfree -> 67 cycles Kernbench, before: Average Optimal load -j 12 Run (std deviation): Elapsed Time 101.873 (1.16069) User Time 1045.22 (1.60447) System Time 88.969 (0.559195) Percent CPU 1112.9 (13.8279) Context Switches 189140 (2282.15) Sleeps 99008.6 (768.091) After: Average Optimal load -j 12 Run (std deviation): Elapsed Time 102.47 (0.562732) User Time 1045.3 (1.34263) System Time 88.311 (0.342554) Percent CPU 1105.8 (6.49444) Context Switches 189081 (2355.78) Sleeps 99231.5 (800.358) Link: http://lkml.kernel.org/r/1464295031-26375-3-git-send-email-thgarnie@google.com Signed-off-by: Thomas Garnier <thgarnie@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
ed18adc1 |
|
23-Jun-2016 |
Kees Cook <keescook@chromium.org> |
mm: SLUB hardened usercopy support Under CONFIG_HARDENED_USERCOPY, this adds object size checking to the SLUB allocator to catch any copies that may span objects. Includes a redzone handling fix discovered by Michael Ellerman. Based on code from PaX and grsecurity. Signed-off-by: Kees Cook <keescook@chromium.org> Tested-by: Michael Ellerman <mpe@ellerman.id.au> Reviwed-by: Laura Abbott <labbott@redhat.com>
|
#
04385fc5 |
|
23-Jun-2016 |
Kees Cook <keescook@chromium.org> |
mm: SLAB hardened usercopy support Under CONFIG_HARDENED_USERCOPY, this adds object size checking to the SLAB allocator to catch any copies that may span objects. Based on code from PaX and grsecurity. Signed-off-by: Kees Cook <keescook@chromium.org> Tested-by: Valdis Kletnieks <valdis.kletnieks@vt.edu>
|
#
b58c3584 |
|
13-Jul-2016 |
Rik van Riel <riel@redhat.com> |
sched/cputime: Replace VTIME_GEN irq time code with IRQ_TIME_ACCOUNTING code The CONFIG_VIRT_CPU_ACCOUNTING_GEN irq time tracking code does not appear to currently work right. On CPUs without nohz_full=, only tick based irq time sampling is done, which breaks down when dealing with a nohz_idle CPU. On firewalls and similar systems, no ticks may happen on a CPU for a while, and the irq time spent may never get accounted properly. This can cause issues with capacity planning and power saving, which use the CPU statistics as inputs in decision making. Remove the VTIME_GEN vtime irq time code, and replace it with the IRQ_TIME_ACCOUNTING code, when selected as a config option by the user. Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Radim Krcmar <rkrcmar@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wanpeng Li <wanpeng.li@hotmail.com> Link: http://lkml.kernel.org/r/1468421405-20056-3-git-send-email-fweisbec@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
076501ff |
|
06-Jul-2016 |
Randy Dunlap <rdunlap@infradead.org> |
init/Kconfig: keep Expert users menu together The "expert" menu was broken (split) such that all entries in it after KALLSYMS were displayed in the "General setup" area instead of in the "Expert users" area. Fix this by adding one kconfig dependency. Yes, the Expert users menu is fragile. Problems like this have happened several times in the past. I will attempt to isolate the Expert users menu if there is interest in that. Fixes: 4d5d5664c900 ("x86: kallsyms: disable absolute percpu symbols on !SMP") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: stable@vger.kernel.org # 4.6 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
5e0d8d59 |
|
05-Jun-2016 |
Geert Uytterhoeven <geert@linux-m68k.org> |
init: fix Kconfig text [jkosina@suse.cz: folded another fix on top on the same line as spotted by Randy Dunlap] Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
|
#
570dd3c7 |
|
15-Jun-2016 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Disable TASKS_RCU for usermode Linux Usermode Linux currently does not implement arch_irqs_disabled_flags(), which results in a build failure in TASKS_RCU. Therefore, this commit disables the TASKS_RCU Kconfig option in usermode Linux builds. The usermode Linux maintainers expect to merge arch_irqs_disabled_flags() into 4.8, at which point this commit may be reverted. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Jeff Dike <jdike@addtoit.com> Acked-by: Richard Weinberger <richard@nod.at>
|
#
427934b8 |
|
20-May-2016 |
Petr Mladek <pmladek@suse.com> |
printk/nmi: increase the size of NMI buffer and make it configurable Testing has shown that the backtrace sometimes does not fit into the 4kB temporary buffer that is used in NMI context. The warnings are gone when I double the temporary buffer size. This patch doubles the buffer size and makes it configurable. Note that this problem existed even in the x86-specific implementation that was added by the commit a9edc8809328 ("x86/nmi: Perform a safe NMI stack trace on all CPUs"). Nobody noticed it because it did not print any warnings. Signed-off-by: Petr Mladek <pmladek@suse.com> Cc: Jan Kara <jack@suse.cz> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Russell King <rmk+kernel@arm.linux.org.uk> Cc: Daniel Thompson <daniel.thompson@linaro.org> Cc: Jiri Kosina <jkosina@suse.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: David Miller <davem@davemloft.net> Cc: Daniel Thompson <daniel.thompson@linaro.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
42a0bb3f |
|
20-May-2016 |
Petr Mladek <pmladek@suse.com> |
printk/nmi: generic solution for safe printk in NMI printk() takes some locks and could not be used a safe way in NMI context. The chance of a deadlock is real especially when printing stacks from all CPUs. This particular problem has been addressed on x86 by the commit a9edc8809328 ("x86/nmi: Perform a safe NMI stack trace on all CPUs"). The patchset brings two big advantages. First, it makes the NMI backtraces safe on all architectures for free. Second, it makes all NMI messages almost safe on all architectures (the temporary buffer is limited. We still should keep the number of messages in NMI context at minimum). Note that there already are several messages printed in NMI context: WARN_ON(in_nmi()), BUG_ON(in_nmi()), anything being printed out from MCE handlers. These are not easy to avoid. This patch reuses most of the code and makes it generic. It is useful for all messages and architectures that support NMI. The alternative printk_func is set when entering and is reseted when leaving NMI context. It queues IRQ work to copy the messages into the main ring buffer in a safe context. __printk_nmi_flush() copies all available messages and reset the buffer. Then we could use a simple cmpxchg operations to get synchronized with writers. There is also used a spinlock to get synchronized with other flushers. We do not longer use seq_buf because it depends on external lock. It would be hard to make all supported operations safe for a lockless use. It would be confusing and error prone to make only some operations safe. The code is put into separate printk/nmi.c as suggested by Steven Rostedt. It needs a per-CPU buffer and is compiled only on architectures that call nmi_enter(). This is achieved by the new HAVE_NMI Kconfig flag. The are MN10300 and Xtensa architectures. We need to clean up NMI handling there first. Let's do it separately. The patch is heavily based on the draft from Peter Zijlstra, see https://lkml.org/lkml/2015/6/10/327 [arnd@arndb.de: printk-nmi: use %zu format string for size_t] [akpm@linux-foundation.org: min_t->min - all types are size_t here] Signed-off-by: Petr Mladek <pmladek@suse.com> Suggested-by: Peter Zijlstra <peterz@infradead.org> Suggested-by: Steven Rostedt <rostedt@goodmis.org> Cc: Jan Kara <jack@suse.cz> Acked-by: Russell King <rmk+kernel@arm.linux.org.uk> [arm part] Cc: Daniel Thompson <daniel.thompson@linaro.org> Cc: Jiri Kosina <jkosina@suse.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: David Miller <davem@davemloft.net> Cc: Daniel Thompson <daniel.thompson@linaro.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
c7ce4f60 |
|
19-May-2016 |
Thomas Garnier <thgarnie@google.com> |
mm: SLAB freelist randomization Provides an optional config (CONFIG_SLAB_FREELIST_RANDOM) to randomize the SLAB freelist. The list is randomized during initialization of a new set of pages. The order on different freelist sizes is pre-computed at boot for performance. Each kmem_cache has its own randomized freelist. Before pre-computed lists are available freelists are generated dynamically. This security feature reduces the predictability of the kernel SLAB allocator against heap overflows rendering attacks much less stable. For example this attack against SLUB (also applicable against SLAB) would be affected: https://jon.oberheide.org/blog/2010/09/10/linux-kernel-can-slub-overflow/ Also, since v4.6 the freelist was moved at the end of the SLAB. It means a controllable heap is opened to new attacks not yet publicly discussed. A kernel heap overflow can be transformed to multiple use-after-free. This feature makes this type of attack harder too. To generate entropy, we use get_random_bytes_arch because 0 bits of entropy is available in the boot stage. In the worse case this function will fallback to the get_random_bytes sub API. We also generate a shift random number to shift pre-computed freelist for each new set of pages. The config option name is not specific to the SLAB as this approach will be extended to other allocators like SLUB. Performance results highlighted no major changes: Hackbench (running 90 10 times): Before average: 0.0698 After average: 0.0663 (-5.01%) slab_test 1 run on boot. Difference only seen on the 2048 size test being the worse case scenario covered by freelist randomization. New slab pages are constantly being created on the 10000 allocations. Variance should be mainly due to getting new pages every few allocations. Before: Single thread testing ===================== 1. Kmalloc: Repeatedly allocate then free test 10000 times kmalloc(8) -> 99 cycles kfree -> 112 cycles 10000 times kmalloc(16) -> 109 cycles kfree -> 140 cycles 10000 times kmalloc(32) -> 129 cycles kfree -> 137 cycles 10000 times kmalloc(64) -> 141 cycles kfree -> 141 cycles 10000 times kmalloc(128) -> 152 cycles kfree -> 148 cycles 10000 times kmalloc(256) -> 195 cycles kfree -> 167 cycles 10000 times kmalloc(512) -> 257 cycles kfree -> 199 cycles 10000 times kmalloc(1024) -> 393 cycles kfree -> 251 cycles 10000 times kmalloc(2048) -> 649 cycles kfree -> 228 cycles 10000 times kmalloc(4096) -> 806 cycles kfree -> 370 cycles 10000 times kmalloc(8192) -> 814 cycles kfree -> 411 cycles 10000 times kmalloc(16384) -> 892 cycles kfree -> 455 cycles 2. Kmalloc: alloc/free test 10000 times kmalloc(8)/kfree -> 121 cycles 10000 times kmalloc(16)/kfree -> 121 cycles 10000 times kmalloc(32)/kfree -> 121 cycles 10000 times kmalloc(64)/kfree -> 121 cycles 10000 times kmalloc(128)/kfree -> 121 cycles 10000 times kmalloc(256)/kfree -> 119 cycles 10000 times kmalloc(512)/kfree -> 119 cycles 10000 times kmalloc(1024)/kfree -> 119 cycles 10000 times kmalloc(2048)/kfree -> 119 cycles 10000 times kmalloc(4096)/kfree -> 121 cycles 10000 times kmalloc(8192)/kfree -> 119 cycles 10000 times kmalloc(16384)/kfree -> 119 cycles After: Single thread testing ===================== 1. Kmalloc: Repeatedly allocate then free test 10000 times kmalloc(8) -> 130 cycles kfree -> 86 cycles 10000 times kmalloc(16) -> 118 cycles kfree -> 86 cycles 10000 times kmalloc(32) -> 121 cycles kfree -> 85 cycles 10000 times kmalloc(64) -> 176 cycles kfree -> 102 cycles 10000 times kmalloc(128) -> 178 cycles kfree -> 100 cycles 10000 times kmalloc(256) -> 205 cycles kfree -> 109 cycles 10000 times kmalloc(512) -> 262 cycles kfree -> 136 cycles 10000 times kmalloc(1024) -> 342 cycles kfree -> 157 cycles 10000 times kmalloc(2048) -> 701 cycles kfree -> 238 cycles 10000 times kmalloc(4096) -> 803 cycles kfree -> 364 cycles 10000 times kmalloc(8192) -> 835 cycles kfree -> 404 cycles 10000 times kmalloc(16384) -> 896 cycles kfree -> 441 cycles 2. Kmalloc: alloc/free test 10000 times kmalloc(8)/kfree -> 121 cycles 10000 times kmalloc(16)/kfree -> 121 cycles 10000 times kmalloc(32)/kfree -> 123 cycles 10000 times kmalloc(64)/kfree -> 142 cycles 10000 times kmalloc(128)/kfree -> 121 cycles 10000 times kmalloc(256)/kfree -> 119 cycles 10000 times kmalloc(512)/kfree -> 119 cycles 10000 times kmalloc(1024)/kfree -> 119 cycles 10000 times kmalloc(2048)/kfree -> 119 cycles 10000 times kmalloc(4096)/kfree -> 119 cycles 10000 times kmalloc(8192)/kfree -> 119 cycles 10000 times kmalloc(16384)/kfree -> 119 cycles [akpm@linux-foundation.org: propagate gfp_t into cache_random_seq_create()] Signed-off-by: Thomas Garnier <thgarnie@google.com> Acked-by: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Kees Cook <keescook@chromium.org> Cc: Greg Thelen <gthelen@google.com> Cc: Laura Abbott <labbott@fedoraproject.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
877417e6 |
|
25-Apr-2016 |
Arnd Bergmann <arnd@arndb.de> |
Kbuild: change CC_OPTIMIZE_FOR_SIZE definition CC_OPTIMIZE_FOR_SIZE disables the often useful -Wmaybe-unused warning, because that causes a ridiculous amount of false positives when combined with -Os. This means a lot of warnings don't show up in testing by the developers that should see them with an 'allmodconfig' kernel that has CC_OPTIMIZE_FOR_SIZE enabled, but only later in randconfig builds that don't. This changes the Kconfig logic around CC_OPTIMIZE_FOR_SIZE to make it a 'choice' statement defaulting to CC_OPTIMIZE_FOR_PERFORMANCE that gets added for this purpose. The allmodconfig and allyesconfig kernels now default to -O2 with the maybe-unused warning enabled. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Michal Marek <mmarek@suse.com>
|
#
f76be617 |
|
01-Apr-2016 |
Andi Kleen <ak@linux.intel.com> |
Make CONFIG_FHANDLE default y Newer Fedora and OpenSUSE didn't boot with my standard configuration. It took me some time to figure out why, in fact I had to write a script to try different config options systematically. The problem is that something (systemd) in dracut depends on CONFIG_FHANDLE, which adds open by file handle syscalls. While it is set in defconfigs it is very easy to miss when updating older configs because it is not default y. Make it default y and also depend on EXPERT, as dracut use is likely widespread. Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Richard Weinberger <richard.weinberger@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
dbacb0ef |
|
26-Jan-2016 |
Nicolas Pitre <nico@fluxnic.net> |
kconfig option for TRIM_UNUSED_KSYMS The config option to enable it all. Signed-off-by: Nicolas Pitre <nico@linaro.org> Acked-by: Rusty Russell <rusty@rustcorp.com.au>
|
#
2213e9a6 |
|
15-Mar-2016 |
Ard Biesheuvel <ardb@kernel.org> |
kallsyms: add support for relative offsets in kallsyms address table Similar to how relative extables are implemented, it is possible to emit the kallsyms table in such a way that it contains offsets relative to some anchor point in the kernel image rather than absolute addresses. On 64-bit architectures, it cuts the size of the kallsyms address table in half, since offsets between kernel symbols can typically be expressed in 32 bits. This saves several hundreds of kilobytes of permanent .rodata on average. In addition, the kallsyms address table is no longer subject to dynamic relocation when CONFIG_RELOCATABLE is in effect, so the relocation work done after decompression now doesn't have to do relocation updates for all these values. This saves up to 24 bytes (i.e., the size of a ELF64 RELA relocation table entry) per value, which easily adds up to a couple of megabytes of uncompressed __init data on ppc64 or arm64. Even if these relocation entries typically compress well, the combined size reduction of 2.8 MB uncompressed for a ppc64_defconfig build (of which 2.4 MB is __init data) results in a ~500 KB space saving in the compressed image. Since it is useful for some architectures (like x86) to retain the ability to emit absolute values as well, this patch also adds support for capturing both absolute and relative values when KALLSYMS_ABSOLUTE_PERCPU is in effect, by emitting absolute per-cpu addresses as positive 32-bit values, and addresses relative to the lowest encountered relative symbol as negative values, which are subtracted from the runtime address of this base symbol to produce the actual address. Support for the above is enabled by default for all architectures except IA-64 and Tile-GX, whose symbols are too far apart to capture in this manner. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Tested-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Kees Cook <keescook@chromium.org> Tested-by: Kees Cook <keescook@chromium.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Ingo Molnar <mingo@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Michal Marek <mmarek@suse.cz> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
4d5d5664 |
|
15-Mar-2016 |
Ard Biesheuvel <ardb@kernel.org> |
x86: kallsyms: disable absolute percpu symbols on !SMP scripts/kallsyms.c has a special --absolute-percpu command line option which deals with the zero based per cpu offsets that are used when building for SMP on x86_64. This means that the option should only be passed in that case, so add a Kconfig symbol with the correct predicate, and use that instead. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Tested-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Kees Cook <keescook@chromium.org> Tested-by: Kees Cook <keescook@chromium.org> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Ingo Molnar <mingo@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Michal Marek <mmarek@suse.cz> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
6cc578df |
|
04-Mar-2016 |
Parav Pandit <pandit.parav@gmail.com> |
cgroup: Trivial correction to reflect controller. Trivial correction in menuconfig help to reflect PIDs as controller instead of subsystem to align to rest of the text and documentation. Signed-off-by: Parav Pandit <pandit.parav@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>
|
#
d43de6c7 |
|
03-Mar-2016 |
David Howells <dhowells@redhat.com> |
akcipher: Move the RSA DER encoding check to the crypto layer Move the RSA EMSA-PKCS1-v1_5 encoding from the asymmetric-key public_key subtype to the rsa crypto module's pkcs1pad template. This means that the public_key subtype no longer has any dependencies on public key type. To make this work, the following changes have been made: (1) The rsa pkcs1pad template is now used for RSA keys. This strips off the padding and returns just the message hash. (2) In a previous patch, the pkcs1pad template gained an optional second parameter that, if given, specifies the hash used. We now give this, and pkcs1pad checks the encoded message E(M) for the EMSA-PKCS1-v1_5 encoding and verifies that the correct digest OID is present. (3) The crypto driver in crypto/asymmetric_keys/rsa.c is now reduced to something that doesn't care about what the encryption actually does and and has been merged into public_key.c. (4) CONFIG_PUBLIC_KEY_ALGO_RSA is gone. Module signing must set CONFIG_CRYPTO_RSA=y instead. Thoughts: (*) Should the encoding style (eg. raw, EMSA-PKCS1-v1_5) also be passed to the padding template? Should there be multiple padding templates registered that share most of the code? Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
|
#
d886f4e4 |
|
20-Jan-2016 |
Johannes Weiner <hannes@cmpxchg.org> |
mm: memcontrol: rein in the CONFIG space madness What CONFIG_INET and CONFIG_LEGACY_KMEM guard inside the memory controller code is insignificant, having these conditionals is not worth the complication and fragility that comes with them. [akpm@linux-foundation.org: rework mem_cgroup_css_free() statement ordering] Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Acked-by: Vladimir Davydov <vdavydov@virtuozzo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
489c2a20 |
|
20-Jan-2016 |
Johannes Weiner <hannes@cmpxchg.org> |
mm: memcontrol: introduce CONFIG_MEMCG_LEGACY_KMEM Let the user know that CONFIG_MEMCG_KMEM does not apply to the cgroup2 interface. This also makes legacy-only code sections stand out better. [arnd@arndb.de: mm: memcontrol: only manage socket pressure for CONFIG_INET] Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Tejun Heo <tj@kernel.org> Acked-by: Vladimir Davydov <vdavydov@virtuozzo.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
b2113a41 |
|
15-Jan-2016 |
Riku Voipio <riku.voipio@linaro.org> |
uselib: default depending if libc5 was used uselib hasn't been used since libc5; glibc does not use it. Deprecate uselib a bit more, by making the default y only if libc5 was widely used on the plaform. This makes arm64 kernel built with defconfig slightly smaller bloat-o-meter: add/remove: 0/3 grow/shrink: 0/2 up/down: 0/-1390 (-1390) function old new delta kernel_config_data 18164 18162 -2 uselib_flags 20 - -20 padzero 216 192 -24 sys_uselib 380 - -380 load_elf_library 964 - -964 Signed-off-by: Riku Voipio <riku.voipio@linaro.org> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
cb74ed27 |
|
13-Jan-2016 |
Paul Moore <pmoore@redhat.com> |
audit: always enable syscall auditing when supported and audit is enabled To the best of our knowledge, everyone who enables audit at compile time also enables syscall auditing; this patch simplifies the Kconfig menus by removing the option to disable syscall auditing when audit is selected and the target arch supports it. Signed-off-by: Paul Moore <pmoore@redhat.com>
|
#
6bf024e6 |
|
17-Dec-2015 |
Johannes Weiner <hannes@cmpxchg.org> |
cgroup: put controller Kconfig options in meaningful order To make it easier to quickly find what's needed list the basic resource controllers of cgroup2 first - io, memory, cpu - while pushing the more exotic and/or legacy controllers to the bottom. tj: Removed spurious "&& CGROUPS" from CGROUP_PERF as suggested by Li. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Zefan Li <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>
|
#
a0166ec4 |
|
17-Dec-2015 |
Johannes Weiner <hannes@cmpxchg.org> |
cgroup: clean up the kernel configuration menu nomenclature The config options for the different cgroup controllers use various terms: resource controller, cgroup subsystem, etc. Simplify this to "controller", which is clear enough in the cgroup context. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Tejun Heo <tj@kernel.org>
|
#
86fffe4a |
|
11-Dec-2015 |
Chris Wilson <chris@chris-wilson.co.uk> |
kernel: remove stop_machine() Kconfig dependency Currently the full stop_machine() routine is only enabled on SMP if module unloading is enabled, or if the CPUs are hotpluggable. This leads to configurations where stop_machine() is broken as it will then only run the callback on the local CPU with irqs disabled, and not stop the other CPUs or run the callback on them. For example, this breaks MTRR setup on x86 in certain configs since ea8596bb2d8d379 ("kprobes/x86: Remove unused text_poke_smp() and text_poke_smp_batch() functions") as the MTRR is only established on the boot CPU. This patch removes the Kconfig option for STOP_MACHINE and uses the SMP and HOTPLUG_CPU config options to compile the correct stop_machine() for the architecture, removing the false dependency on MODULE_UNLOAD in the process. Link: https://lkml.org/lkml/2014/10/8/124 References: https://bugs.freedesktop.org/show_bug.cgi?id=84794 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Pranith Kumar <bobby.prani@gmail.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Vladimir Davydov <vdavydov@parallels.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: H. Peter Anvin <hpa@linux.intel.com> Cc: Tejun Heo <tj@kernel.org> Cc: Iulia Manda <iulia.manda21@gmail.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Chuck Ebbert <cebbert.lkml@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
5b25b13a |
|
11-Sep-2015 |
Mathieu Desnoyers <mathieu.desnoyers@efficios.com> |
sys_membarrier(): system-wide memory barrier (generic, x86) Here is an implementation of a new system call, sys_membarrier(), which executes a memory barrier on all threads running on the system. It is implemented by calling synchronize_sched(). It can be used to distribute the cost of user-space memory barriers asymmetrically by transforming pairs of memory barriers into pairs consisting of sys_membarrier() and a compiler barrier. For synchronization primitives that distinguish between read-side and write-side (e.g. userspace RCU [1], rwlocks), the read-side can be accelerated significantly by moving the bulk of the memory barrier overhead to the write-side. The existing applications of which I am aware that would be improved by this system call are as follows: * Through Userspace RCU library (http://urcu.so) - DNS server (Knot DNS) https://www.knot-dns.cz/ - Network sniffer (http://netsniff-ng.org/) - Distributed object storage (https://sheepdog.github.io/sheepdog/) - User-space tracing (http://lttng.org) - Network storage system (https://www.gluster.org/) - Virtual routers (https://events.linuxfoundation.org/sites/events/files/slides/DPDK_RCU_0MQ.pdf) - Financial software (https://lkml.org/lkml/2015/3/23/189) Those projects use RCU in userspace to increase read-side speed and scalability compared to locking. Especially in the case of RCU used by libraries, sys_membarrier can speed up the read-side by moving the bulk of the memory barrier cost to synchronize_rcu(). * Direct users of sys_membarrier - core dotnet garbage collector (https://github.com/dotnet/coreclr/issues/198) Microsoft core dotnet GC developers are planning to use the mprotect() side-effect of issuing memory barriers through IPIs as a way to implement Windows FlushProcessWriteBuffers() on Linux. They are referring to sys_membarrier in their github thread, specifically stating that sys_membarrier() is what they are looking for. To explain the benefit of this scheme, let's introduce two example threads: Thread A (non-frequent, e.g. executing liburcu synchronize_rcu()) Thread B (frequent, e.g. executing liburcu rcu_read_lock()/rcu_read_unlock()) In a scheme where all smp_mb() in thread A are ordering memory accesses with respect to smp_mb() present in Thread B, we can change each smp_mb() within Thread A into calls to sys_membarrier() and each smp_mb() within Thread B into compiler barriers "barrier()". Before the change, we had, for each smp_mb() pairs: Thread A Thread B previous mem accesses previous mem accesses smp_mb() smp_mb() following mem accesses following mem accesses After the change, these pairs become: Thread A Thread B prev mem accesses prev mem accesses sys_membarrier() barrier() follow mem accesses follow mem accesses As we can see, there are two possible scenarios: either Thread B memory accesses do not happen concurrently with Thread A accesses (1), or they do (2). 1) Non-concurrent Thread A vs Thread B accesses: Thread A Thread B prev mem accesses sys_membarrier() follow mem accesses prev mem accesses barrier() follow mem accesses In this case, thread B accesses will be weakly ordered. This is OK, because at that point, thread A is not particularly interested in ordering them with respect to its own accesses. 2) Concurrent Thread A vs Thread B accesses Thread A Thread B prev mem accesses prev mem accesses sys_membarrier() barrier() follow mem accesses follow mem accesses In this case, thread B accesses, which are ensured to be in program order thanks to the compiler barrier, will be "upgraded" to full smp_mb() by synchronize_sched(). * Benchmarks On Intel Xeon E5405 (8 cores) (one thread is calling sys_membarrier, the other 7 threads are busy looping) 1000 non-expedited sys_membarrier calls in 33s =3D 33 milliseconds/call. * User-space user of this system call: Userspace RCU library Both the signal-based and the sys_membarrier userspace RCU schemes permit us to remove the memory barrier from the userspace RCU rcu_read_lock() and rcu_read_unlock() primitives, thus significantly accelerating them. These memory barriers are replaced by compiler barriers on the read-side, and all matching memory barriers on the write-side are turned into an invocation of a memory barrier on all active threads in the process. By letting the kernel perform this synchronization rather than dumbly sending a signal to every process threads (as we currently do), we diminish the number of unnecessary wake ups and only issue the memory barriers on active threads. Non-running threads do not need to execute such barrier anyway, because these are implied by the scheduler context switches. Results in liburcu: Operations in 10s, 6 readers, 2 writers: memory barriers in reader: 1701557485 reads, 2202847 writes signal-based scheme: 9830061167 reads, 6700 writes sys_membarrier: 9952759104 reads, 425 writes sys_membarrier (dyn. check): 7970328887 reads, 425 writes The dynamic sys_membarrier availability check adds some overhead to the read-side compared to the signal-based scheme, but besides that, sys_membarrier slightly outperforms the signal-based scheme. However, this non-expedited sys_membarrier implementation has a much slower grace period than signal and memory barrier schemes. Besides diminishing the number of wake-ups, one major advantage of the membarrier system call over the signal-based scheme is that it does not need to reserve a signal. This plays much more nicely with libraries, and with processes injected into for tracing purposes, for which we cannot expect that signals will be unused by the application. An expedited version of this system call can be added later on to speed up the grace period. Its implementation will likely depend on reading the cpu_curr()->mm without holding each CPU's rq lock. This patch adds the system call to x86 and to asm-generic. [1] http://urcu.so membarrier(2) man page: MEMBARRIER(2) Linux Programmer's Manual MEMBARRIER(2) NAME membarrier - issue memory barriers on a set of threads SYNOPSIS #include <linux/membarrier.h> int membarrier(int cmd, int flags); DESCRIPTION The cmd argument is one of the following: MEMBARRIER_CMD_QUERY Query the set of supported commands. It returns a bitmask of supported commands. MEMBARRIER_CMD_SHARED Execute a memory barrier on all threads running on the system. Upon return from system call, the caller thread is ensured that all running threads have passed through a state where all memory accesses to user-space addresses match program order between entry to and return from the system call (non-running threads are de facto in such a state). This covers threads from all pro=E2=80=90 cesses running on the system. This command returns 0. The flags argument needs to be 0. For future extensions. All memory accesses performed in program order from each targeted thread is guaranteed to be ordered with respect to sys_membarrier(). If we use the semantic "barrier()" to represent a compiler barrier forcing memory accesses to be performed in program order across the barrier, and smp_mb() to represent explicit memory barriers forcing full memory ordering across the barrier, we have the following ordering table for each pair of barrier(), sys_membarrier() and smp_mb(): The pair ordering is detailed as (O: ordered, X: not ordered): barrier() smp_mb() sys_membarrier() barrier() X X O smp_mb() X O O sys_membarrier() O O O RETURN VALUE On success, these system calls return zero. On error, -1 is returned, and errno is set appropriately. For a given command, with flags argument set to 0, this system call is guaranteed to always return the same value until reboot. ERRORS ENOSYS System call is not implemented. EINVAL Invalid arguments. Linux 2015-04-15 MEMBARRIER(2) Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Nicholas Miell <nmiell@comcast.net> Cc: Ingo Molnar <mingo@redhat.com> Cc: Alan Cox <gnomes@lxorguk.ukuu.org.uk> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: David Howells <dhowells@redhat.com> Cc: Pranith Kumar <bobby.prani@gmail.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Shuah Khan <shuahkh@osg.samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
72b252ae |
|
04-Sep-2015 |
Mel Gorman <mgorman@suse.de> |
mm: send one IPI per CPU to TLB flush all entries after unmapping pages An IPI is sent to flush remote TLBs when a page is unmapped that was potentially accesssed by other CPUs. There are many circumstances where this happens but the obvious one is kswapd reclaiming pages belonging to a running process as kswapd and the task are likely running on separate CPUs. On small machines, this is not a significant problem but as machine gets larger with more cores and more memory, the cost of these IPIs can be high. This patch uses a simple structure that tracks CPUs that potentially have TLB entries for pages being unmapped. When the unmapping is complete, the full TLB is flushed on the assumption that a refill cost is lower than flushing individual entries. Architectures wishing to do this must give the following guarantee. If a clean page is unmapped and not immediately flushed, the architecture must guarantee that a write to that linear address from a CPU with a cached TLB entry will trap a page fault. This is essentially what the kernel already depends on but the window is much larger with this patch applied and is worth highlighting. The architecture should consider whether the cost of the full TLB flush is higher than sending an IPI to flush each individual entry. An additional architecture helper called flush_tlb_local is required. It's a trivial wrapper with some accounting in the x86 case. The impact of this patch depends on the workload as measuring any benefit requires both mapped pages co-located on the LRU and memory pressure. The case with the biggest impact is multiple processes reading mapped pages taken from the vm-scalability test suite. The test case uses NR_CPU readers of mapped files that consume 10*RAM. Linear mapped reader on a 4-node machine with 64G RAM and 48 CPUs 4.2.0-rc1 4.2.0-rc1 vanilla flushfull-v7 Ops lru-file-mmap-read-elapsed 159.62 ( 0.00%) 120.68 ( 24.40%) Ops lru-file-mmap-read-time_range 30.59 ( 0.00%) 2.80 ( 90.85%) Ops lru-file-mmap-read-time_stddv 6.70 ( 0.00%) 0.64 ( 90.38%) 4.2.0-rc1 4.2.0-rc1 vanilla flushfull-v7 User 581.00 611.43 System 5804.93 4111.76 Elapsed 161.03 122.12 This is showing that the readers completed 24.40% faster with 29% less system CPU time. From vmstats, it is known that the vanilla kernel was interrupted roughly 900K times per second during the steady phase of the test and the patched kernel was interrupts 180K times per second. The impact is lower on a single socket machine. 4.2.0-rc1 4.2.0-rc1 vanilla flushfull-v7 Ops lru-file-mmap-read-elapsed 25.33 ( 0.00%) 20.38 ( 19.54%) Ops lru-file-mmap-read-time_range 0.91 ( 0.00%) 1.44 (-58.24%) Ops lru-file-mmap-read-time_stddv 0.28 ( 0.00%) 0.47 (-65.34%) 4.2.0-rc1 4.2.0-rc1 vanilla flushfull-v7 User 58.09 57.64 System 111.82 76.56 Elapsed 27.29 22.55 It's still a noticeable improvement with vmstat showing interrupts went from roughly 500K per second to 45K per second. The patch will have no impact on workloads with no memory pressure or have relatively few mapped pages. It will have an unpredictable impact on the workload running on the CPU being flushed as it'll depend on how many TLB entries need to be refilled and how long that takes. Worst case, the TLB will be completely cleared of active entries when the target PFNs were not resident at all. [sasha.levin@oracle.com: trace tlb flush after disabling preemption in try_to_unmap_flush] Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Dave Hansen <dave.hansen@intel.com> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
a14c151e |
|
04-Sep-2015 |
Andrea Arcangeli <aarcange@redhat.com> |
userfaultfd: buildsystem activation This allows to select the userfaultfd during configuration to build it. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Acked-by: Pavel Emelyanov <xemul@parallels.com> Cc: Sanidhya Kashyap <sanidhya.gatech@gmail.com> Cc: zhang.zhanghailiang@huawei.com Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Andres Lagar-Cavilla <andreslc@google.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Hugh Dickins <hughd@google.com> Cc: Peter Feiner <pfeiner@google.com> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "Huangpeng (Peter)" <peter.huangpeng@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
bf3eac84 |
|
11-Aug-2015 |
Oleg Nesterov <oleg@redhat.com> |
percpu-rwsem: kill CONFIG_PERCPU_RWSEM Remove CONFIG_PERCPU_RWSEM, the next patch adds the unconditional user of percpu_rw_semaphore. Signed-off-by: Oleg Nesterov <oleg@redhat.com>
|
#
cfc411e7 |
|
14-Aug-2015 |
David Howells <dhowells@redhat.com> |
Move certificate handling to its own directory Move certificate handling out of the kernel/ directory and into a certs/ directory to get all the weird stuff in one place and move the generated signing keys into this directory. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: David Woodhouse <David.Woodhouse@intel.com>
|
#
228c37ff |
|
10-Aug-2015 |
David Howells <dhowells@redhat.com> |
sign-file: Document dependency on OpenSSL devel libraries The revised sign-file program is no longer a script that wraps the openssl program, but now rather a program that makes use of OpenSSL's crypto library. This means that to build the sign-file program, the kernel build process now has a dependency on the OpenSSL development packages in addition to OpenSSL itself. Document this in Kconfig and in module-signing.txt. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: David Woodhouse <David.Woodhouse@intel.com>
|
#
99d27b1b |
|
20-Jul-2015 |
David Woodhouse <David.Woodhouse@intel.com> |
modsign: Add explicit CONFIG_SYSTEM_TRUSTED_KEYS option Let the user explicitly provide a file containing trusted keys, instead of just automatically finding files matching *.x509 in the build tree and trusting whatever we find. This really ought to be an *explicit* configuration, and the build rules for dealing with the files were fairly painful too. Fix applied from James Morris that removes an '=' from a macro definition in kernel/Makefile as this is a feature that only exists from GNU make 3.82 onwards. Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Signed-off-by: David Howells <dhowells@redhat.com>
|
#
fb117949 |
|
20-Jul-2015 |
David Woodhouse <David.Woodhouse@intel.com> |
modsign: Use single PEM file for autogenerated key The current rule for generating signing_key.priv and signing_key.x509 is a classic example of a bad rule which has a tendency to break parallel make. When invoked to create *either* target, it generates the other target as a side-effect that make didn't predict. So let's switch to using a single file signing_key.pem which contains both key and certificate. That matches what we do in the case of an external key specified by CONFIG_MODULE_SIG_KEY anyway, so it's also slightly cleaner. Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Signed-off-by: David Howells <dhowells@redhat.com>
|
#
1329e8cc |
|
20-Jul-2015 |
David Woodhouse <David.Woodhouse@intel.com> |
modsign: Extract signing cert from CONFIG_MODULE_SIG_KEY if needed Where an external PEM file or PKCS#11 URI is given, we can get the cert from it for ourselves instead of making the user drop signing_key.x509 in place for us. Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Signed-off-by: David Howells <dhowells@redhat.com>
|
#
19e91b69 |
|
20-Jul-2015 |
David Woodhouse <David.Woodhouse@intel.com> |
modsign: Allow external signing key to be specified Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Signed-off-by: David Howells <dhowells@redhat.com>
|
#
091f6e26 |
|
20-Jul-2015 |
David Howells <dhowells@redhat.com> |
MODSIGN: Extract the blob PKCS#7 signature verifier from module signing Extract the function that drives the PKCS#7 signature verification given a data blob and a PKCS#7 blob out from the module signing code and lump it with the system keyring code as it's generic. This makes it independent of module config options and opens it to use by the firmware loader. Signed-off-by: David Howells <dhowells@redhat.com> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Ming Lei <ming.lei@canonical.com> Cc: Seth Forshee <seth.forshee@canonical.com> Cc: Kyle McMartin <kyle@kernel.org>
|
#
3f1e1bea |
|
20-Jul-2015 |
David Howells <dhowells@redhat.com> |
MODSIGN: Use PKCS#7 messages as module signatures Move to using PKCS#7 messages as module signatures because: (1) We have to be able to support the use of X.509 certificates that don't have a subjKeyId set. We're currently relying on this to look up the X.509 certificate in the trusted keyring list. (2) PKCS#7 message signed information blocks have a field that supplies the data required to match with the X.509 certificate that signed it. (3) The PKCS#7 certificate carries fields that specify the digest algorithm used to generate the signature in a standardised way and the X.509 certificates specify the public key algorithm in a standardised way - so we don't need our own methods of specifying these. (4) We now have PKCS#7 message support in the kernel for signed kexec purposes and we can make use of this. To make this work, the old sign-file script has been replaced with a program that needs compiling in a previous patch. The rules to build it are added here. Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: Vivek Goyal <vgoyal@redhat.com>
|
#
be55fa2a |
|
02-Jun-2015 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Hide RCU_NOCB_CPU behind RCU_EXPERT This commit prevents Kconfig from asking the user about RCU_NOCB_CPU unless the user really wants to be asked. Reported-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Thomas Gleixner <tglx@linutronix.de>
|
#
49b786ea |
|
09-Jun-2015 |
Aleksa Sarai <cyphar@cyphar.com> |
cgroup: implement the PIDs subsystem Adds a new single-purpose PIDs subsystem to limit the number of tasks that can be forked inside a cgroup. Essentially this is an implementation of RLIMIT_NPROC that applies to a cgroup rather than a process tree. However, it should be noted that organisational operations (adding and removing tasks from a PIDs hierarchy) will *not* be prevented. Rather, the number of tasks in the hierarchy cannot exceed the limit through forking. This is due to the fact that, in the unified hierarchy, attach cannot fail (and it is not possible for a task to overcome its PIDs cgroup policy limit by attaching to a child cgroup -- even if migrating mid-fork it must be able to fork in the parent first). PIDs are fundamentally a global resource, and it is possible to reach PID exhaustion inside a cgroup without hitting any reasonable kmemcg policy. Once you've hit PID exhaustion, you're only in a marginally better state than OOM. This subsystem allows PID exhaustion inside a cgroup to be prevented. Signed-off-by: Aleksa Sarai <cyphar@cyphar.com> Signed-off-by: Tejun Heo <tj@kernel.org>
|
#
d1ec4c34 |
|
13-May-2015 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Drop RCU_USER_QS in favor of NO_HZ_FULL The RCU_USER_QS Kconfig parameter is now just a synonym for NO_HZ_FULL, so this commit eliminates RCU_USER_QS, replacing all uses with NO_HZ_FULL. Reported-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
|
#
f6db8347 |
|
25-Jun-2015 |
Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> |
sched/stat: Simplify the sched_info accounting dependency Both CONFIG_SCHEDSTATS=y and CONFIG_TASK_DELAY_ACCT=y track task sched_info, which results in ugly #if clauses. Simplify the code by introducing a synthethic CONFIG_SCHED_INFO switch, selected by both. Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: a.p.zijlstra@chello.nl Cc: ricklind@us.ibm.com Link: http://lkml.kernel.org/r/8d19eef800811a94b0f91bcbeb27430a884d7433.1435255405.git.naveen.n.rao@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
fb39f98d |
|
01-Jul-2015 |
Ingo Molnar <mingo@kernel.org> |
printk: Increase maximum CONFIG_LOG_BUF_SHIFT from 21 to 25 So I tried to some kernel debugging that produced a ton of kernel messages on a big box, and wanted to save them all: but CONFIG_LOG_BUF_SHIFT maxes out at 21 (2 MB). Increase it to 25 (32 MB). This does not affect any existing config or defaults. Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
2e13ba54 |
|
25-Jun-2015 |
Iago López Galeiras <iago@endocode.com> |
fs, proc: introduce CONFIG_PROC_CHILDREN Commit 818411616baf ("fs, proc: introduce /proc/<pid>/task/<tid>/children entry") introduced the children entry for checkpoint restore and the file is only available on kernels configured with CONFIG_EXPERT and CONFIG_CHECKPOINT_RESTORE. This is available in most distributions (Fedora, Debian, Ubuntu, CoreOS) because they usually enable CONFIG_EXPERT and CONFIG_CHECKPOINT_RESTORE. But Arch does not enable CONFIG_EXPERT or CONFIG_CHECKPOINT_RESTORE. However, the children proc file is useful outside of checkpoint restore. I would like to use it in rkt. The rkt process exec() another program it does not control, and that other program will fork()+exec() a child process. I would like to find the pid of the child process from an external tool without iterating in /proc over all processes to find which one has a parent pid equal to rkt. This commit introduces CONFIG_PROC_CHILDREN and makes CONFIG_CHECKPOINT_RESTORE select it. This allows enabling /proc/<pid>/task/<tid>/children without needing to enable CONFIG_CHECKPOINT_RESTORE and CONFIG_EXPERT. Alban tested that /proc/<pid>/task/<tid>/children is present when the kernel is configured with CONFIG_PROC_CHILDREN=y but without CONFIG_CHECKPOINT_RESTORE Signed-off-by: Iago López Galeiras <iago@endocode.com> Tested-by: Alban Crequy <alban@endocode.com> Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Serge Hallyn <serge.hallyn@canonical.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Djalal Harouni <djalal@endocode.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
b6c09b51 |
|
15-Jun-2015 |
Rusty Russell <rusty@rustcorp.com.au> |
modules: clarify CONFIG_MODULE_COMPRESS help, suggest 'N'. Andreas turned this option on, only to find out Debian (and Ubuntu!) don't enable support in their kmod builds. Shorten the text, and suggest N at the bottom (at least for now). Reported-by: Andreas Mohr <andim2@users.sf.net> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
#
89e9b9e0 |
|
22-May-2015 |
Tejun Heo <tj@kernel.org> |
writeback: add {CONFIG|BDI_CAP|FS}_CGROUP_WRITEBACK cgroup writeback requires support from both bdi and filesystem sides. Add BDI_CAP_CGROUP_WRITEBACK and FS_CGROUP_WRITEBACK to indicate support and enable BDI_CAP_CGROUP_WRITEBACK on block based bdi's by default. Also, define CONFIG_CGROUP_WRITEBACK which is enabled if both MEMCG and BLK_CGROUP are enabled. inode_cgwb_enabled() which determines whether a given inode's both bdi and fs support cgroup writeback is added. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>
|
#
6c9692e2 |
|
26-May-2015 |
Peter Zijlstra <peterz@infradead.org> |
module: Make the mod_tree stuff conditional on PERF_EVENTS || TRACING Andrew worried about the overhead on small systems; only use the fancy code when either perf or tracing is enabled. Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Steven Rostedt <rostedt@goodmis.org> Requested-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
#
e72aeafc |
|
21-Apr-2015 |
Pranith Kumar <bobby.prani@gmail.com> |
rcu: Remove prompt for RCU implementation The RCU implementation is chosen based on PREEMPT and SMP config options and is not really a user-selectable choice. This commit removes the menu entry, given that there is not much point in calling something a choice when there is in fact no choice.. The TINY_RCU, TREE_RCU, and PREEMPT_RCU Kconfig options continue to be selected based solely on the values of the PREEMPT and SMP options. Signed-off-by: Pranith Kumar <bobby.prani@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
26730f55 |
|
21-Apr-2015 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Make RCU able to tolerate undefined CONFIG_RCU_KTHREAD_PRIO This commit updates the initialization of the kthread_prio boot parameter so that RCU will build even when CONFIG_RCU_KTHREAD_PRIO is undefined. The kthread_prio boot parameter is set to CONFIG_RCU_KTHREAD_PRIO if that is defined, otherwise to 1 if CONFIG_RCU_BOOST is defined and to zero otherwise. This commit then makes CONFIG_RCU_KTHREAD_PRIO depend on CONFIG_RCU_EXPERT, so that Kconfig users won't be asked about CONFIG_RCU_KTHREAD_PRIO unless they want to be. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Reported-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Pranith Kumar <bobby.prani@gmail.com>
|
#
47d631af |
|
21-Apr-2015 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Make RCU able to tolerate undefined CONFIG_RCU_FANOUT_LEAF This commit introduces an RCU_FANOUT_LEAF C-preprocessor macro so that RCU will build even when CONFIG_RCU_FANOUT_LEAF is undefined. The RCU_FANOUT_LEAF macro is set to the value of CONFIG_RCU_FANOUT_LEAF when defined, otherwise it is set to 32 for 32-bit systems and 64 for 64-bit systems. This commit then makes CONFIG_RCU_FANOUT_LEAF depend on CONFIG_RCU_EXPERT, so that Kconfig users won't be asked about CONFIG_RCU_FANOUT_LEAF unless they want to be. Reported-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Pranith Kumar <bobby.prani@gmail.com>
|
#
05c5df31 |
|
20-Apr-2015 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Make RCU able to tolerate undefined CONFIG_RCU_FANOUT This commit introduces an RCU_FANOUT C-preprocessor macro so that RCU will build even when CONFIG_RCU_FANOUT is undefined. The RCU_FANOUT macro is set to the value of CONFIG_RCU_FANOUT when defined, otherwise it is set to 32 for 32-bit systems and 64 for 64-bit systems. This commit then makes CONFIG_RCU_FANOUT depend on CONFIG_RCU_EXPERT, so that Kconfig users won't be asked about CONFIG_RCU_FANOUT unless they want to be. Reported-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Pranith Kumar <bobby.prani@gmail.com>
|
#
8739c5cb |
|
20-Apr-2015 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Break dependency of RCU_FANOUT_LEAF on RCU_FANOUT RCU_FANOUT_LEAF's range and default values depend on the value of RCU_FANOUT, which at the time seemed like a cute way to save two lines of Kconfig code. However, adding a dependency from both of these Kconfig parameters on RCU_EXPERT requires that RCU_FANOUT_LEAF operate correctly even if RCU_FANOUT is undefined. This commit therefore allows RCU_FANOUT_LEAF to take on the full range of permitted values, even in cases where RCU_FANOUT is undefined. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> [ paulmck: Eliminate redundant "default" as suggested by Pranith Kumar. ] Reviewed-by: Pranith Kumar <bobby.prani@gmail.com>
|
#
78cae10b |
|
20-Apr-2015 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Create RCU_EXPERT Kconfig and hide booleans behind it This commit creates an RCU_EXPERT Kconfig and hides the independent boolean RCU-related user-visible Kconfig parameters behind it, namely RCU_FAST_NO_HZ and RCU_BOOST. This prevents Kconfig from asking about these parameters unless the user really wants to be asked. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Pranith Kumar <bobby.prani@gmail.com>
|
#
7fa27001 |
|
20-Apr-2015 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Convert CONFIG_RCU_FANOUT_EXACT to boot parameter The CONFIG_RCU_FANOUT_EXACT Kconfig parameter is used primarily (and perhaps only) by rcutorture to verify that RCU works correctly in specific rcu_node combining-tree configurations. It therefore does not make much sense have this as a question to people attempting to configure their kernels. So this commit creates an rcutree.rcu_fanout_exact= boot parameter that rcutorture can use, and eliminates the original CONFIG_RCU_FANOUT_EXACT Kconfig parameter. Reported-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Pranith Kumar <bobby.prani@gmail.com>
|
#
7db21edf |
|
20-Apr-2015 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Directly drive RCU_USER_QS from Kconfig Currently, Kconfig will ask the user whether RCU_USER_QS should be set. This is silly because Kconfig already has all the information that it needs to set this parameter. This commit therefore directly drives the value of RCU_USER_QS via NO_HZ_FULL's "select" statement. Reported-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Pranith Kumar <bobby.prani@gmail.com> Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
|
#
82d0f4c0 |
|
20-Apr-2015 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Directly drive TASKS_RCU from Kconfig Currently, Kconfig will ask the user whether TASKS_RCU should be set. This is silly because Kconfig already has all the information that it needs to set this parameter. This commit therefore directly drives the value of TASKS_RCU via "select" statements. Which means that as subsystems require TASKS_RCU, those subsystems will need to add "select" statements of their own. Reported-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Reviewed-by: Pranith Kumar <bobby.prani@gmail.com>
|
#
d59cfc09 |
|
13-May-2015 |
Tejun Heo <tj@kernel.org> |
sched, cgroup: replace signal_struct->group_rwsem with a global percpu_rwsem The cgroup side of threadgroup locking uses signal_struct->group_rwsem to synchronize against threadgroup changes. This per-process rwsem adds small overhead to thread creation, exit and exec paths, forces cgroup code paths to do lock-verify-unlock-retry dance in a couple places and makes it impossible to atomically perform operations across multiple processes. This patch replaces signal_struct->group_rwsem with a global percpu_rwsem cgroup_threadgroup_rwsem which is cheaper on the reader side and contained in cgroups proper. This patch converts one-to-one. This does make writer side heavier and lower the granularity; however, cgroup process migration is a fairly cold path, we do want to optimize thread operations over it and cgroup migration operations don't take enough time for the lower granularity to matter. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org>
|
#
cb307113 |
|
04-May-2015 |
Michael Ellerman <mpe@ellerman.id.au> |
perf_event: Don't allow vmalloc() backed perf on powerpc On powerpc the perf event interrupt is not masked when interrupts are disabled, allowing it to function as an NMI. This causes problems if perf is using vmalloc. If we take a page fault on the vmalloc region the fault handler will fail the page fault because it detects we are coming in from an NMI (see do_hash_page()). We don't actually need or want vmalloc backed perf so just disable it on powerpc. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: <linuxppc-dev@ozlabs.org> Cc: Andrew Morton <akpm@osdl.org> Cc: Anton Blanchard <anton@samba.org> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: acme@ghostprotocols.net Cc: sukadev@linux.vnet.ibm.com Link: http://lkml.kernel.org/r/1430720799-18426-1-git-send-email-mpe@ellerman.id.au Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
2813893f |
|
15-Apr-2015 |
Iulia Manda <iulia.manda21@gmail.com> |
kernel: conditionally support non-root users, groups and capabilities There are a lot of embedded systems that run most or all of their functionality in init, running as root:root. For these systems, supporting multiple users is not necessary. This patch adds a new symbol, CONFIG_MULTIUSER, that makes support for non-root users, non-root groups, and capabilities optional. It is enabled under CONFIG_EXPERT menu. When this symbol is not defined, UID and GID are zero in any possible case and processes always have all capabilities. The following syscalls are compiled out: setuid, setregid, setgid, setreuid, setresuid, getresuid, setresgid, getresgid, setgroups, getgroups, setfsuid, setfsgid, capget, capset. Also, groups.c is compiled out completely. In kernel/capability.c, capable function was moved in order to avoid adding two ifdef blocks. This change saves about 25 KB on a defconfig build. The most minimal kernels have total text sizes in the high hundreds of kB rather than low MB. (The 25k goes down a bit with allnoconfig, but not that much. The kernel was booted in Qemu. All the common functionalities work. Adding users/groups is not possible, failing with -ENOSYS. Bloat-o-meter output: add/remove: 7/87 grow/shrink: 19/397 up/down: 1675/-26325 (-24650) [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Iulia Manda <iulia.manda21@gmail.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Tested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
19717542 |
|
01-Apr-2015 |
Vladimir Davydov <vdavydov.dev@gmail.com> |
Documentation/memcg: update memcg/kmem status Memcg/kmem reclaim support has been finally merged. Reflect this in the documentation. Acked-by: Michal Hocko <mhocko@suse.cz> Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
|
#
e1abf2cc |
|
02-Apr-2015 |
Ingo Molnar <mingo@kernel.org> |
bpf: Fix the build on BPF_SYSCALL=y && !CONFIG_TRACING kernels, make it more configurable So bpf_tracing.o depends on CONFIG_BPF_SYSCALL - but that's not its only dependency, it also depends on the tracing infrastructure and on kprobes, without which it will fail to build with: In file included from kernel/trace/bpf_trace.c:14:0: kernel/trace/trace.h: In function ‘trace_test_and_set_recursion’: kernel/trace/trace.h:491:28: error: ‘struct task_struct’ has no member named ‘trace_recursion’ unsigned int val = current->trace_recursion; [...] It took quite some time to trigger this build failure, because right now BPF_SYSCALL is very obscure, depends on CONFIG_EXPERT. So also make BPF_SYSCALL more configurable, not just under CONFIG_EXPERT. If BPF_SYSCALL, tracing and kprobes are enabled then enable the bpf_tracing gateway as well. We might want to make this an interactive option later on, although I'd not complicate it unnecessarily: enabling BPF_SYSCALL is enough of an indicator that the user wants BPF support. Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: David S. Miller <davem@davemloft.net> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
ee42571f |
|
19-Feb-2015 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Add Kconfig option to expedite grace periods during boot This commit adds a CONFIG_RCU_EXPEDITE_BOOT Kconfig parameter that emulates a very early boot rcu_expedite_gp(). A late-boot call to rcu_end_inkernel_boot() will provide the corresponding rcu_unexpedite_gp(). The late-boot call to rcu_end_inkernel_boot() should be made just before init is spawned. According to Arjan: > To show the boot time, I'm using the timestamp of the "Write protecting" > line, that's pretty much the last thing we print prior to ring 3 execution. > > A kernel with default RCU behavior (inside KVM, only virtual devices) > looks like this: > > [ 0.038724] Write protecting the kernel read-only data: 10240k > > a kernel with expedited RCU (using the command line option, so that I > don't have to recompile between measurements and thus am completely > oranges-to-oranges) > > [ 0.031768] Write protecting the kernel read-only data: 10240k > > which, in percentage, is an 18% improvement. Reported-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Arjan van de Ven <arjan@linux.intel.com>
|
#
5125991c |
|
13-Feb-2015 |
Andy Lutomirski <luto@amacapital.net> |
init: remove CONFIG_INIT_FALLBACK CONFIG_INIT_FALLBACK adds config bloat without an obvious use case that makes it worth keeping around. Delete it. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Chuck Ebbert <cebbert.lkml@gmail.com> Cc: Frank Rowand <frowand.list@gmail.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Rob Landley <rob@landley.net> Cc: Shuah Khan <shuah.kh@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
a94844b2 |
|
12-Dec-2014 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Optionally run grace-period kthreads at real-time priority Recent testing has shown that under heavy load, running RCU's grace-period kthreads at real-time priority can improve performance (according to 0day test robot) and reduce the incidence of RCU CPU stall warnings. However, most systems do just fine with the default non-realtime priorities for these kthreads, and it does not make sense to expose the entire user base to any risk stemming from this change, given that this change is of use only to a few users running extremely heavy workloads. Therefore, this commit allows users to specify realtime priorities for the grace-period kthreads, but leaves them running SCHED_OTHER by default. The realtime priority may be specified at build time via the RCU_KTHREAD_PRIO Kconfig parameter, or at boot time via the rcutree.kthread_prio parameter. Either way, 0 says to continue the default SCHED_OTHER behavior and values from 1-99 specify that priority of SCHED_FIFO behavior. Note that a value of 0 is not permitted when the RCU_BOOST Kconfig parameter is specified. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
31a4af7f |
|
04-Aug-2014 |
Masahiro Yamada <yamada.m@jp.panasonic.com> |
kbuild: trivial - fix the help doc of CONFIG_CC_OPTIMIZE_FOR_SIZE Other than GCC, we have another choice, Clang for building the kernel these days. It seems better to say "compiler" rather than "gcc". Signed-off-by: Masahiro Yamada <yamada.m@jp.panasonic.com> Signed-off-by: Michal Marek <mmarek@suse.cz>
|
#
6341e62b |
|
20-Dec-2014 |
Christoph Jaeger <cj@linux.com> |
kconfig: use bool instead of boolean for type definition attributes Support for keyword 'boolean' will be dropped later on. No functional change. Reference: http://lkml.kernel.org/r/cover.1418003065.git.cj@linux.com Signed-off-by: Christoph Jaeger <cj@linux.com> Signed-off-by: Michal Marek <mmarek@suse.cz>
|
#
83fe27ea |
|
05-Dec-2014 |
Pranith Kumar <bobby.prani@gmail.com> |
rcu: Make SRCU optional by using CONFIG_SRCU SRCU is not necessary to be compiled by default in all cases. For tinification efforts not compiling SRCU unless necessary is desirable. The current patch tries to make compiling SRCU optional by introducing a new Kconfig option CONFIG_SRCU which is selected when any of the components making use of SRCU are selected. If we do not select CONFIG_SRCU, srcu.o will not be compiled at all. text data bss dec hex filename 2007 0 0 2007 7d7 kernel/rcu/srcu.o Size of arch/powerpc/boot/zImage changes from text data bss dec hex filename 831552 64180 23944 919676 e087c arch/powerpc/boot/zImage : before 829504 64180 23952 917636 e0084 arch/powerpc/boot/zImage : after so the savings are about ~2000 bytes. Signed-off-by: Pranith Kumar <bobby.prani@gmail.com> CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com> CC: Josh Triplett <josh@joshtriplett.org> CC: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> [ paulmck: resolve conflict due to removal of arch/ia64/kvm/Kconfig. ]
|
#
5a43b88e |
|
24-Dec-2014 |
Lai Jiangshan <laijs@cn.fujitsu.com> |
rcu: Remove "select IRQ_WORK" from config TREE_RCU The 48a7639ce80c ("rcu: Make callers awaken grace-period kthread") removed the irq_work_queue(), so the TREE_RCU doesn't need irq work any more. This commit therefore updates RCU's Kconfig and Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
6ef4536e |
|
10-Dec-2014 |
Andy Lutomirski <luto@amacapital.net> |
init: allow CONFIG_INIT_FALLBACK=n to disable defaults if init= fails If a user puts init=/whatever on the command line and /whatever can't be run, then the kernel will try a few default options before giving up. If init=/whatever came from a bootloader prompt, then this is unexpected but probably harmless. On the other hand, if it comes from a script (e.g. a tool like virtme or perhaps a future kselftest script), then the fallbacks are likely to exist, but they'll do the wrong thing. For example, they might unexpectedly invoke systemd. This adds a config option CONFIG_INIT_FALLBACK. If unset, then a failure to run the specified init= process be fatal. The tentative plan is to remove CONFIG_INIT_FALLBACK for 3.20. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: Rob Landley <rob@landley.net> Cc: Chuck Ebbert <cebbert.lkml@gmail.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Shuah Khan <shuah.kh@samsung.com> Cc: Frank Rowand <frowand.list@gmail.com> Cc: Josh Triplett <josh@joshtriplett.org> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
9edad6ea |
|
10-Dec-2014 |
Johannes Weiner <hannes@cmpxchg.org> |
mm: move page->mem_cgroup bad page handling into generic code Now that the external page_cgroup data structure and its lookup is gone, let the generic bad_page() check for page->mem_cgroup sanity. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.cz> Acked-by: Vladimir Davydov <vdavydov@parallels.com> Acked-by: David S. Miller <davem@davemloft.net> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Tejun Heo <tj@kernel.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
6f7c97e8 |
|
10-Dec-2014 |
Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> |
mm/numa balancing: rearrange Kconfig entry Add the default enable config option after the NUMA_BALANCING option so that it appears related in the nconfig interface. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Mel Gorman <mgorman@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
5b1efc02 |
|
10-Dec-2014 |
Johannes Weiner <hannes@cmpxchg.org> |
kernel: res_counter: remove the unused API All memory accounting and limiting has been switched over to the lockless page counters. Bye, res_counter! [akpm@linux-foundation.org: update Documentation/cgroups/memory.txt] [mhocko@suse.cz: ditch the last remainings of res_counter] Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Vladimir Davydov <vdavydov@parallels.com> Acked-by: Michal Hocko <mhocko@suse.cz> Cc: Tejun Heo <tj@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: Michal Hocko <mhocko@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
71f87bee |
|
10-Dec-2014 |
Johannes Weiner <hannes@cmpxchg.org> |
mm: hugetlb_cgroup: convert to lockless page counters Abandon the spinlock-protected byte counters in favor of the unlocked page counters in the hugetlb controller as well. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Vladimir Davydov <vdavydov@parallels.com> Acked-by: Michal Hocko <mhocko@suse.cz> Cc: Tejun Heo <tj@kernel.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
3e32cb2e |
|
10-Dec-2014 |
Johannes Weiner <hannes@cmpxchg.org> |
mm: memcontrol: lockless page counters Memory is internally accounted in bytes, using spinlock-protected 64-bit counters, even though the smallest accounting delta is a page. The counter interface is also convoluted and does too many things. Introduce a new lockless word-sized page counter API, then change all memory accounting over to it. The translation from and to bytes then only happens when interfacing with userspace. The removed locking overhead is noticable when scaling beyond the per-cpu charge caches - on a 4-socket machine with 144-threads, the following test shows the performance differences of 288 memcgs concurrently running a page fault benchmark: vanilla: 18631648.500498 task-clock (msec) # 140.643 CPUs utilized ( +- 0.33% ) 1,380,638 context-switches # 0.074 K/sec ( +- 0.75% ) 24,390 cpu-migrations # 0.001 K/sec ( +- 8.44% ) 1,843,305,768 page-faults # 0.099 M/sec ( +- 0.00% ) 50,134,994,088,218 cycles # 2.691 GHz ( +- 0.33% ) <not supported> stalled-cycles-frontend <not supported> stalled-cycles-backend 8,049,712,224,651 instructions # 0.16 insns per cycle ( +- 0.04% ) 1,586,970,584,979 branches # 85.176 M/sec ( +- 0.05% ) 1,724,989,949 branch-misses # 0.11% of all branches ( +- 0.48% ) 132.474343877 seconds time elapsed ( +- 0.21% ) lockless: 12195979.037525 task-clock (msec) # 133.480 CPUs utilized ( +- 0.18% ) 832,850 context-switches # 0.068 K/sec ( +- 0.54% ) 15,624 cpu-migrations # 0.001 K/sec ( +- 10.17% ) 1,843,304,774 page-faults # 0.151 M/sec ( +- 0.00% ) 32,811,216,801,141 cycles # 2.690 GHz ( +- 0.18% ) <not supported> stalled-cycles-frontend <not supported> stalled-cycles-backend 9,999,265,091,727 instructions # 0.30 insns per cycle ( +- 0.10% ) 2,076,759,325,203 branches # 170.282 M/sec ( +- 0.12% ) 1,656,917,214 branch-misses # 0.08% of all branches ( +- 0.55% ) 91.369330729 seconds time elapsed ( +- 0.45% ) On top of improved scalability, this also gets rid of the icky long long types in the very heart of memcg, which is great for 32 bit and also makes the code a lot more readable. Notable differences between the old and new API: - res_counter_charge() and res_counter_charge_nofail() become page_counter_try_charge() and page_counter_charge() resp. to match the more common kernel naming scheme of try_do()/do() - res_counter_uncharge_until() is only ever used to cancel a local counter and never to uncharge bigger segments of a hierarchy, so it's replaced by the simpler page_counter_cancel() - res_counter_set_limit() is replaced by page_counter_limit(), which expects its callers to serialize against themselves - res_counter_memparse_write_strategy() is replaced by page_counter_limit(), which rounds down to the nearest page size - rather than up. This is more reasonable for explicitely requested hard upper limits. - to keep charging light-weight, page_counter_try_charge() charges speculatively, only to roll back if the result exceeds the limit. Because of this, a failing bigger charge can temporarily lock out smaller charges that would otherwise succeed. The error is bounded to the difference between the smallest and the biggest possible charge size, so for memcg, this means that a failing THP charge can send base page charges into reclaim upto 2MB (4MB) before the limit would have been reached. This should be acceptable. [akpm@linux-foundation.org: add includes for WARN_ON_ONCE and memparse] [akpm@linux-foundation.org: add includes for WARN_ON_ONCE, memparse, strncmp, and PAGE_SIZE] Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.cz> Acked-by: Vladimir Davydov <vdavydov@parallels.com> Cc: Tejun Heo <tj@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
28f6569a |
|
22-Sep-2014 |
Pranith Kumar <bobby.prani@gmail.com> |
rcu: Remove redundant TREE_PREEMPT_RCU config option PREEMPT_RCU and TREE_PREEMPT_RCU serve the same function after TINY_PREEMPT_RCU has been removed. This patch removes TREE_PREEMPT_RCU and uses PREEMPT_RCU config option in its place. Signed-off-by: Pranith Kumar <bobby.prani@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
21871d7e |
|
12-Sep-2014 |
Clark Williams <clark.williams@gmail.com> |
rcu: Unify boost and kthread priorities Rename CONFIG_RCU_BOOST_PRIO to CONFIG_RCU_KTHREAD_PRIO and use this value for both the per-CPU kthreads (rcuc/N) and the rcu boosting threads (rcub/n). Also, create the module_parameter rcutree.kthread_prio to be used on the kernel command line at boot to set a new value (rcutree.kthread_prio=N). Signed-off-by: Clark Williams <clark.williams@gmail.com> [ paulmck: Ported to rcu/dev, applied Paul Bolle and Peter Zijlstra feedback. ] Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
4568779f |
|
02-Sep-2014 |
Stefan Hengelein <stefan.hengelein@fau.de> |
init/Kconfig: move RCU_NOCB_CPU dependencies to choice Every choice item of the "Build-forced no-CBs CPUs" choice had a dependency to RCU_NOCB_CPU. It's more comprehensible if the choice itself has the dependency instead of every choice item. The choice itself doesn't need to be visible if there are no items selectable (i.e. on arch/frv) or RCU_NOCB_CPU is not defined. Signed-off-by: Stefan Hengelein <stefan.hengelein@fau.de> Signed-off-by: Andreas Ruprecht <rupran@einserver.de> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
f89b7755 |
|
23-Oct-2014 |
Alexei Starovoitov <ast@kernel.org> |
bpf: split eBPF out of NET introduce two configs: - hidden CONFIG_BPF to select eBPF interpreter that classic socket filters depend on - visible CONFIG_BPF_SYSCALL (default off) that tracing and sockets can use that solves several problems: - tracing and others that wish to use eBPF don't need to depend on NET. They can use BPF_SYSCALL to allow loading from userspace or select BPF to use it directly from kernel in NET-less configs. - in 3.18 programs cannot be attached to events yet, so don't force it on - when the rest of eBPF infra is there in 3.19+, it's still useful to switch it off to minimize kernel size bloat-o-meter on x64 shows: add/remove: 0/60 grow/shrink: 0/2 up/down: 0/-15601 (-15601) tested with many different config combinations. Hopefully didn't miss anything. Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2240a31d |
|
13-Oct-2014 |
Geert Uytterhoeven <geert@linux-m68k.org> |
printk: don't bother using LOG_CPU_MAX_BUF_SHIFT on !SMP When configuring a uniprocessor kernel, don't bother the user with an irrelevant LOG_CPU_MAX_BUF_SHIFT question, and don't build the unused code. Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Luis R. Rodriguez <mcgrof@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
6a33979d |
|
09-Oct-2014 |
Mel Gorman <mgorman@suse.de> |
mm: remove misleading ARCH_USES_NUMA_PROT_NONE ARCH_USES_NUMA_PROT_NONE was defined for architectures that implemented _PAGE_NUMA using _PROT_NONE. This saved using an additional PTE bit and relied on the fact that PROT_NONE vmas were skipped by the NUMA hinting fault scanner. This was found to be conceptually confusing with a lot of implicit assumptions and it was asked that an alternative be found. Commit c46a7c81 "x86: define _PAGE_NUMA by reusing software bits on the PMD and PTE levels" redefined _PAGE_NUMA on x86 to be one of the swap PTE bits and shrunk the maximum possible swap size but it did not go far enough. There are no architectures that reuse _PROT_NONE as _PROT_NUMA but the relics still exist. This patch removes ARCH_USES_NUMA_PROT_NONE and removes some unnecessary duplication in powerpc vs the generic implementation by defining the types the core NUMA helpers expected to exist from x86 with their ppc64 equivalent. This necessitated that a PTE bit mask be created that identified the bits that distinguish present from NUMA pte entries but it is expected this will only differ between arches based on _PAGE_PROTNONE. The naming for the generic helpers was taken from x86 originally but ppc64 has types that are equivalent for the purposes of the helper so they are mapped instead of duplicating code. Signed-off-by: Mel Gorman <mgorman@suse.de> Cc: Hugh Dickins <hughd@google.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Rik van Riel <riel@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
62b4d204 |
|
03-Oct-2014 |
Josh Triplett <josh@joshtriplett.org> |
init/Kconfig: Fix HAVE_FUTEX_CMPXCHG to not break up the EXPERT menu commit 03b8c7b623c80af264c4c8d6111e5c6289933666 ("futex: Allow architectures to skip futex_atomic_cmpxchg_inatomic() test") added the HAVE_FUTEX_CMPXCHG symbol right below FUTEX. This placed it right in the middle of the options for the EXPERT menu. However, HAVE_FUTEX_CMPXCHG does not depend on EXPERT or FUTEX, so Kconfig stops placing items in the EXPERT menu, and displays the remaining several EXPERT items (starting with EPOLL) directly in the General Setup menu. Since both users of HAVE_FUTEX_CMPXCHG only select it "if FUTEX", make HAVE_FUTEX_CMPXCHG itself depend on FUTEX. With this change, the subsequent items display as part of the EXPERT menu again; the EMBEDDED menu now appears as the next top-level item in the General Setup menu, which makes General Setup much shorter and more usable. Signed-off-by: Josh Triplett <josh@joshtriplett.org> Acked-by: Randy Dunlap <rdunlap@infradead.org> Cc: stable <stable@vger.kernel.org>
|
#
361e9dfb |
|
03-Oct-2014 |
Josh Triplett <josh@joshtriplett.org> |
init/Kconfig: Hide printk log config if CONFIG_PRINTK=n The buffers sized by CONFIG_LOG_BUF_SHIFT and CONFIG_LOG_CPU_MAX_BUF_SHIFT do not exist if CONFIG_PRINTK=n, so don't ask about their size at all. Signed-off-by: Josh Triplett <josh@joshtriplett.org> Acked-by: Randy Dunlap <rdunlap@infradead.org> Cc: stable <stable@vger.kernel.org>
|
#
f4579fc5 |
|
25-Jul-2014 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Fix attempt to avoid unsolicited offloading of callbacks Commit b58cc46c5f6b (rcu: Don't offload callbacks unless specifically requested) failed to adjust the callback lists of the CPUs that are known to be no-CBs CPUs only because they are also nohz_full= CPUs. This failure can result in callbacks that are posted during early boot getting stranded on nxtlist for CPUs whose no-CBs property becomes apparent late, and there can also be spurious warnings about offline CPUs posting callbacks. This commit fixes these problems by adding an early-boot rcu_init_nohz() that properly initializes the no-CBs CPUs. Note that kernels built with CONFIG_RCU_NOCB_CPU_ALL=y or with CONFIG_RCU_NOCB_CPU=n do not exhibit this bug. Neither do kernels booted without the nohz_full= boot parameter. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Pranith Kumar <bobby.prani@gmail.com> Tested-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
#
8315f422 |
|
27-Jun-2014 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Add call_rcu_tasks() This commit adds a new RCU-tasks flavor of RCU, which provides call_rcu_tasks(). This RCU flavor's quiescent states are voluntary context switch (not preemption!) and userspace execution (not the idle loop -- use some sort of schedule_on_each_cpu() if you need to handle the idle tasks. Note that unlike other RCU flavors, these quiescent states occur in tasks, not necessarily CPUs. Includes fixes from Steven Rostedt. This RCU flavor is assumed to have very infrequent latency-tolerant updaters. This assumption permits significant simplifications, including a single global callback list protected by a single global lock, along with a single task-private linked list containing all tasks that have not yet passed through a quiescent state. If experience shows this assumption to be incorrect, the required additional complexity will be added. Suggested-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
beb50df3 |
|
27-Aug-2014 |
Bertrand Jacquin <beber@meleeweb.net> |
kbuild: handle module compression while running 'make modules_install'. Since module-init-tools (gzip) and kmod (gzip and xz) support compressed modules, it could be useful to include a support for compressing modules right after having them installed. Doing this in kbuild instead of per distro can permit to make this kind of usage more generic. This patch add a Kconfig entry to "Enable loadable module support" menu and let you choose to compress using gzip (default) or xz. Both gzip and xz does not used any extra -[1-9] option since Andi Kleen and Rusty Russell prove no gain is made using them. gzip is called with -n argument to avoid storing original filename inside compressed file, that way we can save some more bytes. On a v3.16 kernel, 'make allmodconfig' generated 4680 modules for a total of 378MB (no strip, no sign, no compress), the following table shows observed disk space gain based on the allmodconfig .config : | time | +-------------+-----------------+ | manual .ko | make | size | percent | compression | modules_install | | gain +-------------+-----------------+------+-------- - | | 18.61s | 378M | GZIP | 3m16s | 3m37s | 102M | 73.41% XZ | 5m22s | 5m39s | 77M | 79.83% The gain for restricted environnement seems to be interesting while uncompress can be time consuming but happens only while loading a module, that is generally done only once. This is fully compatible with signed modules while the signed module is compressed. module-init-tools or kmod handles decompression and provide to other layer the uncompressed but signed payload. Reviewed-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Bertrand Jacquin <beber@meleeweb.net> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
#
51c0ff6d |
|
08-Aug-2014 |
Geert Uytterhoeven <geert@linux-m68k.org> |
mm: Fix CROSS_MEMORY_ATTACH help text grammar Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
|
#
d3ac21ca |
|
17-Aug-2014 |
Josh Triplett <josh@joshtriplett.org> |
mm: Support compiling out madvise and fadvise Many embedded systems will not need these syscalls, and omitting them saves space. Add a new EXPERT config option CONFIG_ADVISE_SYSCALLS (default y) to support compiling them out. bloat-o-meter: add/remove: 0/3 grow/shrink: 0/0 up/down: 0/-2250 (-2250) function old new delta sys_fadvise64 57 - -57 sys_fadvise64_64 691 - -691 sys_madvise 1502 - -1502 Signed-off-by: Josh Triplett <josh@joshtriplett.org>
|
#
a2a368d9 |
|
12-Aug-2014 |
Geert Uytterhoeven <geert@linux-m68k.org> |
mm: fix CROSS_MEMORY_ATTACH help text grammar Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
de5b56ba |
|
08-Aug-2014 |
Vivek Goyal <vgoyal@redhat.com> |
kernel: build bin2c based on config option CONFIG_BUILD_BIN2C currently bin2c builds only if CONFIG_IKCONFIG=y. But bin2c will now be used by kexec too. So make it compilation dependent on CONFIG_BUILD_BIN2C and this config option can be selected by CONFIG_KEXEC and CONFIG_IKCONFIG. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Cc: Borislav Petkov <bp@suse.de> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Eric Biederman <ebiederm@xmission.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Matthew Garrett <mjg59@srcf.ucam.org> Cc: Greg Kroah-Hartman <greg@kroah.com> Cc: Dave Young <dyoung@redhat.com> Cc: WANG Chao <chaowang@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
23b2899f |
|
06-Aug-2014 |
Luis R. Rodriguez <mcgrof@suse.com> |
printk: allow increasing the ring buffer depending on the number of CPUs The default size of the ring buffer is too small for machines with a large amount of CPUs under heavy load. What ends up happening when debugging is the ring buffer overlaps and chews up old messages making debugging impossible unless the size is passed as a kernel parameter. An idle system upon boot up will on average spew out only about one or two extra lines but where this really matters is on heavy load and that will vary widely depending on the system and environment. There are mechanisms to help increase the kernel ring buffer for tracing through debugfs, and those interfaces even allow growing the kernel ring buffer per CPU. We also have a static value which can be passed upon boot. Relying on debugfs however is not ideal for production, and relying on the value passed upon bootup is can only used *after* an issue has creeped up. Instead of being reactive this adds a proactive measure which lets you scale the amount of contributions you'd expect to the kernel ring buffer under load by each CPU in the worst case scenario. We use num_possible_cpus() to avoid complexities which could be introduced by dynamically changing the ring buffer size at run time, num_possible_cpus() lets us use the upper limit on possible number of CPUs therefore avoiding having to deal with hotplugging CPUs on and off. This introduces the kernel configuration option LOG_CPU_MAX_BUF_SHIFT which is used to specify the maximum amount of contributions to the kernel ring buffer in the worst case before the kernel ring buffer flips over, the size is specified as a power of 2. The total amount of contributions made by each CPU must be greater than half of the default kernel ring buffer size (1 << LOG_BUF_SHIFT bytes) in order to trigger an increase upon bootup. The kernel ring buffer is increased to the next power of two that would fit the required minimum kernel ring buffer size plus the additional CPU contribution. For example if LOG_BUF_SHIFT is 18 (256 KB) you'd require at least 128 KB contributions by other CPUs in order to trigger an increase of the kernel ring buffer. With a LOG_CPU_BUF_SHIFT of 12 (4 KB) you'd require at least anything over > 64 possible CPUs to trigger an increase. If you had 128 possible CPUs the amount of minimum required kernel ring buffer bumps to: ((1 << 18) + ((128 - 1) * (1 << 12))) / 1024 = 764 KB Since we require the ring buffer to be a power of two the new required size would be 1024 KB. This CPU contributions are ignored when the "log_buf_len" kernel parameter is used as it forces the exact size of the ring buffer to an expected power of two value. [pmladek@suse.cz: fix build] Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com> Signed-off-by: Petr Mladek <pmladek@suse.cz> Tested-by: Davidlohr Bueso <davidlohr@hp.com> Tested-by: Petr Mladek <pmladek@suse.cz> Reviewed-by: Davidlohr Bueso <davidlohr@hp.com> Cc: Andrew Lunn <andrew@lunn.ch> Cc: Stephen Warren <swarren@wwwdotorg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Petr Mladek <pmladek@suse.cz> Cc: Joe Perches <joe@perches.com> Cc: Arun KS <arunks.linux@gmail.com> Cc: Kees Cook <keescook@chromium.org> Cc: Davidlohr Bueso <davidlohr@hp.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
ab74fdfd |
|
04-May-2014 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Handle obsolete references to TINY_PREEMPT_RCU Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
|
#
b58cc46c |
|
02-Jul-2014 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Don't offload callbacks unless specifically requested Enabling NO_HZ_FULL currently has the side effect of enabling callback offloading on all CPUs. This results in lots of additional rcuo kthreads, and can also increase context switching and wakeups, even in cases where callback offloading is neither needed nor particularly desirable. This commit therefore enables callback offloading on a given CPU only if specifically requested at build time or boot time, or if that CPU has been specifically designated (again, either at build time or boot time) as a nohz_full CPU. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
f6187769 |
|
04-Jun-2014 |
Fabian Frederick <fabf@skynet.be> |
sys_sgetmask/sys_ssetmask: add CONFIG_SGETMASK_SYSCALL sys_sgetmask and sys_ssetmask are obsolete system calls no longer supported in libc. This patch replaces architecture related __ARCH_WANT_SYS_SGETMAX by expert mode configuration.That option is enabled by default for those architectures. Signed-off-by: Fabian Frederick <fabf@skynet.be> Cc: Steven Miao <realmz6@gmail.com> Cc: Mikael Starvik <starvik@axis.com> Cc: Jesper Nilsson <jesper.nilsson@axis.com> Cc: David Howells <dhowells@redhat.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Michal Simek <monstr@monstr.eu> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com> Cc: "James E.J. Bottomley" <jejb@parisc-linux.org> Cc: Helge Deller <deller@gmx.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Greg Ungerer <gerg@uclinux.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
226b4ccd |
|
04-Jun-2014 |
Konstantin Khlebnikov <koct9i@gmail.com> |
mm/process_vm_access: move config option into init/Kconfig CONFIG_CROSS_MEMORY_ATTACH adds couple syscalls: process_vm_readv and process_vm_writev, it's a kind of IPC for copying data between processes. Currently this option is placed inside "Processor type and features". This patch moves it into "General setup" (where all other arch-independed syscalls and ipc features are placed) and changes prompt string to less cryptic. Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Christopher Yeoh <cyeoh@au1.ibm.com> Cc: Davidlohr Bueso <davidlohr@hp.com> Cc: Hugh Dickins <hughd@google.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
f98bafa0 |
|
04-Jun-2014 |
Oleg Nesterov <oleg@redhat.com> |
memcg: kill CONFIG_MM_OWNER CONFIG_MM_OWNER makes no sense. It is not user-selectable, it is only selected by CONFIG_MEMCG automatically. So we can kill this option in init/Kconfig and do s/CONFIG_MM_OWNER/CONFIG_MEMCG/ globally. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Michal Hocko <mhocko@suse.cz> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
2ee06468 |
|
04-Jun-2014 |
Vladimir Davydov <vdavydov.dev@gmail.com> |
Documentation/memcg: warn about incomplete kmemcg state Kmemcg is currently under development and lacks some important features. In particular, it does not have support of kmem reclaim on memory pressure inside cgroup, which practically makes it unusable in real life. Let's warn about it in both Kconfig and Documentation to prevent complaints arising. Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Acked-by: Michal Hocko <mhocko@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
82c04ff8 |
|
18-Apr-2014 |
Peter Foley <pefoley2@pefoley.com> |
init/Kconfig: move the trusted keyring config option to general setup The SYSTEM_TRUSTED_KEYRING config option is not in any menu, causing it to show up in the toplevel of the kernel configuration. Fix this by moving it under the General Setup menu. Signed-off-by: Peter Foley <pefoley2@pefoley.com> Cc: David Howells <dhowells@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
5d2acfc7 |
|
07-Apr-2014 |
Josh Triplett <josh@joshtriplett.org> |
kconfig: make allnoconfig disable options behind EMBEDDED and EXPERT "make allnoconfig" exists to ease testing of minimal configurations. Documentation/SubmitChecklist includes a note to test with allnoconfig. This helps catch missing dependencies on common-but-not-required functionality, which might otherwise go unnoticed. However, allnoconfig still leaves many symbols enabled, because they're hidden behind CONFIG_EMBEDDED or CONFIG_EXPERT. For instance, allnoconfig still has CONFIG_PRINTK and CONFIG_BLOCK enabled, so drivers don't typically get build-tested with those disabled. To address this, introduce a new Kconfig option "allnoconfig_y", used on symbols which only exist to hide other symbols. Set it on CONFIG_EMBEDDED (which then selects CONFIG_EXPERT). allnoconfig will then disable all the symbols hidden behind those. Signed-off-by: Josh Triplett <josh@joshtriplett.org> Tested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Michal Marek <mmarek@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
69369a70 |
|
03-Apr-2014 |
Josh Triplett <josh@joshtriplett.org> |
fs, kernel: permit disabling the uselib syscall uselib hasn't been used since libc5; glibc does not use it. Support turning it off. When disabled, also omit the load_elf_library implementation from binfmt_elf.c, which only uselib invokes. bloat-o-meter: add/remove: 0/4 grow/shrink: 0/1 up/down: 0/-785 (-785) function old new delta padzero 39 36 -3 uselib_flags 20 - -20 sys_uselib 168 - -168 SyS_uselib 168 - -168 load_elf_library 426 - -426 The new CONFIG_USELIB defaults to `y'. Signed-off-by: Josh Triplett <josh@joshtriplett.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
6af9f7bf |
|
03-Apr-2014 |
Fabian Frederick <fabf@skynet.be> |
sys_sysfs: Add CONFIG_SYSFS_SYSCALL sys_sysfs is an obsolete system call no longer supported by libc. - This patch adds a default CONFIG_SYSFS_SYSCALL=y - Option can be turned off in expert mode. - cond_syscall added to kernel/sys_ni.c [akpm@linux-foundation.org: tweak Kconfig help text] Signed-off-by: Fabian Frederick <fabf@skynet.be> Cc: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
7a017721 |
|
25-Feb-2014 |
AKASHI Takahiro <takahiro.akashi@linaro.org> |
audit: Add CONFIG_HAVE_ARCH_AUDITSYSCALL Currently AUDITSYSCALL has a long list of architecture depencency: depends on AUDIT && (X86 || PARISC || PPC || S390 || IA64 || UML || SPARC64 || SUPERH || (ARM && AEABI && !OABI_COMPAT) || ALPHA) The purpose of this patch is to replace it with HAVE_ARCH_AUDITSYSCALL for simplicity. Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org> Acked-by: Will Deacon <will.deacon@arm.com> (arm) Acked-by: Richard Guy Briggs <rgb@redhat.com> (audit) Acked-by: Matt Turner <mattst88@gmail.com> (alpha) Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Signed-off-by: Eric Paris <eparis@redhat.com>
|
#
015d991f |
|
19-Dec-2013 |
蔡正龙 <zhenglong.cai@cs2c.com.cn> |
alpha: Enable system-call auditing support. Signed-off-by: Zhenglong.cai <zhenglong.cai@cs2c.com.cn> Signed-off-by: Matt Turner <mattst88@gmail.com>
|
#
03b8c7b6 |
|
02-Mar-2014 |
Heiko Carstens <hca@linux.ibm.com> |
futex: Allow architectures to skip futex_atomic_cmpxchg_inatomic() test If an architecture has futex_atomic_cmpxchg_inatomic() implemented and there is no runtime check necessary, allow to skip the test within futex_init(). This allows to get rid of some code which would always give the same result, and also allows the compiler to optimize a couple of if statements away. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Finn Thain <fthain@telegraphics.com.au> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Link: http://lkml.kernel.org/r/20140302120947.GA3641@osiris Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
2bd59d48 |
|
11-Feb-2014 |
Tejun Heo <tj@kernel.org> |
cgroup: convert to kernfs cgroup filesystem code was derived from the original sysfs implementation which was heavily intertwined with vfs objects and locking with the goal of re-using the existing vfs infrastructure. That experiment turned out rather disastrous and sysfs switched, a long time ago, to distributed filesystem model where a separate representation is maintained which is queried by vfs. Unfortunately, cgroup stuck with the failed experiment all these years and accumulated even more problems over time. Locking and object lifetime management being entangled with vfs is probably the most egregious. vfs is never designed to be misused like this and cgroup ends up jumping through various convoluted dancing to make things work. Even then, operations across multiple cgroups can't be done safely as it'll deadlock with rename locking. Recently, kernfs is separated out from sysfs so that it can be used by users other than sysfs. This patch converts cgroup to use kernfs, which will bring the following benefits. * Separation from vfs internals. Locking and object lifetime management is contained in cgroup proper making things a lot simpler. This removes significant amount of locking convolutions, hairy object lifetime rules and the restriction on multi-cgroup operations. * Can drop a lot of code to implement filesystem interface as most are provided by kernfs. * Proper "severing" semantics, which allows controllers to not worry about lingering file accesses after offline. While the preceding patches did as much as possible to make the transition less painful, large part of the conversion has to be one discrete step making this patch rather large. The rest of the commit message lists notable changes in different areas. Overall ------- * vfs constructs replaced with kernfs ones. cgroup->dentry w/ ->kn, cgroupfs_root->sb w/ ->kf_root. * All dentry accessors are removed. Helpers to map from kernfs constructs are added. * All vfs plumbing around dentry, inode and bdi removed. * cgroup_mount() now directly looks for matching root and then proceeds to create a new one if not found. Synchronization and object lifetime ----------------------------------- * vfs inode locking removed. Among other things, this removes the need for the convolution in cgroup_cfts_commit(). Future patches will further simplify it. * vfs refcnting replaced with cgroup internal ones. cgroup->refcnt, cgroupfs_root->refcnt added. cgroup_put_root() now directly puts root->refcnt and when it reaches zero proceeds to destroy it thus merging cgroup_put_root() and the former cgroup_kill_sb(). Simliarly, cgroup_put() now directly schedules cgroup_free_rcu() when refcnt reaches zero. * Unlike before, kernfs objects don't hold onto cgroup objects. When cgroup destroys a kernfs node, all existing operations are drained and the association is broken immediately. The same for cgroupfs_roots and mounts. * All operations which come through kernfs guarantee that the associated cgroup is and stays valid for the duration of operation; however, there are two paths which need to find out the associated cgroup from dentry without going through kernfs - css_tryget_from_dir() and cgroupstats_build(). For these two, kernfs_node->priv is RCU managed so that they can dereference it under RCU read lock. File and directory handling --------------------------- * File and directory operations converted to kernfs_ops and kernfs_syscall_ops. * xattrs is implicitly supported by kernfs. No need to worry about it from cgroup. This means that "xattr" mount option is no longer necessary. A future patch will add a deprecated warning message when sane_behavior. * When cftype->max_write_len > PAGE_SIZE, it's necessary to make a private copy of one of the kernfs_ops to set its atomic_write_len. cftype->kf_ops is added and cgroup_init/exit_cftypes() are updated to handle it. * cftype->lockdep_key added so that kernfs lockdep annotation can be per cftype. * Inidividual file entries and open states are now managed by kernfs. No need to worry about them from cgroup. cfent, cgroup_open_file and their friends are removed. * kernfs_nodes are created deactivated and kernfs_activate() invocations added to places where creation of new nodes are committed. * cgroup_rmdir() uses kernfs_[un]break_active_protection() for self-removal. v2: - Li pointed out in an earlier patch that specifying "name=" during mount without subsystem specification should succeed if there's an existing hierarchy with a matching name although it should fail with -EINVAL if a new hierarchy should be created. Prior to the conversion, this used by handled by deferring failure from NULL return from cgroup_root_from_opts(), which was necessary because root was being created before checking for existing ones. Note that cgroup_root_from_opts() returned an ERR_PTR() value for error conditions which require immediate mount failure. As we now have separate search and creation steps, deferring failure from cgroup_root_from_opts() is no longer necessary. cgroup_root_from_opts() is updated to always return ERR_PTR() value on failure. - The logic to match existing roots is updated so that a mount attempt with a matching name but different subsys_mask are rejected. This was handled by a separate matching loop under the comment "Check for name clashes with existing mounts" but got lost during conversion. Merge the check into the main search loop. - Add __rcu __force casting in RCU_INIT_POINTER() in cgroup_destroy_locked() to avoid the sparse address space warning reported by kbuild test bot. Maybe we want an explicit interface to use kn->priv as RCU protected pointer? v3: Make CONFIG_CGROUPS select CONFIG_KERNFS. v4: Rebased on top of 0ab02ca8f887 ("cgroup: protect modifications to cgroup_idr with cgroup_mutex"). Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com> Cc: kbuild test robot fengguang.wu@intel.com>
|
#
a9302e84 |
|
19-Dec-2013 |
蔡正龙 <zhenglong.cai@cs2c.com.cn> |
alpha: Enable system-call auditing support. Signed-off-by: Zhenglong.cai <zhenglong.cai@cs2c.com.cn> Signed-off-by: Matt Turner <mattst88@gmail.com>
|
#
be5e610c |
|
18-Nov-2013 |
Peter Zijlstra <peterz@infradead.org> |
math64: Add mul_u64_u32_shr() Introduce mul_u64_u32_shr() as proposed by Andy a while back; it allows using 64x64->128 muls on 64bit archs and recent GCC which defines __SIZEOF_INT128__ and __int128. (This new method will be used by the scheduler.) Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: fweisbec@gmail.com Cc: Andy Lutomirski <luto@amacapital.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/n/tip-hxjoeuzmrcaumR0uZwjpe2pv@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
99c8b1ea |
|
24-Oct-2013 |
Paul Gortmaker <paul.gortmaker@windriver.com> |
trivial: fix spelling in CONTEXT_TRACKING_FORCE help text Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org>
|
#
67f502dc |
|
24-Oct-2013 |
Paul Gortmaker <paul.gortmaker@windriver.com> |
Kconfig: fix spelling in CONTEXT_TRACKING_FORCE help text Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
|
#
261000a5 |
|
25-May-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
userns: userns: Remove UIDGID_STRICT_TYPE_CHECKS Removing UIDGID_STRICT_TYPE_CHECKS simplifies the code and always generates a compile error if the uids and kuids or gids and kgids are mixed by accident. Now that the appropriate conversions have been placed throughout the kernel there is no longer a need for a mode where we don't detect them as compile errors. Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
#
79bd9814 |
|
22-Nov-2013 |
Tejun Heo <tj@kernel.org> |
cgroup, memcg: move cgroup_event implementation to memcg cgroup_event is way over-designed and tries to build a generic flexible event mechanism into cgroup - fully customizable event specification for each user of the interface. This is utterly unnecessary and overboard especially in the light of the planned unified hierarchy as there's gonna be single agent. Simply generating events at fixed points, or if that's too restrictive, configureable cadence or single set of configureable points should be enough. Thankfully, memcg is the only user and gets to keep it. Replacing it with something simpler on sane_behavior is strongly recommended. This patch moves cgroup_event and "cgroup.event_control" implementation to mm/memcontrol.c. Clearing of events on cgroup destruction is moved from cgroup_destroy_locked() to mem_cgroup_css_offline(), which shouldn't make any noticeable difference. cgroup_css() and __file_cft() are exported to enable the move; however, this will soon be reverted once the event code is updated to be memcg specific. Note that "cgroup.event_control" will now exist only on the hierarchy with memcg attached to it. While this change is visible to userland, it is unlikely to be noticeable as the file has never been meaningful outside memcg. Aside from the above change, this is pure code relocation. v2: Per Li Zefan's comments, init/Kconfig updated accordingly and poll.h inclusion moved from cgroup.c to memcontrol.c. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Michal Hocko <mhocko@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Balbir Singh <bsingharora@gmail.com>
|
#
2d3c6275 |
|
14-Nov-2013 |
H. Peter Anvin <hpa@zytor.com> |
Revert "init/Kconfig: add option to disable kernel compression" This reverts commit 69f0554ec261fd686ac7fa1c598cc9eb27b83a80. This patch breaks randconfig on at least the x86-64 architecture, and most likely on others. There is work underway to support uncompressed kernels in a generic way, but it looks like it will amount to rewriting the support from scratch; see the LKML thread in the Link: for info. Therefore, revert this change and wait for the fix. Reported-by: Pavel Roskin <proski@gnu.org> Cc: Christian Ruppert <christian.ruppert@abilis.com> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/20131113113418.167b8ffd@IRBT4585 Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
69f0554e |
|
12-Nov-2013 |
Christian Ruppert <christian.ruppert@abilis.com> |
init/Kconfig: add option to disable kernel compression Some ARC users say they can boot faster with without kernel compression. This probably depends on things like the FLASH chip they use etc. Until now, kernel compression can only be disabled by removing "select HAVE_<compression>" lines from the architecture Kconfig. So add the Kconfig logic to permit disabling of kernel compression. Signed-off-by: Christian Ruppert <christian.ruppert@abilis.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
527973c8 |
|
15-Oct-2013 |
Helge Deller <deller@gmx.de> |
parisc: add kernel audit feature Implement missing functions for parisc to provide kernel audit feature. Signed-off-by: Helge Deller <deller@gmx.de>
|
#
83fa6bbe |
|
24-May-2013 |
Eric Paris <eparis@redhat.com> |
audit: remove CONFIG_AUDIT_LOGINUID_IMMUTABLE After trying to use this feature in Fedora we found the hard coding policy like this into the kernel was a bad idea. Surprise surprise. We ran into these problems because it was impossible to launch a container as a logged in user and run a login daemon inside that container. This reverts back to the old behavior before this option was added. The option will be re-added in a userspace selectable manor such that userspace can choose when it is and when it is not appropriate. Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Eric Paris <eparis@redhat.com>
|
#
6d56a410 |
|
13-Aug-2013 |
Paul Gortmaker <paul.gortmaker@windriver.com> |
NUMA: fix typos in Kconfig help text Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
|
#
ff3fb254 |
|
16-Sep-2013 |
Kevin Hilman <khilman@linaro.org> |
nohz: Drop generic vtime obsolete dependency on CONFIG_64BIT The CONFIG_64BIT requirement on vtime can finally be removed since we now depend on HAVE_VIRT_CPU_ACCOUNTING_GEN which already takes care of the arch ability to handle nsecs based cputime_t safely. Signed-off-by: Kevin Hilman <khilman@linaro.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Arm Linux <linux-arm-kernel@lists.infradead.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
|
#
554b0004 |
|
16-Sep-2013 |
Kevin Hilman <khilman@linaro.org> |
vtime: Add HAVE_VIRT_CPU_ACCOUNTING_GEN Kconfig With VIRT_CPU_ACCOUNTING_GEN, cputime_t becomes 64-bit. In order to use that feature, arch code should be audited to ensure there are no races in concurrent read/write of cputime_t. For example, reading/writing 64-bit cputime_t on some 32-bit arches may require multiple accesses for low and high value parts, so proper locking is needed to protect against concurrent accesses. Therefore, add CONFIG_HAVE_VIRT_CPU_ACCOUNTING_GEN which arches can enable after they've been audited for potential races. This option is automatically enabled on 64-bit platforms. Feature requested by Frederic Weisbecker. Signed-off-by: Kevin Hilman <khilman@linaro.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Arm Linux <linux-arm-kernel@lists.infradead.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
|
#
b56e5a17 |
|
30-Aug-2013 |
David Howells <dhowells@redhat.com> |
KEYS: Separate the kernel signature checking keyring from module signing Separate the kernel signature checking keyring from module signing so that it can be used by code other than the module-signing code. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
07555ac1 |
|
22-Aug-2013 |
Michal Hocko <mhocko@suse.cz> |
memcg: get rid of swapaccount leftovers The swapaccount kernel parameter without any values has been removed by commit a2c8990aed5a ("memsw: remove noswapaccount kernel parameter") but it seems that we didn't get rid of all the left overs. Make sure that menuconfig help text and kernel-parameters.txt are clear about value for the paramter and remove the stalled comment which is not very much useful on its own. Signed-off-by: Michal Hocko <mhocko@suse.cz> Reported-by: Gergely Risko <gergely@risko.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
53614714 |
|
25-Jul-2013 |
James Hogan <jhogan@kernel.org> |
rcu: Select IRQ_WORK from TREE_PREEMPT_RCU TREE_RCU and TREE_PREEMPT_RCU both cause kernel/rcutree.c to be built, but only TREE_RCU selects IRQ_WORK, which can result in an undefined reference to irq_work_queue for some (random) configs: kernel/built-in.o In function `rcu_start_gp_advanced': kernel/rcutree.c:1564: undefined reference to `irq_work_queue' Select IRQ_WORK from TREE_PREEMPT_RCU too to fix this. Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Dipankar Sarma <dipankar@in.ibm.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
|
#
11097a03 |
|
11-Aug-2013 |
Yann E. MORIN <yann.morin.1998@free.fr> |
modules: do not depend on kconfig to set 'modules' option to symbol MODULES Currently, the MODULES symbol is special-cased in different places in the kconfig language. For example, if no symbol is defined to enable tristates, then kconfig looks up for a symbol named 'MODULES', and forces the 'modules' option onto that symbol. This causes problems as such: - since MODULES is special-cased, reading the configuration with KCONFIG_ALLCONFIG set will forcibly set MODULES to be 'valid' (ie. it has a valid value), when no such value was previously set. So MODULES defaults to 'n' unless it is present in KCONFIG_ALLCONFIG - other third-party projects may decide that 'MODULES' plays a different role for them This has been exposed by cset #cfa98f2e: kconfig: do not override symbols already set and reported by Stephen in: http://marc.info/?l=linux-next&m=137592137915234&w=2 As suggested by Sam, we explicitly define the MODULES symbol to be the tristate-enabler. This will allow us to drop special-casing of MODULES in the kconfig language, later. (Note: this patch is not a fix to Stephen's issue, just a first step). Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: yann.morin.1998@free.fr Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Michal Marek <mmarek@suse.cz> Cc: Kevin Hilman <khilman@linaro.org> Cc: sedat.dilek@gmail.com Cc: Theodore Ts'o <tytso@mit.edu>
|
#
d6970d4b |
|
15-Aug-2013 |
Dwight Engen <dwight.engen@oracle.com> |
enable building user namespace with xfs Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Dwight Engen <dwight.engen@oracle.com> Signed-off-by: Ben Myers <bpm@sgi.com>
|
#
b39ffbf8 |
|
17-Jul-2013 |
Uwe Kleine-König <u.kleine-koenig@pengutronix.de> |
slub: don't use cpu partial pages on UP cpu partial pages are used to avoid contention which does not exist in the UP case. So let SLUB_CPU_PARTIAL depend on SMP. Acked-by: Christoph Lameter <cl@linux.com> Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Pekka Enberg <penberg@kernel.org>
|
#
d84d27a4 |
|
24-Jul-2013 |
Frederic Weisbecker <fweisbec@gmail.com> |
context_tracking: Remove full dynticks' hacky dependency on wide context tracking Now that the full dynticks subsystem only enables the context tracking on full dynticks CPUs, lets remove the dependency on CONTEXT_TRACKING_FORCE This dependency was a hack to enable the context tracking widely for the full dynticks susbsystem until the latter becomes able to enable it in a more CPU-finegrained fashion. Now CONTEXT_TRACKING_FORCE only stands for testing on archs that work on support for the context tracking while full dynticks can't be used yet due to unmet dependencies. It simulates a system where all CPUs are full dynticks so that RCU user extended quiescent states and dynticks cputime accounting can be tested on the given arch. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Kevin Hilman <khilman@linaro.org>
|
#
e76e1fdf |
|
08-Jul-2013 |
Kyungsik Lee <kyungsik.lee@lge.com> |
lib: add support for LZ4-compressed kernel Add support for extracting LZ4-compressed kernel images, as well as LZ4-compressed ramdisk images in the kernel boot process. Signed-off-by: Kyungsik Lee <kyungsik.lee@lge.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Borislav Petkov <bp@alien8.de> Cc: Florian Fainelli <florian@openwrt.org> Cc: Yann Collet <yann.collet.73@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
345c905d |
|
18-Jun-2013 |
Joonsoo Kim <iamjoonsoo.kim@lge.com> |
slub: Make cpu partial slab support configurable CPU partial support can introduce level of indeterminism that is not wanted in certain context (like a realtime kernel). Make it configurable. This patch is based on Christoph Lameter's "slub: Make cpu partial slab support configurable V2". Acked-by: Christoph Lameter <cl@linux.com> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
|
#
f60e2a96 |
|
03-Jul-2013 |
Sergey Dyasly <dserrg@gmail.com> |
memcg: Kconfig info update Now there are only 2 members in struct page_cgroup. Update config MEMCG description accordingly. Signed-off-by: Sergey Dyasly <dserrg@gmail.com> Acked-by: Michal Hocko <mhocko@suse.cz> Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
4bb16672 |
|
22-May-2013 |
Jiri Slaby <jirislaby@kernel.org> |
build some drivers only when compile-testing Some drivers can be built on more platforms than they run on. This is a burden for users and distributors who package a kernel. They have to manually deselect some (for them useless) drivers when updating their configs via oldconfig. And yet, sometimes it is even impossible to disable the drivers without patching the kernel. Introduce a new config option COMPILE_TEST and make all those drivers to depend on the platform they run on, or on the COMPILE_TEST option. Now, when users/distributors choose COMPILE_TEST=n they will not have the drivers in their allmodconfig setups, but developers still can compile-test them with COMPILE_TEST=y. Now the drivers where we use this new option: * PTP_1588_CLOCK_PCH: The PCH EG20T is only compatible with Intel Atom processors so it should depend on x86. * FB_GEODE: Geode is 32-bit only so only enable it for X86_32. * USB_CHIPIDEA_IMX: The OF_DEVICE dependency will be met on powerpc systems -- which do not actually support the hardware via that method. * INTEL_MID_PTI: It is specific to the Penwell type of Intel Atom device. [v2] * remove EXPERT dependency [gregkh - remove chipidea portion, as it's incorrect, and also doesn't apply to my driver-core tree] Signed-off-by: Jiri Slaby <jslaby@suse.cz> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Jeff Mahoney <jeffm@suse.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: linux-usb@vger.kernel.org Cc: Florian Tobias Schandinat <FlorianSchandinat@gmx.de> Cc: linux-geode@lists.infradead.org Cc: linux-fbdev@vger.kernel.org Cc: Richard Cochran <richardcochran@gmail.com> Cc: netdev@vger.kernel.org Cc: Ben Hutchings <ben@decadent.org.uk> Cc: "Keller, Jacob E" <jacob.e.keller@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
38ff87f7 |
|
02-Jun-2013 |
Stephen Boyd <sboyd@codeaurora.org> |
sched_clock: Make ARM's sched_clock generic for all architectures Nothing about the sched_clock implementation in the ARM port is specific to the architecture. Generalize the code so that other architectures can use it by selecting GENERIC_SCHED_CLOCK. Signed-off-by: Stephen Boyd <sboyd@codeaurora.org> [jstultz: Merge minor collisions with other patches in my tree] Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
127781d1 |
|
27-Mar-2013 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Remove TINY_PREEMPT_RCU TINY_PREEMPT_RCU adds significant code and complexity, but does not offer commensurate benefits. People currently using TINY_PREEMPT_RCU can get much better memory footprint with TINY_RCU, or, if they really need preemptible RCU, they can use TREE_PREEMPT_RCU with a relatively minor degradation in memory footprint. Please note that this move has been widely publicized on LKML (https://lkml.org/lkml/2012/11/12/545) and on LWN (http://lwn.net/Articles/541037/). This commit therefore removes TINY_PREEMPT_RCU. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> [ paulmck: Updated to eliminate #else in rcutiny.h as suggested by Josh ] Reviewed-by: Josh Triplett <josh@joshtriplett.org>
|
#
676c3dc2 |
|
30-Apr-2013 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Apply Dave Jones's NOCB Kconfig help feedback The Kconfig help text for the RCU_NOCB_CPU_NONE, RCU_NOCB_CPU_ZERO, and RCU_NOCB_CPU_ALL Kconfig options was unclear, so this commit adds a bit more detail. Reported-by: Dave Jones <davej@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
9a5739d7 |
|
28-Mar-2013 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Remove "Experimental" flags After a release or two, features are no longer experimental. Therefore, this commit removes the "Experimental" tag from them. Reported-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
|
#
016a8d5b |
|
28-May-2013 |
Steven Rostedt <rostedt@goodmis.org> |
rcu: Don't call wakeup() with rcu_node structure ->lock held This commit fixes a lockdep-detected deadlock by moving a wake_up() call out from a rnp->lock critical section. Please see below for the long version of this story. On Tue, 2013-05-28 at 16:13 -0400, Dave Jones wrote: > [12572.705832] ====================================================== > [12572.750317] [ INFO: possible circular locking dependency detected ] > [12572.796978] 3.10.0-rc3+ #39 Not tainted > [12572.833381] ------------------------------------------------------- > [12572.862233] trinity-child17/31341 is trying to acquire lock: > [12572.870390] (rcu_node_0){..-.-.}, at: [<ffffffff811054ff>] rcu_read_unlock_special+0x9f/0x4c0 > [12572.878859] > but task is already holding lock: > [12572.894894] (&ctx->lock){-.-...}, at: [<ffffffff811390ed>] perf_lock_task_context+0x7d/0x2d0 > [12572.903381] > which lock already depends on the new lock. > > [12572.927541] > the existing dependency chain (in reverse order) is: > [12572.943736] > -> #4 (&ctx->lock){-.-...}: > [12572.960032] [<ffffffff810b9851>] lock_acquire+0x91/0x1f0 > [12572.968337] [<ffffffff816ebc90>] _raw_spin_lock+0x40/0x80 > [12572.976633] [<ffffffff8113c987>] __perf_event_task_sched_out+0x2e7/0x5e0 > [12572.984969] [<ffffffff81088953>] perf_event_task_sched_out+0x93/0xa0 > [12572.993326] [<ffffffff816ea0bf>] __schedule+0x2cf/0x9c0 > [12573.001652] [<ffffffff816eacfe>] schedule_user+0x2e/0x70 > [12573.009998] [<ffffffff816ecd64>] retint_careful+0x12/0x2e > [12573.018321] > -> #3 (&rq->lock){-.-.-.}: > [12573.034628] [<ffffffff810b9851>] lock_acquire+0x91/0x1f0 > [12573.042930] [<ffffffff816ebc90>] _raw_spin_lock+0x40/0x80 > [12573.051248] [<ffffffff8108e6a7>] wake_up_new_task+0xb7/0x260 > [12573.059579] [<ffffffff810492f5>] do_fork+0x105/0x470 > [12573.067880] [<ffffffff81049686>] kernel_thread+0x26/0x30 > [12573.076202] [<ffffffff816cee63>] rest_init+0x23/0x140 > [12573.084508] [<ffffffff81ed8e1f>] start_kernel+0x3f1/0x3fe > [12573.092852] [<ffffffff81ed856f>] x86_64_start_reservations+0x2a/0x2c > [12573.101233] [<ffffffff81ed863d>] x86_64_start_kernel+0xcc/0xcf > [12573.109528] > -> #2 (&p->pi_lock){-.-.-.}: > [12573.125675] [<ffffffff810b9851>] lock_acquire+0x91/0x1f0 > [12573.133829] [<ffffffff816ebe9b>] _raw_spin_lock_irqsave+0x4b/0x90 > [12573.141964] [<ffffffff8108e881>] try_to_wake_up+0x31/0x320 > [12573.150065] [<ffffffff8108ebe2>] default_wake_function+0x12/0x20 > [12573.158151] [<ffffffff8107bbf8>] autoremove_wake_function+0x18/0x40 > [12573.166195] [<ffffffff81085398>] __wake_up_common+0x58/0x90 > [12573.174215] [<ffffffff81086909>] __wake_up+0x39/0x50 > [12573.182146] [<ffffffff810fc3da>] rcu_start_gp_advanced.isra.11+0x4a/0x50 > [12573.190119] [<ffffffff810fdb09>] rcu_start_future_gp+0x1c9/0x1f0 > [12573.198023] [<ffffffff810fe2c4>] rcu_nocb_kthread+0x114/0x930 > [12573.205860] [<ffffffff8107a91d>] kthread+0xed/0x100 > [12573.213656] [<ffffffff816f4b1c>] ret_from_fork+0x7c/0xb0 > [12573.221379] > -> #1 (&rsp->gp_wq){..-.-.}: > [12573.236329] [<ffffffff810b9851>] lock_acquire+0x91/0x1f0 > [12573.243783] [<ffffffff816ebe9b>] _raw_spin_lock_irqsave+0x4b/0x90 > [12573.251178] [<ffffffff810868f3>] __wake_up+0x23/0x50 > [12573.258505] [<ffffffff810fc3da>] rcu_start_gp_advanced.isra.11+0x4a/0x50 > [12573.265891] [<ffffffff810fdb09>] rcu_start_future_gp+0x1c9/0x1f0 > [12573.273248] [<ffffffff810fe2c4>] rcu_nocb_kthread+0x114/0x930 > [12573.280564] [<ffffffff8107a91d>] kthread+0xed/0x100 > [12573.287807] [<ffffffff816f4b1c>] ret_from_fork+0x7c/0xb0 Notice the above call chain. rcu_start_future_gp() is called with the rnp->lock held. Then it calls rcu_start_gp_advance, which does a wakeup. You can't do wakeups while holding the rnp->lock, as that would mean that you could not do a rcu_read_unlock() while holding the rq lock, or any lock that was taken while holding the rq lock. This is because... (See below). > [12573.295067] > -> #0 (rcu_node_0){..-.-.}: > [12573.309293] [<ffffffff810b8d36>] __lock_acquire+0x1786/0x1af0 > [12573.316568] [<ffffffff810b9851>] lock_acquire+0x91/0x1f0 > [12573.323825] [<ffffffff816ebc90>] _raw_spin_lock+0x40/0x80 > [12573.331081] [<ffffffff811054ff>] rcu_read_unlock_special+0x9f/0x4c0 > [12573.338377] [<ffffffff810760a6>] __rcu_read_unlock+0x96/0xa0 > [12573.345648] [<ffffffff811391b3>] perf_lock_task_context+0x143/0x2d0 > [12573.352942] [<ffffffff8113938e>] find_get_context+0x4e/0x1f0 > [12573.360211] [<ffffffff811403f4>] SYSC_perf_event_open+0x514/0xbd0 > [12573.367514] [<ffffffff81140e49>] SyS_perf_event_open+0x9/0x10 > [12573.374816] [<ffffffff816f4dd4>] tracesys+0xdd/0xe2 Notice the above trace. perf took its own ctx->lock, which can be taken while holding the rq lock. While holding this lock, it did a rcu_read_unlock(). The perf_lock_task_context() basically looks like: rcu_read_lock(); raw_spin_lock(ctx->lock); rcu_read_unlock(); Now, what looks to have happened, is that we scheduled after taking that first rcu_read_lock() but before taking the spin lock. When we scheduled back in and took the ctx->lock, the following rcu_read_unlock() triggered the "special" code. The rcu_read_unlock_special() takes the rnp->lock, which gives us a possible deadlock scenario. CPU0 CPU1 CPU2 ---- ---- ---- rcu_nocb_kthread() lock(rq->lock); lock(ctx->lock); lock(rnp->lock); wake_up(); lock(rq->lock); rcu_read_unlock(); rcu_read_unlock_special(); lock(rnp->lock); lock(ctx->lock); **** DEADLOCK **** > [12573.382068] > other info that might help us debug this: > > [12573.403229] Chain exists of: > rcu_node_0 --> &rq->lock --> &ctx->lock > > [12573.424471] Possible unsafe locking scenario: > > [12573.438499] CPU0 CPU1 > [12573.445599] ---- ---- > [12573.452691] lock(&ctx->lock); > [12573.459799] lock(&rq->lock); > [12573.467010] lock(&ctx->lock); > [12573.474192] lock(rcu_node_0); > [12573.481262] > *** DEADLOCK *** > > [12573.501931] 1 lock held by trinity-child17/31341: > [12573.508990] #0: (&ctx->lock){-.-...}, at: [<ffffffff811390ed>] perf_lock_task_context+0x7d/0x2d0 > [12573.516475] > stack backtrace: > [12573.530395] CPU: 1 PID: 31341 Comm: trinity-child17 Not tainted 3.10.0-rc3+ #39 > [12573.545357] ffffffff825b4f90 ffff880219f1dbc0 ffffffff816e375b ffff880219f1dc00 > [12573.552868] ffffffff816dfa5d ffff880219f1dc50 ffff88023ce4d1f8 ffff88023ce4ca40 > [12573.560353] 0000000000000001 0000000000000001 ffff88023ce4d1f8 ffff880219f1dcc0 > [12573.567856] Call Trace: > [12573.575011] [<ffffffff816e375b>] dump_stack+0x19/0x1b > [12573.582284] [<ffffffff816dfa5d>] print_circular_bug+0x200/0x20f > [12573.589637] [<ffffffff810b8d36>] __lock_acquire+0x1786/0x1af0 > [12573.596982] [<ffffffff810918f5>] ? sched_clock_cpu+0xb5/0x100 > [12573.604344] [<ffffffff810b9851>] lock_acquire+0x91/0x1f0 > [12573.611652] [<ffffffff811054ff>] ? rcu_read_unlock_special+0x9f/0x4c0 > [12573.619030] [<ffffffff816ebc90>] _raw_spin_lock+0x40/0x80 > [12573.626331] [<ffffffff811054ff>] ? rcu_read_unlock_special+0x9f/0x4c0 > [12573.633671] [<ffffffff811054ff>] rcu_read_unlock_special+0x9f/0x4c0 > [12573.640992] [<ffffffff811390ed>] ? perf_lock_task_context+0x7d/0x2d0 > [12573.648330] [<ffffffff810b429e>] ? put_lock_stats.isra.29+0xe/0x40 > [12573.655662] [<ffffffff813095a0>] ? delay_tsc+0x90/0xe0 > [12573.662964] [<ffffffff810760a6>] __rcu_read_unlock+0x96/0xa0 > [12573.670276] [<ffffffff811391b3>] perf_lock_task_context+0x143/0x2d0 > [12573.677622] [<ffffffff81139070>] ? __perf_event_enable+0x370/0x370 > [12573.684981] [<ffffffff8113938e>] find_get_context+0x4e/0x1f0 > [12573.692358] [<ffffffff811403f4>] SYSC_perf_event_open+0x514/0xbd0 > [12573.699753] [<ffffffff8108cd9d>] ? get_parent_ip+0xd/0x50 > [12573.707135] [<ffffffff810b71fd>] ? trace_hardirqs_on_caller+0xfd/0x1c0 > [12573.714599] [<ffffffff81140e49>] SyS_perf_event_open+0x9/0x10 > [12573.721996] [<ffffffff816f4dd4>] tracesys+0xdd/0xe2 This commit delays the wakeup via irq_work(), which is what perf and ftrace use to perform wakeups in critical sections. Reported-by: Dave Jones <davej@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
40b31360 |
|
20-May-2013 |
Stephen Rothwell <sfr@canb.auug.org.au> |
Finally eradicate CONFIG_HOTPLUG Ever since commit 45f035ab9b8f ("CONFIG_HOTPLUG should be always on"), it has been basically impossible to build a kernel with CONFIG_HOTPLUG turned off. Remove all the remaining references to it. Cc: Russell King <linux@arm.linux.org.uk> Cc: Doug Thompson <dougthompson@xmission.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Pavel Machek <pavel@ucw.cz> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com> Acked-by: Hans Verkuil <hans.verkuil@cisco.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
73c30828 |
|
02-May-2013 |
Frederic Weisbecker <fweisbec@gmail.com> |
rcu: Fix full dynticks' dependency on wide RCU nocb mode Commit 0637e029392386e6996f5d6574aadccee8315efa ("nohz: Select wide RCU nocb for full dynticks") intended to force CONFIG_RCU_NOCB_CPU_ALL=y when full dynticks is enabled. However this option is part of a choice menu and Kconfig's "select" instruction has no effect on such targets. Fix this by using reverse dependencies on the targets we don't want instead. Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Christoph Lameter <cl@linux.com> Cc: Hakan Akkan <hakanakkan@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Kevin Hilman <khilman@linaro.org> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de>
|
#
657a5209 |
|
30-Apr-2013 |
Mike Frysinger <vapier@gentoo.org> |
init/Kconfig: re-order CONFIG_EXPERT options to fix menuconfig display The kconfig language requires that dependent options all follow the menuconfig symbol in order to be collapsed below it. Recently some hidden options were added below the EXPERT menuconfig, but did not depend on EXPERT (because hidden options can't). This broke the display. So re-order all these options, and while we're here stick the PCI quirks under the EXPERT menu (since it isn't sitting with any related options). Before this commit, we get: [*] Configure standard kernel features (expert users) ---> [ ] Sysctl syscall support [*] Load all symbols for debugging/ksymoops ... [ ] Embedded system Now we get the older (and correct) behavior: [*] Configure standard kernel features (expert users) ---> [ ] Embedded system And if you go into the expert menu you get the expert options: [ ] Sysctl syscall support [*] Load all symbols for debugging/ksymoops ... Signed-off-by: Mike Frysinger <vapier@gentoo.org> Acked-by: Randy Dunlap <rdunlap@infradead.org> Cc: zhangwei(Jovi) <jovi.zhangwei@huawei.com> Cc: Michal Marek <mmarek@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
c58b0df1 |
|
26-Apr-2013 |
Frederic Weisbecker <fweisbec@gmail.com> |
nohz: Select VIRT_CPU_ACCOUNTING_GEN from full dynticks config Turn the full dynticks passive dependency on VIRT_CPU_ACCOUNTING_GEN to an active one. The full dynticks Kconfig is currently hidden behind the full dynticks cputime accounting, which is an awkward and counter-intuitive layout: the user first has to select the dynticks cputime accounting in order to make the full dynticks feature to be visible. We definetly want it the other way around. The usual way to perform this kind of active dependency is use "select" on the depended target. Now we can't use the Kconfig "select" instruction when the target is a "choice". So this patch inspires on how the RCU subsystem Kconfig interact with its dependencies on SMP and PREEMPT: we make sure that cputime accounting can't propose another option than VIRT_CPU_ACCOUNTING_GEN when NO_HZ_FULL is selected by using the right "depends on" instruction for each cputime accounting choices. v2: Keep full dynticks cputime accounting available even without full dynticks, as per Paul McKenney's suggestion. Reported-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Christoph Lameter <cl@linux.com> Cc: Hakan Akkan <hakanakkan@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Kevin Hilman <khilman@linaro.org> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de>
|
#
3451d024 |
|
10-Aug-2011 |
Frederic Weisbecker <fweisbec@gmail.com> |
nohz: Rename CONFIG_NO_HZ to CONFIG_NO_HZ_COMMON We are planning to convert the dynticks Kconfig options layout into a choice menu. The user must be able to easily pick any of the following implementations: constant periodic tick, idle dynticks, full dynticks. As this implies a mutual exclusion, the two dynticks implementions need to converge on the selection of a common Kconfig option in order to ease the sharing of a common infrastructure. It would thus seem pretty natural to reuse CONFIG_NO_HZ to that end. It already implements all the idle dynticks code and the full dynticks depends on all that code for now. So ideally the choice menu would propose CONFIG_NO_HZ_IDLE and CONFIG_NO_HZ_EXTENDED then both would select CONFIG_NO_HZ. On the other hand we want to stay backward compatible: if CONFIG_NO_HZ is set in an older config file, we want to enable CONFIG_NO_HZ_IDLE by default. But we can't afford both at the same time or we run into a circular dependency: 1) CONFIG_NO_HZ_IDLE and CONFIG_NO_HZ_EXTENDED both select CONFIG_NO_HZ 2) If CONFIG_NO_HZ is set, we default to CONFIG_NO_HZ_IDLE We might be able to support that from Kconfig/Kbuild but it may not be wise to introduce such a confusing behaviour. So to solve this, create a new CONFIG_NO_HZ_COMMON option which gathers the common code between idle and full dynticks (that common code for now is simply the idle dynticks code) and select it from their referring Kconfig. Then we'll later create CONFIG_NO_HZ_IDLE and map CONFIG_NO_HZ to it for backward compatibility. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Christoph Lameter <cl@linux.com> Cc: Geoff Levand <geoff@infradead.org> Cc: Gilad Ben Yossef <gilad@benyossef.com> Cc: Hakan Akkan <hakanakkan@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Kevin Hilman <khilman@linaro.org> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Namhyung Kim <namhyung.kim@lge.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de>
|
#
c0f4dfd4 |
|
28-Dec-2012 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Make RCU_FAST_NO_HZ take advantage of numbered callbacks Because RCU callbacks are now associated with the number of the grace period that they must wait for, CPUs can now take advance callbacks corresponding to grace periods that ended while a given CPU was in dyntick-idle mode. This eliminates the need to try forcing the RCU state machine while entering idle, thus reducing the CPU intensiveness of RCU_FAST_NO_HZ, which should increase its energy efficiency. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
a4889858 |
|
03-Dec-2012 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Distinguish "rcuo" kthreads by RCU flavor Currently, the per-no-CBs-CPU kthreads are named "rcuo" followed by the CPU number, for example, "rcuo". This is problematic given that there are either two or three RCU flavors, each of which gets a per-CPU kthread with exactly the same name. This commit therefore introduces a one-letter abbreviation for each RCU flavor, namely 'b' for RCU-bh, 'p' for RCU-preempt, and 's' for RCU-sched. This abbreviation is used to distinguish the "rcuo" kthreads, for example, for CPU 0 we would have "rcuob/0", "rcuop/0", and "rcuos/0". Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
|
#
911af505 |
|
11-Feb-2013 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Provide compile-time control for no-CBs CPUs Currently, the only way to specify no-CBs CPUs is via the rcu_nocbs kernel command-line parameter. This is inconvenient in some cases, particularly for randconfig testing, so this commit adds a new set of kernel configuration parameters. CONFIG_RCU_NOCB_CPU_NONE (the default) retains the old behavior, CONFIG_RCU_NOCB_CPU_ZERO offloads callback processing from CPU 0 (along with any other CPUs specified by the rcu_nocbs boot-time parameter), and CONFIG_RCU_NOCB_CPU_ALL offloads callback processing from all CPUs. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
3d374d09 |
|
06-Mar-2013 |
Kees Cook <keescook@chromium.org> |
final removal of CONFIG_EXPERIMENTAL Remove "config EXPERIMENTAL" itself, now that every "depends on" it has been removed from the tree. Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
34ed6246 |
|
07-Jan-2013 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Remove restrictions on no-CBs CPUs Currently, CPU 0 is constrained to not be a no-CBs CPU, and furthermore at least one no-CBs CPU must remain online at any given time. These restrictions are problematic in some situations, such as cases where all CPUs must run a real-time workload that needs to be insulated from OS jitter and latencies due to RCU callback invocation. This commit therefore provides no-CBs CPUs a (very crude and energy-inefficient) way to start and to wait for grace periods independently of the normal RCU callback mechanisms. This approach allows any or all of the CPUs to be designated as no-CBs CPUs, and allows any proper subset of the CPUs (whether no-CBs CPUs or not) to be offlined. This commit also provides a fix for a locking bug spotted by Xie ChanglongX <changlongx.xie@intel.com>. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
8b438766 |
|
26-Feb-2013 |
Frederic Weisbecker <fweisbec@gmail.com> |
context_tracking: Enable probes by default for selftesting Until we provide the nohz_mask boot parameter, keeping the context tracking probes disabled by default is pointless since what we want is to runtime test this code anyway. It's furthermore confusing for the users which don't expect the probes to be off when they select RCU user mode or full dynticks cputime accounting. Let's enable these probes selftests by default for now. Suggested: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Kevin Hilman <khilman@linaro.org> Cc: Mats Liljegren <mats.liljegren@enea.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Namhyung Kim <namhyung.kim@lge.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
bf14e3b9 |
|
18-Jan-2013 |
Vineet Gupta <vgupta@synopsys.com> |
sysctl: Enable PARISC "unaligned-trap" to be used cross-arch PARISC defines /proc/sys/kernel/unaligned-trap to runtime toggle unaligned access emulation. The exact mechanics of enablig/disabling are still arch specific, we can make the sysctl usable by other arches. Signed-off-by: Vineet Gupta <vgupta@synopsys.com> Acked-by: Helge Deller <deller@gmx.de> Cc: "James E.J. Bottomley" <jejb@parisc-linux.org> Cc: Helge Deller <deller@gmx.de> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Serge Hallyn <serge.hallyn@canonical.com>
|
#
139321c6 |
|
06-Feb-2013 |
Eric W. Biederman <ebiederm@xmission.com> |
cifs: Enable building with user namespaces enabled. Cc: Steve French <smfrench@gmail.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
#
c9617a44 |
|
02-Feb-2013 |
Eric W. Biederman <ebiederm@xmission.com> |
nfsd: Enable building with user namespaces enabled. Now that the kuids and kgids conversion have propogated through net/sunrpc/ and the fs/nfsd/ it is safe to enable building nfsd when user namespaces are enabled. Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
#
4277bbf7 |
|
02-Feb-2013 |
Eric W. Biederman <ebiederm@xmission.com> |
nfs: Enable building with user namespaces enabled. Now that the kuids and kgids conversion have propogated through net/sunrpc/ and the fs/nfs/ it is safe to enable building nfs when user namespaces are enabled. Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
#
1ac7fd81 |
|
07-Feb-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
ncpfs: Support interacting with multiple user namespaces ncpfs does not natively support uids and gids so this conversion was simply a matter of updating the the type of the mounteduid, the uid and the gid on the superblock. Fixing the ioctls that read them, updating the mount option parser and the mount option printer. Cc: Petr Vandrovec <petr@vandrovec.name> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
#
0f07bd37 |
|
31-Jan-2013 |
Eric W. Biederman <ebiederm@xmission.com> |
gfs2: Enable building with user namespaces enabled Now that all of the necessary work has been done to push kuids and kgids throughout gfs2 and to convert between kuids and kgids when reading and writing the on disk structures it is safe to enable gfs2 when multiple user namespaces are enabled. Cc: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
#
ecb528e3 |
|
31-Jan-2013 |
Eric W. Biederman <ebiederm@xmission.com> |
ocfs2: Enable building with user namespaces enabled Now that ocfs2 has been converted to store uids and gids in kuid_t and kgid_t and all of the conversions have been added to the appropriate places it is safe to allow building and using ocfs2 with user namespace support enabled. Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
#
515ee7bd |
|
30-Jan-2013 |
Eric W. Biederman <ebiederm@xmission.com> |
coda: Allow coda to be built when user namespace support is enabled Now that the coda kernel to userspace has been modified to convert between kuids and kgids and uids and gids, and all internal coda structures have be modified to store uids and gids as kuids and kgids it is safe to allow code to be built with user namespace support enabled. Cc: Jan Harkes <jaharkes@cs.cmu.edu> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
#
a0a5386a |
|
07-Feb-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
afs: Support interacting with multiple user namespaces Modify struct afs_file_status to store owner as a kuid_t and group as a kgid_t. In xdr_decode_AFSFetchStatus as owner is now a kuid_t and group is now a kgid_t don't use the EXTRACT macro. Instead perform the work of the extract macro explicitly. Read the value with ntohl and convert it to the appropriate type with make_kuid or make_kgid. Test if the value is different from what is stored in status and update changed. Update the value in status. In xdr_encode_AFS_StoreStatus call from_kuid or from_kgid as we are computing the on the wire encoding. Initialize uids with GLOBAL_ROOT_UID instead of 0. Initialize gids with GLOBAL_ROOT_GID instead of 0. Cc: David Howells <dhowells@redhat.com> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
#
4fa814be |
|
30-Jan-2013 |
Eric W. Biederman <ebiederm@xmission.com> |
9p: Allow building 9p with user namespaces enabled. Now that the uid_t -> kuid_t, gid_t -> kgid_t conversion has been completed in 9p allow 9p to be built when user namespaces are enabled. Cc: Eric Van Hensbergen <ericvh@gmail.com> Cc: Ron Minnich <rminnich@gmail.com> Cc: Latchesar Ionkov <lucho@ionkov.net> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
#
d5ea055f |
|
31-Jan-2013 |
Eric W. Biederman <ebiederm@xmission.com> |
ceph: Enable building when user namespaces are enabled. Now that conversions happen from kuids and kgids when generating ceph messages and conversion happen to kuids and kgids after receiving celph messages, and all intermediate data structures store uids and gids as type kuid_t and kgid_t it is safe to enable ceph with user namespace support enabled. Cc: Sage Weil <sage@inktank.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
#
02fc8d37 |
|
07-Feb-2013 |
Stephen Rothwell <sfr@canb.auug.org.au> |
cputime: Restore CPU_ACCOUNTING config defaults for PPC64 Commit abf917cd91cb ("cputime: Generic on-demand virtual cputime accounting") inadvertantly changed the default CPU_ACCOUNTING config for PPC64. Repair that. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Namhyung Kim <namhyung.kim@lge.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: ppc-dev <linuxppc-dev@lists.ozlabs.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Link: http://lkml.kernel.org/r/20130208141938.f31b7b9e1acac5bbe769ee4c@canb.auug.org.au Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
9fc52d83 |
|
08-Jan-2013 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Allow TREE_PREEMPT_RCU on UP systems The TINY_PREEMPT_RCU is complex, does not provide that much memory savings, and therefore TREE_PREEMPT_RCU should be used instead. The systems where the difference between TINY_PREEMPT_RCU and TREE_PREEMPT_RCU are quite small compared to the memory footprint of CONFIG_PREEMPT. This commit therefore takes a first step towards eliminating TINY_PREEMPT_RCU by allowing TREE_PREEMPT_RCU to be configured on !SMP systems. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
|
#
6bfc09e2 |
|
19-Oct-2012 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Provide RCU CPU stall warnings for tiny RCU Tiny RCU has historically omitted RCU CPU stall warnings in order to reduce memory requirements, however, lack of these warnings caused Thomas Gleixner some debugging pain recently. Therefore, this commit adds RCU CPU stall warnings to tiny RCU if RCU_TRACE=y. This keeps the memory footprint small, while still enabling CPU stall warnings in kernels built to enable them. Updated to include Josh Triplett's suggested use of RCU_STALL_COMMON config variable to simplify #if expressions. Reported-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
|
#
abf917cd |
|
24-Jul-2012 |
Frederic Weisbecker <fweisbec@gmail.com> |
cputime: Generic on-demand virtual cputime accounting If we want to stop the tick further idle, we need to be able to account the cputime without using the tick. Virtual based cputime accounting solves that problem by hooking into kernel/user boundaries. However implementing CONFIG_VIRT_CPU_ACCOUNTING require low level hooks and involves more overhead. But we already have a generic context tracking subsystem that is required for RCU needs by archs which plan to shut down the tick outside idle. This patch implements a generic virtual based cputime accounting that relies on these generic kernel/user hooks. There are some upsides of doing this: - This requires no arch code to implement CONFIG_VIRT_CPU_ACCOUNTING if context tracking is already built (already necessary for RCU in full tickless mode). - We can rely on the generic context tracking subsystem to dynamically (de)activate the hooks, so that we can switch anytime between virtual and tick based accounting. This way we don't have the overhead of the virtual accounting when the tick is running periodically. And one downside: - There is probably more overhead than a native virtual based cputime accounting. But this relies on hooks that are already set anyway. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Namhyung Kim <namhyung.kim@lge.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de>
|
#
e11f0ae3 |
|
25-Jan-2013 |
Eric W. Biederman <ebiederm@xmission.com> |
userns: Recommend use of memory control groups. In the help text describing user namespaces recommend use of memory control groups. In many cases memory control groups are the only mechanism there is to limit how much memory a user who can create user namespaces can use. Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
#
d9d8d7ed |
|
24-Jan-2013 |
Michal Marek <mmarek@suse.cz> |
MODSIGN: Add option to not sign modules during modules_install To allow the builder to sign only a subset of modules, or to sign the modules using a key that is not available on the build machine, add CONFIG_MODULE_SIG_ALL. If this option is unset, no modules will be signed during build. The default is 'y', to preserve the current behavior. Signed-off-by: Michal Marek <mmarek@suse.cz> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
#
22753674 |
|
24-Jan-2013 |
Michal Marek <mmarek@suse.cz> |
MODSIGN: Simplify Makefile with a Kconfig helper Signed-off-by: Michal Marek <mmarek@suse.cz> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
#
3a55fb0d |
|
02-Nov-2012 |
Kirill Smelkov <kirr@mns.spb.ru> |
Tell the world we gave up on pushing CC_OPTIMIZE_FOR_SIZE In commit 281dc5c5ec0f ("Give up on pushing CC_OPTIMIZE_FOR_SIZE") we already changed the actual default value, but the help-text still suggested 'y'. Fix the help text too, for all the same reasons. Sadly, -Os keeps on generating some very suboptimal code for certain cases, to the point where any I$ miss upside is swamped by the downside. The main ones are: - using "rep movsb" for memcpy, even on CPU's where that is horrendously bad for performance. - not honoring branch prediction information, so any I$ footprint you win from smaller code, you lose from less code density in the I$. - using divide instructions when that is very expensive. Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
19c92399 |
|
02-Oct-2012 |
Kees Cook <keescook@chromium.org> |
init: remove depends on CONFIG_EXPERIMENTAL The CONFIG_EXPERIMENTAL config item has not carried much meaning for a while now and is almost always enabled by default. As agreed during the Linux kernel summit, remove it from any "depends on" lines in Kconfigs. CC: "Eric W. Biederman" <ebiederm@xmission.com> CC: Serge Hallyn <serge.hallyn@canonical.com> CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> CC: Andrew Morton <akpm@linux-foundation.org> CC: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Serge Hallyn <serge.hallyn@ubuntu.com>
|
#
5a958db3 |
|
02-Oct-2012 |
Kees Cook <keescook@chromium.org> |
make CONFIG_EXPERIMENTAL invisible and default This config item has not carried much meaning for a while now and is almost always enabled by default (especially in distro builds). As agreed during the Linux kernel summit, it should be removed. As a first step, remove it from being listed, and default it to on. Once it has been removed from all subsystem Kconfigs, it will be dropped entirely. For items that really are experimental, maintainers should use "default n", optionally include "(EXPERIMENTAL)" in the title, and add language to the help text indicating why the item should be considered experimental. For items that are dangerously experimental, the maintainer is encouraged to follow the above title recommendation, add stronger language to the help text, and optionally use (depending on the extent of the danger, from least to most dangerous): printk(), add_taint(TAINT_WARN), add_taint(TAINT_CRAP), WARN_ON(1), and CONFIG_BROKEN. CC: Greg KH <gregkh@linuxfoundation.org> CC: "Eric W. Biederman" <ebiederm@xmission.com> CC: Serge Hallyn <serge.hallyn@canonical.com> CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> CC: Andrew Morton <akpm@linux-foundation.org> CC: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
b6fca725 |
|
09-Jan-2013 |
Vineet Gupta <Vineet.Gupta1@synopsys.com> |
sysctl: Enable IA64 "ignore-unaligned-usertrap" to be used cross-arch IA64 defines /proc/sys/kernel/ignore-unaligned-usertrap to control verbose warnings on unaligned access emulation. Although the exact mechanics of what to do with sysctl (ignore/shout) are arch specific, this change enables the sysctl to be usable cross-arch. Signed-off-by: Vineet Gupta <vgupta@synopsys.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
|
#
d7f25f8a |
|
18-Dec-2012 |
Glauber Costa <glommer@parallels.com> |
memcg: infrastructure to match an allocation to the right cache The page allocator is able to bind a page to a memcg when it is allocated. But for the caches, we'd like to have as many objects as possible in a page belonging to the same cache. This is done in this patch by calling memcg_kmem_get_cache in the beginning of every allocation function. This function is patched out by static branches when kernel memory controller is not being used. It assumes that the task allocating, which determines the memcg in the page allocator, belongs to the same cgroup throughout the whole process. Misaccounting can happen if the task calls memcg_kmem_get_cache() while belonging to a cgroup, and later on changes. This is considered acceptable, and should only happen upon task migration. Before the cache is created by the memcg core, there is also a possible imbalance: the task belongs to a memcg, but the cache being allocated from is the global cache, since the child cache is not yet guaranteed to be ready. This case is also fine, since in this case the GFP_KMEMCG will not be passed and the page allocator will not attempt any cgroup accounting. Signed-off-by: Glauber Costa <glommer@parallels.com> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Frederic Weisbecker <fweisbec@redhat.com> Cc: Greg Thelen <gthelen@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: JoonSoo Kim <js1304@gmail.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Michal Hocko <mhocko@suse.cz> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Rik van Riel <riel@redhat.com> Cc: Suleiman Souhlal <suleiman@google.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
510fc4e1 |
|
18-Dec-2012 |
Glauber Costa <glommer@parallels.com> |
memcg: kmem accounting basic infrastructure Add the basic infrastructure for the accounting of kernel memory. To control that, the following files are created: * memory.kmem.usage_in_bytes * memory.kmem.limit_in_bytes * memory.kmem.failcnt * memory.kmem.max_usage_in_bytes They have the same meaning of their user memory counterparts. They reflect the state of the "kmem" res_counter. Per cgroup kmem memory accounting is not enabled until a limit is set for the group. Once the limit is set the accounting cannot be disabled for that group. This means that after the patch is applied, no behavioral changes exists for whoever is still using memcg to control their memory usage, until memory.kmem.limit_in_bytes is set for the first time. We always account to both user and kernel resource_counters. This effectively means that an independent kernel limit is in place when the limit is set to a lower value than the user memory. A equal or higher value means that the user limit will always hit first, meaning that kmem is effectively unlimited. People who want to track kernel memory but not limit it, can set this limit to a very high number (like RESOURCE_MAX - 1page - that no one will ever hit, or equal to the user memory) [akpm@linux-foundation.org: MEMCG_MMEM only works with slab and slub] Signed-off-by: Glauber Costa <glommer@parallels.com> Acked-by: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Michal Hocko <mhocko@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Tejun Heo <tj@kernel.org> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Frederic Weisbecker <fweisbec@redhat.com> Cc: Greg Thelen <gthelen@google.com> Cc: JoonSoo Kim <js1304@gmail.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
1a687c2e |
|
22-Nov-2012 |
Mel Gorman <mgorman@suse.de> |
mm: sched: numa: Control enabling and disabling of NUMA balancing This patch adds Kconfig options and kernel parameters to allow the enabling and disabling of automatic NUMA balancing. The existance of such a switch was and is very important when debugging problems related to transparent hugepages and we should have the same for automatic NUMA placement. Signed-off-by: Mel Gorman <mgorman@suse.de>
|
#
be3a7284 |
|
03-Oct-2012 |
Andrea Arcangeli <aarcange@redhat.com> |
mm: numa: pte_numa() and pmd_numa() Implement pte_numa and pmd_numa. We must atomically set the numa bit and clear the present bit to define a pte_numa or pmd_numa. Once a pte or pmd has been set as pte_numa or pmd_numa, the next time a thread touches a virtual address in the corresponding virtual range, a NUMA hinting page fault will trigger. The NUMA hinting page fault will clear the NUMA bit and set the present bit again to resolve the page fault. The expectation is that a NUMA hinting page fault is used as part of a placement policy that decides if a page should remain on the current node or migrated to a different node. Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Mel Gorman <mgorman@suse.de>
|
#
91d1aa43 |
|
27-Nov-2012 |
Frederic Weisbecker <fweisbec@gmail.com> |
context_tracking: New context tracking susbsystem Create a new subsystem that probes on kernel boundaries to keep track of the transitions between level contexts with two basic initial contexts: user or kernel. This is an abstraction of some RCU code that use such tracking to implement its userspace extended quiescent state. We need to pull this up from RCU into this new level of indirection because this tracking is also going to be used to implement an "on demand" generic virtual cputime accounting. A necessary step to shutdown the tick while still accounting the cputime. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Gilad Ben-Yossef <gilad@benyossef.com> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> [ paulmck: fix whitespace error and email address. ] Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
74876a98 |
|
12-Oct-2012 |
Frederic Weisbecker <fweisbec@gmail.com> |
printk: Wake up klogd using irq_work klogd is woken up asynchronously from the tick in order to do it safely. However if printk is called when the tick is stopped, the reader won't be woken up until the next interrupt, which might not fire for a while. As a result, the user may miss some message. To fix this, lets implement the printk tick using a lazy irq work. This subsystem takes care of the timer tick state and can fix up accordingly. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
|
#
6147a9d8 |
|
19-Oct-2012 |
Frederic Weisbecker <fweisbec@gmail.com> |
irq_work: Remove CONFIG_HAVE_IRQ_WORK irq work can run on any arch even without IPI support because of the hook on update_process_times(). So lets remove HAVE_IRQ_WORK because it doesn't reflect any backend requirement. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
|
#
3fbfbf7a |
|
19-Aug-2012 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Add callback-free CPUs RCU callback execution can add significant OS jitter and also can degrade both scheduling latency and, in asymmetric multiprocessors, energy efficiency. This commit therefore adds the ability for selected CPUs ("rcu_nocbs=" boot parameter) to have their callbacks offloaded to kthreads. If the "rcu_nocb_poll" boot parameter is also specified, these kthreads will do polling, removing the need for the offloaded CPUs to do wakeups. At least one CPU must be doing normal callback processing: currently CPU 0 cannot be selected as a no-CBs CPU. In addition, attempts to offline the last normal-CBs CPU will fail. This feature was inspired by Jim Houston's and Joe Korty's JRCU, and this commit includes fixes to problems located by Fengguang Wu's kbuild test robot. [ paulmck: Added gfp.h include file as suggested by Fengguang Wu. ] Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
499dcf20 |
|
07-Feb-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
userns: Support fuse interacting with multiple user namespaces Use kuid_t and kgid_t in struct fuse_conn and struct fuse_mount_data. The connection between between a fuse filesystem and a fuse daemon is established when a fuse filesystem is mounted and provided with a file descriptor the fuse daemon created by opening /dev/fuse. For now restrict the communication of uids and gids between the fuse filesystem and the fuse daemon to the initial user namespace. Enforce this by verifying the file descriptor passed to the mount of fuse was opened in the initial user namespace. Ensuring the mount happens in the initial user namespace is not necessary as mounts from non-initial user namespaces are not yet allowed. In fuse_req_init_context convert the currrent fsuid and fsgid into the initial user namespace for the request that will be sent to the fuse daemon. In fuse_fill_attr convert the uid and gid passed from the fuse daemon from the initial user namespace into kuids and kgids. In iattr_to_fattr called from fuse_setattr convert kuids and kgids into the uids and gids in the initial user namespace before passing them to the fuse filesystem. In fuse_change_attributes_common called from fuse_dentry_revalidate, fuse_permission, fuse_geattr, and fuse_setattr, and fuse_iget convert the uid and gid from the fuse daemon into a kuid and a kgid to store on the fuse inode. By default fuse mounts are restricted to task whose uid, suid, and euid matches the fuse user_id and whose gid, sgid, and egid matches the fuse group id. Convert the user_id and group_id mount options into kuids and kgids at mount time, and use uid_eq and gid_eq to compare the in fuse_allow_task. Cc: Miklos Szeredi <miklos@szeredi.hu> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
#
45634cd8 |
|
07-Feb-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
userns: Support autofs4 interacing with multiple user namespaces Use kuid_t and kgid_t in struct autofs_info and struct autofs_wait_queue. When creating directories and symlinks default the uid and gid of the mount requester to the global root uid and gid. autofs4_wait will update these fields when a mount is requested. When generating autofsv5 packets report the uid and gid of the mount requestor in user namespace of the process that opened the pipe, reporting unmapped uids and gids as overflowuid and overflowgid. In autofs_dev_ioctl_requester return the uid and gid of the last mount requester converted into the calling processes user namespace. When the uid or gid don't map return overflowuid and overflowgid as appropriate, allowing failure to find a mount requester to be distinguished from failure to map a mount requester. The uid and gid mount options specifying the user and group of the root autofs inode are converted into kuid and kgid as they are parsed defaulting to the current uid and current gid of the process that mounts autofs. Mounting of autofs for the present remains confined to processes in the initial user namespace. Cc: Ian Kent <raven@themaw.net> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
#
af71befa |
|
24-Oct-2012 |
Paul Gortmaker <paul.gortmaker@windriver.com> |
rcu: Wordsmith help text for RCU_USER_QS kernel parameter This commit adds a "try" missing from the end of the first paragraph of the RCU_USER_QS help text. [ paulmck: Also fix up the last paragraph a bit. ] Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
ba49df47 |
|
07-Oct-2012 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Update RCU_FAST_NO_HZ help text The RCU_FAST_NO_HZ help text included a warning about overhead on large systems, but that issue has since been resolved. The main remaining issue with RCU_FAST_NO_HZ is increased real-time latency. This commit therefore updates the help text accordingly. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
d677124b |
|
10-Oct-2012 |
Frederic Weisbecker <fweisbec@gmail.com> |
rcu: Advise most users not to enable RCU user mode Discourage distros from enabling CONFIG_RCU_USER_QS because it brings overhead for no benefits yet. It's not a useful feature on its own until we can fully run an adaptive tickless kernel. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
48ba2462 |
|
26-Sep-2012 |
David Howells <dhowells@redhat.com> |
MODSIGN: Implement module signature checking Check the signature on the module against the keys compiled into the kernel or available in a hardware key store. Currently, only RSA keys are supported - though that's easy enough to change, and the signature is expected to contain raw components (so not a PGP or PKCS#7 formatted blob). The signature blob is expected to consist of the following pieces in order: (1) The binary identifier for the key. This is expected to match the SubjectKeyIdentifier from an X.509 certificate. Only X.509 type identifiers are currently supported. (2) The signature data, consisting of a series of MPIs in which each is in the format of a 2-byte BE word sizes followed by the content data. (3) A 12 byte information block of the form: struct module_signature { enum pkey_algo algo : 8; enum pkey_hash_algo hash : 8; enum pkey_id_type id_type : 8; u8 __pad; __be32 id_length; __be32 sig_length; }; The three enums are defined in crypto/public_key.h. 'algo' contains the public-key algorithm identifier (0->DSA, 1->RSA). 'hash' contains the digest algorithm identifier (0->MD4, 1->MD5, 2->SHA1, etc.). 'id_type' contains the public-key identifier type (0->PGP, 1->X.509). '__pad' should be 0. 'id_length' should contain in the binary identifier length in BE form. 'sig_length' should contain in the signature data length in BE form. The lengths are in BE order rather than CPU order to make dealing with cross-compilation easier. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (minor Kconfig fix)
|
#
ea0b6dcf |
|
26-Sep-2012 |
David Howells <dhowells@redhat.com> |
MODSIGN: Provide Kconfig options Provide kernel configuration options for module signing. The following configuration options are added: CONFIG_MODULE_SIG_SHA1 CONFIG_MODULE_SIG_SHA224 CONFIG_MODULE_SIG_SHA256 CONFIG_MODULE_SIG_SHA384 CONFIG_MODULE_SIG_SHA512 These select the cryptographic hash used to digest the data prior to signing. Additionally, the crypto module selected will be built into the kernel as it won't be possible to load it as a module without incurring a circular dependency when the kernel tries to check its signature. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
#
106a4ee2 |
|
26-Sep-2012 |
Rusty Russell <rusty@rustcorp.com.au> |
module: signature checking hook We do a very simple search for a particular string appended to the module (which is cache-hot and about to be SHA'd anyway). There's both a config option and a boot parameter which control whether we accept or fail with unsigned modules and modules that are signed with an unknown key. If module signing is enabled, the kernel will be tainted if a module is loaded that is unsigned or has a signature for which we don't have the key. (Useful feedback and tweaks by David Howells <dhowells@redhat.com>) Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
#
7ac57a89 |
|
08-Oct-2012 |
Catalin Marinas <catalin.marinas@arm.com> |
Kconfig: clean up the "#if defined(arch)" list for exception-trace sysctl entry Introduce SYSCTL_EXCEPTION_TRACE config option and selec it in the architectures requiring support for the "exception-trace" debug_table entry in kernel/sysctl.c. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
af1839eb |
|
08-Oct-2012 |
Catalin Marinas <catalin.marinas@arm.com> |
Kconfig: clean up the long arch list for the UID16 config option Introduce HAVE_UID16 config option and select it in corresponding architecture Kconfig files. UID16 now only depends on HAVE_UID16. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Russell King <linux@arm.linux.org.uk> Cc: Mike Frysinger <vapier@gentoo.org> Cc: Mikael Starvik <starvik@axis.com> Cc: Jesper Nilsson <jesper.nilsson@axis.com> Cc: David Howells <dhowells@redhat.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jeff Dike <jdike@addtoit.com> Cc: Richard Weinberger <richard@nod.at> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
4520c6a4 |
|
21-Sep-2012 |
David Howells <dhowells@redhat.com> |
X.509: Add simple ASN.1 grammar compiler Add a simple ASN.1 grammar compiler. This produces a bytecode output that can be fed to a decoder to inform the decoder how to interpret the ASN.1 stream it is trying to parse. Action functions can be specified in the grammar by interpolating: ({ foo }) after a type, for example: SubjectPublicKeyInfo ::= SEQUENCE { algorithm AlgorithmIdentifier, subjectPublicKey BIT STRING ({ do_key_data }) } The decoder is expected to call these after matching this type and parsing the contents if it is a constructed type. The grammar compiler does not currently support the SET type (though it does support SET OF) as I can't see a good way of tracking which members have been encountered yet without using up extra stack space. Currently, the grammar compiler will fail if more than 256 bytes of bytecode would be produced or more than 256 actions have been specified as it uses 8-bit jump values and action indices to keep space usage down. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
#
046d662f |
|
04-Oct-2012 |
Alex Kelly <alex.page.kelly@gmail.com> |
coredump: make core dump functionality optional Adds an expert Kconfig option, CONFIG_COREDUMP, which allows disabling of core dump. This saves approximately 2.6k in the compiled kernel, and complements CONFIG_ELF_CORE, which now depends on it. CONFIG_COREDUMP also disables coredump-related sysctls, except for suid_dumpable and related functions, which are necessary for ptrace. [akpm@linux-foundation.org: fix binfmt_aout.c build] Signed-off-by: Alex Kelly <alex.page.kelly@gmail.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Acked-by: Kees Cook <keescook@chromium.org> Cc: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
754b7b63 |
|
04-Oct-2012 |
Andi Kleen <ak@linux.intel.com> |
sections: disable const sections for PA-RISC v2 The PA-RISC tool chain seems to have some problem with correct read/write attributes on sections. This causes problems when the const sections are fixed up for other architecture to only contain truly read-only data. Disable const sections for PA-RISC This can cause a bit of noise with modpost. Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: "James E.J. Bottomley" <jejb@parisc-linux.org> Cc: Helge Deller <deller@gmx.de> Acked-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
1fd2b442 |
|
11-Jul-2012 |
Frederic Weisbecker <fweisbec@gmail.com> |
rcu: Userspace RCU extended QS selftest Provide a config option that enables the userspace RCU extended quiescent state on every CPUs by default. This is for testing purpose. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Alessio Igor Bogani <abogani@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Avi Kivity <avi@redhat.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Christoph Lameter <cl@linux.com> Cc: Geoff Levand <geoff@infradead.org> Cc: Gilad Ben Yossef <gilad@benyossef.com> Cc: Hakan Akkan <hakanakkan@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Kevin Hilman <khilman@ti.com> Cc: Max Krasnyansky <maxk@qualcomm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephen Hemminger <shemminger@vyatta.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
|
#
2b1d5024 |
|
11-Jul-2012 |
Frederic Weisbecker <fweisbec@gmail.com> |
rcu: Settle config for userspace extended quiescent state Create a new config option under the RCU menu that put CPUs under RCU extended quiescent state (as in dynticks idle mode) when they run in userspace. This require some contribution from architectures to hook into kernel and userspace boundaries. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Alessio Igor Bogani <abogani@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Avi Kivity <avi@redhat.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Christoph Lameter <cl@linux.com> Cc: Geoff Levand <geoff@infradead.org> Cc: Gilad Ben Yossef <gilad@benyossef.com> Cc: Hakan Akkan <hakanakkan@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Kevin Hilman <khilman@ti.com> Cc: Max Krasnyansky <maxk@qualcomm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephen Hemminger <shemminger@vyatta.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
|
#
fdf9c356 |
|
09-Sep-2012 |
Frederic Weisbecker <fweisbec@gmail.com> |
cputime: Make finegrained irqtime accounting generally available There is no known reason for this option to be unavailable on other archs than x86. They just need to call enable_sched_clock_irqtime() if they have a sufficiently finegrained clock to make it working. Move it to the general option and let the user choose between it and pure tick based or virtual cputime accounting. Note that virtual cputime accounting already performs a finegrained irqtime accounting. CONFIG_IRQ_TIME_ACCOUNTING is a kind of middle ground between tick and virtual based accounting. So CONFIG_IRQ_TIME_ACCOUNTING and CONFIG_VIRT_CPU_ACCOUNTING are mutually exclusive choices. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org>
|
#
391dc69c |
|
09-Sep-2012 |
Frederic Weisbecker <fweisbec@gmail.com> |
cputime: Gather time/stats accounting config options into a single menu This debloats a bit the general config menu and make these config options easier to find. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org>
|
#
72235465 |
|
10-Feb-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
userns: Convert the ufs filesystem to use kuid/kgid where appropriate Cc: Evgeniy Dushistov <dushistov@mail.ru> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
#
c2ba138a |
|
10-Feb-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
userns: Convert the udf filesystem to use kuid/kgid where appropriate Cc: Jan Kara <jack@suse.cz> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
#
39241beb |
|
07-Feb-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
userns: Convert ubifs to use kuid/kgid Cc: Artem Bityutskiy <dedekind1@gmail.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
#
61293ee2 |
|
10-Feb-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
userns: Convert squashfs to use kuid/kgid where appropriate Cc: Phillip Lougher <phillip@squashfs.org.uk> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
#
df814654 |
|
07-Feb-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
userns: Convert reiserfs to use kuid and kgid where appropriate Cc: reiserfs-devel@vger.kernel.org Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
#
c18cdc1a |
|
10-Feb-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
userns: Convert jfs to use kuid/kgid where appropriate Cc: Dave Kleikamp <shaggy@kernel.org> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
#
0cfe53d3 |
|
07-Feb-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
userns: Convert jffs2 to use kuid and kgid where appropriate - General routine uid/gid conversion work - When storing posix acls treat ACL_USER and ACL_GROUP separately so I can call from_kuid or from_kgid as appropriate. - When reading posix acls treat ACL_USER and ACL_GROUP separately so I can call make_kuid or make_kgid as appropriate. Cc: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
#
0e1a43c7 |
|
07-Feb-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
userns: Convert hpfs to use kuid and kgid where appropriate Cc: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
#
2f2f43d3 |
|
10-Feb-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
userns: Convert btrfs to use kuid/kgid where appropriate Cc: Chris Mason <chris.mason@fusionio.com> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
#
7f5b82b8 |
|
25-Apr-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
userns: Convert bfs to use kuid/kgid where appropriate Cc: "Tigran A. Aivazian" <tigran@aivazian.fsnet.co.uk> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
#
8fed10be |
|
07-Feb-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
userns: Convert affs to use kuid/kgid wherwe appropriate Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
#
4a2ebb93 |
|
25-May-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
userns: Convert binder ipc to use kuids Cc: Arve Hjønnevåg <arve@android.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
#
8b94eea4 |
|
25-May-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
userns: Add user namespace support to IMA Use kuid's in the IMA rules. When reporting the current uid in audit logs use from_kuid to get a usable value. Cc: Mimi Zohar <zohar@us.ibm.com> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
#
cf9c9352 |
|
25-May-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
userns: Convert EVM to deal with kuids and kgids in it's hmac computation Cc: Mimi Zohar <zohar@us.ibm.com> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
#
29f82ae5 |
|
07-Feb-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
userns: Convert hostfs to use kuid and kgid where appropriate Cc: Jeff Dike <jdike@addtoit.com> Cc: Richard Weinberger <richard@nod.at> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
#
609fcd1b |
|
07-Feb-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
userns: Convert tomoyo to use kuid and kgid where appropriate Acked-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
#
2db81452 |
|
07-Feb-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
userns: Convert apparmor to use kuid and kgid where appropriate Cc: John Johansen <john.johansen@canonical.com> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
#
e4849737 |
|
11-Feb-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
userns: Convert loop to use kuid_t instead of uid_t Cc: Jens Axboe <jaxboe@fusionio.com> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
#
d03ca582 |
|
25-Apr-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
userns: Convert ipathfs to use GLOBAL_ROOT_UID and GLOBAL_ROOT_GID Acked-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
#
2f83ffa8 |
|
10-Feb-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
userns: Convert freevxfs to use kuid/kgid where appropriate Cc: Christoph Hellwig <hch@infradead.org> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
#
a726ecce |
|
10-Feb-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
userns: Convert the sysv filesystem to use kuid/kgid where appropriate Cc: Christoph Hellwig <hch@infradead.org> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
#
85a03d1b |
|
07-Apr-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
userns: Convert the qnx6 filesystem to use kuid/kgid where appropriate Cc: Kai Bankett <chaosman@ontika.net> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
#
511728d7 |
|
10-Feb-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
userns: Convert the qnx4 filesystem to use kuid/kgid where appropriate Acked-by: Anders Larsen <al@alarsen.net> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
#
80fcbe75 |
|
07-Feb-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
userns: Convert omfs to use kuid and kgid where appropriate Acked-by: Bob Copeland <me@bobcopeland.com> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
#
b29f7751 |
|
07-Feb-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
userns: Convert ntfs to use kuid and kgid where appropriate Cc: Anton Altaparmakov <anton@tuxera.com> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
#
305d3d0d |
|
10-Feb-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
userns: Convert nillfs2 to use kuid/kgid where appropriate Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
#
f303bdc5 |
|
10-Feb-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
userns: Convert minix to use kuid/kgid where appropriate Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
#
1a0a994e |
|
10-Feb-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
userns: Convert logfs to use kuid/kgid where appropriate Cc: Joern Engel <joern@logfs.org> Cc: Prasad Joshi <prasadjoshi.linux@gmail.com> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
#
ba64e2b9 |
|
10-Feb-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
userns: Convert isofs to use kuid/kgid where appropriate Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
#
16525e3f |
|
07-Feb-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
userns: Convert hfsplus to use kuid and kgid where appropriate Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
#
43b5e4cc |
|
07-Feb-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
userns: Convert hfs to use kuid and kgid where appropriate Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
#
d001b053 |
|
10-Feb-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
userns: Convert exofs to use kuid/kgid where appropriate Cc: Benny Halevy <bhalevy@tonian.com> Acked-by: Boaz Harrosh <bharrosh@panasas.com> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
#
5d4ea4da |
|
10-Feb-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
userns: Convert efs to use kuid/kgid where appropriate Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
#
cdf8c58a |
|
07-Feb-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
userns: Convert ecryptfs to use kuid/kgid where appropriate Cc: Tyler Hicks <tyhicks@canonical.com> Cc: Dustin Kirkland <dustin.kirkland@gazzang.com> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
#
a7d9cfe9 |
|
10-Feb-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
userns: Convert cramfs to use kuid/kgid where appropriate Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
#
31aba059 |
|
10-Feb-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
userns: Convert befs to use kuid/kgid where appropriate Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
#
c010d1ff |
|
07-Feb-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
userns: Convert adfs to use kuid and kgid where appropriate Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
#
9a11f451 |
|
07-Feb-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
userns: Convert xenfs to use kuid and kgid where appropriate Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
#
a0eb3a05 |
|
07-Feb-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
userns: Convert hugetlbfs to use kuid/kgid where appropriate Note sysctl_hugetlb_shm_group can only be written in the root user in the initial user namespace, so we can assume sysctl_hugetlb_shm_group is in the initial user namespace. Cc: William Irwin <wli@holomorphy.com> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
#
91fa2cca |
|
25-Apr-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
userns: Convert devtmpfs to use GLOBAL_ROOT_UID and GLOBAL_ROOT_GID Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
#
b9b73f7c |
|
14-Jun-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
userns: Convert usb functionfs to use kuid/kgid where appropriate Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Felipe Balbi <balbi@ti.com> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
#
32d639c6 |
|
07-Feb-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
userns: Convert gadgetfs to use kuid and kgid where appropriate Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Felipe Balbi <balbi@ti.com> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
#
170782eb |
|
07-Feb-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
userns: Convert fat to use kuid/kgid where appropriate Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
#
1a06d420 |
|
16-Sep-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
userns: Convert quota Now that the type changes are done, here is the final set of changes to make the quota code work when user namespaces are enabled. Small cleanups and fixes to make the code build when user namespaces are enabled. Cc: Jan Kara <jack@suse.cz> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
#
431f1974 |
|
16-Sep-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
userns: Convert quota netlink aka quota_send_warning Modify quota_send_warning to take struct kqid instead a type and identifier pair. When sending netlink broadcasts always convert uids and quota identifiers into the intial user namespace. There is as yet no way to send a netlink broadcast message with different contents to receivers in different namespaces, so for the time being just map all of the identifiers into the initial user namespace which preserves the current behavior. Change the callers of quota_send_warning in gfs2, xfs and dquot to generate a struct kqid to pass to quota send warning. When all of the user namespaces convesions are complete a struct kqid values will be availbe without need for conversion, but a conversion is needed now to avoid needing to convert everything at once. Cc: Ben Myers <bpm@sgi.com> Cc: Alex Elder <elder@kernel.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Jan Kara <jack@suse.cz> Cc: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
#
74a8a103 |
|
16-Sep-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
userns: Convert qutoactl Update the quotactl user space interface to successfull compile with user namespaces support enabled and to hand off quota identifiers to lower layers of the kernel in struct kqid instead of type and qid pairs. The quota on function is not converted because while it takes a quota type and an id. The id is the on disk quota format to use, which is something completely different. The signature of two struct quotactl_ops methods were changed to take struct kqid argumetns get_dqblk and set_dqblk. The dquot, xfs, and ocfs2 implementations of get_dqblk and set_dqblk are minimally changed so that the code continues to work with the change in parameter type. This is the first in a series of changes to always store quota identifiers in the kernel in struct kqid and only use raw type and qid values when interacting with on disk structures or userspace. Always using struct kqid internally makes it hard to miss places that need conversion to or from the kernel internal values. Cc: Jan Kara <jack@suse.cz> Cc: Dave Chinner <david@fromorbit.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Ben Myers <bpm@sgi.com> Cc: Alex Elder <elder@kernel.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
#
69552c0c |
|
07-Feb-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
userns: Convert configfs to use kuid and kgid where appropriate Cc: Joel Becker <jlbec@evilplan.org> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
#
af84df93 |
|
10-Sep-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
userns: Convert extN to support kuids and kgids in posix acls Convert ext2, ext3, and ext4 to fully support the posix acl changes, using e_uid e_gid instead e_id. Enabled building with posix acls enabled, all filesystems supporting user namespaces, now also support posix acls when user namespaces are enabled. Cc: Theodore Tso <tytso@mit.edu> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: Jan Kara <jack@suse.cz> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
#
d20b92ab |
|
13-Mar-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
userns: Teach trace to use from_kuid - When tracing capture the kuid. - When displaying the data to user space convert the kuid into the user namespace of the process that opened the report file. Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
#
f8f3d4de |
|
07-Feb-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
userns: Convert bsd process accounting to use kuid and kgid where appropriate BSD process accounting conveniently passes the file the accounting records will be written into to do_acct_process. The file credentials captured the user namespace of the opener of the file. Use the file credentials to format the uid and the gid of the current process into the user namespace of the user that started the bsd process accounting. Cc: Pavel Emelyanov <xemul@openvz.org> Reviewed-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
#
4bd6e32a |
|
07-Feb-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
userns: Convert taskstats to handle the user and pid namespaces. - Explicitly limit exit task stat broadcast to the initial user and pid namespaces, as it is already limited to the initial network namespace. - For broadcast task stats explicitly generate all of the idenitiers in terms of the initial user namespace and the initial pid namespace. - For request stats report them in terms of the current user namespace and the current pid namespace. Netlink messages are delivered syncrhonously to the kernel allowing us to get the user namespace and the pid namespace from the current task. - Pass the namespaces for representing pids and uids and gids into bacct_add_task. Cc: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
#
cca080d9 |
|
07-Feb-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
userns: Convert audit to work with user namespaces enabled - Explicitly format uids gids in audit messges in the initial user namespace. This is safe because auditd is restrected to be in the initial user namespace. - Convert audit_sig_uid into a kuid_t. - Enable building the audit code and user namespaces at the same time. The net result is that the audit subsystem now uses kuid_t and kgid_t whenever possible making it almost impossible to confuse a raw uid_t with a kuid_t preventing bugs. Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Eric Paris <eparis@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
#
8c2c3df3 |
|
20-Apr-2012 |
Catalin Marinas <catalin.marinas@arm.com> |
arm64: Build infrastructure This patch adds Makefile and Kconfig files required for building an AArch64 kernel. Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Tony Lindgren <tony@atomide.com> Acked-by: Nicolas Pitre <nico@linaro.org> Acked-by: Olof Johansson <olof@lixom.net> Acked-by: Santosh Shilimkar <santosh.shilimkar@ti.com> Acked-by: Arnd Bergmann <arnd@arndb.de>
|
#
c6089735 |
|
25-May-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
userns: net: Call key_alloc with GLOBAL_ROOT_UID, GLOBAL_ROOT_GID instead of 0, 0 In net/dns_resolver/dns_key.c and net/rxrpc/ar-key.c make them work with user namespaces enabled where key_alloc takes kuids and kgids. Pass GLOBAL_ROOT_UID and GLOBAL_ROOT_GID instead of bare 0's. Cc: Sage Weil <sage@inktank.com> Cc: ceph-devel@vger.kernel.org Cc: David Howells <dhowells@redhat.com> Cc: David Miller <davem@davemloft.net> Cc: linux-afs@lists.infradead.org Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
#
9a56c2db |
|
08-Feb-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
userns: Convert security/keys to the new userns infrastructure - Replace key_user ->user_ns equality checks with kuid_has_mapping checks. - Use from_kuid to generate key descriptions - Use kuid_t and kgid_t and the associated helpers instead of uid_t and gid_t - Avoid potential problems with file descriptor passing by displaying keys in the user namespace of the opener of key status proc files. Cc: linux-security-module@vger.kernel.org Cc: keyrings@linux-nfs.org Cc: David Howells <dhowells@redhat.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
#
5fce5e0b |
|
07-Feb-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
userns: Convert drm to use kuid and kgid and struct pid where appropriate Blink Blink this had not been converted to use struct pid ages ago? - On drm open capture the openers kuid and struct pid. - On drm close release the kuid and struct pid - When reporting the uid and pid convert the kuid and struct pid into values in the appropriate namespace. Cc: dri-devel@lists.freedesktop.org Acked-by: Dave Airlie <airlied@redhat.com> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
#
1efdb69b |
|
07-Feb-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
userns: Convert ipc to use kuid and kgid where appropriate - Store the ipc owner and creator with a kuid - Store the ipc group and the crators group with a kgid. - Add error handling to ipc_update_perms, allowing it to fail if the uids and gids can not be converted to kuids or kgids. - Modify the proc files to display the ipc creator and owner in the user namespace of the opener of the proc file. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
#
9582d901 |
|
07-Feb-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
userns: Convert process event connector to handle kuids and kgids - Only allow asking for events from the initial user and pid namespace, where we generate the events in. - Convert kuids and kgids into the initial user namespace to report them via the process event connector. Cc: David Miller <davem@davemloft.net> Acked-by: Evgeniy Polyakov <zbr@ioremap.net> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
#
7dc05881 |
|
03-Apr-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
userns: Convert debugfs to use kuid/kgid where appropriate. Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
#
45f035ab |
|
04-Sep-2012 |
Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
CONFIG_HOTPLUG should be always on CONFIG_HOTPLUG is a very old option, back when we had static systems and it was odd that any type of device would be removed or added after the system had started up. It is quite hard to disable it these days, and even if you do, it only saves you about 200 bytes. However, if it is disabled, lots of bugs show up because it is almost never tested if the option is disabled. This is a step to eventually just remove the option entirely, which will clean up all of the devinit* variable and function pointer options, that everyone (myself include) ends up getting wrong eventually, causing real problems when memory segments are removed yet we don't expect them to be. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Bjorn Helgaas <bhelgaas@google.com>
|
#
c9235f48 |
|
23-Apr-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
userns: Make credential debugging user namespace safe. Cc: David Howells <dhowells@redhat.com> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
#
bc45dae3 |
|
15-Aug-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
userns: Enable building of pf_key sockets when user namespace support is enabled. Enable building of pf_key sockets and user namespace support at the same time. This combination builds successfully so there is no reason to forbid it. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
#
b952741c |
|
16-Jun-2012 |
Frederic Weisbecker <fweisbec@gmail.com> |
cputime: Generalize CONFIG_VIRT_CPU_ACCOUNTING S390, ia64 and powerpc all define their own version of CONFIG_VIRT_CPU_ACCOUNTING. Generalize the config and its description to a single place to avoid duplication. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org>
|
#
0625c883 |
|
07-Feb-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
userns: Convert tun/tap to use kuid and kgid where appropriate Cc: Maxim Krasnyansky <maxk@qualcomm.com> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
#
1efa29cd |
|
10-Feb-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
userns: Make the airo wireless driver use kuids for proc uids and gids Cc: Dan Carpenter <dan.carpenter@oracle.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: John W. Linville <linville@tuxdriver.com> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
#
26711a79 |
|
02-Feb-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
userns: xt_owner: Add basic user namespace support. - Only allow adding matches from the initial user namespace - Add the appropriate conversion functions to handle matches against sockets in other user namespaces. Cc: Jan Engelhardt <jengelh@medozas.de> Cc: Patrick McHardy <kaber@trash.net> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
#
da742808 |
|
25-May-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
userns xt_recent: Specify the owner/group of ip_list_perms in the initial user namespace xt_recent creates a bunch of proc files and initializes their uid and gids to the values of ip_list_uid and ip_list_gid. When initialize those proc files convert those values to kuids so they can continue to reside on the /proc inode. Cc: Pablo Neira Ayuso <pablo@netfilter.org> Cc: Patrick McHardy <kaber@trash.net> Cc: Jan Engelhardt <jengelh@medozas.de> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
#
8c6e2a94 |
|
25-May-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
userns: Convert xt_LOG to print socket kuids and kgids as uids and gids xt_LOG always writes messages via sb_add via printk. Therefore when xt_LOG logs the uid and gid of a socket a packet came from the values should be converted to be in the initial user namespace. Thus making xt_LOG as user namespace safe as possible. Cc: Pablo Neira Ayuso <pablo@netfilter.org> Cc: Patrick McHardy <kaber@trash.net> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
#
a6c6796c |
|
25-May-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
userns: Convert cls_flow to work with user namespaces enabled The flow classifier can use uids and gids of the sockets that are transmitting packets and do insert those uids and gids into the packet classification calcuation. I don't fully understand the details but it appears that we can depend on specific uids and gids when making traffic classification decisions. To work with user namespaces enabled map from kuids and kgids into uids and gids in the initial user namespace giving raw integer values the code can play with and depend on. To avoid issues of userspace depending on uids and gids in packet classifiers installed from other user namespaces and getting confused deny all packet classifiers that use uids or gids that are not comming from a netlink socket in the initial user namespace. Cc: Patrick McHardy <kaber@trash.net> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Changli Gao <xiaosuo@gmail.com> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
#
9eea9515 |
|
25-May-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
userns: nfnetlink_log: Report socket uids in the log sockets user namespace At logging instance creation capture the peer netlink socket's user namespace. Use the captured peer user namespace when reporting socket uids to the peer. The peer socket's user namespace is guaranateed to be valid until the user closes the netlink socket. nfnetlink_log removes instances during the final close of a socket. __build_packet_message does not get called after an instance is destroyed. Therefore it is safe to let the peer netlink socket take care of the user namespace reference counting for us. Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
#
d06ca956 |
|
24-May-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
userns: Teach inet_diag to work with user namespaces Compute the user namespace of the socket that we are replying to and translate the kuids of reported sockets into that user namespace. Cc: Andrew Vagin <avagin@openvz.org> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Pavel Emelyanov <xemul@parallels.com> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
#
d13fda85 |
|
24-May-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
userns: Convert net/ax25 to use kuid_t where appropriate Cc: Ralf Baechle <ralf@linux-mips.org> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
#
4f82f457 |
|
24-May-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
net ip6 flowlabel: Make owner a union of struct pid * and kuid_t Correct a long standing omission and use struct pid in the owner field of struct ip6_flowlabel when the share type is IPV6_FL_S_PROCESS. This guarantees we don't have issues when pid wraparound occurs. Use a kuid_t in the owner field of struct ip6_flowlabel when the share type is IPV6_FL_S_USER to add user namespace support. In /proc/net/ip6_flowlabel capture the current pid namespace when opening the file and release the pid namespace when the file is closed ensuring we print the pid owner value that is meaning to the reader of the file. Similarly use from_kuid_munged to print uid values that are meaningful to the reader of the file. This requires exporting pid_nr_ns so that ipv6 can continue to built as a module. Yoiks what silliness Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
#
7064d16e |
|
24-May-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
userns: Use kgids for sysctl_ping_group_range - Store sysctl_ping_group_range as a paire of kgid_t values instead of a pair of gid_t values. - Move the kgid conversion work from ping_init_sock into ipv4_ping_group_range - For invalid cases reset to the default disabled state. With the kgid_t conversion made part of the original value sanitation from userspace understand how the code will react becomes clearer and it becomes possible to set the sysctl ping group range from something other than the initial user namespace. Cc: Vasiliy Kulikov <segoon@openwall.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
#
a7cb5a49 |
|
24-May-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
userns: Print out socket uids in a user namespace aware fashion. Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: James Morris <jmorris@namei.org> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: Patrick McHardy <kaber@trash.net> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Cc: Sridhar Samudrala <sri@us.ibm.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
#
fc5795c8 |
|
23-May-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
userns: Allow USER_NS and NET simultaneously in Kconfig Now that the networking core is user namespace safe allow networking and user namespaces to be built at the same time. Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
#
d7555860 |
|
07-Feb-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
userns: Allow the usernamespace support to build after the removal of usbfs The user namespace code has an explicit "depends on USB_DEVICEFS = n" dependency to prevent building code that is not yet user namespace safe. With the removal of usbfs from the kernel it is now impossible to satisfy the USB_DEFICEFS = n dependency and thus it is impossible to enable user namespace support in 3.5-rc1. So remove the now useless depedency. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
#
c255a458 |
|
31-Jul-2012 |
Andrew Morton <akpm@linux-foundation.org> |
memcg: rename config variables Sanity: CONFIG_CGROUP_MEM_RES_CTLR -> CONFIG_MEMCG CONFIG_CGROUP_MEM_RES_CTLR_SWAP -> CONFIG_MEMCG_SWAP CONFIG_CGROUP_MEM_RES_CTLR_SWAP_ENABLED -> CONFIG_MEMCG_SWAP_ENABLED CONFIG_CGROUP_MEM_RES_CTLR_KMEM -> CONFIG_MEMCG_KMEM [mhocko@suse.cz: fix missed bits] Cc: Glauber Costa <glommer@parallels.com> Acked-by: Michal Hocko <mhocko@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Hugh Dickins <hughd@google.com> Cc: Tejun Heo <tj@kernel.org> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: David Rientjes <rientjes@google.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
2bc64a20 |
|
31-Jul-2012 |
Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> |
mm/hugetlb: add new HugeTLB cgroup Implement a new controller that allows us to control HugeTLB allocations. The extension allows to limit the HugeTLB usage per control group and enforces the controller limit during page fault. Since HugeTLB doesn't support page reclaim, enforcing the limit at page fault time implies that, the application will get SIGBUS signal if it tries to access HugeTLB pages beyond its limit. This requires the application to know beforehand how much HugeTLB pages it would require for its use. The charge/uncharge calls will be added to HugeTLB code in later patch. Support for cgroup removal will be added in later patches. [akpm@linux-foundation.org: s/CONFIG_CGROUP_HUGETLB_RES_CTLR/CONFIG_MEMCG_HUGETLB/g] [akpm@linux-foundation.org: s/CONFIG_MEMCG_HUGETLB/CONFIG_CGROUP_HUGETLB/g] Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: David Rientjes <rientjes@google.com> Cc: Hillf Danton <dhillf@gmail.com> Reviewed-by: Michal Hocko <mhocko@suse.cz> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
8f827a14 |
|
06-Jul-2012 |
Will Deacon <will@kernel.org> |
ARM: 7453/1: audit: only allow syscall auditing for pure EABI userspace The audit tools support only EABI userspace and, since there are no AUDIT_ARCH_* defines for the ARM OABI, it makes sense to allow syscall auditing on ARM only for EABI at the moment. Cc: Eric Paris <eparis@redhat.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
#
0a4dd35c |
|
31-May-2012 |
Randy Dunlap <rdunlap@infradead.org> |
kconfig: update compression algorithm info There have been new compression algorithms added without updating nearby relevant descriptive text that refers to (a) the number of compression algorithms and (b) the most recent one. Fix these inconsistencies. Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Reported-by: <qasdfgtyuiop@gmail.com> Cc: Lasse Collin <lasse.collin@tukaani.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Markus Trippelsdorf <markus@trippelsdorf.de> Cc: Alain Knaff <alain@knaff.lu> Cc: Albin Tonnerre <albin.tonnerre@free-electrons.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
764e0da1 |
|
21-May-2012 |
Thomas Gleixner <tglx@linutronix.de> |
timers: Fixup the Kconfig consolidation fallout Sigh, I missed to check which architecture Kconfig files actually include the core Kconfig file. There are a few which did not. So we broke them. Instead of adding the includes to those, we are better off to move the include to init/Kconfig like we did already with irqs and others. This does not change anything for the architectures using the old style periodic timer mode. It just solves the build wreckage there. For those architectures which use the clock events infrastructure it moves the include of the core Kconfig file to "General setup" which is a way more logical place than having it at random locations specified by the architecture specific Kconfigs. Reported-by: Ingo Molnar <mingo@kernel.org> Cc: Anna-Maria Gleixner <anna-maria@glx-um.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
b38a86eb |
|
12-Mar-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
userns: Convert the move_pages, and migrate_pages permission checks to use uid_eq Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
#
14a590c3 |
|
12-Mar-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
userns: Convert cgroup permission checks to use uid_eq Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
#
8751e039 |
|
07-Feb-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
userns: Convert tmpfs to use kuid and kgid where appropriate Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
#
ab27b91b |
|
07-Feb-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
userns: Convert sysfs to use kgid/kuid where appropriate Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
#
091bd3ea |
|
13-Feb-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
userns: Convert sysctl permission checks to use kuid and kgids. Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
#
dcb0f222 |
|
09-Feb-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
userns: Convert proc to use kuid/kgid where appropriate Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
#
08cefc7a |
|
07-Feb-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
userns: Convert ext4 to user kuid/kgid where appropriate Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
#
1523299d |
|
07-Feb-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
userns: Convert ext3 to use kuid/kgid where appropriate Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
#
b8a9f9e1 |
|
07-Feb-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
userns: Convert ext2 to use kuid/kgid where appropriate. Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
#
f04c6ce2 |
|
07-Feb-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
userns: Convert devpts to use kuid/kgid where appropriate Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
#
ebc887b2 |
|
07-Feb-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
userns: Convert binary formats to use kuid/kgid where appropriate Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
#
e1c972b6 |
|
21-Apr-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
userns: Add negative depends on entries to avoid building code that is userns unsafe Add a new internal Kconfig option UIDGID_CONVERTED that is true when the selected Kconfig options have been converted to be user namespace safe, and guard USER_NS and guard the UIDGID_STRICT_TYPE_CHECK options with it. This keeps innocent kernel users from having the choice to enable the user namespace in the cases where it is known not to work. Most of the rest of the conversions are simple and straight forward but their sheer number means it is good not to count on having them all done and reviwed before thinking of merging this code. Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
#
392d65a9 |
|
05-Apr-2012 |
Robert Richter <robert.richter@amd.com> |
perf: Remove PERF_COUNTERS config option Renaming remaining PERF_COUNTERS options into PERF_EVENTS. Think we can get rid of PERF_COUNTERS now. Signed-off-by: Robert Richter <robert.richter@amd.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1333643084-26776-5-git-send-email-robert.richter@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
8932a63d |
|
19-Apr-2012 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Reduce cache-miss initialization latencies for large systems Commit #0209f649 (rcu: limit rcu_node leaf-level fanout) set an upper limit of 16 on the leaf-level fanout for the rcu_node tree. This was needed to reduce lock contention that was induced by the synchronization of scheduling-clock interrupts, which was in turn needed to improve energy efficiency for moderate-sized lightly loaded servers. However, reducing the leaf-level fanout means that there are more leaf-level rcu_node structures in the tree, which in turn means that RCU's grace-period initialization incurs more cache misses. This is not a problem on moderate-sized servers with only a few tens of CPUs, but becomes a major source of real-time latency spikes on systems with many hundreds of CPUs. In addition, the workloads running on these large systems tend to be CPU-bound, which eliminates the energy-efficiency advantages of synchronizing scheduling-clock interrupts. Therefore, these systems need maximal values for the rcu_node leaf-level fanout. This commit addresses this problem by introducing a new kernel parameter named RCU_FANOUT_LEAF that directly controls the leaf-level fanout. This parameter defaults to 16 to handle the common case of a moderate sized lightly loaded servers, but may be set higher on larger systems. Reported-by: Mike Galbraith <efault@gmx.de> Reported-by: Dimitri Sivanich <sivanich@sgi.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
c9336643 |
|
18-Apr-2012 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Clarify help text for RCU_BOOST_PRIO The old text confused real-time applications with real-time threads, so that you pretty much needed to understand how this kernel configuration parameter worked to understand the help text. This commit therefore attempts to make the help text human-readable. Reported-by: Jörn Engel <joern@purestorage.com> Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
1dbdc6f1 |
|
19-Apr-2012 |
David Daney <david.daney@cavium.com> |
kbuild/extable: Hook up sortextable into the build system. Define a config variable BUILDTIME_EXTABLE_SORT to control build time sorting of the kernel's exception table. Patch Makefile to do the sorting when BUILDTIME_EXTABLE_SORT is selected. Signed-off-by: David Daney <david.daney@cavium.com> Link: http://lkml.kernel.org/r/1334872799-14589-4-git-send-email-ddaney.cavm@gmail.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
|
#
5673a94c |
|
17-Nov-2011 |
Eric W. Biederman <ebiederm@xmission.com> |
userns: Add a Kconfig option to enforce strict kuid and kgid type checks Make it possible to easily switch between strong mandatory type checks and relaxed type checks so that the code can easily be tested with the type checks and then built with the strong type checks disabled so the resulting code can be used. Require strong mandatory type checks when enabling the user namespace. It is very simple to make a typo and use the wrong type allowing conversions to/from userspace values to be bypassed by accident, the strong type checks prevent this. Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
#
5f054e31 |
|
28-Mar-2012 |
Rusty Russell <rusty@rustcorp.com.au> |
documentation: remove references to cpu_*_map. This has been obsolescent for a while, fix documentation and misc comments. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
#
32e380ae |
|
05-Mar-2012 |
Tejun Heo <tj@kernel.org> |
blkcg: make CONFIG_BLK_CGROUP bool Block cgroup core can be built as module; however, it isn't too useful as blk-throttle can only be built-in and cfq-iosched is usually the default built-in scheduler. Scheduled blkcg cleanup requires calling into blkcg from block core. To simplify that, disallow building blkcg as module by making CONFIG_BLK_CGROUP bool. If building blkcg core as module really matters, which I doubt, we can revisit it after blkcg API cleanup. -v2: Vivek pointed out that IOSCHED_CFQ was incorrectly updated to depend on BLK_CGROUP. Fixed. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
5c8806a0 |
|
06-Jan-2012 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Move RCU_TRACE to lib/Kconfig.debug The RCU_TRACE kernel parameter has always been intended for debugging, not for production use. Formalize this by moving RCU_TRACE from init/Kconfig to lib/Kconfig.debug. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
29ef73b7 |
|
03-Jan-2012 |
Nathaniel Husted <nhusted@gmail.com> |
Kernel: Audit Support For The ARM Platform This patch provides functionality to audit system call events on the ARM platform. The implementation was based off the structure of the MIPS platform and information in this (http://lists.fedoraproject.org/pipermail/arm/2009-October/000382.html) mailing list thread. The required audit_syscall_exit and audit_syscall_entry checks were added to ptrace using the standard registers for system call values (r0 through r3). A thread information flag was added for auditing (TIF_SYSCALL_AUDIT) and a meta-flag was added (_TIF_SYSCALL_WORK) to simplify modifications to the syscall entry/exit. Now, if either the TRACE flag is set or the AUDIT flag is set, the syscall_trace function will be executed. The prober changes were made to Kconfig to allow CONFIG_AUDITSYSCALL to be enabled. Due to platform availability limitations, this patch was only tested on the Android platform running the modified "android-goldfish-2.6.29" kernel. A test compile was performed using Code Sourcery's cross-compilation toolset and the current linux-3.0 stable kernel. The changes compile without error. I'm hoping, due to the simple modifications, the patch is "obviously correct". Signed-off-by: Nathaniel Husted <nhusted@gmail.com> Signed-off-by: Eric Paris <eparis@redhat.com>
|
#
633b4545 |
|
03-Jan-2012 |
Eric Paris <eparis@redhat.com> |
audit: only allow tasks to set their loginuid if it is -1 At the moment we allow tasks to set their loginuid if they have CAP_AUDIT_CONTROL. In reality we want tasks to set the loginuid when they log in and it be impossible to ever reset. We had to make it mutable even after it was once set (with the CAP) because on update and admin might have to restart sshd. Now sshd would get his loginuid and the next user which logged in using ssh would not be able to set his loginuid. Systemd has changed how userspace works and allowed us to make the kernel work the way it should. With systemd users (even admins) are not supposed to restart services directly. The system will restart the service for them. Thus since systemd is going to loginuid==-1, sshd would get -1, and sshd would be allowed to set a new loginuid without special permissions. If an admin in this system were to manually start an sshd he is inserting himself into the system chain of trust and thus, logically, it's his loginuid that should be used! Since we have old systems I make this a Kconfig option. Signed-off-by: Eric Paris <eparis@redhat.com>
|
#
067bce1a |
|
12-Jan-2012 |
Cyrill Gorcunov <gorcunov@openvz.org> |
c/r: introduce CHECKPOINT_RESTORE symbol For checkpoint/restore we need auxilary features being compiled into the kernel, such as additional prctl codes, /proc/<pid>/map_files and etc... but same time these features are not mandatory for a regular kernel so CHECKPOINT_RESTORE config symbol should bring a way to disable them all at once if one wish to get rid of additional functionality. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Tejun Heo <tj@kernel.org> Cc: Andrew Vagin <avagin@openvz.org> Cc: Serge Hallyn <serge.hallyn@canonical.com> Cc: Vasiliy Kulikov <segoon@openwall.com> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
6db9dc15 |
|
10-Jan-2012 |
Fabio Estevam <festevam@gmail.com> |
sched: Fix CONFIG_CGROUP_SCHED dependency The dependency bug was pointed out by this build warning: warning: (SCHED_AUTOGROUP) selects CGROUP_SCHED which has unmet direct dependencies (CGROUPS && EXPERIMENTAL) Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com> Cc: a.p.zijlstra@chello.nl Cc: Fabio Estevam <festevam@gmail.com> Link: http://lkml.kernel.org/r/1326192383-5113-1-git-send-email-festevam@gmail.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
e5671dfa |
|
11-Dec-2011 |
Glauber Costa <glommer@parallels.com> |
Basic kernel memory functionality for the Memory Controller This patch lays down the foundation for the kernel memory component of the Memory Controller. As of today, I am only laying down the following files: * memory.independent_kmem_limit * memory.kmem.limit_in_bytes (currently ignored) * memory.kmem.usage_in_bytes (always zero) Signed-off-by: Glauber Costa <glommer@parallels.com> CC: Kirill A. Shutemov <kirill@shutemov.name> CC: Paul Menage <paul@paulmenage.org> CC: Greg Thelen <gthelen@google.com> CC: Johannes Weiner <jweiner@redhat.com> CC: Michal Hocko <mhocko@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b807fbff |
|
03-Nov-2011 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Permit RCU_FAST_NO_HZ to be used by TREE_PREEMPT_RCU The new implementation of RCU_FAST_NO_HZ is compatible with preemptible RCU, so this commit removes the Kconfig restriction that previously prohibited this. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
|
#
c736de60 |
|
02-Nov-2011 |
WANG Cong <amwang@redhat.com> |
sysctl: make CONFIG_SYSCTL_SYSCALL default to n When I tried to send a patch to remove it, Andi told me we still need to keep compabitlies for old libc, so we can't remove this completely. Then just make it default to n and remove the doc from feature-removal-schedule.txt. Signed-off-by: WANG Cong <amwang@redhat.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
8008e129 |
|
08-Jun-2011 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Drive configuration directly from SMP and PREEMPT This commit eliminates the possibility of running TREE_PREEMPT_RCU when SMP=n and of running TINY_RCU when PREEMPT=y. People who really want these combinations can hand-edit init/Kconfig, but eliminating them as choices for production systems reduces the amount of testing required. It will also allow cutting out a few #ifdefs. Note that running TREE_RCU and TINY_RCU on single-CPU systems using SMP-built kernels is still supported. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
ab84d31e |
|
21-Jul-2011 |
Paul Turner <pjt@google.com> |
sched: Introduce primitives to account for CFS bandwidth tracking In this patch we introduce the notion of CFS bandwidth, partitioned into globally unassigned bandwidth, and locally claimed bandwidth. - The global bandwidth is per task_group, it represents a pool of unclaimed bandwidth that cfs_rqs can allocate from. - The local bandwidth is tracked per-cfs_rq, this represents allotments from the global pool bandwidth assigned to a specific cpu. Bandwidth is managed via cgroupfs, adding two new interfaces to the cpu subsystem: - cpu.cfs_period_us : the bandwidth period in usecs - cpu.cfs_quota_us : the cpu bandwidth (in usecs) that this tg will be allowed to consume over period above. Signed-off-by: Paul Turner <pjt@google.com> Signed-off-by: Nikhil Rao <ncrao@google.com> Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Reviewed-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20110721184756.972636699@google.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
00a66d29 |
|
25-Jul-2011 |
WANG Cong <xiyou.wangcong@gmail.com> |
mm: remove the leftovers of noswapaccount In commit a2c8990aed5ab ("memsw: remove noswapaccount kernel parameter"), Michal forgot to remove some left pieces of noswapaccount in the tree, this patch removes them all. Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com> Acked-by: Michal Hocko <mhocko@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
d2c32258 |
|
15-Jun-2011 |
Josh Triplett <josh@joshtriplett.org> |
gcov: disable CONFIG_CONSTRUCTORS when not needed by CONFIG_GCOV_KERNEL CONFIG_CONSTRUCTORS controls support for running constructor functions at kernel init time. According to commit b99b87f70c7785ab ("kernel: constructor support"), gcov (CONFIG_GCOV_KERNEL) needs this. However, CONFIG_CONSTRUCTORS currently defaults to y, with no option to disable it, and CONFIG_GCOV_KERNEL depends on it. Instead, default it to n and have CONFIG_GCOV_KERNEL select it, so that the normal case of CONFIG_GCOV_KERNEL=n will result in CONFIG_CONSTRUCTORS=n. Observed in the short list of =y values in a minimal kernel configuration. Signed-off-by: Josh Triplett <josh@joshtriplett.org> Acked-by: WANG Cong <xiyou.wangcong@gmail.com> Acked-by: Peter Oberparleiter <peter.oberparleiter@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
bd5dc17b |
|
15-Jun-2011 |
Josh Triplett <josh@joshtriplett.org> |
uts: make default hostname configurable, rather than always using "(none)" The "hostname" tool falls back to setting the hostname to "localhost" if /etc/hostname does not exist. Distribution init scripts have the same fallback. However, if userspace never calls sethostname, such as when booting with init=/bin/sh, or otherwise booting a minimal system without the usual init scripts, the default hostname of "(none)" remains, unhelpfully appearing in various places such as prompts ("root@(none):~#") and logs. Furthermore, "(none)" doesn't typically resolve to anything useful. Make the default hostname configurable. This removes the need for the standard fallback, provides a useful default for systems that never call sethostname, and makes minimal systems that much more useful with less configuration. Distributions could choose to use "localhost" here to avoid the fallback, while embedded systems may wish to use a specific target hostname. Signed-off-by: Josh Triplett <josh@joshtriplett.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: David Miller <davem@davemloft.net> Cc: Serge Hallyn <serue@us.ibm.com> Cc: Kel Modderman <kel@otaku42.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
8761f1ab |
|
01-Jun-2011 |
Ralf Baechle <ralf@linux-mips.org> |
pcspkr: Cleanup Kconfig dependencies Lenghty lists of the kind "depends on ARCH1 || ARCH2 ... || ARCH123" are usually either wrong or too coarse grained. Or plain an ugly sin. [ tglx: Fixed up amigaone ] Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Matt Turner <mattst88@gmail.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: linux-alpha@vger.kernel.org Cc: linux-mips@linux-mips.org Cc: linuxppc-dev@lists.ozlabs.org Cc: Gerhard Pircher <gerhard_pircher@gmx.net> Link: http://lkml.kernel.org/r/20110601180610.984881988@duck.linux-mips.net Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
15f304b6 |
|
01-Jun-2011 |
Ralf Baechle <ralf@linux-mips.org> |
i8253: Consolidate all kernel definitions of i8253_lock Move them to drivers/clocksource/i8253.c and remove the implementations in arch/ [ tglx: Avoid the extra file in lib - folded arch patches in. The export will become conditional in a later step ] Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Link: http://lkml.kernel.org/r/20110601180610.221426078@duck.linux-mips.net Cc: Russell King <linux@arm.linux.org.uk> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
f505c553 |
|
05-Jun-2011 |
Josh Triplett <josh@joshtriplett.org> |
debug: Make CONFIG_EXPERT select CONFIG_DEBUG_KERNEL to unhide debug options Several debugging options currently default to y, such as CONFIG_DEBUG_BUGVERBOSE and CONFIG_DEBUG_RODATA. Embedded users might want to turn those options off to save space; however, turning them off requires turning on CONFIG_DEBUG_KERNEL to unhide them. Since CONFIG_DEBUG_KERNEL exists specifically to unhide debugging options, and CONFIG_EXPERT exists specifically to unhide options potentially needed by experts and/or embedded users, make CONFIG_EXPERT automatically imply CONFIG_DEBUG_KERNEL. Signed-off-by: Josh Triplett <josh@joshtriplett.org> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20110606012358.GA1909@leaf Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
a77aea92 |
|
26-May-2011 |
Daniel Lezcano <daniel.lezcano@free.fr> |
cgroup: remove the ns_cgroup The ns_cgroup is an annoying cgroup at the namespace / cgroup frontier and leads to some problems: * cgroup creation is out-of-control * cgroup name can conflict when pids are looping * it is not possible to have a single process handling a lot of namespaces without falling in a exponential creation time * we may want to create a namespace without creating a cgroup The ns_cgroup was replaced by a compatibility flag 'clone_children', where a newly created cgroup will copy the parent cgroup values. The userspace has to manually create a cgroup and add a task to the 'tasks' file. This patch removes the ns_cgroup as suggested in the following thread: https://lists.linux-foundation.org/pipermail/containers/2009-June/018616.html The 'cgroup_clone' function is removed because it is no longer used. This is a userspace-visible change. Commit 45531757b45c ("cgroup: notify ns_cgroup deprecated") (merged into 2.6.27) caused the kernel to emit a printk warning users that the feature is planned for removal. Since that time we have heard from XXX users who were affected by this. Signed-off-by: Daniel Lezcano <daniel.lezcano@free.fr> Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Jamal Hadi Salim <hadi@cyberus.ca> Reviewed-by: Li Zefan <lizf@cn.fujitsu.com> Acked-by: Paul Menage <menage@google.com> Acked-by: Matt Helsley <matthltc@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
281dc5c5 |
|
22-May-2011 |
Linus Torvalds <torvalds@linux-foundation.org> |
Give up on pushing CC_OPTIMIZE_FOR_SIZE I still happen to believe that I$ miss costs are a major thing, but sadly, -Os doesn't seem to be the solution. With or without it, gcc will miss some obvious code size improvements, and with it enabled gcc will sometimes make choices that aren't good even with high I$ miss ratios. For example, with -Os, gcc on x86 will turn a 20-byte constant memcpy into a "rep movsl". While I sincerely hope that x86 CPU's will some day do a good job at that, they certainly don't do it yet, and the cost is higher than a L1 I$ miss would be. Some day I hope we can re-enable this. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
17d9f311 |
|
19-May-2011 |
Daniel Hellstrom <daniel@gaisler.com> |
SCHED_TTWU_QUEUE is not longer needed since sparc32 now implements IPI Signed-off-by: Daniel Hellstrom <daniel@gaisler.com> Reported-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
21a43e39 |
|
10-May-2011 |
David Rientjes <rientjes@google.com> |
slub: Revert "[PARISC] slub: fix panic with DISCONTIGMEM" This reverts commit 4a5fa3590f09, which did not allow SLUB to be used on architectures that use DISCONTIGMEM without compiling NUMA support without CONFIG_BROKEN also set. The slub panic that it was intended to prevent is addressed by d9b41e0b54fd ("[PARISC] set memory ranges in N_NORMAL_MEMORY when onlined") on parisc so there is no further slub issues with such a configuration. The reverts allows SLUB now to be used on such architectures since there haven't been any reports of additional errors. Cc: James Bottomley <James.Bottomley@suse.de> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
27f4d280 |
|
07-Feb-2011 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: priority boosting for TREE_PREEMPT_RCU Add priority boosting for TREE_PREEMPT_RCU, similar to that for TINY_PREEMPT_RCU. This is enabled by the default-off RCU_BOOST kernel parameter. The priority to which to boost preempted RCU readers is controlled by the RCU_BOOST_PRIO kernel parameter (defaulting to real-time priority 1) and the time to wait before boosting the readers who are blocking a given grace period is controlled by the RCU_BOOST_DELAY kernel parameter (defaulting to 500 milliseconds). Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
|
#
6befe5f6 |
|
26-Apr-2011 |
Randy Dunlap <randy.dunlap@oracle.com> |
init/Kconfig: fix EXPERT menu list The EXPERT menu list was recently broken by the insertion of a kconfig symbol (EMBEDDED) at the beginning of the EXPERT list of kconfig items. Broken by: commit 6a108a14fa356ef607be308b68337939e56ea94e Author: David Rientjes <rientjes@google.com> Date: Thu Jan 20 14:44:16 2011 -0800 kconfig: rename CONFIG_EMBEDDED to CONFIG_EXPERT Restore the EXPERT menu list -- don't inject a symbol (EMBEDDED) that does not depend on EXPERT into the list. Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: David Rientjes <rientjes@google.com> Cc: Peter Foley <pefoley2@verizon.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
4a5fa359 |
|
19-Apr-2011 |
James Bottomley <James.Bottomley@HansenPartnership.com> |
[PARISC] slub: fix panic with DISCONTIGMEM Slub makes assumptions about page_to_nid() which are violated by DISCONTIGMEM and !NUMA. This violation results in a panic because page_to_nid() can be non-zero for pages in the discontiguous ranges and this leads to a null return by get_node(). The assertion by the maintainer is that DISCONTIGMEM should only be allowed when NUMA is also defined. However, at least six architectures: alpha, ia64, m32r, m68k, mips, parisc violate this. The panic is a regression against slab, so just mark slub broken in the problem configuration to prevent users reporting these panics. Cc: stable@kernel.org Acked-by: David Rientjes <rientjes@google.com> Acked-by: Pekka Enberg <penberg@kernel.org> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
|
#
1e2795a1 |
|
05-Apr-2011 |
Artem Bityutskiy <Artem.Bityutskiy@nokia.com> |
kbuild: move KALLSYMS_EXTRA_PASS from Kconfig to Makefile At the moment we have the CONFIG_KALLSYMS_EXTRA_PASS Kconfig switch, which users can enable or disable while configuring the kernel. This option is then used by 'make' to determine whether an extra kallsyms pass is needed or not. However, this approach is not nice and confusing, and this patch moves CONFIG_KALLSYMS_EXTRA_PASS from Kconfig to Makefile instead. The rationale is below. 1. CONFIG_KALLSYMS_EXTRA_PASS is really about the build time, not run-time. There is no real need for it to be in Kconfig. It is just an additional work-around which should be used only in rare cases, when someone breaks kallsyms, so Kbuild/Makefile is much better place for this option. 2. Grepping CONFIG_KALLSYMS_EXTRA_PASS shows that many defconfigs have it enabled, probably not because they try to work-around a kallsyms bug, but just because the Kconfig help text is confusing and does not really make it clear that this option should not be used unless except when kallsyms is broken. 3. And since many people have CONFIG_KALLSYMS_EXTRA_PASS enabled in their Kconfig, we do might fail to notice kallsyms bugs in time. E.g., many testers use "make allyesconfig" to test builds, which will enable CONFIG_KALLSYMS_EXTRA_PASS and kallsyms breakage will not be noticed. To address that, this patch: 1. Kills CONFIG_KALLSYMS_EXTRA_PASS 2. Changes Makefile so that people can use "make KALLSYMS_EXTRA_PASS=1" to enable the extra pass if needed. Additionally, they may define KALLSYMS_EXTRA_PASS as an environment variable. 3. By default KALLSYMS_EXTRA_PASS is disabled and if kallsyms has issues, "make" should print a warning and suggest using KALLSYMS_EXTRA_PASS Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> [mmarek: Removed make help text, is not necessary] Signed-off-by: Michal Marek <mmarek@suse.cz>
|
#
71a83ec7 |
|
05-Apr-2011 |
Artem Bityutskiy <Artem.Bityutskiy@nokia.com> |
Kconfig: improve KALLSYMS_ALL documentation Dumb users like myself are not able to grasp from the existing KALLSYMS_ALL documentation that this option is not what they need. Improve the help message and make it clearer that KALLSYMS is enough in the majority of use cases, and KALLSYMS_ALL should really be used very rarely. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Signed-off-by: Michal Marek <mmarek@suse.cz>
|
#
317f3941 |
|
05-Apr-2011 |
Peter Zijlstra <a.p.zijlstra@chello.nl> |
sched: Move the second half of ttwu() to the remote cpu Now that we've removed the rq->lock requirement from the first part of ttwu() and can compute placement without holding any rq->lock, ensure we execute the second half of ttwu() on the actual cpu we want the task to run on. This avoids having to take rq->lock and doing the task enqueue remotely, saving lots on cacheline transfers. As measured using: http://oss.oracle.com/~mason/sembench.c $ for i in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor ; do echo performance > $i; done $ echo 4096 32000 64 128 > /proc/sys/kernel/sem $ ./sembench -t 2048 -w 1900 -o 0 unpatched: run time 30 seconds 647278 worker burns per second patched: run time 30 seconds 816715 worker burns per second Reviewed-by: Frank Rowand <frank.rowand@am.sony.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20110405152729.515897185@chello.nl
|
#
990d6c2d |
|
29-Jan-2011 |
Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> |
vfs: Add name to file handle conversion support The syscall also return mount id which can be used to lookup file system specific information such as uuid in /proc/<pid>/mountinfo Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
4ba8216c |
|
25-Jan-2011 |
Arnd Bergmann <arnd@arndb.de> |
BKL: That's all, folks This removes the implementation of the big kernel lock, at last. A lot of people have worked on this in the past, I so the credit for this patch should be with everyone who participated in the hunt. The names on the Cc list are the people that were the most active in this, according to the recorded git history, in alphabetical order. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Alan Cox <alan@linux.intel.com> Cc: Alessio Igor Bogani <abogani@texware.it> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrew Hendry <andrew.hendry@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Christoph Hellwig <hch@infradead.org> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Hans Verkuil <hverkuil@xs4all.nl> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: Jan Blunck <jblunck@infradead.org> Cc: John Kacur <jkacur@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matthew Wilcox <matthew@wil.cx> Cc: Oliver Neukum <oliver@neukum.org> Cc: Paul Menage <menage@google.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
2d0f2520 |
|
02-Mar-2011 |
Li Zefan <lizf@cn.fujitsu.com> |
perf cgroup: Fix a typo in kernel config s/specificied/specified Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Acked-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <4D6F348C.2050804@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
e5d1367f |
|
14-Feb-2011 |
Stephane Eranian <eranian@google.com> |
perf: Add cgroup support This kernel patch adds the ability to filter monitoring based on container groups (cgroups). This is for use in per-cpu mode only. The cgroup to monitor is passed as a file descriptor in the pid argument to the syscall. The file descriptor must be opened to the cgroup name in the cgroup filesystem. For instance, if the cgroup name is foo and cgroupfs is mounted in /cgroup, then the file descriptor is opened to /cgroup/foo. Cgroup mode is activated by passing PERF_FLAG_PID_CGROUP in the flags argument to the syscall. For instance to measure in cgroup foo on CPU1 assuming cgroupfs is mounted under /cgroup: struct perf_event_attr attr; int cgroup_fd, fd; cgroup_fd = open("/cgroup/foo", O_RDONLY); fd = perf_event_open(&attr, cgroup_fd, 1, -1, PERF_FLAG_PID_CGROUP); close(cgroup_fd); Signed-off-by: Stephane Eranian <eranian@google.com> [ added perf_cgroup_{exit,attach} ] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <4d590250.114ddf0a.689e.4482@mx.google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
5d6a4ea5 |
|
10-Jan-2011 |
Ferenc Wagner <wferi@niif.hu> |
sysfs: Capitalize description of SYSFS_DEPRECATED{_V2} options Signed-off-by: Ferenc Wagner <wferi@niif.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
#
6a108a14 |
|
20-Jan-2011 |
David Rientjes <rientjes@google.com> |
kconfig: rename CONFIG_EMBEDDED to CONFIG_EXPERT The meaning of CONFIG_EMBEDDED has long since been obsoleted; the option is used to configure any non-standard kernel with a much larger scope than only small devices. This patch renames the option to CONFIG_EXPERT in init/Kconfig and fixes references to the option throughout the kernel. A new CONFIG_EMBEDDED option is added that automatically selects CONFIG_EXPERT when enabled and can be used in the future to isolate options that should only be considered for embedded systems (RISC architectures, SLOB, etc). Calling the option "EXPERT" more accurately represents its intention: only expert users who understand the impact of the configuration changes they are making should enable it. Reviewed-by: Ingo Molnar <mingo@elte.hu> Acked-by: David Woodhouse <david.woodhouse@intel.com> Signed-off-by: David Rientjes <rientjes@google.com> Cc: Greg KH <gregkh@suse.de> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jens Axboe <axboe@kernel.dk> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Robin Holt <holt@sgi.com> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
c5e0591a |
|
16-Jan-2011 |
Michael Witten <mfwitten@gmail.com> |
Kconfig: BLK_THROTTLE -> BLK_DEV_THROTTLING It would seem that `CONFIG_BLK_THROTTLE' doesn't exist, as it is only referenced in the documentation for `CONFIG_BLK_CGROUP'. The only other choice is `CONFIG_BLK_DEV_THROTTLING': $ git grep --cached THROTTL -- \*Kconfig block/Kconfig:config BLK_DEV_THROTTLING init/Kconfig: CONFIG_BLK_THROTTLE=y. Signed-off-by: Michael Witten <mfwitten@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
|
#
79e2e759 |
|
16-Jan-2011 |
Michael Witten <mfwitten@gmail.com> |
Kconfig: Typo: seti -> set Also, I introduced some punctuation to facilitate reading. Signed-off-by: Michael Witten <mfwitten@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
|
#
c072a388 |
|
07-Jan-2011 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: demote SRCU_SYNCHRONIZE_DELAY from kernel-parameter status Because the adaptive synchronize_srcu_expedited() approach has worked very well in testing, remove the kernel parameter and replace it by a C-preprocessor macro. If someone finds problems with this approach, a more complex and aggressively adaptive approach might be required. Longer term, SRCU will be merged with the other RCU implementations, at which point synchronize_srcu_expedited() will be event driven, just as synchronize_sched_expedited() currently is. At that point, there will be no need for this adaptive approach. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
3ebe1243 |
|
12-Jan-2011 |
Lasse Collin <lasse.collin@tukaani.org> |
decompressors: add boot-time XZ support This implements the API defined in <linux/decompress/generic.h> which is used for kernel, initramfs, and initrd decompression. This patch together with the first patch is enough for XZ-compressed initramfs and initrd; XZ-compressed kernel will need arch-specific changes. The buffering requirements described in decompress_unxz.c are stricter than with gzip, so the relevant changes should be done to the arch-specific code when adding support for XZ-compressed kernel. Similarly, the heap size in arch-specific pre-boot code may need to be increased (30 KiB is enough). The XZ decompressor needs memmove(), memeq() (memcmp() == 0), and memzero() (memset(ptr, 0, size)), which aren't available in all arch-specific pre-boot environments. I'm including simple versions in decompress_unxz.c, but a cleaner solution would naturally be nicer. Signed-off-by: Lasse Collin <lasse.collin@tukaani.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Alain Knaff <alain@knaff.lu> Cc: Albin Tonnerre <albin.tonnerre@free-electrons.com> Cc: Phillip Lougher <phillip@lougher.demon.co.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
43d547f9 |
|
17-Dec-2010 |
Jim Cromie <jim.cromie@gmail.com> |
init/Kconfig: fix typo Signed-off-by: Jim Cromie <jim.cromie@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
|
#
5091faa4 |
|
30-Nov-2010 |
Mike Galbraith <efault@gmx.de> |
sched: Add 'autogroup' scheduling feature: automated per session task groups A recurring complaint from CFS users is that parallel kbuild has a negative impact on desktop interactivity. This patch implements an idea from Linus, to automatically create task groups. Currently, only per session autogroups are implemented, but the patch leaves the way open for enhancement. Implementation: each task's signal struct contains an inherited pointer to a refcounted autogroup struct containing a task group pointer, the default for all tasks pointing to the init_task_group. When a task calls setsid(), a new task group is created, the process is moved into the new task group, and a reference to the preveious task group is dropped. Child processes inherit this task group thereafter, and increase it's refcount. When the last thread of a process exits, the process's reference is dropped, such that when the last process referencing an autogroup exits, the autogroup is destroyed. At runqueue selection time, IFF a task has no cgroup assignment, its current autogroup is used. Autogroup bandwidth is controllable via setting it's nice level through the proc filesystem: cat /proc/<pid>/autogroup Displays the task's group and the group's nice level. echo <nice level> > /proc/<pid>/autogroup Sets the task group's shares to the weight of nice <level> task. Setting nice level is rate limited for !admin users due to the abuse risk of task group locking. The feature is enabled from boot by default if CONFIG_SCHED_AUTOGROUP=y is selected, but can be disabled via the boot option noautogroup, and can also be turned on/off on the fly via: echo [01] > /proc/sys/kernel/sched_autogroup_enabled ... which will automatically move tasks to/from the root task group. Signed-off-by: Mike Galbraith <efault@gmx.de> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Markus Trippelsdorf <markus@trippelsdorf.de> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Paul Turner <pjt@google.com> Cc: Oleg Nesterov <oleg@redhat.com> [ Removed the task_group_path() debug code, and fixed !EVENTFD build failure. ] Signed-off-by: Ingo Molnar <mingo@elte.hu> LKML-Reference: <1290281700.28711.9.camel@maggy.simson.net> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
46fdb093 |
|
26-Oct-2010 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Make synchronize_srcu_expedited() fast if running readers The synchronize_srcu_expedited() function is currently quick if there are no active readers, but will delay a full jiffy if there are any. If these readers leave their SRCU read-side critical sections quickly, this is way too long to wait. So this commit first waits ten microseconds, and only then falls back to jiffy-at-a-time waiting. Reported-by: Avi Kivity <avi@redhat.com> Reported-by: Marcelo Tosatti <mtosatti@redhat.com> Tested-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
9e571a82 |
|
30-Sep-2010 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: add tracing for TINY_RCU and TINY_PREEMPT_RCU Add tracing for the tiny RCU implementations, including statistics on boosting in the case of TINY_PREEMPT_RCU and RCU_BOOST. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
24278d14 |
|
27-Sep-2010 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: priority boosting for TINY_PREEMPT_RCU Add priority boosting, but only for TINY_PREEMPT_RCU. This is enabled by the default-off RCU_BOOST kernel parameter. The priority to which to boost preempted RCU readers is controlled by the RCU_BOOST_PRIO kernel parameter (defaulting to real-time priority 1) and the time to wait before boosting the readers blocking a given grace period is controlled by the RCU_BOOST_DELAY kernel parameter (defaulting to 500 milliseconds). Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
a42c390c |
|
24-Nov-2010 |
Michal Hocko <mhocko@suse.cz> |
cgroups: make swap accounting default behavior configurable Swap accounting can be configured by CONFIG_CGROUP_MEM_RES_CTLR_SWAP configuration option and then it is turned on by default. There is a boot option (noswapaccount) which can disable this feature. This makes it hard for distributors to enable the configuration option as this feature leads to a bigger memory consumption and this is a no-go for general purpose distribution kernel. On the other hand swap accounting may be very usuful for some workloads. This patch adds a new configuration option which controls the default behavior (CGROUP_MEM_RES_CTLR_SWAP_ENABLED). If the option is selected then the feature is turned on by default. It also adds a new boot parameter swapaccount[=1|0] which enhances the original noswapaccount parameter semantic by means of enable/disable logic (defaults to 1 if no value is provided to be still consistent with noswapaccount). The default behavior is unchanged (if CONFIG_CGROUP_MEM_RES_CTLR_SWAP is enabled then CONFIG_CGROUP_MEM_RES_CTLR_SWAP_ENABLED is enabled as well) Signed-off-by: Michal Hocko <mhocko@suse.cz> Acked-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
7af37bec |
|
27-Oct-2010 |
Daniel Lezcano <daniel.lezcano@free.fr> |
namespaces Kconfig: move namespace menu location after the cgroup We have the namespaces as a menuconfig like the cgroup. The cgroup and the namespace are two base bricks for the containers. It is more logical to put the namespace menu right after the cgroup menu. Signed-off-by: Daniel Lezcano <daniel.lezcano@free.fr> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: "Serge E. Hallyn" <serue@us.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Paul Menage <menage@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
eef691b3 |
|
27-Oct-2010 |
Daniel Lezcano <daniel.lezcano@free.fr> |
namespaces Kconfig: remove the cgroup device whitelist experimental tag This subsystem is merged since a long time now, I think we can consider it mature enough. Signed-off-by: Daniel Lezcano <daniel.lezcano@free.fr> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: "Serge E. Hallyn" <serue@us.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Paul Menage <menage@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
79ae9c29 |
|
27-Oct-2010 |
Daniel Lezcano <daniel.lezcano@free.fr> |
namespaces Kconfig: remove pointless cgroup dependency The different cgroup subsystems are under the cgroup submenu. The dependency between the cgroups and the menu subsystems is pointless. Signed-off-by: Daniel Lezcano <daniel.lezcano@free.fr> Acked-by: Li Zefan <lizf@cn.fujitsu.com> Cc: "Serge E. Hallyn" <serue@us.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Paul Menage <menage@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
8dd2a82c |
|
27-Oct-2010 |
Daniel Lezcano <daniel.lezcano@free.fr> |
namespaces Kconfig: make namespace a submenu Make the namespaces config option a submenu. Signed-off-by: Daniel Lezcano <daniel.lezcano@free.fr> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: "Serge E. Hallyn" <serue@us.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Paul Menage <menage@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
17a6d441 |
|
27-Oct-2010 |
Daniel Lezcano <daniel.lezcano@free.fr> |
namespaces: default all the namespaces to 'yes' when CONFIG_NAMESPACES is selected As the different namespaces depend on 'CONFIG_NAMESPACES', it is logical to enable all the namespaces when we enable NAMESPACES. Signed-off-by: Daniel Lezcano <daniel.lezcano@free.fr> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: David Miller <davem@davemloft.net> Acked-By: Matt Helsley <matthltc@us.ibm.com> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
9bd38c2c |
|
27-Oct-2010 |
Daniel Lezcano <daniel.lezcano@free.fr> |
namespaces: remove pid_ns and net_ns experimental status The pid namespace is in the kernel since 2.6.27 and the net_ns since 2.6.29. They are enabled in the distro by default and used by userspace component. They are mature enough to remove the 'experimental' label. Signed-off-by: Daniel Lezcano <daniel.lezcano@free.fr> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: David Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
e52eec13 |
|
08-Sep-2010 |
Andi Kleen <ak@linux.intel.com> |
SYSFS: Allow boot time switching between deprecated and modern sysfs layout I have some systems which need legacy sysfs due to old tools that are making assumptions that a directory can never be a symlink to another directory, and it's a big hazzle to compile separate kernels for them. This patch turns CONFIG_SYSFS_DEPRECATED into a run time option that can be switched on/off the kernel command line. This way the same binary can be used in both cases with just a option on the command line. The old CONFIG_SYSFS_DEPRECATED_V2 option is still there to set the default. I kept the weird name to not break existing config files. Also the compat code can be still completely disabled by undefining CONFIG_SYSFS_DEPRECATED_SWITCH -- just the optimizer takes care of this now instead of lots of ifdefs. This makes the code look nicer. v2: This is an updated version on top of Kay's patch to only handle the block devices. I tested it on my old systems and that seems to work. Cc: axboe@kernel.dk Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
#
39aba963 |
|
04-Sep-2010 |
Kay Sievers <kay.sievers@vrfy.org> |
driver core: remove CONFIG_SYSFS_DEPRECATED_V2 but keep it for block devices This patch removes the old CONFIG_SYSFS_DEPRECATED_V2 config option, but it keeps the logic around to handle block devices in the old manner as some people like to run new kernel versions on old (pre 2007/2008) distros. Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Stephen Hemminger <shemminger@vyatta.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: "James E.J. Bottomley" <James.Bottomley@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: Tejun Heo <tj@kernel.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jaroslav Kysela <perex@perex.cz> Cc: Takashi Iwai <tiwai@suse.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: David Howells <dhowells@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
#
6de5bd12 |
|
11-Sep-2010 |
Arnd Bergmann <arnd@arndb.de> |
BKL: introduce CONFIG_BKL. With all the patches we have queued in the BKL removal tree, only a few dozen modules are left that actually rely on the BKL, and even there are lots of low-hanging fruit. We need to decide what to do about them, this patch illustrates one of the options: Every user of the BKL is marked as 'depends on BKL' in Kconfig, and the CONFIG_BKL becomes a user-visible option. If it gets disabled, no BKL using module can be built any more and the BKL code itself is compiled out. The one exception is file locking, which is practically always enabled and does a 'select BKL' instead. This effectively forces CONFIG_BKL to be enabled until we have solved the fs/lockd mess and can apply the patch that removes the BKL from fs/locks.c. Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
#
e360adbe |
|
14-Oct-2010 |
Peter Zijlstra <a.p.zijlstra@chello.nl> |
irq_work: Add generic hardirq context callbacks Provide a mechanism that allows running code in IRQ context. It is most useful for NMI code that needs to interact with the rest of the system -- like wakeup a task to drain buffers. Perf currently has such a mechanism, so extract that and provide it as a generic feature, independent of perf so that others may also benefit. The IRQ context callback is generated through self-IPIs where possible, or on architectures like powerpc the decrementer (the built-in timer facility) is set to generate an interrupt immediately. Architectures that don't have anything like this get to do with a callback from the timer tick. These architectures can call irq_work_run() at the tail of any IRQ handlers that might enqueue such work (like the perf IRQ handler) to avoid undue latencies in processing the work. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Kyle McMartin <kyle@mcmartin.ca> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> [ various fixes ] Signed-off-by: Huang Ying <ying.huang@intel.com> LKML-Reference: <1287036094.7768.291.camel@yhuang-dev> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
d9817ebe |
|
26-Sep-2010 |
Thomas Gleixner <tglx@linutronix.de> |
genirq: Provide Kconfig The generic irq Kconfig options are copied around all archs. Provide a generic Kconfig file which can be included. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> LKML-Reference: <20100927121843.217333624@linutronix.de> Reviewed-by: H. Peter Anvin <hpa@zytor.com> Reviewed-by: Ingo Molnar <mingo@elte.hu>
|
#
e43473b7 |
|
15-Sep-2010 |
Vivek Goyal <vgoyal@redhat.com> |
blkio: Core implementation of throttle policy o Actual implementation of throttling policy in block layer. Currently it implements READ and WRITE bytes per second throttling logic. IOPS throttling comes in later patches. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
|
#
681b3049 |
|
14-Jul-2010 |
Stephan Sperber <sperberstephan@googlemail.com> |
Kconfig: delete duplicate word Deleted a word which apeared twice. Signed-off-by: Stephan Sperber <sperberstephan@googlemail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
|
#
a57eb940 |
|
29-Jun-2010 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Add a TINY_PREEMPT_RCU Implement a small-memory-footprint uniprocessor-only implementation of preemptible RCU. This implementation uses but a single blocked-tasks list rather than the combinatorial number used per leaf rcu_node by TREE_PREEMPT_RCU, which reduces memory consumption and greatly simplifies processing. This version also takes advantage of uniprocessor execution to accelerate grace periods in the case where there are no readers. The general design is otherwise broadly similar to that of TREE_PREEMPT_RCU. This implementation is a step towards having RCU implementation driven off of the SMP and PREEMPT kernel configuration variables, which can happen once this implementation has accumulated sufficient experience. Removed ACCESS_ONCE() from __rcu_read_unlock() and added barrier() as suggested by Steve Rostedt in order to avoid the compiler-reordering issue noted by Mathieu Desnoyers (http://lkml.org/lkml/2010/8/16/183). As can be seen below, CONFIG_TINY_PREEMPT_RCU represents almost 5Kbyte savings compared to CONFIG_TREE_PREEMPT_RCU. Of course, for non-real-time workloads, CONFIG_TINY_RCU is even better. CONFIG_TREE_PREEMPT_RCU text data bss dec filename 13 0 0 13 kernel/rcupdate.o 6170 825 28 7023 kernel/rcutree.o ---- 7026 Total CONFIG_TINY_PREEMPT_RCU text data bss dec filename 13 0 0 13 kernel/rcupdate.o 2081 81 8 2170 kernel/rcutiny.o ---- 2183 Total CONFIG_TINY_RCU (non-preemptible) text data bss dec filename 13 0 0 13 kernel/rcupdate.o 719 25 0 744 kernel/rcutiny.o --- 757 Total Requested-by: Loïc Minier <loic.minier@canonical.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
4d87ffad |
|
04-Aug-2010 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Fix RCU_FANOUT help message Commit cf244dc01bf68 added a fourth level to the TREE_RCU hierarchy, but the RCU_FANOUT help message still said "cube root". This commit fixes this to "fourth root" and also emphasizes that production systems are well-served by the default. (Stress-testing RCU itself uses small RCU_FANOUT values in order to test large-system code paths on small(er) systems.) Located-by: John Kacur <jkacur@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
687d7a96 |
|
21-Jul-2010 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: restrict TREE_RCU to SMP builds with !PREEMPT Because both TINY_RCU and TREE_PREEMPT_RCU have been in mainline for several releases, it is time to restrict the use of TREE_RCU to SMP non-preemptible systems. This reduces testing/validation effort. This commit is a first step towards driving the selection of RCU implementation directly off of the SMP and PREEMPT configuration parameters. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
65e0e811 |
|
10-Aug-2010 |
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> |
memcg: remove experimental from swap account config It's 11 months since we changed swap_map[] to indicates SWAP_HAS_CACHE. Since that, memcg's swap accounting has been very stable and it seems it can be maintained. So, I'd like to remove EXPERIMENTAL from the config. Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com> Acked-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
939a67fc |
|
17-Dec-2009 |
Eric Paris <eparis@redhat.com> |
Audit: split audit watch Kconfig Audit watch should depend on CONFIG_AUDIT_SYSCALL and should select FSNOTIFY. This splits the spagetti like mixing of audit_watch and audit_filter code so they can be configured seperately. Signed-off-by: Eric Paris <eparis@redhat.com>
|
#
67640b60 |
|
17-Dec-2009 |
Eric Paris <eparis@redhat.com> |
Audit: audit watches depend on fsnotify CONFIG_AUDIT builds audit_watches which depend on fsnotify. Make CONFIG_AUDIT select fsnotify. Reported-by: Alexander Beregalov <a.beregalov@gmail.com> Signed-off-by: Eric Paris <eparis@redhat.com>
|
#
28a3a7eb |
|
17-Dec-2009 |
Eric Paris <eparis@redhat.com> |
audit: reimplement audit_trees using fsnotify rather than inotify Simply switch audit_trees from using inotify to using fsnotify for it's inode pinning and disappearing act information. Signed-off-by: Eric Paris <eparis@redhat.com>
|
#
181a51f6 |
|
20-Jul-2010 |
Tejun Heo <tj@kernel.org> |
slow-work: kill it slow-work doesn't have any user left. Kill it. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: David Howells <dhowells@redhat.com>
|
#
23637d47 |
|
15-May-2010 |
Frederic Weisbecker <fweisbec@gmail.com> |
lockup_detector: Introduce CONFIG_HARDLOCKUP_DETECTOR This new config is deemed to simplify even more the lockup detector dependencies and can make it easier to bring a smooth sorting between archs that support the new generic lockup detector and those that still have their own, especially for those that are in the middle of this migration. Instead of checking whether we have CONFIG_LOCKUP_DETECTOR + CONFIG_PERF_EVENTS_NMI each time an arch wants to know if it needs to build its own lockup detector, take a shortcut with this new config. It is enabled only if the hardlockup detection part of the whole lockup detector is on. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Don Zickus <dzickus@redhat.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com>
|
#
c01d4323 |
|
15-May-2010 |
Frederic Weisbecker <fweisbec@gmail.com> |
lockup_detector: Adapt CONFIG_PERF_EVENT_NMI to other archs CONFIG_PERF_EVENT_NMI is something that need to be enabled from the arch. This is fine on x86 as PERF_EVENTS is builtin but if other archs select it, they will need to handle the PERF_EVENTS dependency. Instead, handle the dependency in the generic layer: - archs need to tell what they support through HAVE_PERF_EVENTS_NMI - Enable magically PERF_EVENTS_NMI if we have PERF_EVENTS and HAVE_PERF_EVENTS_NMI. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Don Zickus <dzickus@redhat.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com>
|
#
58687acb |
|
07-May-2010 |
Don Zickus <dzickus@redhat.com> |
lockup_detector: Combine nmi_watchdog and softlockup detector The new nmi_watchdog (which uses the perf event subsystem) is very similar in structure to the softlockup detector. Using Ingo's suggestion, I combined the two functionalities into one file: kernel/watchdog.c. Now both the nmi_watchdog (or hardlockup detector) and softlockup detector sit on top of the perf event subsystem, which is run every 60 seconds or so to see if there are any lockups. To detect hardlockups, cpus not responding to interrupts, I implemented an hrtimer that runs 5 times for every perf event overflow event. If that stops counting on a cpu, then the cpu is most likely in trouble. To detect softlockups, tasks not yielding to the scheduler, I used the previous kthread idea that now gets kicked every time the hrtimer fires. If the kthread isn't being scheduled neither is anyone else and the warning is printed to the console. I tested this on x86_64 and both the softlockup and hardlockup paths work. V2: - cleaned up the Kconfig and softlockup combination - surrounded hardlockup cases with #ifdef CONFIG_PERF_EVENTS_NMI - seperated out the softlockup case from perf event subsystem - re-arranged the enabling/disabling nmi watchdog from proc space - added cpumasks for hardlockup failure cases - removed fallback to soft events if no PMU exists for hard events V3: - comment cleanups - drop support for older softlockup code - per_cpu cleanups - completely remove software clock base hardlockup detector - use per_cpu masking on hard/soft lockup detection - #ifdef cleanups - rename config option NMI_WATCHDOG to LOCKUP_DETECTOR - documentation additions V4: - documentation fixes - convert per_cpu to __get_cpu_var - powerpc compile fixes V5: - split apart warn flags for hard and soft lockups TODO: - figure out how to make an arch-agnostic clock2cycles call (if possible) to feed into perf events as a sample period [fweisbec: merged conflict patch] Signed-off-by: Don Zickus <dzickus@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Eric Paris <eparis@redhat.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> LKML-Reference: <1273266711-18706-2-git-send-email-dzickus@redhat.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
|
#
afc24d49 |
|
26-Apr-2010 |
Vivek Goyal <vgoyal@redhat.com> |
blk-cgroup: config options re-arrangement This patch fixes few usability and configurability issues. o All the cgroup based controller options are configurable from "Genral Setup/Control Group Support/" menu. blkio is the only exception. Hence make this option visible in above menu and make it configurable from there to bring it inline with rest of the cgroup based controllers. o Get rid of CONFIG_DEBUG_CFQ_IOSCHED. This option currently does two things. - Enable printing of cgroup paths in blktrace - Enables CONFIG_DEBUG_BLK_CGROUP, which in turn displays additional stat files in cgroup. If we are using group scheduling, blktrace data is of not really much use if cgroup information is not present. To get this data, currently one has to also enable CONFIG_DEBUG_CFQ_IOSCHED, which in turn brings the overhead of all the additional debug stat files which is not desired. Hence, this patch moves printing of cgroup paths under CONFIG_CFQ_GROUP_IOSCHED. This allows us to get rid of CONFIG_DEBUG_CFQ_IOSCHED completely. Now all the debug stat files are controlled only by CONFIG_DEBUG_BLK_CGROUP which can be enabled through config menu. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Acked-by: Divyesh Shah <dpshah@google.com> Reviewed-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
|
#
32bd7eb5 |
|
23-Mar-2010 |
Li Zefan <lizf@cn.fujitsu.com> |
sched: Remove remaining USER_SCHED code This is left over from commit 7c9414385e ("sched: Remove USER_SCHED"") Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Acked-by: Dhaval Giani <dhaval.giani@gmail.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: David Howells <dhowells@redhat.com> LKML-Reference: <4BA9A05F.7010407@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
0dea1168 |
|
10-Mar-2010 |
Kirill A. Shutemov <kirill@shutemov.name> |
cgroup: implement eventfd-based generic API for notifications This patchset introduces eventfd-based API for notifications in cgroups and implements memory notifications on top of it. It uses statistics in memory controler to track memory usage. Output of time(1) on building kernel on tmpfs: Root cgroup before changes: make -j2 506.37 user 60.93s system 193% cpu 4:52.77 total Non-root cgroup before changes: make -j2 507.14 user 62.66s system 193% cpu 4:54.74 total Root cgroup after changes (0 thresholds): make -j2 507.13 user 62.20s system 193% cpu 4:53.55 total Non-root cgroup after changes (0 thresholds): make -j2 507.70 user 64.20s system 193% cpu 4:55.70 total Root cgroup after changes (1 thresholds, never crossed): make -j2 506.97 user 62.20s system 193% cpu 4:53.90 total Non-root cgroup after changes (1 thresholds, never crossed): make -j2 507.55 user 64.08s system 193% cpu 4:55.63 total This patch: Introduce the write-only file "cgroup.event_control" in every cgroup. To register new notification handler you need: - create an eventfd; - open a control file to be monitored. Callbacks register_event() and unregister_event() must be defined for the control file; - write "<event_fd> <control_fd> <args>" to cgroup.event_control. Interpretation of args is defined by control file implementation; eventfd will be woken up by control file implementation or when the cgroup is removed. To unregister notification handler just close eventfd. If you need notification functionality for a control file you have to implement callbacks register_event() and unregister_event() in the struct cftype. [kamezawa.hiroyu@jp.fujitsu.com: Kconfig fix] Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Paul Menage <menage@google.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: Dan Malek <dan@embeddedalley.com> Cc: Vladislav Buzov <vbuzov@embeddedalley.com> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: Alexander Shishkin <virtuoso@slind.org> Cc: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
b309a294 |
|
26-Feb-2010 |
Robert Richter <robert.richter@amd.com> |
oprofile: remove EXPERIMENTAL from the config option description OProfile is already used for a long time and no longer experimental. Signed-off-by: Robert Richter <robert.richter@amd.com>
|
#
8bd93a2c |
|
22-Feb-2010 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Accelerate grace period if last non-dynticked CPU Currently, rcu_needs_cpu() simply checks whether the current CPU has an outstanding RCU callback, which means that the last CPU to go into dyntick-idle mode might wait a few ticks for the relevant grace periods to complete. However, if all the other CPUs are in dyntick-idle mode, and if this CPU is in a quiescent state (which it is for RCU-bh and RCU-sched any time that we are considering going into dyntick-idle mode), then the grace period is instantly complete. This patch therefore repeatedly invokes the RCU grace-period machinery in order to force any needed grace periods to complete quickly. It does so a limited number of times in order to prevent starvation by an RCU callback function that might pass itself to call_rcu(). However, if any CPU other than the current one is not in dyntick-idle mode, fall back to simply checking (with fix to bug noted by Lai Jiangshan). Also, take advantage of last grace-period forcing, the opportunity to do so noted by Steve Rostedt. And apply simplified #ifdef condition suggested by Frederic Weisbecker. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: laijs@cn.fujitsu.com Cc: dipankar@in.ibm.com Cc: mathieu.desnoyers@polymtl.ca Cc: josh@joshtriplett.org Cc: dvhltc@us.ibm.com Cc: niv@us.ibm.com Cc: peterz@infradead.org Cc: rostedt@goodmis.org Cc: Valdis.Kletnieks@vt.edu Cc: dhowells@redhat.com LKML-Reference: <1266887105-1528-15-git-send-email-paulmck@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
c3128fb6 |
|
12-Feb-2010 |
Don Zickus <dzickus@redhat.com> |
nmi_watchdog: Use a boolean config flag for compiling Determines if an arch has setup arch specific perf_events and nmi_watchdog code. This should restrict compiles to only those arches ready. Signed-off-by: Don Zickus <dzickus@redhat.com> Cc: peterz@infradead.org Cc: gorcunov@gmail.com Cc: aris@redhat.com LKML-Reference: <1266013161-31197-1-git-send-email-dzickus@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
84336466 |
|
21-Dec-2009 |
Roland McGrath <roland@redhat.com> |
kconfig CROSS_COMPILE option This adds CROSS_COMPILE as a kconfig string so you can store it in .config. Then you can use plain "make" in the configured kernel build directory to do the right cross compilation without setting the command-line or environment variable every time. With this, you can set up different build directories for different kernel configurations, whether native or cross-builds, and then use the simple: make -C /build/dir M=module-source-dir idiom to build modules for any given target kernel, indicating which one by nothing but the build directory chosen. I tried a version that defaults the string with env="CROSS_COMPILE" so that in a "make oldconfig" with CROSS_COMPILE in the environment you can just hit return to store the way you're building it. But the kconfig prompt for strings doesn't give you any way to say you want an empty string instead of the default, so I punted that. Signed-off-by: Roland McGrath <roland@redhat.com> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Anibal Monsalve Salazar <anibal@debian.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Michal Marek <mmarek@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Michal Marek <mmarek@suse.cz>
|
#
7c941438 |
|
20-Jan-2010 |
Dhaval Giani <dhaval.giani@gmail.com> |
sched: Remove USER_SCHED Remove the USER_SCHED feature. It has been scheduled to be removed in 2.6.34 as per http://marc.info/?l=linux-kernel&m=125728479022976&w=2 Signed-off-by: Dhaval Giani <dhaval.giani@gmail.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1263990378.24844.3.camel@localhost> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
7dd65feb |
|
08-Jan-2010 |
Albin Tonnerre <albin.tonnerre@free-electrons.com> |
lib: add support for LZO-compressed kernels This patch series adds generic support for creating and extracting LZO-compressed kernel images, as well as support for using such images on the x86 and ARM architectures, and support for creating and using LZO-compressed initrd and initramfs images. Russell King said: : Testing on a Cortex A9 model: : - lzo decompressor is 65% of the time gzip takes to decompress a kernel : - lzo kernel is 9% larger than a gzip kernel : : which I'm happy to say confirms your figures when comparing the two. : : However, when comparing your new gzip code to the old gzip code: : - new is 99% of the size of the old code : - new takes 42% of the time to decompress than the old code : : What this means is that for a proper comparison, the results get even better: : - lzo is 7.5% larger than the old gzip'd kernel image : - lzo takes 28% of the time that the old gzip code took : : So the expense seems definitely worth the effort. The only reason I : can think of ever using gzip would be if you needed the additional : compression (eg, because you have limited flash to store the image.) : : I would argue that the default for ARM should therefore be LZO. This patch: The lzo compressor is worse than gzip at compression, but faster at extraction. Here are some figures for an ARM board I'm working on: Uncompressed size: 3.24Mo gzip 1.61Mo 0.72s lzo 1.75Mo 0.48s So for a compression ratio that is still relatively close to gzip, it's much faster to extract, at least in that case. This part contains: - Makefile routine to support lzo compression - Fixes to the existing lzo compressor so that it can be used in compressed kernels - wrapper around the existing lzo1x_decompress, as it only extracts one block at a time, while we need to extract a whole file here - config dialog for kernel compression [akpm@linux-foundation.org: coding-style fixes] [akpm@linux-foundation.org: cleanup] Signed-off-by: Albin Tonnerre <albin.tonnerre@free-electrons.com> Tested-by: Wu Zhangjin <wuzhangjin@gmail.com> Acked-by: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Tested-by: Russell King <rmk@arm.linux.org.uk> Acked-by: Russell King <rmk@arm.linux.org.uk> Cc: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
16295bec |
|
06-Jan-2010 |
Steffen Klassert <steffen.klassert@secunet.com> |
padata: Generic parallelization/serialization interface This patch introduces an interface to process data objects in parallel. The parallelized objects return after serialization in the same order as they were before the parallelization. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
#
07b139c8 |
|
20-Dec-2009 |
Li Zefan <lizf@cn.fujitsu.com> |
perf events: Remove CONFIG_EVENT_PROFILE Quoted from Ingo: | This reminds me - i think we should eliminate CONFIG_EVENT_PROFILE - | it's an unnecessary Kconfig complication. If both PERF_EVENTS and | EVENT_TRACING is enabled we should expose generic tracepoints. | | Nor is it limited to event 'profiling', so it has become a misnomer as | well. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paul Mackerras <paulus@samba.org> LKML-Reference: <4B2F1557.2050705@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
ea637639 |
|
14-Dec-2009 |
Jie Zhang <jie.zhang@analog.com> |
nommu: fix malloc performance by adding uninitialized flag The NOMMU code currently clears all anonymous mmapped memory. While this is what we want in the default case, all memory allocation from userspace under NOMMU has to go through this interface, including malloc() which is allowed to return uninitialized memory. This can easily be a significant performance penalty. So for constrained embedded systems were security is irrelevant, allow people to avoid clearing memory unnecessarily. This also alters the ELF-FDPIC binfmt such that it obtains uninitialised memory for the brk and stack region. Signed-off-by: Jie Zhang <jie.zhang@analog.com> Signed-off-by: Robin Getz <rgetz@blackfin.uclinux.org> Signed-off-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Paul Mundt <lethal@linux-sh.org> Acked-by: Greg Ungerer <gerg@snapgear.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
9e9868a7 |
|
03-Dec-2009 |
Uwe Kleine-König <u.kleine-koenig@pengutronix.de> |
sysfs: deprecated features are to help old tools not to confuse them Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Acked-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: Greg KH <gregkh@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
e9438e31 |
|
01-Dec-2009 |
Randy Dunlap <randy.dunlap@oracle.com> |
sysfs: fix SYSFS_DEPRECATED_V2 prompt The SYSFS_DEPRECATED_V2 says "remove" older, deprecated features, but it actually enables them, so correct this confusing, backwards text. Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: Kay Sievers <kay.sievers@vrfy.org> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
f13a48bd |
|
01-Dec-2009 |
David Howells <dhowells@redhat.com> |
SLOW_WORK: Move slow_work's proc file to debugfs Move slow_work's debugging proc file to debugfs. Signed-off-by: David Howells <dhowells@redhat.com> Requested-and-acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
8fba10a4 |
|
19-Nov-2009 |
David Howells <dhowells@redhat.com> |
SLOW_WORK: Allow the work items to be viewed through a /proc file Allow the executing and queued work items to be viewed through a /proc file for debugging purposes. The contents look something like the following: THR PID ITEM ADDR FL MARK DESC === ===== ================ == ===== ========== 0 3005 ffff880023f52348 a 952ms FSC: OBJ17d3: LOOK 1 3006 ffff880024e33668 2 160ms FSC: OBJ17e5 OP60d3b: Write1/Store fl=2 2 3165 ffff8800296dd180 a 424ms FSC: OBJ17e4: LOOK 3 4089 ffff8800262c8d78 a 212ms FSC: OBJ17ea: CRTN 4 4090 ffff88002792bed8 2 388ms FSC: OBJ17e8 OP60d36: Write1/Store fl=2 5 4092 ffff88002a0ef308 2 388ms FSC: OBJ17e7 OP60d2e: Write1/Store fl=2 6 4094 ffff88002abaf4b8 2 132ms FSC: OBJ17e2 OP60d4e: Write1/Store fl=2 7 4095 ffff88002bb188e0 a 388ms FSC: OBJ17e9: CRTN vsq - ffff880023d99668 1 308ms FSC: OBJ17e0 OP60f91: Write1/EnQ fl=2 vsq - ffff8800295d1740 1 212ms FSC: OBJ16be OP4d4b6: Write1/EnQ fl=2 vsq - ffff880025ba3308 1 160ms FSC: OBJ179a OP58dec: Write1/EnQ fl=2 vsq - ffff880024ec83e0 1 160ms FSC: OBJ17ae OP599f2: Write1/EnQ fl=2 vsq - ffff880026618e00 1 160ms FSC: OBJ17e6 OP60d33: Write1/EnQ fl=2 vsq - ffff880025a2a4b8 1 132ms FSC: OBJ16a2 OP4d583: Write1/EnQ fl=2 vsq - ffff880023cbe6d8 9 212ms FSC: OBJ17eb: LOOK vsq - ffff880024d37590 9 212ms FSC: OBJ17ec: LOOK vsq - ffff880027746cb0 9 212ms FSC: OBJ17ed: LOOK vsq - ffff880024d37ae8 9 212ms FSC: OBJ17ee: LOOK vsq - ffff880024d37cb0 9 212ms FSC: OBJ17ef: LOOK vsq - ffff880025036550 9 212ms FSC: OBJ17f0: LOOK vsq - ffff8800250368e0 9 212ms FSC: OBJ17f1: LOOK vsq - ffff880025036aa8 9 212ms FSC: OBJ17f2: LOOK In the 'THR' column, executing items show the thread they're occupying and queued threads indicate which queue they're on. 'PID' shows the process ID of a slow-work thread that's executing something. 'FL' shows the work item flags. 'MARK' indicates how long since an item was queued or began executing. Lastly, the 'DESC' column permits the owner of an item to give some information. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
6beb0009 |
|
09-Nov-2009 |
Thomas Gleixner <tglx@linutronix.de> |
locking: Make inlining decision Kconfig based commit 892a7c67 (locking: Allow arch-inlined spinlocks) implements the selection of which lock functions are inlined based on defines in arch/.../spinlock.h: #define __always_inline__LOCK_FUNCTION Despite of the name __always_inline__* the lock functions can be built out of line depending on config options. Also if the arch does not set some inline defines the generic code might set them; again depending on config options. This makes it unnecessary hard to figure out when and which lock functions are inlined. Aside of that it makes it way harder and messier for -rt to manipulate the lock functions. Convert the inlining decision to CONFIG switches. Each lock function is inlined depending on CONFIG_INLINE_*. The configs implement the existing dependencies. The architecture code can select ARCH_INLINE_* to signal that it wants the corresponding lock function inlined. ARCH_INLINE_* is necessary as Kconfig ignores "depends on" restrictions when a config element is selected. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> LKML-Reference: <20091109151428.504477141@linutronix.de> Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> Reviewed-by: Ingo Molnar <mingo@elte.hu> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
|
#
26a7034b |
|
05-Nov-2009 |
Eric W. Biederman <ebiederm@xmission.com> |
sysctl: Reduce sys_sysctl to a compatibility wrapper around /proc/sys To simply maintenance and to be able to remove all of the binary sysctl support from various subsystems I have rewritten the binary sysctl code as a compatibility wrapper around proc/sys. The code is built around a hard coded table based on the table in sysctl_check.c that lists all of our current binary sysctls and provides enough information to convert from the sysctl binary input into into ascii and back again. New in this patch is the realization that the only dynamic entries that need to be handled have ifname as the asscii string and ifindex as their ctl_name. When a sys_sysctl is called the code now looks in the translation table converting the binary name to the path under /proc where the value is to be found. Opens that file, and calls into a format conversion wrapper that calls fop->read and then fop->write as appropriate. Since in practice the practically no one uses or tests sys_sysctl rewritting the code to be beautiful is a little silly. The redeeming merit of this work is it allows us to rip out all of the binary sysctl syscall support from everywhere else in the tree. Allowing us to remove a lot of dead (after this patch) and barely maintained code. In addition it becomes much easier to optimize the sysctl implementation for being the backing store of /proc/sys, without having to worry about sys_sysctl. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
#
dd77038d |
|
30-Oct-2009 |
Thadeu Lima de Souza Cascardo <cascardo@holoscopio.com> |
perf_events: Fix some typo in the perf events config description Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@holoscopio.com> Cc: trivial@kernel.org LKML-Reference: <1256938346-8230-1-git-send-email-cascardo@holoscopio.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
022382a5 |
|
16-Oct-2009 |
Kumar Gala <galak@kernel.crashing.org> |
powerpc: Minor cleanup to init/Kconfig We dont need to depend on PPC64 explicitly as all powerpc platforms (32-bit and 64-bit) define PPC now. Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
9b1d82fa |
|
25-Oct-2009 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: "Tiny RCU", The Bloatwatch Edition This patch is a version of RCU designed for !SMP provided for a small-footprint RCU implementation. In particular, the implementation of synchronize_rcu() is extremely lightweight and high performance. It passes rcutorture testing in each of the four relevant configurations (combinations of NO_HZ and PREEMPT) on x86. This saves about 1K bytes compared to old Classic RCU (which is no longer in mainline), and more than three kilobytes compared to Hierarchical RCU (updated to 2.6.30): CONFIG_TREE_RCU: text data bss dec filename 183 4 0 187 kernel/rcupdate.o 2783 520 36 3339 kernel/rcutree.o 3526 Total (vs 4565 for v7) CONFIG_TREE_PREEMPT_RCU: text data bss dec filename 263 4 0 267 kernel/rcupdate.o 4594 776 52 5422 kernel/rcutree.o 5689 Total (6155 for v7) CONFIG_TINY_RCU: text data bss dec filename 96 4 0 100 kernel/rcupdate.o 734 24 0 758 kernel/rcutiny.o 858 Total (vs 848 for v7) The above is for x86. Your mileage may vary on other platforms. Further compression is possible, but is being procrastinated. Changes from v7 (http://lkml.org/lkml/2009/10/9/388) o Apply Lai Jiangshan's review comments (aside from might_sleep() in synchronize_sched(), which is covered by SMP builds). o Fix up expedited primitives. Changes from v6 (http://lkml.org/lkml/2009/9/23/293). o Forward ported to put it into the 2.6.33 stream. o Added lockdep support. o Make lightweight rcu_barrier. Changes from v5 (http://lkml.org/lkml/2009/6/23/12). o Ported to latest pre-2.6.32 merge window kernel. - Renamed rcu_qsctr_inc() to rcu_sched_qs(). - Renamed rcu_bh_qsctr_inc() to rcu_bh_qs(). - Provided trivial rcu_cpu_notify(). - Provided trivial exit_rcu(). - Provided trivial rcu_needs_cpu(). - Fixed up the rcu_*_enter/exit() functions in linux/hardirq.h. o Removed the dependence on EMBEDDED, with a view to making TINY_RCU default for !SMP at some time in the future. o Added (trivial) support for expedited grace periods. Changes from v4 (http://lkml.org/lkml/2009/5/2/91) include: o Squeeze the size down a bit further by removing the ->completed field from struct rcu_ctrlblk. o This permits synchronize_rcu() to become the empty function. Previous concerns about rcutorture were unfounded, as rcutorture correctly handles a constant value from rcu_batches_completed() and rcu_batches_completed_bh(). Changes from v3 (http://lkml.org/lkml/2009/3/29/221) include: o Changed rcu_batches_completed(), rcu_batches_completed_bh() rcu_enter_nohz(), rcu_exit_nohz(), rcu_nmi_enter(), and rcu_nmi_exit(), to be static inlines, as suggested by David Howells. Doing this saves about 100 bytes from rcutiny.o. (The numbers between v3 and this v4 of the patch are not directly comparable, since they are against different versions of Linux.) Changes from v2 (http://lkml.org/lkml/2009/2/3/333) include: o Fix whitespace issues. o Change short-circuit "||" operator to instead be "+" in order to fix performance bug noted by "kraai" on LWN. (http://lwn.net/Articles/324348/) Changes from v1 (http://lkml.org/lkml/2009/1/13/440) include: o This version depends on EMBEDDED as well as !SMP, as suggested by Ingo. o Updated rcu_needs_cpu() to unconditionally return zero, permitting the CPU to enter dynticks-idle mode at any time. This works because callbacks can be invoked upon entry to dynticks-idle mode. o Paul is now OK with this being included, based on a poll at the Kernel Miniconf at linux.conf.au, where about ten people said that they cared about saving 900 bytes on single-CPU systems. o Applies to both mainline and tip/core/rcu. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: David Howells <dhowells@redhat.com> Acked-by: Josh Triplett <josh@joshtriplett.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: dipankar@in.ibm.com Cc: mathieu.desnoyers@polymtl.ca Cc: dvhltc@us.ibm.com Cc: niv@us.ibm.com Cc: peterz@infradead.org Cc: rostedt@goodmis.org Cc: Valdis.Kletnieks@vt.edu Cc: avi@redhat.com Cc: mtosatti@redhat.com LKML-Reference: <12565226351355-git-send-email-> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
906010b2 |
|
21-Sep-2009 |
Peter Zijlstra <a.p.zijlstra@chello.nl> |
perf_event: Provide vmalloc() based mmap() backing Some architectures such as Sparc, ARM and MIPS (basically everything with flush_dcache_page()) need to deal with dcache aliases by carefully placing pages in both kernel and user maps. These architectures typically have to use vmalloc_user() for this. However, on other architectures, vmalloc() is not needed and has the downsides of being more restricted and slower than regular allocations. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: David Miller <davem@davemloft.net> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Paul Mackerras <paulus@samba.org> LKML-Reference: <1254830228.21044.272.camel@laptop> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
57c0c15b |
|
20-Sep-2009 |
Ingo Molnar <mingo@elte.hu> |
perf: Tidy up after the big rename - provide compatibility Kconfig entry for existing PERF_COUNTERS .config's - provide courtesy copy of old perf_counter.h, for user-space projects - small indentation fixups - fix up MAINTAINERS - fix small x86 printout fallout - fix up small PowerPC comment fallout (use 'counter' as in register) Reviewed-by: Arjan van de Ven <arjan@linux.intel.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
cdd6c482 |
|
20-Sep-2009 |
Ingo Molnar <mingo@elte.hu> |
perf: Do the big rename: Performance Counters -> Performance Events Bye-bye Performance Counters, welcome Performance Events! In the past few months the perfcounters subsystem has grown out its initial role of counting hardware events, and has become (and is becoming) a much broader generic event enumeration, reporting, logging, monitoring, analysis facility. Naming its core object 'perf_counter' and naming the subsystem 'perfcounters' has become more and more of a misnomer. With pending code like hw-breakpoints support the 'counter' name is less and less appropriate. All in one, we've decided to rename the subsystem to 'performance events' and to propagate this rename through all fields, variables and API names. (in an ABI compatible fashion) The word 'event' is also a bit shorter than 'counter' - which makes it slightly more convenient to write/handle as well. Thanks goes to Stephane Eranian who first observed this misnomer and suggested a rename. User-space tooling and ABI compatibility is not affected - this patch should be function-invariant. (Also, defconfigs were not touched to keep the size down.) This patch has been generated via the following script: FILES=$(find * -type f | grep -vE 'oprofile|[^K]config') sed -i \ -e 's/PERF_EVENT_/PERF_RECORD_/g' \ -e 's/PERF_COUNTER/PERF_EVENT/g' \ -e 's/perf_counter/perf_event/g' \ -e 's/nb_counters/nb_events/g' \ -e 's/swcounter/swevent/g' \ -e 's/tpcounter_event/tp_event/g' \ $FILES for N in $(find . -name perf_counter.[ch]); do M=$(echo $N | sed 's/perf_counter/perf_event/g') mv $N $M done FILES=$(find . -name perf_event.*) sed -i \ -e 's/COUNTER_MASK/REG_MASK/g' \ -e 's/COUNTER/EVENT/g' \ -e 's/\<event\>/event_id/g' \ -e 's/counter/event/g' \ -e 's/Counter/Event/g' \ $FILES ... to keep it as correct as possible. This script can also be used by anyone who has pending perfcounters patches - it converts a Linux kernel tree over to the new naming. We tried to time this change to the point in time where the amount of pending patches is the smallest: the end of the merge window. Namespace clashes were fixed up in a preparatory patch - and some stylistic fallout will be fixed up in a subsequent patch. ( NOTE: 'counters' are still the proper terminology when we deal with hardware registers - and these sed scripts are a bit over-eager in renaming them. I've undone some of that, but in case there's something left where 'counter' would be better than 'event' we can undo that on an individual basis instead of touching an otherwise nicely automated patch. ) Suggested-by: Stephane Eranian <eranian@google.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Paul Mackerras <paulus@samba.org> Reviewed-by: Arjan van de Ven <arjan@linux.intel.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: David Howells <dhowells@redhat.com> Cc: Kyle McMartin <kyle@mcmartin.ca> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: <linux-arch@vger.kernel.org> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
99657c78 |
|
18-Sep-2009 |
Randy Dunlap <randy.dunlap@oracle.com> |
kernel hacking: move STRIP_ASM_SYMS from General Sam suggested moving STRIP_ASM_SYMS into the Kernel hacking menu from the General Setup menu. It makes more sense there. Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
|
#
fc537766 |
|
17-Sep-2009 |
Christoph Hellwig <hch@lst.de> |
tracing: Remove markers Now that the last users of markers have migrated to the event tracer we can kill off the (now orphan) support code. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <20090917173527.GA1699@lst.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
bbe3eae8 |
|
13-Sep-2009 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Kconfig help needs to say that TREE_PREEMPT_RCU scales down To quote Valdis: This leaves somebody who has a laptop wondering which choice is best for a system with only one or two cores that has CONFIG_PREEMPT defined. One choice says it scales down nicely, the other explicitly has a 'depends on PREEMPT' attached to it... So add "scales down nicely" to TREE_PREEMPT_RCU to match that of TREE_RCU. Suggested-by: Valdis Kletnieks <Valdis.Kletnieks@vt.edu> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: laijs@cn.fujitsu.com Cc: dipankar@in.ibm.com Cc: akpm@linux-foundation.org Cc: mathieu.desnoyers@polymtl.ca Cc: josh@joshtriplett.org Cc: dvhltc@us.ibm.com Cc: niv@us.ibm.com Cc: peterz@infradead.org Cc: rostedt@goodmis.org LKML-Reference: <12528585112362-git-send-email-> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
6b3ef48a |
|
22-Aug-2009 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Remove CONFIG_PREEMPT_RCU Now that CONFIG_TREE_PREEMPT_RCU is in place, there is no further need for CONFIG_PREEMPT_RCU. Remove it, along with whatever subtle bugs it may (or may not) contain. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: laijs@cn.fujitsu.com Cc: dipankar@in.ibm.com Cc: akpm@linux-foundation.org Cc: mathieu.desnoyers@polymtl.ca Cc: josht@linux.vnet.ibm.com Cc: dvhltc@us.ibm.com Cc: niv@us.ibm.com Cc: peterz@infradead.org Cc: rostedt@goodmis.org LKML-Reference: <125097461396-git-send-email-> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
f41d911f |
|
22-Aug-2009 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Merge preemptable-RCU functionality into hierarchical RCU Create a kernel/rcutree_plugin.h file that contains definitions for preemptable RCU (or, under the #else branch of the #ifdef, empty definitions for the classic non-preemptable semantics). These definitions fit into plugins defined in kernel/rcutree.c for this purpose. This variant of preemptable RCU uses a new algorithm whose read-side expense is roughly that of classic hierarchical RCU under CONFIG_PREEMPT. This new algorithm's update-side expense is similar to that of classic hierarchical RCU, and, in absence of read-side preemption or blocking, is exactly that of classic hierarchical RCU. Perhaps more important, this new algorithm has a much simpler implementation, saving well over 1,000 lines of code compared to mainline's implementation of preemptable RCU, which will hopefully be retired in favor of this new algorithm. The simplifications are obtained by maintaining per-task nesting state for running tasks, and using a simple lock-protected algorithm to handle accounting when tasks block within RCU read-side critical sections, making use of lessons learned while creating numerous user-level RCU implementations over the past 18 months. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: laijs@cn.fujitsu.com Cc: dipankar@in.ibm.com Cc: akpm@linux-foundation.org Cc: mathieu.desnoyers@polymtl.ca Cc: josht@linux.vnet.ibm.com Cc: dvhltc@us.ibm.com Cc: niv@us.ibm.com Cc: peterz@infradead.org Cc: rostedt@goodmis.org LKML-Reference: <12509746134003-git-send-email-> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
f2654260 |
|
29-Jun-2009 |
Ingo Molnar <mingo@elte.hu> |
perf_counter: Set the CONFIG_PERF_COUNTERS default to y if CONFIG_PROFILING=y If user has already enabled profiling support in the kernel (for oprofile, old-style profiling of ftrace) then offer up perfcounters with a y default in interactive kconfig sessions. Still keep it off by default otherwise. Cc: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
470a1396 |
|
29-Jul-2009 |
Peter Zijlstra <a.p.zijlstra@chello.nl> |
tracing, perf_counter: Add help text to CONFIG_EVENT_PROFILE Explain what tracepoint profiling sources are about. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Jeff Garzik <jeff@garzik.org> LKML-Reference: <1248856508.6987.3041.camel@twins> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
d4d7d0b9 |
|
06-Jul-2009 |
Chris Wilson <chris@chris-wilson.co.uk> |
perf_counter: Fix the tracepoint channel to perfcounters Fix a missed rename in EVENT_PROFILE support so that it gets built and allows tracepoint tracing from the 'perf' tool. Fix a typo in the (never before built & enabled) portion in perf_counter.c as well, and update that code to the attr.config changes as well. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ben Gamari <bgamari.foss@gmail.com> Cc: Jason Baron <jbaron@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Steven Rostedt <rostedt@goodmis.org> LKML-Reference: <1246869094-21237-1-git-send-email-chris@chris-wilson.co.uk> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
c17ef453 |
|
23-Jun-2009 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Remove Classic RCU Remove Classic RCU, given that the combination of Tree RCU and the proposed Bloatwatch RCU do everything that Classic RCU can with fewer bugs. Tree RCU has been default in x86 builds for almost six months, and seems to be quite reliable, so there does not seem to be much justification for keeping the Classic RCU code and config complexity around anymore. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com> Cc: akpm@linux-foundation.org Cc: niv@us.ibm.com Cc: dvhltc@us.ibm.com Cc: dipankar@in.ibm.com Cc: dhowells@redhat.com Cc: lethal@linux-sh.org Cc: kernel@wantstofly.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
b99b87f7 |
|
17-Jun-2009 |
Peter Oberparleiter <oberpar@linux.vnet.ibm.com> |
kernel: constructor support Call constructors (gcc-generated initcall-like functions) during kernel start and module load. Constructors are e.g. used for gcov data initialization. Disable constructor support for usermode Linux to prevent conflicts with host glibc. Signed-off-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: WANG Cong <xiyou.wangcong@gmail.com> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Jeff Dike <jdike@addtoit.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Huang Ying <ying.huang@intel.com> Cc: Li Wei <W.Li@Sun.COM> Cc: Michael Ellerman <michaele@au1.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Heiko Carstens <heicars2@linux.vnet.ibm.com> Cc: Martin Schwidefsky <mschwid2@linux.vnet.ibm.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
f6ee649f |
|
16-Apr-2009 |
Kay Sievers <kay.sievers@vrfy.org> |
driver core: set default SYSFS_DEPRECATED=n All recent distros depend on the non-deprecated sysfs layout, so change the default value of the option to reflect that. Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
#
018df72d |
|
12-Jun-2009 |
Mike Frysinger <vapier@gentoo.org> |
perf_counter: Start documenting HAVE_PERF_COUNTERS requirements Help out arch porters who want to support perf counters by listing some basic requirements. Signed-off-by: Mike Frysinger <vapier@gentoo.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> LKML-Reference: <1244827063-24046-1-git-send-email-vapier@gentoo.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
63c882a0 |
|
21-May-2009 |
Eric Paris <eparis@redhat.com> |
inotify: reimplement inotify using fsnotify Reimplement inotify_user using fsnotify. This should be feature for feature exactly the same as the original inotify_user. This does not make any changes to the in kernel inotify feature used by audit. Those patches (and the eventual removal of in kernel inotify) will come after the new inotify_user proves to be working correctly. Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de>
|
#
8dc8e5e8 |
|
11-Jun-2009 |
Ingo Molnar <mingo@elte.hu> |
perf_counter: Turn off by default Perfcounters were enabled by default to help testing - but now that we are submitting it upstream, make it default-disabled. Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
a9eb5223 |
|
05-Jun-2009 |
Randy Dunlap <randy.dunlap@oracle.com> |
menu: fix embedded menu presentation The STRIP_ASM_SYMS kconfig symbol mucks up the embedded menu because STRIP_ASM_SYMS is in the middle of the embedded menu items but it does not depend on EMBEDDED. Move it to beyond the end of the embedded menu so that the menu is presented correctly. Or if STRIP_ASM_SYMS should depend on EMBEDDED, that can also be fixed. Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
|
#
5d7d18f5 |
|
04-Mar-2009 |
David Howells <dhowells@redhat.com> |
kbuild: make it possible for the linker to discard local symbols from vmlinux Make it possible for the linker to discard local symbols from vmlinux as they cause vmlinux to balloon when CONFIG_KALLSYMS=y and they cause dump_stack() and get_wchan() to produce useless information under some circumstances. With this we add a config option (CONFIG_STRIP_ASM_SYMS) that will cause the build to supply -X to the linker to tell it to strip temporary local symbols. This doesn't seem to cause gdb any problems. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
|
#
bdc8e5f8 |
|
06-Apr-2009 |
Serge E. Hallyn <serue@us.ibm.com> |
namespaces: mqueue namespace: adapt sysctl Largely inspired from ipc/ipc_sysctl.c. This patch isolates the mqueue sysctl stuff in its own file. [akpm@linux-foundation.org: build fix] Signed-off-by: Cedric Le Goater <clg@fr.ibm.com> Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net> Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
614b84cf |
|
06-Apr-2009 |
Serge E. Hallyn <serue@us.ibm.com> |
namespaces: mqueue ns: move mqueue_mnt into struct ipc_namespace Move mqueue vfsmount plus a few tunables into the ipc_namespace struct. The CONFIG_IPC_NS boolean and the ipc_namespace struct will serve both the posix message queue namespaces and the SYSV ipc namespaces. The sysctl code will be fixed separately in patch 3. After just this patch, making a change to posix mqueue tunables always changes the values in the initial ipc namespace. Signed-off-by: Cedric Le Goater <clg@fr.ibm.com> Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
1c2d008c |
|
06-Apr-2009 |
David Howells <dhowells@redhat.com> |
Make CONFIG_SLOW_WORK an automatic rather than manual config option Make CONFIG_SLOW_WORK an automatic rather than manual config option so that people configuring their kernels don't have to make the choice. It can be selected automatically by those things that require it (such as FS-Cache). Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Jeff Garzik <jgarzik@redhat.com> Acked-by: Kyle McMartin <kyle@mcmartin.ca> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
e077df4f |
|
19-Mar-2009 |
Peter Zijlstra <a.p.zijlstra@chello.nl> |
perf_counter: hook up the tracepoint events Impact: new perfcounters feature Enable usage of tracepoints as perf counter events. tracepoint event ids can be found in /debug/tracing/event/*/*/id and (for now) are represented as -65536+id in the type field. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Steven Rostedt <rostedt@goodmis.org> Orig-LKML-Reference: <20090319194233.744044174@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
31c9a24e |
|
02-Apr-2009 |
Paul E. McKenney <paulmck@kernel.org> |
RCU: make treercu be default Impact: switch default config from CLASSIC_RCU to TREE_RCU Given that I have not gotten any complaints or bug reports on treercu recently, this patch makes it be the default. There are a number of other defconfig files that explicitly call out CLASSIC_RCU, but which have comment headers saying not to edit them. Probably holdovers from one of the flavors of "make config", but... Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: akpm@linux-foundation.org Cc: dipankar@in.ibm.com Cc: niv@us.ibm.com Cc: manfred@colorfullife.com Cc: peterz@infradead.org LKML-Reference: <20090403040625.GA9473@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
07fe7cb7 |
|
03-Apr-2009 |
David Howells <dhowells@redhat.com> |
Create a dynamically sized pool of threads for doing very slow work items Create a dynamically sized pool of threads for doing very slow work items, such as invoking mkdir() or rmdir() - things that may take a long time and may sleep, holding mutexes/semaphores and hogging a thread, and are thus unsuitable for workqueues. The number of threads is always at least a settable minimum, but more are started when there's more work to do, up to a limit. Because of the nature of the load, it's not suitable for a 1-thread-per-CPU type pool. A system with one CPU may well want several threads. This is used by FS-Cache to do slow caching operations in the background, such as looking up, creating or deleting cache objects. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Acked-by: Steve Dickson <steved@redhat.com> Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Tested-by: Daire Byrne <Daire.Byrne@framestore.com>
|
#
db7f47cf |
|
02-Apr-2009 |
Paul Menage <menage@google.com> |
cpusets: allow cpusets to be configured/built on non-SMP systems Allow cpusets to be configured/built on non-SMP systems Currently it's impossible to build cpusets under UML on x86-64, since cpusets depends on SMP and x86-64 UML doesn't support SMP. There's code in cpusets that doesn't depend on SMP. This patch surrounds the minimum amount of cpusets code with #ifdef CONFIG_SMP in order to allow cpusets to build/run on UP systems (for testing purposes under UML). Reviewed-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Paul Menage <menage@google.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
627991a2 |
|
02-Apr-2009 |
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> |
memcg: remove redundant message at swapon It's pointed out that swap_cgroup's message at swapon() is nonsense. Because * It can be calculated very easily if all necessary information is written in Kconfig. * It's not necessary to annoying people at every swapon(). In other view, now, memory usage per swp_entry is reduced to 2bytes from 8bytes(64bit) and I think it's reasonably small. Reported-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
21acb9ca |
|
04-Feb-2009 |
Thadeu Lima de Souza Cascardo <cascardo@holoscopio.com> |
trivial: fix where cgroup documentation is not correctly referred to cgroup documentation was moved to Documentation/cgroups/. There are some places that still refer to Documentation/controllers/, Documentation/cgroups.txt and Documentation/cpusets.txt. Fix those. Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@holoscopio.com> Reviewed-by: Li Zefan <lizf@cn.fujitsu.com> Acked-by: Paul Menage <menage@google.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
|
#
692105b8 |
|
26-Jan-2009 |
Matt LaPlante <kernel1@cyberdogtech.com> |
trivial: fix typos/grammar errors in Kconfig texts Signed-off-by: Matt LaPlante <kernel1@cyberdogtech.com> Acked-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
|
#
b943c460 |
|
10-Mar-2009 |
Randy Dunlap <randy.dunlap@oracle.com> |
menu: fix embedded menu snafu The COMPAT_BRK kconfig symbol does not depend on EMBEDDED, but it is in the midst of the EMBEDDED menu symbols, so it mucks up the EMBEDDED menu. Fix by moving it to just after all of the EMBEDDED menu symbols. Also, ANON_INODES has a similar problem, so move it to just above the EMBEDDED menu items since it is used in the EMBEDDED menu. Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
2450cf51 |
|
02-Mar-2009 |
Linus Torvalds <torvalds@linux-foundation.org> |
Revert "menu: fix embedded menu snafu" This reverts commit 155b25bcc28631a5b5230191aa3f56c40dfffa3f, which was totally wrong - the "embedded" options still exists (very much so) even on non-embedded platforms. It's just that we don't bother with actually asking about them when we're not embedded, we just take their default values (which is usually 'y' - the options add features that may not be worth it in a constrained environment). Noticed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
155b25bc |
|
02-Mar-2009 |
Randy Dunlap <randy.dunlap@oracle.com> |
menu: fix embedded menu snafu The COMPAT_BRK kconfig symbol does not depend on EMBEDDED, but it is in the midst of the EMBEDDED menu symbols, so it mucks up the EMBEDDED menu. Fix by moving it to just after all of the EMBEDDED menu symbols. Also, surround all of the EMBEDDED symbols with "if EMBEDDED"/"endif" so that this EMBEDDED block is clearer. Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
91f73f90 |
|
20-Feb-2009 |
Frederic Weisbecker <fweisbec@gmail.com> |
tracing/markers: make markers select tracepoints Sometimes it happens that KConfig dependencies are not handled like in the following scenario: - config A bool - config B bool depends on A - config C bool select B If one selects C, then it will select B without checking its dependency to A, if A hasn't been selected elsewhere, it will result in a build failure. This is what happens on the following build error: kernel/built-in.o: In function `marker_update_probe_range': (.text+0x52f64): undefined reference to `tracepoint_probe_register_noupdate' kernel/built-in.o: In function `marker_update_probe_range': (.text+0x52f74): undefined reference to `tracepoint_probe_unregister_noupdate' kernel/built-in.o: In function `marker_update_probe_range': (.text+0x52fb9): undefined reference to `tracepoint_probe_unregister_noupdate' kernel/built-in.o: In function `marker_update_probes': marker.c:(.text+0x530ba): undefined reference to `tracepoint_probe_update_all' CONFIG_KVM_TRACE will select CONFIG_MARKER, but the latter depends on CONFIG_TRACEPOINTS which will not be selected. Reported-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
d6eb633f |
|
26-Jan-2009 |
Matt Helsley <matthltc@us.ibm.com> |
net: Move config NET_NS to from net/Kconfig to init/Kconfig Make NET_NS available underneath the generic Namespaces config option since all of the other namespace options are there. Signed-off-by: Matt Helsley <matthltc@us.ibm.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ceacc2c1 |
|
16-Jan-2009 |
Peter Zijlstra <peterz@infradead.org> |
sched: make plist a library facility Ingo Molnar wrote: > here's a new build failure with tip/sched/rt: > > LD .tmp_vmlinux1 > kernel/built-in.o: In function `set_curr_task_rt': > sched.c:(.text+0x3675): undefined reference to `plist_del' > kernel/built-in.o: In function `pick_next_task_rt': > sched.c:(.text+0x37ce): undefined reference to `plist_del' > kernel/built-in.o: In function `enqueue_pushable_task': > sched.c:(.text+0x381c): undefined reference to `plist_del' Eliminate the plist library kconfig and make it available unconditionally. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
45ce80fb |
|
15-Jan-2009 |
Li Zefan <lizf@cn.fujitsu.com> |
cgroups: consolidate cgroup documents Move Documentation/cpusets.txt and Documentation/controllers/* to Documentation/cgroups/ Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com> Acked-by: Paul Menage <menage@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
23964d2d |
|
15-Jan-2009 |
Li Zefan <lizf@cn.fujitsu.com> |
cgroups: clean up Kconfig - move CONFIG_PROC_PID_CPUSET into cgroup menu - move MM_OWNER to the bottom for better menu indent - fix typos - use tabs not spaces Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Acked-by: Paul Menage <menage@google.com> Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
c903ff83 |
|
15-Jan-2009 |
Mike Travis <travis@sgi.com> |
rcu: move Kconfig menu Move RCU Kconfig options from top-level menu to an "RCU Subsystem" menu under the "General Setup" menu. Signed-off-by: Mike Travis <travis@sgi.com> Tested-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
2ea03891 |
|
14-Jan-2009 |
Sam Ravnborg <sam@ravnborg.org> |
Revert "kbuild: strip generated symbols from *.ko" This reverts commit ad7a953c522ceb496611d127e51e278bfe0ff483. And commit: ("allow stripping of generated symbols under CONFIG_KALLSYMS_ALL") 9bb482476c6c9d1ae033306440c51ceac93ea80c These stripping patches has caused a set of issues: 1) People have reported compatibility issues with binutils due to lack of support for `--strip-unneeded-symbols' with objcopy 2.15.92.0.2 Reported by: Wenji 2) ccache and distcc no longer works as expeced Reported by: Ted, Roland, + others 3) The installed modules increased a lot in size Reported by: Ted, Davej + others Reported-by: Wenji Huang <wenji.huang@oracle.com> Reported-by: "Theodore Ts'o" <tytso@mit.edu> Reported-by: Dave Jones <davej@redhat.com> Reported-by: Roland McGrath <roland@redhat.com> Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
|
#
c077719b |
|
07-Jan-2009 |
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> |
memcg: mem+swap controller Kconfig Config and control variable for mem+swap controller. This patch adds CONFIG_CGROUP_MEM_RES_CTLR_SWAP (memory resource controller swap extension.) For accounting swap, it's obvious that we have to use additional memory to remember "who uses swap". This adds more overhead. So, it's better to offer "choice" to users. This patch adds 2 choices. This patch adds 2 parameters to enable swap extension or not. - CONFIG - boot option Reviewed-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
c9d5409f |
|
07-Jan-2009 |
Li Zefan <lizf@cn.fujitsu.com> |
memcg: fix a typo in Kconfig s/contoller/controller/ Signed-of-by: Li Zefan <lizf@cn.fujitsu.com> Cc: Paul Menage <menage@google.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
5cdc38f9 |
|
07-Jan-2009 |
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> |
cgroups: make cgroup config a submenu Making CGROUP related configs be a sub-menu. This patch make CGROUP related configs be a sub-menu and makes 1st level configs of "General Setup" shorter. including following additional changes - add help comment about CGROUPS and GROUP_SCHED. - moved MM_OWNER config to the bottom. (for good indent in menuconfig) Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Reviewed-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Paul Menage <menage@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
853ac43a |
|
06-Jan-2009 |
Matt Mackall <mpm@selenic.com> |
shmem: unify regular and tiny shmem tiny-shmem shares most of its 130 lines of code with shmem and tends to break when particular bits of shmem get modified. Unifying saves code and makes keeping these two in sync much easier. before: 14367 392 24 14783 39bf mm/shmem.o 396 72 8 476 1dc mm/tiny-shmem.o after: 14367 392 24 14783 39bf mm/shmem.o 412 72 8 492 1ec mm/shmem.o tiny Signed-off-by: Matt Mackall <mpm@selenic.com> Acked-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
fce3e804 |
|
01-Nov-2008 |
Kay Sievers <kay.sievers@vrfy.org> |
sysfs: clarify SYSFS_DEPRECATED help text This should make the help text of SYSFS_DEPRECATED more clear, that this is _not_ about (what some people think it is) suppressing a few symlinks and variables, but a different sysfs _layout_ with new features. Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
#
2e9f3bdd |
|
04-Jan-2009 |
H. Peter Anvin <hpa@zytor.com> |
bzip2/lzma: make config machinery an arch configurable Impact: Bug fix (we should not show this menu on irrelevant architectures) Make the config machinery to drive the gzip/bzip2/lzma selection dependent on the architecture advertising HAVE_KERNEL_* so that we don't display this for architectures where it doesn't matter. Signed-off-by: H. Peter Anvin <hpa@zytor.com>
|
#
30d65dbf |
|
04-Jan-2009 |
Alain Knaff <alain@knaff.lu> |
bzip2/lzma: config and initramfs support for bzip2/lzma decompression Impact: New code for initramfs decompression, new features This is the second part of the bzip2/lzma patch The bzip patch is based on an idea by Christian Ludwig, includes support for compressing the kernel with bzip2 or lzma rather than gzip. Both compressors give smaller sizes than gzip. Lzma's decompresses faster than bzip2. It also supports ramdisks and initramfs' compressed using these two compressors. The functionality has been successfully used for a couple of years by the udpcast project This version applies to "tip" kernel 2.6.28 This part contains: - support for new compressions (bzip2 and lzma) in initramfs and old-style ramdisk - config dialog for kernel compression (but new kernel compressions not yet supported) Signed-off-by: Alain Knaff <alain@knaff.lu> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
|
#
a327ca2c |
|
08-Jul-2008 |
Johannes Berg <johannes@sipsolutions.net> |
remove CONFIG_KMOD Now that nothing depends on it any more, remove CONFIG_KMOD. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
#
12d79baf |
|
25-Dec-2008 |
Ingo Molnar <mingo@elte.hu> |
rcu: provide RCU options on non-preempt architectures too Impact: build fix Some old architectures still do not use kernel/Kconfig.preempt, so the moving of the RCU options there broke their build: In file included from /home/mingo/tip/include/linux/sem.h:81, from /home/mingo/tip/include/linux/sched.h:69, from /home/mingo/tip/arch/alpha/kernel/asm-offsets.c:9: /home/mingo/tip/include/linux/rcupdate.h:62:2: error: #error "Unknown RCU implementation specified to kernel configuration" Move these options back to init/Kconfig, which every architecture includes. Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
9bb48247 |
|
16-Dec-2008 |
Jan Beulich <jbeulich@novell.com> |
allow stripping of generated symbols under CONFIG_KALLSYMS_ALL Building upon parts of the module stripping patch, this patch introduces similar stripping for vmlinux when CONFIG_KALLSYMS_ALL=y. Using CONFIG_KALLSYMS_STRIP_GENERATED reduces the overhead of CONFIG_KALLSYMS_ALL from 245k/310k to 65k/80k for the (i386/x86-64) kernels I tested with. The patch also does away with the need to special case the kallsyms- internal symbols by making them available even in the first linking stage. While it is a generated file, the patch includes the changes to scripts/genksyms/keywords.c_shipped, as I'm unsure what the procedure here is. Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
|
#
64db4cff |
|
18-Dec-2008 |
Paul E. McKenney <paulmck@kernel.org> |
"Tree RCU": scalable classic RCU implementation This patch fixes a long-standing performance bug in classic RCU that results in massive internal-to-RCU lock contention on systems with more than a few hundred CPUs. Although this patch creates a separate flavor of RCU for ease of review and patch maintenance, it is intended to replace classic RCU. This patch still handles stress better than does mainline, so I am still calling it ready for inclusion. This patch is against the -tip tree. Nevertheless, experience on an actual 1000+ CPU machine would still be most welcome. Most of the changes noted below were found while creating an rcutiny (which should permit ejecting the current rcuclassic) and while doing detailed line-by-line documentation. Updates from v9 (http://lkml.org/lkml/2008/12/2/334): o Fixes from remainder of line-by-line code walkthrough, including comment spelling, initialization, undesirable narrowing due to type conversion, removing redundant memory barriers, removing redundant local-variable initialization, and removing redundant local variables. I do not believe that any of these fixes address the CPU-hotplug issues that Andi Kleen was seeing, but please do give it a whirl in case the machine is smarter than I am. A writeup from the walkthrough may be found at the following URL, in case you are suffering from terminal insomnia or masochism: http://www.kernel.org/pub/linux/kernel/people/paulmck/tmp/rcutree-walkthrough.2008.12.16a.pdf o Made rcutree tracing use seq_file, as suggested some time ago by Lai Jiangshan. o Added a .csv variant of the rcudata debugfs trace file, to allow people having thousands of CPUs to drop the data into a spreadsheet. Tested with oocalc and gnumeric. Updated documentation to suit. Updates from v8 (http://lkml.org/lkml/2008/11/15/139): o Fix a theoretical race between grace-period initialization and force_quiescent_state() that could occur if more than three jiffies were required to carry out the grace-period initialization. Which it might, if you had enough CPUs. o Apply Ingo's printk-standardization patch. o Substitute local variables for repeated accesses to global variables. o Fix comment misspellings and redundant (but harmless) increments of ->n_rcu_pending (this latter after having explicitly added it). o Apply checkpatch fixes. Updates from v7 (http://lkml.org/lkml/2008/10/10/291): o Fixed a number of problems noted by Gautham Shenoy, including the cpu-stall-detection bug that he was having difficulty convincing me was real. ;-) o Changed cpu-stall detection to wait for ten seconds rather than three in order to reduce false positive, as suggested by Ingo Molnar. o Produced a design document (http://lwn.net/Articles/305782/). The act of writing this document uncovered a number of both theoretical and "here and now" bugs as noted below. o Fix dynticks_nesting accounting confusion, simplify WARN_ON() condition, fix kerneldoc comments, and add memory barriers in dynticks interface functions. o Add more data to tracing. o Remove unused "rcu_barrier" field from rcu_data structure. o Count calls to rcu_pending() from scheduling-clock interrupt to use as a surrogate timebase should jiffies stop counting. o Fix a theoretical race between force_quiescent_state() and grace-period initialization. Yes, initialization does have to go on for some jiffies for this race to occur, but given enough CPUs... Updates from v6 (http://lkml.org/lkml/2008/9/23/448): o Fix a number of checkpatch.pl complaints. o Apply review comments from Ingo Molnar and Lai Jiangshan on the stall-detection code. o Fix several bugs in !CONFIG_SMP builds. o Fix a misspelled config-parameter name so that RCU now announces at boot time if stall detection is configured. o Run tests on numerous combinations of configurations parameters, which after the fixes above, now build and run correctly. Updates from v5 (http://lkml.org/lkml/2008/9/15/92, bad subject line): o Fix a compiler error in the !CONFIG_FANOUT_EXACT case (blew a changeset some time ago, and finally got around to retesting this option). o Fix some tracing bugs in rcupreempt that caused incorrect totals to be printed. o I now test with a more brutal random-selection online/offline script (attached). Probably more brutal than it needs to be on the people reading it as well, but so it goes. o A number of optimizations and usability improvements: o Make rcu_pending() ignore the grace-period timeout when there is no grace period in progress. o Make force_quiescent_state() avoid going for a global lock in the case where there is no grace period in progress. o Rearrange struct fields to improve struct layout. o Make call_rcu() initiate a grace period if RCU was idle, rather than waiting for the next scheduling clock interrupt. o Invoke rcu_irq_enter() and rcu_irq_exit() only when idle, as suggested by Andi Kleen. I still don't completely trust this change, and might back it out. o Make CONFIG_RCU_TRACE be the single config variable manipulated for all forms of RCU, instead of the prior confusion. o Document tracing files and formats for both rcupreempt and rcutree. Updates from v4 for those missing v5 given its bad subject line: o Separated dynticks interface so that NMIs and irqs call separate functions, greatly simplifying it. In particular, this code no longer requires a proof of correctness. ;-) o Separated dynticks state out into its own per-CPU structure, avoiding the duplicated accounting. o The case where a dynticks-idle CPU runs an irq handler that invokes call_rcu() is now correctly handled, forcing that CPU out of dynticks-idle mode. o Review comments have been applied (thank you all!!!). For but one example, fixed the dynticks-ordering issue that Manfred pointed out, saving me much debugging. ;-) o Adjusted rcuclassic and rcupreempt to handle dynticks changes. Attached is an updated patch to Classic RCU that applies a hierarchy, greatly reducing the contention on the top-level lock for large machines. This passes 10-hour concurrent rcutorture and online-offline testing on 128-CPU ppc64 without dynticks enabled, and exposes some timekeeping bugs in presence of dynticks (exciting working on a system where "sleep 1" hangs until interrupted...), which were fixed in the 2.6.27 kernel. It is getting more reliable than mainline by some measures, so the next version will be against -tip for inclusion. See also Manfred Spraul's recent patches (or his earlier work from 2004 at http://marc.info/?l=linux-kernel&m=108546384711797&w=2). We will converge onto a common patch in the fullness of time, but are currently exploring different regions of the design space. That said, I have already gratefully stolen quite a few of Manfred's ideas. This patch provides CONFIG_RCU_FANOUT, which controls the bushiness of the RCU hierarchy. Defaults to 32 on 32-bit machines and 64 on 64-bit machines. If CONFIG_NR_CPUS is less than CONFIG_RCU_FANOUT, there is no hierarchy. By default, the RCU initialization code will adjust CONFIG_RCU_FANOUT to balance the hierarchy, so strongly NUMA architectures may choose to set CONFIG_RCU_FANOUT_EXACT to disable this balancing, allowing the hierarchy to be exactly aligned to the underlying hardware. Up to two levels of hierarchy are permitted (in addition to the root node), allowing up to 16,384 CPUs on 32-bit systems and up to 262,144 CPUs on 64-bit systems. I just know that I am going to regret saying this, but this seems more than sufficient for the foreseeable future. (Some architectures might wish to set CONFIG_RCU_FANOUT=4, which would limit such architectures to 64 CPUs. If this becomes a real problem, additional levels can be added, but I doubt that it will make a significant difference on real hardware.) In the common case, a given CPU will manipulate its private rcu_data structure and the rcu_node structure that it shares with its immediate neighbors. This can reduce both lock and memory contention by multiple orders of magnitude, which should eliminate the need for the strange manipulations that are reported to be required when running Linux on very large systems. Some shortcomings: o More bugs will probably surface as a result of an ongoing line-by-line code inspection. Patches will be provided as required. o There are probably hangs, rcutorture failures, &c. Seems quite stable on a 128-CPU machine, but that is kind of small compared to 4096 CPUs. However, seems to do better than mainline. Patches will be provided as required. o The memory footprint of this version is several KB larger than rcuclassic. A separate UP-only rcutiny patch will be provided, which will reduce the memory footprint significantly, even compared to the old rcuclassic. One such patch passes light testing, and has a memory footprint smaller even than rcuclassic. Initial reaction from various embedded guys was "it is not worth it", so am putting it aside. Credits: o Manfred Spraul for ideas, review comments, and bugs spotted, as well as some good friendly competition. ;-) o Josh Triplett, Ingo Molnar, Peter Zijlstra, Mathieu Desnoyers, Lai Jiangshan, Andi Kleen, Andy Whitcroft, and Andrew Morton for reviews and comments. o Thomas Gleixner for much-needed help with some timer issues (see patches below). o Jon M. Tollefson, Tim Pepper, Andrew Theurer, Jose R. Santos, Andy Whitcroft, Darrick Wong, Nishanth Aravamudan, Anton Blanchard, Dave Kleikamp, and Nathan Lynch for keeping machines alive despite my heavy abuse^Wtesting. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
98a79d6a |
|
13-Dec-2008 |
Rusty Russell <rusty@rustcorp.com.au> |
cpumask: centralize cpu_online_map and cpu_possible_map Impact: cleanup Each SMP arch defines these themselves. Move them to a central location. Twists: 1) Some archs (m32, parisc, s390) set possible_map to all 1, so we add a CONFIG_INIT_ALL_POSSIBLE for this rather than break them. 2) mips and sparc32 '#define cpu_possible_map phys_cpu_present_map'. Those archs simply have phys_cpu_present_map replaced everywhere. 3) Alpha defined cpu_possible_map to cpu_present_map; this is tricky so I just manipulate them both in sync. 4) IA64, cris and m32r have gratuitous 'extern cpumask_t cpu_possible_map' declarations. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Reviewed-by: Grant Grundler <grundler@parisc-linux.org> Tested-by: Tony Luck <tony.luck@intel.com> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: Mike Travis <travis@sgi.com> Cc: ink@jurassic.park.msu.ru Cc: rmk@arm.linux.org.uk Cc: starvik@axis.com Cc: tony.luck@intel.com Cc: takata@linux-m32r.org Cc: ralf@linux-mips.org Cc: grundler@parisc-linux.org Cc: paulus@samba.org Cc: schwidefsky@de.ibm.com Cc: lethal@linux-sh.org Cc: wli@holomorphy.com Cc: davem@davemloft.net Cc: jdike@addtoit.com Cc: mingo@redhat.com
|
#
4c59e467 |
|
08-Dec-2008 |
Ingo Molnar <mingo@elte.hu> |
perfcounters: select ANON_INODES The perfcounters subsystem depends on CONFIG_ANON_INODES facilities, so make sure it's selected. Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
0793a61d |
|
04-Dec-2008 |
Thomas Gleixner <tglx@linutronix.de> |
performance counters: core code Implement the core kernel bits of Performance Counters subsystem. The Linux Performance Counter subsystem provides an abstraction of performance counter hardware capabilities. It provides per task and per CPU counters, and it provides event capabilities on top of those. Performance counters are accessed via special file descriptors. There's one file descriptor per virtual counter used. The special file descriptor is opened via the perf_counter_open() system call: int perf_counter_open(u32 hw_event_type, u32 hw_event_period, u32 record_type, pid_t pid, int cpu); The syscall returns the new fd. The fd can be used via the normal VFS system calls: read() can be used to read the counter, fcntl() can be used to set the blocking mode, etc. Multiple counters can be kept open at a time, and the counters can be poll()ed. See more details in Documentation/perf-counters.txt. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
c1df1bd2 |
|
14-Nov-2008 |
Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> |
markers: auto enable tracepoints (new API : trace_mark_tp()) Impact: new API Add a new API trace_mark_tp(), which declares a marker within a tracepoint probe. When the marker is activated, the tracepoint is automatically enabled. No branch test is used at the marker site, because it would be a duplicate of the branch already present in the tracepoint. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
02f56210 |
|
05-Nov-2008 |
Simon Arlott <simon@octiron.net> |
Kconfig: SLUB is the default slab allocator In 2007, a0acd820807680d2ccc4ef3448387fcdbf152c73 changed the default slab allocator to SLUB, but the SLAB help text still says SLAB is the default. This change fixes that. Signed-off-by: Simon Arlott <simon@fire.lp0.eu> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
|
#
2fe401e3 |
|
12-Nov-2008 |
Adrian Knoth <adi@drcomp.erfurt.thur.de> |
sched: correct sched-rt-group.txt pathname in init/Kconfig init/Kconfig directs the user to Documentation/sched-rt-group.txt, but the file is actually in Documentation/scheduler/sched-rt-group.txt. This patch corrects the pathname mentioned in init/Kconfig. Signed-off-by: Adrian Knoth <adi@drcomp.erfurt.thur.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
84ad6d70 |
|
29-Oct-2008 |
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> |
memcg: update menuconfig help text page_cgroup is now allocated at boot and memmap doesn't includes pointer for page_cgroup. Fix the menu help text. Reviewed-by: Balbir Singh <balbir@linux.vnet.ibm.com> Signed-off-by: KAMEZAWA hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
61cfc7e4 |
|
22-Oct-2008 |
Geert Uytterhoeven <geert@linux-m68k.org> |
PCI: PCI_QUIRKS depends on PCI commit 3d137310245e4cdc3e8c8ba1bea2e145a87ae8e3 ("PCI: allow quirks to be compiled out") introduced CONFIG_PCI_QUIRKS, which now shows up in each and every .config. Fix this by making it depend on PCI. Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
|
#
3d137310 |
|
19-Aug-2008 |
Thomas Petazzoni <thomas.petazzoni@enix.org> |
PCI: allow quirks to be compiled out This patch adds the CONFIG_PCI_QUIRKS option which allows to remove all the PCI quirks, which are not necessarily used on embedded systems when PCI is working properly. As this is a size-reduction option, it depends on CONFIG_EMBEDDED. It allows to save almost 12 kilobytes of kernel code: text data bss dec hex filename 1287806 123596 212992 1624394 18c94a vmlinux.old 1275854 123596 212992 1612442 189a9a vmlinux -11952 0 0 -11952 -2EB0 +/- This patch has originally been written by Zwane Mwaikambo <zwane@arm.linux.org.uk> and is part of the Linux Tiny project. Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
|
#
dc52ddc0 |
|
18-Oct-2008 |
Matt Helsley <matthltc@us.ibm.com> |
container freezer: implement freezer cgroup subsystem This patch implements a new freezer subsystem in the control groups framework. It provides a way to stop and resume execution of all tasks in a cgroup by writing in the cgroup filesystem. The freezer subsystem in the container filesystem defines a file named freezer.state. Writing "FROZEN" to the state file will freeze all tasks in the cgroup. Subsequently writing "RUNNING" will unfreeze the tasks in the cgroup. Reading will return the current state. * Examples of usage : # mkdir /containers/freezer # mount -t cgroup -ofreezer freezer /containers # mkdir /containers/0 # echo $some_pid > /containers/0/tasks to get status of the freezer subsystem : # cat /containers/0/freezer.state RUNNING to freeze all tasks in the container : # echo FROZEN > /containers/0/freezer.state # cat /containers/0/freezer.state FREEZING # cat /containers/0/freezer.state FROZEN to unfreeze all tasks in the container : # echo RUNNING > /containers/0/freezer.state # cat /containers/0/freezer.state RUNNING This is the basic mechanism which should do the right thing for user space task in a simple scenario. It's important to note that freezing can be incomplete. In that case we return EBUSY. This means that some tasks in the cgroup are busy doing something that prevents us from completely freezing the cgroup at this time. After EBUSY, the cgroup will remain partially frozen -- reflected by freezer.state reporting "FREEZING" when read. The state will remain "FREEZING" until one of these things happens: 1) Userspace cancels the freezing operation by writing "RUNNING" to the freezer.state file 2) Userspace retries the freezing operation by writing "FROZEN" to the freezer.state file (writing "FREEZING" is not legal and returns EIO) 3) The tasks that blocked the cgroup from entering the "FROZEN" state disappear from the cgroup's set of tasks. [akpm@linux-foundation.org: coding-style fixes] [akpm@linux-foundation.org: export thaw_process] Signed-off-by: Cedric Le Goater <clg@fr.ibm.com> Signed-off-by: Matt Helsley <matthltc@us.ibm.com> Acked-by: Serge E. Hallyn <serue@us.ibm.com> Tested-by: Matt Helsley <matthltc@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
ebf3f09c |
|
15-Oct-2008 |
Thomas Petazzoni <thomas.petazzoni@free-electrons.com> |
Configure out AIO support This patchs adds the CONFIG_AIO option which allows to remove support for asynchronous I/O operations, that are not necessarly used by applications, particularly on embedded devices. As this is a size-reduction option, it depends on CONFIG_EMBEDDED. It allows to save ~7 kilobytes of kernel code/data: text data bss dec hex filename 1115067 119180 217088 1451335 162547 vmlinux 1108025 119048 217088 1444161 160941 vmlinux.new -7042 -132 0 -7174 -1C06 +/- This patch has been originally written by Matt Mackall <mpm@selenic.com>, and is part of the Linux Tiny project. [randy.dunlap@oracle.com: build fix] Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Zach Brown <zach.brown@oracle.com> Signed-off-by: Matt Mackall <mpm@selenic.com> Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
5f87f112 |
|
23-Jul-2008 |
Ingo Molnar <mingo@elte.hu> |
tracing: clean up tracepoints kconfig structure do not expose users to CONFIG_TRACEPOINTS - tracers can select it just fine. update ftrace to select CONFIG_TRACEPOINTS. Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
fa340d9c |
|
23-Jul-2008 |
Ingo Molnar <mingo@elte.hu> |
tracing: disable tracepoints by default while it's arguably low overhead, we dont enable new features by default. Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
97e1c18e |
|
17-Jul-2008 |
Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> |
tracing: Kernel Tracepoints Implementation of kernel tracepoints. Inspired from the Linux Kernel Markers. Allows complete typing verification by declaring both tracing statement inline functions and probe registration/unregistration static inline functions within the same macro "DEFINE_TRACE". No format string is required. See the tracepoint Documentation and Samples patches for usage examples. Taken from the documentation patch : "A tracepoint placed in code provides a hook to call a function (probe) that you can provide at runtime. A tracepoint can be "on" (a probe is connected to it) or "off" (no probe is attached). When a tracepoint is "off" it has no effect, except for adding a tiny time penalty (checking a condition for a branch) and space penalty (adding a few bytes for the function call at the end of the instrumented function and adds a data structure in a separate section). When a tracepoint is "on", the function you provide is called each time the tracepoint is executed, in the execution context of the caller. When the function provided ends its execution, it returns to the caller (continuing from the tracepoint site). You can put tracepoints at important locations in the code. They are lightweight hooks that can pass an arbitrary number of parameters, which prototypes are described in a tracepoint declaration placed in a header file." Addition and removal of tracepoints is synchronized by RCU using the scheduler (and preempt_disable) as guarantees to find a quiescent state (this is really RCU "classic"). The update side uses rcu_barrier_sched() with call_rcu_sched() and the read/execute side uses "preempt_disable()/preempt_enable()". We make sure the previous array containing probes, which has been scheduled for deletion by the rcu callback, is indeed freed before we proceed to the next update. It therefore limits the rate of modification of a single tracepoint to one update per RCU period. The objective here is to permit fast batch add/removal of probes on _different_ tracepoints. Changelog : - Use #name ":" #proto as string to identify the tracepoint in the tracepoint table. This will make sure not type mismatch happens due to connexion of a probe with the wrong type to a tracepoint declared with the same name in a different header. - Add tracepoint_entry_free_old. - Change __TO_TRACE to get rid of the 'i' iterator. Masami Hiramatsu <mhiramat@redhat.com> : Tested on x86-64. Performance impact of a tracepoint : same as markers, except that it adds about 70 bytes of instructions in an unlikely branch of each instrumented function (the for loop, the stack setup and the function call). It currently adds a memory read, a test and a conditional branch at the instrumentation site (in the hot path). Immediate values will eventually change this into a load immediate, test and branch, which removes the memory read which will make the i-cache impact smaller (changing the memory read for a load immediate removes 3-4 bytes per site on x86_32 (depending on mov prefixes), or 7-8 bytes on x86_64, it also saves the d-cache hit). About the performance impact of tracepoints (which is comparable to markers), even without immediate values optimizations, tests done by Hideo Aoki on ia64 show no regression. His test case was using hackbench on a kernel where scheduler instrumentation (about 5 events in code scheduler code) was added. Quoting Hideo Aoki about Markers : I evaluated overhead of kernel marker using linux-2.6-sched-fixes git tree, which includes several markers for LTTng, using an ia64 server. While the immediate trace mark feature isn't implemented on ia64, there is no major performance regression. So, I think that we don't have any issues to propose merging marker point patches into Linus's tree from the viewpoint of performance impact. I prepared two kernels to evaluate. The first one was compiled without CONFIG_MARKERS. The second one was enabled CONFIG_MARKERS. I downloaded the original hackbench from the following URL: http://devresources.linux-foundation.org/craiger/hackbench/src/hackbench.c I ran hackbench 5 times in each condition and calculated the average and difference between the kernels. The parameter of hackbench: every 50 from 50 to 800 The number of CPUs of the server: 2, 4, and 8 Below is the results. As you can see, major performance regression wasn't found in any case. Even if number of processes increases, differences between marker-enabled kernel and marker- disabled kernel doesn't increase. Moreover, if number of CPUs increases, the differences doesn't increase either. Curiously, marker-enabled kernel is better than marker-disabled kernel in more than half cases, although I guess it comes from the difference of memory access pattern. * 2 CPUs Number of | without | with | diff | diff | processes | Marker [Sec] | Marker [Sec] | [Sec] | [%] | -------------------------------------------------------------- 50 | 4.811 | 4.872 | +0.061 | +1.27 | 100 | 9.854 | 10.309 | +0.454 | +4.61 | 150 | 15.602 | 15.040 | -0.562 | -3.6 | 200 | 20.489 | 20.380 | -0.109 | -0.53 | 250 | 25.798 | 25.652 | -0.146 | -0.56 | 300 | 31.260 | 30.797 | -0.463 | -1.48 | 350 | 36.121 | 35.770 | -0.351 | -0.97 | 400 | 42.288 | 42.102 | -0.186 | -0.44 | 450 | 47.778 | 47.253 | -0.526 | -1.1 | 500 | 51.953 | 52.278 | +0.325 | +0.63 | 550 | 58.401 | 57.700 | -0.701 | -1.2 | 600 | 63.334 | 63.222 | -0.112 | -0.18 | 650 | 68.816 | 68.511 | -0.306 | -0.44 | 700 | 74.667 | 74.088 | -0.579 | -0.78 | 750 | 78.612 | 79.582 | +0.970 | +1.23 | 800 | 85.431 | 85.263 | -0.168 | -0.2 | -------------------------------------------------------------- * 4 CPUs Number of | without | with | diff | diff | processes | Marker [Sec] | Marker [Sec] | [Sec] | [%] | -------------------------------------------------------------- 50 | 2.586 | 2.584 | -0.003 | -0.1 | 100 | 5.254 | 5.283 | +0.030 | +0.56 | 150 | 8.012 | 8.074 | +0.061 | +0.76 | 200 | 11.172 | 11.000 | -0.172 | -1.54 | 250 | 13.917 | 14.036 | +0.119 | +0.86 | 300 | 16.905 | 16.543 | -0.362 | -2.14 | 350 | 19.901 | 20.036 | +0.135 | +0.68 | 400 | 22.908 | 23.094 | +0.186 | +0.81 | 450 | 26.273 | 26.101 | -0.172 | -0.66 | 500 | 29.554 | 29.092 | -0.461 | -1.56 | 550 | 32.377 | 32.274 | -0.103 | -0.32 | 600 | 35.855 | 35.322 | -0.533 | -1.49 | 650 | 39.192 | 38.388 | -0.804 | -2.05 | 700 | 41.744 | 41.719 | -0.025 | -0.06 | 750 | 45.016 | 44.496 | -0.520 | -1.16 | 800 | 48.212 | 47.603 | -0.609 | -1.26 | -------------------------------------------------------------- * 8 CPUs Number of | without | with | diff | diff | processes | Marker [Sec] | Marker [Sec] | [Sec] | [%] | -------------------------------------------------------------- 50 | 2.094 | 2.072 | -0.022 | -1.07 | 100 | 4.162 | 4.273 | +0.111 | +2.66 | 150 | 6.485 | 6.540 | +0.055 | +0.84 | 200 | 8.556 | 8.478 | -0.078 | -0.91 | 250 | 10.458 | 10.258 | -0.200 | -1.91 | 300 | 12.425 | 12.750 | +0.325 | +2.62 | 350 | 14.807 | 14.839 | +0.032 | +0.22 | 400 | 16.801 | 16.959 | +0.158 | +0.94 | 450 | 19.478 | 19.009 | -0.470 | -2.41 | 500 | 21.296 | 21.504 | +0.208 | +0.98 | 550 | 23.842 | 23.979 | +0.137 | +0.57 | 600 | 26.309 | 26.111 | -0.198 | -0.75 | 650 | 28.705 | 28.446 | -0.259 | -0.9 | 700 | 31.233 | 31.394 | +0.161 | +0.52 | 750 | 34.064 | 33.720 | -0.344 | -1.01 | 800 | 36.320 | 36.114 | -0.206 | -0.57 | -------------------------------------------------------------- Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Acked-by: Masami Hiramatsu <mhiramat@redhat.com> Acked-by: 'Peter Zijlstra' <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
53167a3e |
|
02-Oct-2008 |
Alexey Dobriyan <adobriyan@gmail.com> |
proc: move PROC_PAGE_MONITOR to fs/proc/Kconfig Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
|
#
9e94cd32 |
|
15-Aug-2008 |
Andi Kleen <ak@linux.intel.com> |
Move sysctl check into debugging section and don't make it default y I noticed that sysctl_check.o was the largest object file in a allnoconfig build in kernel/*. 36243 0 0 36243 8d93 kernel/sysctl_check.o This is because it was default y and && EMBEDDED. But I don't really see a need for a non kernel developer to have their sysctls checked all the time. So move the Kconfig into the kernel debugging section and also drop the default y and the EMBEDDED check. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
0b0de144 |
|
04-Aug-2008 |
Robert P. J. Day <rpjday@crashcourse.ca> |
Kconfig: Extend "menuconfig" for modules to simplify Kconfig file Given that the init/Kconfig file uses a "menuconfig" directive for modules already, might as well wrap all the submenu entries in an "if" to toss all those dependencies. Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca> Acked-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
|
#
775a7229 |
|
15-Jul-2008 |
jkacur <jkacur@gmail.com> |
Kconfig/init: change help text to match default value Change the "If unsure" message to match the default value. Signed-off-by: John Kacur <jkacur at gmail dot com> Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
|
#
12d2b8f9 |
|
06-Jul-2008 |
Heikki Orsila <heikki.orsila@iki.fi> |
kconfig: fix typos: "Suport" -> "Support" Signed-off-by: Heikki Orsila <heikki.orsila@iki.fi> Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
|
#
37a4c940 |
|
18-Jun-2008 |
S.Çağlar Onur <caglar@pardus.org.tr> |
init: fix URL of "The GNU Accounting Utilities" Following patch corrects URL of "The GNU Accounting Utilities" in init/Kconfig. Noticed by: Bart Van Assche" <bart.vanassche@gmail.com> Signed-off-by: S.Çağlar Onur <caglar@pardus.org.tr> Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
|
#
baabaae9 |
|
08-Jul-2008 |
Johannes Berg <johannes@sipsolutions.net> |
make CONFIG_KMOD invisible ... as preparation for removing it completely, make it an invisible bool defaulting to yes. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
#
f7f5b675 |
|
22-Jul-2008 |
Denys Vlasenko <vda.linux@googlemail.com> |
Shrink struct module: CONFIG_UNUSED_SYMBOLS ifdefs module.c and module.h conatains code for finding exported symbols which are declared with EXPORT_UNUSED_SYMBOL, and this code is compiled in even if CONFIG_UNUSED_SYMBOLS is not set and thus there can be no EXPORT_UNUSED_SYMBOLs in modules anyway (because EXPORT_UNUSED_SYMBOL(x) are compiled out to nothing then). This patch adds required #ifdefs. Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
#
ee7e5516 |
|
29-Jun-2008 |
Dmitry Baryshkov <dbaryshkov@gmail.com> |
generic: per-device coherent dma allocator Currently x86_32, sh and cris-v32 provide per-device coherent dma memory allocator. However their implementation is nearly identical. Refactor out common code to be reused by them. Signed-off-by: Dmitry Baryshkov <dbaryshkov@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
73531905 |
|
25-May-2008 |
Sam Ravnborg <sam@ravnborg.org> |
Kconfig: introduce ARCH_DEFCONFIG to DEFCONFIG_LIST init/Kconfig contains a list of configs that are searched for if 'make *config' are used with no .config present. Extend this list to look at the config identified by ARCH_DEFCONFIG. With this change we now try the defconfig targets last. This fixes a regression reported by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com>
|
#
91e37a79 |
|
09-May-2008 |
Rusty Russell <rusty@rustcorp.com.au> |
module: don't ignore vermagic string if module doesn't have modversions Linus found a logic bug: we ignore the version number in a module's vermagic string if we have CONFIG_MODVERSIONS set, but modversions also lets through a module with no __versions section for modprobe --force (with tainting, but still). We should only ignore the start of the vermagic string if the module actually *has* crcs to check. Rather than (say) having an entertaining hissy fit and creating a config option to work around the buggy code. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
e5e1d3cb |
|
06-May-2008 |
Stas Sergeev <stsp@aknet.ru> |
pcspkr: fix dependancies fix pcspkr dependancies: make the pcspkr platform drivers to depend on a platform device, and not the other way around. Signed-off-by: Stas Sergeev <stsp@aknet.ru> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Dmitry Torokhov <dtor@mail.ru> CC: Vojtech Pavlik <vojtech@suse.cz> CC: Michael Opdenacker <michael-lists@free-electrons.com> [fixed for 2.6.26-rc1 by tiwai] Signed-off-by: Takashi Iwai <tiwai@suse.de>
|
#
aac6abca |
|
03-May-2008 |
Parag Warudkar <parag.warudkar@gmail.com> |
sched: default to n for GROUP_SCHED and FAIR_GROUP_SCHED GROUP_SCHED is confirmed to cause unacceptable latencies, see: http://lkml.org/lkml/2008/5/2/370. Mark it EXPERIMENTAL and default to no for now. Signed-off-by: Parag Warudkar <parag.warudkar@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
a5574cf6 |
|
05-May-2008 |
Ingo Molnar <mingo@elte.hu> |
sched, x86: add HAVE_UNSTABLE_SCHED_CLOCK add the HAVE_UNSTABLE_SCHED_CLOCK, for architectures to select. the next change utilizes it. Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
826e4506 |
|
04-May-2008 |
Linus Torvalds <torvalds@linux-foundation.org> |
Make forced module loading optional The kernel module loader used to be much too happy to allow loading of modules for the wrong kernel version by default. For example, if you had MODVERSIONS enabled, but tried to load a module with no version info, it would happily load it and taint the kernel - whether it was likely to actually work or not! Generally, such forced module loading should be considered a really really bad idea, so make it conditional on a new config option (MODULE_FORCE_LOAD), and make it default to off. If somebody really wants to force module loads, that's their problem, but we should not encourage it. Especially as it happened to me by mistake (ie regular unversioned Fedora modules getting loaded) causing lots of strange behavior. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
f6acb635 |
|
29-Apr-2008 |
Christoph Lameter <clameter@sgi.com> |
slub: #ifdef simplification If we make SLUB_DEBUG depend on SYSFS then we can simplify some #ifdefs and avoid others. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
|
#
88f458e4 |
|
29-Apr-2008 |
Holger Schurig <hs4233@mail.mn-solutions.de> |
sysctl: allow embedded targets to disable sysctl_check.c Disable sysctl_check.c for embedded targets. This saves about about 11 kB in .text and another 11 kB in .data on a PXA255 embedded platform. Signed-off-by: Holger Schurig <hs4233@mail.mn-solutions.de> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
cf475ad2 |
|
29-Apr-2008 |
Balbir Singh <balbir@linux.vnet.ibm.com> |
cgroups: add an owner to the mm_struct Remove the mem_cgroup member from mm_struct and instead adds an owner. This approach was suggested by Paul Menage. The advantage of this approach is that, once the mm->owner is known, using the subsystem id, the cgroup can be determined. It also allows several control groups that are virtually grouped by mm_struct, to exist independent of the memory controller i.e., without adding mem_cgroup's for each controller, to mm_struct. A new config option CONFIG_MM_OWNER is added and the memory resource controller selects this config option. This patch also adds cgroup callbacks to notify subsystems when mm->owner changes. The mm_cgroup_changed callback is called with the task_lock() of the new task held and is called just prior to changing the mm->owner. I am indebted to Paul Menage for the several reviews of this patchset and helping me make it lighter and simpler. This patch was tested on a powerpc box, it was compiled with both the MM_OWNER config turned on and off. After the thread group leader exits, it's moved to init_css_state by cgroup_exit(), thus all future charges from runnings threads would be redirected to the init_css_set's subsystem. Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: Pavel Emelianov <xemul@openvz.org> Cc: Hugh Dickins <hugh@veritas.com> Cc: Sudhir Kumar <skumar@linux.vnet.ibm.com> Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp> Cc: Hirokazu Takahashi <taka@valinux.co.jp> Cc: David Rientjes <rientjes@google.com>, Cc: Balbir Singh <balbir@linux.vnet.ibm.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Reviewed-by: Paul Menage <menage@google.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
08ce5f16 |
|
29-Apr-2008 |
Serge E. Hallyn <serue@us.ibm.com> |
cgroups: implement device whitelist Implement a cgroup to track and enforce open and mknod restrictions on device files. A device cgroup associates a device access whitelist with each cgroup. A whitelist entry has 4 fields. 'type' is a (all), c (char), or b (block). 'all' means it applies to all types and all major and minor numbers. Major and minor are either an integer or * for all. Access is a composition of r (read), w (write), and m (mknod). The root device cgroup starts with rwm to 'all'. A child devcg gets a copy of the parent. Admins can then remove devices from the whitelist or add new entries. A child cgroup can never receive a device access which is denied its parent. However when a device access is removed from a parent it will not also be removed from the child(ren). An entry is added using devices.allow, and removed using devices.deny. For instance echo 'c 1:3 mr' > /cgroups/1/devices.allow allows cgroup 1 to read and mknod the device usually known as /dev/null. Doing echo a > /cgroups/1/devices.deny will remove the default 'a *:* mrw' entry. CAP_SYS_ADMIN is needed to change permissions or move another task to a new cgroup. A cgroup may not be granted more permissions than the cgroup's parent has. Any task can move itself between cgroups. This won't be sufficient, but we can decide the best way to adequately restrict movement later. [akpm@linux-foundation.org: coding-style fixes] [akpm@linux-foundation.org: fix may-be-used-uninitialized warning] Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> Acked-by: James Morris <jmorris@namei.org> Looks-good-to: Pavel Emelyanov <xemul@openvz.org> Cc: Daniel Hokka Zakrisson <daniel@hozac.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Paul Menage <menage@google.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
418d7d87 |
|
29-Apr-2008 |
Paul Menage <menage@google.com> |
CGroup API files: make CGROUP_DEBUG default to off The cgroup debug subsystem isn't generally useful for users. It should default to "n". Signed-off-by: Paul Menage <menage@google.com> Cc: "Li Zefan" <lizf@cn.fujitsu.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Paul Jackson <pj@sgi.com> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: "YAMAMOTO Takashi" <yamamoto@valinux.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
f17a32e9 |
|
29-Apr-2008 |
Adrian Bunk <bunk@kernel.org> |
let LOG_BUF_SHIFT default to 17 16 kB is often no longer enough for a normal boot of an UP system. And even less when people e.g. use suspend. 17 seems to be a more reasonable default for current kernels on current hardware (it's just the default, anyone who is memory limited can still lower it). Signed-off-by: Adrian Bunk <bunk@kernel.org> Acked-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
96fffeb4 |
|
27-Apr-2008 |
Ingo Molnar <mingo@elte.hu> |
make CC_OPTIMIZE_FOR_SIZE non-experimental this option has been the default on a wide range of distributions for a long time - time to make it non-experimental. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
09337f50 |
|
26-Apr-2008 |
David S. Miller <davem@davemloft.net> |
sparc64: Kill CONFIG_SPARC32_COMPAT It's completely superfluous, CONFIG_COMPAT is sufficient. What this used to be is an umbrella for enabling code shared by all 32-bit compat binary support types. But with the removal of SunOS and Solaris support, the only one left is Linux 32-bit ELF. Update defconfig. Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b9b158fe |
|
19-Apr-2008 |
Viktor Radnai <viktor.radnai@gmail.com> |
sched: better rt-group documentation Viktor was nice enough to enhance the document based on my replies to his questions on the subject. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
0f389ec6 |
|
14-Apr-2008 |
Christoph Lameter <clameter@sgi.com> |
slub: No need for per node slab counters if !SLUB_DEBUG The per node counters are used mainly for showing data through the sysfs API. If that API is not compiled in then there is no point in keeping track of this data. Disable counters for the number of slabs and the number of total slabs if !SLUB_DEBUG. Incrementing the per node counters is also accessing a potentially contended cacheline so this could actually be a performance benefit to embedded systems. SLABINFO support is also affected. It now must depends on SLUB_DEBUG (which is on by default). Patch also avoids a check for a NULL kmem_cache_node pointer in new_slab() if the system is not compiled with NUMA support. [penberg@cs.helsinki.fi: fix oops and move ->nr_slabs into CONFIG_SLUB_DEBUG] Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
|
#
21bbb39c |
|
10-Mar-2008 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: move PREEMPT_RCU config option back under PREEMPT The original preemptible-RCU patch put the choice between classic and preemptible RCU into kernel/Kconfig.preempt, which resulted in build failures on machines not supporting CONFIG_PREEMPT. This choice was therefore moved to init/Kconfig, which worked, but placed the choice between classic and preemptible RCU at the top level, a very obtuse choice indeed. This patch changes from the Kconfig "choice" mechanism to a pair of booleans, only one of which (CONFIG_PREEMPT_RCU) is user-visible, and is located in kernel/Kconfig.preempt, where one would expect it to be. The other (CONFIG_CLASSIC_RCU) is in init/Kconfig so that it is available to all architectures, hopefully avoiding build breakage. Thanks to Roman Zippel for suggesting this approach. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: Dipankar Sarma <dipankar@in.ibm.com> Cc: Josh Triplett <josh@freedesktop.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Roman Zippel <zippel@linux-m68k.org> Cc: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
00f0b825 |
|
04-Mar-2008 |
Balbir Singh <balbir@linux.vnet.ibm.com> |
Memory controller: rename to Memory Resource Controller Rename Memory Controller to Memory Resource Controller. Reflect the same changes in the CONFIG definition for the Memory Resource Controller. Group together the config options for Resource Counters and Memory Resource Controller. Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: Paul Menage <menage@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
d47846c5 |
|
04-Mar-2008 |
Ingo Molnar <mingo@elte.hu> |
sysfs: CONFIG_SYSFS_DEPRECATED fix CONFIG_SYSFS_DEPRECATED=y changed its meaning recently and causes regressions in working setups that had SYSFS_DEPRECATED disabled. so rename it to SYSFS_DEPRECATED_V2 so that testers pick up the new default via 'make oldconfig', even if their old .config's disabled CONFIG_SYSFS_DEPRECATED ... Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Kay Sievers <kay.sievers@vrfy.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
#
024440d2 |
|
03-Mar-2008 |
Greg Kroah-Hartman <gregkh@suse.de> |
driver core: fix up Kconfig text for CONFIG_SYSFS_DEPRECATED As things get moved into this config option, the hard date of 2006 does not work anymore, so update the text to be more descriptive. Cc: Kay Sievers <kay.sievers@vrfy.org> Cc: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
#
0835ab53 |
|
23-Feb-2008 |
Andi Kleen <andi@firstfloor.org> |
cgroup memory controller: document huge memory/cache overhead in Kconfig Document huge memory/cache overhead of memory controller in Kconfig I was a little surprised that 2.6.25-rc* increased struct page for the memory controller. At least on many x86-64 machines it will not fit into a single cache line now anymore and also costs considerable amounts of RAM. At earlier review I remembered asking for a external data structure for this. It's also quite unobvious that a innocent looking Kconfig option with a single line Kconfig description has such a negative effect. This patch attempts to document these disadvantages at least so that users configuring their kernel can make a informed decision. Signed-off-by: Andi Kleen <ak@suse.de> Cc: Balbir Singh <balbir@in.ibm.com> Acked-by: Paul Menage <menage@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
052f1dc7 |
|
13-Feb-2008 |
Peter Zijlstra <a.p.zijlstra@chello.nl> |
sched: rt-group: make rt groups scheduling configurable Make the rt group scheduler compile time configurable. Keep it experimental for now. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
166124fd |
|
09-Feb-2008 |
Ingo Molnar <mingo@elte.hu> |
brk: help text typo fix Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
74bd59bb |
|
08-Feb-2008 |
Pavel Emelyanov <xemul@openvz.org> |
namespaces: cleanup the code managed with PID_NS option Just like with the user namespaces, move the namespace management code into the separate .c file and mark the (already existing) PID_NS option as "depend on NAMESPACES" [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Kirill Korotaev <dev@sw.ru> Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
aee16ce7 |
|
08-Feb-2008 |
Pavel Emelyanov <xemul@openvz.org> |
namespaces: cleanup the code managed with the USER_NS option Make the user_namespace.o compilation depend on this option and move the init_user_ns into user.c file to make the kernel compile and work without the namespaces support. This make the user namespace code be organized similar to other namespaces'. Also mask the USER_NS option as "depend on NAMESPACES". [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Kirill Korotaev <dev@sw.ru> Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
ae5e1b22 |
|
08-Feb-2008 |
Pavel Emelyanov <xemul@openvz.org> |
namespaces: move the IPC namespace under IPC_NS option Currently the IPC namespace management code is spread over the ipc/*.c files. I moved this code into ipc/namespace.c file which is compiled out when needed. The linux/ipc_namespace.h file is used to store the prototypes of the functions in namespace.c and the stubs for NAMESPACES=n case. This is done so, because the stub for copy_ipc_namespace requires the knowledge of the CLONE_NEWIPC flag, which is in sched.h. But the linux/ipc.h file itself in included into many many .c files via the sys.h->sem.h sequence so adding the sched.h into it will make all these .c depend on sched.h which is not that good. On the other hand the knowledge about the namespaces stuff is required in 4 .c files only. Besides, this patch compiles out some auxiliary functions from ipc/sem.c, msg.c and shm.c files. It turned out that moving these functions into namespaces.c is not that easy because they use many other calls and macros from the original file. Moving them would make this patch complicated. On the other hand all these functions can be consolidated, so I will send a separate patch doing this a bit later. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Kirill Korotaev <dev@sw.ru> Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
58bfdd6d |
|
08-Feb-2008 |
Pavel Emelyanov <xemul@openvz.org> |
namespaces: move the UTS namespace under UTS_NS option Currently all the namespace management code is in the kernel/utsname.c file, so just compile it out and make stubs in the appropriate header. The init namespace itself is in init/version.c and is in the kernel all the time. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Kirill Korotaev <dev@sw.ru> Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
c5289a69 |
|
08-Feb-2008 |
Pavel Emelyanov <xemul@openvz.org> |
namespaces: add the NAMESPACES config option The option is selectable if EMBEDDED is chosen only. When the EMBEDDED is off namespaces will be on. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Kirill Korotaev <dev@sw.ru> Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
8cdea7c0 |
|
07-Feb-2008 |
Balbir Singh <balbir@linux.vnet.ibm.com> |
Memory controller: cgroups setup Setup the memory cgroup and add basic hooks and controls to integrate and work with the cgroup. Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: Pavel Emelianov <xemul@openvz.org> Cc: Paul Menage <menage@google.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Kirill Korotaev <dev@sw.ru> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: David Rientjes <rientjes@google.com> Cc: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
e552b661 |
|
07-Feb-2008 |
Pavel Emelianov <xemul@openvz.org> |
Memory controller: resource counters With fixes from David Rientjes <rientjes@google.com> Introduce generic structures and routines for resource accounting. Each resource accounting cgroup is supposed to aggregate it, cgroup_subsystem_state and its resource-specific members within. Signed-off-by: Pavel Emelianov <xemul@openvz.org> Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: Paul Menage <menage@google.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Kirill Korotaev <dev@sw.ru> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Signed-off-by: David Rientjes <rientjes@google.com> Cc: Pavel Emelianov <xemul@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
32a93233 |
|
06-Feb-2008 |
Ingo Molnar <mingo@elte.hu> |
brk randomization: introduce CONFIG_COMPAT_BRK based on similar patch from: Pavel Machek <pavel@ucw.cz> Introduce CONFIG_COMPAT_BRK. If disabled then the kernel is free (but not obliged to) randomize the brk area. Heap randomization breaks ancient binaries, so we keep COMPAT_BRK enabled by default. Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
37291458 |
|
04-Feb-2008 |
Matt Mackall <mpm@selenic.com> |
slob: correct Kconfig description Signed-off-by: Matt Mackall <mpm@selenic.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
1e883281 |
|
04-Feb-2008 |
Matt Mackall <mpm@selenic.com> |
maps4: make page monitoring /proc file optional Make /proc/ page monitoring configurable This puts the following files under an embedded config option: /proc/pid/clear_refs /proc/pid/smaps /proc/pid/pagemap /proc/kpagecount /proc/kpageflags [akpm@linux-foundation.org: Kconfig fix] Signed-off-by: Matt Mackall <mpm@selenic.com> Cc: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
f79c343e |
|
04-Feb-2008 |
Davide Libenzi <davidel@xmailserver.org> |
timerfd: un-break CONFIG_TIMERFD Remove the broken status to CONFIG_TIMERFD. Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Cc: Michael Kerrisk <mtk-manpages@gmx.net> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
125e5645 |
|
02-Feb-2008 |
Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> |
Move Kconfig.instrumentation to arch/Kconfig and init/Kconfig Move the instrumentation Kconfig to arch/Kconfig for architecture dependent options - oprofile - kprobes and init/Kconfig for architecture independent options - profiling - markers Remove the "Instrumentation Support" menu. Everything moves to "General setup". Delete the kernel/Kconfig.instrumentation file. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
|
#
fb32e03f |
|
02-Feb-2008 |
Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> |
Create arch/Kconfig Puts the content of arch/Kconfig in the "General setup" menu. Linus: > Should it come with a re-duplication of it's content into each > architecture, which was the case previously ? The oprofile and kprobes > menu entries were litteraly cut and pasted from one architecture to > another. Should we put its content in init/Kconfig then ? I don't think it's a good idea to go back to making it per-architecture, although that extensive "depends on <list-of-archiectures-here>" might indicate that there certainly is room for cleanup there. And I don't think it's wrong keeping it in kernel/Kconfig.xyz per se, I just think it's wrong to (a) lump the code together when it really doesn't necessarily need to and (b) show it to users as some kind of choice that is tied together (whether it then has common code or not). On the per-architecture side, I do think it would be better to *not* have internal architecture knowledge in a generic file, and as such a line like depends on X86_32 || IA64 || PPC || S390 || SPARC64 || X86_64 || AVR32 really shouldn't exist in a file like kernel/Kconfig.instrumentation. It would be much better to do depends on ARCH_SUPPORTS_KPROBES in that generic file, and then architectures that do support it would just have a bool ARCH_SUPPORTS_KPROBES default y in *their* architecture files. That would seem to be much more logical, and is readable both for arch maintainers *and* for people who have no clue - and don't care - about which architecture is supposed to support which interface... Sam Ravnborg: Stuff it into a new file: arch/Kconfig We can then extend this file to include all the 'trailing' Kconfig things that are anyway equal for all ARCHs. But it should be kept clean - so if we introduce such a file then we should use ARCH_HAS_whatever in the arch specific Kconfig files to enable stuff that is not shared. [...] The above suggestion is actually not exactly the best way to do it... First the naming.. A quick grep shows following usage today (in Kconfig files) ARCH_HAS 51 ARCH_SUPPORTS 4 HAVE_ARCH 7 ARCH_HAS is the clear winner. In the common Kconfig file do: config FOO depends on ARCH_HAS_FOO bool "bla bla" config ARCH_HAS_FOO def_bool n In the arch specific Kconfig file in a suitable place do: config SUITABLE_OPTION select ARCH_HAS_FOO The naming of ARCH_HAS_ is fixed and shall be: ARCH_HAS_<config option it will enable> Only a single line added pr. architecture. And we will end up with a (maybe even commented) list of trivial selects. - Yet another update : Moving to HAVE_* now. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: Jeff Dike <jdike@addtoit.com> Cc: David Howells <dhowells@redhat.com> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
|
#
09503105 |
|
31-Jan-2008 |
Paul E. McKenney <paulmck@kernel.org> |
RCU: add help text for "RCU implementation type" This patch supplies help text for the "RCU implementation type" kernel configuration choice. Reported-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
80daa560 |
|
13-Jan-2008 |
Roman Zippel <zippel@linux-m68k.org> |
kconfig: use environment option Use the environment option to provide the ARCH symbol and the KERNELVERSION symbol. Signed-off-by: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
|
#
1322b9de |
|
10-Nov-2007 |
Yuichi Nakamura <ynakam@hitachisoft.jp> |
sh: syscall audit support. Support syscall auditing.. Signed-off-by: Yuichi Nakamura <ynakam@hitachisoft.jp> Signed-off-by: Paul Mundt <lethal@linux-sh.org>
|
#
e260be67 |
|
25-Jan-2008 |
Paul E. McKenney <paulmck@kernel.org> |
Preempt-RCU: implementation This patch implements a new version of RCU which allows its read-side critical sections to be preempted. It uses a set of counter pairs to keep track of the read-side critical sections and flips them when all tasks exit read-side critical section. The details of this implementation can be found in this paper - http://www.rdrop.com/users/paulmck/RCU/OLSrtRCU.2006.08.11a.pdf and the article- http://lwn.net/Articles/253651/ This patch was developed as a part of the -rt kernel development and meant to provide better latencies when read-side critical sections of RCU don't disable preemption. As a consequence of keeping track of RCU readers, the readers have a slight overhead (optimizations in the paper). This implementation co-exists with the "classic" RCU implementations and can be switched to at compiler. Also includes RCU tracing summarized in debugfs. [ akpm@linux-foundation.org: build fixes on non-preempt architectures ] Signed-off-by: Gautham R Shenoy <ego@in.ibm.com> Signed-off-by: Dipankar Sarma <dipankar@in.ibm.com> Signed-off-by: Paul E. McKenney <paulmck@us.ibm.com> Reviewed-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
9148fe87 |
|
31-Dec-2007 |
Randy Dunlap <randy.dunlap@oracle.com> |
sysfs: make SYSFS_DEPRECATED depend on SYSFS Make SYSFS_DEPRECATED depend on SYSFS since files that check CONFIG_SYSFS_DEPRECATED don't check for CONFIG_SYSFS first. Also don't prompt user about SYSFS_DEPRECATED if SYSFS=n. Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
#
158a9624 |
|
02-Jan-2008 |
Linus Torvalds <torvalds@woody.linux-foundation.org> |
Unify /proc/slabinfo configuration Both SLUB and SLAB really did almost exactly the same thing for /proc/slabinfo setup, using duplicate code and per-allocator #ifdef's. This just creates a common CONFIG_SLABINFO that is enabled by both SLUB and SLAB, and shares all the setup code. Maybe SLOB will want this some day too. Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
d842de87 |
|
02-Dec-2007 |
Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> |
sched: cpu accounting controller (V2) Commit cfb5285660aad4931b2ebbfa902ea48a37dfffa1 removed a useful feature for us, which provided a cpu accounting resource controller. This feature would be useful if someone wants to group tasks only for accounting purpose and doesnt really want to exercise any control over their cpu consumption. The patch below reintroduces the feature. It is based on Paul Menage's original patch (Commit 62d0df64065e7c135d0002f069444fbdfc64768f), with these differences: - Removed load average information. I felt it needs more thought (esp to deal with SMP and virtualized platforms) and can be added for 2.6.25 after more discussions. - Convert group cpu usage to be nanosecond accurate (as rest of the cfs stats are) and invoke cpuacct_charge() from the respective scheduler classes - Make accounting scalable on SMP systems by splitting the usage counter to be per-cpu - Move the code from kernel/cpu_acct.c to kernel/sched.c (since the code is not big enough to warrant a new file and also this rightly needs to live inside the scheduler. Also things like accessing rq->lock while reading cpu usage becomes easier if the code lived in kernel/sched.c) The patch also modifies the cpu controller not to provide the same accounting information. Tested-by: Balbir Singh <balbir@linux.vnet.ibm.com> Tested the patches on top of 2.6.24-rc3. The patches work fine. Ran some simple tests like cpuspin (spin on the cpu), ran several tasks in the same group and timed them. Compared their time stamps with cpuacct.usage. Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
529a73fb |
|
22-Nov-2007 |
Mike Frysinger <michael.frysinger@analog.com> |
Blackfin arch: punt CONFIG_BFIN -- we already have CONFIG_BLACKFIN Signed-off-by: Mike Frysinger <michael.frysinger@analog.com> Signed-off-by: Bryan Wu <bryan.wu@analog.com>
|
#
57d5f66b |
|
14-Nov-2007 |
Eric W. Biederman <ebiederm@xmission.com> |
pidns: Place under CONFIG_EXPERIMENTAL This is my trivial patch to swat innumerable little bugs with a single blow. After some intensive review (my apologies for not having gotten to this sooner) what we have looks like a good base to build on with the current pid namespace code but it is not complete, and it is still much to simple to find issues where the kernel does the wrong thing outside of the initial pid namespace. Until the dust settles and we are certain we have the ABI and the implementation is as correct as humanly possible let's keep process ID namespaces behind CONFIG_EXPERIMENTAL. Allowing us the option of fixing any ABI or other bugs we find as long as they are minor. Allowing users of the kernel to avoid those bugs simply by ensuring their kernel does not have support for multiple pid namespaces. [akpm@linux-foundation.org: coding-style cleanups] Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: Adrian Bunk <bunk@kernel.org> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Kir Kolyshkin <kir@swsoft.com> Cc: Kirill Korotaev <dev@sw.ru> Cc: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
cfb52856 |
|
14-Nov-2007 |
Andrew Morton <akpm@linux-foundation.org> |
revert "Task Control Groups: example CPU accounting subsystem" Revert 62d0df64065e7c135d0002f069444fbdfc64768f. This was originally intended as a simple initial example of how to create a control groups subsystem; it wasn't intended for mainline, but I didn't make this clear enough to Andrew. The CFS cgroup subsystem now has better functionality for the per-cgroup usage accounting (based directly on CFS stats) than the "usage" status file in this patch, and the "load" status file is rather simplistic - although having a per-cgroup load average report would be a useful feature, I don't believe this patch actually provides it. If it gets into the final 2.6.24 we'd probably have to support this interface for ever. Cc: Paul Menage <menage@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
8ef93cf1 |
|
24-Oct-2007 |
Ingo Molnar <mingo@elte.hu> |
sched: mark CONFIG_FAIR_GROUP_SCHED as !EXPERIMENTAL mark CONFIG_FAIR_GROUP_SCHED as !EXPERIMENTAL. All bugs have been fixed and it's perfect ;-) Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
74c3cbe3 |
|
22-Jul-2007 |
Al Viro <viro@zeniv.linux.org.uk> |
[PATCH] audit: watching subtrees New kind of audit rule predicates: "object is visible in given subtree". The part that can be sanely implemented, that is. Limitations: * if you have hardlink from outside of tree, you'd better watch it too (or just watch the object itself, obviously) * if you mount something under a watched tree, tell audit that new chunk should be added to watched subtrees * if you umount something in a watched tree and it's still mounted elsewhere, you will get matches on events happening there. New command tells audit to recalculate the trees, trimming such sources of false positives. Note that it's _not_ about path - if something mounted in several places (multiple mount, bindings, different namespaces, etc.), the match does _not_ depend on which one we are using for access. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
68318b8e |
|
19-Oct-2007 |
Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> |
Hook up group scheduler with control groups Enable "cgroup" (formerly containers) based fair group scheduling. This will let administrator create arbitrary groups of tasks (using "cgroup" pseudo filesystem) and control their cpu bandwidth usage. [akpm@linux-foundation.org: fix cpp condition] Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Signed-off-by: Dhaval Giani <dhaval@linux.vnet.ibm.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Paul Menage <menage@google.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
858d72ea |
|
19-Oct-2007 |
Serge E. Hallyn <serue@us.ibm.com> |
cgroups: implement namespace tracking subsystem When a task enters a new namespace via a clone() or unshare(), a new cgroup is created and the task moves into it. This version names cgroups which are automatically created using cgroup_clone() as "node_<pid>" where pid is the pid of the unsharing or cloned process. (Thanks Pavel for the idea) This is safe because if the process unshares again, it will create /cgroups/(...)/node_<pid>/node_<pid> The only possibilities (AFAICT) for a -EEXIST on unshare are 1. pid wraparound 2. a process fails an unshare, then tries again. Case 1 is unlikely enough that I ignore it (at least for now). In case 2, the node_<pid> will be empty and can be rmdir'ed to make the subsequent unshare() succeed. Changelog: Name cloned cgroups as "node_<pid>". [clg@fr.ibm.com: fix order of cgroup subsystems in init/Kconfig] Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> Cc: Paul Menage <menage@google.com> Signed-off-by: Cedric Le Goater <clg@fr.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
006cb992 |
|
19-Oct-2007 |
Paul Menage <menage@google.com> |
Task Control Groups: simple task cgroup debug info subsystem This example subsystem exports debugging information as an aid to diagnosing refcount leaks, etc, in the cgroup framework. Signed-off-by: Paul Menage <menage@google.com> Cc: Serge E. Hallyn <serue@us.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Paul Jackson <pj@sgi.com> Cc: Kirill Korotaev <dev@openvz.org> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
62d0df64 |
|
19-Oct-2007 |
Paul Menage <menage@google.com> |
Task Control Groups: example CPU accounting subsystem This example demonstrates how to use the generic cgroup subsystem for a simple resource tracker that counts, for the processes in a cgroup, the total CPU time used and the %CPU used in the last complete 10 second interval. Portions contributed by Balbir Singh <balbir@in.ibm.com> Signed-off-by: Paul Menage <menage@google.com> Cc: Serge E. Hallyn <serue@us.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Paul Jackson <pj@sgi.com> Cc: Kirill Korotaev <dev@openvz.org> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
8793d854 |
|
19-Oct-2007 |
Paul Menage <menage@google.com> |
Task Control Groups: make cpusets a client of cgroups Remove the filesystem support logic from the cpusets system and makes cpusets a cgroup subsystem The "cpuset" filesystem becomes a dummy filesystem; attempts to mount it get passed through to the cgroup filesystem with the appropriate options to emulate the old cpuset filesystem behaviour. Signed-off-by: Paul Menage <menage@google.com> Cc: Serge E. Hallyn <serue@us.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Paul Jackson <pj@sgi.com> Cc: Kirill Korotaev <dev@openvz.org> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
ddbcc7e8 |
|
19-Oct-2007 |
Paul Menage <menage@google.com> |
Task Control Groups: basic task cgroup framework Generic Process Control Groups -------------------------- There have recently been various proposals floating around for resource management/accounting and other task grouping subsystems in the kernel, including ResGroups, User BeanCounters, NSProxy cgroups, and others. These all need the basic abstraction of being able to group together multiple processes in an aggregate, in order to track/limit the resources permitted to those processes, or control other behaviour of the processes, and all implement this grouping in different ways. This patchset provides a framework for tracking and grouping processes into arbitrary "cgroups" and assigning arbitrary state to those groupings, in order to control the behaviour of the cgroup as an aggregate. The intention is that the various resource management and virtualization/cgroup efforts can also become task cgroup clients, with the result that: - the userspace APIs are (somewhat) normalised - it's easier to test e.g. the ResGroups CPU controller in conjunction with the BeanCounters memory controller, or use either of them as the resource-control portion of a virtual server system. - the additional kernel footprint of any of the competing resource management systems is substantially reduced, since it doesn't need to provide process grouping/containment, hence improving their chances of getting into the kernel This patch: Add the main task cgroups framework - the cgroup filesystem, and the basic structures for tracking membership and associating subsystem state objects to tasks. Signed-off-by: Paul Menage <menage@google.com> Cc: Serge E. Hallyn <serue@us.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Paul Jackson <pj@sgi.com> Cc: Kirill Korotaev <dev@openvz.org> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
e98c3202 |
|
17-Oct-2007 |
Avi Kivity <avi@qumranet.com> |
Move PREEMPT_NOTIFIERS into an always-included Kconfig Kconfig.preempt is not included on some archs (for example, m68k). On those archs, the Kconfig machinery complains that KVM selects an undefined symbol PREEMPT_NOTIFIERS (which lives in Kconfig.preempt). So move the offending symbol into a Kconfig file which is included by everyone. Cc: Roman Zippel <zippel@linux-m68k.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Avi Kivity <avi@qumranet.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
ac0e5b7a |
|
16-Oct-2007 |
Mel Gorman <mel@csn.ul.ie> |
remove PAGE_GROUP_BY_MOBILITY Grouping pages by mobility can be disabled at compile-time. This was considered undesirable by a number of people. However, in the current stack of patches, it is not a simple case of just dropping the configurable patch as it would cause merge conflicts. This patch backs out the configuration option. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Acked-by: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
b92a6edd |
|
16-Oct-2007 |
Mel Gorman <mel@csn.ul.ie> |
Add a configure option to group pages by mobility The grouping mechanism has some memory overhead and a more complex allocation path. This patch allows the strategy to be disabled for small memory systems or if it is known the workload is suffering because of the strategy. It also acts to show where the page groupings strategy interacts with the standard buddy allocator. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Joel Schopp <jschopp@austin.ibm.com> Cc: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
fb615581 |
|
15-Oct-2007 |
Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> |
sched: group scheduler, fix coding style issues Fix coding style issues reported by Randy Dunlap and others Signed-off-by: Dhaval Giani <dhaval@linux.vnet.ibm.com> Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
|
#
de8d585a |
|
15-Oct-2007 |
Ingo Molnar <mingo@elte.hu> |
sched: enable CONFIG_FAIR_GROUP_SCHED=y by default enable CONFIG_FAIR_GROUP_SCHED=y by default. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
|
#
7ed2be45 |
|
15-Oct-2007 |
Ingo Molnar <mingo@elte.hu> |
sched: fair-group sched, cleanups fair-group sched, cleanups. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
|
#
24e377a8 |
|
15-Oct-2007 |
Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> |
sched: add fair-user scheduler Enable user-id based fair group scheduling. This is useful for anyone who wants to test the group scheduler w/o having to enable CONFIG_CGROUPS. A separate scheduling group (i.e struct task_grp) is automatically created for every new user added to the system. Upon uid change for a task, it is made to move to the corresponding scheduling group. A /proc tunable (/proc/root_user_share) is also provided to tune root user's quota of cpu bandwidth. Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Signed-off-by: Dhaval Giani <dhaval@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
|
#
9b5b7751 |
|
15-Oct-2007 |
Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> |
sched: clean up code under CONFIG_FAIR_GROUP_SCHED With the view of supporting user-id based fair scheduling (and not just container-based fair scheduling), this patch renames several functions and makes them independent of whether they are being used for container or user-id based fair scheduling. Also fix a problem reported by KAMEZAWA Hiroyuki (wrt allocating less-sized array for tg->cfs_rq[] and tf->se[]). Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Signed-off-by: Dhaval Giani <dhaval@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
|
#
29f59db3 |
|
15-Oct-2007 |
Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> |
sched: group-scheduler core Add interface to control cpu bandwidth allocation to task-groups. (not yet configurable, due to missing CONFIG_CONTAINERS) Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Signed-off-by: Dhaval Giani <dhaval@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
|
#
e4260197 |
|
18-Sep-2007 |
Andrew Morton <akpm@linux-foundation.org> |
disable sys_timerfd() for 2.6.23 There is still some confusion and disagreement over what this interface should actually do. So it is best that we disable it in 2.6.23 until we get that fully sorted out. (sys_timerfd() was present in 2.6.22 but it was apparently broken, so here we assume that nobody is using it yet). Cc: Michael Kerrisk <mtk-manpages@gmx.net> Cc: Davide Libenzi <davidel@xmailserver.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
ff0cfc66 |
|
31-Jul-2007 |
Al Boldi <a1426z@gawab.com> |
Kconfig: remove top level menu "Code maturity level options" Remove the top level menu "Code maturity level options", and moves its options into menu "General setup". This makes Kconfig less cluttered and easier to setup. Signed-off-by: Al Boldi <a1426z@gawab.com> Acked-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
448e3cee |
|
31-Jul-2007 |
Adrian Bunk <bunk@stusta.de> |
ANON_INODES shouldn't be user visible There doesn't seem to be a good reason for ANON_INODES being an user visible option. Signed-off-by: Adrian Bunk <bunk@stusta.de> Acked-by: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
32582fa4 |
|
24-Jul-2007 |
Paul Mundt <lethal@linux-sh.org> |
sh: Add sh to the CC_OPTIMIZE_FOR_SIZE dependencies. Presently we only use this with CONFIG_EXPERIMENTAL, but it is something that can be supported commonly. Signed-off-by: Paul Mundt <lethal@linux-sh.org>
|
#
a0acd820 |
|
17-Jul-2007 |
Christoph Lameter <clameter@sgi.com> |
Make SLUB the default allocator There are some reports that 2.6.22 has SLUB as the default. Not true! This will make SLUB the default for 2.6.23. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
acce292c |
|
16-Jul-2007 |
Cedric Le Goater <clg@fr.ibm.com> |
user namespace: add the framework Basically, it will allow a process to unshare its user_struct table, resetting at the same time its own user_struct and all the associated accounting. A new root user (uid == 0) is added to the user namespace upon creation. Such root users have full privileges and it seems that theses privileges should be controlled through some means (process capabilities ?) The unshare is not included in this patch. Changes since [try #4]: - Updated get_user_ns and put_user_ns to accept NULL, and get_user_ns to return the namespace. Changes since [try #3]: - moved struct user_namespace to files user_namespace.{c,h} Changes since [try #2]: - removed struct user_namespace* argument from find_user() Changes since [try #1]: - removed struct user_namespace* argument from find_user() - added a root_user per user namespace Signed-off-by: Cedric Le Goater <clg@fr.ibm.com> Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> Acked-by: Pavel Emelianov <xemul@openvz.org> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Kirill Korotaev <dev@sw.ru> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Chris Wright <chrisw@sous-sol.org> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: James Morris <jmorris@namei.org> Cc: Andrew Morgan <agm@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
7d69a1f4 |
|
16-Jul-2007 |
Cedric Le Goater <clg@fr.ibm.com> |
remove CONFIG_UTS_NS and CONFIG_IPC_NS CONFIG_UTS_NS and CONFIG_IPC_NS have very little value as they only deactivate the unshare of the uts and ipc namespaces and do not improve performance. Signed-off-by: Cedric Le Goater <clg@fr.ibm.com> Acked-by: "Serge E. Hallyn" <serue@us.ibm.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Pavel Emelianov <xemul@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
66da5733 |
|
16-Jul-2007 |
Jan Engelhardt <jengelh@linux01.gwdg.de> |
Use menuconfig objects II - module menu Change menuconfig objects from "menu, config" into "menuconfig" so that the user can disable the whole feature without entering its menu first. Signed-off-by: Jan Engelhardt <jengelh@gmx.de> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
84a01c2f |
|
16-Jul-2007 |
Paul Mundt <lethal@linux-sh.org> |
slob: sparsemem support Currently slob is disabled if we're using sparsemem, due to an earlier patch from Goto-san. Slob and static sparsemem work without any trouble as it is, and the only hiccup is a missing slab_is_available() in the case of sparsemem extreme. With this, we're rid of the last set of restrictions for slob usage. Signed-off-by: Paul Mundt <lethal@linux-sh.org> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Acked-by: Matt Mackall <mpm@selenic.com> Cc: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
5f5c926e |
|
09-Jul-2007 |
Jan Engelhardt <jengelh@linux01.gwdg.de> |
block/Kconfig already has its own "menuconfig" so remove these "menu, endmenu" that did not get cleaned up in the block patch [ http://lkml.org/lkml/2007/4/10/251 ] Signed-off-by: Jan Engelhardt <jengelh@gmx.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
|
#
9fbf09a0 |
|
16-May-2007 |
Christoph Lameter <clameter@sgi.com> |
SLUB: Remove depends on EXPERIMENTAL and !ARCH_USES_SLAB_PAGE_STRUCT No arch sets ARCH_USES_SLAB_PAGE_STRUCT anymore. Remove the experimental dependency as well since we want to have it as a real alternative to SLAB. It all comes down to killing a single line from init/Kconfig. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
afc0cedb |
|
16-May-2007 |
Nick Piggin <nickpiggin@yahoo.com.au> |
slob: implement RCU freeing The SLOB allocator should implement SLAB_DESTROY_BY_RCU correctly, because even on UP, RCU freeing semantics are not equivalent to simply freeing immediately. This also allows SLOB to be used on SMP. Signed-off-by: Nick Piggin <npiggin@suse.de> Acked-by: Matt Mackall <mpm@selenic.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
e1ad7468 |
|
10-May-2007 |
Davide Libenzi <davidel@xmailserver.org> |
signal/timer/event: eventfd core This is a very simple and light file descriptor, that can be used as event wait/dispatch by userspace (both wait and dispatch) and by the kernel (dispatch only). It can be used instead of pipe(2) in all cases where those would simply be used to signal events. Their kernel overhead is much lower than pipes, and they do not consume two fds. When used in the kernel, it can offer an fd-bridge to enable, for example, functionalities like KAIO or syslets/threadlets to signal to an fd the completion of certain operations. But more in general, an eventfd can be used by the kernel to signal readiness, in a POSIX poll/select way, of interfaces that would otherwise be incompatible with it. The API is: int eventfd(unsigned int count); The eventfd API accepts an initial "count" parameter, and returns an eventfd fd. It supports poll(2) (POLLIN, POLLOUT, POLLERR), read(2) and write(2). The POLLIN flag is raised when the internal counter is greater than zero. The POLLOUT flag is raised when at least a value of "1" can be written to the internal counter. The POLLERR flag is raised when an overflow in the counter value is detected. The write(2) operation can never overflow the counter, since it blocks (unless O_NONBLOCK is set, in which case -EAGAIN is returned). But the eventfd_signal() function can do it, since it's supposed to not sleep during its operation. The read(2) function reads the __u64 counter value, and reset the internal value to zero. If the value read is equal to (__u64) -1, an overflow happened on the internal counter (due to 2^64 eventfd_signal() posts that has never been retired - unlickely, but possible). The write(2) call writes an __u64 count value, and adds it to the current counter. The eventfd fd supports O_NONBLOCK also. On the kernel side, we have: struct file *eventfd_fget(int fd); int eventfd_signal(struct file *file, unsigned int n); The eventfd_fget() should be called to get a struct file* from an eventfd fd (this is an fget() + check of f_op being an eventfd fops pointer). The kernel can then call eventfd_signal() every time it wants to post an event to userspace. The eventfd_signal() function can be called from any context. An eventfd() simple test and bench is available here: http://www.xmailserver.org/eventfd-bench.c This is the eventfd-based version of pipetest-4 (pipe(2) based): http://www.xmailserver.org/pipetest-4.c Not that performance matters much in the eventfd case, but eventfd-bench shows almost as double as performance than pipetest-4. [akpm@linux-foundation.org: fix i386 build] [akpm@linux-foundation.org: add sys_eventfd to sys_ni.c] Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
b215e283 |
|
10-May-2007 |
Davide Libenzi <davidel@xmailserver.org> |
signal/timer/event: timerfd core This patch introduces a new system call for timers events delivered though file descriptors. This allows timer event to be used with standard POSIX poll(2), select(2) and read(2). As a consequence of supporting the Linux f_op->poll subsystem, they can be used with epoll(2) too. The system call is defined as: int timerfd(int ufd, int clockid, int flags, const struct itimerspec *utmr); The "ufd" parameter allows for re-use (re-programming) of an existing timerfd w/out going through the close/open cycle (same as signalfd). If "ufd" is -1, s new file descriptor will be created, otherwise the existing "ufd" will be re-programmed. The "clockid" parameter is either CLOCK_MONOTONIC or CLOCK_REALTIME. The time specified in the "utmr->it_value" parameter is the expiry time for the timer. If the TFD_TIMER_ABSTIME flag is set in "flags", this is an absolute time, otherwise it's a relative time. If the time specified in the "utmr->it_interval" is not zero (.tv_sec == 0, tv_nsec == 0), this is the period at which the following ticks should be generated. The "utmr->it_interval" should be set to zero if only one tick is requested. Setting the "utmr->it_value" to zero will disable the timer, or will create a timerfd without the timer enabled. The function returns the new (or same, in case "ufd" is a valid timerfd descriptor) file, or -1 in case of error. As stated before, the timerfd file descriptor supports poll(2), select(2) and epoll(2). When a timer event happened on the timerfd, a POLLIN mask will be returned. The read(2) call can be used, and it will return a u32 variable holding the number of "ticks" that happened on the interface since the last call to read(2). The read(2) call supportes the O_NONBLOCK flag too, and EAGAIN will be returned if no ticks happened. A quick test program, shows timerfd working correctly on my amd64 box: http://www.xmailserver.org/timerfd-test.c [akpm@linux-foundation.org: add sys_timerfd to sys_ni.c] Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
fba2afaa |
|
10-May-2007 |
Davide Libenzi <davidel@xmailserver.org> |
signal/timer/event: signalfd core This patch series implements the new signalfd() system call. I took part of the original Linus code (and you know how badly it can be broken :), and I added even more breakage ;) Signals are fetched from the same signal queue used by the process, so signalfd will compete with standard kernel delivery in dequeue_signal(). If you want to reliably fetch signals on the signalfd file, you need to block them with sigprocmask(SIG_BLOCK). This seems to be working fine on my Dual Opteron machine. I made a quick test program for it: http://www.xmailserver.org/signafd-test.c The signalfd() system call implements signal delivery into a file descriptor receiver. The signalfd file descriptor if created with the following API: int signalfd(int ufd, const sigset_t *mask, size_t masksize); The "ufd" parameter allows to change an existing signalfd sigmask, w/out going to close/create cycle (Linus idea). Use "ufd" == -1 if you want a brand new signalfd file. The "mask" allows to specify the signal mask of signals that we are interested in. The "masksize" parameter is the size of "mask". The signalfd fd supports the poll(2) and read(2) system calls. The poll(2) will return POLLIN when signals are available to be dequeued. As a direct consequence of supporting the Linux poll subsystem, the signalfd fd can use used together with epoll(2) too. The read(2) system call will return a "struct signalfd_siginfo" structure in the userspace supplied buffer. The return value is the number of bytes copied in the supplied buffer, or -1 in case of error. The read(2) call can also return 0, in case the sighand structure to which the signalfd was attached, has been orphaned. The O_NONBLOCK flag is also supported, and read(2) will return -EAGAIN in case no signal is available. If the size of the buffer passed to read(2) is lower than sizeof(struct signalfd_siginfo), -EINVAL is returned. A read from the signalfd can also return -ERESTARTSYS in case a signal hits the process. The format of the struct signalfd_siginfo is, and the valid fields depends of the (->code & __SI_MASK) value, in the same way a struct siginfo would: struct signalfd_siginfo { __u32 signo; /* si_signo */ __s32 err; /* si_errno */ __s32 code; /* si_code */ __u32 pid; /* si_pid */ __u32 uid; /* si_uid */ __s32 fd; /* si_fd */ __u32 tid; /* si_fd */ __u32 band; /* si_band */ __u32 overrun; /* si_overrun */ __u32 trapno; /* si_trapno */ __s32 status; /* si_status */ __s32 svint; /* si_int */ __u64 svptr; /* si_ptr */ __u64 utime; /* si_utime */ __u64 stime; /* si_stime */ __u64 addr; /* si_addr */ }; [akpm@linux-foundation.org: fix signalfd_copyinfo() on i386] Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
5dc8bf81 |
|
10-May-2007 |
Davide Libenzi <davidel@xmailserver.org> |
signal/timer/event fds: anonymous inode source This patch add an anonymous inode source, to be used for files that need and inode only in order to create a file*. We do not care of having an inode for each file, and we do not even care of having different names in the associated dentries (dentry names will be same for classes of file*). This allow code reuse, and will be used by epoll, signalfd and timerfd (and whatever else there'll be). Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
d4751a27 |
|
10-May-2007 |
Christoph Lameter <clameter@sgi.com> |
SLUB: SLUB_DEBUG must depend on SLUB Otherwise people get asked about SLUB_DEBUG even if they have another slab allocator enabled. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
34013886 |
|
09-May-2007 |
Christoph Lameter <clameter@sgi.com> |
Fix spellings of slab allocator section in init/Kconfig Fix some of the spelling issues. Fix sentences. Discourage SLOB use since SLUB can pack objects denser. Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: Matt Mackall <mpm@selenic.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
41ecc55b |
|
09-May-2007 |
Christoph Lameter <clameter@sgi.com> |
SLUB: add CONFIG_SLUB_DEBUG CONFIG_SLUB_DEBUG can be used to switch off the debugging and sysfs components of SLUB. Thus SLUB will be able to replace SLOB. SLUB can arrange objects in a denser way than SLOB and the code size should be minimal without debugging and sysfs support. Note that CONFIG_SLUB_DEBUG is materially different from CONFIG_SLAB_DEBUG. CONFIG_SLAB_DEBUG is used to enable slab debugging in SLAB. SLUB enables debugging via a boot parameter. SLUB debug code should always be present. CONFIG_SLUB_DEBUG can be modified in the embedded config section. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
b0e37650 |
|
08-May-2007 |
Robert P. J. Day <rpjday@mindspring.com> |
Kconfig: Remove reference to external mqueue library Remove the reference to an external mqueue library since that was merged into glibc in 2004. Signed-off-by: Robert P. J. Day <rpjday@mindspring.com> Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
#
3dde6ad8 |
|
08-May-2007 |
David Sterba <dave@jikos.cz> |
Fix trivial typos in Kconfig* files Fix several typos in help text in Kconfig* files. Signed-off-by: David Sterba <dave@jikos.cz> Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
#
794543a2 |
|
08-May-2007 |
Alistair John Strachan <s0348365@sms.ed.ac.uk> |
Move LOG_BUF_SHIFT to a more sensible place Several people have observed that perhaps LOG_BUF_SHIFT should be in a more obvious place than under DEBUG_KERNEL. Under some circumstances (such as the PARISC architecture), DEBUG_KERNEL can increase kernel size, which is an undesirable trade off for something as trivial as increasing the kernel log buffer size. Instead, move LOG_BUF_SHIFT into "General Setup", so that people are more likely to be able to change it such a circumstance that the default buffer size is insufficient. Signed-off-by: Alistair John Strachan <s0348365@sms.ed.ac.uk> Acked-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
1394f032 |
|
06-May-2007 |
Bryan Wu <bryan.wu@analog.com> |
blackfin architecture This adds support for the Analog Devices Blackfin processor architecture, and currently supports the BF533, BF532, BF531, BF537, BF536, BF534, and BF561 (Dual Core) devices, with a variety of development platforms including those avaliable from Analog Devices (BF533-EZKit, BF533-STAMP, BF537-STAMP, BF561-EZKIT), and Bluetechnix! Tinyboards. The Blackfin architecture was jointly developed by Intel and Analog Devices Inc. (ADI) as the Micro Signal Architecture (MSA) core and introduced it in December of 2000. Since then ADI has put this core into its Blackfin processor family of devices. The Blackfin core has the advantages of a clean, orthogonal,RISC-like microprocessor instruction set. It combines a dual-MAC (Multiply/Accumulate), state-of-the-art signal processing engine and single-instruction, multiple-data (SIMD) multimedia capabilities into a single instruction-set architecture. The Blackfin architecture, including the instruction set, is described by the ADSP-BF53x/BF56x Blackfin Processor Programming Reference http://blackfin.uclinux.org/gf/download/frsrelease/29/2549/Blackfin_PRM.pdf The Blackfin processor is already supported by major releases of gcc, and there are binary and source rpms/tarballs for many architectures at: http://blackfin.uclinux.org/gf/project/toolchain/frs There is complete documentation, including "getting started" guides available at: http://docs.blackfin.uclinux.org/ which provides links to the sources and patches you will need in order to set up a cross-compiling environment for bfin-linux-uclibc This patch, as well as the other patches (toolchain, distribution, uClibc) are actively supported by Analog Devices Inc, at: http://blackfin.uclinux.org/ We have tested this on LTP, and our test plan (including pass/fails) can be found at: http://docs.blackfin.uclinux.org/doku.php?id=testing_the_linux_kernel [m.kozlowski@tuxland.pl: balance parenthesis in blackfin header files] Signed-off-by: Bryan Wu <bryan.wu@analog.com> Signed-off-by: Mariusz Kozlowski <m.kozlowski@tuxland.pl> Signed-off-by: Aubrey Li <aubrey.li@analog.com> Signed-off-by: Jie Zhang <jie.zhang@analog.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
81819f0f |
|
06-May-2007 |
Christoph Lameter <clameter@sgi.com> |
SLUB core This is a new slab allocator which was motivated by the complexity of the existing code in mm/slab.c. It attempts to address a variety of concerns with the existing implementation. A. Management of object queues A particular concern was the complex management of the numerous object queues in SLAB. SLUB has no such queues. Instead we dedicate a slab for each allocating CPU and use objects from a slab directly instead of queueing them up. B. Storage overhead of object queues SLAB Object queues exist per node, per CPU. The alien cache queue even has a queue array that contain a queue for each processor on each node. For very large systems the number of queues and the number of objects that may be caught in those queues grows exponentially. On our systems with 1k nodes / processors we have several gigabytes just tied up for storing references to objects for those queues This does not include the objects that could be on those queues. One fears that the whole memory of the machine could one day be consumed by those queues. C. SLAB meta data overhead SLAB has overhead at the beginning of each slab. This means that data cannot be naturally aligned at the beginning of a slab block. SLUB keeps all meta data in the corresponding page_struct. Objects can be naturally aligned in the slab. F.e. a 128 byte object will be aligned at 128 byte boundaries and can fit tightly into a 4k page with no bytes left over. SLAB cannot do this. D. SLAB has a complex cache reaper SLUB does not need a cache reaper for UP systems. On SMP systems the per CPU slab may be pushed back into partial list but that operation is simple and does not require an iteration over a list of objects. SLAB expires per CPU, shared and alien object queues during cache reaping which may cause strange hold offs. E. SLAB has complex NUMA policy layer support SLUB pushes NUMA policy handling into the page allocator. This means that allocation is coarser (SLUB does interleave on a page level) but that situation was also present before 2.6.13. SLABs application of policies to individual slab objects allocated in SLAB is certainly a performance concern due to the frequent references to memory policies which may lead a sequence of objects to come from one node after another. SLUB will get a slab full of objects from one node and then will switch to the next. F. Reduction of the size of partial slab lists SLAB has per node partial lists. This means that over time a large number of partial slabs may accumulate on those lists. These can only be reused if allocator occur on specific nodes. SLUB has a global pool of partial slabs and will consume slabs from that pool to decrease fragmentation. G. Tunables SLAB has sophisticated tuning abilities for each slab cache. One can manipulate the queue sizes in detail. However, filling the queues still requires the uses of the spin lock to check out slabs. SLUB has a global parameter (min_slab_order) for tuning. Increasing the minimum slab order can decrease the locking overhead. The bigger the slab order the less motions of pages between per CPU and partial lists occur and the better SLUB will be scaling. G. Slab merging We often have slab caches with similar parameters. SLUB detects those on boot up and merges them into the corresponding general caches. This leads to more effective memory use. About 50% of all caches can be eliminated through slab merging. This will also decrease slab fragmentation because partial allocated slabs can be filled up again. Slab merging can be switched off by specifying slub_nomerge on boot up. Note that merging can expose heretofore unknown bugs in the kernel because corrupted objects may now be placed differently and corrupt differing neighboring objects. Enable sanity checks to find those. H. Diagnostics The current slab diagnostics are difficult to use and require a recompilation of the kernel. SLUB contains debugging code that is always available (but is kept out of the hot code paths). SLUB diagnostics can be enabled via the "slab_debug" option. Parameters can be specified to select a single or a group of slab caches for diagnostics. This means that the system is running with the usual performance and it is much more likely that race conditions can be reproduced. I. Resiliency If basic sanity checks are on then SLUB is capable of detecting common error conditions and recover as best as possible to allow the system to continue. J. Tracing Tracing can be enabled via the slab_debug=T,<slabcache> option during boot. SLUB will then protocol all actions on that slabcache and dump the object contents on free. K. On demand DMA cache creation. Generally DMA caches are not needed. If a kmalloc is used with __GFP_DMA then just create this single slabcache that is needed. For systems that have no ZONE_DMA requirement the support is completely eliminated. L. Performance increase Some benchmarks have shown speed improvements on kernbench in the range of 5-10%. The locking overhead of slub is based on the underlying base allocation size. If we can reliably allocate larger order pages then it is possible to increase slub performance much further. The anti-fragmentation patches may enable further performance increases. Tested on: i386 UP + SMP, x86_64 UP + SMP + NUMA emulation, IA64 NUMA + Simulator SLUB Boot options slub_nomerge Disable merging of slabs slub_min_order=x Require a minimum order for slab caches. This increases the managed chunk size and therefore reduces meta data and locking overhead. slub_min_objects=x Mininum objects per slab. Default is 8. slub_max_order=x Avoid generating slabs larger than order specified. slub_debug Enable all diagnostics for all caches slub_debug=<options> Enable selective options for all caches slub_debug=<o>,<cache> Enable selective options for a certain set of caches Available Debug options F Double Free checking, sanity and resiliency R Red zoning P Object / padding poisoning U Track last free / alloc T Trace all allocs / frees (only use for individual slabs). To use SLUB: Apply this patch and then select SLUB as the default slab allocator. [hugh@veritas.com: fix an oops-causing locking error] [akpm@linux-foundation.org: various stupid cleanups and small fixes] Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
6e5a5420 |
|
01-May-2007 |
Robert P. J. Day <rpjday@mindspring.com> |
kbuild: clarify the creation of the LOCALVERSION_AUTO string. Clarify the creation of the LOCALVERSION_AUTO string during kernel configuration, and fix a couple typoes while we're there. Signed-off-by: Robert P. J. Day <rpjday@mindspring.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
|
#
f991633d |
|
06-Mar-2007 |
Dimitri Gorokhovik <dimitri.gorokhovik@free.fr> |
[PATCH] initramfs should not depend on CONFIG_BLOCK initramfs ended up depending on BLOCK: INITRAMFS_SOURCE <-- BLK_DEV_INITRD <-- BLOCK This inhibits use of customized-initramfs-over-ramfs without block layer (ramfs would still be enabled), useful in embedded applications. Move BLK_DEV_INITRD out of 'drivers/block/Kconfig' and into 'init/Kconfig', make it unconditional. Signed-off-by: Dimitri Gorokhovik <dimitri.gorokhovik@free.fr> Cc: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
a5494dcd |
|
14-Feb-2007 |
Eric W. Biederman <ebiederm@xmission.com> |
[PATCH] sysctl: move SYSV IPC sysctls to their own file This is just a simple cleanup to keep kernel/sysctl.c from getting to crowded with special cases, and by keeping all of the ipc logic to together it makes the code a little more readable. [gcoady.lk@gmail.com: build fix] Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Serge E. Hallyn <serue@us.ibm.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Kirill Korotaev <dev@sw.ru> Signed-off-by: Grant Coady <gcoady.lk@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
18f705f4 |
|
10-Feb-2007 |
Alexey Dobriyan <adobriyan@gmail.com> |
[PATCH] Move TASK_XACCT, TASK_IO_ACCOUNTING up in menus Since they depends on TASKSTATS, it would be nice to move them closer to another options depending on TASKSTATS. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
c33df4ea |
|
10-Feb-2007 |
Jean-Paul Saman <jean-paul.saman@nxp.com> |
[PATCH] disable init/initramfs.c The file init/initramfs.c is always compiled and linked in the kernel vmlinux even when BLK_DEV_RAM and BLK_DEV_INITRD are disabled and the system isn't using any form of an initramfs or initrd. In this situation the code is only used to unpack a (static) default initial rootfilesystem. The current init/initramfs.c code. usr/initramfs_data.o compiles to a size of ~15 kbytes. Disabling BLK_DEV_RAM and BLK_DEV_INTRD shrinks the kernel code size with ~60 Kbytes. This patch avoids compiling in the code and data for initramfs support if CONFIG_BLK_DEV_INITRD is not defined. Instead of the initramfs code and data it uses a small routine in init/noinitramfs.c to setup an initial static default environment for mounting a rootfilesystem later on in the kernel initialisation process. The new code is: 164 bytes of size. The patch is separated in two parts: 1) doesn't compile initramfs code when CONFIG_BLK_DEV_INITRD is not set 2) changing all plaforms vmlinux.lds.S files to not reserve an area of PAGE_SIZE when CONFIG_BLK_DEV_INITRD is not set. [deweerdt@free.fr: warning fix] Signed-off-by: Jean-Paul Saman <jean-paul.saman@nxp.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Frederik Deweerdt <frederik.deweerdt@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
561ccd3a |
|
22-Dec-2006 |
Yasunori Goto <y-goto@jp.fujitsu.com> |
[PATCH] handle SLOB with sparsemen This is to disallow to make SLOB with SMP or SPARSEMEM. This avoids latent troubles of SLOB with SLAB_DESTROY_BY_RCU. And fix compile error. Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com> Acked-by: Randy Dunlap <randy.dunlap@oracle.com> Acked-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
2aea4fb6 |
|
22-Dec-2006 |
Paul Jackson <pj@sgi.com> |
[PATCH] CONFIG_VM_EVENT_COUNTER comment decrustify The VM event counters, enabled by CONFIG_VM_EVENT_COUNTERS, which provides VM event counters in /proc/vmstat, has become more essential to non-EMBEDDED kernel configurations than they were in the past. Comments in the code and the Kconfig configuration explanation were stale, downplaying their role excessively. Refresh those comments to correctly reflect the current role of VM event counters. Signed-off-by: Paul Jackson <pj@sgi.com> Acked-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
979c6a1e |
|
12-Dec-2006 |
Jesper Juhl <jesper.juhl@gmail.com> |
Kconfig: fix spelling error in config KALLSYMS help text Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com> Acked-By: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
#
7c3ab738 |
|
10-Dec-2006 |
Andrew Morton <akpm@osdl.org> |
[PATCH] io-accounting: core statistics The present per-task IO accounting isn't very useful. It simply counts the number of bytes passed into read() and write(). So if a process reads 1MB from an already-cached file, it is accused of having performed 1MB of I/O, which is wrong. (David Wright had some comments on the applicability of the present logical IO accounting: For billing purposes it is useless but for workload analysis it is very useful read_bytes/read_calls average read request size write_bytes/write_calls average write request size read_bytes/read_blocks ie logical/physical can indicate hit rate or thrashing write_bytes/write_blocks ie logical/physical guess since pdflush writes can be missed I often look for logical larger than physical to see filesystem cache problems. And the bytes/cpusec can help find applications that are dominating the cache and causing slow interactive response from page cache contention. I want to find the IO intensive applications and make sure they are doing efficient IO. Thus the acctcms(sysV) or csacms command would give the high IO commands). This patchset adds new accounting which tries to be more accurate. We account for three things: reads: attempt to count the number of bytes which this process really did cause to be fetched from the storage layer. Done at the submit_bio() level, so it is accurate for block-backed filesystems. I also attempt to wire up NFS and CIFS. writes: attempt to count the number of bytes which this process caused to be sent to the storage layer. This is done at page-dirtying time. The big inaccuracy here is truncate. If a process writes 1MB to a file and then deletes the file, it will in fact perform no writeout. But it will have been accounted as having caused 1MB of write. So... cancelled_writes: account the number of bytes which this process caused to not happen, by truncating pagecache. We _could_ just subtract this from the process's `write' accounting. But that means that some processes would be reported to have done negative amounts of write IO, which is silly. So we just report the raw number and punt this decision up to userspace. Now, we _could_ account for writes at the physical I/O level. But - This would require that we track memory-dirtying tasks at the per-page level (would require a new pointer in struct page). - It would mean that IO statistics for a process are usually only available long after that process has exitted. Which means that we probably cannot communicate this info via taskstats. This patch: Wire up the kernel-private data structures and the accessor functions to manipulate them. Cc: Jay Lan <jlan@sgi.com> Cc: Shailabh Nagar <nagar@watson.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Chris Sturtivant <csturtiv@sgi.com> Cc: Tony Ernst <tee@sgi.com> Cc: Guillaume Thouvenin <guillaume.thouvenin@bull.net> Cc: David Wright <daw@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
88a22c98 |
|
14-Sep-2006 |
Kay Sievers <kay.sievers@vrfy.org> |
CONFIG_SYSFS_DEPRECATED Provide a way to support older versions of udev that are shipped in older distros. If this option is disabled, it will also turn off the compatible symlinks in sysfs that older programs might rely on. When in doubt, or if running a distro older than 2006, say Yes here. Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
#
13bb7e37 |
|
08-Nov-2006 |
Eric W. Biederman <ebiederm@xmission.com> |
[PATCH] sysctl: Undeprecate sys_sysctl The basic issue is that despite have been deprecated and warned about as a very bad thing in the man pages since its inception there are a few real users of sys_sysctl. It was my assumption that because sysctl had been deprecated for all of 2.6 there would be no user space users by this point, so I initially gave sys_sysctl a very short deprecation period. Now that I know there are a few real users the only sane way to proceed with deprecation is to push the time limit out to a year or two work and work with distributions that have big testing pools like fedora core to find these last remaining users. Which means that the sys_sysctl interface needs to be maintained in the meantime. Since I have provided a technical measure that allows us to add new sysctl entries without reserving more binary numbers I believe that is enough to fix the sys_sysctl binary interface maintenance problems, because there is no longer a need to change the binary interface at all. Since the sys_sysctl implementation needs to stay around for a while and the worst of the maintenance issues that caused us to occasionally break the ABI have been addressed I don't see any advantage in continuing with the removal of sys_sysctl. So instead of merely increasing the deprecation period this patch removes the deprecation of sys_sysctl and modifies the kernel to compile the code in by default. With committing to maintain sys_sysctl we get all of the advantages of a fast interface for anything that needs it. Currently sys_sysctl is about 5x faster than /proc/sys, for the same string data. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Alan Cox <alan@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
b2670eac |
|
20-Oct-2006 |
Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> |
[PATCH] uml: use DEFCONFIG_LIST to avoid reading host's config This should make sure that, for UML, host's configuration files are not considered, which avoids various pains to the user. Our dependency are such that the obtained Kconfig will be valid and will lead to successful compilation - however they cannot prevent an user from disabling any boot device, and if an option is not set in the read .config (say /boot/config-XXX), with make menuconfig ARCH=um, it is not set. This always disables UBD and all console I/O channels, which leads to non-working UML kernels, so this bothers users - especially now, since it will happen on almost every machine (/boot/config-`uname -r` exists almost on every machine). It can be workarounded with make defconfig ARCH=um, but it is non-obvious and can be avoided, so please _do_ merge this patch. Given the existence of options, it could be interesting to implement (additionally) "option required" - with it, Kconfig will refuse reading a .config file (from wherever it comes) if the given option is not set. With this, one could mark with it the option characteristic of the given architecture (it was an old proposal of Roman Zippel, when I pointed out our problem): config UML option required default y However this should be further discussed: *) for x86, it must support constructs like: ==arch/i386/Kconfig== config 64BIT option required default n where Kconfig must require that CONFIG_64BIT is disabled or not present in the read .config. *) do we want to do such checks only for the starting defconfig or also for .config? Which leads to: *) I may want to port a x86_64 .config to x86 and viceversa, or even among more different archs. Should that be allowed, and in which measure (the user may force skipping the check for a .config or it is only given a warning by default)? Cc: Roman Zippel <zippel@linux-m68k.org> Cc: <kbuild-devel@lists.sourceforge.net> Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Cc: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
25b21cb2 |
|
02-Oct-2006 |
Kirill Korotaev <dev@openvz.org> |
[PATCH] IPC namespace core This patch set allows to unshare IPCs and have a private set of IPC objects (sem, shm, msg) inside namespace. Basically, it is another building block of containers functionality. This patch implements core IPC namespace changes: - ipc_namespace structure - new config option CONFIG_IPC_NS - adds CLONE_NEWIPC flag - unshare support [clg@fr.ibm.com: small fix for unshare of ipc namespace] [akpm@osdl.org: build fix] Signed-off-by: Pavel Emelianov <xemul@openvz.org> Signed-off-by: Kirill Korotaev <dev@openvz.org> Signed-off-by: Cedric Le Goater <clg@fr.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
4865ecf1 |
|
02-Oct-2006 |
Serge E. Hallyn <serue@us.ibm.com> |
[PATCH] namespaces: utsname: implement utsname namespaces This patch defines the uts namespace and some manipulators. Adds the uts namespace to task_struct, and initializes a system-wide init namespace. It leaves a #define for system_utsname so sysctl will compile. This define will be removed in a separate patch. [akpm@osdl.org: build fix, cleanup] Signed-off-by: Serge Hallyn <serue@us.ibm.com> Cc: Kirill Korotaev <dev@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Andrey Savochkin <saw@sw.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
9acc1853 |
|
01-Oct-2006 |
Jay Lan <jlan@engr.sgi.com> |
[PATCH] csa: Extended system accounting over taskstats Add extended system accounting handling over taskstats interface. A CONFIG_TASK_XACCT flag is created to enable the extended accounting code. Signed-off-by: Jay Lan <jlan@sgi.com> Cc: Shailabh Nagar <nagar@watson.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Jes Sorensen <jes@sgi.com> Cc: Chris Sturtivant <csturtiv@sgi.com> Cc: Tony Ernst <tee@sgi.com> Cc: Guillaume Thouvenin <guillaume.thouvenin@bull.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
0847062a |
|
01-Oct-2006 |
Randy Dunlap <rdunlap@infradead.org> |
[PATCH] fix EMBEDDED + SYSCTL menu SYSCTL should still depend on EMBEDDED. This unbreaks the EMBEDDED menu (from the recent SYSCTL_SYCALL menu option patch). Fix typos in new SYSCTL_SYSCALL menu. Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
f2443ab6 |
|
01-Oct-2006 |
Ross Biro <rossb@google.com> |
[PATCH] allow /proc/config.gz to be built as a module The driver for /proc/config.gz consumes rather a lot of memory and it is in fact possible to build it as a module. In some ways this is a bit risky, because the .config which is used for compiling kernel/configs.c isn't necessarily the same as the .config which was used to build vmlinux. But OTOH the potential memory savings are decent, and it'd be fairly dumb to build your configs.o with a different .config. Signed-off-by: Andrew Morton <akpm@google.com> Cc: "Randy.Dunlap" <rdunlap@xenotime.net> Cc: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
9361401e |
|
30-Sep-2006 |
David Howells <dhowells@redhat.com> |
[PATCH] BLOCK: Make it possible to disable the block layer [try #6] Make it possible to disable the block layer. Not all embedded devices require it, some can make do with just JFFS2, NFS, ramfs, etc - none of which require the block layer to be present. This patch does the following: (*) Introduces CONFIG_BLOCK to disable the block layer, buffering and blockdev support. (*) Adds dependencies on CONFIG_BLOCK to any configuration item that controls an item that uses the block layer. This includes: (*) Block I/O tracing. (*) Disk partition code. (*) All filesystems that are block based, eg: Ext3, ReiserFS, ISOFS. (*) The SCSI layer. As far as I can tell, even SCSI chardevs use the block layer to do scheduling. Some drivers that use SCSI facilities - such as USB storage - end up disabled indirectly from this. (*) Various block-based device drivers, such as IDE and the old CDROM drivers. (*) MTD blockdev handling and FTL. (*) JFFS - which uses set_bdev_super(), something it could avoid doing by taking a leaf out of JFFS2's book. (*) Makes most of the contents of linux/blkdev.h, linux/buffer_head.h and linux/elevator.h contingent on CONFIG_BLOCK being set. sector_div() is, however, still used in places, and so is still available. (*) Also made contingent are the contents of linux/mpage.h, linux/genhd.h and parts of linux/fs.h. (*) Makes a number of files in fs/ contingent on CONFIG_BLOCK. (*) Makes mm/bounce.c (bounce buffering) contingent on CONFIG_BLOCK. (*) set_page_dirty() doesn't call __set_page_dirty_buffers() if CONFIG_BLOCK is not enabled. (*) fs/no-block.c is created to hold out-of-line stubs and things that are required when CONFIG_BLOCK is not set: (*) Default blockdev file operations (to give error ENODEV on opening). (*) Makes some /proc changes: (*) /proc/devices does not list any blockdevs. (*) /proc/diskstats and /proc/partitions are contingent on CONFIG_BLOCK. (*) Makes some compat ioctl handling contingent on CONFIG_BLOCK. (*) If CONFIG_BLOCK is not defined, makes sys_quotactl() return -ENODEV if given command other than Q_SYNC or if a special device is specified. (*) In init/do_mounts.c, no reference is made to the blockdev routines if CONFIG_BLOCK is not defined. This does not prohibit NFS roots or JFFS2. (*) The bdflush, ioprio_set and ioprio_get syscalls can now be absent (return error ENOSYS by way of cond_syscall if so). (*) The seclvl_bd_claim() and seclvl_bd_release() security calls do nothing if CONFIG_BLOCK is not set, since they can't then happen. Signed-Off-By: David Howells <dhowells@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
b89a8171 |
|
27-Sep-2006 |
Eric W. Biederman <ebiederm@xmission.com> |
[PATCH] sysctl: Allow /proc/sys without sys_sysctl Since sys_sysctl is deprecated start allow it to be compiled out. This should catch any remaining user space code that cares, and paves the way for further sysctl cleanups. [akpm@osdl.org: If sys_sysctl() is not compiled-in, emit a warning] Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
ae81f9e3 |
|
16-Sep-2006 |
Chuck Ebbert <76306.1226@compuserve.com> |
[PATCH] Kconfig: move CONFIG_EMBEDDED options to submenu Fix two problems with the CONFIG_EMBEDDED submenu: (1) The menu was split in two by the rt_mutex patch, which moved half the items into the "General setup" menu. (2) CONFIG_SYSCTL and CONFIG_UID16 were added to the main menu instead of the submenu. Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
6f44993f |
|
14-Jul-2006 |
Shailabh Nagar <nagar@watson.ibm.com> |
[PATCH] per-task-delay-accounting: delay accounting usage of taskstats interface Usage of taskstats interface by delay accounting. Signed-off-by: Shailabh Nagar <nagar@us.ibm.com> Signed-off-by: Balbir Singh <balbir@in.ibm.com> Cc: Jes Sorensen <jes@sgi.com> Cc: Peter Chubb <peterc@gelato.unsw.edu.au> Cc: Erich Focht <efocht@ess.nec.de> Cc: Levent Serinol <lserinol@gmail.com> Cc: Jay Lan <jlan@engr.sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
c757249a |
|
14-Jul-2006 |
Shailabh Nagar <nagar@watson.ibm.com> |
[PATCH] per-task-delay-accounting: taskstats interface Create a "taskstats" interface based on generic netlink (NETLINK_GENERIC family), for getting statistics of tasks and thread groups during their lifetime and when they exit. The interface is intended for use by multiple accounting packages though it is being created in the context of delay accounting. This patch creates the interface without populating the fields of the data that is sent to the user in response to a command or upon the exit of a task. Each accounting package interested in using taskstats has to provide an additional patch to add its stats to the common structure. [akpm@osdl.org: cleanups, Kconfig fix] Signed-off-by: Shailabh Nagar <nagar@us.ibm.com> Signed-off-by: Balbir Singh <balbir@in.ibm.com> Cc: Jes Sorensen <jes@sgi.com> Cc: Peter Chubb <peterc@gelato.unsw.edu.au> Cc: Erich Focht <efocht@ess.nec.de> Cc: Levent Serinol <lserinol@gmail.com> Cc: Jay Lan <jlan@engr.sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
ca74e92b |
|
14-Jul-2006 |
Shailabh Nagar <nagar@watson.ibm.com> |
[PATCH] per-task-delay-accounting: setup Initialization code related to collection of per-task "delay" statistics which measure how long it had to wait for cpu, sync block io, swapping etc. The collection of statistics and the interface are in other patches. This patch sets up the data structures and allows the statistics collection to be disabled through a kernel boot parameter. Signed-off-by: Shailabh Nagar <nagar@watson.ibm.com> Signed-off-by: Balbir Singh <balbir@in.ibm.com> Cc: Jes Sorensen <jes@sgi.com> Cc: Peter Chubb <peterc@gelato.unsw.edu.au> Cc: Erich Focht <efocht@ess.nec.de> Cc: Levent Serinol <lserinol@gmail.com> Cc: Jay Lan <jlan@engr.sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
dd673bca |
|
30-Jun-2006 |
Adrian Bunk <bunk@stusta.de> |
[PATCH] UML: fix the INIT_ENV_ARG_LIMIT dependencies Fix the INIT_ENV_ARG_LIMIT dependencies to what seems to have been intended. Spotted by Jean-Luc Leger. Signed-off-by: Adrian Bunk <bunk@stusta.de> Acked-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Cc: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
f8891e5e |
|
30-Jun-2006 |
Christoph Lameter <clameter@sgi.com> |
[PATCH] Light weight event counters The remaining counters in page_state after the zoned VM counter patches have been applied are all just for show in /proc/vmstat. They have no essential function for the VM. We use a simple increment of per cpu variables. In order to avoid the most severe races we disable preempt. Preempt does not prevent the race between an increment and an interrupt handler incrementing the same statistics counter. However, that race is exceedingly rare, we may only loose one increment or so and there is no requirement (at least not in kernel) that the vm event counters have to be accurate. In the non preempt case this results in a simple increment for each counter. For many architectures this will be reduced by the compiler to a single instruction. This single instruction is atomic for i386 and x86_64. And therefore even the rare race condition in an interrupt is avoided for both architectures in most cases. The patchset also adds an off switch for embedded systems that allows a building of linux kernels without these counters. The implementation of these counters is through inline code that hopefully results in only a single instruction increment instruction being emitted (i386, x86_64) or in the increment being hidden though instruction concurrency (EPIC architectures such as ia64 can get that done). Benefits: - VM event counter operations usually reduce to a single inline instruction on i386 and x86_64. - No interrupt disable, only preempt disable for the preempt case. Preempt disable can also be avoided by moving the counter into a spinlock. - Handling is similar to zoned VM counters. - Simple and easily extendable. - Can be omitted to reduce memory use for embedded use. References: RFC http://marc.theaimsgroup.com/?l=linux-kernel&m=113512330605497&w=2 RFC http://marc.theaimsgroup.com/?l=linux-kernel&m=114988082814934&w=2 local_t http://marc.theaimsgroup.com/?l=linux-kernel&m=114991748606690&w=2 V2 http://marc.theaimsgroup.com/?t=115014808400007&r=1&w=2 V3 http://marc.theaimsgroup.com/?l=linux-kernel&m=115024767022346&w=2 V4 http://marc.theaimsgroup.com/?l=linux-kernel&m=115047968808926&w=2 Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
23f78d4a |
|
27-Jun-2006 |
Ingo Molnar <mingo@elte.hu> |
[PATCH] pi-futex: rt mutex core Core functions for the rt-mutex subsystem. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
c38bfdc8 |
|
26-Jun-2006 |
Andi Kleen <ak@linux.intel.com> |
[PATCH] x86_64: Move VM86 config into arch/i386/Kconfig Architecture specific configs like this have no business at all in init/Kconfig. This prevents it from being set on x86-64 Pointed out by H.Peter Anvin Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
eab98702 |
|
25-Jun-2006 |
H. Peter Anvin <hpa@zytor.com> |
[PATCH] Make sysctl obligatory except under CONFIG_EMBEDDED Make makes sysctl non-optional unless EMBEDDED is set. There are a number of interfaces exposed via sysctl, enough that it has to be considered core kernel functionality at this point. Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
f368c07d |
|
07-Apr-2006 |
Amy Griffis <amy.griffis@hp.com> |
[PATCH] audit: path-based rules In this implementation, audit registers inotify watches on the parent directories of paths specified in audit rules. When audit's inotify event handler is called, it updates any affected rules based on the filesystem event. If the parent directory is renamed, removed, or its filesystem is unmounted, audit removes all rules referencing that inotify watch. To keep things simple, this implementation limits location-based auditing to the directory entries in an existing directory. Given a path-based rule for /foo/bar/passwd, the following table applies: passwd modified -- audit event logged passwd replaced -- audit event logged, rules list updated bar renamed -- rule removed foo renamed -- untracked, meaning that the rule now applies to the new location Audit users typically want to have many rules referencing filesystem objects, which can significantly impact filtering performance. This patch also adds an inode-number-based rule hash to mitigate this situation. The patch is relative to the audit git tree: http://kernel.org/git/?p=linux/kernel/git/viro/audit-current.git;a=summary and uses the inotify kernel API: http://lkml.org/lkml/2006/6/1/145 Signed-off-by: Amy Griffis <amy.griffis@hp.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
face4374 |
|
08-Jun-2006 |
Roman Zippel <zippel@linux-m68k.org> |
kconfig: add defconfig_list/module option This makes it possible to change two options which were hardcoded sofar. 1. Any symbol can now take the role of CONFIG_MODULES 2. The more useful option is to change the list of default file names, which kconfig uses to load the base configuration if .config isn't available. Signed-off-by: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
|
#
6f18a022 |
|
08-May-2006 |
David Woodhouse <dwmw2@infradead.org> |
Finally remove the obnoxious inter_module_xxx() This was already a bad plan when I argued against adding it in the first place. Good riddance. Signed-off-by: David Woodhouse <dwmw2@infradead.org>
|
#
102e41fd |
|
17-Apr-2006 |
Andi Kleen <ak@linux.intel.com> |
[PATCH] i386: Move CONFIG_DOUBLEFAULT into arch/i386 where it belongs. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
e39632fa |
|
10-Apr-2006 |
Randy Dunlap <rdunlap@infradead.org> |
[PATCH] menu: relocate DOUBLEFAULT option Move the DOUBLEFAULT option from the top-level menu to the EMBEDDED menu. Only applicable to X86_32. Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
8d3b33f6 |
|
25-Mar-2006 |
Rusty Russell <rusty@rustcorp.com.au> |
[PATCH] Remove MODULE_PARM MODULE_PARM was actually breaking: recent gcc version optimize them out as unused. It's time to replace the last users, which are generally in the most unloved drivers anyway. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
b86ff981 |
|
23-Mar-2006 |
Jens Axboe <axboe@suse.de> |
[PATCH] relay: migrate from relayfs to a generic relay API Original patch from Paul Mundt, sysfs parts removed by me since they were broken. Signed-off-by: Jens Axboe <axboe@suse.de>
|
#
8cab77a2 |
|
08-Mar-2006 |
Adrian Bunk <bunk@stusta.de> |
Kconfig: remove the CONFIG_CC_ALIGN_* options I don't see any use case for the CONFIG_CC_ALIGN_* options: - they are only available if EMBEDDED - people using EMBEDDED will most likely also enable CC_OPTIMIZE_FOR_SIZE - the default for -Os is to disable alignment In case someone is doing performance comparisons and discovers that the default settings gcc chooses aren't good, the only sane thing is to discuss whether it makes sense to change this, not through offering options to change this locally. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
|
#
99f6d61b |
|
07-Feb-2006 |
Stephen Smalley <sds@tycho.nsa.gov> |
[PATCH] selinux: require AUDIT Make SELinux depend on AUDIT as it requires the basic audit support to log permission denials at all. Note that AUDITSYSCALL remains optional for SELinux, although it can be useful in providing further information upon denials. Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov> Acked-by: James Morris <jmorris@namei.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
3636641b |
|
03-Feb-2006 |
Adrian Bunk <bunk@stusta.de> |
[PATCH] don't allow users to set CONFIG_BROKEN=y Do not allow people to create configurations with CONFIG_BROKEN=y. The sole reason for CONFIG_BROKEN=y would be if you are working on fixing a broken driver, but in this case editing the Kconfig file is trivial. Never ever should a user enable CONFIG_BROKEN. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
fd279197 |
|
16-Jan-2006 |
Adrian Bunk <bunk@stusta.de> |
[PATCH] build kernel/intermodule.c only when required Build kernel/intermodule.c only when required. Signed-off-by: Adrian Bunk <bunk@stusta.de> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
92c3504e |
|
14-Jan-2006 |
Jesper Juhl <juhl-lkml@dif.dk> |
Spelling fix in init/Kconfig for the help of CONFIG_SWAP Trivial spelling fix s/socalled/so called/ Signed-off-by: Jesper Juhl <juhl-lkml@dif.dk> Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
#
4092bdeb |
|
11-Jan-2006 |
Andi Kleen <ak@linux.intel.com> |
[PATCH] i386: Move DOUBLEFAULT config to arch/i386/Kconfig It has no business being elsewhere and x86-64 doesn't need/want it. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
2308acca |
|
09-Jan-2006 |
Adrian Bunk <bunk@stusta.de> |
[PATCH] "tiny-make-id16-support-optional" fixes It seems the "make UID16 support optional" patch was checked when it edited the -tiny tree some time ago, but it wasn't checked whether it still matches the current situation when it was submitted for inclusion in -mm. This patch fixes the following bugs: - ARCH_S390X does no longer exist, nowadays this has to be expressed through (S390 && 64BIT) - in five architecture specific Kconfig files the UID16 options weren't removed Additionally, it changes the fragile negative dependencies of UID16 to positive dependencies (new architectures are more likely to not require UID16 support). Signed-off-by: Adrian Bunk <bunk@stusta.de> Acked-by: Matt Mackall <mpm@selenic.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
64ca9004 |
|
08-Jan-2006 |
Matt Mackall <mpm@selenic.com> |
[PATCH] Make vm86 support optional This adds an option to remove vm86 support under CONFIG_EMBEDDED. Saves about 5k. This version eliminates most of the #ifdefs of the previous version and instead uses function stubs in vm86.h. Also, release_vm86_irqs is moved from asm-i386/irq.h to a more appropriate home in vm86.h so that the stubs can live together. $ size vmlinux-baseline vmlinux-novm86 text data bss dec hex filename 2920821 523232 190652 3634705 377611 vmlinux-baseline 2916268 523100 190492 3629860 376324 vmlinux-novm86 Signed-off-by: Matt Mackall <mpm@selenic.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
708e9a79 |
|
08-Jan-2006 |
Matt Mackall <mpm@selenic.com> |
[PATCH] tiny: Configure ELF core dump support configurable support for ELF core dumps text data bss dec hex filename 3330172 529036 190556 4049764 3dcb64 vmlinux-baseline 3325552 528912 190556 4045020 3db8dc vmlinux-no-elf add/remove: 0/8 grow/shrink: 0/0 up/down: 0/-4424 (-4424) function old new delta fill_note 32 - -32 maydump 58 - -58 dump_seek 67 - -67 writenote 180 - -180 elf_dump_thread_status 274 - -274 fill_psinfo 308 - -308 fill_prstatus 466 - -466 elf_core_dump 3039 - -3039 Signed-off-by: Matt Mackall <mpm@selenic.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
e585e470 |
|
08-Jan-2006 |
Matt Mackall <mpm@selenic.com> |
[PATCH] tiny: Make *[ug]id16 support optional Configurable 16-bit UID and friends support This allows turning off the legacy 16 bit UID interfaces on embedded platforms. text data bss dec hex filename 3330172 529036 190556 4049764 3dcb64 vmlinux-baseline 3328268 529040 190556 4047864 3dc3f8 vmlinux From: Adrian Bunk <bunk@stusta.de> UID16 was accidentially disabled for !EMBEDDED. Signed-off-by: Matt Mackall <mpm@selenic.com> Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
22c4e308 |
|
08-Jan-2006 |
Matt Mackall <mpm@selenic.com> |
[PATCH] tiny: Make x86 doublefault handling optional This adds configurable support for doublefault reporting on x86 add/remove: 0/3 grow/shrink: 0/1 up/down: 0/-13048 (-13048) function old new delta cpu_init 846 786 -60 doublefault_fn 188 - -188 doublefault_stack 4096 - -4096 doublefault_tss 8704 - -8704 Signed-off-by: Matt Mackall <mpm@selenic.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
10cef602 |
|
08-Jan-2006 |
Matt Mackall <mpm@selenic.com> |
[PATCH] slob: introduce the SLOB allocator configurable replacement for slab allocator This adds a CONFIG_SLAB option under CONFIG_EMBEDDED. When CONFIG_SLAB is disabled, the kernel falls back to using the 'SLOB' allocator. SLOB is a traditional K&R/UNIX allocator with a SLAB emulation layer, similar to the original Linux kmalloc allocator that SLAB replaced. It's signicantly smaller code and is more memory efficient. But like all similar allocators, it scales poorly and suffers from fragmentation more than SLAB, so it's only appropriate for small systems. It's been tested extensively in the Linux-tiny tree. I've also stress-tested it with make -j 8 compiles on a 3G SMP+PREEMPT box (not recommended). Here's a comparison for otherwise identical builds, showing SLOB saving nearly half a megabyte of RAM: $ size vmlinux* text data bss dec hex filename 3336372 529360 190812 4056544 3de5e0 vmlinux-slab 3323208 527948 190684 4041840 3dac70 vmlinux-slob $ size mm/{slab,slob}.o text data bss dec hex filename 13221 752 48 14021 36c5 mm/slab.o 1896 52 8 1956 7a4 mm/slob.o /proc/meminfo: SLAB SLOB delta MemTotal: 27964 kB 27980 kB +16 kB MemFree: 24596 kB 25092 kB +496 kB Buffers: 36 kB 36 kB 0 kB Cached: 1188 kB 1188 kB 0 kB SwapCached: 0 kB 0 kB 0 kB Active: 608 kB 600 kB -8 kB Inactive: 808 kB 812 kB +4 kB HighTotal: 0 kB 0 kB 0 kB HighFree: 0 kB 0 kB 0 kB LowTotal: 27964 kB 27980 kB +16 kB LowFree: 24596 kB 25092 kB +496 kB SwapTotal: 0 kB 0 kB 0 kB SwapFree: 0 kB 0 kB 0 kB Dirty: 4 kB 12 kB +8 kB Writeback: 0 kB 0 kB 0 kB Mapped: 560 kB 556 kB -4 kB Slab: 1756 kB 0 kB -1756 kB CommitLimit: 13980 kB 13988 kB +8 kB Committed_AS: 4208 kB 4208 kB 0 kB PageTables: 28 kB 28 kB 0 kB VmallocTotal: 1007312 kB 1007312 kB 0 kB VmallocUsed: 48 kB 48 kB 0 kB VmallocChunk: 1007264 kB 1007264 kB 0 kB (this work has been sponsored in part by CELF) From: Ingo Molnar <mingo@elte.hu> Fix 32-bitness bugs in mm/slob.c. Signed-off-by: Matt Mackall <mpm@selenic.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
347a8dc3 |
|
06-Jan-2006 |
Martin Schwidefsky <schwidefsky@de.ibm.com> |
[PATCH] s390: cleanup Kconfig Sanitize some s390 Kconfig options. We have ARCH_S390, ARCH_S390X, ARCH_S390_31, 64BIT, S390_SUPPORT and COMPAT. Replace these 6 options by S390, 64BIT and COMPAT. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
b0e15190 |
|
06-Jan-2006 |
David Howells <dhowells@redhat.com> |
[PATCH] NOMMU: Make SYSV IPC SHM use ramfs facilities on NOMMU The attached patch makes the SYSV IPC shared memory facilities use the new ramfs facilities on a no-MMU kernel. The following changes are made: (1) There are now shmem_mmap() and shmem_get_unmapped_area() functions to allow the IPC SHM facilities to commune with the tiny-shmem and shmem code. (2) ramfs files now need resizing using do_truncate() rather than by modifying the inode size directly (see shmem_file_setup()). This causes ramfs to attempt to bind a block of pages of sufficient size to the inode. (3) CONFIG_SYSVIPC is no longer contingent on CONFIG_MMU. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
712f47ce |
|
16-Nov-2005 |
Greg Kroah-Hartman <gregkh@suse.de> |
[PATCH] HOTPLUG: always enable the .config option, unless EMBEDDED With modules, dynamic /dev, and uevents, people really want CONFIG_HOTPLUG to be enabled in their kernels. If not, they can still disable it, but it is discouraged. Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
#
0296b228 |
|
10-Nov-2005 |
Kay Sievers <kay.sievers@suse.de> |
[PATCH] remove CONFIG_KOBJECT_UEVENT option It makes zero sense to have hotplug, but not the netlink events enabled today. Remove this option and merge the kobject_uevent.h header into the kobject.h header file. Signed-off-by: Kay Sievers <kay.sievers@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
#
0d541643 |
|
26-Dec-2005 |
Sam Ravnborg <sam@mars.ravnborg.org> |
kbuild: remove EXPERIMENTAL tag from Module versioning Module versioning support has been stable for a loong time so let's get rid of the EXPERIMENTAL tag. Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
|
#
a9c9dff1 |
|
20-Dec-2005 |
David S. Miller <davem@sunset.davemloft.net> |
[SPARC64]: Stop putting -finline-limit=XXX into CFLAGS It was a stupid workaround for the "static inline" vs. "extern inline" issues of long ago, and it is what causes schedule() to be inlined like crazy into kernel/sched.c when -Os is specified. MIPS and S390 should probably do the same. Now CC_OPTIMIZE_FOR_SIZE can be safely used on sparc64 once more. Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c45b4f1f |
|
14-Dec-2005 |
Linus Torvalds <torvalds@g5.osdl.org> |
Move size optimization option outside of EMBEDDED menu, mark it EXPERIMENTAL Also, disable on sparc64 - a number of people report breakage. Probably a compiler bug, but it's quite possible that it tickles some latent kernel problem too. It still defaults to 'y' everywhere else (when enabled through EXPERIMENTAL), and Dave Jones points out that Fedora (and RHEL4) has been building with size optimizations for a long time on x86, x86-64, ia64, s390, s390x, ppc32 and ppc64. So it is really only moderately experimental, but the sparc64 breakage certainly shows that it can trigger "issues". Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
0910b444 |
|
13-Dec-2005 |
Linus Torvalds <torvalds@g5.osdl.org> |
Expose "Optimize for size" option for everybody Let's put my money where my mouth is. Smaller code is almost always faster, if only because a single I$ miss ends up leaving a lot of cycles to make up for. And system software - kernels in particular - are known for taking more cache misses than most other kinds. On my random config, this made the kernel about 10% smaller, and lmbench seems to say that it's pretty uniformly faster too. Your milage may vary. Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
83bab9a4 |
|
12-Dec-2005 |
Adrian Bunk <bunk@stusta.de> |
[PATCH] allow KOBJECT_UEVENT=n only if EMBEDDED KOBJECT_UEVENT=n seems to be a common pitfall for udev users in 2.6.14 . -mm already contains a bigger patch removing this option that is IMHO too big for being applied now to 2.6.15-rc. This patch simply allows KOBJECT_UEVENT=n only if EMBEDDED. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
3a65dfe8 |
|
04-Nov-2005 |
Jens Axboe <axboe@suse.de> |
[BLOCK] Move all core block layer code to new block/ directory drivers/block/ is right now a mix of core and driver parts. Lets move the core parts to a new top level directory. Al will move the fs/ related block parts to block/ next. Signed-off-by: Jens Axboe <axboe@suse.de>
|
#
34ad92c2 |
|
30-Oct-2005 |
Randy Dunlap <rdunlap@infradead.org> |
[PATCH] clarify help text for INIT_ENV_ARG_LIMIT Try to make the INIT_ENV_ARG_LIMIT help text more readable and understandable. Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
aaebf433 |
|
31-Jul-2005 |
Ryan Anderson <ryan@michonline.com> |
[PATCH] kbuild: automatically append a short string to the version based upon the git commit If CONFIG_AUTO_LOCALVERSION is set, the user is using a git-based tree, and the current HEAD is not referred to by any tags in .git/refs/tags/, append -g and the first 8 characters of the commit to the version string. This makes it easier to use git-bisect, and/or to do a daily build, without trampling on your older, working builds, or accidentally setting up conflicting sets of modules. Signed-off-by: Ryan Anderson <ryan@michonline.com> Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
|
#
dbec4866 |
|
10-Aug-2005 |
Sam Ravnborg <sam@mars.(none)> |
kconfig: move initramfs options to General Setup Move initramfs options from Device Drivers | Block Drivers to General Setup This is a more natural place for this option. Furthermore separate out intramfs options to usr/Kconfig Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
|
#
d9fd8a6d |
|
27-Jul-2005 |
Randy Dunlap <rdunlap@infradead.org> |
[PATCH] kernel/cpuset.c: add kerneldoc, fix typos Add kerneldoc to kernel/cpuset.c Fix cpuset typos in init/Kconfig Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Acked-by: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
f9f97bc0 |
|
19-Jul-2005 |
Jesper Juhl <jesper.juhl@gmail.com> |
[PATCH] kallsyms: clarify KALLSYMS_ALL help text Clarify the KALLSYMS_ALL help text slightly. Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
|
#
f7ceba36 |
|
10-Jul-2005 |
David S. Miller <davem@davemloft.net> |
[SPARC64]: Add syscall auditing support. Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
34a1a63e |
|
28-May-2005 |
Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> |
[PATCH] uml: add modversions support Actually, the real support was added by some earlier patches. Now we simply re-enable the config. option. I've actually tested it and it works well. Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
804a6a49 |
|
11-May-2005 |
Chris Wright <chrisw@osdl.org> |
Audit requires CONFIG_NET Audit now actually requires netlink. So make it depend on CONFIG_NET, and remove the inline dependencies on CONFIG_NET. Signed-off-by: Chris Wright <chrisw@osdl.org> Signed-off-by: David Woodhouse <dwmw2@infradead.org>
|
#
ea9c102c |
|
08-May-2005 |
David Woodhouse <dwmw2@shinybook.infradead.org> |
Add CONFIG_AUDITSC and CONFIG_SECCOMP support for ppc32 Signed-off-by: David Woodhouse <dwmw2@infradead.org>
|
#
79d20b14 |
|
03-May-2005 |
Jeff Dike <jdike@addtoit.com> |
[AUDIT] Update UML audit-syscall-{entry,exit} calls to new prototypes This patch is for -mm only. It should probably be included in git-audit, and should be forwarded to Linus iff git-audit is. It updates the audit-syscall-{entry,exit} calls to current -mm. Signed-off-by: Jeff Dike <jdike@addtoit.com> Cc: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: David Woodhouse <dwmw2@infradead.org>
|
#
d59745ce |
|
01-May-2005 |
Matt Mackall <mpm@selenic.com> |
[PATCH] clean up kernel messages Arrange for all kernel printks to be no-ops. Only available if CONFIG_EMBEDDED. This patch saves about 375k on my laptop config and nearly 100k on minimal configs. Signed-off-by: Matt Mackall <mpm@selenic.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
c8538a7a |
|
01-May-2005 |
Matt Mackall <mpm@selenic.com> |
[PATCH] remove all kernel BUGs This patch eliminates all kernel BUGs, trims about 35k off the typical kernel, and makes the system slightly faster. Signed-off-by: Matt Mackall <mpm@selenic.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
1da177e4 |
|
16-Apr-2005 |
Linus Torvalds <torvalds@ppc970.osdl.org> |
Linux-2.6.12-rc2 Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip!
|