#
36d4fe14 |
|
10-Apr-2024 |
Josh Poimboeuf <jpoimboe@kernel.org> |
x86/bugs: Remove CONFIG_BHI_MITIGATION_AUTO and spectre_bhi=auto Unlike most other mitigations' "auto" options, spectre_bhi=auto only mitigates newer systems, which is confusing and not particularly useful. Remove it. Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Nikolay Borisov <nik.borisov@suse.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/412e9dc87971b622bbbaf64740ebc1f140bff343.1712813475.git.jpoimboe@kernel.org
|
#
5f882f3b |
|
10-Apr-2024 |
Josh Poimboeuf <jpoimboe@kernel.org> |
x86/bugs: Clarify that syscall hardening isn't a BHI mitigation While syscall hardening helps prevent some BHI attacks, there's still other low-hanging fruit remaining. Don't classify it as a mitigation and make it clear that the system may still be vulnerable if it doesn't have a HW or SW mitigation enabled. Fixes: ec9404e40e8f ("x86/bhi: Add BHI mitigation knob") Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/b5951dae3fdee7f1520d5136a27be3bdfe95f88b.1712813475.git.jpoimboe@kernel.org
|
#
dfe64890 |
|
10-Apr-2024 |
Josh Poimboeuf <jpoimboe@kernel.org> |
x86/bugs: Fix BHI documentation Fix up some inaccuracies in the BHI documentation. Fixes: ec9404e40e8f ("x86/bhi: Add BHI mitigation knob") Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Nikolay Borisov <nik.borisov@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/8c84f7451bfe0dd08543c6082a383f390d4aa7e2.1712813475.git.jpoimboe@kernel.org
|
#
95a6ccbd |
|
11-Mar-2024 |
Pawan Gupta <pawan.kumar.gupta@linux.intel.com> |
x86/bhi: Mitigate KVM by default BHI mitigation mode spectre_bhi=auto does not deploy the software mitigation by default. In a cloud environment, it is a likely scenario where userspace is trusted but the guests are not trusted. Deploying system wide mitigation in such cases is not desirable. Update the auto mode to unconditionally mitigate against malicious guests. Deploy the software sequence at VMexit in auto mode also, when hardware mitigation is not available. Unlike the force =on mode, software sequence is not deployed at syscalls in auto mode. Suggested-by: Alexandre Chartre <alexandre.chartre@oracle.com> Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org>
|
#
ec9404e4 |
|
11-Mar-2024 |
Pawan Gupta <pawan.kumar.gupta@linux.intel.com> |
x86/bhi: Add BHI mitigation knob Branch history clearing software sequences and hardware control BHI_DIS_S were defined to mitigate Branch History Injection (BHI). Add cmdline spectre_bhi={on|off|auto} to control BHI mitigation: auto - Deploy the hardware mitigation BHI_DIS_S, if available. on - Deploy the hardware mitigation BHI_DIS_S, if available, otherwise deploy the software sequence at syscall entry and VMexit. off - Turn off BHI mitigation. The default is auto mode which does not deploy the software sequence mitigation. This is because of the hardening done in the syscall dispatch path, which is the likely target of BHI. Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org>
|
#
b75d8521 |
|
23-Mar-2024 |
Vitaly Chikunov <vt@altlinux.org> |
tracing: Fix documentation on tp_printk cmdline option kernel-parameters.txt incorrectly states that workings of kernel.tracepoint_printk sysctl depends on "tracepoint_printk kernel cmdline option", this is a bit misleading for new users since the actual cmdline option name is tp_printk. Fixes: 0daa2302968c ("tracing: Add tp_printk cmdline to have tracepoints go to printk()") Signed-off-by: Vitaly Chikunov <vt@altlinux.org> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Jonathan Corbet <corbet@lwn.net> Link: https://lore.kernel.org/r/20240323231704.1217926-1-vt@altlinux.org
|
#
19f0423f |
|
23-Feb-2024 |
Huang Yiwei <quic_hyiwei@quicinc.com> |
tracing: Support to dump instance traces by ftrace_dump_on_oops Currently ftrace only dumps the global trace buffer on an OOPs. For debugging a production usecase, instance trace will be helpful to check specific problems since global trace buffer may be used for other purposes. This patch extend the ftrace_dump_on_oops parameter to dump a specific or multiple trace instances: - ftrace_dump_on_oops=0: as before -- don't dump - ftrace_dump_on_oops[=1]: as before -- dump the global trace buffer on all CPUs - ftrace_dump_on_oops=2 or =orig_cpu: as before -- dump the global trace buffer on CPU that triggered the oops - ftrace_dump_on_oops=<instance_name>: new behavior -- dump the tracing instance matching <instance_name> - ftrace_dump_on_oops[=2/orig_cpu],<instance1_name>[=2/orig_cpu], <instrance2_name>[=2/orig_cpu]: new behavior -- dump the global trace buffer and multiple instance buffer on all CPUs, or only dump on CPU that triggered the oops if =2 or =orig_cpu is given Also, the sysctl node can handle the input accordingly. Link: https://lore.kernel.org/linux-trace-kernel/20240223083126.1817731-1-quic_hyiwei@quicinc.com Cc: Ross Zwisler <zwisler@google.com> Cc: <mhiramat@kernel.org> Cc: <mark.rutland@arm.com> Cc: <mcgrof@kernel.org> Cc: <keescook@chromium.org> Cc: <j.granados@samsung.com> Cc: <mathieu.desnoyers@efficios.com> Cc: <corbet@lwn.net> Signed-off-by: Huang Yiwei <quic_hyiwei@quicinc.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
#
89ac522d |
|
21-Feb-2024 |
Maxime Ripard <mripard@kernel.org> |
drm/edid/firmware: Remove built-in EDIDs The EDID firmware loading mechanism introduced a few built-in EDIDs that could be forced on any connector, bypassing the EDIDs it exposes. While convenient, this limited set of EDIDs doesn't take into account the connector type, and we can end up with an EDID that is completely invalid for a given connector. For example, the edid/800x600.bin file matches the following EDID: edid-decode (hex): 00 ff ff ff ff ff ff 00 31 d8 00 00 00 00 00 00 05 16 01 03 6d 1b 14 78 ea 5e c0 a4 59 4a 98 25 20 50 54 01 00 00 45 40 01 01 01 01 01 01 01 01 01 01 01 01 01 01 a0 0f 20 00 31 58 1c 20 28 80 14 00 15 d0 10 00 00 1e 00 00 00 ff 00 4c 69 6e 75 78 20 23 30 0a 20 20 20 20 00 00 00 fd 00 3b 3d 24 26 05 00 0a 20 20 20 20 20 20 00 00 00 fc 00 4c 69 6e 75 78 20 53 56 47 41 0a 20 20 00 c2 ---------------- Block 0, Base EDID: EDID Structure Version & Revision: 1.3 Vendor & Product Identification: Manufacturer: LNX Model: 0 Made in: week 5 of 2012 Basic Display Parameters & Features: Analog display Signal Level Standard: 0.700 : 0.000 : 0.700 V p-p Blank level equals black level Sync: Separate Composite Serration Maximum image size: 27 cm x 20 cm Gamma: 2.20 DPMS levels: Standby Suspend Off RGB color display First detailed timing is the preferred timing Color Characteristics: Red : 0.6416, 0.3486 Green: 0.2919, 0.5957 Blue : 0.1474, 0.1250 White: 0.3125, 0.3281 Established Timings I & II: DMT 0x09: 800x600 60.316541 Hz 4:3 37.879 kHz 40.000000 MHz Standard Timings: DMT 0x09: 800x600 60.316541 Hz 4:3 37.879 kHz 40.000000 MHz Detailed Timing Descriptors: DTD 1: 800x600 60.316541 Hz 4:3 37.879 kHz 40.000000 MHz (277 mm x 208 mm) Hfront 40 Hsync 128 Hback 88 Hpol P Vfront 1 Vsync 4 Vback 23 Vpol P Display Product Serial Number: 'Linux #0' Display Range Limits: Monitor ranges (GTF): 59-61 Hz V, 36-38 kHz H, max dotclock 50 MHz Display Product Name: 'Linux SVGA' Checksum: 0xc2 So, an analog monitor EDID. However, if the connector was an HDMI monitor for example, it breaks the HDMI specification that requires, among other things, a digital display, the VIC 1 mode and an HDMI Forum Vendor Specific Data Block in an CTA-861 extension. We thus end up with a completely invalid EDID, which thus might confuse HDMI-related code that could parse it. After some discussions on IRC, we identified mainly two ways to fix this: - We can either create more EDIDs for each connector type to provide a built-in EDID that matches the resolution passed in the name, and still be a sensible EDID for that connector type; - Or we can just prevent the EDID to be exposed to userspace if it's built-in. Or possibly both. However, the conclusion was that maybe we just don't need the built-in EDIDs at all and we should just get rid of them. So here we are. Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Acked-by: Jani Nikula <jani.nikula@intel.com> Acked-by: Pekka Paalanen <pekka.paalanen@collabora.com> Acked-by: Thomas Zimmermann <tzimmermann@suse.de> Signed-off-by: Maxime Ripard <mripard@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20240221092636.691701-1-mripard@kernel.org
|
#
2e3fc6ca |
|
02-Feb-2024 |
Feng Tang <feng.tang@intel.com> |
panic: add option to dump blocked tasks in panic_print For debugging kernel panics and other bugs, there is already an option of panic_print to dump all tasks' call stacks. On today's large servers running many containers, there could be thousands of tasks or more, and this will print out huge amount of call stacks, taking a lot of time (for serial console which is main target user case of panic_print). And in many cases, only those several tasks being blocked are key for the panic, so add an option to only dump blocked tasks' call stacks. [akpm@linux-foundation.org: clarify documentation a little] Link: https://lkml.kernel.org/r/20240202132042.3609657-1-feng.tang@intel.com Signed-off-by: Feng Tang <feng.tang@intel.com> Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Josh Poimboeuf <jpoimboe@kernel.org> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
#
3fec6e59 |
|
14-Feb-2024 |
Nikhil V <quic_nprakash@quicinc.com> |
PM: hibernate: Support to select compression algorithm Currently the default compression algorithm is selected based on compile time options. Introduce a module parameter "hibernate.compressor" to override this behaviour. Different compression algorithms have different characteristics and hibernation may benefit when it uses any of these algorithms, especially when a secondary algorithm(LZ4) offers better decompression speeds over a default algorithm(LZO), which in turn reduces hibernation image restore time. Users can override the default algorithm in two ways: 1) Passing "hibernate.compressor" as kernel command line parameter. Usage: LZO: hibernate.compressor=lzo LZ4: hibernate.compressor=lz4 2) Specifying the algorithm at runtime. Usage: LZO: echo lzo > /sys/module/hibernate/parameters/compressor LZ4: echo lz4 > /sys/module/hibernate/parameters/compressor Currently LZO and LZ4 are the supported algorithms. LZO is the default compression algorithm used with hibernation. Signed-off-by: Nikhil V <quic_nprakash@quicinc.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
#
dfddf34a |
|
19-Jan-2024 |
Meng Li <li.meng@amd.com> |
Documentation: introduce amd-pstate preferrd core mode kernel command line options amd-pstate driver support enable/disable preferred core. Default enabled on platforms supporting amd-pstate preferred core. Disable amd-pstate preferred core with "amd_prefcore=disable" added to the kernel command line. Signed-off-by: Meng Li <li.meng@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Wyes Karny <wyes.karny@amd.com> Reviewed-by: Huang Rui <ray.huang@amd.com> Reviewed-by: Perry Yuan <perry.yuan@amd.com> Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
#
8076fcde |
|
11-Mar-2024 |
Pawan Gupta <pawan.kumar.gupta@linux.intel.com> |
x86/rfds: Mitigate Register File Data Sampling (RFDS) RFDS is a CPU vulnerability that may allow userspace to infer kernel stale data previously used in floating point registers, vector registers and integer registers. RFDS only affects certain Intel Atom processors. Intel released a microcode update that uses VERW instruction to clear the affected CPU buffers. Unlike MDS, none of the affected cores support SMT. Add RFDS bug infrastructure and enable the VERW based mitigation by default, that clears the affected buffers just before exiting to userspace. Also add sysfs reporting and cmdline parameter "reg_file_data_sampling" to control the mitigation. For details see: Documentation/admin-guide/hw-vuln/reg-file-data-sampling.rst Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
|
#
ccdec9219 |
|
22-Feb-2024 |
Xuewen Yan <xuewen.yan@unisoc.com> |
workqueue: Control intensive warning threshold through cmdline When CONFIG_WQ_CPU_INTENSIVE_REPORT is set, the kernel will report the work functions which violate the intensive_threshold_us repeatedly. And now, only when the violate times exceed 4 and is a power of 2, the kernel warning could be triggered. However, sometimes, even if a long work execution time occurs only once, it may cause other work to be delayed for a long time. This may also cause some problems sometimes. In order to freely control the threshold of warninging, a boot argument is added so that the user can control the warning threshold to be printed. At the same time, keep the exponential backoff to prevent reporting too much. By default, the warning threshold is 4. tj: Updated kernel-parameters.txt description. Signed-off-by: Xuewen Yan <xuewen.yan@unisoc.com> Signed-off-by: Tejun Heo <tj@kernel.org>
|
#
2ed08e4b |
|
20-Feb-2024 |
Feng Tang <feng.tang@intel.com> |
clocksource: Scale the watchdog read retries automatically On a 8-socket server the TSC is wrongly marked as 'unstable' and disabled during boot time on about one out of 120 boot attempts: clocksource: timekeeping watchdog on CPU227: wd-tsc-wd excessive read-back delay of 153560ns vs. limit of 125000ns, wd-wd read-back delay only 11440ns, attempt 3, marking tsc unstable tsc: Marking TSC unstable due to clocksource watchdog TSC found unstable after boot, most likely due to broken BIOS. Use 'tsc=unstable'. sched_clock: Marking unstable (119294969739, 159204297)<-(125446229205, -5992055152) clocksource: Checking clocksource tsc synchronization from CPU 319 to CPUs 0,99,136,180,210,542,601,896. clocksource: Switched to clocksource hpet The reason is that for platform with a large number of CPUs, there are sporadic big or huge read latencies while reading the watchog/clocksource during boot or when system is under stress work load, and the frequency and maximum value of the latency goes up with the number of online CPUs. The cCurrent code already has logic to detect and filter such high latency case by reading the watchdog twice and checking the two deltas. Due to the randomness of the latency, there is a low probabilty that the first delta (latency) is big, but the second delta is small and looks valid. The watchdog code retries the readouts by default twice, which is not necessarily sufficient for systems with a large number of CPUs. There is a command line parameter 'max_cswd_read_retries' which allows to increase the number of retries, but that's not user friendly as it needs to be tweaked per system. As the number of required retries is proportional to the number of online CPUs, this parameter can be calculated at runtime. Scale and enlarge the number of retries according to the number of online CPUs and remove the command line parameter completely. [ tglx: Massaged change log and comments ] Signed-off-by: Feng Tang <feng.tang@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Jin Wang <jin1.wang@intel.com> Tested-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Waiman Long <longman@redhat.com> Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Link: https://lore.kernel.org/r/20240221060859.1027450-1-feng.tang@intel.com
|
#
5c5682b9 |
|
13-Feb-2024 |
Thomas Gleixner <tglx@linutronix.de> |
x86/cpu: Detect real BSP on crash kernels When a kdump kernel is started from a crashing CPU then there is no guarantee that this CPU is the real boot CPU (BSP). If the kdump kernel tries to online the BSP then the INIT sequence will reset the machine. There is a command line option to prevent this, but in case of nested kdump kernels this is wrong. But that command line option is not required at all because the real BSP is enumerated as the first CPU by firmware. Support for the only known system which was different (Voyager) got removed long ago. Detect whether the boot CPU APIC ID is the first APIC ID enumerated by the firmware. If the first APIC ID enumerated is not matching the boot CPU APIC ID then skip registering it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Michael Kelley <mhklinux@outlook.com> Tested-by: Sohil Mehta <sohil.mehta@intel.com> Link: https://lore.kernel.org/r/20240213210252.348542071@linutronix.de
|
#
7f66f099 |
|
02-Dec-2023 |
Qais Yousef <qyousef@layalina.io> |
rcu: Provide a boot time parameter to control lazy RCU To allow more flexible arrangements while still provide a single kernel for distros, provide a boot time parameter to enable/disable lazy RCU. Specify: rcutree.enable_rcu_lazy=[y|1|n|0] Which also requires rcu_nocbs=all at boot time to enable/disable lazy RCU. To disable it by default at build time when CONFIG_RCU_LAZY=y, the new CONFIG_RCU_LAZY_DEFAULT_OFF can be used. Signed-off-by: Qais Yousef (Google) <qyousef@layalina.io> Tested-by: Andrea Righi <andrea.righi@canonical.com> Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
|
#
60071659 |
|
26-Nov-2023 |
Paul E. McKenney <paulmck@kernel.org> |
doc: Add EARLY flag to early-parsed kernel boot parameters Kernel boot parameters declared with early_param() are parsed before embedded parameters are extracted from initrd, and early_param() parameters are not helpful when embedded in initrd. Therefore, mark early_param() kernel boot parameters with "EARLY" in kernel-parameters.txt. The following early_param() calls declare kernel boot parameters that are undocumented: early_param("atmel.pm_modes", at91_pm_modes_select); early_param("mem_fclk_21285", early_fclk); early_param("ecc", early_ecc); early_param("cachepolicy", early_cachepolicy); early_param("nodebugmon", early_debug_disable); early_param("kfence.sample_interval", parse_kfence_early_init); early_param("additional_cpus", setup_additional_cpus); early_param("stram_pool", atari_stram_setup); early_param("disable_octeon_edac", disable_octeon_edac); early_param("rd_start", rd_start_early); early_param("rd_size", rd_size_early); early_param("coherentio", setcoherentio); early_param("nocoherentio", setnocoherentio); early_param("fadump", early_fadump_param); early_param("fadump_reserve_mem", early_fadump_reserve_mem); early_param("no_stf_barrier", handle_no_stf_barrier); early_param("no_rfi_flush", handle_no_rfi_flush); early_param("smt-enabled", early_smt_enabled); early_param("ppc_pci_reset_phbs", pci_reset_phbs_setup); early_param("ps3fb", early_parse_ps3fb); early_param("ps3flash", early_parse_ps3flash); early_param("novx", disable_vector_extension); early_param("nobp", nobp_setup_early); early_param("nospec", nospec_setup_early); early_param("possible_cpus", _setup_possible_cpus); early_param("stp", early_parse_stp); early_param("nopfault", nopfault); early_param("nmi_mode", nmi_mode_setup); early_param("sh_mv", early_parse_mv); early_param("pmb", early_pmb); early_param("hvirq", early_hvirq_major); early_param("cfi", cfi_parse_cmdline); early_param("disableapic", setup_disableapic); early_param("noapictimer", parse_disable_apic_timer); early_param("disable_cpu_apicid", apic_set_disabled_cpu_apicid); early_param("uv_memblksize", parse_mem_block_size); early_param("retbleed", retbleed_parse_cmdline); early_param("no-kvmclock-vsyscall", parse_no_kvmclock_vsyscall); early_param("update_mptable", update_mptable_setup); early_param("alloc_mptable", parse_alloc_mptable_opt); early_param("possible_cpus", _setup_possible_cpus); early_param("lsmsi", early_parse_ls_scfg_msi); early_param("nokgdbroundup", opt_nokgdbroundup); early_param("kgdbcon", opt_kgdb_con); early_param("kasan", early_kasan_flag); early_param("kasan.mode", early_kasan_mode); early_param("kasan.vmalloc", early_kasan_flag_vmalloc); early_param("kasan.page_alloc.sample", early_kasan_flag_page_alloc_sample); early_param("kasan.page_alloc.sample.order", early_kasan_flag_page_alloc_sample_order); early_param("kasan.fault", early_kasan_fault); early_param("kasan.stacktrace", early_kasan_flag_stacktrace); early_param("kasan.stack_ring_size", early_kasan_flag_stack_ring_size); early_param("accept_memory", accept_memory_parse); early_param("page_table_check", early_page_table_check_param); sh_early_platform_init("earlytimer", &sh_cmt_device_driver); early_param_on_off("gbpages", "nogbpages", direct_gbpages, CONFIG_X86_DIRECT_GBPAGES); These are not necessarily bugs, given that some kernel boot parameters are intended for deep debugging rather than general use. This work does not cover all of the kernel boot parameters declared using cmdline_find_option() and cmdline_find_option_bool(). If these are in fact guaranteed to be early (which appears to be the case), they can be added in a later version of this patch. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Petr Malat <oss@malat.biz> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: <linux-doc@vger.kernel.org> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
|
#
3e3ede49 |
|
02-Feb-2024 |
Guilherme G. Piccoli <gpiccoli@igalia.com> |
docs: Document possible_cpus parameter The number of possible CPUs is set be kernel in early boot time through some discovery mechanisms, like ACPI in x86. We have a parameter both in x86 and S390 to override that - there are some cases of BIOSes exposing more possible CPUs than the available ones, so this parameter is a good testing mechanism, but for some reason wasn't mentioned so far in the kernel parameters guide - let's fix that. Cc: Changwoo Min <changwoo@igalia.com> Signed-off-by: "Guilherme G. Piccoli" <gpiccoli@igalia.com> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Jonathan Corbet <corbet@lwn.net> Link: https://lore.kernel.org/r/20240203152208.1461293-1-gpiccoli@igalia.com
|
#
29956748 |
|
02-Feb-2024 |
Borislav Petkov (AMD) <bp@alien8.de> |
x86/Kconfig: Remove CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT It was meant well at the time but nothing's using it so get rid of it. Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20240202163510.GDZb0Zvj8qOndvFOiZ@fat_crate.local
|
#
3810da12 |
|
05-Dec-2023 |
Xin Li <xin3.li@intel.com> |
x86/fred: Add a fred= cmdline param Let command line option "fred" accept multiple options to make it easier to tweak its behavior. Currently, two options 'on' and 'off' are allowed, and the default behavior is to disable FRED. To enable FRED, append "fred=on" to the kernel command line. [ bp: Use cpu_feature_enabled(), touch ups. ] Signed-off-by: Xin Li <xin3.li@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Tested-by: Shan Kang <shan.kang@intel.com> Link: https://lore.kernel.org/r/20231205105030.8698-9-xin3.li@intel.com
|
#
49527ca2 |
|
18-Jan-2024 |
Borislav Petkov (AMD) <bp@alien8.de> |
Documentation/kernel-parameters: Add spec_rstack_overflow to mitigations=off mitigations=off disables the SRSO mitigation too. Add it to the list. Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20240118163600.17857-1-bp@alien8.de
|
#
671776b3 |
|
14-Dec-2023 |
Xiongwei Song <xiongwei.song@windriver.com> |
mm/slub: unify all sl[au]b parameters with "slab_$param" Since the SLAB allocator has been removed, so we can clean up the sl[au]b_$params. With only one slab allocator left, it's better to use the generic "slab" term instead of "slub" which is an implementation detail, which is pointed out by Vlastimil Babka. For more information please see [1]. Hence, we are going to use "slab_$param" as the primary prefix. This patch is changing the following slab parameters - slub_max_order - slub_min_order - slub_min_objects - slub_debug to - slab_max_order - slab_min_order - slab_min_objects - slab_debug as the primary slab parameters for all references of them in docs and comments. But this patch won't change variables and functions inside slub as we will have wider slub/slab change. Meanwhile, "slub_$params" can also be passed by command line, which is to keep backward compatibility. Also mark all "slub_$params" as legacy. Remove the separate descriptions for slub_[no]merge, append legacy tip for them at the end of descriptions of slab_[no]merge. [1] https://lore.kernel.org/linux-mm/7512b350-4317-21a0-fab3-4101bc4d8f7a@suse.cz/ Signed-off-by: Xiongwei Song <xiongwei.song@windriver.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
|
#
f1868165 |
|
14-Dec-2023 |
Xiongwei Song <xiongwei.song@windriver.com> |
Documentation: kernel-parameters: remove noaliencache Since slab allocator has already been removed. There is no users of the noaliencache parameter, so let's remove it. Suggested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Xiongwei Song <xiongwei.song@windriver.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
|
#
aefb2f2e |
|
21-Nov-2023 |
Breno Leitao <leitao@debian.org> |
x86/bugs: Rename CONFIG_RETPOLINE => CONFIG_MITIGATION_RETPOLINE Step 5/10 of the namespace unification of CPU mitigations related Kconfig options. [ mingo: Converted a few more uses in comments/messages as well. ] Suggested-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Breno Leitao <leitao@debian.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Ariel Miculas <amiculas@cisco.com> Acked-by: Josh Poimboeuf <jpoimboe@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20231121160740.1249350-6-leitao@debian.org
|
#
78de91b4 |
|
16-Jan-2024 |
Youling Tang <tangyouling@kylinos.cn> |
LoongArch: Use generic interface to support crashkernel=X,[high,low] LoongArch already supports two crashkernel regions in kexec-tools, so we can directly use the common interface to support crashkernel=X,[high,low] after commit 0ab97169aa0517079b ("crash_core: add generic function to do reservation"). With the help of newly changed function parse_crashkernel() and generic reserve_crashkernel_generic(), crashkernel reservation can be simplified by steps: 1) Add a new header file <asm/crash_core.h>, then define CRASH_ALIGN, CRASH_ADDR_LOW_MAX and CRASH_ADDR_HIGH_MAX and in <asm/crash_core.h>; 2) Add arch_reserve_crashkernel() to call parse_crashkernel() and reserve_crashkernel_generic(); 3) Add ARCH_HAS_GENERIC_CRASHKERNEL_RESERVATION Kconfig in arch/loongarch/Kconfig. One can reserve the crash kernel from high memory above DMA zone range by explicitly passing "crashkernel=X,high"; or reserve a memory range below 4G with "crashkernel=X,low". Besides, there are few rules need to take notice: 1) "crashkernel=X,[high,low]" will be ignored if "crashkernel=size" is specified. 2) "crashkernel=X,low" is valid only when "crashkernel=X,high" is passed and there is enough memory to be allocated under 4G. 3) When allocating crashkernel above 4G and no "crashkernel=X,low" is specified, a 128M low memory will be allocated automatically for swiotlb bounce buffer. See Documentation/admin-guide/kernel-parameters.txt for more information. Following test cases have been performed as expected: 1) crashkernel=256M //low=256M 2) crashkernel=1G //low=1G 3) crashkernel=4G //high=4G, low=128M(default) 4) crashkernel=4G crashkernel=256M,high //high=4G, low=128M(default), high is ignored 5) crashkernel=4G crashkernel=256M,low //high=4G, low=128M(default), low is ignored 6) crashkernel=4G,high //high=4G, low=128M(default) 7) crashkernel=256M,low //low=0M, invalid 8) crashkernel=4G,high crashkernel=256M,low //high=4G, low=256M 9) crashkernel=4G,high crashkernel=4G,low //high=0M, low=0M, invalid 10) crashkernel=512M@2560M //low=512M 11) crashkernel=1G,high crashkernel=0M,low //high=1G, low=0M Recommended usage in general: 1) In the case of small memory: crashkernel=512M 2) In the case of large memory: crashkernel=1024M,high crashkernel=128M,low Signed-off-by: Youling Tang <tangyouling@kylinos.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
#
323925ed |
|
20-Dec-2023 |
Andrew Jones <ajones@ventanamicro.com> |
RISC-V: paravirt: Add skeleton for pv-time support Add the files and functions needed to support paravirt time on RISC-V. Also include the common code needed for the first application of pv-time, which is steal-time. In the next patches we'll complete the functions to fully enable steal-time support. Acked-by: Palmer Dabbelt <palmer@rivosinc.com> Reviewed-by: Anup Patel <anup@brainfault.org> Reviewed-by: Atish Patra <atishp@rivosinc.com> Signed-off-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Anup Patel <anup@brainfault.org>
|
#
2678fd2f |
|
07-Dec-2023 |
Alexander Graf <graf@amazon.com> |
initramfs: Expose retained initrd as sysfs file When the kernel command line option "retain_initrd" is set, we do not free the initrd memory. However, we also don't expose it to anyone for consumption. That leaves us in a weird situation where the only user of this feature is ppc64 and arm64 specific kexec tooling. To make it more generally useful, this patch adds a kobject to the firmware object that contains the initrd context when "retain_initrd" is set. That way, we can access the initrd any time after boot from user space and for example hand it into kexec as --initrd parameter if we want to reboot the same initrd. Or inspect it directly locally. With this patch applied, there is a new /sys/firmware/initrd file when the kernel was booted with an initrd and "retain_initrd" command line option is set. Signed-off-by: Alexander Graf <graf@amazon.com> Tested-by: Bagas Sanjaya <bagasdotme@gmail.com> Link: https://lore.kernel.org/r/20231207235654.16622-1-graf@amazon.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
5a1ccf0c |
|
27-Oct-2023 |
Hardik Gajjar <hgajjar@de.adit-jv.com> |
usb: new quirk to reduce the SET_ADDRESS request timeout This patch introduces a new USB quirk, USB_QUIRK_SHORT_SET_ADDRESS_REQ_TIMEOUT, which modifies the timeout value for the SET_ADDRESS request. The standard timeout for USB request/command is 5000 ms, as recommended in the USB 3.2 specification (section 9.2.6.1). However, certain scenarios, such as connecting devices through an APTIV hub, can lead to timeout errors when the device enumerates as full speed initially and later switches to high speed during chirp negotiation. In such cases, USB analyzer logs reveal that the bus suspends for 5 seconds due to incorrect chirp parsing and resumes only after two consecutive timeout errors trigger a hub driver reset. Packet(54) Dir(?) Full Speed J(997.100 us) Idle( 2.850 us) _______| Time Stamp(28 . 105 910 682) _______|_____________________________________________________________Ch0 Packet(55) Dir(?) Full Speed J(997.118 us) Idle( 2.850 us) _______| Time Stamp(28 . 106 910 632) _______|_____________________________________________________________Ch0 Packet(56) Dir(?) Full Speed J(399.650 us) Idle(222.582 us) _______| Time Stamp(28 . 107 910 600) _______|_____________________________________________________________Ch0 Packet(57) Dir Chirp J( 23.955 ms) Idle(115.169 ms) _______| Time Stamp(28 . 108 532 832) _______|_____________________________________________________________Ch0 Packet(58) Dir(?) Full Speed J (Suspend)( 5.347 sec) Idle( 5.366 us) _______| Time Stamp(28 . 247 657 600) _______|_____________________________________________________________Ch0 This 5-second delay in device enumeration is undesirable, particularly in automotive applications where quick enumeration is crucial (ideally within 3 seconds). The newly introduced quirks provide the flexibility to align with a 3-second time limit, as required in specific contexts like automotive applications. By reducing the SET_ADDRESS request timeout to 500 ms, the system can respond more swiftly to errors, initiate rapid recovery, and ensure efficient device enumeration. This change is vital for scenarios where rapid smartphone enumeration and screen projection are essential. To use the quirk, please write "vendor_id:product_id:p" to /sys/bus/usb/drivers/hub/module/parameter/quirks For example, echo "0x2c48:0x0132:p" > /sys/bus/usb/drivers/hub/module/parameters/quirks" Signed-off-by: Hardik Gajjar <hgajjar@de.adit-jv.com> Reviewed-by: Alan Stern <stern@rowland.harvard.edu> Link: https://lore.kernel.org/r/20231027152029.104363-2-hgajjar@de.adit-jv.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
a3a27827 |
|
14-Dec-2023 |
Vlastimil Babka <vbabka@suse.cz> |
Documentation, mm/unaccepted: document accept_memory kernel parameter The accept_memory kernel parameter was added in commit dcdfdd40fa82 ("mm: Add support for unaccepted memory") but not listed in the kernel-parameters doc. Add it there. Acked-by: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Jonathan Corbet <corbet@lwn.net> Link: https://lore.kernel.org/r/20231214-accept_memory_param-v2-1-f38cd20a0247@suse.cz
|
#
4e58aaee |
|
01-Nov-2023 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Restrict access to RCU CPU stall notifiers Although the RCU CPU stall notifiers can be useful for dumping state when tracking down delicate forward-progress bugs where NUMA effects cause cache lines to be delivered to a given CPU regularly, but always in a state that prevents that CPU from making forward progress. These bugs can be detected by the RCU CPU stall-warning mechanism, but in some cases, the stall-warnings printk()s disrupt the forward-progress bug before any useful state can be obtained. Unfortunately, the notifier mechanism added by commit 5b404fdabacf ("rcu: Add RCU CPU stall notifier") can make matters worse if used at all carelessly. For example, if the stall warning was caused by a lock not being released, then any attempt to acquire that lock in the notifier will hang. This will prevent not only the notifier from producing any useful output, but it will also prevent the stall-warning message from ever appearing. This commit therefore hides this new RCU CPU stall notifier mechanism under a new RCU_CPU_STALL_NOTIFIER Kconfig option that depends on both DEBUG_KERNEL and RCU_EXPERT. In addition, the rcupdate.rcu_cpu_stall_notifiers=1 kernel boot parameter must also be specified. The RCU_CPU_STALL_NOTIFIER Kconfig option's help text contains a warning and explains the dangers of careless use, recommending lockless notifier code. In addition, a WARN() is triggered each time that an attempt is made to register a stall-warning notifier in kernels built with CONFIG_RCU_CPU_STALL_NOTIFIER=y. This combination of measures will keep use of this mechanism confined to debug kernels and away from routine deployments. [ paulmck: Apply Dan Carpenter feedback. ] Fixes: 5b404fdabacf ("rcu: Add RCU CPU stall notifier") Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Neeraj Upadhyay (AMD) <neeraj.iitr10@gmail.com>
|
#
5e0a760b |
|
28-Dec-2023 |
Kirill A. Shutemov <kirill.shutemov@linux.intel.com> |
mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDER commit 23baf831a32c ("mm, treewide: redefine MAX_ORDER sanely") has changed the definition of MAX_ORDER to be inclusive. This has caused issues with code that was not yet upstream and depended on the previous definition. To draw attention to the altered meaning of the define, rename MAX_ORDER to MAX_PAGE_ORDER. Link: https://lkml.kernel.org/r/20231228144704.14033-2-kirill.shutemov@linux.intel.com Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
#
c986968f |
|
07-Nov-2023 |
Javier Martinez Canillas <javierm@redhat.com> |
regulator: core: Add option to prevent disabling unused regulators This may be useful for debugging and develompent purposes, when there are drivers that depend on regulators to be enabled but do not request them. It is inspired from the clk_ignore_unused and pd_ignore_unused parameters, that are used to keep firmware-enabled clocks and power domains on even if these are not used by drivers. The parameter is not expected to be used in normal cases and should not be needed on a platform with proper driver support. Signed-off-by: Javier Martinez Canillas <javierm@redhat.com> Reviewed-by: Brian Masney <bmasney@redhat.com> Acked-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Link: https://lore.kernel.org/r/20231107190926.1185326-1-javierm@redhat.com Signed-off-by: Mark Brown <broonie@kernel.org>
|
#
c76c067e |
|
28-Sep-2023 |
Niklas Schnelle <schnelle@linux.ibm.com> |
s390/pci: Use dma-iommu layer While s390 already has a standard IOMMU driver and previous changes have added I/O TLB flushing operations this driver is currently only used for user-space PCI access such as vfio-pci. For the DMA API s390 instead utilizes its own implementation in arch/s390/pci/pci_dma.c which drives the same hardware and shares some code but requires a complex and fragile hand over between DMA API and IOMMU API use of a device and despite code sharing still leads to significant duplication and maintenance effort. Let's utilize the common code DMAP API implementation from drivers/iommu/dma-iommu.c instead allowing us to get rid of arch/s390/pci/pci_dma.c. Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Link: https://lore.kernel.org/r/20230928-dma_iommu-v13-3-9e5fc4dacc36@linux.ibm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
#
5b9d31ae |
|
08-Sep-2023 |
Trond Myklebust <trond.myklebust@hammerspace.com> |
NFSv4: Add a parameter to limit the number of retries after NFS4ERR_DELAY When using a 'softerr' mount, the NFSv4 client can get stuck waiting forever while the server just returns NFS4ERR_DELAY. Among other things, this causes the knfsd server threads to busy wait. Add a parameter that tells the NFSv4 client how many times to retry before giving up. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
#
9407bda8 |
|
17-Oct-2023 |
Thomas Gleixner <tglx@linutronix.de> |
x86/microcode: Prepare for minimal revision check Applying microcode late can be fatal for the running kernel when the update changes functionality which is in use already in a non-compatible way, e.g. by removing a CPUID bit. There is no way for admins which do not have access to the vendors deep technical support to decide whether late loading of such a microcode is safe or not. Intel has added a new field to the microcode header which tells the minimal microcode revision which is required to be active in the CPU in order to be safe. Provide infrastructure for handling this in the core code and a command line switch which allows to enforce it. If the update is considered safe the kernel is not tainted and the annoying warning message not emitted. If it's enforced and the currently loaded microcode revision is not safe for late loading then the load is aborted. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231017211724.079611170@linutronix.de
|
#
78a96c86 |
|
19-Oct-2023 |
Geert Uytterhoeven <geert+renesas@glider.be> |
Documentation: kernel-parameters: Add earlyprintk=bios on SH Document the use of the "earlyprintk=bios[,keep]" kernel parameter option on SuperH systems with a SuperH standard BIOS. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Link: https://lore.kernel.org/r/febc920964f2f0919d21775132e84c5cc270177e.1697708489.git.geert+renesas@glider.be Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
|
#
94483490 |
|
13-Jan-2023 |
Ard Biesheuvel <ardb@kernel.org> |
Documentation: Drop or replace remaining mentions of IA64 Drop or update mentions of IA64, as appropriate. Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
|
#
94b3f0b5 |
|
21-Aug-2023 |
Rik van Riel <riel@surriel.com> |
smp,csd: Throw an error if a CSD lock is stuck for too long The CSD lock seems to get stuck in 2 "modes". When it gets stuck temporarily, it usually gets released in a few seconds, and sometimes up to one or two minutes. If the CSD lock stays stuck for more than several minutes, it never seems to get unstuck, and gradually more and more things in the system end up also getting stuck. In the latter case, we should just give up, so the system can dump out a little more information about what went wrong, and, with panic_on_oops and a kdump kernel loaded, dump a whole bunch more information about what might have gone wrong. In addition, there is an smp.panic_on_ipistall kernel boot parameter that by default retains the old behavior, but when set enables the panic after the CSD lock has been stuck for more than the specified number of milliseconds, as in 300,000 for five minutes. [ paulmck: Apply Imran Khan feedback. ] [ paulmck: Apply Leonardo Bras feedback. ] Link: https://lore.kernel.org/lkml/bc7cc8b0-f587-4451-8bcd-0daae627bcc7@paulmck-laptop/ Signed-off-by: Rik van Riel <riel@surriel.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Imran Khan <imran.f.khan@oracle.com> Reviewed-by: Leonardo Bras <leobras@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Valentin Schneider <vschneid@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Randy Dunlap <rdunlap@infradead.org>
|
#
9b81d3a5 |
|
27-Sep-2023 |
Luiz Capitulino <luizcap@amazon.com> |
cgroup: add cgroup_favordynmods= command-line option We have a need of using favordynmods with cgroup v1, which doesn't support changing mount flags during remount. Enabling CONFIG_CGROUP_FAVOR_DYNMODS at build-time is not an option because we want to be able to selectively enable it for certain systems. This commit addresses this by introducing the cgroup_favordynmods= command-line option. This option works for both cgroup v1 and v2 and also allows for disabling favorynmods when the kernel built with CONFIG_CGROUP_FAVOR_DYNMODS=y. Also, note that when cgroup_favordynmods=true favordynmods is never disabled in cgroup_destroy_root(). Signed-off-by: Luiz Capitulino <luizcap@amazon.com> Reviewed-by: Michal Koutný <mkoutny@suse.com> Signed-off-by: Tejun Heo <tj@kernel.org>
|
#
2273799c2 |
|
22-Aug-2023 |
Paul E. McKenney <paulmck@kernel.org> |
locktorture: Rename readers_bind/writers_bind to bind_readers/bind_writers This commit renames the readers_bind and writers_bind module parameters to bind_readers and bind_writers, respectively. This provides added clarity via the imperative mode and better organizes the documentation. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
|
#
b1326d76 |
|
21-Aug-2023 |
Paul E. McKenney <paulmck@kernel.org> |
doc: Catch-up update for locktorture module parameters This commit documents recently added locktorture module parameters. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Joel Fernandes <joel@joelfernandes.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
|
#
7f993623 |
|
21-Aug-2023 |
Paul E. McKenney <paulmck@kernel.org> |
locktorture: Add call_rcu_chains module parameter When running locktorture on large systems, there will normally be enough RCU activity to ensure that there is a grace period in flight at all times. However, on smaller systems, RCU might well be idle the majority of the time. This situation can be inconvenient in cases where the RCU CPU stall warning is part of the debugging process. This commit therefore adds an call_rcu_chains module parameter to locktorture, allowing the user to specify the desired number of self-propagating call_rcu() chains. For good measure, immediately before invoking call_rcu(), the self-propagating RCU callback invokes start_poll_synchronize_rcu() to force the immediate start of a grace period, with the call_rcu() forcing another to start shortly thereafter. Booting with locktorture.call_rcu_chains=2 increases the probability of a stuck locking primitive resulting in an RCU CPU stall warning from about 25% to nearly 100%. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
|
#
a11e0975 |
|
23-Jun-2023 |
Nikolay Borisov <nik.borisov@suse.com> |
x86: Make IA32_EMULATION boot time configurable Distributions would like to reduce their attack surface as much as possible but at the same time they'd want to retain flexibility to cater to a variety of legacy software. This stems from the conjecture that compat layer is likely rarely tested and could have latent security bugs. Ideally distributions will set their default policy and also give users the ability to override it as appropriate. To enable this use case, introduce CONFIG_IA32_EMULATION_DEFAULT_DISABLED compile time option, which controls whether 32bit processes/syscalls should be allowed or not. This option is aimed mainly at distributions to set their preferred default behavior in their kernels. To allow users to override the distro's policy, introduce the 'ia32_emulation' parameter which allows overriding CONFIG_IA32_EMULATION_DEFAULT_DISABLED state at boot time. Signed-off-by: Nikolay Borisov <nik.borisov@suse.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20230623111409.3047467-7-nik.borisov@suse.com
|
#
16128b1f |
|
01-Aug-2023 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Add sysfs to provide throttled access to rcu_barrier() When running a series of stress tests all making heavy use of RCU, it is all too possible to OOM the system when the prior test's RCU callbacks don't get invoked until after the subsequent test starts. One way of handling this is just a timed wait, but this fails when a given CPU has so many callbacks queued that they take longer to invoke than allowed for by that timed wait. This commit therefore adds an rcutree.do_rcu_barrier module parameter that is accessible from sysfs. Writing one of the many synonyms for boolean "true" will cause an rcu_barrier() to be invoked, but will guarantee that no more than one rcu_barrier() will be invoked per sixteenth of a second via this mechanism. The flip side is that a given request might wait a second or three longer than absolutely necessary, but only when there are multiple uses of rcutree.do_rcu_barrier within a one-second time interval. This commit unnecessarily serializes the rcu_barrier() machinery, given that serialization is already provided by procfs. This has the advantage of allowing throttled rcu_barrier() from other sources within the kernel. Reported-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
|
#
8a4c0c90 |
|
16-Aug-2023 |
Paul E. McKenney <paulmck@kernel.org> |
doc: Add refscale.lookup_instances to kernel-parameters.txt This commit adds refscale.lookup_instances to kernel-parameters.txt. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
|
#
0ceef6e9 |
|
17-Aug-2023 |
Vaibhav Jain <vaibhav@linux.ibm.com> |
powerpc/idle: Add support for nohlt This patch enables config option GENERIC_IDLE_POLL_SETUP for arch powerpc. This adds support for kernel param 'nohlt'. Powerpc kernel also supports another kernel boot-time param called 'powersave' which can also be used to disable all cpu idle-states and forces CPU to an idle-loop similar to what cpu_idle_poll() does. This patch however makes powerpc kernel-parameters better aligned to the generic boot-time parameters. Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230818050739.827851-1-vaibhav@linux.ibm.com
|
#
33f0dd97 |
|
26-Jul-2023 |
Chen Jiahao <chenjiahao16@huawei.com> |
docs: kdump: Update the crashkernel description for riscv Now "crashkernel=" parameter on riscv has been updated to support crashkernel=X,[high,low]. Through which we can reserve memory region above/within 32bit addressible DMA zone. Here update the parameter description accordingly. Signed-off-by: Chen Jiahao <chenjiahao16@huawei.com> Reviewed-by: Guo Ren <guoren@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Zhen Lei <thunder.leizhen@huawei.com> Acked-by: Baoquan He <bhe@redhat.com> Link: https://lore.kernel.org/r/20230726175000.2536220-3-chenjiahao16@huawei.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
#
f176638a |
|
08-Aug-2023 |
Alan Stern <stern@rowland.harvard.edu> |
USB: Remove Wireless USB and UWB documentation Support for Wireless USB and Ultra WideBand was removed in 2020 by commit caa6772db4c1 ("Staging: remove wusbcore and UWB from the kernel tree."). But the documentation files were left behind. Let's get rid of that out-of-date documentation. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Link: https://lore.kernel.org/r/015d4310-bcd3-4ba4-9a0e-3664f281a9be@rowland.harvard.edu Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
523a301e |
|
07-Aug-2023 |
Tejun Heo <tj@kernel.org> |
workqueue: Make default affinity_scope dynamically updatable While workqueue.default_affinity_scope is writable, it only affects workqueues which are created afterwards and isn't very useful. Instead, let's introduce explicit "default" scope and update the effective scope dynamically when workqueue.default_affinity_scope is changed. Signed-off-by: Tejun Heo <tj@kernel.org>
|
#
63c5484e |
|
07-Aug-2023 |
Tejun Heo <tj@kernel.org> |
workqueue: Add multiple affinity scopes and interface to select them Add three more affinity scopes - WQ_AFFN_CPU, SMT and CACHE - and make CACHE the default. The code changes to actually add the additional scopes are trivial. Also add module parameter "workqueue.default_affinity_scope" to override the default scope and "affinity_scope" sysfs file to configure it per workqueue. wq_dump.py and documentations are updated accordingly. This enables significant flexibility in configuring how unbound workqueues behave. If affinity scope is set to "cpu", it'll behave close to a per-cpu workqueue. On the other hand, "system" removes all locality boundaries. Many modern machines have multiple L3 caches often while being mostly uniform in terms of memory access. Thus, workqueue's previous behavior of spreading work items in each NUMA node had negative performance implications from unncessarily crossing L3 boundaries between issue and execution. However, picking a finer grained affinity scope also has a downside in that an issuer in one group can't utilize CPUs in other groups. While dependent on the specifics of workload, there's usually a noticeable penalty in crossing L3 boundaries, so let's default to CACHE. This issue will be further addressed and documented with examples in future patches. Signed-off-by: Tejun Heo <tj@kernel.org>
|
#
fcecfa8f |
|
07-Aug-2023 |
Tejun Heo <tj@kernel.org> |
workqueue: Remove module param disable_numa and sysfs knobs pool_ids and numa Unbound workqueue CPU affinity is going to receive an overhaul and the NUMA specific knobs won't make sense anymore. Remove them. Also, the pool_ids knob was used for debugging and not really meaningful given that there is no visibility into the pools associated with those IDs. Remove it too. A future patch will improve overall visibility. Signed-off-by: Tejun Heo <tj@kernel.org>
|
#
73c58e7e |
|
05-Jul-2023 |
Michael Ellerman <mpe@ellerman.id.au> |
powerpc: Add HOTPLUG_SMT support Add support for HOTPLUG_SMT, which enables the generic sysfs SMT support files in /sys/devices/system/cpu/smt, as well as the "nosmt" boot parameter. Implement the recently added hooks to allow partial SMT states, allow any number of threads per core. Tie the config symbol to HOTPLUG_CPU, which enables it on the major platforms that support SMT. If there are other platforms that want the SMT support that can be tweaked in future. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> [ldufour: remove topology_smt_supported] [ldufour: remove topology_smt_threads_supported] [ldufour: select CONFIG_SMT_NUM_THREADS_DYNAMIC] [ldufour: update kernel-parameters.txt] Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com> Link: https://msgid.link/20230705145143.40545-10-ldufour@linux.ibm.com
|
#
496ea826 |
|
13-Jul-2023 |
Conor Dooley <conor.dooley@microchip.com> |
RISC-V: provide Kconfig & commandline options to control parsing "riscv,isa" As it says on the tin, provide Kconfig option to control parsing the "riscv,isa" devicetree property. If either option is used, the kernel will fall back to parsing "riscv,isa", where "riscv,isa-base" and "riscv,isa-extensions" are not present. The Kconfig options are set up so that the default kernel configuration will enable the fallback path, without needing the commandline option. Suggested-by: Andrew Jones <ajones@ventanamicro.com> Suggested-by: Palmer Dabbelt <palmer@rivosinc.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20230713-aviator-plausibly-a35662485c2c@wendy Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
#
ace3c549 |
|
28-Jun-2023 |
tiozhang <tiozhang@didiglobal.com> |
workqueue: add cmdline parameter `workqueue.unbound_cpus` to further constrain wq_unbound_cpumask at boot time Motivation of doing this is to better improve boot times for devices when we want to prevent our workqueue works from running on some specific CPUs, e,g, some CPUs are busy with interrupts. Signed-off-by: tiozhang <tiozhang@didiglobal.com> Signed-off-by: Tejun Heo <tj@kernel.org>
|
#
d56b699d |
|
14-Aug-2023 |
Bjorn Helgaas <bhelgaas@google.com> |
Documentation: Fix typos Fix typos in Documentation. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://lore.kernel.org/r/20230814212822.193684-4-helgaas@kernel.org Signed-off-by: Jonathan Corbet <corbet@lwn.net>
|
#
5797f5f9 |
|
02-Aug-2023 |
Ma Wupeng <mawupeng1@huawei.com> |
doc: update params of memhp_default_state= Commit 5f47adf762b7 ("mm/memory_hotplug: allow to specify a default online_type") allows to specify a default online_type which make online memory to kernel or movable zone possible but fail to update to doc. Update doc to fit this change. Signed-off-by: Ma Wupeng <mawupeng1@huawei.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net> Link: https://lore.kernel.org/r/20230802074312.2111074-1-mawupeng1@huawei.com
|
#
bff24699 |
|
14-Aug-2023 |
Jarkko Sakkinen <jarkko@kernel.org> |
tpm_tis: Revert "tpm_tis: Disable interrupts on ThinkPad T490s" Since for MMIO driver using FIFO registers, also known as tpm_tis, the default (and tbh recommended) behaviour is now the polling mode, the "tristate" workaround is no longer for benefit. If someone wants to explicitly enable IRQs for a TPM chip that should be without question allowed. It could very well be a piece hardware in the existing deny list because of e.g. firmware update or something similar. While at it, document the module parameter, as this was not done in 2006 when it first appeared in the mainline. Link: https://lore.kernel.org/linux-integrity/20201015214430.17937-1-jsnitsel@redhat.com/ Link: https://lore.kernel.org/all/1145393776.4829.19.camel@localhost.localdomain/ Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
|
#
bf29bfaa |
|
12-Jul-2023 |
Yajun Deng <yajun.deng@linux.dev> |
dma-contiguous: support numa CMA for specified node The kernel parameter 'cma_pernuma=' only supports reserving the same size of CMA area for each node. We need to reserve different sizes of CMA area for specified nodes if these devices belong to different nodes. Adding another kernel parameter 'numa_cma=' to reserve CMA area for the specified node. If we want to use one of these parameters, we need to enable DMA_NUMA_CMA. At the same time, print the node id in cma_declare_contiguous_nid() if CONFIG_NUMA is enabled. Signed-off-by: Yajun Deng <yajun.deng@linux.dev> Signed-off-by: Christoph Hellwig <hch@lst.de>
|
#
22e4a348 |
|
12-May-2023 |
Yajun Deng <yajun.deng@linux.dev> |
dma-contiguous: support per-numa CMA for all architectures In the commit b7176c261cdb ("dma-contiguous: provide the ability to reserve per-numa CMA"), Barry adds DMA_PERNUMA_CMA for ARM64. But this feature is architecture independent, so support per-numa CMA for all architectures, and enable it by default if NUMA. Signed-off-by: Yajun Deng <yajun.deng@linux.dev> Tested-by: Yicong Yang <yangyicong@hisilicon.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
|
#
b4047e53 |
|
14-Jul-2023 |
Randy Dunlap <rdunlap@infradead.org> |
docs: panic: cleanups for panic params Move 'panic_print' to its correct place in alphabetical order. Add parameter format for 'pause_on_oops'. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Jonathan Corbet <corbet@lwn.net> Link: https://lore.kernel.org/r/20230715034811.9665-1-rdunlap@infradead.org
|
#
eb38cc80 |
|
15-Jul-2023 |
Randy Dunlap <rdunlap@infradead.org> |
Docs: kernel-parameters: sort arm64 entries Put the arm64 kernel-parameters entries into alphabetical order. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Jonathan Corbet <corbet@lwn.net> Link: https://lore.kernel.org/r/20230715235105.17966-1-rdunlap@infradead.org
|
#
45071e1c |
|
13-Aug-2023 |
Loic Poulain <loic.poulain@linaro.org> |
init: Add support for rootwait timeout parameter Add an optional timeout arg to 'rootwait' as the maximum time in seconds to wait for the root device to show up before attempting forced mount of the root filesystem. Use case: In case of device mapper usage for the rootfs (e.g. root=/dev/dm-0), if the mapper is not able to create the virtual block for any reason (wrong arguments, bad dm-verity signature, etc), the `rootwait` param causes the kernel to wait forever. It may however be desirable to only wait for a given time and then panic (force mount) to cause device reset. This gives the bootloader a chance to detect the problem and to take some measures, such as marking the booted partition as bad (for A/B case) or entering a recovery mode. In success case, mounting happens as soon as the root device is ready, unlike the existing 'rootdelay' parameter which performs an unconditional pause. Signed-off-by: Loic Poulain <loic.poulain@linaro.org> Message-Id: <20230813082349.513386-1-loic.poulain@linaro.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
|
#
5d248bb3 |
|
02-Jun-2023 |
Dietmar Eggemann <dietmar.eggemann@arm.com> |
torture: Add lock_torture writer_fifo module parameter This commit adds a module parameter that causes the locktorture writer to run at real-time priority. To use it: insmod /lib/modules/torture.ko random_shuffle=1 insmod /lib/modules/locktorture.ko torture_type=mutex_lock rt_boost=1 rt_boost_factor=50 nested_locks=3 writer_fifo=1 ^^^^^^^^^^^^^ A predecessor to this patch has been helpful to uncover issues with the proxy-execution series. [ paulmck: Remove locktorture-specific code from kernel/torture.c. ] Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Valentin Schneider <vschneid@redhat.com> Cc: kernel-team@android.com Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com> [jstultz: Include header change to build, reword commit message] Signed-off-by: John Stultz <jstultz@google.com> Acked-by: Davidlohr Bueso <dave@stgolabs.net> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
37002bc6 |
|
17-Jul-2023 |
Costa Shulyupin <costa.shul@redhat.com> |
docs: move s390 under arch and fix all in-tree references. Architecture-specific documentation is being moved into Documentation/arch/ as a way of cleaning up the top-level documentation directory and making the docs hierarchy more closely match the source hierarchy. Signed-off-by: Costa Shulyupin <costa.shul@redhat.com> Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com> Acked-by: Jonathan Corbet <corbet@lwn.net> Acked-by: Heiko Carstens <hca@linux.ibm.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Link: https://lore.kernel.org/r/20230718045550.495428-1-costa.shul@redhat.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
#
5f641174 |
|
11-Jul-2023 |
Mario Limonciello <mario.limonciello@amd.com> |
ACPI: thermal: Drop nocrt parameter The `nocrt` module parameter has no code associated with it and does nothing. As `crt=-1` has same functionality as what nocrt should be doing drop `nocrt` and associated documentation. This should fix a quirk for Gigabyte GA-7ZX that used `nocrt` and thus didn't function properly. Fixes: 8c99fdce3078 ("ACPI: thermal: set "thermal.nocrt" via DMI on Gigabyte GA-7ZX") Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Cc: All applicable <stable@vger.kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
#
2d7b2b34 |
|
01-Jun-2023 |
Paul E. McKenney <paulmck@kernel.org> |
rcuscale: Add kfree_by_call_rcu and kfree_mult to documentation This commit adds the kfree_by_call_rcu and kfree_mult rcuscale module parameters to kernel-parameters.txt. While in the area, it updates to rcuscale.scale_type. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
|
#
7221f493 |
|
16-May-2023 |
Paul E. McKenney <paulmck@kernel.org> |
rcuscale: Add minruntime module parameter By default, rcuscale collects only 100 points of data per writer, but arranging for all kthreads to be actively collecting (if not recording) data during the time that any kthread might be recording. This works well, but does not allow much time to bring external performance tools to bear. This commit therefore adds a minruntime module parameter that specifies a minimum data-collection interval in seconds. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
2226f3dc |
|
16-May-2023 |
Paul E. McKenney <paulmck@kernel.org> |
rcuscale: Permit blocking delays between writers Some workloads do isolated RCU work, and this can affect operation latencies. This commit therefore adds a writer_holdoff_jiffies module parameter that causes writers to block for the specified number of jiffies between each pair of consecutive write-side operations. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
db13710a |
|
26-Jun-2023 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Cancel callback laziness if too many callbacks The various RCU Tasks flavors now do lazy grace periods when there are only asynchronous grace period requests. By default, the system will let 250 milliseconds elapse after the first call_rcu_tasks*() callbacki is queued before starting a grace period. In contrast, synchronous grace period requests such as synchronize_rcu_tasks*() will start a grace period immediately. However, invoking one of the call_rcu_tasks*() functions in a too-tight loop can result in a callback flood, which in turn can exhaust memory if grace periods are delayed for too long. This commit therefore sets a limit so that the grace-period kthread will be awakened when any CPU's callback list expands to contain rcupdate.rcu_task_lazy_lim callbacks elements (defaulting to 32, set to -1 to disable), the grace-period kthread will be awakened, thus cancelling any ongoing laziness and getting out in front of the potential callback flood. Cc: Alexei Starovoitov <ast@kernel.org> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
450d461a |
|
17-May-2023 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Add kernel boot parameters for callback laziness This commit adds kernel boot parameters for callback laziness, allowing the RCU Tasks flavors to be individually adjusted. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
fb3bd914 |
|
28-Jun-2023 |
Borislav Petkov (AMD) <bp@alien8.de> |
x86/srso: Add a Speculative RAS Overflow mitigation Add a mitigation for the speculative return address stack overflow vulnerability found on AMD processors. The mitigation works by ensuring all RET instructions speculate to a controlled location, similar to how speculation is controlled in the retpoline sequence. To accomplish this, the __x86_return_thunk forces the CPU to mispredict every function return using a 'safe return' sequence. To ensure the safety of this mitigation, the kernel must ensure that the safe return sequence is itself free from attacker interference. In Zen3 and Zen4, this is accomplished by creating a BTB alias between the untraining function srso_untrain_ret_alias() and the safe return function srso_safe_ret_alias() which results in evicting a potentially poisoned BTB entry and using that safe one for all function returns. In older Zen1 and Zen2, this is accomplished using a reinterpretation technique similar to Retbleed one: srso_untrain_ret() and srso_safe_ret(). Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
|
#
553a5c03 |
|
12-Jul-2023 |
Daniel Sneddon <daniel.sneddon@linux.intel.com> |
x86/speculation: Add force option to GDS mitigation The Gather Data Sampling (GDS) vulnerability allows malicious software to infer stale data previously stored in vector registers. This may include sensitive data such as cryptographic keys. GDS is mitigated in microcode, and systems with up-to-date microcode are protected by default. However, any affected system that is running with older microcode will still be vulnerable to GDS attacks. Since the gather instructions used by the attacker are part of the AVX2 and AVX512 extensions, disabling these extensions prevents gather instructions from being executed, thereby mitigating the system from GDS. Disabling AVX2 is sufficient, but we don't have the granularity to do this. The XCR0[2] disables AVX, with no option to just disable AVX2. Add a kernel parameter gather_data_sampling=force that will enable the microcode mitigation if available, otherwise it will disable AVX on affected systems. This option will be ignored if cmdline mitigations=off. This is a *big* hammer. It is known to break buggy userspace that uses incomplete, buggy AVX enumeration. Unfortunately, such userspace does exist in the wild: https://www.mail-archive.com/bug-coreutils@gnu.org/msg33046.html [ dhansen: add some more ominous warnings about disabling AVX ] Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
|
#
8974eb58 |
|
12-Jul-2023 |
Daniel Sneddon <daniel.sneddon@linux.intel.com> |
x86/speculation: Add Gather Data Sampling mitigation Gather Data Sampling (GDS) is a hardware vulnerability which allows unprivileged speculative access to data which was previously stored in vector registers. Intel processors that support AVX2 and AVX512 have gather instructions that fetch non-contiguous data elements from memory. On vulnerable hardware, when a gather instruction is transiently executed and encounters a fault, stale data from architectural or internal vector registers may get transiently stored to the destination vector register allowing an attacker to infer the stale data using typical side channel techniques like cache timing attacks. This mitigation is different from many earlier ones for two reasons. First, it is enabled by default and a bit must be set to *DISABLE* it. This is the opposite of normal mitigation polarity. This means GDS can be mitigated simply by updating microcode and leaving the new control bit alone. Second, GDS has a "lock" bit. This lock bit is there because the mitigation affects the hardware security features KeyLocker and SGX. It needs to be enabled and *STAY* enabled for these features to be mitigated against GDS. The mitigation is enabled in the microcode by default. Disable it by setting gather_data_sampling=off or by disabling all mitigations with mitigations=off. The mitigation status can be checked by reading: /sys/devices/system/cpu/vulnerabilities/gather_data_sampling Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
|
#
57ada235 |
|
30-Jun-2023 |
Olaf Hering <olaf@aepfle.de> |
Fix documentation of panic_on_warn The kernel cmdline option panic_on_warn expects an integer, it is not a plain option as documented. A number of uses in the tree figured this already, and use panic_on_warn=1 for their purpose. Adjust a comment which otherwise may mislead people in the future. Fixes: 9e3961a09798 ("kernel: add panic_on_warn") Signed-off-by: Olaf Hering <olaf@aepfle.de> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
|
#
724f4c0d |
|
14-May-2023 |
Sunil V L <sunilvl@ventanamicro.com> |
RISC-V: Add ACPI initialization in setup_arch() Initialize the ACPI core for RISC-V during boot. ACPI tables and interpreter are initialized based on the information passed from the firmware and the value of the kernel parameter 'acpi'. With ACPI support added for RISC-V, the kernel parameter 'acpi' is also supported on RISC-V. Hence, update the documentation. Signed-off-by: Sunil V L <sunilvl@ventanamicro.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Acked-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20230515054928.2079268-9-sunilvl@ventanamicro.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
#
e4624435 |
|
12-Jun-2023 |
Jonathan Corbet <corbet@lwn.net> |
docs: arm64: Move arm64 documentation under Documentation/arch/ Architecture-specific documentation is being moved into Documentation/arch/ as a way of cleaning up the top-level documentation directory and making the docs hierarchy more closely match the source hierarchy. Move Documentation/arm64 into arch/ (along with the Chinese equvalent translations) and fix up documentation references. Cc: Will Deacon <will@kernel.org> Cc: Alex Shi <alexs@kernel.org> Cc: Hu Haowen <src.res@email.cn> Cc: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Yantengsi <siyanteng@loongson.cn> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
|
#
66419036 |
|
30-May-2023 |
Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> |
iommu/amd: Introduce Disable IRTE Caching Support An Interrupt Remapping Table (IRT) stores interrupt remapping configuration for each device. In a normal operation, the AMD IOMMU caches the table to optimize subsequent data accesses. This requires the IOMMU driver to invalidate IRT whenever it updates the table. The invalidation process includes issuing an INVALIDATE_INTERRUPT_TABLE command following by a COMPLETION_WAIT command. However, there are cases in which the IRT is updated at a high rate. For example, for IOMMU AVIC, the IRTE[IsRun] bit is updated on every vcpu scheduling (i.e. amd_iommu_update_ga()). On system with large amount of vcpus and VFIO PCI pass-through devices, the invalidation process could potentially become a performance bottleneck. Introducing a new kernel boot option: amd_iommu=irtcachedis which disables IRTE caching by setting the IRTCachedis bit in each IOMMU Control register, and bypass the IRT invalidation process. Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Co-developed-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Link: https://lore.kernel.org/r/20230530141137.14376-4-suravee.suthikulpanit@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
#
975fd3c2 |
|
21-May-2023 |
Jiaxun Yang <jiaxun.yang@flygoat.com> |
MIPS: Select CONFIG_GENERIC_IDLE_POLL_SETUP hlt,nohlt paramaters are useful when debugging cpuidle related issues. Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
|
#
96cb8ae2 |
|
21-May-2023 |
Jiaxun Yang <jiaxun.yang@flygoat.com> |
MIPS: Rework smt cmdline parameters Provide a generic smt parameters interface aligned with s390 to allow users to limit smt usage and threads per core. It replaced previous undocumented "nothreads" parameter for smp-cps which is ambiguous and does not cover smp-mt. Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
|
#
702f3189 |
|
31-May-2023 |
Christoph Hellwig <hch@lst.de> |
block: move the code to do early boot lookup of block devices to block/ Create a new block/early-lookup.c to keep the early block device lookup code instead of having this code sit with the early mount code. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230531125535.676098-16-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
cf056a43 |
|
31-May-2023 |
Christoph Hellwig <hch@lst.de> |
init: improve the name_to_dev_t interface name_to_dev_t has a very misleading name, that doesn't make clear it should only be used by the early init code, and also has a bad calling convention that doesn't allow returning different kinds of errors. Rename it to early_lookup_bdev to make the use case clear, and return an errno, where -EINVAL means the string could not be parsed, and -ENODEV means it the string was valid, but there was no device found for it. Also stub out the whole call for !CONFIG_BLOCK as all the non-block root cases are always covered in the caller. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230531125535.676098-14-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
c0c1a7dc |
|
31-May-2023 |
Christoph Hellwig <hch@lst.de> |
init: move the nfs/cifs/ram special cases out of name_to_dev_t The nfs/cifs/ram special case only needs to be parsed once, and only in the boot code. Move them out of name_to_dev_t and into prepare_namespace. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230531125535.676098-13-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
3e1dedb2 |
|
09-May-2023 |
Kristina Martsenko <kristina.martsenko@arm.com> |
arm64: mops: allow disabling MOPS from the kernel command line Make it possible to disable the MOPS extension at runtime using the kernel command line. This can be useful for testing or working around hardware issues. For example it could be used to test new memory copy routines that do not use MOPS instructions (e.g. from Arm Optimized Routines). Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com> Link: https://lore.kernel.org/r/20230509142235.3284028-11-kristina.martsenko@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
#
a4316603 |
|
02-May-2023 |
Juergen Gross <jgross@suse.com> |
x86/mtrr: Add mtrr=debug command line option Add a new command line option "mtrr=debug" for getting debug output after building the new cache mode map. The output will include MTRR register values and the resulting map. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Tested-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/20230502120931.20719-13-jgross@suse.com Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
|
#
f41dd67d |
|
03-May-2023 |
Yan-Jie Wang <yanjiewtw@gmail.com> |
docs: clarify KVM related kernel parameters' descriptions The descriptions of certain KVM related kernel parameters can be confusing. They state "Disable ...," which may make people think that setting them to 1 will disable the associated feature when in fact the opposite is true. This commit addresses this issue by revising the descriptions of these parameters by using "Control..." rather than "Enable/Disable...". 1==enabled or 0==disabled can be communicated by the description of default value such as "1 (enabled)" or "0 (disabled)". Also update the description of KVM's default value for kvm-intel.nested as it is enabled by default. Signed-off-by: Yan-Jie Wang <yanjiewtw@gmail.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net> Link: https://lore.kernel.org/r/20230503081530.19956-1-yanjiewtw@gmail.com
|
#
f02c20d9 |
|
27-Apr-2023 |
Natesh Sharma <nsharma@redhat.com> |
docs: admin-guide: Add information about intel_pstate active mode Information about intel_pstate active mode is added in the doc. This operation mode could be used to set on the hardware when it's not activated. Status of the mode could be checked from sysfs file i.e., /sys/devices/system/cpu/intel_pstate/status. The information is already available in cpu-freq/intel-pstate.txt documentation. Signed-off-by: Natesh Sharma <nsharma@redhat.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net> [jc: reformatted for width ] Link: https://lore.kernel.org/r/20230427083706.49882-1-nsharma@redhat.com
|
#
63638450 |
|
17-May-2023 |
Tejun Heo <tj@kernel.org> |
workqueue: Report work funcs that trigger automatic CPU_INTENSIVE mechanism Workqueue now automatically marks per-cpu work items that hog CPU for too long as CPU_INTENSIVE, which excludes them from concurrency management and prevents stalling other concurrency-managed work items. If a work function keeps running over the thershold, it likely needs to be switched to use an unbound workqueue. This patch adds a debug mechanism which tracks the work functions which trigger the automatic CPU_INTENSIVE mechanism and report them using pr_warn() with exponential backoff. v3: Documentation update. v2: Drop bouncing to kthread_worker for printing messages. It was to avoid introducing circular locking dependency through printk but not effective as it still had pool lock -> wci_lock -> printk -> pool lock loop. Let's just print directly using printk_deferred(). Signed-off-by: Tejun Heo <tj@kernel.org> Suggested-by: Peter Zijlstra <peterz@infradead.org>
|
#
616db877 |
|
17-May-2023 |
Tejun Heo <tj@kernel.org> |
workqueue: Automatically mark CPU-hogging work items CPU_INTENSIVE If a per-cpu work item hogs the CPU, it can prevent other work items from starting through concurrency management. A per-cpu workqueue which intends to host such CPU-hogging work items can choose to not participate in concurrency management by setting %WQ_CPU_INTENSIVE; however, this can be error-prone and difficult to debug when missed. This patch adds an automatic CPU usage based detection. If a concurrency-managed work item consumes more CPU time than the threshold (10ms by default) continuously without intervening sleeps, wq_worker_tick() which is called from scheduler_tick() will detect the condition and automatically mark it CPU_INTENSIVE. The mechanism isn't foolproof: * Detection depends on tick hitting the work item. Getting preempted at the right timings may allow a violating work item to evade detection at least temporarily. * nohz_full CPUs may not be running ticks and thus can fail detection. * Even when detection is working, the 10ms detection delays can add up if many CPU-hogging work items are queued at the same time. However, in vast majority of cases, this should be able to detect violations reliably and provide reasonable protection with a small increase in code complexity. If some work items trigger this condition repeatedly, the bigger problem likely is the CPU being saturated with such per-cpu work items and the solution would be making them UNBOUND. The next patch will add a debug mechanism to help spot such cases. v4: Documentation for workqueue.cpu_intensive_thresh_us added to kernel-parameters.txt. v3: Switch to use wq_worker_tick() instead of hooking into preemptions as suggested by Peter. v2: Lai pointed out that wq_worker_stopping() also needs to be called from preemption and rtlock paths and an earlier patch was updated accordingly. This patch adds a comment describing the risk of infinte recursions and how they're avoided. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com>
|
#
89da5a69 |
|
12-Apr-2023 |
Josh Poimboeuf <jpoimboe@kernel.org> |
x86/unwind/orc: Add 'unwind_debug' cmdline option Sometimes the one-line ORC unwinder warnings aren't very helpful. Add a new 'unwind_debug' cmdline option which will dump the full stack contents of the current task when an error condition is encountered. Reviewed-by: Miroslav Benes <mbenes@suse.cz> Link: https://lore.kernel.org/r/6afb9e48a05fd2046bfad47e69b061b43dfd0e0e.1681331449.git.jpoimboe@kernel.org Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
|
#
9e5d61c0 |
|
20-Mar-2023 |
Zqiang <qiang1.zhang@intel.com> |
doc/rcutorture: Add description of rcutorture.stall_cpu_block If you build a kernel with CONFIG_PREEMPTION=n and CONFIG_PREEMPT_COUNT=y, then run the rcutorture tests specifying stalls as follows: runqemu kvm slirp nographic qemuparams="-m 1024 -smp 4" \ bootparams="console=ttyS0 rcutorture.stall_cpu=30 \ rcutorture.stall_no_softlockup=1 rcutorture.stall_cpu_block=1" -d The tests will produce the following splat: [ 10.841071] rcu-torture: rcu_torture_stall begin CPU stall [ 10.841073] rcu_torture_stall start on CPU 3. [ 10.841077] BUG: scheduling while atomic: rcu_torture_sta/66/0x0000000 .... [ 10.841108] Call Trace: [ 10.841110] <TASK> [ 10.841112] dump_stack_lvl+0x64/0xb0 [ 10.841118] dump_stack+0x10/0x20 [ 10.841121] __schedule_bug+0x8b/0xb0 [ 10.841126] __schedule+0x2172/0x2940 [ 10.841157] schedule+0x9b/0x150 [ 10.841160] schedule_timeout+0x2e8/0x4f0 [ 10.841192] schedule_timeout_uninterruptible+0x47/0x50 [ 10.841195] rcu_torture_stall+0x2e8/0x300 [ 10.841199] kthread+0x175/0x1a0 [ 10.841206] ret_from_fork+0x2c/0x50 This is because the rcutorture.stall_cpu_block=1 module parameter causes rcu_torture_stall() to invoke schedule_timeout_uninterruptible() within an RCU read-side critical section. This in turn results in a quiescent state (which prevents the stall) and a sleep in an atomic context (which produces the above splat). Although this code is operating as designed, the design has proven to be counterintuitive to many. This commit therefore updates the description in kernel-parameters.txt accordingly. [ paulmck: Apply Joel Fernandes feedback. ] Signed-off-by: Zqiang <qiang1.zhang@intel.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
18415f33 |
|
12-May-2023 |
Thomas Gleixner <tglx@linutronix.de> |
cpu/hotplug: Allow "parallel" bringup up to CPUHP_BP_KICK_AP_STATE There is often significant latency in the early stages of CPU bringup, and time is wasted by waking each CPU (e.g. with SIPI/INIT/INIT on x86) and then waiting for it to respond before moving on to the next. Allow a platform to enable parallel setup which brings all to be onlined CPUs up to the CPUHP_BP_KICK_AP state. While this state advancement on the control CPU (BP) is single-threaded the important part is the last state CPUHP_BP_KICK_AP which wakes the to be onlined CPUs up. This allows the CPUs to run up to the first sychronization point cpuhp_ap_sync_alive() where they wait for the control CPU to release them one by one for the full onlining procedure. This parallelism depends on the CPU hotplug core sync mechanism which ensures that the parallel brought up CPUs wait for release before touching any state which would make the CPU visible to anything outside the hotplug control mechanism. To handle the SMT constraints of X86 correctly the bringup happens in two iterations when CONFIG_HOTPLUG_SMT is enabled. The control CPU brings up the primary SMT threads of each core first, which can load the microcode without the need to rendevouz with the thread siblings. Once that's completed it brings up the secondary SMT threads. Co-developed-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Michael Kelley <mikelley@microsoft.com> Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name> Tested-by: Helge Deller <deller@gmx.de> # parisc Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com> # Steam Deck Link: https://lore.kernel.org/r/20230512205257.240231377@linutronix.de
|
#
e59e74dc |
|
12-May-2023 |
Thomas Gleixner <tglx@linutronix.de> |
x86/topology: Remove CPU0 hotplug option This was introduced together with commit e1c467e69040 ("x86, hotplug: Wake up CPU0 via NMI instead of INIT, SIPI, SIPI") to eventually support physical hotplug of CPU0: "We'll change this code in the future to wake up hard offlined CPU0 if real platform and request are available." 11 years later this has not happened and physical hotplug is not officially supported. Remove the cruft. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Michael Kelley <mikelley@microsoft.com> Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name> Tested-by: Helge Deller <deller@gmx.de> # parisc Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com> # Steam Deck Link: https://lore.kernel.org/r/20230512205255.715707999@linutronix.de
|
#
fb611249 |
|
21-Mar-2023 |
Paul E. McKenney <paulmck@kernel.org> |
doc: Document the rcutree.rcu_resched_ns module parameter This commit adds kernel-parameters.txt documentation for the rcutree.rcu_resched_ns module parameter. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
51823ca6 |
|
21-Mar-2023 |
Paul E. McKenney <paulmck@kernel.org> |
doc: Get rcutree module parameters back into alpha order This commit puts the rcutree module parameters back into proper alphabetical order. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
26e7aacb |
|
24-Apr-2023 |
Alexandre Ghiti <alexghiti@rivosinc.com> |
riscv: Allow to downgrade paging mode from the command line Add 2 early command line parameters that allow to downgrade satp mode (using the same naming as x86): - "no5lvl": use a 4-level page table (down from sv57 to sv48) - "no4lvl": use a 3-level page table (down from sv57/sv48 to sv39) Note that going through the device tree to get the kernel command line works with ACPI too since the efi stub creates a device tree anyway with the command line. In KASAN kernels, we can't use the libfdt that early in the boot process since we are not ready to execute instrumented functions. So instead of using the "generic" libfdt, we compile our own versions of those functions that are not instrumented and that are prefixed so that they do not conflict with the generic ones. We also need the non-instrumented versions of the string functions and the prefixed versions of memcpy/memmove. This is largely inspired by commit aacd149b6238 ("arm64: head: avoid relocating the kernel twice for KASLR") from which I removed compilation flags that were not relevant to RISC-V at the moment (LTO, SCS). Also note that we have to link with -z norelro to avoid ld.lld to throw a warning with the new .got sections, like in commit 311bea3cb9ee ("arm64: link with -z norelro for LLD or aarch64-elf"). Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Tested-by: Björn Töpel <bjorn@rivosinc.com> Reviewed-by: Björn Töpel <bjorn@rivosinc.com> Link: https://lore.kernel.org/r/20230424092313.178699-2-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
#
8660484e |
|
13-Apr-2023 |
Luis Chamberlain <mcgrof@kernel.org> |
module: add debugging auto-load duplicate module support The finit_module() system call can in the worst case use up to more than twice of a module's size in virtual memory. Duplicate finit_module() system calls are non fatal, however they unnecessarily strain virtual memory during bootup and in the worst case can cause a system to fail to boot. This is only known to currently be an issue on systems with larger number of CPUs. To help debug this situation we need to consider the different sources for finit_module(). Requests from the kernel that rely on module auto-loading, ie, the kernel's *request_module() API, are one source of calls. Although modprobe checks to see if a module is already loaded prior to calling finit_module() there is a small race possible allowing userspace to trigger multiple modprobe calls racing against modprobe and this not seeing the module yet loaded. This adds debugging support to the kernel module auto-loader (*request_module() calls) to easily detect duplicate module requests. To aid with possible bootup failure issues incurred by this, it will converge duplicates requests to a single request. This avoids any possible strain on virtual memory during bootup which could be incurred by duplicate module autoloading requests. Folks debugging virtual memory abuse on bootup can and should enable this to see what pr_warn()s come on, to see if module auto-loading is to blame for their wores. If they see duplicates they can further debug this by enabling the module.enable_dups_trace kernel parameter or by enabling CONFIG_MODULE_DEBUG_AUTOLOAD_DUPS_TRACE. Current evidence seems to point to only a few duplicates for module auto-loading. And so the source for other duplicates creating heavy virtual memory pressure due to larger number of CPUs should becoming from another place (likely udev). Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
|
#
23baf831 |
|
15-Mar-2023 |
Kirill A. Shutemov <kirill.shutemov@linux.intel.com> |
mm, treewide: redefine MAX_ORDER sanely MAX_ORDER currently defined as number of orders page allocator supports: user can ask buddy allocator for page order between 0 and MAX_ORDER-1. This definition is counter-intuitive and lead to number of bugs all over the kernel. Change the definition of MAX_ORDER to be inclusive: the range of orders user can ask from buddy allocator is 0..MAX_ORDER now. [kirill@shutemov.name: fix min() warning] Link: https://lkml.kernel.org/r/20230315153800.32wib3n5rickolvh@box [akpm@linux-foundation.org: fix another min_t warning] [kirill@shutemov.name: fixups per Zi Yan] Link: https://lkml.kernel.org/r/20230316232144.b7ic4cif4kjiabws@box.shutemov.name [akpm@linux-foundation.org: fix underlining in docs] Link: https://lore.kernel.org/oe-kbuild-all/202303191025.VRCTk6mP-lkp@intel.com/ Link: https://lkml.kernel.org/r/20230315113133.11326-11-kirill.shutemov@linux.intel.com Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Michael Ellerman <mpe@ellerman.id.au> [powerpc] Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
#
203e4358 |
|
20-Mar-2023 |
Paul E. McKenney <paulmck@kernel.org> |
kernel/smp: Make csdlock_debug= resettable It is currently possible to set the csdlock_debug_enabled static branch, but not to reset it. This is an issue when several different entities supply kernel boot parameters and also for kernels built with CONFIG_CSD_LOCK_WAIT_DEBUG_DEFAULT=y. Therefore, make the csdlock_debug=0 kernel boot parameter turn off debugging. Last one wins! Reported-by: Jes Sorensen <Jes.Sorensen@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/20230321005516.50558-4-paulmck@kernel.org
|
#
1771257c |
|
20-Mar-2023 |
Paul E. McKenney <paulmck@kernel.org> |
locking/csd_lock: Remove added data from CSD lock debugging The diagnostics added by this commit were extremely useful in one instance: a5aabace5fb8 ("locking/csd_lock: Add more data to CSD lock debugging") However, they have not seen much action since, and there have been some concerns expressed that the complexity is not worth the benefit. Therefore, manually revert this commit, but leave a comment telling people where to find these diagnostics. [ paulmck: Apply Juergen Gross feedback. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/20230321005516.50558-2-paulmck@kernel.org
|
#
c5219860 |
|
20-Mar-2023 |
Paul E. McKenney <paulmck@kernel.org> |
locking/csd_lock: Add Kconfig option for csd_debug default The csd_debug kernel parameter works well, but is inconvenient in cases where it is more closely associated with boot loaders or automation than with a particular kernel version or release. Thererfore, provide a new CSD_LOCK_WAIT_DEBUG_DEFAULT Kconfig option that defaults csd_debug to 1 when selected and 0 otherwise, with this latter being the default. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/20230321005516.50558-1-paulmck@kernel.org
|
#
ffbe08a8 |
|
03-Mar-2023 |
Saravana Kannan <saravanak@google.com> |
driver core: Add fw_devlink.sync_state command line param When all devices that could probe have finished probing (based on deferred_probe_timeout configuration or late_initcall() when !CONFIG_MODULES), this parameter controls what to do with devices that haven't yet received their sync_state() calls. fw_devlink.sync_state=strict is the default and the driver core will continue waiting on all consumers of a device to probe successfully before sync_state() is called for the device. This is the default behavior since calling sync_state() on a device when all its consumers haven't probed could make some systems unusable/unstable. When this option is selected, we also print the list of devices that haven't had sync_state() called on them by the time all devices the could probe have finished probing. fw_devlink.sync_state=timeout will cause the driver core to give up waiting on consumers and call sync_state() on any devices that haven't yet received their sync_state() calls. This option is provided for systems that won't become unusable/unstable as they might be able to save power (depends on state of hardware before kernel starts) if all devices get their sync_state(). Signed-off-by: Saravana Kannan <saravanak@google.com> Link: https://lore.kernel.org/r/20230304005355.746421-2-saravanak@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
721da5ce |
|
23-Feb-2023 |
Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
driver core: remove CONFIG_SYSFS_DEPRECATED and CONFIG_SYSFS_DEPRECATED_V2 CONFIG_SYSFS_DEPRECATED was added in commit 88a22c985e35 ("CONFIG_SYSFS_DEPRECATED") in 2006 to allow systems with older versions of some tools (i.e. Fedora 3's version of udev) to boot properly. Four years later, in 2010, the option was attempted to be removed as most of userspace should have been fixed up properly by then, but some kernel developers clung to those old systems and refused to update, so we added CONFIG_SYSFS_DEPRECATED_V2 in commit e52eec13cd6b ("SYSFS: Allow boot time switching between deprecated and modern sysfs layout") to allow them to continue to boot properly, and we allowed a boot time parameter to be used to switch back to the old format if needed. Over time, the logic that was covered under these config options was slowly removed from individual driver subsystems successfully, removed, and the only thing that is now left in the kernel are some changes in the block layer's representation in sysfs where real directories are used instead of symlinks like normal. Because the original changes were done to userspace tools in 2006, and all distros that use those tools are long end-of-life, and older non-udev-based systems do not care about the block layer's sysfs representation, it is time to finally remove this old logic and the config entries from the kernel. Cc: Jonathan Corbet <corbet@lwn.net> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: linux-block@vger.kernel.org Cc: linux-doc@vger.kernel.org Acked-by: Jens Axboe <axboe@kernel.dk> Link: https://lore.kernel.org/r/20230223073326.2073220-1-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
2dd6d0eb |
|
07-Mar-2023 |
Wyes Karny <wyes.karny@amd.com> |
cpufreq: amd-pstate: Add guided autonomous mode From ACPI spec below 3 modes for CPPC can be defined: 1. Non autonomous: OS scaling governor specifies operating frequency/ performance level through `Desired Performance` register and platform follows that. 2. Guided autonomous: OS scaling governor specifies min and max frequencies/ performance levels through `Minimum Performance` and `Maximum Performance` register, and platform can autonomously select an operating frequency in this range. 3. Fully autonomous: OS only hints (via EPP) to platform for the required energy performance preference for the workload and platform autonomously scales the frequency. Currently (1) is supported by amd_pstate as passive mode, and (3) is implemented by EPP support. This change is to support (2). In guided autonomous mode the min_perf is based on the input from the scaling governor. For example, in case of schedutil this value depends on the current utilization. And max_perf is set to max capacity. To activate guided auto mode ``amd_pstate=guided`` command line parameter has to be passed in the kernel. Acked-by: Huang Rui <ray.huang@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name> Signed-off-by: Wyes Karny <wyes.karny@amd.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
#
3e6e0780 |
|
07-Mar-2023 |
Wyes Karny <wyes.karny@amd.com> |
Documentation: cpufreq: amd-pstate: Move amd_pstate param to alphabetical order Move amd_pstate command line param description to correct alphabetical order. Acked-by: Huang Rui <ray.huang@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name> Signed-off-by: Wyes Karny <wyes.karny@amd.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
#
ff61f079 |
|
14-Mar-2023 |
Jonathan Corbet <corbet@lwn.net> |
docs: move x86 documentation into Documentation/arch/ Move the x86 documentation under Documentation/arch/ as a way of cleaning up the top-level directory and making the structure of our docs more closely match the structure of the source directories it describes. All in-kernel references to the old paths have been updated. Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: linux-arch@vger.kernel.org Cc: x86@kernel.org Cc: Borislav Petkov <bp@alien8.de> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/lkml/20230315211523.108836-1-corbet@lwn.net/ Signed-off-by: Jonathan Corbet <corbet@lwn.net>
|
#
f030c8fd |
|
15-Mar-2023 |
Bagas Sanjaya <bagasdotme@gmail.com> |
Documentation: kernel-parameters: Remove meye entry Commit ba47652ba65523 ("media: meye: remove this deprecated driver") removes meye driver but forgets to purge its kernel-parameters.txt entry, hence broken reference. Remove the entry. Link: https://lore.kernel.org/all/202302070341.OVqstpMM-lkp@intel.com/ Fixes: ba47652ba655 ("media: meye: remove this deprecated driver") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Link: https://lore.kernel.org/r/20230315100246.62324-1-bagasdotme@gmail.com Signed-off-by: Jonathan Corbet <corbet@lwn.net>
|
#
a894a8a5 |
|
16-Mar-2023 |
Randy Dunlap <rdunlap@infradead.org> |
Documentation: kernel-parameters: sort all "no..." parameters Sort all of the "no..." kernel parameters into the correct order as specified in kernel-parameters.rst: "into English Dictionary order (defined as ignoring all punctuation and sorting digits before letters in a case insensitive manner)". No other changes here, just movement of lines. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Link: https://lore.kernel.org/r/20230317002635.16540-1-rdunlap@infradead.org Signed-off-by: Jonathan Corbet <corbet@lwn.net>
|
#
c500488f |
|
26-Feb-2023 |
Randy Dunlap <rdunlap@infradead.org> |
Documentation: kernel-parameters: sort NFS parameters Sort the NFS kernel command line parameters. This is done in 4 groups so as to not have them intermingled: 'nfs' module parameters, 'nfs4' module parameters, 'nfsd' module parameters, and nfs "global" (__setup, no module) parameters. There were 5 parameters which were listed with a space between the parameter name and the following '=' sign. The space has been removed since module parameters expect 'parameter=' with no intervening space. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: linux-doc@vger.kernel.org Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Anna Schumaker <anna@kernel.org> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Jeff Layton <jlayton@kernel.org> Cc: linux-nfs@vger.kernel.org Link: https://lore.kernel.org/r/20230227025816.1083-1-rdunlap@infradead.org Signed-off-by: Jonathan Corbet <corbet@lwn.net>
|
#
16c52e50 |
|
18-Apr-2023 |
Huacai Chen <chenhuacai@kernel.org> |
LoongArch: Make WriteCombine configurable for ioremap() LoongArch maintains cache coherency in hardware, but when paired with LS7A chipsets the WUC attribute (Weak-ordered UnCached, which is similar to WriteCombine) is out of the scope of cache coherency machanism for PCIe devices (this is a PCIe protocol violation, which may be fixed in newer chipsets). This means WUC can only used for write-only memory regions now, so this option is disabled by default, making WUC silently fallback to SUC for ioremap(). You can enable this option if the kernel is ensured to run on hardware without this bug. Kernel parameter writecombine=on/off can be used to override the Kconfig option. Cc: stable@vger.kernel.org Suggested-by: WANG Xuerui <kernel@xen0n.name> Reviewed-by: WANG Xuerui <kernel@xen0n.name> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
#
675cabc8 |
|
09-Feb-2023 |
Jintack Lim <jintack.lim@linaro.org> |
arm64: Add ARM64_HAS_NESTED_VIRT cpufeature Add a new ARM64_HAS_NESTED_VIRT feature to indicate that the CPU has the ARMv8.3 nested virtualization capability, together with the 'kvm-arm.mode=nested' command line option. This will be used to support nested virtualization in KVM. Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Jintack Lim <jintack.lim@linaro.org> Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com> [maz: moved the command-line option to kvm-arm.mode] Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230209175820.1939006-2-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
#
9c1c251d |
|
06-Feb-2023 |
Steven Rostedt (Google) <rostedt@goodmis.org> |
tracing: Allow boot instances to have snapshot buffers Add to ftrace_boot_snapshot, "=<instance>" name, where the instance will get a snapshot buffer, and will take a snapshot at the end of boot (which will save the boot traces). Link: https://lkml.kernel.org/r/20230207173026.792774721@goodmis.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Ross Zwisler <zwisler@google.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
#
c4846480 |
|
06-Feb-2023 |
Steven Rostedt (Google) <rostedt@goodmis.org> |
tracing: Add enabling of events to boot instances Add the format of: trace_instance=foo,sched:sched_switch,irq_handler_entry,initcall That will create the "foo" instance and enable the sched_switch event (here were the "sched" system is explicitly specified), the irq_handler_entry event, and all events under the system initcall. Link: https://lkml.kernel.org/r/20230207173026.386114535@goodmis.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Ross Zwisler <zwisler@google.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
#
cb1f98c5 |
|
06-Feb-2023 |
Steven Rostedt (Google) <rostedt@goodmis.org> |
tracing: Add creation of instances at boot command line Add kernel command line to add tracing instances. This only creates instances at boot but still does not enable any events to them. Later changes will extend this command line to add enabling of events, filters, and triggers. As well as possibly redirecting trace_printk()! Link: https://lkml.kernel.org/r/20230207173026.186210158@goodmis.org Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Ross Zwisler <zwisler@google.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
#
374b30f2 |
|
24-Nov-2022 |
Ricardo Ribalda <ribalda@chromium.org> |
earlycon: Let users set the clock frequency Some platforms, namely AMD Picasso, use non standard uart clocks (48M), witch makes it impossible to use with earlycon. Let the user select its own frequency. Signed-off-by: Ricardo Ribalda <ribalda@chromium.org> Reviewed-by: Jiri Slaby <jirislaby@kernel.org> Link: https://lore.kernel.org/r/20221123-serial-clk-v3-1-49c516980ae0@chromium.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
bba50659 |
|
11-Jan-2023 |
Vidya Sagar <vidyas@nvidia.com> |
PCI/AER: Configure ECRC only if AER is native As the ECRC configuration bits are part of AER registers, configure ECRC only if AER is natively owned by the kernel. Link: https://lore.kernel.org/r/20230112072111.20063-1-vidyas@nvidia.com Signed-off-by: Vidya Sagar <vidyas@nvidia.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
#
a568375b |
|
26-Jan-2023 |
Bjorn Helgaas <bhelgaas@google.com> |
printk: Document that CONFIG_BOOT_PRINTK_DELAY required for boot_delay= Document the fact that CONFIG_BOOT_PRINTK_DELAY must be enabled for the "boot_delay" kernel parameter to work. Also mention that "lpj=" may be necessary. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Link: https://lore.kernel.org/r/20230126225420.1320276-1-helgaas@kernel.org Signed-off-by: Jonathan Corbet <corbet@lwn.net>
|
#
dbeb56fe |
|
29-Jan-2023 |
Randy Dunlap <rdunlap@infradead.org> |
Documentation: admin-guide: correct spelling Correct spelling problems for Documentation/admin-guide/ as reported by codespell. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: Mukesh Ojha <quic_mojha@quicinc.com> Cc: Tejun Heo <tj@kernel.org> Cc: Zefan Li <lizefan.x@bytedance.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: cgroups@vger.kernel.org Cc: Alasdair Kergon <agk@redhat.com> Cc: Mike Snitzer <snitzer@kernel.org> Cc: dm-devel@redhat.com Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: linux-media@vger.kernel.org Cc: linux-mm@kvack.org Link: https://lore.kernel.org/r/20230129231053.20863-2-rdunlap@infradead.org Signed-off-by: Jonathan Corbet <corbet@lwn.net>
|
#
2abfcd29 |
|
25-Jan-2023 |
Ross Zwisler <zwisler@chromium.org> |
docs: ftrace: always use canonical ftrace path The canonical location for the tracefs filesystem is at /sys/kernel/tracing. But, from Documentation/trace/ftrace.rst: Before 4.1, all ftrace tracing control files were within the debugfs file system, which is typically located at /sys/kernel/debug/tracing. For backward compatibility, when mounting the debugfs file system, the tracefs file system will be automatically mounted at: /sys/kernel/debug/tracing Many parts of Documentation still reference this older debugfs path, so let's update them to avoid confusion. Signed-off-by: Ross Zwisler <zwisler@google.com> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Link: https://lore.kernel.org/r/20230125213251.2013791-1-zwisler@google.com Signed-off-by: Jonathan Corbet <corbet@lwn.net>
|
#
7750d8b5 |
|
30-Jan-2023 |
Ondrej Zary <linux@zary.sk> |
drivers/block: Remove PARIDE core and high-level protocols Remove PARIDE core and high level protocols, taking care not to break low-level drivers (used by pata_parport). Also update documentation. Signed-off-by: Ondrej Zary <linux@zary.sk> Acked-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
|
#
17f0669c |
|
11-Jan-2023 |
Sohil Mehta <sohil.mehta@intel.com> |
x86/vsyscall: Fix documentation to reflect the default mode The default vsyscall mode has been updated from emulate to xonly for a while. Update the kernel-parameters doc to reflect that. Fixes: 625b7b7f79c6 ("x86/vsyscall: Change the default vsyscall mode to xonly") Cc: Andy Lutomirski <luto@kernel.org> Signed-off-by: Sohil Mehta <sohil.mehta@intel.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20230111193211.1987047-1-sohil.mehta@intel.com Signed-off-by: Jonathan Corbet <corbet@lwn.net>
|
#
4d2e4980 |
|
14-Oct-2022 |
Damien Le Moal <damien.lemoal@opensource.wdc.com> |
ata: libata: cleanup fua support detection Move the detection of a device FUA support from ata_scsiop_mode_sense()/ata_dev_supports_fua() to device scan time in ata_dev_configure(). The function ata_dev_config_fua() is introduced to detect if a device supports FUA and this support is indicated using the new device flag ATA_DFLAG_FUA. In order to blacklist known buggy devices, the horkage flag ATA_HORKAGE_NO_FUA is introduced. Similarly to other horkage flags, the libata.force= arguments "fua" and "nofua" are also introduced to allow a user to control this horkage flag through the "force" libata module parameter. The ATA_DFLAG_FUA device flag is set only and only if all the following conditions are met: * libata.fua module parameter is set to 1 * The device supports the WRITE DMA FUA EXT command, * The device is not marked with the ATA_HORKAGE_NO_FUA flag, either from the blacklist or set by the user with libata.force=nofua * The device supports NCQ (while this is not mandated by the standards, this restriction is introduced to avoid problems with older non-NCQ devices). Enabling or diabling libata FUA support for all devices can now also be done using the "force=[no]fua" module parameter when libata.fua is set to 1. Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Niklas Cassel <niklas.cassel@wdc.com>
|
#
42551b8d |
|
03-Dec-2022 |
Randy Dunlap <rdunlap@infradead.org> |
Documentation: admin: move OOO entries in kernel-parameters.txt Fix the most blatant out-of-order entries in kernel-parameters.txt. No changes other than modifying the order of the entries. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: linux-doc@vger.kernel.org Link: https://lore.kernel.org/r/20221204013050.11496-1-rdunlap@infradead.org Signed-off-by: Jonathan Corbet <corbet@lwn.net>
|
#
b6c1a8af |
|
10-Feb-2023 |
Yafang Shao <laoar.shao@gmail.com> |
mm: memcontrol: add new kernel parameter cgroup.memory=nobpf Add new kernel parameter cgroup.memory=nobpf to allow user disable bpf memory accounting. This is a preparation for the followup patch. Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Link: https://lore.kernel.org/r/20230210154734.4416-2-laoar.shao@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
0051293c |
|
01-Feb-2023 |
Paul E. McKenney <paulmck@kernel.org> |
clocksource: Enable TSC watchdog checking of HPET and PMTMR only when requested Unconditionally enabling TSC watchdog checking of the HPET and PMTMR clocksources can degrade latency and performance. Therefore, provide a new "watchdog" option to the tsc= boot parameter that opts into such checking. Note that tsc=watchdog is overridden by a tsc=nowatchdog regardless of their relative positions in the list of boot parameters. Reported-by: Thomas Gleixner <tglx@linutronix.de> Reported-by: Waiman Long <longman@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Acked-by: Waiman Long <longman@redhat.com>
|
#
5014603e |
|
31-Jan-2023 |
Perry Yuan <perry.yuan@amd.com> |
Documentation: introduce amd pstate active mode kernel command line options AMD Pstate driver support another firmware based autonomous mode with "amd_pstate=active" added to the kernel command line. In autonomous mode SMU firmware decides frequencies at runtime based on workload utilization, usage in other IPs, infrastructure limits such as power, thermals and so on. Acked-by: Huang Rui <ray.huang@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Wyes Karny <wyes.karny@amd.com> Tested-by: Wyes Karny <wyes.karny@amd.com> Signed-off-by: Perry Yuan <perry.yuan@amd.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
#
a7ec817d |
|
04-Jan-2023 |
Feng Tang <feng.tang@intel.com> |
x86/tsc: Add option to force frequency recalibration with HW timer The kernel assumes that the TSC frequency which is provided by the hardware / firmware via MSRs or CPUID(0x15) is correct after applying a few basic consistency checks. This disables the TSC recalibration against HPET or PM timer. As a result there is no mechanism to validate that frequency in cases where a firmware or hardware defect is suspected. And there was case that some user used atomic clock to measure the TSC frequency and reported an inaccuracy issue, which was later fixed in firmware. Add an option 'recalibrate' for 'tsc' kernel parameter to force the tsc freq recalibration with HPET or PM timer, and warn if the deviation from previous value is more than about 500 PPM, which provides a way to verify the data from hardware / firmware. There is no functional change to existing work flow. Recently there was a real-world case: "The 40ms/s divergence between TSC and HPET was observed on hardware that is quite recent" [1], on that platform the TSC frequence 1896 MHz was got from CPUID(0x15), and the force-reclibration with HPET/PMTIMER both calibrated out value of 1975 MHz, which also matched with check from software 'chronyd', indicating it's a problem of BIOS or firmware. [Thanks tglx for helping improving the commit log] [ paulmck: Wordsmith Kconfig help text. ] [1]. https://lore.kernel.org/lkml/20221117230910.GI4001@paulmck-ThinkPad-P17-Gen-1/ Signed-off-by: Feng Tang <feng.tang@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: <x86@kernel.org> Cc: <linux-doc@vger.kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
e7862eda |
|
24-Jan-2023 |
Kim Phillips <kim.phillips@amd.com> |
x86/cpu: Support AMD Automatic IBRS The AMD Zen4 core supports a new feature called Automatic IBRS. It is a "set-and-forget" feature that means that, like Intel's Enhanced IBRS, h/w manages its IBRS mitigation resources automatically across CPL transitions. The feature is advertised by CPUID_Fn80000021_EAX bit 8 and is enabled by setting MSR C000_0080 (EFER) bit 21. Enable Automatic IBRS by default if the CPU feature is present. It typically provides greater performance over the incumbent generic retpolines mitigation. Reuse the SPECTRE_V2_EIBRS spectre_v2_mitigation enum. AMD Automatic IBRS and Intel Enhanced IBRS have similar enablement. Add NO_EIBRS_PBRSB to cpu_vuln_whitelist, since AMD Automatic IBRS isn't affected by PBRSB-eIBRS. The kernel command line option spectre_v2=eibrs is used to select AMD Automatic IBRS, if available. Signed-off-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Sean Christopherson <seanjc@google.com> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/r/20230124163319.2277355-8-kim.phillips@amd.com
|
#
a76f65c8 |
|
13-Jan-2023 |
Babu Moger <babu.moger@amd.com> |
x86/resctrl: Include new features in command line options Add the command line options to enable or disable the new resctrl features: smba: Slow Memory Bandwidth Allocation bmec: Bandwidth Monitor Event Configuration. Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/r/20230113152039.770054-6-babu.moger@amd.com
|
#
be42f00b |
|
19-Nov-2022 |
Zhen Lei <thunder.leizhen@huawei.com> |
rcu: Add RCU stall diagnosis information Because RCU CPU stall warnings are driven from the scheduling-clock interrupt handler, a workload consisting of a very large number of short-duration hardware interrupts can result in misleading stall-warning messages. On systems supporting only a single level of interrupts, that is, where interrupts handlers cannot be interrupted, this can produce misleading diagnostics. The stack traces will show the innocent-bystander interrupted task, not the interrupts that are at the very least exacerbating the stall. This situation can be improved by displaying the number of interrupts and the CPU time that they have consumed. Diagnosing other types of stalls can be eased by also providing the count of softirqs and the CPU time that they consumed as well as the number of context switches and the task-level CPU time consumed. Consider the following output given this change: rcu: INFO: rcu_preempt self-detected stall on CPU rcu: 0-....: (1250 ticks this GP) <omitted> rcu: hardirqs softirqs csw/system rcu: number: 624 45 0 rcu: cputime: 69 1 2425 ==> 2500(ms) This output shows that the number of hard and soft interrupts is small, there are no context switches, and the system takes up a lot of time. This indicates that the current task is looping with preemption disabled. The impact on system performance is negligible because snapshot is recorded only once for all continuous RCU stalls. This added debugging information is suppressed by default and can be enabled by building the kernel with CONFIG_RCU_CPU_STALL_CPUTIME=y or by booting with rcupdate.rcu_cpu_stall_cputime=1. Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Reviewed-by: Mukesh Ojha <quic_mojha@quicinc.com> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
92987fe8 |
|
19-Dec-2022 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Allow expedited RCU CPU stall warnings to dump task stacks This commit introduces the rcupdate.rcu_exp_stall_task_details kernel boot parameter, which cause expedited RCU CPU stall warnings to dump the stacks of any tasks blocking the current expedited grace period. Reported-by: David Howells <dhowells@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
6b34a099 |
|
23-Oct-2022 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc/64s/hash: add stress_hpt kernel boot option to increase hash faults This option increases the number of hash misses by limiting the number of kernel HPT entries, by keeping a per-CPU record of the last kernel HPTEs installed, and removing that from the hash table on the next hash insertion. A timer round-robins CPUs removing remaining kernel HPTEs and clearing the TLB (in the case of bare metal) to increase and slightly randomise kernel fault activity. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Add comment about NR_CPUS usage, fixup whitespace] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221024030150.852517-1-npiggin@gmail.com
|
#
1198d231 |
|
19-Sep-2022 |
Kim Phillips <kim.phillips@amd.com> |
iommu/amd: Fix ill-formed ivrs_ioapic, ivrs_hpet and ivrs_acpihid options Currently, these options cause the following libkmod error: libkmod: ERROR ../libkmod/libkmod-config.c:489 kcmdline_parse_result: \ Ignoring bad option on kernel command line while parsing module \ name: 'ivrs_xxxx[XX:XX' Fix by introducing a new parameter format for these options and throw a warning for the deprecated format. Users are still allowed to omit the PCI Segment if zero. Adding a Link: to the reason why we're modding the syntax parsing in the driver and not in libkmod. Fixes: ca3bf5d47cec ("iommu/amd: Introduces ivrs_acpihid kernel parameter") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/linux-modules/20200310082308.14318-2-lucas.demarchi@intel.com/ Reported-by: Kim Phillips <kim.phillips@amd.com> Co-developed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Kim Phillips <kim.phillips@amd.com> Link: https://lore.kernel.org/r/20220919155638.391481-2-kim.phillips@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
#
a01fdc89 |
|
20-Oct-2022 |
Steven Rostedt (Google) <rostedt@goodmis.org> |
tracing: Add trace_trigger kernel command line option Allow triggers to be enabled at kernel boot up. For example: trace_trigger="sched_switch.stacktrace if prev_state == 2" The above will enable the stacktrace trigger on top of the sched_switch event and only trigger if its prev_state is 2 (TASK_UNINTERRUPTIBLE). Then at boot up, a stacktrace will trigger and be recorded in the tracing ring buffer every time the sched_switch happens where the previous state is TASK_INTERRUPTIBLE. Another useful trigger would be "traceoff" which can stop tracing on an event if a field of the event matches a certain value defined by the filter ("if" statement). Link: https://lore.kernel.org/linux-trace-kernel/20221020210056.0d8d0a5b@gandalf.local.home Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
#
a9ae89df |
|
16-Nov-2022 |
Zhen Lei <thunder.leizhen@huawei.com> |
arm64: kdump: Support crashkernel=X fall back to reserve region above DMA zones For crashkernel=X without '@offset', select a region within DMA zones first, and fall back to reserve region above DMA zones. This allows users to use the same configuration on multiple platforms. Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Acked-by: Baoquan He <bhe@redhat.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20221116121044.1690-3-thunder.leizhen@huawei.com Signed-off-by: Will Deacon <will@kernel.org>
|
#
a149cf00 |
|
16-Nov-2022 |
Zhen Lei <thunder.leizhen@huawei.com> |
arm64: kdump: Provide default size when crashkernel=Y,low is not specified Try to allocate at least 128 MiB low memory automatically for the case that crashkernel=,high is explicitly specified, while crashkenrel=,low is omitted. This allows users to focus more on the high memory requirements of their business rather than the low memory requirements of the crash kernel booting. Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Acked-by: Baoquan He <bhe@redhat.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20221116121044.1690-2-thunder.leizhen@huawei.com Signed-off-by: Will Deacon <will@kernel.org>
|
#
b9b01a56 |
|
01-Nov-2022 |
Jason A. Donenfeld <Jason@zx2c4.com> |
random: use random.trust_{bootloader,cpu} command line option only It's very unusual to have both a command line option and a compile time option, and apparently that's confusing to people. Also, basically everybody enables the compile time option now, which means people who want to disable this wind up having to use the command line option to ensure that anyway. So just reduce the number of moving pieces and nix the compile time option in favor of the more versatile command line option. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
#
9a758d87 |
|
11-Nov-2022 |
Thomas Zimmermann <tzimmermann@suse.de> |
drm: Move nomodeset kernel parameter to drivers/video Move the nomodeset kernel parameter to drivers/video to make it available to non-DRM drivers. Adapt the interface, but keep the DRM interface drm_firmware_drivers_only() to avoid churn within DRM. The function should later be inlined into callers. The parameter disables any DRM graphics driver that would replace a driver for firmware-provided scanout buffers. It is an option to easily fallback to basic graphics output if the hardware's native driver is broken. Moving it to a more prominent location wil make it available to fbdev as well. v2: * clarify the meaning of the nomodeset parameter (Javier) Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Javier Martinez Canillas <javierm@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221111133024.9897-2-tzimmermann@suse.de
|
#
1f3307cf |
|
20-Sep-2022 |
Thomas Richter <tmricht@linux.ibm.com> |
s390/con3215: Drop console data printout when buffer full Using z/VM the 3270 terminal emulator also emulates an IBM 3215 console which outputs line by line. When the screen is full, the console enters the MORE... state and waits for the operator to confirm the data on the screen by pressing a clear key. If this does not happen in the default time frame (currently 50 seconds) the console enters the HOLDING state. It then waits another time frame (currently 10 seconds) before the output continues on the next screen. When the operator presses the clear key during these wait times, the output continues immediately. This may lead to a very long boot time when the console has to print many messages, also the system may hang because of the console's limited buffer space and the system waits for the console output to drain and finally to finish. This problem can only occur when a terminal emulator is actually connected to the 3215 console driver. If not z/VM simply drops console output. Remedy this rare situation and add a kernel boot command line parameter con3215_drop. It can be set to 0 (do not drop) or 1 (do drop) which is the default. This instructs the kernel drop console data when the console buffer is full. This speeds up the boot time considerable and also does not hang the system anymore. Add a sysfs attribute file for console IBM 3215 named con_drop. This allows for changing the behavior after the boot, for example when during interactive debugging a panic/crash is expected. Here is a test of the new behavior using the following test program: #/bin/bash declare -i cnt=4 mode=$(cat /sys/bus/ccw/drivers/3215/con_drop) [ $mode = yes ] && cnt=25 echo "cons_drop $(cat /sys/bus/ccw/drivers/3215/con_drop)" echo "vmcp term more 5 2" vmcp term more 5 2 echo "Run $cnt iterations of "'echo t > /proc/sysrq-trigger' for i in $(seq $cnt) do echo "$i. command 'echo t > /proc/sysrq-trigger' at $(date +%F,%T)" echo t > /proc/sysrq-trigger sleep 1 done echo "droptest done" > /dev/kmsg # Output with sysfs attribute con_drop set to 1: # ./droptest.sh cons_drop yes vmcp term more 5 2 Run 25 iterations of echo t > /proc/sysrq-trigger 1. command 'echo t > /proc/sysrq-trigger' at 2022-09-02,10:15:09 2. command 'echo t > /proc/sysrq-trigger' at 2022-09-02,10:15:10 3. command 'echo t > /proc/sysrq-trigger' at 2022-09-02,10:15:11 4. command 'echo t > /proc/sysrq-trigger' at 2022-09-02,10:15:12 5. command 'echo t > /proc/sysrq-trigger' at 2022-09-02,10:15:13 6. command 'echo t > /proc/sysrq-trigger' at 2022-09-02,10:15:14 7. command 'echo t > /proc/sysrq-trigger' at 2022-09-02,10:15:15 8. command 'echo t > /proc/sysrq-trigger' at 2022-09-02,10:15:16 9. command 'echo t > /proc/sysrq-trigger' at 2022-09-02,10:15:17 10. command 'echo t > /proc/sysrq-trigger' at 2022-09-02,10:15:18 11. command 'echo t > /proc/sysrq-trigger' at 2022-09-02,10:15:19 12. command 'echo t > /proc/sysrq-trigger' at 2022-09-02,10:15:20 13. command 'echo t > /proc/sysrq-trigger' at 2022-09-02,10:15:21 14. command 'echo t > /proc/sysrq-trigger' at 2022-09-02,10:15:22 15. command 'echo t > /proc/sysrq-trigger' at 2022-09-02,10:15:23 16. command 'echo t > /proc/sysrq-trigger' at 2022-09-02,10:15:24 17. command 'echo t > /proc/sysrq-trigger' at 2022-09-02,10:15:25 18. command 'echo t > /proc/sysrq-trigger' at 2022-09-02,10:15:26 19. command 'echo t > /proc/sysrq-trigger' at 2022-09-02,10:15:27 20. command 'echo t > /proc/sysrq-trigger' at 2022-09-02,10:15:28 21. command 'echo t > /proc/sysrq-trigger' at 2022-09-02,10:15:29 22. command 'echo t > /proc/sysrq-trigger' at 2022-09-02,10:15:30 23. command 'echo t > /proc/sysrq-trigger' at 2022-09-02,10:15:31 24. command 'echo t > /proc/sysrq-trigger' at 2022-09-02,10:15:32 25. command 'echo t > /proc/sysrq-trigger' at 2022-09-02,10:15:33 # There are no hangs anymore. Output with sysfs attribute con_drop set to 0 and identical setting for z/VM console 'term more 5 2'. Sometimes hitting the clear key at the x3270 console to progress output. # ./droptest.sh cons_drop no vmcp term more 5 2 Run 4 iterations of echo t > /proc/sysrq-trigger 1. command 'echo t > /proc/sysrq-trigger' at 2022-09-02,10:20:58 2. command 'echo t > /proc/sysrq-trigger' at 2022-09-02,10:24:32 3. command 'echo t > /proc/sysrq-trigger' at 2022-09-02,10:28:04 4. command 'echo t > /proc/sysrq-trigger' at 2022-09-02,10:31:37 # Details: Enable function raw3215_write() to handle tab expansion and newlines and feed it with input not larger than the console buffer of 65536 bytes. Function raw3125_putchar() just forwards its character for output to raw3215_write(). This moves tab to blank conversion to one function raw3215_write() which also does call raw3215_make_room() to wait for enough free buffer space. Function handle_write() loops over all its input and segments input into chunks of console buffer size (should the input be larger). Rework tab expansion handling logic to avoid code duplication. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Acked-by: Peter Oberparleiter <oberpar@linux.ibm.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
#
1056d314 |
|
17-Nov-2022 |
Perry Yuan <Perry.Yuan@amd.com> |
Documentation: add amd-pstate kernel command line options Add a new amd pstate driver command line option to enable driver passive working mode via MSR and shared memory interface to request desired performance within abstract scale and the power management firmware (SMU) convert the perf requests into actual hardware pstates. Also the `disable` parameter can disable the pstate driver loading by adding `amd_pstate=disable` to kernel command line. Acked-by: Huang Rui <ray.huang@amd.com> Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com> Tested-by: Wyes Karny <wyes.karny@amd.com> Signed-off-by: Perry Yuan <Perry.Yuan@amd.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
#
3fac3734 |
|
26-Sep-2022 |
Juergen Gross <jgross@suse.com> |
xen/pv: support selecting safe/unsafe msr accesses Instead of always doing the safe variants for reading and writing MSRs in Xen PV guests, make the behavior controllable via Kconfig option and a boot parameter. The default will be the current behavior, which is to always use the safe variant. Signed-off-by: Juergen Gross <jgross@suse.com>
|
#
b25806dc |
|
26-Sep-2022 |
Johannes Weiner <hannes@cmpxchg.org> |
mm: memcontrol: deprecate swapaccounting=0 mode The swapaccounting= commandline option already does very little today. To close a trivial containment failure case, the swap ownership tracking part of the swap controller has recently become mandatory (see commit 2d1c498072de ("mm: memcontrol: make swap tracking an integral part of memory control") for details), which makes up the majority of the work during swapout, swapin, and the swap slot map. The only thing left under this flag is the page_counter operations and the visibility of the swap control files in the first place, which are rather meager savings. There also aren't many scenarios, if any, where controlling the memory of a cgroup while allowing it unlimited access to a global swap space is a workable resource isolation strategy. On the other hand, there have been several bugs and confusion around the many possible swap controller states (cgroup1 vs cgroup2 behavior, memory accounting without swap accounting, memcg runtime disabled). This puts the maintenance overhead of retaining the toggle above its practical benefits. Deprecate it. Link: https://lkml.kernel.org/r/20220926135704.400818-3-hannes@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Suggested-by: Shakeel Butt <shakeelb@google.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
#
404a5e72 |
|
19-Sep-2022 |
Christophe Leroy <christophe.leroy@csgroup.eu> |
Documentation: Rename PPC_FSL_BOOK3E to PPC_E500 CONFIG_PPC_FSL_BOOK3E is redundant with CONFIG_PPC_E500. Rename it so that CONFIG_PPC_FSL_BOOK3E can be removed later. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/d3d42b395c09e66b0705fda1e51779f33e13ac38.1663606876.git.christophe.leroy@csgroup.eu
|
#
c4f20f14 |
|
25-Aug-2022 |
Li Zhe <lizhe.67@bytedance.com> |
page_ext: introduce boot parameter 'early_page_ext' In commit 2f1ee0913ce5 ("Revert "mm: use early_pfn_to_nid in page_ext_init""), we call page_ext_init() after page_alloc_init_late() to avoid some panic problem. It seems that we cannot track early page allocations in current kernel even if page structure has been initialized early. This patch introduces a new boot parameter 'early_page_ext' to resolve this problem. If we pass it to the kernel, page_ext_init() will be moved up and the feature 'deferred initialization of struct pages' will be disabled to initialize the page allocator early and prevent the panic problem above. It can help us to catch early page allocations. This is useful especially when we find that the free memory value is not the same right after different kernel booting. [akpm@linux-foundation.org: fix section issue by removing __meminitdata] Link: https://lkml.kernel.org/r/20220825102714.669-1-lizhe.67@bytedance.com Signed-off-by: Li Zhe <lizhe.67@bytedance.com> Suggested-by: Michal Hocko <mhocko@suse.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Jason A. Donenfeld <Jason@zx2c4.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kees Cook <keescook@chromium.org> Cc: Mark-PK Tsai <mark-pk.tsai@mediatek.com> Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
#
d799a183 |
|
25-Aug-2022 |
Vasant Hegde <vasant.hegde@amd.com> |
iommu/amd: Add command-line option to enable different page table Enhance amd_iommu command line option to specify v1 or v2 page table. By default system will boot in V1 page table mode. Co-developed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Link: https://lore.kernel.org/r/20220825063939.8360-10-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
#
0e8a6313 |
|
02-Sep-2022 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc/pseries: Implement CONFIG_PARAVIRT_TIME_ACCOUNTING CONFIG_VIRT_CPU_ACCOUNTING_GEN under pseries does not provide stolen time accounting unless CONFIG_PARAVIRT_TIME_ACCOUNTING is enabled. Implement this using the VPA accumulated wait counters. Note this will not work on current KVM hosts because KVM does not implement the VPA dispatch counters (yet). It could be implemented with the dispatch trace log as it is for VIRT_CPU_ACCOUNTING_NATIVE, but that is not necessary for the more limited accounting provided by PARAVIRT_TIME_ACCOUNTING, and it is more expensive, complex, and has downsides like potential log wrap. From Shrikanth: [...] it was tested on Power10 [PowerVM] Shared LPAR. system has two LPAR. we will call first one LPAR1 and second one as LPAR2. Test was carried out in SMT=1. Similar observation was seen in SMT=8 as well. LPAR config header from each LPAR is below. LPAR1 is twice as big as LPAR2. Since Both are sharing the same underlying hardware, work stealing will happen when both the LPAR's are contending for the same resource. LPAR1: type=Shared mode=Uncapped smt=Off lcpu=40 cpus=40 ent=20.00 LPAR2: type=Shared mode=Uncapped smt=Off lcpu=20 cpus=40 ent=10.00 mpstat was used to check for the utilization. stress-ng has been used as the workload. Few cases are tested. when the both LPAR are idle there is no steal time. when LPAR1 starts running at 100% which consumes all of the physical resource, steal time starts to get accounted. With LPAR1 running at 100% and LPAR2 starts running, steal time starts increasing. This is as expected. When the LPAR2 Load is increased further, steal time increases further. Case 1: 0% LPAR1; 0% LPAR2 %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle 0.00 0.00 0.05 0.00 0.00 0.00 0.00 0.00 0.00 99.95 Case 2: 100% LPAR1; 0% LPAR2 %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle 97.68 0.00 0.00 0.00 0.00 0.00 2.32 0.00 0.00 0.00 Case 3: 100% LPAR1; 50% LPAR2 %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle 86.34 0.00 0.10 0.00 0.00 0.03 13.54 0.00 0.00 0.00 Case 4: 100% LPAR1; 100% LPAR2 %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle 78.54 0.00 0.07 0.00 0.00 0.02 21.36 0.00 0.00 0.00 Case 5: 50% LPAR1; 100% LPAR2 %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle 49.37 0.00 0.00 0.00 0.00 0.00 1.17 0.00 0.00 49.47 Patch is accounting for the steal time and basic tests are holding good. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Tested-by: Shrikanth Hegde <sshegde@linux.ibm.com> [mpe: Add SPDX tag to new paravirt_api_clock.h] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220902085316.2071519-3-npiggin@gmail.com
|
#
d20a6ba5 |
|
23-Aug-2022 |
Joe Fradley <joefradley@google.com> |
kunit: add kunit.enable to enable/disable KUnit test This patch adds the kunit.enable module parameter that will need to be set to true in addition to KUNIT being enabled for KUnit tests to run. The default value is true giving backwards compatibility. However, for the production+testing use case the new config option KUNIT_DEFAULT_ENABLED can be set to N requiring the tester to opt-in by passing kunit.enable=1 to the kernel. Signed-off-by: Joe Fradley <joefradley@google.com> Reviewed-by: David Gow <davidgow@google.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
|
#
e9207223 |
|
10-Sep-2022 |
Kefeng Wang <wangkefeng.wang@huawei.com> |
arm64: support huge vmalloc mappings As commit 559089e0a93d ("vmalloc: replace VM_NO_HUGE_VMAP with VM_ALLOW_HUGE_VMAP"), the use of hugepage mappings for vmalloc is an opt-in strategy, so it is saftly to support huge vmalloc mappings on arm64, for now, it is used in kvmalloc() and alloc_large_system_hash(). Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Link: https://lore.kernel.org/r/20220911044423.139229-1-wangkefeng.wang@huawei.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
#
877ace9e |
|
26-Aug-2022 |
Liu Song <liusong@linux.alibaba.com> |
arm64: spectre: increase parameters that can be used to turn off bhb mitigation individually In our environment, it was found that the mitigation BHB has a great impact on the benchmark performance. For example, in the lmbench test, the "process fork && exit" test performance drops by 20%. So it is necessary to have the ability to turn off the mitigation individually through cmdline, thus avoiding having to compile the kernel by adjusting the config. Signed-off-by: Liu Song <liusong@linux.alibaba.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/1661514050-22263-1-git-send-email-liusong@linux.alibaba.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
#
b8d1d163 |
|
16-Aug-2022 |
Daniel Sneddon <daniel.sneddon@linux.intel.com> |
x86/apic: Don't disable x2APIC if locked The APIC supports two modes, legacy APIC (or xAPIC), and Extended APIC (or x2APIC). X2APIC mode is mostly compatible with legacy APIC, but it disables the memory-mapped APIC interface in favor of one that uses MSRs. The APIC mode is controlled by the EXT bit in the APIC MSR. The MMIO/xAPIC interface has some problems, most notably the APIC LEAK [1]. This bug allows an attacker to use the APIC MMIO interface to extract data from the SGX enclave. Introduce support for a new feature that will allow the BIOS to lock the APIC in x2APIC mode. If the APIC is locked in x2APIC mode and the kernel tries to disable the APIC or revert to legacy APIC mode a GP fault will occur. Introduce support for a new MSR (IA32_XAPIC_DISABLE_STATUS) and handle the new locked mode when the LEGACY_XAPIC_DISABLED bit is set by preventing the kernel from trying to disable the x2APIC. On platforms with the IA32_XAPIC_DISABLE_STATUS MSR, if SGX or TDX are enabled the LEGACY_XAPIC_DISABLED will be set by the BIOS. If legacy APIC is required, then it SGX and TDX need to be disabled in the BIOS. [1]: https://aepicleak.com/aepicleak.pdf Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Tested-by: Neelima Krishnan <neelima.krishnan@intel.com> Link: https://lkml.kernel.org/r/20220816231943.1152579-1-daniel.sneddon@linux.intel.com
|
#
1202cdd6 |
|
17-Aug-2022 |
Stephen Hemminger <stephen@networkplumber.org> |
Remove DECnet support from kernel DECnet is an obsolete network protocol that receives more attention from kernel janitors than users. It belongs in computer protocol history museum not in Linux kernel. It has been "Orphaned" in kernel since 2010. The iproute2 support for DECnet was dropped in 5.0 release. The documentation link on Sourceforge says it is abandoned there as well. Leave the UAPI alone to keep userspace programs compiling. This means that there is still an empty neighbour table for AF_DECNET. The table of /proc/sys/net entries was updated to match current directories and reformatted to be alphabetical. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: David Ahern <dsahern@kernel.org> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2e8cff0a |
|
17-Aug-2022 |
Mark Rutland <mark.rutland@arm.com> |
arm64: fix rodata=full On arm64, "rodata=full" has been suppored (but not documented) since commit: c55191e96caa9d78 ("arm64: mm: apply r/o permissions of VM areas to its linear alias as well") As it's necessary to determine the rodata configuration early during boot, arm64 has an early_param() handler for this, whereas init/main.c has a __setup() handler which is run later. Unfortunately, this split meant that since commit: f9a40b0890658330 ("init/main.c: return 1 from handled __setup() functions") ... passing "rodata=full" would result in a spurious warning from the __setup() handler (though RO permissions would be configured appropriately). Further, "rodata=full" has been broken since commit: 0d6ea3ac94ca77c5 ("lib/kstrtox.c: add "false"/"true" support to kstrtobool()") ... which caused strtobool() to parse "full" as false (in addition to many other values not documented for the "rodata=" kernel parameter. This patch fixes this breakage by: * Moving the core parameter parser to an __early_param(), such that it is available early. * Adding an (optional) arch hook which arm64 can use to parse "full". * Updating the documentation to mention that "full" is valid for arm64. * Having the core parameter parser handle "on" and "off" explicitly, such that any undocumented values (e.g. typos such as "ful") are reported as errors rather than being silently accepted. Note that __setup() and early_param() have opposite conventions for their return values, where __setup() uses 1 to indicate a parameter was handled and early_param() uses 0 to indicate a parameter was handled. Fixes: f9a40b089065 ("init/main.c: return 1 from handled __setup() functions") Fixes: 0d6ea3ac94ca ("lib/kstrtox.c: add "false"/"true" support to kstrtobool()") Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Andy Shevchenko <andy.shevchenko@gmail.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Jagdish Gediya <jvgediya@linux.ibm.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Will Deacon <will@kernel.org> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20220817154022.3974645-1-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
|
#
e6cfcdda |
|
08-Aug-2022 |
Kim Phillips <kim.phillips@amd.com> |
x86/bugs: Enable STIBP for IBPB mitigated RETBleed AMD's "Technical Guidance for Mitigating Branch Type Confusion, Rev. 1.0 2022-07-12" whitepaper, under section 6.1.2 "IBPB On Privileged Mode Entry / SMT Safety" says: Similar to the Jmp2Ret mitigation, if the code on the sibling thread cannot be trusted, software should set STIBP to 1 or disable SMT to ensure SMT safety when using this mitigation. So, like already being done for retbleed=unret, and now also for retbleed=ibpb, force STIBP on machines that have it, and report its SMT vulnerability status accordingly. [ bp: Remove the "we" and remove "[AMD]" applicability parameter which doesn't work here. ] Fixes: 3ebc17006888 ("x86/bugs: Add retbleed=ibpb") Signed-off-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: stable@vger.kernel.org # 5.10, 5.15, 5.19 Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537 Link: https://lore.kernel.org/r/20220804192201.439596-1-kim.phillips@amd.com
|
#
dff03381 |
|
28-Jun-2022 |
Muchun Song <songmuchun@bytedance.com> |
mm: hugetlb_vmemmap: introduce the name HVO It it inconvenient to mention the feature of optimizing vmemmap pages associated with HugeTLB pages when communicating with others since there is no specific or abbreviated name for it when it is first introduced. Let us give it a name HVO (HugeTLB Vmemmap Optimization) from now. This commit also updates the document about "hugetlb_free_vmemmap" by the way discussed in thread [1]. Link: https://lore.kernel.org/all/21aae898-d54d-cc4b-a11f-1bb7fddcfffa@redhat.com/ [1] Link: https://lkml.kernel.org/r/20220628092235.91270-4-songmuchun@bytedance.com Signed-off-by: Muchun Song <songmuchun@bytedance.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: David Hildenbrand <david@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Will Deacon <will@kernel.org> Cc: Xiongchun Duan <duanxiongchun@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
#
ae39e9ed |
|
03-Jun-2022 |
Saravana Kannan <saravanak@google.com> |
module: Add support for default value for module async_probe Add a module.async_probe kernel command line option that allows enabling async probing for all modules. When this command line option is used, there might still be some modules for which we want to explicitly force synchronous probing, so extend <modulename>.async_probe to take an optional bool input so that async probing can be disabled for a specific module. Signed-off-by: Saravana Kannan <saravanak@google.com> Reviewed-by: Aaron Tomlin <atomlin@redhat.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
|
#
5a704629 |
|
17-Jul-2022 |
Dan Moulding <dmoulding@me.com> |
init: add "hostname" kernel parameter The gethostname system call returns the hostname for the current machine. However, the kernel has no mechanism to initially set the current machine's name in such a way as to guarantee that the first userspace process to call gethostname will receive a meaningful result. It relies on some unspecified userspace process to first call sethostname before gethostname can produce a meaningful name. Traditionally the machine's hostname is set from userspace by the init system. The init system, in turn, often relies on a configuration file (say, /etc/hostname) to provide the value that it will supply in the call to sethostname. Consequently, the file system containing /etc/hostname usually must be available before the hostname will be set. There may, however, be earlier userspace processes that could call gethostname before the file system containing /etc/hostname is mounted. Such a process will get some other, likely meaningless, name from gethostname (such as "(none)", "localhost", or "darkstar"). A real-world example where this can happen, and lead to undesirable results, is with mdadm. When assembling arrays, mdadm distinguishes between "local" arrays and "foreign" arrays. A local array is one that properly belongs to the current machine, and a foreign array is one that is (possibly temporarily) attached to the current machine, but properly belongs to some other machine. To determine if an array is local or foreign, mdadm may compare the "homehost" recorded on the array with the current hostname. If mdadm is run before the root file system is mounted, perhaps because the root file system itself resides on an md-raid array, then /etc/hostname isn't yet available and the init system will not yet have called sethostname, causing mdadm to incorrectly conclude that all of the local arrays are foreign. Solving this problem *could* be delegated to the init system. It could be left up to the init system (including any init system that starts within an initramfs, if one is in use) to ensure that sethostname is called before any other userspace process could possibly call gethostname. However, it may not always be obvious which processes could call gethostname (for example, udev itself might not call gethostname, but it could via udev rules invoke processes that do). Additionally, the init system has to ensure that the hostname configuration value is stored in some place where it will be readily accessible during early boot. Unfortunately, every init system will attempt to (or has already attempted to) solve this problem in a different, possibly incorrect, way. This makes getting consistently working configurations harder for users. I believe it is better for the kernel to provide the means by which the hostname may be set early, rather than making this a problem for the init system to solve. The option to set the hostname during early startup, via a kernel parameter, provides a simple, reliable way to solve this problem. It also could make system configuration easier for some embedded systems. [dmoulding@me.com: v2] Link: https://lkml.kernel.org/r/20220506060310.7495-2-dmoulding@me.com Link: https://lkml.kernel.org/r/20220505180651.22849-2-dmoulding@me.com Signed-off-by: Dan Moulding <dmoulding@me.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
#
72311809 |
|
21-Jul-2022 |
Tianyu Lan <tiala@microsoft.com> |
swiotlb: clean up some coding style and minor issues - Fix the used field of struct io_tlb_area wasn't initialized - Set area number to be 0 if input area number parameter is 0 - Use array_size() to calculate io_tlb_area array size - Make parameters of swiotlb_do_find_slots() more reasonable Fixes: 26ffb91fa5e0 ("swiotlb: split up the global swiotlb lock") Signed-off-by: Tianyu Lan <tiala@microsoft.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
|
#
20347fca |
|
07-Jul-2022 |
Tianyu Lan <Tianyu.Lan@microsoft.com> |
swiotlb: split up the global swiotlb lock Traditionally swiotlb was not performance critical because it was only used for slow devices. But in some setups, like TDX/SEV confidential guests, all IO has to go through swiotlb. Currently swiotlb only has a single lock. Under high IO load with multiple CPUs this can lead to significat lock contention on the swiotlb lock. This patch splits the swiotlb bounce buffer pool into individual areas which have their own lock. Each CPU tries to allocate in its own area first. Only if that fails does it search other areas. On freeing the allocation is freed into the area where the memory was originally allocated from. Area number can be set via swiotlb kernel parameter and is default to be possible cpu number. If possible cpu number is not power of 2, area number will be round up to the next power of 2. This idea from Andi Kleen patch(https://github.com/intel/tdx/commit/ 4529b5784c141782c72ec9bd9a92df2b68cb7d45). Based-on-idea-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
|
#
bbe3a106 |
|
06-Jul-2022 |
Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> |
iommu/amd: Add PCI segment support for ivrs_[ioapic/hpet/acpihid] commands By default, PCI segment is zero and can be omitted. To support system with non-zero PCI segment ID, modify the parsing functions to allow PCI segment ID. Co-developed-by: Vasant Hegde <vasant.hegde@amd.com> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Link: https://lore.kernel.org/r/20220706113825.25582-33-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
#
56e54b4e |
|
13-Jun-2022 |
Christophe Leroy <christophe.leroy@csgroup.eu> |
powerpc/32: Remove 'noltlbs' kernel parameter Mapping without large TLBs has no added value on the 8xx. Mapping without large TLBs is still necessary on 40x when selecting CONFIG_KFENCE or CONFIG_DEBUG_PAGEALLOC or CONFIG_STRICT_KERNEL_RWX, but this is done automatically and doesn't require user selection. Remove 'noltlbs' kernel parameter, the user has no reason to use it. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/80ca17bd39cf608a8ebd0764d7064a498e131199.1655202721.git.christophe.leroy@csgroup.eu
|
#
1ce84497 |
|
13-Jun-2022 |
Christophe Leroy <christophe.leroy@csgroup.eu> |
powerpc/32: Remove the 'nobats' kernel parameter Mapping without BATs doesn't bring any added value to the user. Remove that option. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/6977314c823cfb728bc0273cea634b41807bfb64.1655202721.git.christophe.leroy@csgroup.eu
|
#
66361095 |
|
17-Jun-2022 |
Muchun Song <songmuchun@bytedance.com> |
mm: memory_hotplug: make hugetlb_optimize_vmemmap compatible with memmap_on_memory For now, the feature of hugetlb_free_vmemmap is not compatible with the feature of memory_hotplug.memmap_on_memory, and hugetlb_free_vmemmap takes precedence over memory_hotplug.memmap_on_memory. However, someone wants to make memory_hotplug.memmap_on_memory takes precedence over hugetlb_free_vmemmap since memmap_on_memory makes it more likely to succeed memory hotplug in close-to-OOM situations. So the decision of making hugetlb_free_vmemmap take precedence is not wise and elegant. The proper approach is to have hugetlb_vmemmap.c do the check whether the section which the HugeTLB pages belong to can be optimized. If the section's vmemmap pages are allocated from the added memory block itself, hugetlb_free_vmemmap should refuse to optimize the vmemmap, otherwise, do the optimization. Then both kernel parameters are compatible. So this patch introduces VmemmapSelfHosted to mask any non-optimizable vmemmap pages. The hugetlb_vmemmap can use this flag to detect if a vmemmap page can be optimized. [songmuchun@bytedance.com: walk vmemmap page tables to avoid false-positive] Link: https://lkml.kernel.org/r/20220620110616.12056-3-songmuchun@bytedance.com Link: https://lkml.kernel.org/r/20220617135650.74901-3-songmuchun@bytedance.com Signed-off-by: Muchun Song <songmuchun@bytedance.com> Co-developed-by: Oscar Salvador <osalvador@suse.de> Signed-off-by: Oscar Salvador <osalvador@suse.de> Acked-by: David Hildenbrand <david@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Xiongchun Duan <duanxiongchun@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
#
ee65728e |
|
27-Jun-2022 |
Mike Rapoport <rppt@kernel.org> |
docs: rename Documentation/vm to Documentation/mm so it will be consistent with code mm directory and with Documentation/admin-guide/mm and won't be confused with virtual machines. Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Suggested-by: Matthew Wilcox <willy@infradead.org> Tested-by: Ira Weiny <ira.weiny@intel.com> Acked-by: Jonathan Corbet <corbet@lwn.net> Acked-by: Wu XiangCheng <bobwxc@email.cn>
|
#
ada51a9d |
|
22-Jun-2022 |
David Matlack <dmatlack@google.com> |
KVM: x86/mmu: Extend Eager Page Splitting to nested MMUs Add support for Eager Page Splitting pages that are mapped by nested MMUs. Walk through the rmap first splitting all 1GiB pages to 2MiB pages, and then splitting all 2MiB pages to 4KiB pages. Note, Eager Page Splitting is limited to nested MMUs as a policy rather than due to any technical reason (the sp->role.guest_mode check could just be deleted and Eager Page Splitting would work correctly for all shadow MMU pages). There is really no reason to support Eager Page Splitting for tdp_mmu=N, since such support will eventually be phased out, and there is no current use case supporting Eager Page Splitting on hosts where TDP is either disabled or unavailable in hardware. Furthermore, future improvements to nested MMU scalability may diverge the code from the legacy shadow paging implementation. These improvements will be simpler to make if Eager Page Splitting does not have to worry about legacy shadow paging. Splitting huge pages mapped by nested MMUs requires dealing with some extra complexity beyond that of the TDP MMU: (1) The shadow MMU has a limit on the number of shadow pages that are allowed to be allocated. So, as a policy, Eager Page Splitting refuses to split if there are KVM_MIN_FREE_MMU_PAGES or fewer pages available. (2) Splitting a huge page may end up re-using an existing lower level shadow page tables. This is unlike the TDP MMU which always allocates new shadow page tables when splitting. (3) When installing the lower level SPTEs, they must be added to the rmap which may require allocating additional pte_list_desc structs. Case (2) is especially interesting since it may require a TLB flush, unlike the TDP MMU which can fully split huge pages without any TLB flushes. Specifically, an existing lower level page table may point to even lower level page tables that are not fully populated, effectively unmapping a portion of the huge page, which requires a flush. As of this commit, a flush is always done always after dropping the huge page and before installing the lower level page table. This TLB flush could instead be delayed until the MMU lock is about to be dropped, which would batch flushes for multiple splits. However these flushes should be rare in practice (a huge page must be aliased in multiple SPTEs and have been split for NX Huge Pages in only some of them). Flushing immediately is simpler to plumb and also reduces the chances of tripping over a CPU bug (e.g. see iTLB multihit). [ This commit is based off of the original implementation of Eager Page Splitting from Peter in Google's kernel from 2016. ] Suggested-by: Peter Feiner <pfeiner@google.com> Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20220516232138.1783324-23-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
b37a667c |
|
22-Apr-2022 |
Joel Fernandes <joel@joelfernandes.org> |
rcu/nocb: Add an option to offload all CPUs on boot Systems built with CONFIG_RCU_NOCB_CPU=y but booted without either the rcu_nocbs= or rcu_nohz_full= kernel-boot parameters will not have callback offloading on any of the CPUs, nor can any of the CPUs be switched to enable callback offloading at runtime. Although this is intentional, it would be nice to have a way to offload all the CPUs without having to make random bootloaders specify either the rcu_nocbs= or the rcu_nohz_full= kernel-boot parameters. This commit therefore provides a new CONFIG_RCU_NOCB_CPU_DEFAULT_ALL Kconfig option that switches the default so as to offload callback processing on all of the CPUs. This default can still be overridden using the rcu_nocbs= and rcu_nohz_full= kernel-boot parameters. Reviewed-by: Kalesh Singh <kaleshsingh@google.com> Reviewed-by: Uladzislau Rezki <urezki@gmail.com> (In v4.1, fixed issues with CONFIG maze reported by kernel test robot). Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Joel Fernandes <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
|
#
049f9ae9 |
|
08-Jul-2022 |
Jason A. Donenfeld <Jason@zx2c4.com> |
x86/rdrand: Remove "nordrand" flag in favor of "random.trust_cpu" The decision of whether or not to trust RDRAND is controlled by the "random.trust_cpu" boot time parameter or the CONFIG_RANDOM_TRUST_CPU compile time default. The "nordrand" flag was added during the early days of RDRAND, when there were worries that merely using its values could compromise the RNG. However, these days, RDRAND values are not used directly but always go through the RNG's hash function, making "nordrand" no longer useful. Rather, the correct switch is "random.trust_cpu", which not only handles the relevant trust issue directly, but also is general to multiple CPU types, not just x86. However, x86 RDRAND does have a history of being occasionally problematic. Prior, when the kernel would notice something strange, it'd warn in dmesg and suggest enabling "nordrand". We can improve on that by making the test a little bit better and then taking the step of automatically disabling RDRAND if we detect it's problematic. Also disable RDSEED if the RDRAND test fails. Cc: x86@kernel.org Cc: Theodore Ts'o <tytso@mit.edu> Suggested-by: H. Peter Anvin <hpa@zytor.com> Suggested-by: Borislav Petkov <bp@suse.de> Acked-by: Borislav Petkov <bp@suse.de> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
#
7ac3945d |
|
26-Jun-2022 |
Mauro Carvalho Chehab <mchehab@kernel.org> |
Documentation: KVM: update amd-memory-encryption.rst references Changeset daec8d408308 ("Documentation: KVM: add separate directories for architecture-specific documentation") renamed: Documentation/virt/kvm/amd-memory-encryption.rst to: Documentation/virt/kvm/x86/amd-memory-encryption.rst. Update the cross-references accordingly. Fixes: daec8d408308 ("Documentation: KVM: add separate directories for architecture-specific documentation") Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org> Link: https://lore.kernel.org/r/fd80db889e34aae87a4ca88cad94f650723668f4.1656234456.git.mchehab@kernel.org Signed-off-by: Jonathan Corbet <corbet@lwn.net>
|
#
89f7f291 |
|
27-Apr-2022 |
Paul E. McKenney <paulmck@kernel.org> |
doc: Document rcutree.nocb_nobypass_lim_per_jiffy kernel parameter This commit provides documentation for the kernel parameter controlling RCU's handling of callback floods on offloaded (rcu_nocbs) CPUs. This parameter might be obscure, but it is always there when you need it. Reported-by: Frederic Weisbecker <frederic@kernel.org> Reported-by: Uladzislau Rezki <urezki@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
|
#
71de1e34 |
|
20-Apr-2022 |
Paul E. McKenney <paulmck@kernel.org> |
doc: Document the rcutree.rcu_divisor kernel boot parameter This commit adds kernel-parameters.txt documentation for the rcutree.rcu_divisor kernel boot parameter, which controls the softirq callback-invocation batch limit. Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
|
#
8d6d51ed |
|
28-Feb-2022 |
Randy Dunlap <rdunlap@infradead.org> |
docs: selinux: add '=' signs to kernel boot options Provide the full kernel boot option string (with ending '=' sign). They won't work without that and that is how other boot options are listed. If used without an '=' sign (as listed here), they cause an "Unknown parameters" message and are added to init's argument strings, polluting them. Unknown kernel command line parameters "enforcing checkreqprot BOOT_IMAGE=/boot/bzImage-517rc6", will be passed to user space. Run /sbin/init as init process with arguments: /sbin/init enforcing checkreqprot with environment: HOME=/ TERM=linux BOOT_IMAGE=/boot/bzImage-517rc6 Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Paul Moore <paul@paul-moore.com> Cc: Stephen Smalley <stephen.smalley.work@gmail.com> Cc: Eric Paris <eparis@parisplace.org> Cc: selinux@vger.kernel.org Cc: Jonathan Corbet <corbet@lwn.net> [PM: removed bogus 'Fixes' line] Signed-off-by: Paul Moore <paul@paul-moore.com>
|
#
504ee236 |
|
30-Jun-2022 |
Marc Zyngier <maz@kernel.org> |
arm64: Add the arm64.nosve command line option In order to be able to completely disable SVE even if the HW seems to support it (most likely because the FW is broken), move the SVE setup into the EL2 finalisation block, and use a new idreg override to deal with it. Note that we also nuke id_aa64zfr0_el1 as a byproduct, and that SME also gets disabled, due to the dependency between the two features. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20220630160500.1536744-9-maz@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
|
#
b3000e21 |
|
30-Jun-2022 |
Marc Zyngier <maz@kernel.org> |
arm64: Add the arm64.nosme command line option In order to be able to completely disable SME even if the HW seems to support it (most likely because the FW is broken), move the SME setup into the EL2 finalisation block, and use a new idreg override to deal with it. Note that we also nuke id_aa64smfr0_el1 as a byproduct. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20220630160500.1536744-8-maz@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
|
#
e92b2573 |
|
23-Jun-2022 |
Liu Song <liusong@linux.alibaba.com> |
arm64: correct the effect of mitigations off on kpti If KASLR is enabled, then kpti will be forced to be enabled even if mitigations off, so we need to adjust the description of this parameter. Signed-off-by: Liu Song <liusong@linux.alibaba.com> Link: https://lore.kernel.org/r/1656033648-84181-1-git-send-email-liusong@linux.alibaba.com Signed-off-by: Will Deacon <will@kernel.org>
|
#
ea304a8b |
|
27-Jul-2022 |
Eiichi Tsukata <eiichi.tsukata@nutanix.com> |
docs/kernel-parameters: Update descriptions for "mitigations=" param with retbleed Updates descriptions for "mitigations=off" and "mitigations=auto,nosmt" with the respective retbleed= settings. Signed-off-by: Eiichi Tsukata <eiichi.tsukata@nutanix.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: corbet@lwn.net Link: https://lore.kernel.org/r/20220728043907.165688-1-eiichi.tsukata@nutanix.com
|
#
4f2bfd94 |
|
30-Jun-2022 |
Neeraj Upadhyay <quic_neeraju@quicinc.com> |
srcu: Make expedited RCU grace periods block even less frequently The purpose of commit 282d8998e997 ("srcu: Prevent expedited GPs and blocking readers from consuming CPU") was to prevent a long series of never-blocking expedited SRCU grace periods from blocking kernel-live-patching (KLP) progress. Although it was successful, it also resulted in excessive boot times on certain embedded workloads running under qemu with the "-bios QEMU_EFI.fd" command line. Here "excessive" means increasing the boot time up into the three-to-four minute range. This increase in boot time was due to the more than 6000 back-to-back invocations of synchronize_rcu_expedited() within the KVM host OS, which in turn resulted from qemu's emulation of a long series of MMIO accesses. Commit 640a7d37c3f4 ("srcu: Block less aggressively for expedited grace periods") did not significantly help this particular use case. Zhangfei Gao and Shameerali Kolothum Thodi did experiments varying the value of SRCU_MAX_NODELAY_PHASE with HZ=250 and with various values of non-sleeping per phase counts on a system with preemption enabled, and observed the following boot times: +──────────────────────────+────────────────+ | SRCU_MAX_NODELAY_PHASE | Boot time (s) | +──────────────────────────+────────────────+ | 100 | 30.053 | | 150 | 25.151 | | 200 | 20.704 | | 250 | 15.748 | | 500 | 11.401 | | 1000 | 11.443 | | 10000 | 11.258 | | 1000000 | 11.154 | +──────────────────────────+────────────────+ Analysis on the experiment results show additional improvements with CPU-bound delays approaching one jiffy in duration. This improvement was also seen when number of per-phase iterations were scaled to one jiffy. This commit therefore scales per-grace-period phase number of non-sleeping polls so that non-sleeping polls extend for about one jiffy. In addition, the delay-calculation call to srcu_get_delay() in srcu_gp_end() is replaced with a simple check for an expedited grace period. This change schedules callback invocation immediately after expedited grace periods complete, which results in greatly improved boot times. Testing done by Marc and Zhangfei confirms that this change recovers most of the performance degradation in boottime; for CONFIG_HZ_250 configuration, specifically, boot times improve from 3m50s to 41s on Marc's setup; and from 2m40s to ~9.7s on Zhangfei's setup. In addition to the changes to default per phase delays, this change adds 3 new kernel parameters - srcutree.srcu_max_nodelay, srcutree.srcu_max_nodelay_phase, and srcutree.srcu_retry_check_delay. This allows users to configure the srcu grace period scanning delays in order to more quickly react to additional use cases. Fixes: 640a7d37c3f4 ("srcu: Block less aggressively for expedited grace periods") Fixes: 282d8998e997 ("srcu: Prevent expedited GPs and blocking readers from consuming CPU") Reported-by: Zhangfei Gao <zhangfei.gao@linaro.org> Reported-by: yueluck <yueluck@163.com> Signed-off-by: Neeraj Upadhyay <quic_neeraju@quicinc.com> Tested-by: Marc Zyngier <maz@kernel.org> Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org> Link: https://lore.kernel.org/all/20615615-0013-5adc-584f-2b1d5c03ebfc@linaro.org/ Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
3ebc1700 |
|
14-Jun-2022 |
Peter Zijlstra <peterz@infradead.org> |
x86/bugs: Add retbleed=ibpb jmp2ret mitigates the easy-to-attack case at relatively low overhead. It mitigates the long speculation windows after a mispredicted RET, but it does not mitigate the short speculation window from arbitrary instruction boundaries. On Zen2, there is a chicken bit which needs setting, which mitigates "arbitrary instruction boundaries" down to just "basic block boundaries". But there is no fix for the short speculation window on basic block boundaries, other than to flush the entire BTB to evict all attacker predictions. On the spectrum of "fast & blurry" -> "safe", there is (on top of STIBP or no-SMT): 1) Nothing System wide open 2) jmp2ret May stop a script kiddy 3) jmp2ret+chickenbit Raises the bar rather further 4) IBPB Only thing which can count as "safe". Tentative numbers put IBPB-on-entry at a 2.5x hit on Zen2, and a 10x hit on Zen1 according to lmbench. [ bp: Fixup feature bit comments, document option, 32-bit build fix. ] Suggested-by: Andrew Cooper <Andrew.Cooper3@citrix.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de>
|
#
7c693f54 |
|
14-Jun-2022 |
Pawan Gupta <pawan.kumar.gupta@linux.intel.com> |
x86/speculation: Add spectre_v2=ibrs option to support Kernel IBRS Extend spectre_v2= boot option with Kernel IBRS. [jpoimboe: no STIBP with IBRS] Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de>
|
#
e8ec1b6e |
|
14-Jun-2022 |
Kim Phillips <kim.phillips@amd.com> |
x86/bugs: Enable STIBP for JMP2RET For untrained return thunks to be fully effective, STIBP must be enabled or SMT disabled. Co-developed-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de>
|
#
7fbf47c7 |
|
14-Jun-2022 |
Alexandre Chartre <alexandre.chartre@oracle.com> |
x86/bugs: Add AMD retbleed= boot parameter Add the "retbleed=<value>" boot parameter to select a mitigation for RETBleed. Possible values are "off", "auto" and "unret" (JMP2RET mitigation). The default value is "auto". Currently, "retbleed=auto" will select the unret mitigation on AMD and Hygon and no mitigation on Intel (JMP2RET is not effective on Intel). [peterz: rebase; add hygon] [jpoimboe: cleanups] Signed-off-by: Alexandre Chartre <alexandre.chartre@oracle.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de>
|
#
cde5042a |
|
09-Jun-2022 |
Will Deacon <will@kernel.org> |
KVM: arm64: Ignore 'kvm-arm.mode=protected' when using VHE Ignore 'kvm-arm.mode=protected' when using VHE so that kvm_get_mode() only returns KVM_MODE_PROTECTED on systems where the feature is available. Cc: David Brazdil <dbrazdil@google.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220609121223.2551-4-will@kernel.org
|
#
8cb861e9 |
|
19-May-2022 |
Pawan Gupta <pawan.kumar.gupta@linux.intel.com> |
x86/speculation/mmio: Add mitigation for Processor MMIO Stale Data Processor MMIO Stale Data is a class of vulnerabilities that may expose data after an MMIO operation. For details please refer to Documentation/admin-guide/hw-vuln/processor_mmio_stale_data.rst. These vulnerabilities are broadly categorized as: Device Register Partial Write (DRPW): Some endpoint MMIO registers incorrectly handle writes that are smaller than the register size. Instead of aborting the write or only copying the correct subset of bytes (for example, 2 bytes for a 2-byte write), more bytes than specified by the write transaction may be written to the register. On some processors, this may expose stale data from the fill buffers of the core that created the write transaction. Shared Buffers Data Sampling (SBDS): After propagators may have moved data around the uncore and copied stale data into client core fill buffers, processors affected by MFBDS can leak data from the fill buffer. Shared Buffers Data Read (SBDR): It is similar to Shared Buffer Data Sampling (SBDS) except that the data is directly read into the architectural software-visible state. An attacker can use these vulnerabilities to extract data from CPU fill buffers using MDS and TAA methods. Mitigate it by clearing the CPU fill buffers using the VERW instruction before returning to a user or a guest. On CPUs not affected by MDS and TAA, user application cannot sample data from CPU fill buffers using MDS or TAA. A guest with MMIO access can still use DRPW or SBDR to extract data architecturally. Mitigate it with VERW instruction to clear fill buffers before VMENTER for MMIO capable guests. Add a kernel parameter mmio_stale_data={off|full|full,nosmt} to control the mitigation. Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de>
|
#
2b28a1a8 |
|
29-Apr-2022 |
Saravana Kannan <saravanak@google.com> |
driver core: Extend deferred probe timeout on driver registration The deferred probe timer that's used for this currently starts at late_initcall and runs for driver_deferred_probe_timeout seconds. The assumption being that all available drivers would be loaded and registered before the timer expires. This means, the driver_deferred_probe_timeout has to be pretty large for it to cover the worst case. But if we set the default value for it to cover the worst case, it would significantly slow down the average case. For this reason, the default value is set to 0. Also, with CONFIG_MODULES=y and the current default values of driver_deferred_probe_timeout=0 and fw_devlink=on, devices with missing drivers will cause their consumer devices to always defer their probes. This is because device links created by fw_devlink defer the probe even before the consumer driver's probe() is called. Instead of a fixed timeout, if we extend an unexpired deferred probe timer on every successful driver registration, with the expectation more modules would be loaded in the near future, then the default value of driver_deferred_probe_timeout only needs to be as long as the worst case time difference between two consecutive module loads. So let's implement that and set the default value to 10 seconds when CONFIG_MODULES=y. Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Rob Herring <robh@kernel.org> Cc: Linus Walleij <linus.walleij@linaro.org> Cc: Will Deacon <will@kernel.org> Cc: Ulf Hansson <ulf.hansson@linaro.org> Cc: Kevin Hilman <khilman@kernel.org> Cc: Thierry Reding <treding@nvidia.com> Cc: Mark Brown <broonie@kernel.org> Cc: Pavel Machek <pavel@ucw.cz> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Cc: Paul Kocialkowski <paul.kocialkowski@bootlin.com> Cc: linux-gpio@vger.kernel.org Cc: linux-pm@vger.kernel.org Cc: iommu@lists.linux-foundation.org Reviewed-by: Mark Brown <broonie@kernel.org> Acked-by: Rob Herring <robh@kernel.org> Signed-off-by: Saravana Kannan <saravanak@google.com> Link: https://lore.kernel.org/r/20220429220933.1350374-1-saravanak@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
f79f662e |
|
03-May-2022 |
Saravana Kannan <saravanak@google.com> |
driver core: Add "*" wildcard support to driver_async_probe cmdline param There's currently no way to use driver_async_probe kernel cmdline param to enable default async probe for all drivers. So, add support for "*" to match with all driver names. When "*" is used, all other drivers listed in driver_async_probe are drivers that will NOT match the "*". For example: * driver_async_probe=drvA,drvB,drvC drvA, drvB and drvC do asynchronous probing. * driver_async_probe=* All drivers do asynchronous probing except those that have set PROBE_FORCE_SYNCHRONOUS flag. * driver_async_probe=*,drvA,drvB,drvC All drivers do asynchronous probing except drvA, drvB, drvC and those that have set PROBE_FORCE_SYNCHRONOUS flag. Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Feng Tang <feng.tang@intel.com> Signed-off-by: Saravana Kannan <saravanak@google.com> Link: https://lore.kernel.org/r/20220504005344.117803-1-saravanak@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
fa6dae5d |
|
19-May-2022 |
Hans de Goede <hdegoede@redhat.com> |
x86/PCI: Add kernel cmdline options to use/ignore E820 reserved regions Some firmware supplies PCI host bridge _CRS that includes address space unusable by PCI devices, e.g., space occupied by host bridge registers or used by hidden PCI devices. To avoid this unusable space, Linux currently excludes E820 reserved regions from _CRS windows; see 4dc2287c1805 ("x86: avoid E820 regions when allocating address space"). However, this use of E820 reserved regions to clip things out of _CRS is not supported by ACPI, UEFI, or PCI Firmware specs, and some systems have E820 reserved regions that cover the entire memory window from _CRS. 4dc2287c1805 clips the entire window, leaving no space for hot-added or uninitialized PCI devices. For example, from a Lenovo IdeaPad 3 15IIL 81WE: BIOS-e820: [mem 0x4bc50000-0xcfffffff] reserved pci_bus 0000:00: root bus resource [mem 0x65400000-0xbfffffff window] pci 0000:00:15.0: BAR 0: [mem 0x00000000-0x00000fff 64bit] pci 0000:00:15.0: BAR 0: no space for [mem size 0x00001000 64bit] Future patches will add quirks to enable/disable E820 clipping automatically. Add a "pci=no_e820" kernel command line option to disable clipping with E820 reserved regions. Also add a matching "pci=use_e820" option to enable clipping with E820 reserved regions if that has been disabled by default by further patches in this patch-set. Both options taint the kernel because they are intended for debugging and workaround purposes until a quirk can set them automatically. [bhelgaas: commit log, add printk] Link: https://bugzilla.redhat.com/show_bug.cgi?id=1868899 Lenovo IdeaPad 3 Link: https://lore.kernel.org/r/20220519152150.6135-2-hdegoede@redhat.com Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Benoit Grégoire <benoitg@coeus.ca> Cc: Hui Wang <hui.wang@canonical.com>
|
#
9c54c522 |
|
13-May-2022 |
Muchun Song <songmuchun@bytedance.com> |
mm: hugetlb_vmemmap: use kstrtobool for hugetlb_vmemmap param parsing Use kstrtobool rather than open coding "on" and "off" parsing in mm/hugetlb_vmemmap.c, which is more powerful to handle all kinds of parameters like 'Yy1Nn0' or [oO][NnFf] for "on" and "off". Link: https://lkml.kernel.org/r/20220512041142.39501-4-songmuchun@bytedance.com Signed-off-by: Muchun Song <songmuchun@bytedance.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: David Hildenbrand <david@redhat.com> Cc: Iurii Zaikin <yzaikin@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kees Cook <keescook@chromium.org> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Oscar Salvador <osalvador@suse.de> Cc: Xiongchun Duan <duanxiongchun@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
#
47010c04 |
|
29-Apr-2022 |
Muchun Song <songmuchun@bytedance.com> |
mm: hugetlb_vmemmap: cleanup CONFIG_HUGETLB_PAGE_FREE_VMEMMAP* The word of "free" is not expressive enough to express the feature of optimizing vmemmap pages associated with each HugeTLB, rename this keywork to "optimize". In this patch , cheanup configs to make code more expressive. Link: https://lkml.kernel.org/r/20220404074652.68024-4-songmuchun@bytedance.com Signed-off-by: Muchun Song <songmuchun@bytedance.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
#
e9c5048c |
|
13-May-2022 |
Ahmad Fatoum <a.fatoum@pengutronix.de> |
KEYS: trusted: Introduce support for NXP CAAM-based trusted keys The Cryptographic Acceleration and Assurance Module (CAAM) is an IP core built into many newer i.MX and QorIQ SoCs by NXP. The CAAM does crypto acceleration, hardware number generation and has a blob mechanism for encapsulation/decapsulation of sensitive material. This blob mechanism depends on a device specific random 256-bit One Time Programmable Master Key that is fused in each SoC at manufacturing time. This key is unreadable and can only be used by the CAAM for AES encryption/decryption of user data. This makes it a suitable backend (source) for kernel trusted keys. Previous commits generalized trusted keys to support multiple backends and added an API to access the CAAM blob mechanism. Based on these, provide the necessary glue to use the CAAM for trusted keys. Reviewed-by: David Gstir <david@sigma-star.at> Reviewed-by: Pankaj Gupta <pankaj.gupta@nxp.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Tested-by: Tim Harvey <tharvey@gateworks.com> Tested-by: Matthias Schiffer <matthias.schiffer@ew.tq-group.com> Tested-by: Pankaj Gupta <pankaj.gupta@nxp.com> Tested-by: Michael Walle <michael@walle.cc> # on ls1028a (non-E and E) Tested-by: John Ernberg <john.ernberg@actia.se> # iMX8QXP Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
|
#
fcd7c269 |
|
13-May-2022 |
Ahmad Fatoum <a.fatoum@pengutronix.de> |
KEYS: trusted: allow use of kernel RNG for key material The two existing trusted key sources don't make use of the kernel RNG, but instead let the hardware doing the sealing/unsealing also generate the random key material. However, both users and future backends may want to place less trust into the quality of the trust source's random number generator and instead reuse the kernel entropy pool, which can be seeded from multiple entropy sources. Make this possible by adding a new trusted.rng parameter, that will force use of the kernel RNG. In its absence, it's up to the trust source to decide, which random numbers to use, maintaining the existing behavior. Suggested-by: Jarkko Sakkinen <jarkko@kernel.org> Acked-by: Sumit Garg <sumit.garg@linaro.org> Acked-by: Pankaj Gupta <pankaj.gupta@nxp.com> Reviewed-by: David Gstir <david@sigma-star.at> Reviewed-by: Pankaj Gupta <pankaj.gupta@nxp.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Tested-by: Pankaj Gupta <pankaj.gupta@nxp.com> Tested-by: Michael Walle <michael@walle.cc> # on ls1028a (non-E and E) Tested-by: John Ernberg <john.ernberg@actia.se> # iMX8QXP Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
|
#
4a840d5f |
|
23-Apr-2022 |
Randy Dunlap <rdunlap@infradead.org> |
Documentation: drop more IDE boot options and ide-cd.rst Drop ide-* command line options. Drop cdrom/ide-cd.rst documentation. Fixes: 898ee22c32be ("Drop Documentation/ide/") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Christoph Hellwig <hch@lst.de> Acked-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Acked-by: Phillip Potter <phil@philpotter.co.uk> Link: https://lore.kernel.org/r/20220424033701.7988-1-rdunlap@infradead.org [jc: also deleted reference from cdrom/index.rst] Signed-off-by: Jonathan Corbet <corbet@lwn.net>
|
#
989dc725 |
|
22-Dec-2021 |
Mimi Zohar <zohar@linux.ibm.com> |
ima: define a new template field named 'd-ngv2' and templates In preparation to differentiate between unsigned regular IMA file hashes and fs-verity's file digests in the IMA measurement list, define a new template field named 'd-ngv2'. Also define two new templates named 'ima-ngv2' and 'ima-sigv2', which include the new 'd-ngv2' field. Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
|
#
389cfd96 |
|
02-Apr-2022 |
Randy Dunlap <rdunlap@infradead.org> |
docs/admin: alphabetize parts of kernel-parameters.txt (part 2) Alphabetize several of the kernel boot parameters in kernel-parameters.txt. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: linux-doc@vger.kernel.org Signed-off-by: Jonathan Corbet <corbet@lwn.net>
|
#
d2fc83c1 |
|
02-Apr-2022 |
Randy Dunlap <rdunlap@infradead.org> |
Docs/admin: alphabetize some kernel-parameters (part 1) Move some out-of-place kernel parameters into their correct locations. Move one out-of-order keyword/legend in kernel-parameters.rst. Add some missing keyword legends in kernel-parameters.rst: HIBERNATION HYPER_V and drop some obsolete/removed keyword legends: EIDE IOSCHED OSS TS XT Correct the location of the setup.h file. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: linux-doc@vger.kernel.org Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Pavel Machek <pavel@ucw.cz> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
|
#
59bdbbd5 |
|
02-Apr-2022 |
Randy Dunlap <rdunlap@infradead.org> |
Docs: admin/kernel-parameters: edit a few boot options Clean up some of admin-guide/kernel-parameters.txt: a. "smt" should be "smt=" (S390) b. (dropped) c. Sparc supports the vdso= boot option d. make the tp_printk options (2) formatting similar to other options by adding spacing e. add "trace_clock=" with a reference to Documentation/trace/ftrace.rst f. use [IA-64] as documented instead of [ia64] g. fix formatting and text for test_suspend= h. fix formatting for swapaccount= i. fix formatting and grammar for video.brightness_switch_enabled= Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: linux-s390@vger.kernel.org Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: sparclinux@vger.kernel.org Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: linux-ia64@vger.kernel.org Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Pavel Machek <pavel@ucw.cz> Cc: Len Brown <lenb@kernel.org> Cc: linux-pm@vger.kernel.org Cc: linux-acpi@vger.kernel.org Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: linux-doc@vger.kernel.org Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
|
#
82850028 |
|
20-Mar-2022 |
Akihiko Odaki <akihiko.odaki@gmail.com> |
x86/efi: Remove references of EFI earlyprintk from documentation x86 EFI earlyprink was removed with commit 69c1f396f25b ("efi/x86: Convert x86 EFI earlyprintk into generic earlycon implementation"). Signed-off-by: Akihiko Odaki <akihiko.odaki@gmail.com> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
|
#
8f0f104e |
|
10-May-2022 |
Zhen Lei <thunder.leizhen@huawei.com> |
arm64: kdump: Do not allocate crash low memory if not needed When "crashkernel=X,high" is specified, the specified "crashkernel=Y,low" memory is not required in the following corner cases: 1. If both CONFIG_ZONE_DMA and CONFIG_ZONE_DMA32 are disabled, it means that the devices can access any memory. 2. If the system memory is small, the crash high memory may be allocated from the DMA zones. If that happens, there's no need to allocate another crash low memory because there's already one. Add condition '(crash_base >= CRASH_ADDR_LOW_MAX)' to determine whether the 'high' memory is allocated above DMA zones. Note: when both CONFIG_ZONE_DMA and CONFIG_ZONE_DMA32 are disabled, the entire physical memory is DMA accessible, CRASH_ADDR_LOW_MAX equals 'PHYS_MASK + 1'. Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Acked-by: Baoquan He <bhe@redhat.com> Link: https://lore.kernel.org/r/20220511032033.426-1-thunder.leizhen@huawei.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
#
553b0cb3 |
|
13-May-2022 |
Xiao Yang <yangx.jy@fujitsu.com> |
x86/speculation: Add missing srbds=off to the mitigations= help text The mitigations= cmdline option help text misses the srbds=off option. Add it. [ bp: Add a commit message. ] Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220513101637.216487-1-yangx.jy@fujitsu.com
|
#
28b3ae42 |
|
16-Feb-2022 |
Uladzislau Rezki <uladzislau.rezki@sony.com> |
rcu: Introduce CONFIG_RCU_EXP_CPU_STALL_TIMEOUT Currently both expedited and regular grace period stall warnings use a single timeout value that with units of seconds. However, recent Android use cases problem require a sub-100-millisecond expedited RCU CPU stall warning. Given that expedited RCU grace periods normally complete in far less than a single millisecond, especially for small systems, this is not unreasonable. Therefore introduce the CONFIG_RCU_EXP_CPU_STALL_TIMEOUT kernel configuration that defaults to 20 msec on Android and remains the same as that of the non-expedited stall warnings otherwise. It also can be changed in run-time via: /sys/.../parameters/rcu_exp_cpu_stall_timeout. [ paulmck: Default of zero to use CONFIG_RCU_STALL_TIMEOUT. ] Signed-off-by: Uladzislau Rezki <uladzislau.rezki@sony.com> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
fa82cabb |
|
18-Mar-2022 |
Damien Le Moal <damien.lemoal@opensource.wdc.com> |
doc: admin-guide: Update libata kernel parameters Cleanup the text text describing the libata.force boot parameter and update the list of the values to include all supported horkage and link flag that can be forced. Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.de>
|
#
5832f1ae |
|
06-May-2022 |
Zhen Lei <thunder.leizhen@huawei.com> |
docs: kdump: Update the crashkernel description for arm64 Now arm64 has added support for "crashkernel=X,high" and "crashkernel=Y,low". Unlike x86, crash low memory is not allocated if "crashkernel=Y,low" is not specified. Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Acked-by: Baoquan He <bhe@redhat.com> Link: https://lore.kernel.org/r/20220506114402.365-7-thunder.leizhen@huawei.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
#
a57ffb3c |
|
31-Jan-2022 |
Paul E. McKenney <paulmck@kernel.org> |
srcu: Automatically determine size-transition strategy at boot This commit adds a srcutree.convert_to_big option of zero that causes SRCU to decide at boot whether to wait for contention (small systems) or immediately expand to large (large systems). A new srcutree.big_cpu_lim (defaulting to 128) defines how many CPUs constitute a large system. Co-developed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com> Signed-off-by: Neeraj Upadhyay <quic_neeraju@quicinc.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
3791a223 |
|
28-Feb-2022 |
Paul E. McKenney <paulmck@kernel.org> |
kernel/smp: Provide boot-time timeout for CSD lock diagnostics Debugging of problems involving insanely long-running SMI handlers proceeds better if the CSD-lock timeout can be adjusted. This commit therefore provides a new smp.csd_lock_timeout kernel boot parameter that specifies the timeout in milliseconds. The default remains at the previously hard-coded value of five seconds. [ paulmck: Apply feedback from Juergen Gross. ] Cc: Rik van Riel <riel@surriel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
f2539003 |
|
25-Feb-2022 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Print pre-stall-warning informational messages RCU-tasks stall-warning messages are printed after the grace period is ten minutes old. Unfortunately, most of us will have rebooted the system in response to an apparently-hung command long before the ten minutes is up, and will thus see what looks to be a silent hang. This commit therefore adds pr_info() messages that are printed earlier. These should avoid being classified as errors, but should give impatient users a hint. These are controlled by new rcupdate.rcu_task_stall_info and rcupdate.rcu_task_stall_info_mult kernel-boot parameters. The former defines the initial delay in jiffies (defaulting to 10 seconds) and the latter defines the multiplier (defaulting to 3). Thus, by default, the first message will appear 10 seconds into the RCU-tasks grace period, the second 40 seconds in, and the third 160 seconds in. There would be a fourth at 640 seconds in, but the stall warning message appears 600 seconds in, and once a stall warning is printed for a given grace period, no further informational messages are printed. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
9f2e91d9 |
|
27-Jan-2022 |
Paul E. McKenney <paulmck@kernel.org> |
srcu: Add contention-triggered addition of srcu_node tree This commit instruments the acquisitions of the srcu_struct structure's ->lock, enabling the initiation of a transition from SRCU_SIZE_SMALL to SRCU_SIZE_BIG when sufficient contention is experienced. The instrumentation counts the number of trylock failures within the confines of a single jiffy. If that number exceeds the value specified by the srcutree.small_contention_lim kernel boot parameter (which defaults to 100), and if the value specified by the srcutree.convert_to_big kernel boot parameter has the 0x10 bit set (defaults to 0), then a transition will be automatically initiated. By default, there will never be any transitions, so that none of the srcu_struct structures ever gains an srcu_node array. The useful values for srcutree.convert_to_big are: 0x00: Never convert. 0x01: Always convert at init_srcu_struct() time. 0x02: Convert when rcutorture prints its first round of statistics. 0x03: Decide conversion approach at boot given system size. 0x10: Convert if contention is encountered. 0x12: Convert if contention is encountered or when rcutorture prints its first round of statistics, whichever comes first. The value 0x11 acts the same as 0x01 because the conversion happens before there is any chance of contention. [ paulmck: Apply "static" feedback from kernel test robot. ] Co-developed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com> Signed-off-by: Neeraj Upadhyay <quic_neeraju@quicinc.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
c69a00a1 |
|
25-Jan-2022 |
Paul E. McKenney <paulmck@kernel.org> |
srcu: Add boot-time control over srcu_node array allocation This commit adds an srcu_tree.convert_to_big kernel parameter that either refuses to convert at all (0), converts immediately at init_srcu_struct() time (1), or lets rcutorture convert it (2). An addition contention-based dynamic conversion choice will be added, along with documentation. [ paulmck: Apply callback-scanning feedback from Neeraj Upadhyay. ] Co-developed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com> Signed-off-by: Neeraj Upadhyay <quic_neeraju@quicinc.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
ba37a143 |
|
07-Mar-2022 |
Michael Roth <michael.roth@amd.com> |
x86/sev: Add a sev= cmdline option For debugging purposes it is very useful to have a way to see the full contents of the SNP CPUID table provided to a guest. Add an sev=debug kernel command-line option to do so. Also introduce some infrastructure so that additional options can be specified via sev=option1[,option2] over time in a consistent manner. [ bp: Massage, simplify string parsing. ] Suggested-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Michael Roth <michael.roth@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220307213356.2797205-41-brijesh.singh@amd.com
|
#
f8858b5e |
|
26-Jan-2022 |
Borislav Petkov <bp@suse.de> |
x86/cpu: Remove "noclflush" Not really needed anymore and there's clearcpuid=. Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20220127115626.14179-7-bp@alien8.de
|
#
76ea0025 |
|
26-Jan-2022 |
Borislav Petkov <bp@suse.de> |
x86/cpu: Remove "noexec" It doesn't make any sense to disable non-executable mappings - security-wise or else. So rip out that switch and move the remaining code into setup.c and delete setup_nx.c Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Lai Jiangshan <jiangshanlai@gmail.com> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20220127115626.14179-6-bp@alien8.de
|
#
385d2ae0 |
|
26-Jan-2022 |
Borislav Petkov <bp@suse.de> |
x86/cpu: Remove "nosmep" There should be no need to disable SMEP anymore. Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Lai Jiangshan <jiangshanlai@gmail.com> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20220127115626.14179-5-bp@alien8.de
|
#
dbae0a93 |
|
26-Jan-2022 |
Borislav Petkov <bp@suse.de> |
x86/cpu: Remove CONFIG_X86_SMAP and "nosmap" Those were added as part of the SMAP enablement but SMAP is currently an integral part of kernel proper and there's no need to disable it anymore. Rip out that functionality. Leave --uaccess default on for objtool as this is what objtool should do by default anyway. If still needed - clearcpuid=smap. Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Lai Jiangshan <jiangshanlai@gmail.com> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20220127115626.14179-4-bp@alien8.de
|
#
c949110e |
|
26-Jan-2022 |
Borislav Petkov <bp@suse.de> |
x86/cpu: Remove "nosep" That chicken bit was added by 4f88651125e2 ("[PATCH] i386: allow disabling X86_FEATURE_SEP at boot") but measuring int80 vsyscall performance on 32-bit doesn't matter anymore. If still needed, one can boot with clearcpuid=sep to disable that feature for testing. Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20220127115626.14179-3-bp@alien8.de
|
#
1625c833 |
|
26-Jan-2022 |
Borislav Petkov <bp@suse.de> |
x86/cpu: Allow feature bit names from /proc/cpuinfo in clearcpuid= Having to give the X86_FEATURE array indices in order to disable a feature bit for testing is not really user-friendly. So accept the feature bit names too. Some feature bits don't have names so there the array indices are still accepted, of course. Clearing CPUID flags is not something which should be done in production so taint the kernel too. An exemplary cmdline would then be something like: clearcpuid=de,440,smca,succory,bmi1,3dnow ("succory" is wrong on purpose). And it says: [ ... ] Clearing CPUID bits: de 13:24 smca (unknown: succory) bmi1 3dnow [ Fix CONFIG_X86_FEATURE_NAMES=n build error as reported by the 0day robot: https://lore.kernel.org/r/202203292206.ICsY2RKX-lkp@intel.com ] Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20220127115626.14179-2-bp@alien8.de
|
#
d97c68d1 |
|
22-Mar-2022 |
Jason A. Donenfeld <Jason@zx2c4.com> |
random: treat bootloader trust toggle the same way as cpu trust toggle If CONFIG_RANDOM_TRUST_CPU is set, the RNG initializes using RDRAND. But, the user can disable (or enable) this behavior by setting `random.trust_cpu=0/1` on the kernel command line. This allows system builders to do reasonable things while avoiding howls from tinfoil hatters. (Or vice versa.) CONFIG_RANDOM_TRUST_BOOTLOADER is basically the same thing, but regards the seed passed via EFI or device tree, which might come from RDRAND or a TPM or somewhere else. In order to allow distros to more easily enable this while avoiding those same howls (or vice versa), this commit adds the corresponding `random.trust_bootloader=0/1` toggle. Cc: Theodore Ts'o <tytso@mit.edu> Cc: Graham Christensen <graham@grahamc.com> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net> Link: https://github.com/NixOS/nixpkgs/pull/165355 Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
#
f953f140 |
|
23-Mar-2022 |
Guilherme G. Piccoli <gpiccoli@igalia.com> |
panic: move panic_print before kmsg dumpers The panic_print setting allows users to collect more information in a panic event, like memory stats, tasks, CPUs backtraces, etc. This is an interesting debug mechanism, but currently the print event happens *after* kmsg_dump(), meaning that pstore, for example, cannot collect a dmesg with the panic_print extra information. This patch changes that in 2 steps: (a) The panic_print setting allows to replay the existing kernel log buffer to the console (bit 5), besides the extra information dump. This functionality makes sense only at the end of the panic() function. So, we hereby allow to distinguish the two situations by a new boolean parameter in the function panic_print_sys_info(). (b) With the above change, we can safely call panic_print_sys_info() before kmsg_dump(), allowing to dump the extra information when using pstore or other kmsg dumpers. The additional messages from panic_print could overwrite the oldest messages when the buffer is full. The only reasonable solution is to use a large enough log buffer, hence we added an advice into the kernel parameters documentation about that. Link: https://lkml.kernel.org/r/20220214141308.841525-1-gpiccoli@igalia.com Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com> Acked-by: Baoquan He <bhe@redhat.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Feng Tang <feng.tang@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
8d470a45 |
|
23-Mar-2022 |
Guilherme G. Piccoli <gpiccoli@igalia.com> |
panic: add option to dump all CPUs backtraces in panic_print Currently the "panic_print" parameter/sysctl allows some interesting debug information to be printed during a panic event. This is useful for example in cases the user cannot kdump due to resource limits, or if the user collects panic logs in a serial output (or pstore) and prefers a fast reboot instead of a kdump. Happens that currently there's no way to see all CPUs backtraces in a panic using "panic_print" on architectures that support that. We do have "oops_all_cpu_backtrace" sysctl, but although partially overlapping in the functionality, they are orthogonal in nature: "panic_print" is a panic tuning (and we have panics without oopses, like direct calls to panic() or maybe other paths that don't go through oops_enter() function), and the original purpose of "oops_all_cpu_backtrace" is to provide more information on oopses for cases in which the users desire to continue running the kernel even after an oops, i.e., used in non-panic scenarios. So, we hereby introduce an additional bit for "panic_print" to allow dumping the CPUs backtraces during a panic event. Link: https://lkml.kernel.org/r/20211109202848.610874-3-gpiccoli@igalia.com Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com> Reviewed-by: Feng Tang <feng.tang@intel.com> Cc: Iurii Zaikin <yzaikin@google.com> Cc: Kees Cook <keescook@chromium.org> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: Samuel Iglesias Gonsalvez <siglesias@igalia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
1bbc60d0 |
|
18-Feb-2022 |
Sean Christopherson <seanjc@google.com> |
KVM: x86/mmu: Remove MMU auditing Remove mmu_audit.c and all its collateral, the auditing code has suffered severe bitrot, ironically partly due to shadow paging being more stable and thus not benefiting as much from auditing, but mostly due to TDP supplanting shadow paging for non-nested guests and shadowing of nested TDP not heavily stressing the logic that is being audited. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
cb00a70b |
|
19-Jan-2022 |
David Matlack <dmatlack@google.com> |
KVM: x86/mmu: Split huge pages mapped by the TDP MMU during KVM_CLEAR_DIRTY_LOG When using KVM_DIRTY_LOG_INITIALLY_SET, huge pages are not write-protected when dirty logging is enabled on the memslot. Instead they are write-protected once userspace invokes KVM_CLEAR_DIRTY_LOG for the first time and only for the specific sub-region being cleared. Enhance KVM_CLEAR_DIRTY_LOG to also try to split huge pages prior to write-protecting to avoid causing write-protection faults on vCPU threads. This also allows userspace to smear the cost of huge page splitting across multiple ioctls, rather than splitting the entire memslot as is the case when initially-all-set is not used. Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20220119230739.2234394-17-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
a3fe5dbd |
|
19-Jan-2022 |
David Matlack <dmatlack@google.com> |
KVM: x86/mmu: Split huge pages mapped by the TDP MMU when dirty logging is enabled When dirty logging is enabled without initially-all-set, try to split all huge pages in the memslot down to 4KB pages so that vCPUs do not have to take expensive write-protection faults to split huge pages. Eager page splitting is best-effort only. This commit only adds the support for the TDP MMU, and even there splitting may fail due to out of memory conditions. Failures to split a huge page is fine from a correctness standpoint because KVM will always follow up splitting by write-protecting any remaining huge pages. Eager page splitting moves the cost of splitting huge pages off of the vCPU threads and onto the thread enabling dirty logging on the memslot. This is useful because: 1. Splitting on the vCPU thread interrupts vCPUs execution and is disruptive to customers whereas splitting on VM ioctl threads can run in parallel with vCPU execution. 2. Splitting all huge pages at once is more efficient because it does not require performing VM-exit handling or walking the page table for every 4KiB page in the memslot, and greatly reduces the amount of contention on the mmu_lock. For example, when running dirty_log_perf_test with 96 virtual CPUs, 1GiB per vCPU, and 1GiB HugeTLB memory, the time it takes vCPUs to write to all of their memory after dirty logging is enabled decreased by 95% from 2.94s to 0.14s. Eager Page Splitting is over 100x more efficient than the current implementation of splitting on fault under the read lock. For example, taking the same workload as above, Eager Page Splitting reduced the CPU required to split all huge pages from ~270 CPU-seconds ((2.94s - 0.14s) * 96 vCPU threads) to only 1.55 CPU-seconds. Eager page splitting does increase the amount of time it takes to enable dirty logging since it has split all huge pages. For example, the time it took to enable dirty logging in the 96GiB region of the aforementioned test increased from 0.001s to 1.55s. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20220119230739.2234394-16-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
380af29b |
|
10-Mar-2022 |
Steven Rostedt (Google) <rostedt@goodmis.org> |
tracing: Add snapshot at end of kernel boot up Add ftrace_boot_snapshot kernel parameter that will take a snapshot at the end of boot up just before switching over to user space (it happens during the kernel freeing of init memory). This is useful when there's interesting data that can be collected from kernel start up, but gets overridden by user space start up code. With this option, the ring buffer content from the boot up traces gets saved in the snapshot at the end of boot up. This trace can be read from: /sys/kernel/tracing/snapshot Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
#
84842911 |
|
17-Feb-2022 |
Christophe Leroy <christophe.leroy@csgroup.eu> |
vsprintf: Fix %pK with kptr_restrict == 0 Although kptr_restrict is set to 0 and the kernel is booted with no_hash_pointers parameter, the content of /proc/vmallocinfo is lacking the real addresses. / # cat /proc/vmallocinfo 0x(ptrval)-0x(ptrval) 8192 load_module+0xc0c/0x2c0c pages=1 vmalloc 0x(ptrval)-0x(ptrval) 12288 start_kernel+0x4e0/0x690 pages=2 vmalloc 0x(ptrval)-0x(ptrval) 12288 start_kernel+0x4e0/0x690 pages=2 vmalloc 0x(ptrval)-0x(ptrval) 8192 _mpic_map_mmio.constprop.0+0x20/0x44 phys=0x80041000 ioremap 0x(ptrval)-0x(ptrval) 12288 _mpic_map_mmio.constprop.0+0x20/0x44 phys=0x80041000 ioremap ... According to the documentation for /proc/sys/kernel/, %pK is equivalent to %p when kptr_restrict is set to 0. Fixes: 5ead723a20e0 ("lib/vsprintf: no_hash_pointers prints all addresses as unhashed") Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/107476128e59bff11a309b5bf7579a1753a41aca.1645087605.git.christophe.leroy@csgroup.eu
|
#
96b02f2f |
|
16-Feb-2022 |
Randy Dunlap <rdunlap@infradead.org> |
Docs: printk: add 'console=null|""' to admin/kernel-parameters Tell about 'console=null|""' and how to use it. It can be helpful to set (enable) CONFIG_NULL_TTY so that the ttynull driver is available. This avoids problems with stdin/stdout/stderr of the init process. Howevere, CONFIG_NULL_TTY cannot be enabled by default because it can be used by mistake, see the commit a91bd6223ecd ("Revert "init/console: Use ttynull as a fallback when there is no console""). Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: John Ogness <john.ogness@linutronix.de> Cc: Jonathan Corbet <corbet@lwn.net> Cc: linux-doc@vger.kernel.org [pmladek@suse.com: Slightly update wording.] Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20220216203745.980-1-rdunlap@infradead.org
|
#
e7d32485 |
|
22-Mar-2022 |
Muchun Song <songmuchun@bytedance.com> |
mm: hugetlb: free the 2nd vmemmap page associated with each HugeTLB page Patch series "Free the 2nd vmemmap page associated with each HugeTLB page", v7. This series can minimize the overhead of struct page for 2MB HugeTLB pages significantly. It further reduces the overhead of struct page by 12.5% for a 2MB HugeTLB compared to the previous approach, which means 2GB per 1TB HugeTLB. It is a nice gain. Comments and reviews are welcome. Thanks. The main implementation and details can refer to the commit log of patch 1. In this series, I have changed the following four helpers, the following table shows the impact of the overhead of those helpers. +------------------+-----------------------+ | APIs | head page | tail page | +------------------+-----------+-----------+ | PageHead() | Y | N | +------------------+-----------+-----------+ | PageTail() | Y | N | +------------------+-----------+-----------+ | PageCompound() | N | N | +------------------+-----------+-----------+ | compound_head() | Y | N | +------------------+-----------+-----------+ Y: Overhead is increased. N: Overhead is _NOT_ increased. It shows that the overhead of those helpers on a tail page don't change between "hugetlb_free_vmemmap=on" and "hugetlb_free_vmemmap=off". But the overhead on a head page will be increased when "hugetlb_free_vmemmap=on" (except PageCompound()). So I believe that Matthew Wilcox's folio series will help with this. The users of PageHead() and PageTail() are much less than compound_head() and most users of PageTail() are VM_BUG_ON(), so I have done some tests about the overhead of compound_head() on head pages. I have tested the overhead of calling compound_head() on a head page, which is 2.11ns (Measure the call time of 10 million times compound_head(), and then average). For a head page whose address is not aligned with PAGE_SIZE or a non-compound page, the overhead of compound_head() is 2.54ns which is increased by 20%. For a head page whose address is aligned with PAGE_SIZE, the overhead of compound_head() is 2.97ns which is increased by 40%. Most pages are the former. I do not think the overhead is significant since the overhead of compound_head() itself is low. This patch (of 5): This patch minimizes the overhead of struct page for 2MB HugeTLB pages significantly. It further reduces the overhead of struct page by 12.5% for a 2MB HugeTLB compared to the previous approach, which means 2GB per 1TB HugeTLB (2MB type). After the feature of "Free sonme vmemmap pages of HugeTLB page" is enabled, the mapping of the vmemmap addresses associated with a 2MB HugeTLB page becomes the figure below. HugeTLB struct pages(8 pages) page frame(8 pages) +-----------+ ---virt_to_page---> +-----------+ mapping to +-----------+---> PG_head | | | 0 | -------------> | 0 | | | +-----------+ +-----------+ | | | 1 | -------------> | 1 | | | +-----------+ +-----------+ | | | 2 | ----------------^ ^ ^ ^ ^ ^ | | +-----------+ | | | | | | | | 3 | ------------------+ | | | | | | +-----------+ | | | | | | | 4 | --------------------+ | | | | 2MB | +-----------+ | | | | | | 5 | ----------------------+ | | | | +-----------+ | | | | | 6 | ------------------------+ | | | +-----------+ | | | | 7 | --------------------------+ | | +-----------+ | | | | | | +-----------+ As we can see, the 2nd vmemmap page frame (indexed by 1) is reused and remaped. However, the 2nd vmemmap page frame is also can be freed to the buddy allocator, then we can change the mapping from the figure above to the figure below. HugeTLB struct pages(8 pages) page frame(8 pages) +-----------+ ---virt_to_page---> +-----------+ mapping to +-----------+---> PG_head | | | 0 | -------------> | 0 | | | +-----------+ +-----------+ | | | 1 | ---------------^ ^ ^ ^ ^ ^ ^ | | +-----------+ | | | | | | | | | 2 | -----------------+ | | | | | | | +-----------+ | | | | | | | | 3 | -------------------+ | | | | | | +-----------+ | | | | | | | 4 | ---------------------+ | | | | 2MB | +-----------+ | | | | | | 5 | -----------------------+ | | | | +-----------+ | | | | | 6 | -------------------------+ | | | +-----------+ | | | | 7 | ---------------------------+ | | +-----------+ | | | | | | +-----------+ After we do this, all tail vmemmap pages (1-7) are mapped to the head vmemmap page frame (0). In other words, there are more than one page struct with PG_head associated with each HugeTLB page. We __know__ that there is only one head page struct, the tail page structs with PG_head are fake head page structs. We need an approach to distinguish between those two different types of page structs so that compound_head(), PageHead() and PageTail() can work properly if the parameter is the tail page struct but with PG_head. The following code snippet describes how to distinguish between real and fake head page struct. if (test_bit(PG_head, &page->flags)) { unsigned long head = READ_ONCE(page[1].compound_head); if (head & 1) { if (head == (unsigned long)page + 1) ==> head page struct else ==> tail page struct } else ==> head page struct } We can safely access the field of the @page[1] with PG_head because the @page is a compound page composed with at least two contiguous pages. [songmuchun@bytedance.com: restore lost comment changes] Link: https://lkml.kernel.org/r/20211101031651.75851-1-songmuchun@bytedance.com Link: https://lkml.kernel.org/r/20211101031651.75851-2-songmuchun@bytedance.com Signed-off-by: Muchun Song <songmuchun@bytedance.com> Reviewed-by: Barry Song <song.bao.hua@hisilicon.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Michal Hocko <mhocko@suse.com> Cc: David Hildenbrand <david@redhat.com> Cc: Chen Huang <chenhuang5@huawei.com> Cc: Bodeddula Balasubramaniam <bodeddub@amazon.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Matthew Wilcox <willy@infradead.org> Cc: Xiongchun Duan <duanxiongchun@bytedance.com> Cc: Fam Zheng <fam.zheng@bytedance.com> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
75c05fab |
|
10-Mar-2022 |
Mike Rapoport <rppt@kernel.org> |
docs/kernel-parameters: update description of mem= The existing description of mem= does not cover all the cases and differences between how architectures treat it. Extend the description to match the code. Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Link: https://lore.kernel.org/r/20220310082736.1346366-1-rppt@kernel.org Signed-off-by: Jonathan Corbet <corbet@lwn.net>
|
#
99fdc587 |
|
09-Jan-2022 |
Armin Wolf <W_Armin@gmx.de> |
Documentation: admin-guide: Add Documentation for undocumented dell_smm_hwmon parameters Add documentation for fan_mult and fan_max. Signed-off-by: Armin Wolf <W_Armin@gmx.de> Link: https://lore.kernel.org/r/20220109214248.61759-3-W_Armin@gmx.de Signed-off-by: Guenter Roeck <linux@roeck-us.net>
|
#
1b089084 |
|
09-Jan-2022 |
Armin Wolf <W_Armin@gmx.de> |
Documentation: admin-guide: Update i8k driver name The driver should be called dell_smm_hwmon, i8k is only an alias now. Signed-off-by: Armin Wolf <W_Armin@gmx.de> Acked-by: Randy Dunlap <rdunlap@infradead.org> Link: https://lore.kernel.org/r/20220109214248.61759-2-W_Armin@gmx.de Signed-off-by: Guenter Roeck <linux@roeck-us.net>
|
#
a469948b |
|
11-Jan-2022 |
Alison Chaiken <achaiken@aurora.tech> |
rcu: Update documentation regarding kthread_prio cmdline parameter Inform readers that the priority of RCU no-callback threads will also be boosted. Signed-off-by: Alison Chaiken <achaiken@aurora.tech> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
5ad3eb11 |
|
16-Feb-2022 |
Peter Zijlstra <peterz@infradead.org> |
Documentation/hw-vuln: Update spectre doc Update the doc with the new fun. [ bp: Massage commit message. ] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
|
#
67548622 |
|
19-Oct-2021 |
Christophe Leroy <christophe.leroy@csgroup.eu> |
powerpc/kuep: Remove 'nosmep' boot time parameter except for book3s/64 Deactivating KUEP at boot time is unrelevant for PPC32 and BOOK3E/64. Remove it. It allows to refactor setup_kuep() via a __weak function that only PPC64s will overide for now. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> [mpe: Fix CONFIG_PPC_BOOKS_64 -> CONFIG_PPC_BOOK3S_64 typo] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/4c36df18b41c988c4512f45d96220486adbe4c99.1634627931.git.christophe.leroy@csgroup.eu
|
#
c21ee04f |
|
05-Nov-2021 |
Cédric Le Goater <clg@kaod.org> |
powerpc/xive: Add a kernel parameter for StoreEOI StoreEOI is activated by default on platforms supporting the feature (POWER10) and will be used as soon as firmware advertises its availability. The kernel parameter provides a way to deactivate its use. It can be still be reactivated through debugfs. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211105102636.1016378-10-clg@kaod.org
|
#
0a4b4327 |
|
23-Nov-2021 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc/64s: Implement PMU override command line option It can be useful in simulators (with very constrained environments) to allow some PMCs to run from boot so they can be sampled directly by a test harness, rather than having to run perf. A previous change freezes counters at boot by default, so provide a boot time option to un-freeze (plus a bit more flexibility). Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211123095231.1036501-13-npiggin@gmail.com
|
#
1a562067 |
|
18-Nov-2021 |
Waiman Long <longman@redhat.com> |
clocksource: Reduce the default clocksource_watchdog() retries to 2 With the previous patch, there is an extra watchdog read in each retry. Now the total number of clocksource reads is increased to 4 per iteration. In order to avoid increasing the clock skew check overhead, the default maximum number of retries is reduced from 3 to 2 to maintain the same 12 clocksource reads in the worst case. Suggested-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
d2cf0854 |
|
22-Nov-2021 |
Frederic Weisbecker <frederic@kernel.org> |
rcu/nocb: Allow empty "rcu_nocbs" kernel parameter Allow the rcu_nocbs kernel parameter to be specified just by itself, without specifying any CPUs. This allows systems administrators to use "rcu_nocbs" to specify that none of the CPUs are to be offloaded at boot time, but than any of them may be offloaded at runtime via cpusets. In contrast, if the "rcu_nocbs" or "nohz_full" kernel parameters are not specified at all, then not only are none of the CPUs offloaded at boot, none of them can be offloaded at runtime, either. While in the area, modernize the description of the "rcuo" kthreads' naming scheme. Reviewed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Uladzislau Rezki <urezki@gmail.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Joel Fernandes <joel@joelfernandes.org> Tested-by: Juri Lelli <juri.lelli@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
fd796e41 |
|
29-Nov-2021 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Use fewer callbacks queues if callback flood ends By default, when lock contention is encountered, the RCU Tasks flavors of RCU switch to using per-CPU queueing. However, if the callback flood ends, per-CPU queueing continues to be used, which introduces significant additional overhead, especially for callback invocation, which fans out a series of workqueue handlers. This commit therefore switches back to single-queue operation if at the beginning of a grace period there are very few callbacks. The definition of "very few" is set by the rcupdate.rcu_task_collapse_lim module parameter, which defaults to 10. This switch happens in two phases, with the first phase causing future callbacks to be enqueued on CPU 0's queue, but with all queues continuing to be checked for grace periods and callback invocation. The second phase checks to see if an RCU grace period has elapsed and if all remaining RCU-Tasks callbacks are queued on CPU 0. If so, only CPU 0 is checked for future grace periods and callback operation. Of course, the return of contention anywhere during this process will result in returning to per-CPU callback queueing. Reported-by: Martin Lau <kafai@fb.com> Cc: Neeraj Upadhyay <neeraj.iitr10@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
ab97152f |
|
24-Nov-2021 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Use more callback queues if contention encountered The rcupdate.rcu_task_enqueue_lim module parameter allows system administrators to tune the number of callback queues used by the RCU Tasks flavors. However if callback storms are infrequent, it would be better to operate with a single queue on a given system unless and until that system actually needed more queues. Systems not needing more queues can then avoid the overhead of checking the extra queues and especially avoid the overhead of fanning workqueue handlers out to all CPUs to invoke callbacks. This commit therefore switches to using all the CPUs' callback queues if call_rcu_tasks_generic() encounters too much lock contention. The amount of lock contention to tolerate defaults to 100 contended lock acquisitions per jiffy, and can be adjusted using the new rcupdate.rcu_task_contend_lim module parameter. Such switching is undertaken only if the rcupdate.rcu_task_enqueue_lim module parameter is negative, which is its default value (-1). This allows savvy systems administrators to set the number of queues to some known good value and to not have to worry about the kernel doing any second guessing. [ paulmck: Apply feedback from Guillaume Tucker and kernelci. ] Reported-by: Martin Lau <kafai@fb.com> Cc: Neeraj Upadhyay <neeraj.iitr10@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
8610b656 |
|
12-Nov-2021 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Add rcupdate.rcu_task_enqueue_lim to set initial queueing This commit adds a rcupdate.rcu_task_enqueue_lim module parameter that sets the initial number of callback queues to use for the RCU Tasks family of RCU implementations. This parameter allows testing of various fanout values. Reported-by: Martin Lau <kafai@fb.com> Cc: Neeraj Upadhyay <neeraj.iitr10@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
82e31003 |
|
22-Nov-2021 |
Paul E. McKenney <paulmck@kernel.org> |
rcutorture: Enable multiple concurrent callback-flood kthreads This commit converts the rcutorture.fwd_progress module parameter from bool to int, so that it specifies the number of callback-flood kthreads. Values less than zero specify one kthread per CPU, however, the number of kthreads executing concurrently is limited to the number of online CPUs. This commit also reverse the order of the need-resched and callback-flood operations to cause the callback flooding to happen more nearly at the same time. Cc: Neeraj Upadhyay <neeraj.iitr10@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
e2c73a68 |
|
27-Sep-2021 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Remove the RCU_FAST_NO_HZ Kconfig option All of the uses of CONFIG_RCU_FAST_NO_HZ=y that I have seen involve systems with RCU callbacks offloaded. In this situation, all that this Kconfig option does is slow down idle entry/exit with an additional allways-taken early exit. If this is the only use case, then this Kconfig option nothing but an attractive nuisance that needs to go away. This commit therefore removes the RCU_FAST_NO_HZ Kconfig option. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
74d95555 |
|
08-Nov-2021 |
David Woodhouse <dwmw@amazon.co.uk> |
PM: hibernate: Allow ACPI hardware signature to be honoured Theoretically, when the hardware signature in FACS changes, the OS is supposed to gracefully decline to attempt to resume from S4: "If the signature has changed, OSPM will not restore the system context and can boot from scratch" In practice, Windows doesn't do this and many laptop vendors do allow the signature to change especially when docking/undocking, so it would be a bad idea to simply comply with the specification by default in the general case. However, there are use cases where we do want the compliant behaviour and we know it's safe. Specifically, when resuming virtual machines where we know the hypervisor has changed sufficiently that resume will fail. We really want to be able to *tell* the guest kernel not to try, so it boots cleanly and doesn't just crash. This patch provides a way to opt in to the spec-compliant behaviour on the command line. A follow-up patch may do this automatically for certain "known good" machines based on a DMI match, or perhaps just for all hypervisor guests since there's no good reason a hypervisor would change the hardware_signature that it exposes to guests *unless* it wants them to obey the ACPI specification. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
#
376e3fde |
|
18-Nov-2021 |
Finn Thain <fthain@linux-m68k.org> |
m68k: Enable memtest functionality Enable the memtest functionality and rearrange some code to prevent it from clobbering the initrd. The code to implement CONFIG_BLK_DEV_INITRD was conditional on !defined(CONFIG_SUN3). For simplicity, remove that test on the basis that m68k_ramdisk.size == 0 on Sun 3. The SLIME source code at http://sammy.net/sun3/ftp/pub/m68k/sun3/slime/slime-2.0.tar.gz indicates that no BI_RAMDISK entry is ever passed to the kernel due to #ifdef 0 around the relevant code. Cc: Mike Rapoport <rppt@kernel.org> Cc: Sam Creasey <sammy@sammy.net> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Finn Thain <fthain@linux-m68k.org> Acked-by: Mike Rapoport <rppt@linux.ibm.com> Link: https://lore.kernel.org/r/8170fe1d1c62426d82275d36ba409ecc18754292.1637274578.git.fthain@linux-m68k.org Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
|
#
b22a15a5 |
|
12-Nov-2021 |
Javier Martinez Canillas <javierm@redhat.com> |
Documentation/admin-guide: Document nomodeset kernel parameter The nomodeset kernel command line parameter is not documented. Its name is quite vague and is not intuitive what's the behaviour when it is set. Document in kernel-parameters.txt what actually happens when nomodeset is used. That way, users could know if they want to enable this option. Signed-off-by: Javier Martinez Canillas <javierm@redhat.com> Acked-by: Thomas Zimmermann <tzimmermann@suse.de> Acked-by: Jani Nikula <jani.nikula@intel.com> Acked-by: Pekka Paalanen <pekka.paalanen@collabora.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211112133230.1595307-6-javierm@redhat.com
|
#
9222ba68 |
|
29-Nov-2021 |
Takashi Iwai <tiwai@suse.de> |
Input: i8042 - add deferred probe support We've got a bug report about the non-working keyboard on ASUS ZenBook UX425UA. It seems that the PS/2 device isn't ready immediately at boot but takes some seconds to get ready. Until now, the only workaround is to defer the probe, but it's available only when the driver is a module. However, many distros, including openSUSE as in the original report, build the PS/2 input drivers into kernel, hence it won't work easily. This patch adds the support for the deferred probe for i8042 stuff as a workaround of the problem above. When the deferred probe mode is enabled and the device couldn't be probed, it'll be repeated with the standard deferred probe mechanism. The deferred probe mode is enabled either via the new option i8042.probe_defer or via the quirk table entry. As of this patch, the quirk table contains only ASUS ZenBook UX425UA. The deferred probe part is based on Fabio's initial work. BugLink: https://bugzilla.suse.com/show_bug.cgi?id=1190256 Signed-off-by: Takashi Iwai <tiwai@suse.de> Tested-by: Samuel Čavoj <samuel@cavoj.net> Link: https://lore.kernel.org/r/20211117063757.11380-1-tiwai@suse.de Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
|
#
0ff29701 |
|
07-Dec-2021 |
Sean Christopherson <seanjc@google.com> |
KVM: VMX: Fix stale docs for kvm-intel.emulate_invalid_guest_state Update the documentation for kvm-intel's emulate_invalid_guest_state to rectify the description of KVM's default behavior, and to document that the behavior and thus parameter only applies to L1. Fixes: a27685c33acc ("KVM: VMX: Emulate invalid guest state by default") Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211207193006.120997-4-seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
40fdea02 |
|
02-Nov-2021 |
Juergen Gross <jgross@suse.com> |
xen/balloon: add late_initcall_sync() for initial ballooning done When running as PVH or HVM guest with actual memory < max memory the hypervisor is using "populate on demand" in order to allow the guest to balloon down from its maximum memory size. For this to work correctly the guest must not touch more memory pages than its target memory size as otherwise the PoD cache will be exhausted and the guest is crashed as a result of that. In extreme cases ballooning down might not be finished today before the init process is started, which can consume lots of memory. In order to avoid random boot crashes in such cases, add a late init call to wait for ballooning down having finished for PVH/HVM guests. Warn on console if initial ballooning fails, panic() after stalling for more than 3 minutes per default. Add a module parameter for changing this timeout. [boris: replaced pr_info() with pr_notice()] Cc: <stable@vger.kernel.org> Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Signed-off-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/20211102091944.17487-1-jgross@suse.com Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
|
#
6d91929a |
|
01-Nov-2021 |
J. Bruce Fields <bfields@redhat.com> |
nfsd: document server-to-server-copy parameters Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
#
b5389086 |
|
05-Nov-2021 |
Zhenguo Yao <yaozhenguo1@gmail.com> |
hugetlbfs: extend the definition of hugepages parameter to support node allocation We can specify the number of hugepages to allocate at boot. But the hugepages is balanced in all nodes at present. In some scenarios, we only need hugepages in one node. For example: DPDK needs hugepages which are in the same node as NIC. If DPDK needs four hugepages of 1G size in node1 and system has 16 numa nodes we must reserve 64 hugepages on the kernel cmdline. But only four hugepages are used. The others should be free after boot. If the system memory is low(for example: 64G), it will be an impossible task. So extend the hugepages parameter to support specifying hugepages on a specific node. For example add following parameter: hugepagesz=1G hugepages=0:1,1:3 It will allocate 1 hugepage in node0 and 3 hugepages in node1. Link: https://lkml.kernel.org/r/20211005054729.86457-1-yaozhenguo1@gmail.com Signed-off-by: Zhenguo Yao <yaozhenguo1@gmail.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Zhenguo Yao <yaozhenguo1@gmail.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Mike Rapoport <rppt@kernel.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
38e719ab |
|
05-Nov-2021 |
Baolin Wang <baolin.wang@linux.alibaba.com> |
hugetlb: support node specified when using cma for gigantic hugepages Now the size of CMA area for gigantic hugepages runtime allocation is balanced for all online nodes, but we also want to specify the size of CMA per-node, or only one node in some cases, which are similar with patch [1]. For example, on some multi-nodes systems, each node's memory can be different, allocating the same size of CMA for each node is not suitable for the low-memory nodes. Meanwhile some workloads like DPDK mentioned by Zhenguo in patch [1] only need hugepages in one node. On the other hand, we have some machines with multiple types of memory, like DRAM and PMEM (persistent memory). On this system, we may want to specify all the hugepages only on DRAM node, or specify the proportion of DRAM node and PMEM node, to tuning the performance of the workloads. Thus this patch adds node format for 'hugetlb_cma' parameter to support specifying the size of CMA per-node. An example is as follows: hugetlb_cma=0:5G,2:5G which means allocating 5G size of CMA area on node 0 and node 2 respectively. And the users should use the node specific sysfs file to allocate the gigantic hugepages if specified the CMA size on that node. Link: https://lkml.kernel.org/r/20211005054729.86457-1-yaozhenguo1@gmail.com [1] Link: https://lkml.kernel.org/r/bb790775ca60bb8f4b26956bb3f6988f74e075c7.1634261144.git.baolin.wang@linux.alibaba.com Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Roman Gushchin <guro@fb.com> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
6aefbf1c |
|
14-Sep-2021 |
Niklas Schnelle <schnelle@linux.ibm.com> |
s390/pci: add s390_iommu_aperture kernel parameter Some applications map the same memory area for DMA multiple times while also mapping significant amounts of memory. With our current DMA code these applications will run out of DMA addresses after mapping half of the available memory because the number of DMA mappings is constrained by the number of concurrently active DMA addresses we support which in turn is limited by the minimum of hardware constraints and high_memory. Limiting the number of active DMA addresses to high_memory is only a heuristic to save memory used by the iommu_bitmap and DMA page tables however. This was added under the assumption that it rarely makes sense to DMA map more than system memory. To accommodate special applications which insist on double mapping, which works on other platforms, allow specifying a factor of how many times installed memory is available as DMA address space. Use 0 as a special value to apply no constraints beyond what hardware dictates at the expense of significantly more memory use. Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com> Reviewed-by: Pierre Morel <pmorel@linux.ibm.com> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
#
9c40e1aa |
|
13-Oct-2021 |
Andrew Halaney <ahalaney@redhat.com> |
dyndbg: Remove support for ddebug_query param This param has been deprecated for a very long time now, let's rip it out. Signed-off-by: Andrew Halaney <ahalaney@redhat.com> Signed-off-by: Jason Baron <jbaron@akamai.com> Link: https://lore.kernel.org/r/1634139622-20667-3-git-send-email-jbaron@akamai.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
4dfe4f40 |
|
19-Oct-2021 |
Junaid Shahid <junaids@google.com> |
kvm: x86: mmu: Make NX huge page recovery period configurable Currently, the NX huge page recovery thread wakes up every minute and zaps 1/nx_huge_pages_recovery_ratio of the total number of split NX huge pages at a time. This is intended to ensure that only a relatively small number of pages get zapped at a time. But for very large VMs (or more specifically, VMs with a large number of executable pages), a period of 1 minute could still result in this number being too high (unless the ratio is changed significantly, but that can result in split pages lingering on for too long). This change makes the period configurable instead of fixing it at 1 minute. Users of large VMs can then adjust the period and/or the ratio to reduce the number of pages zapped at one time while still maintaining the same overall duration for cycling through the entire list. By default, KVM derives a period from the ratio such that a page will remain on the list for 1 hour on average. Signed-off-by: Junaid Shahid <junaids@google.com> Message-Id: <20211020010627.305925-1-junaids@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
53e8ce13 |
|
11-Oct-2021 |
Alexandru Elisei <alexandru.elisei@arm.com> |
Documentation: admin-guide: Document side effects when pKVM is enabled Recent changes to KVM for arm64 has made it impossible for the host to hibernate or use kexec when protected mode is enabled via the kernel command line. There are people who rely on kexec (for example, developers who use kexec as a quick way to test a new kernel), let's document this change in behaviour, so it doesn't catch them by surprise and we have a place to point people to if it does. Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211011153835.291147-1-alexandru.elisei@arm.com
|
#
b6a68b97 |
|
01-Oct-2021 |
Marc Zyngier <maz@kernel.org> |
KVM: arm64: Allow KVM to be disabled from the command line Although KVM can be compiled out of the kernel, it cannot be disabled at runtime. Allow this possibility by introducing a new mode that will prevent KVM from initialising. This is useful in the (limited) circumstances where you don't want KVM to be available (what is wrong with you?), or when you want to install another hypervisor instead (good luck with that). Reviewed-by: David Brazdil <dbrazdil@google.com> Acked-by: Will Deacon <will@kernel.org> Acked-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Andrew Scull <ascull@google.com> Link: https://lore.kernel.org/r/20211001170553.3062988-1-maz@kernel.org
|
#
3aac3ebe |
|
21-Oct-2021 |
Thomas Gleixner <tglx@linutronix.de> |
x86/signal: Implement sigaltstack size validation For historical reasons MINSIGSTKSZ is a constant which became already too small with AVX512 support. Add a mechanism to enforce strict checking of the sigaltstack size against the real size of the FPU frame. The strict check can be enabled via a config option and can also be controlled via the kernel command line option 'strict_sas_size' independent of the config switch. Enabling it might break existing applications which allocate a too small sigaltstack but 'work' because they never get a signal delivered. Though it can be handy to filter out binaries which are not yet aware of AT_MINSIGSTKSZ. Also the upcoming support for dynamically enabled FPU features requires a strict sanity check to ensure that: - Enabling of a dynamic feature, which changes the sigframe size fits into an enabled sigaltstack - Installing a too small sigaltstack after a dynamic feature has been added is not possible. Implement the base check which is controlled by config and command line options. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20211021225527.10184-3-chang.seok.bae@intel.com
|
#
2f46993d |
|
04-Nov-2020 |
Andrea Arcangeli <aarcange@redhat.com> |
x86: change default to spec_store_bypass_disable=prctl spectre_v2_user=prctl Switch the kernel default of SSBD and STIBP to the ones with CONFIG_SECCOMP=n (i.e. spec_store_bypass_disable=prctl spectre_v2_user=prctl) even if CONFIG_SECCOMP=y. Several motivations listed below: - If SMT is enabled the seccomp jail can still attack the rest of the system even with spectre_v2_user=seccomp by using MDS-HT (except on XEON PHI where MDS can be tamed with SMT left enabled, but that's a special case). Setting STIBP become a very expensive window dressing after MDS-HT was discovered. - The seccomp jail cannot attack the kernel with spectre-v2-HT regardless (even if STIBP is not set), but with MDS-HT the seccomp jail can attack the kernel too. - With spec_store_bypass_disable=prctl the seccomp jail can attack the other userland (guest or host mode) using spectre-v2-HT, but the userland attack is already mitigated by both ASLR and pid namespaces for host userland and through virt isolation with libkrun or kata. (if something if somebody is worried about spectre-v2-HT it's best to mount proc with hidepid=2,gid=proc on workstations where not all apps may run under container runtimes, rather than slowing down all seccomp jails, but the best is to add pid namespaces to the seccomp jail). As opposed MDS-HT is not mitigated and the seccomp jail can still attack all other host and guest userland if SMT is enabled even with spec_store_bypass_disable=seccomp. - If full security is required then MDS-HT must also be mitigated with nosmt and then spectre_v2_user=prctl and spectre_v2_user=seccomp would become identical. - Setting spectre_v2_user=seccomp is overall lower priority than to setting javascript.options.wasm false in about:config to protect against remote wasm MDS-HT, instead of worrying about Spectre-v2-HT and STIBP which again is already statistically well mitigated by other means in userland and it's fully mitigated in kernel with retpolines (unlike the wasm assist call with MDS-HT). - SSBD is needed to prevent reading the JIT memory and the primary user being the OpenJDK. However the primary user of SSBD wouldn't be covered by spec_store_bypass_disable=seccomp because it doesn't use seccomp and the primary user also explicitly declined to set PR_SET_SPECULATION_CTRL+PR_SPEC_STORE_BYPASS despite it easily could. In fact it would need to set it only when the sandboxing mechanism is enabled for javaws applets, but it still declined it by declaring security within the same user address space as an untenable objective for their JIT, even in the sandboxing case where performance would be a lesser concern (for the record: I kind of disagree in not setting PR_SPEC_STORE_BYPASS in the sandbox case and I prefer to run javaws through a wrapper that sets PR_SPEC_STORE_BYPASS if I need). In turn it can be inferred that even if the primary user of SSBD would use seccomp, they would invoke it with SECCOMP_FILTER_FLAG_SPEC_ALLOW by now. - runc/crun already set SECCOMP_FILTER_FLAG_SPEC_ALLOW by default, k8s and podman have a default json seccomp allowlist that cannot be slowed down, so for the #1 seccomp user this change is already a noop. - systemd/sshd or other apps that use seccomp, if they really need STIBP or SSBD, they need to explicitly set the PR_SET_SPECULATION_CTRL by now. The stibp/ssbd seccomp blind catch-all approach was done probably initially with a wishful thinking objective to pretend to have a peace of mind that it could magically fix it all. That was wishful thinking before MDS-HT was discovered, but after MDS-HT has been discovered it become just window dressing. - For qemu "-sandbox" seccomp jail it wouldn't make sense to set STIBP or SSBD. SSBD doesn't help with KVM because there's no JIT (if it's needed with TCG it should be an opt-in with PR_SET_SPECULATION_CTRL+PR_SPEC_STORE_BYPASS and it shouldn't slowdown KVM for nothing). For qemu+KVM STIBP would be even more window dressing than it is for all other apps, because in the qemu+KVM case there's not only the MDS attack to worry about with SMT enabled. Even after disabling SMT, there's still a theoretical spectre-v2 attack possible within the same thread context from guest mode to host ring3 that the host kernel retpoline mitigation has no theoretical chance to mitigate. On some kernels a ibrs-always/ibrs-retpoline opt-in model is provided that will enabled IBRS in the qemu host ring3 userland which fixes this theoretical concern. Only after enabling IBRS in the host userland it would then make sense to proceed and worry about STIBP and an attack on the other host userland, but then again SMT would need to be disabled for full security anyway, so that would render STIBP again a noop. - last but not the least: the lack of "spec_store_bypass_disable=prctl spectre_v2_user=prctl" means the moment a guest boots and sshd/systemd runs, the guest kernel will write to SPEC_CTRL MSR which will make the guest vmexit forever slower, forcing KVM to issue a very slow rdmsr instruction at every vmexit. So the end result is that SPEC_CTRL MSR is only available in GCE. Most other public cloud providers don't expose SPEC_CTRL, which means that not only STIBP/SSBD isn't available, but IBPB isn't available either (which would cause no overhead to the guest or the hypervisor because it's write only and requires no reading during vmexit). So the current default already net loss in security (missing IBPB) which means most public cloud providers cannot achieve a fully secure guest with nosmt (and nosmt is enough to fully mitigate MDS-HT). It also means GCE and is unfairly penalized in performance because it provides the option to enable full security in the guest as an opt-in (i.e. nosmt and IBPB). So this change will allow all cloud providers to expose SPEC_CTRL without incurring into any hypervisor slowdown and at the same time it will remove the unfair penalization of GCE performance for doing the right thing and it'll allow to get full security with nosmt with IBPB being available (and STIBP becoming meaningless). Example to put things in prospective: the STIBP enabled in seccomp has never been about protecting apps using seccomp like sshd from an attack from a malicious userland, but to the contrary it has always been about protecting the system from an attack from sshd, after a successful remote network exploit against sshd. In fact initially it wasn't obvious STIBP would work both ways (STIBP was about preventing the task that runs with STIBP to be attacked with spectre-v2-HT, but accidentally in the STIBP case it also prevents the attack in the other direction). In the hypothetical case that sshd has been remotely exploited the last concern should be STIBP being set, because it'll be still possible to obtain info even from the kernel by using MDS if nosmt wasn't set (and if it was set, STIBP is a noop in the first place). As opposed kernel cannot leak anything with spectre-v2 HT because of retpolines and the userland is mitigated by ASLR already and ideally PID namespaces too. If something it'd be worth checking if sshd run the seccomp thread under pid namespaces too if available in the running kernel. SSBD also would be a noop for sshd, since sshd uses no JIT. If sshd prefers to keep doing the STIBP window dressing exercise, it still can even after this change of defaults by opting-in with PR_SPEC_INDIRECT_BRANCH. Ultimately setting SSBD and STIBP by default for all seccomp jails is a bad sweet spot and bad default with more cons than pros that end up reducing security in the public cloud (by giving an huge incentive to not expose SPEC_CTRL which would be needed to get full security with IBPB after setting nosmt in the guest) and by excessively hurting performance to more secure apps using seccomp that end up having to opt out with SECCOMP_FILTER_FLAG_SPEC_ALLOW. The following is the verified result of the new default with SMT enabled: (gdb) print spectre_v2_user_stibp $1 = SPECTRE_V2_USER_PRCTL (gdb) print spectre_v2_user_ibpb $2 = SPECTRE_V2_USER_PRCTL (gdb) print ssb_mode $3 = SPEC_STORE_BYPASS_PRCTL Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20201104235054.5678-1-aarcange@redhat.com Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Link: https://lore.kernel.org/lkml/AAA2EF2C-293D-4D5B-BFA6-FF655105CD84@redhat.com Acked-by: Waiman Long <longman@redhat.com> Link: https://lore.kernel.org/lkml/c0722838-06f7-da6b-138f-e0f26362f16a@redhat.com
|
#
42bc9716 |
|
30-Sep-2021 |
Jan Beulich <jbeulich@suse.com> |
xen/x86: make "earlyprintk=xen" work for HVM/PVH DomU xenboot_write_console() is dealing with these quite fine so I don't see why xenboot_console_setup() would return -ENOENT in this case. Adjust documentation accordingly. Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/3d212583-700e-8b2d-727a-845ef33ac265@suse.com Signed-off-by: Juergen Gross <jgross@suse.com>
|
#
ade8a86b |
|
20-Jul-2021 |
Dave Jiang <dave.jiang@intel.com> |
dmaengine: idxd: Set defaults for GRPCFG traffic class Set GRPCFG traffic class to value of 1 for best performance on current generation of accelerators. Also add override option to allow experimentation. Sysfs knobs are disabled for DSA/IAX gen1 devices. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/162681373005.1968485.3761065664382799202.stgit@djiang5-desk3.ch.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
|
#
792fb43c |
|
18-Aug-2021 |
Lu Baolu <baolu.lu@linux.intel.com> |
iommu/vt-d: Enable Intel IOMMU scalable mode by default The commit 8950dcd83ae7d ("iommu/vt-d: Leave scalable mode default off") leaves the scalable mode default off and end users could turn it on with "intel_iommu=sm_on". Using the Intel IOMMU scalable mode for kernel DMA, user-level device access and Shared Virtual Address have been enabled. This enables the scalable mode by default if the hardware advertises the support and adds kernel options of "intel_iommu=sm_on/sm_off" for end users to configure it through the kernel parameters. Suggested-by: Ashok Raj <ashok.raj@intel.com> Suggested-by: Sanjay Kumar <sanjay.k.kumar@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Cc: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20210720013856.4143880-1-baolu.lu@linux.intel.com Link: https://lore.kernel.org/r/20210818134852.1847070-5-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
#
e96763ec |
|
11-Aug-2021 |
Robin Murphy <robin.murphy@arm.com> |
iommu: Merge strictness and domain type configs To parallel the sysfs behaviour, merge the new build-time option for DMA domain strictness into the default domain type choice. Suggested-by: Joerg Roedel <joro@8bytes.org> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Reviewed-by: John Garry <john.garry@huawei.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/d04af35b9c0f2a1d39605d7a9b451f5e1f0c7736.1628682049.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
#
712d8f20 |
|
12-Jul-2021 |
Zhen Lei <thunder.leizhen@huawei.com> |
iommu: Enhance IOMMU default DMA mode build options First, add build options IOMMU_DEFAULT_{LAZY|STRICT}, so that we have the opportunity to set {lazy|strict} mode as default at build time. Then put the two config options in an choice, as they are mutually exclusive. [jpg: Make choice between strict and lazy only (and not passthrough)] Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Signed-off-by: John Garry <john.garry@huawei.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/1626088340-5838-4-git-send-email-john.garry@huawei.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
#
1d479f16 |
|
12-Jul-2021 |
John Garry <john.garry@huawei.com> |
iommu: Deprecate Intel and AMD cmdline methods to enable strict mode Now that the x86 drivers support iommu.strict, deprecate the custom methods. Signed-off-by: John Garry <john.garry@huawei.com> Acked-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/1626088340-5838-2-git-send-email-john.garry@huawei.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
#
702f4387 |
|
29-Jul-2021 |
Will Deacon <will@kernel.org> |
Documentation: arm64: describe asymmetric 32-bit support Document support for running 32-bit tasks on asymmetric 32-bit systems and its impact on the user ABI when enabled. Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20210730112443.23245-17-will@kernel.org
|
#
ead7de46 |
|
29-Jul-2021 |
Will Deacon <will@kernel.org> |
arm64: Hook up cmdline parameter to allow mismatched 32-bit EL0 Allow systems with mismatched 32-bit support at EL0 to run 32-bit applications based on a new kernel parameter. Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20210730112443.23245-15-will@kernel.org
|
#
7a062ce3 |
|
03-Aug-2021 |
Yee Lee <yee.lee@mediatek.com> |
arm64/cpufeature: Optionally disable MTE via command-line MTE support needs to be optionally disabled in runtime for HW issue workaround, FW development and some evaluation works on system resource and performance. This patch makes two changes: (1) moves init of tag-allocation bits(ATA/ATA0) to cpu_enable_mte() as not cached in TLB. (2) allows ID_AA64PFR1_EL1.MTE to be overridden on its shadow value by giving "arm64.nomte" on cmdline. When the feature value is off, ATA and TCF will not set and the related functionalities are accordingly suppressed. Suggested-by: Catalin Marinas <catalin.marinas@arm.com> Suggested-by: Marc Zyngier <maz@kernel.org> Suggested-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Yee Lee <yee.lee@mediatek.com> Link: https://lore.kernel.org/r/20210803070824.7586-2-yee.lee@mediatek.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
#
10102a89 |
|
27-Jul-2021 |
Dmitry Safonov <0x7f454c46@gmail.com> |
printk: Add printk.console_no_auto_verbose boot parameter console_verbose() increases console loglevel to CONSOLE_LOGLEVEL_MOTORMOUTH, which provides more information to debug a panic/oops. Unfortunately, in Arista we maintain some DUTs (Device Under Test) that are configured to have 9600 baud rate. While verbose console messages have their value to post-analyze crashes, on such setup they: - may prevent panic/oops messages being printed - take too long to flush on console resulting in watchdog reboot In all our setups we use kdump which saves dmesg buffer after panic, so in reality those extra messages on console provide no additional value, but rather add risk of not getting to __crash_kexec(). Provide printk.console_no_auto_verbose boot parameter, which allows to switch off printk being verbose on oops/panic/lockdep. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: John Ogness <john.ogness@linutronix.de> Cc: Petr Mladek <pmladek@suse.com> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Dmitry Safonov <dima@arista.com> Suggested-by: Petr Mladek <pmladek@suse.com> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Reviewed-by: Petr Mladek <pmladek@suse.com> Tested-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20210727130635.675184-3-dima@arista.com
|
#
72bcad53 |
|
03-Aug-2021 |
Arnd Bergmann <arnd@arndb.de> |
wan: remove sbni/granch driver The driver was merged in 1999 and has only ever seen treewide cleanups since then, with no indication whatsoever that anyone has actually had access to hardware for testing the patches. >From the information in the link below, it appears that the hardware is for some leased line system in Russia that has since been discontinued, and useless without any remote end to connect to. As the driver still feels like a Linux-2.2 era artifact today, it appears that the best way forward is to just delete it. Link: https://www.tms.ru/%D0%90%D0%B4%D0%B0%D0%BF%D1%82%D0%B5%D1%80_%D0%B4%D0%BB%D1%8F_%D0%B2%D1%8B%D0%B4%D0%B5%D0%BB%D0%B5%D0%BD%D0%BD%D1%8B%D1%85_%D0%BB%D0%B8%D0%BD%D0%B8%D0%B9_Granch_SBNI12-10 Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
12febc18 |
|
29-May-2021 |
Paul Gortmaker <paul.gortmaker@windriver.com> |
x86/reboot: Document how to override DMI platform quirks commit 5955633e91bf ("x86/reboot: Skip DMI checks if reboot set by user") made it so that it's not required to recompile the kernel in order to bypass broken reboot quirks compiled into an image: * This variable is used privately to keep track of whether or not * reboot_type is still set to its default value (i.e., reboot= hasn't * been set on the command line). This is needed so that we can * suppress DMI scanning for reboot quirks. Without it, it's * impossible to override a faulty reboot quirk without recompiling. However, at the time it was not eally documented outside the source code, and so this information isn't really available to the average user out there. The change is a little white lie and invented "reboot=default" since it is easy to remember, and documents well. The truth is that any random string that is *not* a currently accepted string will work. Since that doesn't document well for non-coders, and since it's unknown what the future additions might be, lay claim on "default" since that is exactly what it achieves. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20210530162447.996461-3-paul.gortmaker@windriver.com
|
#
b7fe54f6 |
|
08-Jan-2021 |
Balbir Singh <sblbir@amazon.com> |
Documentation: Add L1D flushing Documentation Add documentation of l1d flushing, explain the need for the feature and how it can be used. Signed-off-by: Balbir Singh <sblbir@amazon.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20210108121056.21940-6-sblbir@amazon.com
|
#
4bc2bd5a |
|
17-May-2021 |
Stafford Horne <shorne@gmail.com> |
serial: liteuart: Add support for earlycon Most litex boards using RISC-V soft cores us the sbi earlycon, however this is not available for non RISC-V litex SoC's. This patch enables earlycon for liteuart which is available on all Litex SoC's making support for earycon debugging more widely available. Cc: Florent Kermarrec <florent@enjoy-digital.fr> Cc: Mateusz Holenko <mholenko@antmicro.com> Cc: Joel Stanley <joel@jms.id.au> Cc: Gabriel L. Somlo <gsomlo@gmail.com> Reviewed-and-tested-by: Gabriel Somlo <gsomlo@gmail.com> Reviewed-by: Jiri Slaby <jirislaby@kernel.org> Reviewed-by: Joel Stanley <joel@jms.id.au> Signed-off-by: Stafford Horne <shorne@gmail.com> Link: https://lore.kernel.org/r/20210517115453.24365-1-shorne@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
d0bfa8b3 |
|
15-Apr-2021 |
Zhang Qiang <qiang.zhang@windriver.com> |
kvfree_rcu: Release a page cache under memory pressure Add a drain_page_cache() function to drain a per-cpu page cache. The reason behind of it is a system can run into a low memory condition, in that case a page shrinker can ask for its users to free their caches in order to get extra memory available for other needs in a system. When a system hits such condition, a page cache is drained for all CPUs in a system. By default a page cache work is delayed with 5 seconds interval until a memory pressure disappears, if needed it can be changed. See a rcu_delay_page_cache_fill_msec module parameter. Co-developed-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Zqiang <qiang.zhang@windriver.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
f3860136 |
|
17-Jun-2021 |
Steven Rostedt (VMware) <rostedt@goodmis.org> |
tracing: Add tp_printk_stop_on_boot option Add a kernel command line option that disables printing of events to console at late_initcall_sync(). This is useful when needing to see specific events written to console on boot up, but not wanting it when user space starts, as user space may make the console so noisy that the system becomes inoperable. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
#
e6d41f12 |
|
30-Jun-2021 |
Muchun Song <songmuchun@bytedance.com> |
mm: hugetlb: introduce CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON When using HUGETLB_PAGE_FREE_VMEMMAP, the freeing unused vmemmap pages associated with each HugeTLB page is default off. Now the vmemmap is PMD mapped. So there is no side effect when this feature is enabled with no HugeTLB pages in the system. Someone may want to enable this feature in the compiler time instead of using boot command line. So add a config to make it default on when someone do not want to enable it via command line. Link: https://lkml.kernel.org/r/20210616094915.34432-4-songmuchun@bytedance.com Signed-off-by: Muchun Song <songmuchun@bytedance.com> Cc: Chen Huang <chenhuang5@huawei.com> Cc: David Hildenbrand <david@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Xiongchun Duan <duanxiongchun@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
2d7a2171 |
|
30-Jun-2021 |
Muchun Song <songmuchun@bytedance.com> |
mm: sparsemem: use huge PMD mapping for vmemmap pages The preparation of splitting huge PMD mapping of vmemmap pages is ready, so switch the mapping from PTE to PMD. Link: https://lkml.kernel.org/r/20210616094915.34432-3-songmuchun@bytedance.com Signed-off-by: Muchun Song <songmuchun@bytedance.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Chen Huang <chenhuang5@huawei.com> Cc: David Hildenbrand <david@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Michal Hocko <mhocko@suse.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Xiongchun Duan <duanxiongchun@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
4bab4964 |
|
30-Jun-2021 |
Muchun Song <songmuchun@bytedance.com> |
mm: memory_hotplug: disable memmap_on_memory when hugetlb_free_vmemmap enabled The parameter of memory_hotplug.memmap_on_memory is not compatible with hugetlb_free_vmemmap. So disable it when hugetlb_free_vmemmap is enabled. [akpm@linux-foundation.org: remove unneeded include, per Oscar] Link: https://lkml.kernel.org/r/20210510030027.56044-9-songmuchun@bytedance.com Signed-off-by: Muchun Song <songmuchun@bytedance.com> Acked-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Andy Lutomirski <luto@kernel.org> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Barry Song <song.bao.hua@hisilicon.com> Cc: Bodeddula Balasubramaniam <bodeddub@amazon.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Chen Huang <chenhuang5@huawei.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: HORIGUCHI NAOYA <naoya.horiguchi@nec.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Joao Martins <joao.m.martins@oracle.com> Cc: Joerg Roedel <jroedel@suse.de> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Matthew Wilcox <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mina Almasry <almasrymina@google.com> Cc: Oliver Neukum <oneukum@suse.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Xiongchun Duan <duanxiongchun@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
e9fdff87 |
|
30-Jun-2021 |
Muchun Song <songmuchun@bytedance.com> |
mm: hugetlb: add a kernel parameter hugetlb_free_vmemmap Add a kernel parameter hugetlb_free_vmemmap to enable the feature of freeing unused vmemmap pages associated with each hugetlb page on boot. We disable PMD mapping of vmemmap pages for x86-64 arch when this feature is enabled. Because vmemmap_remap_free() depends on vmemmap being base page mapped. Link: https://lkml.kernel.org/r/20210510030027.56044-8-songmuchun@bytedance.com Signed-off-by: Muchun Song <songmuchun@bytedance.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Reviewed-by: Barry Song <song.bao.hua@hisilicon.com> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Tested-by: Chen Huang <chenhuang5@huawei.com> Tested-by: Bodeddula Balasubramaniam <bodeddub@amazon.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Andy Lutomirski <luto@kernel.org> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: HORIGUCHI NAOYA <naoya.horiguchi@nec.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Joao Martins <joao.m.martins@oracle.com> Cc: Joerg Roedel <jroedel@suse.de> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mina Almasry <almasrymina@google.com> Cc: Oliver Neukum <oneukum@suse.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Xiongchun Duan <duanxiongchun@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
531353e6 |
|
14-Jun-2021 |
Robin Murphy <robin.murphy@arm.com> |
iommu: Update "iommu.strict" documentation Consolidating the flush queue logic also meant that the "iommu.strict" option started taking effect on x86 as well. Make sure we document that. Fixes: a250c23f15c2 ("iommu: remove DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE") Signed-off-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: John Garry <john.garry@huawei.com> Link: https://lore.kernel.org/r/2c8c06e1b449d6b060c5bf9ad3b403cd142f405d.1623682646.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
#
b1e650db |
|
03-Jun-2021 |
Joerg Roedel <jroedel@suse.de> |
iommu/amd: Add amd_iommu=force_enable option Add this option to enable the IOMMU on platforms like AMD Stoney, where the kernel usually disables it because it may cause problems in some scenarios. Signed-off-by: Joerg Roedel <jroedel@suse.de> Acked-by: Alex Deucher <alexander.deucher@amd.com> Link: https://lore.kernel.org/r/20210603130203.29016-1-joro@8bytes.org
|
#
3958e2d0 |
|
24-May-2021 |
Suren Baghdasaryan <surenb@google.com> |
cgroup: make per-cgroup pressure stall tracking configurable PSI accounts stalls for each cgroup separately and aggregates it at each level of the hierarchy. This causes additional overhead with psi_avgs_work being called for each cgroup in the hierarchy. psi_avgs_work has been highly optimized, however on systems with large number of cgroups the overhead becomes noticeable. Systems which use PSI only at the system level could avoid this overhead if PSI can be configured to skip per-cgroup stall accounting. Add "cgroup_disable=pressure" kernel command-line option to allow requesting system-wide only pressure stall accounting. When set, it keeps system-wide accounting under /proc/pressure/ but skips accounting for individual cgroups and does not expose PSI nodes in cgroup hierarchy. Signed-off-by: Suren Baghdasaryan <surenb@google.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Tejun Heo <tj@kernel.org>
|
#
f58780a8 |
|
28-Jun-2021 |
Gavin Shan <gshan@redhat.com> |
mm/page_reporting: export reporting order as module parameter The macro PAGE_REPORTING_MIN_ORDER is defined as the page reporting threshold. It can't be adjusted at runtime. This introduces a variable (@page_reporting_order) to replace the marcro (PAGE_REPORTING_MIN_ORDER). MAX_ORDER is assigned to it initially, meaning the page reporting is disabled. It will be specified by driver if valid one is provided. Otherwise, it will fall back to @pageblock_order. It's also exported so that the page reporting order can be adjusted at runtime. Link: https://lkml.kernel.org/r/20210625014710.42954-3-gshan@redhat.com Signed-off-by: Gavin Shan <gshan@redhat.com> Suggested-by: David Hildenbrand <david@redhat.com> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
1253b9b8 |
|
27-May-2021 |
Paul E. McKenney <paulmck@kernel.org> |
clocksource: Provide kernel module to test clocksource watchdog When the clocksource watchdog marks a clock as unstable, this might be due to that clock being unstable or it might be due to delays that happen to occur between the reads of the two clocks. It would be good to have a way of testing the clocksource watchdog's ability to distinguish between these two causes of clock skew and instability. Therefore, provide a new clocksource-wdtest module selected by a new TEST_CLOCKSOURCE_WATCHDOG Kconfig option. This module has a single module parameter named "holdoff" that provides the number of seconds of delay before testing should start, which defaults to zero when built as a module and to 10 seconds when built directly into the kernel. Very large systems that boot slowly may need to increase the value of this module parameter. This module uses hand-crafted clocksource structures to do its testing, thus avoiding messing up timing for the rest of the kernel and for user applications. This module first verifies that the ->uncertainty_margin field of the clocksource structures are set sanely. It then tests the delay-detection capability of the clocksource watchdog, increasing the number of consecutive delays injected, first provoking console messages complaining about the delays and finally forcing a clock-skew event. Unexpected test results cause at least one WARN_ON_ONCE() console splat. If there are no splats, the test has passed. Finally, it fuzzes the value returned from a clocksource to test the clocksource watchdog's ability to detect time skew. This module checks the state of its clocksource after each test, and uses WARN_ON_ONCE() to emit a console splat if there are any failures. This should enable all types of test frameworks to detect any such failures. This facility is intended for diagnostic use only, and should be avoided on production systems. Reported-by: Chris Mason <clm@fb.com> Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Feng Tang <feng.tang@intel.com> Link: https://lore.kernel.org/r/20210527190124.440372-5-paulmck@kernel.org
|
#
fa218f1c |
|
27-May-2021 |
Paul E. McKenney <paulmck@kernel.org> |
clocksource: Limit number of CPUs checked for clock synchronization Currently, if skew is detected on a clock marked CLOCK_SOURCE_VERIFY_PERCPU, that clock is checked on all CPUs. This is thorough, but might not be what you want on a system with a few tens of CPUs, let alone a few hundred of them. Therefore, by default check only up to eight randomly chosen CPUs. Also provide a new clocksource.verify_n_cpus kernel boot parameter. A value of -1 says to check all of the CPUs, and a non-negative value says to randomly select that number of CPUs, without concern about selecting the same CPU multiple times. However, make use of a cpumask so that a given CPU will be checked at most once. Suggested-by: Thomas Gleixner <tglx@linutronix.de> # For verify_n_cpus=1. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Feng Tang <feng.tang@intel.com> Link: https://lore.kernel.org/r/20210527190124.440372-3-paulmck@kernel.org
|
#
db3a34e1 |
|
27-May-2021 |
Paul E. McKenney <paulmck@kernel.org> |
clocksource: Retry clock read if long delays detected When the clocksource watchdog marks a clock as unstable, this might be due to that clock being unstable or it might be due to delays that happen to occur between the reads of the two clocks. Yes, interrupts are disabled across those two reads, but there are no shortage of things that can delay interrupts-disabled regions of code ranging from SMI handlers to vCPU preemption. It would be good to have some indication as to why the clock was marked unstable. Therefore, re-read the watchdog clock on either side of the read from the clock under test. If the watchdog clock shows an excessive time delta between its pair of reads, the reads are retried. The maximum number of retries is specified by a new kernel boot parameter clocksource.max_cswd_read_retries, which defaults to three, that is, up to four reads, one initial and up to three retries. If more than one retry was required, a message is printed on the console (the occasional single retry is expected behavior, especially in guest OSes). If the maximum number of retries is exceeded, the clock under test will be marked unstable. However, the probability of this happening due to various sorts of delays is quite small. In addition, the reason (clock-read delays) for the unstable marking will be apparent. Reported-by: Chris Mason <clm@fb.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Feng Tang <feng.tang@intel.com> Link: https://lore.kernel.org/r/20210527190124.440372-1-paulmck@kernel.org
|
#
d3121e64 |
|
16-Jun-2021 |
Andy Shevchenko <andriy.shevchenko@linux.intel.com> |
ACPI: sysfs: Allow bitmap list to be supplied to acpi_mask_gpe Currently we need to use as many acpi_mask_gpe options as we want to have GPEs to be masked. Even with two it already becomes inconveniently large the kernel command line. Instead, allow acpi_mask_gpe to represent bitmap list. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
#
1a6a9044 |
|
01-Jun-2021 |
Mike Rapoport <rppt@kernel.org> |
x86/setup: Remove CONFIG_X86_RESERVE_LOW and reservelow= options The CONFIG_X86_RESERVE_LOW build time and reservelow= command line option allowed to control the amount of memory under 1M that would be reserved at boot to avoid using memory that can be potentially clobbered by BIOS. Since the entire range under 1M is always reserved there is no need for these options anymore and they can be removed. Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210601075354.5149-3-rppt@kernel.org
|
#
544ef682 |
|
23-May-2021 |
Barry Song <song.bao.hua@hisilicon.com> |
docs: kernel-parameters: mark numa=off is supported by a bundle of architectures risc-v and arm64 support numa=off by common arch_numa_init() in drivers/base/arch_numa.c. x86, ppc, mips, sparc support it by arch-level early_param. numa=off is widely used in linux distributions. it is better to document it. Signed-off-by: Barry Song <song.bao.hua@hisilicon.com> Link: https://lore.kernel.org/r/20210524051715.13604-1-song.bao.hua@hisilicon.com Signed-off-by: Jonathan Corbet <corbet@lwn.net>
|
#
9d839c28 |
|
19-Apr-2021 |
Fenghua Yu <fenghua.yu@intel.com> |
Documentation/admin-guide: Add bus lock ratelimit Since bus lock rate limit changes the split_lock_detect parameter, update the documentation for the change. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20210419214958.4035512-4-fenghua.yu@intel.com
|
#
e4042ad4 |
|
04-May-2021 |
Peter Zijlstra <peterz@infradead.org> |
delayacct: Default disabled Assuming this stuff isn't actually used much; disable it by default and avoid allocating and tracking the task_delay_info structure. taskstats is changed to still report the regular sched and sched_info and only skip the missing task_delay_info fields instead of not reporting anything. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Link: https://lkml.kernel.org/r/20210505111525.308018373@infradead.org
|
#
8abddd96 |
|
03-May-2021 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc/64s/radix: Enable huge vmalloc mappings This reduces TLB misses by nearly 30x on a `git diff` workload on a 2-node POWER9 (59,800 -> 2,100) and reduces CPU cycles by 0.54%, due to vfs hashes being allocated with 2MB pages. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210503091755.613393-1-npiggin@gmail.com
|
#
e7cb072e |
|
06-May-2021 |
Rasmus Villemoes <linux@rasmusvillemoes.dk> |
init/initramfs.c: do unpacking asynchronously Patch series "background initramfs unpacking, and CONFIG_MODPROBE_PATH", v3. These two patches are independent, but better-together. The second is a rather trivial patch that simply allows the developer to change "/sbin/modprobe" to something else - e.g. the empty string, so that all request_module() during early boot return -ENOENT early, without even spawning a usermode helper, needlessly synchronizing with the initramfs unpacking. The first patch delegates decompressing the initramfs to a worker thread, allowing do_initcalls() in main.c to proceed to the device_ and late_ initcalls without waiting for that decompression (and populating of rootfs) to finish. Obviously, some of those later calls may rely on the initramfs being available, so I've added synchronization points in the firmware loader and usermodehelper paths - there might be other places that would need this, but so far no one has been able to think of any places I have missed. There's not much to win if most of the functionality needed during boot is only available as modules. But systems with a custom-made .config and initramfs can boot faster, partly due to utilizing more than one cpu earlier, partly by avoiding known-futile modprobe calls (which would still trigger synchronization with the initramfs unpacking, thus eliminating most of the first benefit). This patch (of 2): Most of the boot process doesn't actually need anything from the initramfs, until of course PID1 is to be executed. So instead of doing the decompressing and populating of the initramfs synchronously in populate_rootfs() itself, push that off to a worker thread. This is primarily motivated by an embedded ppc target, where unpacking even the rather modest sized initramfs takes 0.6 seconds, which is long enough that the external watchdog becomes unhappy that it doesn't get attention soon enough. By doing the initramfs decompression in a worker thread, we get to do the device_initcalls and hence start petting the watchdog much sooner. Normal desktops might benefit as well. On my mostly stock Ubuntu kernel, my initramfs is a 26M xz-compressed blob, decompressing to around 126M. That takes almost two seconds: [ 0.201454] Trying to unpack rootfs image as initramfs... [ 1.976633] Freeing initrd memory: 29416K Before this patch, these lines occur consecutively in dmesg. With this patch, the timestamps on these two lines is roughly the same as above, but with 172 lines inbetween - so more than one cpu has been kept busy doing work that would otherwise only happen after the populate_rootfs() finished. Should one of the initcalls done after rootfs_initcall time (i.e., device_ and late_ initcalls) need something from the initramfs (say, a kernel module or a firmware blob), it will simply wait for the initramfs unpacking to be done before proceeding, which should in theory make this completely safe. But if some driver pokes around in the filesystem directly and not via one of the official kernel interfaces (i.e. request_firmware*(), call_usermodehelper*) that theory may not hold - also, I certainly might have missed a spot when sprinkling wait_for_initramfs(). So there is an escape hatch in the form of an initramfs_async= command line parameter. Link: https://lkml.kernel.org/r/20210313212528.2956377-1-linux@rasmusvillemoes.dk Link: https://lkml.kernel.org/r/20210313212528.2956377-2-linux@rasmusvillemoes.dk Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Cc: Jessica Yu <jeyu@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Takashi Iwai <tiwai@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
f6e5aedf |
|
24-Feb-2021 |
Kefeng Wang <wangkefeng.wang@huawei.com> |
riscv: Add support for memtest The riscv [rv32_]defconfig enabled CONFIG_MEMTEST, but memtest feature is not supported in RISCV. Add early_memtest() to support for memtest. Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
|
#
e3a9d9fc |
|
04-May-2021 |
Oscar Salvador <osalvador@suse.de> |
mm,memory_hotplug: add kernel boot option to enable memmap_on_memory Self stored memmap leads to a sparse memory situation which is unsuitable for workloads that requires large contiguous memory chunks, so make this an opt-in which needs to be explicitly enabled. To control this, let memory_hotplug have its own memory space, as suggested by David, so we can add memory_hotplug.memmap_on_memory parameter. Link: https://lkml.kernel.org/r/20210421102701.25051-7-osalvador@suse.de Signed-off-by: Oscar Salvador <osalvador@suse.de> Reviewed-by: David Hildenbrand <david@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
6984a320 |
|
29-Mar-2021 |
Alexander Dahl <ada@thorsis.com> |
docs: kernel-parameters: Add gpio_mockup_named_lines Missing since introduced in the driver. Fixes: 8a68ea00a62e ("gpio: mockup: implement naming the lines") Signed-off-by: Alexander Dahl <ada@thorsis.com> Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
|
#
3eb52226 |
|
29-Mar-2021 |
Alexander Dahl <ada@thorsis.com> |
docs: kernel-parameters: Move gpio-mockup for alphabetic order All other sections are ordered alphabetically so do the same for gpio-mockup. Fixes: 0f98dd1b27d2 ("gpio/mockup: add virtual gpio device") Signed-off-by: Alexander Dahl <ada@thorsis.com> Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
|
#
3542dcb1 |
|
05-Mar-2021 |
Robin Murphy <robin.murphy@arm.com> |
iommu/dma: Resurrect the "forcedac" option In converting intel-iommu over to the common IOMMU DMA ops, it quietly lost the functionality of its "forcedac" option. Since this is a handy thing both for testing and for performance optimisation on certain platforms, reimplement it under the common IOMMU parameter namespace. For the sake of fixing the inadvertent breakage of the Intel-specific parameter, remove the dmar_forcedac remnants and hook it up as an alias while documenting the transition to the new common parameter. Fixes: c588072bba6b ("iommu/vt-d: Convert intel iommu driver to the iommu ops") Signed-off-by: Robin Murphy <robin.murphy@arm.com> Acked-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: John Garry <john.garry@huawei.com> Link: https://lore.kernel.org/r/7eece8e0ea7bfbe2cd0e30789e0d46df573af9b0.1614961776.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
#
82edd9d5 |
|
29-Apr-2021 |
Rafael Aquini <aquini@redhat.com> |
mm/slab_common: provide "slab_merge" option for !IS_ENABLED(CONFIG_SLAB_MERGE_DEFAULT) builds This is a minor addition to the allocator setup options to provide a simple way to on demand enable back cache merging for builds that by default run with CONFIG_SLAB_MERGE_DEFAULT not set. Link: https://lkml.kernel.org/r/20210319194506.200159-1-aquini@redhat.com Signed-off-by: Rafael Aquini <aquini@redhat.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
9406415f |
|
15-Apr-2021 |
Peter Zijlstra <peterz@infradead.org> |
sched/debug: Rename the sched_debug parameter to sched_verbose CONFIG_SCHED_DEBUG is the build-time Kconfig knob, the boot param sched_debug and the /debug/sched/debug_enabled knobs control the sched_debug_enabled variable, but what they really do is make SCHED_DEBUG more verbose, so rename the lot. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
|
#
7d33004d |
|
21-Mar-2021 |
Maciej W. Rozycki <macro@orcam.me.uk> |
pata_legacy: Add `probe_mask' parameter like with ide-generic Carry the `probe_mask' parameter over from ide-generic to pata_legacy so that there is a way to prevent random poking at ISA port I/O locations in attempt to discover adapter option cards with libata like with the old IDE driver. By default all enabled locations are tried, however it may interfere with a different kind of hardware responding there. For example with a plain (E)ISA system the driver tries all the six possible locations: scsi host0: pata_legacy ata1: PATA max PIO4 cmd 0x1f0 ctl 0x3f6 irq 14 ata1.00: ATA-4: ST310211A, 3.54, max UDMA/100 ata1.00: 19541088 sectors, multi 16: LBA ata1.00: configured for PIO scsi 0:0:0:0: Direct-Access ATA ST310211A 3.54 PQ: 0 ANSI: 5 scsi 0:0:0:0: Attached scsi generic sg0 type 0 sd 0:0:0:0: [sda] 19541088 512-byte logical blocks: (10.0 GB/9.32 GiB) sd 0:0:0:0: [sda] Write Protect is off sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sda: sda1 sda2 sda3 sd 0:0:0:0: [sda] Attached SCSI disk scsi host1: pata_legacy ata2: PATA max PIO4 cmd 0x170 ctl 0x376 irq 15 scsi host1: pata_legacy ata3: PATA max PIO4 cmd 0x1e8 ctl 0x3ee irq 11 scsi host1: pata_legacy ata4: PATA max PIO4 cmd 0x168 ctl 0x36e irq 10 scsi host1: pata_legacy ata5: PATA max PIO4 cmd 0x1e0 ctl 0x3e6 irq 8 scsi host1: pata_legacy ata6: PATA max PIO4 cmd 0x160 ctl 0x366 irq 12 however giving the kernel "pata_legacy.probe_mask=21" makes it try every other location only: scsi host0: pata_legacy ata1: PATA max PIO4 cmd 0x1f0 ctl 0x3f6 irq 14 ata1.00: ATA-4: ST310211A, 3.54, max UDMA/100 ata1.00: 19541088 sectors, multi 16: LBA ata1.00: configured for PIO scsi 0:0:0:0: Direct-Access ATA ST310211A 3.54 PQ: 0 ANSI: 5 scsi 0:0:0:0: Attached scsi generic sg0 type 0 sd 0:0:0:0: [sda] 19541088 512-byte logical blocks: (10.0 GB/9.32 GiB) sd 0:0:0:0: [sda] Write Protect is off sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sda: sda1 sda2 sda3 sd 0:0:0:0: [sda] Attached SCSI disk scsi host1: pata_legacy ata2: PATA max PIO4 cmd 0x1e8 ctl 0x3ee irq 11 scsi host1: pata_legacy ata3: PATA max PIO4 cmd 0x1e0 ctl 0x3e6 irq 8 Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/alpine.DEB.2.21.2103211800110.21463@angie.orcam.me.uk Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
6ddcec95 |
|
21-Mar-2021 |
Maciej W. Rozycki <macro@orcam.me.uk> |
pata_platform: Document `pio_mask' module parameter Add MODULE_PARM_DESC documentation and a kernel-parameters.txt entry. Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/alpine.DEB.2.21.2103212023190.21463@angie.orcam.me.uk Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
426e2c6a |
|
21-Mar-2021 |
Maciej W. Rozycki <macro@orcam.me.uk> |
pata_legacy: Properly document module parameters Most pata_legacy module parameters lack MODULE_PARM_DESC documentation and none is described in kernel-parameters.txt. Also several comments are inaccurate or wrong. Add the missing documentation pieces then and reorder parameters into a consistent block. Remove inaccuracies as follows: - `all' affects primary and secondary port ranges only rather than all, - `probe_all' affects tertiary and further port ranges rather than all, - `ht6560b' is for HT 6560B rather than HT 6560A. Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/alpine.DEB.2.21.2103211909560.21463@angie.orcam.me.uk Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
686fe1bf |
|
17-Feb-2021 |
Uladzislau Rezki (Sony) <urezki@gmail.com> |
rcuscale: Add kfree_rcu() single-argument scale test The single-argument variant of kfree_rcu() is currently not tested by any member of the rcutoture test suite. This commit therefore adds rcuscale code to test it. This testing is controlled by two new boolean module parameters, kfree_rcu_test_single and kfree_rcu_test_double. If one is set and the other not, only the corresponding variant is tested, otherwise both are tested, with the variant to be tested determined randomly on each invocation. Both of these module parameters are initialized to false, so setting either to true will test only that variant. Suggested-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
3e70df91 |
|
21-Feb-2021 |
Paul Gortmaker <paul.gortmaker@windriver.com> |
rcu: deprecate "all" option to rcu_nocbs= With the core bitmap support now accepting "N" as a placeholder for the end of the bitmap, "all" can be represented as "0-N" and has the advantage of not being specific to RCU (or any other subsystem). So deprecate the use of "all" by removing documentation references to it. The support itself needs to remain for now, since we don't know how many people out there are using it currently, but since it is in an __init area anyway, it isn't worth losing sleep over. Cc: Yury Norov <yury.norov@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Josh Triplett <josh@joshtriplett.org> Acked-by: Yury Norov <yury.norov@gmail.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
a5aabace |
|
01-Mar-2021 |
Juergen Gross <jgross@suse.com> |
locking/csd_lock: Add more data to CSD lock debugging In order to help identifying problems with IPI handling and remote function execution add some more data to IPI debugging code. There have been multiple reports of CPUs looping long times (many seconds) in smp_call_function_many() waiting for another CPU executing a function like tlb flushing. Most of these reports have been for cases where the kernel was running as a guest on top of KVM or Xen (there are rumours of that happening under VMWare, too, and even on bare metal). Finding the root cause hasn't been successful yet, even after more than 2 years of chasing this bug by different developers. Commit: 35feb60474bf4f7 ("kernel/smp: Provide CSD lock timeout diagnostics") tried to address this by adding some debug code and by issuing another IPI when a hang was detected. This helped mitigating the problem (the repeated IPI unlocks the hang), but the root cause is still unknown. Current available data suggests that either an IPI wasn't sent when it should have been, or that the IPI didn't result in the target CPU executing the queued function (due to the IPI not reaching the CPU, the IPI handler not being called, or the handler not seeing the queued request). Try to add more diagnostic data by introducing a global atomic counter which is being incremented when doing critical operations (before and after queueing a new request, when sending an IPI, and when dequeueing a request). The counter value is stored in percpu variables which can be printed out when a hang is detected. The data of the last event (consisting of sequence counter, source CPU, target CPU, and event type) is stored in a global variable. When a new event is to be traced, the data of the last event is stored in the event related percpu location and the global data is updated with the new event's data. This allows to track two events in one data location: one by the value of the event data (the event before the current one), and one by the location itself (the current event). A typical printout with a detected hang will look like this: csd: Detected non-responsive CSD lock (#1) on CPU#1, waiting 5000000003 ns for CPU#06 scf_handler_1+0x0/0x50(0xffffa2a881bb1410). csd: CSD lock (#1) handling prior scf_handler_1+0x0/0x50(0xffffa2a8813823c0) request. csd: cnt(00008cc): ffff->0000 dequeue (src cpu 0 == empty) csd: cnt(00008cd): ffff->0006 idle csd: cnt(0003668): 0001->0006 queue csd: cnt(0003669): 0001->0006 ipi csd: cnt(0003e0f): 0007->000a queue csd: cnt(0003e10): 0001->ffff ping csd: cnt(0003e71): 0003->0000 ping csd: cnt(0003e72): ffff->0006 gotipi csd: cnt(0003e73): ffff->0006 handle csd: cnt(0003e74): ffff->0006 dequeue (src cpu 0 == empty) csd: cnt(0003e7f): 0004->0006 ping csd: cnt(0003e80): 0001->ffff pinged csd: cnt(0003eb2): 0005->0001 noipi csd: cnt(0003eb3): 0001->0006 queue csd: cnt(0003eb4): 0001->0006 noipi csd: cnt now: 0003f00 The idea is to print only relevant entries. Those are all events which are associated with the hang (so sender side events for the source CPU of the hanging request, and receiver side events for the target CPU), and the related events just before those (for adding data needed to identify a possible race). Printing all available data would be possible, but this would add large amounts of data printed on larger configurations. Signed-off-by: Juergen Gross <jgross@suse.com> [ Minor readability edits. Breaks col80 but is far more readable. ] Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Paul E. McKenney <paulmck@kernel.org> Link: https://lore.kernel.org/r/20210301101336.7797-4-jgross@suse.com
|
#
8d0968cc |
|
01-Mar-2021 |
Juergen Gross <jgross@suse.com> |
locking/csd_lock: Add boot parameter for controlling CSD lock debugging Currently CSD lock debugging can be switched on and off via a kernel config option only. Unfortunately there is at least one problem with CSD lock handling pending for about 2 years now, which has been seen in different environments (mostly when running virtualized under KVM or Xen, at least once on bare metal). Multiple attempts to catch this issue have finally led to introduction of CSD lock debug code, but this code is not in use in most distros as it has some impact on performance. In order to be able to ship kernels with CONFIG_CSD_LOCK_WAIT_DEBUG enabled even for production use, add a boot parameter for switching the debug functionality on. This will reduce any performance impact of the debug coding to a bare minimum when not being used. Signed-off-by: Juergen Gross <jgross@suse.com> [ Minor edits. ] Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210301101336.7797-2-jgross@suse.com
|
#
5d0682be |
|
01-Mar-2021 |
Sumit Garg <sumit.garg@linaro.org> |
KEYS: trusted: Add generic trusted keys framework Current trusted keys framework is tightly coupled to use TPM device as an underlying implementation which makes it difficult for implementations like Trusted Execution Environment (TEE) etc. to provide trusted keys support in case platform doesn't posses a TPM device. Add a generic trusted keys framework where underlying implementations can be easily plugged in. Create struct trusted_key_ops to achieve this, which contains necessary functions of a backend. Also, define a module parameter in order to select a particular trust source in case a platform support multiple trust sources. In case its not specified then implementation itetrates through trust sources list starting with TPM and assign the first trust source as a backend which has initiazed successfully during iteration. Note that current implementation only supports a single trust source at runtime which is either selectable at compile time or during boot via aforementioned module parameter. Suggested-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: Sumit Garg <sumit.garg@linaro.org> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
|
#
2d726d0d |
|
08-Apr-2021 |
Marc Zyngier <maz@kernel.org> |
arm64: Get rid of CONFIG_ARM64_VHE CONFIG_ARM64_VHE was introduced with ARMv8.1 (some 7 years ago), and has been enabled by default for almost all that time. Given that newer systems that are VHE capable are finally becoming available, and that some systems are even incapable of not running VHE, drop the configuration altogether. Anyone willing to stick to non-VHE on VHE hardware for obscure reasons should use the 'kvm-arm.mode=nvhe' command-line option. Suggested-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20210408131010.1109027-4-maz@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
#
39218ff4 |
|
01-Apr-2021 |
Kees Cook <keescook@chromium.org> |
stack: Optionally randomize kernel stack offset each syscall This provides the ability for architectures to enable kernel stack base address offset randomization. This feature is controlled by the boot param "randomize_kstack_offset=on/off", with its default value set by CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT. This feature is based on the original idea from the last public release of PaX's RANDKSTACK feature: https://pax.grsecurity.net/docs/randkstack.txt All the credit for the original idea goes to the PaX team. Note that the design and implementation of this upstream randomize_kstack_offset feature differs greatly from the RANDKSTACK feature (see below). Reasoning for the feature: This feature aims to make harder the various stack-based attacks that rely on deterministic stack structure. We have had many such attacks in past (just to name few): https://jon.oberheide.org/files/infiltrate12-thestackisback.pdf https://jon.oberheide.org/files/stackjacking-infiltrate11.pdf https://googleprojectzero.blogspot.com/2016/06/exploiting-recursion-in-linux-kernel_20.html As Linux kernel stack protections have been constantly improving (vmap-based stack allocation with guard pages, removal of thread_info, STACKLEAK), attackers have had to find new ways for their exploits to work. They have done so, continuing to rely on the kernel's stack determinism, in situations where VMAP_STACK and THREAD_INFO_IN_TASK_STRUCT were not relevant. For example, the following recent attacks would have been hampered if the stack offset was non-deterministic between syscalls: https://repositorio-aberto.up.pt/bitstream/10216/125357/2/374717.pdf (page 70: targeting the pt_regs copy with linear stack overflow) https://a13xp0p0v.github.io/2020/02/15/CVE-2019-18683.html (leaked stack address from one syscall as a target during next syscall) The main idea is that since the stack offset is randomized on each system call, it is harder for an attack to reliably land in any particular place on the thread stack, even with address exposures, as the stack base will change on the next syscall. Also, since randomization is performed after placing pt_regs, the ptrace-based approach[1] to discover the randomized offset during a long-running syscall should not be possible. Design description: During most of the kernel's execution, it runs on the "thread stack", which is pretty deterministic in its structure: it is fixed in size, and on every entry from userspace to kernel on a syscall the thread stack starts construction from an address fetched from the per-cpu cpu_current_top_of_stack variable. The first element to be pushed to the thread stack is the pt_regs struct that stores all required CPU registers and syscall parameters. Finally the specific syscall function is called, with the stack being used as the kernel executes the resulting request. The goal of randomize_kstack_offset feature is to add a random offset after the pt_regs has been pushed to the stack and before the rest of the thread stack is used during the syscall processing, and to change it every time a process issues a syscall. The source of randomness is currently architecture-defined (but x86 is using the low byte of rdtsc()). Future improvements for different entropy sources is possible, but out of scope for this patch. Further more, to add more unpredictability, new offsets are chosen at the end of syscalls (the timing of which should be less easy to measure from userspace than at syscall entry time), and stored in a per-CPU variable, so that the life of the value does not stay explicitly tied to a single task. As suggested by Andy Lutomirski, the offset is added using alloca() and an empty asm() statement with an output constraint, since it avoids changes to assembly syscall entry code, to the unwinder, and provides correct stack alignment as defined by the compiler. In order to make this available by default with zero performance impact for those that don't want it, it is boot-time selectable with static branches. This way, if the overhead is not wanted, it can just be left turned off with no performance impact. The generated assembly for x86_64 with GCC looks like this: ... ffffffff81003977: 65 8b 05 02 ea 00 7f mov %gs:0x7f00ea02(%rip),%eax # 12380 <kstack_offset> ffffffff8100397e: 25 ff 03 00 00 and $0x3ff,%eax ffffffff81003983: 48 83 c0 0f add $0xf,%rax ffffffff81003987: 25 f8 07 00 00 and $0x7f8,%eax ffffffff8100398c: 48 29 c4 sub %rax,%rsp ffffffff8100398f: 48 8d 44 24 0f lea 0xf(%rsp),%rax ffffffff81003994: 48 83 e0 f0 and $0xfffffffffffffff0,%rax ... As a result of the above stack alignment, this patch introduces about 5 bits of randomness after pt_regs is spilled to the thread stack on x86_64, and 6 bits on x86_32 (since its has 1 fewer bit required for stack alignment). The amount of entropy could be adjusted based on how much of the stack space we wish to trade for security. My measure of syscall performance overhead (on x86_64): lmbench: /usr/lib/lmbench/bin/x86_64-linux-gnu/lat_syscall -N 10000 null randomize_kstack_offset=y Simple syscall: 0.7082 microseconds randomize_kstack_offset=n Simple syscall: 0.7016 microseconds So, roughly 0.9% overhead growth for a no-op syscall, which is very manageable. And for people that don't want this, it's off by default. There are two gotchas with using the alloca() trick. First, compilers that have Stack Clash protection (-fstack-clash-protection) enabled by default (e.g. Ubuntu[3]) add pagesize stack probes to any dynamic stack allocations. While the randomization offset is always less than a page, the resulting assembly would still contain (unreachable!) probing routines, bloating the resulting assembly. To avoid this, -fno-stack-clash-protection is unconditionally added to the kernel Makefile since this is the only dynamic stack allocation in the kernel (now that VLAs have been removed) and it is provably safe from Stack Clash style attacks. The second gotcha with alloca() is a negative interaction with -fstack-protector*, in that it sees the alloca() as an array allocation, which triggers the unconditional addition of the stack canary function pre/post-amble which slows down syscalls regardless of the static branch. In order to avoid adding this unneeded check and its associated performance impact, architectures need to carefully remove uses of -fstack-protector-strong (or -fstack-protector) in the compilation units that use the add_random_kstack() macro and to audit the resulting stack mitigation coverage (to make sure no desired coverage disappears). No change is visible for this on x86 because the stack protector is already unconditionally disabled for the compilation unit, but the change is required on arm64. There is, unfortunately, no attribute that can be used to disable stack protector for specific functions. Comparison to PaX RANDKSTACK feature: The RANDKSTACK feature randomizes the location of the stack start (cpu_current_top_of_stack), i.e. including the location of pt_regs structure itself on the stack. Initially this patch followed the same approach, but during the recent discussions[2], it has been determined to be of a little value since, if ptrace functionality is available for an attacker, they can use PTRACE_PEEKUSR/PTRACE_POKEUSR to read/write different offsets in the pt_regs struct, observe the cache behavior of the pt_regs accesses, and figure out the random stack offset. Another difference is that the random offset is stored in a per-cpu variable, rather than having it be per-thread. As a result, these implementations differ a fair bit in their implementation details and results, though obviously the intent is similar. [1] https://lore.kernel.org/kernel-hardening/2236FBA76BA1254E88B949DDB74E612BA4BC57C1@IRSMSX102.ger.corp.intel.com/ [2] https://lore.kernel.org/kernel-hardening/20190329081358.30497-1-elena.reshetova@intel.com/ [3] https://lists.ubuntu.com/archives/ubuntu-devel/2019-June/040741.html Co-developed-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20210401232347.2791257-4-keescook@chromium.org
|
#
ebca1770 |
|
22-Mar-2021 |
Fenghua Yu <fenghua.yu@intel.com> |
Documentation/admin-guide: Change doc for split_lock_detect parameter Since #DB for bus lock detect changes the split_lock_detect parameter, update the documentation for the changes. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Link: https://lore.kernel.org/r/20210322135325.682257-4-fenghua.yu@intel.com
|
#
00b072c0 |
|
02-Mar-2021 |
Barry Song <song.bao.hua@hisilicon.com> |
Documentation/admin-guide: kernel-parameters: correct the architectures for numa_balancing X86 isn't the only architecture supporting NUMA_BALANCING. ARM64, PPC, S390 and RISCV also support it: arch$ git grep NUMA_BALANCING arm64/Kconfig: select ARCH_SUPPORTS_NUMA_BALANCING arm64/configs/defconfig:CONFIG_NUMA_BALANCING=y arm64/include/asm/pgtable.h:#ifdef CONFIG_NUMA_BALANCING powerpc/configs/powernv_defconfig:CONFIG_NUMA_BALANCING=y powerpc/configs/ppc64_defconfig:CONFIG_NUMA_BALANCING=y powerpc/configs/pseries_defconfig:CONFIG_NUMA_BALANCING=y powerpc/include/asm/book3s/64/pgtable.h:#ifdef CONFIG_NUMA_BALANCING powerpc/include/asm/book3s/64/pgtable.h:#ifdef CONFIG_NUMA_BALANCING powerpc/include/asm/book3s/64/pgtable.h:#endif /* CONFIG_NUMA_BALANCING */ powerpc/include/asm/book3s/64/pgtable.h:#ifdef CONFIG_NUMA_BALANCING powerpc/include/asm/book3s/64/pgtable.h:#endif /* CONFIG_NUMA_BALANCING */ powerpc/include/asm/nohash/pgtable.h:#ifdef CONFIG_NUMA_BALANCING powerpc/include/asm/nohash/pgtable.h:#endif /* CONFIG_NUMA_BALANCING */ powerpc/platforms/Kconfig.cputype: select ARCH_SUPPORTS_NUMA_BALANCING riscv/Kconfig: select ARCH_SUPPORTS_NUMA_BALANCING riscv/include/asm/pgtable.h:#ifdef CONFIG_NUMA_BALANCING s390/Kconfig: select ARCH_SUPPORTS_NUMA_BALANCING s390/configs/debug_defconfig:CONFIG_NUMA_BALANCING=y s390/configs/defconfig:CONFIG_NUMA_BALANCING=y s390/include/asm/pgtable.h:#ifdef CONFIG_NUMA_BALANCING x86/Kconfig: select ARCH_SUPPORTS_NUMA_BALANCING if X86_64 x86/include/asm/pgtable.h:#ifdef CONFIG_NUMA_BALANCING x86/include/asm/pgtable.h:#endif /* CONFIG_NUMA_BALANCING */ On the other hand, setup_numabalancing() is implemented in mm/mempolicy.c which doesn't depend on architectures. Signed-off-by: Barry Song <song.bao.hua@hisilicon.com> Reviewed-by: Palmer Dabbelt <palmerdabbelt@google.com> Acked-by: Palmer Dabbelt <palmerdabbelt@google.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20210302084159.33688-1-song.bao.hua@hisilicon.com Signed-off-by: Jonathan Corbet <corbet@lwn.net>
|
#
866d6cdf |
|
19-Feb-2021 |
Rafael J. Wysocki <rafael.j.wysocki@intel.com> |
ACPI: PCI: Drop ACPI_PCI_COMPONENT that is not used any more After dropping all of the code using ACPI_PCI_COMPONENT drop the definition of it too and update the documentation to remove all ACPI_PCI_COMPONENT references from it. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Hanjun Guo <guohanjun@huawei.com>
|
#
e1fdc403 |
|
25-Feb-2021 |
Vijayanand Jitta <vjitta@codeaurora.org> |
lib: stackdepot: add support to disable stack depot Add a kernel parameter stack_depot_disable to disable stack depot. So that stack hash table doesn't consume any memory when stack depot is disabled. The use case is CONFIG_PAGE_OWNER without page_owner=on. Without this patch, stackdepot will consume the memory for the hashtable. By default, it's 8M which is never trivial. With this option, in CONFIG_PAGE_OWNER configured system, page_owner=off, stack_depot_disable in kernel command line, we could save the wasted memory for the hashtable. [akpm@linux-foundation.org: fix CONFIG_STACKDEPOT=n build] Link: https://lkml.kernel.org/r/1611749198-24316-2-git-send-email-vjitta@codeaurora.org Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org> Signed-off-by: Vijayanand Jitta <vjitta@codeaurora.org> Cc: Alexander Potapenko <glider@google.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Yogesh Lal <ylal@codeaurora.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
fe2cce15 |
|
24-Feb-2021 |
Vlastimil Babka <vbabka@suse.cz> |
mm, slub: remove slub_memcg_sysfs boot param and CONFIG_SLUB_MEMCG_SYSFS_ON The boot param and config determine the value of memcg_sysfs_enabled, which is unused since commit 10befea91b61 ("mm: memcg/slab: use a single set of kmem_caches for all allocations") as there are no per-memcg kmem caches anymore. Link: https://lkml.kernel.org/r/20210127124745.7928-1-vbabka@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: David Hildenbrand <david@redhat.com> Acked-by: Roman Gushchin <guro@fb.com> Acked-by: David Rientjes <rientjes@google.com> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
1b79fc4f |
|
25-Jan-2021 |
Andy Shevchenko <andriy.shevchenko@linux.intel.com> |
x86/apb_timer: Remove driver for deprecated platform Intel Moorestown and Medfield are quite old Intel Atom based 32-bit platforms, which were in limited use in some Android phones, tablets and consumer electronics more than eight years ago. There are no bugs or problems ever reported outside from Intel for breaking any of that platforms for years. It seems no real users exists who run more or less fresh kernel on it. Commit 05f4434bc130 ("ASoC: Intel: remove mfld_machine") is also in align with this theory. Due to above and to reduce a burden of supporting outdated drivers, remove the support for outdated platforms completely. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
#
19d0f5f6 |
|
05-Feb-2021 |
Saravana Kannan <saravanak@google.com> |
driver core: Add fw_devlink.strict kernel param This param allows forcing all dependencies to be treated as mandatory. This will be useful for boards in which all optional dependencies like IOMMUs and DMAs need to be treated as mandatory dependencies. Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Saravana Kannan <saravanak@google.com> Link: https://lore.kernel.org/r/20210205222644.2357303-4-saravanak@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
03d939c7 |
|
22-Jan-2021 |
Dave Jiang <dave.jiang@intel.com> |
dmaengine: idxd: add module parameter to force disable of SVA Add a module parameter that overrides the SVA feature enabling. This keeps the driver in legacy mode even when intel_iommu=sm_on is set. In this mode, the descriptor fields must be programmed with dma_addr_t from the Linux DMA API for source, destination, and completion descriptors. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/161134110457.4005461.13171197785259115852.stgit@djiang5-desk3.ch.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
|
#
5ead723a |
|
14-Feb-2021 |
Timur Tabi <timur@kernel.org> |
lib/vsprintf: no_hash_pointers prints all addresses as unhashed If the no_hash_pointers command line parameter is set, then printk("%p") will print pointers as unhashed, which is useful for debugging purposes. This change applies to any function that uses vsprintf, such as print_hex_dump() and seq_buf_printf(). A large warning message is displayed if this option is enabled. Unhashed pointers expose kernel addresses, which can be a security risk. Also update test_printf to skip the hashed pointer tests if the command-line option is set. Signed-off-by: Timur Tabi <timur@kernel.org> Acked-by: Petr Mladek <pmladek@suse.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Marco Elver <elver@google.com> Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20210214161348.369023-4-timur@kernel.org
|
#
3cae85f5 |
|
09-Feb-2021 |
Florian Fainelli <f.fainelli@gmail.com> |
Documentation/admin-guide: kernel-parameters: Update nohlt section Update the documentation regarding "nohlt" and indicate that it is not only for bugs, but can be useful to disable the architecture specific sleep instructions. ARM, ARM64, SuperH and Microblaze all use CONFIG_GENERIC_IDLE_POLL_SETUP which takes care of honoring the "hlt"/"nohlt" parameters. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Link: https://lore.kernel.org/r/20210209172349.2249596-1-f.fainelli@gmail.com Signed-off-by: Jonathan Corbet <corbet@lwn.net>
|
#
bc47190d |
|
24-Jan-2021 |
Randy Dunlap <rdunlap@infradead.org> |
Documentation/admin-guide: kernel-parameters: update CMA entries Add qualifying build option legend [CMA] to kernel boot options that requirce CMA support to be enabled for them to be usable. Also capitalize 'CMA' when it is used as an acronym. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Acked-by: Mike Kravetz <mike.kravetz@oracle.com> Link: https://lore.kernel.org/r/20210125043202.22399-1-rdunlap@infradead.org Signed-off-by: Jonathan Corbet <corbet@lwn.net>
|
#
6ef869e0 |
|
18-Jan-2021 |
Michal Hocko <mhocko@kernel.org> |
preempt: Introduce CONFIG_PREEMPT_DYNAMIC Preemption mode selection is currently hardcoded on Kconfig choices. Introduce a dedicated option to tune preemption flavour at boot time, This will be only available on architectures efficiently supporting static calls in order not to tempt with the feature against additional overhead that might be prohibitive or undesirable. CONFIG_PREEMPT_DYNAMIC is automatically selected by CONFIG_PREEMPT if the architecture provides the necessary support (CONFIG_STATIC_CALL_INLINE, CONFIG_GENERIC_ENTRY, and provide with __preempt_schedule_function() / __preempt_schedule_notrace_function()). Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> [peterz: relax requirement to HAVE_STATIC_CALL] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lkml.kernel.org/r/20210118141223.123667-5-frederic@kernel.org
|
#
f8da5752 |
|
08-Feb-2021 |
Marc Zyngier <maz@kernel.org> |
arm64: cpufeatures: Allow disabling of Pointer Auth from the command-line In order to be able to disable Pointer Authentication at runtime, whether it is for testing purposes, or to work around HW issues, let's add support for overriding the ID_AA64ISAR1_EL1.{GPI,GPA,API,APA} fields. This is further mapped on the arm64.nopauth command-line alias. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: David Brazdil <dbrazdil@google.com> Tested-by: Srinivas Ramana <sramana@codeaurora.org> Link: https://lore.kernel.org/r/20210208095732.3267263-23-maz@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
|
#
93ad55b7 |
|
08-Feb-2021 |
Marc Zyngier <maz@kernel.org> |
arm64: cpufeatures: Allow disabling of BTI from the command-line In order to be able to disable BTI at runtime, whether it is for testing purposes, or to work around HW issues, let's add support for overriding the ID_AA64PFR1_EL1.BTI field. This is further mapped on the arm64.nobti command-line alias. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: David Brazdil <dbrazdil@google.com> Tested-by: Srinivas Ramana <sramana@codeaurora.org> Link: https://lore.kernel.org/r/20210208095732.3267263-21-maz@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
|
#
1945a067 |
|
08-Feb-2021 |
Marc Zyngier <maz@kernel.org> |
arm64: Make kvm-arm.mode={nvhe, protected} an alias of id_aa64mmfr1.vh=0 Admitedly, passing id_aa64mmfr1.vh=0 on the command-line isn't that easy to understand, and it is likely that users would much prefer write "kvm-arm.mode=nvhe", or "...=protected". So here you go. This has the added advantage that we can now always honor the "kvm-arm.mode=protected" option, even when booting on a VHE system. Signed-off-by: Marc Zyngier <maz@kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: David Brazdil <dbrazdil@google.com> Link: https://lore.kernel.org/r/20210208095732.3267263-18-maz@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
|
#
f8408264 |
|
14-Jan-2021 |
Viresh Kumar <viresh.kumar@linaro.org> |
drivers: Remove CONFIG_OPROFILE support The "oprofile" user-space tools don't use the kernel OPROFILE support any more, and haven't in a long time. User-space has been converted to the perf interfaces. Remove kernel's old oprofile support. Suggested-by: Christoph Hellwig <hch@infradead.org> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Robert Richter <rric@kernel.org> Acked-by: Paul E. McKenney <paulmck@kernel.org> #RCU Acked-by: William Cohen <wcohen@redhat.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Thomas Gleixner <tglx@linutronix.de>
|
#
3daa96d6 |
|
10-Nov-2020 |
Peter Zijlstra <peterz@infradead.org> |
perf/intel: Remove Perfmon-v4 counter_freezing support Perfmon-v4 counter freezing is fundamentally broken; remove this default disabled code to make sure nobody uses it. The feature is called Freeze-on-PMI in the SDM, and if it would do that, there wouldn't actually be a problem, *however* it does something subtly different. It globally disables the whole PMU when it raises the PMI, not when the PMI hits. This means there's a window between the PMI getting raised and the PMI actually getting served where we loose events and this violates the perf counter independence. That is, a counting event should not result in a different event count when there is a sampling event co-scheduled. This is known to break existing software (RR). Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
|
#
03cee168 |
|
07-Jan-2021 |
Lakshmi Ramasubramanian <nramas@linux.microsoft.com> |
IMA: define a builtin critical data measurement policy Define a new critical data builtin policy to allow measuring early kernel integrity critical data before a custom IMA policy is loaded. Update the documentation on kernel parameters to document the new critical data builtin policy. Signed-off-by: Lakshmi Ramasubramanian <nramas@linux.microsoft.com> Reviewed-by: Tyler Hicks <tyhicks@linux.microsoft.com> Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
|
#
5831c0f7 |
|
09-Dec-2020 |
Peter Zijlstra <peterz@infradead.org> |
locking/selftests: More granular debug_locks_verbose Showing all tests all the time is tiresome. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
|
#
8a67a20b |
|
25-Nov-2020 |
Paul E. McKenney <paulmck@kernel.org> |
torture: Throttle VERBOSE_TOROUT_*() output This commit adds kernel boot parameters torture.verbose_sleep_frequency and torture.verbose_sleep_duration, which allow VERBOSE_TOROUT_*() output to be throttled with periodic sleeps on large systems. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
e76506f0 |
|
15-Nov-2020 |
Paul E. McKenney <paulmck@kernel.org> |
refscale: Allow summarization of verbose output The refscale test prints enough per-kthread console output to provoke RCU CPU stall warnings on large systems. This commit therefore allows this output to be summarized. For example, the refscale.verbose_batched=32 boot parameter would causes only every 32nd line of output to be logged. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
2c4319bd |
|
23-Sep-2020 |
Paul E. McKenney <paulmck@kernel.org> |
rcutorture: Test runtime toggling of CPUs' callback offloading Frederic Weisbecker is adding the ability to change the rcu_nocbs state of CPUs at runtime, that is, to offload and deoffload their RCU callback processing without the need to reboot. As the old saying goes, "if it ain't tested, it don't work", so this commit therefore adds prototype rcutorture testing for this capability. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Frederic Weisbecker <frederic@kernel.org>
|
#
36221e10 |
|
15-Dec-2020 |
Julia Cartwright <julia@ni.com> |
rcu: Enable rcu_normal_after_boot unconditionally for RT Expedited RCU grace periods send IPIs to all non-idle CPUs, and thus can disrupt time-critical code in real-time applications. However, there is a portion of boot-time processing (presumably before any real-time applications have started) where expedited RCU grace periods are the only option. And so it is that experience with the -rt patchset indicates that PREEMPT_RT systems should always set the rcupdate.rcu_normal_after_boot kernel boot parameter. This commit therefore makes the post-boot application environment safe for real-time applications by making PREEMPT_RT systems disable the rcupdate.rcu_normal_after_boot kernel boot parameter and acting as if this parameter had been set. This means that post-boot calls to synchronize_rcu_expedited() will be treated as if they were instead calls to synchronize_rcu(), thus preventing the IPIs, and thus avoiding disrupting real-time applications. Suggested-by: Luiz Capitulino <lcapitulino@redhat.com> Acked-by: Paul E. McKenney <paulmck@linux.ibm.com> Signed-off-by: Julia Cartwright <julia@ni.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> [ paulmck: Update kernel-parameters.txt accordingly. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
8b9a0ecc |
|
15-Dec-2020 |
Scott Wood <swood@redhat.com> |
rcu: Unconditionally use rcuc threads on PREEMPT_RT PREEMPT_RT systems have long used the rcutree.use_softirq kernel boot parameter to avoid use of RCU_SOFTIRQ handlers, which can disrupt real-time applications by invoking callbacks during return from interrupts that arrived while executing time-critical code. This kernel boot parameter instead runs RCU core processing in an 'rcuc' kthread, thus allowing the scheduler to do its job of avoiding disrupting time-critical code. This commit therefore disables the rcutree.use_softirq kernel boot parameter on PREEMPT_RT systems, thus forcing such systems to do RCU core processing in 'rcuc' kthreads. This approach has long been in use by users of the -rt patchset, and there have been no complaints. There is therefore no way for the system administrator to override this choice, at least without modifying and rebuilding the kernel. Signed-off-by: Scott Wood <swood@redhat.com> [bigeasy: Reword commit message] Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> [ paulmck: Update kernel-parameters.txt accordingly. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
2252ec14 |
|
10-Dec-2020 |
Paul E. McKenney <paulmck@kernel.org> |
doc: Remove obsolete rcutree.rcu_idle_lazy_gp_delay boot parameter This commit removes documentation for the rcutree.rcu_idle_lazy_gp_delay kernel boot parameter given that this parameter no longer exists. Fixes: 77a40f97030b ("rcu: Remove kfree_rcu() special casing and lazy-callback handling") Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
b36b0fe9 |
|
06-Jan-2021 |
David Woodhouse <dwmw@amazon.co.uk> |
x86/xen: Add xen_no_vector_callback option to test PCI INTX delivery It's useful to be able to test non-vector event channel delivery, to make sure Linux will work properly on older Xen which doesn't have it. It's also useful for those working on Xen and Xen-compatible hypervisors, because there are guest kernels still in active use which use PCI INTX even when vector delivery is available. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Link: https://lore.kernel.org/r/20210106153958.584169-4-dwmw2@infradead.org Signed-off-by: Juergen Gross <jgross@suse.com>
|
#
25942e5e |
|
31-Dec-2020 |
Randy Dunlap <rdunlap@infradead.org> |
Documentation/admin-guide: kernel-parameters: hyphenate comma-separated Hyphenate "comma separated" when it is used as a compound adjective. hyphenated. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Link: https://lore.kernel.org/r/20210101040831.4148-1-rdunlap@infradead.org Signed-off-by: Jonathan Corbet <corbet@lwn.net>
|
#
8010622c |
|
09-Dec-2020 |
Oliver Neukum <oneukum@suse.com> |
USB: UAS: introduce a quirk to set no_write_same UAS does not share the pessimistic assumption storage is making that devices cannot deal with WRITE_SAME. A few devices supported by UAS, are reported to not deal well with WRITE_SAME. Those need a quirk. Add it to the device that needs it. Reported-by: David C. Partridge <david.partridge@perdrix.co.uk> Signed-off-by: Oliver Neukum <oneukum@suse.com> Cc: stable <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20201209152639.9195-1-oneukum@suse.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
d8b369c4 |
|
02-Dec-2020 |
David Brazdil <dbrazdil@google.com> |
KVM: arm64: Add kvm-arm.mode early kernel parameter Add an early parameter that allows users to select the mode of operation for KVM/arm64. For now, the only supported value is "protected". By passing this flag users opt into the hypervisor placing additional restrictions on the host kernel. These allow the hypervisor to spawn guests whose state is kept private from the host. Restrictions will include stage-2 address translation to prevent host from accessing guest memory, filtering its SMC calls, etc. Without this parameter, the default behaviour remains selecting VHE/nVHE based on hardware support and CONFIG_ARM64_VHE. Signed-off-by: David Brazdil <dbrazdil@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20201202184122.26046-2-dbrazdil@google.com
|
#
4c8e3de4 |
|
28-Nov-2020 |
Barry Song <song.bao.hua@hisilicon.com> |
Documentation/admin-guide: mark memmap parameter is supported by a few architectures early_param memmap is only implemented on X86, MIPS and XTENSA. To avoid wasting users’ time on trying this on platform like ARM, mark it clearly. Signed-off-by: Barry Song <song.bao.hua@hisilicon.com> Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> Link: https://lore.kernel.org/r/20201128195121.2556-1-song.bao.hua@hisilicon.com Signed-off-by: Jonathan Corbet <corbet@lwn.net>
|
#
58a8bb39 |
|
24-Nov-2020 |
Lu Baolu <baolu.lu@linux.intel.com> |
iommu/vt-d: Cleanup after converting to dma-iommu ops Some cleanups after converting the driver to use dma-iommu ops. - Remove nobounce option; - Cleanup and simplify the path in domain mapping. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Tested-by: Logan Gunthorpe <logang@deltatee.com> Link: https://lore.kernel.org/r/20201124082057.2614359-8-baolu.lu@linux.intel.com Signed-off-by: Will Deacon <will@kernel.org>
|
#
9a32a7e7 |
|
16-Nov-2020 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc/64s: flush L1D after user accesses IBM Power9 processors can speculatively operate on data in the L1 cache before it has been completely validated, via a way-prediction mechanism. It is not possible for an attacker to determine the contents of impermissible memory using this method, since these systems implement a combination of hardware and software security measures to prevent scenarios where protected data could be leaked. However these measures don't address the scenario where an attacker induces the operating system to speculatively execute instructions using data that the attacker controls. This can be used for example to speculatively bypass "kernel user access prevention" techniques, as discovered by Anthony Steinhauser of Google's Safeside Project. This is not an attack by itself, but there is a possibility it could be used in conjunction with side-channels or other weaknesses in the privileged code to construct an attack. This issue can be mitigated by flushing the L1 cache between privilege boundaries of concern. This patch flushes the L1 cache after user accesses. This is part of the fix for CVE-2020-4788. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
f7964378 |
|
16-Nov-2020 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc/64s: flush L1D on kernel entry IBM Power9 processors can speculatively operate on data in the L1 cache before it has been completely validated, via a way-prediction mechanism. It is not possible for an attacker to determine the contents of impermissible memory using this method, since these systems implement a combination of hardware and software security measures to prevent scenarios where protected data could be leaked. However these measures don't address the scenario where an attacker induces the operating system to speculatively execute instructions using data that the attacker controls. This can be used for example to speculatively bypass "kernel user access prevention" techniques, as discovered by Anthony Steinhauser of Google's Safeside Project. This is not an attack by itself, but there is a possibility it could be used in conjunction with side-channels or other weaknesses in the privileged code to construct an attack. This issue can be mitigated by flushing the L1 cache between privilege boundaries of concern. This patch flushes the L1 cache on kernel entry. This is part of the fix for CVE-2020-4788. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
38853a30 |
|
12-Nov-2020 |
Jarkko Sakkinen <jarkko@kernel.org> |
x86/cpu/intel: Add a nosgx kernel parameter Add a kernel parameter to disable SGX kernel support and document it. [ bp: Massage. ] Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Acked-by: Jethro Beekman <jethro@fortanix.com> Tested-by: Sean Christopherson <sean.j.christopherson@intel.com> Link: https://lkml.kernel.org/r/20201112220135.165028-9-jarkko@kernel.org
|
#
0f12999e |
|
11-Sep-2020 |
Krzysztof Kozlowski <krzk@kernel.org> |
Documentation: Update paths of Samsung S3C machine files Documentation references Samsung S3C24xx and S3C64xx machine files in multiple places but the files were traveling around the kernel multiple times. Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Link: https://lore.kernel.org/r/20200911143343.498-1-krzk@kernel.org
|
#
1a89c1dc |
|
22-Oct-2020 |
Juergen Gross <jgross@suse.com> |
Documentation: add xen.fifo_events kernel parameter description The kernel boot parameter xen.fifo_events isn't listed in Documentation/admin-guide/kernel-parameters.txt. Add it. Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Link: https://lore.kernel.org/r/20201022094907.28560-6-jgross@suse.com Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
|
#
e99502f7 |
|
07-Sep-2020 |
Juergen Gross <jgross@suse.com> |
xen/events: defer eoi in case of excessive number of events In case rogue guests are sending events at high frequency it might happen that xen_evtchn_do_upcall() won't stop processing events in dom0. As this is done in irq handling a crash might be the result. In order to avoid that, delay further inter-domain events after some time in xen_evtchn_do_upcall() by forcing eoi processing into a worker on the same cpu, thus inhibiting new events coming in. The time after which eoi processing is to be delayed is configurable via a new module parameter "event_loop_timeout" which specifies the maximum event loop time in jiffies (default: 2, the value was chosen after some tests showing that a value of 2 was the lowest with an only slight drop of dom0 network throughput while multiple guests performed an event storm). How long eoi processing will be delayed can be specified via another parameter "event_eoi_delay" (again in jiffies, default 10, again the value was chosen after testing with different delay values). This is part of XSA-332. Cc: stable@vger.kernel.org Reported-by: Julien Grall <julien@xen.org> Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Reviewed-by: Wei Liu <wl@xen.org>
|
#
2c739ced |
|
15-Oct-2020 |
Albert van der Linde <alinde@google.com> |
lib, include/linux: add usercopy failure capability Patch series "add fault injection to user memory access", v3. The goal of this series is to improve testing of fault-tolerance in usages of user memory access functions, by adding support for fault injection. syzkaller/syzbot are using the existing fault injection modes and will use this particular feature also. The first patch adds failure injection capability for usercopy functions. The second changes usercopy functions to use this new failure capability (copy_from_user, ...). The third patch adds get/put/clear_user failures to x86. This patch (of 3): Add a failure injection capability to improve testing of fault-tolerance in usages of user memory access functions. Add CONFIG_FAULT_INJECTION_USERCOPY to enable faults in usercopy functions. The should_fail_usercopy function is to be called by these functions (copy_from_user, get_user, ...) in order to fail or not. Signed-off-by: Albert van der Linde <alinde@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Akinobu Mita <akinobu.mita@gmail.com> Reviewed-by: Alexander Potapenko <glider@google.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Marco Elver <elver@google.com> Cc: Christoph Hellwig <hch@lst.de> Link: http://lkml.kernel.org/r/20200831171733.955393-1-alinde@google.com Link: http://lkml.kernel.org/r/20200831171733.955393-2-alinde@google.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
0b1abd1f |
|
11-Sep-2020 |
Christoph Hellwig <hch@lst.de> |
dma-mapping: merge <linux/dma-contiguous.h> into <linux/dma-map-ops.h> Merge dma-contiguous.h into dma-map-ops.h, after removing the comment describing the contiguous allocator into kernel/dma/contigous.c. Signed-off-by: Christoph Hellwig <hch@lst.de>
|
#
307e3ee9 |
|
15-Sep-2020 |
Randy Dunlap <rdunlap@infradead.org> |
Documentation: kernel-parameters: clarify "module." parameters The command-line parameters "dyndbg" and "async_probe" are not parameters for kernel/module.c but instead they are for the module that is being loaded. Try to make that distinction in the help text. OTOH, "module.sig_enforce" is handled as a parameter of kernel/module.c so "module." is correct for it. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jessica Yu <jeyu@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: linux-doc@vger.kernel.org Link: https://lore.kernel.org/r/67d40b6d-c073-a3bf-cbb6-6cad941cceeb@infradead.org Signed-off-by: Jonathan Corbet <corbet@lwn.net>
|
#
6b99e6e6 |
|
17-Sep-2020 |
Randy Dunlap <rdunlap@infradead.org> |
Documentation/admin-guide: blockdev/ramdisk: remove use of "rdev" Remove use of "rdev" from blockdev/ramdisk.rst and update admin-guide/kernel-parameters.txt. "rdev" is considered antiquated, ancient, archaic, obsolete, deprecated {choose any or all}. "rdev" was removed from util-linux in 2010: https://git.kernel.org/pub/scm/utils/util-linux/util-linux.git/commit/?id=a3e40c14651fccf18e7954f081e601389baefe3f Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Karel Zak <kzak@redhat.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: linux-block@vger.kernel.org Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Martin Mares <mj@ucw.cz> Cc: linux-video@atrey.karlin.mff.cuni.cz Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: linux-doc@vger.kernel.org Link: https://lore.kernel.org/r/20200918015640.8439-3-rdunlap@infradead.org Signed-off-by: Jonathan Corbet <corbet@lwn.net>
|
#
497de97e |
|
17-Sep-2020 |
Randy Dunlap <rdunlap@infradead.org> |
Documentation/admin-guide: kernel-parameters: capitalize Korina Fix typo, capitalize Korina proper noun. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: linux-doc@vger.kernel.org Link: https://lore.kernel.org/r/20200918054722.28713-1-rdunlap@infradead.org Signed-off-by: Jonathan Corbet <corbet@lwn.net>
|
#
622381e6 |
|
17-Sep-2020 |
Randy Dunlap <rdunlap@infradead.org> |
Documentation: admin-guide: kernel-parameters: reformat "lapic=" boot option Reformat "lapic=" to try to make it more understandable and similar to the style that is mostly used in this file. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Link: https://lore.kernel.org/r/20200918054739.2523-1-rdunlap@infradead.org Signed-off-by: Jonathan Corbet <corbet@lwn.net>
|
#
7c42376e |
|
17-Sep-2020 |
Randy Dunlap <rdunlap@infradead.org> |
Documentation/admin-guide: kernel-parameters: fix "io7" parameter description Fix punctuation and capitalization for the "io7" boot parameter. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: linux-doc@vger.kernel.org Link: https://lore.kernel.org/r/20200918054751.6538-1-rdunlap@infradead.org Signed-off-by: Jonathan Corbet <corbet@lwn.net>
|
#
255bf90f |
|
17-Sep-2020 |
Randy Dunlap <rdunlap@infradead.org> |
Documentation/admin-guide: kernel-parameters: fix "disable_ddw" wording Drop and extraneous word (if) in a sentence. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: linux-doc@vger.kernel.org Link: https://lore.kernel.org/r/20200918054803.6588-1-rdunlap@infradead.org Signed-off-by: Jonathan Corbet <corbet@lwn.net>
|
#
c372e741 |
|
18-Sep-2020 |
Tian Tao <tiantao6@hisilicon.com> |
Documentation: Remove CMA's dependency on architecture CMA only depends on MMU. It doesn't depend on arch too much. such as ARM, ARM64, X86, MIPS etc. so We remove the dependency of cma about the architecture in kernel-parameters.txt. Signed-off-by: Tian Tao <tiantao6@hisilicon.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Link: https://lore.kernel.org/r/1600412758-60545-1-git-send-email-tiantao6@hisilicon.com Signed-off-by: Jonathan Corbet <corbet@lwn.net>
|
#
42769488 |
|
18-Sep-2020 |
Randy Dunlap <rdunlap@infradead.org> |
Documentation: kernel-parameters: fix formatting of MIPS "machtype" For the "machtype" boot parameter, fix word spacing, line wrap, and plural of "laptops". Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Jiaxun Yang <jiaxun.yang@flygoat.com> Link: https://lore.kernel.org/r/c9059e35-188d-a749-1907-767b53479328@infradead.org Signed-off-by: Jonathan Corbet <corbet@lwn.net>
|
#
aed26eeb |
|
21-Sep-2020 |
Satheesh Rajendran <sathnaga@linux.vnet.ibm.com> |
Doc: admin-guide: Add entry for kvm_cma_resv_ratio kernel param Add document entry for kvm_cma_resv_ratio kernel param which is used to alter the KVM contiguous memory allocation percentage for hash pagetable allocation used by hash mode PowerPC KVM guests. Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com> Link: https://lore.kernel.org/r/20200921090220.14981-1-sathnaga@linux.vnet.ibm.com Signed-off-by: Jonathan Corbet <corbet@lwn.net>
|
#
5b280ed4 |
|
10-Sep-2020 |
Tian Tao <tiantao6@hisilicon.com> |
Documentation: arm64 also supports disable hugeiomap arm64 also supports disable hugeiomap,updated documentation. Signed-off-by: Tian Tao <tiantao6@hisilicon.com> Link: https://lore.kernel.org/r/1599740386-47210-1-git-send-email-tiantao6@hisilicon.com Signed-off-by: Jonathan Corbet <corbet@lwn.net>
|
#
0a4bb5e5 |
|
07-Sep-2020 |
Arvind Sankar <nivedita@alum.mit.edu> |
x86/fpu: Allow multiple bits in clearcpuid= parameter Commit 0c2a3913d6f5 ("x86/fpu: Parse clearcpuid= as early XSAVE argument") changed clearcpuid parsing from __setup() to cmdline_find_option(). While the __setup() function would have been called for each clearcpuid= parameter on the command line, cmdline_find_option() will only return the last one, so the change effectively made it impossible to disable more than one bit. Allow a comma-separated list of bit numbers as the argument for clearcpuid to allow multiple bits to be disabled again. Log the bits being disabled for informational purposes. Also fix the check on the return value of cmdline_find_option(). It returns -1 when the option is not found, so testing as a boolean is incorrect. Fixes: 0c2a3913d6f5 ("x86/fpu: Parse clearcpuid= as early XSAVE argument") Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200907213919.2423441-1-nivedita@alum.mit.edu
|
#
b7176c26 |
|
23-Aug-2020 |
Barry Song <song.bao.hua@hisilicon.com> |
dma-contiguous: provide the ability to reserve per-numa CMA Right now, drivers like ARM SMMU are using dma_alloc_coherent() to get coherent DMA buffers to save their command queues and page tables. As there is only one default CMA in the whole system, SMMUs on nodes other than node0 will get remote memory. This leads to significant latency. This patch provides per-numa CMA so that drivers like SMMU can get local memory. Tests show localizing CMA can decrease dma_unmap latency much. For instance, before this patch, SMMU on node2 has to wait for more than 560ns for the completion of CMD_SYNC in an empty command queue; with this patch, it needs 240ns only. A positive side effect of this patch would be improving performance even further for those users who are worried about performance more than DMA security and use iommu.passthrough=1 to skip IOMMU. With local CMA, all drivers can get local coherent DMA buffers. Also, this patch changes the default CONFIG_CMA_AREAS to 19 in NUMA. As 1+CONFIG_CMA_AREAS should be quite enough for most servers on the market even they enable both hugetlb_cma and pernuma_cma. 2 numa nodes: 2(hugetlb) + 2(pernuma) + 1(default global cma) = 5 4 numa nodes: 4(hugetlb) + 4(pernuma) + 1(default global cma) = 9 8 numa nodes: 8(hugetlb) + 8(pernuma) + 1(default global cma) = 17 Signed-off-by: Barry Song <song.bao.hua@hisilicon.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
|
#
316cdaa1 |
|
26-Aug-2020 |
Mahesh Bandewar <maheshb@google.com> |
net: add option to not create fall-back tunnels in root-ns as well The sysctl that was added earlier by commit 79134e6ce2c ("net: do not create fallback tunnels for non-default namespaces") to create fall-back only in root-ns. This patch enhances that behavior to provide option not to create fallback tunnels in root-ns as well. Since modules that create fallback tunnels could be built-in and setting the sysctl value after booting is pointless, so added a kernel cmdline options to change this default. The default setting is preseved for backward compatibility. The kernel command line option of fb_tunnels=initns will set the sysctl value to 1 and will create fallback tunnels only in initns while kernel cmdline fb_tunnels=none will set the sysctl value to 2 and fallback tunnels are skipped in every netns. Signed-off-by: Mahesh Bandewar <maheshb@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Maciej Zenczykowski <maze@google.com> Cc: Jian Yang <jianyang@google.com> Cc: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d6855142 |
|
11-Aug-2020 |
Paul E. McKenney <paulmck@kernel.org> |
rcutorture: Allow pointer leaks to test diagnostic code This commit adds an rcutorture.leakpointer module parameter that intentionally leaks an RCU-protected pointer out of the RCU read-side critical section and checks to see if the corresponding grace period has elapsed, emitting a WARN_ON_ONCE() if so. This module parameter can be used to test facilities like CONFIG_RCU_STRICT_GRACE_PERIOD that end grace periods quickly. While in the area, also document rcutorture.irqreader, which was previously left out. Reported-by Jann Horn <jannh@google.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
3d29aaf1 |
|
07-Aug-2020 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Provide optional RCU-reader exit delay for strict GPs The goal of this series is to increase the probability of tools like KASAN detecting that an RCU-protected pointer was used outside of its RCU read-side critical section. Thus far, the approach has been to make grace periods and callback processing happen faster. Another approach is to delay the pointer leaker. This commit therefore allows a delay to be applied to exit from RCU read-side critical sections. This slowdown is specified by a new rcutree.rcu_unlock_delay kernel boot parameter that specifies this delay in microseconds, defaulting to zero. Reported-by Jann Horn <jannh@google.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
4e88ec4a |
|
11-Aug-2020 |
Paul E. McKenney <paulmck@kernel.org> |
rcuperf: Change rcuperf to rcuscale This commit further avoids conflation of rcuperf with the kernel's perf feature by renaming kernel/rcu/rcuperf.c to kernel/rcu/rcuscale.c, and also by similarly renaming the functions and variables inside this file. This has the side effect of changing the names of the kernel boot parameters, so kernel-parameters.txt and ver_functions.sh are also updated. The rcutorture --torture type was also updated from rcuperf to rcuscale. [ paulmck: Fix bugs located by Stephen Rothwell. ] Reported-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
e9d338a0 |
|
24-Jun-2020 |
Paul E. McKenney <paulmck@kernel.org> |
scftorture: Add smp_call_function() torture test This commit adds an smp_call_function() torture test that repeatedly invokes this function and complains if things go badly awry. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
160c7ba3 |
|
08-Jul-2020 |
Paul E. McKenney <paulmck@kernel.org> |
lib: Add backtrace_idle parameter to force backtrace of idle CPUs Currently, the nmi_cpu_backtrace() declines to produce backtraces for idle CPUs. This is a good choice in the common case in which problems are caused only by non-idle CPUs. However, there are occasionally situations in which idle CPUs are helping to cause problems. This commit therefore adds an nmi_backtrace.backtrace_idle kernel boot parameter that causes nmi_cpu_backtrace() to dump stacks even of idle CPUs. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: <linux-doc@vger.kernel.org>
|
#
fb1201ae |
|
16-Aug-2020 |
Ard Biesheuvel <ardb@kernel.org> |
Documentation: efi: remove description of efi=old_map The old EFI runtime region mapping logic that was kept around for some time has finally been removed entirely, along with the SGI UV1 support code that was its last remaining user. So remove any mention of the efi=old_map command line parameter from the docs. Cc: Jonathan Corbet <corbet@lwn.net> Cc: linux-doc@vger.kernel.org Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
|
#
be3a5b0e |
|
09-Aug-2020 |
Randy Dunlap <rdunlap@infradead.org> |
Doc: admin-guide: use correct legends in kernel-parameters.txt Documentation/admin-guide/kernel-parameters.rst includes a legend telling us what configurations or hardware platforms are relevant for certain boot options. For X86, it is spelled "X86" and for x86_64, it is spelled "X86-64", so make corrections for those. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: linux-doc@vger.kernel.org Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: x86@kernel.org Link: https://lore.kernel.org/r/20200810024941.30231-1-rdunlap@infradead.org Signed-off-by: Jonathan Corbet <corbet@lwn.net>
|
#
e17f1dfb |
|
07-Aug-2020 |
Vlastimil Babka <vbabka@suse.cz> |
mm, slub: extend slub_debug syntax for multiple blocks Patch series "slub_debug fixes and improvements". The slub_debug kernel boot parameter can either apply a single set of options to all caches or a list of caches. There is a use case where debugging is applied for all caches and then disabled at runtime for specific caches, for performance and memory consumption reasons [1]. As runtime changes are dangerous, extend the boot parameter syntax so that multiple blocks of either global or slab-specific options can be specified, with blocks delimited by ';'. This will also support the use case of [1] without runtime changes. For details see the updated Documentation/vm/slub.rst [1] https://lore.kernel.org/r/1383cd32-1ddc-4dac-b5f8-9c42282fa81c@codeaurora.org [weiyongjun1@huawei.com: make parse_slub_debug_flags() static] Link: http://lkml.kernel.org/r/20200702150522.4940-1-weiyongjun1@huawei.com Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Christoph Lameter <cl@linux.com> Cc: Jann Horn <jannh@google.com> Cc: Roman Gushchin <guro@fb.com> Cc: Vijayanand Jitta <vjitta@codeaurora.org> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Link: http://lkml.kernel.org/r/20200610163135.17364-2-vbabka@suse.cz Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
bf6b7661 |
|
27-Jul-2020 |
Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> |
powerpc/book3s64/radix: Add kernel command line option to disable radix GTSE This adds a kernel command line option that can be used to disable GTSE support. Disabling GTSE implies kernel will make hcalls to invalidate TLB entries. This was done so that we can do VM migration between configs that enable/disable GTSE support via hypervisor. To migrate a VM from a system that supports GTSE to a system that doesn't, we can boot the guest with radix_hcall_invalidate=on, thereby forcing the guest to use hcalls for TLB invalidates. The check for hcall availability is done in pSeries_setup_arch so that the panic message appears on the console. This should only happen on a hypervisor that doesn't force the guest to hash translation even though it can't handle the radix GTSE=0 request via CAS. With radix_hcall_invalidate=on if the hypervisor doesn't support hcall_rpt_invalidate hcall it should force the LPAR to hash translation. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Tested-by: Bharata B Rao <bharata@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200727085908.420806-1-aneesh.kumar@linux.ibm.com
|
#
a24c6f7b |
|
16-Jul-2020 |
Peter Enderborg <peter.enderborg@sony.com> |
debugfs: Add access restriction option Since debugfs include sensitive information it need to be treated carefully. But it also has many very useful debug functions for userspace. With this option we can have same configuration for system with need of debugfs and a way to turn it off. This gives a extra protection for exposure on systems where user-space services with system access are attacked. It is controlled by a configurable default value that can be override with a kernel command line parameter. (debugfs=) It can be on or off, but also internally on but not seen from user-space. This no-mount mode do not register a debugfs as filesystem, but client can register their parts in the internal structures. This data can be readed with a debugger or saved with a crashkernel. When it is off clients get EPERM error when accessing the functions for registering their components. Signed-off-by: Peter Enderborg <peter.enderborg@sony.com> Link: https://lore.kernel.org/r/20200716071511.26864-3-peter.enderborg@sony.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
9a3c05e6 |
|
23-Oct-2019 |
Zhenzhong Duan <zhenzhong.duan@oracle.com> |
xen: Mark "xen_nopvspin" parameter obsolete Map "xen_nopvspin" to "nopvspin", fix stale description of "xen_nopvspin" as we use qspinlock now. Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Juergen Gross <jgross@suse.com> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
05eee619 |
|
23-Oct-2019 |
Zhenzhong Duan <zhenzhong.duan@oracle.com> |
x86/kvm: Add "nopvspin" parameter to disable PV spinlocks There are cases where a guest tries to switch spinlocks to bare metal behavior (e.g. by setting "xen_nopvspin" on XEN platform and "hv_nopvspin" on HYPER_V). That feature is missed on KVM, add a new parameter "nopvspin" to disable PV spinlocks for KVM guest. The new 'nopvspin' parameter will also replace Xen and Hyper-V specific parameters in future patches. Define variable nopvsin as global because it will be used in future patches as above. Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krcmar <rkrcmar@redhat.com> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Wanpeng Li <wanpengli@tencent.com> Cc: Jim Mattson <jmattson@google.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
6b2484e1 |
|
27-Jun-2020 |
Alexander A. Klimov <grandmaster@al2klimov.de> |
Replace HTTP links with HTTPS ones: Documentation/admin-guide Rationale: Reduces attack surface on kernel devs opening the links for MITM as HTTPS traffic is much harder to manipulate. Deterministic algorithm: For each file: If not .svg: For each line: If doesn't contain `\bxmlns\b`: For each link, `\bhttp://[^# \t\r\n]*(?:\w|/)`: If both the HTTP and HTTPS versions return 200 OK and serve the same content: Replace HTTP with HTTPS. Signed-off-by: Alexander A. Klimov <grandmaster@al2klimov.de> Link: https://lore.kernel.org/r/20200627072935.62652-1-grandmaster@al2klimov.de Signed-off-by: Jonathan Corbet <corbet@lwn.net>
|
#
8412b456 |
|
29-Jun-2020 |
Quentin Perret <qperret@google.com> |
cpufreq: Specify default governor on command line Currently, the only way to specify the default CPUfreq governor is via Kconfig options, which suits users who can build the kernel themselves perfectly. However, for those who use a distro-like kernel (such as Android, with the Generic Kernel Image project), the only way to use a non-default governor is to boot to userspace, and to then switch using the sysfs interface. Being able to specify the default governor on the command line, like is the case for cpuidle, would allow those users to specify their governor of choice earlier on, and to simplify the userspace boot procedure slighlty. To support this use-case, add a kernel command line parameter allowing the default governor for CPUfreq to be specified, which takes precedence over the built-in default. This implementation has one notable limitation: the default governor must be registered before the driver. This is solved for builtin governors and drivers using appropriate *_initcall() functions. And in the modular case, this must be reflected as a constraint on the module loading order. Signed-off-by: Quentin Perret <qperret@google.com> [ Viresh: Converted 'default_governor' to a string and parsing it only at initcall level, and several updates to cpufreq_init_policy(). ] Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> [ rjw: Changelog ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
#
2102ad29 |
|
16-Jun-2020 |
Paul E. McKenney <paulmck@kernel.org> |
torture: Dump ftrace at shutdown only if requested If there is a large number of torture tests running concurrently, all of which are dumping large ftrace buffers at shutdown time, the resulting dumping can take a very long time, particularly on systems with rotating-rust storage. This commit therefore adds a default-off torture.ftrace_dump_at_shutdown module parameter that enables shutdown-time ftrace-buffer dumping. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
4a5f133c |
|
24-Apr-2020 |
Paul E. McKenney <paulmck@kernel.org> |
rcutorture: Add races with task-exit processing Several variants of Linux-kernel RCU interact with task-exit processing, including preemptible RCU, Tasks RCU, and Tasks Trace RCU. This commit therefore adds testing of this interaction to rcutorture by adding rcutorture.read_exit_burst and rcutorture.read_exit_delay kernel-boot parameters. These kernel parameters control the frequency and spacing of special read-then-exit kthreads that are spawned. [ paulmck: Apply feedback from Dan Carpenter's static checker. ] [ paulmck: Reduce latency to avoid false-positive shutdown hangs. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
1fbeb3a8 |
|
17-Jun-2020 |
Paul E. McKenney <paulmck@kernel.org> |
refperf: Rename refperf.c to refscale.c and change internal names This commit further avoids conflation of refperf with the kernel's perf feature by renaming kernel/rcu/refperf.c to kernel/rcu/refscale.c, and also by similarly renaming the functions and variables inside this file. This has the side effect of changing the names of the kernel boot parameters, so kernel-parameters.txt and ver_functions.sh are also updated. The rcutorture --torture type remains refperf, and this will be addressed in a separate commit. Reported-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
847dd70a |
|
29-May-2020 |
Paul E. McKenney <paulmck@kernel.org> |
doc: Document rcuperf's module parameters This commit adds documentation for the rcuperf module parameters. Cc: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
53c72b59 |
|
25-May-2020 |
Uladzislau Rezki (Sony) <urezki@gmail.com> |
rcu/tree: cache specified number of objects In order to reduce the dynamic need for pages in kfree_rcu(), pre-allocate a configurable number of pages per CPU and link them in a list. When kfree_rcu() reclaims objects, the object's container page is cached into a list instead of being released to the low-level page allocator. Such an approach provides O(1) access to free pages while also reducing the number of requests to the page allocator. It also makes the kfree_rcu() code to have free pages available during a low memory condition. A read-only sysfs parameter (rcu_min_cached_objs) reflects the minimum number of allowed cached pages per CPU. Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
c03f739f |
|
15-Jun-2020 |
Heinrich Schuchardt <xypron.glpk@gmx.de> |
doc: add novamap to efi kernel command line parameters Document the efi=novamap kernel command line parameter. Put the efi parameters into alphabetic order. Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de> Link: https://lore.kernel.org/r/20200616104012.4780-1-xypron.glpk@gmx.de Signed-off-by: Jonathan Corbet <corbet@lwn.net>
|
#
b745cfba |
|
28-May-2020 |
Andy Lutomirski <luto@kernel.org> |
x86/cpu: Enable FSGSBASE on 64bit by default and add a chicken bit Now that FSGSBASE is fully supported, remove unsafe_fsgsbase, enable FSGSBASE by default, and add nofsgsbase to disable it. Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Andi Kleen <ak@linux.intel.com> Link: https://lkml.kernel.org/r/1557309753-24073-17-git-send-email-chang.seok.bae@intel.com Link: https://lkml.kernel.org/r/20200528201402.1708239-13-sashal@kernel.org
|
#
dd649bd0 |
|
28-May-2020 |
Andy Lutomirski <luto@kernel.org> |
x86/cpu: Add 'unsafe_fsgsbase' to enable CR4.FSGSBASE This is temporary. It will allow the next few patches to be tested incrementally. Setting unsafe_fsgsbase is a root hole. Don't do it. Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Andi Kleen <ak@linux.intel.com> Reviewed-by: Andy Lutomirski <luto@kernel.org> Link: https://lkml.kernel.org/r/1557309753-24073-4-git-send-email-chang.seok.bae@intel.com Link: https://lkml.kernel.org/r/20200528201402.1708239-3-sashal@kernel.org
|
#
f117955a |
|
07-Jun-2020 |
Guilherme G. Piccoli <gpiccoli@canonical.com> |
kernel/watchdog.c: convert {soft/hard}lockup boot parameters to sysctl aliases After a recent change introduced by Vlastimil's series [0], kernel is able now to handle sysctl parameters on kernel command line; also, the series introduced a simple infrastructure to convert legacy boot parameters (that duplicate sysctls) into sysctl aliases. This patch converts the watchdog parameters softlockup_panic and {hard,soft}lockup_all_cpu_backtrace to use the new alias infrastructure. It fixes the documentation too, since the alias only accepts values 0 or 1, not the full range of integers. We also took the opportunity here to improve the documentation of the previously converted hung_task_panic (see the patch series [0]) and put the alias table in alphabetical order. [0] http://lkml.kernel.org/r/20200427180433.7029-1-vbabka@suse.cz Signed-off-by: Guilherme G. Piccoli <gpiccoli@canonical.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Kees Cook <keescook@chromium.org> Cc: Iurii Zaikin <yzaikin@google.com> Cc: Luis Chamberlain <mcgrof@kernel.org> Link: http://lkml.kernel.org/r/20200507214624.21911-1-gpiccoli@canonical.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
b467f3ef |
|
07-Jun-2020 |
Vlastimil Babka <vbabka@suse.cz> |
kernel/hung_task convert hung_task_panic boot parameter to sysctl We can now handle sysctl parameters on kernel command line and have infrastructure to convert legacy command line options that duplicate sysctl to become a sysctl alias. This patch converts the hung_task_panic parameter. Note that the sysctl handler is more strict and allows only 0 and 1, while the legacy parameter allowed any non-zero value. But there is little reason anyone would not be using 1. Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Kees Cook <keescook@chromium.org> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Christian Brauner <christian.brauner@ubuntu.com> Cc: David Rientjes <rientjes@google.com> Cc: "Eric W . Biederman" <ebiederm@xmission.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Guilherme G . Piccoli" <gpiccoli@canonical.com> Cc: Iurii Zaikin <yzaikin@google.com> Cc: Ivan Teterevkov <ivan.teterevkov@nutanix.com> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20200427180433.7029-4-vbabka@suse.cz Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
3db978d4 |
|
07-Jun-2020 |
Vlastimil Babka <vbabka@suse.cz> |
kernel/sysctl: support setting sysctl parameters from kernel command line Patch series "support setting sysctl parameters from kernel command line", v3. This series adds support for something that seems like many people always wanted but nobody added it yet, so here's the ability to set sysctl parameters via kernel command line options in the form of sysctl.vm.something=1 The important part is Patch 1. The second, not so important part is an attempt to clean up legacy one-off parameters that do the same thing as a sysctl. I don't want to remove them completely for compatibility reasons, but with generic sysctl support the idea is to remove the one-off param handlers and treat the parameters as aliases for the sysctl variants. I have identified several parameters that mention sysctl counterparts in Documentation/admin-guide/kernel-parameters.txt but there might be more. The conversion also has varying level of success: - numa_zonelist_order is converted in Patch 2 together with adding the necessary infrastructure. It's easy as it doesn't really do anything but warn on deprecated value these days. - hung_task_panic is converted in Patch 3, but there's a downside that now it only accepts 0 and 1, while previously it was any integer value - nmi_watchdog maps to two sysctls nmi_watchdog and hardlockup_panic, so there's no straighforward conversion possible - traceoff_on_warning is a flag without value and it would be required to handle that somehow in the conversion infractructure, which seems pointless for a single flag This patch (of 5): A recently proposed patch to add vm_swappiness command line parameter in addition to existing sysctl [1] made me wonder why we don't have a general support for passing sysctl parameters via command line. Googling found only somebody else wondering the same [2], but I haven't found any prior discussion with reasons why not to do this. Settings the vm_swappiness issue aside (the underlying issue might be solved in a different way), quick search of kernel-parameters.txt shows there are already some that exist as both sysctl and kernel parameter - hung_task_panic, nmi_watchdog, numa_zonelist_order, traceoff_on_warning. A general mechanism would remove the need to add more of those one-offs and might be handy in situations where configuration by e.g. /etc/sysctl.d/ is impractical. Hence, this patch adds a new parse_args() pass that looks for parameters prefixed by 'sysctl.' and tries to interpret them as writes to the corresponding sys/ files using an temporary in-kernel procfs mount. This mechanism was suggested by Eric W. Biederman [3], as it handles all dynamically registered sysctl tables, even though we don't handle modular sysctls. Errors due to e.g. invalid parameter name or value are reported in the kernel log. The processing is hooked right before the init process is loaded, as some handlers might be more complicated than simple setters and might need some subsystems to be initialized. At the moment the init process can be started and eventually execute a process writing to /proc/sys/ then it should be also fine to do that from the kernel. Sysctls registered later on module load time are not set by this mechanism - it's expected that in such scenarios, setting sysctl values from userspace is practical enough. [1] https://lore.kernel.org/r/BL0PR02MB560167492CA4094C91589930E9FC0@BL0PR02MB5601.namprd02.prod.outlook.com/ [2] https://unix.stackexchange.com/questions/558802/how-to-set-sysctl-using-kernel-command-line-parameter [3] https://lore.kernel.org/r/87bloj2skm.fsf@x220.int.ebiederm.org/ Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Acked-by: Kees Cook <keescook@chromium.org> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Iurii Zaikin <yzaikin@google.com> Cc: Ivan Teterevkov <ivan.teterevkov@nutanix.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: "Eric W . Biederman" <ebiederm@xmission.com> Cc: "Guilherme G . Piccoli" <gpiccoli@canonical.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Christian Brauner <christian.brauner@ubuntu.com> Link: http://lkml.kernel.org/r/20200427180433.7029-1-vbabka@suse.cz Link: http://lkml.kernel.org/r/20200427180433.7029-2-vbabka@suse.cz Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
db38d5c1 |
|
07-Jun-2020 |
Rafael Aquini <aquini@redhat.com> |
kernel: add panic_on_taint Analogously to the introduction of panic_on_warn, this patch introduces a kernel option named panic_on_taint in order to provide a simple and generic way to stop execution and catch a coredump when the kernel gets tainted by any given flag. This is useful for debugging sessions as it avoids having to rebuild the kernel to explicitly add calls to panic() into the code sites that introduce the taint flags of interest. For instance, if one is interested in proceeding with a post-mortem analysis at the point a given code path is hitting a bad page (i.e. unaccount_page_cache_page(), or slab_bug()), a coredump can be collected by rebooting the kernel with 'panic_on_taint=0x20' amended to the command line. Another, perhaps less frequent, use for this option would be as a means for assuring a security policy case where only a subset of taints, or no single taint (in paranoid mode), is allowed for the running system. The optional switch 'nousertaint' is handy in this particular scenario, as it will avoid userspace induced crashes by writes to sysctl interface /proc/sys/kernel/tainted causing false positive hits for such policies. [akpm@linux-foundation.org: tweak kernel-parameters.txt wording] Suggested-by: Qian Cai <cai@lca.pw> Signed-off-by: Rafael Aquini <aquini@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Cc: Dave Young <dyoung@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kees Cook <keescook@chromium.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Adrian Bunk <bunk@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Laura Abbott <labbott@redhat.com> Cc: Jeff Mahoney <jeffm@suse.com> Cc: Jiri Kosina <jikos@kernel.org> Cc: Takashi Iwai <tiwai@suse.de> Link: http://lkml.kernel.org/r/20200515175502.146720-1-aquini@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
282f4214 |
|
03-Jun-2020 |
Mike Kravetz <mike.kravetz@oracle.com> |
hugetlbfs: clean up command line processing With all hugetlb page processing done in a single file clean up code. - Make code match desired semantics - Update documentation with semantics - Make all warnings and errors messages start with 'HugeTLB:'. - Consistently name command line parsing routines. - Warn if !hugepages_supported() and command line parameters have been specified. - Add comments to code - Describe some of the subtle interactions - Describe semantics of command line arguments This patch also fixes issues with implicitly setting the number of gigantic huge pages to preallocate. Previously on X86 command line, hugepages=2 default_hugepagesz=1G would result in zero 1G pages being preallocated and, # grep HugePages_Total /proc/meminfo HugePages_Total: 0 # sysctl -a | grep nr_hugepages vm.nr_hugepages = 2 vm.nr_hugepages_mempolicy = 2 # cat /proc/sys/vm/nr_hugepages 2 After this patch 2 gigantic pages will be preallocated and all the proc, sysfs, sysctl and meminfo files will accurately reflect this. To address the issue with gigantic pages, a small change in behavior was made to command line processing. Previously the command line, hugepages=128 default_hugepagesz=2M hugepagesz=2M hugepages=256 would result in the allocation of 256 2M huge pages. The value 128 would be ignored without any warning. After this patch, 128 2M pages will be allocated and a warning message will be displayed indicating the value of 256 is ignored. This change in behavior is required because allocation of implicitly specified gigantic pages must be done when the default_hugepagesz= is encountered for gigantic pages. Previously the code waited until later in the boot process (hugetlb_init), to allocate pages of default size. However the bootmem allocator required for gigantic allocations is not available at this time. Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Sandipan Das <sandipan@linux.ibm.com> Acked-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> [s390] Acked-by: Will Deacon <will@kernel.org> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David S. Miller <davem@davemloft.net> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Longpeng <longpeng2@huawei.com> Cc: Mina Almasry <almasrymina@google.com> Cc: Nitesh Narayan Lal <nitesh@redhat.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Xu <peterx@redhat.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Anders Roxell <anders.roxell@linaro.org> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> Cc: Qian Cai <cai@lca.pw> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Link: http://lkml.kernel.org/r/20200417185049.275845-5-mike.kravetz@oracle.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
f71fc3bc |
|
07-May-2020 |
Douglas Anderson <dianders@chromium.org> |
Documentation: kgdboc: Document new kgdboc_earlycon parameter The recent patch ("kgdboc: Add kgdboc_earlycon to support early kgdb using boot consoles") adds a new kernel command line parameter. Document it. Note that the patch adding the feature does some comparing/contrasting of "kgdboc_earlycon" vs. the existing "ekgdboc". See that patch for more details, but briefly "ekgdboc" can be used _instead_ of "kgdboc" and just makes "kgdboc" do its normal initialization early (only works if your tty driver is already ready). The new "kgdboc_earlycon" works in combination with "kgdboc" and is backed by boot consoles. Signed-off-by: Douglas Anderson <dianders@chromium.org> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Daniel Thompson <daniel.thompson@linaro.org> Link: https://lore.kernel.org/r/20200507130644.v4.9.I7d5eb42c6180c831d47aef1af44d0b8be3fac559@changeid Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
|
#
bd35c77e |
|
23-Jan-2020 |
Krzysztof Piecuch <piecuch@protonmail.com> |
x86/tsc: Add tsc_early_khz command line parameter Changing base clock frequency directly impacts TSC Hz but not CPUID.16h value. An overclocked CPU supporting CPUID.16h and with partial CPUID.15h support will set TSC KHZ according to "best guess" given by CPUID.16h relying on tsc_refine_calibration_work to give better numbers later. tsc_refine_calibration_work will refuse to do its work when the outcome is off the early TSC KHZ value by more than 1% which is certain to happen on an overclocked system. Fix this by adding a tsc_early_khz command line parameter that makes the kernel skip early TSC calibration and use the given value instead. This allows the user to provide the expected TSC frequency that is closer to reality than the one reported by the hardware, enabling tsc_refine_calibration_work to do meaningful error checking. [ tglx: Made the variable __initdata as it's only used on init and removed the error checking in the argument parser because kstrto*() only stores to the variable if the string is valid ] Signed-off-by: Krzysztof Piecuch <piecuch@protonmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/O2CpIOrqLZHgNRkfjRpz_LGqnc1ix_seNIiOCvHY4RHoulOVRo6kMXKuLOfBVTi0SMMevg6Go1uZ_cL9fLYtYdTRNH78ChaFaZyG3VAyYz8=@protonmail.com
|
#
82a1b8ed |
|
11-May-2020 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc/64s/hash: Add stress_slb kernel boot option to increase SLB faults This option increases the number of SLB misses by limiting the number of kernel SLB entries, and increased flushing of cached lookaside information. This helps stress test difficult to hit paths in the kernel. Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Relocate the code into arch/powerpc/mm, s/torture/stress/] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200511125825.3081305-1-mpe@ellerman.id.au
|
#
a74e2a22 |
|
01-May-2020 |
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> |
docs: debugging-via-ohci1394.txt: add it to the core-api book There is an special chapter inside the core-api book about some debug infrastructure like tracepoints and debug objects. It sounded to me that this is the best place to add a chapter explaining how to use a FireWire controller to do remote kernel debugging, as explained on this document. Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Link: https://lore.kernel.org/r/9b489d36d08ad89d3ad5aefef1f52a0715b29716.1588345503.git.mchehab+huawei@kernel.org Signed-off-by: Jonathan Corbet <corbet@lwn.net>
|
#
55b2dcf5 |
|
01-Apr-2020 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Allow rcutorture to starve grace-period kthread This commit provides an rcutorture.stall_gp_kthread module parameter to allow rcutorture to starve the grace-period kthread. This allows testing the code that detects such starvation. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
19a8ff95 |
|
11-Mar-2020 |
Paul E. McKenney <paulmck@kernel.org> |
rcutorture: Add flag to produce non-busy-wait task stalls This commit aids testing of RCU task stall warning messages by adding an rcutorture.stall_cpu_block module parameter that results in the induced stall sleeping within the RCU read-side critical section. Spinning with interrupts disabled is still available via the rcutorture.stall_cpu_irqsoff module parameter, and specifying neither of these two module parameters will spin with preemption disabled. Note that sleeping (as opposed to preemption) results in additional complaints from RCU at context-switch time, so yet more testing. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
d9d6ef25 |
|
30-Apr-2020 |
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> |
docs: networking: convert netconsole.txt to ReST - add SPDX header; - add a document title; - mark code blocks and literals as such; - mark tables as such; - add notes markups; - adjust identation, whitespaces and blank lines; - add to networking/index.rst. Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
19093313 |
|
27-Apr-2020 |
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> |
docs: networking: convert ipv6.txt to ReST Not much to be done here: - add SPDX header; - add a document title; - mark a literal as such, in order to avoid a warning; - add to networking/index.rst. Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1cec2cac |
|
27-Apr-2020 |
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> |
docs: networking: convert ip-sysctl.txt to ReST - add SPDX header; - adjust titles and chapters, adding proper markups; - mark code blocks and literals as such; - mark lists as such; - mark tables as such; - use footnote markup; - adjust identation, whitespaces and blank lines; - add to networking/index.rst. Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9a69fb9c |
|
27-Apr-2020 |
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> |
docs: networking: convert decnet.txt to ReST - add SPDX header; - adjust titles and chapters, adding proper markups; - mark lists as such; - mark code blocks and literals as such; - adjust identation, whitespaces and blank lines; - add to networking/index.rst. Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
de267a7c |
|
01-Apr-2020 |
Pierre Morel <pmorel@linux.ibm.com> |
s390/pci: Documentation for zPCI There are changes in the usage of PCI for the user: - new kernel parameter - modification of the way functions are enumerated Let's document these. Signed-off-by: Pierre Morel <pmorel@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
#
b0afa0f0 |
|
17-Mar-2020 |
Paul E. McKenney <paulmck@kernel.org> |
rcu-tasks: Provide boot parameter to delay IPIs until late in grace period This commit provides a rcupdate.rcu_task_ipi_delay kernel boot parameter that specifies how old the RCU tasks trace grace period must be before the grace-period kthread starts sending IPIs. This delay allows more tasks to pass through rcu_tasks_qs() quiescent states, thus reducing (or even eliminating) the number of IPIs that must be sent. On a short rcutorture test setting this kernel boot parameter to HZ/2 resulted in zero IPIs for all 877 RCU-tasks trace grace periods that elapsed during that test. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
694cfd87 |
|
25-Apr-2020 |
Ronald G. Minnich <rminnich@gmail.com> |
x86/setup: Add an initrdmem= option to specify initrd physical address Add the initrdmem option: initrdmem=ss[KMG],nn[KMG] which is used to specify the physical address of the initrd, almost always an address in FLASH. Also add code for x86 to use the existing phys_init_start and phys_init_size variables in the kernel. This is useful in cases where a kernel and an initrd is placed in FLASH, but there is no firmware file system structure in the FLASH. One such situation occurs when unused FLASH space on UEFI systems has been reclaimed by, e.g., taking it from the Management Engine. For example, on many systems, the ME is given half the FLASH part; not only is 2.75M of an 8M part unused; but 10.75M of a 16M part is unused. This space can be used to contain an initrd, but need to tell Linux where it is. This space is "raw": due to, e.g., UEFI limitations: it can not be added to UEFI firmware volumes without rebuilding UEFI from source or writing a UEFI device driver. It can be referenced only as a physical address and size. At the same time, if a kernel can be "netbooted" or loaded from GRUB or syslinux, the option of not using the physical address specification should be available. Then, it is easy to boot the kernel and provide an initrd; or boot the the kernel and let it use the initrd in FLASH. In practice, this has proven to be very helpful when integrating Linux into FLASH on x86. Hence, the most flexible and convenient path is to enable the initrdmem command line option in a way that it is the last choice tried. For example, on the DigitalLoggers Atomic Pi, an image into FLASH can be burnt in with a built-in command line which includes: initrdmem=0xff968000,0x200000 which specifies a location and size. [ bp: Massage commit message, make it passive. ] [akpm@linux-foundation.org: coding style fixes] Signed-off-by: Ronald G. Minnich <rminnich@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: H. Peter Anvin (Intel) <hpa@zytor.com> Link: http://lkml.kernel.org/r/CAP6exYLK11rhreX=6QPyDQmW7wPHsKNEFtXE47pjx41xS6O7-A@mail.gmail.com Link: https://lkml.kernel.org/r/20200426011021.1cskg0AGd%akpm@linux-foundation.org
|
#
3155f4f4 |
|
22-Apr-2020 |
Alan Stern <stern@rowland.harvard.edu> |
USB: hub: Revert commit bd0e6c9614b9 ("usb: hub: try old enumeration scheme first for high speed devices") Commit bd0e6c9614b9 ("usb: hub: try old enumeration scheme first for high speed devices") changed the way the hub driver enumerates high-speed devices. Instead of using the "new" enumeration scheme first and switching to the "old" scheme if that doesn't work, we start with the "old" scheme. In theory this is better because the "old" scheme is slightly faster -- it involves resetting the device only once instead of twice. However, for a long time Windows used only the "new" scheme. Zeng Tao said that Windows 8 and later use the "old" scheme for high-speed devices, but apparently there are some devices that don't like it. William Bader reports that the Ricoh webcam built into his Sony Vaio laptop not only doesn't enumerate under the "old" scheme, it gets hung up so badly that it won't then enumerate under the "new" scheme! Only a cold reset will fix it. Therefore we will revert the commit and go back to trying the "new" scheme first for high-speed devices. Reported-and-tested-by: William Bader <williambader@hotmail.com> Ref: https://bugzilla.kernel.org/show_bug.cgi?id=207219 Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Fixes: bd0e6c9614b9 ("usb: hub: try old enumeration scheme first for high speed devices") CC: Zeng Tao <prime.zeng@hisilicon.com> CC: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/Pine.LNX.4.44L0.2004221611230.11262-100000@iolanthe.rowland.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
7e5b3c26 |
|
16-Apr-2020 |
Mark Gross <mgross@linux.intel.com> |
x86/speculation: Add Special Register Buffer Data Sampling (SRBDS) mitigation SRBDS is an MDS-like speculative side channel that can leak bits from the random number generator (RNG) across cores and threads. New microcode serializes the processor access during the execution of RDRAND and RDSEED. This ensures that the shared buffer is overwritten before it is released for reuse. While it is present on all affected CPU models, the microcode mitigation is not needed on models that enumerate ARCH_CAPABILITIES[MDS_NO] in the cases where TSX is not supported or has been disabled with TSX_CTRL. The mitigation is activated by default on affected processors and it increases latency for RDRAND and RDSEED instructions. Among other effects this will reduce throughput from /dev/urandom. * Enable administrator to configure the mitigation off when desired using either mitigations=off or srbds=off. * Export vulnerability status via sysfs * Rename file-scoped macros to apply for non-whitelist table initializations. [ bp: Massage, - s/VULNBL_INTEL_STEPPING/VULNBL_INTEL_STEPPINGS/g, - do not read arch cap MSR a second time in tsx_fused_off() - just pass it in, - flip check in cpu_set_bug_bits() to save an indentation level, - reflow comments. jpoimboe: s/Mitigated/Mitigation/ in user-visible strings tglx: Dropped the fused off magic for now ] Signed-off-by: Mark Gross <mgross@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> Tested-by: Neelima Krishnan <neelima.krishnan@intel.com>
|
#
32e2eae2 |
|
04-Mar-2020 |
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> |
media: docs: move user-facing docs to the admin guide Most of the driver-specific documentation is meant to help users of the media subsystem. Move them to the admin-guide. It should be noticed, however, that several of those files are outdated and will require further work in order to make them useful again. Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
|
#
cf11e85f |
|
10-Apr-2020 |
Roman Gushchin <guro@fb.com> |
mm: hugetlb: optionally allocate gigantic hugepages using cma Commit 944d9fec8d7a ("hugetlb: add support for gigantic page allocation at runtime") has added the run-time allocation of gigantic pages. However it actually works only at early stages of the system loading, when the majority of memory is free. After some time the memory gets fragmented by non-movable pages, so the chances to find a contiguous 1GB block are getting close to zero. Even dropping caches manually doesn't help a lot. At large scale rebooting servers in order to allocate gigantic hugepages is quite expensive and complex. At the same time keeping some constant percentage of memory in reserved hugepages even if the workload isn't using it is a big waste: not all workloads can benefit from using 1 GB pages. The following solution can solve the problem: 1) On boot time a dedicated cma area* is reserved. The size is passed as a kernel argument. 2) Run-time allocations of gigantic hugepages are performed using the cma allocator and the dedicated cma area In this case gigantic hugepages can be allocated successfully with a high probability, however the memory isn't completely wasted if nobody is using 1GB hugepages: it can be used for pagecache, anon memory, THPs, etc. * On a multi-node machine a per-node cma area is allocated on each node. Following gigantic hugetlb allocation are using the first available numa node if the mask isn't specified by a user. Usage: 1) configure the kernel to allocate a cma area for hugetlb allocations: pass hugetlb_cma=10G as a kernel argument 2) allocate hugetlb pages as usual, e.g. echo 10 > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages If the option isn't enabled or the allocation of the cma area failed, the current behavior of the system is preserved. x86 and arm-64 are covered by this patch, other architectures can be trivially added later. The patch contains clean-ups and fixes proposed and implemented by Aslan Bakirov and Randy Dunlap. It also contains ideas and suggestions proposed by Rik van Riel, Michal Hocko and Mike Kravetz. Thanks! Signed-off-by: Roman Gushchin <guro@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Andreas Schaufler <andreas.schaufler@gmx.de> Acked-by: Mike Kravetz <mike.kravetz@oracle.com> Acked-by: Michal Hocko <mhocko@kernel.org> Cc: Aslan Bakirov <aslan@fb.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Rik van Riel <riel@surriel.com> Cc: Joonsoo Kim <js1304@gmail.com> Link: http://lkml.kernel.org/r/20200407163840.92263-3-guro@fb.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
cd4ca341 |
|
02-Apr-2020 |
Jimmy Assarsson <jimmyassarsson@gmail.com> |
docs: kernel-parameters.txt: Fix broken references Fix remaining broken references in kernel-parameters.txt. Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Jimmy Assarsson <jimmyassarsson@gmail.com> Link: https://lore.kernel.org/r/20200402172614.3020-2-jimmyassarsson@gmail.com Signed-off-by: Jonathan Corbet <corbet@lwn.net>
|
#
ed01b030 |
|
02-Apr-2020 |
Jimmy Assarsson <jimmyassarsson@gmail.com> |
docs: kernel-parameters.txt: Remove nompx x86/mpx was removed in commit 45fc24e89b7c ("x86/mpx: remove MPX from arch/x86"), this removes the documentation of parameter nompx. Fixes: 45fc24e89b7c ("x86/mpx: remove MPX from arch/x86") Signed-off-by: Jimmy Assarsson <jimmyassarsson@gmail.com> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/r/20200402172614.3020-1-jimmyassarsson@gmail.com Signed-off-by: Jonathan Corbet <corbet@lwn.net>
|
#
f3cd4c86 |
|
06-Apr-2020 |
Baoquan He <bhe@redhat.com> |
mm/memory_hotplug.c: only respect mem= parameter during boot stage In commit 357b4da50a62 ("x86: respect memory size limiting via mem= parameter") a global varialbe max_mem_size is added to store the value parsed from 'mem= ', then checked when memory region is added. This truly stops those DIMMs from being added into system memory during boot-time. However, it also limits the later memory hotplug functionality. Any DIMM can't be hotplugged any more if its region is beyond the max_mem_size. We will get errors like: [ 216.387164] acpi PNP0C80:02: add_memory failed [ 216.389301] acpi PNP0C80:02: acpi_memory_enable_device() error [ 216.392187] acpi PNP0C80:02: Enumeration failure This will cause issue in a known use case where 'mem=' is added to the hypervisor. The memory that lies after 'mem=' boundary will be assigned to KVM guests. After commit 357b4da50a62 merged, memory can't be extended dynamically if system memory on hypervisor is not sufficient. So fix it by also checking if it's during boot-time restricting to add memory. Otherwise, skip the restriction. And also add this use case to document of 'mem=' kernel parameter. Fixes: 357b4da50a62 ("x86: respect memory size limiting via mem= parameter") Signed-off-by: Baoquan He <bhe@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Juergen Gross <jgross@suse.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: William Kucharski <william.kucharski@oracle.com> Cc: David Hildenbrand <david@redhat.com> Cc: Balbir Singh <bsingharora@gmail.com> Link: http://lkml.kernel.org/r/20200204050643.20925-1-bhe@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
db96a759 |
|
02-Apr-2020 |
Chen Yu <yu.c.chen@intel.com> |
PM: sleep: Add pm_debug_messages kernel command line option Debug messages from the system suspend/hibernation infrastructure are disabled by default, and can only be enabled after the system has boot up via /sys/power/pm_debug_messages. This makes the hibernation resume hard to track as it involves system boot up across hibernation. There's no chance for software_resume() to track the resume process, for example. Add a kernel command line option to set pm_debug_messages during boot up. Signed-off-by: Chen Yu <yu.c.chen@intel.com> [ rjw: Subject & changelog ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
#
5fd769c2 |
|
30-Mar-2020 |
Randy Dunlap <rdunlap@infradead.org> |
ACPI: video: Docs update for "acpi_backlight" kernel parameter options Update the Documentation for "acpi_backlight" by adding 2 new options (native and none). Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
#
e73a8f38 |
|
23-Mar-2020 |
Alexey Makhalov <amakhalov@vmware.com> |
x86/vmware: Enable steal time accounting Set paravirt_steal_rq_enabled if steal clock present. paravirt_steal_rq_enabled is used in sched/core.c to adjust task progress by offsetting stolen time. Use 'no-steal-acc' off switch (share same name with KVM) to disable steal time accounting. Signed-off-by: Alexey Makhalov <amakhalov@vmware.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20200323195707.31242-5-amakhalov@vmware.com
|
#
1ffb8d03 |
|
04-Mar-2020 |
Alex Hung <alex.hung@canonical.com> |
acpi/x86: add a kernel parameter to disable ACPI BGRT BGRT is for displaying seamless OEM logo from booting to login screen; however, this mechanism does not always work well on all configurations and the OEM logo can be displayed multiple times. This looks worse than without BGRT enabled. This patch adds a kernel parameter to disable BGRT in boot time. This is easier than re-compiling a kernel with CONFIG_ACPI_BGRT disabled. Signed-off-by: Alex Hung <alex.hung@canonical.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
#
0a07bef6 |
|
10-Mar-2020 |
Guilherme G. Piccoli <gpiccoli@canonical.com> |
Documentation: Better document the softlockup_panic sysctl Commit 9c44bc03fff4 ("softlockup: allow panic on lockup") added the softlockup_panic sysctl, but didn't add information about it to the file Documentation/admin-guide/sysctl/kernel.rst (which in that time certainly wasn't rst and had other name!). This patch just adds the respective documentation and references it from the corresponding entry in Documentation/admin-guide/kernel-parameters.txt. This patch was strongly based on Scott Wood's commit d22881dc13b6 ("Documentation: Better document the hardlockup_panic sysctl"). Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Guilherme G. Piccoli <gpiccoli@canonical.com> Link: https://lore.kernel.org/r/20200310183649.23163-1-gpiccoli@canonical.com Signed-off-by: Jonathan Corbet <corbet@lwn.net>
|
#
05289b90 |
|
21-Feb-2020 |
Thara Gopinath <thara.gopinath@linaro.org> |
sched/fair: Enable tuning of decay period Thermal pressure follows pelt signals which means the decay period for thermal pressure is the default pelt decay period. Depending on SoC characteristics and thermal activity, it might be beneficial to decay thermal pressure slower, but still in-tune with the pelt signals. One way to achieve this is to provide a command line parameter to set a decay shift parameter to an integer between 0 and 10. Signed-off-by: Thara Gopinath <thara.gopinath@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lkml.kernel.org/r/20200222005213.3873-10-thara.gopinath@linaro.org
|
#
e94f62b7 |
|
21-Feb-2020 |
Saravana Kannan <saravanak@google.com> |
of: property: Delete of_devlink kernel commandline option With the addition of fw_devlink kernel commandline option, of_devlink is redundant and not useful anymore. So, delete it. Acked-by: Rob Herring <robh@kernel.org> Signed-off-by: Saravana Kannan <saravanak@google.com> Link: https://lore.kernel.org/r/20200222014038.180923-6-saravanak@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
8375e74f |
|
21-Feb-2020 |
Saravana Kannan <saravanak@google.com> |
driver core: Add fw_devlink kernel commandline option fwnode_operations.add_links allows creating device links from information provided by firmware. fwnode_operations.add_links is currently implemented only by OF/devicetree code and a specific case of efi. However, there's nothing preventing ACPI or other firmware types from implementing it. The OF implementation is currently controlled by a kernel commandline parameter called of_devlink. Since this feature is generic isn't limited to OF, add a generic fw_devlink kernel commandline parameter to control this feature across firmware types. Signed-off-by: Saravana Kannan <saravanak@google.com> Link: https://lore.kernel.org/r/20200222014038.180923-3-saravanak@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
3eb30c51 |
|
12-Feb-2020 |
Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> |
Documentation: nfsroot.rst: Fix references to nfsroot.rst When converting and moving nfsroot.txt to nfsroot.rst the references to the old text file was not updated to match the change, fix this. Fixes: f9a9349846f92b2d ("Documentation: nfsroot.txt: convert to ReST") Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Link: https://lore.kernel.org/r/20200212181332.520545-1-niklas.soderlund+renesas@ragnatech.se Signed-off-by: Jonathan Corbet <corbet@lwn.net>
|
#
7fe068db |
|
29-Feb-2020 |
Jonathan Neuschäfer <j.neuschaefer@gmx.net> |
docs: admin-guide: kernel-parameters: Document earlycon options for i.MX UARTs drivers/tty/serial/imx.c implements these earlycon options. Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net> Link: https://lore.kernel.org/r/20200229132750.2783-1-j.neuschaefer@gmx.net Signed-off-by: Jonathan Corbet <corbet@lwn.net>
|
#
ecdc5d84 |
|
23-Oct-2019 |
Vasily Gorbik <gor@linux.ibm.com> |
s390/protvirt: introduce host side setup Add "prot_virt" command line option which controls if the kernel protected VMs support is enabled at early boot time. This has to be done early, because it needs large amounts of memory and will disable some features like STP time sync for the lpar. Extend ultravisor info definitions and expose it via uv_info struct filled in during startup. Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> [borntraeger@de.ibm.com: patch merging, splitting, fixing] Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
|
#
bf347b9d |
|
19-Feb-2020 |
Alex Hung <alex.hung@canonical.com> |
Documentation: fix a typo for intel_iommu=nobounce "untrusted" was mis-spelled as "unstrusted" Signed-off-by: Alex Hung <alex.hung@canonical.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
|
#
8171d3e0 |
|
06-Dec-2019 |
Paul E. McKenney <paulmck@kernel.org> |
torture: Allow disabling of boottime CPU-hotplug torture operations In theory, RCU-hotplug operations are supposed to work as soon as there is more than one CPU online. However, in practice, in normal production there is no way to make them happen until userspace is up and running. Besides which, on smaller systems, rcutorture doesn't start doing hotplug operations until 30 seconds after the start of boot, which on most systems also means the better part of 30 seconds after the end of boot. This commit therefore provides a new torture.disable_onoff_at_boot kernel boot parameter that suppresses CPU-hotplug torture operations until about the time that init is spawned. Of course, if you know of a need for boottime CPU-hotplug operations, then you should avoid passing this argument to any of the torture tests. You might also want to look at the splats linked to below. Link: https://lore.kernel.org/lkml/20191206185208.GA25636@paulmck-ThinkPad-P72/ Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
58c53360 |
|
05-Dec-2019 |
Paul E. McKenney <paulmck@kernel.org> |
rcutorture: Allow boottime stall warnings to be suppressed In normal production, an RCU CPU stall warning at boottime is often just as bad as at any other time. In fact, given the desire for fast boot, any sort of long-term stall at boot is a bad idea. However, heavy rcutorture testing on large hyperthreaded systems can generate boottime RCU CPU stalls as a matter of course. This commit therefore provides a kernel boot parameter that suppresses reporting of boottime RCU CPU stall warnings and similarly of rcutorture writer stalls. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
b2b00ddf |
|
30-Oct-2019 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: React to callback overload by aggressively seeking quiescent states In default configutions, RCU currently waits at least 100 milliseconds before asking cond_resched() and/or resched_rcu() for help seeking quiescent states to end a grace period. But 100 milliseconds can be one good long time during an RCU callback flood, for example, as can happen when user processes repeatedly open and close files in a tight loop. These 100-millisecond gaps in successive grace periods during a callback flood can result in excessive numbers of callbacks piling up, unnecessarily increasing memory footprint. This commit therefore asks cond_resched() and/or resched_rcu() for help as early as the first FQS scan when at least one of the CPUs has more than 20,000 callbacks queued, a number that can be changed using the new rcutree.qovld kernel boot parameter. An auxiliary qovld_calc variable is used to avoid acquisition of locks that have not yet been initialized. Early tests indicate that this reduces the RCU-callback memory footprint during rcutorture floods by from 50% to 4x, depending on configuration. Reported-by: Joel Fernandes (Google) <joel@joelfernandes.org> Reported-by: Tejun Heo <tj@kernel.org> [ paulmck: Fix bug located by Qian Cai. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Tested-by: Dexuan Cui <decui@microsoft.com> Tested-by: Qian Cai <cai@lca.pw>
|
#
6650cdd9 |
|
26-Jan-2020 |
Peter Zijlstra (Intel) <peterz@infradead.org> |
x86/split_lock: Enable split lock detection by kernel A split-lock occurs when an atomic instruction operates on data that spans two cache lines. In order to maintain atomicity the core takes a global bus lock. This is typically >1000 cycles slower than an atomic operation within a cache line. It also disrupts performance on other cores (which must wait for the bus lock to be released before their memory operations can complete). For real-time systems this may mean missing deadlines. For other systems it may just be very annoying. Some CPUs have the capability to raise an #AC trap when a split lock is attempted. Provide a command line option to give the user choices on how to handle this: split_lock_detect= off - not enabled (no traps for split locks) warn - warn once when an application does a split lock, but allow it to continue running. fatal - Send SIGBUS to applications that cause split lock On systems that support split lock detection the default is "warn". Note that if the kernel hits a split lock in any mode other than "off" it will OOPs. One implementation wrinkle is that the MSR to control the split lock detection is per-core, not per thread. This might result in some short lived races on HT systems in "warn" mode if Linux tries to enable on one thread while disabling on the other. Race analysis by Sean Christopherson: - Toggling of split-lock is only done in "warn" mode. Worst case scenario of a race is that a misbehaving task will generate multiple #AC exceptions on the same instruction. And this race will only occur if both siblings are running tasks that generate split-lock #ACs, e.g. a race where sibling threads are writing different values will only occur if CPUx is disabling split-lock after an #AC and CPUy is re-enabling split-lock after *its* previous task generated an #AC. - Transitioning between off/warn/fatal modes at runtime isn't supported and disabling is tracked per task, so hardware will always reach a steady state that matches the configured mode. I.e. split-lock is guaranteed to be enabled in hardware once all _TIF_SLD threads have been scheduled out. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Co-developed-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Co-developed-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20200126200535.GB30377@agluck-desk2.amr.corp.intel.com
|
#
fb251124 |
|
18-Feb-2020 |
Jonathan Neuschäfer <j.neuschaefer@gmx.net> |
docs: Fix path to MTD command line partition parser cmdlinepart.c has been moved to drivers/mtd/parsers/. Fixes: a3f12a35c91d ("mtd: parsers: Move CMDLINE parser") Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
|
#
e9c38f9f |
|
08-Jan-2020 |
Stephen Smalley <sds@tycho.nsa.gov> |
Documentation,selinux: deprecate setting checkreqprot to 1 Deprecate setting the SELinux checkreqprot tunable to 1 via kernel parameter or /sys/fs/selinux/checkreqprot. Setting it to 0 is left intact for compatibility since Android and some Linux distributions do so for security and treat an inability to set it as a fatal error. Eventually setting it to 0 will become a no-op and the kernel will stop using checkreqprot's value internally altogether. checkreqprot was originally introduced as a compatibility mechanism for legacy userspace and the READ_IMPLIES_EXEC personality flag. However, if set to 1, it weakens security by allowing mappings to be made executable without authorization by policy. The default value for the SECURITY_SELINUX_CHECKREQPROT_VALUE config option was changed from 1 to 0 in commit 2a35d196c160e3 ("selinux: change CONFIG_SECURITY_SELINUX_CHECKREQPROT_VALUE default") and both Android and Linux distributions began explicitly setting /sys/fs/selinux/checkreqprot to 0 some time ago. Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Paul Moore <paul@paul-moore.com>
|
#
3f9e12e0 |
|
06-Feb-2020 |
Jean Delvare <jdelvare@suse.de> |
ACPI: watchdog: Allow disabling WDAT at boot In case the WDAT interface is broken, give the user an option to ignore it to let a native driver bind to the watchdog device instead. Signed-off-by: Jean Delvare <jdelvare@suse.de> Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
#
7495e092 |
|
04-Feb-2020 |
Steven Rostedt (VMware) <rostedt@goodmis.org> |
bootconfig: Only load bootconfig if "bootconfig" is on the kernel cmdline As the bootconfig is appended to the initrd it is not as easy to modify as the kernel command line. If there's some issue with the kernel, and the developer wants to boot a pristine kernel, it should not be needed to modify the initrd to remove the bootconfig for a single boot. As bootconfig is silently added (if the admin does not know where to look they may not know it's being loaded). It should be explicitly added to the kernel cmdline. The loading of the bootconfig is only done if "bootconfig" is on the kernel command line. This will let admins know that the kernel command line is extended. Note, after adding printk()s for when the size is too great or the checksum is wrong, exposed that the current method always looked for the boot config, and if this size and checksum matched, it would parse it (as if either is wrong a printk has been added to show this). It's better to only check this if the boot config is asked to be looked for. Link: https://lore.kernel.org/r/CAHk-=wjfjO+h6bQzrTf=YCZA53Y3EDyAs3Z4gEsT7icA3u_Psw@mail.gmail.com Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
#
c65e6815 |
|
30-Jan-2020 |
Mikhail Zaslonko <zaslonko@linux.ibm.com> |
s390/boot: add dfltcc= kernel command line parameter Add the new kernel command line parameter 'dfltcc=' to configure s390 zlib hardware support. Format: { on | off | def_only | inf_only | always } on: s390 zlib hardware support for compression on level 1 and decompression (default) off: No s390 zlib hardware support def_only: s390 zlib hardware support for deflate only (compression on level 1) inf_only: s390 zlib hardware support for inflate only (decompression) always: Same as 'on' but ignores the selected compression level always using hardware support (used for debugging) Link: http://lkml.kernel.org/r/20200103223334.20669-5-zaslonko@linux.ibm.com Signed-off-by: Mikhail Zaslonko <zaslonko@linux.ibm.com> Cc: Chris Mason <clm@fb.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: David Sterba <dsterba@suse.com> Cc: Eduard Shishkin <edward6@linux.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Ilya Leoshkevich <iii@linux.ibm.com> Cc: Josef Bacik <josef@toxicpanda.com> Cc: Richard Purdie <rpurdie@rpsys.net> Cc: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
189a6883 |
|
29-Aug-2019 |
Joel Fernandes (Google) <joel@joelfernandes.org> |
rcu: Remove kfree_call_rcu_nobatch() Now that the kfree_rcu() special-casing has been removed from tree RCU, this commit removes kfree_call_rcu_nobatch() since it is no longer needed. Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
e6e78b00 |
|
29-Aug-2019 |
Joel Fernandes (Google) <joel@joelfernandes.org> |
rcuperf: Add kfree_rcu() performance Tests This test runs kfree_rcu() in a loop to measure performance of the new kfree_rcu() batching functionality. The following table shows results when booting with arguments: rcuperf.kfree_loops=20000 rcuperf.kfree_alloc_num=8000 rcuperf.kfree_rcu_test=1 rcuperf.kfree_no_batch=X rcuperf.kfree_no_batch=X # Grace Periods Test Duration (s) X=1 (old behavior) 9133 11.5 X=0 (new behavior) 1732 12.5 On a 16 CPU system with the above boot parameters, we see that the total number of grace periods that elapse during the test drops from 9133 when not batching to 1732 when batching (a 5X improvement). The kfree_rcu() flood itself slows down a bit when batching, though, as shown. Note that the active memory consumption during the kfree_rcu() flood does increase to around 200-250MB due to the batching (from around 50MB without batching). However, this memory consumption is relatively constant. In other words, the system is able to keep up with the kfree_rcu() load. The memory consumption comes down considerably if KFREE_DRAIN_JIFFIES is increased from HZ/50 to HZ/80. A later patch will reduce memory consumption further by using multiple lists. Also, when running the test, please disable CONFIG_DEBUG_PREEMPT and CONFIG_PROVE_RCU for realistic comparisons with/without batching. Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
11ea68f5 |
|
20-Jan-2020 |
Ming Lei <ming.lei@redhat.com> |
genirq, sched/isolation: Isolate from handling managed interrupts The affinity of managed interrupts is completely handled in the kernel and cannot be changed via the /proc/irq/* interfaces from user space. As the kernel tries to spread out interrupts evenly accross CPUs on x86 to prevent vector exhaustion, it can happen that a managed interrupt whose affinity mask contains both isolated and housekeeping CPUs is routed to an isolated CPU. As a consequence IO submitted on a housekeeping CPU causes interrupts on the isolated CPU. Add a new sub-parameter 'managed_irq' for 'isolcpus' and the corresponding logic in the interrupt affinity selection code. The subparameter indicates to the interrupt affinity selection logic that it should try to avoid the above scenario. This isolation is best effort and only effective if the automatically assigned interrupt mask of a device queue contains isolated and housekeeping CPUs. If housekeeping CPUs are online then such interrupts are directed to the housekeeping CPU so that IO submitted on the housekeeping CPU cannot disturb the isolated CPU. If a queue's affinity mask contains only isolated CPUs then this parameter has no effect on the interrupt routing decision, though interrupts are only happening when tasks running on those isolated CPUs submit IO. IO submitted on housekeeping CPUs has no influence on those queues. If the affinity mask contains both housekeeping and isolated CPUs, but none of the contained housekeeping CPUs is online, then the interrupt is also routed to an isolated CPU. Interrupts are only delivered when one of the isolated CPUs in the affinity mask submits IO. If one of the contained housekeeping CPUs comes online, the CPU hotplug logic migrates the interrupt automatically back to the upcoming housekeeping CPU. Depending on the type of interrupt controller, this can require that at least one interrupt is delivered to the isolated CPU in order to complete the migration. [ tglx: Removed unused parameter, added and edited comments/documentation and rephrased the changelog so it contains more details. ] Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20200120091625.17912-1-ming.lei@redhat.com
|
#
1f299fad |
|
13-Jan-2020 |
Ard Biesheuvel <ardb@kernel.org> |
efi/x86: Limit EFI old memory map to SGI UV machines We carry a quirk in the x86 EFI code to switch back to an older method of mapping the EFI runtime services memory regions, because it was deemed risky at the time to implement a new method without providing a fallback to the old method in case problems arose. Such problems did arise, but they appear to be limited to SGI UV1 machines, and so these are the only ones for which the fallback gets enabled automatically (via a DMI quirk). The fallback can be enabled manually as well, by passing efi=old_map, but there is very little evidence that suggests that this is something that is being relied upon in the field. Given that UV1 support is not enabled by default by the distros (Ubuntu, Fedora), there is no point in carrying this fallback code all the time if there are no other users. So let's move it into the UV support code, and document that efi=old_map now requires this support code to be enabled. Note that efi=old_map has been used in the past on other SGI UV machines to work around kernel regressions in production, so we keep the option to enable it by hand, but only if the kernel was built with UV support. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20200113172245.27925-8-ardb@kernel.org
|
#
4444f854 |
|
02-Jan-2020 |
Matthew Garrett <matthewgarrett@google.com> |
efi: Allow disabling PCI busmastering on bridges during boot Add an option to disable the busmaster bit in the control register on all PCI bridges before calling ExitBootServices() and passing control to the runtime kernel. System firmware may configure the IOMMU to prevent malicious PCI devices from being able to attack the OS via DMA. However, since firmware can't guarantee that the OS is IOMMU-aware, it will tear down IOMMU configuration when ExitBootServices() is called. This leaves a window between where a hostile device could still cause damage before Linux configures the IOMMU again. If CONFIG_EFI_DISABLE_PCI_DMA is enabled or "efi=disable_early_pci_dma" is passed on the command line, the EFI stub will clear the busmaster bit on all PCI bridges before ExitBootServices() is called. This will prevent any malicious PCI devices from being able to perform DMA until the kernel reenables busmastering after configuring the IOMMU. This option may cause failures with some poorly behaved hardware and should not be enabled without testing. The kernel commandline options "efi=disable_early_pci_dma" or "efi=no_disable_early_pci_dma" may be used to override the default. Note that PCI devices downstream from PCI bridges are disconnected from their drivers first, using the UEFI driver model API, so that DMA can be disabled safely at the bridge level. [ardb: disconnect PCI I/O handles first, as suggested by Arvind] Co-developed-by: Matthew Garrett <mjg59@google.com> Signed-off-by: Matthew Garrett <mjg59@google.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Arvind Sankar <nivedita@alum.mit.edu> Cc: Matthew Garrett <matthewgarrett@google.com> Cc: linux-efi@vger.kernel.org Link: https://lkml.kernel.org/r/20200103113953.9571-18-ardb@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
d41415eb |
|
07-Jan-2020 |
Stephen Smalley <sds@tycho.nsa.gov> |
Documentation,selinux: fix references to old selinuxfs mount point selinuxfs was originally mounted on /selinux, and various docs and kconfig help texts referred to nodes under it. In Linux 3.0, /sys/fs/selinux was introduced as the preferred mount point for selinuxfs. Fix all the old references to /selinux/ to /sys/fs/selinux/. While we are there, update the description of the selinux boot parameter to reflect the fact that the default value is always 1 since commit be6ec88f41ba94 ("selinux: Remove SECURITY_SELINUX_BOOTPARAM_VALUE") and drop discussion of runtime disable since it is deprecated. Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Paul Moore <paul@paul-moore.com>
|
#
e3fedd57 |
|
22-Nov-2019 |
Masami Hiramatsu <mhiramat@kernel.org> |
Documentation: Remove bootmem_debug from kernel-parameters.txt Remove bootmem_debug kernel paramenter because it has been replaced by memblock=debug. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jonathan Corbet <corbet@lwn.net> Link: https://lore.kernel.org/r/157443061745.20995.9432492850513217966.stgit@devnote2 Signed-off-by: Jonathan Corbet <corbet@lwn.net>
|
#
a7583e72 |
|
14-Nov-2019 |
Yunfeng Ye <yeyunfeng@huawei.com> |
ACPI: sysfs: Change ACPI_MASKABLE_GPE_MAX to 0x100 The commit 0f27cff8597d ("ACPI: sysfs: Make ACPI GPE mask kernel parameter cover all GPEs") says: "Use a bitmap of size 0xFF instead of a u64 for the GPE mask so 256 GPEs can be masked" But the masking of GPE 0xFF it not supported and the check condition "gpe > ACPI_MASKABLE_GPE_MAX" is not valid because the type of gpe is u8. So modify the macro ACPI_MASKABLE_GPE_MAX to 0x100, and drop the "gpe > ACPI_MASKABLE_GPE_MAX" check. In addition, update the docs "Format" for acpi_mask_gpe parameter. Fixes: 0f27cff8597d ("ACPI: sysfs: Make ACPI GPE mask kernel parameter cover all GPEs") Signed-off-by: Yunfeng Ye <yeyunfeng@huawei.com> [ rjw: Use u16 as gpe data type in acpi_gpe_apply_masked_gpes() ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
#
65cc8bf9 |
|
13-Nov-2019 |
Oliver Neukum <oneukum@suse.com> |
USB: documentation: flags on usb-storage versus UAS Document which flags work storage, UAS or both Signed-off-by: Oliver Neukum <oneukum@suse.com> Cc: stable <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20191114112758.32747-4-oneukum@suse.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
64870ed1 |
|
15-Nov-2019 |
Waiman Long <longman@redhat.com> |
x86/speculation: Fix incorrect MDS/TAA mitigation status For MDS vulnerable processors with TSX support, enabling either MDS or TAA mitigations will enable the use of VERW to flush internal processor buffers at the right code path. IOW, they are either both mitigated or both not. However, if the command line options are inconsistent, the vulnerabilites sysfs files may not report the mitigation status correctly. For example, with only the "mds=off" option: vulnerabilities/mds:Vulnerable; SMT vulnerable vulnerabilities/tsx_async_abort:Mitigation: Clear CPU buffers; SMT vulnerable The mds vulnerabilities file has wrong status in this case. Similarly, the taa vulnerability file will be wrong with mds mitigation on, but taa off. Change taa_select_mitigation() to sync up the two mitigation status and have them turned off if both "mds=off" and "tsx_async_abort=off" are present. Update documentation to emphasize the fact that both "mds=off" and "tsx_async_abort=off" have to be specified together for processors that are affected by both TAA and MDS to be effective. [ bp: Massage and add kernel-parameters.txt change too. ] Fixes: 1b42f017415b ("x86/speculation/taa: Add mitigation for TSX Async Abort") Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: linux-doc@vger.kernel.org Cc: Mark Gross <mgross@linux.intel.com> Cc: <stable@vger.kernel.org> Cc: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Tyler Hicks <tyhicks@canonical.com> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20191115161445.30809-2-longman@redhat.com
|
#
14d3fe42 |
|
12-Nov-2019 |
Jonathan Corbet <corbet@lwn.net> |
Revert "Documentation: admin-guide: add earlycon documentation for RISC-V" This reverts commit 7f70ae564b807f81263326d641514c6dca88e5ef. Christoph H. notes that the information is redundant, and Paul W. agrees with reverting. Signed-off-by: Jonathan Corbet <corbet@lwn.net>
|
#
199c8471 |
|
06-Nov-2019 |
Dan Williams <dan.j.williams@intel.com> |
x86/efi: Add efi_fake_mem support for EFI_MEMORY_SP Given that EFI_MEMORY_SP is platform BIOS policy decision for marking memory ranges as "reserved for a specific purpose" there will inevitably be scenarios where the BIOS omits the attribute in situations where it is desired. Unlike other attributes if the OS wants to reserve this memory from the kernel the reservation needs to happen early in init. So early, in fact, that it needs to happen before e820__memblock_setup() which is a pre-requisite for efi_fake_memmap() that wants to allocate memory for the updated table. Introduce an x86 specific efi_fake_memmap_early() that can search for attempts to set EFI_MEMORY_SP via efi_fake_mem and update the e820 table accordingly. The KASLR code that scans the command line looking for user-directed memory reservations also needs to be updated to consider "efi_fake_mem=nn@ss:0x40000" requests. Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
#
b617c526 |
|
06-Nov-2019 |
Dan Williams <dan.j.williams@intel.com> |
efi: Common enable/disable infrastructure for EFI soft reservation UEFI 2.8 defines an EFI_MEMORY_SP attribute bit to augment the interpretation of the EFI Memory Types as "reserved for a specific purpose". The proposed Linux behavior for specific purpose memory is that it is reserved for direct-access (device-dax) by default and not available for any kernel usage, not even as an OOM fallback. Later, through udev scripts or another init mechanism, these device-dax claimed ranges can be reconfigured and hot-added to the available System-RAM with a unique node identifier. This device-dax management scheme implements "soft" in the "soft reserved" designation by allowing some or all of the reservation to be recovered as typical memory. This policy can be disabled at compile-time with CONFIG_EFI_SOFT_RESERVE=n, or runtime with efi=nosoftreserve. As for this patch, define the common helpers to determine if the EFI_MEMORY_SP attribute should be honored. The determination needs to be made early to prevent the kernel from being loaded into soft-reserved memory, or otherwise allowing early allocations to land there. Follow-on changes are needed per architecture to leverage these helpers in their respective mem-init paths. Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
#
1aa9b957 |
|
04-Nov-2019 |
Junaid Shahid <junaids@google.com> |
kvm: x86: mmu: Recovery of shattered NX large pages The page table pages corresponding to broken down large pages are zapped in FIFO order, so that the large page can potentially be recovered, if it is not longer being used for execution. This removes the performance penalty for walking deeper EPT page tables. By default, one large page will last about one hour once the guest reaches a steady state. Signed-off-by: Junaid Shahid <junaids@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
b8e8c830 |
|
03-Nov-2019 |
Paolo Bonzini <pbonzini@redhat.com> |
kvm: mmu: ITLB_MULTIHIT mitigation With some Intel processors, putting the same virtual address in the TLB as both a 4 KiB and 2 MiB page can confuse the instruction fetch unit and cause the processor to issue a machine check resulting in a CPU lockup. Unfortunately when EPT page tables use huge pages, it is possible for a malicious guest to cause this situation. Add a knob to mark huge pages as non-executable. When the nx_huge_pages parameter is enabled (and we are using EPT), all huge pages are marked as NX. If the guest attempts to execute in one of those pages, the page is broken down into 4K pages, which are then marked executable. This is not an issue for shadow paging (except nested EPT), because then the host is in control of TLB flushes and the problematic situation cannot happen. With nested EPT, again the nested guest can cause problems shadow and direct EPT is treated in the same way. [ tglx: Fixup default to auto and massage wording a bit ] Originally-by: Junaid Shahid <junaids@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
a7a248c5 |
|
22-Oct-2019 |
Pawan Gupta <pawan.kumar.gupta@linux.intel.com> |
x86/speculation/taa: Add documentation for TSX Async Abort Add the documenation for TSX Async Abort. Include the description of the issue, how to check the mitigation state, control the mitigation, guidance for system administrators. [ bp: Add proper SPDX tags, touch ups by Josh and me. ] Co-developed-by: Antonio Gomez Iglesias <antonio.gomez.iglesias@intel.com> Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Antonio Gomez Iglesias <antonio.gomez.iglesias@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Mark Gross <mgross@linux.intel.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
|
#
7531a359 |
|
22-Oct-2019 |
Pawan Gupta <pawan.kumar.gupta@linux.intel.com> |
x86/tsx: Add "auto" option to the tsx= cmdline parameter Platforms which are not affected by X86_BUG_TAA may want the TSX feature enabled. Add "auto" option to the TSX cmdline parameter. When tsx=auto disable TSX when X86_BUG_TAA is present, otherwise enable TSX. More details on X86_BUG_TAA can be found here: https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/tsx_async_abort.html [ bp: Extend the arg buffer to accommodate "auto\0". ] Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
|
#
95c5824f |
|
23-Oct-2019 |
Pawan Gupta <pawan.kumar.gupta@linux.intel.com> |
x86/cpu: Add a "tsx=" cmdline option with TSX disabled by default Add a kernel cmdline parameter "tsx" to control the Transactional Synchronization Extensions (TSX) feature. On CPUs that support TSX control, use "tsx=on|off" to enable or disable TSX. Not specifying this option is equivalent to "tsx=off". This is because on certain processors TSX may be used as a part of a speculative side channel attack. Carve out the TSX controlling functionality into a separate compilation unit because TSX is a CPU feature while the TSX async abort control machinery will go to cpu/bugs.c. [ bp: - Massage, shorten and clear the arg buffer. - Clarifications of the tsx= possible options - Josh. - Expand on TSX_CTRL availability - Pawan. ] Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
|
#
35a0b237 |
|
23-Oct-2019 |
Olof Johansson <olof@lixom.net> |
PCI/DPC: Add "pcie_ports=dpc-native" to allow DPC without AER control Prior to eed85ff4c0da7 ("PCI/DPC: Enable DPC only if AER is available"), Linux handled DPC events regardless of whether firmware had granted it ownership of AER or DPC, e.g., via _OSC. PCIe r5.0, sec 6.2.10, recommends that the OS link control of DPC to control of AER, so after eed85ff4c0da7, Linux handles DPC events only if it has control of AER. On platforms that do not grant OS control of AER via _OSC, Linux DPC handling worked before eed85ff4c0da7 but not after. To make Linux DPC handling work on those platforms the same way they did before, add a "pcie_ports=dpc-native" kernel parameter that makes Linux handle DPC events regardless of whether it has control of AER. [bhelgaas: commit log, move pcie_ports_dpc_native to drivers/pci/] Link: https://lore.kernel.org/r/20191023192205.97024-1-olof@lixom.net Signed-off-by: Olof Johansson <olof@lixom.net> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
#
d7b8a217 |
|
22-Oct-2019 |
Nicholas Johnson <nicholas.johnson-opensource@outlook.com.au> |
PCI: Add "pci=hpmmiosize" and "pci=hpmmioprefsize" parameters The existing "pci=hpmemsize=nn[KMG]" kernel parameter overrides the default size of both the non-prefetchable and the prefetchable MMIO windows for hotplug bridges. Add "pci=hpmmiosize=nn[KMG]" to override the default size of only the non-prefetchable MMIO window. Add "pci=hpmmioprefsize=nn[KMG]" to override the default size of only the prefetchable MMIO window. Link: https://lore.kernel.org/r/SL2P216MB0187E4D0055791957B7E2660806B0@SL2P216MB0187.KORP216.PROD.OUTLOOK.COM Signed-off-by: Nicholas Johnson <nicholas.johnson-opensource@outlook.com.au> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
|
#
e0685fa2 |
|
21-Oct-2019 |
Steven Price <steven.price@arm.com> |
arm64: Retrieve stolen time as paravirtualized guest Enable paravirtualization features when running under a hypervisor supporting the PV_TIME_ST hypercall. For each (v)CPU, we ask the hypervisor for the location of a shared page which the hypervisor will use to report stolen time to us. We set pv_time_ops to the stolen time function which simply reads the stolen value from the shared page for a VCPU. We guarantee single-copy atomicity using READ_ONCE which means we can also read the stolen time for another VCPU than the currently running one while it is potentially being updated by the hypervisor. Signed-off-by: Steven Price <steven.price@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
|
#
9905f32a |
|
16-Oct-2019 |
Stefan-Gabriel Mirea <stefan-gabriel.mirea@nxp.com> |
serial: fsl_linflexuart: Be consistent with the name For consistency reasons, spell the controller name as "LINFlexD" in comments and documentation. Signed-off-by: Stefan-Gabriel Mirea <stefan-gabriel.mirea@nxp.com> Link: https://lore.kernel.org/r/1571230107-8493-4-git-send-email-stefan-gabriel.mirea@nxp.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
7f70ae56 |
|
09-Oct-2019 |
Paul Walmsley <paul.walmsley@sifive.com> |
Documentation: admin-guide: add earlycon documentation for RISC-V Kernels booting on RISC-V can specify "earlycon" with no options on the Linux command line, and the generic DT earlycon support will query the "chosen/stdout-path" property (if present) to determine which early console device to use. Document this appropriately in the admin-guide. Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Cc: Christoph Hellwig <hch@lst.de> Cc: Andreas Schwab <schwab@suse.de> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
|
#
c6875f3a |
|
30-Sep-2019 |
Boris Ostrovsky <boris.ostrovsky@oracle.com> |
x86/xen: Return from panic notifier Currently execution of panic() continues until Xen's panic notifier (xen_panic_event()) is called at which point we make a hypercall that never returns. This means that any notifier that is supposed to be called later as well as significant part of panic() code (such as pstore writes from kmsg_dump()) is never executed. There is no reason for xen_panic_event() to be this last point in execution since panic()'s emergency_restart() will call into xen_emergency_restart() from where we can perform our hypercall. Nevertheless, we will provide xen_legacy_crash boot option that will preserve original behavior during crash. This option could be used, for example, if running kernel dumper (which happens after panic notifiers) is undesirable. Reported-by: James Dingwall <james@dingwall.me.uk> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Juergen Gross <jgross@suse.com>
|
#
a3e1d1a7 |
|
04-Sep-2019 |
Saravana Kannan <saravanak@google.com> |
of: property: Add functional dependency link from DT bindings Add device links after the devices are created (but before they are probed) by looking at common DT bindings like clocks and interconnects. Automatically adding device links for functional dependencies at the framework level provides the following benefits: - Optimizes device probe order and avoids the useless work of attempting probes of devices that will not probe successfully (because their suppliers aren't present or haven't probed yet). For example, in a commonly available mobile SoC, registering just one consumer device's driver at an initcall level earlier than the supplier device's driver causes 11 failed probe attempts before the consumer device probes successfully. This was with a kernel with all the drivers statically compiled in. This problem gets a lot worse if all the drivers are loaded as modules without direct symbol dependencies. - Supplier devices like clock providers, interconnect providers, etc need to keep the resources they provide active and at a particular state(s) during boot up even if their current set of consumers don't request the resource to be active. This is because the rest of the consumers might not have probed yet and turning off the resource before all the consumers have probed could lead to a hang or undesired user experience. Some frameworks (Eg: regulator) handle this today by turning off "unused" resources at late_initcall_sync and hoping all the devices have probed by then. This is not a valid assumption for systems with loadable modules. Other frameworks (Eg: clock) just don't handle this due to the lack of a clear signal for when they can turn off resources. This leads to downstream hacks to handle cases like this that can easily be solved in the upstream kernel. By linking devices before they are probed, we give suppliers a clear count of the number of dependent consumers. Once all of the consumers are active, the suppliers can turn off the unused resources without making assumptions about the number of consumers. By default we just add device-links to track "driver presence" (probe succeeded) of the supplier device. If any other functionality provided by device-links are needed, it is left to the consumer/supplier devices to change the link when they probe. kbuild test robot reported clang error about missing const Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Saravana Kannan <saravanak@google.com> Link: https://lore.kernel.org/r/20190904211126.47518-4-saravanak@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
e18409c0 |
|
17-Sep-2019 |
Christoph Hellwig <hch@lst.de> |
Documentation: document earlycon without options for more platforms The earlycon options without arguments is supposed to work on all device tree platforms, not just arm64. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
|
#
8974558f |
|
23-Sep-2019 |
Vlastimil Babka <vbabka@suse.cz> |
mm, page_owner, debug_pagealloc: save and dump freeing stack trace The debug_pagealloc functionality is useful to catch buggy page allocator users that cause e.g. use after free or double free. When page inconsistency is detected, debugging is often simpler by knowing the call stack of process that last allocated and freed the page. When page_owner is also enabled, we record the allocation stack trace, but not freeing. This patch therefore adds recording of freeing process stack trace to page owner info, if both page_owner and debug_pagealloc are configured and enabled. With only page_owner enabled, this info is not useful for the memory leak debugging use case. dump_page() is adjusted to print the info. An example result of calling __free_pages() twice may look like this (note the page last free stack trace): BUG: Bad page state in process bash pfn:13d8f8 page:ffffc31984f63e00 refcount:-1 mapcount:0 mapping:0000000000000000 index:0x0 flags: 0x1affff800000000() raw: 01affff800000000 dead000000000100 dead000000000122 0000000000000000 raw: 0000000000000000 0000000000000000 ffffffffffffffff 0000000000000000 page dumped because: nonzero _refcount page_owner tracks the page as freed page last allocated via order 0, migratetype Unmovable, gfp_mask 0xcc0(GFP_KERNEL) prep_new_page+0x143/0x150 get_page_from_freelist+0x289/0x380 __alloc_pages_nodemask+0x13c/0x2d0 khugepaged+0x6e/0xc10 kthread+0xf9/0x130 ret_from_fork+0x3a/0x50 page last free stack trace: free_pcp_prepare+0x134/0x1e0 free_unref_page+0x18/0x90 khugepaged+0x7b/0xc10 kthread+0xf9/0x130 ret_from_fork+0x3a/0x50 Modules linked in: CPU: 3 PID: 271 Comm: bash Not tainted 5.3.0-rc4-2.g07a1a73-default+ #57 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58-prebuilt.qemu.org 04/01/2014 Call Trace: dump_stack+0x85/0xc0 bad_page.cold+0xba/0xbf rmqueue_pcplist.isra.0+0x6c5/0x6d0 rmqueue+0x2d/0x810 get_page_from_freelist+0x191/0x380 __alloc_pages_nodemask+0x13c/0x2d0 __get_free_pages+0xd/0x30 __pud_alloc+0x2c/0x110 copy_page_range+0x4f9/0x630 dup_mmap+0x362/0x480 dup_mm+0x68/0x110 copy_process+0x19e1/0x1b40 _do_fork+0x73/0x310 __x64_sys_clone+0x75/0x80 do_syscall_64+0x6e/0x1e0 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x7f10af854a10 ... Link: http://lkml.kernel.org/r/20190820131828.22684-5-vbabka@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
82f12ab3 |
|
13-Sep-2019 |
Palmer Dabbelt <palmer@sifive.com> |
Documentation: Add "earlycon=sbi" to the admin guide This argument is supported on RISC-V systems and widely used, but was not documented here. Signed-off-by: Palmer Dabbelt <palmer@sifive.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
|
#
e5e04d05 |
|
06-Sep-2019 |
Lu Baolu <baolu.lu@linux.intel.com> |
iommu/vt-d: Check whether device requires bounce buffer This adds a helper to check whether a device needs to use bounce buffer. It also provides a boot time option to disable the bounce buffer. Users can use this to prevent the iommu driver from using the bounce buffer for performance gain. Cc: Ashok Raj <ashok.raj@intel.com> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Cc: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Tested-by: Xu Pengfei <pengfei.xu@intel.com> Tested-by: Mika Westerberg <mika.westerberg@intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
#
2275d7b5 |
|
02-Sep-2019 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc/64s/radix: introduce options to disable use of the tlbie instruction Introduce two options to control the use of the tlbie instruction. A boot time option which completely disables the kernel using the instruction, this is currently incompatible with HASH MMU, KVM, and coherent accelerators. And a debugfs option can be switched at runtime and avoids using tlbie for invalidating CPU TLBs for normal process and kernel address mappings. Coherent accelerators are still managed with tlbie, as will KVM partition scope translations. Cross-CPU TLB flushing is implemented with IPIs and tlbiel. This is a basic implementation which does not attempt to make any optimisation beyond the tlbie implementation. This is useful for performance testing among other things. For example in certain situations on large systems, using IPIs may be faster than tlbie as they can be directed rather than broadcast. Later we may also take advantage of the IPIs to do more interesting things such as trim the mm cpumask more aggressively. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190902152931.17840-7-npiggin@gmail.com
|
#
09864c1c |
|
09-Aug-2019 |
Stefan-gabriel Mirea <stefan-gabriel.mirea@nxp.com> |
tty: serial: Add linflexuart driver for S32V234 Introduce support for LINFlex driver, based on: - the version of Freescale LPUART driver after commit b3e3bf2ef2c7 ("Merge 4.0-rc7 into tty-next"); - commit abf1e0a98083 ("tty: serial: fsl_lpuart: lock port on console write"). In this basic version, the driver can be tested using initramfs and relies on the clocks and pin muxing set up by U-Boot. Remarks concerning the earlycon support: - LinFlexD does not allow character transmissions in the INIT mode (see section 47.4.2.1 in the reference manual[1]). Therefore, a mutual exclusion between the first linflex_setup_watermark/linflex_set_termios executions and linflex_earlycon_putchar was employed and the characters normally sent to earlycon during initialization are kept in a buffer and sent afterwards. - Empirically, character transmission is also forbidden within the last 1-2 ms before entering the INIT mode, so we use an explicit timeout (PREINIT_DELAY) between linflex_earlycon_putchar and the first call to linflex_setup_watermark. - U-Boot currently uses the UART FIFO mode, while this driver makes the transition to the buffer mode. Therefore, the earlycon putchar function matches the U-Boot behavior before initializations and the Linux behavior after. [1] https://www.nxp.com/webapp/Download?colCode=S32V234RM Signed-off-by: Stoica Cosmin-Stefan <cosmin.stoica@nxp.com> Signed-off-by: Adrian.Nitu <adrian.nitu@freescale.com> Signed-off-by: Larisa Grigore <Larisa.Grigore@nxp.com> Signed-off-by: Ana Nedelcu <B56683@freescale.com> Signed-off-by: Mihaela Martinas <Mihaela.Martinas@freescale.com> Signed-off-by: Matthew Nunez <matthew.nunez@nxp.com> [stefan-gabriel.mirea@nxp.com: Reduced for upstreaming and implemented earlycon support] Signed-off-by: Stefan-Gabriel Mirea <stefan-gabriel.mirea@nxp.com> Link: https://lore.kernel.org/r/20190809112853.15846-6-stefan-gabriel.mirea@nxp.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
85c0a037 |
|
27-Aug-2019 |
Marcos Paulo de Souza <marcos.souza.org@gmail.com> |
block: elevator.c: Remove now unused elevator= argument Since the inclusion of blk-mq, elevator argument was not being considered anymore, and it's utility died long with the legacy IO path, now removed too. Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Bob Liu <bob.liu@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Marcos Paulo de Souza <marcos.souza.org@gmail.com> Fold with doc removal patch. Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
6a9c930b |
|
19-Aug-2019 |
Ram Pai <linuxram@us.ibm.com> |
powerpc/prom_init: Add the ESM call to prom_init Make the Enter-Secure-Mode (ESM) ultravisor call to switch the VM to secure mode. Pass kernel base address and FDT address so that the Ultravisor is able to verify the integrity of the VM using information from the ESM blob. Add "svm=" command line option to turn on switching to secure mode. Signed-off-by: Ram Pai <linuxram@us.ibm.com> [ andmike: Generate an RTAS os-term hcall when the ESM ucall fails. ] Signed-off-by: Michael Anderson <andmike@linux.ibm.com> [ bauerman: Cleaned up the code a bit. ] Signed-off-by: Thiago Jung Bauermann <bauerman@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190820021326.6884-5-bauerman@linux.ibm.com
|
#
d77b3f07 |
|
27-Aug-2019 |
Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
Revert "of/platform: Add functional dependency link from DT bindings" This reverts commit 690ff7881b2681afca1c3c063f4a5cb7c71d9d8b. Based on a lot of email and in-person discussions, this patch series is being reworked to address a number of issues that were pointed out that needed to be taken care of before it should be merged. It will be resubmitted with those changes hopefully soon. Cc: Frank Rowand <frowand.list@gmail.com> Cc: Saravana Kannan <saravanak@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
c8fb436b |
|
19-Aug-2019 |
Joerg Roedel <jroedel@suse.de> |
Documentation: Update Documentation for iommu.passthrough This kernel parameter now takes also effect on X86. Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
#
6278f55b |
|
14-Aug-2019 |
Gustavo Romero <gromero@linux.ibm.com> |
powerpc: Document xmon options Document all options currently supported by xmon debugger. Signed-off-by: Gustavo Romero <gromero@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190814205638.25322-1-gromero@linux.ibm.com
|
#
000d388e |
|
19-Aug-2019 |
Matthew Garrett <matthewgarrett@google.com> |
security: Add a static lockdown policy LSM While existing LSMs can be extended to handle lockdown policy, distributions generally want to be able to apply a straightforward static policy. This patch adds a simple LSM that can be configured to reject either integrity or all lockdown queries, and can be configured at runtime (through securityfs), boot time (via a kernel parameter) or build time (via a kconfig option). Based on initial code by David Howells. Signed-off-by: Matthew Garrett <mjg59@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: David Howells <dhowells@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>
|
#
c49a0a80 |
|
19-Aug-2019 |
Tom Lendacky <thomas.lendacky@amd.com> |
x86/CPU/AMD: Clear RDRAND CPUID bit on AMD family 15h/16h There have been reports of RDRAND issues after resuming from suspend on some AMD family 15h and family 16h systems. This issue stems from a BIOS not performing the proper steps during resume to ensure RDRAND continues to function properly. RDRAND support is indicated by CPUID Fn00000001_ECX[30]. This bit can be reset by clearing MSR C001_1004[62]. Any software that checks for RDRAND support using CPUID, including the kernel, will believe that RDRAND is not supported. Update the CPU initialization to clear the RDRAND CPUID bit for any family 15h and 16h processor that supports RDRAND. If it is known that the family 15h or family 16h system does not have an RDRAND resume issue or that the system will not be placed in suspend, the "rdrand=force" kernel parameter can be used to stop the clearing of the RDRAND CPUID bit. Additionally, update the suspend and resume path to save and restore the MSR C001_1004 value to ensure that the RDRAND CPUID setting remains in place after resuming from suspend. Note, that clearing the RDRAND CPUID bit does not prevent a processor that normally supports the RDRAND instruction from executing it. So any code that determined the support based on family and model won't #UD. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Chen Yu <yu.c.chen@intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <keescook@chromium.org> Cc: "linux-doc@vger.kernel.org" <linux-doc@vger.kernel.org> Cc: "linux-pm@vger.kernel.org" <linux-pm@vger.kernel.org> Cc: Nathan Chancellor <natechancellor@gmail.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: <stable@vger.kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "x86@kernel.org" <x86@kernel.org> Link: https://lkml.kernel.org/r/7543af91666f491547bd86cebb1e17c66824ab9f.1566229943.git.thomas.lendacky@amd.com
|
#
df43acac |
|
13-Aug-2019 |
Christoph Hellwig <hch@lst.de> |
ia64: remove the zx1 swiotlb machvec The aim of this machvec is to support devices with < 32-bit dma masks. But given that ia64 only has a ZONE_DMA32 and not a ZONE_DMA that isn't supported by swiotlb either. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lkml.kernel.org/r/20190813072514.23299-21-hch@lst.de Signed-off-by: Tony Luck <tony.luck@intel.com>
|
#
f7c612b0 |
|
02-Apr-2019 |
Paul E. McKenney <paulmck@kernel.org> |
rcu/nocb: Rename rcu_nocb_leader_stride kernel boot parameter This commit changes the name of the rcu_nocb_leader_stride kernel boot parameter to rcu_nocb_gp_stride in order to account for the new distinction between callback and grace-period no-CBs kthreads. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
|
#
3b1b1ce3 |
|
05-Jun-2019 |
Alexey Kardashevskiy <aik@ozlabs.ru> |
PCI: Correct pci=resource_alignment parameter example The "pci=resource_alignment" parameter is described as requiring an order (not a size) and the code in pci_specified_resource_alignment() expects an order. But the example wrongly shows a size. Convert the example to an order. Fixes: 8b078c603249 ("PCI: Update "pci=resource_alignment" documentation") Link: https://lore.kernel.org/r/20190606032557.107542-1-aik@ozlabs.ru Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
#
cdc694b2 |
|
13-Jun-2019 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Add kernel parameter to dump trace after RCU CPU stall warning This commit adds a rcu_cpu_stall_ftrace_dump kernel boot parameter, that, when set, causes the trace buffer to be dumped after an RCU CPU stall warning is printed. This kernel boot parameter is disabled by default, maintaining compatibility with previous behavior. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
|
#
690ff788 |
|
31-Jul-2019 |
Saravana Kannan <saravanak@google.com> |
of/platform: Add functional dependency link from DT bindings Add device-links after the devices are created (but before they are probed) by looking at common DT bindings like clocks and interconnects. Automatically adding device-links for functional dependencies at the framework level provides the following benefits: - Optimizes device probe order and avoids the useless work of attempting probes of devices that will not probe successfully (because their suppliers aren't present or haven't probed yet). For example, in a commonly available mobile SoC, registering just one consumer device's driver at an initcall level earlier than the supplier device's driver causes 11 failed probe attempts before the consumer device probes successfully. This was with a kernel with all the drivers statically compiled in. This problem gets a lot worse if all the drivers are loaded as modules without direct symbol dependencies. - Supplier devices like clock providers, interconnect providers, etc need to keep the resources they provide active and at a particular state(s) during boot up even if their current set of consumers don't request the resource to be active. This is because the rest of the consumers might not have probed yet and turning off the resource before all the consumers have probed could lead to a hang or undesired user experience. Some frameworks (Eg: regulator) handle this today by turning off "unused" resources at late_initcall_sync and hoping all the devices have probed by then. This is not a valid assumption for systems with loadable modules. Other frameworks (Eg: clock) just don't handle this due to the lack of a clear signal for when they can turn off resources. This leads to downstream hacks to handle cases like this that can easily be solved in the upstream kernel. By linking devices before they are probed, we give suppliers a clear count of the number of dependent consumers. Once all of the consumers are active, the suppliers can turn off the unused resources without making assumptions about the number of consumers. By default we just add device-links to track "driver presence" (probe succeeded) of the supplier device. If any other functionality provided by device-links are needed, it is left to the consumer/supplier devices to change the link when they probe. kbuild test robot reported clang error about missing const Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Saravana Kannan <saravanak@google.com> Link: https://lore.kernel.org/r/20190731221721.187713-4-saravanak@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
2f5947df |
|
24-Jul-2019 |
Christoph Hellwig <hch@lst.de> |
Documentation: move Documentation/virtual to Documentation/virt Renaming docs seems to be en vogue at the moment, so fix on of the grossly misnamed directories. We usually never use "virtual" as a shortcut for virtualization in the kernel, but always virt, as seen in the virt/ top-level directory. Fix up the documentation to match that. Fixes: ed16648eb5b8 ("Move kvm, uml, and lguest subdirectories under a common "virtual" directory, I.E:") Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
b39b0497 |
|
11-Jul-2019 |
Zhenzhong Duan <zhenzhong.duan@oracle.com> |
xen: Map "xen_nopv" parameter to "nopv" and mark it obsolete Clean up unnecessory code after that operation. Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Juergen Gross <jgross@suse.com> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Signed-off-by: Juergen Gross <jgross@suse.com>
|
#
30978346 |
|
11-Jul-2019 |
Zhenzhong Duan <zhenzhong.duan@oracle.com> |
x86: Add "nopv" parameter to disable PV extensions In virtualization environment, PV extensions (drivers, interrupts, timers, etc) are enabled in the majority of use cases which is the best option. However, in some cases (kexec not fully working, benchmarking) we want to disable PV extensions. We have "xen_nopv" for that purpose but only for XEN. For a consistent admin experience a common command line parameter "nopv" set across all PV guest implementations is a better choice. There are guest types which just won't work without PV extensions, like Xen PV, Xen PVH and jailhouse. add a "ignore_nopv" member to struct hypervisor_x86 set to true for those guest types and call the detect functions only if nopv is false or ignore_nopv is true. Suggested-by: Juergen Gross <jgross@suse.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com> Reviewed-by: Juergen Gross <jgross@suse.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Jan Kiszka <jan.kiszka@siemens.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Stefano Stabellini <sstabellini@kernel.org> Signed-off-by: Juergen Gross <jgross@suse.com>
|
#
814bbf49 |
|
14-Jul-2019 |
Juergen Gross <jgross@suse.com> |
xen: remove tmem driver The Xen tmem (transcendent memory) driver can be removed, as the related Xen hypervisor feature never made it past the "experimental" state and will be removed in future Xen versions (>= 4.13). The xen-selfballoon driver depends on tmem, so it can be removed, too. Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Juergen Gross <jgross@suse.com>
|
#
c6c40533 |
|
16-Jul-2019 |
Kairui Song <kasong@redhat.com> |
vmcore: add a kernel parameter novmcoredd Since commit 2724273e8fd0 ("vmcore: add API to collect hardware dump in second kernel"), drivers are allowed to add device related dump data to vmcore as they want by using the device dump API. This has a potential issue, the data is stored in memory, drivers may append too much data and use too much memory. The vmcore is typically used in a kdump kernel which runs in a pre-reserved small chunk of memory. So as a result it will make kdump unusable at all due to OOM issues. So introduce new 'novmcoredd' command line option. User can disable device dump to reduce memory usage. This is helpful if device dump is using too much memory, disabling device dump could make sure a regular vmcore without device dump data is still available. [akpm@linux-foundation.org: tweak documentation] [akpm@linux-foundation.org: vmcore.c needs moduleparam.h] Link: http://lkml.kernel.org/r/20190528111856.7276-1-kasong@redhat.com Signed-off-by: Kairui Song <kasong@redhat.com> Acked-by: Dave Young <dyoung@redhat.com> Reviewed-by: Bhupesh Sharma <bhsharma@redhat.com> Cc: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> Cc: "David S . Miller" <davem@davemloft.net> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Baoquan He <bhe@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
baa293e9 |
|
27-Jun-2019 |
Mauro Carvalho Chehab <mchehab+samsung@kernel.org> |
docs: driver-api: add a series of orphaned documents There are lots of documents under Documentation/*.txt and a few other orphan documents elsehwere that belong to the driver-API book. Move them to their right place. Reviewed-by: Cornelia Huck <cohuck@redhat.com> # vfio-related parts Acked-by: Logan Gunthorpe <logang@deltatee.com> # switchtec Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
|
#
4f4cfa6c |
|
27-Jun-2019 |
Mauro Carvalho Chehab <mchehab+samsung@kernel.org> |
docs: admin-guide: add a series of orphaned documents There are lots of documents that belong to the admin-guide but are on random places (most under Documentation root dir). Move them to the admin guide. Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Acked-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
|
#
da82c92f |
|
27-Jun-2019 |
Mauro Carvalho Chehab <mchehab+samsung@kernel.org> |
docs: cgroup-v1: add it to the admin-guide book Those files belong to the admin guide, so add them. Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
|
#
e7751617 |
|
18-Jun-2019 |
Mauro Carvalho Chehab <mchehab+samsung@kernel.org> |
docs: blockdev: add it to the admin-guide The blockdev book basically contains user-faced documentation. Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
|
#
330d4810 |
|
13-Jun-2019 |
Mauro Carvalho Chehab <mchehab+samsung@kernel.org> |
docs: admin-guide: add kdump documentation into it The Kdump documentation describes procedures with admins use in order to solve issues on their systems. Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
|
#
9e1cbede |
|
13-Jun-2019 |
Mauro Carvalho Chehab <mchehab+samsung@kernel.org> |
docs: admin-guide: add laptops documentation The docs under Documentation/laptops contain users specific information. Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com>
|
#
57043247 |
|
22-Apr-2019 |
Mauro Carvalho Chehab <mchehab+samsung@kernel.org> |
docs: admin-guide: move sysctl directory to it The stuff under sysctl describes /sys interface from userspace point of view. So, add it to the admin-guide and remove the :orphan: from its index file. Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
|
#
898bd37a |
|
18-Apr-2019 |
Mauro Carvalho Chehab <mchehab+samsung@kernel.org> |
docs: block: convert to ReST Rename the block documentation files to ReST, add an index for them and adjust in order to produce a nice html output via the Sphinx build system. At its new index.rst, let's add a :orphan: while this is not linked to the main index.rst file, in order to avoid build warnings. Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
|
#
53b95375 |
|
18-Apr-2019 |
Mauro Carvalho Chehab <mchehab+samsung@kernel.org> |
docs: sysctl: convert to ReST Rename the /proc/sys/ documentation files to ReST, using the README file as a template for an index.rst, adding the other files there via TOC markup. Despite being written on different times with different styles, try to make them somewhat coherent with a similar look and feel, ensuring that they'll look nice as both raw text file and as via the html output produced by the Sphinx build system. At its new index.rst, let's add a :orphan: while this is not linked to the main index.rst file, in order to avoid build warnings. Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
|
#
39443104 |
|
18-Apr-2019 |
Mauro Carvalho Chehab <mchehab+samsung@kernel.org> |
docs: blockdev: convert to ReST Rename the blockdev documentation files to ReST, add an index for them and adjust in order to produce a nice html output via the Sphinx build system. The drbd sub-directory contains some graphs and data flows. Add those too to the documentation. At its new index.rst, let's add a :orphan: while this is not linked to the main index.rst file, in order to avoid build warnings. Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
|
#
b02f1651 |
|
18-Apr-2019 |
Mauro Carvalho Chehab <mchehab+samsung@kernel.org> |
docs: laptops: convert to ReST Rename the laptops documentation files to ReST, add an index for them and adjust in order to produce a nice html output via the Sphinx build system. At its new index.rst, let's add a :orphan: while this is not linked to the main index.rst file, in order to avoid build warnings. Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com>
|
#
6471384a |
|
11-Jul-2019 |
Alexander Potapenko <glider@google.com> |
mm: security: introduce init_on_alloc=1 and init_on_free=1 boot options Patch series "add init_on_alloc/init_on_free boot options", v10. Provide init_on_alloc and init_on_free boot options. These are aimed at preventing possible information leaks and making the control-flow bugs that depend on uninitialized values more deterministic. Enabling either of the options guarantees that the memory returned by the page allocator and SL[AU]B is initialized with zeroes. SLOB allocator isn't supported at the moment, as its emulation of kmem caches complicates handling of SLAB_TYPESAFE_BY_RCU caches correctly. Enabling init_on_free also guarantees that pages and heap objects are initialized right after they're freed, so it won't be possible to access stale data by using a dangling pointer. As suggested by Michal Hocko, right now we don't let the heap users to disable initialization for certain allocations. There's not enough evidence that doing so can speed up real-life cases, and introducing ways to opt-out may result in things going out of control. This patch (of 2): The new options are needed to prevent possible information leaks and make control-flow bugs that depend on uninitialized values more deterministic. This is expected to be on-by-default on Android and Chrome OS. And it gives the opportunity for anyone else to use it under distros too via the boot args. (The init_on_free feature is regularly requested by folks where memory forensics is included in their threat models.) init_on_alloc=1 makes the kernel initialize newly allocated pages and heap objects with zeroes. Initialization is done at allocation time at the places where checks for __GFP_ZERO are performed. init_on_free=1 makes the kernel initialize freed pages and heap objects with zeroes upon their deletion. This helps to ensure sensitive data doesn't leak via use-after-free accesses. Both init_on_alloc=1 and init_on_free=1 guarantee that the allocator returns zeroed memory. The two exceptions are slab caches with constructors and SLAB_TYPESAFE_BY_RCU flag. Those are never zero-initialized to preserve their semantics. Both init_on_alloc and init_on_free default to zero, but those defaults can be overridden with CONFIG_INIT_ON_ALLOC_DEFAULT_ON and CONFIG_INIT_ON_FREE_DEFAULT_ON. If either SLUB poisoning or page poisoning is enabled, those options take precedence over init_on_alloc and init_on_free: initialization is only applied to unpoisoned allocations. Slowdown for the new features compared to init_on_free=0, init_on_alloc=0: hackbench, init_on_free=1: +7.62% sys time (st.err 0.74%) hackbench, init_on_alloc=1: +7.75% sys time (st.err 2.14%) Linux build with -j12, init_on_free=1: +8.38% wall time (st.err 0.39%) Linux build with -j12, init_on_free=1: +24.42% sys time (st.err 0.52%) Linux build with -j12, init_on_alloc=1: -0.13% wall time (st.err 0.42%) Linux build with -j12, init_on_alloc=1: +0.57% sys time (st.err 0.40%) The slowdown for init_on_free=0, init_on_alloc=0 compared to the baseline is within the standard error. The new features are also going to pave the way for hardware memory tagging (e.g. arm64's MTE), which will require both on_alloc and on_free hooks to set the tags for heap objects. With MTE, tagging will have the same cost as memory initialization. Although init_on_free is rather costly, there are paranoid use-cases where in-memory data lifetime is desired to be minimized. There are various arguments for/against the realism of the associated threat models, but given that we'll need the infrastructure for MTE anyway, and there are people who want wipe-on-free behavior no matter what the performance cost, it seems reasonable to include it in this series. [glider@google.com: v8] Link: http://lkml.kernel.org/r/20190626121943.131390-2-glider@google.com [glider@google.com: v9] Link: http://lkml.kernel.org/r/20190627130316.254309-2-glider@google.com [glider@google.com: v10] Link: http://lkml.kernel.org/r/20190628093131.199499-2-glider@google.com Link: http://lkml.kernel.org/r/20190617151050.92663-2-glider@google.com Signed-off-by: Alexander Potapenko <glider@google.com> Acked-by: Kees Cook <keescook@chromium.org> Acked-by: Michal Hocko <mhocko@suse.cz> [page and dmapool parts Acked-by: James Morris <jamorris@linux.microsoft.com>] Cc: Christoph Lameter <cl@linux.com> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: "Serge E. Hallyn" <serge@hallyn.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Kostya Serebryany <kcc@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Sandeep Patil <sspatil@android.com> Cc: Laura Abbott <labbott@redhat.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Jann Horn <jannh@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Marco Elver <elver@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
3972f6bb |
|
11-Jul-2019 |
Vlastimil Babka <vbabka@suse.cz> |
mm, debug_pagealloc: use a page type instead of page_ext flag When debug_pagealloc is enabled, we currently allocate the page_ext array to mark guard pages with the PAGE_EXT_DEBUG_GUARD flag. Now that we have the page_type field in struct page, we can use that instead, as guard pages are neither PageSlab nor mapped to userspace. This reduces memory overhead when debug_pagealloc is enabled and there are no other features requiring the page_ext array. Link: http://lkml.kernel.org/r/20190603143451.27353-4-vbabka@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
a2059825 |
|
08-Jul-2019 |
Josh Poimboeuf <jpoimboe@redhat.com> |
x86/speculation: Enable Spectre v1 swapgs mitigations The previous commit added macro calls in the entry code which mitigate the Spectre v1 swapgs issue if the X86_FEATURE_FENCE_SWAPGS_* features are enabled. Enable those features where applicable. The mitigations may be disabled with "nospectre_v1" or "mitigations=off". There are different features which can affect the risk of attack: - When FSGSBASE is enabled, unprivileged users are able to place any value in GS, using the wrgsbase instruction. This means they can write a GS value which points to any value in kernel space, which can be useful with the following gadget in an interrupt/exception/NMI handler: if (coming from user space) swapgs mov %gs:<percpu_offset>, %reg1 // dependent load or store based on the value of %reg // for example: mov %(reg1), %reg2 If an interrupt is coming from user space, and the entry code speculatively skips the swapgs (due to user branch mistraining), it may speculatively execute the GS-based load and a subsequent dependent load or store, exposing the kernel data to an L1 side channel leak. Note that, on Intel, a similar attack exists in the above gadget when coming from kernel space, if the swapgs gets speculatively executed to switch back to the user GS. On AMD, this variant isn't possible because swapgs is serializing with respect to future GS-based accesses. NOTE: The FSGSBASE patch set hasn't been merged yet, so the above case doesn't exist quite yet. - When FSGSBASE is disabled, the issue is mitigated somewhat because unprivileged users must use prctl(ARCH_SET_GS) to set GS, which restricts GS values to user space addresses only. That means the gadget would need an additional step, since the target kernel address needs to be read from user space first. Something like: if (coming from user space) swapgs mov %gs:<percpu_offset>, %reg1 mov (%reg1), %reg2 // dependent load or store based on the value of %reg2 // for example: mov %(reg2), %reg3 It's difficult to audit for this gadget in all the handlers, so while there are no known instances of it, it's entirely possible that it exists somewhere (or could be introduced in the future). Without tooling to analyze all such code paths, consider it vulnerable. Effects of SMAP on the !FSGSBASE case: - If SMAP is enabled, and the CPU reports RDCL_NO (i.e., not susceptible to Meltdown), the kernel is prevented from speculatively reading user space memory, even L1 cached values. This effectively disables the !FSGSBASE attack vector. - If SMAP is enabled, but the CPU *is* susceptible to Meltdown, SMAP still prevents the kernel from speculatively reading user space memory. But it does *not* prevent the kernel from reading the user value from L1, if it has already been cached. This is probably only a small hurdle for an attacker to overcome. Thanks to Dave Hansen for contributing the speculative_smap() function. Thanks to Andrew Cooper for providing the inside scoop on whether swapgs is serializing on AMD. [ tglx: Fixed the USER fence decision and polished the comment as suggested by Dave Hansen ] Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dave Hansen <dave.hansen@intel.com>
|
#
74665686 |
|
12-Jun-2019 |
Mauro Carvalho Chehab <mchehab+samsung@kernel.org> |
docs: watchdog: convert docs to ReST and rename to *.rst Convert those documents and prepare them to be part of the kernel API book, as most of the stuff there are related to the Kernel interfaces. Still, in the future, it would make sense to split the docs, as some of the stuff is clearly focused on sysadmin tasks. The conversion is actually: - add blank lines and identation in order to identify paragraphs; - fix tables markups; - add some lists markups; - mark literal blocks; - adjust title markups. At its new index.rst, let's add a :orphan: while this is not linked to the main index.rst file, in order to avoid build warnings. Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Wim Van Sebroeck <wim@linux-watchdog.org>
|
#
049331f2 |
|
03-Jul-2019 |
Thomas Gleixner <tglx@linutronix.de> |
x86/fsgsbase: Revert FSGSBASE support The FSGSBASE series turned out to have serious bugs and there is still an open issue which is not fully understood yet. The confidence in those changes has become close to zero especially as the test cases which have been shipped with that series were obviously never run before sending the final series out to LKML. ./fsgsbase_64 >/dev/null Segmentation fault As the merge window is close, the only sane decision is to revert FSGSBASE support. The revert is necessary as this branch has been merged into perf/core already and rebasing all of that a few days before the merge window is not the most brilliant idea. I could definitely slap myself for not noticing the test case fail when merging that series, but TBH my expectations weren't that low back then. Won't happen again. Revert the following commits: 539bca535dec ("x86/entry/64: Fix and clean up paranoid_exit") 2c7b5ac5d5a9 ("Documentation/x86/64: Add documentation for GS/FS addressing mode") f987c955c745 ("x86/elf: Enumerate kernel FSGSBASE capability in AT_HWCAP2") 2032f1f96ee0 ("x86/cpu: Enable FSGSBASE on 64bit by default and add a chicken bit") 5bf0cab60ee2 ("x86/entry/64: Document GSBASE handling in the paranoid path") 708078f65721 ("x86/entry/64: Handle FSGSBASE enabled paranoid entry/exit") 79e1932fa3ce ("x86/entry/64: Introduce the FIND_PERCPU_BASE macro") 1d07316b1363 ("x86/entry/64: Switch CR3 before SWAPGS in paranoid entry") f60a83df4593 ("x86/process/64: Use FSGSBASE instructions on thread copy and ptrace") 1ab5f3f7fe3d ("x86/process/64: Use FSBSBASE in switch_to() if available") a86b4625138d ("x86/fsgsbase/64: Enable FSGSBASE instructions in helper functions") 8b71340d702e ("x86/fsgsbase/64: Add intrinsics for FSGSBASE instructions") b64ed19b93c3 ("x86/cpu: Add 'unsafe_fsgsbase' to enable CR4.FSGSBASE") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: Chang S. Bae <chang.seok.bae@intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ravi Shankar <ravi.v.shankar@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com>
|
#
ba45cff6 |
|
12-May-2019 |
Michael Neuling <mikey@neuling.org> |
powerpc: Document xive=off option commit 243e25112d06 ("powerpc/xive: Native exploitation of the XIVE interrupt controller") added an option to turn off Linux native XIVE usage via the xive=off kernel command line option. This documents this option. Signed-off-by: Michael Neuling <mikey@neuling.org> Reviewed-by: Cédric Le Goater <clg@kaod.org> Acked-by: Stewart Smith <stewart@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
bd49e16e |
|
26-Jun-2019 |
Andy Lutomirski <luto@kernel.org> |
x86/vsyscall: Add a new vsyscall=xonly mode With vsyscall emulation on, a readable vsyscall page is still exposed that contains syscall instructions that validly implement the vsyscalls. This is required because certain dynamic binary instrumentation tools attempt to read the call targets of call instructions in the instrumented code. If the instrumented code uses vsyscalls, then the vsyscall page needs to contain readable code. Unfortunately, leaving readable memory at a deterministic address can be used to help various ASLR bypasses, so some hardening value can be gained by disallowing vsyscall reads. Given how rarely the vsyscall page needs to be readable, add a mechanism to make the vsyscall page be execute only. Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: Florian Weimer <fweimer@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Kernel Hardening <kernel-hardening@lists.openwall.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lkml.kernel.org/r/d17655777c21bc09a7af1bbcf74e6f2b69a51152.1561610354.git.luto@kernel.org
|
#
d974ffcf |
|
26-Jun-2019 |
Andy Lutomirski <luto@kernel.org> |
Documentation/admin: Remove the vsyscall=native documentation The vsyscall=native feature is gone -- remove the docs. Fixes: 076ca272a14c ("x86/vsyscall/64: Drop "native" vsyscalls") Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Kees Cook <keescook@chromium.org> Cc: Florian Weimer <fweimer@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: stable@vger.kernel.org Cc: Borislav Petkov <bp@alien8.de> Cc: Kernel Hardening <kernel-hardening@lists.openwall.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lkml.kernel.org/r/d77c7105eb4c57c1a95a95b6a5b8ba194a18e764.1561610354.git.luto@kernel.org
|
#
2032f1f9 |
|
08-May-2019 |
Andy Lutomirski <luto@kernel.org> |
x86/cpu: Enable FSGSBASE on 64bit by default and add a chicken bit Now that FSGSBASE is fully supported, remove unsafe_fsgsbase, enable FSGSBASE by default, and add nofsgsbase to disable it. Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Andi Kleen <ak@linux.intel.com> Cc: Ravi Shankar <ravi.v.shankar@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Link: https://lkml.kernel.org/r/1557309753-24073-17-git-send-email-chang.seok.bae@intel.com
|
#
b64ed19b |
|
08-May-2019 |
Andy Lutomirski <luto@kernel.org> |
x86/cpu: Add 'unsafe_fsgsbase' to enable CR4.FSGSBASE This is temporary. It will allow the next few patches to be tested incrementally. Setting unsafe_fsgsbase is a root hole. Don't do it. Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Andi Kleen <ak@linux.intel.com> Reviewed-by: Andy Lutomirski <luto@kernel.org> Cc: Ravi Shankar <ravi.v.shankar@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: H. Peter Anvin <hpa@zytor.com> Link: https://lkml.kernel.org/r/1557309753-24073-4-git-send-email-chang.seok.bae@intel.com
|
#
d909f910 |
|
09-Jun-2019 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc/64s/radix: Enable HAVE_ARCH_HUGE_VMAP This sets the HAVE_ARCH_HUGE_VMAP option, and defines the required page table functions. This enables huge (2MB and 1GB) ioremap mappings. I don't have a benchmark for this change, but huge vmap will be used by a later core kernel change to enable huge vmalloc memory mappings. This improves cached `git diff` performance by about 5% on a 2-node POWER9 with 32MB size dentry cache hash. Profiling git diff dTLB misses with a vanilla kernel: 81.75% git [kernel.vmlinux] [k] __d_lookup_rcu 7.21% git [kernel.vmlinux] [k] strncpy_from_user 1.77% git [kernel.vmlinux] [k] find_get_entry 1.59% git [kernel.vmlinux] [k] kmem_cache_free 40,168 dTLB-miss 0.100342754 seconds time elapsed With powerpc huge vmalloc: 2,987 dTLB-miss 0.095933138 seconds time elapsed Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
151f4e2b |
|
13-Jun-2019 |
Mauro Carvalho Chehab <mchehab+samsung@kernel.org> |
docs: power: convert docs to ReST and rename to *.rst Convert the PM documents to ReST, in order to allow them to build with Sphinx. The conversion is actually: - add blank lines and indentation in order to identify paragraphs; - fix tables markups; - add some lists markups; - mark literal blocks; - adjust title markups. At its new index.rst, let's add a :orphan: while this is not linked to the main index.rst file, in order to avoid build warnings. Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Mark Brown <broonie@kernel.org> Acked-by: Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu>
|
#
a2f405a5 |
|
12-Jun-2019 |
Mauro Carvalho Chehab <mchehab+samsung@kernel.org> |
docs: EDID/HOWTO.txt: convert it and rename to howto.rst Sphinx need to know when a paragraph ends. So, do some adjustments at the file for it to be properly parsed. At its new index.rst, let's add a :orphan: while this is not linked to the main index.rst file, in order to avoid build warnings. that's said, I believe that this file should be moved to the GPU/DRM documentation. Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
|
#
cc2a2d19 |
|
12-Jun-2019 |
Mauro Carvalho Chehab <mchehab+samsung@kernel.org> |
docs: watchdog: convert docs to ReST and rename to *.rst Convert those documents and prepare them to be part of the kernel API book, as most of the stuff there are related to the Kernel interfaces. Still, in the future, it would make sense to split the docs, as some of the stuff is clearly focused on sysadmin tasks. The conversion is actually: - add blank lines and identation in order to identify paragraphs; - fix tables markups; - add some lists markups; - mark literal blocks; - adjust title markups. At its new index.rst, let's add a :orphan: while this is not linked to the main index.rst file, in order to avoid build warnings. Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
|
#
99c8b231 |
|
12-Jun-2019 |
Mauro Carvalho Chehab <mchehab+samsung@kernel.org> |
docs: cgroup-v1: convert docs to ReST and rename to *.rst Convert the cgroup-v1 files to ReST format, in order to allow a later addition to the admin-guide. The conversion is actually: - add blank lines and identation in order to identify paragraphs; - fix tables markups; - add some lists markups; - mark literal blocks; - adjust title markups. At its new index.rst, let's add a :orphan: while this is not linked to the main index.rst file, in order to avoid build warnings. Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Tejun Heo <tj@kernel.org>
|
#
d67297ad |
|
12-Jun-2019 |
Mauro Carvalho Chehab <mchehab+samsung@kernel.org> |
docs: kdump: convert docs to ReST and rename to *.rst Convert kdump documentation to ReST and add it to the user faced manual, as the documents are mainly focused on sysadmins that would be enabling kdump. Note: the vmcoreinfo.rst has one very long title on one of its sub-sections: PG_lru|PG_private|PG_swapcache|PG_swapbacked|PG_slab|PG_hwpoision|PG_head_mask|PAGE_BUDDY_MAPCOUNT_VALUE(~PG_buddy)|PAGE_OFFLINE_MAPCOUNT_VALUE(~PG_offline) I opted to break this one, into two entries with the same content, in order to make it easier to display after being parsed in html and PDF. The conversion is actually: - add blank lines and identation in order to identify paragraphs; - fix tables markups; - add some lists markups; - mark literal blocks; - adjust title markups. At its new index.rst, let's add a :orphan: while this is not linked to the main index.rst file, in order to avoid build warnings. Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
|
#
d7b461c5 |
|
12-Jun-2019 |
Mauro Carvalho Chehab <mchehab+samsung@kernel.org> |
docs: ide: convert docs to ReST and rename to *.rst The conversion is actually: - add blank lines and identation in order to identify paragraphs; - fix tables markups; - add some lists markups; - mark literal blocks; - adjust title markups. At its new index.rst, let's add a :orphan: while this is not linked to the main index.rst file, in order to avoid build warnings. Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
|
#
ab42b818 |
|
12-Jun-2019 |
Mauro Carvalho Chehab <mchehab+samsung@kernel.org> |
docs: fb: convert docs to ReST and rename to *.rst The conversion is actually: - add blank lines and identation in order to identify paragraphs; - fix tables markups; - add some lists markups; - mark literal blocks; - adjust title markups. At its new index.rst, let's add a :orphan: while this is not linked to the main index.rst file, in order to avoid build warnings. Also, removed the Maintained by, as requested by Geert. Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
|
#
8b4a503d |
|
08-Jun-2019 |
Mauro Carvalho Chehab <mchehab+samsung@kernel.org> |
docs: s390: convert docs to ReST and rename to *.rst Convert all text files with s390 documentation to ReST format. Tried to preserve as much as possible the original document format. Still, some of the files required some work in order for it to be visible on both plain text and after converted to html. The conversion is actually: - add blank lines and identation in order to identify paragraphs; - fix tables markups; - add some lists markups; - mark literal blocks; - adjust title markups. At its new index.rst, let's add a :orphan: while this is not linked to the main index.rst file, in order to avoid build warnings. Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
|
#
9915ec28 |
|
07-Jun-2019 |
Mauro Carvalho Chehab <mchehab+samsung@kernel.org> |
docs: isdn: remove hisax references from kernel-parameters.txt The hisax driver got removed on 85993b8c9786 ("isdn: remove hisax driver"), but a left-over was kept at kernel-parameters.txt. Fixes: 85993b8c9786 ("isdn: remove hisax driver") Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
|
#
cb1aaebe |
|
07-Jun-2019 |
Mauro Carvalho Chehab <mchehab+samsung@kernel.org> |
docs: fix broken documentation links Mostly due to x86 and acpi conversion, several documentation links are still pointing to the old file. Fix them. Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Reviewed-by: Wolfram Sang <wsa@the-dreams.de> Reviewed-by: Sven Van Asbroeck <TheSven73@gmail.com> Reviewed-by: Bhupesh Sharma <bhsharma@redhat.com> Acked-by: Mark Brown <broonie@kernel.org> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
|
#
93285c01 |
|
20-May-2019 |
Zhenzhong Duan <zhenzhong.duan@oracle.com> |
doc: kernel-parameters.txt: fix documentation of nmi_watchdog parameter The default behavior of hardlockup depends on the config of CONFIG_BOOTPARAM_HARDLOCKUP_PANIC. Fix the description of nmi_watchdog to make it clear. Suggested-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Acked-by: Ingo Molnar <mingo@kernel.org> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Kees Cook <keescook@chromium.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: linux-doc@vger.kernel.org Signed-off-by: Jonathan Corbet <corbet@lwn.net>
|
#
970988e1 |
|
22-May-2019 |
Masami Hiramatsu <mhiramat@kernel.org> |
tracing/kprobe: Add kprobe_event= boot parameter Add kprobe_event= boot parameter to define kprobe events at boot time. The definition syntax is similar to tracefs/kprobe_events interface, but use ',' and ';' instead of ' ' and '\n' respectively. e.g. kprobe_event=p,vfs_read,$arg1,$arg2 This puts a probe on vfs_read with argument1 and 2, and enable the new event. Link: http://lkml.kernel.org/r/155851395498.15728.830529496248543583.stgit@devnote2 Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
#
48d07c04 |
|
20-Mar-2019 |
Sebastian Andrzej Siewior <bigeasy@linutronix.de> |
rcu: Enable elimination of Tree-RCU softirq processing Some workloads need to change kthread priority for RCU core processing without affecting other softirq work. This commit therefore introduces the rcutree.use_softirq kernel boot parameter, which moves the RCU core work from softirq to a per-CPU SCHED_OTHER kthread named rcuc. Use of SCHED_OTHER approach avoids the scalability problems that appeared with the earlier attempt to move RCU core processing to from softirq to kthreads. That said, kernels built with RCU_BOOST=y will run the rcuc kthreads at the RCU-boosting priority. Note that rcutree.use_softirq=0 must be specified to move RCU core processing to the rcuc kthreads: rcutree.use_softirq=1 is the default. Reported-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> [ paulmck: Adjust for invoke_rcu_callbacks() only ever being invoked from RCU core processing, in contrast to softirq->rcuc transition in old mainline RCU priority boosting. ] [ paulmck: Avoid wakeups when scheduler might have invoked rcu_read_unlock() while holding rq or pi locks, also possibly fixing a pre-existing latent bug involving raise_softirq()-induced wakeups. ] Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
|
#
de6da1e8 |
|
17-May-2019 |
Feng Tang <feng.tang@intel.com> |
panic: add an option to replay all the printk message in buffer Currently on panic, kernel will lower the loglevel and print out pending printk msg only with console_flush_on_panic(). Add an option for users to configure the "panic_print" to replay all dmesg in buffer, some of which they may have never seen due to the loglevel setting, which will help panic debugging . [feng.tang@intel.com: keep the original console_flush_on_panic() inside panic()] Link: http://lkml.kernel.org/r/1556199137-14163-1-git-send-email-feng.tang@intel.com [feng.tang@intel.com: use logbuf lock to protect the console log index] Link: http://lkml.kernel.org/r/1556269868-22654-1-git-send-email-feng.tang@intel.com Link: http://lkml.kernel.org/r/1556095872-36838-1-git-send-email-feng.tang@intel.com Signed-off-by: Feng Tang <feng.tang@intel.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Cc: Aaro Koskinen <aaro.koskinen@nokia.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> Cc: Kees Cook <keescook@chromium.org> Cc: Borislav Petkov <bp@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
5ac893b8 |
|
14-May-2019 |
Waiman Long <longman@redhat.com> |
ipc: allow boot time extension of IPCMNI from 32k to 16M The maximum number of unique System V IPC identifiers was limited to 32k. That limit should be big enough for most use cases. However, there are some users out there requesting for more, especially those that are migrating from Solaris which uses 24 bits for unique identifiers. To satisfy the need of those users, a new boot time kernel option "ipcmni_extend" is added to extend the IPCMNI value to 16M. This is a 512X increase which should be big enough for users out there that need a large number of unique IPC identifier. The use of this new option will change the pattern of the IPC identifiers returned by functions like shmget(2). An application that depends on such pattern may not work properly. So it should only be used if the users really need more than 32k of unique IPC numbers. This new option does have the side effect of reducing the maximum number of unique sequence numbers from 64k down to 128. So it is a trade-off. The computation of a new IPC id is not done in the performance critical path. So a little bit of additional overhead shouldn't have any real performance impact. Link: http://lkml.kernel.org/r/20190329204930.21620-1-longman@redhat.com Signed-off-by: Waiman Long <longman@redhat.com> Acked-by: Manfred Spraul <manfred@colorfullife.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: "Eric W . Biederman" <ebiederm@xmission.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kees Cook <keescook@chromium.org> Cc: "Luis R. Rodriguez" <mcgrof@kernel.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Takashi Iwai <tiwai@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
b287a25a |
|
14-May-2019 |
Aaro Koskinen <aaro.koskinen@nokia.com> |
panic/reboot: allow specifying reboot_mode for panic only Allow specifying reboot_mode for panic only. This is needed on systems where ramoops is used to store panic logs, and user wants to use warm reset to preserve those, while still having cold reset on normal reboots. Link: http://lkml.kernel.org/r/20190322004735.27702-1-aaro.koskinen@iki.fi Signed-off-by: Aaro Koskinen <aaro.koskinen@nokia.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
e900a918 |
|
14-May-2019 |
Dan Williams <dan.j.williams@intel.com> |
mm: shuffle initial free memory to improve memory-side-cache utilization Patch series "mm: Randomize free memory", v10. This patch (of 3): Randomization of the page allocator improves the average utilization of a direct-mapped memory-side-cache. Memory side caching is a platform capability that Linux has been previously exposed to in HPC (high-performance computing) environments on specialty platforms. In that instance it was a smaller pool of high-bandwidth-memory relative to higher-capacity / lower-bandwidth DRAM. Now, this capability is going to be found on general purpose server platforms where DRAM is a cache in front of higher latency persistent memory [1]. Robert offered an explanation of the state of the art of Linux interactions with memory-side-caches [2], and I copy it here: It's been a problem in the HPC space: http://www.nersc.gov/research-and-development/knl-cache-mode-performance-coe/ A kernel module called zonesort is available to try to help: https://software.intel.com/en-us/articles/xeon-phi-software and this abandoned patch series proposed that for the kernel: https://lkml.kernel.org/r/20170823100205.17311-1-lukasz.daniluk@intel.com Dan's patch series doesn't attempt to ensure buffers won't conflict, but also reduces the chance that the buffers will. This will make performance more consistent, albeit slower than "optimal" (which is near impossible to attain in a general-purpose kernel). That's better than forcing users to deploy remedies like: "To eliminate this gradual degradation, we have added a Stream measurement to the Node Health Check that follows each job; nodes are rebooted whenever their measured memory bandwidth falls below 300 GB/s." A replacement for zonesort was merged upstream in commit cc9aec03e58f ("x86/numa_emulation: Introduce uniform split capability"). With this numa_emulation capability, memory can be split into cache sized ("near-memory" sized) numa nodes. A bind operation to such a node, and disabling workloads on other nodes, enables full cache performance. However, once the workload exceeds the cache size then cache conflicts are unavoidable. While HPC environments might be able to tolerate time-scheduling of cache sized workloads, for general purpose server platforms, the oversubscribed cache case will be the common case. The worst case scenario is that a server system owner benchmarks a workload at boot with an un-contended cache only to see that performance degrade over time, even below the average cache performance due to excessive conflicts. Randomization clips the peaks and fills in the valleys of cache utilization to yield steady average performance. Here are some performance impact details of the patches: 1/ An Intel internal synthetic memory bandwidth measurement tool, saw a 3X speedup in a contrived case that tries to force cache conflicts. The contrived cased used the numa_emulation capability to force an instance of the benchmark to be run in two of the near-memory sized numa nodes. If both instances were placed on the same emulated they would fit and cause zero conflicts. While on separate emulated nodes without randomization they underutilized the cache and conflicted unnecessarily due to the in-order allocation per node. 2/ A well known Java server application benchmark was run with a heap size that exceeded cache size by 3X. The cache conflict rate was 8% for the first run and degraded to 21% after page allocator aging. With randomization enabled the rate levelled out at 11%. 3/ A MongoDB workload did not observe measurable difference in cache-conflict rates, but the overall throughput dropped by 7% with randomization in one case. 4/ Mel Gorman ran his suite of performance workloads with randomization enabled on platforms without a memory-side-cache and saw a mix of some improvements and some losses [3]. While there is potentially significant improvement for applications that depend on low latency access across a wide working-set, the performance may be negligible to negative for other workloads. For this reason the shuffle capability defaults to off unless a direct-mapped memory-side-cache is detected. Even then, the page_alloc.shuffle=0 parameter can be specified to disable the randomization on those systems. Outside of memory-side-cache utilization concerns there is potentially security benefit from randomization. Some data exfiltration and return-oriented-programming attacks rely on the ability to infer the location of sensitive data objects. The kernel page allocator, especially early in system boot, has predictable first-in-first out behavior for physical pages. Pages are freed in physical address order when first onlined. Quoting Kees: "While we already have a base-address randomization (CONFIG_RANDOMIZE_MEMORY), attacks against the same hardware and memory layouts would certainly be using the predictability of allocation ordering (i.e. for attacks where the base address isn't important: only the relative positions between allocated memory). This is common in lots of heap-style attacks. They try to gain control over ordering by spraying allocations, etc. I'd really like to see this because it gives us something similar to CONFIG_SLAB_FREELIST_RANDOM but for the page allocator." While SLAB_FREELIST_RANDOM reduces the predictability of some local slab caches it leaves vast bulk of memory to be predictably in order allocated. However, it should be noted, the concrete security benefits are hard to quantify, and no known CVE is mitigated by this randomization. Introduce shuffle_free_memory(), and its helper shuffle_zone(), to perform a Fisher-Yates shuffle of the page allocator 'free_area' lists when they are initially populated with free memory at boot and at hotplug time. Do this based on either the presence of a page_alloc.shuffle=Y command line parameter, or autodetection of a memory-side-cache (to be added in a follow-on patch). The shuffling is done in terms of CONFIG_SHUFFLE_PAGE_ORDER sized free pages where the default CONFIG_SHUFFLE_PAGE_ORDER is MAX_ORDER-1 i.e. 10, 4MB this trades off randomization granularity for time spent shuffling. MAX_ORDER-1 was chosen to be minimally invasive to the page allocator while still showing memory-side cache behavior improvements, and the expectation that the security implications of finer granularity randomization is mitigated by CONFIG_SLAB_FREELIST_RANDOM. The performance impact of the shuffling appears to be in the noise compared to other memory initialization work. This initial randomization can be undone over time so a follow-on patch is introduced to inject entropy on page free decisions. It is reasonable to ask if the page free entropy is sufficient, but it is not enough due to the in-order initial freeing of pages. At the start of that process putting page1 in front or behind page0 still keeps them close together, page2 is still near page1 and has a high chance of being adjacent. As more pages are added ordering diversity improves, but there is still high page locality for the low address pages and this leads to no significant impact to the cache conflict rate. [1]: https://itpeernetwork.intel.com/intel-optane-dc-persistent-memory-operating-modes/ [2]: https://lkml.kernel.org/r/AT5PR8401MB1169D656C8B5E121752FC0F8AB120@AT5PR8401MB1169.NAMPRD84.PROD.OUTLOOK.COM [3]: https://lkml.org/lkml/2018/10/12/309 [dan.j.williams@intel.com: fix shuffle enable] Link: http://lkml.kernel.org/r/154943713038.3858443.4125180191382062871.stgit@dwillia2-desk3.amr.corp.intel.com [cai@lca.pw: fix SHUFFLE_PAGE_ALLOCATOR help texts] Link: http://lkml.kernel.org/r/20190425201300.75650-1-cai@lca.pw Link: http://lkml.kernel.org/r/154899811738.3165233.12325692939590944259.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Qian Cai <cai@lca.pw> Reviewed-by: Kees Cook <keescook@chromium.org> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Keith Busch <keith.busch@intel.com> Cc: Robert Elliott <elliott@hpe.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
a111b7c0 |
|
12-Apr-2019 |
Josh Poimboeuf <jpoimboe@redhat.com> |
arm64/speculation: Support 'mitigations=' cmdline option Configure arm64 runtime CPU speculation bug mitigations in accordance with the 'mitigations=' cmdline option. This affects Meltdown, Spectre v2, and Speculative Store Bypass. The default behavior is unchanged. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> [will: reorder checks so KASLR implies KPTI and SSBS is affected by cmdline] Signed-off-by: Will Deacon <will.deacon@arm.com>
|
#
56271303 |
|
18-Apr-2019 |
Sebastian Ott <sebott@linux.ibm.com> |
s390/pci: add parameter to disable usage of MIO instructions Allow users to disable usage of MIO instructions by specifying pci=nomio at the kernel command line. Signed-off-by: Sebastian Ott <sebott@linux.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
fbfe07d4 |
|
26-Feb-2019 |
Sebastian Ott <sebott@linux.ibm.com> |
s390/pci: add parameter to force floating irqs Provide a kernel parameter to force the usage of floating interrupts. Signed-off-by: Sebastian Ott <sebott@linux.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
e5ce5e72 |
|
15-Apr-2019 |
Jeremy Linton <jeremy.linton@arm.com> |
arm64: Provide a command line to disable spectre_v2 mitigation There are various reasons, such as benchmarking, to disable spectrev2 mitigation on a machine. Provide a command-line option to do so. Signed-off-by: Jeremy Linton <jeremy.linton@arm.com> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Reviewed-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Tested-by: Stefan Wahren <stefan.wahren@i2se.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: linux-doc@vger.kernel.org Signed-off-by: Will Deacon <will.deacon@arm.com>
|
#
2ec16bc0 |
|
22-Mar-2019 |
Ryan Thibodeaux <ryan.thibodeaux@starlab.io> |
x86/xen: Add "xen_timer_slop" command line option Add a new command-line option "xen_timer_slop=<INT>" that sets the minimum delta of virtual Xen timers. This commit does not change the default timer slop value for virtual Xen timers. Lowering the timer slop value should improve the accuracy of virtual timers (e.g., better process dispatch latency), but it will likely increase the number of virtual timer interrupts (relative to the original slop setting). The original timer slop value has not changed since the introduction of the Xen-aware Linux kernel code. This commit provides users an opportunity to tune timer performance given the refinements to hardware and the Xen event channel processing. It also mirrors a feature in the Xen hypervisor - the "timer_slop" Xen command line option. [boris: updated comment describing TIMER_SLOP] Signed-off-by: Ryan Thibodeaux <ryan.thibodeaux@starlab.io> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
|
#
b9ac3849 |
|
21-Apr-2019 |
Dave Young <dyoung@redhat.com> |
x86/kdump: Fall back to reserve high crashkernel memory crashkernel=xM tries to reserve memory for the crash kernel under 4G, which is enough, usually. But this could fail sometimes, for example when one tries to reserve a big chunk like 2G, for example. So let the crashkernel=xM just fall back to use high memory in case it fails to find a suitable low range. Do not set the ,high as default because it allocates extra low memory for DMA buffers and swiotlb, and this is not always necessary for all machines. Typically, crashkernel=128M usually works with low reservation under 4G, so keep <4G as default. [ bp: Massage. ] Signed-off-by: Dave Young <dyoung@redhat.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Acked-by: Baoquan He <bhe@redhat.com> Cc: Dave Young <dyoung@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <keescook@chromium.org> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: linux-doc@vger.kernel.org Cc: "Paul E. McKenney" <paulmck@linux.ibm.com> Cc: Petr Tesarik <ptesarik@suse.cz> Cc: piliu@redhat.com Cc: Ram Pai <linuxram@us.ibm.com> Cc: Sinan Kaya <okaya@codeaurora.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thymo van Beers <thymovanbeers@gmail.com> Cc: vgoyal@redhat.com Cc: x86-ml <x86@kernel.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Zhimin Gu <kookoo.gu@intel.com> Link: https://lkml.kernel.org/r/20190422031905.GA8387@dhcp-128-65.nay.redhat.com
|
#
de78a9c4 |
|
18-Apr-2019 |
Christophe Leroy <christophe.leroy@c-s.fr> |
powerpc: Add a framework for Kernel Userspace Access Protection This patch implements a framework for Kernel Userspace Access Protection. Then subarches will have the possibility to provide their own implementation by providing setup_kuap() and allow/prevent_user_access(). Some platforms will need to know the area accessed and whether it is accessed from read, write or both. Therefore source, destination and size and handed over to the two functions. mpe: Rename to allow/prevent rather than unlock/lock, and add read/write wrappers. Drop the 32-bit code for now until we have an implementation for it. Add kuap to pt_regs for 64-bit as well as 32-bit. Don't split strings, use pr_crit_ratelimited(). Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
0fb1c25a |
|
18-Apr-2019 |
Christophe Leroy <christophe.leroy@c-s.fr> |
powerpc: Add skeleton for Kernel Userspace Execution Prevention This patch adds a skeleton for Kernel Userspace Execution Prevention. Then subarches implementing it have to define CONFIG_PPC_HAVE_KUEP and provide setup_kuep() function. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> [mpe: Don't split strings, use pr_crit_ratelimited()] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
5c14068f |
|
17-Apr-2019 |
Josh Poimboeuf <jpoimboe@redhat.com> |
x86/speculation/mds: Add 'mitigations=' support for MDS Add MDS to the new 'mitigations=' cmdline option. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
0336e04a |
|
12-Apr-2019 |
Josh Poimboeuf <jpoimboe@redhat.com> |
s390/speculation: Support 'mitigations=' cmdline option Configure s390 runtime CPU speculation bug mitigations in accordance with the 'mitigations=' cmdline option. This affects Spectre v1 and Spectre v2. The default behavior is unchanged. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Jiri Kosina <jkosina@suse.cz> (on x86) Reviewed-by: Jiri Kosina <jkosina@suse.cz> Cc: Borislav Petkov <bp@alien8.de> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Jiri Kosina <jikos@kernel.org> Cc: Waiman Long <longman@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Jon Masters <jcm@redhat.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: linuxppc-dev@lists.ozlabs.org Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: linux-s390@vger.kernel.org Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-arch@vger.kernel.org Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Tyler Hicks <tyhicks@canonical.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Steven Price <steven.price@arm.com> Cc: Phil Auld <pauld@redhat.com> Link: https://lkml.kernel.org/r/e4a161805458a5ec88812aac0307ae3908a030fc.1555085500.git.jpoimboe@redhat.com
|
#
782e69ef |
|
12-Apr-2019 |
Josh Poimboeuf <jpoimboe@redhat.com> |
powerpc/speculation: Support 'mitigations=' cmdline option Configure powerpc CPU runtime speculation bug mitigations in accordance with the 'mitigations=' cmdline option. This affects Meltdown, Spectre v1, Spectre v2, and Speculative Store Bypass. The default behavior is unchanged. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Jiri Kosina <jkosina@suse.cz> (on x86) Reviewed-by: Jiri Kosina <jkosina@suse.cz> Cc: Borislav Petkov <bp@alien8.de> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Jiri Kosina <jikos@kernel.org> Cc: Waiman Long <longman@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Jon Masters <jcm@redhat.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: linuxppc-dev@lists.ozlabs.org Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: linux-s390@vger.kernel.org Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-arch@vger.kernel.org Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Tyler Hicks <tyhicks@canonical.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Steven Price <steven.price@arm.com> Cc: Phil Auld <pauld@redhat.com> Link: https://lkml.kernel.org/r/245a606e1a42a558a310220312d9b6adb9159df6.1555085500.git.jpoimboe@redhat.com
|
#
d68be4c4 |
|
12-Apr-2019 |
Josh Poimboeuf <jpoimboe@redhat.com> |
x86/speculation: Support 'mitigations=' cmdline option Configure x86 runtime CPU speculation bug mitigations in accordance with the 'mitigations=' cmdline option. This affects Meltdown, Spectre v2, Speculative Store Bypass, and L1TF. The default behavior is unchanged. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Jiri Kosina <jkosina@suse.cz> (on x86) Reviewed-by: Jiri Kosina <jkosina@suse.cz> Cc: Borislav Petkov <bp@alien8.de> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Jiri Kosina <jikos@kernel.org> Cc: Waiman Long <longman@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Jon Masters <jcm@redhat.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: linuxppc-dev@lists.ozlabs.org Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: linux-s390@vger.kernel.org Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-arch@vger.kernel.org Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Tyler Hicks <tyhicks@canonical.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Steven Price <steven.price@arm.com> Cc: Phil Auld <pauld@redhat.com> Link: https://lkml.kernel.org/r/6616d0ae169308516cfdf5216bedd169f8a8291b.1555085500.git.jpoimboe@redhat.com
|
#
98af8452 |
|
12-Apr-2019 |
Josh Poimboeuf <jpoimboe@redhat.com> |
cpu/speculation: Add 'mitigations=' cmdline option Keeping track of the number of mitigations for all the CPU speculation bugs has become overwhelming for many users. It's getting more and more complicated to decide which mitigations are needed for a given architecture. Complicating matters is the fact that each arch tends to have its own custom way to mitigate the same vulnerability. Most users fall into a few basic categories: a) they want all mitigations off; b) they want all reasonable mitigations on, with SMT enabled even if it's vulnerable; or c) they want all reasonable mitigations on, with SMT disabled if vulnerable. Define a set of curated, arch-independent options, each of which is an aggregation of existing options: - mitigations=off: Disable all mitigations. - mitigations=auto: [default] Enable all the default mitigations, but leave SMT enabled, even if it's vulnerable. - mitigations=auto,nosmt: Enable all the default mitigations, disabling SMT if needed by a mitigation. Currently, these options are placeholders which don't actually do anything. They will be fleshed out in upcoming patches. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Jiri Kosina <jkosina@suse.cz> (on x86) Reviewed-by: Jiri Kosina <jkosina@suse.cz> Cc: Borislav Petkov <bp@alien8.de> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Jiri Kosina <jikos@kernel.org> Cc: Waiman Long <longman@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Jon Masters <jcm@redhat.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: linuxppc-dev@lists.ozlabs.org Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: linux-s390@vger.kernel.org Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-arch@vger.kernel.org Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Tyler Hicks <tyhicks@canonical.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Steven Price <steven.price@arm.com> Cc: Phil Auld <pauld@redhat.com> Link: https://lkml.kernel.org/r/b07a8ef9b7c5055c3a4637c87d07c296d5016fe0.1555085500.git.jpoimboe@redhat.com
|
#
41475a3e |
|
04-Apr-2019 |
Petr Vorel <pvorel@suse.cz> |
doc/kernel-parameters.txt: Deprecate ima_appraise_tcb Signed-off-by: Petr Vorel <pvorel@suse.cz> Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
|
#
d71eb0ce |
|
02-Apr-2019 |
Josh Poimboeuf <jpoimboe@redhat.com> |
x86/speculation/mds: Add mds=full,nosmt cmdline option Add the mds=full,nosmt cmdline option. This is like mds=full, but with SMT disabled if the CPU is vulnerable. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Tyler Hicks <tyhicks@canonical.com> Acked-by: Jiri Kosina <jkosina@suse.cz>
|
#
da8739f2 |
|
05-Mar-2019 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Allow rcu_nocbs= to specify all CPUs Currently, the rcu_nocbs= kernel boot parameter requires that a specific list of CPUs be specified, and has no way to say "all of them". As noted by user RavFX in a comment to Phoronix topic 1002538, this is an inconvenient side effect of the removal of the RCU_NOCB_CPU_ALL Kconfig option. This commit therefore enables the rcu_nocbs= kernel boot parameter to be given the string "all", as in "rcu_nocbs=all" to specify that all CPUs on the system are to have their RCU callbacks offloaded. Another approach would be to make cpulist_parse() check for "all", but there are uses of cpulist_parse() that do other checking, which could conflict with an "all". This commit therefore focuses on the specific use of cpulist_parse() in rcu_nocb_setup(). Just a note to other people who would like changes to Linux-kernel RCU: If you send your requests to me directly, they might get fixed somewhat faster. RavFX's comment was posted on January 22, 2018 and I first saw it on March 5, 2019. And the only reason that I found it -at- -all- was that I was looking for projects using RCU, and my search engine showed me that Phoronix comment quite by accident. Your choice, though! ;-) Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
|
#
0f0b7e1c |
|
07-Mar-2019 |
Juri Lelli <juri.lelli@redhat.com> |
x86/tsc: Add option to disable tsc clocksource watchdog Clocksource watchdog has been found responsible for generating latency spikes (in the 10-20 us range) when woken up to check for TSC stability. Add an option to disable it at boot. Signed-off-by: Juri Lelli <juri.lelli@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: bigeasy@linutronix.de Cc: linux-rt-users@vger.kernel.org Cc: peterz@infradead.org Cc: bristot@redhat.com Cc: williams@redhat.com Link: https://lkml.kernel.org/r/20190307120913.13168-1-juri.lelli@redhat.com
|
#
5999bbe7 |
|
18-Feb-2019 |
Thomas Gleixner <tglx@linutronix.de> |
Documentation: Add MDS vulnerability documentation Add the initial MDS vulnerability documentation. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Jon Masters <jcm@redhat.com>
|
#
65fd4cb6 |
|
19-Feb-2019 |
Thomas Gleixner <tglx@linutronix.de> |
Documentation: Move L1TF to separate directory Move L!TF to a separate directory so the MDS stuff can be added at the side. Otherwise the all hardware vulnerabilites have their own top level entry. Should have done that right away. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Jon Masters <jcm@redhat.com>
|
#
bc124170 |
|
18-Feb-2019 |
Thomas Gleixner <tglx@linutronix.de> |
x86/speculation/mds: Add mitigation control for MDS Now that the mitigations are in place, add a command line parameter to control the mitigation, a mitigation selector function and a SMT update mechanism. This is the minimal straight forward initial implementation which just provides an always on/off mode. The command line parameter is: mds=[full|off] This is consistent with the existing mitigations for other speculative hardware vulnerabilities. The idle invocation is dynamically updated according to the SMT state of the system similar to the dynamic update of the STIBP mitigation. The idle mitigation is limited to CPUs which are only affected by MSBDS and not any other variant, because the other variants cannot be mitigated on SMT enabled systems. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Borislav Petkov <bp@suse.de> Reviewed-by: Jon Masters <jcm@redhat.com> Tested-by: Jon Masters <jcm@redhat.com>
|
#
89a9684e |
|
12-Feb-2019 |
Kees Cook <keescook@chromium.org> |
LSM: Ignore "security=" when "lsm=" is specified To avoid potential confusion, explicitly ignore "security=" when "lsm=" is used on the command line, and report that it is happening. Suggested-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Casey Schaufler <casey@schaufler-ca.com> Acked-by: John Johansen <john.johansen@canonical.com> Signed-off-by: James Morris <james.morris@microsoft.com>
|
#
7bae0432 |
|
17-Feb-2019 |
Dmitry Torokhov <dtor@chromium.org> |
usb: core: add option of only authorizing internal devices On Chrome OS we want to use USBguard to potentially limit access to USB devices based on policy. We however to do not want to wait for userspace to come up before initializing fixed USB devices to not regress our boot times. This patch adds option to instruct the kernel to only authorize devices connected to the internal ports. Previously we could either authorize all or none (or, by default, we'd only authorize wired devices). The behavior is controlled via usbcore.authorized_default command line option. Signed-off-by: Dmitry Torokhov <dtor@chromium.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
1ea61b68 |
|
13-Feb-2019 |
Feng Tang <feng.tang@intel.com> |
async: Add cmdline option to specify drivers to be async probed Asynchronous driver probing can help much on kernel fastboot, and this option can provide a flexible way to optimize and quickly verify async driver probe. Also it will help in below cases: * Some driver actually covers several families of HWs, some of which could use async probing while others don't. So we can't simply turn on the PROBE_PREFER_ASYNCHRONOUS flag in driver, but use this cmdline option, like igb driver async patch discussed at https://www.spinics.net/lists/netdev/msg545986.html * For SOC (System on Chip) with multiple spi or i2c controllers, most of the slave spi/i2c devices will be assigned with fixed controller number, while async probing may make those controllers get different index for each boot, which prevents those controller drivers to be async probed. For platforms not using these spi/i2c slave devices, they can use this cmdline option to benefit from the async probing. Suggested-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> Cc: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Feng Tang <feng.tang@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
de190555 |
|
24-Jan-2019 |
Jeremy Linton <jeremy.linton@arm.com> |
Documentation: Document arm64 kpti control For a while Arm64 has been capable of force enabling or disabling the kpti mitigations. Lets make sure the documentation reflects that. Signed-off-by: Jeremy Linton <jeremy.linton@arm.com> Reviewed-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
|
#
31dcbbef |
|
03-Feb-2019 |
Otto Sabart <ottosabart@seberm.com> |
doc: kernel-parameters.txt: fix documentation of elevator parameter Legacy IO schedulers (cfq, deadline and noop) were removed in f382fb0bcef4. The documentation for deadline was retained because it carries over to mq-deadline as well, but location of the doc file was changed over time. The old iosched algorithms were removed from elevator= kernel parameter and mq-deadline, kyber and bfq were added with a reference to their documentation. Fixes: f382fb0bcef4 ("block: remove legacy IO schedulers") Signed-off-by: Otto Sabart <ottosabart@seberm.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
|
#
bc3c03cc |
|
31-Jan-2019 |
Julien Thierry <julien.thierry.kdev@gmail.com> |
arm64: Enable the support of pseudo-NMIs Add a build option and a command line parameter to build and enable the support of pseudo-NMIs. Signed-off-by: Julien Thierry <julien.thierry@arm.com> Suggested-by: Daniel Thompson <daniel.thompson@linaro.org> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
#
69c1f396 |
|
02-Feb-2019 |
Ard Biesheuvel <ardb@kernel.org> |
efi/x86: Convert x86 EFI earlyprintk into generic earlycon implementation Move the x86 EFI earlyprintk implementation to a shared location under drivers/firmware and tweak it slightly so we can expose it as an earlycon implementation (which is generic) rather than earlyprintk (which is only implemented for a few architectures) This also involves switching to write-combine mappings by default (which is required on ARM since device mappings lack memory semantics, and so memcpy/memset may not be used on them), and adding support for shared memory framebuffers on cache coherent non-x86 systems (which do not tolerate mismatched attributes). Note that 32-bit ARM does not populate its struct screen_info early enough for earlycon=efifb to work, so it is disabled there. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Reviewed-by: Alexander Graf <agraf@suse.de> Cc: AKASHI Takahiro <takahiro.akashi@linaro.org> Cc: Bjorn Andersson <bjorn.andersson@linaro.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Heinrich Schuchardt <xypron.glpk@gmx.de> Cc: Jeffrey Hugo <jhugo@codeaurora.org> Cc: Lee Jones <lee.jones@linaro.org> Cc: Leif Lindholm <leif.lindholm@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Peter Jones <pjones@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Link: http://lkml.kernel.org/r/20190202094119.13230-10-ard.biesheuvel@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
3fc46fc9 |
|
31-Jan-2019 |
Martin Kepplinger <martink@posteo.de> |
ipconfig: add carrier_timeout kernel parameter commit 3fb72f1e6e61 ("ipconfig wait for carrier") added a "wait for carrier" policy, with a fixed worst case maximum wait of two minutes. Now make the wait for carrier timeout configurable on the kernel commandline and use the 120s as the default. The timeout messages introduced with commit 5e404cd65860 ("ipconfig: add informative timeout messages while waiting for carrier") are done in a fixed interval of 20 seconds, just like they were before (240/12). Signed-off-by: Martin Kepplinger <martin.kepplinger@ginzinger.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8950dcd8 |
|
23-Jan-2019 |
Lu Baolu <baolu.lu@linux.intel.com> |
iommu/vt-d: Leave scalable mode default off Commit 765b6a98c1de3 ("iommu/vt-d: Enumerate the scalable mode capability") enables VT-d scalable mode if hardware advertises the capability. As we will bring up different features and use cases to upstream in different patch series, it will leave some intermediate kernel versions which support partial features. Hence, end user might run into problems when they use such kernels on bare metals or virtualization environments. This leaves scalable mode default off and end users could turn it on with "intel-iommu=sm_on" only when they have clear ideas about which scalable features are supported in the kernel. Cc: Liu Yi L <yi.l.liu@intel.com> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Suggested-by: Ashok Raj <ashok.raj@intel.com> Suggested-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
#
1a4762b9 |
|
20-Nov-2018 |
Paul E. McKenney <paulmck@kernel.org> |
doc: Now jiffies_till_sched_qs solicits help from cond_resched() The rcutree.jiffies_till_sched_qs kernel boot parameter used to solicit help only from rcu_note_context_switch(), but now also solicits help from cond_resched(). This commit therefore updates kernel-parameters.txt accordingly. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
|
#
2ccaff10 |
|
12-Dec-2018 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Add sysrq rcu_node-dump capability Life is hard if RCU manages to get stuck without triggering RCU CPU stall warnings or triggering the rcu_check_gp_start_stall() checks for failing to start a grace period. This commit therefore adds a boot-time-selectable sysrq key (commandeering "y") that allows manually dumping Tree RCU state. The new rcutree.sysrq_rcu kernel boot parameter must be set for this sysrq to be available. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
|
#
79f7865d |
|
19-Sep-2018 |
Kees Cook <keescook@chromium.org> |
LSM: Introduce "lsm=" for boottime LSM selection Provide a way to explicitly choose LSM initialization order via the new "lsm=" comma-separated list of LSMs. Signed-off-by: Kees Cook <keescook@chromium.org>
|
#
d999bd93 |
|
03-Jan-2019 |
Feng Tang <feng.tang@intel.com> |
panic: add options to print system info when panic happens Kernel panic issues are always painful to debug, partially because it's not easy to get enough information of the context when panic happens. And we have ramoops and kdump for that, while this commit tries to provide a easier way to show the system info by adding a cmdline parameter, referring some idea from sysrq handler. Link: http://lkml.kernel.org/r/1543398842-19295-2-git-send-email-feng.tang@intel.com Signed-off-by: Feng Tang <feng.tang@intel.com> Reviewed-by: Kees Cook <keescook@chromium.org> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: John Stultz <john.stultz@linaro.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
c10b1332 |
|
18-Dec-2018 |
Manivannan Sadhasivam <mani@kernel.org> |
tty: serial: Add RDA8810PL UART driver Add UART driver for RDA Micro RDA8810PL SoC. Signed-off-by: Andreas Färber <afaerber@suse.de> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Olof Johansson <olof@lixom.net>
|
#
3fc9c12d |
|
28-Dec-2018 |
Tejun Heo <tj@kernel.org> |
cgroup: Add named hierarchy disabling to cgroup_no_v1 boot param It can be useful to inhibit all cgroup1 hierarchies especially during transition and for debugging. cgroup_no_v1 can block hierarchies with controllers which leaves out the named hierarchies. Expand it to cover the named hierarchies so that "cgroup_no_v1=all,named" disables all cgroup1 hierarchies. Signed-off-by: Tejun Heo <tj@kernel.org> Suggested-by: Marcin Pawlowski <mpawlowski@fb.com> Signed-off-by: Tejun Heo <tj@kernel.org>
|
#
e59f5bd7 |
|
12-Dec-2018 |
Diana Craciun <diana.craciun@nxp.com> |
powerpc/fsl: Add FSL_PPC_BOOK3E as supported arch for nospectre_v2 boot arg Signed-off-by: Diana Craciun <diana.craciun@nxp.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
61cb5758 |
|
05-Dec-2018 |
Rafael J. Wysocki <rafael.j.wysocki@intel.com> |
cpuidle: Add cpuidle.governor= command line parameter Add cpuidle.governor= command line parameter to allow the default cpuidle governor to be replaced. That is useful, for example, if someone running a tickful kernel wants to use the menu governor on it. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
#
5b5e4d62 |
|
13-Nov-2018 |
Michal Hocko <mhocko@suse.com> |
x86/speculation/l1tf: Drop the swap storage limit restriction when l1tf=off Swap storage is restricted to max_swapfile_size (~16TB on x86_64) whenever the system is deemed affected by L1TF vulnerability. Even though the limit is quite high for most deployments it seems to be too restrictive for deployments which are willing to live with the mitigation disabled. We have a customer to deploy 8x 6,4TB PCIe/NVMe SSD swap devices which is clearly out of the limit. Drop the swap restriction when l1tf=off is specified. It also doesn't make much sense to warn about too much memory for the l1tf mitigation when it is forcefully disabled by the administrator. [ tglx: Folded the documentation delta change ] Fixes: 377eeaa8e11f ("x86/speculation/l1tf: Limit swap file size to MAX_PA/2") Signed-off-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Pavel Tatashin <pasha.tatashin@soleen.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Kosina <jkosina@suse.cz> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: <linux-mm@kvack.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20181113184910.26697-1-mhocko@kernel.org
|
#
765b6a98 |
|
09-Dec-2018 |
Lu Baolu <baolu.lu@linux.intel.com> |
iommu/vt-d: Enumerate the scalable mode capability The Intel vt-d spec rev3.0 introduces a new translation mode called scalable mode, which enables PASID-granular translations for first level, second level, nested and pass-through modes. At the same time, the previous Extended Context (ECS) mode is deprecated (no production ever implements ECS). This patch adds enumeration for Scalable Mode and removes the deprecated ECS enumeration. It provides a boot time option to disable scalable mode even hardware claims to support it. Cc: Ashok Raj <ashok.raj@intel.com> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Cc: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Sanjay Kumar <sanjay.k.kumar@intel.com> Signed-off-by: Liu Yi L <yi.l.liu@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Ashok Raj <ashok.raj@intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
#
fc6f9c57 |
|
27-Aug-2018 |
Paul E. McKenney <paulmck@kernel.org> |
rcutorture: Remove cbflood facility Now that the forward-progress code does a full-bore continuous callback flood lasting multiple seconds, there is little point in also posting a mere 60,000 callbacks every second or so. This commit therefore removes the old cbflood testing. Over time, it may be desirable to concurrently do full-bore continuous callback floods on all CPUs simultaneously, but one dragon at a time. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
e0c27447 |
|
30-Nov-2018 |
Johannes Weiner <hannes@cmpxchg.org> |
psi: make disabling/enabling easier for vendor kernels Mel Gorman reports a hackbench regression with psi that would prohibit shipping the suse kernel with it default-enabled, but he'd still like users to be able to opt in at little to no cost to others. With the current combination of CONFIG_PSI and the psi_disabled bool set from the commandline, this is a challenge. Do the following things to make it easier: 1. Add a config option CONFIG_PSI_DEFAULT_DISABLED that allows distros to enable CONFIG_PSI in their kernel but leave the feature disabled unless a user requests it at boot-time. To avoid double negatives, rename psi_disabled= to psi=. 2. Make psi_disabled a static branch to eliminate any branch costs when the feature is disabled. In terms of numbers before and after this patch, Mel says: : The following is a comparision using CONFIG_PSI=n as a baseline against : your patch and a vanilla kernel : : 4.20.0-rc4 4.20.0-rc4 4.20.0-rc4 : kconfigdisable-v1r1 vanilla psidisable-v1r1 : Amean 1 1.3100 ( 0.00%) 1.3923 ( -6.28%) 1.3427 ( -2.49%) : Amean 3 3.8860 ( 0.00%) 4.1230 * -6.10%* 3.8860 ( -0.00%) : Amean 5 6.8847 ( 0.00%) 8.0390 * -16.77%* 6.7727 ( 1.63%) : Amean 7 9.9310 ( 0.00%) 10.8367 * -9.12%* 9.9910 ( -0.60%) : Amean 12 16.6577 ( 0.00%) 18.2363 * -9.48%* 17.1083 ( -2.71%) : Amean 18 26.5133 ( 0.00%) 27.8833 * -5.17%* 25.7663 ( 2.82%) : Amean 24 34.3003 ( 0.00%) 34.6830 ( -1.12%) 32.0450 ( 6.58%) : Amean 30 40.0063 ( 0.00%) 40.5800 ( -1.43%) 41.5087 ( -3.76%) : Amean 32 40.1407 ( 0.00%) 41.2273 ( -2.71%) 39.9417 ( 0.50%) : : It's showing that the vanilla kernel takes a hit (as the bisection : indicated it would) and that disabling PSI by default is reasonably : close in terms of performance for this particular workload on this : particular machine so; Link: http://lkml.kernel.org/r/20181127165329.GA29728@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Tested-by: Mel Gorman <mgorman@techsingularity.net> Reported-by: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
55a97402 |
|
25-Nov-2018 |
Thomas Gleixner <tglx@linutronix.de> |
x86/speculation: Provide IBPB always command line options Provide the possibility to enable IBPB always in combination with 'prctl' and 'seccomp'. Add the extra command line options and rework the IBPB selection to evaluate the command instead of the mode selected by the STIPB switch case. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Casey Schaufler <casey.schaufler@intel.com> Cc: Asit Mallick <asit.k.mallick@intel.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Jon Masters <jcm@redhat.com> Cc: Waiman Long <longman9394@gmail.com> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Dave Stewart <david.c.stewart@intel.com> Cc: Kees Cook <keescook@chromium.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20181125185006.144047038@linutronix.de
|
#
6b3e64c2 |
|
25-Nov-2018 |
Thomas Gleixner <tglx@linutronix.de> |
x86/speculation: Add seccomp Spectre v2 user space protection mode If 'prctl' mode of user space protection from spectre v2 is selected on the kernel command-line, STIBP and IBPB are applied on tasks which restrict their indirect branch speculation via prctl. SECCOMP enables the SSBD mitigation for sandboxed tasks already, so it makes sense to prevent spectre v2 user space to user space attacks as well. The Intel mitigation guide documents how STIPB works: Setting bit 1 (STIBP) of the IA32_SPEC_CTRL MSR on a logical processor prevents the predicted targets of indirect branches on any logical processor of that core from being controlled by software that executes (or executed previously) on another logical processor of the same core. Ergo setting STIBP protects the task itself from being attacked from a task running on a different hyper-thread and protects the tasks running on different hyper-threads from being attacked. While the document suggests that the branch predictors are shielded between the logical processors, the observed performance regressions suggest that STIBP simply disables the branch predictor more or less completely. Of course the document wording is vague, but the fact that there is also no requirement for issuing IBPB when STIBP is used points clearly in that direction. The kernel still issues IBPB even when STIBP is used until Intel clarifies the whole mechanism. IBPB is issued when the task switches out, so malicious sandbox code cannot mistrain the branch predictor for the next user space task on the same logical processor. Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Casey Schaufler <casey.schaufler@intel.com> Cc: Asit Mallick <asit.k.mallick@intel.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Jon Masters <jcm@redhat.com> Cc: Waiman Long <longman9394@gmail.com> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Dave Stewart <david.c.stewart@intel.com> Cc: Kees Cook <keescook@chromium.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20181125185006.051663132@linutronix.de
|
#
7cc765a6 |
|
25-Nov-2018 |
Thomas Gleixner <tglx@linutronix.de> |
x86/speculation: Enable prctl mode for spectre_v2_user Now that all prerequisites are in place: - Add the prctl command line option - Default the 'auto' mode to 'prctl' - When SMT state changes, update the static key which controls the conditional STIBP evaluation on context switch. - At init update the static key which controls the conditional IBPB evaluation on context switch. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Casey Schaufler <casey.schaufler@intel.com> Cc: Asit Mallick <asit.k.mallick@intel.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Jon Masters <jcm@redhat.com> Cc: Waiman Long <longman9394@gmail.com> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Dave Stewart <david.c.stewart@intel.com> Cc: Kees Cook <keescook@chromium.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20181125185005.958421388@linutronix.de
|
#
fa1202ef |
|
25-Nov-2018 |
Thomas Gleixner <tglx@linutronix.de> |
x86/speculation: Add command line control for indirect branch speculation Add command line control for user space indirect branch speculation mitigations. The new option is: spectre_v2_user= The initial options are: - on: Unconditionally enabled - off: Unconditionally disabled -auto: Kernel selects mitigation (default off for now) When the spectre_v2= command line argument is either 'on' or 'off' this implies that the application to application control follows that state even if a contradicting spectre_v2_user= argument is supplied. Originally-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Casey Schaufler <casey.schaufler@intel.com> Cc: Asit Mallick <asit.k.mallick@intel.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Jon Masters <jcm@redhat.com> Cc: Waiman Long <longman9394@gmail.com> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Dave Stewart <david.c.stewart@intel.com> Cc: Kees Cook <keescook@chromium.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20181125185005.082720373@linutronix.de
|
#
2a5bf23d |
|
20-Nov-2018 |
Peter Zijlstra <peterz@infradead.org> |
perf/x86/intel: Fix regression by default disabling perfmon v4 interrupt handling Kyle Huey reported that 'rr', a replay debugger, broke due to the following commit: af3bdb991a5c ("perf/x86/intel: Add a separate Arch Perfmon v4 PMI handler") Rework the 'disable_counter_freezing' __setup() parameter such that we can explicitly enable/disable it and switch to default disabled. To this purpose, rename the parameter to "perf_v4_pmi=" which is a much better description and allows requiring a bool argument. [ mingo: Improved the changelog some more. ] Reported-by: Kyle Huey <me@kylehuey.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert O'Callahan <robert@ocallahan.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: acme@kernel.org Link: http://lkml.kernel.org/r/20181120170842.GZ2131@hirez.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
806654a9 |
|
19-Nov-2018 |
Will Deacon <will@kernel.org> |
Documentation: Use "while" instead of "whilst" Whilst making an unrelated change to some Documentation, Linus sayeth: | Afaik, even in Britain, "whilst" is unusual and considered more | formal, and "while" is the common word. | | [...] | | Can we just admit that we work with computers, and we don't need to | use þe eald Englisc spelling of words that most of the world never | uses? dictionary.com refers to the word as "Chiefly British", which is probably an undesirable attribute for technical documentation. Replace all occurrences under Documentation/ with "while". Cc: David Howells <dhowells@redhat.com> Cc: Liam Girdwood <lgirdwood@gmail.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Michael Halcrow <mhalcrow@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
|
#
ed8f6fb2 |
|
01-Oct-2018 |
Paul E. McKenney <paulmck@kernel.org> |
doc: Document rcutorture forward-progress test kernel parameters Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
|
#
781f0766 |
|
19-Oct-2018 |
Kai-Heng Feng <kai.heng.feng@canonical.com> |
USB: Wait for extra delay time after USB_PORT_FEAT_RESET for quirky hub Devices connected under Terminus Technology Inc. Hub (1a40:0101) may fail to work after the system resumes from suspend: [ 206.063325] usb 3-2.4: reset full-speed USB device number 4 using xhci_hcd [ 206.143691] usb 3-2.4: device descriptor read/64, error -32 [ 206.351671] usb 3-2.4: device descriptor read/64, error -32 Info for this hub: T: Bus=03 Lev=01 Prnt=01 Port=01 Cnt=01 Dev#= 2 Spd=480 MxCh= 4 D: Ver= 2.00 Cls=09(hub ) Sub=00 Prot=01 MxPS=64 #Cfgs= 1 P: Vendor=1a40 ProdID=0101 Rev=01.11 S: Product=USB 2.0 Hub C: #Ifs= 1 Cfg#= 1 Atr=e0 MxPwr=100mA I: If#= 0 Alt= 0 #EPs= 1 Cls=09(hub ) Sub=00 Prot=00 Driver=hub Some expirements indicate that the USB devices connected to the hub are innocent, it's the hub itself is to blame. The hub needs extra delay time after it resets its port. Hence wait for extra delay, if the device is connected to this quirky hub. Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Cc: stable <stable@vger.kernel.org> Acked-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
11295055 |
|
01-Nov-2018 |
Laurence Oberman <loberman@redhat.com> |
watchdog/core: Add watchdog_thresh command line parameter The hard and soft lockup detector threshold has a default value of 10 seconds which can only be changed via sysctl. During early boot lockup detection can trigger when noisy debugging emits a large amount of messages to the console, but there is no way to set a larger threshold on the kernel command line. The detector can only be completely disabled. Add a new watchdog_thresh= command line parameter to allow boot time control over the threshold. It works in the same way as the sysctl and affects both the soft and the hard lockup detectors. Signed-off-by: Laurence Oberman <loberman@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: rdunlap@infradead.org Cc: prarit@redhat.com Link: https://lkml.kernel.org/r/1541079018-13953-1-git-send-email-loberman@redhat.com
|
#
f682a97a |
|
26-Oct-2018 |
Alexander Duyck <alexander.h.duyck@linux.intel.com> |
mm: provide kernel parameter to allow disabling page init poisoning Patch series "Address issues slowing persistent memory initialization", v5. The main thing this patch set achieves is that it allows us to initialize each node worth of persistent memory independently. As a result we reduce page init time by about 2 minutes because instead of taking 30 to 40 seconds per node and going through each node one at a time, we process all 4 nodes in parallel in the case of a 12TB persistent memory setup spread evenly over 4 nodes. This patch (of 3): On systems with a large amount of memory it can take a significant amount of time to initialize all of the page structs with the PAGE_POISON_PATTERN value. I have seen it take over 2 minutes to initialize a system with over 12TB of RAM. In order to work around the issue I had to disable CONFIG_DEBUG_VM and then the boot time returned to something much more reasonable as the arch_add_memory call completed in milliseconds versus seconds. However in doing that I had to disable all of the other VM debugging on the system. In order to work around a kernel that might have CONFIG_DEBUG_VM enabled on a system that has a large amount of memory I have added a new kernel parameter named "vm_debug" that can be set to "-" in order to disable it. Link: http://lkml.kernel.org/r/20180925201921.3576.84239.stgit@localhost.localdomain Reviewed-by: Pavel Tatashin <pavel.tatashin@microsoft.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
9b8c7c14 |
|
10-Oct-2018 |
Kees Cook <keescook@chromium.org> |
LSM: Provide init debugging infrastructure Booting with "lsm.debug" will report future details on how LSM ordering decisions are being made. Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Casey Schaufler <casey@schaufler-ca.com> Reviewed-by: John Johansen <john.johansen@canonical.com> Reviewed-by: James Morris <james.morris@microsoft.com> Signed-off-by: James Morris <james.morris@microsoft.com>
|
#
3a025de6 |
|
08-Oct-2018 |
Yi Sun <yi.y.sun@linux.intel.com> |
x86/hyperv: Enable PV qspinlock for Hyper-V Implement the required wait and kick callbacks to support PV spinlocks in Hyper-V guests. [ tglx: Document the requirement for disabling interrupts in the wait() callback. Remove goto and unnecessary includes. Add prototype for hv_vcpu_is_preempted(). Adapted to pending paravirt changes. ] Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Juergen Gross <jgross@suse.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Michael Kelley (EOSG) <Michael.H.Kelley@microsoft.com> Cc: chao.p.peng@intel.com Cc: chao.gao@intel.com Cc: isaku.yamahata@intel.com Cc: tianyu.lan@microsoft.com Link: https://lkml.kernel.org/r/1538987374-51217-3-git-send-email-yi.y.sun@linux.intel.com
|
#
d90fe2ac |
|
28-Sep-2018 |
Christophe Leroy <christophe.leroy@c-s.fr> |
powerpc: Wire up memtest Add call to early_memtest() so that kernel compiled with CONFIG_MEMTEST really perform memtest at startup when requested via 'memtest' boot parameter. Tested-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
bd0e6c96 |
|
28-Sep-2018 |
Zeng Tao <prime.zeng@hisilicon.com> |
usb: hub: try old enumeration scheme first for high speed devices The new scheme is required just to support legacy low and full-speed devices. For high speed devices, it will slower the enumeration speed. So in this patch we try the "old" enumeration scheme first for high speed devices, and this is what Windows does since Windows 8. Signed-off-by: Zeng Tao <prime.zeng@hisilicon.com> Acked-by: Alan Stern <stern@rowland.harvard.edu> Reviewed-by: Roger Quadros <rogerq@ti.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
d2266bbf |
|
02-Oct-2018 |
Feng Tang <feng.tang@intel.com> |
x86/earlyprintk: Add a force option for pciserial device The "pciserial" earlyprintk variant helps much on many modern x86 platforms, but unfortunately there are still some platforms with PCI UART devices which have the wrong PCI class code. In that case, the current class code check does not allow for them to be used for logging. Add a sub-option "force" which overrides the class code check and thus the use of such device can be enforced. [ bp: massage formulations. ] Suggested-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Feng Tang <feng.tang@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: "Stuart R . Anderson" <stuart.r.anderson@intel.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: David Rientjes <rientjes@google.com> Cc: Feng Tang <feng.tang@intel.com> Cc: Frederic Weisbecker <frederic@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: H Peter Anvin <hpa@linux.intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kai-Heng Feng <kai.heng.feng@canonical.com> Cc: Kate Stewart <kstewart@linuxfoundation.org> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Philippe Ombredanne <pombredanne@nexb.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thymo van Beers <thymovanbeers@gmail.com> Cc: alan@linux.intel.com Cc: linux-doc@vger.kernel.org Link: http://lkml.kernel.org/r/20181002164921.25833-1-feng.tang@intel.com
|
#
af3bdb99 |
|
08-Aug-2018 |
Andi Kleen <ak@linux.intel.com> |
perf/x86/intel: Add a separate Arch Perfmon v4 PMI handler Implements counter freezing for Arch Perfmon v4 (Skylake and newer). This allows to speed up the PMI handler by avoiding unnecessary MSR writes and make it more accurate. The Arch Perfmon v4 PMI handler is substantially different than the older PMI handler. Differences to the old handler: - It relies on counter freezing, which eliminates several MSR writes from the PMI handler and lowers the overhead significantly. It makes the PMI handler more accurate, as all counters get frozen atomically as soon as any counter overflows. So there is much less counting of the PMI handler itself. With the freezing we don't need to disable or enable counters or PEBS. Only BTS which does not support auto-freezing still needs to be explicitly managed. - The PMU acking is done at the end, not the beginning. This makes it possible to avoid manual enabling/disabling of the PMU, instead we just rely on the freezing/acking. - The APIC is acked before reenabling the PMU, which avoids problems with LBRs occasionally not getting unfreezed on Skylake. - Looping is only needed to workaround a corner case which several PMIs are very close to each other. For common cases, the counters are freezed during PMI handler. It doesn't need to do re-check. This patch: - Adds code to enable v4 counter freezing - Fork <=v3 and >=v4 PMI handlers into separate functions. - Add kernel parameter to disable counter freezing. It took some time to debug counter freezing, so in case there are new problems we added an option to turn it off. Would not expect this to be used until there are new bugs. - Only for big core. The patch for small core will be posted later separately. Performance: When profiling a kernel build on Kabylake with different perf options, measuring the length of all NMI handlers using the nmi handler trace point: V3 is without counter freezing. V4 is with counter freezing. The value is the average cost of the PMI handler. (lower is better) perf options ` V3(ns) V4(ns) delta -c 100000 1088 894 -18% -g -c 100000 1862 1646 -12% --call-graph lbr -c 100000 3649 3367 -8% --c.g. dwarf -c 100000 2248 1982 -12% Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: acme@kernel.org Link: http://lkml.kernel.org/r/1533712328-2834-2-git-send-email-kan.liang@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
68a6efe8 |
|
20-Sep-2018 |
Zhen Lei <thunder.leizhen@huawei.com> |
iommu: Add "iommu.strict" command line option Add a generic command line option to enable lazy unmapping via IOVA flush queues, which will initally be suuported by iommu-dma. This echoes the semantics of "intel_iommu=strict" (albeit with the opposite default value), but in the driver-agnostic fashion of "iommu.passthrough". Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> [rm: move handling out of SMMUv3 driver, clean up documentation] Signed-off-by: Robin Murphy <robin.murphy@arm.com> [will: dropped broken printk when parsing command-line option] Signed-off-by: Will Deacon <will.deacon@arm.com>
|
#
9d723b4c |
|
20-Sep-2018 |
Robin Murphy <robin.murphy@arm.com> |
iommu: Fix passthrough option documentation Document that the default for "iommu.passthrough" is now configurable. Fixes: 58d1131777a4 ("iommu: Add config option to set passthrough as default") Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
|
#
197ecb38 |
|
07-Sep-2018 |
Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> |
xen/balloon: add runtime control for scrubbing ballooned out pages Scrubbing pages on initial balloon down can take some time, especially in nested virtualization case (nested EPT is slow). When HVM/PVH guest is started with memory= significantly lower than maxmem=, all the extra pages will be scrubbed before returning to Xen. But since most of them weren't used at all at that point, Xen needs to populate them first (from populate-on-demand pool). In nested virt case (Xen inside KVM) this slows down the guest boot by 15-30s with just 1.5GB needed to be returned to Xen. Add runtime parameter to enable/disable it, to allow initially disabling scrubbing, then enable it back during boot (for example in initramfs). Such usage relies on assumption that a) most pages ballooned out during initial boot weren't used at all, and b) even if they were, very few secrets are in the guest at that time (before any serious userspace kicks in). Convert CONFIG_XEN_SCRUB_PAGES to CONFIG_XEN_SCRUB_PAGES_DEFAULT (also enabled by default), controlling default value for the new runtime switch. Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
|
#
9b254366 |
|
27-Aug-2018 |
Kees Cook <keescook@chromium.org> |
random: make CPU trust a boot parameter Instead of forcing a distro or other system builder to choose at build time whether the CPU is trusted for CRNG seeding via CONFIG_RANDOM_TRUST_CPU, provide a boot-time parameter for end users to control the choice. The CONFIG will set the default state instead. Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
c06aed0e |
|
25-Jul-2018 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Compute jiffies_till_sched_qs from other kernel parameters The jiffies_till_sched_qs value used to determine how old a grace period must be before RCU enlists the help of the scheduler to force a quiescent state on the holdout CPU. Currently, this defaults to HZ/10 regardless of system size and may be set only at boot time. This can be a problem for very large systems, because if the values of the jiffies_till_first_fqs and jiffies_till_next_fqs kernel parameters are left at their defaults, they are calculated to increase as the number of CPUs actually configured on the system increases. Thus, on a sufficiently large system, RCU would enlist the help of the scheduler before the grace-period kthread had a chance to scan for idle CPUs, which wastes CPU time. This commit therefore allows jiffies_till_sched_qs to be set, if desired, but if left as default, computes is as jiffies_till_first_fqs plus twice jiffies_till_next_fqs, thus allowing three force-quiescent-state scans for idle CPUs. This scales with the number of CPUs, providing sensible default values. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
72ce30dd |
|
07-Jul-2018 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Stop testing RCU-bh and RCU-sched Now that the RCU-bh and RCU-sched update-side functions are simple wrappers around their RCU counterparts, there isn't a whole lot of point in testing them. This commit therefore removes the self-test capability and removes the corresponding kernel-boot parameters. It also updates the various rcutorture .boot files to remove the kernel boot parameters that call for testing RCU-bh and RCU-sched. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
77095901 |
|
02-Jul-2018 |
Paul E. McKenney <paulmck@kernel.org> |
doc: Update removal of RCU-bh/sched update machinery The RCU-bh update API is now defined in terms of that of RCU-bh and RCU-sched, so this commit updates the documentation accordingly. In addition, although RCU-sched persists in !PREEMPT kernels, in the PREEMPT case its update API is now defined in terms of that of RCU-preempt, so this commit also updates the documentation accordingly. While in the area, this commit removes the documentation for the now-obsolete synchronize_rcu_mult() and clarifies the Tasks RCU documentation. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
8c9a134c |
|
21-Aug-2018 |
Kees Cook <keescook@chromium.org> |
mm: clarify CONFIG_PAGE_POISONING and usage The Kconfig text for CONFIG_PAGE_POISONING doesn't mention that it has to be enabled explicitly. This updates the documentation for that and adds a note about CONFIG_PAGE_POISONING to the "page_poison" command line docs. While here, change description of CONFIG_PAGE_POISONING_ZERO too, as it's not "random" data, but rather the fixed debugging value that would be used when not zeroing. Additionally removes a stray "bool" in the Kconfig. Link: http://lkml.kernel.org/r/20180725223832.GA43733@beast Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Laura Abbott <labbott@redhat.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
aaca43fd |
|
30-Jul-2018 |
Logan Gunthorpe <logang@deltatee.com> |
PCI: Add "pci=disable_acs_redir=" parameter for peer-to-peer support To support peer-to-peer traffic on a segment of the PCI hierarchy, we must disable the ACS redirect bits for select PCI bridges. The bridges must be selected before the devices are discovered by the kernel and the IOMMU groups created. Therefore, add a kernel command line parameter to specify devices which must have their ACS bits disabled. The new parameter takes a list of devices separated by a semicolon. Each device specified will have its ACS redirect bits disabled. This is similar to the existing 'resource_alignment' parameter. The ACS Request P2P Request Redirect, P2P Completion Redirect and P2P Egress Control bits are disabled, which is sufficient to always allow passing P2P traffic uninterrupted. The bits are set after the kernel (optionally) enables the ACS bits itself. It is also done regardless of whether the kernel or platform firmware sets the bits. If the user tries to disable the ACS redirect for a device without the ACS capability, print a warning to dmesg. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> [bhelgaas: reorder to add the generic code first and move the device-specific quirk to subsequent patches] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Stephen Bates <sbates@raithlin.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Christian König <christian.koenig@amd.com>
|
#
45db3370 |
|
30-Jul-2018 |
Logan Gunthorpe <logang@deltatee.com> |
PCI: Allow specifying devices using a base bus and path of devfns When specifying PCI devices on the kernel command line using a bus/device/function address, bus numbers can change when adding or replacing a device, changing motherboard firmware, or applying kernel parameters like "pci=assign-buses". When bus numbers change, it's likely the command line tweak will be applied to the wrong device. Therefore, it is useful to be able to specify devices with a base bus number and the path of devfns needed to get to it, similar to the "device scope" structure in the Intel VT-d spec, Section 8.3.1. Thus, we add an option to specify devices in the following format: [<domain>:]<bus>:<device>.<func>[/<device>.<func>]* The path can be any segment within the PCI hierarchy of any length and determined through the use of 'lspci -t'. When specified this way, it is less likely that a renumbered bus will result in a valid device specification and the tweak won't be applied to the wrong device. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> [bhelgaas: use "device" instead of "slot" in documentation since that's the usual language in the PCI specs] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Stephen Bates <sbates@raithlin.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Christian König <christian.koenig@amd.com>
|
#
07d8d7e5 |
|
30-Jul-2018 |
Logan Gunthorpe <logang@deltatee.com> |
PCI: Make specifying PCI devices in kernel parameters reusable Separate out the code to match a PCI device with a string (typically originating from a kernel parameter) from the pci_specified_resource_alignment() function into its own helper function. While we are at it, this change fixes the kernel style of the function (fixing a number of long lines and extra parentheses). Additionally, make the analogous change to the kernel parameter documentation: Separate the description of how to specify a PCI device into its own section at the head of the "pci=" parameter. This patch should have no functional alterations. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> [bhelgaas: use "device" instead of "slot" in documentation since that's the usual language in the PCI specs] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Stephen Bates <sbates@raithlin.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Christian König <christian.koenig@amd.com>
|
#
26cb1f36 |
|
27-Jul-2018 |
Diana Craciun <diana.craciun@nxp.com> |
Documentation: Add nospectre_v1 parameter Currently only supported on powerpc. Signed-off-by: Diana Craciun <diana.craciun@nxp.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
58d11317 |
|
20-Jul-2018 |
Olof Johansson <olof@lixom.net> |
iommu: Add config option to set passthrough as default This allows the default behavior to be controlled by a kernel config option instead of changing the commandline for the kernel to include "iommu.passthrough=on" or "iommu=pt" on machines where this is desired. Likewise, for machines where this config option is enabled, it can be disabled at boot time with "iommu.passthrough=off" or "iommu=nopt". Also corrected iommu=pt documentation for IA-64, since it has no code that parses iommu= at all. Signed-off-by: Olof Johansson <olof@lixom.net> Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
#
fe9af81e |
|
19-Jul-2018 |
Pavel Tatashin <pasha.tatashin@oracle.com> |
x86/tsc: Redefine notsc to behave as tsc=unstable Currently, the notsc kernel parameter disables the use of the TSC by sched_clock(). However, this parameter does not prevent the kernel from accessing tsc in other places. The only rationale to boot with notsc is to avoid timing discrepancies on multi-socket systems where TSC are not properly synchronized, and thus exclude TSC from being used for time keeping. But that prevents using TSC as sched_clock() as well, which is not necessary as the core sched_clock() implementation can handle non synchronized TSC based sched clocks just fine. However, there is another method to solve the above problem: booting with tsc=unstable parameter. This parameter allows sched_clock() to use TSC and just excludes it from timekeeping. So there is no real reason to keep notsc, but for compatibility reasons the parameter has to stay. Make it behave like 'tsc=unstable' instead. [ tglx: Massaged changelog ] Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dou Liyang <douly.fnst@cn.fujitsu.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: steven.sistare@oracle.com Cc: daniel.m.jordan@oracle.com Cc: linux@armlinux.org.uk Cc: schwidefsky@de.ibm.com Cc: heiko.carstens@de.ibm.com Cc: john.stultz@linaro.org Cc: sboyd@codeaurora.org Cc: hpa@zytor.com Cc: peterz@infradead.org Cc: prarit@redhat.com Cc: feng.tang@intel.com Cc: pmladek@suse.com Cc: gnomes@lxorguk.ukuu.org.uk Cc: linux-s390@vger.kernel.org Cc: boris.ostrovsky@oracle.com Cc: jgross@suse.com Cc: pbonzini@redhat.com Link: https://lkml.kernel.org/r/20180719205545.16512-12-pasha.tatashin@oracle.com
|
#
3672476e |
|
21-Jun-2018 |
Tobin C. Harding <me@tobin.cc> |
vsprintf: Add command line option debug_boot_weak_hash Currently printing [hashed] pointers requires enough entropy to be available. Early in the boot sequence this may not be the case resulting in a dummy string '(____ptrval____)' being printed. This makes debugging the early boot sequence difficult. We can relax the requirement to use cryptographically secure hashing during debugging. This enables debugging while keeping development/production kernel behaviour the same. If new command line option debug_boot_weak_hash is enabled use cryptographically insecure hashing and hash pointer value immediately. Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Tobin C. Harding <me@tobin.cc> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
d90a7a0e |
|
13-Jul-2018 |
Jiri Kosina <jkosina@suse.cz> |
x86/bugs, kvm: Introduce boot-time control of L1TF mitigations Introduce the 'l1tf=' kernel command line option to allow for boot-time switching of mitigation that is used on processors affected by L1TF. The possible values are: full Provides all available mitigations for the L1TF vulnerability. Disables SMT and enables all mitigations in the hypervisors. SMT control via /sys/devices/system/cpu/smt/control is still possible after boot. Hypervisors will issue a warning when the first VM is started in a potentially insecure configuration, i.e. SMT enabled or L1D flush disabled. full,force Same as 'full', but disables SMT control. Implies the 'nosmt=force' command line option. sysfs control of SMT and the hypervisor flush control is disabled. flush Leaves SMT enabled and enables the conditional hypervisor mitigation. Hypervisors will issue a warning when the first VM is started in a potentially insecure configuration, i.e. SMT enabled or L1D flush disabled. flush,nosmt Disables SMT and enables the conditional hypervisor mitigation. SMT control via /sys/devices/system/cpu/smt/control is still possible after boot. If SMT is reenabled or flushing disabled at runtime hypervisors will issue a warning. flush,nowarn Same as 'flush', but hypervisors will not warn when a VM is started in a potentially insecure configuration. off Disables hypervisor mitigations and doesn't emit any warnings. Default is 'flush'. Let KVM adhere to these semantics, which means: - 'lt1f=full,force' : Performe L1D flushes. No runtime control possible. - 'l1tf=full' - 'l1tf-flush' - 'l1tf=flush,nosmt' : Perform L1D flushes and warn on VM start if SMT has been runtime enabled or L1D flushing has been run-time enabled - 'l1tf=flush,nowarn' : Perform L1D flushes and no warnings are emitted. - 'l1tf=off' : L1D flushes are not performed and no warnings are emitted. KVM can always override the L1D flushing behavior using its 'vmentry_l1d_flush' module parameter except when lt1f=full,force is set. This makes KVM's private 'nosmt' option redundant, and as it is a bit non-systematic anyway (this is something to control globally, not on hypervisor level), remove that option. Add the missing Documentation entry for the l1tf vulnerability sysfs file while at it. Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Jiri Kosina <jkosina@suse.cz> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> Link: https://lkml.kernel.org/r/20180713142323.202758176@linutronix.de
|
#
028be12b |
|
08-May-2018 |
Paul E. McKenney <paulmck@kernel.org> |
rcutorture: Change units of onoff_interval to jiffies Some RCU bugs have been sensitive to the frequency of CPU-hotplug operations, which have been gradually increased over time. But this frequency is now at the one-second lower limit that can be specified using the rcutorture.onoff_interval kernel parameter. This commit therefore changes the units of rcutorture.onoff_interval from seconds to jiffies, and also sets the value specified for this kernel parameter in the TREE03 rcutorture scenario to 200, which is 200 milliseconds for HZ=1000. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
6b4c1360 |
|
09-Jul-2018 |
Michael Ellerman <mpe@ellerman.id.au> |
Documentation: Add powerpc options for spec_store_bypass_disable Document the support for spec_store_bypass_disable that was added for powerpc in commit a048a07d7f45 ("powerpc/64s: Add support for a store forwarding barrier at kernel entry/exit"). Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Kees Cook <keescook@chromium.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
|
#
819d731f |
|
05-Jul-2018 |
Laurentiu Tudor <laurentiu.tudor@nxp.com> |
docs: kernel-parameters.txt: document xhci-hcd.quirks parameter This parameter introduced several years ago in the XHCI host controller driver was somehow left undocumented. Add a few lines in the kernel parameters text. Signed-off-by: Laurentiu Tudor <laurentiu.tudor@nxp.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
|
#
25b4e70d |
|
09-Jul-2018 |
Rob Herring <robh@kernel.org> |
driver core: allow stopping deferred probe after init Deferred probe will currently wait forever on dependent devices to probe, but sometimes a driver will never exist. It's also not always critical for a driver to exist. Platforms can rely on default configuration from the bootloader or reset defaults for things such as pinctrl and power domains. This is often the case with initial platform support until various drivers get enabled. There's at least 2 scenarios where deferred probe can render a platform broken. Both involve using a DT which has more devices and dependencies than the kernel supports. The 1st case is a driver may be disabled in the kernel config. The 2nd case is the kernel version may simply not have the dependent driver. This can happen if using a newer DT (provided by firmware perhaps) with a stable kernel version. Deferred probe issues can be difficult to debug especially if the console has dependencies or userspace fails to boot to a shell. There are also cases like IOMMUs where only built-in drivers are supported, so deferring probe after initcalls is not needed. The IOMMU subsystem implemented its own mechanism to handle this using OF_DECLARE linker sections. This commit adds makes ending deferred probe conditional on initcalls being completed or a debug timeout. Subsystems or drivers may opt-in by calling driver_deferred_probe_check_init_done() instead of unconditionally returning -EPROBE_DEFER. They may use additional information from DT or kernel's config to decide whether to continue to defer probe or not. The timeout mechanism is intended for debug purposes and WARNs loudly. The remaining deferred probe pending list will also be dumped after the timeout. Not that this timeout won't work for the console which needs to be enabled before userspace starts. However, if the console's dependencies are resolved, then the kernel log will be printed (as opposed to no output). Cc: Alexander Graf <agraf@suse.de> Signed-off-by: Rob Herring <robh@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
c0addc9a |
|
05-Jul-2018 |
Laurentiu Tudor <laurentiu.tudor@nxp.com> |
docs: kernel-parameters.txt: document xhci-hcd.quirks parameter This parameter introduced several years ago in the XHCI host controller driver was somehow left undocumented. Add a few lines in the kernel parameters text. Signed-off-by: Laurentiu Tudor <laurentiu.tudor@nxp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
a399477e |
|
01-Jul-2018 |
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> |
x86/KVM/VMX: Add module argument for L1TF mitigation Add a mitigation mode parameter "vmentry_l1d_flush" for CVE-2018-3620, aka L1 terminal fault. The valid arguments are: - "always" L1D cache flush on every VMENTER. - "cond" Conditional L1D cache flush, explained below - "never" Disable the L1D cache flush mitigation "cond" is trying to avoid L1D cache flushes on VMENTER if the code executed between VMEXIT and VMENTER is considered safe, i.e. is not bringing any interesting information into L1D which might exploited. [ tglx: Split out from a larger patch ] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
26acfb66 |
|
20-Jun-2018 |
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> |
x86/KVM: Warn user if KVM is loaded SMT and L1TF CPU bug being present If the L1TF CPU bug is present we allow the KVM module to be loaded as the major of users that use Linux and KVM have trusted guests and do not want a broken setup. Cloud vendors are the ones that are uncomfortable with CVE 2018-3620 and as such they are the ones that should set nosmt to one. Setting 'nosmt' means that the system administrator also needs to disable SMT (Hyper-threading) in the BIOS, or via the 'nosmt' command line parameter, or via the /sys/devices/system/cpu/smt/control. See commit 05736e4ac13c ("cpu/hotplug: Provide knobs to control SMT"). Other mitigations are to use task affinity, cpu sets, interrupt binding, etc - anything to make sure that _only_ the same guests vCPUs are running on sibling threads. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
b5cb15d9 |
|
03-Jul-2018 |
Chris von Recklinghausen <crecklin@redhat.com> |
usercopy: Allow boot cmdline disabling of hardening Enabling HARDENED_USERCOPY may cause measurable regressions in networking performance: up to 8% under UDP flood. I ran a small packet UDP flood using pktgen vs. a host b2b connected. On the receiver side the UDP packets are processed by a simple user space process that just reads and drops them: https://github.com/netoptimizer/network-testing/blob/master/src/udp_sink.c Not very useful from a functional PoV, but it helps to pin-point bottlenecks in the networking stack. When running a kernel with CONFIG_HARDENED_USERCOPY=y, I see a 5-8% regression in the receive tput, compared to the same kernel without this option enabled. With CONFIG_HARDENED_USERCOPY=y, perf shows ~6% of CPU time spent cumulatively in __check_object_size (~4%) and __virt_addr_valid (~2%). The call-chain is: __GI___libc_recvfrom entry_SYSCALL_64_after_hwframe do_syscall_64 __x64_sys_recvfrom __sys_recvfrom inet_recvmsg udp_recvmsg __check_object_size udp_recvmsg() actually calls copy_to_iter() (inlined) and the latters calls check_copy_size() (again, inlined). A generic distro may want to enable HARDENED_USERCOPY in their default kernel config, but at the same time, such distro may want to be able to avoid the performance penalties in with the default configuration and disable the stricter check on a per-boot basis. This change adds a boot parameter that conditionally disables HARDENED_USERCOPY via "hardened_usercopy=off". Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com> Signed-off-by: Kees Cook <keescook@chromium.org>
|
#
506a66f3 |
|
29-Jun-2018 |
Thomas Gleixner <tglx@linutronix.de> |
Revert "x86/apic: Ignore secondary threads if nosmt=force" Dave Hansen reported, that it's outright dangerous to keep SMT siblings disabled completely so they are stuck in the BIOS and wait for SIPI. The reason is that Machine Check Exceptions are broadcasted to siblings and the soft disabled sibling has CR4.MCE = 0. If a MCE is delivered to a logical core with CR4.MCE = 0, it asserts IERR#, which shuts down or reboots the machine. The MCE chapter in the SDM contains the following blurb: Because the logical processors within a physical package are tightly coupled with respect to shared hardware resources, both logical processors are notified of machine check errors that occur within a given physical processor. If machine-check exceptions are enabled when a fatal error is reported, all the logical processors within a physical package are dispatched to the machine-check exception handler. If machine-check exceptions are disabled, the logical processors enter the shutdown state and assert the IERR# signal. When enabling machine-check exceptions, the MCE flag in control register CR4 should be set for each logical processor. Reverting the commit which ignores siblings at enumeration time solves only half of the problem. The core cpuhotplug logic needs to be adjusted as well. This thoughtful engineered mechanism also turns the boot process on all Intel HT enabled systems into a MCE lottery. MCE is enabled on the boot CPU before the secondary CPUs are brought up. Depending on the number of physical cores the window in which this situation can happen is smaller or larger. On a HSW-EX it's about 750ms: MCE is enabled on the boot CPU: [ 0.244017] mce: CPU supports 22 MCE banks The corresponding sibling #72 boots: [ 1.008005] .... node #0, CPUs: #72 That means if an MCE hits on physical core 0 (logical CPUs 0 and 72) between these two points the machine is going to shutdown. At least it's a known safe state. It's obvious that the early boot can be hit by an MCE as well and then runs into the same situation because MCEs are not yet enabled on the boot CPU. But after enabling them on the boot CPU, it does not make any sense to prevent the kernel from recovering. Adjust the nosmt kernel parameter documentation as well. Reverts: 2207def700f9 ("x86/apic: Ignore secondary threads if nosmt=force") Reported-by: Dave Hansen <dave.hansen@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Tony Luck <tony.luck@intel.com>
|
#
11eb0e0e |
|
04-Jun-2018 |
Sinan Kaya <okaya@codeaurora.org> |
PCI: Make early dump functionality generic Move early dump functionality into common code so that it is available for all architectures. No need to carry arch-specific reads around as the read hooks are already initialized by the time pci_setup_device() is getting called during scan. Tested-by: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: Sinan Kaya <okaya@codeaurora.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
|
#
05736e4a |
|
29-May-2018 |
Thomas Gleixner <tglx@linutronix.de> |
cpu/hotplug: Provide knobs to control SMT Provide a command line and a sysfs knob to control SMT. The command line options are: 'nosmt': Enumerate secondary threads, but do not online them 'nosmt=force': Ignore secondary threads completely during enumeration via MP table and ACPI/MADT. The sysfs control file has the following states (read/write): 'on': SMT is enabled. Secondary threads can be freely onlined 'off': SMT is disabled. Secondary threads, even if enumerated cannot be onlined 'forceoff': SMT is permanentely disabled. Writes to the control file are rejected. 'notsupported': SMT is not supported by the CPU The command line option 'nosmt' sets the sysfs control to 'off'. This can be changed to 'on' to reenable SMT during runtime. The command line option 'nosmt=force' sets the sysfs control to 'forceoff'. This cannot be changed during runtime. When SMT is 'on' and the control file is changed to 'off' then all online secondary threads are offlined and attempts to online a secondary thread later on are rejected. When SMT is 'off' and the control file is changed to 'on' then secondary threads can be onlined again. The 'off' -> 'on' transition does not automatically online the secondary threads. When the control file is set to 'forceoff', the behaviour is the same as setting it to 'off', but the operation is irreversible and later writes to the control file are rejected. When the control status is 'notsupported' then writes to the control file are rejected. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Ingo Molnar <mingo@kernel.org>
|
#
1ca2c806 |
|
14-Jun-2018 |
Mauro Carvalho Chehab <mchehab+samsung@kernel.org> |
kernel-parameters.txt: fix pointers to sound parameters The alsa parameters file was renamed to alsa-configuration.rst. With regards to OSS, it got retired as a hole by at changeset 727dede0ba8a ("sound: Retire OSS"). So, it doesn't make sense to keep mentioning it at kernel-parameters.txt. Fixes: 727dede0ba8a ("sound: Retire OSS") Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Acked-by: Jonathan Corbet <corbet@lwn.net>
|
#
5fb94e9c |
|
08-May-2018 |
Mauro Carvalho Chehab <mchehab+samsung@kernel.org> |
docs: Fix some broken references As we move stuff around, some doc references are broken. Fix some of them via this script: ./scripts/documentation-file-ref-check --fix Manually checked if the produced result is valid, removing a few false-positives. Acked-by: Takashi Iwai <tiwai@suse.de> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Acked-by: Stephen Boyd <sboyd@kernel.org> Acked-by: Charles Keepax <ckeepax@opensource.wolfsonmicro.com> Acked-by: Mathieu Poirier <mathieu.poirier@linaro.org> Reviewed-by: Coly Li <colyli@suse.de> Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Acked-by: Jonathan Corbet <corbet@lwn.net>
|
#
a43ae4df |
|
29-May-2018 |
Marc Zyngier <maz@kernel.org> |
arm64: Add 'ssbd' command-line option On a system where the firmware implements ARCH_WORKAROUND_2, it may be useful to either permanently enable or disable the workaround for cases where the user decides that they'd rather not get a trap overhead, and keep the mitigation permanently on or off instead of switching it on exception entry/exit. In any case, default to the mitigation being enabled. Reviewed-by: Julien Grall <julien.grall@arm.com> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
#
a49d9c0a |
|
21-May-2018 |
Omar Sandoval <osandov@fb.com> |
Documentation: document hung_task_panic kernel parameter This parameter has been around since commit e162b39a368f ("softlockup: decouple hung tasks check from softlockup detection") in 2009 but was never documented. Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
|
#
06e9552f |
|
27-Apr-2018 |
Christoph Hellwig <hch@lst.de> |
x86/pci-dma: remove the experimental forcesac boot option Limiting the dma mask to avoid PCI (pre-PCIe) DAC cycles while paying the huge overhead of an IOMMU is rather pointless, and this seriously gets in the way of dma mapping work. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
|
#
45c9a74f |
|
14-May-2018 |
Mike Rapoport <rppt@linux.vnet.ibm.com> |
docs/vm: transhuge: split userspace bits to admin-guide/mm/transhuge Now that the administrative information for transparent huge pages is nicely separated, move it to its own page under the admin guide. Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
|
#
372fddf7 |
|
18-May-2018 |
Kirill A. Shutemov <kirill.shutemov@linux.intel.com> |
x86/mm: Introduce the 'no5lvl' kernel parameter This kernel parameter allows to force kernel to use 4-level paging even if hardware and kernel support 5-level paging. The option may be useful to work around regressions related to 5-level paging. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Hugh Dickins <hughd@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180518103528.59260-5-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
43f1831b |
|
03-May-2018 |
Karthikeyan Ramasubramanian <kramasub@codeaurora.org> |
tty: serial: qcom_geni_serial: Add early console support Add early console support in Qualcomm Technologies Inc., GENI based UART controller. Signed-off-by: Karthikeyan Ramasubramanian <kramasub@codeaurora.org> Signed-off-by: Girish Mahadevan <girishm@codeaurora.org> Reviewed-by: Stephen Boyd <swboyd@chromium.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
cef74409 |
|
10-May-2018 |
Gil Kupfer <gilkup@gmail.com> |
PCI: Add "pci=noats" boot parameter Adds a "pci=noats" boot parameter. When supplied, all ATS related functions fail immediately and the IOMMU is configured to not use device-IOTLB. Any function that checks for ATS capabilities directly against the devices should also check this flag. Currently, such functions exist only in IOMMU drivers, and they are covered by this patch. The motivation behind this patch is the existence of malicious devices. Lots of research has been done about how to use the IOMMU as protection from such devices. When ATS is supported, any I/O device can access any physical address by faking device-IOTLB entries. Adding the ability to ignore these entries lets sysadmins enhance system security. Signed-off-by: Gil Kupfer <gilkup@cs.technion.ac.il> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Joerg Roedel <jroedel@suse.de>
|
#
18bcaa4e |
|
07-May-2018 |
Mauro Carvalho Chehab <mchehab+samsung@kernel.org> |
docs: driver-api: add clk documentation The clk.rst is already in ReST format. So, move it to the driver-api guide, where it belongs. Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
|
#
f21b53b2 |
|
03-May-2018 |
Kees Cook <keescook@chromium.org> |
x86/speculation: Make "seccomp" the default mode for Speculative Store Bypass Unless explicitly opted out of, anything running under seccomp will have SSB mitigations enabled. Choosing the "prctl" mode will disable this. [ tglx: Adjusted it to the new arch_seccomp_spec_mitigate() mechanism ] Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
a73ec77e |
|
29-Apr-2018 |
Thomas Gleixner <tglx@linutronix.de> |
x86/speculation: Add prctl for Speculative Store Bypass mitigation Add prctl based control for Speculative Store Bypass mitigation and make it the default mitigation for Intel and AMD. Andi Kleen provided the following rationale (slightly redacted): There are multiple levels of impact of Speculative Store Bypass: 1) JITed sandbox. It cannot invoke system calls, but can do PRIME+PROBE and may have call interfaces to other code 2) Native code process. No protection inside the process at this level. 3) Kernel. 4) Between processes. The prctl tries to protect against case (1) doing attacks. If the untrusted code can do random system calls then control is already lost in a much worse way. So there needs to be system call protection in some way (using a JIT not allowing them or seccomp). Or rather if the process can subvert its environment somehow to do the prctl it can already execute arbitrary code, which is much worse than SSB. To put it differently, the point of the prctl is to not allow JITed code to read data it shouldn't read from its JITed sandbox. If it already has escaped its sandbox then it can already read everything it wants in its address space, and do much worse. The ability to control Speculative Store Bypass allows to enable the protection selectively without affecting overall system performance. Based on an initial patch from Tim Chen. Completely rewritten. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
24f7fc83 |
|
25-Apr-2018 |
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> |
x86/bugs: Provide boot parameters for the spec_store_bypass_disable mitigation Contemporary high performance processors use a common industry-wide optimization known as "Speculative Store Bypass" in which loads from addresses to which a recent store has occurred may (speculatively) see an older value. Intel refers to this feature as "Memory Disambiguation" which is part of their "Smart Memory Access" capability. Memory Disambiguation can expose a cache side-channel attack against such speculatively read values. An attacker can create exploit code that allows them to read memory outside of a sandbox environment (for example, malicious JavaScript in a web page), or to perform more complex attacks against code running within the same privilege level, e.g. via the stack. As a first step to mitigate against such attacks, provide two boot command line control knobs: nospec_store_bypass_disable spec_store_bypass_disable=[off,auto,on] By default affected x86 processors will power on with Speculative Store Bypass enabled. Hence the provided kernel parameters are written from the point of view of whether to enable a mitigation or not. The parameters are as follows: - auto - Kernel detects whether your CPU model contains an implementation of Speculative Store Bypass and picks the most appropriate mitigation. - on - disable Speculative Store Bypass - off - enable Speculative Store Bypass [ tglx: Reordered the checks so that the whole evaluation is not done when the CPU does not support RDS ] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Borislav Petkov <bp@suse.de> Reviewed-by: Ingo Molnar <mingo@kernel.org>
|
#
6dddd7a7 |
|
18-Apr-2018 |
Thymo van Beers <thymovanbeers@gmail.com> |
docs: kernel-parameters.txt: Fix whitespace Some lines used spaces instead of tabs at line start. This can cause mangled lines in editors due to inconsistency. Replace spaces for tabs where appropriate. Signed-off-by: Thymo van Beers <thymovanbeers@gmail.com> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
|
#
5d12f042 |
|
17-Apr-2018 |
Borislav Petkov <bp@suse.de> |
x86/dumpstack: Remove code_bytes This was added by 86c418374223 ("[PATCH] i386: add option to show more code in oops reports") long time ago but experience shows that 64 instruction bytes are plenty when deciphering an oops. So get rid of it. Removing it will simplify further enhancements to the opcodes dumping machinery coming in the following patches. Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Link: https://lkml.kernel.org/r/20180417161124.5294-2-bp@alien8.de
|
#
ad56b738 |
|
21-Mar-2018 |
Mike Rapoport <rppt@linux.vnet.ibm.com> |
docs/vm: rename documentation files to .rst Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
|
#
a5c6d650 |
|
05-Apr-2018 |
David Rientjes <rientjes@google.com> |
mm, page_alloc: extend kernelcore and movablecore for percent Both kernelcore= and movablecore= can be used to define the amount of ZONE_NORMAL and ZONE_MOVABLE on a system, respectively. This requires the system memory capacity to be known when specifying the command line, however. This introduces the ability to define both kernelcore= and movablecore= as a percentage of total system memory. This is convenient for systems software that wants to define the amount of ZONE_MOVABLE, for example, as a proportion of a system's memory rather than a hardcoded byte value. To define the percentage, the final character of the parameter should be a '%'. mhocko: "why is anyone using these options nowadays?" rientjes: : : Fragmentation of non-__GFP_MOVABLE pages due to low on memory : situations can pollute most pageblocks on the system, as much as 1GB of : slab being fragmented over 128GB of memory, for example. When the : amount of kernel memory is well bounded for certain systems, it is : better to aggressively reclaim from existing MIGRATE_UNMOVABLE : pageblocks rather than eagerly fallback to others. : : We have additional patches that help with this fragmentation if you're : interested, specifically kcompactd compaction of MIGRATE_UNMOVABLE : pageblocks triggered by fallback of non-__GFP_MOVABLE allocations and : draining of pcp lists back to the zone free area to prevent stranding. [rientjes@google.com: updates] Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1802131700160.71590@chino.kir.corp.google.com Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1802121622470.179479@chino.kir.corp.google.com Signed-off-by: David Rientjes <rientjes@google.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Mel Gorman <mgorman@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
4c0fd764 |
|
09-Mar-2018 |
Bjorn Helgaas <bhelgaas@google.com> |
PCI/portdrv: Remove unnecessary "pcie_ports=auto" parameter The "pcie_ports=auto" parameter set pcie_ports_disabled and pcie_ports_auto to their compiled-in defaults, so specifying the parameter is the same as not using it at all. Remove the "pcie_ports=auto" parameter and update the documentation. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
#
1e447c57 |
|
09-Mar-2018 |
Bjorn Helgaas <bhelgaas@google.com> |
PCI/portdrv: Remove "pcie_hp=nomsi" kernel parameter 7570a333d8b0 ("PCI: Add pcie_hp=nomsi to disable MSI/MSI-X for pciehp driver") added the "pcie_hp=nomsi" kernel parameter to work around this error on shutdown: irq 16: nobody cared (try booting with the "irqpoll" option) Pid: 1081, comm: reboot Not tainted 3.2.0 #1 ... Disabling IRQ #16 This happened on an unspecified system (possibly involving the Integrated Device Technology, Inc. Device 807f bridge) where "an un-wanted interrupt is generated when PCI driver switches from MSI/MSI-X to INTx while shutting down the device." The implication was that the device was buggy, but it is normal for a device to use INTx after MSI/MSI-X have been disabled. The only problem was that the driver was still attached and it wasn't prepared for INTx interrupts. Prarit Bhargava fixed this issue with fda78d7a0ead ("PCI/MSI: Stop disabling MSI/MSI-X in pci_device_shutdown()"). There is no automated way to set this parameter, so it's not very useful for distributions or end users. It's really only useful for debugging, and we have "pci=nomsi" for that purpose. Revert 7570a333d8b0 to remove the "pcie_hp=nomsi" parameter. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> CC: MUNEDA Takahiro <muneda.takahiro@jp.fujitsu.com> CC: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com> CC: Prarit Bhargava <prarit@redhat.com>
|
#
4d8d5a39 |
|
23-Mar-2018 |
Kai-Heng Feng <kai.heng.feng@canonical.com> |
usb: core: Add USB_QUIRK_DELAY_CTRL_MSG to usbcore quirks There's a new quirk, USB_QUIRK_DELAY_CTRL_MSG. Add it to usbcore quirks for completeness. Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
2ddc8e2d |
|
21-Mar-2018 |
Filip Alac <filipalac@gmail.com> |
HID: usbhid: extend the polling interval configuration to keyboards For mouse and joystick devices user can change the polling interval via usbhid.mousepoll and usbhid.jspoll. Implement the same thing for keyboards, so user can reduce(or increase) input latency this way. This has been tested with a Cooler Master Devastator with kbpoll=32, resulting in delay between events of 32 ms(values were taken from evtest). Signed-off-by: Filip Alac <filipalac@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
|
#
9e67028e |
|
21-Feb-2018 |
Mimi Zohar <zohar@linux.vnet.ibm.com> |
ima: fail signature verification based on policy This patch addresses the fuse privileged mounted filesystems in environments which are unwilling to accept the risk of trusting the signature verification and want to always fail safe, but are for example using a pre-built kernel. This patch defines a new builtin policy named "fail_securely", which can be specified on the boot command line as an argument to "ima_policy=". Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: Seth Forshee <seth.forshee@canonical.com> Cc: Dongsu Park <dongsu@kinvolk.io> Cc: Alban Crequy <alban@kinvolk.io> Acked-by: Serge Hallyn <serge@hallyn.com> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
#
027bd6ca |
|
19-Mar-2018 |
Kai-Heng Feng <kai.heng.feng@canonical.com> |
usb: core: Add "quirks" parameter for usbcore Trying quirks in usbcore needs to rebuild the driver or the entire kernel if it's builtin. It can save a lot of time if usbcore has similar ability like "usbhid.quirks=" and "usb-storage.quirks=". Rename the original quirk detection function to "static" as we introduce this new "dynamic" function. Now users can use "usbcore.quirks=" as short term workaround before the next kernel release. Also, the quirk parameter can XOR the builtin quirks for debugging purpose. This is inspired by usbhid and usb-storage. Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
c4ae60e4 |
|
12-Mar-2018 |
Liran Alon <liran.alon@oracle.com> |
KVM: x86: Add module parameter for supporting VMware backdoor Support access to VMware backdoor requires KVM to intercept #GP exceptions from guest which introduce slight performance hit. Therefore, control this support by module parameter. Note that module parameter is exported as it should be consumed by kvm_intel & kvm_amd to determine if they should intercept #GP or not. This commit doesn't change semantics. It is done as a preparation for future commits. Signed-off-by: Liran Alon <liran.alon@oracle.com> Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com> Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
4ba66a97 |
|
07-Mar-2018 |
Arnd Bergmann <arnd@arndb.de> |
arch: remove blackfin port The Analog Devices Blackfin port was added in 2007 and was rather active for a while, but all work on it has come to a standstill over time, as Analog have changed their product line-up. Aaron Wu confirmed that the architecture port is no longer relevant, and multiple people suggested removing blackfin independently because of some of its oddities like a non-working SMP port, and the amount of duplication between the chip variants, which cause extra work when doing cross-architecture changes. Link: https://docs.blackfin.uclinux.org/ Acked-by: Aaron Wu <Aaron.Wu@analog.com> Acked-by: Bryan Wu <cooloney@gmail.com> Cc: Steven Miao <realmz6@gmail.com> Cc: Mike Frysinger <vapier@chromium.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
#
f736d65d |
|
25-Feb-2018 |
Marc Zyngier <maz@kernel.org> |
irqchip/gic-v3: Allow LPIs to be disabled from the command line For most GICv3 implementations, enabling LPIs is a one way switch. Once they're on, there is no turning back, which completely kills kexec (pending tables will always be live, and we can't tell the secondary kernel where they are). This is really annoying if you plan to use Linux as a bootloader, as it pretty much guarantees that the secondary kernel won't be able to use MSIs, and may even see some memory corruption. Bad. A workaround for this unfortunate situation is to allow the kernel not to enable LPIs, even if the feature is present in the HW. This would allow Linux-as-a-bootloader to leave LPIs alone, and let the secondary kernel to do whatever it wants with them. Let's introduce a boolean "irqchip.gicv3_nolpi" command line option that serves that purpose. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
|
#
95713fb8 |
|
12-Mar-2018 |
Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
Revert "usb: core: Add "quirks" parameter for usbcore" This reverts commit b27560e4d9e5240b5544c9c5650c7442e482646e as it breaks the build for some arches :( Reported-by: kbuild test robot <fengguang.wu@intel.com> Cc: Kai-Heng Feng <kai.heng.feng@canonical.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 1d1d53f85ddd..70a7398c20e2 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -4368,6 +4368,61 @@ usbcore.nousb [USB] Disable the USB subsystem + usbcore.quirks= + [USB] A list of quirks entries to supplement or + override the built-in usb core quirk list. List + entries are separated by commas. Each entry has + the form VID:PID:Flags where VID and PID are Vendor + and Product ID values (4-digit hex numbers) and + Flags is a set of characters, each corresponding + to a common usb core quirk flag as follows: + a = USB_QUIRK_STRING_FETCH_255 (string + descriptors must not be fetched using + a 255-byte read); + b = USB_QUIRK_RESET_RESUME (device can't resume + correctly so reset it instead); + c = USB_QUIRK_NO_SET_INTF (device can't handle + Set-Interface requests); + d = USB_QUIRK_CONFIG_INTF_STRINGS (device can't + handle its Configuration or Interface + strings); + e = USB_QUIRK_RESET (device can't be reset + (e.g morph devices), don't use reset); + f = USB_QUIRK_HONOR_BNUMINTERFACES (device has + more interface descriptions than the + bNumInterfaces count, and can't handle + talking to these interfaces); + g = USB_QUIRK_DELAY_INIT (device needs a pause + during initialization, after we read + the device descriptor); + h = USB_QUIRK_LINEAR_UFRAME_INTR_BINTERVAL (For + high speed and super speed interrupt + endpoints, the USB 2.0 and USB 3.0 spec + require the interval in microframes (1 + microframe = 125 microseconds) to be + calculated as interval = 2 ^ + (bInterval-1). + Devices with this quirk report their + bInterval as the result of this + calculation instead of the exponent + variable used in the calculation); + i = USB_QUIRK_DEVICE_QUALIFIER (device can't + handle device_qualifier descriptor + requests); + j = USB_QUIRK_IGNORE_REMOTE_WAKEUP (device + generates spurious wakeup, ignore + remote wakeup capability); + k = USB_QUIRK_NO_LPM (device can't handle Link + Power Management); + l = USB_QUIRK_LINEAR_FRAME_INTR_BINTERVAL + (Device reports its bInterval as linear + frames instead of the USB 2.0 + calculation); + m = USB_QUIRK_DISCONNECT_SUSPEND (Device needs + to be disconnected before suspend to + prevent spurious wakeup) + Example: quirks=0781:5580:bk,0a5c:5834:gij + usbhid.mousepoll= [USBHID] The interval which mice are to be polled at. diff --git a/drivers/usb/core/quirks.c b/drivers/usb/core/quirks.c index f4a548471f0f..42faaeead81b 100644 --- a/drivers/usb/core/quirks.c +++ b/drivers/usb/core/quirks.c @@ -11,6 +11,143 @@ #include <linux/usb/hcd.h> #include "usb.h" +struct quirk_entry { + u16 vid; + u16 pid; + u32 flags; +}; + +static DEFINE_MUTEX(quirk_mutex); + +static struct quirk_entry *quirk_list; +static unsigned int quirk_count; + +static char quirks_param[128]; + +static int quirks_param_set(const char *val, const struct kernel_param *kp) +{ + char *p, *field; + u16 vid, pid; + u32 flags; + size_t i; + + mutex_lock(&quirk_mutex); + + if (!val || !*val) { + quirk_count = 0; + kfree(quirk_list); + quirk_list = NULL; + goto unlock; + } + + for (quirk_count = 1, i = 0; val[i]; i++) + if (val[i] == ',') + quirk_count++; + + if (quirk_list) { + kfree(quirk_list); + quirk_list = NULL; + } + + quirk_list = kcalloc(quirk_count, sizeof(struct quirk_entry), + GFP_KERNEL); + if (!quirk_list) { + mutex_unlock(&quirk_mutex); + return -ENOMEM; + } + + for (i = 0, p = (char *)val; p && *p;) { + /* Each entry consists of VID:PID:flags */ + field = strsep(&p, ":"); + if (!field) + break; + + if (kstrtou16(field, 16, &vid)) + break; + + field = strsep(&p, ":"); + if (!field) + break; + + if (kstrtou16(field, 16, &pid)) + break; + + field = strsep(&p, ","); + if (!field || !*field) + break; + + /* Collect the flags */ + for (flags = 0; *field; field++) { + switch (*field) { + case 'a': + flags |= USB_QUIRK_STRING_FETCH_255; + break; + case 'b': + flags |= USB_QUIRK_RESET_RESUME; + break; + case 'c': + flags |= USB_QUIRK_NO_SET_INTF; + break; + case 'd': + flags |= USB_QUIRK_CONFIG_INTF_STRINGS; + break; + case 'e': + flags |= USB_QUIRK_RESET; + break; + case 'f': + flags |= USB_QUIRK_HONOR_BNUMINTERFACES; + break; + case 'g': + flags |= USB_QUIRK_DELAY_INIT; + break; + case 'h': + flags |= USB_QUIRK_LINEAR_UFRAME_INTR_BINTERVAL; + break; + case 'i': + flags |= USB_QUIRK_DEVICE_QUALIFIER; + break; + case 'j': + flags |= USB_QUIRK_IGNORE_REMOTE_WAKEUP; + break; + case 'k': + flags |= USB_QUIRK_NO_LPM; + break; + case 'l': + flags |= USB_QUIRK_LINEAR_FRAME_INTR_BINTERVAL; + break; + case 'm': + flags |= USB_QUIRK_DISCONNECT_SUSPEND; + break; + /* Ignore unrecognized flag characters */ + } + } + + quirk_list[i++] = (struct quirk_entry) + { .vid = vid, .pid = pid, .flags = flags }; + } + + if (i < quirk_count) + quirk_count = i; + +unlock: + mutex_unlock(&quirk_mutex); + + return param_set_copystring(val, kp); +} + +static const struct kernel_param_ops quirks_param_ops = { + .set = quirks_param_set, + .get = param_get_string, +}; + +static struct kparam_string quirks_param_string = { + .maxlen = sizeof(quirks_param), + .string = quirks_param, +}; + +module_param_cb(quirks, &quirks_param_ops, &quirks_param_string, 0644); +MODULE_PARM_DESC(quirks, "Add/modify USB quirks by specifying quirks=vendorID:productID:quirks"); + /* Lists of quirky USB devices, split in device quirks and interface quirks. * Device quirks are applied at the very beginning of the enumeration process, * right after reading the device descriptor. They can thus only match on device @@ -320,8 +457,8 @@ static int usb_amd_resume_quirk(struct usb_device *udev) return 0; } -static u32 __usb_detect_quirks(struct usb_device *udev, - const struct usb_device_id *id) +static u32 usb_detect_static_quirks(struct usb_device *udev, + const struct usb_device_id *id) { u32 quirks = 0; @@ -339,21 +476,43 @@ static u32 __usb_detect_quirks(struct usb_device *udev, return quirks; } +static u32 usb_detect_dynamic_quirks(struct usb_device *udev) +{ + u16 vid = le16_to_cpu(udev->descriptor.idVendor); + u16 pid = le16_to_cpu(udev->descriptor.idProduct); + int i, flags = 0; + + mutex_lock(&quirk_mutex); + + for (i = 0; i < quirk_count; i++) { + if (vid == quirk_list[i].vid && pid == quirk_list[i].pid) { + flags = quirk_list[i].flags; + break; + } + } + + mutex_unlock(&quirk_mutex); + + return flags; +} + /* * Detect any quirks the device has, and do any housekeeping for it if needed. */ void usb_detect_quirks(struct usb_device *udev) { - udev->quirks = __usb_detect_quirks(udev, usb_quirk_list); + udev->quirks = usb_detect_static_quirks(udev, usb_quirk_list); /* * Pixart-based mice would trigger remote wakeup issue on AMD * Yangtze chipset, so set them as RESET_RESUME flag. */ if (usb_amd_resume_quirk(udev)) - udev->quirks |= __usb_detect_quirks(udev, + udev->quirks |= usb_detect_static_quirks(udev, usb_amd_resume_quirk_list); + udev->quirks ^= usb_detect_dynamic_quirks(udev); + if (udev->quirks) dev_dbg(&udev->dev, "USB quirks for this device: %x\n", udev->quirks); @@ -372,7 +531,7 @@ void usb_detect_interface_quirks(struct usb_device *udev) { u32 quirks; - quirks = __usb_detect_quirks(udev, usb_interface_quirk_list); + quirks = usb_detect_static_quirks(udev, usb_interface_quirk_list); if (quirks == 0) return; @@ -380,3 +539,11 @@ void usb_detect_interface_quirks(struct usb_device *udev) quirks); udev->quirks |= quirks; } + +void usb_release_quirk_list(void) +{ + mutex_lock(&quirk_mutex); + kfree(quirk_list); + quirk_list = NULL; + mutex_unlock(&quirk_mutex); +} diff --git a/drivers/usb/core/usb.c b/drivers/usb/core/usb.c index 2f5fbc56a9dd..0adb6345ff2e 100644 --- a/drivers/usb/core/usb.c +++ b/drivers/usb/core/usb.c @@ -1259,6 +1259,7 @@ static void __exit usb_exit(void) if (usb_disabled()) return; + usb_release_quirk_list(); usb_deregister_device_driver(&usb_generic_driver); usb_major_cleanup(); usb_deregister(&usbfs_driver); diff --git a/drivers/usb/core/usb.h b/drivers/usb/core/usb.h index 149cc7480971..546a2219454b 100644 --- a/drivers/usb/core/usb.h +++ b/drivers/usb/core/usb.h @@ -36,6 +36,7 @@ extern void usb_deauthorize_interface(struct usb_interface *); extern void usb_authorize_interface(struct usb_interface *); extern void usb_detect_quirks(struct usb_device *udev); extern void usb_detect_interface_quirks(struct usb_device *udev); +extern void usb_release_quirk_list(void); extern int usb_remove_device(struct usb_device *udev); extern int usb_get_device_descriptor(struct usb_device *dev,
|
#
b27560e4 |
|
07-Mar-2018 |
Kai-Heng Feng <kai.heng.feng@canonical.com> |
usb: core: Add "quirks" parameter for usbcore Trying quirks in usbcore needs to rebuild the driver or the entire kernel if it's builtin. It can save a lot of time if usbcore has similar ability like "usbhid.quirks=" and "usb-storage.quirks=". Rename the original quirk detection function to "static" as we introduce this new "dynamic" function. Now users can use "usbcore.quirks=" as short term workaround before the next kernel release. Also, the quirk parameter can XOR the builtin quirks for debugging purpose. This is inspired by usbhid and usb-storage. Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
11dd2666 |
|
05-Mar-2018 |
Greg Edwards <gedwards@ddn.com> |
audit: do not panic on invalid boot parameter If you pass in an invalid audit boot parameter value, e.g. "audit=off", the kernel panics very early in boot before the regular console is initialized. Unless you have earlyprintk enabled, there is no indication of what the problem is on the console. Convert the panic() calls to pr_err(), and leave auditing enabled if an invalid parameter value was passed in. Modify the parameter to also accept "on" or "off" as valid values, and update the documentation accordingly. Signed-off-by: Greg Edwards <gedwards@ddn.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
|
#
ef61f8a3 |
|
02-Feb-2018 |
Jan H. Schönherr <jschoenh@amazon.de> |
x86/boot/e820: Implement a range manipulation operator Add a more versatile memmap= operator, which -- in addition to all the things that were possible before -- allows you to: - redeclare existing ranges -- before, you were limited to adding ranges; - drop any range -- like a mem= for any location; - use any e820 memory type -- not just some predefined ones. The syntax is: memmap=<size>%<offset>-<oldtype>+<newtype> Size and offset work as usual. The "-<oldtype>" and "+<newtype>" are optional and their existence determine the behavior: The command works on the specified range of memory limited to type <oldtype> (if specified). This memory is then configured to show up as <newtype>. If <newtype> is not specified, the memory is removed from the e820 map. Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Andy Shevchenko <andy.shevchenko@gmail.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20180202231020.15608-1-jschoenh@amazon.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
29891061 |
|
24-Oct-2017 |
James Hogan <jhogan@kernel.org> |
docs: Remove metag docs Now that arch/metag/ has been removed, remove Meta architecture specific documentation from the Documentation/ directory. Signed-off-by: James Hogan <jhogan@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: linux-metag@vger.kernel.org Cc: linux-doc@vger.kernel.org
|
#
083c6eea |
|
20-Feb-2018 |
Frederic Weisbecker <frederic@kernel.org> |
sched/isolation: Update nohz documentation to explain tick offload Update the documentation to reflect the 1Hz tick offload changes. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Chris Metcalf <cmetcalf@mellanox.com> Cc: Christoph Lameter <cl@linux.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luiz Capitulino <lcapitulino@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Rik van Riel <riel@redhat.com> Cc: Wanpeng Li <kernellwp@gmail.com> Link: http://lkml.kernel.org/r/1519186649-3242-8-git-send-email-frederic@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
0231d000 |
|
18-Jan-2018 |
Prarit Bhargava <prarit@redhat.com> |
ACPI: SPCR: Make SPCR available to x86 SPCR is currently only enabled or ARM64 and x86 can use SPCR to setup an early console. General fixes include updating Documentation & Kconfig (for x86), updating comments, and changing parse_spcr() to acpi_parse_spcr(), and earlycon_init_is_deferred to earlycon_acpi_spcr_enable to be more descriptive. On x86, many systems have a valid SPCR table but the table version is not 2 so the table version check must be a warning. On ARM64 when the kernel parameter earlycon is used both the early console and console are enabled. On x86, only the earlycon should be enabled by by default. Modify acpi_parse_spcr() to allow options for initializing the early console and console separately. Signed-off-by: Prarit Bhargava <prarit@redhat.com> Acked-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Mark Salter <msalter@redhat.com> Tested-by: Mark Salter <msalter@redhat.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
#
12c69f1e |
|
30-Jan-2018 |
Josh Poimboeuf <jpoimboe@redhat.com> |
x86/paravirt: Remove 'noreplace-paravirt' cmdline option The 'noreplace-paravirt' option disables paravirt patching, leaving the original pv indirect calls in place. That's highly incompatible with retpolines, unless we want to uglify paravirt even further and convert the paravirt calls to retpolines. As far as I can tell, the option doesn't seem to be useful for much other than introducing surprising corner cases and making the kernel vulnerable to Spectre v2. It was probably a debug option from the early paravirt days. So just remove it. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Juergen Gross <jgross@suse.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ashok Raj <ashok.raj@intel.com> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Jun Nakajima <jun.nakajima@intel.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Asit Mallick <asit.k.mallick@intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Jason Baron <jbaron@akamai.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Alok Kataria <akataria@vmware.com> Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Dan Williams <dan.j.williams@intel.com> Link: https://lkml.kernel.org/r/20180131041333.2x6blhxirc2kclrq@treble
|
#
31516de3 |
|
20-Dec-2017 |
Fenghua Yu <fenghua.yu@intel.com> |
x86/intel_rdt: Add command line parameter to control L2_CDP L2 CDP can be controlled by kernel parameter "rdt=". If "rdt=l2cdp", L2 CDP is turned on. If "rdt=!l2cdp", L2 CDP is turned off. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com> Cc: "Tony Luck" <tony.luck@intel.com> Cc: Vikas" <vikas.shivappa@intel.com> Cc: Sai Praneeth" <sai.praneeth.prakhya@intel.com> Cc: Reinette" <reinette.chatre@intel.com> Link: https://lkml.kernel.org/r/1513810644-78015-7-git-send-email-fenghua.yu@intel.com
|
#
da285121 |
|
11-Jan-2018 |
David Woodhouse <dwmw@amazon.co.uk> |
x86/spectre: Add boot time option to select Spectre v2 mitigation Add a spectre_v2= option to select the mitigation used for the indirect branch speculation vulnerability. Currently, the only option available is retpoline, in its various forms. This will be expanded to cover the new IBRS/IBPB microcode features. The RETPOLINE_AMD feature relies on a serializing LFENCE for speculation control. For AMD hardware, only set RETPOLINE_AMD if LFENCE is a serializing instruction, which is indicated by the LFENCE_RDTSC feature. [ tglx: Folded back the LFENCE/AMD fixes and reworked it so IBRS integration becomes simple ] Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: gnomes@lxorguk.ukuu.org.uk Cc: Rik van Riel <riel@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: thomas.lendacky@amd.com Cc: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Jiri Kosina <jikos@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Kees Cook <keescook@google.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org> Cc: Paul Turner <pjt@google.com> Link: https://lkml.kernel.org/r/1515707194-20531-5-git-send-email-dwmw@amazon.co.uk
|
#
f32ab754 |
|
11-Jan-2018 |
=?UTF-8?q?Christian=20K=C3=B6nig?= <ckoenig.leichtzumerken@gmail.com> |
x86/PCI: Add "pci=big_root_window" option for AMD 64-bit windows Only try to enable a 64-bit window on AMD CPUs when "pci=big_root_window" is specified. This taints the kernel because the new 64-bit window uses address space we don't know anything about, and it may contain unreported devices or memory that would conflict with the window. The pci_amd_enable_64bit_bar() quirk that enables the window is specific to AMD CPUs. The generic solution would be to have the firmware enable the window and describe it in the host bridge's _CRS method, or at least describe it in the _PRS method so the OS would have the option of enabling it. Signed-off-by: Christian König <christian.koenig@amd.com> [bhelgaas: changelog, extend doc, mention taint in dmesg] Signed-off-by: Bjorn Helgaas <helgaas@kernel.org>
|
#
dba04eb7 |
|
08-Jan-2018 |
David Sterba <dsterba@suse.com> |
locking/Documentation: Remove stale crossrelease_fullstack parameter The cross-release lockdep functionality has been removed in: e966eaeeb623: ("locking/lockdep: Remove the cross-release locking checks") ... leaving the kernel parameter docs behind. The code handling the parameter does not exist so this is a plain documentation change. Signed-off-by: David Sterba <dsterba@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: byungchul.park@lge.com Cc: linux-doc@vger.kernel.org Link: http://lkml.kernel.org/r/20180108152731.27613-1-dsterba@suse.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
01c9b17b |
|
05-Jan-2018 |
Dave Hansen <dave.hansen@linux.intel.com> |
x86/Documentation: Add PTI description Add some details about how PTI works, what some of the downsides are, and how to debug it when things go wrong. Also document the kernel parameter: 'pti/nopti'. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: Moritz Lipp <moritz.lipp@iaik.tugraz.at> Cc: Daniel Gruss <daniel.gruss@iaik.tugraz.at> Cc: Michael Schwarz <michael.schwarz@iaik.tugraz.at> Cc: Richard Fellner <richard.fellner@student.tugraz.at> Cc: Andy Lutomirski <luto@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Hugh Dickins <hughd@google.com> Cc: Andi Lutomirsky <luto@kernel.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180105174436.1BC6FA2B@viggo.jf.intel.com
|
#
cca10d58 |
|
20-Dec-2017 |
Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> |
printk: add console_msg_format command line option 0day and kernelCI automatically parse kernel log - basically some sort of grepping using the pre-defined text patterns - in order to detect and report regressions/errors. There are several sources they get the kernel logs from: a) dmesg or /proc/ksmg This is the preferred way. Because `dmesg --raw' (see later Note) and /proc/kmsg output contains facility and log level, which greatly simplifies grepping for EMERG/ALERT/CRIT/ERR messages. b) serial consoles This option is harder to maintain, because serial console messages don't contain facility and log level. This patch introduces a `console_msg_format=' command line option, to switch between different message formatting on serial consoles. For the time being we have just two options - default and syslog. The "default" option just keeps the existing format. While the "syslog" option makes serial console messages to appear in syslog format [syslog() syscall], matching the `dmesg -S --raw' and `cat /proc/kmsg' output formats: - facility and log level - time stamp (depends on printk_time/PRINTK_TIME) - message <%u>[time stamp] text\n NOTE: while Kevin and Fengguang talk about "dmesg --raw", it's actually "dmesg -S --raw" that always prints messages in syslog format [per Petr Mladek]. Running "dmesg --raw" may produce output in non-syslog format sometimes. console_msg_format=syslog enables syslog format, thus in documentation we mention "dmesg -S --raw", not "dmesg --raw". Per Kevin Hilman: : Right now we can get this info from a "dmesg --raw" after bootup, : but it would be really nice in certain automation frameworks to : have a kernel command-line option to enable printing of loglevels : in default boot log. : : This is especially useful when ingesting kernel logs into advanced : search/analytics frameworks (I'm playing with and ELK stack: Elastic : Search, Logstash, Kibana). : : The other important reason for having this on the command line is that : for testing linux-next (and other bleeding edge developer branches), : it's common that we never make it to userspace, so can't even run : "dmesg --raw" (or equivalent.) So we really want this on the primary : boot (serial) console. Per Fengguang Wu, 0day scripts should quickly benefit from that feature, because they will be able to switch to a more reliable parsing, based on messages' facility and log levels [1]: `#{grep} -a -E -e '^<[0123]>' -e '^kern :(err |crit |alert |emerg )' instead of doing text pattern matching `#{grep} -a -F -f /lkp/printk-error-messages #{kmsg_file} | grep -a -v -E -f #{LKP_SRC}/etc/oops-pattern | grep -a -v -F -f #{LKP_SRC}/etc/kmsg-blacklist` [1] https://github.com/fengguang/lkp-tests/blob/master/lib/dmesg.rb Link: http://lkml.kernel.org/r/20171221054149.4398-1-sergey.senozhatsky@gmail.com To: Steven Rostedt <rostedt@goodmis.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Kevin Hilman <khilman@baylibre.com> Cc: Mark Brown <broonie@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: LKML <linux-kernel@vger.kernel.org> Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Reviewed-by: Fengguang Wu <fengguang.wu@intel.com> Reviewed-by: Kevin Hilman <khilman@baylibre.com> Tested-by: Kevin Hilman <khilman@baylibre.com> Signed-off-by: Petr Mladek <pmladek@suse.com>
|
#
64e05d11 |
|
03-Dec-2017 |
Dou Liyang <douly.fnst@cn.fujitsu.com> |
x86/apic: Update the 'apic=' description of setting APIC driver There are two consumers of apic=: the APIC debug level and the low level generic architecture code, but Linux just documented the first one. Append the second description. Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: peterz@infradead.org Cc: rdunlap@infradead.org Cc: corbet@lwn.net Link: https://lkml.kernel.org/r/20171204040313.24824-2-douly.fnst@cn.fujitsu.com
|
#
41f4c20b |
|
12-Dec-2017 |
Borislav Petkov <bp@suse.de> |
x86/pti: Add the pti= cmdline option and documentation Keep the "nopti" optional for traditional reasons. [ tglx: Don't allow force on when running on XEN PV and made 'on' printout conditional ] Requested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirsky <luto@kernel.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Laight <David.Laight@aculab.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Eduardo Valentin <eduval@amazon.com> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will.deacon@arm.com> Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Link: https://lkml.kernel.org/r/20171212133952.10177-1-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
aa8c6248 |
|
04-Dec-2017 |
Thomas Gleixner <tglx@linutronix.de> |
x86/mm/pti: Add infrastructure for page table isolation Add the initial files for kernel page table isolation, with a minimal init function and the boot time detection for this misfeature. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Laight <David.Laight@aculab.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Eduardo Valentin <eduval@amazon.com> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will.deacon@arm.com> Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
ffd2e8df |
|
01-Dec-2017 |
Bjorn Helgaas <bhelgaas@google.com> |
resource: Set type of "reserve=" user-specified resources When we reserve regions because the user specified a "reserve=" parameter, set the resource type to either IORESOURCE_IO (for regions below 0x10000) or IORESOURCE_MEM. The test for 0x10000 is just a heuristic; obviously there can be memory below 0x10000 as well. Improve documentation of the "reserve=" parameter. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
#
d94d1053 |
|
14-Dec-2017 |
Frederic Weisbecker <frederic@kernel.org> |
sched/isolation: Document boot parameters dependency on CONFIG_CPU_ISOLATION=y The "isolcpus=" and "nohz_full=" boot parameters depend on CPU Isolation support. Let's document that. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Chris Metcalf <cmetcalf@mellanox.com> Cc: Christoph Lameter <cl@linux.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luiz Capitulino <lcapitulino@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wanpeng Li <kernellwp@gmail.com> Cc: kernel test robot <xiaolong.ye@intel.com> Link: http://lkml.kernel.org/r/1513275507-29200-4-git-send-email-frederic@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
0f27cff8 |
|
30-Nov-2017 |
Prarit Bhargava <prarit@redhat.com> |
ACPI: sysfs: Make ACPI GPE mask kernel parameter cover all GPEs The acpi_mask_gpe= kernel parameter documentation states that the range of mask is 128 GPEs (0x00 to 0x7F). The acpi_masked_gpes mask is a u64 so only 64 GPEs (0x00 to 0x3F) can really be masked. Use a bitmap of size 0xFF instead of a u64 for the GPE mask so 256 GPEs can be masked. Fixes: 9c4aa1eecb48 (ACPI / sysfs: Provide quirk mechanism to prevent GPE flooding) Signed-off-by: Prarit Bharava <prarit@redhat.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
#
d22881dc |
|
10-Dec-2017 |
Scott Wood <swood@redhat.com> |
Documentation: Better document the hardlockup_panic sysctl Commit ac1f591249d95372f ("kernel/watchdog.c: add sysctl knob hardlockup_panic") added the hardlockup_panic sysctl, but did not add it to Documentation/sysctl/kernel.txt. Add this, and reference it from the corresponding entry in Documentation/admin-guide/kernel-parameters.txt. Signed-off-by: Scott Wood <swood@redhat.com> Acked-by: Christoph von Recklinghausen <crecklin@redhat.com> Acked-by: Don Zickus <dzickus@redhat.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
|
#
a2f2577d |
|
21-Nov-2017 |
Paul E. McKenney <paulmck@kernel.org> |
torture: Eliminate torture_runnable and perf_runnable The purpose of torture_runnable is to allow rcutorture and locktorture to be started and stopped via sysfs when they are built into the kernel (as in not compiled as loadable modules). However, the 0444 permissions for both instances of torture_runnable prevent this use case from ever being put into practice. Given that there have been no complaints about this deficiency, it is reasonable to conclude that no one actually makes use of this sysfs capability. The perf_runnable module parameter for rcuperf is in the same situation. This commit therefore removes both torture_runnable instances as well as perf_runnable. Reported-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
57044031 |
|
14-Nov-2017 |
Rafael J. Wysocki <rafael.j.wysocki@intel.com> |
ACPI / PM: Make it possible to ignore the system sleep blacklist The ACPI code supporting system transitions to sleep states uses an internal blacklist to apply special handling to some machines reported to behave incorrectly in some ways. However, some entries of that blacklist cover problematic as well as non-problematic systems, so give the users of the latter a chance to ignore the blacklist and run their systems in the default way by adding acpi_sleep=nobl to the kernel command line. For example, that allows the users of Dell XPS13 9360 systems not affected by the issue that caused the blacklist entry for this machine to be added by commit 71630b7a832f (ACPI / PM: Blacklist Low Power S0 Idle _DSM for Dell XPS13 9360) to use suspend-to-idle with the Low Power S0 Idle _DSM interface which in principle should be more energy-efficient than S3 on them. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
#
e7e61fc0 |
|
19-Nov-2017 |
Randy Dunlap <rdunlap@infradead.org> |
Documentation: fix profile= options in kernel-parameters.txt Correctly the formatting of several additions to the profile= option that have been added by using <profiletype> and listing the choices for it. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
|
#
4675ff05 |
|
15-Nov-2017 |
Levin, Alexander (Sasha Levin) <alexander.levin@verizon.com> |
kmemcheck: rip it out Fix up makefiles, remove references, and git rm kmemcheck. Link: http://lkml.kernel.org/r/20171007030159.22241-4-alexander.levin@verizon.com Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Vegard Nossum <vegardno@ifi.uio.no> Cc: Pekka Enberg <penberg@kernel.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Alexander Potapenko <glider@google.com> Cc: Tim Hansen <devtimhansen@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
a7546054 |
|
27-Oct-2017 |
Marc Zyngier <maz@kernel.org> |
KVM: arm/arm64: GICv4: Enable VLPI support All it takes is the has_v4 flag to be set in gic_kvm_info as well as "kvm-arm.vgic_v4_enable=1" being passed on the command line for GICv4 to be enabled in KVM. Acked-by: Christoffer Dall <cdall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
|
#
0962289b |
|
27-Oct-2017 |
Marc Zyngier <maz@kernel.org> |
irqchip/gic: Deal with broken firmware exposing only 4kB of GICv2 CPU interface There is a lot of broken firmware out there that don't really expose the information the kernel requires when it comes with dealing with GICv2: (1) Firmware that only describes the first 4kB of GICv2 (2) Firmware that describe 128kB of CPU interface, while the usable portion of the address space is between 60 and 68kB So far, we only deal with (2). But we have platforms exhibiting behaviour (1), resulting in two sub-cases: (a) The GIC is occupying 8kB, as required by the GICv2 architecture (b) It is actually spread 128kB, and this is likely to be a version of (2) This patch tries to work around both (a) and (b) by poking at the outside of the described memory region, and try to work out what is actually there. This is of course unsafe, and should only be enabled if there is no way to otherwise fix the DT provided by the firmware (we provide a "irqchip.gicv2_force_probe" option to that effect). Note that for the time being, we restrict ourselves to GICv2 implementations provided by ARM, since there I have no knowledge of an alternative implementations. This could be relaxed if such an implementation comes to light on a broken platform. Reviewed-by: Christoffer Dall <cdall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
|
#
b0d40d2b |
|
30-Oct-2017 |
Frederic Weisbecker <frederic@kernel.org> |
sched/isolation: Document isolcpus= boot parameter flags, mark it deprecated Document the latest updates on the isolcpus= boot option. While at it, let's also fix the details about the preferred way to isolate a set of CPUs from the scheduler general domains. Cpusets offer a much better interface to achieve that. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Chris Metcalf <cmetcalf@mellanox.com> Cc: Christoph Lameter <cl@linux.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luiz Capitulino <lcapitulino@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Rik van Riel <riel@redhat.com> Cc: Wanpeng Li <kernellwp@gmail.com> Link: http://lkml.kernel.org/r/1509419914-16179-1-git-send-email-frederic@kernel.org [ Clarified the text some more, marked the boot option deprecated. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
d141babe |
|
25-Oct-2017 |
Byungchul Park <byungchul.park@lge.com> |
locking/lockdep: Add a boot parameter allowing unwind in cross-release and disable it by default Johan Hovold reported a heavy performance regression caused by lockdep cross-release: > Boot time (from "Linux version" to login prompt) had in fact doubled > since 4.13 where it took 17 seconds (with my current config) compared to > the 35 seconds I now see with 4.14-rc4. > > I quick bisect pointed to lockdep and specifically the following commit: > > 28a903f63ec0 ("locking/lockdep: Handle non(or multi)-acquisition > of a crosslock") > > which I've verified is the commit which doubled the boot time (compared > to 28a903f63ec0^) (added by lockdep crossrelease series [1]). Currently cross-release performs unwind on every acquisition, but that is very expensive. This patch makes unwind optional and disables it by default and only records acquire_ip. Full stack traces are sometimes required for full analysis, in which case a boot paramter, crossrelease_fullstack, can be specified. On my qemu Ubuntu machine (x86_64, 4 cores, 512M), the regression was fixed. We measure boot times with 'perf stat --null --repeat 10 $QEMU', where $QEMU launches a kernel with init=/bin/true: 1. No lockdep enabled: Performance counter stats for 'qemu_booting_time.sh bzImage' (10 runs): 2.756558155 seconds time elapsed ( +- 0.09% ) 2. Lockdep enabled: Performance counter stats for 'qemu_booting_time.sh bzImage' (10 runs): 2.968710420 seconds time elapsed ( +- 0.12% ) 3. Lockdep enabled + cross-release enabled: Performance counter stats for 'qemu_booting_time.sh bzImage' (10 runs): 3.153839636 seconds time elapsed ( +- 0.31% ) 4. Lockdep enabled + cross-release enabled + this patch applied: Performance counter stats for 'qemu_booting_time.sh bzImage' (10 runs): 2.963669551 seconds time elapsed ( +- 0.11% ) I.e. lockdep cross-release performance is now indistinguishable from vanilla lockdep. Bisected-by: Johan Hovold <johan@kernel.org> Analyzed-by: Thomas Gleixner <tglx@linutronix.de> Suggested-by: Thomas Gleixner <tglx@linutronix.de> Reported-by: Johan Hovold <johan@kernel.org> Signed-off-by: Byungchul Park <byungchul.park@lge.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: amir73il@gmail.com Cc: axboe@kernel.dk Cc: darrick.wong@oracle.com Cc: david@fromorbit.com Cc: hch@infradead.org Cc: idryomov@gmail.com Cc: johannes.berg@intel.com Cc: kernel-team@lge.com Cc: linux-block@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org Cc: linux-mm@kvack.org Cc: linux-xfs@vger.kernel.org Cc: oleg@redhat.com Cc: tj@kernel.org Link: http://lkml.kernel.org/r/1508921765-15396-5-git-send-email-byungchul.park@lge.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
07fd1761 |
|
12-Oct-2017 |
Cyril Bur <cyrilbur@gmail.com> |
powerpc/tm: Add commandline option to disable hardware transactional memory Currently the kernel relies on firmware to inform it whether or not the CPU supports HTM and as long as the kernel was built with CONFIG_PPC_TRANSACTIONAL_MEM=y then it will allow userspace to make use of the facility. There may be situations where it would be advantageous for the kernel to not allow userspace to use HTM, currently the only way to achieve this is to recompile the kernel with CONFIG_PPC_TRANSACTIONAL_MEM=n. This patch adds a simple commandline option so that HTM can be disabled at boot time. Signed-off-by: Cyril Bur <cyrilbur@gmail.com> [mpe: Simplify to a bool, move to prom.c, put doco in the right place. Always disable, regardless of initial state, to avoid user confusion.] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
686140a1 |
|
12-Oct-2017 |
Vasily Gorbik <gor@linux.vnet.ibm.com> |
s390: introduce CPU alternatives Implement CPU alternatives, which allows to optionally patch newer instructions at runtime, based on CPU facilities availability. A new kernel boot parameter "noaltinstr" disables patching. Current implementation is derived from x86 alternatives. Although ideal instructions padding (when altinstr is longer then oldinstr) is added at compile time, and no oldinstr nops optimization has to be done at runtime. Also couple of compile time sanity checks are done: 1. oldinstr and altinstr must be <= 254 bytes long, 2. oldinstr and altinstr must not have an odd length. alternative(oldinstr, altinstr, facility); alternative_2(oldinstr, altinstr1, facility1, altinstr2, facility2); Both compile time and runtime padding consists of either 6/4/2 bytes nop or a jump (brcl) + 2 bytes nop filler if padding is longer then 6 bytes. .altinstructions and .altinstr_replacement sections are part of __init_begin : __init_end region and are freed after initialization. Signed-off-by: Vasily Gorbik <gor@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
a405ed85 |
|
09-Oct-2017 |
Tom Saeger <tom.saeger@oracle.com> |
Documentation: fix media related doc refs Make media doc refs valid. Signed-off-by: Tom Saeger <tom.saeger@oracle.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
|
#
1752118d |
|
09-Oct-2017 |
Tom Saeger <tom.saeger@oracle.com> |
Documentation: fix input related doc refs Make `input` document refs valid including: - joystick - joystick-parport Signed-off-by: Tom Saeger <tom.saeger@oracle.com> Reviewed-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
|
#
3ba9b1b8 |
|
09-Oct-2017 |
Tom Saeger <tom.saeger@oracle.com> |
Documentation: fix admin-guide doc refs Make admin-guide document refs valid. Signed-off-by: Tom Saeger <tom.saeger@oracle.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
|
#
6be53520 |
|
09-Oct-2017 |
Dou Liyang <douly.fnst@cn.fujitsu.com> |
x86/tsc: Append the 'tsc=' description for the 'tsc=unstable' boot parameter Commit: 8309f86cd41e ("x86/tsc: Provide 'tsc=unstable' boot parameter") added a new 'tsc=unstable' parameter, but didn't document it. Document it. Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: <corbet@lwn.net> Cc: <pasha.tatashin@oracle.com> Cc: <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1507539813-11420-1-git-send-email-douly.fnst@cn.fujitsu.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
2b1516e5 |
|
18-Aug-2017 |
Paul E. McKenney <paulmck@kernel.org> |
rcutorture: Add interrupt-disable capability to stall-warning tests When rcutorture sees the rcutorture.stall_cpu kernel boot parameter, it loops with preemption disabled, which does in fact normally generate an RCU CPU stall warning message. However, there are test scenarios that need the stalling CPU to have interrupts disabled. This commit therefore adds an rcutorture.stall_cpu_irqsoff kernel boot parameter that causes the stalling CPU to disable interrupts. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
58e7cb9e |
|
05-Oct-2017 |
Rafael J. Wysocki <rafael.j.wysocki@intel.com> |
PM: docs: Fix stale reference in kernel-parameters.txt Commit 7aa7a0360a66 (PM: docs: Delete the obsolete states.txt document) forgot to update kernel-parameters.txt with a reference to the new sleep-states.rst document and it still points to states.txt that was dropped, so fix it now. Fixes: 7aa7a0360a66 (PM: docs: Delete the obsolete states.txt document) Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
|
#
3ce62385 |
|
03-Oct-2017 |
Borislav Petkov <bp@suse.de> |
Documentation: Improve softlockup_panic= description text It should say what that <integer> range is and what that integer value means. I had to look at the code... Signed-off-by: Borislav Petkov <bp@suse.de> [jc: changed non-null to nonzero] Signed-off-by: Jonathan Corbet <corbet@lwn.net>
|
#
ac0a314c |
|
18-Sep-2017 |
Daniel Xu <dxu@dxuuu.xyz> |
console: Update to reflect new default value The default value was changed from 10 minutes to disabled in commit a4199f5eb8096d6. Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
|
#
53fd40a9 |
|
12-Sep-2017 |
Jani Nikula <jani.nikula@intel.com> |
drm: handle override and firmware EDID at drm_do_get_edid() level Handle debugfs override edid and firmware edid at the low level to transparently and completely replace the real edid. Previously, we practically only used the modes from the override EDID, and none of the other data, such as audio parameters. This change also prevents actual EDID reads when the EDID is to be overridden, but retains the DDC probe. This is useful if the reason for preferring override EDID are problems with reading the data, or corruption of the data. Move firmware EDID loading from helper to core, as the functionality moves to lower level as well. This will result in a change of module parameter from drm_kms_helper.edid_firmware to drm.edid_firmware, which arguably makes more sense anyway. Some future work remains related to override and firmware EDID validation. Like before, no validation is done for override EDID. The firmware EDID is validated separately in the loader. Some unification and deduplication would be in order, to validate all of them at the drm_do_get_edid() level, like "real" EDIDs. v2: move firmware loading to core v3: rebase, commit message refresh Cc: Abdiel Janulgue <abdiel.janulgue@linux.intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Tested-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Acked-by: Dave Airlie <airlied@gmail.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/1e8a710bcac46e5136c1a7b430074893c81f364a.1505203831.git.jani.nikula@intel.com
|
#
c9bff3ee |
|
06-Sep-2017 |
Michal Hocko <mhocko@suse.com> |
mm, page_alloc: rip out ZONELIST_ORDER_ZONE Patch series "cleanup zonelists initialization", v1. This is aimed at cleaning up the zonelists initialization code we have but the primary motivation was bug report [2] which got resolved but the usage of stop_machine is just too ugly to live. Most patches are straightforward but 3 of them need a special consideration. Patch 1 removes zone ordered zonelists completely. I am CCing linux-api because this is a user visible change. As I argue in the patch description I do not think we have a strong usecase for it these days. I have kept sysctl in place and warn into the log if somebody tries to configure zone lists ordering. If somebody has a real usecase for it we can revert this patch but I do not expect anybody will actually notice runtime differences. This patch is not strictly needed for the rest but it made patch 6 easier to implement. Patch 7 removes stop_machine from build_all_zonelists without adding any special synchronization between iterators and updater which I _believe_ is acceptable as explained in the changelog. I hope I am not missing anything. Patch 8 then removes zonelists_mutex which is kind of ugly as well and not really needed AFAICS but a care should be taken when double checking my thinking. This patch (of 9): Supporting zone ordered zonelists costs us just a lot of code while the usefulness is arguable if existent at all. Mel has already made node ordering default on 64b systems. 32b systems are still using ZONELIST_ORDER_ZONE because it is considered better to fallback to a different NUMA node rather than consume precious lowmem zones. This argument is, however, weaken by the fact that the memory reclaim has been reworked to be node rather than zone oriented. This means that lowmem requests have to skip over all highmem pages on LRUs already and so zone ordering doesn't save the reclaim time much. So the only advantage of the zone ordering is under a light memory pressure when highmem requests do not ever hit into lowmem zones and the lowmem pressure doesn't need to reclaim. Considering that 32b NUMA systems are rather suboptimal already and it is generally advisable to use 64b kernel on such a HW I believe we should rather care about the code maintainability and just get rid of ZONELIST_ORDER_ZONE altogether. Keep systcl in place and warn if somebody tries to set zone ordering either from kernel command line or the sysctl. [mhocko@suse.com: reading vm.numa_zonelist_order will never terminate] Link: http://lkml.kernel.org/r/20170721143915.14161-2-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Acked-by: Mel Gorman <mgorman@suse.de> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Joonsoo Kim <js1304@gmail.com> Cc: Shaohua Li <shaohua.li@intel.com> Cc: Toshi Kani <toshi.kani@hpe.com> Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com> Cc: <linux-api@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
35b55ef2 |
|
15-Jun-2017 |
Noam Camus <noamca@mellanox.com> |
ARC: [plat-eznps] new command line argument for HW scheduler at MTM We add ability for all cores at NPS SoC to control the number of cycles HW thread can execute before it is replace with another eligible HW thread within the same core. The replacement is done by the HW scheduler. Signed-off-by: Noam Camus <noamca@mellanox.com> Signed-off-by: Vineet Gupta <vgupta@synopsys.com> [vgupta: simplified handlign of out of range argument value]
|
#
1d9807fc |
|
24-Aug-2017 |
Tony Luck <tony.luck@intel.com> |
x86/intel_rdt: Add command line options for resource director technology Command line options allow us to ignore features that we don't want. Also we can re-enable options that have been disabled on a platform (so long as the underlying h/w actually supports the option). [ tglx: Marked the option array __initdata and the helper function __init ] Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Fenghua" <fenghua.yu@intel.com> Cc: Ravi V" <ravi.v.shankar@intel.com> Cc: "Peter Zijlstra" <peterz@infradead.org> Cc: "Stephane Eranian" <eranian@google.com> Cc: "Andi Kleen" <ak@linux.intel.com> Cc: "David Carrillo-Cisneros" <davidcc@google.com> Cc: Vikas Shivappa <vikas.shivappa@linux.intel.com> Link: http://lkml.kernel.org/r/0c37b0d4dbc30977a3c1cee08b66420f83662694.1503512900.git.tony.luck@intel.com
|
#
3f429842 |
|
07-Aug-2017 |
Heiko Carstens <hca@linux.ibm.com> |
s390/vmcp: make use of contiguous memory allocator If memory is fragmented it is unlikely that large order memory allocations succeed. This has been an issue with the vmcp device driver since a long time, since it requires large physical contiguous memory ares for large responses. To hopefully resolve this issue make use of the contiguous memory allocator (cma). This patch adds a vmcp specific vmcp cma area with a default size of 4MB. The size can be changed either via the VMCP_CMA_SIZE config option at compile time or with the "vmcp_cma" kernel parameter (e.g. "vmcp_cma=16m"). For any vmcp response buffers larger than 16k memory from the cma area will be allocated. If such an allocation fails, there is a fallback to the buddy allocator. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
f99bcb2c |
|
02-Jun-2017 |
Paul E. McKenney <paulmck@kernel.org> |
documentation: Fix relation between nohz_full and rcu_nocbs If a CPU is specified in the nohz_full= kernel boot parameter to a kernel built with CONFIG_NO_HZ_FULL=y, then that CPU's callbacks will be offloaded, just as if that CPU had also been specified in the rcu_nocbs= kernel boot parameter. But the current documentation states that the user must keep these two boot parameters manually synchronized. This commit therefore updates the documentation to reflect reality. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
c262f3b9 |
|
17-Jul-2017 |
Tom Lendacky <thomas.lendacky@amd.com> |
x86/cpu/AMD: Document AMD Secure Memory Encryption (SME) Create a Documentation entry to describe the AMD Secure Memory Encryption (SME) feature and add documentation for the mem_encrypt= kernel parameter. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: Dave Young <dyoung@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Larry Woodman <lwoodman@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Toshimitsu Kani <toshi.kani@hpe.com> Cc: kasan-dev@googlegroups.com Cc: kvm@vger.kernel.org Cc: linux-arch@vger.kernel.org Cc: linux-doc@vger.kernel.org Cc: linux-efi@vger.kernel.org Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/ca0a0c13b055fd804cfc92cbaca8acd68057eed0.1500319216.git.thomas.lendacky@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
f70029bb |
|
06-Jul-2017 |
Michal Hocko <mhocko@suse.com> |
mm, memory_hotplug: drop CONFIG_MOVABLE_NODE Commit 20b2f52b73fe ("numa: add CONFIG_MOVABLE_NODE for movable-dedicated node") has introduced CONFIG_MOVABLE_NODE without a good explanation on why it is actually useful. It makes a lot of sense to make movable node semantic opt in but we already have that because the feature has to be explicitly enabled on the kernel command line. A config option on top only makes the configuration space larger without a good reason. It also adds an additional ifdefery that pollutes the code. Just drop the config option and make it de-facto always enabled. This shouldn't introduce any change to the semantic. Link: http://lkml.kernel.org/r/20170529114141.536-3-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Acked-by: Reza Arbab <arbab@linux.vnet.ibm.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Mel Gorman <mgorman@suse.de> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Yasuaki Ishimatsu <yasu.isimatu@gmail.com> Cc: Xishi Qiu <qiuxishi@huawei.com> Cc: Kani Toshimitsu <toshi.kani@hpe.com> Cc: Chen Yucong <slaoub@gmail.com> Cc: Joonsoo Kim <js1304@gmail.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Daniel Kiper <daniel.kiper@oracle.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
7660a6fd |
|
06-Jul-2017 |
Kees Cook <keescook@chromium.org> |
mm: allow slab_nomerge to be set at build time Some hardened environments want to build kernels with slab_nomerge already set (so that they do not depend on remembering to set the kernel command line option). This is desired to reduce the risk of kernel heap overflows being able to overwrite objects from merged caches and changes the requirements for cache layout control, increasing the difficulty of these attacks. By keeping caches unmerged, these kinds of exploits can usually only damage objects in the same cache (though the risk to metadata exploitation is unchanged). Link: http://lkml.kernel.org/r/20170620230911.GA25238@beast Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Daniel Micay <danielmicay@gmail.com> Cc: David Windsor <dave@nullcore.net> Cc: Eric Biggers <ebiggers3@gmail.com> Cc: Christoph Lameter <cl@linux.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Daniel Micay <danielmicay@gmail.com> Cc: David Windsor <dave@nullcore.net> Cc: Eric Biggers <ebiggers3@gmail.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Nicolas Pitre <nicolas.pitre@linaro.org> Cc: Tejun Heo <tj@kernel.org> Cc: Daniel Mack <daniel@zonque.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Helge Deller <deller@gmx.de> Cc: Rik van Riel <riel@redhat.com> Cc: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
0790c9aa |
|
29-Jun-2017 |
Andy Lutomirski <luto@kernel.org> |
x86/mm: Add the 'nopcid' boot option to turn off PCID The parameter is only present on x86_64 systems to save a few bytes, as PCID is always disabled on x86_32. Signed-off-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Nadav Amit <nadav.amit@gmail.com> Reviewed-by: Borislav Petkov <bp@suse.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/8bbb2e65bcd249a5f18bfb8128b4689f08ac2b60.1498751203.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
787e3075 |
|
13-Jun-2017 |
Steffen Maier <maier@linux.vnet.ibm.com> |
docs-rst: fix broken links to dynamic-debug-howto in kernel-parameters Another place in lib/Kconfig.debug was already fixed in commit f8998c226587 ("lib/Kconfig.debug: correct documentation paths"). Fixes: 9d85025b0418 ("docs-rst: create an user's manual book") Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
|
#
503ceaef |
|
21-Apr-2017 |
Mimi Zohar <zohar@linux.vnet.ibm.com> |
ima: define a set of appraisal rules requiring file signatures The builtin "ima_appraise_tcb" policy should require file signatures for at least a few of the hooks (eg. kernel modules, firmware, and the kexec kernel image), but changing it would break the existing userspace/kernel ABI. This patch defines a new builtin policy named "secure_boot", which can be specified on the "ima_policy=" boot command line, independently or in conjunction with the "ima_appraise_tcb" policy, by specifing ima_policy="appraise_tcb | secure_boot". The new appraisal rules requiring file signatures will be added prior to the "ima_appraise_tcb" rules. Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com> Changelog: - Reference secure boot in the new builtin policy name. (Thiago Bauermann)
|
#
33ce9549 |
|
23-Apr-2017 |
Mimi Zohar <zohar@linux.vnet.ibm.com> |
ima: extend the "ima_policy" boot command line to support multiple policies Add support for providing multiple builtin policies on the "ima_policy=" boot command line. Use "|" as the delimitor separating the policy names. Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
|
#
e36361d7 |
|
18-Jun-2017 |
Andreas Färber <afaerber@suse.de> |
tty: serial: Add Actions Semi Owl UART earlycon This implements an earlycon for Actions Semi S500/S900 SoCs. Based on LeMaker linux-actions tree. Signed-off-by: Andreas Färber <afaerber@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
1be7107f |
|
19-Jun-2017 |
Hugh Dickins <hughd@google.com> |
mm: larger stack guard gap, between vmas Stack guard page is a useful feature to reduce a risk of stack smashing into a different mapping. We have been using a single page gap which is sufficient to prevent having stack adjacent to a different mapping. But this seems to be insufficient in the light of the stack usage in userspace. E.g. glibc uses as large as 64kB alloca() in many commonly used functions. Others use constructs liks gid_t buffer[NGROUPS_MAX] which is 256kB or stack strings with MAX_ARG_STRLEN. This will become especially dangerous for suid binaries and the default no limit for the stack size limit because those applications can be tricked to consume a large portion of the stack and a single glibc call could jump over the guard page. These attacks are not theoretical, unfortunatelly. Make those attacks less probable by increasing the stack guard gap to 1MB (on systems with 4k pages; but make it depend on the page size because systems with larger base pages might cap stack allocations in the PAGE_SIZE units) which should cover larger alloca() and VLA stack allocations. It is obviously not a full fix because the problem is somehow inherent, but it should reduce attack space a lot. One could argue that the gap size should be configurable from userspace, but that can be done later when somebody finds that the new 1MB is wrong for some special case applications. For now, add a kernel command line option (stack_guard_gap) to specify the stack gap size (in page units). Implementation wise, first delete all the old code for stack guard page: because although we could get away with accounting one extra page in a stack vma, accounting a larger gap can break userspace - case in point, a program run with "ulimit -S -v 20000" failed when the 1MB gap was counted for RLIMIT_AS; similar problems could come with RLIMIT_MLOCK and strict non-overcommit mode. Instead of keeping gap inside the stack vma, maintain the stack guard gap as a gap between vmas: using vm_start_gap() in place of vm_start (or vm_end_gap() in place of vm_end if VM_GROWSUP) in just those few places which need to respect the gap - mainly arch_get_unmapped_area(), and and the vma tree's subtree_gap support for that. Original-patch-by: Oleg Nesterov <oleg@redhat.com> Original-patch-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Tested-by: Helge Deller <deller@gmx.de> # parisc Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
ff89511e |
|
08-Jun-2017 |
Marc Zyngier <maz@kernel.org> |
KVM: arm64: Enable GICv3 common sysreg trapping via command-line Now that we're able to safely handle common sysreg access, let's give the user the opportunity to enable it by passing a specific command-line option (vgic_v3.common_trap). Tested-by: Alexander Graf <agraf@suse.de> Acked-by: David Daney <david.daney@cavium.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Acked-by: Christoffer Dall <cdall@linaro.org> Signed-off-by: Christoffer Dall <cdall@linaro.org>
|
#
e23f62f7 |
|
08-Jun-2017 |
Marc Zyngier <maz@kernel.org> |
KVM: arm64: Enable GICv3 Group-0 sysreg trapping via command-line Now that we're able to safely handle Group-0 sysreg access, let's give the user the opportunity to enable it by passing a specific command-line option (vgic_v3.group0_trap). Tested-by: Alexander Graf <agraf@suse.de> Acked-by: David Daney <david.daney@cavium.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <cdall@linaro.org>
|
#
182936ee |
|
08-Jun-2017 |
Marc Zyngier <maz@kernel.org> |
KVM: arm64: Enable GICv3 Group-1 sysreg trapping via command-line Now that we're able to safely handle Group-1 sysreg access, let's give the user the opportunity to enable it by passing a specific command-line option (vgic_v3.group1_trap). Tested-by: Alexander Graf <agraf@suse.de> Acked-by: David Daney <david.daney@cavium.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Acked-by: Christoffer Dall <cdall@linaro.org> Signed-off-by: Christoffer Dall <cdall@linaro.org>
|
#
62a31ce1 |
|
05-Jun-2017 |
Leo Yan <leo.yan@linaro.org> |
doc: Add coresight_cpu_debug.enable to kernel-parameters.txt Add coresight_cpu_debug.enable to kernel-parameters.txt, this flag is used to enable/disable the CPU sampling based debugging. Signed-off-by: Leo Yan <leo.yan@linaro.org> Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
90040c9e |
|
10-May-2017 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Remove *_SLOW_* Kconfig options The RCU_TORTURE_TEST_SLOW_PREINIT, RCU_TORTURE_TEST_SLOW_PREINIT_DELAY, RCU_TORTURE_TEST_SLOW_PREINIT_DELAY, RCU_TORTURE_TEST_SLOW_INIT, RCU_TORTURE_TEST_SLOW_INIT_DELAY, RCU_TORTURE_TEST_SLOW_CLEANUP, and RCU_TORTURE_TEST_SLOW_CLEANUP_DELAY Kconfig options are only useful for torture testing, and there are the rcutree.gp_cleanup_delay, rcutree.gp_init_delay, and rcutree.gp_preinit_delay kernel boot parameters that rcutorture can use instead. The effect of these parameters is to artificially slow down grace period initialization and cleanup in order to make some types of race conditions happen more often. This commit therefore simplifies Tree RCU a bit by removing the Kconfig options and adding the corresponding kernel parameters to rcutorture's .boot files instead. However, this commit also leaves out the kernel parameters for TREE02, TREE04, and TREE07 in order to have about the same number of tests slowed as not slowed. TREE01, TREE03, TREE05, and TREE06 are slowed, and the rest are not slowed. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
c350c008 |
|
03-May-2017 |
Paul E. McKenney <paulmck@kernel.org> |
srcu: Prevent sdp->srcu_gp_seq_needed counter wrap If a given CPU never happens to ever start an SRCU grace period, the grace-period sequence counter might wrap. If this CPU were to decide to finally start a grace period, the state of its sdp->srcu_gp_seq_needed might make it appear that it has already requested this grace period, which would prevent starting the grace period. If no other CPU ever started a grace period again, this would look like a grace-period hang. Even if some other CPU took pity and started the needed grace period, the leaf rcu_node structure's ->srcu_data_have_cbs field won't have record of the fact that this CPU has a callback pending, which would look like a very localized grace-period hang. This might seem very unlikely, but SRCU grace periods can take less than a microsecond on small systems, which means that overflow can happen in much less than an hour on a 32-bit embedded system. And embedded systems are especially likely to have long-term idle CPUs. Therefore, it makes sense to prevent this scenario from happening. This commit therefore scans each srcu_data structure occasionally, with frequency controlled by the srcutree.counter_wrap_check kernel boot parameter. This parameter can be set to something like 255 in order to exercise the counter-wrap-prevention code. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
820687a7 |
|
25-Apr-2017 |
Paul E. McKenney <paulmck@kernel.org> |
rcuperf: Add writer_holdoff boot parameter This commit adds a writer_holdoff boot parameter to rcuperf, which is intended to be used to test Tree SRCU's auto-expediting. This boot parameter is in microseconds, and defaults to zero (that is, disabled). Set it to a bit larger than srcutree.exp_holdoff, keeping the nanosecond/microsecond conversion, to force Tree SRCU to auto-expedite more aggressively. This commit also adds documentation for this parameter, and fixes some alphabetization while in the neighborhood. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
881ed593 |
|
17-Apr-2017 |
Paul E. McKenney <paulmck@kernel.org> |
rcuperf: Add ability to performance-test call_rcu() and friends This commit upgrades rcuperf so that it can do performance testing on asynchronous grace-period primitives such as call_srcu(). There is a new rcuperf.gp_async module parameter that specifies this new behavior, with the pre-existing rcuperf.gp_exp testing expedited grace periods such as synchronize_rcu_expedited, and with the default being to test synchronous non-expedited grace periods such as synchronize_rcu(). There is also a new rcuperf.gp_async_max module parameter that specifies the maximum number of outstanding callbacks per writer kthread, defaulting to 1,000. When this limit is exceeded, the writer thread invokes the appropriate flavor of rcu_barrier() to wait for callbacks to drain. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> [ paulmck: Removed the redundant initialization noted by Arnd Bergmann. ]
|
#
a2b05b7a |
|
11-May-2017 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc/64s: Add dt_cpu_ftrs boot time setup option Provide a dt_cpu_ftrs= cmdline option to disable the dt_cpu_ftrs CPU feature discovery, and fall back to the "cputable" based version. Also allow control of advertising unknown features to userspace and with this parameter, and remove the clunky CONFIG option. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Add explicit early check of bootargs in dt_cpu_ftrs_init()] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
8fcc9bc3 |
|
12-May-2017 |
Baoquan He <bhe@redhat.com> |
Documentation/kernel-parameters.txt: Update 'memmap=' boot option description In commit: 9710f581bb4c ("x86, mm: Let "memmap=" take more entries one time") ... 'memmap=' was changed to adopt multiple, comma delimited values in a single entry, so update the related description. In the special case of only specifying size value without an offset, like memmap=nn[KMG], memmap behaves similarly to mem=nn[KMG], so update it too here. Furthermore, for memmap=nn[KMG]$ss[KMG], an escape character needs be added before '$' for some bootloaders. E.g in grub2, if we specify memmap=100M$5G as suggested by the documentation, "memmap=100MG" gets passed to the kernel. Clarify all this. Signed-off-by: Baoquan He <bhe@redhat.com> Acked-by: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dan.j.williams@intel.com Cc: douly.fnst@cn.fujitsu.com Cc: dyoung@redhat.com Cc: m.mizuma@jp.fujitsu.com Link: http://lkml.kernel.org/r/1494654390-23861-4-git-send-email-bhe@redhat.com [ Various spelling fixes. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
f7c864e7 |
|
03-May-2017 |
Andre Przywara <andre.przywara@arm.com> |
Documentation: earlycon: fix Marvell Armada 3700 UART name The Marvell Armada 3700 UART uses "ar3700_uart" for its earlycon name. Adjust documentation to match the code. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
|
#
c0c74acb |
|
26-Feb-2017 |
Hans-Christian Noren Egtvedt <egtvedt@samfundet.no> |
docs: remove all references to AVR32 architecture The AVR32 architecture support has been removed from the Linux kernel, hence remove all references to it from Documentation. Signed-off-by: Hans-Christian Noren Egtvedt <egtvedt@samfundet.no> Signed-off-by: Håvard Skinnemoen <hskinnemoen@gmail.com> Signed-off-by: Nicolas Ferre <nicolas.ferre@microchip.com> Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com> Acked-by: Boris Brezillon <boris.brezillon@free-electrons.com>
|
#
22607d66 |
|
25-Apr-2017 |
Paul E. McKenney <paulmck@kernel.org> |
srcu: Specify auto-expedite holdoff time On small systems, in the absence of readers, expedited SRCU grace periods can complete in less than a microsecond. This means that an eight-CPU system can have all CPUs doing synchronize_srcu() in a tight loop and almost always expedite. This might actually be desirable in some situations, but in general it is a good way to needlessly burn CPU cycles. And in those situations where it is desirable, your friend is the function synchronize_srcu_expedited(). For other situations, this commit adds a kernel parameter that specifies a holdoff between completing the last SRCU grace period and auto-expediting the next. If the next grace period starts before the holdoff expires, auto-expediting is disabled. The holdoff is 50 microseconds by default, and can be tuned to the desired number of nanoseconds. A value of zero disables auto-expediting. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Mike Galbraith <efault@gmx.de>
|
#
bfd20f1c |
|
26-Apr-2017 |
Shaohua Li <shli@fb.com> |
x86, iommu/vt-d: Add an option to disable Intel IOMMU force on IOMMU harms performance signficantly when we run very fast networking workloads. It's 40GB networking doing XDP test. Software overhead is almost unaware, but it's the IOTLB miss (based on our analysis) which kills the performance. We observed the same performance issue even with software passthrough (identity mapping), only the hardware passthrough survives. The pps with iommu (with software passthrough) is only about ~30% of that without it. This is a limitation in hardware based on our observation, so we'd like to disable the IOMMU force on, but we do want to use TBOOT and we can sacrifice the DMA security bought by IOMMU. I must admit I know nothing about TBOOT, but TBOOT guys (cc-ed) think not eabling IOMMU is totally ok. So introduce a new boot option to disable the force on. It's kind of silly we need to run into intel_iommu_init even without force on, but we need to disable TBOOT PMR registers. For system without the boot option, nothing is changed. Signed-off-by: Shaohua Li <shli@fb.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
#
6d22323b |
|
12-Apr-2017 |
Christoph Hellwig <hch@lst.de> |
nfs: remove the objlayout driver The objlayout code has been in the tree, but it's been unmaintained and no server product for it actually ever shipped. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
#
fccb4e3b |
|
05-Jan-2017 |
Will Deacon <will@kernel.org> |
iommu: Allow default domain type to be set on the kernel command line The IOMMU core currently initialises the default domain for each group to IOMMU_DOMAIN_DMA, under the assumption that devices will use IOMMU-backed DMA ops by default. However, in some cases it is desirable for the DMA ops to bypass the IOMMU for performance reasons, reserving use of translation for subsystems such as VFIO that require it for enforcing device isolation. Rather than modify each IOMMU driver to provide different semantics for DMA domains, instead we introduce a command line parameter that can be used to change the type of the default domain. Passthrough can then be specified using "iommu.passthrough=1" on the kernel command line. Signed-off-by: Will Deacon <will.deacon@arm.com>
|
#
b0845ce5 |
|
31-Mar-2017 |
Mark Rutland <mark.rutland@arm.com> |
kasan: report only the first error by default Disable kasan after the first report. There are several reasons for this: - Single bug quite often has multiple invalid memory accesses causing storm in the dmesg. - Write OOB access might corrupt metadata so the next report will print bogus alloc/free stacktraces. - Reports after the first easily could be not bugs by itself but just side effects of the first one. Given that multiple reports usually only do harm, it makes sense to disable kasan after the first one. If user wants to see all the reports, the boot-time parameter kasan_multi_shot must be used. [aryabinin@virtuozzo.com: wrote changelog and doc, added missing include] Link: http://lkml.kernel.org/r/20170323154416.30257-1-aryabinin@virtuozzo.com Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
011d8261 |
|
27-Mar-2017 |
Borislav Petkov <bp@suse.de> |
RAS: Add a Corrected Errors Collector Introduce a simple data structure for collecting correctable errors along with accessors. More detailed description in the code itself. The error decoding is done with the decoding chain now and mce_first_notifier() gets to see the error first and the CEC decides whether to log it and then the rest of the chain doesn't hear about it - basically the main reason for the CE collector - or to continue running the notifiers. When the CEC hits the action threshold, it will try to soft-offine the page containing the ECC and then the whole decoding chain gets to see the error. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170327093304.10683-5-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
1b5aeebf |
|
21-Mar-2017 |
Lu Baolu <baolu.lu@linux.intel.com> |
x86/earlyprintk: Add support for earlyprintk via USB3 debug port Add support for earlyprintk by writing debug messages to the USB3 debug port. Users can use this type of early printk by specifying the kernel parameter of "earlyprintk=xdbc". This gives users a chance of providing debugging output. The hardware for USB3 debug port requires DMA memory blocks. This requires to delay setting up debugging hardware and registering boot console until the memblocks are filled. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mathias Nyman <mathias.nyman@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: linux-usb@vger.kernel.org Link: http://lkml.kernel.org/r/1490083293-3792-4-git-send-email-baolu.lu@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
933bfe4d |
|
25-Feb-2017 |
Tobias Jakobi <tjakobi@math.uni-bielefeld.de> |
HID: usbhid: extend polling interval configuration to joysticks For mouse devices we can currently change the polling interval via usbhid.mousepoll. Implement the same thing for joysticks, so users can reduce input latency this way. This has been tested with a Logitech RumblePad 2 with jspoll=2, resulting in a polling rate of 500Hz (verified with evhz). Signed-off-by: Tobias Jakobi <tjakobi@math.uni-bielefeld.de> Acked-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
|
#
d82f2692 |
|
28-Feb-2017 |
Len Brown <len.brown@intel.com> |
cpufreq: Add the "cpufreq.off=1" cmdline option Add the "cpufreq.off=1" cmdline option. At boot-time, this allows a user to request CONFIG_CPU_FREQ=n behavior from a kernel built with CONFIG_CPU_FREQ=y. This is analogous to the existing "cpuidle.off=1" option and CONFIG_CPU_IDLE=y This capability is valuable when we need to debug end-user issues in the BIOS or in Linux. It is also convenient for enabling comparisons, which may otherwise require a new kernel, or help from BIOS SETUP, which may be buggy or unavailable. Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
#
65a50c65 |
|
02-Mar-2017 |
Todd Brandt <todd.e.brandt@linux.intel.com> |
ftrace/graph: Add ftrace_graph_max_depth kernel parameter Early trace callgraphs can be extremely large on systems with several seconds of boot time. The max_depth parameter limits how deep the graph trace goes and reduces the output size. This parameter is the same as the max_graph_depth file in tracefs. Link: http://lkml.kernel.org/r/1488499935-23216-1-git-send-email-todd.e.brandt@linux.intel.com Signed-off-by: Todd Brandt <todd.e.brandt@linux.intel.com> [ changed comments about debugfs to tracefs ] Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
#
1663f26d |
|
22-Feb-2017 |
Tejun Heo <tj@kernel.org> |
slub: make sysfs directories for memcg sub-caches optional SLUB creates a per-cache directory under /sys/kernel/slab which hosts a bunch of debug files. Usually, there aren't that many caches on a system and this doesn't really matter; however, if memcg is in use, each cache can have per-cgroup sub-caches. SLUB creates the same directories for these sub-caches under /sys/kernel/slab/$CACHE/cgroup. Unfortunately, because there can be a lot of cgroups, active or draining, the product of the numbers of caches, cgroups and files in each directory can reach a very high number - hundreds of thousands is commonplace. Millions and beyond aren't difficult to reach either. What's under /sys/kernel/slab is primarily for debugging and the information and control on the a root cache already cover its sub-caches. While having a separate directory for each sub-cache can be helpful for development, it doesn't make much sense to pay this amount of overhead by default. This patch introduces a boot parameter slub_memcg_sysfs which determines whether to create sysfs directories for per-memcg sub-caches. It also adds CONFIG_SLUB_MEMCG_SYSFS_ON which determines the boot parameter's default value and defaults to 0. [akpm@linux-foundation.org: kset_unregister(NULL) is legal] Link: http://lkml.kernel.org/r/20170204145203.GB26958@mtj.duckdns.org Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
47512cfd |
|
15-Feb-2017 |
Thomas Gleixner <tglx@linutronix.de> |
x86/platform/goldfish: Prevent unconditional loading The goldfish platform code registers the platform device unconditionally which causes havoc in several ways if the goldfish_pdev_bus driver is enabled: - Access to the hardcoded physical memory region, which is either not available or contains stuff which is completely unrelated. - Prevents that the interrupt of the serial port can be requested - In case of a spurious interrupt it goes into a infinite loop in the interrupt handler of the pdev_bus driver (which needs to be fixed seperately). Add a 'goldfish' command line option to make the registration opt-in when the platform is compiled in. I'm seriously grumpy about this engineering trainwreck, which has seven SOBs from Intel developers for 50 lines of code. And none of them figured out that this is broken. Impressive fail! Fixes: ddd70cf93d78 ("goldfish: platform device for x86") Reported-by: Gabriel C <nix.or.die@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
5444ea6a |
|
06-Feb-2017 |
Ding Tianhong <dingtianhong@huawei.com> |
clocksource/drivers/arm_arch_timer: Remove fsl-a008585 parameter Having a command line option to flip the errata handling for a particular erratum is a little bit unusual, and it's vastly superior to pass this in the DT. By common consensus, it's best to kill off the command line parameter. Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> [Mark: split patch, reword commit message] Signed-off-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
|
#
e16fd002 |
|
20-Jan-2017 |
Grzegorz Andrejczuk <grzegorz.andrejczuk@intel.com> |
x86/cpufeature: Enable RING3MWAIT for Knights Landing Enable ring 3 MONITOR/MWAIT for Intel Xeon Phi x200 codenamed Knights Landing. Presence of this feature cannot be detected automatically (by reading any other MSR) therefore it is required to explicitly check for the family and model of the CPU before attempting to enable it. Signed-off-by: Grzegorz Andrejczuk <grzegorz.andrejczuk@intel.com> Cc: Piotr.Luc@intel.com Cc: dave.hansen@linux.intel.com Link: http://lkml.kernel.org/r/1484918557-15481-5-git-send-email-grzegorz.andrejczuk@intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
89175cf7 |
|
11-Jan-2017 |
Heiko Carstens <hca@linux.ibm.com> |
s390: provide sclp based boot console Use the early sclp code to provide a boot console. This boot console is available if the kernel parameter "earlyprintk" has been specified, just like it works for other architectures that also provide an early boot console. This makes debugging of early problems much easier, since now we finally have working console output even before memory detection is running. The boot console will be automatically disabled as soon as another console will be registered. Reviewed-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
e3c50dfb |
|
06-Jan-2017 |
Paul E. McKenney <paulmck@kernel.org> |
doc: Add rcutree.rcu_kick_kthreads to kernel-parameters.txt Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
|
#
ec84aa0a |
|
11-Dec-2016 |
Martin Blumenstingl <martin.blumenstingl@googlemail.com> |
tty: serial: lantiq: implement earlycon support This allows enabling earlycon for devices with a Lantiq serial console by splitting lqasc_serial_port_write() from lqasc_console_write() and re-using the new function for earlycon's write callback. The kernel-parameter name matches the driver name ("lantiq"), similar to how other drivers implement this. Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
9c4aa1ee |
|
15-Dec-2016 |
Lv Zheng <lv.zheng@intel.com> |
ACPI / sysfs: Provide quirk mechanism to prevent GPE flooding Sometimes, the users may require a quirk to be provided from ACPI subsystem core to prevent a GPE from flooding. Normally, if a GPE cannot be dispatched, ACPICA core automatically prevents the GPE from firing. But there are cases the GPE is dispatched by _Lxx/_Exx provided via AML table, and OSPM is lacking of the knowledge to get _Lxx/_Exx correctly executed to handle the GPE, thus the GPE flooding may still occur. The existing quirk mechanism can be enabled/disabled using the following commands to prevent such kind of GPE flooding during runtime: # echo mask > /sys/firmware/acpi/interrupts/gpe00 # echo unmask > /sys/firmware/acpi/interrupts/gpe00 To avoid GPE flooding during boot, we need a boot stage mechanism. This patch provides such a boot stage quirk mechanism to stop this kind of GPE flooding. This patch doesn't fix any feature gap but since the new feature gaps could be found in the future endlessly, and can disappear if the feature gaps are filled, providing a boot parameter rather than a DMI table should suffice. Link: https://bugzilla.kernel.org/show_bug.cgi?id=53071 Link: https://bugzilla.kernel.org/show_bug.cgi?id=117481 Link: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/887793 Signed-off-by: Lv Zheng <lv.zheng@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
#
d68a6fe9 |
|
19-Dec-2016 |
Mimi Zohar <zohar@linux.vnet.ibm.com> |
ima: define a canonical binary_runtime_measurements list format The IMA binary_runtime_measurements list is currently in platform native format. To allow restoring a measurement list carried across kexec with a different endianness than the targeted kernel, this patch defines little-endian as the canonical format. For big endian systems wanting to save/restore the measurement list from a system with a different endianness, a new boot command line parameter named "ima_canonical_fmt" is defined. Considerations: use of the "ima_canonical_fmt" boot command line option will break existing userspace applications on big endian systems expecting the binary_runtime_measurements list to be in platform native format. Link: http://lkml.kernel.org/r/1480554346-29071-10-git-send-email-zohar@linux.vnet.ibm.com Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com> Acked-by: Dmitry Kasatkin <dmitry.kasatkin@gmail.com> Cc: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Andreas Steffen <andreas.steffen@strongswan.org> Cc: Josh Sklar <sklar@linux.vnet.ibm.com> Cc: Dave Young <dyoung@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Stewart Smith <stewart@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
fff5d992 |
|
16-Dec-2016 |
Geert Uytterhoeven <geert+renesas@glider.be> |
swiotlb: Add swiotlb=noforce debug option On architectures like arm64, swiotlb is tied intimately to the core architecture DMA support. In addition, ZONE_DMA cannot be disabled. To aid debugging and catch devices not supporting DMA to memory outside the 32-bit address space, add a kernel command line option "swiotlb=noforce", which disables the use of bounce buffers. If specified, trying to map memory that cannot be used with DMA will fail, and a rate-limited warning will be printed. Note that io_tlb_nslabs is set to 1, which is the minimal supported value. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
c3cbd075 |
|
01-Dec-2016 |
Balbir Singh <bsingharora@gmail.com> |
ppc/idle: Add documentation for powersave=off Update kernel-parameters.txt to add Documentation for powersave=off. Signed-off-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
|
#
e52347bd |
|
02-Nov-2016 |
Jani Nikula <jani.nikula@intel.com> |
Documentation/admin-guide: split the kernel parameter list to a separate file Include the literal kernel parameter list from a separate file. This helps the pdf build. Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|