#
9843231c |
|
22-Mar-2024 |
Tom Lendacky <thomas.lendacky@amd.com> |
x86/boot/64: Move 5-level paging global variable assignments back Commit 63bed9660420 ("x86/startup_64: Defer assignment of 5-level paging global variables") moved assignment of 5-level global variables to later in the boot in order to avoid having to use RIP relative addressing in order to set them. However, when running with 5-level paging and SME active (mem_encrypt=on), the variables are needed as part of the page table setup needed to encrypt the kernel (using pgd_none(), p4d_offset(), etc.). Since the variables haven't been set, the page table manipulation is done as if 4-level paging is active, causing the system to crash on boot. While only a subset of the assignments that were moved need to be set early, move all of the assignments back into check_la57_support() so that these assignments aren't spread between two locations. Instead of just reverting the fix, this uses the new RIP_REL_REF() macro when assigning the variables. Fixes: 63bed9660420 ("x86/startup_64: Defer assignment of 5-level paging global variables") Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/2ca419f4d0de719926fd82353f6751f717590a86.1711122067.git.thomas.lendacky@amd.com
|
#
4d0d7e78 |
|
22-Mar-2024 |
Tom Lendacky <thomas.lendacky@amd.com> |
x86/boot/64: Apply encryption mask to 5-level pagetable update When running with 5-level page tables, the kernel mapping PGD entry is updated to point to the P4D table. The assignment uses _PAGE_TABLE_NOENC, which, when SME is active (mem_encrypt=on), results in a page table entry without the encryption mask set, causing the system to crash on boot. Change the assignment to use _PAGE_TABLE instead of _PAGE_TABLE_NOENC so that the encryption mask is set for the PGD entry. Fixes: 533568e06b15 ("x86/boot/64: Use RIP_REL_REF() to access early_top_pgt[]") Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/8f20345cda7dbba2cf748b286e1bc00816fe649a.1711122067.git.thomas.lendacky@amd.com
|
#
63bed966 |
|
27-Feb-2024 |
Ard Biesheuvel <ardb@kernel.org> |
x86/startup_64: Defer assignment of 5-level paging global variables Assigning the 5-level paging related global variables from the earliest C code using explicit references that use the 1:1 translation of memory is unnecessary, as the startup code itself does not rely on them to create the initial page tables, and this is all it should be doing. So defer these assignments to the primary C entry code that executes via the ordinary kernel virtual mapping. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Tested-by: Tom Lendacky <thomas.lendacky@amd.com> Link: https://lore.kernel.org/r/20240227151907.387873-13-ardb+git@google.com
|
#
11e36b0f |
|
26-Feb-2024 |
Brian Gerst <brgerst@gmail.com> |
x86/boot/64: Load the final kernel GDT during early boot directly, remove startup_gdt[] Instead of loading a duplicate GDT just for early boot, load the kernel GDT from its physical address. Signed-off-by: Brian Gerst <brgerst@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Ard Biesheuvel <ardb@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Link: https://lore.kernel.org/r/20240226220544.70769-1-brgerst@gmail.com
|
#
533568e0 |
|
20-Feb-2024 |
Ard Biesheuvel <ardb@kernel.org> |
x86/boot/64: Use RIP_REL_REF() to access early_top_pgt[] early_top_pgt[] is assigned from code that executes from a 1:1 mapping so it cannot use a plain access from C. Replace the use of fixup_pointer() with RIP_REL_REF(), which is better and simpler. For legibility and to align with the code that populates the lower page table levels, statically initialize the root level page table with an entry pointing to level3_kernel_pgt[], and overwrite it when needed to enable 5-level paging. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20240221113506.2565718-24-ardb+git@google.com
|
#
eb54c2ae |
|
20-Feb-2024 |
Ard Biesheuvel <ardb@kernel.org> |
x86/boot/64: Use RIP_REL_REF() to access early page tables The early statically allocated page tables are populated from code that executes from a 1:1 mapping so it cannot use plain accesses from C. Replace the use of fixup_pointer() with RIP_REL_REF(), which is better and simpler. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20240221113506.2565718-23-ardb+git@google.com
|
#
4f8b6cf2 |
|
20-Feb-2024 |
Ard Biesheuvel <ardb@kernel.org> |
x86/boot/64: Use RIP_REL_REF() to access '__supported_pte_mask' '__supported_pte_mask' is accessed from code that executes from a 1:1 mapping so it cannot use a plain access from C. Replace the use of fixup_pointer() with RIP_REL_REF(), which is better and simpler. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20240221113506.2565718-22-ardb+git@google.com
|
#
b0fe5fb6 |
|
20-Feb-2024 |
Ard Biesheuvel <ardb@kernel.org> |
x86/boot/64: Use RIP_REL_REF() to access early_dynamic_pgts[] early_dynamic_pgts[] and next_early_pgt are accessed from code that executes from a 1:1 mapping so it cannot use a plain access from C. Replace the use of fixup_pointer() with RIP_REL_REF(), which is better and simpler. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20240221113506.2565718-21-ardb+git@google.com
|
#
d9ec1158 |
|
20-Feb-2024 |
Ard Biesheuvel <ardb@kernel.org> |
x86/boot/64: Use RIP_REL_REF() to assign 'phys_base' 'phys_base' is assigned from code that executes from a 1:1 mapping so it cannot use a plain access from C. Replace the use of fixup_pointer() with RIP_REL_REF(), which is better and simpler. While at it, move the assignment to before the addition of the SME mask so there is no need to subtract it again, and drop the unnecessary addition ('phys_base' is statically initialized to 0x0) Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20240221113506.2565718-20-ardb+git@google.com
|
#
5da79367 |
|
20-Feb-2024 |
Ard Biesheuvel <ardb@kernel.org> |
x86/boot/64: Simplify global variable accesses in GDT/IDT programming There are two code paths in the startup code to program an IDT: one that runs from the 1:1 mapping and one that runs from the virtual kernel mapping. Currently, these are strictly separate because fixup_pointer() is used on the 1:1 path, which will produce the wrong value when used while executing from the virtual kernel mapping. Switch to RIP_REL_REF() so that the two code paths can be merged. Also, move the GDT and IDT descriptors to the stack so that they can be referenced directly, rather than via RIP_REL_REF(). Rename startup_64_setup_env() to startup_64_setup_gdt_idt() while at it, to make the call from assembler self-documenting. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20240221113506.2565718-19-ardb+git@google.com
|
#
3b184b71 |
|
19-Dec-2023 |
Vegard Nossum <vegard.nossum@oracle.com> |
x86/asm: Always set A (accessed) flag in GDT descriptors We have no known use for having the CPU track whether GDT descriptors have been accessed or not. Simplify the code by adding the flag to the common flags and removing it everywhere else. Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20231219151200.2878271-5-vegard.nossum@oracle.com
|
#
1445f6e1 |
|
19-Dec-2023 |
Vegard Nossum <vegard.nossum@oracle.com> |
x86/asm: Replace magic numbers in GDT descriptors, script-generated change Actually replace the numeric values by the new symbolic values. I used this to find all the existing users of the GDT_ENTRY*() macros: $ git grep -P 'GDT_ENTRY(_INIT)?\(' Some of the lines will exceed 80 characters, but some of them will be shorter again in the next couple of patches. Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20231219151200.2878271-4-vegard.nossum@oracle.com
|
#
d2a285d6 |
|
17-Oct-2023 |
Hou Wenlong <houwenlong.hwl@antgroup.com> |
x86/head/64: Move the __head definition to <asm/init.h> Move the __head section definition to a header to widen its use. An upcoming patch will mark the code as __head in mem_encrypt_identity.c too. Signed-off-by: Hou Wenlong <houwenlong.hwl@antgroup.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/0583f57977be184689c373fe540cbd7d85ca2047.1697525407.git.houwenlong.hwl@antgroup.com
|
#
7f6874ed |
|
11-Jul-2023 |
Hou Wenlong <houwenlong.hwl@antgroup.com> |
x86/head/64: Add missing __head annotation to startup_64_load_idt() This function is currently only used in the head code and is only called from startup_64_setup_env(). Although it would be inlined by the compiler, it would be better to mark it as __head too in case it doesn't. Signed-off-by: Hou Wenlong <houwenlong.hwl@antgroup.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/efcc5b5e18af880e415d884e072bf651c1fa7c34.1689130310.git.houwenlong.hwl@antgroup.com
|
#
dc628300 |
|
11-Jul-2023 |
Hou Wenlong <houwenlong.hwl@antgroup.com> |
x86/head/64: Mark 'startup_gdt[]' and 'startup_gdt_descr' as __initdata As 'startup_gdt[]' and 'startup_gdt_descr' are only used in booting, mark them as __initdata to allow them to be freed after boot. Signed-off-by: Hou Wenlong <houwenlong.hwl@antgroup.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/c85903a7cfad37d14a7e5a4df9fc7119a3669fb3.1689130310.git.houwenlong.hwl@antgroup.com
|
#
9f76d606 |
|
03-Aug-2023 |
Wang Jinchao <wangjinchao@xfusion.com> |
x86/boot: Harmonize the style of array-type parameter for fixup_pointer() calls The usage of '&' before the array parameter is redundant because '&array' is equivalent to 'array'. Therefore, there is no need to include '&' before the array parameter. In fact, using '&' can cause more confusion, especially for individuals who are not familiar with the address-of operation for arrays. They might mistakenly believe that one is different from the other and spend additional time realizing that they are actually the same. Harmonizing the style by removing the unnecessary '&' would save time for those individuals. Signed-off-by: Wang Jinchao <wangjinchao@xfusion.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/ZMt24BGEX9IhPSY6@fedora
|
#
001470fe |
|
07-Aug-2023 |
Yuntao Wang <ytcoode@gmail.com> |
x86/boot: Fix incorrect startup_gdt_descr.size Since the size value is added to the base address to yield the last valid byte address of the GDT, the current size value of startup_gdt_descr is incorrect (too large by one), fix it. [ mingo: This probably never mattered, because startup_gdt[] is only used in a very controlled fashion - but make it consistent nevertheless. ] Fixes: 866b556efa12 ("x86/head/64: Install startup GDT") Signed-off-by: Yuntao Wang <ytcoode@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Link: https://lore.kernel.org/r/20230807084547.217390-1-ytcoode@gmail.com
|
#
4208d2d7 |
|
12-Apr-2023 |
Josh Poimboeuf <jpoimboe@kernel.org> |
x86/head: Mark *_start_kernel() __noreturn Now that start_kernel() is __noreturn, mark its chain of callers __noreturn. Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/c2525f96b88be98ee027ee0291d58003036d4120.1681342859.git.jpoimboe@kernel.org
|
#
82328227 |
|
16-May-2022 |
Pasha Tatashin <pasha.tatashin@soleen.com> |
x86/mm: Remove P*D_PAGE_MASK and P*D_PAGE_SIZE macros Other architectures and the common mm/ use P*D_MASK, and P*D_SIZE. Remove the duplicated P*D_PAGE_MASK and P*D_PAGE_SIZE which are only used in x86/*. Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Acked-by: Mike Rapoport <rppt@linux.ibm.com> Link: https://lore.kernel.org/r/20220516185202.604654-1-tatashin@google.com
|
#
38fa5479 |
|
30-Jun-2022 |
Juergen Gross <jgross@suse.com> |
x86: Clear .brk area at early boot The .brk section has the same properties as .bss: it is an alloc-only section and should be cleared before being used. Not doing so is especially a problem for Xen PV guests, as the hypervisor will validate page tables (check for writable page tables and hypervisor private bits) before accepting them to be used. Make sure .brk is initially zero by letting clear_bss() clear the brk area, too. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220630071441.28576-3-jgross@suse.com
|
#
96e8fc58 |
|
30-Jun-2022 |
Juergen Gross <jgross@suse.com> |
x86/xen: Use clear_bss() for Xen PV guests Instead of clearing the bss area in assembly code, use the clear_bss() function. This requires to pass the start_info address as parameter to xen_start_kernel() in order to avoid the xen_start_info being zeroed again. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Link: https://lore.kernel.org/r/20220630071441.28576-2-jgross@suse.com
|
#
32e72854 |
|
05-Apr-2022 |
Andi Kleen <ak@linux.intel.com> |
x86/tdx: Port I/O: Add early boot support TDX guests cannot do port I/O directly. The TDX module triggers a #VE exception to let the guest kernel emulate port I/O by converting them into TDCALLs to call the host. But before IDT handlers are set up, port I/O cannot be emulated using normal kernel #VE handlers. To support the #VE-based emulation during this boot window, add a minimal early #VE handler support in early exception handlers. This is similar to what AMD SEV does. This is mainly to support earlyprintk's serial driver, as well as potentially the VGA driver. The early handler only supports I/O-related #VE exceptions. Unhandled or failed exceptions will be handled via early_fixup_exceptions() (like normal exception failures). At runtime I/O-related #VE exceptions (along with other types) handled by virt_exception_kernel(). Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lkml.kernel.org/r/20220405232939.73860-19-kirill.shutemov@linux.intel.com
|
#
59bd54a8 |
|
05-Apr-2022 |
Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> |
x86/tdx: Detect running as a TDX guest in early boot In preparation of extending cc_platform_has() API to support TDX guest, use CPUID instruction to detect support for TDX guests in the early boot code (via tdx_early_init()). Since copy_bootdata() is the first user of cc_platform_has() API, detect the TDX guest status before it. Define a synthetic feature flag (X86_FEATURE_TDX_GUEST) and set this bit in a valid TDX guest platform. Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Borislav Petkov <bp@suse.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20220405232939.73860-2-kirill.shutemov@linux.intel.com
|
#
469693d8 |
|
08-Feb-2022 |
Michael Roth <michael.roth@amd.com> |
x86/head/64: Re-enable stack protection Due to 103a4908ad4d ("x86/head/64: Disable stack protection for head$(BITS).o") kernel/head{32,64}.c are compiled with -fno-stack-protector to allow a call to set_bringup_idt_handler(), which would otherwise have stack protection enabled with CONFIG_STACKPROTECTOR_STRONG. While sufficient for that case, there may still be issues with calls to any external functions that were compiled with stack protection enabled that in-turn make stack-protected calls, or if the exception handlers set up by set_bringup_idt_handler() make calls to stack-protected functions. Subsequent patches for SEV-SNP CPUID validation support will introduce both such cases. Attempting to disable stack protection for everything in scope to address that is prohibitive since much of the code, like the SEV-ES #VC handler, is shared code that remains in use after boot and could benefit from having stack protection enabled. Attempting to inline calls is brittle and can quickly balloon out to library/helper code where that's not really an option. Instead, re-enable stack protection for head32.c/head64.c, and make the appropriate changes to ensure the segment used for the stack canary is initialized in advance of any stack-protected C calls. For head64.c: - The BSP will enter from startup_64() and call into C code (startup_64_setup_env()) shortly after setting up the stack, which may result in calls to stack-protected code. Set up %gs early to allow for this safely. - APs will enter from secondary_startup_64*(), and %gs will be set up soon after. There is one call to C code prior to %gs being setup (__startup_secondary_64()), but it is only to fetch 'sme_me_mask' global, so just load 'sme_me_mask' directly instead, and remove the now-unused __startup_secondary_64() function. For head32.c: - BSPs/APs will set %fs to __BOOT_DS prior to any C calls. In recent kernels, the compiler is configured to access the stack canary at %fs:__stack_chk_guard [1], which overlaps with the initial per-cpu '__stack_chk_guard' variable in the initial/"master" .data..percpu area. This is sufficient to allow access to the canary for use during initial startup, so no changes are needed there. [1] 3fb0fdb3bbe7 ("x86/stackprotector/32: Make the canary into a regular percpu variable") [ bp: Massage commit message. ] Suggested-by: Joerg Roedel <jroedel@suse.de> #for 64-bit %gs set up Signed-off-by: Michael Roth <michael.roth@amd.com> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220307213356.2797205-24-brijesh.singh@amd.com
|
#
efac0eed |
|
08-Feb-2022 |
Brijesh Singh <brijesh.singh@amd.com> |
x86/kernel: Mark the .bss..decrypted section as shared in the RMP table The encryption attribute for the .bss..decrypted section is cleared in the initial page table build. This is because the section contains the data that need to be shared between the guest and the hypervisor. When SEV-SNP is active, just clearing the encryption attribute in the page table is not enough. The page state needs to be updated in the RMP table. Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220307213356.2797205-20-brijesh.singh@amd.com
|
#
95d33bfa |
|
08-Feb-2022 |
Brijesh Singh <brijesh.singh@amd.com> |
x86/sev: Register GHCB memory when SEV-SNP is active The SEV-SNP guest is required by the GHCB spec to register the GHCB's Guest Physical Address (GPA). This is because the hypervisor may prefer that a guest uses a consistent and/or specific GPA for the GHCB associated with a vCPU. For more information, see the GHCB specification section "GHCB GPA Registration". [ bp: Cleanup comments. ] Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220307213356.2797205-18-brijesh.singh@amd.com
|
#
bcce8290 |
|
08-Feb-2022 |
Michael Roth <michael.roth@amd.com> |
x86/sev: Detect/setup SEV/SME features earlier in boot sme_enable() handles feature detection for both SEV and SME. Future patches will also use it for SEV-SNP feature detection/setup, which will need to be done immediately after the first #VC handler is set up. Move it now in preparation. Signed-off-by: Michael Roth <michael.roth@amd.com> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Venu Busireddy <venu.busireddy@oracle.com> Link: https://lore.kernel.org/r/20220307213356.2797205-9-brijesh.singh@amd.com
|
#
5f117033 |
|
11-Feb-2022 |
Marco Bonelli <marco@mebeim.net> |
x86/head64: Add missing __head annotation to sme_postprocess_startup() This function was previously part of __startup_64() which is marked __head, and is currently only called from there. Mark it __head too. Signed-off-by: Marco Bonelli <marco@mebeim.net> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220211162350.11780-1-marco@mebeim.net
|
#
b64dfcde |
|
17-Dec-2021 |
Borislav Petkov <bp@suse.de> |
x86/mm: Prevent early boot triple-faults with instrumentation Commit in Fixes added a global TLB flush on the early boot path, after the kernel switches off of the trampoline page table. Compiler profiling options enabled with GCOV_PROFILE add additional measurement code on clang which needs to be initialized prior to use. The global flush in x86_64_start_kernel() happens before those initializations can happen, leading to accessing invalid memory. GCOV_PROFILE builds with gcc are still ok so this is clang-specific. The second issue this fixes is with KASAN: for a similar reason, kasan_early_init() needs to have happened before KASAN-instrumented functions are called. Therefore, reorder the flush to happen after the KASAN early init and prevent the compilers from adding profiling instrumentation to native_write_cr4(). Fixes: f154f290855b ("x86/mm/64: Flush global TLB on boot and AP bringup") Reported-by: "J. Bruce Fields" <bfields@fieldses.org> Reported-by: kernel test robot <oliver.sang@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Carel Si <beibei.si@intel.com> Tested-by: "J. Bruce Fields" <bfields@fieldses.org> Link: https://lore.kernel.org/r/20211209144141.GC25654@xsang-OptiPlex-9020
|
#
f154f290 |
|
02-Dec-2021 |
Joerg Roedel <jroedel@suse.de> |
x86/mm/64: Flush global TLB on boot and AP bringup The AP bringup code uses the trampoline_pgd page-table which establishes global mappings in the user range of the address space. Flush the global TLB entries after the indentity mappings are removed so no stale entries remain in the TLB. Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20211202153226.22946-3-joro@8bytes.org
|
#
5ed0a99b |
|
10-Nov-2021 |
Borislav Petkov <bp@suse.de> |
x86/head64: Carve out the guest encryption postprocessing into a helper Carve it out so that it is abstracted out of the main boot path. All other encrypted guest-relevant processing should be placed in there. No functional changes. Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20211110220731.2396491-7-brijesh.singh@amd.com
|
#
e9d1d2bb |
|
08-Sep-2021 |
Tom Lendacky <thomas.lendacky@amd.com> |
treewide: Replace the use of mem_encrypt_active() with cc_platform_has() Replace uses of mem_encrypt_active() with calls to cc_platform_has() with the CC_ATTR_MEM_ENCRYPT attribute. Remove the implementation of mem_encrypt_active() across all arches. For s390, since the default implementation of the cc_platform_has() matches the s390 implementation of mem_encrypt_active(), cc_platform_has() does not need to be implemented in s390 (the config option ARCH_HAS_CC_PLATFORM is not set). Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210928191009.32551-9-bp@alien8.de
|
#
e759959f |
|
27-Apr-2021 |
Brijesh Singh <brijesh.singh@amd.com> |
x86/sev-es: Rename sev-es.{ch} to sev.{ch} SEV-SNP builds upon the SEV-ES functionality while adding new hardware protection. Version 2 of the GHCB specification adds new NAE events that are SEV-SNP specific. Rename the sev-es.{ch} to sev.{ch} so that all SEV* functionality can be consolidated in one place. Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Joerg Roedel <jroedel@suse.de> Link: https://lkml.kernel.org/r/20210427111636.1207-2-brijesh.singh@amd.com
|
#
d9f6e12f |
|
18-Mar-2021 |
Ingo Molnar <mingo@kernel.org> |
x86: Fix various typos in comments Fix ~144 single-word typos in arch/x86/ code comments. Doing this in a single commit should reduce the churn. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: linux-kernel@vger.kernel.org
|
#
61b39ad9 |
|
08-Nov-2020 |
Wang Qing <wangqing@vivo.com> |
x86/head64: Remove duplicate include Remove duplicate header include. Signed-off-by: Wang Qing <wangqing@vivo.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/1604893542-20961-1-git-send-email-wangqing@vivo.com
|
#
33def849 |
|
21-Oct-2020 |
Joe Perches <joe@perches.com> |
treewide: Convert macro and uses of __section(foo) to __section("foo") Use a more generic form for __section that requires quotes to avoid complications with clang and gcc differences. Remove the quote operator # from compiler_attributes.h __section macro. Convert all unquoted __section(foo) uses to quoted __section("foo"). Also convert __attribute__((section("foo"))) uses to __section("foo") even if the __attribute__ has multiple list entry forms. Conversion done using the script at: https://lore.kernel.org/lkml/75393e5ddc272dc7403de74d645e6c6e0f4e70eb.camel@perches.com/2-convert_section.pl Signed-off-by: Joe Perches <joe@perches.com> Reviewed-by: Nick Desaulniers <ndesaulniers@gooogle.com> Reviewed-by: Miguel Ojeda <ojeda@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
1aa9aa8e |
|
08-Sep-2020 |
Joerg Roedel <jroedel@suse.de> |
x86/sev-es: Setup GHCB-based boot #VC handler Add the infrastructure to handle #VC exceptions when the kernel runs on virtual addresses and has mapped a GHCB. This handler will be used until the runtime #VC handler takes over. Since the handler runs very early, disable instrumentation for sev-es.c. [ bp: Make vc_ghcb_invalidate() __always_inline so that it can be inlined in noinstr functions like __sev_es_nmi_complete(). ] Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200908123816.GB3764@8bytes.org
|
#
74d8d9d5 |
|
08-Sep-2020 |
Joerg Roedel <jroedel@suse.de> |
x86/sev-es: Setup an early #VC handler Setup an early handler for #VC exceptions. There is no GHCB mapped yet, so just re-use the vc_no_ghcb_handler(). It can only handle CPUID exit-codes, but that should be enough to get the kernel through verify_cpu() and __startup_64() until it runs on virtual addresses. Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> [ boot failure Error: kernel_ident_mapping_init() failed. ] Reported-by: kernel test robot <lkp@intel.com> Link: https://lkml.kernel.org/r/20200908123517.GA3764@8bytes.org
|
#
4b47cdbd |
|
07-Sep-2020 |
Joerg Roedel <jroedel@suse.de> |
x86/head/64: Move early exception dispatch to C code Move the assembly coded dispatch between page-faults and all other exceptions to C code to make it easier to maintain and extend. Also change the return-type of early_make_pgtable() to bool and make it static. Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200907131613.12703-36-joro@8bytes.org
|
#
f5963ba7 |
|
07-Sep-2020 |
Joerg Roedel <jroedel@suse.de> |
x86/head/64: Install a CPU bringup IDT Add a separate bringup IDT for the CPU bringup code that will be used until the kernel switches to the idt_table. There are two reasons for a separate IDT: 1) When the idt_table is set up and the secondary CPUs are booted, it contains entries (e.g. IST entries) which require certain CPU state to be set up. This includes a working TSS (for IST), MSR_GS_BASE (for stack protector) or CR4.FSGSBASE (for paranoid_entry) path. By using a dedicated IDT for early boot this state need not to be set up early. 2) The idt_table is static to idt.c, so any function using/modifying must be in idt.c too. That means that all compiler driven instrumentation like tracing or KASAN is also active in this code. But during early CPU bringup the environment is not set up for this instrumentation to work correctly. To avoid all of these hassles and make early exception handling robust, use a dedicated bringup IDT. The IDT is loaded two times, first on the boot CPU while the kernel is still running on direct mapped addresses, and again later after the switch to kernel addresses has happened. The second IDT load happens on the boot and secondary CPUs. Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200907131613.12703-34-joro@8bytes.org
|
#
866b556e |
|
07-Sep-2020 |
Joerg Roedel <jroedel@suse.de> |
x86/head/64: Install startup GDT Handling exceptions during boot requires a working GDT. The kernel GDT can't be used on the direct mapping, so load a startup GDT and setup segments. Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200907131613.12703-30-joro@8bytes.org
|
#
65fddcfc |
|
08-Jun-2020 |
Mike Rapoport <rppt@kernel.org> |
mm: reorder includes after introduction of linux/pgtable.h The replacement of <asm/pgrable.h> with <linux/pgtable.h> made the include of the latter in the middle of asm includes. Fix this up with the aid of the below script and manual adjustments here and there. import sys import re if len(sys.argv) is not 3: print "USAGE: %s <file> <header>" % (sys.argv[0]) sys.exit(1) hdr_to_move="#include <linux/%s>" % sys.argv[2] moved = False in_hdrs = False with open(sys.argv[1], "r") as f: lines = f.readlines() for _line in lines: line = _line.rstrip(' ') if line == hdr_to_move: continue if line.startswith("#include <linux/"): in_hdrs = True elif not moved and in_hdrs: moved = True print hdr_to_move print line Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Cain <bcain@codeaurora.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chris Zankel <chris@zankel.net> Cc: "David S. Miller" <davem@davemloft.net> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Greentime Hu <green.hu@gmail.com> Cc: Greg Ungerer <gerg@linux-m68k.org> Cc: Guan Xuetao <gxt@pku.edu.cn> Cc: Guo Ren <guoren@kernel.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ley Foon Tan <ley.foon.tan@intel.com> Cc: Mark Salter <msalter@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Nick Hu <nickhu@andestech.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Richard Weinberger <richard@nod.at> Cc: Rich Felker <dalias@libc.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Stafford Horne <shorne@gmail.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Vincent Chen <deanbo422@gmail.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Will Deacon <will@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Link: http://lkml.kernel.org/r/20200514170327.31389-4-rppt@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
ca5999fd |
|
08-Jun-2020 |
Mike Rapoport <rppt@kernel.org> |
mm: introduce include/linux/pgtable.h The include/linux/pgtable.h is going to be the home of generic page table manipulation functions. Start with moving asm-generic/pgtable.h to include/linux/pgtable.h and make the latter include asm/pgtable.h. Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Cain <bcain@codeaurora.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chris Zankel <chris@zankel.net> Cc: "David S. Miller" <davem@davemloft.net> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Greentime Hu <green.hu@gmail.com> Cc: Greg Ungerer <gerg@linux-m68k.org> Cc: Guan Xuetao <gxt@pku.edu.cn> Cc: Guo Ren <guoren@kernel.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ley Foon Tan <ley.foon.tan@intel.com> Cc: Mark Salter <msalter@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Nick Hu <nickhu@andestech.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Richard Weinberger <richard@nod.at> Cc: Rich Felker <dalias@libc.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Stafford Horne <shorne@gmail.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Vincent Chen <deanbo422@gmail.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Will Deacon <will@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Link: http://lkml.kernel.org/r/20200514170327.31389-3-rppt@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
2aa85f24 |
|
24-Sep-2019 |
Steve Wahl <steve.wahl@hpe.com> |
x86/boot/64: Make level2_kernel_pgt pages invalid outside kernel area Our hardware (UV aka Superdome Flex) has address ranges marked reserved by the BIOS. Access to these ranges is caught as an error, causing the BIOS to halt the system. Initial page tables mapped a large range of physical addresses that were not checked against the list of BIOS reserved addresses, and sometimes included reserved addresses in part of the mapped range. Including the reserved range in the map allowed processor speculative accesses to the reserved range, triggering a BIOS halt. Used early in booting, the page table level2_kernel_pgt addresses 1 GiB divided into 2 MiB pages, and it was set up to linearly map a full 1 GiB of physical addresses that included the physical address range of the kernel image, as chosen by KASLR. But this also included a large range of unused addresses on either side of the kernel image. And unlike the kernel image's physical address range, this extra mapped space was not checked against the BIOS tables of usable RAM addresses. So there were times when the addresses chosen by KASLR would result in processor accessible mappings of BIOS reserved physical addresses. The kernel code did not directly access any of this extra mapped space, but having it mapped allowed the processor to issue speculative accesses into reserved memory, causing system halts. This was encountered somewhat rarely on a normal system boot, and much more often when starting the crash kernel if "crashkernel=512M,high" was specified on the command line (this heavily restricts the physical address of the crash kernel, in our case usually within 1 GiB of reserved space). The solution is to invalidate the pages of this table outside the kernel image's space before the page table is activated. It fixes this problem on our hardware. [ bp: Touchups. ] Signed-off-by: Steve Wahl <steve.wahl@hpe.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Baoquan He <bhe@redhat.com> Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: dimitri.sivanich@hpe.com Cc: Feng Tang <feng.tang@intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jordan Borgner <mail@jordan-borgner.de> Cc: Juergen Gross <jgross@suse.com> Cc: mike.travis@hpe.com Cc: russ.anderson@hpe.com Cc: stable@vger.kernel.org Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86-ml <x86@kernel.org> Cc: Zhenzhong Duan <zhenzhong.duan@oracle.com> Link: https://lkml.kernel.org/r/9c011ee51b081534a7a15065b1681d200298b530.1569358539.git.steve.wahl@hpe.com
|
#
c1887159 |
|
20-Jun-2019 |
Kirill A. Shutemov <kirill@shutemov.name> |
x86/boot/64: Add missing fixup_pointer() for next_early_pgt access __startup_64() uses fixup_pointer() to access global variables in a position-independent fashion. Access to next_early_pgt was wrapped into the helper, but one instance in the 5-level paging branch was missed. GCC generates a R_X86_64_PC32 PC-relative relocation for the access which doesn't trigger the issue, but Clang emmits a R_X86_64_32S which leads to an invalid memory access and system reboot. Fixes: 187e91fe5e91 ("x86/boot/64/clang: Use fixup_pointer() to access 'next_early_pgt'") Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Alexander Potapenko <glider@google.com> Link: https://lkml.kernel.org/r/20190620112422.29264-1-kirill.shutemov@linux.intel.com
|
#
81c7ed29 |
|
20-Jun-2019 |
Kirill A. Shutemov <kirill@shutemov.name> |
x86/boot/64: Fix crash if kernel image crosses page table boundary A kernel which boots in 5-level paging mode crashes in a small percentage of cases if KASLR is enabled. This issue was tracked down to the case when the kernel image unpacks in a way that it crosses an 1G boundary. The crash is caused by an overrun of the PMD page table in __startup_64() and corruption of P4D page table allocated next to it. This particular issue is not visible with 4-level paging as P4D page tables are not used. But the P4D and the PUD calculation have similar problems. The PMD index calculation is wrong due to operator precedence, which fails to confine the PMDs in the PMD array on wrap around. The P4D calculation for 5-level paging and the PUD calculation calculate the first index correctly, but then blindly increment it which causes the same issue when a kernel image is located across a 512G and for 5-level paging across a 46T boundary. This wrap around mishandling was introduced when these parts moved from assembly to C. Restore it to the correct behaviour. Fixes: c88d71508e36 ("x86/boot/64: Rewrite startup_64() in C") Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lkml.kernel.org/r/20190620112345.28833-1-kirill.shutemov@linux.intel.com
|
#
38418404 |
|
20-Nov-2018 |
Juergen Gross <jgross@suse.com> |
x86/boot: Mostly revert commit ae7e1238e68f2a ("Add ACPI RSDP address to setup_header") Peter Anvin pointed out that commit: ae7e1238e68f2a ("x86/boot: Add ACPI RSDP address to setup_header") should be reverted as setup_header should only contain items set by the legacy BIOS. So revert said commit. Instead of fully reverting the dependent commit of: e7b66d16fe4172 ("x86/acpi, x86/boot: Take RSDP address for boot params if available") just remove the setup_header reference in order to replace it by a boot_params in a followup patch. Suggested-by: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: boris.ostrovsky@oracle.com Cc: bp@alien8.de Cc: daniel.kiper@oracle.com Cc: sstabellini@kernel.org Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/20181120072529.5489-2-jgross@suse.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
0e96f31e |
|
27-Oct-2018 |
Jordan Borgner <mail@jordan-borgner.de> |
x86: Clean up 'sizeof x' => 'sizeof(x)' "sizeof(x)" is the canonical coding style used in arch/x86 most of the time. Fix the few places that didn't follow the convention. (Also do some whitespace cleanups in a few places while at it.) [ mingo: Rewrote the changelog. ] Signed-off-by: Jordan Borgner <mail@jordan-borgner.de> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20181028125828.7rgammkgzep2wpam@JordanDesktop Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
ae7e1238 |
|
10-Oct-2018 |
Juergen Gross <jgross@suse.com> |
x86/boot: Add ACPI RSDP address to setup_header Xen PVH guests receive the address of the RSDP table from Xen. In order to support booting a Xen PVH guest via Grub2 using the standard x86 boot entry we need a way for Grub2 to pass the RSDP address to the kernel. For this purpose expand the struct setup_header to hold the physical address of the RSDP address. Being zero means it isn't specified and has to be located the legacy way (searching through low memory or EBDA). While documenting the new setup_header layout and protocol version 2.14 add the missing documentation of protocol version 2.13. There are Grub2 versions in several distros with a downstream patch violating the boot protocol by writing past the end of setup_header. This requires another update of the boot protocol to enable the kernel to distinguish between a specified RSDP address and one filled with garbage by such a broken Grub2. From protocol 2.14 on Grub2 will write the version it is supporting (but never a higher value than found to be supported by the kernel) ored with 0x8000 to the version field of setup_header. This enables the kernel to know up to which field Grub2 has written information to. All fields after that are supposed to be clobbered. Signed-off-by: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: boris.ostrovsky@oracle.com Cc: bp@alien8.de Cc: corbet@lwn.net Cc: linux-doc@vger.kernel.org Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/20181010061456.22238-3-jgross@suse.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
05ab1d8a |
|
19-Sep-2018 |
Feng Tang <feng.tang@intel.com> |
x86/mm: Expand static page table for fixmap space We met a kernel panic when enabling earlycon, which is due to the fixmap address of earlycon is not statically setup. Currently the static fixmap setup in head_64.S only covers 2M virtual address space, while it actually could be in 4M space with different kernel configurations, e.g. when VSYSCALL emulation is disabled. So increase the static space to 4M for now by defining FIXMAP_PMD_NUM to 2, and add a build time check to ensure that the fixmap is covered by the initial static page tables. Fixes: 1ad83c858c7d ("x86_64,vsyscall: Make vsyscall emulation configurable") Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Feng Tang <feng.tang@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: kernel test robot <rong.a.chen@intel.com> Reviewed-by: Juergen Gross <jgross@suse.com> (Xen parts) Cc: H Peter Anvin <hpa@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andy Lutomirsky <luto@kernel.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180920025828.23699-1-feng.tang@intel.com
|
#
b3f0907c |
|
14-Sep-2018 |
Brijesh Singh <brijesh.singh@amd.com> |
x86/mm: Add .bss..decrypted section to hold shared variables kvmclock defines few static variables which are shared with the hypervisor during the kvmclock initialization. When SEV is active, memory is encrypted with a guest-specific key, and if the guest OS wants to share the memory region with the hypervisor then it must clear the C-bit before sharing it. Currently, we use kernel_physical_mapping_init() to split large pages before clearing the C-bit on shared pages. But it fails when called from the kvmclock initialization (mainly because the memblock allocator is not ready that early during boot). Add a __bss_decrypted section attribute which can be used when defining such shared variable. The so-defined variables will be placed in the .bss..decrypted section. This section will be mapped with C=0 early during boot. The .bss..decrypted section has a big chunk of memory that may be unused when memory encryption is not active, free it when memory encryption is not active. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Borislav Petkov <bp@suse.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Radim Krčmář<rkrcmar@redhat.com> Cc: kvm@vger.kernel.org Link: https://lkml.kernel.org/r/1536932759-12905-2-git-send-email-brijesh.singh@amd.com
|
#
51be1335 |
|
22-Jun-2018 |
Kirill A. Shutemov <kirill.shutemov@linux.intel.com> |
Revert "x86/mm: Mark __pgtable_l5_enabled __initdata" This reverts commit e4e961e36f063484c48bed919013c106d178995d. We need to use early version of pgtable_l5_enabled() in early_identify_cpu() as this code runs before cpu_feature_enabled() is usable. But it leads to section mismatch: cpu_init() load_mm_ldt() ldt_slot_va() LDT_BASE_ADDR LDT_PGD_ENTRY pgtable_l5_enabled() __pgtable_l5_enabled __pgtable_l5_enabled marked as __initdata, but cpu_init() is not __init. It's fixable: early code can be isolated into a separate translation unit, but such change collides with other work in the area. That's too much hassle to save 4 bytes of memory. Return __pgtable_l5_enabled back to be __ro_after_init. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Link: https://lkml.kernel.org/r/20180622220841.54135-2-kirill.shutemov@linux.intel.com
|
#
e4e961e3 |
|
18-May-2018 |
Kirill A. Shutemov <kirill.shutemov@linux.intel.com> |
x86/mm: Mark __pgtable_l5_enabled __initdata __pgtable_l5_enabled shouldn't be needed after system has booted. All preparation is done. We can now mark it as __initdata. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Hugh Dickins <hughd@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180518103528.59260-8-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
372fddf7 |
|
18-May-2018 |
Kirill A. Shutemov <kirill.shutemov@linux.intel.com> |
x86/mm: Introduce the 'no5lvl' kernel parameter This kernel parameter allows to force kernel to use 4-level paging even if hardware and kernel support 5-level paging. The option may be useful to work around regressions related to 5-level paging. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Hugh Dickins <hughd@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180518103528.59260-5-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
ed7588d5 |
|
18-May-2018 |
Kirill A. Shutemov <kirill.shutemov@linux.intel.com> |
x86/mm: Stop pretending pgtable_l5_enabled is a variable pgtable_l5_enabled is defined using cpu_feature_enabled() but we refer to it as a variable. This is misleading. Make pgtable_l5_enabled() a function. We cannot literally define it as a function due to circular dependencies between header files. Function-alike macros is close enough. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Hugh Dickins <hughd@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180518103528.59260-4-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
ad3fe525 |
|
18-May-2018 |
Kirill A. Shutemov <kirill.shutemov@linux.intel.com> |
x86/mm: Unify pgtable_l5_enabled usage in early boot code Usually pgtable_l5_enabled is defined using cpu_feature_enabled(). cpu_feature_enabled() is not available in early boot code. We use several different preprocessor tricks to get around it. It's messy. Unify them all. If cpu_feature_enabled() is not yet available, USE_EARLY_PGTABLE_L5 can be defined before all includes. It makes pgtable_l5_enabled rely on __pgtable_l5_enabled variable instead. This approach fits all early users. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Hugh Dickins <hughd@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180518103528.59260-3-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
4a09f021 |
|
09-May-2018 |
Alexander Potapenko <glider@google.com> |
x86/boot/64/clang: Use fixup_pointer() to access '__supported_pte_mask' Clang builds with defconfig started crashing after the following commit: fb43d6cb91ef ("x86/mm: Do not auto-massage page protections") This was caused by introducing a new global access in __startup_64(). Code in __startup_64() can be relocated during execution, but the compiler doesn't have to generate PC-relative relocations when accessing globals from that function. Clang actually does not generate them, which leads to boot-time crashes. To work around this problem, every global pointer must be adjusted using fixup_pointer(). Signed-off-by: Alexander Potapenko <glider@google.com> Reviewed-by: Dave Hansen <dave.hansen@intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: dvyukov@google.com Cc: kirill.shutemov@linux.intel.com Cc: linux-mm@kvack.org Cc: md@google.com Cc: mka@chromium.org Fixes: fb43d6cb91ef ("x86/mm: Do not auto-massage page protections") Link: http://lkml.kernel.org/r/20180509091822.191810-1-glider@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
fb43d6cb |
|
06-Apr-2018 |
Dave Hansen <dave.hansen@linux.intel.com> |
x86/mm: Do not auto-massage page protections A PTE is constructed from a physical address and a pgprotval_t. __PAGE_KERNEL, for instance, is a pgprot_t and must be converted into a pgprotval_t before it can be used to create a PTE. This is done implicitly within functions like pfn_pte() by massage_pgprot(). However, this makes it very challenging to set bits (and keep them set) if your bit is being filtered out by massage_pgprot(). This moves the bit filtering out of pfn_pte() and friends. For users of PAGE_KERNEL*, filtering will be done automatically inside those macros but for users of __PAGE_KERNEL*, they need to do their own filtering now. Note that we also just move pfn_pte/pmd/pud() over to check_pgprot() instead of massage_pgprot(). This way, we still *look* for unsupported bits and properly warn about them if we find them. This might happen if an unfiltered __PAGE_KERNEL* value was passed in, for instance. - printk format warning fix from: Arnd Bergmann <arnd@arndb.de> - boot crash fix from: Tom Lendacky <thomas.lendacky@amd.com> - crash bisected by: Mike Galbraith <efault@gmx.de> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reported-and-fixed-by: Arnd Bergmann <arnd@arndb.de> Fixed-by: Tom Lendacky <thomas.lendacky@amd.com> Bisected-by: Mike Galbraith <efault@gmx.de> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hugh Dickins <hughd@google.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <keescook@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nadav Amit <namit@vmware.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20180406205509.77E1D7F6@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
39b95522 |
|
16-Feb-2018 |
Kirill A. Shutemov <kirill.shutemov@linux.intel.com> |
x86/mm: Optimize boot-time paging mode switching cost By this point we have functioning boot-time switching between 4- and 5-level paging mode. But naive approach comes with cost. Numbers below are for kernel build, allmodconfig, 5 times. CONFIG_X86_5LEVEL=n: Performance counter stats for 'sh -c make -j100 -B -k >/dev/null' (5 runs): 17308719.892691 task-clock:u (msec) # 26.772 CPUs utilized ( +- 0.11% ) 0 context-switches:u # 0.000 K/sec 0 cpu-migrations:u # 0.000 K/sec 331,993,164 page-faults:u # 0.019 M/sec ( +- 0.01% ) 43,614,978,867,455 cycles:u # 2.520 GHz ( +- 0.01% ) 39,371,534,575,126 stalled-cycles-frontend:u # 90.27% frontend cycles idle ( +- 0.09% ) 28,363,350,152,428 instructions:u # 0.65 insn per cycle # 1.39 stalled cycles per insn ( +- 0.00% ) 6,316,784,066,413 branches:u # 364.948 M/sec ( +- 0.00% ) 250,808,144,781 branch-misses:u # 3.97% of all branches ( +- 0.01% ) 646.531974142 seconds time elapsed ( +- 1.15% ) CONFIG_X86_5LEVEL=y: Performance counter stats for 'sh -c make -j100 -B -k >/dev/null' (5 runs): 17411536.780625 task-clock:u (msec) # 26.426 CPUs utilized ( +- 0.10% ) 0 context-switches:u # 0.000 K/sec 0 cpu-migrations:u # 0.000 K/sec 331,868,663 page-faults:u # 0.019 M/sec ( +- 0.01% ) 43,865,909,056,301 cycles:u # 2.519 GHz ( +- 0.01% ) 39,740,130,365,581 stalled-cycles-frontend:u # 90.59% frontend cycles idle ( +- 0.05% ) 28,363,358,997,959 instructions:u # 0.65 insn per cycle # 1.40 stalled cycles per insn ( +- 0.00% ) 6,316,784,937,460 branches:u # 362.793 M/sec ( +- 0.00% ) 251,531,919,485 branch-misses:u # 3.98% of all branches ( +- 0.00% ) 658.886307752 seconds time elapsed ( +- 0.92% ) The patch tries to fix the performance regression by using cpu_feature_enabled(X86_FEATURE_LA57) instead of pgtable_l5_enabled in all hot code paths. These will statically patch the target code for additional performance. CONFIG_X86_5LEVEL=y + the patch: Performance counter stats for 'sh -c make -j100 -B -k >/dev/null' (5 runs): 17381990.268506 task-clock:u (msec) # 26.907 CPUs utilized ( +- 0.19% ) 0 context-switches:u # 0.000 K/sec 0 cpu-migrations:u # 0.000 K/sec 331,862,625 page-faults:u # 0.019 M/sec ( +- 0.01% ) 43,697,726,320,051 cycles:u # 2.514 GHz ( +- 0.03% ) 39,480,408,690,401 stalled-cycles-frontend:u # 90.35% frontend cycles idle ( +- 0.05% ) 28,363,394,221,388 instructions:u # 0.65 insn per cycle # 1.39 stalled cycles per insn ( +- 0.00% ) 6,316,794,985,573 branches:u # 363.410 M/sec ( +- 0.00% ) 251,013,232,547 branch-misses:u # 3.97% of all branches ( +- 0.01% ) 645.991174661 seconds time elapsed ( +- 1.19% ) Unfortunately, this approach doesn't help with text size: vmlinux.before .text size: 8190319 vmlinux.after .text size: 8200623 The .text section is increased by about 4k. Not sure if we can do anything about this. Signed-off-by: Kirill A. Shuemov <kirill.shutemov@linux.intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bp@suse.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20180216114948.68868-4-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
6f9dd329 |
|
14-Feb-2018 |
Kirill A. Shutemov <kirill.shutemov@linux.intel.com> |
x86/mm: Support boot-time switching of paging modes in the early boot code Early boot code should be able to initialize page tables for both 4- and 5-level paging modes. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20180214182542.69302-7-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
9b46a051 |
|
14-Feb-2018 |
Kirill A. Shutemov <kirill.shutemov@linux.intel.com> |
x86/mm: Initialize vmemmap_base at boot-time vmemmap area has different placement depending on paging mode. Let's adjust it during early boot accodring to machine capability. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20180214182542.69302-6-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
a7412546 |
|
14-Feb-2018 |
Kirill A. Shutemov <kirill.shutemov@linux.intel.com> |
x86/mm: Adjust vmalloc base and size at boot-time vmalloc area has different placement and size depending on paging mode. Let's adjust it during early boot accodring to machine capability. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20180214182542.69302-5-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
4fa5662b |
|
14-Feb-2018 |
Kirill A. Shutemov <kirill.shutemov@linux.intel.com> |
x86/mm: Initialize 'page_offset_base' at boot-time For 4- and 5-level paging we have different 'page_offset_base'. Let's initialize it at boot-time accordingly to machine capability. We also have to split __PAGE_OFFSET_BASE into two constants -- for 4- and 5-level paging. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20180214182542.69302-4-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
b16e770b |
|
14-Feb-2018 |
Kirill A. Shutemov <kirill.shutemov@linux.intel.com> |
x86/mm: Initialize 'pgdir_shift' and 'ptrs_per_p4d' at boot-time Switching between paging modes requires the folding of the p4d page table level when we only have 4 paging levels, which means we need to adjust 'pgdir_shift' and 'ptrs_per_p4d' during early boot according to paging mode. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20180214182542.69302-3-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
4c2b4058 |
|
14-Feb-2018 |
Kirill A. Shutemov <kirill.shutemov@linux.intel.com> |
x86/mm: Initialize 'pgtable_l5_enabled' at boot-time 'pgtable_l5_enabled' indicates which paging mode we are using. We need to initialize it at boot-time according to machine capability. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20180214182542.69302-2-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
c65e774f |
|
14-Feb-2018 |
Kirill A. Shutemov <kirill.shutemov@linux.intel.com> |
x86/mm: Make PGDIR_SHIFT and PTRS_PER_P4D variable For boot-time switching between 4- and 5-level paging we need to be able to fold p4d page table level at runtime. It requires variable PGDIR_SHIFT and PTRS_PER_P4D. The change doesn't affect the kernel image size much: text data bss dec hex filename 8628091 4734304 1368064 14730459 e0c4db vmlinux.before 8628393 4734340 1368064 14730797 e0c62d vmlinux.after Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20180214111656.88514-7-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
e626e6bb |
|
14-Feb-2018 |
Kirill A. Shutemov <kirill.shutemov@linux.intel.com> |
x86/mm: Introduce 'pgtable_l5_enabled' The new flag would indicate what paging mode we are in. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20180214111656.88514-5-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
eedb92ab |
|
14-Feb-2018 |
Kirill A. Shutemov <kirill.shutemov@linux.intel.com> |
x86/mm: Make virtual memory layout dynamic for CONFIG_X86_5LEVEL=y We need to be able to adjust virtual memory layout at runtime to be able to switch between 4- and 5-level paging at boot-time. KASLR already has movable __VMALLOC_BASE, __VMEMMAP_BASE and __PAGE_OFFSET. Let's re-use it. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20180214111656.88514-4-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
107cd253 |
|
10-Jan-2018 |
Tom Lendacky <thomas.lendacky@amd.com> |
x86/mm: Encrypt the initrd earlier for BSP microcode update Currently the BSP microcode update code examines the initrd very early in the boot process. If SME is active, the initrd is treated as being encrypted but it has not been encrypted (in place) yet. Update the early boot code that encrypts the kernel to also encrypt the initrd so that early BSP microcode updates work. Tested-by: Gabriel Craciunescu <nix.or.die@gmail.com> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20180110192634.6026.10452.stgit@tlendack-t1.amdoffice.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
b2441318 |
|
01-Nov-2017 |
Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
License cleanup: add SPDX GPL-2.0 license identifier to files with no license Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine the correct license. By default all files without license information are under the default license of the kernel, which is GPL version 2. Update the files which contain no license information with the 'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne. How this work was done: Patches were generated and checked against linux-4.14-rc6 for a subset of the use cases: - file had no licensing information it it. - file was a */uapi/* one with no licensing information in it, - file was a */uapi/* one with existing licensing information, Further patches will be generated in subsequent months to fix up cases where non-standard license headers were used, and references to license had to be inferred by heuristics based on keywords. The analysis to determine which SPDX License Identifier to be applied to a file was done in a spreadsheet of side by side results from of the output of two independent scanners (ScanCode & Windriver) producing SPDX tag:value files created by Philippe Ombredanne. Philippe prepared the base worksheet, and did an initial spot review of a few 1000 files. The 4.13 kernel was the starting point of the analysis with 60,537 files assessed. Kate Stewart did a file by file comparison of the scanner results in the spreadsheet to determine which SPDX license identifier(s) to be applied to the file. She confirmed any determination that was not immediately clear with lawyers working with the Linux Foundation. Criteria used to select files for SPDX license identifier tagging was: - Files considered eligible had to be source code files. - Make and config files were included as candidates if they contained >5 lines of source - File already had some variant of a license header in it (even if <5 lines). All documentation files were explicitly excluded. The following heuristics were used to determine which SPDX license identifiers to apply. - when both scanners couldn't find any license traces, file was considered to have no license information in it, and the top level COPYING file license applied. For non */uapi/* files that summary was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 11139 and resulted in the first patch in this series. If that file was a */uapi/* path one, it was "GPL-2.0 WITH Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 WITH Linux-syscall-note 930 and resulted in the second patch in this series. - if a file had some form of licensing information in it, and was one of the */uapi/* ones, it was denoted with the Linux-syscall-note if any GPL family license was found in the file or had no licensing in it (per prior point). Results summary: SPDX license identifier # files ---------------------------------------------------|------ GPL-2.0 WITH Linux-syscall-note 270 GPL-2.0+ WITH Linux-syscall-note 169 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17 LGPL-2.1+ WITH Linux-syscall-note 15 GPL-1.0+ WITH Linux-syscall-note 14 ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5 LGPL-2.0+ WITH Linux-syscall-note 4 LGPL-2.1 WITH Linux-syscall-note 3 ((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3 ((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1 and that resulted in the third patch in this series. - when the two scanners agreed on the detected license(s), that became the concluded license(s). - when there was disagreement between the two scanners (one detected a license but the other didn't, or they both detected different licenses) a manual inspection of the file occurred. - In most cases a manual inspection of the information in the file resulted in a clear resolution of the license that should apply (and which scanner probably needed to revisit its heuristics). - When it was not immediately clear, the license identifier was confirmed with lawyers working with the Linux Foundation. - If there was any question as to the appropriate license identifier, the file was flagged for further research and to be revisited later in time. In total, over 70 hours of logged manual review was done on the spreadsheet to determine the SPDX license identifiers to apply to the source files by Kate, Philippe, Thomas and, in some cases, confirmation by lawyers working with the Linux Foundation. Kate also obtained a third independent scan of the 4.13 code base from FOSSology, and compared selected files where the other two scanners disagreed against that SPDX file, to see if there was new insights. The Windriver scanner is based on an older version of FOSSology in part, so they are related. Thomas did random spot checks in about 500 files from the spreadsheets for the uapi headers and agreed with SPDX license identifier in the files he inspected. For the non-uapi files Thomas did random spot checks in about 15000 files. In initial set of patches against 4.14-rc6, 3 files were found to have copy/paste license identifier errors, and have been fixed to reflect the correct identifier. Additionally Philippe spent 10 hours this week doing a detailed manual inspection and review of the 12,461 patched files from the initial patch version early this week with: - a full scancode scan run, collecting the matched texts, detected license ids and scores - reviewing anything where there was a license detected (about 500+ files) to ensure that the applied SPDX license was correct - reviewing anything where there was no detection but the patch license was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied SPDX license was correct This produced a worksheet with 20 files needing minor correction. This worksheet was then exported into 3 different .csv files for the different types of files to be modified. These .csv files were then reviewed by Greg. Thomas wrote a script to parse the csv files and add the proper SPDX tag to the file, in the format that the file expected. This script was further refined by Greg based on the output to detect more types of files automatically and to distinguish between header and source .c files (which need different comment types.) Finally Greg ran the script using the .csv files to generate the patches. Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
588787fd |
|
28-Aug-2017 |
Thomas Gleixner <tglx@linutronix.de> |
x86/idt: Move early IDT handler setup to IDT code The early IDT handler setup is done in C entry code on 64-bit kernels and in ASM entry code on 32-bit kernels. Move the 64-bit variant to the IDT code so it can be shared with 32-bit in the next step. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20170828064958.679561404@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
187e91fe |
|
16-Aug-2017 |
Alexander Potapenko <glider@google.com> |
x86/boot/64/clang: Use fixup_pointer() to access 'next_early_pgt' __startup_64() is normally using fixup_pointer() to access globals in a position-independent fashion. However 'next_early_pgt' was accessed directly, which wasn't guaranteed to work. Luckily GCC was generating a R_X86_64_PC32 PC-relative relocation for 'next_early_pgt', but Clang emitted a R_X86_64_32S, which led to accessing invalid memory and rebooting the kernel. Signed-off-by: Alexander Potapenko <glider@google.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michael Davidson <md@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: c88d71508e36 ("x86/boot/64: Rewrite startup_64() in C") Link: http://lkml.kernel.org/r/20170816190808.131748-1-glider@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
aca20d54 |
|
17-Jul-2017 |
Tom Lendacky <thomas.lendacky@amd.com> |
x86/mm: Add support to make use of Secure Memory Encryption Add support to check if SME has been enabled and if memory encryption should be activated (checking of command line option based on the configuration of the default state). If memory encryption is to be activated, then the encryption mask is set and the kernel is encrypted "in place." Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: Dave Young <dyoung@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Larry Woodman <lwoodman@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Toshimitsu Kani <toshi.kani@hpe.com> Cc: kasan-dev@googlegroups.com Cc: kvm@vger.kernel.org Cc: linux-arch@vger.kernel.org Cc: linux-doc@vger.kernel.org Cc: linux-efi@vger.kernel.org Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/5f0da2fd4cce63f556117549e2c89c170072209f.1500319216.git.thomas.lendacky@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
b9d05200 |
|
17-Jul-2017 |
Tom Lendacky <thomas.lendacky@amd.com> |
x86/mm: Insure that boot memory areas are mapped properly The boot data and command line data are present in memory in a decrypted state and are copied early in the boot process. The early page fault support will map these areas as encrypted, so before attempting to copy them, add decrypted mappings so the data is accessed properly when copied. For the initrd, encrypt this data in place. Since the future mapping of the initrd area will be mapped as encrypted the data will be accessed properly. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: Dave Young <dyoung@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Larry Woodman <lwoodman@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Toshimitsu Kani <toshi.kani@hpe.com> Cc: kasan-dev@googlegroups.com Cc: kvm@vger.kernel.org Cc: linux-arch@vger.kernel.org Cc: linux-doc@vger.kernel.org Cc: linux-efi@vger.kernel.org Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/bb0d430b41efefd45ee515aaf0979dcfda8b6a44.1500319216.git.thomas.lendacky@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
21729f81 |
|
17-Jul-2017 |
Tom Lendacky <thomas.lendacky@amd.com> |
x86/mm: Provide general kernel support for memory encryption Changes to the existing page table macros will allow the SME support to be enabled in a simple fashion with minimal changes to files that use these macros. Since the memory encryption mask will now be part of the regular pagetable macros, we introduce two new macros (_PAGE_TABLE_NOENC and _KERNPG_TABLE_NOENC) to allow for early pagetable creation/initialization without the encryption mask before SME becomes active. Two new pgprot() macros are defined to allow setting or clearing the page encryption mask. The FIXMAP_PAGE_NOCACHE define is introduced for use with MMIO. SME does not support encryption for MMIO areas so this define removes the encryption mask from the page attribute. Two new macros are introduced (__sme_pa() / __sme_pa_nodebug()) to allow creating a physical address with the encryption mask. These are used when working with the cr3 register so that the PGD can be encrypted. The current __va() macro is updated so that the virtual address is generated based off of the physical address without the encryption mask thus allowing the same virtual address to be generated regardless of whether encryption is enabled for that physical location or not. Also, an early initialization function is added for SME. If SME is active, this function: - Updates the early_pmd_flags so that early page faults create mappings with the encryption mask. - Updates the __supported_pte_mask to include the encryption mask. - Updates the protection_map entries to include the encryption mask so that user-space allocations will automatically have the encryption mask applied. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: Dave Young <dyoung@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Larry Woodman <lwoodman@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Toshimitsu Kani <toshi.kani@hpe.com> Cc: kasan-dev@googlegroups.com Cc: kvm@vger.kernel.org Cc: linux-arch@vger.kernel.org Cc: linux-doc@vger.kernel.org Cc: linux-efi@vger.kernel.org Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/b36e952c4c39767ae7f0a41cf5345adf27438480.1500319216.git.thomas.lendacky@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
5868f365 |
|
17-Jul-2017 |
Tom Lendacky <thomas.lendacky@amd.com> |
x86/mm: Add support to enable SME in early boot processing Add support to the early boot code to use Secure Memory Encryption (SME). Since the kernel has been loaded into memory in a decrypted state, encrypt the kernel in place and update the early pagetables with the memory encryption mask so that new pagetable entries will use memory encryption. The routines to set the encryption mask and perform the encryption are stub routines for now with functionality to be added in a later patch. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: Dave Young <dyoung@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Larry Woodman <lwoodman@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Toshimitsu Kani <toshi.kani@hpe.com> Cc: kasan-dev@googlegroups.com Cc: kvm@vger.kernel.org Cc: linux-arch@vger.kernel.org Cc: linux-doc@vger.kernel.org Cc: linux-efi@vger.kernel.org Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/e52ad781f085224bf835b3caff9aa3aee6febccb.1500319216.git.thomas.lendacky@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
26179670 |
|
16-Jun-2017 |
Kirill A. Shutemov <kirill.shutemov@linux.intel.com> |
x86/boot/64: Put __startup_64() into .head.text Put __startup_64() and fixup_pointer() into .head.text section to make sure it's always near startup_64() and always callable. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kernel test robot <fengguang.wu@intel.com> Cc: wfg@linux.intel.com Link: http://lkml.kernel.org/r/20170616113024.ajmif63cmcszry5a@black.fi.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
032370b9 |
|
06-Jun-2017 |
Kirill A. Shutemov <kirill.shutemov@linux.intel.com> |
x86/boot/64: Add support of additional page table level during early boot This patch adds support for 5-level paging during early boot. It generalizes boot for 4- and 5-level paging on 64-bit systems with compile-time switch between them. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20170606113133.22974-10-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
65ade2f8 |
|
06-Jun-2017 |
Kirill A. Shutemov <kirill.shutemov@linux.intel.com> |
x86/boot/64: Rename init_level4_pgt and early_level4_pgt With CONFIG_X86_5LEVEL=y, level 4 is no longer top level of page tables. Let's give these variable more generic names: init_top_pgt and early_top_pgt. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Juergen Gross <jgross@suse.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20170606113133.22974-9-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
c88d7150 |
|
06-Jun-2017 |
Kirill A. Shutemov <kirill.shutemov@linux.intel.com> |
x86/boot/64: Rewrite startup_64() in C The patch write most of startup_64 logic in C. This is preparation for 5-level paging enabling. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20170606113133.22974-8-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
6c690ee1 |
|
12-Jun-2017 |
Andy Lutomirski <luto@kernel.org> |
x86/mm: Split read_cr3() into read_cr3_pa() and __read_cr3() The kernel has several code paths that read CR3. Most of them assume that CR3 contains the PGD's physical address, whereas some of them awkwardly use PHYSICAL_PAGE_MASK to mask off low bits. Add explicit mask macros for CR3 and convert all of the CR3 readers. This will keep them from breaking when PCID is enabled. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: xen-devel <xen-devel@lists.xen.org> Link: http://lkml.kernel.org/r/883f8fb121f4616c1c1427ad87350bb2f5ffeca1.1497288170.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
be3606ff |
|
13-Mar-2017 |
Andrey Ryabinin <ryabinin.a.a@gmail.com> |
x86/kasan: Fix boot with KASAN=y and PROFILE_ANNOTATED_BRANCHES=y The kernel doesn't boot with both PROFILE_ANNOTATED_BRANCHES=y and KASAN=y options selected. With branch profiling enabled we end up calling ftrace_likely_update() before kasan_early_init(). ftrace_likely_update() is built with KASAN instrumentation, so calling it before kasan has been initialized leads to crash. Use DISABLE_BRANCH_PROFILING define to make sure that we don't call ftrace_likely_update() from early code before kasan_early_init(). Fixes: ef7f0d6a6ca8 ("x86_64: add KASan support") Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: kasan-dev@googlegroups.com Cc: Alexander Potapenko <glider@google.com> Cc: stable@vger.kernel.org Cc: Andrew Morton <akpm@linux-foundation.org> Cc: lkp@01.org Cc: Dmitry Vyukov <dvyukov@google.com> Link: http://lkml.kernel.org/r/20170313163337.1704-1-aryabinin@virtuozzo.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
66441bd3 |
|
27-Jan-2017 |
Ingo Molnar <mingo@kernel.org> |
x86/boot/e820: Move asm/e820.h to asm/e820/api.h In line with asm/e820/types.h, move the e820 API declarations to asm/e820/api.h and update all usage sites. This is just a mechanical, obviously correct move & replace patch, there will be subsequent changes to clean up the code and to make better use of the new header organization. Cc: Alex Thorlton <athorlton@sgi.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Huang, Ying <ying.huang@intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Jackson <pj@sgi.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
007b7560 |
|
10-Aug-2016 |
Andy Lutomirski <luto@kernel.org> |
x86/boot: Run reserve_bios_regions() after we initialize the memory map reserve_bios_regions() is a quirk that reserves memory that we might otherwise think is available. There's no need to run it so early, and running it before we have the memory map initialized with its non-quirky inputs makes it hard to make reserve_bios_regions() more intelligent. Move it right after we populate the memblock state. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mario Limonciello <mario_limonciello@dell.com> Cc: Matt Fleming <mfleming@suse.de> Cc: Matthew Garrett <mjg59@srcf.ucam.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/59f58618911005c799c6c9979ce6ae4881d907c2.1470821230.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
edce2121 |
|
21-Jul-2016 |
Ingo Molnar <mingo@kernel.org> |
x86/boot: Reorganize and clean up the BIOS area reservation code So the reserve_ebda_region() code has accumulated a number of problems over the years that make it really difficult to read and understand: - The calculation of 'lowmem' and 'ebda_addr' is an unnecessarily interleaved mess of first lowmem, then ebda_addr, then lowmem tweaks... - 'lowmem' here means 'super low mem' - i.e. 16-bit addressable memory. In other parts of the x86 code 'lowmem' means 32-bit addressable memory... This makes it super confusing to read. - It does not help at all that we have various memory range markers, half of which are 'start of range', half of which are 'end of range' - but this crucial property is not obvious in the naming at all ... gave me a headache trying to understand all this. - Also, the 'ebda_addr' name sucks: it highlights that it's an address (which is obvious, all values here are addresses!), while it does not highlight that it's the _start_ of the EBDA region ... - 'BIOS_LOWMEM_KILOBYTES' says a lot of things, except that this is the only value that is a pointer to a value, not a memory range address! - The function name itself is a misnomer: it says 'reserve_ebda_region()' while its main purpose is to reserve all the firmware ROM typically between 640K and 1MB, while the 'EBDA' part is only a small part of that ... - Likewise, the paravirt quirk flag name 'ebda_search' is misleading as well: this too should be about whether to reserve firmware areas in the paravirt case. - In fact thinking about this as 'end of RAM' is confusing: what this function *really* wants to reserve is firmware data and code areas! Once the thinking is inverted from a mixed 'ram' and 'reserved firmware area' notion to a pure 'reserved area' notion everything becomes a lot clearer. To improve all this rewrite the whole code (without changing the logic): - Firstly invert the naming from 'lowmem end' to 'BIOS reserved area start' and propagate this concept through all the variable names and constants. BIOS_RAM_SIZE_KB_PTR // was: BIOS_LOWMEM_KILOBYTES BIOS_START_MIN // was: INSANE_CUTOFF ebda_start // was: ebda_addr bios_start // was: lowmem BIOS_START_MAX // was: LOWMEM_CAP - Then clean up the name of the function itself by renaming it to reserve_bios_regions() and renaming the ::ebda_search paravirt flag to ::reserve_bios_regions. - Fix up all the comments (fix typos), harmonize and simplify their formulation and remove comments that become unnecessary due to the much better naming all around. Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
8d152e7a |
|
13-Apr-2016 |
Luis R. Rodriguez <mcgrof@kernel.org> |
x86/rtc: Replace paravirt rtc check with platform legacy quirk We have 4 types of x86 platforms that disable RTC: * Intel MID * Lguest - uses paravirt * Xen dom-U - uses paravirt * x86 on legacy systems annotated with an ACPI legacy flag We can consolidate all of these into a platform specific legacy quirk set early in boot through i386_start_kernel() and through x86_64_start_reservations(). This deals with the RTC quirks which we can rely on through the hardware subarch, the ACPI check can be dealt with separately. For Xen things are bit more complex given that the @X86_SUBARCH_XEN x86_hardware_subarch is shared on for Xen which uses the PV path for both domU and dom0. Since the semantics for differentiating between the two are Xen specific we provide a platform helper to help override default legacy features -- x86_platform.set_legacy_features(). Use of this helper is highly discouraged, its only purpose should be to account for the lack of semantics available within your given x86_hardware_subarch. As per 0-day, this bumps the vmlinux size using i386-tinyconfig as follows: TOTAL TEXT init.text x86_early_init_platform_quirks() +70 +62 +62 +43 Only 8 bytes overhead total, as the main increase in size is all removed via __init. Suggested-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org> Reviewed-by: Juergen Gross <jgross@suse.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: andrew.cooper3@citrix.com Cc: andriy.shevchenko@linux.intel.com Cc: bigeasy@linutronix.de Cc: boris.ostrovsky@oracle.com Cc: david.vrabel@citrix.com Cc: ffainelli@freebox.fr Cc: george.dunlap@citrix.com Cc: glin@suse.com Cc: jlee@suse.com Cc: josh@joshtriplett.org Cc: julien.grall@linaro.org Cc: konrad.wilk@oracle.com Cc: kozerkov@parallels.com Cc: lenb@kernel.org Cc: lguest@lists.ozlabs.org Cc: linux-acpi@vger.kernel.org Cc: lv.zheng@intel.com Cc: matt@codeblueprint.co.uk Cc: mbizon@freebox.fr Cc: rjw@rjwysocki.net Cc: robert.moore@intel.com Cc: rusty@rustcorp.com.au Cc: tiwai@suse.de Cc: toshi.kani@hp.com Cc: xen-devel@lists.xensource.com Link: http://lkml.kernel.org/r/1460592286-300-5-git-send-email-mcgrof@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
a91bbe01 |
|
09-Feb-2016 |
Alexander Kuleshov <kuleshovmail@gmail.com> |
x86/boot: Use proper array element type in memset() size calculation I changed open coded zeroing loops to explicit memset()s in the following commit: 5e9ebbd87a99 ("x86/boot: Micro-optimize reset_early_page_tables()") The base for the size argument of memset was sizeof(pud_p/pmd_p), which are pointers - but the initialized array has pud_t/pmd_t elements. Luckily the two types had the same size, so this did not result in any runtime misbehavior. Signed-off-by: Alexander Kuleshov <kuleshovmail@gmail.com> Cc: Alexander Popov <alpopov@ptsecurity.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1455025494-4063-1-git-send-email-kuleshovmail@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
5e9ebbd8 |
|
30-Jan-2016 |
Alexander Kuleshov <kuleshovmail@gmail.com> |
x86/boot: Micro-optimize reset_early_page_tables() Save 25 bytes of code and make the bootup a tiny bit faster: text data bss dec filename 9735144 4970776 15474688 30180608 vmlinux.old 9735119 4970776 15474688 30180583 vmlinux Signed-off-by: Alexander Kuleshov <kuleshovmail@gmail.com> Cc: Alexander Popov <alpopov@ptsecurity.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1454140872-16926-1-git-send-email-kuleshovmail@gmail.com [ Fixed various small details. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
3fda5bb4 |
|
15-Jan-2016 |
Andy Shevchenko <andriy.shevchenko@linux.intel.com> |
x86/platform/intel-mid: Enable 64-bit build Intel Tangier SoC is known to have 64-bit dual core CPU. Enable 64-bit build for it. The kernel has been tested on Intel Edison board: Linux buildroot 4.4.0-next-20160115+ #25 SMP Fri Jan 15 22:03:19 EET 2016 x86_64 GNU/Linux processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 74 model name : Genuine Intel(R) CPU 4000 @ 500MHz stepping : 8 Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mika Westerberg <mika.westerberg@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1452888668-147116-1-git-send-email-andriy.shevchenko@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
5d5aa3cf |
|
01-Jul-2015 |
Alexander Popov <alpopov@ptsecurity.com> |
x86/kasan: Fix KASAN shadow region page tables Currently KASAN shadow region page tables created without respect of physical offset (phys_base). This causes kernel halt when phys_base is not zero. So let's initialize KASAN shadow region page tables in kasan_early_init() using __pa_nodebug() which considers phys_base. This patch also separates x86_64_start_kernel() from KASAN low level details by moving kasan_map_early_shadow(init_level4_pgt) into kasan_early_init(). Remove the comment before clear_bss() which stopped bringing much profit to the code readability. Otherwise describing all the new order dependencies would be too verbose. Signed-off-by: Alexander Popov <alpopov@ptsecurity.com> Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com> Cc: <stable@vger.kernel.org> # 4.0+ Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <adech.fo@gmail.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1435828178-10975-3-git-send-email-a.ryabinin@samsung.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
d0f77d4d |
|
01-Jul-2015 |
Andrey Ryabinin <ryabinin.a.a@gmail.com> |
x86/init: Clear 'init_level4_pgt' earlier Currently x86_64_start_kernel() has two KASAN related function calls. The first call maps shadow to early_level4_pgt, the second maps shadow to init_level4_pgt. If we move clear_page(init_level4_pgt) earlier, we could hide KASAN low level detail from generic x86_64 initialization code. The next patch will do it. Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com> Cc: <stable@vger.kernel.org> # 4.0+ Cc: Alexander Popov <alpopov@ptsecurity.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <adech.fo@gmail.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1435828178-10975-2-git-send-email-a.ryabinin@samsung.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
425be567 |
|
22-May-2015 |
Andy Lutomirski <luto@kernel.org> |
x86/asm/irq: Stop relying on magic JMP behavior for early_idt_handlers The early_idt_handlers asm code generates an array of entry points spaced nine bytes apart. It's not really clear from that code or from the places that reference it what's going on, and the code only works in the first place because GAS never generates two-byte JMP instructions when jumping to global labels. Clean up the code to generate the correct array stride (member size) explicitly. This should be considerably more robust against screw-ups, as GAS will warn if a .fill directive has a negative count. Using '. =' to advance would have been even more robust (it would generate an actual error if it tried to move backwards), but it would pad with nulls, confusing anyone who tries to disassemble the code. The new scheme should be much clearer to future readers. While we're at it, improve the comments and rename the array and common code. Binutils may start relaxing jumps to non-weak labels. If so, this change will fix our build, and we may need to backport this change. Before, on x86_64: 0000000000000000 <early_idt_handlers>: 0: 6a 00 pushq $0x0 2: 6a 00 pushq $0x0 4: e9 00 00 00 00 jmpq 9 <early_idt_handlers+0x9> 5: R_X86_64_PC32 early_idt_handler-0x4 ... 48: 66 90 xchg %ax,%ax 4a: 6a 08 pushq $0x8 4c: e9 00 00 00 00 jmpq 51 <early_idt_handlers+0x51> 4d: R_X86_64_PC32 early_idt_handler-0x4 ... 117: 6a 00 pushq $0x0 119: 6a 1f pushq $0x1f 11b: e9 00 00 00 00 jmpq 120 <early_idt_handler> 11c: R_X86_64_PC32 early_idt_handler-0x4 After: 0000000000000000 <early_idt_handler_array>: 0: 6a 00 pushq $0x0 2: 6a 00 pushq $0x0 4: e9 14 01 00 00 jmpq 11d <early_idt_handler_common> ... 48: 6a 08 pushq $0x8 4a: e9 d1 00 00 00 jmpq 120 <early_idt_handler_common> 4f: cc int3 50: cc int3 ... 117: 6a 00 pushq $0x0 119: 6a 1f pushq $0x1f 11b: eb 03 jmp 120 <early_idt_handler_common> 11d: cc int3 11e: cc int3 11f: cc int3 Signed-off-by: Andy Lutomirski <luto@kernel.org> Acked-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Binutils <binutils@sourceware.org> Cc: Borislav Petkov <bp@alien8.de> Cc: H.J. Lu <hjl.tools@gmail.com> Cc: Jan Beulich <JBeulich@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/ac027962af343b0c599cbfcf50b945ad2ef3d7a8.1432336324.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
cdeb6048 |
|
22-May-2015 |
Andy Lutomirski <luto@kernel.org> |
x86/asm/irq: Stop relying on magic JMP behavior for early_idt_handlers The early_idt_handlers asm code generates an array of entry points spaced nine bytes apart. It's not really clear from that code or from the places that reference it what's going on, and the code only works in the first place because GAS never generates two-byte JMP instructions when jumping to global labels. Clean up the code to generate the correct array stride (member size) explicitly. This should be considerably more robust against screw-ups, as GAS will warn if a .fill directive has a negative count. Using '. =' to advance would have been even more robust (it would generate an actual error if it tried to move backwards), but it would pad with nulls, confusing anyone who tries to disassemble the code. The new scheme should be much clearer to future readers. While we're at it, improve the comments and rename the array and common code. Binutils may start relaxing jumps to non-weak labels. If so, this change will fix our build, and we may need to backport this change. Before, on x86_64: 0000000000000000 <early_idt_handlers>: 0: 6a 00 pushq $0x0 2: 6a 00 pushq $0x0 4: e9 00 00 00 00 jmpq 9 <early_idt_handlers+0x9> 5: R_X86_64_PC32 early_idt_handler-0x4 ... 48: 66 90 xchg %ax,%ax 4a: 6a 08 pushq $0x8 4c: e9 00 00 00 00 jmpq 51 <early_idt_handlers+0x51> 4d: R_X86_64_PC32 early_idt_handler-0x4 ... 117: 6a 00 pushq $0x0 119: 6a 1f pushq $0x1f 11b: e9 00 00 00 00 jmpq 120 <early_idt_handler> 11c: R_X86_64_PC32 early_idt_handler-0x4 After: 0000000000000000 <early_idt_handler_array>: 0: 6a 00 pushq $0x0 2: 6a 00 pushq $0x0 4: e9 14 01 00 00 jmpq 11d <early_idt_handler_common> ... 48: 6a 08 pushq $0x8 4a: e9 d1 00 00 00 jmpq 120 <early_idt_handler_common> 4f: cc int3 50: cc int3 ... 117: 6a 00 pushq $0x0 119: 6a 1f pushq $0x1f 11b: eb 03 jmp 120 <early_idt_handler_common> 11d: cc int3 11e: cc int3 11f: cc int3 Signed-off-by: Andy Lutomirski <luto@kernel.org> Acked-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Binutils <binutils@sourceware.org> Cc: Borislav Petkov <bp@alien8.de> Cc: H.J. Lu <hjl.tools@gmail.com> Cc: Jan Beulich <JBeulich@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/ac027962af343b0c599cbfcf50b945ad2ef3d7a8.1432336324.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
91d8f041 |
|
17-Mar-2015 |
Alexander Kuleshov <kuleshovmail@gmail.com> |
x86/boot/64: Remove pointless early_printk() message earlyprintk is not initialised yet by the setup_early_printk() function so we can remove it. Signed-off-by: Alexander Kuleshov <kuleshovmail@gmail.com> Cc: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/1426597205-5142-1-git-send-email-kuleshovmail@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
ef7f0d6a |
|
13-Feb-2015 |
Andrey Ryabinin <ryabinin.a.a@gmail.com> |
x86_64: add KASan support This patch adds arch specific code for kernel address sanitizer. 16TB of virtual addressed used for shadow memory. It's located in range [ffffec0000000000 - fffffc0000000000] between vmemmap and %esp fixup stacks. At early stage we map whole shadow region with zero page. Latter, after pages mapped to direct mapping address range we unmap zero pages from corresponding shadow (see kasan_map_shadow()) and allocate and map a real shadow memory reusing vmemmap_populate() function. Also replace __pa with __pa_nodebug before shadow initialized. __pa with CONFIG_DEBUG_VIRTUAL=y make external function call (__phys_addr) __phys_addr is instrumented, so __asan_load could be called before shadow area initialized. Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Konstantin Serebryany <kcc@google.com> Cc: Dmitry Chernenkov <dmitryc@google.com> Signed-off-by: Andrey Konovalov <adech.fo@gmail.com> Cc: Yuri Gribov <tetra2005@gmail.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Christoph Lameter <cl@linux.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Jim Davis <jim.epost@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
1e02ce4c |
|
24-Oct-2014 |
Andy Lutomirski <luto@amacapital.net> |
x86: Store a per-cpu shadow copy of CR4 Context switches and TLB flushes can change individual bits of CR4. CR4 reads take several cycles, so store a shadow copy of CR4 in a per-cpu variable. To avoid wasting a cache line, I added the CR4 shadow to cpu_tlbstate, which is already touched in switch_mm. The heaviest users of the cr4 shadow will be switch_mm and __switch_to_xtra, and __switch_to_xtra is called shortly after switch_mm during context switch, so the cacheline is likely to be hot. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Kees Cook <keescook@chromium.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Vince Weaver <vince@deater.net> Cc: "hillf.zj" <hillf.zj@alibaba-inc.com> Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/3a54dd3353fffbf84804398e00dfdc5b7c1afd7d.1414190806.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
a8fe19eb |
|
04-Jun-2014 |
Borislav Petkov <bp@suse.de> |
kernel/printk: use symbolic defines for console loglevels ... instead of naked numbers. Stuff in sysrq.c used to set it to 8 which is supposed to mean above default level so set it to DEBUG instead as we're terminating/killing all tasks and we want to be verbose there. Also, correct the check in x86_64_start_kernel which should be >= as we're clearly issuing the string there for all debug levels, not only the magical 10. Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Kees Cook <keescook@chromium.org> Acked-by: Randy Dunlap <rdunlap@infradead.org> Cc: Joe Perches <joe@perches.com> Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
2605fc21 |
|
01-May-2014 |
Andi Kleen <ak@linux.intel.com> |
asmlinkage, x86: Add explicit __visible to arch/x86/* As requested by Linus add explicit __visible to the asmlinkage users. This marks all functions visible to assembler. Tree sweep for arch/x86/* Signed-off-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1398984278-29319-3-git-send-email-andi@firstfloor.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
|
#
25c74b10 |
|
30-Oct-2013 |
Seiji Aguchi <seiji.aguchi@hds.com> |
x86, trace: Register exception handler to trace IDT This patch registers exception handlers for tracing to a trace IDT. To implemented it in set_intr_gate(), this patch does followings. - Register the exception handlers to the trace IDT by prepending "trace_" to the handler's names. - Also, newly introduce trace_page_fault() to add tracepoints in a subsequent patch. Signed-off-by: Seiji Aguchi <seiji.aguchi@hds.com> Link: http://lkml.kernel.org/r/52716DEC.5050204@hds.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
|
#
a1ed4ddf |
|
05-Aug-2013 |
Andi Kleen <ak@linux.intel.com> |
x86, asmlinkage: Make _*_start_kernel visible Obviously these functions have to be visible, otherwise the whole kernel could be optimized away. Signed-off-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1375740170-7446-5-git-send-email-andi@firstfloor.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
|
#
5e427ec2 |
|
20-May-2013 |
Linus Torvalds <torvalds@linux-foundation.org> |
x86: Fix bit corruption at CPU resume time In commit 78d77df71510 ("x86-64, init: Do not set NX bits on non-NX capable hardware") we added the early_pmd_flags that gets the NX bit set when a CPU supports NX. However, the new variable was marked __initdata, because the main _use_ of this is in an __init routine. However, the bit setting happens from secondary_startup_64(), which is called not only at bootup, but on every secondary CPU start. Including resuming from STR and at CPU hotplug time. So the value cannot be __initdata. Reported-bisected-and-tested-by: Michal Hocko <mhocko@suse.cz> Cc: stable@vger.kernel.org # v3.9 Acked-by: Peter Anvin <hpa@linux.intel.com> Cc: Fernando Luis Vázquez Cao <fernando@oss.ntt.co.jp> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
78d77df7 |
|
02-May-2013 |
H. Peter Anvin <hpa@linux.intel.com> |
x86-64, init: Do not set NX bits on non-NX capable hardware During early init, we would incorrectly set the NX bit even if the NX feature was not supported. Instead, only set this bit if NX is actually available and enabled. We already do very early detection of the NX bit to enable it in EFER, this simply extends this detection to the early page table mask. Reported-by: Fernando Luis Vázquez Cao <fernando@oss.ntt.co.jp> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Link: http://lkml.kernel.org/r/1367476850.5660.2.camel@nexus Cc: <stable@vger.kernel.org> v3.9
|
#
8e3c2a8c |
|
04-Mar-2013 |
Borislav Petkov <bp@alien8.de> |
x86: Drop KERNEL_IMAGE_START We have KERNEL_IMAGE_START and __START_KERNEL_map which both contain the start of the kernel text mapping's virtual address. Remove the prior one which has been replicated a lot less times around the tree. No functionality change. Signed-off-by: Borislav Petkov <bp@alien8.de> Link: http://lkml.kernel.org/r/1362428180-8865-3-git-send-email-bp@alien8.de Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
|
#
ac630dd9 |
|
22-Feb-2013 |
Linus Torvalds <torvalds@linux-foundation.org> |
x86-64: don't set the early IDT to point directly to 'early_idt_handler' The code requires the use of the proper per-exception-vector stub functions (set up as the early_idt_handlers[] array - note the 's') that make sure to set up the error vector number. This is true regardless of whether CONFIG_EARLY_PRINTK is set or not. Why? The stack offset for the comparison of __KERNEL_CS won't be right otherwise, nor will the new check (from commit 8170e6bed465: "x86, 64bit: Use a #PF handler to materialize early mappings on demand") for the page fault exception vector. Acked-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
feddc9de |
|
21-Dec-2012 |
Fenghua Yu <fenghua.yu@intel.com> |
x86/head64.c: Early update ucode in 64-bit This updates ucode on BSP in 64-bit mode. Paging and virtual address are working now. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Link: http://lkml.kernel.org/r/1356075872-3054-11-git-send-email-fenghua.yu@intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
|
#
6c902b65 |
|
24-Jan-2013 |
Yinghai Lu <yinghai@kernel.org> |
x86: Merge early kernel reserve for 32bit and 64bit They are the same, and we could move them out from head32/64.c to setup.c. We are using memblock, and it could handle overlapping properly, so we don't need to reserve some at first to hold the location, and just need to make sure we reserve them before we are using memblock to find free mem to use. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1359058816-7615-32-git-send-email-yinghai@kernel.org Cc: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
|
#
ee92d815 |
|
28-Jan-2013 |
Yinghai Lu <yinghai@kernel.org> |
x86, boot: Support loading bzImage, boot_params and ramdisk above 4G xloadflags bit 1 indicates that we can load the kernel and all data structures above 4G; it is set if kernel is relocatable and 64bit. bootloader will check if xloadflags bit 1 is set to decide if it could load ramdisk and kernel high above 4G. bootloader will fill value to ext_ramdisk_image/size for high 32bits when it load ramdisk above 4G. kernel use get_ramdisk_image/size to use ext_ramdisk_image/size to get right positon for ramdisk. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: Rob Landley <rob@landley.net> Cc: Matt Fleming <matt.fleming@intel.com> Cc: Gokul Caushik <caushik1@gmail.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Joe Millenbach <jmillenbach@gmail.com> Link: http://lkml.kernel.org/r/1359058816-7615-26-git-send-email-yinghai@kernel.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
|
#
f1da834c |
|
24-Jan-2013 |
Yinghai Lu <yinghai@kernel.org> |
x86, boot: Add get_cmd_line_ptr() Add an accessor function for the command line address. Later we will add support for holding a 64-bit address via ext_cmd_line_ptr. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1359058816-7615-17-git-send-email-yinghai@kernel.org Cc: Gokul Caushik <caushik1@gmail.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Joe Millenbach <jmillenbach@gmail.com> Cc: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
|
#
1b8c78be |
|
24-Jan-2013 |
Yinghai Lu <yinghai@kernel.org> |
x86: Merge early_reserve_initrd for 32bit and 64bit They are the same, could move them out from head32/64.c to setup.c. We are using memblock, and it could handle overlapping properly, so we don't need to reserve some at first to hold the location, and just need to make sure we reserve them before we are using memblock to find free mem to use. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1359058816-7615-15-git-send-email-yinghai@kernel.org Reviewed-by: Pekka Enberg <penberg@kernel.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
|
#
10054230 |
|
24-Jan-2013 |
Yinghai Lu <yinghai@kernel.org> |
x86, 64bit: Don't set max_pfn_mapped wrong value early on native path We are not having max_pfn_mapped set correctly until init_memory_mapping. So don't print its initial value for 64bit Also need to use KERNEL_IMAGE_SIZE directly for highmap cleanup. -v2: update comments about max_pfn_mapped according to Stefano Stabellini. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1359058816-7615-14-git-send-email-yinghai@kernel.org Acked-by: Borislav Petkov <bp@suse.de> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
|
#
6b9c75ac |
|
24-Jan-2013 |
Yinghai Lu <yinghai@kernel.org> |
x86, 64bit: #PF handler set page to cover only 2M per #PF We only map a single 2 MiB page per #PF, even though we should be able to do this a full gigabyte at a time with no additional memory cost. This is a workaround for a broken AMD reference BIOS (and its derivatives in shipping system) which maps a large chunk of memory as WB in the MTRR system but will #MC if the processor wanders off and tries to prefetch that memory, which can happen any time the memory is mapped in the TLB. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1359058816-7615-13-git-send-email-yinghai@kernel.org Cc: Alexander Duyck <alexander.h.duyck@intel.com> [ hpa: rewrote the patch description ] Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
|
#
8170e6be |
|
24-Jan-2013 |
H. Peter Anvin <hpa@zytor.com> |
x86, 64bit: Use a #PF handler to materialize early mappings on demand Linear mode (CR0.PG = 0) is mutually exclusive with 64-bit mode; all 64-bit code has to use page tables. This makes it awkward before we have first set up properly all-covering page tables to access objects that are outside the static kernel range. So far we have dealt with that simply by mapping a fixed amount of low memory, but that fails in at least two upcoming use cases: 1. We will support load and run kernel, struct boot_params, ramdisk, command line, etc. above the 4 GiB mark. 2. need to access ramdisk early to get microcode to update that as early possible. We could use early_iomap to access them too, but it will make code to messy and hard to be unified with 32 bit. Hence, set up a #PF table and use a fixed number of buffers to set up page tables on demand. If the buffers fill up then we simply flush them and start over. These buffers are all in __initdata, so it does not increase RAM usage at runtime. Thus, with the help of the #PF handler, we can set the final kernel mapping from blank, and switch to init_level4_pgt later. During the switchover in head_64.S, before #PF handler is available, we use three pages to handle kernel crossing 1G, 512G boundaries with sharing page by playing games with page aliasing: the same page is mapped twice in the higher-level tables with appropriate wraparound. The kernel region itself will be properly mapped; other mappings may be spurious. early_make_pgtable is using kernel high mapping address to access pages to set page table. -v4: Add phys_base offset to make kexec happy, and add init_mapping_kernel() - Yinghai -v5: fix compiling with xen, and add back ident level3 and level2 for xen also move back init_level4_pgt from BSS to DATA again. because we have to clear it anyway. - Yinghai -v6: switch to init_level4_pgt in init_mem_mapping. - Yinghai -v7: remove not needed clear_page for init_level4_page it is with fill 512,8,0 already in head_64.S - Yinghai -v8: we need to keep that handler alive until init_mem_mapping and don't let early_trap_init to trash that early #PF handler. So split early_trap_pf_init out and move it down. - Yinghai -v9: switchover only cover kernel space instead of 1G so could avoid touch possible mem holes. - Yinghai -v11: change far jmp back to far return to initial_code, that is needed to fix failure that is reported by Konrad on AMD systems. - Yinghai Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1359058816-7615-12-git-send-email-yinghai@kernel.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
|
#
fa2bbce9 |
|
24-Jan-2013 |
Yinghai Lu <yinghai@kernel.org> |
x86, 64bit: Copy struct boot_params early We want to support struct boot_params (formerly known as the zero-page, or real-mode data) above the 4 GiB mark. We will have #PF handler to set page table for not accessible ram early, but want to limit it before x86_64_start_reservations to limit the code change to native path only. Also we will need the ramdisk info in struct boot_params to access the microcode blob in ramdisk in x86_64_start_kernel, so copy struct boot_params early makes it accessing ramdisk info simple. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1359058816-7615-9-git-send-email-yinghai@kernel.org Cc: Alexander Duyck <alexander.h.duyck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
|
#
5dcd14ec |
|
29-Jan-2013 |
H. Peter Anvin <hpa@linux.intel.com> |
x86, boot: Sanitize boot_params if not zeroed on creation Use the new sentinel field to detect bootloaders which fail to follow protocol and don't initialize fields in struct boot_params that they do not explicitly initialize to zero. Based on an original patch and research by Yinghai Lu. Changed by hpa to be invoked both in the decompression path and in the kernel proper; the latter for the case where a bootloader takes over decompression. Originally-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1359058816-7615-26-git-send-email-yinghai@kernel.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
|
#
bbee3aec |
|
19-Nov-2012 |
Alexander Duyck <alexander.h.duyck@intel.com> |
x86: Fix warning about cast from pointer to integer of different size This patch fixes a warning reported by the kbuild test robot where we were casting a pointer to a physical address which represents an integer of a different size. Per the suggestion of Peter Anvin I am replacing it and one other spot where I made a similar cast with an unsigned long. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Link: http://lkml.kernel.org/r/20121119182927.3655.7641.stgit@ahduyck-cp1.jf.intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
|
#
05a476b6 |
|
16-Nov-2012 |
Alexander Duyck <alexander.h.duyck@intel.com> |
x86: Drop 4 unnecessary calls to __pa_symbol While debugging the __pa_symbol inline patch I found that there were a couple spots where __pa_symbol was used as follows: __pa_symbol(x) - __pa_symbol(y) The compiler had reduced them to: x - y Since we also support a debug case where __pa_symbol is a function call it would probably be useful to just change the two cases I found so that they are always just treated as "x - y". As such I am casting the values to phys_addr_t and then doing simple subtraction so that the correct type and value is returned. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Link: http://lkml.kernel.org/r/20121116215552.8521.68085.stgit@ahduyck-cp1.jf.intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
|
#
c9b77ccb |
|
08-May-2012 |
Jarkko Sakkinen <jarkko.sakkinen@intel.com> |
x86, realmode: Move ACPI wakeup to unified realmode code Migrated ACPI wakeup code to the real-mode blob. Code existing in .x86_trampoline can be completely removed. Static descriptor table in wakeup_asm.S is courtesy of H. Peter Anvin. Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@intel.com> Link: http://lkml.kernel.org/r/1336501366-28617-7-git-send-email-jarkko.sakkinen@intel.com Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Len Brown <len.brown@intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
|
#
fe091c20 |
|
08-Dec-2011 |
Tejun Heo <tj@kernel.org> |
memblock: Kill memblock_init() memblock_init() initializes arrays for regions and memblock itself; however, all these can be done with struct initializers and memblock_init() can be removed. This patch kills memblock_init() and initializes memblock with struct initializer. The only difference is that the first dummy entries don't have .nid set to MAX_NUMNODES initially. This doesn't cause any behavior difference. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Russell King <linux@arm.linux.org.uk> Cc: Michal Simek <monstr@monstr.eu> Cc: Paul Mundt <lethal@linux-sh.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> Cc: "H. Peter Anvin" <hpa@zytor.com>
|
#
24aa0788 |
|
12-Jul-2011 |
Tejun Heo <tj@kernel.org> |
memblock, x86: Replace memblock_x86_reserve/free_range() with generic ones Other than sanity check and debug message, the x86 specific version of memblock reserve/free functions are simple wrappers around the generic versions - memblock_reserve/free(). This patch adds debug messages with caller identification to the generic versions and replaces x86 specific ones and kills them. arch/x86/include/asm/memblock.h and arch/x86/mm/memblock.c are empty after this change and removed. Signed-off-by: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/1310462166-31469-14-git-send-email-tj@kernel.org Cc: Yinghai Lu <yinghai@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
|
#
e5f15b45 |
|
18-Feb-2011 |
Yinghai Lu <yinghai@kernel.org> |
x86: Cleanup highmap after brk is concluded Now cleanup_highmap actually is in two steps: one is early in head64.c and only clears above _end; a second one is in init_memory_mapping() and tries to clean from _brk_end to _end. It should check if those boundaries are PMD_SIZE aligned but currently does not. Also init_memory_mapping() is called several times for numa or memory hotplug, so we really should not handle initial kernel mappings there. This patch moves cleanup_highmap() down after _brk_end is settled so we can do everything in one step. Also we honor max_pfn_mapped in the implementation of cleanup_highmap. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> LKML-Reference: <alpine.DEB.2.00.1103171739050.3382@kaball-desktop> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
|
#
67e87f0a |
|
13-Oct-2010 |
Jeremy Fitzhardinge <jeremy@goop.org> |
x86-64: Only set max_pfn_mapped to 512 MiB if we enter via head_64.S head_64.S maps up to 512 MiB, but that is not necessarity true for other entry paths, such as Xen. Thus, co-locate the setting of max_pfn_mapped with the code to actually set up the page tables in head_64.S. The 32-bit code is already so co-located. (The Xen code already sets max_pfn_mapped correctly for its own use case.) -v2: Yinghai fixed the following bug in this patch: | | max_pfn_mapped is in .bss section, so we need to set that | after bss get cleared. Without that we crash on bootup. | | That is safe because Xen does not call x86_64_start_kernel(). | Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Fixed-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> LKML-Reference: <4CB6AB24.9020504@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
a9ce6bc1 |
|
25-Aug-2010 |
Yinghai Lu <yinghai@kernel.org> |
x86, memblock: Replace e820_/_early string with memblock_ 1.include linux/memblock.h directly. so later could reduce e820.h reference. 2 this patch is done by sed scripts mainly -v2: use MEMBLOCK_ERROR instead of -1ULL or -1UL Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
|
#
72d7c3b3 |
|
25-Aug-2010 |
Yinghai Lu <yinghai@kernel.org> |
x86: Use memblock to replace early_res 1. replace find_e820_area with memblock_find_in_range 2. replace reserve_early with memblock_x86_reserve_range 3. replace free_early with memblock_x86_free_range. 4. NO_BOOTMEM will switch to use memblock too. 5. use _e820, _early wrap in the patch, in following patch, will replace them all 6. because memblock_x86_free_range support partial free, we can remove some special care 7. Need to make sure that memblock_find_in_range() is called after memblock_x86_fill() so adjust some calling later in setup.c::setup_arch() -- corruption_check and mptable_update -v2: Move reserve_brk() early Before fill_memblock_area, to avoid overlap between brk and memblock_find_in_range() that could happen We have more then 128 RAM entry in E820 tables, and memblock_x86_fill() could use memblock_find_in_range() to find a new place for memblock.memory.region array. and We don't need to use extend_brk() after fill_memblock_area() So move reserve_brk() early before fill_memblock_area(). -v3: Move find_smp_config early To make sure memblock_find_in_range not find wrong place, if BIOS doesn't put mptable in right place. -v4: Treat RESERVED_KERN as RAM in memblock.memory. and they are already in memblock.reserved already.. use __NOT_KEEP_MEMBLOCK to make sure memblock related code could be freed later. -v5: Generic version __memblock_find_in_range() is going from high to low, and for 32bit active_region for 32bit does include high pages need to replace the limit with memblock.default_alloc_limit, aka get_max_mapped() -v6: Use current_limit instead -v7: check with MEMBLOCK_ERROR instead of -1ULL or -1L -v8: Set memblock_can_resize early to handle EFI with more RAM entries -v9: update after kmemleak changes in mainline Suggested-by: David S. Miller <davem@davemloft.net> Suggested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
|
#
c967da6a |
|
28-Mar-2010 |
Yinghai Lu <yinghai@kernel.org> |
x86: Make sure free_init_pages() frees pages on page boundary When CONFIG_NO_BOOTMEM=y, it could use memory more effiently, or in a more compact fashion. Example: Allocated new RAMDISK: 00ec2000 - 0248ce57 Move RAMDISK from 000000002ea04000 - 000000002ffcee56 to 00ec2000 - 0248ce56 The new RAMDISK's end is not page aligned. Last page could be shared with other users. When free_init_pages are called for initrd or .init, the page could be freed and we could corrupt other data. code segment in free_init_pages(): | for (; addr < end; addr += PAGE_SIZE) { | ClearPageReserved(virt_to_page(addr)); | init_page_count(virt_to_page(addr)); | memset((void *)(addr & ~(PAGE_SIZE-1)), | POISON_FREE_INITMEM, PAGE_SIZE); | free_page(addr); | totalram_pages++; | } last half page could be used as one whole free page. So page align the boundaries. -v2: make the original initramdisk to be aligned, according to Johannes, otherwise we have the chance to lose one page. we still need to keep initrd_end not aligned, otherwise it could confuse decompressor. -v3: change to WARN_ON instead, suggested by Johannes. -v4: use PAGE_ALIGN, suggested by Johannes. We may fix that macro name later to PAGE_ALIGN_UP, and PAGE_ALIGN_DOWN Add comments about assuming ramdisk start is aligned in relocate_initrd(), change to re get ramdisk_image instead of save it to make diff smaller. Add warning for wrong range, suggested by Johannes. -v6: remove one WARN() We need to align beginning in free_init_pages() do not copy more than ramdisk_size, noticed by Johannes Reported-by: Stanislaw Gruszka <sgruszka@redhat.com> Tested-by: Stanislaw Gruszka <sgruszka@redhat.com> Signed-off-by: Yinghai Lu <yinghai@kernel.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: David Miller <davem@davemloft.net> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> LKML-Reference: <1269830604-26214-3-git-send-email-yinghai@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
893f38d1 |
|
10-Dec-2009 |
Yinghai Lu <yinghai@kernel.org> |
x86: Use find_e820() instead of hard coded trampoline address Jens found the following crash/regression: [ 0.000000] found SMP MP-table at [ffff8800000fdd80] fdd80 [ 0.000000] Kernel panic - not syncing: Overlapping early reservations 12-f011 MP-table mpc to 0-fff BIOS data page and [ 0.000000] Kernel panic - not syncing: Overlapping early reservations 12-f011 MP-table mpc to 6000-7fff TRAMPOLINE and bisected it to b24c2a9 ("x86: Move find_smp_config() earlier and avoid bootmem usage"). It turns out the BIOS is using the first 64k for mptable, without reserving it. So try to find good range for the real-mode trampoline instead of hard coding it, in case some bios tries to use that range for sth. Reported-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Yinghai Lu <yinghai@kernel.org> Tested-by: Jens Axboe <jens.axboe@oracle.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> LKML-Reference: <4B21630A.6000308@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
47a3d5da |
|
29-Aug-2009 |
Thomas Gleixner <tglx@linutronix.de> |
x86: Add early platform detection Platforms like Moorestown require early setup and want to avoid the call to reserve_ebda_region. The x86_init override is too late when the MRST detection happens in setup_arch. Move the default i386 x86_init overrides and the call to reserve_ebda_region into a separate function which is called as the default of a switch case depending on the hardware_subarch id in boot params. This allows us to add a case for MRST and let MRST have its own early setup function. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
816c25e7 |
|
19-Aug-2009 |
Thomas Gleixner <tglx@linutronix.de> |
x86: Add reserve_ebda_region to x86_init_ops reserve_ebda_region needs to be called befor start_kernel. Moorestown needs to override it. Make it a x86_init_ops function and initialize it with the default reserve_ebda_region. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
93dbda7c |
|
26-Feb-2009 |
Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> |
x86: add brk allocation for very, very early allocations Impact: new interface Add a brk()-like allocator which effectively extends the bss in order to allow very early code to do dynamic allocations. This is better than using statically allocated arrays for data in subsystems which may never get used. The space for brk allocations is in the bss ELF segment, so that the space is mapped properly by the code which maps the kernel, and so that bootloaders keep the space free rather than putting a ramdisk or something into it. The bss itself, delimited by __bss_stop, ends before the brk area (__brk_base to __brk_limit). The kernel text, data and bss is reserved up to __bss_stop. Any brk-allocated data is reserved separately just before the kernel pagetable is built, as that code allocates from unreserved spaces in the e820 map, potentially allocating from any unused brk memory. Ultimately any unused memory in the brk area is used in the general kernel memory pool. Initially the brk space is set to 1MB, which is probably much larger than any user needs (the largest current user is i386 head_32.S's code to build the pagetables to map the kernel, which can get fairly large with a big kernel image and no PSE support). So long as the system has sufficient memory for the bootloader to reserve the kernel+1MB brk, there are no bad effects resulting from an over-large brk. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
|
#
8ce03197 |
|
18-Jan-2009 |
Brian Gerst <brgerst@gmail.com> |
x86: remove pda_init() Impact: cleanup Copy the code to cpu_init() to satisfy the requirement that the cpu be reinitialized. Remove all other calls, since the segments are already initialized in head_64.S. Signed-off-by: Brian Gerst <brgerst@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>
|
#
004aa322 |
|
13-Jan-2009 |
Tejun Heo <tj@kernel.org> |
x86: misc clean up after the percpu update Do the following cleanups: * kill x86_64_init_pda() which now is equivalent to pda_init() * use per_cpu_offset() instead of cpu_pda() when initializing initial_gs Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
b12d8db8 |
|
13-Jan-2009 |
Tejun Heo <tj@kernel.org> |
x86: make pda a percpu variable [ Based on original patch from Christoph Lameter and Mike Travis. ] As pda is now allocated in percpu area, it can easily be made a proper percpu variable. Make it so by defining per cpu symbol from linker script and declaring it in C code for SMP and simply defining it for UP. This change cleans up code and brings SMP and UP closer a bit. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
9939ddaf |
|
13-Jan-2009 |
Tejun Heo <tj@kernel.org> |
x86: merge 64 and 32 SMP percpu handling Now that pda is allocated as part of percpu, percpu doesn't need to be accessed through pda. Unify x86_64 SMP percpu access with x86_32 SMP one. Other than the segment register, operand size and the base of percpu symbols, they behave identical now. This patch replaces now unnecessary pda->data_offset with a dummy field which is necessary to keep stack_canary at its place. This patch also moves per_cpu_offset initialization out of init_gdt() into setup_per_cpu_areas(). Note that this change also necessitates explicit per_cpu_offset initializations in voyager_smp.c. With this change, x86_OP_percpu()'s are as efficient on x86_64 as on x86_32 and also x86_64 can use assembly PER_CPU macros. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
1a51e3a0 |
|
13-Jan-2009 |
Tejun Heo <tj@kernel.org> |
x86: fold pda into percpu area on SMP [ Based on original patch from Christoph Lameter and Mike Travis. ] Currently pdas and percpu areas are allocated separately. %gs points to local pda and percpu area can be reached using pda->data_offset. This patch folds pda into percpu area. Due to strange gcc requirement, pda needs to be at the beginning of the percpu area so that pda->stack_canary is at %gs:40. To achieve this, a new percpu output section macro - PERCPU_VADDR_PREALLOC() - is added and used to reserve pda sized chunk at the start of the percpu area. After this change, for boot cpu, %gs first points to pda in the data.init area and later during setup_per_cpu_areas() gets updated to point to the actual pda. This means that setup_per_cpu_areas() need to reload %gs for CPU0 while clearing pda area for other cpus as cpu0 already has modified it when control reaches setup_per_cpu_areas(). This patch also removes now unnecessary get_local_pda() and its call sites. A lot of this patch is taken from Mike Travis' "x86_64: Fold pda into per cpu area" patch. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
c8f3329a |
|
13-Jan-2009 |
Tejun Heo <tj@kernel.org> |
x86: use static _cpu_pda array _cpu_pda array first uses statically allocated storage in data.init and then switches to allocated bootmem to conserve space. However, after folding pda area into percpu area, _cpu_pda array will be removed completely. Drop the reallocation part to simplify the code for soon-to-follow changes. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
f32ff538 |
|
13-Jan-2009 |
Tejun Heo <tj@kernel.org> |
x86: load pointer to pda into %gs while brining up a CPU [ Based on original patch from Christoph Lameter and Mike Travis. ] CPU startup code in head_64.S loaded address of a zero page into %gs for temporary use till pda is loaded but address to the actual pda is available at the point. Load the real address directly instead. This will help unifying percpu and pda handling later on. This patch is mostly taken from Mike Travis' "x86_64: Fold pda into per cpu area" patch. Signed-off-by: Tejun Heo <tj@kernel.org>
|
#
3e5d8f97 |
|
13-Jan-2009 |
Tejun Heo <tj@kernel.org> |
x86: make percpu symbols zerobased on SMP [ Based on original patch from Christoph Lameter and Mike Travis. ] This patch makes percpu symbols zerobased on x86_64 SMP by adding PERCPU_VADDR() to vmlinux.lds.h which helps setting explicit vaddr on the percpu output section and using it in vmlinux_64.lds.S. A new PHDR is added as existing ones cannot contain sections near address zero. PERCPU_VADDR() also adds a new symbol __per_cpu_load which always points to the vaddr of the loaded percpu data.init region. The following adjustments have been made to accomodate the address change. * code to locate percpu gdt_page in head_64.S is updated to add the load address to the gdt_page offset. * __per_cpu_load is used in places where access to the init data area is necessary. * pda->data_offset is initialized soon after C code is entered as zero value doesn't work anymore. This patch is mostly taken from Mike Travis' "x86_64: Base percpu variables at zero" patch. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
26799a63 |
|
31-Dec-2008 |
Ravikiran G Thirumalai <kiran@scalex86.org> |
x86: fix incorrect __read_mostly on _boot_cpu_pda The pda rework (commit 3461b0af025251bbc6b3d56c821c6ac2de6f7209) to remove static boot cpu pdas introduced a performance bug. _boot_cpu_pda is the actual pda used by the boot cpu and is definitely not "__read_mostly" and ended up polluting the read mostly section with writes. This bug caused regression of about 8-10% on certain syscall intensive workloads. Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org> Acked-by: Mike Travis <travis@sgi.com> Cc: <stable@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
3e1e9002 |
|
07-Dec-2008 |
Rafael J. Wysocki <rjw@rjwysocki.net> |
x86: change static allocation of trampoline area Impact: fix trampoline sizing bug, save space While debugging a suspend-to-RAM related issue it occured to me that if the trampoline code had grown past 4 KB, we would have been allocating too little memory for it, since the 4 KB size of the trampoline is hardcoded into arch/x86/kernel/e820.c . Change that by making the kernel compute the trampoline size and allocate as much memory as necessary. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
12544697 |
|
28-Sep-2008 |
dcg <diegocg@gmail.com> |
x86_64: be less annoying on boot, v2 Honour "quiet" boot parameter in early_printk() calls Signed-off-by: Diego Calleja <diegocg@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
f6476774 |
|
24-Sep-2008 |
Bill Nottingham <notting@redhat.com> |
x86_64: be less annoying on boot Remove mostly useless message on every boot. Signed-off-by: Bill Nottingham <notting@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
66d4bdf2 |
|
31-Jul-2008 |
Jan Beulich <jbeulich@novell.com> |
x86-64: fix overlap of modules and fixmap areas Plus add a build time check so this doesn't go unnoticed again. Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
5b09b287 |
|
08-Jul-2008 |
Jeremy Fitzhardinge <jeremy@goop.org> |
x86_64: add workaround for no %gs-based percpu As a stopgap until Mike Travis's x86-64 gs-based percpu patches are ready, provide workaround functions for x86_read/write_percpu for Xen's use. Specifically, this means that we can't really make use of vcpu placement, because we can't use a single gs-based memory access to get to vcpu fields. So disable all that for now. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
28bb2237 |
|
30-Jun-2008 |
Yinghai Lu <yhlu.kernel@gmail.com> |
x86: move reserve_setup_data to setup.c Ying Huang would like setup_data to be reserved, but not included in the no save range. Here we try to modify the e820 table to reserve that range early. also add that in early_res in case bootloader messes up with the ramdisk. other solution would be 1. add early_res_to_highmem... 2. early_res_to_e820... but they could reserve another type memory wrongly, if early_res has some resource reserved early, and not needed later, but it is not removed from early_res in time. Like the RAMDISK (already handled). Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com> Cc: andi@firstfloor.org Tested-by: Huang, Ying <ying.huang@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
f97013fd |
|
24-Jun-2008 |
Jeremy Fitzhardinge <jeremy@goop.org> |
x86, 64-bit: split x86_64_start_kernel Split x86_64_start_kernel() into two pieces: The first essentially cleans up after head_64.S. It clears the bss, zaps low identity mappings, sets up some early exception handlers. The second part preserves the boot data, reserves the kernel's text/data/bss, pagetables and ramdisk, and then starts the kernel proper. This split is so that Xen can call the second part to do the set up it needs done. It doesn't need any of the first part setups, because it doesn't boot via head_64.S, and its redundant or actively damaging. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: xen-devel <xen-devel@lists.xensource.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
5deb0b2a |
|
12-May-2008 |
Mike Travis <travis@sgi.com> |
x86: leave initial __cpu_pda array in place until cpus are booted Ingo Molnar wrote: ... > they crashed after about 3 randconfig iterations with: > > early res: 4 [8000-afff] PGTABLE > early res: 5 [b000-b87f] MEMNODEMAP > PANIC: early exception 0e rip 10:ffffffff8077a150 error 2 cr2 37 > Pid: 0, comm: swapper Not tainted 2.6.25-sched-devel.git-x86-latest.git #14 > > Call Trace: > [<ffffffff81466196>] early_idt_handler+0x56/0x6a > [<ffffffff8077a150>] ? numa_set_node+0x30/0x60 > [<ffffffff8077a129>] ? numa_set_node+0x9/0x60 > [<ffffffff8147a543>] numa_init_array+0x93/0xf0 > [<ffffffff8147b039>] acpi_scan_nodes+0x3b9/0x3f0 > [<ffffffff8147a496>] numa_initmem_init+0x136/0x150 > [<ffffffff8146da5f>] setup_arch+0x48f/0x700 > [<ffffffff802566ea>] ? clockevents_register_notifier+0x3a/0x50 > [<ffffffff81466a87>] start_kernel+0xd7/0x440 > [<ffffffff81466422>] x86_64_start_kernel+0x222/0x280 ... Here's the fixup... This one should follow the previous patches. Thanks, Mike Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
3461b0af |
|
12-May-2008 |
Mike Travis <travis@sgi.com> |
x86: remove static boot_cpu_pda array v2 * Remove the boot_cpu_pda array and pointer table from the data section. Allocate the pointer table and array during init. do_boot_cpu() will reallocate the pda in node local memory and if the cpu is being brought up before the bootmem array is released (after_bootmem = 0), then it will free the initial pda. This will happen for all cpus present at system startup. This removes 512k + 32k bytes from the data section. For inclusion into sched-devel/latest tree. Based on: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git + sched-devel/latest .../mingo/linux-2.6-sched-devel.git Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
c45a707d |
|
02-Jun-2008 |
Huang, Ying <ying.huang@intel.com> |
x86: linked list of setup_data for i386 This patch adds linked list of struct setup_data supported for i386. Signed-off-by: Huang Ying <ying.huang@intel.com> Cc: andi@firstfloor.org Cc: mingo@redhat.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
0c51a965 |
|
02-Jun-2008 |
Huang, Ying <ying.huang@intel.com> |
x86: extract common part of head32.c and head64.c into head.c This patch extracts the common part of head32.c and head64.c into head.c. Signed-off-by: Huang Ying <ying.huang@intel.com> Cc: andi@firstfloor.org Cc: mingo@redhat.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
8b664aa6 |
|
27-Mar-2008 |
Huang, Ying <ying.huang@intel.com> |
x86, boot: add linked list of struct setup_data This patch adds a field of 64-bit physical pointer to NULL terminated single linked list of struct setup_data to real-mode kernel header. This is used as a more extensible boot parameters passing mechanism. Signed-off-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
356fa0c6 |
|
19-Apr-2008 |
Akinobu Mita <akinobu.mita@gmail.com> |
x86: use get_bios_ebda() Use get_bios_ebda(). Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
2b8106a0 |
|
18-Mar-2008 |
Yinghai Lu <yhlu.kernel@gmail.com> |
x86_64: do not reserve ramdisk two times ramdisk is reserved via reserve_early in x86_64_start_kernel, later early_res_to_bootmem() will convert to reservation in bootmem. so don't need to reserve that again. Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
5524ea32 |
|
10-Mar-2008 |
Andi Kleen <andi@firstfloor.org> |
x86: don't set up early exception handlers for external interrupts All of early setup runs with interrupts disabled, so there is no need to set up early exception handlers for vectors >= 32 This saves some minor text size. Signed-off-by: Andi Kleen <ak@suse.de> Cc: mingo@elte.hu Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
ecd94c08 |
|
04-Mar-2008 |
Alexander van Heukelum <heukelum@mailshack.com> |
x86: reserve end-of-conventional-memory to 1MB, 64-bit, use paravirt_enabled Jeremy Fitzhardinge pointed out that looking at the boot_params struct to determine if the system is running in a paravirtual environment is not reliable for the Xen case, currently. He also points out that there already exists a function to determine if the system is running in a paravirtual environment. So let's use that instead. This gets rid of the preprocessor test too. Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm> Acked-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
320a6b2e |
|
01-Mar-2008 |
Alexander van Heukelum <heukelum@mailshack.com> |
x86: reserve end-of-conventional-memory to 1MB, 64-bit This patch is an add-on to the 64-bit ebda patch. It makes the functions reserve_ebda_region (renamed from reserve_ebda) and copy_e820_map equal to the 32-bit versions of the previous patch. Changes: Use u64 and u32 for local variables in copy_e820_map. The amount of conventional memory and the start of the EBDA are detected by reading the BIOS data area directly. Paravirtual environments do not provide this area, so we bail out early in that case. They will just have to set up a correct memory map to start with. Add a safety net for zeroed out BIOS data area. Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
f6eb62b6 |
|
25-Feb-2008 |
Alexander van Heukelum <heukelum@mailshack.com> |
x86: reserve_early end-of-conventional-memory to 1MB, 64-bit Explicitly reserve_early the whole address range from the end of conventional memory as reported by the bios data area up to the 1Mb mark. Regard the info retrieved from the BIOS data area with a bit of paranoia, though, because some biosses forget to register the EBDA correctly. Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
b4e0409a |
|
21-Feb-2008 |
Ingo Molnar <mingo@elte.hu> |
x86: check vmlinux limits, 64-bit these build-time and link-time checks would have prevented the vmlinux size regression. Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
31eedd82 |
|
15-Feb-2008 |
Thomas Gleixner <tglx@linutronix.de> |
x86: zap invalid and unused pmds in early boot The early boot code maps KERNEL_TEXT_SIZE (currently 40MB) starting from __START_KERNEL_map. The kernel itself only needs _text to _end mapped in the high alias. On relocatible kernels the ASM setup code adjusts the compile time created high mappings to the relocation. This creates invalid pmd entries for negative offsets: 0xffffffff80000000 -> pmd entry: ffffffffff2001e3 It points outside of the physical address space and is marked present. This starts at the virtual address __START_KERNEL_map and goes up to the point where the first valid physical address (0x0) is mapped. Zap the mappings before _text and after _end right away in early boot. This removes also the invalid entries. Furthermore it simplifies the range check for high aliases. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
25eff8d4 |
|
01-Feb-2008 |
Yinghai Lu <Yinghai.Lu@Sun.COM> |
x86_64: add debug name for early_res helps debugging problems in this rather murky area of code. Signed-off-by: Yinghai Lu <yinghai.lu@sun.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
cd582896 |
|
30-Jan-2008 |
Ingo Molnar <mingo@elte.hu> |
x86: fix more non-global TLB flushes fix more __flush_tlb() instances, out of caution. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
75175278 |
|
30-Jan-2008 |
Andi Kleen <ak@linux.intel.com> |
x86: replace hard coded reservations in 64-bit early boot code with dynamic table On x86-64 there are several memory allocations before bootmem. To avoid them stomping on each other they used to be all hard coded in bad_area(). Replace this with an array that is filled as needed. This cleans up the code considerably and allows to expand its use. Cc: peterz@infradead.org Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
076f9776 |
|
30-Jan-2008 |
Ingo Molnar <mingo@elte.hu> |
x86: make early printk selectable on 64-bit as well Enable CONFIG_EMBEDDED to select CONFIG_EARLY_PRINTK on 64-bit as well. saves ~2K: text data bss dec hex filename 7290283 3672091 1907848 12870222 c4624e vmlinux.before 7288373 3671795 1907848 12868016 c459b0 vmlinux.after Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
8866cd9d |
|
30-Jan-2008 |
Roland McGrath <roland@redhat.com> |
x86: early_idt_handler improvements, 64-bit It's not too pretty, but I found this made the "PANIC: early exception" messages become much more reliably useful: 1. print the vector number, 2. print the %cs value, 3. handle error-code-pushing vs non-pushing vectors. Signed-off-by: Roland McGrath <roland@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
9de819fe |
|
30-Jan-2008 |
Yinghai Lu <Yinghai.Lu@Sun.COM> |
x86: do not set boot cpu in cpu_online_map at x86_64_start_kernel() In init/main.c boot_cpu_init() does that later. Signed-off-by: Yinghai Lu <yinghai.lu@sun.com> Cc: Zachary Amsden <zach@vmware.com> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
eaf76e8b |
|
30-Jan-2008 |
Thomas Gleixner <tglx@linutronix.de> |
x86: remove duplicate start_kernel declaration start_kernel is already declared in a generic header file. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
718fc13b |
|
30-Jan-2008 |
Thomas Gleixner <tglx@linutronix.de> |
x86: move debug related declarations to kdebug.h Move them and fixup some users. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
9d1c6e7c |
|
19-Oct-2007 |
Glauber de Oliveira Costa <gcosta@redhat.com> |
x86: use descriptor's functions instead of inline assembly This patch provides a new set of functions for managing the descriptor tables that can be used instead of putting the raw assembly in .c files. Remodeling of store_tr() suggested by Frederik Deweerdt. [ tglx: arch/x86 adaptation ] Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
30c82645 |
|
15-Oct-2007 |
H. Peter Anvin <hpa@zytor.com> |
[x86] remove uses of magic macros for boot_params access Instead of using magic macros for boot_params access, simply use the boot_params structure. Signed-off-by: H. Peter Anvin <hpa@zytor.com>
|
#
835c34a1 |
|
12-Oct-2007 |
Dave Jones <davej@redhat.com> |
Delete filenames in comments. Since the x86 merge, lots of files that referenced their own filenames are no longer correct. Rather than keep them up to date, just delete them, as they add no real value. Additionally: - fix up comment formatting in scx200_32.c - Remove a credit from myself in setup_64.c from a time when we had no SCM - remove longwinded history from tsc_32.c which can be figured out from git. Signed-off-by: Dave Jones <davej@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
250c2277 |
|
11-Oct-2007 |
Thomas Gleixner <tglx@linutronix.de> |
x86_64: move kernel Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|