History log of /linux-master/arch/x86/kernel/head_32.S
Revision Date Author Comments
# ea4654e0 21-Nov-2023 Breno Leitao <leitao@debian.org>

x86/bugs: Rename CONFIG_PAGE_TABLE_ISOLATION => CONFIG_MITIGATION_PAGE_TABLE_ISOLATION

Step 4/10 of the namespace unification of CPU mitigations related Kconfig options.

[ mingo: Converted new uses that got added since the series was posted. ]

Suggested-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20231121160740.1249350-5-leitao@debian.org


# 0b62f6cb 17-Oct-2023 Thomas Gleixner <tglx@linutronix.de>

x86/microcode/32: Move early loading after paging enable

32-bit loads microcode before paging is enabled. The commit which
introduced that has zero justification in the changelog. The cover
letter has slightly more content, but it does not give any technical
justification either:

"The problem in current microcode loading method is that we load a
microcode way, way too late; ideally we should load it before turning
paging on. This may only be practical on 32 bits since we can't get
to 64-bit mode without paging on, but we should still do it as early
as at all possible."

Handwaving word salad with zero technical content.

Someone claimed in an offlist conversation that this is required for
curing the ATOM erratum AAE44/AAF40/AAG38/AAH41. That erratum requires
an microcode update in order to make the usage of PSE safe. But during
early boot, PSE is completely irrelevant and it is evaluated way later.

Neither is it relevant for the AP on single core HT enabled CPUs as the
microcode loading on the AP is not doing anything.

On dual core CPUs there is a theoretical problem if a split of an
executable large page between enabling paging including PSE and loading
the microcode happens. But that's only theoretical, it's practically
irrelevant because the affected dual core CPUs are 64bit enabled and
therefore have paging and PSE enabled before loading the microcode on
the second core. So why would it work on 64-bit but not on 32-bit?

The erratum:

"AAG38 Code Fetch May Occur to Incorrect Address After a Large Page is
Split Into 4-Kbyte Pages

Problem: If software clears the PS (page size) bit in a present PDE
(page directory entry), that will cause linear addresses mapped through
this PDE to use 4-KByte pages instead of using a large page after old
TLB entries are invalidated. Due to this erratum, if a code fetch uses
this PDE before the TLB entry for the large page is invalidated then it
may fetch from a different physical address than specified by either the
old large page translation or the new 4-KByte page translation. This
erratum may also cause speculative code fetches from incorrect addresses."

The practical relevance for this is exactly zero because there is no
splitting of large text pages during early boot-time, i.e. between paging
enable and microcode loading, and neither during CPU hotplug.

IOW, this load microcode before paging enable is yet another voodoo
programming solution in search of a problem. What's worse is that it causes
at least two serious problems:

1) When stackprotector is enabled, the microcode loader code has the
stackprotector mechanics enabled. The read from the per CPU variable
__stack_chk_guard is always accessing the virtual address either
directly on UP or via %fs on SMP. In physical address mode this
results in an access to memory above 3GB. So this works by chance as
the hardware returns the same value when there is no RAM at this
physical address. When there is RAM populated above 3G then the read
is by chance the same as nothing changes that memory during the very
early boot stage. That's not necessarily true during runtime CPU
hotplug.

2) When function tracing is enabled, the relevant microcode loader
functions and the functions invoked from there will call into the
tracing code and evaluate global and per CPU variables in physical
address mode. What could potentially go wrong?

Cure this and move the microcode loading after the early paging enable, use
the new temporary initrd mapping and remove the gunk in the microcode
loader which is required to handle physical address mode.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231017211722.348298216@linutronix.de


# 94ea9c05 06-Aug-2023 Masahiro Yamada <masahiroy@kernel.org>

x86/headers: Replace #include <asm/export.h> with #include <linux/export.h>

The following commit:

ddb5cdbafaaa ("kbuild: generate KSYMTAB entries by modpost")

deprecated <asm/export.h>, which is now a wrapper of <linux/export.h>.

Use <linux/export.h> in *.S as well as in *.c files.

After all the <asm/export.h> lines are replaced, <asm/export.h> and
<asm-generic/export.h> will be removed.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230806145958.380314-2-masahiroy@kernel.org


# cded3679 12-May-2023 Thomas Gleixner <tglx@linutronix.de>

x86/smpboot: Restrict soft_restart_cpu() to SEV

Now that the CPU0 hotplug cruft is gone, the only user is AMD SEV.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Tested-by: Helge Deller <deller@gmx.de> # parisc
Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com> # Steam Deck
Link: https://lore.kernel.org/r/20230512205255.822234014@linutronix.de


# 666e1156 12-May-2023 Thomas Gleixner <tglx@linutronix.de>

x86/smpboot: Rename start_cpu0() to soft_restart_cpu()

This is used in the SEV play_dead() implementation to re-online CPUs. But
that has nothing to do with CPU0.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Tested-by: Helge Deller <deller@gmx.de> # parisc
Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com> # Steam Deck
Link: https://lore.kernel.org/r/20230512205255.662319599@linutronix.de


# ff4c85c0 15-Nov-2022 Brian Gerst <brgerst@gmail.com>

x86/asm/32: Remove setup_once()

After the removal of the stack canary segment setup code, this function
does nothing.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20221115184328.70874-1-brgerst@gmail.com


# 3131ef39 13-Jul-2022 Jiri Slaby <jirislaby@kernel.org>

x86/asm/32: Fix ANNOTATE_UNRET_SAFE use on 32-bit

The build on x86_32 currently fails after commit

9bb2ec608a20 (objtool: Update Retpoline validation)

with:

arch/x86/kernel/../../x86/xen/xen-head.S:35: Error: no such instruction: `annotate_unret_safe'

ANNOTATE_UNRET_SAFE is defined in nospec-branch.h. And head_32.S is
missing this include. Fix this.

Fixes: 9bb2ec608a20 ("objtool: Update Retpoline validation")
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/63e23f80-033f-f64e-7522-2816debbc367@kernel.org


# f94909ce 04-Dec-2021 Peter Zijlstra <peterz@infradead.org>

x86: Prepare asm files for straight-line-speculation

Replace all ret/retq instructions with RET in preparation of making
RET a macro. Since AS is case insensitive it's a big no-op without
RET defined.

find arch/x86/ -name \*.S | while read file
do
sed -i 's/\<ret[q]*\>/RET/' $file
done

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20211204134907.905503893@infradead.org


# 33701557 15-Jun-2021 Chris Down <chris@chrisdown.name>

printk: Userspace format indexing support

We have a number of systems industry-wide that have a subset of their
functionality that works as follows:

1. Receive a message from local kmsg, serial console, or netconsole;
2. Apply a set of rules to classify the message;
3. Do something based on this classification (like scheduling a
remediation for the machine), rinse, and repeat.

As a couple of examples of places we have this implemented just inside
Facebook, although this isn't a Facebook-specific problem, we have this
inside our netconsole processing (for alarm classification), and as part
of our machine health checking. We use these messages to determine
fairly important metrics around production health, and it's important
that we get them right.

While for some kinds of issues we have counters, tracepoints, or metrics
with a stable interface which can reliably indicate the issue, in order
to react to production issues quickly we need to work with the interface
which most kernel developers naturally use when developing: printk.

Most production issues come from unexpected phenomena, and as such
usually the code in question doesn't have easily usable tracepoints or
other counters available for the specific problem being mitigated. We
have a number of lines of monitoring defence against problems in
production (host metrics, process metrics, service metrics, etc), and
where it's not feasible to reliably monitor at another level, this kind
of pragmatic netconsole monitoring is essential.

As one would expect, monitoring using printk is rather brittle for a
number of reasons -- most notably that the message might disappear
entirely in a new version of the kernel, or that the message may change
in some way that the regex or other classification methods start to
silently fail.

One factor that makes this even harder is that, under normal operation,
many of these messages are never expected to be hit. For example, there
may be a rare hardware bug which one wants to detect if it was to ever
happen again, but its recurrence is not likely or anticipated. This
precludes using something like checking whether the printk in question
was printed somewhere fleetwide recently to determine whether the
message in question is still present or not, since we don't anticipate
that it should be printed anywhere, but still need to monitor for its
future presence in the long-term.

This class of issue has happened on a number of occasions, causing
unhealthy machines with hardware issues to remain in production for
longer than ideal. As a recent example, some monitoring around
blk_update_request fell out of date and caused semi-broken machines to
remain in production for longer than would be desirable.

Searching through the codebase to find the message is also extremely
fragile, because many of the messages are further constructed beyond
their callsite (eg. btrfs_printk and other module-specific wrappers,
each with their own functionality). Even if they aren't, guessing the
format and formulation of the underlying message based on the aesthetics
of the message emitted is not a recipe for success at scale, and our
previous issues with fleetwide machine health checking demonstrate as
much.

This provides a solution to the issue of silently changed or deleted
printks: we record pointers to all printk format strings known at
compile time into a new .printk_index section, both in vmlinux and
modules. At runtime, this can then be iterated by looking at
<debugfs>/printk/index/<module>, which emits the following format, both
readable by humans and able to be parsed by machines:

$ head -1 vmlinux; shuf -n 5 vmlinux
# <level[,flags]> filename:line function "format"
<5> block/blk-settings.c:661 disk_stack_limits "%s: Warning: Device %s is misaligned\n"
<4> kernel/trace/trace.c:8296 trace_create_file "Could not create tracefs '%s' entry\n"
<6> arch/x86/kernel/hpet.c:144 _hpet_print_config "hpet: %s(%d):\n"
<6> init/do_mounts.c:605 prepare_namespace "Waiting for root device %s...\n"
<6> drivers/acpi/osl.c:1410 acpi_no_auto_serialize_setup "ACPI: auto-serialization disabled\n"

This mitigates the majority of cases where we have a highly-specific
printk which we want to match on, as we can now enumerate and check
whether the format changed or the printk callsite disappeared entirely
in userspace. This allows us to catch changes to printks we monitor
earlier and decide what to do about it before it becomes problematic.

There is no additional runtime cost for printk callers or printk itself,
and the assembly generated is exactly the same.

Signed-off-by: Chris Down <chris@chrisdown.name>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Jessica Yu <jeyu@kernel.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kees Cook <keescook@chromium.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Tested-by: Petr Mladek <pmladek@suse.com>
Reported-by: kernel test robot <lkp@intel.com>
Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Acked-by: Jessica Yu <jeyu@kernel.org> # for module.{c,h}
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/e42070983637ac5e384f17fbdbe86d19c7b212a5.1623775748.git.chris@chrisdown.name


# 3fb0fdb3 13-Feb-2021 Andy Lutomirski <luto@kernel.org>

x86/stackprotector/32: Make the canary into a regular percpu variable

On 32-bit kernels, the stackprotector canary is quite nasty -- it is
stored at %gs:(20), which is nasty because 32-bit kernels use %fs for
percpu storage. It's even nastier because it means that whether %gs
contains userspace state or kernel state while running kernel code
depends on whether stackprotector is enabled (this is
CONFIG_X86_32_LAZY_GS), and this setting radically changes the way
that segment selectors work. Supporting both variants is a
maintenance and testing mess.

Merely rearranging so that percpu and the stack canary
share the same segment would be messy as the 32-bit percpu address
layout isn't currently compatible with putting a variable at a fixed
offset.

Fortunately, GCC 8.1 added options that allow the stack canary to be
accessed as %fs:__stack_chk_guard, effectively turning it into an ordinary
percpu variable. This lets us get rid of all of the code to manage the
stack canary GDT descriptor and the CONFIG_X86_32_LAZY_GS mess.

(That name is special. We could use any symbol we want for the
%fs-relative mode, but for CONFIG_SMP=n, gcc refuses to let us use any
name other than __stack_chk_guard.)

Forcibly disable stackprotector on older compilers that don't support
the new options and turn the stack canary into a percpu variable. The
"lazy GS" approach is now used for all 32-bit configurations.

Also makes load_gs_index() work on 32-bit kernels. On 64-bit kernels,
it loads the GS selector and updates the user GSBASE accordingly. (This
is unchanged.) On 32-bit kernels, it loads the GS selector and updates
GSBASE, which is now always the user base. This means that the overall
effect is the same on 32-bit and 64-bit, which avoids some ifdeffery.

[ bp: Massage commit message. ]

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/c0ff7dba14041c7e5d1cae5d4df052f03759bef3.1613243844.git.luto@kernel.org


# a13f2ef1 29-Jun-2020 Juergen Gross <jgross@suse.com>

x86/xen: remove 32-bit Xen PV guest support

Xen is requiring 64-bit machines today and since Xen 4.14 it can be
built without 32-bit PV guest support. There is no need to carry the
burden of 32-bit PV guest support in the kernel any longer, as new
guests can be either HVM or PVH, or they can use a 64 bit kernel.

Remove the 32-bit Xen PV support from the kernel.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Juergen Gross <jgross@suse.com>


# 67a6af7a 01-Feb-2020 Arvind Sankar <nivedita@alum.mit.edu>

x86/boot: Remove KEEP_SEGMENTS support

Commit a24e785111a3 ("i386: paravirt boot sequence") added this flag for
use by paravirtualized environments such as Xen. However, Xen never made
use of this flag [1], and it was only ever used by lguest [2].

Commit ecda85e70277 ("x86/lguest: Remove lguest support") removed
lguest, so KEEP_SEGMENTS has lost its last user.

[1] https://lore.kernel.org/lkml/4D4B097C.5050405@goop.org
[2] https://www.mail-archive.com/lguest@lists.ozlabs.org/msg00469.html

Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Link: https://lore.kernel.org/r/20200202171353.3736319-2-nivedita@alum.mit.edu
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>


# f490e07c 20-Nov-2019 Thomas Gleixner <tglx@linutronix.de>

x86/pti/32: Size initial_page_table correctly

Commit 945fd17ab6ba ("x86/cpu_entry_area: Sync cpu_entry_area to
initial_page_table") introduced the sync for the initial page table for
32bit.

sync_initial_page_table() uses clone_pgd_range() which does the update for
the kernel page table. If PTI is enabled it also updates the user space
page table counterpart, which is assumed to be in the next page after the
target PGD.

At this point in time 32-bit did not have PTI support, so the user space
page table update was not taking place.

The support for PTI on 32-bit which was introduced later on, did not take
that into account and missed to add the user space counter part for the
initial page table.

As a consequence sync_initial_page_table() overwrites any data which is
located in the page behing initial_page_table causing random failures,
e.g. by corrupting doublefault_tss and wreckaging the doublefault handler
on 32bit.

Fix it by adding a "user" page table right after initial_page_table.

Fixes: 7757d607c6b3 ("x86/pti: Allow CONFIG_PAGE_TABLE_ISOLATION for x86_32")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Joerg Roedel <jroedel@suse.de>
Cc: stable@kernel.org


# 13fbe784 11-Oct-2019 Jiri Slaby <jirislaby@kernel.org>

x86/asm: Replace WEAK uses by SYM_INNER_LABEL_ALIGN

Use the new SYM_INNER_LABEL_ALIGN for WEAK entries in the middle of x86
assembly functions.

And make sure WEAK is not defined for x86 anymore as these were the last
users.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: linux-arch@vger.kernel.org
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20191011115108.12392-29-jslaby@suse.cz


# 6d685e53 11-Oct-2019 Jiri Slaby <jirislaby@kernel.org>

x86/asm/32: Change all ENTRY+ENDPROC to SYM_FUNC_*

These are all functions which are invoked from elsewhere, so annotate
them as global using the new SYM_FUNC_START and their ENDPROC's by
SYM_FUNC_END.

Now, ENTRY/ENDPROC can be forced to be undefined on X86, so do so.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Allison Randal <allison@lohutok.net>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Andy Shevchenko <andy@infradead.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Bill Metzenthen <billm@melbpc.org.au>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Darren Hart <dvhart@infradead.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: linux-arch@vger.kernel.org
Cc: linux-crypto@vger.kernel.org
Cc: linux-efi <linux-efi@vger.kernel.org>
Cc: linux-efi@vger.kernel.org
Cc: linux-pm@vger.kernel.org
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: platform-driver-x86@vger.kernel.org
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20191011115108.12392-28-jslaby@suse.cz


# 78762b0e 11-Oct-2019 Jiri Slaby <jirislaby@kernel.org>

x86/asm/32: Add ENDs to some functions and relabel with SYM_CODE_*

All these are functions which are invoked from elsewhere but they are
not typical C functions. So annotate them using the new SYM_CODE_START.
All these were not balanced with any END, so mark their ends by
SYM_CODE_END, appropriately.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> [xen bits]
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> [hibernate]
Cc: Andy Lutomirski <luto@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Len Brown <len.brown@intel.com>
Cc: linux-arch@vger.kernel.org
Cc: linux-pm@vger.kernel.org
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Pingfan Liu <kernelfans@gmail.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Cc: xen-devel@lists.xenproject.org
Link: https://lkml.kernel.org/r/20191011115108.12392-26-jslaby@suse.cz


# b1bd27b9 11-Oct-2019 Jiri Slaby <jirislaby@kernel.org>

x86/asm/head: Annotate data appropriately

Use the new SYM_DATA, SYM_DATA_START, and SYM_DATA_END in both 32 and 64
bit head_*.S. In the 64-bit version, define also
SYM_DATA_START_PAGE_ALIGNED locally using the new SYM_START. It is used
in the code instead of NEXT_PAGE() which was defined in this file and
had been using the obsolete macro GLOBAL().

Now, the data in the 64-bit object file look sane:
Value Size Type Bind Vis Ndx Name
0000 4096 OBJECT GLOBAL DEFAULT 15 init_level4_pgt
1000 4096 OBJECT GLOBAL DEFAULT 15 level3_kernel_pgt
2000 2048 OBJECT GLOBAL DEFAULT 15 level2_kernel_pgt
3000 4096 OBJECT GLOBAL DEFAULT 15 level2_fixmap_pgt
4000 4096 OBJECT GLOBAL DEFAULT 15 level1_fixmap_pgt
5000 2 OBJECT GLOBAL DEFAULT 15 early_gdt_descr
5002 8 OBJECT LOCAL DEFAULT 15 early_gdt_descr_base
500a 8 OBJECT GLOBAL DEFAULT 15 phys_base
0000 8 OBJECT GLOBAL DEFAULT 17 initial_code
0008 8 OBJECT GLOBAL DEFAULT 17 initial_gs
0010 8 OBJECT GLOBAL DEFAULT 17 initial_stack
0000 4 OBJECT GLOBAL DEFAULT 19 early_recursion_flag
1000 4096 OBJECT GLOBAL DEFAULT 19 early_level4_pgt
2000 0x40000 OBJECT GLOBAL DEFAULT 19 early_dynamic_pgts
0000 4096 OBJECT GLOBAL DEFAULT 22 empty_zero_page

All have correct size and type now.

Note that this also removes implicit 16B alignment previously inserted
by ENTRY:

* initial_code, setup_once_ref, initial_page_table, initial_stack,
boot_gdt are still aligned
* early_gdt_descr is now properly aligned as was intended before ENTRY
was added there long time ago
* phys_base's alignment is kept by an explicitly added new alignment

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Cao jin <caoj.fnst@cn.fujitsu.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: linux-arch@vger.kernel.org
Cc: Maran Wilson <maran.wilson@oracle.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20191011115108.12392-12-jslaby@suse.cz


# ef77e688 11-Oct-2019 Jiri Slaby <jirislaby@kernel.org>

x86/asm: Annotate local pseudo-functions

Use the newly added SYM_CODE_START_LOCAL* to annotate beginnings of
all pseudo-functions (those ending with END until now) which do not
have ".globl" annotation. This is needed to balance END for tools that
generate debuginfo. Note that ENDs are switched to SYM_CODE_END too so
that everybody can see the pairing.

C-like functions (which handle frame ptr etc.) are not annotated here,
hence SYM_CODE_* macros are used here, not SYM_FUNC_*. Note that the
32bit version of early_idt_handler_common already had ENDPROC -- switch
that to SYM_CODE_END for the same reason as above (and to be the same as
64bit).

While early_idt_handler_common is LOCAL, it's name is not prepended with
".L" as it happens to appear in call traces.

bad_get_user*, and bad_put_user are now aligned, as they are separate
functions. They do not mind to be aligned -- no need to be compact
there.

early_idt_handler_common is aligned now too, as it is after
early_idt_handler_array, so as well no need to be compact there.

verify_cpu is self-standing and included in other .S files, so align it
too.

The others have alignment preserved to what it used to be (using the
_NOALIGN variant of macros).

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Alexios Zavras <alexios.zavras@intel.com>
Cc: Allison Randal <allison@lohutok.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Cao jin <caoj.fnst@cn.fujitsu.com>
Cc: Enrico Weigelt <info@metux.net>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: linux-arch@vger.kernel.org
Cc: Maran Wilson <maran.wilson@oracle.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20191011115108.12392-6-jslaby@suse.cz


# 5aa5cbd2 03-Oct-2019 Jiri Slaby <jirislaby@kernel.org>

x86/asm: Make boot_gdt_descr local

As far as I can see, it was never used outside of head_32.S. Not even
when added in 2004. So make it local.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20191003095238.29831-2-jslaby@suse.cz


# e3238faf 18-Jul-2018 Joerg Roedel <jroedel@suse.de>

x86/pgtable/32: Allocate 8k page-tables when PTI is enabled

Allocate a kernel and a user page-table root when PTI is enabled. Also
allocate a full page per root for PAE because otherwise the bit to flip in
CR3 to switch between them would be non-constant, which creates a lot of
hassle. Keep that for a later optimization.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Pavel Machek <pavel@ucw.cz>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: linux-mm@kvack.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Waiman Long <llong@redhat.com>
Cc: "David H . Gutteridge" <dhgutteridge@sympatico.ca>
Cc: joro@8bytes.org
Link: https://lkml.kernel.org/r/1531906876-13451-18-git-send-email-joro@8bytes.org


# 050e9baa 13-Jun-2018 Linus Torvalds <torvalds@linux-foundation.org>

Kbuild: rename CC_STACKPROTECTOR[_STRONG] config variables

The changes to automatically test for working stack protector compiler
support in the Kconfig files removed the special STACKPROTECTOR_AUTO
option that picked the strongest stack protector that the compiler
supported.

That was all a nice cleanup - it makes no sense to have the AUTO case
now that the Kconfig phase can just determine the compiler support
directly.

HOWEVER.

It also meant that doing "make oldconfig" would now _disable_ the strong
stackprotector if you had AUTO enabled, because in a legacy config file,
the sane stack protector configuration would look like

CONFIG_HAVE_CC_STACKPROTECTOR=y
# CONFIG_CC_STACKPROTECTOR_NONE is not set
# CONFIG_CC_STACKPROTECTOR_REGULAR is not set
# CONFIG_CC_STACKPROTECTOR_STRONG is not set
CONFIG_CC_STACKPROTECTOR_AUTO=y

and when you ran this through "make oldconfig" with the Kbuild changes,
it would ask you about the regular CONFIG_CC_STACKPROTECTOR (that had
been renamed from CONFIG_CC_STACKPROTECTOR_REGULAR to just
CONFIG_CC_STACKPROTECTOR), but it would think that the STRONG version
used to be disabled (because it was really enabled by AUTO), and would
disable it in the new config, resulting in:

CONFIG_HAVE_CC_STACKPROTECTOR=y
CONFIG_CC_HAS_STACKPROTECTOR_NONE=y
CONFIG_CC_STACKPROTECTOR=y
# CONFIG_CC_STACKPROTECTOR_STRONG is not set
CONFIG_CC_HAS_SANE_STACKPROTECTOR=y

That's dangerously subtle - people could suddenly find themselves with
the weaker stack protector setup without even realizing.

The solution here is to just rename not just the old RECULAR stack
protector option, but also the strong one. This does that by just
removing the CC_ prefix entirely for the user choices, because it really
is not about the compiler support (the compiler support now instead
automatially impacts _visibility_ of the options to users).

This results in "make oldconfig" actually asking the user for their
choice, so that we don't have any silent subtle security model changes.
The end result would generally look like this:

CONFIG_HAVE_CC_STACKPROTECTOR=y
CONFIG_CC_HAS_STACKPROTECTOR_NONE=y
CONFIG_STACKPROTECTOR=y
CONFIG_STACKPROTECTOR_STRONG=y
CONFIG_CC_HAS_SANE_STACKPROTECTOR=y

where the "CC_" versions really are about internal compiler
infrastructure, not the user selections.

Acked-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# b399151c 31-Dec-2017 Jia Zhang <qianyue.zj@alibaba-inc.com>

x86/cpu: Rename cpu_data.x86_mask to cpu_data.x86_stepping

x86_mask is a confusing name which is hard to associate with the
processor's stepping.

Additionally, correct an indent issue in lib/cpu.c.

Signed-off-by: Jia Zhang <qianyue.zj@alibaba-inc.com>
[ Updated it to more recent kernels. ]
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: bp@alien8.de
Cc: tony.luck@intel.com
Link: http://lkml.kernel.org/r/1514771530-70829-1-git-send-email-qianyue.zj@alibaba-inc.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# b2441318 01-Nov-2017 Greg Kroah-Hartman <gregkh@linuxfoundation.org>

License cleanup: add SPDX GPL-2.0 license identifier to files with no license

Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.

By default all files without license information are under the default
license of the kernel, which is GPL version 2.

Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.

This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.

How this work was done:

Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,

Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.

The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.

The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.

Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).

All documentation files were explicitly excluded.

The following heuristics were used to determine which SPDX license
identifiers to apply.

- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.

For non */uapi/* files that summary was:

SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139

and resulted in the first patch in this series.

If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:

SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930

and resulted in the second patch in this series.

- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:

SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1

and that resulted in the third patch in this series.

- when the two scanners agreed on the detected license(s), that became
the concluded license(s).

- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.

- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).

- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.

- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.

In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.

Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.

Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.

In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.

Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct

This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.

These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.

Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# b0ce5b8c 27-Oct-2017 Ricardo Neri <ricardo.neri-calderon@linux.intel.com>

x86/boot: Relocate definition of the initial state of CR0

Both head_32.S and head_64.S utilize the same value to initialize the
control register CR0. Also, other parts of the kernel might want to access
this initial definition (e.g., emulation code for User-Mode Instruction
Prevention uses this state to provide a sane dummy value for CR0 when
emulating the smsw instruction). Thus, relocate this definition to a
header file from which it can be conveniently accessed.

Suggested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: ricardo.neri@intel.com
Cc: linux-mm@kvack.org
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Huang Rui <ray.huang@amd.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: linux-arch@vger.kernel.org
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Chen Yucong <slaoub@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lkml.kernel.org/r/1509135945-13762-3-git-send-email-ricardo.neri-calderon@linux.intel.com


# 82c62fa0 20-Oct-2017 Josh Poimboeuf <jpoimboe@redhat.com>

x86/asm: Don't use the confusing '.ifeq' directive

I find the '.ifeq <expression>' directive to be confusing. Reading it
quickly seems to suggest its opposite meaning, or that it's missing an
argument.

Improve readability by replacing all of its x86 uses with
'.if <expression> == 0'.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andrei Vagin <avagin@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/757da028e802c7e98d23fbab8d234b1063e161cf.1508516398.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 04b5de3a 31-Aug-2017 Jiri Slaby <jirislaby@kernel.org>

x86/idt: Remove superfluous ALIGNment

Commit 87e81786b13b ("x86/idt: Move early IDT setup out of 32-bit asm")
switched early_ignore_irq to use ENTRY. ENTRY aligns the code, so there
is no need for one more ALIGN right before the function.

And add one \n after the function to separate it from the data.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Link: http://lkml.kernel.org/r/20170831121653.28917-1-jslaby@suse.cz


# 87e81786 28-Aug-2017 Thomas Gleixner <tglx@linutronix.de>

x86/idt: Move early IDT setup out of 32-bit asm

The early IDT setup can be done in C code like it's done on 64-bit kernels.
Reuse the 64-bit version.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20170828064958.757980775@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 16bc18d8 28-Aug-2017 Thomas Gleixner <tglx@linutronix.de>

x86/idt: Move 32-bit idt_descr to C code

32-bit kernels have the idt_descr defined in the low level assembly entry code,
but there is no good reason for that.

Move it into the C file and use the 64-bit version of it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20170828064958.445862201@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# ecda85e7 16-Aug-2017 Juergen Gross <jgross@suse.com>

x86/lguest: Remove lguest support

Lguest seems to be rather unused these days. It has seen only patches
ensuring it still builds the last two years and its official state is
"Odd Fixes".

Remove it in order to be able to clean up the paravirt code.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: boris.ostrovsky@oracle.com
Cc: lguest@lists.ozlabs.org
Cc: rusty@rustcorp.com.au
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/20170816173157.8633-3-jgross@suse.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 630c1863 28-Jul-2017 Andy Lutomirski <luto@kernel.org>

x86/traps: Don't clear segment high bits in early_idt_handler_common()

Now that pt_regs defines the segment fields as 16-bit, there's no
need to sanitize the values.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 5a7670ee 03-Feb-2017 Boris Ostrovsky <boris.ostrovsky@oracle.com>

x86/boot/32: Convert the 32-bit pgtable setup code from assembly to C

The new Xen PVH entry point requires page tables to be setup by the
kernel since it is entered with paging disabled.

Pull the common code out of head_32.S so that mk_early_pgtbl_32() can be
invoked from both the new Xen entry point and the existing startup_32()
code.

Convert resulting common code to C.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: matt@codeblueprint.co.uk
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1481215471-9639-1-git-send-email-boris.ostrovsky@oracle.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 1e620f9b 08-Dec-2016 Boris Ostrovsky <boris.ostrovsky@oracle.com>

x86/boot/32: Convert the 32-bit pgtable setup code from assembly to C

The new Xen PVH entry point requires page tables to be setup by the
kernel since it is entered with paging disabled.

Pull the common code out of head_32.S so that mk_early_pgtbl_32() can be
invoked from both the new Xen entry point and the existing startup_32()
code.

Convert resulting common code to C.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: matt@codeblueprint.co.uk
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1481215471-9639-1-git-send-email-boris.ostrovsky@oracle.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 553bbc11 16-Nov-2016 Arnd Bergmann <arnd@arndb.de>

x86/boot: Avoid warning for zero-filling .bss

The latest binutils are warning about a .fill directive with an explicit
value in a .bss section:

arch/x86/kernel/head_32.S: Assembler messages:
arch/x86/kernel/head_32.S:677: Warning: ignoring fill value in section `.bss..page_aligned'
arch/x86/kernel/head_32.S:679: Warning: ignoring fill value in section `.bss..page_aligned'

This comes from the 'ENTRY()' macro padding the space between the symbols
with 'nop' via:

.align 4,0x90

Open-coding the .globl directive without the padding avoids that warning,
as all the symbols are already page aligned.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20161116141726.2013389-1-arnd@arndb.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 22dc3918 21-Sep-2016 Josh Poimboeuf <jpoimboe@redhat.com>

x86/boot: Fix the end of the stack for idle tasks

Thanks to all the recent x86 entry code refactoring, most tasks' kernel
stacks start at the same offset right below their saved pt_regs,
regardless of which syscall was used to enter the kernel. That creates
a nice convention which makes it straightforward to identify the end of
the stack, which can be useful for the unwinder to verify the stack is
sane.

However, the boot CPU's idle "swapper" task doesn't follow that
convention. Fix that by starting its stack at a sizeof(pt_regs) offset
from the end of the stack page.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/81aee3beb6ed88e44f1bea6986bb7b65c368f77a.1474480779.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 6616a147 21-Sep-2016 Josh Poimboeuf <jpoimboe@redhat.com>

x86/boot/32: Fix the end of the stack for idle tasks

The frame at the end of each idle task stack is inconsistent with real
task stacks, which have a stack frame header and a real return address
before the pt_regs area. This inconsistency can be confusing for stack
unwinders. It also hides useful information about what asm code was
involved in calling into C.

Fix that by changing the initial code jumps to calls. Also add infinite
loops after the calls to make it clear that the calls don't return, and
to hang if they do.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/2588f34b6fbac4ae6f6f9ead2a78d7f8d58a6341.1474480779.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 1b00255f 21-Sep-2016 Josh Poimboeuf <jpoimboe@redhat.com>

x86/entry/32, x86/boot/32: Use local labels

Add the local label prefix to all non-function named labels in head_32.S
and entry_32.S. In addition to decluttering the symbol table, it also
will help stack traces to be more sensible. For example, the last
reported function in the idle task stack trace will be startup_32_smp()
instead of is486().

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/14f9f7afd478b23a762f40734da1a57c0c273f6e.1474480779.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# b32f96c7 18-Aug-2016 Josh Poimboeuf <jpoimboe@redhat.com>

x86/asm/head: Rename 'stack_start' -> 'initial_stack'

The 'stack_start' variable is similar in usage to 'initial_code' and
'initial_gs': they're all stored in head_64.S and they're all updated by
SMP and ACPI suspend before starting a CPU.

Rename it to 'initial_stack' to be consistent with the others.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/87063d773a3212051b77e17b0ee427f6582a5050.1471535549.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 784d5699 11-Jan-2016 Al Viro <viro@zeniv.linux.org.uk>

x86: move exports to actual definitions

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# e16d8a6c 26-Apr-2016 Andy Lutomirski <luto@kernel.org>

Revert "x86/mm/32: Set NX in __supported_pte_mask before enabling paging"

This reverts commit 320d25b6a05f8b73c23fc21025d2906ecdd2d4fc.

This change was problematic for a couple of reasons:

1. It missed a some entry points (Xen things and 64-bit native).

2. The entry it changed can be executed more than once. This isn't
really a problem, but it conflated per-cpu state setup and global
state setup.

3. It broke 64-bit non-NX. 64-bit non-NX worked the other way around from
32-bit -- __supported_pte_mask had NX set initially and was *cleared*
in x86_configure_nx. With the patch applied, it never got cleared.

Reported-and-tested-by: Meelis Roos <mroos@linux.ee>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/59bd15f7f4b56b633a611b7f70876c6d2ad01a98.1461685884.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 0e861fbb 02-Apr-2016 Andy Lutomirski <luto@kernel.org>

x86/head: Move early exception panic code into early_fixup_exception()

This removes a bunch of assembly and adds some C code instead. It
changes the actual printouts on both 32-bit and 64-bit kernels, but
they still seem okay.

Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: KVM list <kvm@vger.kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: xen-devel <Xen-devel@lists.xen.org>
Link: http://lkml.kernel.org/r/4085070316fc3ab29538d3fcfe282648d1d4ee2e.1459605520.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 0d0efc07 02-Apr-2016 Andy Lutomirski <luto@kernel.org>

x86/head: Move the early NMI fixup into C

C is nicer than asm.

Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: KVM list <kvm@vger.kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: xen-devel <Xen-devel@lists.xen.org>
Link: http://lkml.kernel.org/r/dd068269f8d59fe44e9e43a50d0efd67da65c2b5.1459605520.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 7bbcdb1c 02-Apr-2016 Andy Lutomirski <luto@kernel.org>

x86/head: Pass a real pt_regs and trapnr to early_fixup_exception()

early_fixup_exception() is limited by the fact that it doesn't have a
real struct pt_regs. Change both the 32-bit and 64-bit asm and the
C code to pass and accept a real pt_regs.

Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: KVM list <kvm@vger.kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: xen-devel <Xen-devel@lists.xen.org>
Link: http://lkml.kernel.org/r/e3fb680fcfd5e23e38237e8328b64a25cc121d37.1459605520.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# cd4d09ec 26-Jan-2016 Borislav Petkov <bp@suse.de>

x86/cpufeature: Carve out X86_FEATURE_*

Move them to a separate header and have the following
dependency:

x86/cpufeatures.h <- x86/processor.h <- x86/cpufeature.h

This makes it easier to use the header in asm code and not
include the whole cpufeature.h and add guards for asm.

Suggested-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1453842730-28463-5-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 320d25b6 19-Jan-2016 Andy Lutomirski <luto@kernel.org>

x86/mm/32: Set NX in __supported_pte_mask before enabling paging

There's a short window in which very early mappings can end up
with NX clear because they are created before we've noticed that
we have NX.

It turns out that we detect NX very early, so there's no need to
defer __supported_pte_mask setup.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/2b544627345f7110160545a3f47031eb45c3ad4f.1453239349.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# fe055896 20-Oct-2015 Borislav Petkov <bp@suse.de>

x86/microcode: Merge the early microcode loader

Merge the early loader functionality into the driver proper. The
diff is huge but logically, it is simply moving code from the
_early.c files into the main driver.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/1445334889-300-3-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 04c17341 19-Jun-2015 Borislav Petkov <bp@alien8.de>

x86/boot: Fix overflow warning with 32-bit binutils

When building the kernel with 32-bit binutils built with support
only for the i386 target, we get the following warning:

arch/x86/kernel/head_32.S:66: Warning: shift count out of range (32 is not between 0 and 31)

The problem is that in that case, binutils' internal type
representation is 32-bit wide and the shift range overflows.

In order to fix this, manipulate the shift expression which
creates the 4GiB constant to not overflow the shift count.

Suggested-by: Michael Matz <matz@suse.de>
Reported-and-tested-by: Enrico Mioso <mrkiko.rs@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 425be567 22-May-2015 Andy Lutomirski <luto@kernel.org>

x86/asm/irq: Stop relying on magic JMP behavior for early_idt_handlers

The early_idt_handlers asm code generates an array of entry
points spaced nine bytes apart. It's not really clear from that
code or from the places that reference it what's going on, and
the code only works in the first place because GAS never
generates two-byte JMP instructions when jumping to global
labels.

Clean up the code to generate the correct array stride (member size)
explicitly. This should be considerably more robust against
screw-ups, as GAS will warn if a .fill directive has a negative
count. Using '. =' to advance would have been even more robust
(it would generate an actual error if it tried to move
backwards), but it would pad with nulls, confusing anyone who
tries to disassemble the code. The new scheme should be much
clearer to future readers.

While we're at it, improve the comments and rename the array and
common code.

Binutils may start relaxing jumps to non-weak labels. If so,
this change will fix our build, and we may need to backport this
change.

Before, on x86_64:

0000000000000000 <early_idt_handlers>:
0: 6a 00 pushq $0x0
2: 6a 00 pushq $0x0
4: e9 00 00 00 00 jmpq 9 <early_idt_handlers+0x9>
5: R_X86_64_PC32 early_idt_handler-0x4
...
48: 66 90 xchg %ax,%ax
4a: 6a 08 pushq $0x8
4c: e9 00 00 00 00 jmpq 51 <early_idt_handlers+0x51>
4d: R_X86_64_PC32 early_idt_handler-0x4
...
117: 6a 00 pushq $0x0
119: 6a 1f pushq $0x1f
11b: e9 00 00 00 00 jmpq 120 <early_idt_handler>
11c: R_X86_64_PC32 early_idt_handler-0x4

After:

0000000000000000 <early_idt_handler_array>:
0: 6a 00 pushq $0x0
2: 6a 00 pushq $0x0
4: e9 14 01 00 00 jmpq 11d <early_idt_handler_common>
...
48: 6a 08 pushq $0x8
4a: e9 d1 00 00 00 jmpq 120 <early_idt_handler_common>
4f: cc int3
50: cc int3
...
117: 6a 00 pushq $0x0
119: 6a 1f pushq $0x1f
11b: eb 03 jmp 120 <early_idt_handler_common>
11d: cc int3
11e: cc int3
11f: cc int3

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Binutils <binutils@sourceware.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H.J. Lu <hjl.tools@gmail.com>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/ac027962af343b0c599cbfcf50b945ad2ef3d7a8.1432336324.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# cdeb6048 22-May-2015 Andy Lutomirski <luto@kernel.org>

x86/asm/irq: Stop relying on magic JMP behavior for early_idt_handlers

The early_idt_handlers asm code generates an array of entry
points spaced nine bytes apart. It's not really clear from that
code or from the places that reference it what's going on, and
the code only works in the first place because GAS never
generates two-byte JMP instructions when jumping to global
labels.

Clean up the code to generate the correct array stride (member size)
explicitly. This should be considerably more robust against
screw-ups, as GAS will warn if a .fill directive has a negative
count. Using '. =' to advance would have been even more robust
(it would generate an actual error if it tried to move
backwards), but it would pad with nulls, confusing anyone who
tries to disassemble the code. The new scheme should be much
clearer to future readers.

While we're at it, improve the comments and rename the array and
common code.

Binutils may start relaxing jumps to non-weak labels. If so,
this change will fix our build, and we may need to backport this
change.

Before, on x86_64:

0000000000000000 <early_idt_handlers>:
0: 6a 00 pushq $0x0
2: 6a 00 pushq $0x0
4: e9 00 00 00 00 jmpq 9 <early_idt_handlers+0x9>
5: R_X86_64_PC32 early_idt_handler-0x4
...
48: 66 90 xchg %ax,%ax
4a: 6a 08 pushq $0x8
4c: e9 00 00 00 00 jmpq 51 <early_idt_handlers+0x51>
4d: R_X86_64_PC32 early_idt_handler-0x4
...
117: 6a 00 pushq $0x0
119: 6a 1f pushq $0x1f
11b: e9 00 00 00 00 jmpq 120 <early_idt_handler>
11c: R_X86_64_PC32 early_idt_handler-0x4

After:

0000000000000000 <early_idt_handler_array>:
0: 6a 00 pushq $0x0
2: 6a 00 pushq $0x0
4: e9 14 01 00 00 jmpq 11d <early_idt_handler_common>
...
48: 6a 08 pushq $0x8
4a: e9 d1 00 00 00 jmpq 120 <early_idt_handler_common>
4f: cc int3
50: cc int3
...
117: 6a 00 pushq $0x0
119: 6a 1f pushq $0x1f
11b: eb 03 jmp 120 <early_idt_handler_common>
11d: cc int3
11e: cc int3
11f: cc int3

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Binutils <binutils@sourceware.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H.J. Lu <hjl.tools@gmail.com>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/ac027962af343b0c599cbfcf50b945ad2ef3d7a8.1432336324.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# e839004b 16-May-2015 Borislav Petkov <bp@suse.de>

x86/asm/head*.S: Change global labels to local

Make the disassembly look less confusing:

-- head_64.o.before.asm
++ head_64.o.after.asm
0000000000000120 <early_idt_handler>:
120: fc cld
121: 83 3c 24 02 cmpl $0x2,(%rsp)
- 125: 0f 84 9d 00 00 00 je 1c8 <is_nmi>
+ 125: 0f 84 9d 00 00 00 je 1c8 <early_idt_handler+0xa8>
12b: 83 3d 00 00 00 00 02 cmpl $0x2,0x0(%rip) # 132 <early_idt_handler+0x12>
132: 74 7e je 1b2 <early_idt_handler+0x92>
134: ff 05 00 00 00 00 incl 0x0(%rip) # 13a <early_idt_handler+0x1a>
@@ -1198,9 +1198,7 @@ Disassembly of section .init.text:
1bf: 5a pop %rdx
1c0: 59 pop %rcx
1c1: 58 pop %rax
- 1c2: ff 0d 00 00 00 00 decl 0x0(%rip) # 1c8 <is_nmi>
-
-00000000000001c8 <is_nmi>:
+ 1c2: ff 0d 00 00 00 00 decl 0x0(%rip) # 1c8 <early_idt_handler+0xa8>
1c8: 48 83 c4 10 add $0x10,%rsp
1cc: 48 cf iretq

-- head_32.o.before.asm
++ head_32.o.after.asm
0000016c <early_idt_handler>:
16c: fc cld
16d: 83 3c 24 02 cmpl $0x2,(%esp)
- 171: 74 73 je 1e6 <is_nmi>
+ 171: 74 73 je 1e6 <ex_entry+0xc>
173: 36 83 3d 00 00 00 00 cmpl $0x2,%ss:0x0
17a: 02
17b: 74 5a je 1d7 <hlt_loop>
@@ -483,8 +483,6 @@ Disassembly of section .init.text:
1dd: 59 pop %ecx
1de: 58 pop %eax
1df: 36 ff 0d 00 00 00 00 decl %ss:0x0
-
-000001e6 <is_nmi>:
1e6: 83 c4 08 add $0x8,%esp
1e9: cf iret
1ea: 66 90 xchg %ax,%ax

No functionality change.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1431793079-11153-1-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# fb148d83 19-Feb-2015 Alexander Kuleshov <kuleshovmail@gmail.com>

x86/asm/boot: Use already defined KEEP_SEGMENTS macro in head_{32,64}.S

There is already defined macro KEEP_SEGMENTS in
<asm/bootparam.h>, let's use it instead of hardcoded
constants.

Signed-off-by: Alexander Kuleshov <kuleshovmail@gmail.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1424331298-7456-1-git-send-email-kuleshovmail@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# b01d4e68 07-Mar-2014 Linus Torvalds <torvalds@linux-foundation.org>

x86: fix compile error due to X86_TRAP_NMI use in asm files

It's an enum, not a #define, you can't use it in asm files.

Introduced in commit 5fa10196bdb5 ("x86: Ignore NMIs that come in during
early boot"), and sadly I didn't compile-test things like I should have
before pushing out.

My weak excuse is that the x86 tree generally doesn't introduce stupid
things like this (and the ARM pull afterwards doesn't cause me to do a
compile-test either, since I don't cross-compile).

Cc: Don Zickus <dzickus@redhat.com>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 5fa10196 07-Mar-2014 H. Peter Anvin <hpa@linux.intel.com>

x86: Ignore NMIs that come in during early boot

Don Zickus reports:

A customer generated an external NMI using their iLO to test kdump
worked. Unfortunately, the machine hung. Disabling the nmi_watchdog
made things work.

I speculated the external NMI fired, caused the machine to panic (as
expected) and the perf NMI from the watchdog came in and was latched.
My guess was this somehow caused the hang.

----

It appears that the latched NMI stays latched until the early page
table generation on 64 bits, which causes exceptions to happen which
end in IRET, which re-enable NMI. Therefore, ignore NMIs that come in
during early execution, until we have proper exception handling.

Reported-and-tested-by: Don Zickus <dzickus@redhat.com>
Link: http://lkml.kernel.org/r/1394221143-29713-1-git-send-email-dzickus@redhat.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: <stable@vger.kernel.org> # v3.5+, older with some backport effort


# 148f9bb8 18-Jun-2013 Paul Gortmaker <paul.gortmaker@windriver.com>

x86: delete __cpuinit usage from all x86 files

The __cpuinit type of throwaway sections might have made sense
some time ago when RAM was more constrained, but now the savings
do not offset the cost and complications. For example, the fix in
commit 5e427ec2d0 ("x86: Fix bit corruption at CPU resume time")
is a good example of the nasty type of bugs that can be created
with improper use of the various __init prefixes.

After a discussion on LKML[1] it was decided that cpuinit should go
the way of devinit and be phased out. Once all the users are gone,
we can then finally remove the macros themselves from linux/init.h.

Note that some harmless section mismatch warnings may result, since
notify_cpu_starting() and cpu_up() are arch independent (kernel/cpu.c)
are flagged as __cpuinit -- so if we remove the __cpuinit from
arch specific callers, we will also get section mismatch warnings.
As an intermediate step, we intend to turn the linux/init.h cpuinit
content into no-ops as early as possible, since that will get rid
of these warnings. In any case, they are temporary and harmless.

This removes all the arch/x86 uses of the __cpuinit macros from
all C files. x86 only had the one __CPUINIT used in assembly files,
and it wasn't paired off with a .previous or a __FINIT, so we can
delete it directly w/o any corresponding additional change there.

[1] https://lkml.org/lkml/2013/5/20/589

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>


# 237d1548 28-Jun-2013 Wang YanQing <udknight@gmail.com>

x86: Fix override new_cpu_data.x86 with 486

We should set X86 to 486 before use cpuid to detect the cpu type, if
we set X86 to 486 after cpuid, then we will get 486 until cpu_detect
runs.

Signed-off-by: Wang YanQing <udknight@gmail.com>
Link: http://lkml.kernel.org/r/20130628144516.GA2177@udknight
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>


# 60e019eb 29-Apr-2013 H. Peter Anvin <hpa@zytor.com>

x86: Get rid of ->hard_math and all the FPU asm fu

Reimplement FPU detection code in C and drop old, not-so-recommended
detection method in asm. Move all the relevant stuff into i387.c where
it conceptually belongs. Finally drop cpuinfo_x86.hard_math.

[ hpa: huge thanks to Borislav for taking my original concept patch
and productizing it ]

[ Boris, note to self: do not use static_cpu_has before alternatives! ]

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Link: http://lkml.kernel.org/r/1367244262-29511-2-git-send-email-bp@alien8.de
Link: http://lkml.kernel.org/r/1365436666-9837-2-git-send-email-bp@alien8.de
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>


# 5e2a044d 11-Feb-2013 Borislav Petkov <bp@suse.de>

x86, head_32: Give the 6 label a real name

Jumping here we are about to enable paging so rename the label
accordingly.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1360592538-10643-5-git-send-email-bp@alien8.de
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>


# c3a22a26 11-Feb-2013 Borislav Petkov <bp@suse.de>

x86, head_32: Remove second CPUID detection from default_entry

We do that once earlier now and cache it into new_cpu_data.cpuid_level
so no need for the EFLAGS.ID toggling dance anymore.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1360592538-10643-4-git-send-email-bp@alien8.de
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>


# 9efb58de 11-Feb-2013 Borislav Petkov <bp@suse.de>

x86: Detect CPUID support early at boot

We detect CPUID function support on each CPU and save it for later use,
obviating the need to play the toggle EFLAGS.ID game every time. C code
is looking at ->cpuid_level anyway.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1360592538-10643-3-git-send-email-bp@alien8.de
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>


# 166df91d 11-Feb-2013 Borislav Petkov <bp@suse.de>

x86, head_32: Remove i386 pieces

Remove code fragments detecting a 386 CPU since we don't support those
anymore. Also, do not do alignment checks because they're done only at
CPL3. Also, no need to preserve EFLAGS.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1360592538-10643-2-git-send-email-bp@alien8.de
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>


# 63b553c6 21-Dec-2012 Fenghua Yu <fenghua.yu@intel.com>

x86/head_32.S: Early update ucode in 32-bit

This updates ucode in 32-bit kernel on BSP and AP. At this point, there is no
paging and no virtual address yet.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1356075872-3054-10-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>


# 021ef050 19-Jan-2013 H. Peter Anvin <hpa@linux.intel.com>

x86-32: Start out cr0 clean, disable paging before modifying cr3/4

Patch

5a5a51db78e x86-32: Start out eflags and cr4 clean

... made x86-32 match x86-64 in that we initialize %eflags and %cr4
from scratch. This broke OLPC XO-1.5, because the XO enters the
kernel with paging enabled, which the kernel doesn't expect.

Since we no longer support 386 (the source of most of the variability
in %cr0 configuration), we can simply match further x86-64 and
initialize %cr0 to a fixed value -- the one variable part remaining in
%cr0 is for FPU control, but all that is handled later on in
initialization; in particular, configuring %cr0 as if the FPU is
present until proven otherwise is correct and necessary for the probe
to work.

To deal with the XO case sanely, explicitly disable paging in %cr0
before we muck with %cr3, %cr4 or EFER -- those operations are
inherently unsafe with paging enabled.

NOTE: There is still a lot of 386-related junk in head_32.S which we
can and should get rid of, however, this is intended as a minimal fix
whereas the cleanup can be deferred to the next merge window.

Reported-by: Andres Salomon <dilinger@queued.net>
Tested-by: Daniel Drake <dsd@laptop.org>
Link: http://lkml.kernel.org/r/50FA0661.2060400@linux.intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>


# 6662c34f 27-Nov-2012 H. Peter Anvin <hpa@linux.intel.com>

x86-32: Unbreak booting on some 486 clones

There appear to have been some 486 clones, including the "enhanced"
version of Am486, which have CPUID but not CR4. These 486 clones had
only the FPU flag, if any, unlike the Intel 486s with CPUID, which
also had VME and therefore needed CR4.

Therefore, look at the basic CPUID flags and require at least one bit
other than bit 0 before we modify CR4.

Thanks to Christian Ludloff of sandpile.org for confirming this as a
problem.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>


# 3e2a0cc3 13-Nov-2012 Fenghua Yu <fenghua.yu@intel.com>

x86-32, hotplug: Add start_cpu0() entry point to head_32.S

start_cpu0() is defined in head_32.S for 32-bit. The function sets up stack and
jumps to start_secondary() for CPU0 wake up.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1352835171-3958-9-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>


# 5a5a51db 24-Sep-2012 H. Peter Anvin <hpa@linux.intel.com>

x86-32: Start out eflags and cr4 clean

%cr4 is supposed to reflect a set of features into which the operating
system is opting in. If the BIOS or bootloader leaks bits here, this
is not desirable. Consider a bootloader passing in %cr4.pae set to a
legacy paging kernel, for example -- it will not have any immediate
effect, but the kernel would crash when turning paging on.

A similar argument applies to %eflags, and since we have to look for
%eflags.id being settable we can use a sequence which clears %eflags
as a side effect.

Note that we already do this for x86-64.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Link: http://lkml.kernel.org/r/1348529239-17943-1-git-send-email-hpa@linux.intel.com


# 48927bbb 08-May-2012 Jarkko Sakkinen <jarkko.sakkinen@intel.com>

x86, realmode: Move SMP trampoline to unified realmode code

Migrated SMP trampoline code to the real mode blob.
SMP trampoline code is not yet removed from
.x86_trampoline because it is needed by the wakeup
code.

[ hpa: always enable compiling startup_32_smp in head_32.S... it is
only a few instructions which go into .init on UP builds, and it makes
the rest of the code less #ifdef ugly. ]

Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@intel.com>
Link: http://lkml.kernel.org/r/1336501366-28617-6-git-send-email-jarkko.sakkinen@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>


# 4c5023a3 18-Apr-2012 H. Peter Anvin <hpa@zytor.com>

x86-32: Handle exception table entries during early boot

If we get an exception during early boot, walk the exception table to
see if we should intercept it. The main use case for this is to allow
rdmsr_safe()/wrmsr_safe() during CPU initialization.

Since the exception table is currently sorted at runtime, and fairly
late in startup, this code walks the exception table linearly. We
obviously don't need to worry about modules, however: none have been
loaded at this point.

This patch changes the early IDT setup to look a lot more like x86-64:
we now install handlers for all 32 exception vectors. The output of
the early exception handler has changed somewhat as it directly
reflects the stack frame of the exception handler, and the stack frame
has been somewhat restructured.

Finally, centralize the code that can and should be run only once.

[ v2: Use early_fixup_exception() instead of linear search ]

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Link: http://lkml.kernel.org/r/1334794610-5546-6-git-send-email-hpa@zytor.com


# 7bf04be8 25-Feb-2011 Stratos Psomadakis <psomas@cslab.ece.ntua.gr>

x86, asm: Cleanup unnecssary macros in asm-offsets.c

PAGE_SIZE_asm, PAGE_SHIFT_asm, THREAD_SIZE_asm can be safely removed from
asm-offsets.c, and be replaced by their non-'_asm' counterparts in the code
that uses them, since the _AC macro defined in include/linux/const.h makes
PAGE_SIZE/PAGE_SHIFT/THREAD_SIZE work with as.

Signed-off-by: Stratos Psomadakis <psomas@cslab.ece.ntua.gr>
LKML-Reference: <1298666774-17646-2-git-send-email-psomas@cslab.ece.ntua.gr>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>


# dc3119e70 23-Feb-2011 Thomas Gleixner <tglx@linutronix.de>

x86: OLPC: Cleanup config maze completely

Neither CONFIG_OLPC_OPENFIRMWARE nor CONFIG_OLPC_OPENFIRMWARE_DT are
really necessary.

OLPC selects OLPC_OPENFIRMWARE unconditionally, so move the "select
OF" part under OLPC config option and fixup the dependencies in
Makefiles and code.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andres Salomon <dilinger@queued.net>


# 11d4c3f9 04-Feb-2011 H. Peter Anvin <hpa@linux.intel.com>

x86-32: Make sure the stack is set up before we use it

Since checkin ebba638ae723d8a8fc2f7abce5ec18b688b791d7 we call
verify_cpu even in 32-bit mode. Unfortunately, calling a function
means using the stack, and the stack pointer was not initialized in
the 32-bit setup code! This code initializes the stack pointer, and
simplifies the interface slightly since it is easier to rely on just a
pointer value rather than a descriptor; we need to have different
values for the segment register anyway.

This retains start_stack as a virtual address, even though a physical
address would be more convenient for 32 bits; the 64-bit code wants
the other way around...

Reported-by: Matthieu Castet <castet.matthieu@free.fr>
LKML-Reference: <4D41E86D.8060205@free.fr>
Tested-by: Kees Cook <kees.cook@canonical.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>


# d50d8fe1 03-Jan-2011 Rusty Russell <rusty@rustcorp.com.au>

x86, mm: Initialize initial_page_table before paravirt jumps

v2.6.36-rc8-54-gb40827f (x86-32, mm: Add an initial page table
for core bootstrapping) made x86 boot using initial_page_table
and broke lguest.

For 2.6.37 we simply cut & paste the initialization code into
lguest (da32dac10126 "lguest: populate initial_page_table"), now
we fix it properly by doing that initialization before the
paravirt jump.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: lguest <lguest@ozlabs.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <201101041720.54535.rusty@rustcorp.com.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 147dd561 16-Dec-2010 H. Peter Anvin <hpa@linux.intel.com>

x86-32: Make sure we can map all of lowmem if we need to

A relocatable kernel can be anywhere in lowmem -- and in the case of a
kdump kernel, is likely to be fairly high. Since the early page
tables map everything from address zero up we need to make sure we
allocate enough brk that we can map all of lowmem if we need to.

Reported-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Tested-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <4D0AD3ED.8070607@kernel.org>


# da32dac1 16-Dec-2010 Rusty Russell <rusty@rustcorp.com.au>

lguest: populate initial_page_table

Two x86 patches broke lguest:
1) v2.6.35-492-g72d7c3b, which changed x86 to use the memblock allocator.

In lguest, the host places linear page tables at the top of mem, which
used to be enough to get us up to the swapper_pg_dir page tables. With
the first patch, the direct mapping tables used that memory:

Before: kernel direct mapping tables up to 4000000 @ 7000-1a000
After: kernel direct mapping tables up to 4000000 @ 3fed000-4000000

I initially fixed this by lying about the amount of memory we had, so
the kernel wouldn't blatt the lguest boot pagetables (yuk!), but then...

2) v2.6.36-rc8-54-gb40827f, which made x86 boot use initial_page_table.

This was initialized in a part of head_32.S which isn't executed by
lguest; it is then copied into swapper_pg_dir. So we have to initialize
it; and anyway we switch to it before we blatt the old tables, so that
fixes the previous damage as well.

For the moment, I cut & pasted the code into lguest's boot code, but
next merge window I will merge them.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
To: x86@kernel.org


# ebba638a 10-Nov-2010 Kees Cook <keescook@chromium.org>

x86, cpu: Call verify_cpu during 32bit CPU startup

The XD_DISABLE-clearing side-effect needs to happen for both 32bit
and 64bit, but the 32bit init routines were not calling verify_cpu()
yet. This adds that call to gain the side-effect.

The longmode/SSE tests being performed in verify_cpu() need to happen very
early for 64bit but not for 32bit. Instead of including it in two places
for 32bit, we can just include it once in arch/x86/kernel/head_32.S.

Signed-off-by: Kees Cook <kees.cook@canonical.com>
LKML-Reference: <1289414154-7829-4-git-send-email-kees.cook@canonical.com>
Acked-by: Pekka Enberg <penberg@kernel.org>
Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>


# b595076a 01-Nov-2010 Uwe Kleine-König <u.kleine-koenig@pengutronix.de>

tree-wide: fix comment/printk typos

"gadget", "through", "command", "maintain", "maintain", "controller", "address",
"between", "initiali[zs]e", "instead", "function", "select", "already",
"equal", "access", "management", "hierarchy", "registration", "interest",
"relative", "memory", "offset", "already",

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>


# b40827fa 28-Aug-2010 Borislav Petkov <bp@alien8.de>

x86-32, mm: Add an initial page table for core bootstrapping

This patch adds an initial page table with low mappings used exclusively
for booting APs/resuming after ACPI suspend/machine restart. After this,
there's no need to add low mappings to swapper_pg_dir and zap them later
or create own swsusp PGD page solely for ACPI sleep needs - we have
initial_page_table for that.

Signed-off-by: Borislav Petkov <bp@alien8.de>
LKML-Reference: <20101020070526.GA9588@liondog.tnic>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>


# fd89a137 16-Aug-2010 Joerg Roedel <joerg.roedel@amd.com>

x86-32: Separate 1:1 pagetables from swapper_pg_dir

This patch fixes machine crashes which occur when heavily exercising the
CPU hotplug codepaths on a 32-bit kernel. These crashes are caused by
AMD Erratum 383 and result in a fatal machine check exception. Here's
the scenario:

1. On 32-bit, the swapper_pg_dir page table is used as the initial page
table for booting a secondary CPU.

2. To make this work, swapper_pg_dir needs a direct mapping of physical
memory in it (the low mappings). By adding those low, large page (2M)
mappings (PAE kernel), we create the necessary conditions for Erratum
383 to occur.

3. Other CPUs which do not participate in the off- and onlining game may
use swapper_pg_dir while the low mappings are present (when leave_mm is
called). For all steps below, the CPU referred to is a CPU that is using
swapper_pg_dir, and not the CPU which is being onlined.

4. The presence of the low mappings in swapper_pg_dir can result
in TLB entries for addresses below __PAGE_OFFSET to be established
speculatively. These TLB entries are marked global and large.

5. When the CPU with such TLB entry switches to another page table, this
TLB entry remains because it is global.

6. The process then generates an access to an address covered by the
above TLB entry but there is a permission mismatch - the TLB entry
covers a large global page not accessible to userspace.

7. Due to this permission mismatch a new 4kb, user TLB entry gets
established. Further, Erratum 383 provides for a small window of time
where both TLB entries are present. This results in an uncorrectable
machine check exception signalling a TLB multimatch which panics the
machine.

There are two ways to fix this issue:

1. Always do a global TLB flush when a new cr3 is loaded and the
old page table was swapper_pg_dir. I consider this a hack hard
to understand and with performance implications

2. Do not use swapper_pg_dir to boot secondary CPUs like 64-bit
does.

This patch implements solution 2. It introduces a trampoline_pg_dir
which has the same layout as swapper_pg_dir with low_mappings. This page
table is used as the initial page table of the booting CPU. Later in the
bringup process, it switches to swapper_pg_dir and does a global TLB
flush. This fixes the crashes in our test cases.

-v2: switch to swapper_pg_dir right after entering start_secondary() so
that we are able to access percpu data which might not be mapped in the
trampoline page table.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
LKML-Reference: <20100816123833.GB28147@aftab>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>


# fd699c76 18-Jun-2010 Andres Salomon <dilinger@queued.net>

x86, olpc: Add support for calling into OpenFirmware

Add support for saving OFW's cif, and later calling into it to run OFW
commands. OFW remains resident in memory, living within virtual range
0xff800000 - 0xffc00000. A single page directory entry points to the
pgdir that OFW actually uses, so rather than saving the entire page
table, we grab and install that one entry permanently in the kernel's
page table.

This is currently only used by the OLPC XO. Note that this particular
calling convention breaks PAE and PAT, and so cannot be used on newer
x86 hardware.

Signed-off-by: Andres Salomon <dilinger@queued.net>
LKML-Reference: <20100618174653.7755a39a@dev.queued.net>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>


# 8a50e513 13-Nov-2009 H. Peter Anvin <hpa@zytor.com>

x86-32: Use symbolic constants, safer CPUID when enabling EFER.NX

Use symbolic constants rather than hard-coded values when setting
EFER.NX in head_32.S, and do a more rigorous test for the validity of
the response when probing for the extended CPUID range.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
LKML-Reference: <1258154897-6770-2-git-send-email-hpa@zytor.com>
Acked-by: Kees Cook <kees.cook@canonical.com>


# dd17c8f7 29-Oct-2009 Rusty Russell <rusty@rustcorp.com.au>

percpu: remove per_cpu__ prefix.

Now that the return from alloc_percpu is compatible with the address
of per-cpu vars, it makes sense to hand around the address of per-cpu
variables. To make this sane, we remove the per_cpu__ prefix we used
created to stop people accidentally using these vars directly.

Now we have sparse, we can use that (next patch).

tj: * Updated to convert stuff which were missed by or added after the
original patch.

* Kill per_cpu_var() macro.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Christoph Lameter <cl@linux-foundation.org>


# abe1ee3a 20-Sep-2009 Tim Abbott <tabbott@ksplice.com>

Use macros for .data.page_aligned section.

This patch changes the remaining direct references to
.data.page_aligned in C and assembly code to use the macros in
include/linux/linkage.h.

Signed-off-by: Tim Abbott <tabbott@ksplice.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Haavard Skinnemoen <hskinnemoen@atmel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>


# 02b7da37 20-Sep-2009 Tim Abbott <tabbott@ksplice.com>

Use macros for .bss.page_aligned section.

This patch changes the remaining direct references to
.bss.page_aligned in C and assembly code to use the macros in
include/linux/linkage.h.

Signed-off-by: Tim Abbott <tabbott@ksplice.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Cc: Chris Zankel <chris@zankel.net>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>


# 4ae59b91 16-Sep-2009 Tim Abbott <tabbott@ksplice.com>

x86: convert to use __HEAD and HEAD_TEXT macros.

This has the consequence of changing the section name use for head
code from ".text.head" to ".head.text". It also eliminates the
".text.head" output section (instead placing head code at the start of
the .text output section), which should be harmless.

This patch only changes the sections in the actual kernel, not those
in the compressed boot loader.

Signed-off-by: Tim Abbott <tabbott@ksplice.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>


# 1ea0d14e 03-Sep-2009 Jeremy Fitzhardinge <jeremy@goop.org>

x86/i386: Make sure stack-protector segment base is cache aligned

The Intel Optimization Reference Guide says:

In Intel Atom microarchitecture, the address generation unit
assumes that the segment base will be 0 by default. Non-zero
segment base will cause load and store operations to experience
a delay.
- If the segment base isn't aligned to a cache line
boundary, the max throughput of memory operations is
reduced to one [e]very 9 cycles.
[...]
Assembly/Compiler Coding Rule 15. (H impact, ML generality)
For Intel Atom processors, use segments with base set to 0
whenever possible; avoid non-zero segment base address that is
not aligned to cache line boundary at all cost.

We can't avoid having a non-zero base for the stack-protector
segment, but we can make it cache-aligned.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: <stable@kernel.org>
LKML-Reference: <4AA01893.6000507@goop.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 162bc7ab 28-Aug-2009 Pan, Jacob jun <jacob.jun.pan@intel.com>

x86: Add hardware_subarch ID for Moorestown

x86 bootprotocol 2.07 has introduced hardware_subarch ID in the boot
parameters provided by FW. We use it to identify Moorestown platforms.

[ tglx: Cleanup and paravirt fix ]

Signed-off-by: Jacob Pan <jacob.jun.pan@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 78b89ecd 18-Aug-2009 Jan Beulich <JBeulich@novell.com>

i386: Fix section mismatches for init code with !HOTPLUG_CPU

Commit 0e83815be719d3391bf5ea24b7fe696c07dbd417 changed the
section the initial_code variable gets allocated in, in an
attempt to address a section conflict warning. This, however
created a new section conflict when building without
HOTPLUG_CPU. The apparently only (reasonable) way to address
this is to always use __REFDATA.

Once at it, also fix a second section mismatch when not using
HOTPLUG_CPU.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Cc: Robert Richter <robert.richter@amd.com>
LKML-Reference: <4A8AE7CD020000780001054B@vpn.id2.novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 0e83815b 27-Jul-2009 Robert Richter <robert.richter@amd.com>

x86: fix section mismatch for i386 init code

Startup code for i386 in arch/x86/kernel/head_32.S is using the
reference variable initial_code that is located in the .cpuinit.data
section. If CONFIG_HOTPLUG_CPU is enabled, startup code is not in an
init section and can be called later too. In this case the reference
initial_code must be kept too. This patch fixes this. See below for
the section mismatch warning.

WARNING: vmlinux.o(.cpuinit.data+0x0): Section mismatch in reference
from the variable initial_code to the function
.init.text:i386_start_kernel()
The variable __cpuinitdata initial_code references
a function __init i386_start_kernel().
If i386_start_kernel is only used by initial_code then
annotate i386_start_kernel with a matching annotation.

Signed-off-by: Robert Richter <robert.richter@amd.com>
LKML-Reference: <1248716632-26844-1-git-send-email-robert.richter@amd.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>


# bc3f5d3d 17-Jun-2009 Alexander van Heukelum <heukelum@fastmail.fm>

x86: de-assembler-ize asm/desc.h

asm/desc.h is included in three assembly files, but the only macro
it defines, GET_DESC_BASE, is never used. This patch removes the
includes, removes the macro GET_DESC_BASE and the ASSEMBLY guard
from asm/desc.h.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>


# dfc20895 29-Apr-2009 Sam Ravnborg <sam@ravnborg.org>

x86, vmlinux.lds: unify .text output sections

32 bit x86 had a dedicated .text.head output section,
whereas 64 bit had it all in a single output section.

In the unified version the dedicated .text.head output section
was kept to have full control over the head code.

32 bit:

- Moved definition of _stext to the linker script.
The definition is located _after_ .text.page_aligned as this
is what 32 bit did before.

The ALIGN(8) was introduced so we hit the exact same address
(on the tested config) before and after the move.

I assume that it is a bug that _stext did not cover the
.text.page_aligned section - if this is true it can be fixed
in a follow-up patch (and the ugly ALIGN() can be dropped).

[ Impact: 64-bit: cleanup, 32-bit: use the 64-bit linker script ]

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Cc: Tim Abbott <tabbott@MIT.EDU>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <1240991249-27117-5-git-send-email-sam@ravnborg.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 60ac9821 17-Mar-2009 H. Peter Anvin <hpa@linux.intel.com>

x86-32: tighten the bound on additional memory to map

Impact: Tighten bound to avoid masking errors

The definition of MAPPING_BEYOND_END was excessive; this has a nasty
tendency to mask bugs. We have learned over time that this kind of
bug hiding can cause some very strange errors. Therefore, tighten the
bound to only need to map the actual kernel area.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Yinghai Lu <yinghai@kernel.org>


# b8a22a62 16-Mar-2009 Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

x86-32: remove ALLOCATOR_SLOP from head_32.S

Impact: cleanup

ALLOCATOR_SLOP is a vestigial remain from when we used the
bootmem allocator to allocate the kernel's linear memory mapping.
Now we directly reserve pages from the e820 mapping, and no
longer require secondary structures to keep track of allocated
pages.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>


# c090f532 16-Mar-2009 Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

x86-32: make sure we map enough to fit linear map pagetables

Impact: crash fix

head_32.S needs to map the kernel itself, and enough space so
that mm/init.c can allocate space from the e820 allocator
for the linear map of low memory.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>


# 2bd2753f 09-Mar-2009 Yinghai Lu <yinghai@kernel.org>

x86: put initial_pg_tables into .bss

Impact: makes vmlinux section information more useful

Don't use ram after _end blindly for pagetables. aka init pages is before _end
put those pg table into .bss

[Adapted to use brk segment - Jeremy]

v2: keep initial page table up to 512M only.
v4: put initial page tables just before _end

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>


# 796216a5 12-Mar-2009 Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

x86: allow extend_brk users to reserve brk space

Impact: new interface; remove hard-coded limit

Add RESERVE_BRK(name, size) macro to reserve space in the brk
area. This should be a conservative (ie, larger) estimate of
how much space might possibly be required from the brk area.
Any unused space will be freed, so there's no real downside
on making the reservation too large (within limits).

The name should be unique within a given file, and somewhat
descriptive.

The C definition of RESERVE_BRK() ends up being more complex than
one would expect to work around a cluster of gcc infelicities:

The first attempt was to simply try putting __section(.brk_reservation)
on a variable. This doesn't work because it ends up making it a
@progbits section, which gets actual space allocated in the vmlinux
executable.

The second attempt was to emit the space into a section using asm,
but gcc doesn't allow arguments to be passed to file-level asm()
statements, making it hard to pass in the size.

The final attempt is to wrap the asm() in a function to allow
it to have arguments, and put the function itself into the
.discard section, which vmlinux*.lds drops entirely from the
emitted vmlinux.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>


# 7543c1de 12-Mar-2009 Yinghai Lu <yinghai@kernel.org>

x86-32: compute initial mapping size more accurately

Impact: simplification

We only need to map the kernel in head_32.S, not the whole of
lowmem. We use 512MB as a reasonable (but arbitrary) limit on
the maximum size of the kernel image.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>


# ccf3fe02 27-Feb-2009 Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

x86-32: use brk segment for allocating initial kernel pagetable

Impact: use new interface instead of previous ad hoc implementation

Rather than having special purpose init_pg_table_start/end variables
to delimit the kernel pagetable built by head_32.S, just use the brk
mechanism to extend the bss for the new pagetable.

This patch removes init_pg_table_start/end and pg0, defines __brk_base
(which is page-aligned and immediately follows _end), initializes
the brk region to start there, and uses it for the 32-bit pagetable.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>


# 0341c14d 13-Feb-2009 Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

x86: use _types.h headers in asm where available

In general, the only definitions that assembly files can use
are in _types.S headers (where available), so convert them.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>


# 5c79d2a5 11-Feb-2009 Tejun Heo <tj@kernel.org>

x86: fix x86_32 stack protector bugs

Impact: fix x86_32 stack protector

Brian Gerst found out that %gs was being initialized to stack_canary
instead of stack_canary - 20, which basically gave the same canary
value for all threads. Fixing this also exposed the following bugs.

* cpu_idle() didn't call boot_init_stack_canary()

* stack canary switching in switch_to() was being done too late making
the initial run of a new thread use the old stack canary value.

Fix all of them and while at it update comment in cpu_idle() about
calling boot_init_stack_canary().

Reported-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 60a5317f 09-Feb-2009 Tejun Heo <tj@kernel.org>

x86: implement x86_32 stack protector

Impact: stack protector for x86_32

Implement stack protector for x86_32. GDT entry 28 is used for it.
It's set to point to stack_canary-20 and have the length of 24 bytes.
CONFIG_CC_STACKPROTECTOR turns off CONFIG_X86_32_LAZY_GS and sets %gs
to the stack canary segment on entry. As %gs is otherwise unused by
the kernel, the canary can be anywhere. It's defined as a percpu
variable.

x86_32 exception handlers take register frame on stack directly as
struct pt_regs. With -fstack-protector turned on, gcc copies the
whole structure after the stack canary and (of course) doesn't copy
back on return thus losing all changed. For now, -fno-stack-protector
is added to all files which contain those functions. We definitely
need something better.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# d5e397cb 25-Jan-2009 Ingo Molnar <mingo@elte.hu>

x86: improve early fault/irq printout

Impact: add a stack dump to early IRQs/faults

Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 34707bcd 26-Jan-2009 Ingo Molnar <mingo@elte.hu>

x86, debug: remove early_printk() #ifdefs from head_32.S

Impact: cleanup

Remove such constructs:

#ifdef CONFIG_EARLY_PRINTK
call early_printk
#else
call printk
#endif

Not only are they ugly, they are also pointless: a call to printk()
maps to early_printk during early bootup anyway, if CONFIG_EARLY_PRINTK
is enabled.

Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 0dd76d73 21-Jan-2009 Brian Gerst <brgerst@gmail.com>

x86: set %fs to __KERNEL_PERCPU unconditionally for x86_32

Impact: cleanup

%fs is currently set to __KERNEL_DS at boot, and conditionally
switched to __KERNEL_PERCPU for secondary cpus. Instead, initialize
GDT_ENTRY_PERCPU to the same attributes as GDT_ENTRY_KERNEL_DS and
set %fs to __KERNEL_PERCPU unconditionally.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>


# b2bc2731 23-Sep-2008 Suresh Siddha <suresh.b.siddha@intel.com>

x86, cpa: rename PTE attribute macros for kernel direct mapping in early boot

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: arjan@linux.intel.com
Cc: venkatesh.pallipadi@intel.com
Cc: jeremy@goop.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 583323b9 27-Jul-2008 Thomas Gleixner <tglx@linutronix.de>

x86: fix cpu hotplug on 32bit

commit 3e9704739daf46a8ba6593d749c67b5f7cd633d2 ("x86: boot secondary
cpus through initial_code") causes the kernel to crash when a CPU is
brought online after the read only sections have been write
protected. The write to initial_code in do_boot_cpu() fails.

Move inital_code to .cpuinit.data section.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: H. Peter Anvin <hpa@zytor.com>


# 3e970473 28-May-2008 Glauber Costa <gcosta@redhat.com>

x86: boot secondary cpus through initial_code

remove "initialize_secondary". Boot both architectures via
initial_code.

Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# e3f77edf 27-May-2008 Glauber Costa <gcosta@redhat.com>

x86: use initial_code for i386

x86_64 jumps to whatever is written in "initial_code" symbol,
instead of a fixed address. Do it for i386 too. It will allow us
to integrate more of the smp boot code.

Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 86b2b70e 02-Jun-2008 Joe Korty <joe.korty@ccur.com>

x86: fix asm warning in head_32.S

On Mon, May 19, 2008 at 04:10:02PM -0700, Linus Torvalds wrote:
> It also causes these warnings on 32-bit PAE:
>
> AS arch/x86/kernel/head_32.o
> arch/x86/kernel/head_32.S: Assembler messages:
> arch/x86/kernel/head_32.S:225: Warning: left operand is a bignum; integer 0 assumed
> arch/x86/kernel/head_32.S:609: Warning: left operand is a bignum; integer 0 assumed
>
> and I do not see why (the end result seems to be identical).

Fix head_32.S gcc bignum warnings when CONFIG_PAE=y.

arch/x86/kernel/head_32.S: Assembler messages:
arch/x86/kernel/head_32.S:225: Warning: left operand is a bignum; integer 0 assumed
arch/x86/kernel/head_32.S:609: Warning: left operand is a bignum; integer 0 assumed

The assembler was stumbling over the 64-bit constant 0x100000000 in the
KPMDS #define.

Testing: a cmp(1) on head_32.o before and after shows the binary is unchanged.

Signed-off-by: Joe Korty <joe.korty@ccur.com
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Theodore Tso <tytso@mit.edu>
Cc: Gabriel C <nix.or.die@googlemail.com>
Cc: Keith Packard <keithp@keithp.com>
Cc: "Pallipadi Venkatesh" <venkatesh.pallipadi@intel.com>
Cc: Eric Anholt <eric@anholt.net>
Cc: "Siddha Suresh B" <suresh.b.siddha@intel.com>
Cc: bugme-daemon@bugzilla.kernel.org
Cc: airlied@linux.ie
Cc: "Barnes Jesse" <jesse.barnes@intel.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 6af61a76 02-Jun-2008 Yinghai Lu <yhlu.kernel@gmail.com>

x86: clean up max_pfn_mapped usage - 32-bit

on 32-bit in head_32.S after initial page table is done, we get initial
max_pfn_mapped, and then kernel_physical_mapping_init will give us
a final one.

We need to use that to make sure find_e820_area will get valid addresses
for boot_map and for NODE_DATA(0) on numa32.

XEN PV and lguest may need to assign max_pfn_mapped too.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# f0d43100 29-May-2008 Yinghai Lu <yhlu.kernel@gmail.com>

x86: extend e820 early_res support 32bit -fix #3

introduce init_pg_table_start, so xen PV could specify the value.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 575ca735 25-Apr-2008 Vegard Nossum <vegard.nossum@gmail.com>

x86: fix early-BUG message

The .asciz directive takes any number of strings, but each one is zero-
terminated, and string pasting is not done as in C. That results in only the
first line being output.

Replace .asciz with multiple .ascii directives and terminate with .asciz.

Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# cf9b111c 08-Mar-2008 WANG Cong <xiyou.wangcong@gmail.com>

x86: remove pointless comments

Remove old comments that include the old arch/i386 directory.

Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 700efc1b 23-Feb-2008 Eric W. Biederman <ebiederm@xmission.com>

x86: introduce kernel/head32.c

Copy x86_64 and add a head32.c so we can start moving early
architecture initialization out of assembly.

[ Sam Ravnborg <sam@ravnborg.org>: updated it to x86 ]

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# e215f3c2 12-Mar-2008 Jiri Slaby <jirislaby@kernel.org>

x86: fix fault_msg nul termination

The fault_msg text is not explictly nul terminated now in startup
assembly. Do so by converting .ascii to .asciz.

Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# ed2b7e2b 22-Feb-2008 Adrian Bunk <bunk@kernel.org>

x86: don't make swapper_pg_pmd global

There doesn't seem to be any reason for swapper_pg_pmd being global.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# aa65af3f 13-Feb-2008 Adrian Bunk <bunk@kernel.org>

x86: don't make swapper_pg_fixmap global

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Cc: Ian Campbell <ijc@hellion.org.uk>
Cc: hpa@zytor.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 551889a6 09-Feb-2008 Ian Campbell <ijc@hellion.org.uk>

x86: construct 32-bit boot time page tables in native format.

Specifically the boot time page tables in a CONFIG_X86_PAE=y enabled
kernel are in PAE format.

early_ioremap is updated to use the standard page table accessors.

Clear any mappings beyond max_low_pfn from the boot page tables in
native_pagetable_setup_start because the initial mappings can extend
beyond the range of physical memory and into the vmalloc area.

Derived from patches by Eric Biederman and H. Peter Anvin.

[ jeremy@goop.org: PAE swapper_pg_dir needs to be page-sized fix ]

Signed-off-by: Ian Campbell <ijc@hellion.org.uk>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Mika Penttilä <mika.penttila@kolumbus.fi>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 8b2f7fff 30-Jan-2008 Sam Ravnborg <sam@ravnborg.org>

x86: fix Section mismatch: reference to .init.text:lguest_entry

fix:

> WARNING: vmlinux.o(.data+0x4): Section mismatch: reference to .init.text:lguest_entry (between 'subarch_entries' and 'stack_start')

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 5756dd59 30-Jan-2008 Ian Campbell <ijc@hellion.org.uk>

x86_32: always run the full set of paging state.

I am preparing to convert the boot time page table to the kernels
native format. To achieve that I need to enable PAE. Enabling PSE
and the no execute bit would not hurt. So this patch modifies
the boot cpu path to execute all of the kernels enable code
if and only if we have the proper bits set in mmu_cr4_features.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Ian Campbell <ijc@hellion.org.uk>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Mika Penttilä <mika.penttila@kolumbus.fi>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 50359501 30-Jan-2008 Ian Campbell <ijc@hellion.org.uk>

x86_32: remove unnecessary use of %ebx as the boot cpu flag

Currently in head_32.S there are two ways we test to see if we
are the boot cpu. By looking at %ebx and by looking at the
static variable ready. When changing things around I have
found that it gets tricky to preserve %ebx. So this
patch just switches head.S over to the more reliable
test of always using ready.

Hopefully later we can kill these tests entirely.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Ian Campbell <ijc@hellion.org.uk>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Mika Penttilä <mika.penttila@kolumbus.fi>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 94878efd 30-Jan-2008 Ingo Molnar <mingo@elte.hu>

x86: early fault debugging improvement

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 476c6c11 01-Jan-2008 Rusty Russell <rusty@rustcorp.com.au>

fix lguest rmmod "bad pgd"

After 17d57a9206b4de6ad082ac9f2d2346985abbd2aa ("x86: fix x86-32 early
fixmap initialization.") removing lg.ko caused a printk from vunmap:

mm/memory.c:115: bad pgd 004b3027.

On the second use after module load, the kernel crashes.

This fixes the immediate problem (accessed and dirty bits not set as
expected in pmd_none_or_clear_bad). I can't see why this would cause
a crash, but I haven't been able to reproduce it once this is applied.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 17d57a92 01-Dec-2007 Eric W. Biederman <ebiederm@xmission.com>

x86: fix x86-32 early fixmap initialization.

pageexec@freemail.hu writes:

> i've just noticed that the chunk in i386/kernel/head.S ended up in a
> weird place, namely, it's not going to be executed as it's just after
> a 'jmp 3f' and before startup_32_smp, probably not what you intended.
> on a sidenote, the whole thing can be done in a single insn, like:
>
> movl $(swapper_pg_pmd - __PAGE_OFFSET + 0x067), (swapper_pg_dir -
> __PAGE_OFFSET+ 4092)

Thanks for the reminder I thought we had fixed this problem a while ago.

Needed to get fixed virtual address for USB debug and earlycon with mmio.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# fa76dab9 23-Oct-2007 H. Peter Anvin <hpa@zytor.com>

x86: clean up setup.h and the boot code

Make <asm/setup.h> usable by the boot code.

Clean up vestiges of the old command-line protocol from setup.h and
head_32.S (it is still supported from the boot loader point of
view, since it is converted to the new command-line protocol by the
boot code.)

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# a24e7851 21-Oct-2007 Rusty Russell <rusty@rustcorp.com.au>

i386: paravirt boot sequence

This patch uses the updated boot protocol to do paravirtualized boot.
If the boot version is >= 2.07, then it will do two things:

1. Check the bootparams loadflags to see if we should reload the
segment registers and clear interrupts. This is appropriate
for normal native boot and some paravirtualized environments, but
inapproprate for others.

2. Check the hardware architecture, and dispatch to the appropriate
kernel entrypoint. If the bootloader doesn't set this, then we
simply do the normal boot sequence.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@in.ibm.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Zachary Amsden <zach@vmware.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 382f64ab 17-Oct-2007 Ingo Molnar <mingo@elte.hu>

i386: print better early fault info

improve early fault output.

old format:

Int 14: CR2 010001e3 err 00000002 EIP c011f2f9 CS 00000060 flags 00010046
Stack: c073695e c0791c10 00000000 ffffffff 00000000 01000000 00001000 c0791c10

new format:

BUG: Int 14: CR2 010001e3
EDI c1000000 ESI c0693c10 EBP c0637f9c ESP c0637f08
EBX 00000000 EDX 0000000e ECX 00000000 EAX 010001e3
err 00000002 EIP c0123119 CS 00000060 flg 00010046
Stack: c064d589 c0693000 00000000 c0637f60 00c001e3 01000000 00038000 00000163
00000000 00000163 00000000 ffffffff 00038000 00000000 00000000 00001000
00001000 00000000 c0637f88 c06509be c0a2ae60 00001000 00001000 00000000

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 1e3e1972 17-Oct-2007 Ingo Molnar <mingo@elte.hu>

x86: prepare page allocator for high allocations on PAGEALLOC=y

To preserve the DMA pool in CONFIG_DEBUG_PAGEALLOC=y kernels, we'll
allocate pagetables from above the 16MB DMA limit, so we'll have to set
up boot pagetables to cover 16MB more RAM (worst-case).

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 9a163ed8 11-Oct-2007 Thomas Gleixner <tglx@linutronix.de>

i386: move kernel

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>