History log of /linux-master/include/asm-generic/vmlinux.lds.h
Revision Date Author Comments
# 2947a456 09-Jan-2024 Nathan Chancellor <nathan@kernel.org>

treewide: update LLVM Bugzilla links

LLVM moved their issue tracker from their own Bugzilla instance to GitHub
issues. While all of the links are still valid, they may not necessarily
show the most up to date information around the issues, as all updates
will occur on GitHub, not Bugzilla.

Another complication is that the Bugzilla issue number is not always the
same as the GitHub issue number. Thankfully, LLVM maintains this mapping
through two shortlinks:

https://llvm.org/bz<num> -> https://bugs.llvm.org/show_bug.cgi?id=<num>
https://llvm.org/pr<num> -> https://github.com/llvm/llvm-project/issues/<mapped_num>

Switch all "https://bugs.llvm.org/show_bug.cgi?id=<num>" links to the
"https://llvm.org/pr<num>" shortlink so that the links show the most up to
date information. Each migrated issue links back to the Bugzilla entry,
so there should be no loss of fidelity of information here.

Link: https://lkml.kernel.org/r/20240109-update-llvm-links-v1-3-eb09b59db071@kernel.org
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Fangrui Song <maskray@google.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Mykola Lysenko <mykolal@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>


# d81f0d7b 13-Dec-2023 Rae Moar <rmoar@google.com>

kunit: add KUNIT_INIT_TABLE to init linker section

Add KUNIT_INIT_TABLE to the INIT_DATA linker section.

Alter the KUnit macros to create init tests:
kunit_test_init_section_suites

Update lib/kunit/executor.c to run both the suites in KUNIT_TABLE and
KUNIT_INIT_TABLE.

Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Rae Moar <rmoar@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>


# 69dfdce1 13-Dec-2023 Rae Moar <rmoar@google.com>

kunit: move KUNIT_TABLE out of INIT_DATA

Alter the linker section of KUNIT_TABLE to move it out of INIT_DATA and
into DATA_DATA.

Data for KUnit tests does not need to be in the init section.

In order to run tests again after boot the KUnit data cannot be labeled as
init data as the kernel could write over it.

Add a KUNIT_INIT_TABLE in the next patch for KUnit tests that test init
data/functions.

Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Rae Moar <rmoar@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>


# 6a4e59ee 22-Oct-2023 Masahiro Yamada <masahiroy@kernel.org>

linux/init: remove __memexit* annotations

We have never used __memexit, __memexitdata, or __memexitconst.

These were unneeded.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>


# 15e86643 30-Sep-2023 Masahiro Yamada <masahiroy@kernel.org>

vmlinux.lds.h: remove unused CPU_KEEP and CPU_DISCARD macros

Remove the left-over of commit e24f6628811e ("modpost: remove all
traces of cpuinit/cpuexit sections").

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Acked-by: Paul Gortmaker <paul.gortmaker@windriver.com>


# 5fc52248 11-Jul-2023 Petr Pavlu <petr.pavlu@suse.com>

vmlinux.lds.h: Remove a reference to no longer used sections .text..refcount

Sections .text..refcount were previously used to hold an error path code
for fast refcount overflow protection on x86, see commit 7a46ec0e2f48
("locking/refcounts, x86/asm: Implement fast refcount overflow
protection") and commit 564c9cc84e2a ("locking/refcounts, x86/asm: Use
unique .text section for refcount exceptions").

The code was replaced and removed in commit fb041bb7c0a9
("locking/refcount: Consolidate implementations of refcount_t") and no
sections .text..refcount are present since then.

Remove then a relic referencing these sections from TEXT_TEXT to avoid
confusing people, like me. This is a non-functional change.

Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
Link: https://lore.kernel.org/r/20230711125054.9000-1-petr.pavlu@suse.com
Signed-off-by: Kees Cook <keescook@chromium.org>


# d4035ff16 23-May-2023 Jisheng Zhang <jszhang@kernel.org>

vmlinux.lds.h: use correct .init.data.* section name

If building with -fdata-sections on riscv, LD_ORPHAN_WARN will warn
similar as below:

riscv64-linux-gnu-ld: warning: orphan section `.init.data.efi_loglevel'
from `./drivers/firmware/efi/libstub/printk.stub.o' being placed in
section `.init.data.efi_loglevel'

I believe this is caused by a a typo:
init.data.* should be .init.data.*

Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Tested-by: Nick Desaulniers <ndesaulniers@google.com> # build
Link: https://lore.kernel.org/r/20230523165502.2592-4-jszhang@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>


# ddb5cdba 11-Jun-2023 Masahiro Yamada <masahiroy@kernel.org>

kbuild: generate KSYMTAB entries by modpost

Commit 7b4537199a4a ("kbuild: link symbol CRCs at final link, removing
CONFIG_MODULE_REL_CRCS") made modpost output CRCs in the same way
whether the EXPORT_SYMBOL() is placed in *.c or *.S.

For further cleanups, this commit applies a similar approach to the
entire data structure of EXPORT_SYMBOL().

The EXPORT_SYMBOL() compilation is split into two stages.

When a source file is compiled, EXPORT_SYMBOL() will be converted into
a dummy symbol in the .export_symbol section.

For example,

EXPORT_SYMBOL(foo);
EXPORT_SYMBOL_NS_GPL(bar, BAR_NAMESPACE);

will be encoded into the following assembly code:

.section ".export_symbol","a"
__export_symbol_foo:
.asciz "" /* license */
.asciz "" /* name space */
.balign 8
.quad foo /* symbol reference */
.previous

.section ".export_symbol","a"
__export_symbol_bar:
.asciz "GPL" /* license */
.asciz "BAR_NAMESPACE" /* name space */
.balign 8
.quad bar /* symbol reference */
.previous

They are mere markers to tell modpost the name, license, and namespace
of the symbols. They will be dropped from the final vmlinux and modules
because the *(.export_symbol) will go into /DISCARD/ in the linker script.

Then, modpost extracts all the information about EXPORT_SYMBOL() from the
.export_symbol section, and generates the final C code:

KSYMTAB_FUNC(foo, "", "");
KSYMTAB_FUNC(bar, "_gpl", "BAR_NAMESPACE");

KSYMTAB_FUNC() (or KSYMTAB_DATA() if it is data) is expanded to struct
kernel_symbol that will be linked to the vmlinux or a module.

With this change, EXPORT_SYMBOL() works in the same way for *.c and *.S
files, providing the following benefits.

[1] Deprecate EXPORT_DATA_SYMBOL()

In the old days, EXPORT_SYMBOL() was only available in C files. To export
a symbol in *.S, EXPORT_SYMBOL() was placed in a separate *.c file.
arch/arm/kernel/armksyms.c is one example written in the classic manner.

Commit 22823ab419d8 ("EXPORT_SYMBOL() for asm") removed this limitation.
Since then, EXPORT_SYMBOL() can be placed close to the symbol definition
in *.S files. It was a nice improvement.

However, as that commit mentioned, you need to use EXPORT_DATA_SYMBOL()
for data objects on some architectures.

In the new approach, modpost checks symbol's type (STT_FUNC or not),
and outputs KSYMTAB_FUNC() or KSYMTAB_DATA() accordingly.

There are only two users of EXPORT_DATA_SYMBOL:

EXPORT_DATA_SYMBOL_GPL(empty_zero_page) (arch/ia64/kernel/head.S)
EXPORT_DATA_SYMBOL(ia64_ivt) (arch/ia64/kernel/ivt.S)

They are transformed as follows and output into .vmlinux.export.c

KSYMTAB_DATA(empty_zero_page, "_gpl", "");
KSYMTAB_DATA(ia64_ivt, "", "");

The other EXPORT_SYMBOL users in ia64 assembly are output as
KSYMTAB_FUNC().

EXPORT_DATA_SYMBOL() is now deprecated.

[2] merge <linux/export.h> and <asm-generic/export.h>

There are two similar header implementations:

include/linux/export.h for .c files
include/asm-generic/export.h for .S files

Ideally, the functionality should be consistent between them, but they
tend to diverge.

Commit 8651ec01daed ("module: add support for symbol namespaces.") did
not support the namespace for *.S files.

This commit shifts the essential implementation part to C, which supports
EXPORT_SYMBOL_NS() for *.S files.

<asm/export.h> and <asm-generic/export.h> will remain as a wrapper of
<linux/export.h> for a while.

They will be removed after #include <asm/export.h> directives are all
replaced with #include <linux/export.h>.

[3] Implement CONFIG_TRIM_UNUSED_KSYMS in one-pass algorithm (by a later commit)

When CONFIG_TRIM_UNUSED_KSYMS is enabled, Kbuild recursively traverses
the directory tree to determine which EXPORT_SYMBOL to trim. If an
EXPORT_SYMBOL turns out to be unused by anyone, Kbuild begins the
second traverse, where some source files are recompiled with their
EXPORT_SYMBOL() tuned into a no-op.

We can do this better now; modpost can selectively emit KSYMTAB entries
that are really used by modules.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>


# b9f174c8 13-Jun-2023 Omar Sandoval <osandov@fb.com>

x86/unwind/orc: Add ELF section with ORC version identifier

Commits ffb1b4a41016 ("x86/unwind/orc: Add 'signal' field to ORC
metadata") and fb799447ae29 ("x86,objtool: Split UNWIND_HINT_EMPTY in
two") changed the ORC format. Although ORC is internal to the kernel,
it's the only way for external tools to get reliable kernel stack traces
on x86-64. In particular, the drgn debugger [1] uses ORC for stack
unwinding, and these format changes broke it [2]. As the drgn
maintainer, I don't care how often or how much the kernel changes the
ORC format as long as I have a way to detect the change.

It suffices to store a version identifier in the vmlinux and kernel
module ELF files (to use when parsing ORC sections from ELF), and in
kernel memory (to use when parsing ORC from a core dump+symbol table).
Rather than hard-coding a version number that needs to be manually
bumped, Peterz suggested hashing the definitions from orc_types.h. If
there is a format change that isn't caught by this, the hashing script
can be updated.

This patch adds an .orc_header allocated ELF section containing the
20-byte hash to vmlinux and kernel modules, along with the corresponding
__start_orc_header and __stop_orc_header symbols in vmlinux.

1: https://github.com/osandov/drgn
2: https://github.com/osandov/drgn/issues/303

Fixes: ffb1b4a41016 ("x86/unwind/orc: Add 'signal' field to ORC metadata")
Fixes: fb799447ae29 ("x86,objtool: Split UNWIND_HINT_EMPTY in two")
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Link: https://lkml.kernel.org/r/aef9c8dc43915b886a8c48509a12ec1b006ca1ca.1686690801.git.osandov@osandov.com


# f7ba52f3 18-Apr-2023 Josh Poimboeuf <jpoimboe@kernel.org>

vmlinux.lds.h: Discard .note.gnu.property section

When tooling reads ELF notes, it assumes each note entry is aligned to
the value listed in the .note section header's sh_addralign field.

The kernel-created ELF notes in the .note.Linux and .note.Xen sections
are aligned to 4 bytes. This causes the toolchain to set those
sections' sh_addralign values to 4.

On the other hand, the GCC-created .note.gnu.property section has an
sh_addralign value of 8 for some reason, despite being based on struct
Elf32_Nhdr which only needs 4-byte alignment.

When the mismatched input sections get linked together into the vmlinux
.notes output section, the higher alignment "wins", resulting in an
sh_addralign of 8, which confuses tooling. For example:

$ readelf -n .tmp_vmlinux.btf
...
readelf: .tmp_vmlinux.btf: Warning: note with invalid namesz and/or descsz found at offset 0x170
readelf: .tmp_vmlinux.btf: Warning: type: 0x4, namesize: 0x006e6558, descsize: 0x00008801, alignment: 8

In this case readelf thinks there's alignment padding where there is
none, so it starts reading an ELF note in the middle.

With newer toolchains (e.g., latest Fedora Rawhide), a similar mismatch
triggers a build failure when combined with CONFIG_X86_KERNEL_IBT:

btf_encoder__encode: btf__dedup failed!
Failed to encode BTF
libbpf: failed to find '.BTF' ELF section in vmlinux
FAILED: load BTF from vmlinux: No data available
make[1]: *** [scripts/Makefile.vmlinux:35: vmlinux] Error 255

This latter error was caused by pahole crashing when it encountered the
corrupt .notes section. This crash has been fixed in dwarves version
1.25. As Tianyi Liu describes:

"Pahole reads .notes to look for LINUX_ELFNOTE_BUILD_LTO. When LTO is
enabled, pahole needs to call cus__merge_and_process_cu to merge
compile units, at which point there should only be one unspecified
type (used to represent some compilation information) in the global
context.

However, when the kernel is compiled without LTO, if pahole calls
cus__merge_and_process_cu due to alignment issues with notes,
multiple unspecified types may appear after merging the cus, and
older versions of pahole only support up to one. This is why pahole
1.24 crashes, while newer versions support multiple. However, the
latest version of pahole still does not solve the problem of
incorrect LTO recognition, so compiling the kernel may be slower
than normal."

Even with the newer pahole, the note section misaligment issue still
exists and pahole is misinterpreting the LTO note. Fix it by discarding
the .note.gnu.property section. While GNU properties are important for
user space (and VDSO), they don't seem to have any use for vmlinux.

(In fact, they're already getting (inadvertently) stripped from vmlinux
when CONFIG_DEBUG_INFO_BTF is enabled. The BTF data is extracted from
vmlinux.o with "objcopy --only-section=.BTF" into .btf.vmlinux.bin.o.
That file doesn't have .note.gnu.property, so when it gets modified and
linked back into the main object, the linker automatically strips it
(see "How GNU properties are merged" in the ld man page).)

Reported-by: Daniel Xu <dxu@dxuuu.xyz>
Link: https://lkml.kernel.org/bpf/57830c30-cd77-40cf-9cd1-3bb608aa602e@app.fastmail.com
Debugged-by: Tianyi Liu <i.pear@outlook.com>
Suggested-by: Joan Bruguera Micó <joanbrugueram@gmail.com>
Link: https://lore.kernel.org/r/20230418214925.ay3jpf2zhw75kgmd@treble
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>


# 2b5a0e42 12-Jan-2023 Peter Zijlstra <peterz@infradead.org>

objtool/idle: Validate __cpuidle code as noinstr

Idle code is very like entry code in that RCU isn't available. As
such, add a little validation.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Tony Lindgren <tony@atomide.com>
Tested-by: Ulf Hansson <ulf.hansson@linaro.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/20230112195540.373461409@infradead.org


# 99cb0d91 26-Dec-2022 Masahiro Yamada <masahiroy@kernel.org>

arch: fix broken BuildID for arm64 and riscv

Dennis Gilmore reports that the BuildID is missing in the arm64 vmlinux
since commit 994b7ac1697b ("arm64: remove special treatment for the
link order of head.o").

The issue is that the type of .notes section, which contains the BuildID,
changed from NOTES to PROGBITS.

Ard Biesheuvel figured out that whichever object gets linked first gets
to decide the type of a section. The PROGBITS type is the result of the
compiler emitting .note.GNU-stack as PROGBITS rather than NOTE.

While Ard provided a fix for arm64, I want to fix this globally because
the same issue is happening on riscv since commit 2348e6bf4421 ("riscv:
remove special treatment for the link order of head.o"). This problem
will happen in general for other architectures if they start to drop
unneeded entries from scripts/head-object-list.txt.

Discard .note.GNU-stack in include/asm-generic/vmlinux.lds.h.

Link: https://lore.kernel.org/lkml/CAABkxwuQoz1CTbyb57n0ZX65eSYiTonFCU8-LCQc=74D=xE=rA@mail.gmail.com/
Fixes: 994b7ac1697b ("arm64: remove special treatment for the link order of head.o")
Fixes: 2348e6bf4421 ("riscv: remove special treatment for the link order of head.o")
Reported-by: Dennis Gilmore <dennis@ausil.us>
Suggested-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Acked-by: Palmer Dabbelt <palmer@rivosinc.com>


# 1d926e25 17-Nov-2022 Jim Cromie <jim.cromie@gmail.com>

vmlinux.lds.h: add HEADERED_SECTION_* macros

These macros elaborate on BOUNDED_SECTION_(PRE|POST)_LABEL macros,
prepending an optional KEEP(.gnu.linkonce##_sec_) reservation, and a
linker-symbol to address it.

This allows a developer to define a header struct (which must fit with
the section's base struct-type), and could contain:

1- fields whose value is common to the entire set of data-records.
This allows the header & data structs to specialize, complement
each other, and shrink.

2- an uplink pointer to an organizing struct
which refs other related/sub data-tables
header record is addressable via the extern'd header linker-symbol

Once the linker-symbols created by the macro are ref'd extern in code,
that code can compute a record's index (ptr - start) in the "primary"
table, then use it to index into the related/sub tables. Adding a
primary.map_* field foreach sub-table would then allow deduplication
and remapping of that sub-table.

This is aimed at dyndbg's struct _ddebug __dyndbg[] section, whose 3
columns: function, file, module are 50%, 90%, 100% redundant. The
module column is fully recoverable after dynamic_debug_init() saves it
to each ddebug_table.module as the builtin __dyndbg[] table is parsed.

Given that those 3 columns use 24/56 of a _ddebug record, a dyndbg=y
kernel with ~5k callsites could reduce kernel memory substantially.
Returning that memory to the kernel buddy-allocator? is then possible.

Signed-off-by: Jim Cromie <jim.cromie@gmail.com>
Link: https://lore.kernel.org/r/20221117171633.923628-3-jim.cromie@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# 435d6b65 17-Nov-2022 Jim Cromie <jim.cromie@gmail.com>

vmlinux.lds.h: fix BOUNDED_SECTION_(PRE|POST)_LABEL macros

Commit 2f465b921bb8 ("vmlinux.lds.h: place optional header space in BOUNDED_SECTION")

added BOUNDED_SECTION_(PRE|POST)_LABEL macros, encapsulating the basic
boilerplate to KEEP/pack records into a section, and to mark the begin
and end of the section with linker-symbols.

But it tried to do extra, adding KEEP(*(.gnu.linkonce.##_sec_)) to
optionally reserve a header record in front of the data. It wrongly
placed the KEEP after the linker-symbol starting the section,
so if a header was added, it would wind up in the data.

Moving the KEEP to the "correct" place proved brittle, and too clever
by half. The obvious safe fix is to remove the KEEP and restore the
plain old boilerplate. The header can be added later, with separate
macros.

Also, the macro var-names: _s_, _e_ are nearly invisible, change them
to more obvious names: _BEGIN_, _END_

Fixes: 2f465b921bb8 ("vmlinux.lds.h: place optional header space in BOUNDED_SECTION")
Signed-off-by: Jim Cromie <jim.cromie@gmail.com>
Link: https://lore.kernel.org/r/20221117171633.923628-2-jim.cromie@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# 2f465b92 22-Oct-2022 Jim Cromie <jim.cromie@gmail.com>

vmlinux.lds.h: place optional header space in BOUNDED_SECTION

Extend recently added BOUNDED_SECTION(_name) macro by adding a
KEEP(*(.gnu.linkonce.##_name)) before the KEEP(*(_name)).

This does nothing by itself, vmlinux is the same before and after this
patch. But if a developer adds a .gnu.linkonce.foo record, that
record is placed in the front of the section, where it can be used as
a header for the table.

The intent is to create an up-link to another organizing struct, from
where related tables can be referenced. And since every item in a
table has a known offset from its header, that same offset can be used
to fetch records from the related tables.

By itself, this doesnt gain much, unless maybe the pattern of access
is to scan 1 or 2 fields in each fat record, but with 2 16 bit .map*
fields added, we could de-duplicate 2 related tables.

The use case here is struct _ddebug, which has 3 pointers (function,
file, module) with substantial repetition; respectively 53%, 90%, and
the module column is fully recoverable after dynamic_debug_init()
splits the table into a linked list of "module" chunks.

On a DYNAMIC_DEBUG=y kernel with 5k pr_debugs, the memory savings
should be ~100 KiB.

Signed-off-by: Jim Cromie <jim.cromie@gmail.com>
Link: https://lore.kernel.org/r/20221022225637.1406715-3-jim.cromie@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# 9b351be2 22-Oct-2022 Jim Cromie <jim.cromie@gmail.com>

vmlinux.lds.h: add BOUNDED_SECTION* macros

vmlinux.lds.h has ~45 occurrences of this general pattern:

__start_foo = .;
KEEP(*(foo))
__stop_foo = .;

Reduce this pattern to a (group of 4) macros, and use them to reduce
linecount. This was inspired by the codetag patchset.

no functional change.

CC: Suren Baghdasaryan <surenb@google.com>
CC: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Jim Cromie <jim.cromie@gmail.com>
Link: https://lore.kernel.org/r/20221022225637.1406715-2-jim.cromie@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# d49a0626 15-Sep-2022 Peter Zijlstra <peterz@infradead.org>

arch: Introduce CONFIG_FUNCTION_ALIGNMENT

Generic function-alignment infrastructure.

Architectures can select FUNCTION_ALIGNMENT_xxB symbols; the
FUNCTION_ALIGNMENT symbol is then set to the largest such selected
size, 0 otherwise.

From this the -falign-functions compiler argument and __ALIGN macro
are set.

This incorporates the DEBUG_FORCE_FUNCTION_ALIGN_64B knob and future
alignment requirements for x86_64 (later in this series) into a single
place.

NOTE: also removes the 0x90 filler byte from the generic __ALIGN
primitive, that value makes no sense outside of x86.

NOTE: .balign 0 reverts to a no-op.

Requested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220915111143.719248727@infradead.org


# 68c76ad4 27-Oct-2022 Ard Biesheuvel <ardb@kernel.org>

arm64: unwind: add asynchronous unwind tables to kernel and modules

Enable asynchronous unwind table generation for both the core kernel as
well as modules, and emit the resulting .eh_frame sections as init code
so we can use the unwind directives for code patching at boot or module
load time.

This will be used by dynamic shadow call stack support, which will rely
on code patching rather than compiler codegen to emit the shadow call
stack push and pop instructions.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Tested-by: Sami Tolvanen <samitolvanen@google.com>
Link: https://lore.kernel.org/r/20221027155908.1940624-2-ardb@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>


# 000f8870 08-Nov-2022 Nathan Chancellor <nathan@kernel.org>

vmlinux.lds.h: Fix placement of '.data..decrypted' section

Commit d4c639990036 ("vmlinux.lds.h: Avoid orphan section with !SMP")
fixed an orphan section warning by adding the '.data..decrypted' section
to the linker script under the PERCPU_DECRYPTED_SECTION define but that
placement introduced a panic with !SMP, as the percpu sections are not
instantiated with that configuration so attempting to access variables
defined with DEFINE_PER_CPU_DECRYPTED() will result in a page fault.

Move the '.data..decrypted' section to the DATA_MAIN define so that the
variables in it are properly instantiated at boot time with
CONFIG_SMP=n.

Cc: stable@vger.kernel.org
Fixes: d4c639990036 ("vmlinux.lds.h: Avoid orphan section with !SMP")
Link: https://lore.kernel.org/cbbd3548-880c-d2ca-1b67-5bb93b291d5f@huawei.com/
Debugged-by: Ard Biesheuvel <ardb@kernel.org>
Reported-by: Zhao Wenhui <zhaowenhui8@huawei.com>
Tested-by: xiafukun <xiafukun@huawei.com>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20221108174934.3384275-1-nathan@kernel.org


# 883bbbff 18-Oct-2022 Peter Zijlstra <peterz@infradead.org>

ftrace,kcfi: Separate ftrace_stub() and ftrace_stub_graph()

Different function signatures means they needs to be different
functions; otherwise CFI gets upset.

As triggered by the ftrace boot tests:

[] CFI failure at ftrace_return_to_handler+0xac/0x16c (target: ftrace_stub+0x0/0x14; expected type: 0x0a5d5347)

Fixes: 3c516f89e17e ("x86: Add support for CONFIG_CFI_CLANG")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lkml.kernel.org/r/Y06dg4e1xF6JTdQq@hirez.programming.kicks-ass.net


# 66f4006b 04-Sep-2022 Jim Cromie <jim.cromie@gmail.com>

kernel/module: add __dyndbg_classes section

Add __dyndbg_classes section, using __dyndbg as a model. Use it:

vmlinux.lds.h:

KEEP the new section, which also silences orphan section warning on
loadable modules. Add (__start_/__stop_)__dyndbg_classes linker
symbols for the c externs (below).

kernel/module/main.c:
- fill new fields in find_module_sections(), using section_objs()
- extend callchain prototypes
to pass classes, length
load_module(): pass new info to dynamic_debug_setup()
dynamic_debug_setup(): new params, pass through to ddebug_add_module()

dynamic_debug.c:
- add externs to the linker symbols.

ddebug_add_module():
- It currently builds a debug_table, and *will* find and attach classes.

dynamic_debug_init():
- add class fields to the _ddebug_info cursor var: di.

Signed-off-by: Jim Cromie <jim.cromie@gmail.com>
Link: https://lore.kernel.org/r/20220904214134.408619-16-jim.cromie@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# 9440155c 03-Sep-2022 Peter Zijlstra (Intel) <peterz@infradead.org>

ftrace: Add HAVE_DYNAMIC_FTRACE_NO_PATCHABLE

x86 will shortly start using -fpatchable-function-entry for purposes
other than ftrace, make sure the __patchable_function_entry section
isn't merged in the mcount_loc section.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220903131154.420467-2-jolsa@kernel.org


# 89245600 08-Sep-2022 Sami Tolvanen <samitolvanen@google.com>

cfi: Switch to -fsanitize=kcfi

Switch from Clang's original forward-edge control-flow integrity
implementation to -fsanitize=kcfi, which is better suited for the
kernel, as it doesn't require LTO, doesn't use a jump table that
requires altering function references, and won't break cross-module
function address equality.

Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Tested-by: Kees Cook <keescook@chromium.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20220908215504.3686827-6-samitolvanen@google.com


# 13b05669 22-Sep-2022 Will Deacon <will@kernel.org>

vmlinux.lds.h: CFI: Reduce alignment of jump-table to function alignment

Due to undocumented, hysterical raisins on x86, the CFI jump-table
sections in .text are needlessly aligned to PMD_SIZE in the vmlinux
linker script. When compiling a CFI-enabled arm64 kernel with a 64KiB
page-size, a PMD maps 512MiB of virtual memory and so the .text section
increases to a whopping 940MiB and blows the final Image up to 960MiB.
Others report a link failure.

Since the CFI jump-table requires only instruction alignment, reduce the
alignment directives to function alignment for parity with other parts
of the .text section. This reduces the size of the .text section for the
aforementioned 64KiB page size arm64 kernel to 19MiB for a much more
reasonable total Image size of 39MiB.

Cc: Sami Tolvanen <samitolvanen@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: "Mohan Rao .vanimina" <mailtoc.mohanrao@gmail.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/all/CAL_GTzigiNOMYkOPX1KDnagPhJtFNqSK=1USNbS0wUL4PW6-Uw@mail.gmail.com/
Fixes: cf68fffb66d6 ("add support for Clang CFI")
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20220922215715.13345-1-will@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>


# 546a3fee 17-May-2022 Peter Zijlstra <peterz@infradead.org>

sched: Reverse sched_class layout

Because GCC-12 is fully stupid about array bounds and it's just really
hard to get a solid array definition from a linker script, flip the
array order to avoid needing negative offsets :-/

This makes the whole relational pointer magic a little less obvious, but
alas.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lkml.kernel.org/r/YoOLLmLG7HRTXeEm@hirez.programming.kicks-ass.net


# b44544fe 08-Mar-2022 Peter Zijlstra <peterz@infradead.org>

static_call: Avoid building empty .static_call_sites

Without CONFIG_HAVE_STATIC_CALL_INLINE there's no point in creating
the .static_call_sites section and it's related symbols.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/20220308154317.223798256@infradead.org


# b9794a82 28-Jan-2022 Daniel Lezcano <daniel.lezcano@linaro.org>

powercap/drivers/dtpm: Convert the init table section to a simple array

The init table section is freed after the system booted. However the
next changes will make per module the DTPM description, so the table
won't be accessible when the module is loaded.

In order to fix that, we should move the table to the data section
where there are very few entries and that makes strange to add it
there.

The main goal of the table was to keep self-encapsulated code and we
can keep it almost as it by using an array instead.

Suggested-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20220128163537.212248-2-daniel.lezcano@linaro.org


# 771856ca 21-Oct-2021 Luis Chamberlain <mcgrof@kernel.org>

vmlinux.lds.h: wrap built-in firmware support under FW_LOADER

The firmware loader built-in firmware is only available when FW_LOADER
is built-in, so tuck away the sections for built-in firmware under it.

This ensures no oddball user tries to uses these sections without
first enabling FW_LOADER=y.

Reviewed-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Link: https://lore.kernel.org/r/20211021155843.1969401-6-mcgrof@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# ca136cac 13-Oct-2021 Kristen Carlson Accardi <kristen@linux.intel.com>

vmlinux.lds.h: Have ORC lookup cover entire _etext - _stext

When using -ffunction-sections to place each function in its own text
section (so it can be randomized at load time in the future FGKASLR
series), the linker will place most of the functions into separate .text.*
sections. SIZEOF(.text) won't work here for calculating the ORC lookup
table size, so the total text size must be calculated to include .text
AND all .text.* sections.

Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
[ alobakin: move it to vmlinux.lds.h and make arch-indep ]
Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Tested-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/r/20211013175742.1197608-5-keescook@chromium.org


# 34cdd18b 17-Jun-2020 Steven Rostedt (VMware) <rostedt@goodmis.org>

tracing: Use linker magic instead of recasting ftrace_ops_list_func()

In an effort to enable -Wcast-function-type in the top-level Makefile to
support Control Flow Integrity builds, all function casts need to be
removed.

This means that ftrace_ops_list_func() can no longer be defined as
ftrace_ops_no_ops(). The reason for ftrace_ops_no_ops() is to use that when
an architecture calls ftrace_ops_list_func() with only two parameters
(called from assembly). And to make sure there's no C side-effects, those
archs call ftrace_ops_no_ops() which only has two parameters, as
ftrace_ops_list_func() has four parameters.

Instead of a typecast, use vmlinux.lds.h to define ftrace_ops_list_func() to
arch_ftrace_ops_list_func() that will define the proper set of parameters.

Link: https://lore.kernel.org/r/20200614070154.6039-1-oscar.carter@gmx.com
Link: https://lkml.kernel.org/r/20200617165616.52241bde@oasis.local.home
Link: https://lore.kernel.org/all/20211005053922.GA702049@embeddedor/

Requested-by: Oscar Carter <oscar.carter@gmx.com>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 6f20fa2d 10-Sep-2021 Nick Desaulniers <ndesaulniers@google.com>

vmlinux.lds.h: remove old check for GCC 4.9

Now that GCC 5.1 is the minimally supported version of GCC, we can
effectively revert commit 85c2ce9104eb ("sched, vmlinux.lds: Increase
STRUCT_ALIGNMENT to 64 bytes for GCC-4.9")

Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 33701557 15-Jun-2021 Chris Down <chris@chrisdown.name>

printk: Userspace format indexing support

We have a number of systems industry-wide that have a subset of their
functionality that works as follows:

1. Receive a message from local kmsg, serial console, or netconsole;
2. Apply a set of rules to classify the message;
3. Do something based on this classification (like scheduling a
remediation for the machine), rinse, and repeat.

As a couple of examples of places we have this implemented just inside
Facebook, although this isn't a Facebook-specific problem, we have this
inside our netconsole processing (for alarm classification), and as part
of our machine health checking. We use these messages to determine
fairly important metrics around production health, and it's important
that we get them right.

While for some kinds of issues we have counters, tracepoints, or metrics
with a stable interface which can reliably indicate the issue, in order
to react to production issues quickly we need to work with the interface
which most kernel developers naturally use when developing: printk.

Most production issues come from unexpected phenomena, and as such
usually the code in question doesn't have easily usable tracepoints or
other counters available for the specific problem being mitigated. We
have a number of lines of monitoring defence against problems in
production (host metrics, process metrics, service metrics, etc), and
where it's not feasible to reliably monitor at another level, this kind
of pragmatic netconsole monitoring is essential.

As one would expect, monitoring using printk is rather brittle for a
number of reasons -- most notably that the message might disappear
entirely in a new version of the kernel, or that the message may change
in some way that the regex or other classification methods start to
silently fail.

One factor that makes this even harder is that, under normal operation,
many of these messages are never expected to be hit. For example, there
may be a rare hardware bug which one wants to detect if it was to ever
happen again, but its recurrence is not likely or anticipated. This
precludes using something like checking whether the printk in question
was printed somewhere fleetwide recently to determine whether the
message in question is still present or not, since we don't anticipate
that it should be printed anywhere, but still need to monitor for its
future presence in the long-term.

This class of issue has happened on a number of occasions, causing
unhealthy machines with hardware issues to remain in production for
longer than ideal. As a recent example, some monitoring around
blk_update_request fell out of date and caused semi-broken machines to
remain in production for longer than would be desirable.

Searching through the codebase to find the message is also extremely
fragile, because many of the messages are further constructed beyond
their callsite (eg. btrfs_printk and other module-specific wrappers,
each with their own functionality). Even if they aren't, guessing the
format and formulation of the underlying message based on the aesthetics
of the message emitted is not a recipe for success at scale, and our
previous issues with fleetwide machine health checking demonstrate as
much.

This provides a solution to the issue of silently changed or deleted
printks: we record pointers to all printk format strings known at
compile time into a new .printk_index section, both in vmlinux and
modules. At runtime, this can then be iterated by looking at
<debugfs>/printk/index/<module>, which emits the following format, both
readable by humans and able to be parsed by machines:

$ head -1 vmlinux; shuf -n 5 vmlinux
# <level[,flags]> filename:line function "format"
<5> block/blk-settings.c:661 disk_stack_limits "%s: Warning: Device %s is misaligned\n"
<4> kernel/trace/trace.c:8296 trace_create_file "Could not create tracefs '%s' entry\n"
<6> arch/x86/kernel/hpet.c:144 _hpet_print_config "hpet: %s(%d):\n"
<6> init/do_mounts.c:605 prepare_namespace "Waiting for root device %s...\n"
<6> drivers/acpi/osl.c:1410 acpi_no_auto_serialize_setup "ACPI: auto-serialization disabled\n"

This mitigates the majority of cases where we have a highly-specific
printk which we want to match on, as we can now enumerate and check
whether the format changed or the printk callsite disappeared entirely
in userspace. This allows us to catch changes to printks we monitor
earlier and decide what to do about it before it becomes problematic.

There is no additional runtime cost for printk callers or printk itself,
and the assembly generated is exactly the same.

Signed-off-by: Chris Down <chris@chrisdown.name>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Jessica Yu <jeyu@kernel.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kees Cook <keescook@chromium.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Tested-by: Petr Mladek <pmladek@suse.com>
Reported-by: kernel test robot <lkp@intel.com>
Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Acked-by: Jessica Yu <jeyu@kernel.org> # for module.{c,h}
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/e42070983637ac5e384f17fbdbe86d19c7b212a5.1623775748.git.chris@chrisdown.name


# 84837881 30-Jul-2021 Nathan Chancellor <nathan@kernel.org>

vmlinux.lds.h: Handle clang's module.{c,d}tor sections

A recent change in LLVM causes module_{c,d}tor sections to appear when
CONFIG_K{A,C}SAN are enabled, which results in orphan section warnings
because these are not handled anywhere:

ld.lld: warning: arch/x86/pci/built-in.a(legacy.o):(.text.asan.module_ctor) is being placed in '.text.asan.module_ctor'
ld.lld: warning: arch/x86/pci/built-in.a(legacy.o):(.text.asan.module_dtor) is being placed in '.text.asan.module_dtor'
ld.lld: warning: arch/x86/pci/built-in.a(legacy.o):(.text.tsan.module_ctor) is being placed in '.text.tsan.module_ctor'

Fangrui explains: "the function asan.module_ctor has the SHF_GNU_RETAIN
flag, so it is in a separate section even with -fno-function-sections
(default)".

Place them in the TEXT_TEXT section so that these technologies continue
to work with the newer compiler versions. All of the KASAN and KCSAN
KUnit tests continue to pass after this change.

Cc: stable@vger.kernel.org
Link: https://github.com/ClangBuiltLinux/linux/issues/1432
Link: https://github.com/llvm/llvm-project/commit/7b789562244ee941b7bf2cefeb3fc08a59a01865
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Fangrui Song <maskray@google.com>
Acked-by: Marco Elver <elver@google.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20210731023107.1932981-1-nathan@kernel.org


# d4c63999 05-May-2021 Nathan Chancellor <nathan@kernel.org>

vmlinux.lds.h: Avoid orphan section with !SMP

With x86_64_defconfig and the following configs, there is an orphan
section warning:

CONFIG_SMP=n
CONFIG_AMD_MEM_ENCRYPT=y
CONFIG_HYPERVISOR_GUEST=y
CONFIG_KVM=y
CONFIG_PARAVIRT=y

ld: warning: orphan section `.data..decrypted' from `arch/x86/kernel/cpu/vmware.o' being placed in section `.data..decrypted'
ld: warning: orphan section `.data..decrypted' from `arch/x86/kernel/kvm.o' being placed in section `.data..decrypted'

These sections are created with DEFINE_PER_CPU_DECRYPTED, which
ultimately turns into __PCPU_ATTRS, which in turn has a section
attribute with a value of PER_CPU_BASE_SECTION + the section name. When
CONFIG_SMP is not set, the base section is .data and that is not
currently handled in any linker script.

Add .data..decrypted to PERCPU_DECRYPTED_SECTION, which is included in
PERCPU_INPUT -> PERCPU_SECTION, which is include in the x86 linker
script when either CONFIG_X86_64 or CONFIG_SMP is unset, taking care of
the warning.

Fixes: ac26963a1175 ("percpu: Introduce DEFINE_PER_CPU_DECRYPTED")
Link: https://github.com/ClangBuiltLinux/linux/issues/1360
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Nick Desaulniers <ndesaulniers@google.com> # build
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20210506001410.1026691-1-nathan@kernel.org


# cf68fffb 08-Apr-2021 Sami Tolvanen <samitolvanen@google.com>

add support for Clang CFI

This change adds support for Clang’s forward-edge Control Flow
Integrity (CFI) checking. With CONFIG_CFI_CLANG, the compiler
injects a runtime check before each indirect function call to ensure
the target is a valid function with the correct static type. This
restricts possible call targets and makes it more difficult for
an attacker to exploit bugs that allow the modification of stored
function pointers. For more details, see:

https://clang.llvm.org/docs/ControlFlowIntegrity.html

Clang requires CONFIG_LTO_CLANG to be enabled with CFI to gain
visibility to possible call targets. Kernel modules are supported
with Clang’s cross-DSO CFI mode, which allows checking between
independently compiled components.

With CFI enabled, the compiler injects a __cfi_check() function into
the kernel and each module for validating local call targets. For
cross-module calls that cannot be validated locally, the compiler
calls the global __cfi_slowpath_diag() function, which determines
the target module and calls the correct __cfi_check() function. This
patch includes a slowpath implementation that uses __module_address()
to resolve call targets, and with CONFIG_CFI_CLANG_SHADOW enabled, a
shadow map that speeds up module look-ups by ~3x.

Clang implements indirect call checking using jump tables and
offers two methods of generating them. With canonical jump tables,
the compiler renames each address-taken function to <function>.cfi
and points the original symbol to a jump table entry, which passes
__cfi_check() validation. This isn’t compatible with stand-alone
assembly code, which the compiler doesn’t instrument, and would
result in indirect calls to assembly code to fail. Therefore, we
default to using non-canonical jump tables instead, where the compiler
generates a local jump table entry <function>.cfi_jt for each
address-taken function, and replaces all references to the function
with the address of the jump table entry.

Note that because non-canonical jump table addresses are local
to each component, they break cross-module function address
equality. Specifically, the address of a global function will be
different in each module, as it's replaced with the address of a local
jump table entry. If this address is passed to a different module,
it won’t match the address of the same function taken there. This
may break code that relies on comparing addresses passed from other
components.

CFI checking can be disabled in a function with the __nocfi attribute.
Additionally, CFI can be disabled for an entire compilation unit by
filtering out CC_FLAGS_CFI.

By default, CFI failures result in a kernel panic to stop a potential
exploit. CONFIG_CFI_PERMISSIVE enables a permissive mode, where the
kernel prints out a rate-limited warning instead, and allows execution
to continue. This option is helpful for locating type mismatches, but
should only be enabled during development.

Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20210408182843.1754385-2-samitolvanen@google.com


# f5b6a74d 29-Jan-2021 Nathan Chancellor <nathan@kernel.org>

vmlinux.lds.h: Define SANTIZER_DISCARDS with CONFIG_GCOV_KERNEL=y

clang produces .eh_frame sections when CONFIG_GCOV_KERNEL is enabled,
even when -fno-asynchronous-unwind-tables is in KBUILD_CFLAGS:

$ make CC=clang vmlinux
...
ld: warning: orphan section `.eh_frame' from `init/main.o' being placed in section `.eh_frame'
ld: warning: orphan section `.eh_frame' from `init/version.o' being placed in section `.eh_frame'
ld: warning: orphan section `.eh_frame' from `init/do_mounts.o' being placed in section `.eh_frame'
ld: warning: orphan section `.eh_frame' from `init/do_mounts_initrd.o' being placed in section `.eh_frame'
ld: warning: orphan section `.eh_frame' from `init/initramfs.o' being placed in section `.eh_frame'
ld: warning: orphan section `.eh_frame' from `init/calibrate.o' being placed in section `.eh_frame'
ld: warning: orphan section `.eh_frame' from `init/init_task.o' being placed in section `.eh_frame'
...

$ rg "GCOV_KERNEL|GCOV_PROFILE_ALL" .config
CONFIG_GCOV_KERNEL=y
CONFIG_ARCH_HAS_GCOV_PROFILE_ALL=y
CONFIG_GCOV_PROFILE_ALL=y

This was already handled for a couple of other options in
commit d812db78288d ("vmlinux.lds.h: Avoid KASAN and KCSAN's unwanted
sections") and there is an open LLVM bug for this issue. Take advantage
of that section for this config as well so that there are no more orphan
warnings.

Link: https://bugs.llvm.org/show_bug.cgi?id=46478
Link: https://github.com/ClangBuiltLinux/linux/issues/1069
Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: Fangrui Song <maskray@google.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Fixes: d812db78288d ("vmlinux.lds.h: Avoid KASAN and KCSAN's unwanted sections")
Cc: stable@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20210130004650.2682422-1-nathan@kernel.org


# 49387f62 23-Feb-2021 Alexander Lobakin <alobakin@pm.me>

vmlinux.lds.h: catch even more instrumentation symbols into .data

LKP caught another bunch of orphaned instrumentation symbols [0]:

mipsel-linux-ld: warning: orphan section `.data.$LPBX1' from
`init/main.o' being placed in section `.data.$LPBX1'
mipsel-linux-ld: warning: orphan section `.data.$LPBX0' from
`init/main.o' being placed in section `.data.$LPBX0'
mipsel-linux-ld: warning: orphan section `.data.$LPBX1' from
`init/do_mounts.o' being placed in section `.data.$LPBX1'
mipsel-linux-ld: warning: orphan section `.data.$LPBX0' from
`init/do_mounts.o' being placed in section `.data.$LPBX0'
mipsel-linux-ld: warning: orphan section `.data.$LPBX1' from
`init/do_mounts_initrd.o' being placed in section `.data.$LPBX1'
mipsel-linux-ld: warning: orphan section `.data.$LPBX0' from
`init/do_mounts_initrd.o' being placed in section `.data.$LPBX0'
mipsel-linux-ld: warning: orphan section `.data.$LPBX1' from
`init/initramfs.o' being placed in section `.data.$LPBX1'
mipsel-linux-ld: warning: orphan section `.data.$LPBX0' from
`init/initramfs.o' being placed in section `.data.$LPBX0'
mipsel-linux-ld: warning: orphan section `.data.$LPBX1' from
`init/calibrate.o' being placed in section `.data.$LPBX1'
mipsel-linux-ld: warning: orphan section `.data.$LPBX0' from
`init/calibrate.o' being placed in section `.data.$LPBX0'

[...]

Soften the wildcard to .data.$L* to grab these ones into .data too.

[0] https://lore.kernel.org/lkml/202102231519.lWPLPveV-lkp@intel.com

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>


# 3c4fa46b 05-Feb-2021 Nick Desaulniers <ndesaulniers@google.com>

vmlinux.lds.h: add DWARF v5 sections

We expect toolchains to produce these new debug info sections as part of
DWARF v5. Add explicit placements to prevent the linker warnings from
--orphan-section=warn.

Compilers may produce such sections with explicit -gdwarf-5, or based on
the implicit default version of DWARF when -g is used via DEBUG_INFO.
This implicit default changes over time, and has changed to DWARF v5
with GCC 11.

.debug_sup was mentioned in review, but without compilers producing it
today, let's wait to add it until it becomes necessary.

Cc: stable@vger.kernel.org
Link: https://bugzilla.redhat.com/show_bug.cgi?id=1922707
Reported-by: Chris Murphy <lists@colorremedies.com>
Suggested-by: Fangrui Song <maskray@google.com>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Mark Wielaard <mark@klomp.org>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>


# 36794822 02-Feb-2021 Christoph Hellwig <hch@lst.de>

module: remove EXPORT_UNUSED_SYMBOL*

EXPORT_UNUSED_SYMBOL* is not actually used anywhere. Remove the
unused functionality as we generally just remove unused code anyway.

Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jessica Yu <jeyu@kernel.org>


# f1c3d73e 02-Feb-2021 Christoph Hellwig <hch@lst.de>

module: remove EXPORT_SYMBOL_GPL_FUTURE

As far as I can tell this has never been used at all, and certainly
not any time recently.

Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jessica Yu <jeyu@kernel.org>


# dc5723b0 11-Dec-2020 Sami Tolvanen <samitolvanen@google.com>

kbuild: add support for Clang LTO

This change adds build system support for Clang's Link Time
Optimization (LTO). With -flto, instead of ELF object files, Clang
produces LLVM bitcode, which is compiled into native code at link
time, allowing the final binary to be optimized globally. For more
details, see:

https://llvm.org/docs/LinkTimeOptimization.html

The Kconfig option CONFIG_LTO_CLANG is implemented as a choice,
which defaults to LTO being disabled. To use LTO, the architecture
must select ARCH_SUPPORTS_LTO_CLANG and support:

- compiling with Clang,
- compiling all assembly code with Clang's integrated assembler,
- and linking with LLD.

While using CONFIG_LTO_CLANG_FULL results in the best runtime
performance, the compilation is not scalable in time or
memory. CONFIG_LTO_CLANG_THIN enables ThinLTO, which allows
parallel optimization and faster incremental builds. ThinLTO is
used by default if the architecture also selects
ARCH_SUPPORTS_LTO_CLANG_THIN:

https://clang.llvm.org/docs/ThinLTO.html

To enable LTO, LLVM tools must be used to handle bitcode files, by
passing LLVM=1 and LLVM_IAS=1 options to make:

$ make LLVM=1 LLVM_IAS=1 defconfig
$ scripts/config -e LTO_CLANG_THIN
$ make LLVM=1 LLVM_IAS=1

To prepare for LTO support with other compilers, common parts are
gated behind the CONFIG_LTO option, and LTO can be disabled for
specific files by filtering out CC_FLAGS_LTO.

Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20201211184633.3213045-3-samitolvanen@google.com


# 73f44fe1 27-Jan-2021 Josh Poimboeuf <jpoimboe@redhat.com>

static_call: Allow module use without exposing static_call_key

When exporting static_call_key; with EXPORT_STATIC_CALL*(), the module
can use static_call_update() to change the function called. This is
not desirable in general.

Not exporting static_call_key however also disallows usage of
static_call(), since objtool needs the key to construct the
static_call_site.

Solve this by allowing objtool to create the static_call_site using
the trampoline address when it builds a module and cannot find the
static_call_key symbol. The module loader will then try and map the
trampole back to a key before it constructs the normal sites list.

Doing this requires a trampoline -> key associsation, so add another
magic section that keeps those.

Originally-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20210127231837.ifddpn7rhwdaepiu@treble


# fa07eca8 16-Feb-2021 Alexander Lobakin <alobakin@pm.me>

vmlinux.lds.h: catch more UBSAN symbols into .data

LKP triggered lots of LD orphan warnings [0]:

mipsel-linux-ld: warning: orphan section `.data.$Lubsan_data299' from
`init/do_mounts_rd.o' being placed in section `.data.$Lubsan_data299'
mipsel-linux-ld: warning: orphan section `.data.$Lubsan_data183' from
`init/do_mounts_rd.o' being placed in section `.data.$Lubsan_data183'
mipsel-linux-ld: warning: orphan section `.data.$Lubsan_type3' from
`init/do_mounts_rd.o' being placed in section `.data.$Lubsan_type3'
mipsel-linux-ld: warning: orphan section `.data.$Lubsan_type2' from
`init/do_mounts_rd.o' being placed in section `.data.$Lubsan_type2'
mipsel-linux-ld: warning: orphan section `.data.$Lubsan_type0' from
`init/do_mounts_rd.o' being placed in section `.data.$Lubsan_type0'

[...]

Seems like "unnamed data" isn't the only type of symbols that UBSAN
instrumentation can emit.
Catch these into .data with the wildcard as well.

[0] https://lore.kernel.org/linux-mm/202102160741.k57GCNSR-lkp@intel.com

Fixes: f41b233de0ae ("vmlinux.lds.h: catch UBSAN's "unnamed data" into data")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>


# f41b233d 10-Jan-2021 Alexander Lobakin <alobakin@pm.me>

vmlinux.lds.h: catch UBSAN's "unnamed data" into data

When building kernel with both LD_DEAD_CODE_DATA_ELIMINATION and
UBSAN, LLVM stack generates lots of "unnamed data" sections:

ld.lld: warning: net/built-in.a(netfilter/utils.o): (.data.$__unnamed_2)
is being placed in '.data.$__unnamed_2'
ld.lld: warning: net/built-in.a(netfilter/utils.o): (.data.$__unnamed_3)
is being placed in '.data.$__unnamed_3'
ld.lld: warning: net/built-in.a(netfilter/utils.o): (.data.$__unnamed_4)
is being placed in '.data.$__unnamed_4'
ld.lld: warning: net/built-in.a(netfilter/utils.o): (.data.$__unnamed_5)
is being placed in '.data.$__unnamed_5'

[...]

Also handle this by adding the related sections to generic definitions.

Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>


# 9a427556 10-Jan-2021 Alexander Lobakin <alobakin@pm.me>

vmlinux.lds.h: catch compound literals into data and BSS

When building kernel with LD_DEAD_CODE_DATA_ELIMINATION, LLVM stack
generates separate sections for compound literals, just like in case
with enabled LTO [0]:

ld.lld: warning: drivers/built-in.a(mtd/nand/spi/gigadevice.o):
(.data..compoundliteral.14) is being placed in
'.data..compoundliteral.14'
ld.lld: warning: drivers/built-in.a(mtd/nand/spi/gigadevice.o):
(.data..compoundliteral.15) is being placed in
'.data..compoundliteral.15'
ld.lld: warning: drivers/built-in.a(mtd/nand/spi/gigadevice.o):
(.data..compoundliteral.16) is being placed in
'.data..compoundliteral.16'
ld.lld: warning: drivers/built-in.a(mtd/nand/spi/gigadevice.o):
(.data..compoundliteral.17) is being placed in
'.data..compoundliteral.17'

[...]

Handle this by adding the related sections to generic definitions
as suggested by Sami [0].

[0] https://lore.kernel.org/lkml/20201211184633.3213045-3-samitolvanen@google.com

Suggested-by: Sami Tolvanen <samitolvanen@google.com>
Suggested-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>


# a20d0ef9 08-Dec-2020 Daniel Lezcano <daniel.lezcano@linaro.org>

powercap/drivers/dtpm: Add API for dynamic thermal power management

On the embedded world, the complexity of the SoC leads to an
increasing number of hotspots which need to be monitored and mitigated
as a whole in order to prevent the temperature to go above the
normative and legally stated 'skin temperature'.

Another aspect is to sustain the performance for a given power budget,
for example virtual reality where the user can feel dizziness if the
GPU performance is capped while a big CPU is processing something
else. Or reduce the battery charging because the dissipated power is
too high compared with the power consumed by other devices.

The userspace is the most adequate place to dynamically act on the
different devices by limiting their power given an application
profile: it has the knowledge of the platform.

These userspace daemons are in charge of the Dynamic Thermal Power
Management (DTPM).

Nowadays, the dtpm daemons are abusing the thermal framework as they
act on the cooling device state to force a specific and arbitrary
state without taking care of the governor decisions. Given the closed
loop of some governors that can confuse the logic or directly enter in
a decision conflict.

As the number of cooling device support is limited today to the CPU
and the GPU, the dtpm daemons have little control on the power
dissipation of the system. The out of tree solutions are hacking
around here and there in the drivers, in the frameworks to have
control on the devices. The common solution is to declare them as
cooling devices.

There is no unification of the power limitation unit, opaque states
are used.

This patch provides a way to create a hierarchy of constraints using
the powercap framework. The devices which are registered as power
limit-able devices are represented in this hierarchy as a tree. They
are linked together with intermediate nodes which are just there to
propagate the constraint to the children.

The leaves of the tree are the real devices, the intermediate nodes
are virtual, aggregating the children constraints and power
characteristics.

Each node have a weight on a 2^10 basis, in order to reflect the
percentage of power distribution of the children's node. This
percentage is used to dispatch the power limit to the children.

The weight is computed against the max power of the siblings.

This simple approach allows to do a fair distribution of the power
limit.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Tested-by: Lukasz Luba <lukasz.luba@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 793f49a8 09-Feb-2021 Fangrui Song <maskray@google.com>

firmware_loader: align .builtin_fw to 8

arm64 references the start address of .builtin_fw (__start_builtin_fw)
with a pair of R_AARCH64_ADR_PREL_PG_HI21/R_AARCH64_LDST64_ABS_LO12_NC
relocations. The compiler is allowed to emit the
R_AARCH64_LDST64_ABS_LO12_NC relocation because struct builtin_fw in
include/linux/firmware.h is 8-byte aligned.

The R_AARCH64_LDST64_ABS_LO12_NC relocation requires the address to be a
multiple of 8, which may not be the case if .builtin_fw is empty.
Unconditionally align .builtin_fw to fix the linker error. 32-bit
architectures could use ALIGN(4) but that would add unnecessary
complexity, so just use ALIGN(8).

Link: https://lkml.kernel.org/r/20201208054646.2913063-1-maskray@google.com
Link: https://github.com/ClangBuiltLinux/linux/issues/1204
Fixes: 5658c76 ("firmware: allow firmware files to be built into kernel image")
Signed-off-by: Fangrui Song <maskray@google.com>
Reported-by: kernel test robot <lkp@intel.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Tested-by: Douglas Anderson <dianders@chromium.org>
Acked-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 3e663148 04-Oct-2020 Kees Cook <keescook@chromium.org>

vmlinux.lds.h: Keep .ctors.* with .ctors

Under some circumstances, the compiler generates .ctors.* sections. This
is seen doing a cross compile of x86_64 from a powerpc64el host:

x86_64-linux-gnu-ld: warning: orphan section `.ctors.65435' from `kernel/trace/trace_clock.o' being
placed in section `.ctors.65435'
x86_64-linux-gnu-ld: warning: orphan section `.ctors.65435' from `kernel/trace/ftrace.o' being
placed in section `.ctors.65435'
x86_64-linux-gnu-ld: warning: orphan section `.ctors.65435' from `kernel/trace/ring_buffer.o' being
placed in section `.ctors.65435'

Include these orphans along with the regular .ctors section.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Tested-by: Stephen Rothwell <sfr@canb.auug.org.au>
Fixes: 83109d5d5fba ("x86/build: Warn on orphan section placement")
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Nick Desaulniers <ndesaulniers@google.com>
Link: https://lore.kernel.org/r/20201005025720.2599682-1-keescook@chromium.org


# 90a025a8 04-Aug-2020 Brendan Higgins <brendanhiggins@google.com>

vmlinux.lds.h: add linker section for KUnit test suites

Add a linker section where KUnit can put references to its test suites.
This patch is the first step in transitioning to dispatching all KUnit
tests from a centralized executor rather than having each as its own
separate late_initcall.

Co-developed-by: Iurii Zaikin <yzaikin@google.com>
Signed-off-by: Iurii Zaikin <yzaikin@google.com>
Signed-off-by: Brendan Higgins <brendanhiggins@google.com>
Reviewed-by: Stephen Boyd <sboyd@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>


# 65c20439 19-Sep-2020 Tony Ambardar <tony.ambardar@gmail.com>

bpf: Prevent .BTF section elimination

Systems with memory or disk constraints often reduce the kernel footprint
by configuring LD_DEAD_CODE_DATA_ELIMINATION. However, this can result in
removal of any BTF information.

Use the KEEP() macro to preserve the BTF data as done with other important
sections, while still allowing for smaller kernels.

Fixes: 90ceddcb4950 ("bpf: Support llvm-objcopy for vmlinux BTF")
Signed-off-by: Tony Ambardar <Tony.Ambardar@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/a635b5d3e2da044e7b51ec1315e8910fbce0083f.1600417359.git.Tony.Ambardar@gmail.com


# 1e7e4788 18-Aug-2020 Josh Poimboeuf <jpoimboe@redhat.com>

x86/static_call: Add inline static call implementation for x86-64

Add the inline static call implementation for x86-64. The generated code
is identical to the out-of-line case, except we move the trampoline into
it's own section.

Objtool uses the trampoline naming convention to detect all the call
sites. It then annotates those call sites in the .static_call_sites
section.

During boot (and module init), the call sites are patched to call
directly into the destination function. The temporary trampoline is
then no longer used.

[peterz: merged trampolines, put trampoline in section]

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20200818135804.864271425@infradead.org


# 9183c3f9 18-Aug-2020 Josh Poimboeuf <jpoimboe@redhat.com>

static_call: Add inline static call infrastructure

Add infrastructure for an arch-specific CONFIG_HAVE_STATIC_CALL_INLINE
option, which is a faster version of CONFIG_HAVE_STATIC_CALL. At
runtime, the static call sites are patched directly, rather than using
the out-of-line trampolines.

Compared to out-of-line static calls, the performance benefits are more
modest, but still measurable. Steven Rostedt did some tracepoint
measurements:

https://lkml.kernel.org/r/20181126155405.72b4f718@gandalf.local.home

This code is heavily inspired by the jump label code (aka "static
jumps"), as some of the concepts are very similar.

For more details, see the comments in include/linux/static_call.h.

[peterz: simplified interface; merged trampolines]

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20200818135804.684334440@infradead.org


# eff8728f 21-Aug-2020 Nick Desaulniers <ndesaulniers@google.com>

vmlinux.lds.h: Add PGO and AutoFDO input sections

Basically, consider .text.{hot|unlikely|unknown}.* part of .text, too.

When compiling with profiling information (collected via PGO
instrumentations or AutoFDO sampling), Clang will separate code into
.text.hot, .text.unlikely, or .text.unknown sections based on profiling
information. After D79600 (clang-11), these sections will have a
trailing `.` suffix, ie. .text.hot., .text.unlikely., .text.unknown..

When using -ffunction-sections together with profiling infomation,
either explicitly (FGKASLR) or implicitly (LTO), code may be placed in
sections following the convention:
.text.hot.<foo>, .text.unlikely.<bar>, .text.unknown.<baz>
where <foo>, <bar>, and <baz> are functions. (This produces one section
per function; we generally try to merge these all back via linker script
so that we don't have 50k sections).

For the above cases, we need to teach our linker scripts that such
sections might exist and that we'd explicitly like them grouped
together, otherwise we can wind up with code outside of the
_stext/_etext boundaries that might not be mapped properly for some
architectures, resulting in boot failures.

If the linker script is not told about possible input sections, then
where the section is placed as output is a heuristic-laiden mess that's
non-portable between linkers (ie. BFD and LLD), and has resulted in many
hard to debug bugs. Kees Cook is working on cleaning this up by adding
--orphan-handling=warn linker flag used in ARCH=powerpc to additional
architectures. In the case of linker scripts, borrowing from the Zen of
Python: explicit is better than implicit.

Also, ld.bfd's internal linker script considers .text.hot AND
.text.hot.* to be part of .text, as well as .text.unlikely and
.text.unlikely.*. I didn't see support for .text.unknown.*, and didn't
see Clang producing such code in our kernel builds, but I see code in
LLVM that can produce such section names if profiling information is
missing. That may point to a larger issue with generating or collecting
profiles, but I would much rather be safe and explicit than have to
debug yet another issue related to orphan section placement.

Reported-by: Jian Cai <jiancai@google.com>
Suggested-by: Fāng-ruì Sòng <maskray@google.com>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Luis Lozano <llozano@google.com>
Tested-by: Manoj Gupta <manojgupta@google.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: linux-arch@vger.kernel.org
Cc: stable@vger.kernel.org
Link: https://sourceware.org/git/?p=binutils-gdb.git;a=commitdiff;h=add44f8d5c5c05e08b11e033127a744d61c26aee
Link: https://sourceware.org/git/?p=binutils-gdb.git;a=commitdiff;h=1de778ed23ce7492c523d5850c6c6dbb34152655
Link: https://reviews.llvm.org/D79600
Link: https://bugs.chromium.org/p/chromium/issues/detail?id=1084760
Link: https://lore.kernel.org/r/20200821194310.3089815-7-keescook@chromium.org

Debugged-by: Luis Lozano <llozano@google.com>


# a840c4de 21-Aug-2020 Kees Cook <keescook@chromium.org>

vmlinux.lds.h: Add .symtab, .strtab, and .shstrtab to ELF_DETAILS

When linking vmlinux with LLD, the synthetic sections .symtab, .strtab,
and .shstrtab are listed as orphaned. Add them to the ELF_DETAILS section
so there will be no warnings when --orphan-handling=warn is used more
widely. (They are added above comment as it is the more common
order[1].)

ld.lld: warning: <internal>:(.symtab) is being placed in '.symtab'
ld.lld: warning: <internal>:(.shstrtab) is being placed in '.shstrtab'
ld.lld: warning: <internal>:(.strtab) is being placed in '.strtab'

[1] https://lore.kernel.org/lkml/20200622224928.o2a7jkq33guxfci4@google.com/

Reported-by: Fangrui Song <maskray@google.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: linux-arch@vger.kernel.org
Link: https://lore.kernel.org/r/20200821194310.3089815-6-keescook@chromium.org


# c604abc3 21-Aug-2020 Kees Cook <keescook@chromium.org>

vmlinux.lds.h: Split ELF_DETAILS from STABS_DEBUG

The .comment section doesn't belong in STABS_DEBUG. Split it out into a
new macro named ELF_DETAILS. This will gain other non-debug sections
that need to be accounted for when linking with --orphan-handling=warn.

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: linux-arch@vger.kernel.org
Link: https://lore.kernel.org/r/20200821194310.3089815-5-keescook@chromium.org


# d812db78 21-Aug-2020 Kees Cook <keescook@chromium.org>

vmlinux.lds.h: Avoid KASAN and KCSAN's unwanted sections

KASAN (-fsanitize=kernel-address) and KCSAN (-fsanitize=thread)
produce unwanted[1] .eh_frame and .init_array.* sections. Add them to
COMMON_DISCARDS, except with CONFIG_CONSTRUCTORS, which wants to keep
.init_array.* sections.

[1] https://bugs.llvm.org/show_bug.cgi?id=46478

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Marco Elver <elver@google.com>
Cc: linux-arch@vger.kernel.org
Link: https://lore.kernel.org/r/20200821194310.3089815-4-keescook@chromium.org


# dfbe6968 21-Aug-2020 Kees Cook <keescook@chromium.org>

vmlinux.lds.h: Add .gnu.version* to COMMON_DISCARDS

For vmlinux linking, no architecture uses the .gnu.version* sections,
so remove it via the COMMON_DISCARDS macro in preparation for adding
--orphan-handling=warn more widely. This is a work-around for what
appears to be a bug[1] in ld.bfd which warns for this synthetic section
even when none is found in input objects, and even when no section is
emitted for an output object[2].

[1] https://sourceware.org/bugzilla/show_bug.cgi?id=26153
[2] https://lore.kernel.org/lkml/202006221524.CEB86E036B@keescook/

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Fangrui Song <maskray@google.com>
Cc: linux-arch@vger.kernel.org
Link: https://lore.kernel.org/r/20200821194310.3089815-3-keescook@chromium.org


# 03c2b85c 21-Aug-2020 Kees Cook <keescook@chromium.org>

vmlinux.lds.h: Create COMMON_DISCARDS

Collect the common DISCARD sections for architectures that need more
specialized discard control than what the standard DISCARDS section
provides.

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: linux-arch@vger.kernel.org
Link: https://lore.kernel.org/r/20200821194310.3089815-2-keescook@chromium.org


# 7f897acb 14-Aug-2020 Romain Naour <romain.naour@gmail.com>

include/asm-generic/vmlinux.lds.h: align ro_after_init

Since the patch [1], building the kernel using a toolchain built with
binutils 2.33.1 prevents booting a sh4 system under Qemu. Apply the patch
provided by Alan Modra [2] that fix alignment of rodata.

[1] https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=ebd2263ba9a9124d93bbc0ece63d7e0fae89b40e
[2] https://www.sourceware.org/ml/binutils/2019-12/msg00112.html

Signed-off-by: Romain Naour <romain.naour@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Alan Modra <amodra@gmail.com>
Cc: Bin Meng <bin.meng@windriver.com>
Cc: Chen Zhou <chenzhou10@huawei.com>
Cc: Geert Uytterhoeven <geert+renesas@glider.be>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: Krzysztof Kozlowski <krzk@kernel.org>
Cc: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Cc: Rich Felker <dalias@libc.org>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: <stable@vger.kernel.org>
Link: https://marc.info/?l=linux-sh&m=158429470221261
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# e5ebffe1 19-Jul-2020 Jim Cromie <jim.cromie@gmail.com>

dyndbg: rename __verbose section to __dyndbg

dyndbg populates its callsite info into __verbose section, change that
to a more specific and descriptive name, __dyndbg.

Also, per checkpatch:
simplify __attribute(..) to __section(__dyndbg) declaration.

and 1 spelling fix, decriptor

Acked-by: <jbaron@akamai.com>
Signed-off-by: Jim Cromie <jim.cromie@gmail.com>
Link: https://lore.kernel.org/r/20200719231058.1586423-6-jim.cromie@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# de2b41be 21-Jul-2020 Joerg Roedel <jroedel@suse.de>

x86, vmlinux.lds: Page-align end of ..page_aligned sections

On x86-32 the idt_table with 256 entries needs only 2048 bytes. It is
page-aligned, but the end of the .bss..page_aligned section is not
guaranteed to be page-aligned.

As a result, objects from other .bss sections may end up on the same 4k
page as the idt_table, and will accidentially get mapped read-only during
boot, causing unexpected page-faults when the kernel writes to them.

This could be worked around by making the objects in the page aligned
sections page sized, but that's wrong.

Explicit sections which store only page aligned objects have an implicit
guarantee that the object is alone in the page in which it is placed. That
works for all objects except the last one. That's inconsistent.

Enforcing page sized objects for these sections would wreckage memory
sanitizers, because the object becomes artificially larger than it should
be and out of bound access becomes legit.

Align the end of the .bss..page_aligned and .data..page_aligned section on
page-size so all objects places in these sections are guaranteed to have
their own page.

[ tglx: Amended changelog ]

Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20200721093448.10417-1-joro@8bytes.org


# 5a2798ab 11-Jul-2020 Jiri Olsa <jolsa@kernel.org>

bpf: Add BTF_ID_LIST/BTF_ID/BTF_ID_UNUSED macros

Adding support to generate .BTF_ids section that will hold BTF
ID lists for verifier.

Adding macros that will help to define lists of BTF ID values
placed in .BTF_ids section. They are initially filled with zeros
(during compilation) and resolved later during the linking phase
by resolve_btfids tool.

Following defines list of one BTF ID value:

BTF_ID_LIST(bpf_skb_output_btf_ids)
BTF_ID(struct, sk_buff)

It also defines following variable to access the list:

extern u32 bpf_skb_output_btf_ids[];

The BTF_ID_UNUSED macro defines 4 zero bytes. It's used when we
want to define 'unused' entry in BTF_ID_LIST, like:

BTF_ID_LIST(bpf_skb_output_btf_ids)
BTF_ID(struct, sk_buff)
BTF_ID_UNUSED
BTF_ID(struct, task_struct)

Suggested-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Tested-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200711215329.41165-4-jolsa@kernel.org


# 85c2ce91 30-Jun-2020 Peter Zijlstra <peterz@infradead.org>

sched, vmlinux.lds: Increase STRUCT_ALIGNMENT to 64 bytes for GCC-4.9

For some mysterious reason GCC-4.9 has a 64 byte section alignment for
structures, all other GCC versions (and Clang) tested (including 4.8
and 5.0) are fine with the 32 bytes alignment.

Getting this right is important for the new SCHED_DATA macro that
creates an explicitly ordered array of 'struct sched_class' in the
linker script and expect pointer arithmetic to work.

Fixes: c3a340f7e7ea ("sched: Have sched_class_highest define by vmlinux.lds.h")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200630144905.GX4817@hirez.programming.kicks-ass.net


# c3a340f7 19-Dec-2019 Steven Rostedt (VMware) <rostedt@goodmis.org>

sched: Have sched_class_highest define by vmlinux.lds.h

Now that the sched_class descriptors are defined by the linker script, and
this needs to be aware of the existance of stop_sched_class when SMP is
enabled or not, as it is used as the "highest" priority when defined. Move
the declaration of sched_class_highest to the same location in the linker
script that inserts stop_sched_class, and this will also make it easier to
see what should be defined as the highest class, as this linker script
location defines the priorities as well.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20191219214558.682913590@goodmis.org


# 590d6979 19-Dec-2019 Steven Rostedt (VMware) <rostedt@goodmis.org>

sched: Force the address order of each sched class descriptor

In order to make a micro optimization in pick_next_task(), the order of the
sched class descriptor address must be in the same order as their priority
to each other. That is:

&idle_sched_class < &fair_sched_class < &rt_sched_class <
&dl_sched_class < &stop_sched_class

In order to guarantee this order of the sched class descriptors, add each
one into their own data section and force the order in the linker script.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/157675913272.349305.8936736338884044103.stgit@localhost.localdomain


# 65538966 09-Mar-2020 Thomas Gleixner <tglx@linutronix.de>

vmlinux.lds.h: Create section for protection against instrumentation

Some code pathes, especially the low level entry code, must be protected
against instrumentation for various reasons:

- Low level entry code can be a fragile beast, especially on x86.

- With NO_HZ_FULL RCU state needs to be established before using it.

Having a dedicated section for such code allows to validate with tooling
that no unsafe functions are invoked.

Add the .noinstr.text section and the noinstr attribute to mark
functions. noinstr implies notrace. Kprobes will gain a section check
later.

Provide also a set of markers: instrumentation_begin()/end()

These are used to mark code inside a noinstr function which calls
into regular instrumentable text section as safe.

The instrumentation markers are only active when CONFIG_DEBUG_ENTRY is
enabled as the end marker emits a NOP to prevent the compiler from merging
the annotation points. This means the objtool verification requires a
kernel compiled with this option.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200505134100.075416272@linutronix.de


# 84d5f77f 26-Mar-2020 H.J. Lu <hjl.tools@gmail.com>

x86, vmlinux.lds: Add RUNTIME_DISCARD_EXIT to generic DISCARDS

In the x86 kernel, .exit.text and .exit.data sections are discarded at
runtime, not by the linker. Add RUNTIME_DISCARD_EXIT to generic DISCARDS
and define it in the x86 kernel linker script to keep them.

The sections are added before the DISCARD directive so document here
only the situation explicitly as this change doesn't have any effect on
the generated kernel. Also, other architectures like ARM64 will use it
too so generalize the approach with the RUNTIME_DISCARD_EXIT define.

[ bp: Massage and extend commit message. ]

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lkml.kernel.org/r/20200326193021.255002-1-hjl.tools@gmail.com


# 90ceddcb 18-Mar-2020 Fangrui Song <maskray@google.com>

bpf: Support llvm-objcopy for vmlinux BTF

Simplify gen_btf logic to make it work with llvm-objcopy. The existing
'file format' and 'architecture' parsing logic is brittle and does not
work with llvm-objcopy/llvm-objdump.

'file format' output of llvm-objdump>=11 will match GNU objdump, but
'architecture' (bfdarch) may not.

.BTF in .tmp_vmlinux.btf is non-SHF_ALLOC. Add the SHF_ALLOC flag
because it is part of vmlinux image used for introspection. C code
can reference the section via linker script defined __start_BTF and
__stop_BTF. This fixes a small problem that previous .BTF had the
SHF_WRITE flag (objcopy -I binary -O elf* synthesized .data).

Additionally, `objcopy -I binary` synthesized symbols
_binary__btf_vmlinux_bin_start and _binary__btf_vmlinux_bin_stop (not
used elsewhere) are replaced with more commonplace __start_BTF and
__stop_BTF.

Add 2>/dev/null because GNU objcopy (but not llvm-objcopy) warns
"empty loadable segment detected at vaddr=0xffffffff81000000, is this intentional?"

We use a dd command to change the e_type field in the ELF header from
ET_EXEC to ET_REL so that lld will accept .btf.vmlinux.bin.o. Accepting
ET_EXEC as an input file is an extremely rare GNU ld feature that lld
does not intend to support, because this is error-prone.

The output section description .BTF in include/asm-generic/vmlinux.lds.h
avoids potential subtle orphan section placement issues and suppresses
--orphan-handling=warn warnings.

Fixes: df786c9b9476 ("bpf: Force .BTF section start to zero when dumping from vmlinux")
Fixes: cb0cc635c7a9 ("powerpc: Include .BTF section")
Reported-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Fangrui Song <maskray@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Stanislav Fomichev <sdf@google.com>
Tested-by: Andrii Nakryiko <andriin@fb.com>
Reviewed-by: Stanislav Fomichev <sdf@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Link: https://github.com/ClangBuiltLinux/linux/issues/871
Link: https://lore.kernel.org/bpf/20200318222746.173648-1-maskray@google.com


# 46f94692 18-Nov-2019 Steven Rostedt (VMware) <rostedt@goodmis.org>

ftrace: Rename ftrace_graph_stub to ftrace_stub_graph

The ftrace_graph_stub was created and points to ftrace_stub as a way to
assign the functon graph tracer function pointer to a stub function with a
different prototype than what ftrace_stub has and not trigger the C
verifier. The ftrace_graph_stub was created via the linker script
vmlinux.lds.h. Unfortunately, powerpc already uses the name
ftrace_graph_stub for its internal implementation of the function graph
tracer, and even though powerpc would still build, the change via the linker
script broke function tracer on powerpc from working.

By using the name ftrace_stub_graph, which does not exist anywhere else in
the kernel, this should not be a problem.

Link: https://lore.kernel.org/r/1573849732.5937.136.camel@lca.pw

Fixes: b83b43ffc6e4 ("fgraph: Fix function type mismatches of ftrace_graph_return using ftrace_stub")
Reorted-by: Qian Cai <cai@lca.pw>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# b83b43ff 15-Oct-2019 Steven Rostedt (VMware) <rostedt@goodmis.org>

fgraph: Fix function type mismatches of ftrace_graph_return using ftrace_stub

The C compiler is allowing more checks to make sure that function pointers
are assigned to the correct prototype function. Unfortunately, the function
graph tracer uses a special name with its assigned ftrace_graph_return
function pointer that maps to a stub function used by the function tracer
(ftrace_stub). The ftrace_graph_return variable is compared to the
ftrace_stub in some archs to know if the function graph tracer is enabled or
not. This means we can not just simply create a new function stub that
compares it without modifying all the archs.

Instead, have the linker script create a function_graph_stub that maps to
ftrace_stub, and this way we can define the prototype for it to match the
prototype of ftrace_graph_return, and make the compiler checks all happy!

Link: http://lkml.kernel.org/r/20191015090055.789a0aed@gandalf.local.home

Cc: linux-sh@vger.kernel.org
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Reported-by: Sami Tolvanen <samitolvanen@google.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# a1326b17 16-Oct-2019 Mark Rutland <mark.rutland@arm.com>

module/ftrace: handle patchable-function-entry

When using patchable-function-entry, the compiler will record the
callsites into a section named "__patchable_function_entries" rather
than "__mcount_loc". Let's abstract this difference behind a new
FTRACE_CALLSITE_SECTION, so that architectures don't have to handle this
explicitly (e.g. with custom module linker scripts).

As parisc currently handles this explicitly, it is fixed up accordingly,
with its custom linker script removed. Since FTRACE_CALLSITE_SECTION is
only defined when DYNAMIC_FTRACE is selected, the parisc module loading
code is updated to only use the definition in that case. When
DYNAMIC_FTRACE is not selected, modules shouldn't have this section, so
this removes some redundant work in that case.

To make sure that this is keep up-to-date for modules and the main
kernel, a comment is added to vmlinux.lds.h, with the existing ifdeffery
simplified for legibility.

I built parisc generic-{32,64}bit_defconfig with DYNAMIC_FTRACE enabled,
and verified that the section made it into the .ko files for modules.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Helge Deller <deller@gmx.de>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: Torsten Duwe <duwe@suse.de>
Tested-by: Amit Daniel Kachhap <amit.kachhap@arm.com>
Tested-by: Sven Schnelle <svens@stackframe.org>
Tested-by: Torsten Duwe <duwe@suse.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James E.J. Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Jessica Yu <jeyu@kernel.org>
Cc: linux-parisc@vger.kernel.org


# b8c2f776 29-Oct-2019 Kees Cook <keescook@chromium.org>

vmlinux.lds.h: Allow EXCEPTION_TABLE to live in RO_DATA

Many architectures have an EXCEPTION_TABLE that needs to be only
readable. As such, it should live in RO_DATA. Create a macro to identify
this case for the architectures that can move EXCEPTION_TABLE into
RO_DATA.

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Will Deacon <will@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: linux-alpha@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-c6x-dev@linux-c6x.org
Cc: linux-ia64@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-s390@vger.kernel.org
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
Cc: Segher Boessenkool <segher@kernel.crashing.org>
Cc: x86-ml <x86@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: https://lkml.kernel.org/r/20191029211351.13243-15-keescook@chromium.org


# c9174047 29-Oct-2019 Kees Cook <keescook@chromium.org>

vmlinux.lds.h: Replace RW_DATA_SECTION with RW_DATA

Rename RW_DATA_SECTION to RW_DATA. (Calling this a "section" is a lie,
since it's multiple sections and section flags cannot be applied to
the macro.)

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> # s390
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> # m68k
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: linux-alpha@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-c6x-dev@linux-c6x.org
Cc: linux-ia64@vger.kernel.org
Cc: linux-s390@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
Cc: Segher Boessenkool <segher@kernel.crashing.org>
Cc: Will Deacon <will@kernel.org>
Cc: x86-ml <x86@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: https://lkml.kernel.org/r/20191029211351.13243-14-keescook@chromium.org


# 93240b32 29-Oct-2019 Kees Cook <keescook@chromium.org>

vmlinux.lds.h: Replace RO_DATA_SECTION with RO_DATA

Finish renaming RO_DATA_SECTION to RO_DATA. (Calling this a "section"
is a lie, since it's multiple sections and section flags cannot be
applied to the macro.)

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> # s390
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> # m68k
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: linux-alpha@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-c6x-dev@linux-c6x.org
Cc: linux-ia64@vger.kernel.org
Cc: linux-s390@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
Cc: Segher Boessenkool <segher@kernel.crashing.org>
Cc: Will Deacon <will@kernel.org>
Cc: x86-ml <x86@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: https://lkml.kernel.org/r/20191029211351.13243-13-keescook@chromium.org


# c8231825 29-Oct-2019 Kees Cook <keescook@chromium.org>

vmlinux.lds.h: Replace RODATA with RO_DATA

There's no reason to keep the RODATA macro: replace the callers with
the expected RO_DATA macro.

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: linux-alpha@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-c6x-dev@linux-c6x.org
Cc: linux-ia64@vger.kernel.org
Cc: linux-s390@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
Cc: Segher Boessenkool <segher@kernel.crashing.org>
Cc: Will Deacon <will@kernel.org>
Cc: x86-ml <x86@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: https://lkml.kernel.org/r/20191029211351.13243-12-keescook@chromium.org


# eaf93707 29-Oct-2019 Kees Cook <keescook@chromium.org>

vmlinux.lds.h: Move NOTES into RO_DATA

The .notes section should be non-executable read-only data. As such,
move it to the RO_DATA macro instead of being per-architecture defined.

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> # s390
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: linux-alpha@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-c6x-dev@linux-c6x.org
Cc: linux-ia64@vger.kernel.org
Cc: linux-s390@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
Cc: Segher Boessenkool <segher@kernel.crashing.org>
Cc: Will Deacon <will@kernel.org>
Cc: x86-ml <x86@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: https://lkml.kernel.org/r/20191029211351.13243-11-keescook@chromium.org


# fbe6a8e6 29-Oct-2019 Kees Cook <keescook@chromium.org>

vmlinux.lds.h: Move Program Header restoration into NOTES macro

In preparation for moving NOTES into RO_DATA, make the Program Header
assignment restoration be part of the NOTES macro itself.

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> # s390
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: linux-alpha@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-c6x-dev@linux-c6x.org
Cc: linux-ia64@vger.kernel.org
Cc: linux-s390@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
Cc: Segher Boessenkool <segher@kernel.crashing.org>
Cc: Will Deacon <will@kernel.org>
Cc: x86-ml <x86@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: https://lkml.kernel.org/r/20191029211351.13243-10-keescook@chromium.org


# 441110a5 29-Oct-2019 Kees Cook <keescook@chromium.org>

vmlinux.lds.h: Provide EMIT_PT_NOTE to indicate export of .notes

In preparation for moving NOTES into RO_DATA, provide a mechanism for
architectures that want to emit a PT_NOTE Program Header to do so.

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> # s390
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: linux-alpha@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-c6x-dev@linux-c6x.org
Cc: linux-ia64@vger.kernel.org
Cc: linux-s390@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
Cc: Segher Boessenkool <segher@kernel.crashing.org>
Cc: Will Deacon <will@kernel.org>
Cc: x86-ml <x86@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: https://lkml.kernel.org/r/20191029211351.13243-9-keescook@chromium.org


# e6b1db98 19-Aug-2019 Matthew Garrett <matthewgarrett@google.com>

security: Support early LSMs

The lockdown module is intended to allow for kernels to be locked down
early in boot - sufficiently early that we don't have the ability to
kmalloc() yet. Add support for early initialisation of some LSMs, and
then add them to the list of names when we do full initialisation later.
Early LSMs are initialised in link order and cannot be overridden via
boot parameters, and cannot make use of kmalloc() (since the allocator
isn't initialised yet).

(Fixed by Stephen Rothwell to include a stub to fix builds when
!CONFIG_SECURITY)

Signed-off-by: Matthew Garrett <mjg59@google.com>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: James Morris <jmorris@namei.org>


# 980af75e 12-Jun-2019 Daniel Lezcano <daniel.lezcano@linaro.org>

thermal/drivers/core: Add init section table for self-encapsulation

Currently the governors are declared in their respective files but they
export their [un]register functions which in turn call the [un]register
governors core's functions. That implies a cyclic dependency which is
not desirable. There is a way to self-encapsulate the governors by letting
them to declare themselves in a __init section table.

Define the table in the asm generic linker description like the other
tables and provide the specific macros to deal with.

Reviewed-by: Amit Kucheria <amit.kucheria@linaro.org>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>


# 6ca63662 05-Jun-2019 Sven Schnelle <svens@stackframe.org>

parisc: add dynamic ftrace

This patch implements dynamic ftrace for PA-RISC. The required mcount
call sequences can get pretty long, so instead of patching the
whole call sequence out of the functions, we are using
-fpatchable-function-entry from gcc. This puts a configurable amount of
NOPS before/at the start of the function. Taking do_sys_open() as example,
which would look like this when the call is patched out:

1036b248: 08 00 02 40 nop
1036b24c: 08 00 02 40 nop
1036b250: 08 00 02 40 nop
1036b254: 08 00 02 40 nop

1036b258 <do_sys_open>:
1036b258: 08 00 02 40 nop
1036b25c: 08 03 02 41 copy r3,r1
1036b260: 6b c2 3f d9 stw rp,-14(sp)
1036b264: 08 1e 02 43 copy sp,r3
1036b268: 6f c1 01 00 stw,ma r1,80(sp)

When ftrace gets enabled for this function the kernel will patch these
NOPs to:

1036b248: 10 19 57 20 <address of ftrace>
1036b24c: 6f c1 00 80 stw,ma r1,40(sp)
1036b250: 48 21 3f d1 ldw -18(r1),r1
1036b254: e8 20 c0 02 bv,n r0(r1)

1036b258 <do_sys_open>:
1036b258: e8 3f 1f df b,l,n .-c,r1
1036b25c: 08 03 02 41 copy r3,r1
1036b260: 6b c2 3f d9 stw rp,-14(sp)
1036b264: 08 1e 02 43 copy sp,r3
1036b268: 6f c1 01 00 stw,ma r1,80(sp)

So the first NOP in do_sys_open() will be patched to jump backwards into
some minimal trampoline code which pushes a stackframe, saves r1 which
holds the return address, loads the address of the real ftrace function,
and branches to that location. For 64 Bit things are getting a bit more
complicated (and longer) because we must make sure that the address of
ftrace location is 8 byte aligned, and the offset passed to ldd for
fetching the address is 8 byte aligned as well.

Note that gcc has a bug which misplaces the function label, and needs a
patch to make dynamic ftrace work. See
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90751 for details.

Signed-off-by: Sven Schnelle <svens@stackframe.org>
Signed-off-by: Helge Deller <deller@gmx.de>


# 54e6c11b 07-Apr-2019 Joel Fernandes (Google) <joel@joelfernandes.org>

srcu: Remove unused vmlinux srcu linker entries

The SRCU for modules optimization (commit title "srcu: Allocate per-CPU
data for DEFINE_SRCU() in modules") introduced vmlinux linker entries
which is unused since it applies only to the built-in vmlinux. So remove
it to prevent any space usage due to the 8 byte alignment it added.
vmlinux.lds.h has no effect on module loading and is not used for
building the module object, so the changes were not needed in the first
place since the optimization is specific to modules.

Tested with SRCU torture_type and rcutorture. Put prints in module
loader to confirm it is able to find and initialize the srcu structures.

Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: kernel-team@android.com
Cc: paulmck@linux.vnet.ibm.com
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>


# fe15b50c 05-Apr-2019 Paul E. McKenney <paulmck@kernel.org>

srcu: Allocate per-CPU data for DEFINE_SRCU() in modules

Adding DEFINE_SRCU() or DEFINE_STATIC_SRCU() to a loadable module requires
that the size of the reserved region be increased, which is not something
we want to be doing all that often. One approach would be to require
that loadable modules define an srcu_struct and invoke init_srcu_struct()
from their module_init function and cleanup_srcu_struct() from their
module_exit function. However, this is more than a bit user unfriendly.

This commit therefore creates an ___srcu_struct_ptrs linker section,
and pointers to srcu_struct structures created by DEFINE_SRCU() and
DEFINE_STATIC_SRCU() within a module are placed into that module's
___srcu_struct_ptrs section. The required init_srcu_struct() and
cleanup_srcu_struct() functions are then automatically invoked as needed
when that module is loaded and unloaded, thus allowing modules to continue
to use DEFINE_SRCU() and DEFINE_STATIC_SRCU() while avoiding the need
to increase the size of the reserved region.

Many of the algorithms and some of the code was cheerfully cherry-picked
from other code making use of linker sections, perhaps most notably from
tracepoints. All bugs are nevertheless the sole property of the author.

Suggested-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
[ paulmck: Use __section() and use "default" in srcu_module_notify()'s
"switch" statement as suggested by Joel Fernandes. ]
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
Tested-by: Joel Fernandes (Google) <joel@joelfernandes.org>


# 898490c0 29-Apr-2019 Alexey Gladkov <gladkov.alexey@gmail.com>

moduleparam: Save information about built-in modules in separate file

Problem:

When a kernel module is compiled as a separate module, some important
information about the kernel module is available via .modinfo section of
the module. In contrast, when the kernel module is compiled into the
kernel, that information is not available.

Information about built-in modules is necessary in the following cases:

1. When it is necessary to find out what additional parameters can be
passed to the kernel at boot time.

2. When you need to know which module names and their aliases are in
the kernel. This is very useful for creating an initrd image.

Proposal:

The proposed patch does not remove .modinfo section with module
information from the vmlinux at the build time and saves it into a
separate file after kernel linking. So, the kernel does not increase in
size and no additional information remains in it. Information is stored
in the same format as in the separate modules (null-terminated string
array). Because the .modinfo section is already exported with a separate
modules, we are not creating a new API.

It can be easily read in the userspace:

$ tr '\0' '\n' < modules.builtin.modinfo
ext4.softdep=pre: crc32c
ext4.license=GPL
ext4.description=Fourth Extended Filesystem
ext4.author=Remy Card, Stephen Tweedie, Andrew Morton, Andreas Dilger, Theodore Ts'o and others
ext4.alias=fs-ext4
ext4.alias=ext3
ext4.alias=fs-ext3
ext4.alias=ext2
ext4.alias=fs-ext2
md_mod.alias=block-major-9-*
md_mod.alias=md
md_mod.description=MD RAID framework
md_mod.license=GPL
md_mod.parmtype=create_on_open:bool
md_mod.parmtype=start_dirty_degraded:int
...

Co-Developed-by: Gleb Fotengauer-Malinovskiy <glebfm@altlinux.org>
Signed-off-by: Gleb Fotengauer-Malinovskiy <glebfm@altlinux.org>
Signed-off-by: Alexey Gladkov <gladkov.alexey@gmail.com>
Acked-by: Jessica Yu <jeyu@kernel.org>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>


# 9672e2cb 30-Dec-2018 Mathias Krause <minipli@googlemail.com>

vmlinux.lds.h: drop unused __vermagic

The reference to '__vermagic' is a relict from v2.5 times. And even
there it had a very short life time, from v2.5.59 (commit 1d411b80ee18
("Module Sanity Check") in the historic tree) to v2.5.69 (commit
67ac5b866bda ("[PATCH] complete modinfo section")).

Neither current kernels nor modules contain a '__vermagic' section any
more, so get rid of it.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jessica Yu <jeyu@kernel.org>
Signed-off-by: Jessica Yu <jeyu@kernel.org>


# f76a16ad 06-Mar-2019 Josh Poimboeuf <jpoimboe@redhat.com>

x86/unwind/orc: Fix ORC unwind table alignment

The .orc_unwind section is a packed array of 6-byte structs. It's
currently aligned to 6 bytes, which is causing warnings in the LLD
linker.

Six isn't a power of two, so it's not a valid alignment value. The
actual alignment doesn't matter much because it's an array of packed
structs. An alignment of two is sufficient. In reality it always gets
aligned to four bytes because it comes immediately after the
4-byte-aligned .orc_unwind_ip section.

Fixes: ee9f8fce9964 ("x86/unwind: Add the ORC unwinder")
Reported-by: Nick Desaulniers <ndesaulniers@google.com>
Reported-by: Dmitry Golovin <dima@golovin.in>
Reported-by: Sedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://github.com/ClangBuiltLinux/linux/issues/218
Link: https://lkml.kernel.org/r/d55027ee95fe73e952dcd8be90aebd31b0095c45.1551892041.git.jpoimboe@redhat.com


# 52c8ee5b 13-Sep-2018 Peter Oberparleiter <oberpar@linux.ibm.com>

vmlinux.lds.h: Fix linker warnings about orphan .LPBX sections

Enabling both CONFIG_LD_DEAD_CODE_DATA_ELIMINATION=y and
CONFIG_GCOV_PROFILE_ALL=y results in linker warnings:

warning: orphan section `.data..LPBX1' being placed in
section `.data..LPBX1'.

LD_DEAD_CODE_DATA_ELIMINATION adds compiler flag -fdata-sections. This
option causes GCC to create separate data sections for data objects,
including those generated by GCC internally for gcov profiling. The
names of these objects start with a dot (.LPBX0, .LPBX1), resulting in
section names starting with 'data..'.

As section names starting with 'data..' are used for specific purposes
in the Linux kernel, the linker script does not automatically include
them in the output data section, resulting in the "orphan section"
linker warnings.

Fix this by specifically including sections named "data..LPBX*" in the
data section.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Tested-by: Stephen Rothwell <sfr@canb.auug.org.au>
Tested-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>


# 8dcf86ca 12-Sep-2018 Peter Oberparleiter <oberpar@linux.ibm.com>

vmlinux.lds.h: Fix incomplete .text.exit discards

Enabling CONFIG_GCOV_PROFILE_ALL=y causes linker errors on ARM:

`.text.exit' referenced in section `.ARM.exidx.text.exit':
defined in discarded section `.text.exit'

`.text.exit' referenced in section `.fini_array.00100':
defined in discarded section `.text.exit'

And related errors on NDS32:

`.text.exit' referenced in section `.dtors.65435':
defined in discarded section `.text.exit'

The gcov compiler flags cause certain compiler versions to generate
additional destructor-related sections that are not yet handled by the
linker script, resulting in references between discarded and
non-discarded sections.

Since destructors are not used in the Linux kernel, fix this by
discarding these additional sections.

Reported-by: Arnd Bergmann <arnd@arndb.de>
Tested-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Reported-by: Greentime Hu <green.hu@gmail.com>
Tested-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>


# 3ac946d1 10-Oct-2018 Kees Cook <keescook@chromium.org>

vmlinux.lds.h: Move LSM_TABLE into INIT_DATA

Since the struct lsm_info table is not an initcall, we can just move it
into INIT_DATA like all the other tables.

Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Casey Schaufler <casey@schaufler-ca.com>
Reviewed-by: John Johansen <john.johansen@canonical.com>
Reviewed-by: James Morris <james.morris@microsoft.com>
Signed-off-by: James Morris <james.morris@microsoft.com>


# b048ae6e 10-Oct-2018 Kees Cook <keescook@chromium.org>

LSM: Rename .security_initcall section to .lsm_info

In preparation for switching from initcall to just a regular set of
pointers in a section, rename the internal section name.

Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Casey Schaufler <casey@schaufler-ca.com>
Reviewed-by: James Morris <james.morris@microsoft.com>
Reviewed-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: James Morris <james.morris@microsoft.com>


# 1e80cd16 10-Oct-2018 Kees Cook <keescook@chromium.org>

vmlinux.lds.h: Avoid copy/paste of security_init section

Avoid copy/paste by defining SECURITY_INIT in terms of SECURITY_INITCALL.

Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Casey Schaufler <casey@schaufler-ca.com>
Reviewed-by: James Morris <james.morris@microsoft.com>
Reviewed-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: James Morris <james.morris@microsoft.com>


# e872267b 19-Sep-2018 Ard Biesheuvel <ardb@kernel.org>

jump_table: Move entries into ro_after_init region

The __jump_table sections emitted into the core kernel and into
each module consist of statically initialized references into
other parts of the code, and with the exception of entries that
point into init code, which are defused at post-init time, these
data structures are never modified.

So let's move them into the ro_after_init section, to prevent them
from being corrupted inadvertently by buggy code, or deliberately
by an attacker.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Jessica Yu <jeyu@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-s390@vger.kernel.org
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Link: https://lkml.kernel.org/r/20180919065144.25010-9-ard.biesheuvel@linaro.org


# 7953002a 20-Aug-2018 Masahiro Yamada <yamada.masahiro@socionext.com>

vmlinux.lds.h: remove stale <linux/export.h> include

This is unneeded since commit a62143850053 ("vmlinux.lds.h: remove
no-op macro VMLINUX_SYMBOL()").

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>


# ac6bbf0c 09-Jul-2018 Rob Herring <robh@kernel.org>

iommu: Remove IOMMU_OF_DECLARE

Now that we use the driver core to stop deferred probe for missing
drivers, IOMMU_OF_DECLARE can be removed.

This is slightly less optimal than having a list of built-in drivers in
that we'll now defer probe twice before giving up. This shouldn't have a
significant impact on boot times as past discussions about deferred
probe have given no evidence of deferred probe having a substantial
impact.

Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Kukjin Kim <kgene@kernel.org>
Cc: Krzysztof Kozlowski <krzk@kernel.org>
Cc: Rob Clark <robdclark@gmail.com>
Cc: Heiko Stuebner <heiko@sntech.de>
Cc: Frank Rowand <frowand.list@gmail.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: iommu@lists.linux-foundation.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-arm-msm@vger.kernel.org
Cc: linux-rockchip@lists.infradead.org
Cc: devicetree@vger.kernel.org
Acked-by: Will Deacon <will.deacon@arm.com>
Acked-by: Marek Szyprowski <m.szyprowski@samsung.com>
Acked-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# 266ff2a8 09-May-2018 Nicholas Piggin <npiggin@gmail.com>

kbuild: Fix asm-generic/vmlinux.lds.h for LD_DEAD_CODE_DATA_ELIMINATION

KEEP more tables, and add the function/data section wildcard to more
section selections.

This is a little ad-hoc at the moment, but kernel code should be moved
to consistently use .text..x (note: double dots) for explicit sections
and all references to it in the linker script can be made with
TEXT_MAIN, and similarly for other sections.

For now, let's see if major architectures move to enabling this option
then we can do some refactoring passes. Otherwise if it remains unused
or superseded by LTO, this may not be required.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>


# a6214385 09-May-2018 Masahiro Yamada <yamada.masahiro@socionext.com>

vmlinux.lds.h: remove no-op macro VMLINUX_SYMBOL()

Now that VMLINUX_SYMBOL() is no-op, clean up the linker script.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Sam Ravnborg <sam@ravnborg.org>


# dd709e72 06-Apr-2018 Daniel Kurtz <djkurtz@chromium.org>

earlycon: Use a pointer table to fix __earlycon_table stride

Commit 99492c39f39f ("earlycon: Fix __earlycon_table stride") tried to fix
__earlycon_table stride by forcing the earlycon_id struct alignment to 32
and asking the linker to 32-byte align the __earlycon_table symbol. This
fix was based on commit 07fca0e57fca92 ("tracing: Properly align linker
defined symbols") which tried a similar fix for the tracing subsystem.

However, this fix doesn't quite work because there is no guarantee that
gcc will place structures packed into an array format. In fact, gcc 4.9
chooses to 64-byte align these structs by inserting additional padding
between the entries because it has no clue that they are supposed to be in
an array. If we are unlucky, the linker will assign symbol
"__earlycon_table" to a 32-byte aligned address which does not correspond
to the 64-byte aligned contents of section "__earlycon_table".

To address this same problem, the fix to the tracing system was
subsequently re-implemented using a more robust table of pointers approach
by commits:
3d56e331b653 ("tracing: Replace syscall_meta_data struct array with pointer array")
654986462939 ("tracepoints: Fix section alignment using pointer array")
e4a9ea5ee7c8 ("tracing: Replace trace_event struct array with pointer array")

Let's use this same "array of pointers to structs" approach for
EARLYCON_TABLE.

Fixes: 99492c39f39f ("earlycon: Fix __earlycon_table stride")
Signed-off-by: Daniel Kurtz <djkurtz@chromium.org>
Suggested-by: Aaron Durbin <adurbin@chromium.org>
Reviewed-by: Rob Herring <robh@kernel.org>
Tested-by: Guenter Roeck <groeck@chromium.org>
Reviewed-by: Guenter Roeck <groeck@chromium.org>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# c4f6699d 28-Mar-2018 Alexei Starovoitov <ast@kernel.org>

bpf: introduce BPF_RAW_TRACEPOINT

Introduce BPF_PROG_TYPE_RAW_TRACEPOINT bpf program type to access
kernel internal arguments of the tracepoints in their raw form.

>From bpf program point of view the access to the arguments look like:
struct bpf_raw_tracepoint_args {
__u64 args[0];
};

int bpf_prog(struct bpf_raw_tracepoint_args *ctx)
{
// program can read args[N] where N depends on tracepoint
// and statically verified at program load+attach time
}

kprobe+bpf infrastructure allows programs access function arguments.
This feature allows programs access raw tracepoint arguments.

Similar to proposed 'dynamic ftrace events' there are no abi guarantees
to what the tracepoints arguments are and what their meaning is.
The program needs to type cast args properly and use bpf_probe_read()
helper to access struct fields when argument is a pointer.

For every tracepoint __bpf_trace_##call function is prepared.
In assembler it looks like:
(gdb) disassemble __bpf_trace_xdp_exception
Dump of assembler code for function __bpf_trace_xdp_exception:
0xffffffff81132080 <+0>: mov %ecx,%ecx
0xffffffff81132082 <+2>: jmpq 0xffffffff811231f0 <bpf_trace_run3>

where

TRACE_EVENT(xdp_exception,
TP_PROTO(const struct net_device *dev,
const struct bpf_prog *xdp, u32 act),

The above assembler snippet is casting 32-bit 'act' field into 'u64'
to pass into bpf_trace_run3(), while 'dev' and 'xdp' args are passed as-is.
All of ~500 of __bpf_trace_*() functions are only 5-10 byte long
and in total this approach adds 7k bytes to .text.

This approach gives the lowest possible overhead
while calling trace_xdp_exception() from kernel C code and
transitioning into bpf land.
Since tracepoint+bpf are used at speeds of 1M+ events per second
this is valuable optimization.

The new BPF_RAW_TRACEPOINT_OPEN sys_bpf command is introduced
that returns anon_inode FD of 'bpf-raw-tracepoint' object.

The user space looks like:
// load bpf prog with BPF_PROG_TYPE_RAW_TRACEPOINT type
prog_fd = bpf_prog_load(...);
// receive anon_inode fd for given bpf_raw_tracepoint with prog attached
raw_tp_fd = bpf_raw_tracepoint_open("xdp_exception", prog_fd);

Ctrl-C of tracing daemon or cmdline tool that uses this feature
will automatically detach bpf program, unload it and
unregister tracepoint probe.

On the kernel side the __bpf_raw_tp_map section of pointers to
tracepoint definition and to __bpf_trace_*() probe function is used
to find a tracepoint with "xdp_exception" name and
corresponding __bpf_trace_xdp_exception() probe function
which are passed to tracepoint_probe_register() to connect probe
with tracepoint.

Addition of bpf_raw_tracepoint doesn't interfere with ftrace and perf
tracepoint mechanisms. perf_event_open() can be used in parallel
on the same tracepoint.
Multiple bpf_raw_tracepoint_open("xdp_exception", prog_fd) are permitted.
Each with its own bpf program. The kernel will execute
all tracepoint probes and all attached bpf programs.

In the future bpf_raw_tracepoints can be extended with
query/introspection logic.

__bpf_raw_tp_map section logic was contributed by Steven Rostedt

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>


# c38d0852 06-Feb-2018 Jia He <jia.he@hxt-semitech.com>

ACPI/IORT: Remove linker section for IORT entries again

In commit 316ca8804ea8 ("ACPI/IORT: Remove linker section for IORT entries
probing"), iort entries was removed in vmlinux.lds.h. But in
commit 2fcc112af37f ("clocksource/drivers: Rename clksrc table to timer"),
this line was back incorrectly.

It does no harm except for adding some useless symbols, so fix it.

Signed-off-by: Jia He <jia.he@hxt-semitech.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>


# 663faf9f 12-Jan-2018 Masami Hiramatsu <mhiramat@kernel.org>

error-injection: Add injectable error types

Add injectable error types for each error-injectable function.

One motivation of error injection test is to find software flaws,
mistakes or mis-handlings of expectable errors. If we find such
flaws by the test, that is a program bug, so we need to fix it.

But if the tester miss input the error (e.g. just return success
code without processing anything), it causes unexpected behavior
even if the caller is correctly programmed to handle any errors.
That is not what we want to test by error injection.

To clarify what type of errors the caller must expect for each
injectable function, this introduces injectable error types:

- EI_ETYPE_NULL : means the function will return NULL if it
fails. No ERR_PTR, just a NULL.
- EI_ETYPE_ERRNO : means the function will return -ERRNO
if it fails.
- EI_ETYPE_ERRNO_NULL : means the function will return -ERRNO
(ERR_PTR) or NULL.

ALLOW_ERROR_INJECTION() macro is expanded to get one of
NULL, ERRNO, ERRNO_NULL to record the error type for
each function. e.g.

ALLOW_ERROR_INJECTION(open_ctree, ERRNO)

This error types are shown in debugfs as below.

====
/ # cat /sys/kernel/debug/error_injection/list
open_ctree [btrfs] ERRNO
io_ctl_init [btrfs] ERRNO
====

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>


# 540adea3 12-Jan-2018 Masami Hiramatsu <mhiramat@kernel.org>

error-injection: Separate error-injection from kprobe

Since error-injection framework is not limited to be used
by kprobes, nor bpf. Other kernel subsystems can use it
freely for checking safeness of error-injection, e.g.
livepatch, ftrace etc.
So this separate error-injection framework from kprobes.

Some differences has been made:

- "kprobe" word is removed from any APIs/structures.
- BPF_ALLOW_ERROR_INJECTION() is renamed to
ALLOW_ERROR_INJECTION() since it is not limited for BPF too.
- CONFIG_FUNCTION_ERROR_INJECTION is the config item of this
feature. It is automatically enabled if the arch supports
error injection feature for kprobe or ftrace etc.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>


# 0500871f 02-Jan-2018 David Howells <dhowells@redhat.com>

Construct init thread stack in the linker script rather than by union

Construct the init thread stack in the linker script rather than doing it
by means of a union so that ia64's init_task.c can be got rid of.

The following symbols are then made available from INIT_TASK_DATA() linker
script macro:

init_thread_union
init_stack

INIT_TASK_DATA() also expands the region to THREAD_SIZE to accommodate the
size of the init stack. init_thread_union is given its own section so that
it can be placed into the stack space in the right order. I'm assuming
that the ia64 ordering is correct and that the task_struct is first and the
thread_info second.

Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Tony Luck <tony.luck@intel.com>
Tested-by: Will Deacon <will.deacon@arm.com> (arm64)
Tested-by: Palmer Dabbelt <palmer@sifive.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>


# 92ace999 11-Dec-2017 Josef Bacik <jbacik@fb.com>

add infrastructure for tagging functions as error injectable

Using BPF we can override kprob'ed functions and return arbitrary
values. Obviously this can be a bit unsafe, so make this feature opt-in
for functions. Simply tag a function with KPROBE_ERROR_INJECT_SYMBOL in
order to give BPF access to that function for error injection purposes.

Signed-off-by: Josef Bacik <jbacik@fb.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>


# b1fca27d 17-Nov-2017 Andi Kleen <ak@linux.intel.com>

kernel debug: support resetting WARN*_ONCE

I like _ONCE warnings because it's guaranteed that they don't flood the
log.

During testing I find it useful to reset the state of the once warnings,
so that I can rerun tests and see if they trigger again, or can
guarantee that a test run always hits the same warnings.

This patch adds a debugfs interface to reset all the _ONCE warnings so
that they appear again:

echo 1 > /sys/kernel/debug/clear_warn_once

This is implemented by putting all the warning booleans into a special
section, and clearing it.

[akpm@linux-foundation.org: coding-style fixes]
Link: http://lkml.kernel.org/r/20171017221455.6740-1-andi@firstfloor.org
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Tested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# ac26963a 20-Oct-2017 Brijesh Singh <brijesh.singh@amd.com>

percpu: Introduce DEFINE_PER_CPU_DECRYPTED

KVM guest defines three per-CPU variables (steal-time, apf_reason, and
kvm_pic_eoi) which are shared between a guest and a hypervisor.

When SEV is active, memory is encrypted with a guest-specific key, and if
the guest OS wants to share the memory region with the hypervisor then it
must clear the C-bit (i.e set decrypted) before sharing it.

DEFINE_PER_CPU_DECRYPTED can be used to define the per-CPU variables
which will be shared between a guest and a hypervisor.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Borislav Petkov <bp@suse.de>
Acked-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: linux-arch@vger.kernel.org
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: kvm@vger.kernel.org
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Christoph Lameter <cl@linux.com>
Link: https://lkml.kernel.org/r/20171020143059.3291-16-brijesh.singh@amd.com


# 11af8474 13-Oct-2017 Josh Poimboeuf <jpoimboe@redhat.com>

x86/unwind: Rename unwinder config options to 'CONFIG_UNWINDER_*'

Rename the unwinder config options from:

CONFIG_ORC_UNWINDER
CONFIG_FRAME_POINTER_UNWINDER
CONFIG_GUESS_UNWINDER

to:

CONFIG_UNWINDER_ORC
CONFIG_UNWINDER_FRAME_POINTER
CONFIG_UNWINDER_GUESS

... in order to give them a more logical config namespace.

Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/73972fc7e2762e91912c6b9584582703d6f1b8cc.1507924831.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 564c9cc8 02-Sep-2017 Kees Cook <keescook@chromium.org>

locking/refcounts, x86/asm: Use unique .text section for refcount exceptions

Using .text.unlikely for refcount exceptions isn't safe because gcc may
move entire functions into .text.unlikely (e.g. in6_dev_dev()), which
would cause any uses of a protected refcount_t function to stay inline
with the function, triggering the protection unconditionally:

.section .text.unlikely,"ax",@progbits
.type in6_dev_get, @function
in6_dev_getx:
.LFB4673:
.loc 2 4128 0
.cfi_startproc
...
lock; incl 480(%rbx)
js 111f
.pushsection .text.unlikely
111: lea 480(%rbx), %rcx
112: .byte 0x0f, 0xff
.popsection
113:

This creates a unique .text..refcount section and adds an additional
test to the exception handler to WARN in the case of having none of OF,
SF, nor ZF set so we can see things like this more easily in the future.

The double dot for the section name keeps it out of the TEXT_MAIN macro
namespace, to avoid collisions and so it can be put at the end with
text.unlikely to keep the cold code together.

See commit:

cb87481ee89db ("kbuild: linker script do not match C names unless LD_DEAD_CODE_DATA_ELIMINATION is configured")

... which matches C names: [a-zA-Z0-9_] but not ".".

Reported-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Elena <elena.reshetova@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arch <linux-arch@vger.kernel.org>
Fixes: 7a46ec0e2f48 ("locking/refcounts, x86/asm: Implement fast refcount overflow protection")
Link: http://lkml.kernel.org/r/1504382986-49301-2-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 129f6c48 21-Jul-2017 Arnd Bergmann <arnd@arndb.de>

mtd: only use __xipram annotation when XIP_KERNEL is set

When XIP_KERNEL is enabled, some functions are defined in the .data
ELF section because we require them to be in RAM whenever we communicate
with the flash chip. However this causes problems when FTRACE is
enabled and gcc emits calls to __gnu_mcount_nc in the function
prolog:

drivers/built-in.o: In function `cfi_chip_setup':
:(.data+0x272fc): relocation truncated to fit: R_ARM_CALL against symbol `__gnu_mcount_nc' defined in .text section in arch/arm/kernel/built-in.o
drivers/built-in.o: In function `cfi_probe_chip':
:(.data+0x27de8): relocation truncated to fit: R_ARM_CALL against symbol `__gnu_mcount_nc' defined in .text section in arch/arm/kernel/built-in.o
/tmp/ccY172rP.s: Assembler messages:
/tmp/ccY172rP.s:70: Warning: ignoring changed section attributes for .data
/tmp/ccY172rP.s: Error: 1 warning, treating warnings as errors
make[5]: *** [drivers/mtd/chips/cfi_probe.o] Error 1
/tmp/ccK4rjeO.s: Assembler messages:
/tmp/ccK4rjeO.s:421: Warning: ignoring changed section attributes for .data
/tmp/ccK4rjeO.s: Error: 1 warning, treating warnings as errors
make[5]: *** [drivers/mtd/chips/cfi_util.o] Error 1
/tmp/ccUvhCYR.s: Assembler messages:
/tmp/ccUvhCYR.s:1895: Warning: ignoring changed section attributes for .data
/tmp/ccUvhCYR.s: Error: 1 warning, treating warnings as errors

Specifically, this does not work because the .data section is not
marked executable, which leads LD to not generate trampolines for
long calls.

This moves the __xipram functions into their own .xiptext section instead.
The section is still placed next to .data and located in RAM but is marked
executable, which avoids the build errors.

Also, we only need to place the XIP functions into a separate section
if both CONFIG_XIP_KERNEL and CONFIG_MTD_XIP are set: When only MTD_XIP
is used, the whole kernel is still in RAM and we do not need to worry
about pulling out the rug under it. When only XIP_KERNEL but not MTD_XIP
is set, the kernel is in some form of ROM, but we never write to it.

Note that MTD_XIP has been broken on ARM since around 2011 or 2012. I
have sent another patch[2] to fix compilation, which I plan to merge
through arm-soc unless there are objections. The obvious alternative
to that would be to completely rip out the MTD_XIP support from the
kernel, since obviously nobody has been using it in a long while.

Link: [1] https://patchwork.kernel.org/patch/8109771/
Link: [2] https://patchwork.kernel.org/patch/9855225/
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>


# 229a7186 02-Aug-2017 Masami Hiramatsu <mhiramat@kernel.org>

irq: Make the irqentry text section unconditional

Generate irqentry and softirqentry text sections without
any Kconfig dependencies. This will add extra sections, but
there should be no performace impact.

Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: David S . Miller <davem@davemloft.net>
Cc: Francis Deslauriers <francis.deslauriers@efficios.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: linux-arch@vger.kernel.org
Cc: linux-cris-kernel@axis.com
Cc: mathieu.desnoyers@efficios.com
Link: http://lkml.kernel.org/r/150172789110.27216.3955739126693102122.stgit@devbox
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# cb87481e 26-Jul-2017 Nicholas Piggin <npiggin@gmail.com>

kbuild: linker script do not match C names unless LD_DEAD_CODE_DATA_ELIMINATION is configured

The .data and .bss sections were modified in the generic linker script to
pull in sections named .data.<C identifier>, which are generated by gcc with
-ffunction-sections and -fdata-sections options.

The problem with this pattern is it can also match section names that Linux
defines explicitly, e.g., .data.unlikely. This can cause Linux sections to
get moved into the wrong place.

The way to avoid this is to use ".." separators for explicit section names
(the dot character is valid in a section name but not a C identifier).
However currently there are sections which don't follow this rule, so for
now just disable the wild card by default.

Example: http://marc.info/?l=linux-arm-kernel&m=150106824024221&w=2

Cc: <stable@vger.kernel.org> # 4.9
Fixes: b67067f1176df ("kbuild: allow archs to select link dead code/data elimination")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>


# ee9f8fce 24-Jul-2017 Josh Poimboeuf <jpoimboe@redhat.com>

x86/unwind: Add the ORC unwinder

Add the new ORC unwinder which is enabled by CONFIG_ORC_UNWINDER=y.
It plugs into the existing x86 unwinder framework.

It relies on objtool to generate the needed .orc_unwind and
.orc_unwind_ip sections.

For more details on why ORC is used instead of DWARF, see
Documentation/x86/orc-unwinder.txt - but the short version is
that it's a simplified, fundamentally more robust debugninfo
data structure, which also allows up to two orders of magnitude
faster lookups than the DWARF unwinder - which matters to
profiling workloads like perf.

Thanks to Andy Lutomirski for the performance improvement ideas:
splitting the ORC unwind table into two parallel arrays and creating a
fast lookup table to search a subset of the unwind table.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: live-patching@vger.kernel.org
Link: http://lkml.kernel.org/r/0a6cbfb40f8da99b7a45a1a8302dc6aef16ec812.1500938583.git.jpoimboe@redhat.com
[ Extended the changelog. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# bb0eb050 26-May-2017 Daniel Lezcano <daniel.lezcano@linaro.org>

clocksource/drivers: Rename CLKSRC_OF to TIMER_OF

The config option name is now renamed to 'TIMER_OF' for consistency with
the CLOCKSOURCE_OF_DECLARE => TIMER_OF_DECLARE change.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>


# 2fcc112a 26-May-2017 Daniel Lezcano <daniel.lezcano@linaro.org>

clocksource/drivers: Rename clksrc table to timer

The table name is now renamed to 'timer' for consistency with
the CLOCKSOURCE_OF_DECLARE => TIMER_OF_DECLARE change.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>


# 02fd7f68 31-May-2017 Jeremy Linton <jeremy.linton@arm.com>

trace: rename kernel enum section to eval

The kernel and its modules have sections containing the enum
string to value conversions. Rename this section because we
intend to store more than enums in it.

Link: http://lkml.kernel.org/r/20170531215653.3240-2-jeremy.linton@arm.com

Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 8e093102 26-May-2017 Daniel Lezcano <daniel.lezcano@linaro.org>

Revert "clockevents: Add a clkevt-of mechanism like clksrc-of"

After discussing it, this feature is dropped as it is not considered
adequate:

https://patchwork.kernel.org/patch/9639317/

There is no user of this macro yet, so there is no impact on the drivers.

This reverts commit 376bc27150f180d9f5eddec6a14117780177589d.

Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>


# 83a092cf 11-May-2017 Nicholas Piggin <npiggin@gmail.com>

powerpc: Link warning for orphan sections

Add --orphan-handling=warn to final link flags. This ensures we can
handle all sections explicitly. This would have caught subtle breakage
such as 7de3b27bac47da9de08409df1d69664acbb72197 at build-time.

Also bring existing orphan sections into the fold:
- .text.hot and .text.unlikely are compiler generated sections.
- .sdata2, .dynsbss, .plt are used by PPC32
- We previously did not specify DWARF_DEBUG or STABS_DEBUG
- DWARF_DEBUG did not include all DWARF sections that can be emitted
- A number of sections are unused and can be discarded.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 316ca880 10-Apr-2017 Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>

ACPI/IORT: Remove linker section for IORT entries probing

The IORT linker section introduced by commit 34ceea275f62
("ACPI/IORT: Introduce linker section for IORT entries probing")
was needed to make sure SMMU drivers are registered (and therefore
probed) in the kernel before devices using the SMMU have a chance
to probe in turn.

Through the introduction of deferred IOMMU configuration the linker
section based IORT probing infrastructure is not needed any longer, in
that device/SMMU probe dependencies are managed through the probe
deferral mechanism, making the IORT linker section infrastructure
unused, so that it can be removed.

Remove the unused IORT linker section probing infrastructure
from the kernel to complete the ACPI IORT IOMMU configure probe
deferral mechanism implementation.

Tested-by: Hanjun Guo <hanjun.guo@linaro.org>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Sricharan R <sricharan@codeaurora.org>
Signed-off-by: Joerg Roedel <jroedel@suse.de>


# d79bf21e 07-Apr-2017 Jessica Yu <jeyu@redhat.com>

vmlinux.lds: add missing VMLINUX_SYMBOL macros

When __{start,end}_ro_after_init is referenced from C code, we run into
the following build errors on blackfin:

kernel/extable.c:169: undefined reference to `__start_ro_after_init'
kernel/extable.c:169: undefined reference to `__end_ro_after_init'

The build error is due to the fact that blackfin is one of the few
arches that prepends an underscore '_' to all symbols defined in C.

Fix this by wrapping __{start,end}_ro_after_init in vmlinux.lds.h with
VMLINUX_SYMBOL(), which adds the necessary prefix for arches that have
HAVE_UNDERSCORE_SYMBOL_PREFIX.

Link: http://lkml.kernel.org/r/1491259387-15869-1-git-send-email-jeyu@redhat.com
Signed-off-by: Jessica Yu <jeyu@redhat.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Eddie Kovsky <ewk@edkovsky.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 906f2a51 31-Mar-2017 Kees Cook <keescook@chromium.org>

mm: fix section name for .data..ro_after_init

A section name for .data..ro_after_init was added by both:

commit d07a980c1b8d ("s390: add proper __ro_after_init support")

and

commit d7c19b066dcf ("mm: kmemleak: scan .data.ro_after_init")

The latter adds incorrect wrapping around the existing s390 section, and
came later. I'd prefer the s390 naming, so this moves the s390-specific
name up to the asm-generic/sections.h and renames the section as used by
kmemleak (and in the future, kernel/extable.c).

Link: http://lkml.kernel.org/r/20170327192213.GA129375@beast
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> [s390 parts]
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Cc: Eddie Kovsky <ewk@edkovsky.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 19d43626 25-Feb-2017 Peter Zijlstra <peterz@infradead.org>

debug: Add _ONCE() logic to report_bug()

Josh suggested moving the _ONCE logic inside the trap handler, using a
bit in the bug_entry::flags field, avoiding the need for the extra
variable.

Sadly this only works for WARN_ON_ONCE(), since the others have
printk() statements prior to triggering the trap.

Still, this saves a fair amount of text and some data:

text data filename
10682460 4530992 defconfig-build/vmlinux.orig
10665111 4530096 defconfig-build/vmlinux.patched

Suggested-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 8d09617b 22-Mar-2017 Alexander Kochetkov <al.kochet@gmail.com>

vmlinux.lds: Add __clkevt_of_table to kernel

The code introduced by commit 0c8893c9095d ("clockevents: Add a
clkevt-of mechanism like clksrc-of") refer to __clkevt_of_table
what doesn't exist in the vmlinux. As a result kernel build
failed with error: "clkevt-probe.c:63: undefined reference to
`__clkevt_of_table’"

Fixes: 0c8893c9095d ("clockevents: Add a clkevt-of mechanism like clksrc-of")
Signed-off-by: Alexander Kochetkov <al.kochet@gmail.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>


# 34ceea27 21-Nov-2016 Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>

ACPI/IORT: Introduce linker section for IORT entries probing

Since commit e647b532275b ("ACPI: Add early device probing
infrastructure") the kernel has gained the infrastructure that allows
adding linker script section entries to execute ACPI driver callbacks
(ie probe routines) for all subsystems that register a table entry
in the respective kernel section (eg clocksource, irqchip).

Since ARM IOMMU devices data is described through IORT tables when
booting with ACPI, the ARM IOMMU drivers must be made able to hook ACPI
callback routines that are called to probe IORT entries and initialize
the respective IOMMU devices.

To avoid adding driver specific hooks into IORT table initialization
code (breaking therefore code modularity - ie ACPI IORT code must be made
aware of ARM SMMU drivers ACPI init callbacks), this patch adds code
that allows ARM SMMU drivers to take advantage of the ACPI early probing
infrastructure, so that they can add linker script section entries
containing drivers callback to be executed on IORT tables detection.

Since IORT nodes are differentiated by a type, the callback routines
can easily parse the IORT table entries, check the IORT nodes and
carry out some actions whenever the IORT node type associated with
the driver specific callback is matched.

Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Hanjun Guo <hanjun.guo@linaro.org>
Reviewed-by: Tomasz Nowicki <tn@semihalf.com>
Tested-by: Hanjun Guo <hanjun.guo@linaro.org>
Tested-by: Tomasz Nowicki <tn@semihalf.com>
Cc: Tomasz Nowicki <tn@semihalf.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>


# 4b89b7f7 23-Nov-2016 Nicholas Piggin <npiggin@gmail.com>

kbuild: keep data tables through dead code elimination

When CONFIG_LD_DEAD_CODE_DATA_ELIMINATION is enabled we must ensure
that we still keep various programatically-accessed tables.

[npiggin: Fold Paul's patches into one, and add a few more tables.
diff symbol tables of allyesconfig with/without -gc-sections shows up
lost tables quite easily.]

Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michal Marek <mmarek@suse.com>


# d7c19b06 10-Nov-2016 Jakub Kicinski <kuba@kernel.org>

mm: kmemleak: scan .data.ro_after_init

Limit the number of kmemleak false positives by including
.data.ro_after_init in memory scanning. To achieve this we need to add
symbols for start and end of the section to the linker scripts.

The problem was been uncovered by commit 56989f6d8568 ("genetlink: mark
families as __ro_after_init").

Link: http://lkml.kernel.org/r/1478274173-15218-1-git-send-email-jakub.kicinski@netronome.com
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 6727ad9e 07-Oct-2016 Chris Metcalf <cmetcalf@mellanox.com>

nmi_backtrace: generate one-line reports for idle cpus

When doing an nmi backtrace of many cores, most of which are idle, the
output is a little overwhelming and very uninformative. Suppress
messages for cpus that are idling when they are interrupted and just
emit one line, "NMI backtrace for N skipped: idling at pc 0xNNN".

We do this by grouping all the cpuidle code together into a new
.cpuidle.text section, and then checking the address of the interrupted
PC to see if it lies within that section.

This commit suitably tags x86 and tile idle routines, and only adds in
the minimal framework for other architectures.

Link: http://lkml.kernel.org/r/1472487169-14923-5-git-send-email-cmetcalf@mellanox.com
Signed-off-by: Chris Metcalf <cmetcalf@mellanox.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Daniel Thompson <daniel.thompson@linaro.org> [arm]
Tested-by: Petr Mladek <pmladek@suse.com>
Cc: Aaron Tomlin <atomlin@redhat.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 0f4c4af0 13-Sep-2016 Nicholas Piggin <npiggin@gmail.com>

kbuild: -ffunction-sections fix for archs with conflicting sections

Enabling -ffunction-sections modified the generic linker script to
pull .text.* sections into regular TEXT_TEXT section, conflicting
with some architectures. Revert that change and require archs that
enable the option to ensure they have no conflicting section names,
and do the appropriate merging.

Reported-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Fixes: b67067f1176d ("kbuild: allow archs to select link dead code/data elimination")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michal Marek <mmarek@suse.com>


# b67067f1 24-Aug-2016 Nicholas Piggin <npiggin@gmail.com>

kbuild: allow archs to select link dead code/data elimination

Introduce LD_DEAD_CODE_DATA_ELIMINATION option for architectures to
select to build with -ffunction-sections, -fdata-sections, and link
with --gc-sections. It requires some work (documented) to ensure all
unreferenced entrypoints are live, and requires toolchain and build
verification, so it is made a per-arch option for now.

On a random powerpc64le build, this yelds a significant size saving,
it boots and runs fine, but there is a lot I haven't tested as yet, so
these savings may be reduced if there are bugs in the link.

text data bss dec filename
11169741 1180744 1923176 14273661 vmlinux
10445269 1004127 1919707 13369103 vmlinux.dce

~700K text, ~170K data, 6% removed from kernel image size.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michal Marek <mmarek@suse.com>


# e41f501d 14-Jul-2016 Dmitry Vyukov <dvyukov@google.com>

vmlinux.lds: account for destructor sections

If CONFIG_KASAN is enabled and gcc is configured with
--disable-initfini-array and/or gold linker is used, gcc emits
.ctors/.dtors and .text.startup/.text.exit sections instead of
.init_array/.fini_array. .dtors section is not explicitly accounted in
the linker script and messes vvar/percpu layout.

We want:
ffffffff822bfd80 D _edata
ffffffff822c0000 D __vvar_beginning_hack
ffffffff822c0000 A __vvar_page
ffffffff822c0080 0000000000000098 D vsyscall_gtod_data
ffffffff822c1000 A __init_begin
ffffffff822c1000 D init_per_cpu__irq_stack_union
ffffffff822c1000 A __per_cpu_load
ffffffff822d3000 D init_per_cpu__gdt_page

We got:
ffffffff8279a600 D _edata
ffffffff8279b000 A __vvar_page
ffffffff8279c000 A __init_begin
ffffffff8279c000 D init_per_cpu__irq_stack_union
ffffffff8279c000 A __per_cpu_load
ffffffff8279e000 D __vvar_beginning_hack
ffffffff8279e080 0000000000000098 D vsyscall_gtod_data
ffffffff827ae000 D init_per_cpu__gdt_page

This happens because __vvar_page and .vvar get different addresses in
arch/x86/kernel/vmlinux.lds.S:

. = ALIGN(PAGE_SIZE);
__vvar_page = .;

.vvar : AT(ADDR(.vvar) - LOAD_OFFSET) {
/* work around gold bug 13023 */
__vvar_beginning_hack = .;

Discard .dtors/.fini_array/.text.exit, since we don't call dtors.
Merge .text.startup into init text.

Link: http://lkml.kernel.org/r/1467386363-120030-1-git-send-email-dvyukov@google.com
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: <stable@vger.kernel.org> [4.0+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 177cf6e5 06-Jun-2016 Daniel Lezcano <daniel.lezcano@linaro.org>

clocksources: Switch back to the clksrc table

All the clocksource drivers's init function are now converted to return
an error code. CLOCKSOURCE_OF_DECLARE is no longer used as well as the
clksrc-of table.

Let's convert back the names:
- CLOCKSOURCE_OF_DECLARE_RET => CLOCKSOURCE_OF_DECLARE
- clksrc-of-ret => clksrc-of

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>

For exynos_mct and samsung_pwm_timer:
Acked-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>

For arch/arc:
Acked-by: Vineet Gupta <vgupta@synopsys.com>

For mediatek driver:
Acked-by: Matthias Brugger <matthias.bgg@gmail.com>

For the Rockchip-part
Acked-by: Heiko Stuebner <heiko@sntech.de>

For STi :
Acked-by: Patrice Chotard <patrice.chotard@st.com>

For the mps2-timer.c and versatile.c changes:
Acked-by: Liviu Dudau <Liviu.Dudau@arm.com>

For the OXNAS part :
Acked-by: Neil Armstrong <narmstrong@baylibre.com>

For LPC32xx driver:
Acked-by: Sylvain Lemieux <slemieux.tyco@gmail.com>

For Broadcom Kona timer change:
Acked-by: Ray Jui <ray.jui@broadcom.com>

For Sun4i and Sun5i:
Acked-by: Chen-Yu Tsai <wens@csie.org>

For Meson6:
Acked-by: Carlo Caione <carlo@caione.org>

For Keystone:
Acked-by: Santosh Shilimkar <ssantosh@kernel.org>

For NPS:
Acked-by: Noam Camus <noamca@mellanox.com>

For bcm2835:
Acked-by: Eric Anholt <eric@anholt.net>


# b7c4db86 31-May-2016 Daniel Lezcano <daniel.lezcano@linaro.org>

clocksource/drivers/clksrc-probe: Introduce init functions with return code

Currently, the clksrc-probe is not able to handle any error from the init
functions. There are different issues with the current code:
- the code is duplicated in the init functions by writing error
- every driver tends to panic in its own init function
- counting the number of clocksources is not reliable

This patch adds another table to store the functions returning an error.
The table is temporary while we convert all the drivers to return an error
and will disappear.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>


# 5ee02af1 13-Jun-2016 Masahiro Yamada <yamada.masahiro@socionext.com>

vmlinux.lds.h: replace config_enabled() with IS_ENABLED()

The use of config_enabled() against config options is ambiguous.

Now, IS_ENABLED() is implemented purely with macro expansion, so
let's replace config_enabled() with IS_ENABLED().

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Michal Marek <mmarek@suse.com>


# 32fb2fc5 06-Jun-2016 Heiko Carstens <hca@linux.ibm.com>

vmlinux.lds.h: allow arch specific handling of ro_after_init data section

commit c74ba8b3480d ("arch: Introduce post-init read-only memory")
introduced the __ro_after_init attribute which allows to add variables
to the ro_after_init data section.

This new section was added to rodata, even though it contains writable
data. This in turn causes problems on architectures which mark the
page table entries read-only that point to rodata very early.

This patch allows architectures to implement an own handling of the
.data..ro_after_init section.
Usually that would be:
- mark the rodata section read-only very early
- mark the ro_after_init section read-only within mark_rodata_ro

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>


# 91ed140d 31-Mar-2016 Borislav Petkov <bp@suse.de>

x86/asm: Make sure verify_cpu() has a good stack

04633df0c43d ("x86/cpu: Call verify_cpu() after having entered long mode too")
added the call to verify_cpu() for sanitizing CPU configuration.

The latter uses the stack minimally and it can happen that we land in
startup_64() directly from a 64-bit bootloader. Then we want to use our
own, known good stack.

Do that.

APs don't need this as the trampoline sets up a stack for them.

Reported-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mika Penttilä <mika.penttila@nextfour.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1459434062-31055-1-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# be7635e7 25-Mar-2016 Alexander Potapenko <glider@google.com>

arch, ftrace: for KASAN put hard/soft IRQ entries into separate sections

KASAN needs to know whether the allocation happens in an IRQ handler.
This lets us strip everything below the IRQ entry point to reduce the
number of unique stack traces needed to be stored.

Move the definition of __irq_entry to <linux/interrupt.h> so that the
users don't need to pull in <linux/ftrace.h>. Also introduce the
__softirq_entry macro which is similar to __irq_entry, but puts the
corresponding functions to the .softirqentry.text section.

Signed-off-by: Alexander Potapenko <glider@google.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Andrey Konovalov <adech.fo@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Konstantin Serebryany <kcc@google.com>
Cc: Dmitry Chernenkov <dmitryc@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# c74ba8b3 17-Feb-2016 Kees Cook <keescook@chromium.org>

arch: Introduce post-init read-only memory

One of the easiest ways to protect the kernel from attack is to reduce
the internal attack surface exposed when a "write" flaw is available. By
making as much of the kernel read-only as possible, we reduce the
attack surface.

Many things are written to only during __init, and never changed
again. These cannot be made "const" since the compiler will do the wrong
thing (we do actually need to write to them). Instead, move these items
into a memory region that will be made read-only during mark_rodata_ro()
which happens after all kernel __init code has finished.

This introduces __ro_after_init as a way to mark such memory, and adds
some documentation about the existing __read_mostly marking.

This improves the security of the Linux kernel by marking formerly
read-write memory regions as read-only on a fully booted up system.

Based on work by PaX Team and Brad Spengler.

Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brad Spengler <spender@grsecurity.net>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Brown <david.brown@linaro.org>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Emese Revfy <re.emese@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mathias Krause <minipli@googlemail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: PaX Team <pageexec@freemail.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kernel-hardening@lists.openwall.com
Cc: linux-arch <linux-arch@vger.kernel.org>
Link: http://lkml.kernel.org/r/1455748879-21872-5-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 2eaa7909 16-Jan-2016 Peter Hurley <peter@hurleysoftware.com>

earlycon: Use common framework for earlycon declarations

Use a single common table of struct earlycon_id for both command line
and devicetree. Re-define OF_EARLYCON_DECLARE() macro to instance a
unique earlycon declaration (the declaration is only guaranteed to be
unique within a compilation unit; separate compilation units must still
use unique earlycon names).

The semantics of OF_EARLYCON_DECLARE() is different; it declares an
earlycon which can matched either on the command line or by devicetree.
EARLYCON_DECLARE() is semantically unchanged; it declares an earlycon
which is matched by command line only. Remove redundant instances of
EARLYCON_DECLARE().

This enables all earlycons to properly initialize struct console
with the appropriate name and index, which improves diagnostics and
enables direct earlycon-to-console handoff.

Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# c625f76a 28-Sep-2015 Marc Zyngier <maz@kernel.org>

clocksource / ACPI: Add probing infrastructure for ACPI-based clocksources

DT enjoys a rather nice probing infrastructure for clocksources,
while ACPI is so far stuck into a very distant past.

This patch introduces a declarative API, allowing clocksources
to be self-contained and be called when parsing the GTDT table.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Hanjun Guo <hanjun.guo@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 46e589a3 28-Sep-2015 Marc Zyngier <maz@kernel.org>

irqchip / ACPI: Add probing infrastructure for ACPI-based irqchips

DT enjoys a rather nice probing infrastructure for irqchips, while
ACPI is so far stuck into a very distant past.

This patch introduces a declarative API, allowing irqchips to be
self-contained and be called when a particular entry is matched
in the MADT table.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Hanjun Guo <hanjun.guo@linaro.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Hanjun Guo <hanjun.guo@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# e647b532 28-Sep-2015 Marc Zyngier <maz@kernel.org>

ACPI: Add early device probing infrastructure

IRQ controllers and timers are the two types of device the kernel
requires before being able to use the device driver model.

ACPI so far lacks a proper probing infrastructure similar to the one
we have with DT, where we're able to declare IRQ chips and
clocksources inside the driver code, and let the core code pick it up
and call us back on a match. This leads to all kind of really ugly
hacks all over the arm64 code and even in the ACPI layer.

In order to allow some basic probing based on the ACPI tables,
introduce "struct acpi_probe_entry" which contains just enough
data and callbacks to match a table, an optional subtable, and
call a probe function. A driver can, at build time, register itself
and expect being called if the right entry exists in the ACPI
table.

A acpi_probe_device_table() is provided, taking an identifier for
a set of acpi_prove_entries, and iterating over the registered
entries.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Hanjun Guo <hanjun.guo@linaro.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Hanjun Guo <hanjun.guo@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 9bebe9e5 19-Jul-2015 Andi Kleen <ak@linux.intel.com>

kbuild: Fix .text.unlikely placement

When building a kernel with .text.unlikely text the unlikely text for
each translation unit was put next to the main .text code in the
final vmlinux.

The problem is that the linker doesn't allow more specific submatches
of a section name in a different linker script statement after the
main match.

So we need to move them all into one line. With that change
.text.unlikely is at the end of everything again.

I also moved .text.hot into the same statement though, even though
that's not strictly needed.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Michal Marek <mmarek@suse.com>


# 99492c39 03-Apr-2015 Peter Hurley <peter@hurleysoftware.com>

earlycon: Fix __earlycon_table stride

The compiler and the linker must agree on the alignment of
struct earlycon_id; empirical testing and commit 07fca0e57fca92
("tracing: Properly align linker defined symbols") suggests
32-byte alignment is the LCD.

Reported-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# 0c564a53 24-Mar-2015 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

tracing: Add TRACE_DEFINE_ENUM() macro to map enums to their values

Several tracepoints use the helper functions __print_symbolic() or
__print_flags() and pass in enums that do the mapping between the
binary data stored and the value to print. This works well for reading
the ASCII trace files, but when the data is read via userspace tools
such as perf and trace-cmd, the conversion of the binary value to a
human string format is lost if an enum is used, as userspace does not
have access to what the ENUM is.

For example, the tracepoint trace_tlb_flush() has:

__print_symbolic(REC->reason,
{ TLB_FLUSH_ON_TASK_SWITCH, "flush on task switch" },
{ TLB_REMOTE_SHOOTDOWN, "remote shootdown" },
{ TLB_LOCAL_SHOOTDOWN, "local shootdown" },
{ TLB_LOCAL_MM_SHOOTDOWN, "local mm shootdown" })

Which maps the enum values to the strings they represent. But perf and
trace-cmd do no know what value TLB_LOCAL_MM_SHOOTDOWN is, and would
not be able to map it.

With TRACE_DEFINE_ENUM(), developers can place these in the event header
files and ftrace will convert the enums to their values:

By adding:

TRACE_DEFINE_ENUM(TLB_FLUSH_ON_TASK_SWITCH);
TRACE_DEFINE_ENUM(TLB_REMOTE_SHOOTDOWN);
TRACE_DEFINE_ENUM(TLB_LOCAL_SHOOTDOWN);
TRACE_DEFINE_ENUM(TLB_LOCAL_MM_SHOOTDOWN);

$ cat /sys/kernel/debug/tracing/events/tlb/tlb_flush/format
[...]
__print_symbolic(REC->reason,
{ 0, "flush on task switch" },
{ 1, "remote shootdown" },
{ 2, "local shootdown" },
{ 3, "local mm shootdown" })

The above is what userspace expects to see, and tools do not need to
be modified to parse them.

Link: http://lkml.kernel.org/r/20150403013802.220157513@goodmis.org

Cc: Guilherme Cox <cox@computer.org>
Cc: Tony Luck <tony.luck@gmail.com>
Cc: Xie XiuQi <xiexiuqi@huawei.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 779c88c9 24-Mar-2015 Ard Biesheuvel <ardb@kernel.org>

ARM: 8321/1: asm-generic: introduce .text.fixup input section

This introduces a new .text.fixup input section that gets emitted
together with the .text section for each input object file.

Note that

*(.text)
*(.text.fixup)

is not the same as

*(.text .text.fixup)

and we are looking for the latter, to ensure that fixup snippets that
are assembled into a separate section in the object file do not end
up out of range for the relative branch instructions it contains if
the .text section itself grows very large.

This helps prevent linker failures on large ARM kernels.

Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>


# 470ca0de 09-Mar-2015 Peter Hurley <peter@hurleysoftware.com>

serial: earlycon: Enable earlycon without command line param

Earlycon matching can only be triggered if 'earlycon=...' has been
specified on the kernel command line. To workaround this limitation
requires tight coupling between arches and specific serial drivers
in order to start an earlycon. Devicetree avoids this limitation
with a link table that contains the required data to match earlycons.

Mirror this approach for earlycon match by name. Re-purpose
EARLYCON_DECLARE to generate a table entry which associates name with
setup() function. Re-purpose setup_earlycon() to scan this table for
an earlycon match, which is registered if found.

Declare one "earlycon" early_param, which calls setup_earlycon().

This design allows setup_earlycon() to be called directly with a
param string (as if 'earlycon=...' had been set on the command line).
Re-registration (either directly or by early_param) is prevented.

Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# 449e056c 02-Feb-2015 Daniel Lezcano <daniel.lezcano@linaro.org>

ARM: cpuidle: Add a cpuidle ops structure to be used for DT

The current state of the different cpuidle drivers is the different PM
operations are passed via the platform_data using the platform driver
paradigm.

This approach allowed to split the low level PM code from the arch specific
and the generic cpuidle code.

Unfortunately there are complaints about this approach as, in the context of the
single kernel image, we have multiple drivers loaded in memory for nothing and
the platform driver is not adequate for cpuidle.

This patch provides a common interface via cpuidle ops for all new cpuidle
driver and a definition for the device tree.

It will allow with the next patches to a have a common definition with ARM64
and share the same cpuidle driver.

The code is optimized to use the __init section intensively in order to reduce
the memory footprint after the driver is initialized and unify the function
names with ARM64.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Kevin Hilman <khilman@linaro.org>
Acked-by: Rob Herring <robherring2@gmail.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>


# 9ddf8252 13-Feb-2015 Andrey Ryabinin <ryabinin.a.a@gmail.com>

kernel: add support for .init_array.* constructors

KASan uses constructors for initializing redzones for global variables.
Globals instrumentation in GCC 4.9.2 produces constructors with priority
(.init_array.00099)

Currently kernel ignores such constructors. Only constructors with
default priority supported (.init_array)

This patch adds support for constructors with priorities. For kernel
image we put pointers to constructors between __ctors_start/__ctors_end
and do_ctors() will call them on start up. For modules we merge
.init_array.* sections into resulting .init_array. Module code properly
handles constructors in .init_array section.

Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Konstantin Serebryany <kcc@google.com>
Cc: Dmitry Chernenkov <dmitryc@google.com>
Signed-off-by: Andrey Konovalov <adech.fo@gmail.com>
Cc: Yuri Gribov <tetra2005@gmail.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 1cd076bf 27-Aug-2014 Will Deacon <will@kernel.org>

iommu: provide early initialisation hook for IOMMU drivers

IOMMU drivers must be initialised before any of their upstream devices,
otherwise the relevant iommu_ops won't be configured for the bus in
question. To solve this, a number of IOMMU drivers use initcalls to
initialise the driver before anything has a chance to be probed.

Whilst this solves the immediate problem, it leaves the job of probing
the IOMMU completely separate from the iommu_ops to configure the IOMMU,
which are called on a per-bus basis and require the driver to figure out
exactly which instance of the IOMMU is being requested. In particular,
the add_device callback simply passes a struct device to the driver,
which then has to parse firmware tables or probe buses to identify the
relevant IOMMU instance.

This patch takes the first step in addressing this problem by adding an
early initialisation pass for IOMMU drivers, giving them the ability to
store some per-instance data in their iommu_ops structure and store that
in their of_node. This can later be used when parsing OF masters to
identify the IOMMU instance in question.

Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Joerg Roedel <jroedel@suse.de>
Acked-by: Marek Szyprowski <m.szyprowski@samsung.com>
Tested-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>


# 562c85ca 25-Sep-2014 Yalin Wang <Yalin.Wang@sonymobile.com>

ARM: 8168/1: extend __init_end to a page align address

This patch changes the __init_end address to a
page align address, so that free_initmem() can
free the whole .init section, because if the end
address is not page aligned, it will round down to
a page align address, then the tail unligned page
will not be freed.

Signed-off-by: wang <yalin.wang2010@gmail.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>


# 330d2822 01-Jul-2014 Zhengyu He <hzy@google.com>

core: fix typo in percpu read_mostly section

This fixes a typo that named the read_mostly section of percpu as
readmostly. It works fine with SMP because the linker script specifies
.data..percpu..readmostly. However, UP kernel builds don't have percpu
sections defined and the non-percpu version of the section is called
data..read_mostly, so .data..readmostly will float around and may break
things unexpectedly.

Looking at the original change that introduced data..percpu..readmostly
(commit c957ef2c59e952803766ddc22e89981ab534606f), it looks like this
was the original intention.

Tested: Built UP kernel and confirmed the sections got merged.

- Before the patch:
$ objdump -h vmlinux.o | grep '\.data\.\.read.*mostly'
38 .data..read_mostly 00004418 0000000000000000 0000000000000000 00431ac0 2**6
50 .data..readmostly 00000014 0000000000000000 0000000000000000 00444000 2**3

- After the patch:
$ objdump -h vmlinux.o | grep '\.data\.\.read.*mostly'
38 .data..read_mostly 00004438 0000000000000000 0000000000000000 00431ac0 2**6

Signed-off-by: Zhengyu He <hzy@google.com>
Signed-off-by: Filipe Brandenburger <filbranden@google.com>
Signed-off-by: Tejun Heo <tj@kernel.org>


# 7d2a01b8 03-Jun-2014 Andreas Noever <andreas.noever@gmail.com>

PCI: Add pci_fixup_suspend_late quirk pass

Add pci_fixup_suspend_late as a new pci_fixup_pass. The pass is called
from suspend_noirq and poweroff_noirq. Using the same pass for suspend
and hibernate is consistent with resume_early which is called by
resume_noirq and restore_noirq.

The new quirk pass is required for Thunderbolt support on Apple
hardware.

Signed-off-by: Andreas Noever <andreas.noever@gmail.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# b0b6abd3 27-Mar-2014 Rob Herring <robh@kernel.org>

serial: earlycon: add DT support

This adds the infrastructure to generic earlycon for earlycon setup
using DT. The actual setup is not enabled until a following commit to
add the FDT parsing.

Signed-off-by: Rob Herring <robh@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Arnd Bergmann <arnd@arndb.de>
Acked-by: Grant Likely <grant.likely@linaro.org>


# 06309288 24-Mar-2014 Rob Herring <robh@kernel.org>

vmlinuz.lds: define OF table sections with macros

OF table sections all have the same pattern, so create a macro to define
them and insure consistency.

Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Rob Herring <robh@kernel.org>


# 9a721c41 24-Mar-2014 Rob Herring <robh@kernel.org>

ARM: align cpu_method_of_table naming

The cpu_method_of_table is the oddball of the various OF linker sections.
In preparation to have common linker section definitions, align the
cpu_method_of_table with the other definitions for the naming and ending
with a blank struct.

Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Rob Herring <robh@kernel.org>
Cc: Russell King <linux@arm.linux.org.uk>


# 735e0da7 24-Mar-2014 Rob Herring <robh@kernel.org>

irqchip: align irqchip OF match table section naming

Make the irqchip OF match table section naming aligned with other
OF match table sections in preparation to have a common definition.

Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Rob Herring <robh@kernel.org>


# 69902c71 30-Apr-2014 Vineet Gupta <Vineet.Gupta1@synopsys.com>

kprobes: Ensure blacklist data is aligned

ARC Linux (not supporting native unaligned access) was failing
to boot because __start_kprobe_blacklist was not aligned.

This was because per generated vmlinux.lds it was emitted right
next to .rodata with strings etc hence could be randomly
unaligned.

Fix that by ensuring a word alignment. While 4 would suffice for
32bit arches and problem at hand, it is probably better to put 8.

| Path: (null) CPU: 0 PID: 1 Comm: swapper Not tainted
| 3.15.0-rc3-next-20140430 #2
| task: 8f044000 ti: 8f01e000 task.ti: 8f01e000
|
| [ECR ]: 0x00230400 => Misaligned r/w from 0x800fb0d3
| [EFA ]: 0x800fb0d3
| [BLINK ]: do_one_initcall+0x86/0x1bc
| [ERET ]: init_kprobes+0x52/0x120

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Cc: <torvalds@linux-foundation.org>
Cc: <rusty@rustcorp.com.au>
Cc: <rdunlap@infradead.org>
Cc: <jeremy@goop.org>
Cc: <arnd@arndb.de>
Cc: <dl9pf@gmx.de>
Cc: <sparse@chrisli.org>
Cc: <anil.s.keshavamurthy@intel.com>
Cc: <davem@davemloft.net>
Cc: <ananth@in.ibm.com>
Cc: <masami.hiramatsu.pt@hitachi.com>
Cc: <chrisw@sous-sol.org>
Cc: <akataria@vmware.com>
Cc: anton Kolesov <Anton.Kolesov@synopsys.com>
Link: http://lkml.kernel.org/r/5361DB14.7010406@synopsys.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 376e2424 17-Apr-2014 Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>

kprobes: Introduce NOKPROBE_SYMBOL() macro to maintain kprobes blacklist

Introduce NOKPROBE_SYMBOL() macro which builds a kprobes
blacklist at kernel build time.

The usage of this macro is similar to EXPORT_SYMBOL(),
placed after the function definition:

NOKPROBE_SYMBOL(function);

Since this macro will inhibit inlining of static/inline
functions, this patch also introduces a nokprobe_inline macro
for static/inline functions. In this case, we must use
NOKPROBE_SYMBOL() for the inline function caller.

When CONFIG_KPROBES=y, the macro stores the given function
address in the "_kprobe_blacklist" section.

Since the data structures are not fully initialized by the
macro (because there is no "size" information), those
are re-initialized at boot time by using kallsyms.

Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Link: http://lkml.kernel.org/r/20140417081705.26341.96719.stgit@ltc230.yrl.intra.hitachi.co.jp
Cc: Alok Kataria <akataria@vmware.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christopher Li <sparse@chrisli.org>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Jan-Simon Möller <dl9pf@gmx.de>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: linux-arch@vger.kernel.org
Cc: linux-doc@vger.kernel.org
Cc: linux-sparse@vger.kernel.org
Cc: virtualization@lists.linux-foundation.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# f618c470 28-Feb-2014 Marek Szyprowski <m.szyprowski@samsung.com>

drivers: of: add support for custom reserved memory drivers

Add support for custom reserved memory drivers. Call their init() function
for each reserved region and prepare for using operations provided by them
with by the reserved_mem->ops array.

Based on previous code provided by Josh Cartwright <joshc@codeaurora.org>

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Grant Likely <grant.likely@linaro.org>


# 6c3ff8b1 30-Oct-2013 Stephen Boyd <sboyd@codeaurora.org>

ARM: Introduce CPU_METHOD_OF_DECLARE() for cpu hotplug/smp

The goal of multi-platform kernels is to remove the need for mach
directories and machine descriptors. To further that goal,
introduce CPU_METHOD_OF_DECLARE() to allow cpu hotplug/smp
support to be separated from the machine descriptors.
Implementers should specify an enable-method property in their
cpus node and then implement a matching set of smp_ops in their
hotplug/smp code, wiring it up with the CPU_METHOD_OF_DECLARE()
macro. When the kernel is compiled we'll collect all the
enable-method smp_ops into one section for use at boot.

At boot time we'll look for an enable-method in each cpu node and
try to match that against all known CPU enable methods in the
kernel. If there are no enable-methods in the cpu nodes we
fallback to the cpus node and try to use any enable-method found
there. If that doesn't work we fall back to the old way of using
the machine descriptor.

Acked-by: Mark Rutland <mark.rutland@arm.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: <devicetree@vger.kernel.org>
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Kumar Gala <galak@codeaurora.org>


# eb3057df 14-Oct-2013 Frantisek Hrbata <fhrbata@redhat.com>

kernel: add support for init_array constructors

This adds the .init_array section as yet another section with constructors. This
is needed because gcc could add __gcov_init calls to .init_array or .ctors
section, depending on gcc (and binutils) version .

v2: - reuse mod->ctors for .init_array section for modules, because gcc uses
.ctors or .init_array, but not both at the same time
v3: - fail to load if that does happen somehow.

Signed-off-by: Frantisek Hrbata <fhrbata@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


# 102c9323 12-Jul-2013 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

tracing: Add __tracepoint_string() to export string pointers

There are several tracepoints (mostly in RCU), that reference a string
pointer and uses the print format of "%s" to display the string that
exists in the kernel, instead of copying the actual string to the
ring buffer (saves time and ring buffer space).

But this has an issue with userspace tools that read the binary buffers
that has the address of the string but has no access to what the string
itself is. The end result is just output that looks like:

rcu_dyntick: ffffffff818adeaa 1 0
rcu_dyntick: ffffffff818adeb5 0 140000000000000
rcu_dyntick: ffffffff818adeb5 0 140000000000000
rcu_utilization: ffffffff8184333b
rcu_utilization: ffffffff8184333b

The above is pretty useless when read by the userspace tools. Ideally
we would want something that looks like this:

rcu_dyntick: Start 1 0
rcu_dyntick: End 0 140000000000000
rcu_dyntick: Start 140000000000000 0
rcu_callback: rcu_preempt rhp=0xffff880037aff710 func=put_cred_rcu 0/4
rcu_callback: rcu_preempt rhp=0xffff880078961980 func=file_free_rcu 0/5
rcu_dyntick: End 0 1

The trace_printk() which also only stores the address of the string
format instead of recording the string into the buffer itself, exports
the mapping of kernel addresses to format strings via the printk_format
file in the debugfs tracing directory.

The tracepoint strings can use this same method and output the format
to the same file and the userspace tools will be able to decipher
the address without any modification.

The tracepoint strings need its own section to save the strings because
the trace_printk section will cause the trace_printk() buffers to be
allocated if anything exists within the section. trace_printk() is only
used for debugging and should never exist in the kernel, we can not use
the trace_printk sections.

Add a new tracepoint_str section that will also be examined by the output
of the printk_format file.

Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 2ec3ba69 03-Jul-2013 Alexandre Bounine <alexandre.bounine@idt.com>

rapidio: convert switch drivers to modules

Rework RapidIO switch drivers to add an option to build them as loadable
kernel modules.

This patch removes RapidIO-specific vmlinux section and converts switch
drivers to be compatible with LDM driver registration method. To simplify
registration of device-specific callback routines this patch introduces
rio_switch_ops data structure. The sw_sysfs() callback is removed from
the list of device-specific operations because under the new structure its
functions can be handled by switch driver's probe() and remove() routines.

If a specific switch device driver is not loaded the RapidIO subsystem
core will use default standard-based operations to configure a switch.
Because the current implementation of RapidIO enumeration/discovery method
relies on availability of device-specific operations for error management,
switch device drivers must be loaded before the RapidIO
enumeration/discovery starts.

This patch also moves several common routines from enumeration/discovery
module into the RapidIO core code to make switch-specific operations
accessible to all components of RapidIO subsystem.

Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Li Yang <leoli@freescale.com>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: Andre van Herk <andre.van.herk@Prodrive.nl>
Cc: Micha Nelissen <micha.nelissen@Prodrive.nl>
Cc: Stef van Os <stef.van.os@Prodrive.nl>
Cc: Jean Delvare <jdelvare@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# e24f6628 19-Jun-2013 Paul Gortmaker <paul.gortmaker@windriver.com>

modpost: remove all traces of cpuinit/cpuexit sections

Delete all audit rules that were checking how the .cpuXYZ
related sections were inter-operating with other __init
like sections, now that __cpuinit is gone. Update the linker
script to not have any knowledge of .cpuinit sections.

[lds.h update courtesy of Ralf Baechle <ralf@linux-mips.org>]

Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>


# 40b31360 20-May-2013 Stephen Rothwell <sfr@canb.auug.org.au>

Finally eradicate CONFIG_HOTPLUG

Ever since commit 45f035ab9b8f ("CONFIG_HOTPLUG should be always on"),
it has been basically impossible to build a kernel with CONFIG_HOTPLUG
turned off. Remove all the remaining references to it.

Cc: Russell King <linux@arm.linux.org.uk>
Cc: Doug Thompson <dougthompson@xmission.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Acked-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# b92021b0 14-Mar-2013 Rusty Russell <rusty@rustcorp.com.au>

CONFIG_SYMBOL_PREFIX: cleanup.

We have CONFIG_SYMBOL_PREFIX, which three archs define to the string
"_". But Al Viro broke this in "consolidate cond_syscall and
SYSCALL_ALIAS declarations" (in linux-next), and he's not the first to
do so.

Using CONFIG_SYMBOL_PREFIX is awkward, since we usually just want to
prefix it so something. So various places define helpers which are
defined to nothing if CONFIG_SYMBOL_PREFIX isn't set:

1) include/asm-generic/unistd.h defines __SYMBOL_PREFIX.
2) include/asm-generic/vmlinux.lds.h defines VMLINUX_SYMBOL(sym)
3) include/linux/export.h defines MODULE_SYMBOL_PREFIX.
4) include/linux/kernel.h defines SYMBOL_PREFIX (which differs from #7)
5) kernel/modsign_certificate.S defines ASM_SYMBOL(sym)
6) scripts/modpost.c defines MODULE_SYMBOL_PREFIX
7) scripts/Makefile.lib defines SYMBOL_PREFIX on the commandline if
CONFIG_SYMBOL_PREFIX is set, so that we have a non-string version
for pasting.

(arch/h8300/include/asm/linkage.h defines SYMBOL_NAME(), too).

Let's solve this properly:
1) No more generic prefix, just CONFIG_HAVE_UNDERSCORE_SYMBOL_PREFIX.
2) Make linux/export.h usable from asm.
3) Define VMLINUX_SYMBOL() and VMLINUX_SYMBOL_STR().
4) Make everyone use them.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Reviewed-by: James Hogan <james.hogan@imgtec.com>
Tested-by: James Hogan <james.hogan@imgtec.com> (metag)


# f2f6c255 03-Jan-2013 Prashant Gaikwad <pgaikwad@nvidia.com>

clk: add common of_clk_init() function

Modify of_clk_init function so that it will determine which
driver to initialize based on device tree instead of each driver
registering to it.

Based on a similar patch for drivers/irqchip by Thomas Petazzoni and
drivers/clocksource by Stephen Warren.

Signed-off-by: Prashant Gaikwad <pgaikwad@nvidia.com>
Tested-by: Tony Prisk <linux@prisktech.co.nz>
Tested-by: Pawel Moll <pawel.moll@arm.com>
Tested-by: Rob Herring <rob.herring@calxeda.com>
Tested-by: Josh Cartwright <josh.cartwright@ni.com>
Reviewed-by: Josh Cartwright <josh.cartwright@ni.com>
Acked-by: Maxime Ripard <maxime.ripard@anandra.org>
Signed-off-by: Mike Turquette <mturquette@linaro.org>
[mturquette@linaro.org: merge conflict from missing CLKSRC_OF_TABLES()]

Signed-off-by: Mike Turquette <mturquette@linaro.org>


# f6e916b8 20-Nov-2012 Thomas Petazzoni <thomas.petazzoni@free-electrons.com>

irqchip: add basic infrastructure

With the recent creation of the drivers/irqchip/ directory, it is
desirable to move irq controller drivers here. At the moment, the only
driver here is irq-bcm2835, the driver for the irq controller found in
the ARM BCM2835 SoC, present in Rasberry Pi systems. This irq
controller driver was exporting its initialization function and its
irq handling function through a header file in
<linux/irqchip/bcm2835.h>.

When proposing to also move another irq controller driver in
drivers/irqchip, Rob Herring raised the very valid point that moving
things to drivers/irqchip was good in order to remove more stuff from
arch/arm, but if it means adding gazillions of headers files in
include/linux/irqchip/, it would not be very nice.

So, upon the suggestion of Rob Herring and Arnd Bergmann, this commit
introduces a small infrastructure that defines a central
irqchip_init() function in drivers/irqchip/irqchip.c, which is meant
to be called as the ->init_irq() callback of ARM platforms. This
function calls of_irq_init() with an array of match strings and init
functions generated from a special linker section.

Note that the irq controller driver initialization function is
responsible for setting the global handle_arch_irq() variable, so that
ARM platforms no longer have to define the ->handle_irq field in their
DT_MACHINE structure.

A global header, <linux/irqchip.h> is also added to expose the single
irqchip_init() function to the reset of the kernel.

A further commit moves the BCM2835 irq controller driver to this new
small infrastructure, therefore removing the include/linux/irqchip/
directory.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Reviewed-by: Stephen Warren <swarren@wwwdotorg.org>
Reviewed-by: Rob Herring <rob.herring@calxeda.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
[rob.herring: reword commit message to reflect use of linker sections.]
Signed-off-by: Rob Herring <rob.herring@calxeda.com>


# ae278a93 19-Nov-2012 Stephen Warren <swarren@nvidia.com>

clocksource: add common of_clksrc_init() function

It is desirable to move all clocksource drivers to drivers/clocksource,
yet each requires its own initialization function. We'd rather not
pollute <linux/> with a header for each function. Instead, create a
single of_clksrc_init() function which will determine which clocksource
driver to initialize based on device tree.

Based on a similar patch for drivers/irqchip by Thomas Petazzoni.

Signed-off-by: Stephen Warren <swarren@nvidia.com>


# c87728ca 14-Aug-2012 David Daney <david.daney@cavium.com>

vmlinux.lds.h: Allow architectures to add sections to the front of .bss

Follow-on MIPS patch will put an object here that needs 64K alignment
to minimize padding.

For those architectures that don't define BSS_FIRST_SECTIONS, there is
no change.

Signed-off-by: David Daney <david.daney@cavium.com>
Cc: linux-mips@linux-mips.org
Cc: linux-arch@vger.kernel.org,
Cc: linux-kernel@vger.kernel.org
Acked-by: Arnd Bergmann <arnd@arndb.de>
Patchwork: https://patchwork.linux-mips.org/patch/4221/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>


# 9fd49328 24-Apr-2012 Steven Rostedt <srostedt@redhat.com>

ftrace: Sort all function addresses, not just per page

Instead of just sorting the ip's of the functions per ftrace page,
sort the entire list before adding them to the ftrace pages.

This will allow the bsearch algorithm to be sped up as it can
also sort by pages, not just records within a page.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 026cee00 25-Mar-2012 Pawel Moll <pawel.moll@arm.com>

params: <level>_initcall-like kernel parameters

This patch adds a set of macros that can be used to declare
kernel parameters to be parsed _before_ initcalls at a chosen
level are executed. We rename the now-unused "flags" field of
struct kernel_param as the level. It's signed, for when we
use this for early params as well, in future.

Linker macro collating init calls had to be modified in order
to add additional symbols between levels that are later used
by the init code to split the calls into blocks.

Signed-off-by: Pawel Moll <pawel.moll@arm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


# 7ccaba53 23-Mar-2012 Jan Beulich <JBeulich@suse.com>

consolidate WARN_...ONCE() static variables

Due to the alignment of following variables, these typically consume
more than just the single byte that 'bool' requires, and as there are a
few hundred instances, the cache pollution (not so much the waste of
memory) sums up. Put these variables into their own section, outside of
any half way frequently used memory range.

Do the same also to the __warned variable of rcu_lockdep_assert().
(Don't, however, include the ones used by printk_once() and alike, as
they can potentially be hot.)

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# bc74ee97 03-Sep-2011 Kirill Tkhai <tkhai@yandex.ru>

m68k: Finally remove leftover markers sections

Markers have removed already twice:

1: fc5377668c3d808e1d53c4aee152c836f55c3490
2: eb878b3bc0349344dbf70c51bf01fc734d5cf2d3

But a little bit is still here.

Signed-off-by: Tkhai Kirill <tkhai@yandex.ru>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>


# f02e8a65 14-Apr-2011 Alessio Igor Bogani <abogani@kernel.org>

module: Sort exported symbols

This patch places every exported symbol in its own section
(i.e. "___ksymtab+printk"). Thus the linker will use its SORT() directive
to sort and finally merge all symbol in the right and final section
(i.e. "__ksymtab").

The symbol prefixed archs use an underscore as prefix for symbols.
To avoid collision we use a different character to create the temporary
section names.

This work was supported by a hardware donation from the CE Linux Forum.

Signed-off-by: Alessio Igor Bogani <abogani@kernel.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (folded in '+' fixup)
Tested-by: Dirk Behme <dirk.behme@googlemail.com>


# d430d3d7 16-Mar-2011 Jason Baron <jbaron@redhat.com>

jump label: Introduce static_branch() interface

Introduce:

static __always_inline bool static_branch(struct jump_label_key *key);

instead of the old JUMP_LABEL(key, label) macro.

In this way, jump labels become really easy to use:

Define:

struct jump_label_key jump_key;

Can be used as:

if (static_branch(&jump_key))
do unlikely code

enable/disale via:

jump_label_inc(&jump_key);
jump_label_dec(&jump_key);

that's it!

For the jump labels disabled case, the static_branch() becomes an
atomic_read(), and jump_label_inc()/dec() are simply atomic_inc(),
atomic_dec() operations. We show testing results for this change below.

Thanks to H. Peter Anvin for suggesting the 'static_branch()' construct.

Since we now require a 'struct jump_label_key *key', we can store a pointer into
the jump table addresses. In this way, we can enable/disable jump labels, in
basically constant time. This change allows us to completely remove the previous
hashtable scheme. Thanks to Peter Zijlstra for this re-write.

Testing:

I ran a series of 'tbench 20' runs 5 times (with reboots) for 3
configurations, where tracepoints were disabled.

jump label configured in
avg: 815.6

jump label *not* configured in (using atomic reads)
avg: 800.1

jump label *not* configured in (regular reads)
avg: 803.4

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20110316212947.GA8792@redhat.com>
Signed-off-by: Jason Baron <jbaron@redhat.com>
Suggested-by: H. Peter Anvin <hpa@linux.intel.com>
Tested-by: David Daney <ddaney@caviumnetworks.com>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 6ea0c34d 03-Apr-2011 Mike Frysinger <vapier@gentoo.org>

percpu: Unify input section names

The two percpu helper macros have the section names duplicated. So create
a new define to merge the two. This also allows arches who need to link
things more directly themselves to avoid duplicating the input sections in
their linker script.

Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Tejun Heo <tj@kernel.org>


# 25985edc 30-Mar-2011 Lucas De Marchi <lucas.demarchi@profusion.mobi>

Fix common misspellings

Fixes generated by 'codespell' and manually reviewed.

Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>


# 0415b00d1 24-Mar-2011 Tejun Heo <tj@kernel.org>

percpu: Always align percpu output section to PAGE_SIZE

Percpu allocator honors alignment request upto PAGE_SIZE and both the
percpu addresses in the percpu address space and the translated kernel
addresses should be aligned accordingly. The calculation of the
former depends on the alignment of percpu output section in the kernel
image.

The linker script macros PERCPU_VADDR() and PERCPU() are used to
define this output section and the latter takes @align parameter.
Several architectures are using @align smaller than PAGE_SIZE breaking
percpu memory alignment.

This patch removes @align parameter from PERCPU(), renames it to
PERCPU_SECTION() and makes it always align to PAGE_SIZE. While at it,
add PCPU_SETUP_BUG_ON() checks such that alignment problems are
reliably detected and remove percpu alignment comment recently added
in workqueue.c as the condition would trigger BUG way before reaching
there.

For um, this patch raises the alignment of percpu area. As the area
is in .init, there shouldn't be any noticeable difference.

This problem was discovered by David Howells while debugging boot
failure on mn10300.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Mike Frysinger <vapier@gentoo.org>
Cc: uclinux-dist-devel@blackfin.uclinux.org
Cc: David Howells <dhowells@redhat.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: user-mode-linux-devel@lists.sourceforge.net


# ea714547 07-Mar-2011 Jiri Olsa <jolsa@redhat.com>

x86: Separate out entry text section

Put x86 entry code into a separate link section: .entry.text.

Separating the entry text section seems to have performance
benefits - caused by more efficient instruction cache usage.

Running hackbench with perf stat --repeat showed that the change
compresses the icache footprint. The icache load miss rate went
down by about 15%:

before patch:
19417627 L1-icache-load-misses ( +- 0.147% )

after patch:
16490788 L1-icache-load-misses ( +- 0.180% )

The motivation of the patch was to fix a particular kprobes
bug that relates to the entry text section, the performance
advantage was discovered accidentally.

Whole perf output follows:

- results for current tip tree:

Performance counter stats for './hackbench/hackbench 10' (500 runs):

19417627 L1-icache-load-misses ( +- 0.147% )
2676914223 instructions # 0.497 IPC ( +- 0.079% )
5389516026 cycles ( +- 0.144% )

0.206267711 seconds time elapsed ( +- 0.138% )

- results for current tip tree with the patch applied:

Performance counter stats for './hackbench/hackbench 10' (500 runs):

16490788 L1-icache-load-misses ( +- 0.180% )
2717734941 instructions # 0.502 IPC ( +- 0.079% )
5414756975 cycles ( +- 0.148% )

0.206747566 seconds time elapsed ( +- 0.137% )

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: masami.hiramatsu.pt@hitachi.com
Cc: ananth@in.ibm.com
Cc: davem@davemloft.net
Cc: 2nddept-manager@sdl.hitachi.co.jp
LKML-Reference: <20110307181039.GB15197@jolsa.redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 3d56e331 02-Feb-2011 Steven Rostedt <srostedt@redhat.com>

tracing: Replace syscall_meta_data struct array with pointer array

Currently the syscall_meta structures for the syscall tracepoints are
placed in the __syscall_metadata section, and at link time, the linker
makes one large array of all these syscall metadata structures. On boot
up, this array is read (much like the initcall sections) and the syscall
data is processed.

The problem is that there is no guarantee that gcc will place complex
structures nicely together in an array format. Two structures in the
same file may be placed awkwardly, because gcc has no clue that they
are suppose to be in an array.

A hack was used previous to force the alignment to 4, to pack the
structures together. But this caused alignment issues with other
architectures (sparc).

Instead of packing the structures into an array, the structures' addresses
are now put into the __syscall_metadata section. As pointers are always the
natural alignment, gcc should always pack them tightly together
(otherwise initcall, extable, etc would also fail).

By having the pointers to the structures in the section, we can still
iterate the trace_events without causing unnecessary alignment problems
with other architectures, or depending on the current behaviour of
gcc that will likely change in the future just to tick us kernel developers
off a little more.

The __syscall_metadata section is also moved into the .init.data section
as it is now only needed at boot up.

Suggested-by: David Miller <davem@davemloft.net>
Acked-by: David S. Miller <davem@davemloft.net>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 65498646 26-Jan-2011 Mathieu Desnoyers <mathieu.desnoyers@efficios.com>

tracepoints: Fix section alignment using pointer array

Make the tracepoints more robust, making them solid enough to handle compiler
changes by not relying on anything based on compiler-specific behavior with
respect to structure alignment. Implement an approach proposed by David Miller:
use an array of const pointers to refer to the individual structures, and export
this pointer array through the linker script rather than the structures per se.
It will consume 32 extra bytes per tracepoint (24 for structure padding and 8
for the pointers), but are less likely to break due to compiler changes.

History:

commit 7e066fb8 tracepoints: add DECLARE_TRACE() and DEFINE_TRACE()
added the aligned(32) type and variable attribute to the tracepoint structures
to deal with gcc happily aligning statically defined structures on 32-byte
multiples.

One attempt was to use a 8-byte alignment for tracepoint structures by applying
both the variable and type attribute to tracepoint structures definitions and
declarations. It worked fine with gcc 4.5.1, but broke with gcc 4.4.4 and 4.4.5.

The reason is that the "aligned" attribute only specify the _minimum_ alignment
for a structure, leaving both the compiler and the linker free to align on
larger multiples. Because tracepoint.c expects the structures to be placed as an
array within each section, up-alignment cause NULL-pointer exceptions due to the
extra unexpected padding.

(this patch applies on top of -tip)

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: David S. Miller <davem@davemloft.net>
LKML-Reference: <20110126222622.GA10794@Krystal>
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: Ingo Molnar <mingo@elte.hu>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# e4a9ea5e 27-Jan-2011 Steven Rostedt <srostedt@redhat.com>

tracing: Replace trace_event struct array with pointer array

Currently the trace_event structures are placed in the _ftrace_events
section, and at link time, the linker makes one large array of all
the trace_event structures. On boot up, this array is read (much like
the initcall sections) and the events are processed.

The problem is that there is no guarantee that gcc will place complex
structures nicely together in an array format. Two structures in the
same file may be placed awkwardly, because gcc has no clue that they
are suppose to be in an array.

A hack was used previous to force the alignment to 4, to pack the
structures together. But this caused alignment issues with other
architectures (sparc).

Instead of packing the structures into an array, the structures' addresses
are now put into the _ftrace_event section. As pointers are always the
natural alignment, gcc should always pack them tightly together
(otherwise initcall, extable, etc would also fail).

By having the pointers to the structures in the section, we can still
iterate the trace_events without causing unnecessary alignment problems
with other architectures, or depending on the current behaviour of
gcc that will likely change in the future just to tick us kernel developers
off a little more.

The _ftrace_event section is also moved into the .init.data section
as it is now only needed at boot up.

Suggested-by: David Miller <davem@davemloft.net>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 19df0c2f 25-Jan-2011 Tejun Heo <tj@kernel.org>

percpu: align percpu readmostly subsection to cacheline

Currently percpu readmostly subsection may share cachelines with other
percpu subsections which may result in unnecessary cacheline bounce
and performance degradation.

This patch adds @cacheline parameter to PERCPU() and PERCPU_VADDR()
linker macros, makes each arch linker scripts specify its cacheline
size and use it to align percpu subsections.

This is based on Shaohua's x86 only patch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Shaohua Li <shaohua.li@intel.com>


# e94965ed 15-Dec-2010 Dmitry Torokhov <dtor@vmware.com>

module: show version information for built-in modules in sysfs

Currently only drivers that are built as modules have their versions
shown in /sys/module/<module_name>/version, but this information might
also be useful for built-in drivers as well. This especially important
for drivers that do not define any parameters - such drivers, if
built-in, are completely invisible from userspace.

This patch changes MODULE_VERSION() macro so that in case when we are
compiling built-in module, version information is stored in a separate
section. Kernel then uses this data to create 'version' sysfs attribute
in the same fashion it creates attributes for module parameters.

Signed-off-by: Dmitry Torokhov <dtor@vmware.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


# 8369744f 12-Jan-2011 Shaohua Li <shaohua.li@intel.com>

include/asm-generic/vmlinux.lds.h: make readmostly section correctly align

The readmostly section should end at a cacheline aligned address,
otherwise the last several data might share cachline with other data and
make the readmostly data still have cache bounce.

For example, in ia64, secpath_cachep is the last readmostly data, and it
shares cacheline with init_uts_ns.

a000000100e80480 d secpath_cachep
a000000100e80488 D init_uts_ns

Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# aab94339 22-Dec-2010 Dirk Brandewie <dirk.brandewie@gmail.com>

of: Add support for linking device tree blobs into vmlinux

This patch adds support for linking device tree blob(s) into
vmlinux. Modifies asm-generic/vmlinux.lds.h to add linking
.dtb sections into vmlinux. To maintain compatiblity with the of/fdt
driver code platforms MUST copy the blob to a non-init memory location
before the kernel frees the .init.* sections in the image.

Modifies scripts/Makefile.lib to add a kbuild command to
compile DTS files to device tree blobs and a rule to create objects to
wrap the blobs for linking.

STRUCT_ALIGNMENT is defined in vmlinux.lds.h for use in the rule to
create wrapper objects for the dtb in Makefile.lib. The
STRUCT_ALIGN() macro in vmlinux.lds.h is modified to use the
STRUCT_ALIGNMENT definition.

The DTB's are placed on 32 byte boundries to allow parsing the blob
with driver/of/fdt.c during early boot without having to copy the blob
to get the structure alignment GCC expects.

A DTB is linked in by adding the DTB object to the list of objects to
be linked into vmlinux in the archtecture specific Makefile using
obj-y += foo.dtb.o

Signed-off-by: Dirk Brandewie <dirk.brandewie@gmail.com>
Acked-by: Michal Marek <mmarek@suse.cz>
[grant.likely@secretlab.ca: cleaned up whitespace inconsistencies]
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>


# d8826262 26-Oct-2010 Mike Frysinger <vapier@gentoo.org>

vmlinux.lds.h: lower init ramfs alignment to 4

The new init ramfs format (cpio based) requires an alignment of 4 (per the
documentation and per the source files themselves). As for compressed
sources, the decompressors can all deal with unaligned buffers.

The cpio source is also found in the __init sections of the kernel, so
once they are read and expanded into a tmpfs, the source is freed. That
means there is no need to force page alignment here either.

This has been used on Blackfin systems for many releases without issue.

Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# d356c0b6 26-Oct-2010 Mike Frysinger <vapier@gentoo.org>

vmlinux.lds.h: gather .data..shared_aligned sections in DATA_DATA

With the recent change "net: remove time limit in process_backlog()", the
softnet_data variable changed from "DEFINE_PER_CPU()" to
"DEFINE_PER_CPU_ALIGNED()" which moved it from the .data section to the
.data.shared_align section. I'm not saying this patch is wrong, just that
is what caused me to notice this larger problem. No one else in the
kernel is using this aligned macro variant, so I imagine that's why no one
has noticed yet.

Since .data..shared_align isn't declared in any vmlinux files that I can
see, the linker just places it last. This "just works" for most people,
but when building a ROM kernel on Blackfin systems, it causes section
overlap errors:

bfin-uclinux-ld.real:
section .init.data [00000000202e06b8 -> 00000000202e48b7] overlaps
section .data.shared_aligned [00000000202e06b8 -> 00000000202e0723]

I imagine other arches which support the ROM config option and thus do
funky placement would see similar issues ...

On x86, it is stuck in a dedicated section at the end:
[8] .data PROGBITS ffffffff810ec000 2ec0000303a8 00 WA 0 0 4096
[9] .data.shared_alig PROGBITS ffffffff8111c3c0 31c3c00000c8 00 WA 0 0 64

So make sure we include this section in the DATA_DATA macro so that it is
placed in the right location.

Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Alan Jenkins <alan-jenkins@tuffmail.co.uk>
Cc: Greg Ungerer <gerg@snapgear.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 2aeb66d3 21-Oct-2010 H. Peter Anvin <hpa@zytor.com>

x86-32, percpu: Correct the ordering of the percpu readmostly section

Checkin c957ef2c59e952803766ddc22e89981ab534606f had inconsistent
ordering of .data..percpu..page_aligned and .data..percpu..readmostly;
the still-broken version affected x86-32 at least.

The page aligned version really must be page aligned...

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
LKML-Reference: <1287544022.4571.7.camel@sli10-conroe.sh.intel.com>
Cc: Shaohua Li <shaohua.li@intel.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>


# c957ef2c 19-Oct-2010 Shaohua Li <shaohua.li@intel.com>

percpu: Introduce a read-mostly percpu API

Add a new readmostly percpu section and API. This can be used to
avoid dirtying data lines which are generally not written to, which is
especially important for data which may be accessed by processors
other than the one for which the percpu area belongs to.

[ hpa: moved it *after* the page-aligned section, for obvious
reasons. ]

Signed-off-by: Shaohua Li <shaohua.li@intel.com>
LKML-Reference: <1287544022.4571.7.camel@sli10-conroe.sh.intel.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>


# ffe8018c 17-Sep-2010 Hendrik Brueckner <brueckner@linux.vnet.ibm.com>

initramfs: fix initramfs size calculation

The size of a built-in initramfs is calculated in init/initramfs.c by
"__initramfs_end - __initramfs_start". Those symbols are defined in the
linker script include/asm-generic/vmlinux.lds.h:

#define INIT_RAM_FS \
. = ALIGN(PAGE_SIZE); \
VMLINUX_SYMBOL(__initramfs_start) = .; \
*(.init.ramfs) \
VMLINUX_SYMBOL(__initramfs_end) = .;

If the initramfs file has an odd number of bytes, the "__initramfs_end"
symbol points to an odd address, for example, the symbols in the
System.map might look like:

0000000000572000 T __initramfs_start
00000000005bcd05 T __initramfs_end <-- odd address

At least on s390 this causes a problem:

Certain s390 instructions, especially instructions for loading addresses
(larl) or branch addresses must be on even addresses. The compiler loads
the symbol addresses with the "larl" instruction. This instruction sets
the last bit to 0 and, therefore, for odd size files, the calculated size
is one byte less than it should be:

0000000000540a9c <populate_rootfs>:
540a9c: eb cf f0 78 00 24 stmg %r12,%r15,120(%r15),
540aa2: c0 10 00 01 8a af larl %r1,572000 <__initramfs_start>
540aa8: c0 c0 00 03 e1 2e larl %r12,5bcd04 <initramfs_end>
(Instead of 5bcd05)
...
540abe: 1b c1 sr %r12,%r1

To fix the problem, this patch introduces the global variable
__initramfs_size, which is calculated in the "usr/initramfs_data.S" file.
The populate_rootfs() function can then use the start marker of the
.init.ramfs section and the value of __initramfs_size for loading the
initramfs. Because the start marker and size is sufficient, the
__initramfs_end symbol is no longer needed and is removed.

Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Reviewed-by: WANG Cong <xiyou.wangcong@gmail.com>
Acked-by: Michal Marek <mmarek@suse.cz>
Acked-by: "H. Peter Anvin" <hpa@zytor.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Michal Marek <mmarek@suse.cz>


# bf5438fc 17-Sep-2010 Jason Baron <jbaron@redhat.com>

jump label: Base patch for jump label

base patch to implement 'jump labeling'. Based on a new 'asm goto' inline
assembly gcc mechanism, we can now branch to labels from an 'asm goto'
statment. This allows us to create a 'no-op' fastpath, which can subsequently
be patched with a jump to the slowpath code. This is useful for code which
might be rarely used, but which we'd like to be able to call, if needed.
Tracepoints are the current usecase that these are being implemented for.

Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jason Baron <jbaron@redhat.com>
LKML-Reference: <ee8b3595967989fdaf84e698dc7447d315ce972a.1284733808.git.jbaron@redhat.com>

[ cleaned up some formating ]

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# c7f52cdc 22-Jul-2010 Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

support multiple .discard.* sections to avoid section type conflicts

gcc 4.4.4 will complain if you use a .discard section for both text and
data ("causes a section type conflict"). Add support for ".discard.*"
sections, and use .discard.text for a dummy function in the x86
RESERVE_BRK() macro.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>


# da5e37ef 13-Jul-2010 Sam Ravnborg <sam@ravnborg.org>

vmlinux.lds: fix .data..init_task output section (fix popwerpc boot)

The .data..init_task output section was missing
a load offset causing a popwerpc target to fail to boot.

Sean MacLennan tracked it down to the definition of
INIT_TASK_DATA_SECTION().

There are only two users of INIT_TASK_DATA_SECTION()
in the kernel today: cris and popwerpc.
cris do not support relocatable kernels and is thus not
impacted by this change.

Fix INIT_TASK_DATA_SECTION() to specify load offset like
all other output sections.

Reported-by: Sean MacLennan <smaclennan@pikatech.com>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# 07fca0e5 10-Jul-2010 Sam Ravnborg <sam@ravnborg.org>

tracing: Properly align linker defined symbols

We define a number of symbols in the linker scipt like this:

__start_syscalls_metadata = .;
*(__syscalls_metadata)

But we do not know the alignment of "." when we assign
the __start_syscalls_metadata symbol.
gcc started to uses bigger alignment for structs (32 bytes),
so we saw situations where the linker due to alignment
constraints increased the value of "." after the symbol assignment.

This resulted in boot fails.

Fix this by forcing a 32 byte alignment of "." before the
assignment.

This patch introduces the forced alignment for
ftrace_events and syscalls_metadata.
It may be required in more places.

Reported-by: Zeev Tarantov <zeev.tarantov@gmail.com>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
LKML-Reference: <20100710063459.GA14596@merkur.ravnborg.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# eb878b3b 15-Jul-2010 Frederic Weisbecker <fweisbec@gmail.com>

tracing: Remove letfover markers section

Markers have been removed, but we forgot to remove their
section.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>


# 058f88d6 26-May-2010 Alexandre Bounine <alexandre.bounine@idt.com>

rapidio: modify initialization of switch operations

Modify the way how RapidIO switch operations are declared. Multiple
assignments through the linker script replaced by single initialization
call.

Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Li Yang <leoli@freescale.com>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: Thomas Moll <thomas.moll@sysgo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# e5cabeb3 26-May-2010 Alexandre Bounine <alexandre.bounine@idt.com>

rapidio: add Port-Write handling for EM

Add RapidIO Port-Write message handling in the context of Error
Management Extensions Specification Rev.1.3.

Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Tested-by: Thomas Moll <thomas.moll@sysgo.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Li Yang <leoli@freescale.com>
Cc: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 07b3bb1e 19-Feb-2010 Denys Vlasenko <vda.linux@googlemail.com>

Rename .data.nosave to .data..nosave.

Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
Signed-off-by: Michal Marek <mmarek@suse.cz>


# 54cb27a7 19-Feb-2010 Denys Vlasenko <vda.linux@googlemail.com>

Rename .data.read_mostly to .data..read_mostly.

Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
Signed-off-by: Michal Marek <mmarek@suse.cz>


# 3d9a854c 19-Feb-2010 Denys Vlasenko <vda.linux@googlemail.com>

Rename .data[.percpu][.XXX] to .data[..percpu][..XXX].

Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
Signed-off-by: Michal Marek <mmarek@suse.cz>


# 7c74df07 19-Feb-2010 Tim Abbott <tabbott@ksplice.com>

Rename .bss.page_aligned to .bss..page_aligned.

Signed-off-by: Tim Abbott <tabbott@ksplice.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
Signed-off-by: Michal Marek <mmarek@suse.cz>


# 75b13483 19-Feb-2010 Tim Abbott <tabbott@ksplice.com>

Rename .data.page_aligned to .data..page_aligned.

Signed-off-by: Tim Abbott <tabbott@ksplice.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
Signed-off-by: Michal Marek <mmarek@suse.cz>


# 2af7687f 19-Feb-2010 Tim Abbott <tabbott@ksplice.com>

Rename .data.init_task to .data..init_task.

Signed-off-by: Tim Abbott <tabbott@ksplice.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
Signed-off-by: Michal Marek <mmarek@suse.cz>


# 4af57b78 19-Feb-2010 Tim Abbott <tabbott@ksplice.com>

Rename .data.cacheline_aligned to .data..cacheline_aligned.

Signed-off-by: Tim Abbott <tabbott@ksplice.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
Signed-off-by: Michal Marek <mmarek@suse.cz>


# 9e1b9b80 07-Nov-2009 Alan Jenkins <alan-jenkins@tuffmail.co.uk>

module: make MODULE_SYMBOL_PREFIX into a CONFIG option

The next commit will require the use of MODULE_SYMBOL_PREFIX in
.tmp_exports-asm.S. Currently it is mixed in with C structure
definitions in "asm/module.h". Move the definition of this arch option
into Kconfig, so it can be easily accessed by any code.

This also lets modpost.c use the same definition. Previously modpost
relied on a hardcoded list of architectures in mk_elfconfig.c.

A build test for blackfin, one of the two MODULE_SYMBOL_PREFIX archs,
showed the generated code was unchanged. vmlinux was identical save
for build ids, and an apparently randomized suffix on a single "__key"
symbol in the kallsyms data).

Signed-off-by: Alan Jenkins <alan-jenkins@tuffmail.co.uk>
Acked-by: Mike Frysinger <vapier@gentoo.org> (blackfin)
CC: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


# 1b208622 24-Sep-2009 Tim Abbott <tabbott@ksplice.com>

Optimize the ordering of sections in RW_DATA_SECTION.

The old RW_DATA_SECTION had INIT_TASK_DATA (which was
more-than-PAGE_SIZE-aligned), followed by a bunch of small alignment
stuff, followed by more PAGE_SIZE-aligned stuff, so you wasted memory
in the middle of .data re-aligning back up to PAGE_SIZE.

This patch sorts the sections by alignment requirements, which should
pack them essentially optimally.

Signed-off-by: Tim Abbott <tabbott@ksplice.com>
Reviewed-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 4b3b4c5e 27-Jul-2009 John Reiser <jreiser@bitwagon.com>

ftrace: __start_mcount_loc should be .init.rodata

__start_mcount_loc[] is unused after init, yet occupies RAM forever
as part of .rodata. 152kiB is typical on a 64-bit architecture. Instead,
__start_mcount_loc should be in the interval [__init_begin, __init_end)
so that the space is reclaimed after init.

__start_mcount_loc[] is generated during the load portion
of kernel build, and is used only by ftrace_init(). ftrace_init is declared
'__init' and is in .init.text, which is freed after init.
__start_mcount_loc is placed into .rodata by a call to MCOUNT_REC inside
the RO_DATA macro of include/asm-generic/vmlinux.lds.h. The array *is*
read-only, but more importantly it is not used after init. So the call to
MCOUNT_REC should be moved from RO_DATA to INIT_DATA.

This patch has been tested on x86_64 with CONFIG_DEBUG_PAGEALLOC=y
which verifies that the address range never is accessed after init.

Signed-off-by: John Reiser <jreiser@BitWagon.com>
LKML-Reference: <4A6DF0B6.7080402@bitwagon.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 04e448d9 12-Jul-2009 Tim Abbott <tabbott@ksplice.com>

vmlinux.lds.h: restructure BSS linker script macros

The BSS section macros in vmlinux.lds.h currently place the .sbss
input section outside the bounds of [__bss_start, __bss_end]. On all
architectures except for microblaze that handle both .sbss and
__bss_start/__bss_end, this is wrong: the .sbss input section is
within the range [__bss_start, __bss_end]. Relatedly, the example
code at the top of the file actually has __bss_start/__bss_end defined
twice; I believe the right fix here is to define them in the
BSS_SECTION macro but not in the BSS macro.

Another problem with the current macros is that several
architectures have an ALIGN(4) or some other small number just before
__bss_stop in their linker scripts. The BSS_SECTION macro currently
hardcodes this to 4; while it should really be an argument. It also
ignores its sbss_align argument; fix that.

mn10300 is the only user at present of any of the macros touched by
this patch. It looks like mn10300 actually was incorrectly converted
to use the new BSS() macro (the alignment of 4 prior to conversion was
a __bss_stop alignment, but the argument to the BSS macro is a start
alignment). So fix this as well.

I'd like acks from Sam and David on this one. Also CCing Paul, since
he has a patch from me which will need to be updated to use
BSS_SECTION(0, PAGE_SIZE, 4) once this gets merged.

Signed-off-by: Tim Abbott <tabbott@ksplice.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>


# 023bf6f1 08-Jul-2009 Tejun Heo <tj@kernel.org>

linker script: unify usage of discard definition

Discarded sections in different archs share some commonality but have
considerable differences. This led to linker script for each arch
implementing its own /DISCARD/ definition, which makes maintaining
tedious and adding new entries error-prone.

This patch makes all linker scripts to move discard definitions to the
end of the linker script and use the common DISCARDS macro. As ld
uses the first matching section definition, archs can include default
discarded sections by including them earlier in the linker script.

ia64 is notable because it first throws away some ia64 specific
subsections and then include the rest of the sections into the final
image, so those sections must be discarded before the inclusion.

defconfig compile tested for x86, x86-64, powerpc, powerpc64, ia64,
alpha, sparc, sparc64 and s390. Michal Simek tested microblaze.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Acked-by: Mike Frysinger <vapier@gentoo.org>
Tested-by: Michal Simek <monstr@monstr.eu>
Cc: linux-arch@vger.kernel.org
Cc: Michal Simek <monstr@monstr.eu>
Cc: microblaze-uclinux@itee.uq.edu.au
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Tony Luck <tony.luck@intel.com>


# 2a2325e6 30-Jun-2009 Heiko Carstens <hca@linux.ibm.com>

gcov: fix __ctors_start alignment

The ctors section for each object file is eight byte aligned (on 64 bit).
However the __ctors_start symbol starts at an arbitrary address dependent
on the size of the previous sections.

Therefore the linker may add some zeroes after __ctors_start to make sure
the ctors contents are properly aligned. However the extra zeroes at the
beginning aren't expected by the code. When walking the functions
pointers contained in there and extra zeroes are added this may result in
random jumps. So make sure that the __ctors_start symbol is always
aligned as well.

Fixes this crash on an allyesconfig on s390:

[ 0.582482] Kernel BUG at 0000000000000012 [verbose debug info unavailable]
[ 0.582489] illegal operation: 0001 [#1] SMP DEBUG_PAGEALLOC
[ 0.582496] Modules linked in:
[ 0.582501] CPU: 0 Tainted: G W 2.6.31-rc1-dirty #273
[ 0.582506] Process swapper (pid: 1, task: 000000003f218000, ksp: 000000003f2238e8)
[ 0.582510] Krnl PSW : 0704200180000000 0000000000000012 (0x12)
[ 0.582518] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:0 CC:2 PM:0 EA:3
[ 0.582524] Krnl GPRS: 0000000000036727 0000000000000010 0000000000000001 0000000000000001
[ 0.582529] 00000000001dfefa 0000000000000000 0000000000000000 0000000000000040
[ 0.582534] 0000000001fff0f0 0000000001790628 0000000002296048 0000000002296048
[ 0.582540] 00000000020c438e 0000000001786000 0000000002014a66 000000003f223e60
[ 0.582553] Krnl Code:>0000000000000012: 0000 unknown
[ 0.582559] 0000000000000014: 0000 unknown
[ 0.582564] 0000000000000016: 0000 unknown
[ 0.582570] 0000000000000018: 0000 unknown
[ 0.582575] 000000000000001a: 0000 unknown
[ 0.582580] 000000000000001c: 0000 unknown
[ 0.582585] 000000000000001e: 0000 unknown
[ 0.582591] 0000000000000020: 0000 unknown
[ 0.582596] Call Trace:
[ 0.582599] ([<0000000002014a46>] kernel_init+0x622/0x7a0)
[ 0.582607] [<0000000000113e22>] kernel_thread_starter+0x6/0xc
[ 0.582615] [<0000000000113e1c>] kernel_thread_starter+0x0/0xc
[ 0.582621] INFO: lockdep is turned off.
[ 0.582624] Last Breaking-Event-Address:
[ 0.582627] [<0000000002014a64>] kernel_init+0x640/0x7a0

Cc: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 39a449d9 23-Jun-2009 Tim Abbott <tabbott@ksplice.com>

asm-generic/vmlinux.lds.h: shuffle INIT_TASK* macro names in vmlinux.lds.h

We recently added a INIT_TASK(align) in include/asm-generic/vmlinux.lds.h,
but there is already a macro INIT_TASK in include/linux/init_task.h, which
is quite confusing. We should switch the macro in the linker script to
INIT_TASK_DATA. (Sorry that I missed this in reviewing the patch). Since
the macros are new, there is only one user of the INIT_TASK in
vmlinux.lds.h, arch/mn10300/kernel/vmlinux.lds.S.

However, we are currently using INIT_TASK_DATA for laying down an entire
.data.init_task section. So rename that to INIT_TASK_DATA_SECTION.

I would be worried about changing the meaning of INIT_TASK_DATA, but the
old INIT_TASK_DATA implementation had no users, and in fact if anyone had
tried to use it, it would have failed to compile because it didn't pass
the alignment to the old INIT_TASK.

Signed-off-by: Tim Abbott <tabbott@ksplice.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Jesper Nilsson <Jesper.Nilsson@axis.com
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>


# 73f1d939 23-Jun-2009 Paul Mundt <lethal@linux-sh.org>

asm-generic/vmlinux.lds.h: Fix up RW_DATA_SECTION definition.

RW_DATA_SECTION is defined to take 4 different alignment parameters,
while NOSAVE_DATA currently uses a fixed PAGE_SIZE alignment as noted
in the comments.

There are presently no in-tree users of this at present, and I just
stumbled across this while implementing the simplified script on a new
architecture port, which subsequently resulted in a syntax error.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>


# 405d967d 24-Jun-2009 Tejun Heo <tj@kernel.org>

linker script: throw away .discard section

x86 throws away .discard section but no other archs do. Also,
.discard is not thrown away while linking modules. Make every arch
and module linking throw it away. This will be used to define dummy
variables for percpu declarations and definitions.

This patch is based on Ivan Kokshaysky's alpha percpu patch.

[ Impact: always throw away everything in .discard ]

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Haavard Skinnemoen <hskinnemoen@atmel.com>
Cc: Bryan Wu <cooloney@kernel.org>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Ingo Molnar <mingo@elte.hu>


# eadfe219 22-Jun-2009 David Howells <dhowells@redhat.com>

LDSCRIPT: Name INIT_RAM_FS consistently

In asm-generic/vmlinux.lds.h, name INIT_RAM_FS consistently, no matter the
setting of CONFIG_BLK_DEV_INITRD. This corrects:

commit ef53dae8658cf0e93d380983824a661067948d87
Author: Sam Ravnborg <sam@ravnborg.org>
Date: Sun Jun 7 20:46:37 2009 +0200
Subject: Improve vmlinux.lds.h support for arch specific linker scripts

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# b99b87f7 17-Jun-2009 Peter Oberparleiter <oberpar@linux.vnet.ibm.com>

kernel: constructor support

Call constructors (gcc-generated initcall-like functions) during kernel
start and module load. Constructors are e.g. used for gcov data
initialization.

Disable constructor support for usermode Linux to prevent conflicts with
host glibc.

Signed-off-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: WANG Cong <xiyou.wangcong@gmail.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Li Wei <W.Li@Sun.COM>
Cc: Michael Ellerman <michaele@au1.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Heiko Carstens <heicars2@linux.vnet.ibm.com>
Cc: Martin Schwidefsky <mschwid2@linux.vnet.ibm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 7923f90f 14-Jun-2009 Sam Ravnborg <sam@ravnborg.org>

vmlinux.lds.h update

Updated after review by Tim Abbott.
- Use HEAD_TEXT_SECTION
- Drop use of section-names.h and delete file
- Introduce EXIT_CALL

Deleting section-names.h required a few simple
updates of init.h

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Cc: Tim Abbott <tabbott@ksplice.com>


# ef53dae8 07-Jun-2009 Sam Ravnborg <sam@ravnborg.org>

Improve vmlinux.lds.h support for arch specific linker scripts

To support alingment of the individual architecture specific linker scripts
provide a set of general definitions in vmlinux.lds.h

With these definitions applied the diverse linekr scripts can be reduced
in line count and their readability are improved - IMO.

A sample linker script is included to give the preferred
order of the sections for the architectures that do not
have any special requirments.

These definitions are also a first step towards eventual
support for -ffunction-sections.
The definitions makes it much easier to do a global
renaming of section names - but the main purpose is
to clean up the linker scripts.

Tim Aboot has provided a lot of inputs to improve
the definitions - all faults are mine.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Cc: Tim Abbott <tabbott@mit.edu>


# fd6c3a8d 12-Mar-2009 Jan Beulich <jbeulich@novell.com>

initconst adjustments

- add .init.rodata to INIT_DATA, and group all initconst flavors
together
- move strings generated from __setup_param() into .init.rodata
- add .*init.rodata to modpost's sets of init sections
- make modpost warn about references between meminit and cpuinit
as well as memexit and cpuexit sections (as CPU and memory
hotplug are independently selectable features)

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>


# 27b18332 27-Apr-2009 Tim Abbott <tabbott@MIT.EDU>

Remove unused support code for refok sections.

The old refok sections

.text.init.refok
.data.init.refok
.exit.text.refok

have been deprecated since commit
312b1485fb509c9bc32eda28ad29537896658cb8. After the other patches in
this patch series nothing is put in these sections, so clean things up
by eliminating all the remaining references to them.

Signed-off-by: Tim Abbott <tabbott@mit.edu>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# c80d471a 25-Apr-2009 Tim Abbott <tabbott@MIT.EDU>

Add new HEAD_TEXT_SECTION macro.

This patch is preparation for replacing all uses of ".head.text" or
".text.head" in the kernel with macros, so that the section name can
later be changed without having to touch a lot of the kernel.

Since some linker scripts do more complex things than referencing
HEAD_TEXT, we add a HEAD_TEXT_SECTION macro that just contains the
actual name.

I've defined HEAD_TEXT_SECTION in a new header,
include/linux/section-names.h, so that this section name only needs to
appear in one place. I anticipate creating similar macro structures
for a number of other section names.

The long-term goal here is to be able to change the kernel's magic
section names to those that are compatible with -ffunction-sections
-fdata-sections. This requires renaming all magic sections with names
of the form ".text.foo".

Signed-off-by: Tim Abbott <tabbott@mit.edu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 5f77a88b 08-Apr-2009 Tom Zanussi <tzanussi@gmail.com>

tracing/infrastructure: separate event tracer from event support

Add a new config option, CONFIG_EVENT_TRACING that gets selected
when CONFIG_TRACING is selected and adds everything needed by the stuff
in trace_export - basically all the event tracing support needed by e.g.
bprint, minus the actual events, which are only included if
CONFIG_EVENT_TRACER is selected.

So CONFIG_EVENT_TRACER can be used to turn on or off the generated events
(what I think of as the 'event tracer'), while CONFIG_EVENT_TRACING turns
on or off the base event tracing support used by both the event tracer and
the other things such as bprint that can't be configured out.

Signed-off-by: Tom Zanussi <tzanussi@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: fweisbec@gmail.com
LKML-Reference: <1239178441.10295.34.camel@tropicana>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# e9d376f0 05-Feb-2009 Jason Baron <jbaron@redhat.com>

dynamic debug: combine dprintk and dynamic printk

This patch combines Greg Bank's dprintk() work with the existing dynamic
printk patchset, we are now calling it 'dynamic debug'.

The new feature of this patchset is a richer /debugfs control file interface,
(an example output from my system is at the bottom), which allows fined grained
control over the the debug output. The output can be controlled by function,
file, module, format string, and line number.

for example, enabled all debug messages in module 'nf_conntrack':

echo -n 'module nf_conntrack +p' > /mnt/debugfs/dynamic_debug/control

to disable them:

echo -n 'module nf_conntrack -p' > /mnt/debugfs/dynamic_debug/control

A further explanation can be found in the documentation patch.

Signed-off-by: Greg Banks <gnb@sgi.com>
Signed-off-by: Jason Baron <jbaron@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>


# bed1ffca 13-Mar-2009 Frederic Weisbecker <fweisbec@gmail.com>

tracing/syscalls: core infrastructure for syscalls tracing, enhancements

Impact: new feature

This adds the generic support for syscalls tracing. This is
currently exploited through a devoted tracer but other tracing
engines can use it. (They just have to play with
{start,stop}_ftrace_syscalls() and use the display callbacks
unless they want to override them.)

The syscalls prototypes definitions are abused here to steal
some metadata informations:

- syscall name, param types, param names, number of params

The syscall addr is not directly saved during this definition
because we don't know if its prototype is available in the
namespace. But we don't really need it. The arch has just to
build a function able to resolve the syscall number to its
metadata struct.

The current tracer prints the syscall names, parameters names
and values (and their types optionally). Currently the value is
a raw hex but higher level values diplaying is on my TODO list.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <1236955332-10133-2-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 8a20d84d 09-Mar-2009 Ingo Molnar <mingo@elte.hu>

tracing: trace_printk() fix, move format array to data section

Impact: fix kernel crash when using trace_printk()

trace_printk_fmt section is defined into the readonly section.
But we do:

trace_printk_fmt = fmt;

to fill in that table of format strings - which is not read-only.
Under CONFIG_DEBUG_RODATA=y this crashes ...

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <1236356510-8381-5-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 1ba28e02 06-Mar-2009 Lai Jiangshan <laijs@cn.fujitsu.com>

tracing: add trace_bprintk()

Impact: add a generic printk() for tracing, like trace_printk()

trace_bprintk() uses the infrastructure to record events on ring_buffer.

[ fweisbec@gmail.com: ported to latest -tip, made it work if
!CONFIG_MODULES, never free the format strings from modules
because we can't keep track of them and conditionnaly create
the ftrace format strings section (reported by Steven Rostedt) ]

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <1236356510-8381-4-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# b77e38aa 24-Feb-2009 Steven Rostedt <srostedt@redhat.com>

tracing: add event trace infrastructure

This patch creates the event tracing infrastructure of ftrace.
It will create the files:

/debug/tracing/available_events
/debug/tracing/set_event

The available_events will list the trace points that have been
registered with the event tracer.

set_events will allow the user to enable or disable an event hook.

example:

# echo sched_wakeup > /debug/tracing/set_event

Will enable the sched_wakeup event (if it is registered).

# echo "!sched_wakeup" >> /debug/tracing/set_event

Will disable the sched_wakeup event (and only that event).

# echo > /debug/tracing/set_event

Will disable all events (notice the '>')

# cat /debug/tracing/available_events > /debug/tracing/set_event

Will enable all registered event hooks.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>


# 3ac6cffe 30-Jan-2009 Tejun Heo <tj@kernel.org>

linker script: use separate simpler definition for PERCPU()

Impact: fix linker screwup on x86_32

Recent x86_64 zerobased patches introduced PERCPU_VADDR() to put
.data.percpu to a predefined address and re-defined PERCPU() in terms
of it. The new macro defined one extra symbol, __per_cpu_load, for
LMA of the section so that the init data could be accessed. This new
symbol introduced the following problems to x86_32.

1. If __per_cpu_load is defined outside of .data.percpu as an absolute
symbol, relocation generation for relocatable kernel fails due to
absolute relocation.

2. If __per_cpu_load is put inside .data.percpu with absolute address
assignment to work around #1, linker gets confused and under
certain configurations ends up relocating the symbol against
.data.percpu such that the load address gets added on top of
already set load address.

As x86_32 doesn't use predefined address for .data.percpu, there's no
need for it to care about the possibility of __per_cpu_load being
different from __per_cpu_start.

This patch defines PERCPU() separately so that __per_cpu_load is
defined inside .data.percpu so that everything is ordinary
linking-wise.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# dba3d36b 29-Jan-2009 Ingo Molnar <mingo@elte.hu>

Revert "generic, x86: fix __per_cpu_load relocation"

This reverts commit 5a611268b69f05262936dd177205acbce4471358.

It is causing occasional boot crashes, caused by certain
linker versions (GNU ld version 2.18.50.0.6-2 20080403) messing up:

82dcc000 D __per_cpu_load
c16e6000 A __per_cpu_load_abs

The __per_cpu_load value is out of whack. Hpa noticed the following
detail:

* (gdb) p/x -(0xc16e6000-0x82dcc000)
* $2 = 0xc16e6000
* I.e. one is the other << 1

The two symbols should be equal.

Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 5a611268 26-Jan-2009 Brian Gerst <brgerst@gmail.com>

generic, x86: fix __per_cpu_load relocation

This patch fixes this linker error:

WARNING: Absolute relocations present
Offset Info Type Sym.Value Sym.Name
c0a4e07d 00e78001 R_386_32 c0ab0000 __per_cpu_load

Now, __per_cpu_load is a section-relative symbol:

c0aa4000 D __per_cpu_load
c0aa4000 A __per_cpu_load_abs

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 6b7c38d5 18-Jan-2009 Tejun Heo <tj@kernel.org>

linker script: kill PERCPU_VADDR_PREALLOC()

Impact: cleanup

With .data.percpu.first in place, PERCPU_VADDR_PREALLOC() is no longer
necessary. Kill it.

Signed-off-by: Tejun Heo <tj@kernel.org>


# 0bd74fa8 18-Jan-2009 Brian Gerst <brgerst@gmail.com>

percpu: refactor percpu.h

Impact: cleanup

Refactor the DEFINE_PER_CPU_* macros and add .data.percpu.first
section.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>


# 145cd30b 16-Jan-2009 Tejun Heo <tj@kernel.org>

linker script: add missing VMLINUX_SYMBOL

The newly added PERCPU_*() macros define and use __per_cpu_load but
VMLINUX_SYMBOL() was missing from usages causing build failures on
archs where linker visible symbol is different from C symbols
(e.g. blackfin).

Signed-off-by: Tejun Heo <tj@kernel.org>


# 1a51e3a0 13-Jan-2009 Tejun Heo <tj@kernel.org>

x86: fold pda into percpu area on SMP

[ Based on original patch from Christoph Lameter and Mike Travis. ]

Currently pdas and percpu areas are allocated separately. %gs points
to local pda and percpu area can be reached using pda->data_offset.
This patch folds pda into percpu area.

Due to strange gcc requirement, pda needs to be at the beginning of
the percpu area so that pda->stack_canary is at %gs:40. To achieve
this, a new percpu output section macro - PERCPU_VADDR_PREALLOC() - is
added and used to reserve pda sized chunk at the start of the percpu
area.

After this change, for boot cpu, %gs first points to pda in the
data.init area and later during setup_per_cpu_areas() gets updated to
point to the actual pda. This means that setup_per_cpu_areas() need
to reload %gs for CPU0 while clearing pda area for other cpus as cpu0
already has modified it when control reaches setup_per_cpu_areas().

This patch also removes now unnecessary get_local_pda() and its call
sites.

A lot of this patch is taken from Mike Travis' "x86_64: Fold pda into
per cpu area" patch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 3e5d8f97 13-Jan-2009 Tejun Heo <tj@kernel.org>

x86: make percpu symbols zerobased on SMP

[ Based on original patch from Christoph Lameter and Mike Travis. ]

This patch makes percpu symbols zerobased on x86_64 SMP by adding
PERCPU_VADDR() to vmlinux.lds.h which helps setting explicit vaddr on
the percpu output section and using it in vmlinux_64.lds.S. A new
PHDR is added as existing ones cannot contain sections near address
zero. PERCPU_VADDR() also adds a new symbol __per_cpu_load which
always points to the vaddr of the loaded percpu data.init region.

The following adjustments have been made to accomodate the address
change.

* code to locate percpu gdt_page in head_64.S is updated to add the
load address to the gdt_page offset.

* __per_cpu_load is used in places where access to the init data area
is necessary.

* pda->data_offset is initialized soon after C code is entered as zero
value doesn't work anymore.

This patch is mostly taken from Mike Travis' "x86_64: Base percpu
variables at zero" patch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# a0343e82 09-Dec-2008 Frederic Weisbecker <fweisbec@gmail.com>

tracing/function-graph-tracer: add a new .irqentry.text section

Impact: let the function-graph-tracer be aware of the irq entrypoints

Add a new .irqentry.text section to store the irq entrypoints functions
inside the same section. This way, the tracer will be able to signal
an interrupts triggering on output by recognizing these entrypoints.

Also, make this section recordable for dynamic tracing.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 2bcd521a 20-Nov-2008 Steven Rostedt <srostedt@redhat.com>

trace: profile all if conditionals

Impact: feature to profile if statements

This patch adds a branch profiler for all if () statements.
The results will be found in:

/debugfs/tracing/profile_branch

For example:

miss hit % Function File Line
------- --------- - -------- ---- ----
0 1 100 x86_64_start_reservations head64.c 127
0 1 100 copy_bootdata head64.c 69
1 0 0 x86_64_start_kernel head64.c 111
32 0 0 set_intr_gate desc.h 319
1 0 0 reserve_ebda_region head.c 51
1 0 0 reserve_ebda_region head.c 47
0 1 100 reserve_ebda_region head.c 42
0 0 X maxcpus main.c 165

Miss means the branch was not taken. Hit means the branch was taken.
The percent is the percentage the branch was taken.

This adds a significant amount of overhead and should only be used
by those analyzing their system.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 45b79749 20-Nov-2008 Steven Rostedt <srostedt@redhat.com>

trace: consolidate unlikely and likely profiler

Impact: clean up to make one profiler of like and unlikely tracer

The likely and unlikely profiler prints out the file and line numbers
of the annotated branches that it is profiling. It shows the number
of times it was correct or incorrect in its guess. Having two
different files or sections for that matter to tell us if it was a
likely or unlikely is pretty pointless. We really only care if
it was correct or not.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 7e066fb8 14-Nov-2008 Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>

tracepoints: add DECLARE_TRACE() and DEFINE_TRACE()

Impact: API *CHANGE*. Must update all tracepoint users.

Add DEFINE_TRACE() to tracepoints to let them declare the tracepoint
structure in a single spot for all the kernel. It helps reducing memory
consumption, especially when declaring a lot of tracepoints, e.g. for
kmalloc tracing.

*API CHANGE WARNING*: now, DECLARE_TRACE() must be used in headers for
tracepoint declarations rather than DEFINE_TRACE(). This is the sane way
to do it. The name previously used was misleading.

Updates scheduler instrumentation to follow this API change.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 2ed84eeb 12-Nov-2008 Steven Rostedt <srostedt@redhat.com>

trace: rename unlikely profiler to branch profiler

Impact: name change of unlikely tracer and profiler

Ingo Molnar suggested changing the config from UNLIKELY_PROFILE
to BRANCH_PROFILING. I never did like the "unlikely" name so I
went one step farther, and renamed all the unlikely configurations
to a "BRANCH" variant.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 1f0d69a9 11-Nov-2008 Steven Rostedt <rostedt@goodmis.org>

tracing: profile likely and unlikely annotations

Impact: new unlikely/likely profiler

Andrew Morton recently suggested having an in-kernel way to profile
likely and unlikely macros. This patch achieves that goal.

When configured, every(*) likely and unlikely macro gets a counter attached
to it. When the condition is hit, the hit and misses of that condition
are recorded. These numbers can later be retrieved by:

/debugfs/tracing/profile_likely - All likely markers
/debugfs/tracing/profile_unlikely - All unlikely markers.

# cat /debug/tracing/profile_unlikely | head
correct incorrect % Function File Line
------- --------- - -------- ---- ----
2167 0 0 do_arch_prctl process_64.c 832
0 0 0 do_arch_prctl process_64.c 804
2670 0 0 IS_ERR err.h 34
71230 5693 7 __switch_to process_64.c 673
76919 0 0 __switch_to process_64.c 639
43184 33743 43 __switch_to process_64.c 624
12740 64181 83 __switch_to process_64.c 594
12740 64174 83 __switch_to process_64.c 590

# cat /debug/tracing/profile_unlikely | \
awk '{ if ($3 > 25) print $0; }' |head -20
44963 35259 43 __switch_to process_64.c 624
12762 67454 84 __switch_to process_64.c 594
12762 67447 84 __switch_to process_64.c 590
1478 595 28 syscall_get_error syscall.h 51
0 2821 100 syscall_trace_leave ptrace.c 1567
0 1 100 native_smp_prepare_cpus smpboot.c 1237
86338 265881 75 calc_delta_fair sched_fair.c 408
210410 108540 34 calc_delta_mine sched.c 1267
0 54550 100 sched_info_queued sched_stats.h 222
51899 66435 56 pick_next_task_fair sched_fair.c 1422
6 10 62 yield_task_fair sched_fair.c 982
7325 2692 26 rt_policy sched.c 144
0 1270 100 pre_schedule_rt sched_rt.c 1261
1268 48073 97 pick_next_task_rt sched_rt.c 884
0 45181 100 sched_info_dequeued sched_stats.h 177
0 15 100 sched_move_task sched.c 8700
0 15 100 sched_move_task sched.c 8690
53167 33217 38 schedule sched.c 4457
0 80208 100 sched_info_switch sched_stats.h 270
30585 49631 61 context_switch sched.c 2619

# cat /debug/tracing/profile_likely | awk '{ if ($3 > 25) print $0; }'
39900 36577 47 pick_next_task sched.c 4397
20824 15233 42 switch_mm mmu_context_64.h 18
0 7 100 __cancel_work_timer workqueue.c 560
617 66484 99 clocksource_adjust timekeeping.c 456
0 346340 100 audit_syscall_exit auditsc.c 1570
38 347350 99 audit_get_context auditsc.c 732
0 345244 100 audit_syscall_entry auditsc.c 1541
38 1017 96 audit_free auditsc.c 1446
0 1090 100 audit_alloc auditsc.c 862
2618 1090 29 audit_alloc auditsc.c 858
0 6 100 move_masked_irq migration.c 9
1 198 99 probe_sched_wakeup trace_sched_switch.c 58
2 2 50 probe_wakeup trace_sched_wakeup.c 227
0 2 100 probe_wakeup_sched_switch trace_sched_wakeup.c 144
4514 2090 31 __grab_cache_page filemap.c 2149
12882 228786 94 mapping_unevictable pagemap.h 50
4 11 73 __flush_cpu_slab slub.c 1466
627757 330451 34 slab_free slub.c 1731
2959 61245 95 dentry_lru_del_init dcache.c 153
946 1217 56 load_elf_binary binfmt_elf.c 904
102 82 44 disk_put_part genhd.h 206
1 1 50 dst_gc_task dst.c 82
0 19 100 tcp_mss_split_point tcp_output.c 1126

As you can see by the above, there's a bit of work to do in rethinking
the use of some unlikelys and likelys. Note: the unlikely case had 71 hits
that were more than 25%.

Note: After submitting my first version of this patch, Andrew Morton
showed me a version written by Daniel Walker, where I picked up
the following ideas from:

1) Using __builtin_constant_p to avoid profiling fixed values.
2) Using __FILE__ instead of instruction pointers.
3) Using the preprocessor to stop all profiling of likely
annotations from vsyscall_64.c.

Thanks to Andrew Morton, Arjan van de Ven, Theodore Tso and Ingo Molnar
for their feed back on this patch.

(*) Not ever unlikely is recorded, those that are used by vsyscalls
(a few of them) had to have profiling disabled.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Theodore Tso <tytso@mit.edu>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 346e15be 12-Aug-2008 Jason Baron <jbaron@redhat.com>

driver core: basic infrastructure for per-module dynamic debug messages

Base infrastructure to enable per-module debug messages.

I've introduced CONFIG_DYNAMIC_PRINTK_DEBUG, which when enabled centralizes
control of debugging statements on a per-module basis in one /proc file,
currently, <debugfs>/dynamic_printk/modules. When, CONFIG_DYNAMIC_PRINTK_DEBUG,
is not set, debugging statements can still be enabled as before, often by
defining 'DEBUG' for the proper compilation unit. Thus, this patch set has no
affect when CONFIG_DYNAMIC_PRINTK_DEBUG is not set.

The infrastructure currently ties into all pr_debug() and dev_dbg() calls. That
is, if CONFIG_DYNAMIC_PRINTK_DEBUG is set, all pr_debug() and dev_dbg() calls
can be dynamically enabled/disabled on a per-module basis.

Future plans include extending this functionality to subsystems, that define
their own debug levels and flags.

Usage:

Dynamic debugging is controlled by the debugfs file,
<debugfs>/dynamic_printk/modules. This file contains a list of the modules that
can be enabled. The format of the file is as follows:

<module_name> <enabled=0/1>
.
.
.

<module_name> : Name of the module in which the debug call resides
<enabled=0/1> : whether the messages are enabled or not

For example:

snd_hda_intel enabled=0
fixup enabled=1
driver enabled=0

Enable a module:

$echo "set enabled=1 <module_name>" > dynamic_printk/modules

Disable a module:

$echo "set enabled=0 <module_name>" > dynamic_printk/modules

Enable all modules:

$echo "set enabled=1 all" > dynamic_printk/modules

Disable all modules:

$echo "set enabled=0 all" > dynamic_printk/modules

Finally, passing "dynamic_printk" at the command line enables
debugging for all modules. This mode can be turned off via the above
disable command.

[gkh: minor cleanups and tweaks to make the build work quietly]

Signed-off-by: Jason Baron <jbaron@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>


# d6c88a50 15-Oct-2008 Thomas Gleixner <tglx@linutronix.de>

genirq: revert dynarray

Revert the dynarray changes. They need more thought and polishing.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 1f3fcd4b 19-Aug-2008 Yinghai Lu <yhlu.kernel@gmail.com>

add per_cpu_dyn_array support

allow dyn-array in per_cpu area, allocated dynamically.

usage:

| /* in .h */
| struct kernel_stat {
| struct cpu_usage_stat cpustat;
| unsigned int *irqs;
| };
|
| /* in .c */
| DEFINE_PER_CPU(struct kernel_stat, kstat);
|
| DEFINE_PER_CPU_DYN_ARRAY_ADDR(per_cpu__kstat_irqs, per_cpu__kstat.irqs, sizeof(unsigned int), nr_irqs, sizeof(unsigned long), NULL);

after setup_percpu()/per_cpu_alloc_dyn_array(), the dyn_array in
per_cpu area is ready to use.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 3ddfda11 19-Aug-2008 Yinghai Lu <yhlu.kernel@gmail.com>

generic: add dyn_array support

Allow crazy big arrays via bootmem at init stage.
Architectures use CONFIG_HAVE_DYN_ARRAY to enable it.

usage:

| static struct irq_desc irq_desc_init __initdata = {
| .status = IRQ_DISABLED,
| .chip = &no_irq_chip,
| .handle_irq = handle_bad_irq,
| .depth = 1,
| .lock = __SPIN_LOCK_UNLOCKED(irq_desc->lock),
| #ifdef CONFIG_SMP
| .affinity = CPU_MASK_ALL
| #endif
| };
|
| static void __init init_work(void *data)
| {
| struct dyn_array *da = data;
| struct irq_desc *desc;
| int i;
|
| desc = *da->name;
|
| for (i = 0; i < *da->nr; i++)
| memcpy(&desc[i], &irq_desc_init, sizeof(struct irq_desc));
| }
|
| struct irq_desc *irq_desc;
| DEFINE_DYN_ARRAY(irq_desc, sizeof(struct irq_desc), nr_irqs, PAGE_SIZE, init_work);

after pre_alloc_dyn_array() after setup_arch(), the array is ready to be
used.

Via this facility we can replace irq_desc[NR_IRQS] array with dyn_array
irq_desc[nr_irqs].

v2: remove _nopanic in pre_alloc_dyn_array()

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 8da3821b 14-Aug-2008 Steven Rostedt <rostedt@goodmis.org>

ftrace: create __mcount_loc section

This patch creates a section in the kernel called "__mcount_loc".
This will hold a list of pointers to the mcount relocation for
each call site of mcount.

For example:

objdump -dr init/main.o
[...]
Disassembly of section .text:

0000000000000000 <do_one_initcall>:
0: 55 push %rbp
[...]
000000000000017b <init_post>:
17b: 55 push %rbp
17c: 48 89 e5 mov %rsp,%rbp
17f: 53 push %rbx
180: 48 83 ec 08 sub $0x8,%rsp
184: e8 00 00 00 00 callq 189 <init_post+0xe>
185: R_X86_64_PC32 mcount+0xfffffffffffffffc
[...]

We will add a section to point to each function call.

.section __mcount_loc,"a",@progbits
[...]
.quad .text + 0x185
[...]

The offset to of the mcount call site in init_post is an offset from
the start of the section, and not the start of the function init_post.
The mcount relocation is at the call site 0x185 from the start of the
.text section.

.text + 0x185 == init_post + 0xa

We need a way to add this __mcount_loc section in a way that we do not
lose the relocations after final link. The .text section here will
be attached to all other .text sections after final link and the
offsets will be meaningless. We need to keep track of where these
.text sections are.

To do this, we use the start of the first function in the section.
do_one_initcall. We can make a tmp.s file with this function as a reference
to the start of the .text section.

.section __mcount_loc,"a",@progbits
[...]
.quad do_one_initcall + 0x185
[...]

Then we can compile the tmp.s into a tmp.o

gcc -c tmp.s -o tmp.o

And link it into back into main.o.

ld -r main.o tmp.o -o tmp_main.o
mv tmp_main.o main.o

But we have a problem. What happens if the first function in a section
is not exported, and is a static function. The linker will not let
the tmp.o use it. This case exists in main.o as well.

Disassembly of section .init.text:

0000000000000000 <set_reset_devices>:
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
4: e8 00 00 00 00 callq 9 <set_reset_devices+0x9>
5: R_X86_64_PC32 mcount+0xfffffffffffffffc

The first function in .init.text is a static function.

00000000000000a8 t __setup_set_reset_devices
000000000000105f t __setup_str_set_reset_devices
0000000000000000 t set_reset_devices

The lowercase 't' means that set_reset_devices is local and is not exported.
If we simply try to link the tmp.o with the set_reset_devices we end
up with two symbols: one local and one global.

.section __mcount_loc,"a",@progbits
.quad set_reset_devices + 0x10

00000000000000a8 t __setup_set_reset_devices
000000000000105f t __setup_str_set_reset_devices
0000000000000000 t set_reset_devices
U set_reset_devices

We still have an undefined reference to set_reset_devices, and if we try
to compile the kernel, we will end up with an undefined reference to
set_reset_devices, or even worst, it could be exported someplace else,
and then we will have a reference to the wrong location.

To handle this case, we make an intermediate step using objcopy.
We convert set_reset_devices into a global exported symbol before linking
it with tmp.o and set it back afterwards.

00000000000000a8 t __setup_set_reset_devices
000000000000105f t __setup_str_set_reset_devices
0000000000000000 T set_reset_devices

00000000000000a8 t __setup_set_reset_devices
000000000000105f t __setup_str_set_reset_devices
0000000000000000 T set_reset_devices

00000000000000a8 t __setup_set_reset_devices
000000000000105f t __setup_str_set_reset_devices
0000000000000000 t set_reset_devices

Now we have a section in main.o called __mcount_loc that we can place
somewhere in the kernel using vmlinux.ld.S and access it to convert
all these locations that call mcount into nops before starting SMP
and thus, eliminating the need to do this with kstop_machine.

Note, A well documented perl script (scripts/recordmcount.pl) is used
to do all this in one location.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 97e1c18e 17-Jul-2008 Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>

tracing: Kernel Tracepoints

Implementation of kernel tracepoints. Inspired from the Linux Kernel
Markers. Allows complete typing verification by declaring both tracing
statement inline functions and probe registration/unregistration static
inline functions within the same macro "DEFINE_TRACE". No format string
is required. See the tracepoint Documentation and Samples patches for
usage examples.

Taken from the documentation patch :

"A tracepoint placed in code provides a hook to call a function (probe)
that you can provide at runtime. A tracepoint can be "on" (a probe is
connected to it) or "off" (no probe is attached). When a tracepoint is
"off" it has no effect, except for adding a tiny time penalty (checking
a condition for a branch) and space penalty (adding a few bytes for the
function call at the end of the instrumented function and adds a data
structure in a separate section). When a tracepoint is "on", the
function you provide is called each time the tracepoint is executed, in
the execution context of the caller. When the function provided ends its
execution, it returns to the caller (continuing from the tracepoint
site).

You can put tracepoints at important locations in the code. They are
lightweight hooks that can pass an arbitrary number of parameters, which
prototypes are described in a tracepoint declaration placed in a header
file."

Addition and removal of tracepoints is synchronized by RCU using the
scheduler (and preempt_disable) as guarantees to find a quiescent state
(this is really RCU "classic"). The update side uses rcu_barrier_sched()
with call_rcu_sched() and the read/execute side uses
"preempt_disable()/preempt_enable()".

We make sure the previous array containing probes, which has been
scheduled for deletion by the rcu callback, is indeed freed before we
proceed to the next update. It therefore limits the rate of modification
of a single tracepoint to one update per RCU period. The objective here
is to permit fast batch add/removal of probes on _different_
tracepoints.

Changelog :
- Use #name ":" #proto as string to identify the tracepoint in the
tracepoint table. This will make sure not type mismatch happens due to
connexion of a probe with the wrong type to a tracepoint declared with
the same name in a different header.
- Add tracepoint_entry_free_old.
- Change __TO_TRACE to get rid of the 'i' iterator.

Masami Hiramatsu <mhiramat@redhat.com> :
Tested on x86-64.

Performance impact of a tracepoint : same as markers, except that it
adds about 70 bytes of instructions in an unlikely branch of each
instrumented function (the for loop, the stack setup and the function
call). It currently adds a memory read, a test and a conditional branch
at the instrumentation site (in the hot path). Immediate values will
eventually change this into a load immediate, test and branch, which
removes the memory read which will make the i-cache impact smaller
(changing the memory read for a load immediate removes 3-4 bytes per
site on x86_32 (depending on mov prefixes), or 7-8 bytes on x86_64, it
also saves the d-cache hit).

About the performance impact of tracepoints (which is comparable to
markers), even without immediate values optimizations, tests done by
Hideo Aoki on ia64 show no regression. His test case was using hackbench
on a kernel where scheduler instrumentation (about 5 events in code
scheduler code) was added.

Quoting Hideo Aoki about Markers :

I evaluated overhead of kernel marker using linux-2.6-sched-fixes git
tree, which includes several markers for LTTng, using an ia64 server.

While the immediate trace mark feature isn't implemented on ia64, there
is no major performance regression. So, I think that we don't have any
issues to propose merging marker point patches into Linus's tree from
the viewpoint of performance impact.

I prepared two kernels to evaluate. The first one was compiled without
CONFIG_MARKERS. The second one was enabled CONFIG_MARKERS.

I downloaded the original hackbench from the following URL:
http://devresources.linux-foundation.org/craiger/hackbench/src/hackbench.c

I ran hackbench 5 times in each condition and calculated the average and
difference between the kernels.

The parameter of hackbench: every 50 from 50 to 800
The number of CPUs of the server: 2, 4, and 8

Below is the results. As you can see, major performance regression
wasn't found in any case. Even if number of processes increases,
differences between marker-enabled kernel and marker- disabled kernel
doesn't increase. Moreover, if number of CPUs increases, the differences
doesn't increase either.

Curiously, marker-enabled kernel is better than marker-disabled kernel
in more than half cases, although I guess it comes from the difference
of memory access pattern.

* 2 CPUs

Number of | without | with | diff | diff |
processes | Marker [Sec] | Marker [Sec] | [Sec] | [%] |
--------------------------------------------------------------
50 | 4.811 | 4.872 | +0.061 | +1.27 |
100 | 9.854 | 10.309 | +0.454 | +4.61 |
150 | 15.602 | 15.040 | -0.562 | -3.6 |
200 | 20.489 | 20.380 | -0.109 | -0.53 |
250 | 25.798 | 25.652 | -0.146 | -0.56 |
300 | 31.260 | 30.797 | -0.463 | -1.48 |
350 | 36.121 | 35.770 | -0.351 | -0.97 |
400 | 42.288 | 42.102 | -0.186 | -0.44 |
450 | 47.778 | 47.253 | -0.526 | -1.1 |
500 | 51.953 | 52.278 | +0.325 | +0.63 |
550 | 58.401 | 57.700 | -0.701 | -1.2 |
600 | 63.334 | 63.222 | -0.112 | -0.18 |
650 | 68.816 | 68.511 | -0.306 | -0.44 |
700 | 74.667 | 74.088 | -0.579 | -0.78 |
750 | 78.612 | 79.582 | +0.970 | +1.23 |
800 | 85.431 | 85.263 | -0.168 | -0.2 |
--------------------------------------------------------------

* 4 CPUs

Number of | without | with | diff | diff |
processes | Marker [Sec] | Marker [Sec] | [Sec] | [%] |
--------------------------------------------------------------
50 | 2.586 | 2.584 | -0.003 | -0.1 |
100 | 5.254 | 5.283 | +0.030 | +0.56 |
150 | 8.012 | 8.074 | +0.061 | +0.76 |
200 | 11.172 | 11.000 | -0.172 | -1.54 |
250 | 13.917 | 14.036 | +0.119 | +0.86 |
300 | 16.905 | 16.543 | -0.362 | -2.14 |
350 | 19.901 | 20.036 | +0.135 | +0.68 |
400 | 22.908 | 23.094 | +0.186 | +0.81 |
450 | 26.273 | 26.101 | -0.172 | -0.66 |
500 | 29.554 | 29.092 | -0.461 | -1.56 |
550 | 32.377 | 32.274 | -0.103 | -0.32 |
600 | 35.855 | 35.322 | -0.533 | -1.49 |
650 | 39.192 | 38.388 | -0.804 | -2.05 |
700 | 41.744 | 41.719 | -0.025 | -0.06 |
750 | 45.016 | 44.496 | -0.520 | -1.16 |
800 | 48.212 | 47.603 | -0.609 | -1.26 |
--------------------------------------------------------------

* 8 CPUs

Number of | without | with | diff | diff |
processes | Marker [Sec] | Marker [Sec] | [Sec] | [%] |
--------------------------------------------------------------
50 | 2.094 | 2.072 | -0.022 | -1.07 |
100 | 4.162 | 4.273 | +0.111 | +2.66 |
150 | 6.485 | 6.540 | +0.055 | +0.84 |
200 | 8.556 | 8.478 | -0.078 | -0.91 |
250 | 10.458 | 10.258 | -0.200 | -1.91 |
300 | 12.425 | 12.750 | +0.325 | +2.62 |
350 | 14.807 | 14.839 | +0.032 | +0.22 |
400 | 16.801 | 16.959 | +0.158 | +0.94 |
450 | 19.478 | 19.009 | -0.470 | -2.41 |
500 | 21.296 | 21.504 | +0.208 | +0.98 |
550 | 23.842 | 23.979 | +0.137 | +0.57 |
600 | 26.309 | 26.111 | -0.198 | -0.75 |
650 | 28.705 | 28.446 | -0.259 | -0.9 |
700 | 31.233 | 31.394 | +0.161 | +0.52 |
750 | 34.064 | 33.720 | -0.344 | -1.01 |
800 | 36.320 | 36.114 | -0.206 | -0.57 |
--------------------------------------------------------------

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Acked-by: Masami Hiramatsu <mhiramat@redhat.com>
Acked-by: 'Peter Zijlstra' <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# c6de0026 31-Jul-2008 Yoshinori Sato <ysato@users.sourceforge.jp>

Missing symbol prefix on vmlinux.lds.h

ARCH=h8300:

init/main.c:781: undefined reference to `___early_initcall_end'

Same problem have
__start___bug_table
__stop___bug_table
__tracedata_start
__tracedata_end
__per_cpu_start
__per_cpu_end

When defining a symbol in vmlinux.lds, use the VMLINUX_SYMBOL macro.
VMLINUX_SYMBOL adds a prefix charactor.

You can't just use straight symbol names in common header files as they
dont take into consideration weird arch-specific ABI conventions. in the
case of Blackfin/h8300, the ABI dictates that any C-visible symbols have
an underscore prefixed to them. Thus all symbols in vmlinux.lds.h need to
be wrapped in VMLINUX_SYMBOL() so that each arch can put hide this magic
in their own files.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: "Mike Frysinger" <vapier.adi@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# c2147a50 25-Jul-2008 Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>

Better interface for hooking early initcalls

Added early initcall (pre-SMP) support, using an identical interface to
that of regular initcalls. Functions called from do_pre_smp_initcalls()
could be converted to use this cleaner interface.

This is required by CPU hotplug, because early users have to register
notifiers before going SMP. One such CPU hotplug user is the relay
interface with buffer-only channels, which needs to register such a
notifier, to be usable in early code. This in turn is used by kmemtrace.

Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
Cc: Tom Zanussi <tzanussi@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# fb5e2b37 17-Jun-2008 Jan Beulich <jbeulich@novell.com>

vmlinux.lds: move __attribute__((__cold__)) functions back into final .text section

Due to the addition of __attribute__((__cold__)) to a few symbols
without adjusting the linker scripts, those symbols currently may end
up outside the [_stext,_etext) range, as they get placed in
.text.unlikely by (at least) gcc 4.3.0. This may confuse code not only
outside of the kernel, symbol_put_addr()'s BUG() could also trigger.
Hence we need to add .text.unlikely (and for future uses of
__attribute__((__hot__)) also .text.hot) to the TEXT_TEXT() macro.

Issue observed by Lukas Lipavsky.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Tested-by: Lukas Lipavsky <llipavsky@suse.cz>
Cc: <stable@kernel.org>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>


# 5658c769 23-May-2008 David Woodhouse <dwmw2@infradead.org>

firmware: allow firmware files to be built into kernel image

Some drivers have their own hacks to bypass the kernel's firmware loader
and build their firmware into the kernel; this renders those unnecessary.

Other drivers don't use the firmware loader at all, because they always
want the firmware to be available. This allows them to start using the
firmware loader.

A third set of drivers already use the firmware loader, but can't be
used without help from userspace, which sometimes requires an initrd.
This allows them to work in a static kernel.

Signed-off-by: David Woodhouse <dwmw2@infradead.org>


# e1a2a51e 15-May-2008 Rafael J. Wysocki <rjw@rjwysocki.net>

Suspend/Resume bug in PCI layer wrt quirks

Some quirks should be called with interrupt disabled, we can't directly
call them in .resume_early. Also the patch introduces
pci_fixup_resume_early and pci_fixup_suspend, which matches current
device core callbacks (.suspend/.resume_early).

TBD: Somebody knows why we need quirk resume should double check if a
quirk should be called in resume or resume_early. I changed some per my
understanding, but can't make sure I fixed all.

Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# 63687a52 12-May-2008 Jan Beulich <jbeulich@novell.com>

x86: move tracedata to RODATA

.. allowing it to be write-protected just as other read-only data
under CONFIG_DEBUG_RODATA.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 6360b1fb 12-May-2008 Jan Beulich <jbeulich@novell.com>

move BUG_TABLE into RODATA

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 63cc8c75 12-May-2008 Eric Dumazet <dada1@cosmosbay.com>

percpu: introduce DEFINE_PER_CPU_PAGE_ALIGNED() macro

While examining holes in percpu section I found this :

c05f5000 D per_cpu__current_task
c05f5000 D __per_cpu_start
c05f5004 D per_cpu__cpu_number
c05f5008 D per_cpu__irq_regs
c05f500c d per_cpu__cpu_devices
c05f5040 D per_cpu__cyc2ns

<Big Hole of about 4000 bytes>

c05f6000 d per_cpu__cpuid4_info
c05f6004 d per_cpu__cache_kobject
c05f6008 d per_cpu__index_kobject

<Big Hole of about 4000 bytes>

c05f7000 D per_cpu__gdt_page

This is because gdt_page is a percpu variable, defined with
a page alignement, and linker is doing its job, two times because of .o
nesting in the build process.

I introduced a new macro DEFINE_PER_CPU_PAGE_ALIGNED() to avoid
wasting this space. All page aligned variables (only one at this time)
are put in a separate
subsection .data.percpu.page_aligned, at the very begining of percpu zone.

Before patch , on a x86_32 machine :

.data.percpu 30232 3227471872
.data.percpu 22168 3227471872

Thats 8064 bytes saved for each CPU.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 37c514e3 19-Feb-2008 Sam Ravnborg <sam@ravnborg.org>

Add missing init section definitions

When adding __devinitconst etc. the __initconst variant
were missed.
Add this one and proper definitions for .head.text for use
in .S files.
The naming .head.text is preferred over .text.head as the
latter will conflict for a function named head when introducing
-ffunctions-sections.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>


# edeed305 30-Jan-2008 Arjan van de Ven <arjan@infradead.org>

x86: add testcases for RODATA and NX protections/attributes

Latest update; I now have 4 NX tests, but 2 fail so they're #if 0'd.
I also cleaned up the NX test code quite a bit, and got rid of the ugly
exception table sorting stuff.

From: Arjan van de Ven <arjan@linux.intel.com>

This patch adds testcases for the CONFIG_DEBUG_RODATA configuration option
as well as the NX CPU feature/mappings. Both testcases can move to tests/
once that patch gets merged into mainline.
(I'm half considering moving the rodata test into mm/init.c but I'll
wait with that until init.c is unified)

As part of this I had to fix a not-quite-right alignment in the vmlinux.lds.h
for the RODATA sections, which lead to 1 page less being marked read only.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 312b1485 28-Jan-2008 Sam Ravnborg <sam@ravnborg.org>

Introduce new section reference annotations tags: __ref, __refdata, __refconst

Today we have the following annotations for functions/data
referencing __init/__exit functions / data:

__init_refok => for init functions
__initdata_refok => for init data
__exit_refok => for exit functions

There is really no difference between the __init and __exit
versions and simplify it and to introduce a shorter annotation
the following new annotations are introduced:

__ref => for functions (code) that
references __*init / __*exit
__refdata => for variables
__refconst => for const variables

Whit this annotation is it more obvious what the annotation
is for and there is no longer the arbitary division
between __init and __exit code.

The mechanishm is the same as before - a special section
is created which is made part of the usual sections
in the linker script.

We will start to see annotations like this:

-static struct pci_serial_quirk pci_serial_quirks[] = {
+static const struct pci_serial_quirk pci_serial_quirks[] __refconst = {
-----------------
-static struct notifier_block __cpuinitdata cpuid_class_cpu_notifier =
+static struct notifier_block cpuid_class_cpu_notifier __refdata =
----------------
-static int threshold_cpu_callback(struct notifier_block *nfb,
+static int __ref threshold_cpu_callback(struct notifier_block *nfb,

[The above is just random samples].

Note: No modifications were needed in modpost
to support the new sections due to the newly introduced
blacklisting.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>


# 1a3fb6d4 24-Jan-2008 Adrian Bunk <bunk@kernel.org>

asm-generic/vmlix.lds.h: simplify __mem{init,exit}* dependencies

Simplify the dependencies on __mem{init,exit}* (ACPI_HOTPLUG_MEMORY requires
MEMORY_HOTPLUG).

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>


# eb8f6890 20-Jan-2008 Sam Ravnborg <sam@ravnborg.org>

Use separate sections for __dev/__cpu/__mem code/data

Introducing separate sections for __dev* (HOTPLUG),
__cpu* (HOTPLUG_CPU) and __mem* (MEMORY_HOTPLUG)
allows us to do a much more reliable Section mismatch
check in modpost. We are no longer dependent on the actual
configuration of for example HOTPLUG.

This has the effect that all users see much more
Section mismatch warnings than before because they
were almost all hidden when HOTPLUG was enabled.
The advantage of this is that when building a piece
of code then it is much more likely that the Section
mismatch errors are spotted and the warnings will be
felt less random of nature.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Cc: Greg KH <greg@kroah.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Adrian Bunk <bunk@kernel.org>


# 01ba2bdc 20-Jan-2008 Sam Ravnborg <sam@ravnborg.org>

all archs: consolidate init and exit sections in vmlinux.lds.h

This patch consolidate all definitions of .init.text, .init.data
and .exit.text, .exit.data section definitions in
the generic vmlinux.lds.h.

This is a preparational patch - alone it does not buy
us much good.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>


# 8256e47c 19-Oct-2007 Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>

Linux Kernel Markers

The marker activation functions sits in kernel/marker.c. A hash table is used
to keep track of the registered probes and armed markers, so the markers
within a newly loaded module that should be active can be activated at module
load time.

marker_query has been removed. marker_get_first, marker_get_next and
marker_release should be used as iterators on the markers.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Acked-by: "Frank Ch. Eigler" <fche@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Mike Mason <mmlnx@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 23ec23c2 13-Oct-2007 Al Viro <viro@ftp.linux.org.uk>

fix sparc32 breakage (result of vmlinux.lds.S bug)

In commit 4665079cbb2a3e17de82f2ab2940b9f97f37d65e ("[NETNS]: Move some
code into __init section when CONFIG_NET_NS=n") we got a new section -
.exit.text.refok (more of 'let's tell modpost that some bogus calls are
not bogus', a-la text.init.refok).

Unfortunately, the commit in question forgot to add it to TEXT_TEXT,
with rather amusing results.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# cbe87121 19-Jul-2007 Roland McGrath <roland@redhat.com>

i386: Put allocated ELF notes in read-only data segment

This changes the i386 linker script and the asm-generic macro it uses so that
ELF note sections with SHF_ALLOC set are linked into the kernel image along
with other read-only data. The PT_NOTE also points to their location.

This paves the way for putting useful build-time information into ELF notes
that can be found easily later in a kernel memory dump.

Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 5fb7dc37 19-Jul-2007 Fenghua Yu <fenghua.yu@intel.com>

define new percpu interface for shared data

per cpu data section contains two types of data. One set which is
exclusively accessed by the local cpu and the other set which is per cpu,
but also shared by remote cpus. In the current kernel, these two sets are
not clearely separated out. This can potentially cause the same data
cacheline shared between the two sets of data, which will result in
unnecessary bouncing of the cacheline between cpus.

One way to fix the problem is to cacheline align the remotely accessed per
cpu data, both at the beginning and at the end. Because of the padding at
both ends, this will likely cause some memory wastage and also the
interface to achieve this is not clean.

This patch:

Moves the remotely accessed per cpu data (which is currently marked
as ____cacheline_aligned_in_smp) into a different section, where all the data
elements are cacheline aligned. And as such, this differentiates the local
only data and remotely accessed data cleanly.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: <linux-arch@vger.kernel.org>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 4096b46f 29-May-2007 Sam Ravnborg <sam@ravnborg.org>

sparc64: fix alignment bug in linker definition script

The RO_DATA section were hardcoded to a specific
alignment in include/asm-generic/vmlinux.h.
But for sparc64 this did not match the PAGE_SIZE.

Introduce a new section definition named:
RO_DATA that takes actual alignment as parameter.
RODATA are provided for backward compatibility.

On top of this avoid hardcoding alignment for
sparc64 in reset of the script
Fix is build-tested on sparc64 + x86_64.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>


# 0e0d314e 17-May-2007 Sam Ravnborg <sam@ravnborg.org>

kbuild: introduce __init_refok/__initdata_refok to supress section mismatch warnings

Throughout the kernel there are a few legitimite references
to init or exit sections. Most of these are covered by the
patterns included in modpost but a few nees special attention.
To avoid hardcoding a lot of function names in modpost introduce
a marker so relevant function/data can be marked.
When modpost see a reference to a init/exit function from
a function/data marked no warning will be issued.

Idea from: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Cc: Andrew Morton <akpm@linux-foundation.org>


# ca967258 17-May-2007 Sam Ravnborg <sam@ravnborg.org>

all-archs: consolidate .data section definition in asm-generic

With this consolidation we can now modify the .data
section definition in one spot for all archs.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>


# 7664709b 12-May-2007 Sam Ravnborg <sam@ravnborg.org>

all-archs: consolidate .text section definition in asm-generic

Move definition of .text section to asm-generic.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>


# 03df4f6e 02-May-2007 Jeremy Fitzhardinge <jeremy@goop.org>

[PATCH] i386: Clean up ELF note generation

Three cleanups:

1: ELF notes are never mapped, so there's no need to have any access
flags in their phdr.

2: When generating them from asm, tell the assembler to use a SHT_NOTE
section type. There doesn't seem to be a way to do this from C.

3: Use ANSI rather than traditional cpp behaviour to stringify the
macro argument.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Eric W. Biederman <ebiederm@xmission.com>


# 1597cacb 04-Dec-2006 Alan Cox <alan@lxorguk.ukuu.org.uk>

PCI: Fix multiple problems with VIA hardware

This patch is designed to fix:
- Disk eating corruptor on KT7 after resume from RAM
- VIA IRQ handling
- VIA fixups for bus lockups after resume from RAM

The core of this is to add a table of resume fixups run at resume time.
We need to do this for a variety of boards and features, but particularly
we need to do this to get various critical VIA fixups done on resume.

The second part of the problem is to handle VIA IRQ number rules which
are a bit odd and need special handling for PIC interrupts. Various
patches broke various boxes and while this one may not be perfect
(hopefully it is) it ensures the workaround is applied to the right
devices only.

From: Jean Delvare <khali@linux-fr.org>

Now that PCI quirks are replayed on software resume, we can safely
re-enable the Asus SMBus unhiding quirk even when software suspend support
is enabled.

[akpm@osdl.org: fix const warning]
Signed-off-by: Alan Cox <alan@redhat.com>
Cc: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>


# d1526e2c 15-Dec-2006 Linus Torvalds <torvalds@woody.osdl.org>

Remove stack unwinder for now

It has caused more problems than it ever really solved, and is
apparently not getting cleaned up and fixed. We can put it back when
it's stable and isn't likely to make warning or bug events worse.

In the meantime, enable frame pointers for more readable stack traces.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 8d610dd5 11-Dec-2006 Linus Torvalds <torvalds@woody.osdl.org>

Make sure we populate the initroot filesystem late enough

We should not initialize rootfs before all the core initializers have
run. So do it as a separate stage just before starting the regular
driver initializers.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 7664c5a1 08-Dec-2006 Jeremy Fitzhardinge <jeremy@goop.org>

[PATCH] Generic BUG implementation

This patch adds common handling for kernel BUGs, for use by architectures as
they wish. The code is derived from arch/powerpc.

The advantages of having common BUG handling are:
- consistent BUG reporting across architectures
- shared implementation of out-of-line file/line data
- implement CONFIG_DEBUG_BUGVERBOSE consistently

This means that in inline impact of BUG is just the illegal instruction
itself, which is an improvement for i386 and x86-64.

A BUG is represented in the instruction stream as an illegal instruction,
which has file/line information associated with it. This extra information is
stored in the __bug_table section in the ELF file.

When the kernel gets an illegal instruction, it first confirms it might
possibly be from a BUG (ie, in kernel mode, the right illegal instruction).
It then calls report_bug(). This searches __bug_table for a matching
instruction pointer, and if found, prints the corresponding file/line
information. If report_bug() determines that it wasn't a BUG which caused the
trap, it returns BUG_TRAP_TYPE_NONE.

Some architectures (powerpc) implement WARN using the same mechanism; if the
illegal instruction was the result of a WARN, then report_bug(Q) returns
CONFIG_DEBUG_BUGVERBOSE; otherwise it returns BUG_TRAP_TYPE_BUG.

lib/bug.c keeps a list of loaded modules which can be searched for __bug_table
entries. The architecture must call
module_bug_finalize()/module_bug_cleanup() from its corresponding
module_finalize/cleanup functions.

Unsetting CONFIG_DEBUG_BUGVERBOSE will reduce the kernel size by some amount.
At the very least, filename and line information will not be recorded for each
but, but architectures may decide to store no extra information per BUG at
all.

Unfortunately, gcc doesn't have a general way to mark an asm() as noreturn, so
architectures will generally have to include an infinite loop (or similar) in
the BUG code, so that gcc knows execution won't continue beyond that point.
gcc does have a __builtin_trap() operator which may be useful to achieve the
same effect, unfortunately it cannot be used to actually implement the BUG
itself, because there's no way to get the instruction's address for use in
generating the __bug_table entry.

[randy.dunlap@oracle.com: Handle BUG=n, GENERIC_BUG=n to prevent build errors]
[bunk@stusta.de: include/linux/bug.h must always #include <linux/module.h]
Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Andi Kleen <ak@muc.de>
Cc: Hugh Dickens <hugh@veritas.com>
Cc: Michael Ellerman <michael@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# b65780e1 06-Dec-2006 Jan Beulich <jbeulich@novell.com>

[PATCH] unwinder: move .eh_frame to RODATA

The .eh_frame section contents is never written to, so it can as well
benefit from CONFIG_DEBUG_RODATA.

Diff-ed against firstfloor tree.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>


# 6569580d 06-Dec-2006 Vivek Goyal <vgoyal@in.ibm.com>

[PATCH] i386: Distinguish absolute symbols

Ld knows about 2 kinds of symbols, absolute and section
relative. Section relative symbols symbols change value
when a section is moved and absolute symbols do not.

Currently in the linker script we have several labels
marking the beginning and ending of sections that
are outside of sections, making them absolute symbols.
Having a mixture of absolute and section relative
symbols refereing to the same data is currently harmless
but it is confusing.

This must be done carefully as newer revs of ld do not place
symbols that appear in sections without data and instead
ld makes those symbols global :(

My ultimate goal is to build a relocatable kernel. The
safest and least intrusive technique is to generate
relocation entries so the kernel can be relocated at load
time. The only penalty would be an increase in the size
of the kernel binary. The problem is that if absolute and
relocatable symbols are not properly specified absolute symbols
will be relocated or section relative symbols won't be, which
is fatal.

The practical motivation is that when generating kernels that
will run from a reserved area for analyzing what caused
a kernel panic, it is simpler if you don't need to hard code
the physical memory location they will run at, especially
for the distributions.

[AK: and merged:]

o Also put a message so that in future people can be aware of it and
avoid introducing absolute symbols.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>


# b3438f82 20-Nov-2006 Linus Torvalds <torvalds@woody.osdl.org>

Add "pure_initcall" for static variable initialization

This is a quick hack to overcome the fact that SRCU currently does not
allow static initializers, and we need to sometimes initialize those
things before any other initializers (even "core" ones) can do so.

Currently we don't allow this at all for modules, and the only user that
needs is right now is cpufreq. As reported by Thomas Gleixner:

"Commit b4dfdbb3c707474a2254c5b4d7e62be31a4b7da9 ("[PATCH] cpufreq:
make the transition_notifier chain use SRCU breaks cpu frequency
notification users, which register the callback > on core_init
level."

Cc: Thomas Gleixner <tglx@timesys.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Andrew Morton <akpm@osdl.org>,
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 735a7ffb 27-Oct-2006 Andrew Morton <akpm@osdl.org>

[PATCH] drivers: wait for threaded probes between initcall levels

The multithreaded-probing code has a problem: after one initcall level (eg,
core_initcall) has been processed, we will then start processing the next
level (postcore_initcall) while the kernel threads which are handling
core_initcall are still executing. This breaks the guarantees which the
layered initcalls previously gave us.

IOW, we want to be multithreaded _within_ an initcall level, but not between
different levels.

Fix that up by causing the probing code to wait for all outstanding probes at
one level to complete before we start processing the next level.

Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 61ce1efe 27-Oct-2006 Andrew Morton <akpm@osdl.org>

[PATCH] vmlinux.lds: consolidate initcall sections

Add a vmlinux.lds.h helper macro for defining the eight-level initcall table,
teach all the architectures to use it.

This is a prerequisite for a patch which performs initcall synchronisation for
multithreaded-probing.

Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
[ Added AVR32 as well ]
Signed-off-by: Haavard Skinnemoen <hskinnemoen@atmel.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 690a973f 21-Oct-2006 Jan Beulich <jbeulich@novell.com>

[PATCH] x86-64: Speed up dwarf2 unwinder

This changes the dwarf2 unwinder to do a binary search for CIEs
instead of a linear work. The linker is unfortunately not
able to build a proper lookup table at link time, instead it creates
one at runtime as soon as the bootmem allocator is usable (so you'll continue
using the linear lookup for the first [hopefully] few calls).
The code should be ready to utilize a build-time created table once
a fixed linker becomes available.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>


# 7583ddfd 27-Sep-2006 Marcelo Tosatti <marcelo@kvack.org>

[PATCH] Include __param section in read-only data range

The param section is an array of "kernel_param" structures, storing only
constant data: pointer to name, permission of the variable pointed to by
(void *)arg and pointers to set/get methods.

Move end_rodata down to include __param section in the read-only range used
by CONFIG_DEBUG_RODATA.

Signed-off-by: Marcelo Tosatti <marcelo@kvack.org>
Acked-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 9c9b8b38 26-Sep-2006 Jeremy Fitzhardinge <jeremy@xensource.com>

[PATCH] x86: put .note.* sections into a PT_NOTE segment in vmlinux

This patch will pack any .note.* section into a PT_NOTE segment in the output
file.

To do this, we tell ld that we need a PT_NOTE segment. This requires us to
start explicitly mapping sections to segments, so we also need to explicitly
create PT_LOAD segments for text and data, and map the sections to them
appropriately. Fortunately, each section will default to its previous
section's segment, so it doesn't take many changes to vmlinux.lds.S.

This only changes i386 for now, but I presume the corresponding changes for
other architectures will be as simple.

This change also adds <linux/elfnote.h>, which defines C and Assembler macros
for actually creating ELF notes.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# f71d20e9 28-Jun-2006 Arjan van de Ven <arjan@linux.intel.com>

[PATCH] Add EXPORT_UNUSED_SYMBOL and EXPORT_UNUSED_SYMBOL_GPL

Temporarily add EXPORT_UNUSED_SYMBOL and EXPORT_UNUSED_SYMBOL_GPL. These
will be used as a transition measure for symbols that aren't used in the
kernel and are on the way out. When a module uses such a symbol, a warning
is printk'd at modprobe time.

The main reason for removing unused exports is size: eacho export takes
roughly between 100 and 150 bytes of kernel space in the binary. This
patch gives users the option to immediately get this size gain via a config
option.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 9f28bb7e 20-Mar-2006 Greg Kroah-Hartman <gregkh@suse.de>

[PATCH] add EXPORT_SYMBOL_GPL_FUTURE()

This patch adds the ability to mark symbols that will be changed in the
future, so that kernel modules that don't include MODULE_LICENSE("GPL")
and use the symbols, will be flagged and printed out to the system log.

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>


# 37b73c82 06-Jan-2006 Arjan van de Ven <arjan@infradead.org>

[PATCH] x86/x86_64: mark rodata section read only: generic infrastructure

Generic prep-work for marking the .rodata section readonly:
* Align the rodata section at 4Kb boundary
* call the mark_rodata_ro() function when available

Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 394b701c 07-Nov-2005 Matt Porter <mporter@kernel.crashing.org>

[PATCH] RapidIO support: core base

Adds a RapidIO subsystem to the kernel. RIO is a switched fabric interconnect
used in higher-end embedded applications. The curious can look at the specs
over at http://www.rapidio.org

The core code implements enumeration/discovery, management of
devices/resources, and interfaces for RIO drivers.

There's a lot more to do to take advantages of all the hardware features.
However, this should provide a good base for folks with RIO hardware to start
contributing.

Signed-off-by: Matt Porter <mporter@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# a7d0c210 10-Sep-2005 Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>

[PATCH] i386 / uml: add dwarf sections to static link script

Inside the linker script, insert the code for DWARF debug info sections. This
may help GDB'ing a Uml binary. Actually, it seems that ld is able to guess
what I added correctly, but normal linker scripts include this section so it
should be correct anyway adding it.

On request by Sam Ravnborg <sam@ravnborg.org>, I've added it to
asm-generic/vmlinux.lds.s. I've also moved there the stabs debug section,
used the new macro in i386 linker script and added DWARF debug section to
that.

In the truth, I've not been able to verify the difference in GDB behaviour
after this change (I've seen large improvements with another patch). This
may depend on my binutils version, older one may have worse defaults.

However, this section is present in normal linker script, so add it at
least for the sake of cleanness.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# d0aaff97 06-Sep-2005 Prasanna S Panchamukhi <prasanna@in.ibm.com>

[PATCH] Kprobes: prevent possible race conditions generic

There are possible race conditions if probes are placed on routines within the
kprobes files and routines used by the kprobes. For example if you put probe
on get_kprobe() routines, the system can hang while inserting probes on any
routine such as do_fork(). Because while inserting probes on do_fork(),
register_kprobes() routine grabs the kprobes spin lock and executes
get_kprobe() routine and to handle probe of get_kprobe(), kprobes_handler()
gets executed and tries to grab kprobes spin lock, and spins forever. This
patch avoids such possible race conditions by preventing probes on routines
within the kprobes file and routines used by kprobes.

I have modified the patches as per Andi Kleen's suggestion to move kprobes
routines and other routines used by kprobes to a seperate section
.kprobes.text.

Also moved page fault and exception handlers, general protection fault to
.kprobes.text section.

These patches have been tested on i386, x86_64 and ppc64 architectures, also
compiled on ia64 and sparc64 architectures.

Signed-off-by: Prasanna S Panchamukhi <prasanna@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 6d30e3a8 14-Jul-2005 Sam Ravnborg <sam@mars.(none)>

kbuild: Avoid inconsistent kallsyms data

Several reports on inconsistent kallsyms data has been caused by the aliased symbols
__sched_text_start and __down to shift places in the output of nm.
The root cause was that on second pass ld aligned __sched_text_start to a 4 byte boundary
which is the function alignment on i386.
sched.text and spinlock.text is now aligned to an 8 byte boundary to make sure they
are aligned to a function alignemnt on most (all?) archs.

Tested by: Paulo Marques <pmarques@grupopie.com>
Tested by: Alexander Stohr <Alexander.Stohr@gmx.de>

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>


# 60bad7fa 25-Jun-2005 Eric W. Biederman <ebiederm@xmission.com>

[PATCH] kexec: vmlinux: fix physical addresses

In vmlinux.lds.h the code is carefull to define every section so vmlinux
properly reports the correct physical load address of code, as well as
it's virtual address.

The new SECURITY_INIT definition fails to follow that convention and
and causes incorrect physical address to appear in the vmlinux if
there are any security initcalls.

This patch updates the SECURITY_INIT to follow the convention in the rest of
the file.

Signed-off-by: Eric Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 1da177e4 16-Apr-2005 Linus Torvalds <torvalds@ppc970.osdl.org>

Linux-2.6.12-rc2

Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.

Let it rip!