History log of /linux-master/arch/powerpc/kernel/rtas.c
Revision Date Author Comments
# fad87dbd 22-Feb-2024 Nathan Lynch <nathanl@linux.ibm.com>

powerpc/rtas: use correct function name for resetting TCE tables

The PAPR spec spells the function name as

"ibm,reset-pe-dma-windows"

but in practice firmware uses the singular form:

"ibm,reset-pe-dma-window"

in the device tree. Since we have the wrong spelling in the RTAS
function table, reverse lookups (token -> name) fail and warn:

unexpected failed lookup for token 86
WARNING: CPU: 1 PID: 545 at arch/powerpc/kernel/rtas.c:659 __do_enter_rtas_trace+0x2a4/0x2b4
CPU: 1 PID: 545 Comm: systemd-udevd Not tainted 6.8.0-rc4 #30
Hardware name: IBM,9105-22A POWER10 (raw) 0x800200 0xf000006 of:IBM,FW1060.00 (NL1060_028) hv:phyp pSeries
NIP [c0000000000417f0] __do_enter_rtas_trace+0x2a4/0x2b4
LR [c0000000000417ec] __do_enter_rtas_trace+0x2a0/0x2b4
Call Trace:
__do_enter_rtas_trace+0x2a0/0x2b4 (unreliable)
rtas_call+0x1f8/0x3e0
enable_ddw.constprop.0+0x4d0/0xc84
dma_iommu_dma_supported+0xe8/0x24c
dma_set_mask+0x5c/0xd8
mlx5_pci_init.constprop.0+0xf0/0x46c [mlx5_core]
probe_one+0xfc/0x32c [mlx5_core]
local_pci_probe+0x68/0x12c
pci_call_probe+0x68/0x1ec
pci_device_probe+0xbc/0x1a8
really_probe+0x104/0x570
__driver_probe_device+0xb8/0x224
driver_probe_device+0x54/0x130
__driver_attach+0x158/0x2b0
bus_for_each_dev+0xa8/0x120
driver_attach+0x34/0x48
bus_add_driver+0x174/0x304
driver_register+0x8c/0x1c4
__pci_register_driver+0x68/0x7c
mlx5_init+0xb8/0x118 [mlx5_core]
do_one_initcall+0x60/0x388
do_init_module+0x7c/0x2a4
init_module_from_file+0xb4/0x108
idempotent_init_module+0x184/0x34c
sys_finit_module+0x90/0x114

And oopses are possible when lockdep is enabled or the RTAS
tracepoints are active, since those paths dereference the result of
the lookup.

Use the correct spelling to match firmware's behavior, adjusting the
related constants to match.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Fixes: 8252b88294d2 ("powerpc/rtas: improve function information lookups")
Reported-by: Gaurav Batra <gbatra@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240222-rtas-fix-ibm-reset-pe-dma-window-v1-1-7aaf235ac63c@linux.ibm.com


# e3681107 12-Dec-2023 Nathan Lynch <nathanl@linux.ibm.com>

powerpc/rtas: Warn if per-function lock isn't held

If the function descriptor has a populated lock member, then callers
are required to hold it across calls. Now that the firmware activation
sequence is appropriately guarded, we can warn when the requirement
isn't satisfied.

__do_enter_rtas_trace() gets reorganized a bit as a result of
performing the function descriptor lookup unconditionally now.

Reviewed-by: "Aneesh Kumar K.V (IBM)" <aneesh.kumar@kernel.org>
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20231212-papr-sys_rtas-vs-lockdown-v6-8-e9eafd0c8c6c@linux.ibm.com


# dc7637c4 12-Dec-2023 Nathan Lynch <nathanl@linux.ibm.com>

powerpc/rtas: Serialize firmware activation sequences

Use rtas_ibm_activate_firmware_lock to prevent interleaving call
sequences of the ibm,activate-firmware RTAS function, which typically
requires multiple calls to complete the update. While the spec does
not specifically prohibit interleaved sequences, there's almost
certainly no advantage to allowing them.

Reviewed-by: "Aneesh Kumar K.V (IBM)" <aneesh.kumar@kernel.org>
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20231212-papr-sys_rtas-vs-lockdown-v6-7-e9eafd0c8c6c@linux.ibm.com


# adf7a019 12-Dec-2023 Nathan Lynch <nathanl@linux.ibm.com>

powerpc/rtas: Facilitate high-level call sequences

On RTAS platforms there is a general restriction that the OS must not
enter RTAS on more than one CPU at a time. This low-level
serialization requirement is satisfied by holding a spin
lock (rtas_lock) across most RTAS function invocations.

However, some pseries RTAS functions require multiple successive calls
to complete a logical operation. Beginning a new call sequence for such a
function may disrupt any other sequences of that function already in
progress. Safe and reliable use of these functions effectively
requires higher-level serialization beyond what is already done at the
level of RTAS entry and exit.

Where a sequence-based RTAS function is invoked only through
sys_rtas(), with no in-kernel users, there is no issue as far as the
kernel is concerned. User space is responsible for appropriately
serializing its call sequences. (Whether user space code actually
takes measures to prevent sequence interleaving is another matter.)
Examples of such functions currently include ibm,platform-dump and
ibm,get-vpd.

But where a sequence-based RTAS function has both user space and
in-kernel uesrs, there is a hazard. Even if the in-kernel call sites
of such a function serialize their sequences correctly, a user of
sys_rtas() can invoke the same function at any time, potentially
disrupting a sequence in progress.

So in order to prevent disruption of kernel-based RTAS call sequences,
they must serialize not only with themselves but also with sys_rtas()
users, somehow. Preferably without adding more function-specific hacks
to sys_rtas(). This is a prerequisite for adding an in-kernel call
sequence of ibm,get-vpd, which is in a change to follow.

Note that it has never been feasible for the kernel to prevent
sys_rtas()-based sequences from being disrupted because control
returns to user space on every call. sys_rtas()-based users of these
functions have always been, and continue to be, responsible for
coordinating their call sequences with other users, even those which
may invoke the RTAS functions through less direct means than
sys_rtas(). This is an unavoidable consequence of exposing
sequence-based RTAS functions through sys_rtas().

* Add an optional mutex member to struct rtas_function.

* Statically define a mutex for each RTAS function with known call
sequence serialization requirements, and assign its address to the
.lock member of the corresponding function table entry, along with
justifying commentary.

* In sys_rtas(), if the table entry for the RTAS function being
called has a populated lock member, acquire it before taking
rtas_lock and entering RTAS.

* Kernel-based RTAS call sequences are expected to access the
appropriate mutex explicitly by name. For example, a user of the
ibm,activate-firmware RTAS function would do:

int token = rtas_function_token(RTAS_FN_IBM_ACTIVATE_FIRMWARE);
int fwrc;

mutex_lock(&rtas_ibm_activate_firmware_lock);

do {
fwrc = rtas_call(token, 0, 1, NULL);
} while (rtas_busy_delay(fwrc));

mutex_unlock(&rtas_ibm_activate_firmware_lock);

There should be no perceivable change introduced here except that
concurrent callers of the same RTAS function via sys_rtas() may block
on a mutex instead of spinning on rtas_lock.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20231212-papr-sys_rtas-vs-lockdown-v6-6-e9eafd0c8c6c@linux.ibm.com


# e7582edb 12-Dec-2023 Nathan Lynch <nathanl@linux.ibm.com>

powerpc/rtas: Move token validation from block_rtas_call() to sys_rtas()

The rtas system call handler sys_rtas() delegates certain input
validation steps to a helper function: block_rtas_call(). One of these
steps ensures that the user-supplied token value maps to a known RTAS
function. This is done by performing a "reverse" token-to-function
lookup via rtas_token_to_function_untrusted() to obtain an
rtas_function object.

In changes to come, sys_rtas() itself will need the function
descriptor for the token. To prepare:

* Move the lookup and validation up into sys_rtas() and pass the
resulting rtas_function pointer to block_rtas_call(), which is
otherwise unconcerned with the token value.

* Change block_rtas_call() to report the RTAS function name instead of
the token value on validation failures, since it can now rely on
having a valid function descriptor.

One behavior change is that sys_rtas() now silently errors out when
passed a bad token, before calling block_rtas_call(). So we will no
longer log "RTAS call blocked - exploit attempt?" on invalid
tokens. This is consistent with how sys_rtas() currently handles other
"metadata" (nargs and nret), while block_rtas_call() is primarily
concerned with validating the arguments to be passed to specific RTAS
functions.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20231212-papr-sys_rtas-vs-lockdown-v6-5-e9eafd0c8c6c@linux.ibm.com


# 669acc7e 12-Dec-2023 Nathan Lynch <nathanl@linux.ibm.com>

powerpc/rtas: Fall back to linear search on failed token->function lookup

Enabling any of the powerpc:rtas_* tracepoints at boot is likely to
result in an oops on RTAS platforms. For example, booting a QEMU
pseries model with 'trace_event=powerpc:rtas_input' in the command
line leads to:

BUG: Kernel NULL pointer dereference on read at 0x00000008
Oops: Kernel access of bad area, sig: 7 [#1]
NIP [c00000000004231c] do_enter_rtas+0x1bc/0x460
LR [c00000000004231c] do_enter_rtas+0x1bc/0x460
Call Trace:
do_enter_rtas+0x1bc/0x460 (unreliable)
rtas_call+0x22c/0x4a0
rtas_get_boot_time+0x80/0x14c
read_persistent_clock64+0x124/0x150
read_persistent_wall_and_boot_offset+0x28/0x58
timekeeping_init+0x70/0x348
start_kernel+0xa0c/0xc1c
start_here_common+0x1c/0x20

(This is preceded by a warning for the failed lookup in
rtas_token_to_function().)

This happens when __do_enter_rtas_trace() attempts a token to function
descriptor lookup before the xarray containing the mappings has been
set up.

Fall back to linear scan of the table if rtas_token_to_function_xarray
is empty.

Fixes: 24098f580e2b ("powerpc/rtas: add tracepoints around RTAS entry")
Reviewed-by: "Aneesh Kumar K.V (IBM)" <aneesh.kumar@kernel.org>
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20231212-papr-sys_rtas-vs-lockdown-v6-3-e9eafd0c8c6c@linux.ibm.com


# c500c6e7 12-Dec-2023 Nathan Lynch <nathanl@linux.ibm.com>

powerpc/rtas: Add for_each_rtas_function() iterator

Add a convenience macro for iterating over every element of the
internal function table and convert the one site that can use it. An
additional user of the macro is anticipated in changes to follow.

Reviewed-by: "Aneesh Kumar K.V (IBM)" <aneesh.kumar@kernel.org>
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20231212-papr-sys_rtas-vs-lockdown-v6-2-e9eafd0c8c6c@linux.ibm.com


# 01e346ff 12-Dec-2023 Nathan Lynch <nathanl@linux.ibm.com>

powerpc/rtas: Avoid warning on invalid token argument to sys_rtas()

rtas_token_to_function() WARNs when passed an invalid token; it's
meant to catch bugs in kernel-based users of RTAS functions. However,
user space controls the token value passed to rtas_token_to_function()
by block_rtas_call(), so user space with sufficient privilege to use
sys_rtas() can trigger the warnings at will:

unexpected failed lookup for token 2048
WARNING: CPU: 20 PID: 2247 at arch/powerpc/kernel/rtas.c:556
rtas_token_to_function+0xfc/0x110
...
NIP rtas_token_to_function+0xfc/0x110
LR rtas_token_to_function+0xf8/0x110
Call Trace:
rtas_token_to_function+0xf8/0x110 (unreliable)
sys_rtas+0x188/0x880
system_call_exception+0x268/0x530
system_call_common+0x160/0x2c4

It's desirable to continue warning on bogus tokens in
rtas_token_to_function(). Currently it is used to look up RTAS
function descriptors when tracing, where we know there has to have
been a successful descriptor lookup by different means already, and it
would be a serious inconsistency for the reverse lookup to fail.

So instead of weakening rtas_token_to_function()'s contract by
removing the warnings, introduce rtas_token_to_function_untrusted(),
which has no opinion on failed lookups. Convert block_rtas_call() and
rtas_token_to_function() to use it.

Fixes: 8252b88294d2 ("powerpc/rtas: improve function information lookups")
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20231212-papr-sys_rtas-vs-lockdown-v6-1-e9eafd0c8c6c@linux.ibm.com


# 19773eda 06-Nov-2023 Nathan Lynch <nathanl@linux.ibm.com>

powerpc/rtas: Remove trailing space

Use scripts/cleanfile to remove instances of trailing space in the
core RTAS code and header.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20231106-rtas-trivial-v1-6-61847655c51f@linux.ibm.com


# 1d8faf1f 06-Nov-2023 Nathan Lynch <nathanl@linux.ibm.com>

powerpc/rtas: Remove unused rtas_service_present()

rtas_service_present() has no more users.

rtas_function_implemented() is now the appropriate API for determining
whether a given RTAS function is available to call.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20231106-rtas-trivial-v1-4-61847655c51f@linux.ibm.com


# e160bf64 18-Aug-2023 Mahesh Salgaonkar <mahesh@linux.ibm.com>

powerpc/rtas: export rtas_error_rc() for reuse.

Also, #define descriptive names for common rtas return codes and use it
instead of numeric values.

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/169235811556.193557.1023625262204809514.stgit@jupiter


# b949ee68 08-Jun-2023 Hari Bathini <hbathini@linux.ibm.com>

powerpc/fadump: invoke ibm,os-term with rtas_call_unlocked()

Invoke ibm,os-term call with rtas_call_unlocked(), without using the
RTAS spinlock, to avoid deadlock in the unlikely event of a machine
crash while making an RTAS call.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Reviewed-by: Mahesh Salgaonkar <mahesh@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230609071404.425529-1-hbathini@linux.ibm.com


# af8bc682 06-Mar-2023 Nathan Lynch <nathanl@linux.ibm.com>

powerpc/rtas: lockdep annotations

Add lockdep annotations for the following properties that must hold:

* Any error log retrieval must be atomically coupled with the prior
RTAS call, without a window for another RTAS call to occur before the
error log can be retrieved.

* All users of the core rtas_args parameter block must hold rtas_lock.

Move the definitions of rtas_lock and rtas_args up in the file so that
__do_enter_rtas_trace() can refer to them.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230220-rtas-queue-for-6-4-v1-6-010e4416f13f@linux.ibm.com


# 32740fce 06-Mar-2023 Nathan Lynch <nathanl@linux.ibm.com>

powerpc/rtas: fix miswording in rtas_function kerneldoc

The 'filter' member is a pointer, not a bool; fix the wording
accordingly.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230220-rtas-queue-for-6-4-v1-4-010e4416f13f@linux.ibm.com


# 1792e46e 06-Mar-2023 Nathan Lynch <nathanl@linux.ibm.com>

powerpc/rtas: rtas_call_unlocked() kerneldoc

Add documentation for rtas_call_unlocked(), including details on how
it differs from rtas_call().

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230220-rtas-queue-for-6-4-v1-3-010e4416f13f@linux.ibm.com


# 271208ee 06-Mar-2023 Nathan Lynch <nathanl@linux.ibm.com>

powerpc/rtas: use memmove for potentially overlapping buffer copy

Using memcpy() isn't safe when buf is identical to rtas_err_buf, which
can happen during boot before slab is up. Full context which may not
be obvious from the diff:

if (altbuf) {
buf = altbuf;
} else {
buf = rtas_err_buf;
if (slab_is_available())
buf = kmalloc(RTAS_ERROR_LOG_MAX, GFP_ATOMIC);
}
if (buf)
memcpy(buf, rtas_err_buf, RTAS_ERROR_LOG_MAX);

This was found by inspection and I'm not aware of it causing problems
in practice. It appears to have been introduced by commit
033ef338b6e0 ("powerpc: Merge rtas.c into arch/powerpc/kernel"); the
old ppc64 version of this code did not have this problem.

Use memmove() instead.

Fixes: 033ef338b6e0 ("powerpc: Merge rtas.c into arch/powerpc/kernel")
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230220-rtas-queue-for-6-4-v1-2-010e4416f13f@linux.ibm.com


# 08273c9f 09-Feb-2023 Nathan Lynch <nathanl@linux.ibm.com>

powerpc/rtas: arch-wide function token lookup conversions

With the tokens for all implemented RTAS functions now available via
rtas_function_token(), which is optimal and safe for arbitrary
contexts, there is no need to use rtas_token() or cache its result.

Most conversions are trivial, but a few are worth describing in more
detail:

* Error injection token comparisons for lockdown purposes are
consolidated into a simple predicate: token_is_restricted_errinjct().

* A couple of special cases in block_rtas_call() do not use
rtas_token() but perform string comparisons against names in the
function table. These are converted to compare against token values
instead, which is logically equivalent but less expensive.

* The lookup for the ibm,os-term token can be deferred until needed,
instead of caching it at boot to avoid device tree traversal during
panic.

* Since rtas_function_token() accesses a read-only data structure
without taking any locks, xmon's lookup of set-indicator can be
performed as needed instead of cached at startup.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230125-b4-powerpc-rtas-queue-v3-20-26929c8cce78@linux.ibm.com


# 716bfc97 09-Feb-2023 Nathan Lynch <nathanl@linux.ibm.com>

powerpc/rtas: introduce rtas_function_token() API

Users of rtas_token() supply a string argument that can't be validated
at build time. A typo or misspelling has to be caught by inspection or
by observing wrong behavior at runtime.

Since the core RTAS code now has consolidated the names of all
possible RTAS functions and mapped them to their tokens, token lookup
can be implemented using symbolic constants to index a static array.

So introduce rtas_function_token(), a replacement API which does that,
along with a rtas_service_present()-equivalent helper,
rtas_function_implemented(). Callers supply an opaque predefined
function handle which is used internally to index the function
table. Typos or other inappropriate arguments yield build errors, and
the function handle is a type that can't be easily confused with RTAS
tokens or other integer types.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230125-b4-powerpc-rtas-queue-v3-19-26929c8cce78@linux.ibm.com


# 43033bc6 09-Feb-2023 Nathan Lynch <nathanl@linux.ibm.com>

powerpc/pseries: add RTAS work area allocator

Various pseries-specific RTAS functions take a temporary "work area"
parameter - a buffer in memory accessible to RTAS. Typically such
functions are passed the statically allocated rtas_data_buf buffer as
the argument. This buffer is protected by a global spinlock. So users
of rtas_data_buf cannot perform sleeping operations while accessing
the buffer.

Most RTAS functions that have a work area parameter can return a
status (-2/990x) that indicates that the caller should retry. Before
retrying, the caller may need to reschedule or sleep (see
rtas_busy_delay() for details). This combination of factors
leads to uncomfortable constructions like this:

do {
spin_lock(&rtas_data_buf_lock);
rc = rtas_call(token, __pa(rtas_data_buf, ...);
if (rc == 0) {
/* parse or copy out rtas_data_buf contents */
}
spin_unlock(&rtas_data_buf_lock);
} while (rtas_busy_delay(rc));

Another unfortunately common way of handling this is for callers to
blithely ignore the possibility of a -2/990x status and hope for the
best.

If users were allowed to perform blocking operations while owning a
work area, the programming model would become less tedious and
error-prone. Users could schedule away, sleep, or perform other
blocking operations without having to release and re-acquire
resources.

We could continue to use a single work area buffer, and convert
rtas_data_buf_lock to a mutex. But that would impose an unnecessarily
coarse serialization on all users. As awkward as the current design
is, it prevents longer running operations that need to repeatedly use
rtas_data_buf from blocking the progress of others.

There are more considerations. One is that while 4KB is fine for all
current in-kernel uses, some RTAS calls can take much smaller buffers,
and some (VPD, platform dumps) would likely benefit from larger
ones. Another is that at least one RTAS function (ibm,get-vpd)
has *two* work area parameters. And finally, we should expect the
number of work area users in the kernel to increase over time as we
introduce lockdown-compatible ABIs to replace less safe use cases
based on sys_rtas/librtas.

So a special-purpose allocator for RTAS work area buffers seems worth
trying.

Properties:

* The backing memory for the allocator is reserved early in boot in
order to satisfy RTAS addressing requirements, and then managed with
genalloc.
* Allocations can block, but they never fail (mempool-like).
* Prioritizes first-come, first-serve fairness over throughput.
* Early boot allocations before the allocator has been initialized are
served via an internal static buffer.

Intended to replace rtas_data_buf. New code that needs RTAS work area
buffers should prefer this API.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230125-b4-powerpc-rtas-queue-v3-12-26929c8cce78@linux.ibm.com


# 24098f58 09-Feb-2023 Nathan Lynch <nathanl@linux.ibm.com>

powerpc/rtas: add tracepoints around RTAS entry

Decompose the RTAS entry C code into tracing and non-tracing variants,
calling the just-added tracepoints in the tracing-enabled path. Skip
tracing in contexts known to be unsafe (real mode, CPU offline).

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230125-b4-powerpc-rtas-queue-v3-11-26929c8cce78@linux.ibm.com


# 77f85f69 09-Feb-2023 Nathan Lynch <nathanl@linux.ibm.com>

powerpc/rtas: strengthen do_enter_rtas() type safety, drop inline

Make do_enter_rtas() take a pointer to struct rtas_args and do the
__pa() conversion in one place instead of leaving it to callers. This
also makes it possible to introduce enter/exit tracepoints that access
the rtas_args struct fields.

There's no apparent reason to force inlining of do_enter_rtas()
either, and it seems to bloat the code a bit. Let the compiler decide.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230125-b4-powerpc-rtas-queue-v3-9-26929c8cce78@linux.ibm.com


# 8252b882 09-Feb-2023 Nathan Lynch <nathanl@linux.ibm.com>

powerpc/rtas: improve function information lookups

The core RTAS support code and its clients perform two types of lookup
for RTAS firmware function information.

First, mapping a known function name to a token. The typical use case
invokes rtas_token() to retrieve the token value to pass to
rtas_call(). rtas_token() relies on of_get_property(), which performs
a linear search of the /rtas node's property list under a lock with
IRQs disabled.

Second, and less common: given a token value, looking up some
information about the function. The primary example is the sys_rtas
filter path, which linearly scans a small table to match the token to
a rtas_filter struct. Another use case to come is RTAS entry/exit
tracepoints, which will require efficient lookup of function names
from token values. Currently there is no general API for this.

We need something much like the existing rtas_filters table, but more
general and organized to facilitate efficient lookups.

Introduce:

* A new rtas_function type, aggregating function name, token,
and filter. Other function characteristics could be added in the
future.

* An array of rtas_function, where each element corresponds to a known
RTAS function. All information in the table is static save the token
values, which are derived from the device tree at boot. The array is
sorted by function name to allow binary search.

* A named constant for each known RTAS function, used to index the
function array. These also will be used in a client-facing API to be
added later.

* An xarray that maps valid tokens to rtas_function objects.

Fold the existing rtas_filter table into the new rtas_function array,
with the appropriate adjustments to block_rtas_call(). Remove
now-redundant fields from struct rtas_filter. Preserve the function of
the CONFIG_CPU_BIG_ENDIAN guard in the current filter table by
introducing a per-function flag that is set for the function entries
related to pseries LPAR migration. These have never had working users
via sys_rtas on ppc64le; see commit de0f7349a0dd ("powerpc/rtas:
prevent suspend-related sys_rtas use on LE").

Convert rtas_token() to use a lockless binary search on the function
table. Fall back to the old behavior for lookups against names that
are not known to be RTAS functions, but issue a warning. rtas_token()
is for function names; it is not a general facility for accessing
arbitrary properties of the /rtas node. All known misuses of
rtas_token() have been converted to more appropriate of_ APIs in
preceding changes.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230125-b4-powerpc-rtas-queue-v3-8-26929c8cce78@linux.ibm.com


# 836b5b9f 09-Feb-2023 Nathan Lynch <nathanl@linux.ibm.com>

powerpc/rtas: ensure 4KB alignment for rtas_data_buf

Some RTAS functions that have work area parameters impose alignment
requirements on the work area passed to them by the OS. Examples
include:

- ibm,configure-connector
- ibm,update-nodes
- ibm,update-properties

4KB is the greatest alignment required by PAPR for such
buffers. rtas_data_buf used to have a __page_aligned attribute in the
arch/ppc64 days, but that was changed to __cacheline_aligned for
unknown reasons by commit 033ef338b6e0 ("powerpc: Merge rtas.c into
arch/powerpc/kernel"). That works out to 128-byte alignment
on ppc64, which isn't right.

This was found by inspection and I'm not aware of any real problems
caused by this. Either current RTAS implementations don't enforce the
alignment constraints, or rtas_data_buf is always being placed at a
4KB boundary by accident (or both, perhaps).

Use __aligned(SZ_4K) to ensure the rtas_data_buf has alignment
appropriate for all users.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Fixes: 033ef338b6e0 ("powerpc: Merge rtas.c into arch/powerpc/kernel")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230125-b4-powerpc-rtas-queue-v3-6-26929c8cce78@linux.ibm.com


# 09d1ea72 09-Feb-2023 Nathan Lynch <nathanl@linux.ibm.com>

powerpc/rtas: handle extended delays safely in early boot

Some code that runs early in boot calls RTAS functions that can return
-2 or 990x statuses, which mean the caller should retry. An example is
pSeries_cmo_feature_init(), which invokes ibm,get-system-parameter but
treats these benign statuses as errors instead of retrying.

pSeries_cmo_feature_init() and similar code should be made to retry
until they succeed or receive a real error, using the usual pattern:

do {
rc = rtas_call(token, etc...);
} while (rtas_busy_delay(rc));

But rtas_busy_delay() will perform a timed sleep on any 990x
status. This isn't safe so early in boot, before the CPU scheduler and
timer subsystem have initialized.

The -2 RTAS status is much more likely to occur during single-threaded
boot than 990x in practice, at least on PowerVM. This is because -2
usually means that RTAS made progress but exhausted its self-imposed
timeslice, while 990x is associated with concurrent requests from the
OS causing internal contention. Regardless, according to the language
in PAPR, the OS should be prepared to handle either type of status at
any time.

Add a fallback path to rtas_busy_delay() to handle this as safely as
possible, performing a small delay on 990x. Include a counter to
detect retry loops that aren't making progress and bail out. Add __ref
to rtas_busy_delay() since it now conditionally calls an __init
function.

This was found by inspection and I'm not aware of any real
failures. However, the implementation of rtas_busy_delay() before
commit 38f7b7067dae ("powerpc/rtas: rtas_busy_delay() improvements")
was not susceptible to this problem, so let's treat this as a
regression.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Fixes: 38f7b7067dae ("powerpc/rtas: rtas_busy_delay() improvements")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230125-b4-powerpc-rtas-queue-v3-1-26929c8cce78@linux.ibm.com


# 12fd6665 24-Jan-2023 Nathan Lynch <nathanl@linux.ibm.com>

powerpc/rtas: upgrade internal arch spinlocks

At the time commit f97bb36f705d ("powerpc/rtas: Turn rtas lock into a
raw spinlock") was written, the spinlock lockup detection code called
__delay(), which will not make progress if the timebase is not
advancing. Since the interprocessor timebase synchronization sequence
for chrp, cell, and some now-unsupported Power models can temporarily
freeze the timebase through an RTAS function (freeze-time-base), the
lock that serializes most RTAS calls was converted to arch_spinlock_t
to prevent kernel hangs in the lockup detection code.

However, commit bc88c10d7e69 ("locking/spinlock/debug: Remove spinlock
lockup detection code") removed that inconvenient property from the
lock debug code several years ago. So now it should be safe to
reintroduce generic locks into the RTAS support code, primarily to
increase lockdep coverage.

Making rtas_lock a spinlock_t would violate lock type nesting rules
because it can be acquired while holding raw locks, e.g. pci_lock and
irq_desc->lock. So convert it to raw_spinlock_t. There's no apparent
reason not to upgrade timebase_lock as well.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230124140448.45938-5-nathanl@linux.ibm.com


# 599af491 24-Jan-2023 Nathan Lynch <nathanl@linux.ibm.com>

powerpc/rtas: remove lock and args fields from global rtas struct

Only code internal to the RTAS subsystem needs access to the central
lock and parameter block. Remove these from the globally visible
'rtas' struct and make them file-static in rtas.c.

Some changed lines in rtas_call() lack appropriate spacing around
operators and cause checkpatch errors; fix these as well.

Suggested-by: Laurent Dufour <ldufour@linux.ibm.com>
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Laurent Dufour <laurent.dufour@fr.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230124140448.45938-4-nathanl@linux.ibm.com


# 9bce6243 24-Jan-2023 Nathan Lynch <nathanl@linux.ibm.com>

powerpc/rtas: make all exports GPL

The first symbol exports of RTAS functions and data came with the (now
removed) scanlog driver in 2003:

https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git/commit/?id=f92e361842d5251e50562b09664082dcbd0548bb

At the time this was applied, EXPORT_SYMBOL_GPL() was very new, and
the exports of rtas_call() etc have remained non-GPL. As new APIs have
been added to the RTAS subsystem, their symbol exports have followed
the convention set by existing code.

However, the historical evidence is that RTAS function exports have been
added over time only to satisfy the needs of in-kernel users, and these
clients must have fairly intimate knowledge of how the APIs work to use
them safely. No out of tree users are known, and future ones seem
unlikely.

Arguably the default for RTAS symbols should have become
EXPORT_SYMBOL_GPL once it was available. Let's make it so now, and
exceptions can be evaluated as needed.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Laurent Dufour <laurent.dufour@fr.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230124140448.45938-3-nathanl@linux.ibm.com


# 0d7e812f 26-Jan-2023 Michael Ellerman <mpe@ellerman.id.au>

powerpc/rtas: Drop unused export symbols

Some RTAS symbols are never used by modular code, drop their exports.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Nathan Lynch <nathanl@linux.ibm.com>
Link: https://lore.kernel.org/r/20230127111231.84294-1-mpe@ellerman.id.au


# 5ff92e2f 24-Jan-2023 Nathan Lynch <nathanl@linux.ibm.com>

powerpc/rtas: unexport 'rtas' symbol

No modular code needs access to the 'rtas' struct, so remove the
symbol export.

Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230124140448.45938-2-nathanl@linux.ibm.com


# 98c738c8 18-Nov-2022 Nathan Lynch <nathanl@linux.ibm.com>

powerpc/rtas: mandate RTAS syscall filtering

CONFIG_PPC_RTAS_FILTER has been optional but default-enabled since its
introduction. It's been enabled in enterprise distro kernels for a
while without causing ABI breakage that wasn't easily fixed, and it
prevents harmful abuses of the rtas syscall.

Let's make it unconditional.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221118150751.469393-10-nathanl@linux.ibm.com


# f975b655 18-Nov-2022 Nathan Lynch <nathanl@linux.ibm.com>

powerpc/rtas: define pr_fmt and convert printk call sites

Set pr_fmt to "rtas: " and convert the handful of printk() uses in
rtas.c, adjusting the messages to remove now-redundant "RTAS"
strings.

Note that rtas_restart(), rtas_power_off(), and rtas_halt() all
currently use printk() without specifying a log level. These have been
changed to use pr_emerg(), which matches the behavior of
rtas_os_term().

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221118150751.469393-9-nathanl@linux.ibm.com


# 9581f8a0 18-Nov-2022 Nathan Lynch <nathanl@linux.ibm.com>

powerpc/rtas: clean up includes

rtas.c used to host complex code related to pseries-specific guest
migration and suspend, which used atomics, completions, hcalls, and
CPU hotplug APIs. That's all been deleted or moved, so remove the
include directives that have been rendered unnecessary. Sort the
remainder (with linux/ before asm/) to impose some order on where
future additions go.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221118150751.469393-8-nathanl@linux.ibm.com


# c67a0e41 18-Nov-2022 Nathan Lynch <nathanl@linux.ibm.com>

powerpc/rtas: clean up rtas_error_log_max initialization

The code in rtas_get_error_log_max() doesn't cause problems in
practice, but there are no measures to ensure that the lazy
initialization of the static rtas_error_log_max variable is atomic,
and it's not worth adding them.

Initialize the static rtas_error_log_max variable at boot when we're
single-threaded instead of lazily on first use. Use the more
appropriate of_property_read_u32() API instead of rtas_token() to
consult the "rtas-error-log-max" property, which is not the name of an
RTAS function. Convert use of printk() to pr_warn() and distinguish
the possible error cases.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221118150751.469393-7-nathanl@linux.ibm.com


# 6c606e57 18-Nov-2022 Nathan Lynch <nathanl@linux.ibm.com>

powerpc/rtas: avoid scheduling in rtas_os_term()

It's unsafe to use rtas_busy_delay() to handle a busy status from
the ibm,os-term RTAS function in rtas_os_term():

Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
BUG: sleeping function called from invalid context at arch/powerpc/kernel/rtas.c:618
in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1, name: swapper/0
preempt_count: 2, expected: 0
CPU: 7 PID: 1 Comm: swapper/0 Tainted: G D 6.0.0-rc5-02182-gf8553a572277-dirty #9
Call Trace:
[c000000007b8f000] [c000000001337110] dump_stack_lvl+0xb4/0x110 (unreliable)
[c000000007b8f040] [c0000000002440e4] __might_resched+0x394/0x3c0
[c000000007b8f0e0] [c00000000004f680] rtas_busy_delay+0x120/0x1b0
[c000000007b8f100] [c000000000052d04] rtas_os_term+0xb8/0xf4
[c000000007b8f180] [c0000000001150fc] pseries_panic+0x50/0x68
[c000000007b8f1f0] [c000000000036354] ppc_panic_platform_handler+0x34/0x50
[c000000007b8f210] [c0000000002303c4] notifier_call_chain+0xd4/0x1c0
[c000000007b8f2b0] [c0000000002306cc] atomic_notifier_call_chain+0xac/0x1c0
[c000000007b8f2f0] [c0000000001d62b8] panic+0x228/0x4d0
[c000000007b8f390] [c0000000001e573c] do_exit+0x140c/0x1420
[c000000007b8f480] [c0000000001e586c] make_task_dead+0xdc/0x200

Use rtas_busy_delay_time() instead, which signals without side effects
whether to attempt the ibm,os-term RTAS call again.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221118150751.469393-5-nathanl@linux.ibm.com


# ed2213bf 18-Nov-2022 Nathan Lynch <nathanl@linux.ibm.com>

powerpc/rtas: avoid device tree lookups in rtas_os_term()

rtas_os_term() is called during panic. Its behavior depends on a couple
of conditions in the /rtas node of the device tree, the traversal of
which entails locking and local IRQ state changes. If the kernel panics
while devtree_lock is held, rtas_os_term() as currently written could
hang.

Instead of discovering the relevant characteristics at panic time,
cache them in file-static variables at boot. Note the lookup for
"ibm,extended-os-term" is converted to of_property_read_bool() since it
is a boolean property, not an RTAS function token.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
[mpe: Incorporate suggested change from Nick]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221118150751.469393-4-nathanl@linux.ibm.com


# 336e2554 18-Nov-2022 Nathan Lynch <nathanl@linux.ibm.com>

powerpc/rtas: document rtas_call()

rtas_call() has a complex calling convention, non-standard return
values, and many users. Add kernel-doc for it and remove the less
structured commentary from rtas.h.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221118150751.469393-2-nathanl@linux.ibm.com


# b8f3e488 26-Sep-2022 Nathan Lynch <nathanl@linux.ibm.com>

powerpc/rtas: block error injection when locked down

The error injection facility on pseries VMs allows corruption of
arbitrary guest memory, potentially enabling a sufficiently privileged
user to disable lockdown or perform other modifications of the running
kernel via the rtas syscall.

Block the PAPR error injection facility from being opened or called
when locked down.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Acked-by: Paul Moore <paul@paul-moore.com> (LSM)
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220926131643.146502-3-nathanl@linux.ibm.com


# f88aabad 07-Sep-2022 Nathan Lynch <nathanl@linux.ibm.com>

Revert "powerpc/rtas: Implement reentrant rtas call"

At the time this was submitted by Leonardo, I confirmed -- or thought
I had confirmed -- with PowerVM partition firmware development that
the following RTAS functions:

- ibm,get-xive
- ibm,int-off
- ibm,int-on
- ibm,set-xive

were safe to call on multiple CPUs simultaneously, not only with
respect to themselves as indicated by PAPR, but with arbitrary other
RTAS calls:

https://lore.kernel.org/linuxppc-dev/875zcy2v8o.fsf@linux.ibm.com/

Recent discussion with firmware development makes it clear that this
is not true, and that the code in commit b664db8e3f97 ("powerpc/rtas:
Implement reentrant rtas call") is unsafe, likely explaining several
strange bugs we've seen in internal testing involving DLPAR and
LPM. These scenarios use ibm,configure-connector, whose internal state
can be corrupted by the concurrent use of the "reentrant" functions,
leading to symptoms like endless busy statuses from RTAS.

Fixes: b664db8e3f97 ("powerpc/rtas: Implement reentrant rtas call")
Cc: stable@vger.kernel.org # v5.8+
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Laurent Dufour <laurent.dufour@fr.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220907220111.223267-1-nathanl@linux.ibm.com


# 7bc08056 14-Jun-2022 Andrew Donnellan <ajd@linux.ibm.com>

powerpc/rtas: Allow ibm,platform-dump RTAS call with null buffer address

Add a special case to block_rtas_call() to allow the ibm,platform-dump RTAS
call through the RTAS filter if the buffer address is 0.

According to PAPR, ibm,platform-dump is called with a null buffer address
to notify the platform firmware that processing of a particular dump is
finished.

Without this, on a pseries machine with CONFIG_PPC_RTAS_FILTER enabled, an
application such as rtas_errd that is attempting to retrieve a dump will
encounter an error at the end of the retrieval process.

Fixes: bd59380c5ba4 ("powerpc/rtas: Restrict RTAS requests from userspace")
Cc: stable@vger.kernel.org
Reported-by: Sathvika Vasireddy <sathvika@linux.ibm.com>
Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com>
Reviewed-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Reviewed-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220614134952.156010-1-ajd@linux.ibm.com


# 743cdb7b 19-May-2022 Paul Mackerras <paulus@ozlabs.org>

powerpc/kasan: Mark more real-mode code as not to be instrumented

This marks more files and functions that can possibly be called in
real mode as not to be instrumented by KASAN. Most were found by
inspection, except for get_pseries_errorlog() which was reported as
causing a crash in testing.

Reported-by: Nageswara R Sastry <rnsastry@linux.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/YoX1kZPnmUX4RZEK@cleo


# 804c0a16 08-Mar-2022 Nicholas Piggin <npiggin@gmail.com>

powerpc/rtas: enture rtas_call is called with MMU enabled

rtas_call must not be called with the MMU disabled because in case
of rtas error, log_error is called which requires MMU enabled. Add
a test and warning for this.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Laurent Dufour <ldufour@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220308135047.478297-14-npiggin@gmail.com


# c5a65e0a 08-Mar-2022 Nicholas Piggin <npiggin@gmail.com>

powerpc/rtas: Call enter_rtas with MSR[EE] disabled

Disable MSR[EE] in C code rather than asm.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Laurent Dufour <ldufour@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220308135047.478297-5-npiggin@gmail.com


# b6b1c3ce 03-May-2022 Laurent Dufour <ldufour@linux.ibm.com>

powerpc/rtas: Keep MSR[RI] set when calling RTAS

RTAS runs in real mode (MSR[DR] and MSR[IR] unset) and in 32-bit big
endian mode (MSR[SF,LE] unset).

The change in MSR is done in enter_rtas() in a relatively complex way,
since the MSR value could be hardcoded.

Furthermore, a panic has been reported when hitting the watchdog interrupt
while running in RTAS, this leads to the following stack trace:

watchdog: CPU 24 Hard LOCKUP
watchdog: CPU 24 TB:997512652051031, last heartbeat TB:997504470175378 (15980ms ago)
...
Supported: No, Unreleased kernel
CPU: 24 PID: 87504 Comm: drmgr Kdump: loaded Tainted: G E X 5.14.21-150400.71.1.bz196362_2-default #1 SLE15-SP4 (unreleased) 0d821077ef4faa8dfaf370efb5fdca1fa35f4e2c
NIP: 000000001fb41050 LR: 000000001fb4104c CTR: 0000000000000000
REGS: c00000000fc33d60 TRAP: 0100 Tainted: G E X (5.14.21-150400.71.1.bz196362_2-default)
MSR: 8000000002981000 <SF,VEC,VSX,ME> CR: 48800002 XER: 20040020
CFAR: 000000000000011c IRQMASK: 1
GPR00: 0000000000000003 ffffffffffffffff 0000000000000001 00000000000050dc
GPR04: 000000001ffb6100 0000000000000020 0000000000000001 000000001fb09010
GPR08: 0000000020000000 0000000000000000 0000000000000000 0000000000000000
GPR12: 80040000072a40a8 c00000000ff8b680 0000000000000007 0000000000000034
GPR16: 000000001fbf6e94 000000001fbf6d84 000000001fbd1db0 000000001fb3f008
GPR20: 000000001fb41018 ffffffffffffffff 000000000000017f fffffffffffff68f
GPR24: 000000001fb18fe8 000000001fb3e000 000000001fb1adc0 000000001fb1cf40
GPR28: 000000001fb26000 000000001fb460f0 000000001fb17f18 000000001fb17000
NIP [000000001fb41050] 0x1fb41050
LR [000000001fb4104c] 0x1fb4104c
Call Trace:
Instruction dump:
XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
Oops: Unrecoverable System Reset, sig: 6 [#1]
LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
...
Supported: No, Unreleased kernel
CPU: 24 PID: 87504 Comm: drmgr Kdump: loaded Tainted: G E X 5.14.21-150400.71.1.bz196362_2-default #1 SLE15-SP4 (unreleased) 0d821077ef4faa8dfaf370efb5fdca1fa35f4e2c
NIP: 000000001fb41050 LR: 000000001fb4104c CTR: 0000000000000000
REGS: c00000000fc33d60 TRAP: 0100 Tainted: G E X (5.14.21-150400.71.1.bz196362_2-default)
MSR: 8000000002981000 <SF,VEC,VSX,ME> CR: 48800002 XER: 20040020
CFAR: 000000000000011c IRQMASK: 1
GPR00: 0000000000000003 ffffffffffffffff 0000000000000001 00000000000050dc
GPR04: 000000001ffb6100 0000000000000020 0000000000000001 000000001fb09010
GPR08: 0000000020000000 0000000000000000 0000000000000000 0000000000000000
GPR12: 80040000072a40a8 c00000000ff8b680 0000000000000007 0000000000000034
GPR16: 000000001fbf6e94 000000001fbf6d84 000000001fbd1db0 000000001fb3f008
GPR20: 000000001fb41018 ffffffffffffffff 000000000000017f fffffffffffff68f
GPR24: 000000001fb18fe8 000000001fb3e000 000000001fb1adc0 000000001fb1cf40
GPR28: 000000001fb26000 000000001fb460f0 000000001fb17f18 000000001fb17000
NIP [000000001fb41050] 0x1fb41050
LR [000000001fb4104c] 0x1fb4104c
Call Trace:
Instruction dump:
XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
---[ end trace 3ddec07f638c34a2 ]---

This happens because MSR[RI] is unset when entering RTAS but there is no
valid reason to not set it here.

RTAS is expected to be called with MSR[RI] as specified in PAPR+ section
"7.2.1 Machine State":

R1–7.2.1–9. If called with MSR[RI] equal to 1, then RTAS must protect
its own critical regions from recursion by setting the MSR[RI] bit to
0 when in the critical regions.

Fixing this by reviewing the way MSR is compute before calling RTAS. Now a
hardcoded value meaning real mode, 32 bits big endian mode and Recoverable
Interrupt is loaded. In the case MSR[S] is set, it will remain set while
entering RTAS as only urfid can unset it (thanks Fabiano).

In addition a check is added in do_enter_rtas() to detect calls made with
MSR[RI] unset, as we are forcing it on later.

This patch has been tested on the following machines:
Power KVM Guest
P8 S822L (host Ubuntu kernel 5.11.0-49-generic)
PowerVM LPAR
P8 9119-MME (FW860.A1)
p9 9008-22L (FW950.00)
P10 9080-HEX (FW1010.00)

Suggested-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220504101244.12107-1-ldufour@linux.ibm.com


# e6f6390a 08-Mar-2022 Christophe Leroy <christophe.leroy@csgroup.eu>

powerpc: Add missing headers

Don't inherit headers "by chances" from asm/prom.h, asm/mpc52xx.h,
asm/pci.h etc...

Include the needed headers, and remove asm/prom.h when it was
needed exclusively for pulling necessary headers.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/be8bdc934d152a7d8ee8d1a840d5596e2f7d85e0.1646767214.git.christophe.leroy@csgroup.eu


# 7c5ed82b 04-Feb-2022 Sourabh Jain <sourabhjain@linux.ibm.com>

powerpc: Set crashkernel offset to mid of RMA region

On large config LPARs (having 192 and more cores), Linux fails to boot
due to insufficient memory in the first memblock. It is due to the
memory reservation for the crash kernel which starts at 128MB offset of
the first memblock. This memory reservation for the crash kernel doesn't
leave enough space in the first memblock to accommodate other essential
system resources.

The crash kernel start address was set to 128MB offset by default to
ensure that the crash kernel get some memory below the RMA region which
is used to be of size 256MB. But given that the RMA region size can be
512MB or more, setting the crash kernel offset to mid of RMA size will
leave enough space for the kernel to allocate memory for other system
resources.

Since the above crash kernel offset change is only applicable to the LPAR
platform, the LPAR feature detection is pushed before the crash kernel
reservation. The rest of LPAR specific initialization will still
be done during pseries_probe_fw_features as usual.

This patch is dependent on changes to paca allocation for boot CPU. It
expect boot CPU to discover 1T segment support which is introduced by
the patch posted here:
https://lists.ozlabs.org/pipermail/linuxppc-dev/2022-January/239175.html

Reported-by: Abdul haleem <abdhalee@linux.vnet.ibm.com>
Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220204085601.107257-1-sourabhjain@linux.ibm.com


# dd5cde45 16-Nov-2021 Nathan Lynch <nathanl@linux.ibm.com>

powerpc/rtas: rtas_busy_delay_time() kernel-doc

Provide API documentation for rtas_busy_delay_time(), explaining why we
return the same value for 9900 and -2.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211117060259.957178-3-nathanl@linux.ibm.com


# 38f7b706 16-Nov-2021 Nathan Lynch <nathanl@linux.ibm.com>

powerpc/rtas: rtas_busy_delay() improvements

Generally RTAS cannot block, and in PAPR it is required to return control
to the OS within a few tens of microseconds. In order to support operations
which may take longer to complete, many RTAS primitives can return
intermediate -2 ("busy") or 990x ("extended delay") values, which indicate
that the OS should reattempt the same call with the same arguments at some
point in the future.

Current versions of PAPR are less than clear about this, but the intended
meanings of these values in more detail are:

RTAS_BUSY (-2): RTAS has suspended a potentially long-running operation in
order to meet its latency obligation and give the OS the opportunity to
perform other work. RTAS can resume making progress as soon as the OS
reattempts the call.

RTAS_EXTENDED_DELAY_{MIN...MAX} (9900-9905): RTAS must wait for an external
event to occur or for internal contention to resolve before it can complete
the requested operation. The value encodes a non-binding hint as to roughly
how long the OS should wait before calling again, but the OS is allowed to
reattempt the call sooner or even immediately.

Linux of course must take its own CPU scheduling obligations into account
when handling these statuses; e.g. a task which receives an RTAS_BUSY
status should check whether to reschedule before it attempts the RTAS call
again to avoid starving other tasks.

rtas_busy_delay() is a helper function that "consumes" a busy or extended
delay status. Common usage:

int rc;

do {
rc = rtas_call(rtas_token("some-function"), ...);
} while (rtas_busy_delay(rc));

/* convert rc to Linux error value, etc */

If rc is a busy or extended delay status, the caller can rely on
rtas_busy_delay() to perform an appropriate sleep or reschedule and return
nonzero. Other statuses are handled normally by the caller.

The current implementation of rtas_busy_delay() both oversleeps and
overuses the CPU:

* It performs msleep() for all 990x and even when no delay is
suggested (-2), but this is understood to actually sleep for two jiffies
minimum in practice (20ms with HZ=100). 9900 (1ms) and 9901 (10ms)
appear to be the most common extended delay statuses, and the
oversleeping measurably lengthens DLPAR operations, which perform
many RTAS calls.

* It does not sleep on 990x unless need_resched() is true, causing code
like the loop above to needlessly retry, wasting CPU time.

Alter the logic to align better with the intended meanings:

* When passed RTAS_BUSY, perform cond_resched() and return without
sleeping. The caller should reattempt immediately

* Always sleep when passed an extended delay status, using usleep_range()
for precise shorter sleeps. Limit the sleep time to one second even
though there are higher architected values.

Change rtas_busy_delay()'s return type to bool to better reflect its usage,
and add kernel-doc.

rtas_busy_delay_time() is unchanged, even though it "incorrectly" returns 1
for RTAS_BUSY. There are users of that API with open-coded delay loops in
sensitive contexts that will have to be taken on an individual basis.

Brief results for addition and removal of 5GB memory on a small P9 PowerVM
partition follow. Load was generated with stress-ng --cpu N. For add,
elapsed time is greatly reduced without significant change in the number of
RTAS calls or time spent on CPU. For remove, elapsed time is modestly
reduced, with significant reductions in RTAS calls and time spent on CPU.

With no competing workload (- before, + after):

Performance counter stats for 'bash -c echo "memory add count 20" > /sys/kernel/dlpar' (10 runs):

- 1,935 probe:rtas_call # 0.003 M/sec ( +- 0.22% )
- 609.99 msec task-clock # 0.183 CPUs utilized ( +- 0.19% )
+ 1,956 probe:rtas_call # 0.003 M/sec ( +- 0.17% )
+ 618.56 msec task-clock # 0.278 CPUs utilized ( +- 0.11% )

- 3.3322 +- 0.0670 seconds time elapsed ( +- 2.01% )
+ 2.2222 +- 0.0416 seconds time elapsed ( +- 1.87% )

Performance counter stats for 'bash -c echo "memory remove count 20" > /sys/kernel/dlpar' (10 runs):

- 6,224 probe:rtas_call # 0.008 M/sec ( +- 2.57% )
- 750.36 msec task-clock # 0.190 CPUs utilized ( +- 2.01% )
+ 843 probe:rtas_call # 0.003 M/sec ( +- 0.12% )
+ 250.66 msec task-clock # 0.068 CPUs utilized ( +- 0.17% )

- 3.9394 +- 0.0890 seconds time elapsed ( +- 2.26% )
+ 3.678 +- 0.113 seconds time elapsed ( +- 3.07% )

With all CPUs 100% busy (- before, + after):

Performance counter stats for 'bash -c echo "memory add count 20" > /sys/kernel/dlpar' (10 runs):

- 2,979 probe:rtas_call # 0.003 M/sec ( +- 0.12% )
- 1,096.62 msec task-clock # 0.105 CPUs utilized ( +- 0.10% )
+ 2,981 probe:rtas_call # 0.003 M/sec ( +- 0.22% )
+ 1,095.26 msec task-clock # 0.154 CPUs utilized ( +- 0.21% )

- 10.476 +- 0.104 seconds time elapsed ( +- 1.00% )
+ 7.1124 +- 0.0865 seconds time elapsed ( +- 1.22% )

Performance counter stats for 'bash -c echo "memory remove count 20" > /sys/kernel/dlpar' (10 runs):

- 2,702 probe:rtas_call # 0.004 M/sec ( +- 4.00% )
- 722.71 msec task-clock # 0.067 CPUs utilized ( +- 2.41% )
+ 1,246 probe:rtas_call # 0.003 M/sec ( +- 0.25% )
+ 487.73 msec task-clock # 0.049 CPUs utilized ( +- 0.20% )

- 10.829 +- 0.163 seconds time elapsed ( +- 1.51% )
+ 9.9887 +- 0.0866 seconds time elapsed ( +- 0.87% )

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211117060259.957178-2-nathanl@linux.ibm.com


# 53cadf7d 16-Nov-2021 Nathan Lynch <nathanl@linux.ibm.com>

powerpc/rtas: kernel-doc fixes

Fix the following issues reported by kernel-doc:

$ scripts/kernel-doc -v -none arch/powerpc/kernel/rtas.c
arch/powerpc/kernel/rtas.c:810: info: Scanning doc for function rtas_activate_firmware
arch/powerpc/kernel/rtas.c:818: warning: contents before sections
arch/powerpc/kernel/rtas.c:841: info: Scanning doc for function rtas_call_reentrant
arch/powerpc/kernel/rtas.c:893: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
* Find a specific pseries error log in an RTAS extended event log.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211116215806.928235-1-nathanl@linux.ibm.com


# c0891ac1 02-Aug-2021 Alexey Dobriyan <adobriyan@gmail.com>

isystem: ship and use stdarg.h

Ship minimal stdarg.h (1 type, 4 macros) as <linux/stdarg.h>.
stdarg.h is the only userspace header commonly used in the kernel.

GPL 2 version of <stdarg.h> can be extracted from
http://archive.debian.org/debian/pool/main/g/gcc-4.2/gcc-4.2_4.2.4.orig.tar.gz

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>


# 59dc5bfc 17-Jun-2021 Nicholas Piggin <npiggin@gmail.com>

powerpc/64s: avoid reloading (H)SRR registers if they are still valid

When an interrupt is taken, the SRR registers are set to return to where
it left off. Unless they are modified in the meantime, or the return
address or MSR are modified, there is no need to reload these registers
when returning from interrupt.

Introduce per-CPU flags that track the validity of SRR and HSRR
registers. These are cleared when returning from interrupt, when
using the registers for something else (e.g., OPAL calls), when
adjusting the return address or MSR of a context, and when context
switching (which changes the return address and MSR).

This improves the performance of interrupt returns.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Fold in fixup patch from Nick]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210617155116.2167984-5-npiggin@gmail.com


# e5d56763 08-Apr-2021 Nathan Lynch <nathanl@linux.ibm.com>

powerpc/rtas: rename RTAS_RMOBUF_MAX to RTAS_USER_REGION_SIZE

RTAS_RMOBUF_MAX doesn't actually describe a "maximum" value in any
sense. It represents the size of an area of memory set aside for user
space to use as work areas for certain RTAS calls.

Rename it to RTAS_USER_REGION_SIZE.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210408140630.205502-6-nathanl@linux.ibm.com


# 0649cdc8 08-Apr-2021 Nathan Lynch <nathanl@linux.ibm.com>

powerpc/rtas: move syscall filter setup into separate function

Reduce conditionally compiled sections within rtas_initialize() by
moving the filter table initialization into its own function already
guarded by CONFIG_PPC_RTAS_FILTER. No behavior change intended.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Acked-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210408140630.205502-5-nathanl@linux.ibm.com


# 0ab1c929 08-Apr-2021 Nathan Lynch <nathanl@linux.ibm.com>

powerpc/rtas: remove ibm_suspend_me_token

There's not a compelling reason to cache the value of the token for
the ibm,suspend-me function. Just look it up when needed in the RTAS
syscall's special case for it.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210408140630.205502-4-nathanl@linux.ibm.com


# f10881a4 08-Dec-2020 Tyrel Datwyler <tyreld@linux.ibm.com>

powerpc/rtas: Fix typo of ibm,open-errinjct in RTAS filter

Commit bd59380c5ba4 ("powerpc/rtas: Restrict RTAS requests from userspace")
introduced the following error when invoking the errinjct userspace
tool:

[root@ltcalpine2-lp5 librtas]# errinjct open
[327884.071171] sys_rtas: RTAS call blocked - exploit attempt?
[327884.071186] sys_rtas: token=0x26, nargs=0 (called by errinjct)
errinjct: Could not open RTAS error injection facility
errinjct: librtas: open: Unexpected I/O error

The entry for ibm,open-errinjct in rtas_filter array has a typo where
the "j" is omitted in the rtas call name. After fixing this typo the
errinjct tool functions again as expected.

[root@ltcalpine2-lp5 linux]# errinjct open
RTAS error injection facility open, token = 1

Fixes: bd59380c5ba4 ("powerpc/rtas: Restrict RTAS requests from userspace")
Cc: stable@vger.kernel.org
Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201208195434.8289-1-tyreld@linux.ibm.com


# 1b248817 07-Dec-2020 Nathan Lynch <nathanl@linux.ibm.com>

powerpc/rtas: remove unused rtas_suspend_last_cpu()

rtas_suspend_last_cpu() is now unused, remove it and
__rtas_suspend_last_cpu() which also becomes unused.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201207215200.1785968-24-nathanl@linux.ibm.com


# 395b2c09 07-Dec-2020 Nathan Lynch <nathanl@linux.ibm.com>

powerpc/rtas: remove rtas_suspend_cpu()

rtas_suspend_cpu() no longer has users; remove it and
__rtas_suspend_cpu() which now becomes unused as well.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201207215200.1785968-22-nathanl@linux.ibm.com


# 5f6665e4 07-Dec-2020 Nathan Lynch <nathanl@linux.ibm.com>

powerpc/rtas: remove rtas_ibm_suspend_me_unsafe()

rtas_ibm_suspend_me_unsafe() is now unused; remove it and
rtas_percpu_suspend_me() which becomes unused as a result.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201207215200.1785968-17-nathanl@linux.ibm.com


# 4d756894 07-Dec-2020 Nathan Lynch <nathanl@linux.ibm.com>

powerpc/rtas: dispatch partition migration requests to pseries

sys_rtas() cannot call ibm,suspend-me directly in the same way it
handles other inputs. Instead it must dispatch the request to code
that can first perform the H_JOIN sequence before any call to
ibm,suspend-me can succeed. Over time kernel/rtas.c has accreted a fair
amount of platform-specific code to implement this.

Since a different, more robust implementation of the suspend sequence
is now in the pseries platform code, we want to dispatch the request
there.

Note that invoking ibm,suspend-me via the RTAS syscall is all but
deprecated; this change preserves ABI compatibility for old programs
while providing to them the benefit of the new partition suspend
implementation. This is a behavior change in that the kernel performs
the device tree update and firmware activation before returning, but
experimentation indicates this is tolerated fine by legacy user space.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201207215200.1785968-16-nathanl@linux.ibm.com


# 5f485a66 07-Dec-2020 Nathan Lynch <nathanl@linux.ibm.com>

powerpc/rtas: add rtas_activate_firmware()

Provide a documented wrapper function for the ibm,activate-firmware
service, which must be called after a partition migration or
hibernation.

If the function is absent or the call fails, the OS will continue to
run normally with the current firmware, so there is no need to perform
any recovery. Just log it and continue.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201207215200.1785968-6-nathanl@linux.ibm.com


# 701ba683 07-Dec-2020 Nathan Lynch <nathanl@linux.ibm.com>

powerpc/rtas: add rtas_ibm_suspend_me()

Now that the name is available, provide a simple wrapper for
ibm,suspend-me which returns both a Linux errno and optionally the
actual RTAS status to the caller.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201207215200.1785968-5-nathanl@linux.ibm.com


# 7049b288 07-Dec-2020 Nathan Lynch <nathanl@linux.ibm.com>

powerpc/rtas: rtas_ibm_suspend_me -> rtas_ibm_suspend_me_unsafe

The pseries partition suspend sequence requires that all active CPUs
call H_JOIN, which suspends all but one of them with interrupts
disabled. The "chosen" CPU is then to call ibm,suspend-me to complete
the suspend. Upon returning from ibm,suspend-me, the chosen CPU is to
use H_PROD to wake the joined CPUs.

Using on_each_cpu() for this, as rtas_ibm_suspend_me() does to
implement partition migration, is susceptible to deadlock with other
users of on_each_cpu() and with users of stop_machine APIs. The
callback passed to on_each_cpu() is not allowed to synchronize with
other CPUs in the way it is used here.

Complicating the fix is the fact that rtas_ibm_suspend_me() also
occupies the function name that should be used to provide a more
conventional wrapper for ibm,suspend-me. Rename rtas_ibm_suspend_me()
to rtas_ibm_suspend_me_unsafe() to free up the name and indicate that
it should not gain users.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201207215200.1785968-4-nathanl@linux.ibm.com


# de0f7349 07-Dec-2020 Nathan Lynch <nathanl@linux.ibm.com>

powerpc/rtas: prevent suspend-related sys_rtas use on LE

While drmgr has had work in some areas to make its RTAS syscall
interactions endian-neutral, its code for performing partition
migration via the syscall has never worked on LE. While it is able to
complete ibm,suspend-me successfully, it crashes when attempting the
subsequent ibm,update-nodes call.

drmgr is the only known (or plausible) user of ibm,suspend-me,
ibm,update-nodes, and ibm,update-properties, so allow them only in
big-endian configurations.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201207215200.1785968-2-nathanl@linux.ibm.com


# bd59380c 19-Aug-2020 Andrew Donnellan <ajd@linux.ibm.com>

powerpc/rtas: Restrict RTAS requests from userspace

A number of userspace utilities depend on making calls to RTAS to retrieve
information and update various things.

The existing API through which we expose RTAS to userspace exposes more
RTAS functionality than we actually need, through the sys_rtas syscall,
which allows root (or anyone with CAP_SYS_ADMIN) to make any RTAS call they
want with arbitrary arguments.

Many RTAS calls take the address of a buffer as an argument, and it's up to
the caller to specify the physical address of the buffer as an argument. We
allocate a buffer (the "RMO buffer") in the Real Memory Area that RTAS can
access, and then expose the physical address and size of this buffer in
/proc/powerpc/rtas/rmo_buffer. Userspace is expected to read this address,
poke at the buffer using /dev/mem, and pass an address in the RMO buffer to
the RTAS call.

However, there's nothing stopping the caller from specifying whatever
address they want in the RTAS call, and it's easy to construct a series of
RTAS calls that can overwrite arbitrary bytes (even without /dev/mem
access).

Additionally, there are some RTAS calls that do potentially dangerous
things and for which there are no legitimate userspace use cases.

In the past, this would not have been a particularly big deal as it was
assumed that root could modify all system state freely, but with Secure
Boot and lockdown we need to care about this.

We can't fundamentally change the ABI at this point, however we can address
this by implementing a filter that checks RTAS calls against a list
of permitted calls and forces the caller to use addresses within the RMO
buffer.

The list is based off the list of calls that are used by the librtas
userspace library, and has been tested with a number of existing userspace
RTAS utilities. For compatibility with any applications we are not aware of
that require other calls, the filter can be turned off at build time.

Cc: stable@vger.kernel.org
Reported-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200820044512.7543-1-ajd@linux.ibm.com


# ec2fc2a9 11-Jun-2020 Nathan Lynch <nathanl@linux.ibm.com>

powerpc/rtas: don't online CPUs for partition suspend

Partition suspension, used for hibernation and migration, requires
that the OS place all but one of the LPAR's processor threads into one
of two states prior to calling the ibm,suspend-me RTAS function:

* the architected offline state (via RTAS stop-self); or
* the H_JOIN hcall, which does not return until the partition
resumes execution

Using H_CEDE as the offline mode, introduced by
commit 3aa565f53c39 ("powerpc/pseries: Add hooks to put the CPU into
an appropriate offline state"), means that any threads which are
offline from Linux's point of view must be moved to one of those two
states before a partition suspension can proceed.

This was eventually addressed in commit 120496ac2d2d ("powerpc: Bring
all threads online prior to migration/hibernation"), which added code
to temporarily bring up any offline processor threads so they can call
H_JOIN. Conceptually this is fine, but the implementation has had
multiple races with cpu hotplug operations initiated from user
space[1][2][3], the error handling is fragile, and it generates
user-visible cpu hotplug events which is a lot of noise for a platform
feature that's supposed to minimize disruption to workloads.

With commit 3aa565f53c39 ("powerpc/pseries: Add hooks to put the CPU
into an appropriate offline state") reverted, this code becomes
unnecessary, so remove it. Since any offline CPUs now are truly
offline from the platform's point of view, it is no longer necessary
to bring up CPUs only to have them call H_JOIN and then go offline
again upon resuming. Only active threads are required to call H_JOIN;
stopped threads can be left alone.

[1] commit a6717c01ddc2 ("powerpc/rtas: use device model APIs and
serialization during LPM")
[2] commit 9fb603050ffd ("powerpc/rtas: retry when cpu offline races
with suspend/migration")
[3] commit dfd718a2ed1f ("powerpc/rtas: Fix a potential race between
CPU-Offline & Migration")

Fixes: 120496ac2d2d ("powerpc: Bring all threads online prior to migration/hibernation")
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200612051238.1007764-3-nathanl@linux.ibm.com


# b664db8e 18-May-2020 Leonardo Bras <leobras.c@gmail.com>

powerpc/rtas: Implement reentrant rtas call

Implement rtas_call_reentrant() for reentrant rtas-calls:
"ibm,int-on", "ibm,int-off",ibm,get-xive" and "ibm,set-xive".

On LoPAPR Version 1.1 (March 24, 2016), from 7.3.10.1 to 7.3.10.4,
items 2 and 3 say:

2 - For the PowerPC External Interrupt option: The * call must be
reentrant to the number of processors on the platform.
3 - For the PowerPC External Interrupt option: The * argument call
buffer for each simultaneous call must be physically unique.

So, these rtas-calls can be called in a lockless way, if using
a different buffer for each cpu doing such rtas call.

For this, it was suggested to add the buffer (struct rtas_args)
in the PACA struct, so each cpu can have it's own buffer.
The PACA struct received a pointer to rtas buffer, which is
allocated in the memory range available to rtas 32-bit.

Reentrant rtas calls are useful to avoid deadlocks in crashing,
where rtas-calls are needed, but some other thread crashed holding
the rtas.lock.

This is a backtrace of a deadlock from a kdump testing environment:

#0 arch_spin_lock
#1 lock_rtas ()
#2 rtas_call (token=8204, nargs=1, nret=1, outputs=0x0)
#3 ics_rtas_mask_real_irq (hw_irq=4100)
#4 machine_kexec_mask_interrupts
#5 default_machine_crash_shutdown
#6 machine_crash_shutdown
#7 __crash_kexec
#8 crash_kexec
#9 oops_end

Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
[mpe: Move under #ifdef PSERIES to avoid build breakage]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200518234245.200672-3-leobras.c@gmail.com


# 10e4850d 02-Aug-2019 Nathan Lynch <nathanl@linux.ibm.com>

powerpc/rtas: allow rescheduling while changing cpu states

rtas_cpu_state_change_mask() potentially operates on scores of cpus,
so explicitly allow rescheduling in the loop body.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190802192926.19277-3-nathanl@linux.ibm.com


# a6717c01 02-Aug-2019 Nathan Lynch <nathanl@linux.ibm.com>

powerpc/rtas: use device model APIs and serialization during LPM

The LPAR migration implementation and userspace-initiated cpu hotplug
can interleave their executions like so:

1. Set cpu 7 offline via sysfs.

2. Begin a partition migration, whose implementation requires the OS
to ensure all present cpus are online; cpu 7 is onlined:

rtas_ibm_suspend_me -> rtas_online_cpus_mask -> cpu_up

This sets cpu 7 online in all respects except for the cpu's
corresponding struct device; dev->offline remains true.

3. Set cpu 7 online via sysfs. _cpu_up() determines that cpu 7 is
already online and returns success. The driver core (device_online)
sets dev->offline = false.

4. The migration completes and restores cpu 7 to offline state:

rtas_ibm_suspend_me -> rtas_offline_cpus_mask -> cpu_down

This leaves cpu7 in a state where the driver core considers the cpu
device online, but in all other respects it is offline and
unused. Attempts to online the cpu via sysfs appear to succeed but the
driver core actually does not pass the request to the lower-level
cpuhp support code. This makes the cpu unusable until the cpu device
is manually set offline and then online again via sysfs.

Instead of directly calling cpu_up/cpu_down, the migration code should
use the higher-level device core APIs to maintain consistent state and
serialize operations.

Fixes: 120496ac2d2d ("powerpc: Bring all threads online prior to migration/hibernation")
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190802192926.19277-2-nathanl@linux.ibm.com


# ae2e953f 18-Jul-2019 Nathan Lynch <nathanl@linux.ibm.com>

powerpc/rtas: Unexport rtas_online_cpus_mask, rtas_offline_cpus_mask

These aren't used by modular code, nor should they be.

Fixes: 120496ac2d2d ("powerpc: Bring all threads online prior to migration/hibernation")
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190718162214.5694-1-nathanl@linux.ibm.com


# 9fb60305 21-Jun-2019 Nathan Lynch <nathanl@linux.ibm.com>

powerpc/rtas: retry when cpu offline races with suspend/migration

The protocol for suspending or migrating an LPAR requires all present
processor threads to enter H_JOIN. So if we have threads offline, we
have to temporarily bring them up. This can race with administrator
actions such as SMT state changes. As of dfd718a2ed1f ("powerpc/rtas:
Fix a potential race between CPU-Offline & Migration"),
rtas_ibm_suspend_me() accounts for this, but errors out with -EBUSY
for what almost certainly is a transient condition in any reasonable
scenario.

Callers of rtas_ibm_suspend_me() already retry when -EAGAIN is
returned, and it is typical during a migration for that to happen
repeatedly for several minutes polling the H_VASI_STATE hcall result
before proceeding to the next stage.

So return -EAGAIN instead of -EBUSY when this race is
encountered. Additionally: logging this event is still appropriate but
use pr_info instead of pr_err; and remove use of unlikely() while here
as this is not a hot path at all.

Fixes: dfd718a2ed1f ("powerpc/rtas: Fix a potential race between CPU-Offline & Migration")
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# caa75932 13-Jun-2019 Nadav Amit <namit@vmware.com>

smp: Remove smp_call_function() and on_each_cpu() return values

The return value is fixed. Remove it and amend the callers.

[ tglx: Fixup arm/bL_switcher and powerpc/rtas ]

Signed-off-by: Nadav Amit <namit@vmware.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: https://lkml.kernel.org/r/20190613064813.8102-2-namit@vmware.com


# 2874c5fd 27-May-2019 Thomas Gleixner <tglx@linutronix.de>

treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152

Based on 1 normalized pattern(s):

this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license as published by
the free software foundation either version 2 of the license or at
your option any later version

extracted by the scancode license scanner the SPDX license identifier

GPL-2.0-or-later

has been chosen to replace the boilerplate/reference in 3029 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# 0ba9e6ed 12-Mar-2019 Mike Rapoport <rppt@kernel.org>

memblock: drop memblock_alloc_base()

The memblock_alloc_base() function tries to allocate a memory up to the
limit specified by its max_addr parameter and panics if the allocation
fails. Replace its usage with memblock_phys_alloc_range() and make the
callers check the return value and panic in case of error.

Link: http://lkml.kernel.org/r/1548057848-15136-10-git-send-email-rppt@linux.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> [powerpc]
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Christoph Hellwig <hch@lst.de>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Guo Ren <guoren@kernel.org>
Cc: Guo Ren <ren_guo@c-sky.com> [c-sky]
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Juergen Gross <jgross@suse.com> [Xen]
Cc: Mark Salter <msalter@redhat.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Rob Herring <robh@kernel.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# dfd718a2 01-Oct-2018 Gautham R. Shenoy <ego@linux.vnet.ibm.com>

powerpc/rtas: Fix a potential race between CPU-Offline & Migration

Live Partition Migrations require all the present CPUs to execute the
H_JOIN call, and hence rtas_ibm_suspend_me() onlines any offline CPUs
before initiating the migration for this purpose.

The commit 85a88cabad57
("powerpc/pseries: Disable CPU hotplug across migrations")
disables any CPU-hotplug operations once all the offline CPUs are
brought online to prevent any further state change. Once the
CPU-Hotplug operation is disabled, the code assumes that all the CPUs
are online.

However, there is a minor window in rtas_ibm_suspend_me() between
onlining the offline CPUs and disabling CPU-Hotplug when a concurrent
CPU-offline operations initiated by the userspace can succeed thereby
nullifying the the aformentioned assumption. In this unlikely case
these offlined CPUs will not call H_JOIN, resulting in a system hang.

Fix this by verifying that all the present CPUs are actually online
after CPU-Hotplug has been disabled, failing which we restore the
state of the offline CPUs in rtas_ibm_suspend_me() and return an
-EBUSY.

Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Cc: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Reviewed-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 65b9fdad 09-Oct-2018 Michael Bringmann <mwb@linux.vnet.ibm.com>

powerpc/pseries/mobility: Extend start/stop topology update scope

The powerpc mobility code may receive RTAS requests to perform PRRN
(Platform Resource Reassignment Notification) topology changes at any
time, including during LPAR migration operations.

In some configurations where the affinity of CPUs or memory is being
changed on that platform, the PRRN requests may apply or refer to
outdated information prior to the complete update of the device-tree.

This patch changes the duration for which topology updates are
suppressed during LPAR migrations from just the rtas_ibm_suspend_me()
/ 'ibm,suspend-me' call(s) to cover the entire migration_store()
operation to allow all changes to the device-tree to be applied prior
to accepting and applying any PRRN requests.

For tracking purposes, pr_info notices are added to the functions
start_topology_update() and stop_topology_update() of 'numa.c'.

Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com>
Reviewed-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 85a88cab 17-Sep-2018 Nathan Fontenot <nfont@linux.vnet.ibm.com>

powerpc/pseries: Disable CPU hotplug across migrations

When performing partition migrations all present CPUs must be online
as all present CPUs must make the H_JOIN call as part of the migration
process. Once all present CPUs make the H_JOIN call, one CPU is returned
to make the rtas call to perform the migration to the destination system.

During testing of migration and changing the SMT state we have found
instances where CPUs are offlined, as part of the SMT state change,
before they make the H_JOIN call. This results in a hung system where
every CPU is either in H_JOIN or offline.

To prevent this this patch disables CPU hotplug during the migration
process.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Reviewed-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# ac851744 19-Jun-2018 Paul Burton <paulburton@kernel.org>

powerpc: Remove -Wattribute-alias pragmas

With SYSCALL_DEFINEx() disabling -Wattribute-alias generically, there's
no need to duplicate that for PowerPC syscalls.

This reverts commit 415520373975 ("powerpc: fix build failure by
disabling attribute-alias warning in pci_32") and commit 2479bfc9bc60
("powerpc: Fix build by disabling attribute-alias warning for
SYSCALL_DEFINEx").

Signed-off-by: Paul Burton <paul.burton@mips.com>
Acked-by: Christophe Leroy <christophe.leroy@c-s.fr>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>


# 2479bfc9 29-May-2018 Christophe Leroy <christophe.leroy@c-s.fr>

powerpc: Fix build by disabling attribute-alias warning for SYSCALL_DEFINEx

GCC 8.1 emits warnings such as the following. As arch/powerpc code is
built with -Werror, this breaks the build with GCC 8.1.

In file included from arch/powerpc/kernel/pci_64.c:23:
./include/linux/syscalls.h:233:18: error: 'sys_pciconfig_iobase' alias
between functions of incompatible types 'long int(long int, long
unsigned int, long unsigned int)' and 'long int(long int, long int,
long int)' [-Werror=attribute-alias]
asmlinkage long sys##name(__MAP(x,__SC_DECL,__VA_ARGS__)) \
^~~
./include/linux/syscalls.h:222:2: note: in expansion of macro '__SYSCALL_DEFINEx'
__SYSCALL_DEFINEx(x, sname, __VA_ARGS__)

This patch inhibits those warnings.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
[mpe: Trim change log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 4c392e65 02-May-2018 Al Viro <viro@zeniv.linux.org.uk>

powerpc/syscalls: switch rtas(2) to SYSCALL_DEFINE

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
[mpe: Update sys_ni.c for s/ppc_rtas/sys_rtas/]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 58788a9b 17-Oct-2017 Will Deacon <will@kernel.org>

locking/arch, powerpc/rtas: Use arch_spin_lock() instead of arch_spin_lock_flags()

arch_spin_lock_flags() is an internal part of the spinlock implementation
and is no longer available when SMP=n and DEBUG_SPINLOCK=y, so the PPC
RTAS code fails to compile in this configuration:

arch/powerpc/kernel/rtas.c: In function 'lock_rtas':
>> arch/powerpc/kernel/rtas.c:81:2: error: implicit declaration of function 'arch_spin_lock_flags' [-Werror=implicit-function-declaration]
arch_spin_lock_flags(&rtas.lock, flags);
^~~~~~~~~~~~~~~~~~~~

Since there's no good reason to use arch_spin_lock_flags() here (the code
in question already calls local_irq_save(flags)), switch it over to
arch_spin_lock and get things building again.

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1508327469-20231-1-git-send-email-will.deacon@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 0ee931c4 13-Sep-2017 Michal Hocko <mhocko@suse.com>

mm: treewide: remove GFP_TEMPORARY allocation flag

GFP_TEMPORARY was introduced by commit e12ba74d8ff3 ("Group short-lived
and reclaimable kernel allocations") along with __GFP_RECLAIMABLE. It's
primary motivation was to allow users to tell that an allocation is
short lived and so the allocator can try to place such allocations close
together and prevent long term fragmentation. As much as this sounds
like a reasonable semantic it becomes much less clear when to use the
highlevel GFP_TEMPORARY allocation flag. How long is temporary? Can the
context holding that memory sleep? Can it take locks? It seems there is
no good answer for those questions.

The current implementation of GFP_TEMPORARY is basically GFP_KERNEL |
__GFP_RECLAIMABLE which in itself is tricky because basically none of
the existing caller provide a way to reclaim the allocated memory. So
this is rather misleading and hard to evaluate for any benefits.

I have checked some random users and none of them has added the flag
with a specific justification. I suspect most of them just copied from
other existing users and others just thought it might be a good idea to
use without any measuring. This suggests that GFP_TEMPORARY just
motivates for cargo cult usage without any reasoning.

I believe that our gfp flags are quite complex already and especially
those with highlevel semantic should be clearly defined to prevent from
confusion and abuse. Therefore I propose dropping GFP_TEMPORARY and
replace all existing users to simply use GFP_KERNEL. Please note that
SLAB users with shrinkers will still get __GFP_RECLAIMABLE heuristic and
so they will be placed properly for memory fragmentation prevention.

I can see reasons we might want some gfp flag to reflect shorterm
allocations but I propose starting from a clear semantic definition and
only then add users with proper justification.

This was been brought up before LSF this year by Matthew [1] and it
turned out that GFP_TEMPORARY really doesn't have a clear semantic. It
seems to be a heuristic without any measured advantage for most (if not
all) its current users. The follow up discussion has revealed that
opinions on what might be temporary allocation differ a lot between
developers. So rather than trying to tweak existing users into a
semantic which they haven't expected I propose to simply remove the flag
and start from scratch if we really need a semantic for short term
allocations.

[1] http://lkml.kernel.org/r/20170118054945.GD18349@bombadil.infradead.org

[akpm@linux-foundation.org: fix typo]
[akpm@linux-foundation.org: coding-style fixes]
[sfr@canb.auug.org.au: drm/i915: fix up]
Link: http://lkml.kernel.org/r/20170816144703.378d4f4d@canb.auug.org.au
Link: http://lkml.kernel.org/r/20170728091904.14627-1-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Neil Brown <neilb@suse.de>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 8b257783 23-Jan-2017 Gavin Shan <gwshan@linux.vnet.ibm.com>

powerpc/kernel: Fix unbalanced refcount on RTAS device node

The RTAS device-tree node's refcount has been increased by one in
the function call of_find_node_by_name(), but it's missed to be
decreased by one in the error path. It leads to unbalanced refcount
on RTAS device-tree node.

This fixes above issue by decreasing RTAS device-tree node's refcount
in error path.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# de6d2d1b 23-Jan-2017 Gavin Shan <gwshan@linux.vnet.ibm.com>

powerpc/kernel: Use of_property_read_u32() in rtas_initialize()

This uses of_property_read_u32() in rtas_initialize() so that we
needn't explicitly care the CPU's endian.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# dbecd509 23-Jan-2017 Gavin Shan <gwshan@linux.vnet.ibm.com>

powerpc/kernel: Remove nested if statements in rtas_initialize()

This removes the unnecessary nested if statements in function
rtas_initialize(), to simplify the code. No functional changes
introduced.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 7c0f6ba6 24-Dec-2016 Linus Torvalds <torvalds@linux-foundation.org>

Replace <asm/uaccess.h> with <linux/uaccess.h> globally

This was entirely automated, using the script by Al:

PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>'
sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \
$(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h)

to do the replacement at the end of the merge window.

Requested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 95ec77c0 11-Jul-2016 Daniel Axtens <dja@axtens.net>

powerpc: Make ppc_md.{halt, restart} __noreturn

powernv marks it's halt and restart calls as __noreturn. However,
ppc_md does not have this annotation. Add the annotation to ppc_md,
and then to every halt/restart function that is missing it.

Additionally, I have verified that all of these functions do not
return. Occasionally I have added a spin loop to be sure.

Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 484cc1ed 04-Jul-2016 Benjamin Herrenschmidt <benh@kernel.crashing.org>

powerpc/rtas: Don't test for machine type in rtas_initialize()

The test is unnecessary, the FW_FEATURE_LPAR is sufficient as there
exist no other LPAR type that has RTAS.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# a9862c74 18-Mar-2016 Andrew Donnellan <andrew.donnellan@au1.ibm.com>

powerpc/rtas: Fix array overrun in ppc_rtas() syscall

If ppc_rtas() is called with args.nargs == 16 and args.nret == 0,
args.rets is set to point to &args.args[16], which is beyond the end of
the args.args array. This results in a minor read overrun of the array
when we check the first return code (which, per PAPR, is a required
output of all RTAS calls) to see if there's been a hardware error.

Change the nargs/nret check to ensure nargs is <= 15, allowing room for
the status code. Users shouldn't be calling with nret == 0, but there's
no real harm if they do, so we don't stop them.

Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# cd5cdeb6 24-Nov-2015 Michael Ellerman <mpe@ellerman.id.au>

powerpc/rtas: Make enter_rtas() private

There are no longer any users of enter_rtas() outside of rtas.c, so make
it "private", by moving the declaration inside rtas.c. Hopefully this
will encourage people to use one of the wrappers which takes the sharp
edges off the RTAS calling sequence.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 4456f452 24-Nov-2015 Michael Ellerman <mpe@ellerman.id.au>

powerpc/rtas: Use rtas_call_unlocked() in call_rtas_display_status()

Although call_rtas_display_status() does actually want to use the
regular RTAS locking, it doesn't want the extra logic that is in
rtas_call(), so currently it open codes the logic.

Instead we can use rtas_call_unlocked(), after taking the RTAS lock.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 209eb4e5 16-Dec-2015 Michael Ellerman <mpe@ellerman.id.au>

powerpc/rtas: Add rtas_call_unlocked()

Most users of RTAS (Run-Time Abstraction Services) use rtas_call(),
which deals with locking as well as endian handling.

However we have two users outside of rtas.c that can't use rtas_call()
because they have different locking requirements.

The hotplug CPU code can't take the RTAS lock because the CPU would go
offline with the lock held and no other CPUs would be able to call RTAS
until the CPU came back online.

The xmon code doesn't want to take the lock because it would risk dead
locking when we are trying to recover from a crash.

Both sites required multiple patches when we added little endian
support, proving that programmers can't do endian right.

Although that ship has sailed, we can still clean the code up by
providing an unlocked version of rtas_call() which avoids the need to
open code the logic elsewhere.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 8832317f 16-Oct-2015 Vasant Hegde <hegdevasant@linux.vnet.ibm.com>

powerpc/rtas: Validate rtas.entry before calling enter_rtas()

Currently we do not validate rtas.entry before calling enter_rtas(). This
leads to a kernel oops when user space calls rtas system call on a powernv
platform (see below). This patch adds code to validate rtas.entry before
making enter_rtas() call.

Oops: Exception in kernel mode, sig: 4 [#1]
SMP NR_CPUS=1024 NUMA PowerNV
task: c000000004294b80 ti: c0000007e1a78000 task.ti: c0000007e1a78000
NIP: 0000000000000000 LR: 0000000000009c14 CTR: c000000000423140
REGS: c0000007e1a7b920 TRAP: 0e40 Not tainted (3.18.17-340.el7_1.pkvm3_1_0.2400.1.ppc64le)
MSR: 1000000000081000 <HV,ME> CR: 00000000 XER: 00000000
CFAR: c000000000009c0c SOFTE: 0
NIP [0000000000000000] (null)
LR [0000000000009c14] 0x9c14
Call Trace:
[c0000007e1a7bba0] [c00000000041a7f4] avc_has_perm_noaudit+0x54/0x110 (unreliable)
[c0000007e1a7bd80] [c00000000002ddc0] ppc_rtas+0x150/0x2d0
[c0000007e1a7be30] [c000000000009358] syscall_exit+0x0/0x98

Cc: stable@vger.kernel.org # v3.2+
Fixes: 55190f88789a ("powerpc: Add skeleton PowerNV platform")
Reported-by: NAGESWARA R. SASTRY <nasastry@in.ibm.com>
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
[mpe: Reword change log, trim oops, and add stable + fixes]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 1c2cb594 16-Jul-2015 Thomas Huth <thuth@redhat.com>

powerpc/rtas: Introduce rtas_get_sensor_fast() for IRQ handlers

The EPOW interrupt handler uses rtas_get_sensor(), which in turn
uses rtas_busy_delay() to wait for RTAS becoming ready in case it
is necessary. But rtas_busy_delay() is annotated with might_sleep()
and thus may not be used by interrupts handlers like the EPOW handler!
This leads to the following BUG when CONFIG_DEBUG_ATOMIC_SLEEP is
enabled:

BUG: sleeping function called from invalid context at arch/powerpc/kernel/rtas.c:496
in_atomic(): 1, irqs_disabled(): 1, pid: 0, name: swapper/1
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.2.0-rc2-thuth #6
Call Trace:
[c00000007ffe7b90] [c000000000807670] dump_stack+0xa0/0xdc (unreliable)
[c00000007ffe7bc0] [c0000000000e1f14] ___might_sleep+0x134/0x180
[c00000007ffe7c20] [c00000000002aec0] rtas_busy_delay+0x30/0xd0
[c00000007ffe7c50] [c00000000002bde4] rtas_get_sensor+0x74/0xe0
[c00000007ffe7ce0] [c000000000083264] ras_epow_interrupt+0x44/0x450
[c00000007ffe7d90] [c000000000120260] handle_irq_event_percpu+0xa0/0x300
[c00000007ffe7e70] [c000000000120524] handle_irq_event+0x64/0xc0
[c00000007ffe7eb0] [c000000000124dbc] handle_fasteoi_irq+0xec/0x260
[c00000007ffe7ef0] [c00000000011f4f0] generic_handle_irq+0x50/0x80
[c00000007ffe7f20] [c000000000010f3c] __do_irq+0x8c/0x200
[c00000007ffe7f90] [c0000000000236cc] call_do_irq+0x14/0x24
[c00000007e6f39e0] [c000000000011144] do_IRQ+0x94/0x110
[c00000007e6f3a30] [c000000000002594] hardware_interrupt_common+0x114/0x180

Fix this issue by introducing a new rtas_get_sensor_fast() function
that does not use rtas_busy_delay() - and thus can only be used for
sensors that do not cause a BUSY condition - known as "fast" sensors.

The EPOW sensor is defined to be "fast" in sPAPR - mpe.

Fixes: 587f83e8dd50 ("powerpc/pseries: Use rtas_get_sensor in RAS code")
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 9ef03193 22-Jul-2015 Thomas Huth <thuth@redhat.com>

powerpc/rtas: Replace magic values with defines

rtas.h already has some nice #defines for RTAS return status
codes - let's use them instead of hard-coded "magic" values!

Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# f691fa10 29-Mar-2015 Michael Ellerman <mpe@ellerman.id.au>

powerpc: Replace mem_init_done with slab_is_available()

We have a powerpc specific global called mem_init_done which is "set on
boot once kmalloc can be called".

But that's not *quite* true. We set it at the bottom of mem_init(), and
rely on the fact that mm_init() calls kmem_cache_init() immediately
after that, and nothing is running in parallel.

So replace it with the generic and 100% correct slab_is_available().

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# c03e7374 27-Mar-2015 Tyrel Datwyler <tyreld@linux.vnet.ibm.com>

powerpc/pseries: Simplify check for suspendability during suspend/migration

During suspend/migration operation we must wait for the VASI state reported
by the hypervisor to become Suspending prior to making the ibm,suspend-me
RTAS call. Calling routines to rtas_ibm_supend_me() pass a vasi_state variable
that exposes the VASI state to the caller. This is unnecessary as the caller
only really cares about the following three conditions; if there is an error
we should bailout, success indicating we have suspended and woken back up so
proceed to device tree update, or we are not suspendable yet so try calling
rtas_ibm_suspend_me again shortly.

This patch removes the extraneous vasi_state variable and simply uses the
return code to communicate how to proceed. We either succeed, fail, or get
-EAGAIN in which case we sleep for a second before trying to call
rtas_ibm_suspend_me again. The behaviour of ppc_rtas() remains the same,
but migrate_store() now returns the propogated error code on failure.
Previously -1 was returned from migrate_store() in the failure case which
equates to -EPERM and was clearly wrong.

Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
Cc: Nathan Fontenont <nfont@linux.vnet.ibm.com>
Cc: Cyril Bur <cyrilbur@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 3df76a9d 20-Jan-2015 Cyril Bur <cyrilbur@gmail.com>

powerpc/pseries: Fix endian problems with LE migration

RTAS events require arguments be passed in big endian while hypercalls
have their arguments passed in registers and the values should therefore
be in CPU endian.

The "ibm,suspend_me" 'RTAS' call makes a sequence of hypercalls to setup
one true RTAS call. This means that "ibm,suspend_me" is handled
specially in the ppc_rtas() syscall.

The ppc_rtas() syscall has its arguments in big endian and can therefore
pass these arguments directly to the RTAS call. "ibm,suspend_me" is
handled specially from within ppc_rtas() (by calling rtas_ibm_suspend_me())
which has left an endian bug on little endian systems due to the
requirement of hypercalls. The return value from rtas_ibm_suspend_me()
gets returned in cpu endian, and is left unconverted, also a bug on
little endian systems.

rtas_ibm_suspend_me() does not actually make use of the rtas_args that
it is passed. This patch removes the convoluted use of the rtas_args
struct to pass params to rtas_ibm_suspend_me() in favour of passing what
it needs as actual arguments. This patch also ensures the two callers of
rtas_ibm_suspend_me() pass function parameters in cpu endian and in the
case of ppc_rtas(), converts the return value.

migrate_store() (the other caller of rtas_ibm_suspend_me()) is from a
sysfs file which deals with everything in cpu endian so this function
only underwent cleanup.

This patch has been tested with KVM both LE and BE and on PowerVM both
LE and BE. Under QEMU/KVM the migration happens without touching these
code pathes.

For PowerVM there is no obvious regression on BE and the LE code path
now provides the correct parameters to the hypervisor.

Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 14ed7409 17-Sep-2014 Anton Blanchard <anton@samba.org>

powerpc: Remove some old bootmem related comments

Now bootmem is gone from powerpc we can remove comments mentioning it.

Signed-off-by: Anton Blanchard <anton@samba.org>
Tested-by: Emil Medve <Emilian.Medve@Freescale.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 9d0c4dfe 01-Apr-2014 Rob Herring <robh@kernel.org>

of/fdt: update of_get_flat_dt_prop in prep for libfdt

Make of_get_flat_dt_prop arguments compatible with libfdt fdt_getprop
call in preparation to convert FDT code to use libfdt. Make the return
value const and the property length ptr type an int.

Signed-off-by: Rob Herring <robh@kernel.org>
Tested-by: Michal Simek <michal.simek@xilinx.com>
Tested-by: Grant Likely <grant.likely@linaro.org>
Tested-by: Stephen Chivers <schivers@csc.com>


# a08a53ea 04-Apr-2014 Greg Kurz <groug@kaod.org>

powerpc/le: Enable RTAS events support

The current kernel code assumes big endian and parses RTAS events all
wrong. The most visible effect is that we cannot honor EPOW events,
meaning, for example, we cannot shut down a guest properly from the
hypervisor.

This new patch is largely inspired by Nathan's work: we get rid of all
the bit fields in the RTAS event structures (even the unused ones, for
consistency). We also introduce endian safe accessors for the fields used
by the kernel (trivial rtas_error_type() accessor added for consistency).

Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# 599d2870 19-Mar-2014 Greg Kurz <groug@kaod.org>

powerpc/le: Big endian arguments for ppc_rtas()

The ppc_rtas() syscall allows userspace to interact directly with RTAS.
For the moment, it assumes every thing is big endian and returns either
EINVAL or EFAULT when called in a little endian environment.

As suggested by Benjamin, to avoid bugs when userspace wants to pass
a non 32 bit value to RTAS, it is far better to stick with a simple
rationale: ppc_rtas() should be called with a big endian rtas_args
structure.

With this patch, it is now up to userspace to forge big endian arguments,
as expected by RTAS.

Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# 27128264 06-Aug-2013 Anton Blanchard <anton@samba.org>

powerpc: Make RTAS calls endian safe

RTAS expects arguments in the call buffer to be big endian so we
need to byteswap on little endian builds

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# 08bc1dc5 06-Aug-2013 Anton Blanchard <anton@samba.org>

powerpc: Make RTAS device tree accesses endian safe

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# 061d19f2 24-Jun-2013 Paul Gortmaker <paul.gortmaker@windriver.com>

powerpc: Delete __cpuinit usage from all users

The __cpuinit type of throwaway sections might have made sense
some time ago when RAM was more constrained, but now the savings
do not offset the cost and complications. For example, the fix in
commit 5e427ec2d0 ("x86: Fix bit corruption at CPU resume time")
is a good example of the nasty type of bugs that can be created
with improper use of the various __init prefixes.

After a discussion on LKML[1] it was decided that cpuinit should go
the way of devinit and be phased out. Once all the users are gone,
we can then finally remove the macros themselves from linux/init.h.

This removes all the powerpc uses of the __cpuinit macros. There
are no __CPUINIT users in assembly files in powerpc.

[1] https://lkml.org/lkml/2013/5/20/589

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Josh Boyer <jwboyer@gmail.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# 120496ac 06-May-2013 Robert Jennings <rcj@linux.vnet.ibm.com>

powerpc: Bring all threads online prior to migration/hibernation

This patch brings online all threads which are present but not online
prior to migration/hibernation. After migration/hibernation those
threads are taken back offline.

During migration/hibernation all online CPUs must call H_JOIN, this is
required by the hypervisor. Without this patch, threads that are offline
(H_CEDE'd) will not be woken to make the H_JOIN call and the OS will be
deadlocked (all threads either JOIN'd or CEDE'd).

Cc: <stable@kernel.org>
Signed-off-by: Robert Jennings <rcj@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# f459d63e 02-Oct-2012 Nathan Fontenot <nfont@linux.vnet.ibm.com>

powerpc+of: Remove the pSeries_reconfig.h file

Remove the pSeries_reconfig.h header file. At this point there is only one
definition in the file, pSeries_coalesce_init(), which can be
moved to rtas.h.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Acked-by: Rob Herring <rob.herring@calxeda.com>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# ae3a197e 28-Mar-2012 David Howells <dhowells@redhat.com>

Disintegrate asm/system.h for PowerPC

Disintegrate asm/system.h for PowerPC.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
cc: linuxppc-dev@lists.ozlabs.org


# 6431f208 21-Mar-2012 Anton Blanchard <anton@samba.org>

powerpc: Make function that parses RTAS error logs global

The IO event interrupt code has a function that finds specific
sections in an RTAS error log. We want to use it in the EPOW
code so make it global.

Rename things to make it less cryptic:

find_xelog_section() -> get_pseries_errorlog()
struct pseries_elog_section -> struct pseries_errorlog

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# 444080d1 10-Jan-2012 Brian King <brking@linux.vnet.ibm.com>

powerpc/pseries: Fix partition migration hang in stop_topology_update

This fixes a hang that was observed during live partition migration.
Since stop_topology_update must not be called from an interrupt
context, call it earlier in the migration process. The hang observed
can be seen below:

WARNING: at kernel/timer.c:1011
Modules linked in: ip6t_LOG xt_tcpudp xt_pkttype ipt_LOG xt_limit ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_raw xt_NOTRACK ipt_REJECT xt_state iptable_raw iptable_filter ip6table_mangle nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 ip_tables ip6table_filter ip6_tables x_tables ipv6 fuse loop ibmveth sg ext3 jbd mbcache raid456 async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy async_tx raid10 raid1 raid0 scsi_dh_alua scsi_dh_rdac scsi_dh_hp_sw scsi_dh_emc dm_round_robin dm_multipath scsi_dh sd_mod crc_t10dif ibmvfc scsi_transport_fc scsi_tgt scsi_mod dm_snapshot dm_mod
NIP: c0000000000c52d8 LR: c00000000004be28 CTR: 0000000000000000
REGS: c00000005ffd77d0 TRAP: 0700 Not tainted (3.2.0-git-00001-g07d106d)
MSR: 8000000000021032 <ME,CE,IR,DR> CR: 48000084 XER: 00000001
CFAR: c00000000004be20
TASK = c00000005ec78860[0] 'swapper/3' THREAD: c00000005ec98000 CPU: 3
GPR00: 0000000000000001 c00000005ffd7a50 c000000000fbbc98 c000000000ec8340
GPR04: 00000000282a0020 0000000000000000 0000000000004000 0000000000000101
GPR08: 0000000000000012 c00000005ffd4000 0000000000000020 c000000000f3ba88
GPR12: 0000000000000000 c000000007f40900 0000000000000001 0000000000000004
GPR16: 0000000000000001 0000000000000000 0000000000000000 c000000001022310
GPR20: 0000000000000001 0000000000000000 0000000000200200 c000000001029e14
GPR24: 0000000000000000 0000000000000001 0000000000000040 c00000003f74bc80
GPR28: c00000003f74bc84 c000000000f38038 c000000000f16b58 c000000000ec8340
NIP [c0000000000c52d8] .del_timer_sync+0x28/0x60
LR [c00000000004be28] .stop_topology_update+0x20/0x38
Call Trace:
[c00000005ffd7a50] [c00000005ec78860] 0xc00000005ec78860 (unreliable)
[c00000005ffd7ad0] [c00000000004be28] .stop_topology_update+0x20/0x38
[c00000005ffd7b40] [c000000000028378] .__rtas_suspend_last_cpu+0x58/0x260
[c00000005ffd7bf0] [c0000000000fa230] .generic_smp_call_function_interrupt+0x160/0x358
[c00000005ffd7cf0] [c000000000036ec8] .smp_ipi_demux+0x88/0x100
[c00000005ffd7d80] [c00000000005c154] .icp_hv_ipi_action+0x5c/0x80
[c00000005ffd7e00] [c00000000012a088] .handle_irq_event_percpu+0x100/0x318
[c00000005ffd7f00] [c00000000012e774] .handle_percpu_irq+0x84/0xd0
[c00000005ffd7f90] [c000000000022ba8] .call_handle_irq+0x1c/0x2c
[c00000005ec9ba20] [c00000000001157c] .do_IRQ+0x22c/0x2a8
[c00000005ec9bae0] [c0000000000054bc] hardware_interrupt_entry+0x18/0x1c
Exception: 501 at .cpu_idle+0x194/0x2f8
LR = .cpu_idle+0x194/0x2f8
[c00000005ec9bdd0] [c000000000017e58] .cpu_idle+0x188/0x2f8 (unreliable)
[c00000005ec9be90] [c00000000067ec18] .start_secondary+0x3e4/0x524
[c00000005ec9bf90] [c0000000000093e8] .start_secondary_prolog+0x10/0x14
Instruction dump:
ebe1fff8 4e800020 fbe1fff8 7c0802a6 f8010010 7c7f1b78 f821ff81 78290464
80090014 5400019e 7c0000d0 78000fe0 <0b000000> 4800000c 7c210b78 7c421378

Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# 4b16f8e2 22-Jul-2011 Paul Gortmaker <paul.gortmaker@windriver.com>

powerpc: various straight conversions from module.h --> export.h

All these files were including module.h just for the basic
EXPORT_SYMBOL infrastructure. We can shift them off to the
export.h header which is a way smaller footprint and thus
realize some compile time gains.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>


# 60063497 26-Jul-2011 Arun Sharma <asharma@fb.com>

atomic: use <linux/atomic.h>

This allows us to move duplicated code in <asm/atomic.h>
(atomic_inc_not_zero() for now) to <linux/atomic.h>

Signed-off-by: Arun Sharma <asharma@fb.com>
Reviewed-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: David Miller <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# c5f41752 25-Jul-2011 Amerigo Wang <amwang@redhat.com>

notifiers: sys: move reboot notifiers into reboot.h

It is not necessary to share the same notifier.h.

This patch already moves register_reboot_notifier() and
unregister_reboot_notifier() from kernel/notifier.c to kernel/sys.c.

[amwang@redhat.com: make allyesconfig succeed on ppc64]
Signed-off-by: WANG Cong <amwang@redhat.com>
Cc: David Miller <davem@davemloft.net>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: WANG Cong <amwang@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 9ee820fa 04-May-2011 Brian King <brking@linux.vnet.ibm.com>

powerpc/pseries: Add page coalescing support

Adds support for page coalescing, which is a feature on IBM Power servers
which allows for coalescing identical pages between logical partitions.
Hint text pages as coalesce candidates, since they are the most likely
pages to be able to be coalesced between partitions. This patch also
exports some page coalescing statistics available from firmware via
lparcfg.

[BenH: Moved a couple of things around to fix compile problems]

Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# eca590f4 06-Apr-2011 Anton Blanchard <anton@samba.org>

powerpc/rtas: Only sleep in rtas_busy_delay if we have useful work to do

RTAS returns extended error codes as a hint of how long the
OS might want to wait before retrying a call. If we have nothing
else useful to do we may as well call back straight away.

This was found when testing the new dynamic dma window feature.
Firmware split the zeroing of the TCE table into 32k chunks but
returned 9901 (which is a suggested wait of 10ms). All up this took
about 10 minutes to complete since msleep is jiffies based and will
round 10ms up to 20ms.

With the patch below we take 3 seconds to complete the same test.
The hint firmware is returning in the RTAS call should definitely
be decreased, but even if we slept 1ms each iteration this would
take 32s.

Signed-off-by: Anton Blanchard <anton@samba.org>
Acked-by: Nishanth Aravamudan <nacc@us.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# 3b7a27db 30-Nov-2010 Jesse Larrew <jlarrew@linux.vnet.ibm.com>

powerpc: Disable VPHN polling during a suspend operation

Tie the polling mechanism into the ibm,suspend-me rtas call to
stop/restart polling before/after a suspend, hibernate, migrate,
or checkpoint restart operation. This ensures that the system has a
chance to disable the polling if the partition is migrated to a system
that does not support VPHN (and vice versa).

Signed-off-by: Jesse Larrew <jlarrew@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# d8862be1 10-Sep-2010 Nathan Fontenot <nfont@austin.ibm.com>

powerpc/pseries: Export rtas_ibm_suspend_me()

Export the rtas_ibm_suspend_me() routine. This is needed to perform
partition migration in the kernel.

Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# cd3db0c4 06-Jul-2010 Benjamin Herrenschmidt <benh@kernel.crashing.org>

memblock: Remove rmo_size, burry it in arch/powerpc where it belongs

The RMA (RMO is a misnomer) is a concept specific to ppc64 (in fact
server ppc64 though I hijack it on embedded ppc64 for similar purposes)
and represents the area of memory that can be accessed in real mode
(aka with MMU off), or on embedded, from the exception vectors (which
is bolted in the TLB) which pretty much boils down to the same thing.

We take that out of the generic MEMBLOCK data structure and move it into
arch/powerpc where it belongs, renaming it to "RMA" while at it.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# 95f72d1e 11-Jul-2010 Yinghai Lu <yinghai@kernel.org>

lmb: rename to memblock

via following scripts

FILES=$(find * -type f | grep -vE 'oprofile|[^K]config')

sed -i \
-e 's/lmb/memblock/g' \
-e 's/LMB/MEMBLOCK/g' \
$FILES

for N in $(find . -name lmb.[ch]); do
M=$(echo $N | sed 's/lmb/memblock/g')
mv $N $M
done

and remove some wrong change like lmbench and dlmb etc.

also move memblock.c from lib/ to mm/

Suggested-by: Ingo Molnar <mingo@elte.hu>
Acked-by: "H. Peter Anvin" <hpa@zytor.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# 8fe93f8d 06-Jul-2010 Brian King <brking@linux.vnet.ibm.com>

powerpc/pseries: Migration code reorganization / hibernation prep

Partition hibernation will use some of the same code as is
currently used for Live Partition Migration. This function
further abstracts this code such that code outside of rtas.c
can utilize it. It also changes the error field in the suspend
me data structure to be an atomic type, since it is set and
checked on different cpus without any barriers or locking.

Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# e9bbc8cd 17-Feb-2010 Anton Blanchard <anton@samba.org>

powerpc/pseries: Call ibm,os-term if the ibm,extended-os-term is present

We have had issues in the past with ibm,os-term initiating shutdown of a
partition. This is confusing to the user, especially if panic_timeout is
non zero.

The temporary fix was to avoid calling ibm,os-term if a panic_timeout was set
and since we set it on every boot we basically never call ibm,os-term.

An extended version of ibm,os-term has since been implemented which gives us
the behaviour we want:

"When the platform supports extended ibm,os-term behavior, the return to the
RTAS will always occur unless there is a kernel assisted dump active as
initiated by an ibm,configure-kernel-dump call."

This patch checks for the ibm,extended-os-term property and calls ibm,os-term
if it exists.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# 5a0e3ad6 24-Mar-2010 Tejun Heo <tj@kernel.org>

include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h

percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.

2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).

* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>


# 0199c4e6 02-Dec-2009 Thomas Gleixner <tglx@linutronix.de>

locking: Convert __raw_spin* functions to arch_spin*

Name space cleanup. No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: linux-arch@vger.kernel.org


# edc35bd7 02-Dec-2009 Thomas Gleixner <tglx@linutronix.de>

locking: Rename __RAW_SPIN_LOCK_UNLOCKED to __ARCH_SPIN_LOCK_UNLOCKED

Further name space cleanup. No functional change

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: linux-arch@vger.kernel.org


# 445c8951 02-Dec-2009 Thomas Gleixner <tglx@linutronix.de>

locking: Convert raw_spinlock to arch_spinlock

The raw_spin* namespace was taken by lockdep for the architecture
specific implementations. raw_spin_* would be the ideal name space for
the spinlocks which are not converted to sleeping locks in preempt-rt.

Linus suggested to convert the raw_ to arch_ locks and cleanup the
name space instead of using an artifical name like core_spin,
atomic_spin or whatever

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: linux-arch@vger.kernel.org


# 46db2f86 27-Aug-2009 Brian King <brking@linux.vnet.ibm.com>

powerpc/pseries: Fix to handle slb resize across migration

The SLB can change sizes across a live migration, which was not
being handled, resulting in possible machine crashes during
migration if migrating to a machine which has a smaller max SLB
size than the source machine. Fix this by first reducing the
SLB size to the minimum possible value, which is 32, prior to
migration. Then during the device tree update which occurs after
migration, we make the call to ensure the SLB gets updated. Also
add the slb_size to the lparcfg output so that the migration
tools can check to make sure the kernel has this capability
before allowing migration in scenarios where the SLB size will change.

BenH: Fixed #include <asm/mmu-hash64.h> -> <asm/mmu.h> to avoid
breaking ppc32 build

Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# c4007a2f 16-Jun-2009 Benjamin Herrenschmidt <benh@kernel.crashing.org>

powerpc: Use one common impl. of RTAS timebase sync and use raw spinlock

Several platforms use their own copy of what is essentially the same code,
using RTAS to synchronize the timebases when bringing up new CPUs. This
moves it all into a single common implementation and additionally
turns the spinlock into a raw spinlock since the former can rely on
the timebase not being frozen when spinlock debugging is enabled, and finally
masks interrupts while the timebase is disabled.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# f97bb36f 16-Jun-2009 Benjamin Herrenschmidt <benh@kernel.crashing.org>

powerpc/rtas: Turn rtas lock into a raw spinlock

RTAS currently uses a normal spinlock. However it can be called from
contexts where this is not necessarily a good idea. For example, it
can be called while syncing timebases, with the core timebase being
frozen. Unfortunately, that will deadlock in case of lock contention
when spinlock debugging is enabled as the spin lock debugging code
will try to use __delay() which ... relies on the timebase being
enabled.

Also RTAS can be used in some low level IRQ handling code path so it
may as well be a raw spinlock for -rt sake.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# f52862f4 16-Feb-2009 Brian King <brking@linux.vnet.ibm.com>

powerpc/pseries: Fix partition migration hang under load

While testing partition migration with heavy CPU load using
shared processors, it was observed that sometimes the migration
would never complete and would appear to hang. Currently, the
migration code assumes that if H_SUCCESS is returned from the H_JOIN
then the migration is complete and the processor is waking up on
the target system. If there was an outstanding PROD to the processor
when the H_JOIN is called, however, it will return H_SUCCESS on the source
system, causing the migration to hang, or in some scenarios cause
the kernel to crash on the complete call waking the caller
of rtas_percpu_suspend_me. Fix this by calling H_JOIN multiple times
if necessary during the migration.

Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# edc72ac4 11-Dec-2008 Nathan Lynch <ntl@pobox.com>

powerpc/pseries: Check for GIQ indicator before calling set-indicator

Since "Factor out cpu joining/unjoining the GIQ"
(b4963255ad5a426f04a0bb15c4315fa4bb40cde9) the WARN_ON in
xics_set_cpu_giq() is being triggered during boot on JS20 because the
GIQ indicator is not available on that platform. While the warning is
harmless and the system runs normally, it's nicer to check for the
existence of the indicator before trying to manipulate it.

Implement rtas_indicator_present(), which searches the
/rtas/rtas-indicators property for the given indicator token, and use
this function in xics_set_cpu_giq().

Also use a WARN statement in xics_set_cpu_giq to get better
information on failure.

Signed-off-by: Nathan Lynch <ntl@pobox.com>
Acked-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>


# b79998fc 30-Jul-2008 Nathan Fontenot <nfont@austin.ibm.com>

powerpc: Zero fill the return values of rtas argument buffer

The kernel copy of the rtas args struct contains the return
value(s) for the specified rtas call. These are copied back
to user space with the assumption that every value has been
set by the rtas call, which turns out to be not always true.
Thus userspace can see random values and think the call failed
when in fact it succeeded, but for some reason didn't set one
of the return values.

This fixes the problem by zeroing out the return value fields
of the rtas args struct before processing the rtas call.

Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>


# 15c8b6c1 09-May-2008 Jens Axboe <jens.axboe@oracle.com>

on_each_cpu(): kill unused 'retry' parameter

It's not even passed on to smp_call_function() anymore, since that
was removed. So kill it.

Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>


# 1c21a293 07-May-2008 Michael Ellerman <michael@ellerman.id.au>

[POWERPC] Fix sparse warnings in arch/powerpc/kernel

Make a few things static in lparcfg.c
Make init and exit routines static in rtas_flash.c
Make things static in rtas_pci.c
Make some functions static in rtas.c
Make fops static in rtas-proc.c
Remove unneeded extern for do_gtod in smp.c
Make clocksource_init() static in time.c
Make last_tick_len and ticklen_to_xs static in time.c
Move the declaration of the pvr per-cpu into smp.h
Make kexec_smp_down() and kexec_stack static in machine_kexec_64.c
Don't return void in arch_teardown_msi_irqs() in msi.c
Move declaration of GregorianDay()into asm/time.h

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>


# 950e4da3 26-Feb-2008 Matthew Wilcox <willy@infradead.org>

arch: Remove unnecessary inclusions of asm/semaphore.h

None of these files use any of the functionality promised by
asm/semaphore.h. It's possible that they rely on it dragging in some
unrelated header file, but I can't build all these files, so we'll have
fix any build failures as they come up.

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>


# e48b1b45 28-Mar-2008 Harvey Harrison <harvey.harrison@gmail.com>

[POWERPC] Replace remaining __FUNCTION__ occurrences

__FUNCTION__ is gcc-specific, use __func__

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>


# d9b2b2a2 13-Feb-2008 David S. Miller <davem@davemloft.net>

[LIB]: Make PowerPC LMB code generic so sparc64 can use it too.

Signed-off-by: David S. Miller <davem@davemloft.net>


# 8f515061 02-Dec-2007 Paul Mackerras <paulus@samba.org>

Revert "[POWERPC] Fix RTAS os-term usage on kernel panic"

This reverts commit a2b51812a4dc5db09ab4d4638d4d8ed456e2457e.

It turns out that this change caused some machines to fail to come
back up when being rebooted, and generated an error in the hypervisor
error log on some machines. The platform architecture (PAPR) is a
little unclear on exactly when the RTAS ibm,os-term function should be
called. Until that is clarified I'm reverting this commit.

Signed-off-by: Paul Mackerras <paulus@samba.org>


# a2b51812 19-Nov-2007 Linas Vepstas <linas@austin.ibm.com>

[POWERPC] Fix RTAS os-term usage on kernel panic

The rtas_os_term() routine was being called at the wrong time.
The actual rtas call "os-term" will not ever return, and so
calling it from the panic notifier is too early. Instead,
call it from the machine_reset() call.

This splits the rtas_os_term() routine into two: one part to capture
the kernel panic message, invoked during the panic notifier, and
another part that is invoked during machine_reset().

Prior to this patch, the os-term call was never being made,
because panic_timeout was always non-zero. Calling os-term
helps keep the hypervisor happy! We have to keep the hypervisor
happy to avoid service, dump and error reporting problems.

Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>


# 8f5c7579 13-Nov-2007 Nathan Lynch <ntl@pobox.com>

[POWERPC] Fix multiple bugs in rtas_ibm_suspend_me code

There are several issues with the rtas_ibm_suspend_me code, which
enables platform-assisted suspension of an LPAR as covered in PAPR
2.2.

1.) rtas_ibm_suspend_me uses on_each_cpu() to invoke
rtas_percpu_suspend_me on all cpus via IPI:

if (on_each_cpu(rtas_percpu_suspend_me, &data, 1, 0))
...

'data' is on the calling task's stack, but rtas_ibm_suspend_me takes
no measures to ensure that all instances of rtas_percpu_suspend_me are
finished accessing 'data' before returning. This can result in the
IPI'd cpus accessing random stack data and getting stuck in H_JOIN.

This is addressed by using an atomic count of workers and a completion
on the stack.

2.) rtas_percpu_suspend_me is needlessly calling H_JOIN in a loop.
The only event that can cause a cpu to return from H_JOIN is an H_PROD
from another cpu or a NMI/system reset. Each cpu need call H_JOIN
only once per suspend operation.

Remove the loop and the now unnecessary 'waiting' state variable.

3.) H_JOIN must be called with MSR[EE] off, but lazy interrupt
disabling may cause the caller of rtas_ibm_suspend_me to call H_JOIN
with it on; the local_irq_disable() in on_each_cpu() is not
sufficient.

Fix this by explicitly saving the MSR and clearing the EE bit before
calling H_JOIN.

4.) H_PROD is being called with the Linux logical cpu number as the
parameter, not the platform interrupt server value. (It's also being
called for all possible cpus, which is harmless, but unnecessary.)

This is fixed by calling H_PROD for each online cpu using
get_hard_smp_processor_id(cpu) for the argument.

Signed-off-by: Nathan Lynch <ntl@pobox.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>


# 8c8dc322 23-Apr-2007 Stephen Rothwell <sfr@canb.auug.org.au>

[POWERPC] Remove old interface find_path_device

Replaced by of_find_node_by_path.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>


# e2eb6392 03-Apr-2007 Stephen Rothwell <sfr@canb.auug.org.au>

[POWERPC] Rename get_property to of_get_property: arch/powerpc

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>


# c5a69d57 17-Feb-2007 Tobias Klauser <tklauser@distanz.ch>

Storage class should be before const qualifier

The C99 specification states in section 6.11.5:

The placement of a storage-class specifier other than at the
beginning of the declaration specifiers in a declaration is an
obsolescent feature.

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Adrian Bunk <bunk@stusta.de>


# f2d6d2d8 06-Dec-2006 Nathan Lynch <ntl@pobox.com>

[POWERPC] Add rtas_service_present() helper

To test for the existence of an RTAS function, we typically do:

foo_token = rtas_token("foo");
if (foo_token == RTAS_UNKNOWN_SERVICE)
return;

Add a rtas_service_present method, which provides a more conventional
boolean interface for testing the existence of an RTAS method.

Signed-off-by: Nathan Lynch <ntl@pobox.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>


# 0332c2d4 04-Dec-2006 Michael Ellerman <michael@ellerman.id.au>

[POWERPC] Move rtas_stop_self() into platforms/pseries/hotplug-cpu.c

As the first step in consolidating the pseries hotplug cpu code,
create platforms/pseries/hotplug-cpu.c and move rtas_stop_self()
into it. Do the rtas token initialisation in a new initcall, rather
than rtas_initialize().

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>


# 088df4d2 16-Nov-2006 Linas Vepstas <linas@austin.ibm.com>

[POWERPC] Wrap cpu_die() with CONFIG_HOTPLUG_CPU

Per email discussion, it appears that rtas_stop_self()
and pSeries_mach_cpu_die() should not be compiled if
CONFIG_HOTPLUG_CPU is not defined. This patch adds
#ifdefs around these bits of code.

Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>


# 39ed2fe6 21-Aug-2006 Olaf Hering <olaf@aepfle.de>

[POWERPC] reboot when panic_timout is set

Only call into RTAS when booted with panic=0 because the RTAS call
does not return. The system has to be rebooted via the HMC or via the
management console right now. This is cumbersome and not what the
default panic=180 is supposed to do.

Signed-off-by: Olaf Hering <olh@suse.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>


# 9a2ded55 16-Aug-2006 Michael Neuling <mikey@neuling.org>

[POWERPC] powerpc: Make RTAS console init generic

The rtas console doesn't have to be Cell specific. If we get both
RTAS tokens, we should just enabled the console then and there.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>


# 81b73dd9 27-Jul-2006 Haren Myneni <haren@us.ibm.com>

[POWERPC] Fix might-sleep warning on removing cpus

Noticing the following might_sleep warning (dump_stack()) during kdump
testing when CONFIG_DEBUG_SPINLOCK_SLEEP is enabled. All secondary CPUs
will be calling rtas_set_indicator with interrupts disabled to remove
them from global interrupt queue.

BUG: sleeping function called from invalid context at
arch/powerpc/kernel/rtas.c:463
in_atomic():1, irqs_disabled():1
Call Trace:
[C00000000FFFB970] [C000000000010234] .show_stack+0x68/0x1b0 (unreliable)
[C00000000FFFBA10] [C000000000059354] .__might_sleep+0xd8/0xf4
[C00000000FFFBA90] [C00000000001D1BC] .rtas_busy_delay+0x20/0x5c
[C00000000FFFBB20] [C00000000001D8A8] .rtas_set_indicator+0x6c/0xcc
[C00000000FFFBBC0] [C000000000048BF4] .xics_teardown_cpu+0x118/0x134
[C00000000FFFBC40] [C00000000004539C]
.pseries_kexec_cpu_down_xics+0x74/0x8c
[C00000000FFFBCC0] [C00000000002DF08] .crash_ipi_callback+0x15c/0x188
[C00000000FFFBD50] [C0000000000296EC] .smp_message_recv+0x84/0xdc
[C00000000FFFBDC0] [C000000000048E08] .xics_ipi_dispatch+0xf0/0x130
[C00000000FFFBE50] [C00000000009EF10] .handle_IRQ_event+0x7c/0xf8
[C00000000FFFBF00] [C0000000000A0A14] .handle_percpu_irq+0x90/0x10c
[C00000000FFFBF90] [C00000000002659C] .call_handle_irq+0x1c/0x2c
[C00000000058B9C0] [C00000000000CA10] .do_IRQ+0xf4/0x1a4
[C00000000058BA50] [C0000000000044EC] hardware_interrupt_entry+0xc/0x10
--- Exception: 501 at .plpar_hcall_norets+0x14/0x1c
LR = .pseries_dedicated_idle_sleep+0x190/0x1d4
[C00000000058BD40] [C00000000058BDE0] 0xc00000000058bde0 (unreliable)
[C00000000058BDF0] [C00000000001270C] .cpu_idle+0x10c/0x1e0
[C00000000058BE70] [C000000000009274] .rest_init+0x44/0x5c

To fix this issue, rtas_set_indicator_fast() is added so that will not
wait for RTAS 'busy' delay and this new function is used for kdump (in
xics_teardown_cpu()) and for CPU hotplug ( xics_migrate_irqs_away() and
xics_setup_cpu()).

Note that the platform architecture spec says that set-indicator
on the indicator we're using here is not permitted to return the
busy or extended busy status codes.

Signed-off-by: Haren Myneni <haren@us.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>


# b9377ffc 18-Jul-2006 Anton Blanchard <anton@samba.org>

[POWERPC] clean up pseries hcall interfaces

Our pseries hcall interfaces are out of control:

plpar_hcall_norets
plpar_hcall
plpar_hcall_8arg_2ret
plpar_hcall_4out
plpar_hcall_7arg_7ret
plpar_hcall_9arg_9ret

Create 3 interfaces to cover all cases:

plpar_hcall_norets: 7 arguments no returns
plpar_hcall: 6 arguments 4 returns
plpar_hcall9: 9 arguments 9 returns

There are only 2 cases in the kernel that need plpar_hcall9, hopefully
we can keep it that way.

Pass in a buffer to stash return parameters so we avoid the &dummy1,
&dummy2 madness.

Signed-off-by: Anton Blanchard <anton@samba.org>
--
Signed-off-by: Paul Mackerras <paulus@samba.org>


# a7f67bdf 11-Jul-2006 Jeremy Kerr <jk@ozlabs.org>

[POWERPC] Constify & voidify get_property()

Now that get_property() returns a void *, there's no need to cast its
return value. Also, treat the return value as const, so we can
constify get_property later.

powerpc core changes.

Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>


# cc46bb98 23-Jun-2006 Michael Ellerman <michael@ellerman.id.au>

[POWERPC] Add udbg support for RTAS console

Add udbg hooks for the RTAS console, based on the RTAS put-term-char
and get-term-char calls. Along with my previous patches, this should
enable debugging as soon as early_init_dt_scan_rtas() is called.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>


# 458148c0 23-Jun-2006 Michael Ellerman <michael@ellerman.id.au>

[POWERPC] Setup RTAS values earlier, to enable rtas_call() earlier

Althought RTAS is instantiated when we enter the kernel, we can't actually
call into it until we know its entry point address. Currently we grab that
in rtas_initialize(), however that's quite late in the boot sequence.

To enable rtas_call() earlier, we can grab the RTAS entry etc. values while
we're scanning the flattened device tree. There's existing code to retrieve
the values from /chosen, however we don't store them there anymore, so remove
that code.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>


# ab3ab74d 23-Jun-2006 Michael Ellerman <michael@ellerman.id.au>

[POWERPC] Move RTAS exports next to their declarations

Move RTAS exports next to their declarations.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>


# 24da3dd5 23-Jun-2006 Michael Ellerman <michael@ellerman.id.au>

[POWERPC] Make rtas_call() safe if RTAS hasn't been initialised

Currently it's unsafe to call rtas_call() prior to rtas_initialize(). This
is because the rtas.entry value hasn't been setup and so we don't know
where to enter, but we just try anyway.

We can't do anything intelligent without rtas.entry, so if it's not set, just
return. Code that calls rtas_call() early needs to be aware that the call
might fail.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>


# 7932f0b8 15-Jun-2006 John Rose <johnrose@austin.ibm.com>

[POWERPC] RTAS delay, fix module build breaks

Export both news RTAS delay functions, and change the scanlog module to
use the new delay functions.

Signed-off-by: John Rose <johnrose@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>


# 368a6ba5 12-Jun-2006 Dave C Boutcher <boutcher@cs.umn.edu>

[POWERPC] check firmware state before suspending

Currently the kernel blindly halts all the processors and calls the
ibm,suspend-me rtas call. If the firmware is not in the correct
state, we then re-start all the processors and return. It is much
smarter to first check the firmware state, and only if it is waiting,
call the ibm,suspend-me call.

Signed-off-by: Paul Mackerras <paulus@samba.org>


# 507279db 05-Jun-2006 John Rose <johnrose@austin.ibm.com>

[PATCH] powerpc: reorg RTAS delay code

This patch attempts to handle RTAS "busy" return codes in a more simple
and consistent manner. Typical callers of RTAS shouldn't have to
manage wait times and delay calls.

This patch also changes the kernel to use msleep() rather than udelay()
when a runtime delay is necessary. This will avoid CPU soft lockups
for extended delay conditions.

Signed-off-by: John Rose <johnrose@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>


# 706c8c93 30-Mar-2006 Segher Boessenkool <segher@kernel.crashing.org>

[PATCH] powerpc/pseries: Change H_StudlyCaps to H_SHOUTING_CAPS

Also cleans up some nearby whitespace problems.

Signed-off-by: Segher Boessenkool <segher@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>


# 0e551954 28-Mar-2006 KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

[PATCH] for_each_possible_cpu: powerpc

for_each_cpu() actually iterates across all possible CPUs. We've had mistakes
in the past where people were using for_each_cpu() where they should have been
iterating across only online or present CPUs. This is inefficient and
possibly buggy.

We're renaming for_each_cpu() to for_each_possible_cpu() to avoid this in the
future.

This patch replaces for_each_cpu with for_each_possible_cpu.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>


# e8222502 28-Mar-2006 Benjamin Herrenschmidt <benh@kernel.crashing.org>

[PATCH] powerpc: Kill _machine and hard-coded platform numbers

This removes statically assigned platform numbers and reworks the
powerpc platform probe code to use a better mechanism. With this,
board support files can simply declare a new machine type with a
macro, and implement a probe() function that uses the flattened
device-tree to detect if they apply for a given machine.

We now have a machine_is() macro that replaces the comparisons of
_machine with the various PLATFORM_* constants. This commit also
changes various drivers to use the new macro instead of looking at
_machine.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>


# a7f31841 22-Mar-2006 Arnd Bergmann <abergman@de.ibm.com>

[PATCH] powerpc: declare arch syscalls in <asm/syscalls.h>

powerpc currently declares some of its own system calls
in <asm/unistd.h>, but not all of them. That place also
contains remainders of the now almost unused kernel syscall
hack.

- Add a new <asm/syscalls.h> with clean declarations
- Include that file from every source that implements one
of these
- Get rid of old declarations in <asm/unistd.h>

This patch is required as a base for implementing system
calls from an SPU, but also makes sense as a general
cleanup.

Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>


# b4fd884a 03-Feb-2006 Dave C Boutcher <boutcher@cs.umn.edu>

[PATCH] powerpc: remove useless call to touch_softlockup_watchdog

It turns out that we can't stop the watchdog from
triggering here. If we touch the timer (which just uses the current jiffie
value) before we enable interrupts, it does nothing because jiffies
are not mass-updated until after we enable interrupts. If we touch the
timer after we enable interrupts, its too late because the softlockup
watchdog will already have triggered. The touch_softlockup_watchdog
call removed below does nothing.

Signed-off-by: Dave Boutcher <sleddog@us.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>


# 82a4df74 03-Feb-2006 Dave C Boutcher <boutcher@cs.umn.edu>

[PATCH] powerpc: prod all processors after ibm,suspend-me

We need to prod everyone here since this is the only CPU that is
guaranteed to be running after the ibm,suspend-me RTAS call returns.

Signed-off-by: Dave Boutcher <sleddog@us.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>


# c4cb8ecc 03-Feb-2006 Dave C Boutcher <boutcher@cs.umn.edu>

[PATCH] powerpc: return correct rtas status from ibm,suspend-me

Correctly return the status from the RTAS call. rtas_call expects
to return the status as a return value.

Signed-off-by: Dave Boutcher <sleddog@us.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>


# 31a7f67e 30-Jan-2006 Michael Ellerman <michael@ellerman.id.au>

[PATCH] powerpc: Fix !SMP build of rtas.c

arch/powerpc/kernel/rtas.c is getting hvcall.h via spinlock.h, but when we're
building for UP we don't include spinlock.h.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>


# 91dc182c 13-Jan-2006 Dave C Boutcher <sleddog@us.ibm.com>

[PATCH] powerpc: special-case ibm,suspend-me RTAS call

Handle the ibm,suspend-me RTAS call specially. It needs
to be wrapped in a set of synchronization hypervisor calls
(H_Join). When the H_Join calls are made on all CPUs, the
intent is that only one will return with H_Continue, meaning
that he is the "last man standing". That CPU then issues the
ibm,suspend-me call. What is interesting, of course, is that
the CPU running when the rtas syscall is made, may NOT be the
CPU that ultimately executes the ibm,suspend-me rtas call.

Signed-off-by: Dave Boutcher <sleddog@us.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>


# a9415644 11-Jan-2006 Randy Dunlap <rdunlap@infradead.org>

[PATCH] capable/capability.h (arch/)

arch: Use <linux/capability.h> where capable() is used.

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 296167ae 10-Jan-2006 Michael Ellerman <michael@ellerman.id.au>

[PATCH] powerpc: Make early debugging configurable via Kconfig

This patch adds Kconfig entries to control the early debugging options,
currently in setup_64.c.

Doing this via Kconfig rather than #defines means you can have one source tree,
which is buildable for multiple platforms - and you can enable the correct
early debug option for each platform via .config.

I made udbg_early_init() a static inline because otherwise GCC is to daft to
optimise it away when debugging is off.

Now that we have udbg_init_rtas() we can make call_rtas_display_status* static.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>


# 943ffb58 09-Jan-2006 Adrian Bunk <bunk@stusta.de>

spelling: s/retreive/retrieve/

Signed-off-by: Adrian Bunk <bunk@stusta.de>


# 799d6046 09-Nov-2005 Paul Mackerras <paulus@samba.org>

[PATCH] powerpc: merge code values for identifying platforms

This patch merges platform codes. systemcfg->platform is no longer used,
systemcfg use in general is deprecated as much as possible (and renamed
_systemcfg before it gets completely moved elsewhere in a future patch),
_machine is now used on ppc64 along as ppc32. Platform codes aren't gone
yet but we are getting a step closer. A bunch of asm code in head[_64].S
is also turned into C code.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>


# 21fe3301 06-Nov-2005 Benjamin Herrenschmidt <benh@kernel.crashing.org>

[PATCH] ppc: fix a bunch of warnings

Building a PowerMac kernel with ARCH=powerpc causes a bunch of warnings,
this fixes some of them

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>


# 2249ca9d 06-Nov-2005 Paul Mackerras <paulus@samba.org>

powerpc: Various UP build fixes

Mostly this involves adding #include <asm/smp.h>, since that defines
things like boot_cpuid[_phys] and [gs]et_hard_smp_processor_id, which
are SMP-related but still needed on UP. This incorporates fixes
posted by Olof Johansson and Heikki Lindholm.

Signed-off-by: Paul Mackerras <paulus@samba.org>


# f4fcbbe9 02-Nov-2005 Paul Mackerras <paulus@samba.org>

powerpc: Merge remaining RTAS code

This moves rtas-proc.c and rtas_flash.c into arch/powerpc/kernel, since
cell wants them as well as pseries (and chrp can use rtas-proc.c too,
at least in principle). rtas_fw.c is gone, with its bits moved into
rtas_flash.c and rtas.c.

Signed-off-by: Paul Mackerras <paulus@samba.org>


# 033ef338 26-Oct-2005 Paul Mackerras <paulus@samba.org>

powerpc: Merge rtas.c into arch/powerpc/kernel

This splits arch/ppc64/kernel/rtas.c into arch/powerpc/kernel/rtas.c,
which contains generic RTAS functions useful on any CHRP platform,
and arch/powerpc/platforms/pseries/rtas-fw.[ch], which contain
some pSeries-specific firmware flashing bits. The parts of rtas.c
that are to do with pSeries-specific error logging are protected
by a new CONFIG_RTAS_ERROR_LOGGING symbol. The inclusion of rtas.o
is controlled by the CONFIG_PPC_RTAS symbol, and the relevant
platforms select that.

Signed-off-by: Paul Mackerras <paulus@samba.org>