History log of /linux-master/drivers/char/random.c
Revision Date Author Comments
# e871abcd 17-Apr-2024 Jason A. Donenfeld <Jason@zx2c4.com>

random: handle creditable entropy from atomic process context

The entropy accounting changes a static key when the RNG has
initialized, since it only ever initializes once. Static key changes,
however, cannot be made from atomic context, so depending on where the
last creditable entropy comes from, the static key change might need to
be deferred to a worker.

Previously the code used the execute_in_process_context() helper
function, which accounts for whether or not the caller is
in_interrupt(). However, that doesn't account for the case where the
caller is actually in process context but is holding a spinlock.

This turned out to be the case with input_handle_event() in
drivers/input/input.c contributing entropy:

[<ffffffd613025ba0>] die+0xa8/0x2fc
[<ffffffd613027428>] bug_handler+0x44/0xec
[<ffffffd613016964>] brk_handler+0x90/0x144
[<ffffffd613041e58>] do_debug_exception+0xa0/0x148
[<ffffffd61400c208>] el1_dbg+0x60/0x7c
[<ffffffd61400c000>] el1h_64_sync_handler+0x38/0x90
[<ffffffd613011294>] el1h_64_sync+0x64/0x6c
[<ffffffd613102d88>] __might_resched+0x1fc/0x2e8
[<ffffffd613102b54>] __might_sleep+0x44/0x7c
[<ffffffd6130b6eac>] cpus_read_lock+0x1c/0xec
[<ffffffd6132c2820>] static_key_enable+0x14/0x38
[<ffffffd61400ac08>] crng_set_ready+0x14/0x28
[<ffffffd6130df4dc>] execute_in_process_context+0xb8/0xf8
[<ffffffd61400ab30>] _credit_init_bits+0x118/0x1dc
[<ffffffd6138580c8>] add_timer_randomness+0x264/0x270
[<ffffffd613857e54>] add_input_randomness+0x38/0x48
[<ffffffd613a80f94>] input_handle_event+0x2b8/0x490
[<ffffffd613a81310>] input_event+0x6c/0x98

According to Guoyong, it's not really possible to refactor the various
drivers to never hold a spinlock there. And in_atomic() isn't reliable.

So, rather than trying to be too fancy, just punt the change in the
static key to a workqueue always. There's basically no drawback of doing
this, as the code already needed to account for the static key not
changing immediately, and given that it's just an optimization, there's
not exactly a hurry to change the static key right away, so deferal is
fine.

Reported-by: Guoyong Wang <guoyong.wang@mediatek.com>
Cc: stable@vger.kernel.org
Fixes: f5bda35fba61 ("random: use static branch for crng_ready()")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# 9fd7874c 04-Dec-2023 Jens Axboe <axboe@kernel.dk>

iov_iter: replace import_single_range() with import_ubuf()

With the removal of the 'iov' argument to import_single_range(), the two
functions are now fully identical. Convert the import_single_range()
callers to import_ubuf(), and remove the former fully.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
Link: https://lore.kernel.org/r/20231204174827.1258875-3-axboe@kernel.dk
Signed-off-by: Christian Brauner <brauner@kernel.org>


# 6ac805d1 04-Dec-2023 Jens Axboe <axboe@kernel.dk>

iov_iter: remove unused 'iov' argument from import_single_range()

It is entirely unused, just get rid of it.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
Link: https://lore.kernel.org/r/20231204174827.1258875-2-axboe@kernel.dk
Signed-off-by: Christian Brauner <brauner@kernel.org>


# ed1aa959 02-Oct-2023 Joel Granados <j.granados@samsung.com>

char-misc: Remove the now superfluous sentinel element from ctl_table array

This commit comes at the tail end of a greater effort to remove the
empty elements at the end of the ctl_table arrays (sentinels) which
will reduce the overall build time size of the kernel and run time
memory bloat by ~64 bytes per sentinel (further information Link :
https://lore.kernel.org/all/ZO5Yx5JFogGi%2FcBo@bombadil.infradead.org/)

Remove sentinel from impi_table and random_table

Signed-off-by: Joel Granados <j.granados@samsung.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>


# b0072734 22-May-2023 David Howells <dhowells@redhat.com>

tty, proc, kernfs, random: Use copy_splice_read()

Use copy_splice_read() for tty, procfs, kernfs and random files rather
than going through generic_file_splice_read() as they just copy the file
into the output buffer and don't splice pages. This avoids the need for
them to have a ->read_folio() to satisfy filemap_splice_read().

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
cc: Christoph Hellwig <hch@lst.de>
cc: Jens Axboe <axboe@kernel.dk>
cc: Al Viro <viro@zeniv.linux.org.uk>
cc: John Hubbard <jhubbard@nvidia.com>
cc: David Hildenbrand <david@redhat.com>
cc: Matthew Wilcox <willy@infradead.org>
cc: Miklos Szeredi <miklos@szeredi.hu>
cc: Arnd Bergmann <arnd@arndb.de>
cc: linux-block@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
cc: linux-mm@kvack.org
Link: https://lore.kernel.org/r/20230522135018.2742245-13-dhowells@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 8ca09d5f 06-Mar-2023 Linus Torvalds <torvalds@linux-foundation.org>

cpumask: fix incorrect cpumask scanning result checks

It turns out that commit 596ff4a09b89 ("cpumask: re-introduce
constant-sized cpumask optimizations") exposed a number of cases of
drivers not checking the result of "cpumask_next()" and friends
correctly.

The documented correct check for "no more cpus in the cpumask" is to
check for the result being equal or larger than the number of possible
CPU ids, exactly _because_ we've always done those constant-sized
cpumask scans using a widened type before. So the return value of a
cpumask scan should be checked with

if (cpu >= nr_cpu_ids)
...

because the cpumask scan did not necessarily stop exactly *at* that
maximum CPU id.

But a few cases ended up instead using checks like

if (cpu == nr_cpumask_bits)
...

which used that internal "widened" number of bits. And that used to
work pretty much by accident (ok, in this case "by accident" is simply
because it matched the historical internal implementation of the cpumask
scanning, so it was more of a "intentionally using implementation
details rather than an accident").

But the extended constant-sized optimizations then did that internal
implementation differently, and now that code that did things wrong but
matched the old implementation no longer worked at all.

Which then causes subsequent odd problems due to using what ends up
being an invalid CPU ID.

Most of these cases require either unusual hardware or special uses to
hit, but the random.c one triggers quite easily.

All you really need is to have a sufficiently small CONFIG_NR_CPUS value
for the bit scanning optimization to be triggered, but not enough CPUs
to then actually fill that widened cpumask. At that point, the cpumask
scanning will return the NR_CPUS constant, which is _not_ the same as
nr_cpumask_bits.

This just does the mindless fix with

sed -i 's/== nr_cpumask_bits/>= nr_cpu_ids/'

to fix the incorrect uses.

The ones in the SCSI lpfc driver in particular could probably be fixed
more cleanly by just removing that repeated pattern entirely, but I am
not emptionally invested enough in that driver to care.

Reported-and-tested-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/lkml/481b19b5-83a0-4793-b4fd-194ad7b978c3@roeck-us.net/
Reported-and-tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/lkml/CAMuHMdUKo_Sf7TjKzcNDa8Ve+6QrK+P8nSQrSQ=6LTRmcBKNww@mail.gmail.com/
Reported-by: Vernon Yang <vernon2gm@gmail.com>
Link: https://lore.kernel.org/lkml/20230306160651.2016767-1-vernon2gm@gmail.com/
Cc: Yury Norov <yury.norov@gmail.com>
Cc: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 6bb20c15 28-Oct-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: do not include <asm/archrandom.h> from random.h

The <asm/archrandom.h> header is a random.c private detail, not
something to be called by other code. As such, don't make it
automatically available by way of random.h.

Cc: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# 39ec9e6b 29-Nov-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: align entropy_timer_state to cache line

The theory behind the jitter dance is that multiple things are poking at
the same cache line. This only works, however, if what's being poked at
is actually all in the same cache line. Ensure this is the case by
aligning the struct on the stack to the cache line size.

We can't use ____cacheline_aligned on a stack variable, because gcc
assumes 16 byte alignment when only 8 byte alignment is provided by the
kernel, which means gcc could technically do something pathological
like `(rsp & ~48) - 64`. It doesn't, but rather than risk it, just do
the stack alignment manually with PTR_ALIGN and an oversized buffer.

Fixes: 50ee7529ec45 ("random: try to actively add entropy rather than passively wait for it")
Cc: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# b83e45fd 29-Nov-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: mix in cycle counter when jitter timer fires

Rather than just relying on interaction between cache lines of the timer
and the main loop, also explicitly take into account the fact that the
timer might fire at some time that's hard to predict, due to scheduling,
interrupts, or cross-CPU conditions. Mix in a cycle counter during the
firing of the timer, in addition to the existing one during the
scheduling of the timer. It can't hurt and can only help.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# 1c21fe00 30-Sep-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: spread out jitter callback to different CPUs

Rather than merely hoping that the callback gets called on another CPU,
arrange for that to actually happen, by round robining which CPU the
timer fires on. This way, on multiprocessor machines, we exacerbate
jitter by touching the same memory from multiple different cores.

There's a little bit of tricky bookkeeping involved here, because using
timer_setup_on_stack() + add_timer_on() + del_timer_sync() will result
in a use after free. See this sample code: <https://xn--4db.cc/xBdEiIKO/c>.

Instead, it's necessary to call [try_to_]del_timer_sync() before calling
add_timer_on(), so that the final call to del_timer_sync() at the end of
the function actually succeeds at making sure no handlers are running.

Cc: Sultan Alsawaf <sultan@kerneltoast.com>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# 0e42d14b 28-Nov-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: remove extraneous period and add a missing one in comments

Just some trivial typo fixes, and reflowing of lines.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# de4eda9d 15-Sep-2022 Al Viro <viro@zeniv.linux.org.uk>

use less confusing names for iov_iter direction initializers

READ/WRITE proved to be actively confusing - the meanings are
"data destination, as used with read(2)" and "data source, as
used with write(2)", but people keep interpreting those as
"we read data from it" and "we write data to it", i.e. exactly
the wrong way.

Call them ITER_DEST and ITER_SOURCE - at least that is harder
to misinterpret...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# bbc7e1be 16-Nov-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: add back async readiness notifier

This is required by vsprint, because it can't do things synchronously
from hardirq context, and it will be useful for an EFI notifier as well.
I didn't initially want to do this, but with two potential consumers
now, it seems worth it.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# 9148de31 17-Nov-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: reseed in delayed work rather than on-demand

Currently, we reseed when random bytes are requested, if the current
seed is too old. Since random bytes can be requested from all contexts,
including hard IRQ, this means sometimes we wind up adding a bit of
latency to hard IRQ. This was so much of a problem on s390x that now
s390x just doesn't provide its architectural RNG from hard IRQ context,
so we miss out in that case.

Instead, let's just schedule a persistent delayed work, so that the
reseeding and potentially expensive operations will always happen from
process context, reducing unexpected latencies from hard IRQ.

This also has the nice effect of accumulating a transcript of random
inputs over time, since it means that we amass more input values. And it
should make future vDSO integration a bit easier.

Cc: Harald Freudenberger <freude@linux.ibm.com>
Cc: Juergen Christ <jchrist@linux.ibm.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# db516da9 06-Nov-2022 Jason A. Donenfeld <Jason@zx2c4.com>

hw_random: use add_hwgenerator_randomness() for early entropy

Rather than calling add_device_randomness(), the add_early_randomness()
function should use add_hwgenerator_randomness(), so that the early
entropy can be potentially credited, which allows for the RNG to
initialize earlier without having to wait for the kthread to come up.

This requires some minor API refactoring, by adding a `sleep_after`
parameter to add_hwgenerator_randomness(), so that we don't hit a
blocking sleep from add_early_randomness().

Tested-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# 19258d05 03-Nov-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: modernize documentation comment on get_random_bytes()

The prior text was very old and made outdated references to TCP sequence
numbers, which should use one of the integer functions instead, since
batched entropy was introduced. The current way of describing the
quality of functions is just to say that it's as good as /dev/urandom,
which now all the functions are.

Fixes: f5b98461cb81 ("random: use chacha20 for get_random_int/long")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# b240bab5 01-Nov-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: adjust comment to account for removed function

Since de492c83cae0 ("prandom: remove unused functions"),
get_random_int() no longer exists, so remove its reference from this
comment.

Fixes: de492c83cae0 ("prandom: remove unused functions")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# 2c03e16f 28-Oct-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: remove early archrandom abstraction

The arch_get_random*_early() abstraction is not completely useful and
adds complexity, because it's not a given that there will be no calls to
arch_get_random*() between random_init_early(), which uses
arch_get_random*_early(), and init_cpu_features(). During that gap,
crng_reseed() might be called, which uses arch_get_random*(), since it's
mostly not init code.

Instead we can test whether we're in the early phase in
arch_get_random*() itself, and in doing so avoid all ambiguity about
where we are. Fortunately, the only architecture that currently
implements arch_get_random*_early() also has an alternatives-based cpu
feature system, one flag of which determines whether the other flags
have been initialized. This makes it possible to do the early check with
zero cost once the system is initialized.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# b9b01a56 01-Nov-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: use random.trust_{bootloader,cpu} command line option only

It's very unusual to have both a command line option and a compile time
option, and apparently that's confusing to people. Also, basically
everybody enables the compile time option now, which means people who
want to disable this wind up having to use the command line option to
ensure that anyway. So just reduce the number of moving pieces and nix
the compile time option in favor of the more versatile command line
option.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# 7f576b25 19-Oct-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: add helpers for random numbers with given floor or range

Now that we have get_random_u32_below(), it's nearly trivial to make
inline helpers to compute get_random_u32_above() and
get_random_u32_inclusive(), which will help clean up open coded loops
and manual computations throughout the tree.

One snag is that in order to make get_random_u32_inclusive() operate on
closed intervals, we have to do some (unlikely) special case handling if
get_random_u32_inclusive(0, U32_MAX) is called. The least expensive way
of doing this is actually to adjust the slowpath of
get_random_u32_below() to have its undefined 0 result just return the
output of get_random_u32(). We can make this basically free by calling
get_random_u32() before the branch, so that the branch latency gets
interleaved.

Cc: stable@vger.kernel.org # to ease future backports that use this api
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# e9a688bc 08-Oct-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: use rejection sampling for uniform bounded random integers

Until the very recent commits, many bounded random integers were
calculated using `get_random_u32() % max_plus_one`, which not only
incurs the price of a division -- indicating performance mostly was not
a real issue -- but also does not result in a uniformly distributed
output if max_plus_one is not a power of two. Recent commits moved to
using `prandom_u32_max(max_plus_one)`, which replaces the division with
a faster multiplication, but still does not solve the issue with
non-uniform output.

For some users, maybe this isn't a problem, and for others, maybe it is,
but for the majority of users, probably the question has never been
posed and analyzed, and nobody thought much about it, probably assuming
random is random is random. In other words, the unthinking expectation
of most users is likely that the resultant numbers are uniform.

So we implement here an efficient way of generating uniform bounded
random integers. Through use of compile-time evaluation, and avoiding
divisions as much as possible, this commit introduces no measurable
overhead. At least for hot-path uses tested, any potential difference
was lost in the noise. On both clang and gcc, code generation is pretty
small.

The new function, get_random_u32_below(), lives in random.h, rather than
prandom.h, and has a "get_random_xxx" function name, because it is
suitable for all uses, including cryptography.

In order to be efficient, we implement a kernel-specific variant of
Daniel Lemire's algorithm from "Fast Random Integer Generation in an
Interval", linked below. The kernel's variant takes advantage of
constant folding to avoid divisions entirely in the vast majority of
cases, works on both 32-bit and 64-bit architectures, and requests a
minimal amount of bytes from the RNG.

Link: https://arxiv.org/pdf/1805.10941.pdf
Cc: stable@vger.kernel.org # to ease future backports that use this api
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# f5e4ec15 28-Oct-2022 Jean-Philippe Brucker <jean-philippe@linaro.org>

random: use arch_get_random*_early() in random_init()

While reworking the archrandom handling, commit d349ab99eec7 ("random:
handle archrandom with multiple longs") switched to the non-early
archrandom helpers in random_init(), which broke initialization of the
entropy pool from the arm64 random generator.

Indeed at that point the arm64 CPU features, which verify that all CPUs
have compatible capabilities, are not finalized so arch_get_random_seed_longs()
is unsuccessful. Instead random_init() should use the _early functions,
which check only the boot CPU on arm64. On other architectures the
_early functions directly call the normal ones.

Fixes: d349ab99eec7 ("random: handle archrandom with multiple longs")
Cc: stable@vger.kernel.org
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# de492c83 05-Oct-2022 Jason A. Donenfeld <Jason@zx2c4.com>

prandom: remove unused functions

With no callers left of prandom_u32() and prandom_bytes(), as well as
get_random_int(), remove these deprecated wrappers, in favor of
get_random_u32() and get_random_bytes().

Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Yury Norov <yury.norov@gmail.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# a890d1c6 04-Oct-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: clear new batches when bringing new CPUs online

The commit that added the new get_random_{u8,u16}() functions neglected
to update the code that clears the batches when bringing up a new CPU.
It also forgot a few comments and helper defines, so add those in too.

Fixes: 585cd5fe9f73 ("random: add 8-bit and 16-bit batches")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# d687772e 01-Oct-2022 William Zijl <postmaster@gusted.xyz>

random: fix typos in get_random_bytes() comment

Remove extra whitespace and add a missing word to a sentence describing
get_random_bytes().

Signed-off-by: William Zijl <postmaster@gusted.xyz>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# 12273347 30-Sep-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: schedule jitter credit for next jiffy, not in two jiffies

Counterintuitively, mod_timer(..., jiffies + 1) will cause the timer to
fire not in the next jiffy, but in two jiffies. The way to cause
the timer to fire in the next jiffy is with mod_timer(..., jiffies).
Doing so then lets us bump the upper bound back up again.

Fixes: 50ee7529ec45 ("random: try to actively add entropy rather than passively wait for it")
Fixes: 829d680e82a9 ("random: cap jitter samples per bit to factor of HZ")
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# 585cd5fe 28-Sep-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: add 8-bit and 16-bit batches

There are numerous places in the kernel that would be sped up by having
smaller batches. Currently those callsites do `get_random_u32() & 0xff`
or similar. Since these are pretty spread out, and will require patches
to multiple different trees, let's get ahead of the curve and lay the
foundation for `get_random_u8()` and `get_random_u16()`, so that it's
then possible to start submitting conversion patches leisurely.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# dd54fd7d 27-Sep-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: use init_utsname() instead of utsname()

Rather than going through the current-> indirection for utsname, at this
point in boot, init_utsname()==utsname(), so just use it directly that
way. Additionally, init_utsname() appears to be available nearly always,
so move it into random_init_early().

Suggested-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# f6238499 26-Sep-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: split initialization into early step and later step

The full RNG initialization relies on some timestamps, made possible
with initialization functions like time_init() and timekeeping_init().
However, these are only available rather late in initialization.
Meanwhile, other things, such as memory allocator functions, make use of
the RNG much earlier.

So split RNG initialization into two phases. We can provide arch
randomness very early on, and then later, after timekeeping and such are
available, initialize the rest.

This ensures that, for example, slabs are properly randomized if RDRAND
is available. Without this, CONFIG_SLAB_FREELIST_RANDOM=y loses a degree
of its security, because its random seed is potentially deterministic,
since it hasn't yet incorporated RDRAND. It also makes it possible to
use a better seed in kfence, which currently relies on only the cycle
counter.

Another positive consequence is that on systems with RDRAND, running
with CONFIG_WARN_ALL_UNSEEDED_RANDOM=y results in no warnings at all.

One subtle side effect of this change is that on systems with no RDRAND,
RDTSC is now only queried by random_init() once, committing the moment
of the function call, instead of multiple times as before. This is
intentional, as the multiple RDTSCs in a loop before weren't
accomplishing very much, with jitter being better provided by
try_to_generate_entropy(). Plus, filling blocks with RDTSC is still
being done in extract_entropy(), which is necessarily called before
random bytes are served anyway.

Cc: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# 748bc4dd 22-Sep-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: use expired timer rather than wq for mixing fast pool

Previously, the fast pool was dumped into the main pool periodically in
the fast pool's hard IRQ handler. This worked fine and there weren't
problems with it, until RT came around. Since RT converts spinlocks into
sleeping locks, problems cropped up. Rather than switching to raw
spinlocks, the RT developers preferred we make the transformation from
originally doing:

do_some_stuff()
spin_lock()
do_some_other_stuff()
spin_unlock()

to doing:

do_some_stuff()
queue_work_on(some_other_stuff_worker)

This is an ordinary pattern done all over the kernel. However, Sherry
noticed a 10% performance regression in qperf TCP over a 40gbps
InfiniBand card. Quoting her message:

> MT27500 Family [ConnectX-3] cards:
> Infiniband device 'mlx4_0' port 1 status:
> default gid: fe80:0000:0000:0000:0010:e000:0178:9eb1
> base lid: 0x6
> sm lid: 0x1
> state: 4: ACTIVE
> phys state: 5: LinkUp
> rate: 40 Gb/sec (4X QDR)
> link_layer: InfiniBand
>
> Cards are configured with IP addresses on private subnet for IPoIB
> performance testing.
> Regression identified in this bug is in TCP latency in this stack as reported
> by qperf tcp_lat metric:
>
> We have one system listen as a qperf server:
> [root@yourQperfServer ~]# qperf
>
> Have the other system connect to qperf server as a client (in this
> case, it’s X7 server with Mellanox card):
> [root@yourQperfClient ~]# numactl -m0 -N0 qperf 20.20.20.101 -v -uu -ub --time 60 --wait_server 20 -oo msg_size:4K:1024K:*2 tcp_lat

Rather than incur the scheduling latency from queue_work_on, we can
instead switch to running on the next timer tick, on the same core. This
also batches things a bit more -- once per jiffy -- which is okay now
that mix_interrupt_randomness() can credit multiple bits at once.

Reported-by: Sherry Yang <sherry.yang@oracle.com>
Tested-by: Paul Webb <paul.x.webb@oracle.com>
Cc: Sherry Yang <sherry.yang@oracle.com>
Cc: Phillip Goerl <phillip.goerl@oracle.com>
Cc: Jack Vogel <jack.vogel@oracle.com>
Cc: Nicky Veitch <nicky.veitch@oracle.com>
Cc: Colm Harrington <colm.harrington@oracle.com>
Cc: Ramanan Govindarajan <ramanan.govindarajan@oracle.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Tejun Heo <tj@kernel.org>
Cc: Sultan Alsawaf <sultan@kerneltoast.com>
Cc: stable@vger.kernel.org
Fixes: 58340f8e952b ("random: defer fast pool mixing to worker")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# 9ee0507e 22-Sep-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: avoid reading two cache lines on irq randomness

In order to avoid reading and dirtying two cache lines on every IRQ,
move the work_struct to the bottom of the fast_pool struct. add_
interrupt_randomness() always touches .pool and .count, which are
currently split, because .mix pushes everything down. Instead, move .mix
to the bottom, so that .pool and .count are always in the first cache
line, since .mix is only accessed when the pool is full.

Fixes: 58340f8e952b ("random: defer fast pool mixing to worker")
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# e78a802a 22-Sep-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: clamp credited irq bits to maximum mixed

Since the most that's mixed into the pool is sizeof(long)*2, don't
credit more than that many bytes of entropy.

Fixes: e3e33fc2ea7f ("random: do not use input pool from hard IRQs")
Cc: stable@vger.kernel.org
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# d775335e 20-Sep-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: throttle hwrng writes if no entropy is credited

If a hwrng source does not provide an entropy estimate, it currently
does not contribute at all to the CRNG. In order to help fix this, in
case add_hwgenerator_randomness() is called with the entropy parameter
set to zero, go to sleep until one reseed interval has passed.

While the hwrng thread currently only runs under conditions where this
is non-zero, this change is not harmful and prepares for future updates
to the hwrng core.

Cc: Herbert Xu <herbert@gondor.apana.org.au>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# 745558f9 03-Sep-2022 Dominik Brodowski <linux@dominikbrodowski.net>

random: use hwgenerator randomness more frequently at early boot

Mix in randomness from hw-rng sources more frequently during early
boot, approximately once for every rng reseed.

Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# cd4f24ae 08-Sep-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: restore O_NONBLOCK support

Prior to 5.6, when /dev/random was opened with O_NONBLOCK, it would
return -EAGAIN if there was no entropy. When the pools were unified in
5.6, this was lost. The post 5.6 behavior of blocking until the pool is
initialized, and ignoring O_NONBLOCK in the process, went unnoticed,
with no reports about the regression received for two and a half years.
However, eventually this indeed did break somebody's userspace.

So we restore the old behavior, by returning -EAGAIN if the pool is not
initialized. Unlike the old /dev/random, this can only occur during
early boot, after which it never blocks again.

In order to make this O_NONBLOCK behavior consistent with other
expectations, also respect users reading with preadv2(RWF_NOWAIT) and
similar.

Fixes: 30c08efec888 ("random: make /dev/random be almost like /dev/urandom")
Reported-by: Guozihua <guozihua@huawei.com>
Reported-by: Zhongguohua <zhongguohua1@huawei.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Andrew Lutomirski <luto@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# 261e224d 30-Jun-2022 Kalesh Singh <kaleshsingh@google.com>

pm/sleep: Add PM_USERSPACE_AUTOSLEEP Kconfig

Systems that initiate frequent suspend/resume from userspace
can make the kernel aware by enabling PM_USERSPACE_AUTOSLEEP
config.

This allows for certain sleep-sensitive code (wireguard/rng) to
decide on what preparatory work should be performed (or not) in
their pm_notification callbacks.

This patch was prompted by the discussion at [1] which attempts
to remove CONFIG_ANDROID that currently guards these code paths.

[1] https://lore.kernel.org/r/20220629150102.1582425-1-hch@lst.de/

Suggested-by: Jason A. Donenfeld <Jason@zx2c4.com>
Acked-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Link: https://lore.kernel.org/r/20220630191230.235306-1-kaleshsingh@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# 7f637be4 29-Jul-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: correct spelling of "overwrites"

It was missing an 'r'.

Fixes: 186873c549df ("random: use simpler fast key erasure flow on per-cpu keys")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# d349ab99 16-Jul-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: handle archrandom with multiple longs

The archrandom interface was originally designed for x86, which supplies
RDRAND/RDSEED for receiving random words into registers, resulting in
one function to generate an int and another to generate a long. However,
other architectures don't follow this.

On arm64, the SMCCC TRNG interface can return between one and three
longs. On s390, the CPACF TRNG interface can return arbitrary amounts,
with four longs having the same cost as one. On UML, the os_getrandom()
interface can return arbitrary amounts.

So change the api signature to take a "max_longs" parameter designating
the maximum number of longs requested, and then return the number of
longs generated.

Since callers need to check this return value and loop anyway, each arch
implementation does not bother implementing its own loop to try again to
fill the maximum number of longs. Additionally, all existing callers
pass in a constant max_longs parameter. Taken together, these two things
mean that the codegen doesn't really change much for one-word-at-a-time
platforms, while performance is greatly improved on platforms such as
s390.

Acked-by: Heiko Carstens <hca@linux.ibm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# b7a68f67 14-Jul-2022 Uros Bizjak <ubizjak@gmail.com>

random: use try_cmpxchg in _credit_init_bits

Use `!try_cmpxchg(ptr, &orig, new)` instead of `cmpxchg(ptr, orig, new)
!= orig` in _credit_init_bits. This has two benefits:

- The x86 cmpxchg instruction returns success in the ZF flag, so this
change saves a compare after cmpxchg, as well as a related move
instruction in front of cmpxchg.

- try_cmpxchg implicitly assigns the *ptr value to &orig when cmpxchg
fails, enabling further code simplifications.

This patch has no functional change.

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# 829d680e 13-Jul-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: cap jitter samples per bit to factor of HZ

Currently the jitter mechanism will require two timer ticks per
iteration, and it requires N iterations per bit. This N is determined
with a small measurement, and if it's too big, it won't waste time with
jitter entropy because it'd take too long or not have sufficient entropy
anyway.

With the current max N of 32, there are large timeouts on systems with a
small CONFIG_HZ. Rather than set that maximum to 32, instead choose a
factor of CONFIG_HZ. In this case, 1/30 seems to yield sane values for
different configurations of CONFIG_HZ.

Reported-by: Vladimir Murzin <vladimir.murzin@arm.com>
Fixes: 78c768e619fb ("random: vary jitter iterations based on cycle counter speed")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 63b8ea5e 20-Jun-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: update comment from copy_to_user() -> copy_to_iter()

This comment wasn't updated when we moved from read() to read_iter(), so
this patch makes the trivial fix.

Fixes: 1b388e7765f2 ("random: convert to using fops->read_iter()")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# c01d4d0a 16-Jun-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: quiet urandom warning ratelimit suppression message

random.c ratelimits how much it warns about uninitialized urandom reads
using __ratelimit(). When the RNG is finally initialized, it prints the
number of missed messages due to ratelimiting.

It has been this way since that functionality was introduced back in
2018. Recently, cc1e127bfa95 ("random: remove ratelimiting for in-kernel
unseeded randomness") put a bit more stress on the urandom ratelimiting,
which teased out a bug in the implementation.

Specifically, when under pressure, __ratelimit() will print its own
message and reset the count back to 0, making the final message at the
end less useful. Secondly, it does so as a pr_warn(), which apparently
is undesirable for people's CI.

Fortunately, __ratelimit() has the RATELIMIT_MSG_ON_RELEASE flag exactly
for this purpose, so we set the flag.

Fixes: 4e00b339e264 ("random: rate limit unseeded randomness warnings")
Cc: stable@vger.kernel.org
Reported-by: Jon Hunter <jonathanh@nvidia.com>
Reported-by: Ron Economos <re@w6rz.net>
Tested-by: Ron Economos <re@w6rz.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# 534d2eaf 15-Jun-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: schedule mix_interrupt_randomness() less often

It used to be that mix_interrupt_randomness() would credit 1 bit each
time it ran, and so add_interrupt_randomness() would schedule mix() to
run every 64 interrupts, a fairly arbitrary number, but nonetheless
considered to be a decent enough conservative estimate.

Since e3e33fc2ea7f ("random: do not use input pool from hard IRQs"),
mix() is now able to credit multiple bits, depending on the number of
calls to add(). This was done for reasons separate from this commit, but
it has the nice side effect of enabling this patch to schedule mix()
less often.

Currently the rules are:
a) Credit 1 bit for every 64 calls to add().
b) Schedule mix() once a second that add() is called.
c) Schedule mix() once every 64 calls to add().

Rules (a) and (c) no longer need to be coupled. It's still important to
have _some_ value in (c), so that we don't "over-saturate" the fast
pool, but the once per second we get from rule (b) is a plenty enough
baseline. So, by increasing the 64 in rule (c) to something larger, we
avoid calling queue_work_on() as frequently during irq storms.

This commit changes that 64 in rule (c) to be 1024, which means we
schedule mix() 16 times less often. And it does *not* need to change the
64 in rule (a).

Fixes: 58340f8e952b ("random: defer fast pool mixing to worker")
Cc: stable@vger.kernel.org
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# e052a478 08-Jun-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: remove rng_has_arch_random()

With arch randomness being used by every distro and enabled in
defconfigs, the distinction between rng_has_arch_random() and
rng_is_initialized() is now rather small. In fact, the places where they
differ are now places where paranoid users and system builders really
don't want arch randomness to be used, in which case we should respect
that choice, or places where arch randomness is known to be broken, in
which case that choice is all the more important. So this commit just
removes the function and its one user.

Reviewed-by: Petr Mladek <pmladek@suse.com> # for vsprintf.c
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# 60e5b288 07-Jun-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: do not use jump labels before they are initialized

Stephen reported that a static key warning splat appears during early
boot on systems that credit randomness from device trees that contain an
"rng-seed" property, because because setup_machine_fdt() is called
before jump_label_init() during setup_arch():

static_key_enable_cpuslocked(): static key '0xffffffe51c6fcfc0' used before call to jump_label_init()
WARNING: CPU: 0 PID: 0 at kernel/jump_label.c:166 static_key_enable_cpuslocked+0xb0/0xb8
Modules linked in:
CPU: 0 PID: 0 Comm: swapper Not tainted 5.18.0+ #224 44b43e377bfc84bc99bb5ab885ff694984ee09ff
pstate: 600001c9 (nZCv dAIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : static_key_enable_cpuslocked+0xb0/0xb8
lr : static_key_enable_cpuslocked+0xb0/0xb8
sp : ffffffe51c393cf0
x29: ffffffe51c393cf0 x28: 000000008185054c x27: 00000000f1042f10
x26: 0000000000000000 x25: 00000000f10302b2 x24: 0000002513200000
x23: 0000002513200000 x22: ffffffe51c1c9000 x21: fffffffdfdc00000
x20: ffffffe51c2f0831 x19: ffffffe51c6fcfc0 x18: 00000000ffff1020
x17: 00000000e1e2ac90 x16: 00000000000000e0 x15: ffffffe51b710708
x14: 0000000000000066 x13: 0000000000000018 x12: 0000000000000000
x11: 0000000000000000 x10: 00000000ffffffff x9 : 0000000000000000
x8 : 0000000000000000 x7 : 61632065726f6665 x6 : 6220646573752027
x5 : ffffffe51c641d25 x4 : ffffffe51c13142c x3 : ffff0a00ffffff05
x2 : 40000000ffffe003 x1 : 00000000000001c0 x0 : 0000000000000065
Call trace:
static_key_enable_cpuslocked+0xb0/0xb8
static_key_enable+0x2c/0x40
crng_set_ready+0x24/0x30
execute_in_process_context+0x80/0x90
_credit_init_bits+0x100/0x154
add_bootloader_randomness+0x64/0x78
early_init_dt_scan_chosen+0x140/0x184
early_init_dt_scan_nodes+0x28/0x4c
early_init_dt_scan+0x40/0x44
setup_machine_fdt+0x7c/0x120
setup_arch+0x74/0x1d8
start_kernel+0x84/0x44c
__primary_switched+0xc0/0xc8
---[ end trace 0000000000000000 ]---
random: crng init done
Machine model: Google Lazor (rev1 - 2) with LTE

A trivial fix went in to address this on arm64, 73e2d827a501 ("arm64:
Initialize jump labels before setup_machine_fdt()"). I wrote patches as
well for arm32 and risc-v. But still patches are needed on xtensa,
powerpc, arc, and mips. So that's 7 platforms where things aren't quite
right. This sort of points to larger issues that might need a larger
solution.

Instead, this commit just defers setting the static branch until later
in the boot process. random_init() is called after jump_label_init() has
been called, and so is always a safe place from which to adjust the
static branch.

Fixes: f5bda35fba61 ("random: use static branch for crng_ready()")
Reported-by: Stephen Boyd <swboyd@chromium.org>
Reported-by: Phil Elwell <phil@raspberrypi.com>
Tested-by: Phil Elwell <phil@raspberrypi.com>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# 77fc95f8 07-Jun-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: account for arch randomness in bits

Rather than accounting in bytes and multiplying (shifting), we can just
account in bits and avoid the shift. The main motivation for this is
there are other patches in flux that expand this code a bit, and
avoiding the duplication of "* 8" everywhere makes things a bit clearer.

Cc: stable@vger.kernel.org
Fixes: 12e45a2a6308 ("random: credit architectural init the exact amount")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# 39e0f991 07-Jun-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: mark bootloader randomness code as __init

add_bootloader_randomness() and the variables it touches are only used
during __init and not after, so mark these as __init. At the same time,
unexport this, since it's only called by other __init code that's
built-in.

Cc: stable@vger.kernel.org
Fixes: 428826f5358c ("fdt: add support for rng-seed")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# 9b29b6b2 07-Jun-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: avoid checking crng_ready() twice in random_init()

The current flow expands to:

if (crng_ready())
...
else if (...)
if (!crng_ready())
...

The second crng_ready() call is redundant, but can't so easily be
optimized out by the compiler.

This commit simplifies that to:

if (crng_ready()
...
else if (...)
...

Fixes: 560181c27b58 ("random: move initialization functions out of hot pages")
Cc: stable@vger.kernel.org
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# 1ce6c8d6 22-May-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: check for signals after page of pool writes

get_random_bytes_user() checks for signals after producing a PAGE_SIZE
worth of output, just like /dev/zero does. write_pool() is doing
basically the same work (actually, slightly more expensive), and so
should stop to check for signals in the same way. Let's also name it
write_pool_user() to match get_random_bytes_user(), so this won't be
misused in the future.

Before this patch, massive writes to /dev/urandom would tie up the
process for an extremely long time and make it unterminatable. After, it
can be successfully interrupted. The following test program can be used
to see this works as intended:

#include <unistd.h>
#include <fcntl.h>
#include <signal.h>
#include <stdio.h>

static unsigned char x[~0U];

static void handle(int) { }

int main(int argc, char *argv[])
{
pid_t pid = getpid(), child;
int fd;
signal(SIGUSR1, handle);
if (!(child = fork())) {
for (;;)
kill(pid, SIGUSR1);
}
fd = open("/dev/urandom", O_WRONLY);
pause();
printf("interrupted after writing %zd bytes\n", write(fd, x, sizeof(x)));
close(fd);
kill(child, SIGTERM);
return 0;
}

Result before: "interrupted after writing 2147479552 bytes"
Result after: "interrupted after writing 4096 bytes"

Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# 79025e72 19-May-2022 Jens Axboe <axboe@kernel.dk>

random: wire up fops->splice_{read,write}_iter()

Now that random/urandom is using {read,write}_iter, we can wire it up to
using the generic splice handlers.

Fixes: 36e2c7421f02 ("fs: don't allow splice read/write without explicit ops")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
[Jason: added the splice_write path. Note that sendfile() and such still
does not work for read, though it does for write, because of a file
type restriction in splice_direct_to_actor(), which I'll address
separately.]
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# 22b0a222 19-May-2022 Jens Axboe <axboe@kernel.dk>

random: convert to using fops->write_iter()

Now that the read side has been converted to fix a regression with
splice, convert the write side as well to have some symmetry in the
interface used (and help deprecate ->write()).

Signed-off-by: Jens Axboe <axboe@kernel.dk>
[Jason: cleaned up random_ioctl a bit, require full writes in
RNDADDENTROPY since it's crediting entropy, simplify control flow of
write_pool(), and incorporate suggestions from Al.]
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# 1b388e77 19-May-2022 Jens Axboe <axboe@kernel.dk>

random: convert to using fops->read_iter()

This is a pre-requisite to wiring up splice() again for the random
and urandom drivers. It also allows us to remove the INT_MAX check in
getrandom(), because import_single_range() applies capping internally.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
[Jason: rewrote get_random_bytes_user() to simplify and also incorporate
additional suggestions from Al.]
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# 3092adce 14-May-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: unify batched entropy implementations

There are currently two separate batched entropy implementations, for
u32 and u64, with nearly identical code, with the goal of avoiding
unaligned memory accesses and letting the buffers be used more
efficiently. Having to maintain these two functions independently is a
bit of a hassle though, considering that they always need to be kept in
sync.

This commit factors them out into a type-generic macro, so that the
expansion produces the same code as before, such that diffing the
assembly shows no differences. This will also make it easier in the
future to add u16 and u8 batches.

This was initially tested using an always_inline function and letting
gcc constant fold the type size in, but the code gen was less efficient,
and in general it was more verbose and harder to follow. So this patch
goes with the boring macro solution, similar to what's already done for
the _wait functions in random.h.

Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# 5ad7dd88 14-May-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: move randomize_page() into mm where it belongs

randomize_page is an mm function. It is documented like one. It contains
the history of one. It has the naming convention of one. It looks
just like another very similar function in mm, randomize_stack_top().
And it has always been maintained and updated by mm people. There is no
need for it to be in random.c. In the "which shape does not look like
the other ones" test, pointing to randomize_page() is correct.

So move randomize_page() into mm/util.c, right next to the similar
randomize_stack_top() function.

This commit contains no actual code changes.

Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# 6701de6c 15-May-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: remove mostly unused async readiness notifier

The register_random_ready_notifier() notifier is somewhat complicated,
and was already recently rewritten to use notifier blocks. It is only
used now by one consumer in the kernel, vsprintf.c, for which the async
mechanism is really overly complex for what it actually needs. This
commit removes register_random_ready_notifier() and unregister_random_
ready_notifier(), because it just adds complication with little utility,
and changes vsprintf.c to just check on `!rng_is_initialized() &&
!rng_has_arch_random()`, which will eventually be true. Performance-
wise, that code was already using a static branch, so there's basically
no overhead at all to this change.

Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Acked-by: Petr Mladek <pmladek@suse.com> # for vsprintf.c
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# 248561ad 14-May-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: remove get_random_bytes_arch() and add rng_has_arch_random()

The RNG incorporates RDRAND into its state at boot and every time it
reseeds, so there's no reason for callers to use it directly. The
hashing that the RNG does on it is preferable to using the bytes raw.

The only current use case of get_random_bytes_arch() is vsprintf's
siphash key for pointer hashing, which uses it to initialize the pointer
secret earlier than usual if RDRAND is available. In order to replace
this narrow use case, just expose whether RDRAND is mixed into the RNG,
with a new function called rng_has_arch_random(). With that taken care
of, there are no users of get_random_bytes_arch() left, so it can be
removed.

Later, if trust_cpu gets turned on by default (as most distros are
doing), this one use of rng_has_arch_random() can probably go away as
well.

Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Acked-by: Petr Mladek <pmladek@suse.com> # for vsprintf.c
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# 560181c2 13-May-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: move initialization functions out of hot pages

Much of random.c is devoted to initializing the rng and accounting for
when a sufficient amount of entropy has been added. In a perfect world,
this would all happen during init, and so we could mark these functions
as __init. But in reality, this isn't the case: sometimes the rng only
finishes initializing some seconds after system init is finished.

For this reason, at the moment, a whole host of functions that are only
used relatively close to system init and then never again are intermixed
with functions that are used in hot code all the time. This creates more
cache misses than necessary.

In order to pack the hot code closer together, this commit moves the
initialization functions that can't be marked as __init into
.text.unlikely by way of the __cold attribute.

Of particular note is moving credit_init_bits() into a macro wrapper
that inlines the crng_ready() static branch check. This avoids a
function call to a nop+ret, and most notably prevents extra entropy
arithmetic from being computed in mix_interrupt_randomness().

Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# a1940263 13-May-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: make consistent use of buf and len

The current code was a mix of "nbytes", "count", "size", "buffer", "in",
and so forth. Instead, let's clean this up by naming input parameters
"buf" (or "ubuf") and "len", so that you always understand that you're
reading this variety of function argument.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# f5bda35f 03-May-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: use static branch for crng_ready()

Since crng_ready() is only false briefly during initialization and then
forever after becomes true, we don't need to evaluate it after, making
it a prime candidate for a static branch.

One complication, however, is that it changes state in a particular call
to credit_init_bits(), which might be made from atomic context, which
means we must kick off a workqueue to change the static key. Further
complicating things, credit_init_bits() may be called sufficiently early
on in system initialization such that system_wq is NULL.

Fortunately, there exists the nice function execute_in_process_context(),
which will immediately execute the function if !in_interrupt(), and
otherwise defer it to a workqueue. During early init, before workqueues
are available, in_interrupt() is always false, because interrupts
haven't even been enabled yet, which means the function in that case
executes immediately. Later on, after workqueues are available,
in_interrupt() might be true, but in that case, the work is queued in
system_wq and all goes well.

Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Sultan Alsawaf <sultan@kerneltoast.com>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# 12e45a2a 12-May-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: credit architectural init the exact amount

RDRAND and RDSEED can fail sometimes, which is fine. We currently
initialize the RNG with 512 bits of RDRAND/RDSEED. We only need 256 bits
of those to succeed in order to initialize the RNG. Instead of the
current "all or nothing" approach, actually credit these contributions
the amount that is actually contributed.

Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# 2f14062b 04-May-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: handle latent entropy and command line from random_init()

Currently, start_kernel() adds latent entropy and the command line to
the entropy bool *after* the RNG has been initialized, deferring when
it's actually used by things like stack canaries until the next time
the pool is seeded. This surely is not intended.

Rather than splitting up which entropy gets added where and when between
start_kernel() and random_init(), just do everything in random_init(),
which should eliminate these kinds of bugs in the future.

While we're at it, rename the awkwardly titled "rand_initialize()" to
the more standard "random_init()" nomenclature.

Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# 8a5b8a4a 10-May-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: use proper jiffies comparison macro

This expands to exactly the same code that it replaces, but makes things
consistent by using the same macro for jiffy comparisons throughout.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# cc1e127b 09-May-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: remove ratelimiting for in-kernel unseeded randomness

The CONFIG_WARN_ALL_UNSEEDED_RANDOM debug option controls whether the
kernel warns about all unseeded randomness or just the first instance.
There's some complicated rate limiting and comparison to the previous
caller, such that even with CONFIG_WARN_ALL_UNSEEDED_RANDOM enabled,
developers still don't see all the messages or even an accurate count of
how many were missed. This is the result of basically parallel
mechanisms aimed at accomplishing more or less the same thing, added at
different points in random.c history, which sort of compete with the
first-instance-only limiting we have now.

It turns out, however, that nobody cares about the first unseeded
randomness instance of in-kernel users. The same first user has been
there for ages now, and nobody is doing anything about it. It isn't even
clear that anybody _can_ do anything about it. Most places that can do
something about it have switched over to using get_random_bytes_wait()
or wait_for_random_bytes(), which is the right thing to do, but there is
still much code that needs randomness sometimes during init, and as a
geeneral rule, if you're not using one of the _wait functions or the
readiness notifier callback, you're bound to be doing it wrong just
based on that fact alone.

So warning about this same first user that can't easily change is simply
not an effective mechanism for anything at all. Users can't do anything
about it, as the Kconfig text points out -- the problem isn't in
userspace code -- and kernel developers don't or more often can't react
to it.

Instead, show the warning for all instances when CONFIG_WARN_ALL_UNSEEDED_RANDOM
is set, so that developers can debug things need be, or if it isn't set,
don't show a warning at all.

At the same time, CONFIG_WARN_ALL_UNSEEDED_RANDOM now implies setting
random.ratelimit_disable=1 on by default, since if you care about one
you probably care about the other too. And we can clean up usage around
the related urandom_warning ratelimiter as well (whose behavior isn't
changing), so that it properly counts missed messages after the 10
message threshold is reached.

Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# 68c9c8b1 09-May-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: move initialization out of reseeding hot path

Initialization happens once -- by way of credit_init_bits() -- and then
it never happens again. Therefore, it doesn't need to be in
crng_reseed(), which is a hot path that is called multiple times. It
also doesn't make sense to have there, as initialization activity is
better associated with initialization routines.

After the prior commit, crng_reseed() now won't be called by multiple
concurrent callers, which means that we can safely move the
"finialize_init" logic into crng_init_bits() unconditionally.

Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# fed7ef06 09-May-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: avoid initializing twice in credit race

Since all changes of crng_init now go through credit_init_bits(), we can
fix a long standing race in which two concurrent callers of
credit_init_bits() have the new bit count >= some threshold, but are
doing so with crng_init as a lower threshold, checked outside of a lock,
resulting in crng_reseed() or similar being called twice.

In order to fix this, we can use the original cmpxchg value of the bit
count, and only change crng_init when the bit count transitions from
below a threshold to meeting the threshold.

Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# e3d2c5e7 08-May-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: use symbolic constants for crng_init states

crng_init represents a state machine, with three states, and various
rules for transitions. For the longest time, we've been managing these
with "0", "1", and "2", and expecting people to figure it out. To make
the code more obvious, replace these with proper enum values
representing the transition, and then redocument what each of these
states mean.

Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# e73aaae2 07-May-2022 Jason A. Donenfeld <Jason@zx2c4.com>

siphash: use one source of truth for siphash permutations

The SipHash family of permutations is currently used in three places:

- siphash.c itself, used in the ordinary way it was intended.
- random32.c, in a construction from an anonymous contributor.
- random.c, as part of its fast_mix function.

Each one of these places reinvents the wheel with the same C code, same
rotation constants, and same symmetry-breaking constants.

This commit tidies things up a bit by placing macros for the
permutations and constants into siphash.h, where each of the three .c
users can access them. It also leaves a note dissuading more users of
them from emerging.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# 791332b3 06-May-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: help compiler out with fast_mix() by using simpler arguments

Now that fast_mix() has more than one caller, gcc no longer inlines it.
That's fine. But it also doesn't handle the compound literal argument we
pass it very efficiently, nor does it handle the loop as well as it
could. So just expand the code to spell out this function so that it
generates the same code as it did before. Performance-wise, this now
behaves as it did before the last commit. The difference in actual code
size on x86 is 45 bytes, which is less than a cache line.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# e3e33fc2 06-May-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: do not use input pool from hard IRQs

Years ago, a separate fast pool was added for interrupts, so that the
cost associated with taking the input pool spinlocks and mixing into it
would be avoided in places where latency is critical. However, one
oversight was that add_input_randomness() and add_disk_randomness()
still sometimes are called directly from the interrupt handler, rather
than being deferred to a thread. This means that some unlucky interrupts
will be caught doing a blake2s_compress() call and potentially spinning
on input_pool.lock, which can also be taken by unprivileged users by
writing into /dev/urandom.

In order to fix this, add_timer_randomness() now checks whether it is
being called from a hard IRQ and if so, just mixes into the per-cpu IRQ
fast pool using fast_mix(), which is much faster and can be done
lock-free. A nice consequence of this, as well, is that it means hard
IRQ context FPU support is likely no longer useful.

The entropy estimation algorithm used by add_timer_randomness() is also
somewhat different than the one used for add_interrupt_randomness(). The
former looks at deltas of deltas of deltas, while the latter just waits
for 64 interrupts for one bit or for one second since the last bit. In
order to bridge these, and since add_interrupt_randomness() runs after
an add_timer_randomness() that's called from hard IRQ, we add to the
fast pool credit the related amount, and then subtract one to account
for add_interrupt_randomness()'s contribution.

A downside of this, however, is that the num argument is potentially
attacker controlled, which puts a bit more pressure on the fast_mix()
sponge to do more than it's really intended to do. As a mitigating
factor, the first 96 bits of input aren't attacker controlled (a cycle
counter followed by zeros), which means it's essentially two rounds of
siphash rather than one, which is somewhat better. It's also not that
much different from add_interrupt_randomness()'s use of the irq stack
instruction pointer register.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Filipe Manana <fdmanana@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# a4b5c26b 06-May-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: order timer entropy functions below interrupt functions

There are no code changes here; this is just a reordering of functions,
so that in subsequent commits, the timer entropy functions can call into
the interrupt ones.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# e85c0fc1 30-Apr-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: do not pretend to handle premature next security model

Per the thread linked below, "premature next" is not considered to be a
realistic threat model, and leads to more serious security problems.

"Premature next" is the scenario in which:

- Attacker compromises the current state of a fully initialized RNG via
some kind of infoleak.
- New bits of entropy are added directly to the key used to generate the
/dev/urandom stream, without any buffering or pooling.
- Attacker then, somehow having read access to /dev/urandom, samples RNG
output and brute forces the individual new bits that were added.
- Result: the RNG never "recovers" from the initial compromise, a
so-called violation of what academics term "post-compromise security".

The usual solutions to this involve some form of delaying when entropy
gets mixed into the crng. With Fortuna, this involves multiple input
buckets. With what the Linux RNG was trying to do prior, this involves
entropy estimation.

However, by delaying when entropy gets mixed in, it also means that RNG
compromises are extremely dangerous during the window of time before
the RNG has gathered enough entropy, during which time nonces may become
predictable (or repeated), ephemeral keys may not be secret, and so
forth. Moreover, it's unclear how realistic "premature next" is from an
attack perspective, if these attacks even make sense in practice.

Put together -- and discussed in more detail in the thread below --
these constitute grounds for just doing away with the current code that
pretends to handle premature next. I say "pretends" because it wasn't
doing an especially great job at it either; should we change our mind
about this direction, we would probably implement Fortuna to "fix" the
"problem", in which case, removing the pretend solution still makes
sense.

This also reduces the crng reseed period from 5 minutes down to 1
minute. The rationale from the thread might lead us toward reducing that
even further in the future (or even eliminating it), but that remains a
topic of a future commit.

At a high level, this patch changes semantics from:

Before: Seed for the first time after 256 "bits" of estimated
entropy have been accumulated since the system booted. Thereafter,
reseed once every five minutes, but only if 256 new "bits" have been
accumulated since the last reseeding.

After: Seed for the first time after 256 "bits" of estimated entropy
have been accumulated since the system booted. Thereafter, reseed
once every minute.

Most of this patch is renaming and removing: POOL_MIN_BITS becomes
POOL_INIT_BITS, credit_entropy_bits() becomes credit_init_bits(),
crng_reseed() loses its "force" parameter since it's now always true,
the drain_entropy() function no longer has any use so it's removed,
entropy estimation is skipped if we've already init'd, the various
notifiers for "low on entropy" are now only active prior to init, and
finally, some documentation comments are cleaned up here and there.

Link: https://lore.kernel.org/lkml/YmlMGx6+uigkGiZ0@zx2c4.com/
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Nadia Heninger <nadiah@cs.ucsd.edu>
Cc: Tom Ristenpart <ristenpart@cornell.edu>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# 5c3b747e 30-Apr-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: use first 128 bits of input as fast init

Before, the first 64 bytes of input, regardless of how entropic it was,
would be used to mutate the crng base key directly, and none of those
bytes would be credited as having entropy. Then 256 bits of credited
input would be accumulated, and only then would the rng transition from
the earlier "fast init" phase into being actually initialized.

The thinking was that by mixing and matching fast init and real init, an
attacker who compromised the fast init state, considered easy to do
given how little entropy might be in those first 64 bytes, would then be
able to bruteforce bits from the actual initialization. By keeping these
separate, bruteforcing became impossible.

However, by not crediting potentially creditable bits from those first 64
bytes of input, we delay initialization, and actually make the problem
worse, because it means the user is drawing worse random numbers for a
longer period of time.

Instead, we can take the first 128 bits as fast init, and allow them to
be credited, and then hold off on the next 128 bits until they've
accumulated. This is still a wide enough margin to prevent bruteforcing
the rng state, while still initializing much faster.

Then, rather than trying to piecemeal inject into the base crng key at
various points, instead just extract from the pool when we need it, for
the crng_init==0 phase. Performance may even be better for the various
inputs here, since there are likely more calls to mix_pool_bytes() then
there are to get_random_bytes() during this phase of system execution.

Since the preinit injection code is gone, bootloader randomness can then
do something significantly more straight forward, removing the weird
system_wq hack in hwgenerator randomness.

Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# cbe89e5a 03-May-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: do not use batches when !crng_ready()

It's too hard to keep the batches synchronized, and pointless anyway,
since in !crng_ready(), we're updating the base_crng key really often,
where batching only hurts. So instead, if the crng isn't ready, just
call into get_random_bytes(). At this stage nothing is performance
critical anyhow.

Cc: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# b7b67d13 01-May-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: mix in timestamps and reseed on system restore

Since the RNG loses freshness with system suspend/hibernation, when we
resume, immediately reseed using whatever data we can, which for this
particular case is the various timestamps regarding system suspend time,
in addition to more generally the RDSEED/RDRAND/RDTSC values that happen
whenever the crng reseeds.

On systems that suspend and resume automatically all the time -- such as
Android -- we skip the reseeding on suspend resumption, since that could
wind up being far too busy. This is the same trade-off made in
WireGuard.

In addition to reseeding upon resumption always mix into the pool these
various stamps on every power notification event.

Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# 78c768e6 22-Apr-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: vary jitter iterations based on cycle counter speed

Currently, we do the jitter dance if two consecutive reads to the cycle
counter return different values. If they do, then we consider the cycle
counter to be fast enough that one trip through the scheduler will yield
one "bit" of credited entropy. If those two reads return the same value,
then we assume the cycle counter is too slow to show meaningful
differences.

This methodology is flawed for a variety of reasons, one of which Eric
posted a patch to fix in [1]. The issue that patch solves is that on a
system with a slow counter, you might be [un]lucky and read the counter
_just_ before it changes, so that the second cycle counter you read
differs from the first, even though there's usually quite a large period
of time in between the two. For example:

| real time | cycle counter |
| --------- | ------------- |
| 3 | 5 |
| 4 | 5 |
| 5 | 5 |
| 6 | 5 |
| 7 | 5 | <--- a
| 8 | 6 | <--- b
| 9 | 6 | <--- c

If we read the counter at (a) and compare it to (b), we might be fooled
into thinking that it's a fast counter, when in reality it is not. The
solution in [1] is to also compare counter (b) to counter (c), on the
theory that if the counter is _actually_ slow, and (a)!=(b), then
certainly (b)==(c).

This helps solve this particular issue, in one sense, but in another
sense, it mostly functions to disallow jitter entropy on these systems,
rather than simply taking more samples in that case.

Instead, this patch takes a different approach. Right now we assume that
a difference in one set of consecutive samples means one "bit" of
credited entropy per scheduler trip. We can extend this so that a
difference in two sets of consecutive samples means one "bit" of
credited entropy per /two/ scheduler trips, and three for three, and
four for four. In other words, we can increase the amount of jitter
"work" we require for each "bit", depending on how slow the cycle
counter is.

So this patch takes whole bunch of samples, sees how many of them are
different, and divides to find the amount of work required per "bit",
and also requires that at least some minimum of them are different in
order to attempt any jitter entropy.

Note that this approach is still far from perfect. It's not a real
statistical estimate on how much these samples vary; it's not a
real-time analysis of the relevant input data. That remains a project
for another time. However, it makes the same (partly flawed) assumptions
as the code that's there now, so it's probably not worse than the status
quo, and it handles the issue Eric mentioned in [1]. But, again, it's
probably a far cry from whatever a really robust version of this would
be.

[1] https://lore.kernel.org/lkml/20220421233152.58522-1-ebiggers@kernel.org/
https://lore.kernel.org/lkml/20220421192939.250680-1-ebiggers@kernel.org/

Cc: Eric Biggers <ebiggers@google.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# 4b758eda 12-Apr-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: insist on random_get_entropy() existing in order to simplify

All platforms are now guaranteed to provide some value for
random_get_entropy(). In case some bug leads to this not being so, we
print a warning, because that indicates that something is really very
wrong (and likely other things are impacted too). This should never be
hit, but it's a good and cheap way of finding out if something ever is
problematic.

Since we now have viable fallback code for random_get_entropy() on all
platforms, which is, in the worst case, not worse than jiffies, we can
count on getting the best possible value out of it. That means there's
no longer a use for using jiffies as entropy input. It also means we no
longer have a reason for doing the round-robin register flow in the IRQ
handler, which was always of fairly dubious value.

Instead we can greatly simplify the IRQ handler inputs and also unify
the construction between 64-bits and 32-bits. We now collect the cycle
counter and the return address, since those are the two things that
matter. Because the return address and the irq number are likely
related, to the extent we mix in the irq number, we can just xor it into
the top unchanging bytes of the return address, rather than the bottom
changing bytes of the cycle counter as before. Then, we can do a fixed 2
rounds of SipHash/HSipHash. Finally, we use the same construction of
hashing only half of the [H]SipHash state on 32-bit and 64-bit. We're
not actually discarding any entropy, since that entropy is carried
through until the next time. And more importantly, it lets us do the
same sponge-like construction everywhere.

Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# 8717627d 18-Apr-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: document crng_fast_key_erasure() destination possibility

This reverts 35a33ff3807d ("random: use memmove instead of memcpy for
remaining 32 bytes"), which was made on a totally bogus basis. The thing
it was worried about overlapping came from the stack, not from one of
its arguments, as Eric pointed out.

But the fact that this confusion even happened draws attention to the
fact that it's a bit non-obvious that the random_data parameter can
alias chacha_state, and in fact should do so when the caller can't rely
on the stack being cleared in a timely manner. So this commit documents
that.

Reported-by: Eric Biggers <ebiggers@kernel.org>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# 35a33ff3 13-Apr-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: use memmove instead of memcpy for remaining 32 bytes

In order to immediately overwrite the old key on the stack, before
servicing a userspace request for bytes, we use the remaining 32 bytes
of block 0 as the key. This means moving indices 8,9,a,b,c,d,e,f ->
4,5,6,7,8,9,a,b. Since 4 < 8, for the kernel implementations of
memcpy(), this doesn't actually appear to be a problem in practice. But
relying on that characteristic seems a bit brittle. So let's change that
to a proper memmove(), which is the by-the-books way of handling
overlapping memory copies.

Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# b0c3e796 08-Apr-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: make random_get_entropy() return an unsigned long

Some implementations were returning type `unsigned long`, while others
that fell back to get_cycles() were implicitly returning a `cycles_t` or
an untyped constant int literal. That makes for weird and confusing
code, and basically all code in the kernel already handled it like it
was an `unsigned long`. I recently tried to handle it as the largest
type it could be, a `cycles_t`, but doing so doesn't really help with
much.

Instead let's just make random_get_entropy() return an unsigned long all
the time. This also matches the commonly used `arch_get_random_long()`
function, so now RDRAND and RDTSC return the same sized integer, which
means one can fallback to the other more gracefully.

Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Theodore Ts'o <tytso@mit.edu>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# 5209aed5 07-Apr-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: allow partial reads if later user copies fail

Rather than failing entirely if a copy_to_user() fails at some point,
instead we should return a partial read for the amount that succeeded
prior, unless none succeeded at all, in which case we return -EFAULT as
before.

This makes it consistent with other reader interfaces. For example, the
following snippet for /dev/zero outputs "4" followed by "1":

int fd;
void *x = mmap(NULL, 4096, PROT_WRITE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
assert(x != MAP_FAILED);
fd = open("/dev/zero", O_RDONLY);
assert(fd >= 0);
printf("%zd\n", read(fd, x, 4));
printf("%zd\n", read(fd, x + 4095, 4));
close(fd);

This brings that same standard behavior to the various RNG reader
interfaces.

While we're at it, we can streamline the loop logic a little bit.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jann Horn <jannh@google.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# e3c1c4fd 05-Apr-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: check for signals every PAGE_SIZE chunk of /dev/[u]random

In 1448769c9cdb ("random: check for signal_pending() outside of
need_resched() check"), Jann pointed out that we previously were only
checking the TIF_NOTIFY_SIGNAL and TIF_SIGPENDING flags if the process
had TIF_NEED_RESCHED set, which meant in practice, super long reads to
/dev/[u]random would delay signal handling by a long time. I tried this
using the below program, and indeed I wasn't able to interrupt a
/dev/urandom read until after several megabytes had been read. The bug
he fixed has always been there, and so code that reads from /dev/urandom
without checking the return value of read() has mostly worked for a long
time, for most sizes, not just for <= 256.

Maybe it makes sense to keep that code working. The reason it was so
small prior, ignoring the fact that it didn't work anyway, was likely
because /dev/random used to block, and that could happen for pretty
large lengths of time while entropy was gathered. But now, it's just a
chacha20 call, which is extremely fast and is just operating on pure
data, without having to wait for some external event. In that sense,
/dev/[u]random is a lot more like /dev/zero.

Taking a page out of /dev/zero's read_zero() function, it always returns
at least one chunk, and then checks for signals after each chunk. Chunk
sizes there are of length PAGE_SIZE. Let's just copy the same thing for
/dev/[u]random, and check for signals and cond_resched() for every
PAGE_SIZE amount of data. This makes the behavior more consistent with
expectations, and should mitigate the impact of Jann's fix for the
age-old signal check bug.

---- test program ----

#include <unistd.h>
#include <signal.h>
#include <stdio.h>
#include <sys/random.h>

static unsigned char x[~0U];

static void handle(int) { }

int main(int argc, char *argv[])
{
pid_t pid = getpid(), child;
signal(SIGUSR1, handle);
if (!(child = fork())) {
for (;;)
kill(pid, SIGUSR1);
}
pause();
printf("interrupted after reading %zd bytes\n", getrandom(x, sizeof(x), 0));
kill(child, SIGTERM);
return 0;
}

Cc: Jann Horn <jannh@google.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# 1448769c 05-Apr-2022 Jann Horn <jannh@google.com>

random: check for signal_pending() outside of need_resched() check

signal_pending() checks TIF_NOTIFY_SIGNAL and TIF_SIGPENDING, which
signal that the task should bail out of the syscall when possible. This
is a separate concept from need_resched(), which checks
TIF_NEED_RESCHED, signaling that the task should preempt.

In particular, with the current code, the signal_pending() bailout
probably won't work reliably.

Change this to look like other functions that read lots of data, such as
read_zero().

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# aba120cc 05-Apr-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: do not allow user to keep crng key around on stack

The fast key erasure RNG design relies on the key that's used to be used
and then discarded. We do this, making judicious use of
memzero_explicit(). However, reads to /dev/urandom and calls to
getrandom() involve a copy_to_user(), and userspace can use FUSE or
userfaultfd, or make a massive call, dynamically remap memory addresses
as it goes, and set the process priority to idle, in order to keep a
kernel stack alive indefinitely. By probing
/proc/sys/kernel/random/entropy_avail to learn when the crng key is
refreshed, a malicious userspace could mount this attack every 5 minutes
thereafter, breaking the crng's forward secrecy.

In order to fix this, we just overwrite the stack's key with the first
32 bytes of the "free" fast key erasure output. If we're returning <= 32
bytes to the user, then we can still return those bytes directly, so
that short reads don't become slower. And for long reads, the difference
is hopefully lost in the amortization, so it doesn't change much, with
that amortization helping variously for medium reads.

We don't need to do this for get_random_bytes() and the various
kernel-space callers, and later, if we ever switch to always batching,
this won't be necessary either, so there's no need to change the API of
these functions.

Cc: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jann Horn <jannh@google.com>
Fixes: c92e040d575a ("random: add backtracking protection to the CRNG")
Fixes: 186873c549df ("random: use simpler fast key erasure flow on per-cpu keys")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# 48bff105 05-Apr-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: opportunistically initialize on /dev/urandom reads

In 6f98a4bfee72 ("random: block in /dev/urandom"), we tried to make a
successful try_to_generate_entropy() call *required* if the RNG was not
already initialized. Unfortunately, weird architectures and old
userspaces combined in TCG test harnesses, making that change still not
realistic, so it was reverted in 0313bc278dac ("Revert "random: block in
/dev/urandom"").

However, rather than making a successful try_to_generate_entropy() call
*required*, we can instead make it *best-effort*.

If try_to_generate_entropy() fails, it fails, and nothing changes from
the current behavior. If it succeeds, then /dev/urandom becomes safe to
use for free. This way, we don't risk the regression potential that led
to us reverting the required-try_to_generate_entropy() call before.

Practically speaking, this means that at least on x86, /dev/urandom
becomes safe. Probably other architectures with working cycle counters
will also become safe. And architectures with slow or broken cycle
counters at least won't be affected at all by this change.

So it may not be the glorious "all things are unified!" change we were
hoping for initially, but practically speaking, it makes a positive
impact.

Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# 527a9867 04-Apr-2022 Jan Varho <jan.varho@gmail.com>

random: do not split fast init input in add_hwgenerator_randomness()

add_hwgenerator_randomness() tries to only use the required amount of input
for fast init, but credits all the entropy, rather than a fraction of
it. Since it's hard to determine how much entropy is left over out of a
non-unformly random sample, either give it all to fast init or credit
it, but don't attempt to do both. In the process, we can clean up the
injection code to no longer need to return a value.

Signed-off-by: Jan Varho <jan.varho@gmail.com>
[Jason: expanded commit message]
Fixes: 73c7733f122e ("random: do not throw away excess input to crng_fast_load")
Cc: stable@vger.kernel.org # 5.17+, requires af704c856e88
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# 1754abb3 31-Mar-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: mix build-time latent entropy into pool at init

Prior, the "input_pool_data" array needed no real initialization, and so
it was easy to mark it with __latent_entropy to populate it during
compile-time. In switching to using a hash function, this required us to
specifically initialize it to some specific state, which means we
dropped the __latent_entropy attribute. An unfortunate side effect was
this meant the pool was no longer seeded using compile-time random data.
In order to bring this back, we declare an array in rand_initialize()
with __latent_entropy and call mix_pool_bytes() on that at init, which
accomplishes the same thing as before. We make this __initconst, so that
it doesn't take up space at runtime after init.

Fixes: 6e8ec2552c7d ("random: use computational hash for entropy extraction")
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# dd7aa36e 22-Mar-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: re-add removed comment about get_random_{u32,u64} reseeding

The comment about get_random_{u32,u64}() not invoking reseeding got
added in an unrelated commit, that then was recently reverted by
0313bc278dac ("Revert "random: block in /dev/urandom""). So this adds
that little comment snippet back, and improves the wording a bit too.

Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# d97c68d1 22-Mar-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: treat bootloader trust toggle the same way as cpu trust toggle

If CONFIG_RANDOM_TRUST_CPU is set, the RNG initializes using RDRAND.
But, the user can disable (or enable) this behavior by setting
`random.trust_cpu=0/1` on the kernel command line. This allows system
builders to do reasonable things while avoiding howls from tinfoil
hatters. (Or vice versa.)

CONFIG_RANDOM_TRUST_BOOTLOADER is basically the same thing, but regards
the seed passed via EFI or device tree, which might come from RDRAND or
a TPM or somewhere else. In order to allow distros to more easily enable
this while avoiding those same howls (or vice versa), this commit adds
the corresponding `random.trust_bootloader=0/1` toggle.

Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Graham Christensen <graham@grahamc.com>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Link: https://github.com/NixOS/nixpkgs/pull/165355
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# af704c85 21-Mar-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: skip fast_init if hwrng provides large chunk of entropy

At boot time, EFI calls add_bootloader_randomness(), which in turn calls
add_hwgenerator_randomness(). Currently add_hwgenerator_randomness()
feeds the first 64 bytes of randomness to the "fast init"
non-crypto-grade phase. But if add_hwgenerator_randomness() gets called
with more than POOL_MIN_BITS of entropy, there's no point in passing it
off to the "fast init" stage, since that's enough entropy to bootstrap
the real RNG. The "fast init" stage is just there to provide _something_
in the case where we don't have enough entropy to properly bootstrap the
RNG. But if we do have enough entropy to bootstrap the RNG, the current
logic doesn't serve a purpose. So, in the case where we're passed
greater than or equal to POOL_MIN_BITS of entropy, this commit makes us
skip the "fast init" phase.

Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# 0313bc27 22-Mar-2022 Linus Torvalds <torvalds@linux-foundation.org>

Revert "random: block in /dev/urandom"

This reverts commit 6f98a4bfee72c22f50aedb39fb761567969865fe.

It turns out we still can't do this. Way too many platforms that don't
have any real source of randomness at boot and no jitter entropy because
they don't even have a cycle counter.

As reported by Guenter Roeck:

"This causes a large number of qemu boot test failures for various
architectures (arm, m68k, microblaze, sparc32, xtensa are the ones I
observed).

Common denominator is that boot hangs at 'Saving random seed:'"

This isn't hugely unexpected - we tried it, it failed, so now we'll
revert it.

Link: https://lore.kernel.org/all/20220322155820.GA1745955@roeck-us.net/
Reported-and-bisected-by: Guenter Roeck <linux@roeck-us.net>
Cc: Jason Donenfeld <Jason@zx2c4.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 3e504d20 08-Mar-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: check for signal and try earlier when generating entropy

Rather than waiting a full second in an interruptable waiter before
trying to generate entropy, try to generate entropy first and wait
second. While waiting one second might give an extra second for getting
entropy from elsewhere, we're already pretty late in the init process
here, and whatever else is generating entropy will still continue to
contribute. This has implications on signal handling: we call
try_to_generate_entropy() from wait_for_random_bytes(), and
wait_for_random_bytes() always uses wait_event_interruptible_timeout()
when waiting, since it's called by userspace code in restartable
contexts, where signals can pend. Since try_to_generate_entropy() now
runs first, if a signal is pending, it's necessary for
try_to_generate_entropy() to check for signals, since it won't hit the
wait until after try_to_generate_entropy() has returned. And even before
this change, when entering a busy loop in try_to_generate_entropy(), we
should have been checking to see if any signals are pending, so that a
process doesn't get stuck in that loop longer than expected.

Cc: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# 7a7ff644 08-Mar-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: reseed more often immediately after booting

In order to chip away at the "premature first" problem, we augment our
existing entropy accounting with more frequent reseedings at boot.

The idea is that at boot, we're getting entropy from various places, and
we're not very sure which of early boot entropy is good and which isn't.
Even when we're crediting the entropy, we're still not totally certain
that it's any good. Since boot is the one time (aside from a compromise)
that we have zero entropy, it's important that we shepherd entropy into
the crng fairly often.

At the same time, we don't want a "premature next" problem, whereby an
attacker can brute force individual bits of added entropy. In lieu of
going full-on Fortuna (for now), we can pick a simpler strategy of just
reseeding more often during the first 5 minutes after boot. This is
still bounded by the 256-bit entropy credit requirement, so we'll skip a
reseeding if we haven't reached that, but in case entropy /is/ coming
in, this ensures that it makes its way into the crng rather rapidly
during these early stages.

Ordinarily we reseed if the previous reseeding is 300 seconds old. This
commit changes things so that for the first 600 seconds of boot time, we
reseed if the previous reseeding is uptime / 2 seconds old. That means
that we'll reseed at the very least double the uptime of the previous
reseeding.

Cc: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# a96cfe2d 08-Mar-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: make consistent usage of crng_ready()

Rather than sometimes checking `crng_init < 2`, we should always use the
crng_ready() macro, so that should we change anything later, it's
consistent. Additionally, that macro already has a likely() around it,
which means we don't need to open code our own likely() and unlikely()
annotations.

Cc: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# f5eab0e2 11-Feb-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: use SipHash as interrupt entropy accumulator

The current fast_mix() function is a piece of classic mailing list
crypto, where it just sort of sprung up by an anonymous author without a
lot of real analysis of what precisely it was accomplishing. As an ARX
permutation alone, there are some easily searchable differential trails
in it, and as a means of preventing malicious interrupts, it completely
fails, since it xors new data into the entire state every time. It can't
really be analyzed as a random permutation, because it clearly isn't,
and it can't be analyzed as an interesting linear algebraic structure
either, because it's also not that. There really is very little one can
say about it in terms of entropy accumulation. It might diffuse bits,
some of the time, maybe, we hope, I guess. But for the most part, it
fails to accomplish anything concrete.

As a reminder, the simple goal of add_interrupt_randomness() is to
simply accumulate entropy until ~64 interrupts have elapsed, and then
dump it into the main input pool, which uses a cryptographic hash.

It would be nice to have something cryptographically strong in the
interrupt handler itself, in case a malicious interrupt compromises a
per-cpu fast pool within the 64 interrupts / 1 second window, and then
inside of that same window somehow can control its return address and
cycle counter, even if that's a bit far fetched. However, with a very
CPU-limited budget, actually doing that remains an active research
project (and perhaps there'll be something useful for Linux to come out
of it). And while the abundance of caution would be nice, this isn't
*currently* the security model, and we don't yet have a fast enough
solution to make it our security model. Plus there's not exactly a
pressing need to do that. (And for the avoidance of doubt, the actual
cluster of 64 accumulated interrupts still gets dumped into our
cryptographically secure input pool.)

So, for now we are going to stick with the existing interrupt security
model, which assumes that each cluster of 64 interrupt data samples is
mostly non-malicious and not colluding with an infoleaker. With this as
our goal, we have a few more choices, simply aiming to accumulate
entropy, while discarding the least amount of it.

We know from <https://eprint.iacr.org/2019/198> that random oracles,
instantiated as computational hash functions, make good entropy
accumulators and extractors, which is the justification for using
BLAKE2s in the main input pool. As mentioned, we don't have that luxury
here, but we also don't have the same security model requirements,
because we're assuming that there aren't malicious inputs. A
pseudorandom function instance can approximately behave like a random
oracle, provided that the key is uniformly random. But since we're not
concerned with malicious inputs, we can pick a fixed key, which is not
secret, knowing that "nature" won't interact with a sufficiently chosen
fixed key by accident. So we pick a PRF with a fixed initial key, and
accumulate into it continuously, dumping the result every 64 interrupts
into our cryptographically secure input pool.

For this, we make use of SipHash-1-x on 64-bit and HalfSipHash-1-x on
32-bit, which are already in use in the kernel's hsiphash family of
functions and achieve the same performance as the function they replace.
It would be nice to do two rounds, but we don't exactly have the CPU
budget handy for that, and one round alone is already sufficient.

As mentioned, we start with a fixed initial key (zeros is fine), and
allow SipHash's symmetry breaking constants to turn that into a useful
starting point. Also, since we're dumping the result (or half of it on
64-bit so as to tax our hash function the same amount on all platforms)
into the cryptographically secure input pool, there's no point in
finalizing SipHash's output, since it'll wind up being finalized by
something much stronger. This means that all we need to do is use the
ordinary round function word-by-word, as normal SipHash does.
Simplified, the flow is as follows:

Initialize:

siphash_state_t state;
siphash_init(&state, key={0, 0, 0, 0});

Update (accumulate) on interrupt:

siphash_update(&state, interrupt_data_and_timing);

Dump into input pool after 64 interrupts:

blake2s_update(&input_pool, &state, sizeof(state) / 2);

The result of all of this is that the security model is unchanged from
before -- we assume non-malicious inputs -- yet we now implement that
model with a stronger argument. I would like to emphasize, again, that
the purpose of this commit is to improve the existing design, by making
it analyzable, without changing any fundamental assumptions. There may
well be value down the road in changing up the existing design, using
something cryptographically strong, or simply using a ring buffer of
samples rather than having a fast_mix() at all, or changing which and
how much data we collect each interrupt so that we can use something
linear, or a variety of other ideas. This commit does not invalidate the
potential for those in the future.

For example, in the future, if we're able to characterize the data we're
collecting on each interrupt, we may be able to inch toward information
theoretic accumulators. <https://eprint.iacr.org/2021/523> shows that `s
= ror32(s, 7) ^ x` and `s = ror64(s, 19) ^ x` make very good
accumulators for 2-monotone distributions, which would apply to
timestamp counters, like random_get_entropy() or jiffies, but would not
apply to our current combination of the two values, or to the various
function addresses and register values we mix in. Alternatively,
<https://eprint.iacr.org/2021/1002> shows that max-period linear
functions with no non-trivial invariant subspace make good extractors,
used in the form `s = f(s) ^ x`. However, this only works if the input
data is both identical and independent, and obviously a collection of
address values and counters fails; so it goes with theoretical papers.
Future directions here may involve trying to characterize more precisely
what we actually need to collect in the interrupt handler, and building
something specific around that.

However, as mentioned, the morass of data we're gathering at the
interrupt handler presently defies characterization, and so we use
SipHash for now, which works well and performs well.

Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# f3c2682b 01-Mar-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: provide notifier for VM fork

Drivers such as WireGuard need to learn when VMs fork in order to clear
sessions. This commit provides a simple notifier_block for that, with a
register and unregister function. When no VM fork detection is compiled
in, this turns into a no-op, similar to how the power notifier works.

Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# 5acd3548 01-Mar-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: replace custom notifier chain with standard one

We previously rolled our own randomness readiness notifier, which only
has two users in the whole kernel. Replace this with a more standard
atomic notifier block that serves the same purpose with less code. Also
unexport the symbols, because no modules use it, only unconditional
builtins. The only drawback is that it's possible for a notification
handler returning the "stop" code to prevent further processing, but
given that there are only two users, and that we're unexporting this
anyway, that doesn't seem like a significant drawback for the
simplification we receive here.

Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# a4107d34 01-Mar-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: do not export add_vmfork_randomness() unless needed

Since add_vmfork_randomness() is only called from vmgenid.o, we can
guard it in CONFIG_VMGENID, similarly to how we do with
add_disk_randomness() and CONFIG_BLOCK. If we ever have multiple things
calling into add_vmfork_randomness(), we can add another shared Kconfig
symbol for that, but for now, this is good enough. Even though
add_vmfork_randomess() is a pretty small function, removing it means
that there are only calls to crng_reseed(false) and none to
crng_reseed(true), which means the compiler can constant propagate the
false, removing branches from crng_reseed() and its descendants.

Additionally, we don't even need the symbol to be exported if
CONFIG_VMGENID is not a module, so conditionalize that too.

Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# ae099e8e 23-Feb-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: add mechanism for VM forks to reinitialize crng

When a VM forks, we must immediately mix in additional information to
the stream of random output so that two forks or a rollback don't
produce the same stream of random numbers, which could have catastrophic
cryptographic consequences. This commit adds a simple API, add_vmfork_
randomness(), for that, by force reseeding the crng.

This has the added benefit of also draining the entropy pool and setting
its timer back, so that any old entropy that was there prior -- which
could have already been used by a different fork, or generally gone
stale -- does not contribute to the accounting of the next 256 bits.

Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Jann Horn <jannh@google.com>
Cc: Eric Biggers <ebiggers@google.com>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# 77553cf8 28-Feb-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: don't let 644 read-only sysctls be written to

We leave around these old sysctls for compatibility, and we keep them
"writable" for compatibility, but even after writing, we should keep
reporting the same value. This is consistent with how userspaces tend to
use sysctl_random_write_wakeup_bits, writing to it, and then later
reading from it and using the value.

Cc: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# d0efdf35 28-Feb-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: give sysctl_random_min_urandom_seed a more sensible value

This isn't used by anything or anywhere, but we can't delete it due to
compatibility. So at least give it the correct value of what it's
supposed to be instead of a garbage one.

Cc: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# 6f98a4bf 07-Feb-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: block in /dev/urandom

This topic has come up countless times, and usually doesn't go anywhere.
This time I thought I'd bring it up with a slightly narrower focus,
updated for some developments over the last three years: we finally can
make /dev/urandom always secure, in light of the fact that our RNG is
now always seeded.

Ever since Linus' 50ee7529ec45 ("random: try to actively add entropy
rather than passively wait for it"), the RNG does a haveged-style jitter
dance around the scheduler, in order to produce entropy (and credit it)
for the case when we're stuck in wait_for_random_bytes(). How ever you
feel about the Linus Jitter Dance is beside the point: it's been there
for three years and usually gets the RNG initialized in a second or so.

As a matter of fact, this is what happens currently when people use
getrandom(). It's already there and working, and most people have been
using it for years without realizing.

So, given that the kernel has grown this mechanism for seeding itself
from nothing, and that this procedure happens pretty fast, maybe there's
no point any longer in having /dev/urandom give insecure bytes. In the
past we didn't want the boot process to deadlock, which was
understandable. But now, in the worst case, a second goes by, and the
problem is resolved. It seems like maybe we're finally at a point when
we can get rid of the infamous "urandom read hole".

The one slight drawback is that the Linus Jitter Dance relies on random_
get_entropy() being implemented. The first lines of try_to_generate_
entropy() are:

stack.now = random_get_entropy();
if (stack.now == random_get_entropy())
return;

On most platforms, random_get_entropy() is simply aliased to get_cycles().
The number of machines without a cycle counter or some other
implementation of random_get_entropy() in 2022, which can also run a
mainline kernel, and at the same time have a both broken and out of date
userspace that relies on /dev/urandom never blocking at boot is thought
to be exceedingly low. And to be clear: those museum pieces without
cycle counters will continue to run Linux just fine, and even
/dev/urandom will be operable just like before; the RNG just needs to be
seeded first through the usual means, which should already be the case
now.

On systems that really do want unseeded randomness, we already offer
getrandom(GRND_INSECURE), which is in use by, e.g., systemd for seeding
their hash tables at boot. Nothing in this commit would affect
GRND_INSECURE, and it remains the means of getting those types of random
numbers.

This patch goes a long way toward eliminating a long overdue userspace
crypto footgun. After several decades of endless user confusion, we will
finally be able to say, "use any single one of our random interfaces and
you'll be fine. They're all the same. It doesn't matter." And that, I
think, is really something. Finally all of those blog posts and
disagreeing forums and contradictory articles will all become correct
about whatever they happened to recommend, and along with it, a whole
class of vulnerabilities eliminated.

With very minimal downside, we're finally in a position where we can
make this change.

Cc: Dinh Nguyen <dinguyen@kernel.org>
Cc: Nick Hu <nickhu@andestech.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Guo Ren <guoren@kernel.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Joshua Kinard <kumba@gentoo.org>
Cc: David Laight <David.Laight@aculab.com>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Eric Biggers <ebiggers@google.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Lennart Poettering <mzxreary@0pointer.de>
Cc: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# c2a7de4f 13-Feb-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: do crng pre-init loading in worker rather than irq

Taking spinlocks from IRQ context is generally problematic for
PREEMPT_RT. That is, in part, why we take trylocks instead. However, a
spin_try_lock() is also problematic since another spin_lock() invocation
can potentially PI-boost the wrong task, as the spin_try_lock() is
invoked from an IRQ-context, so the task on CPU (random task or idle) is
not the actual owner.

Additionally, by deferring the crng pre-init loading to the worker, we
can use the cryptographic hash function rather than xor, which is
perhaps a meaningful difference when considering this data has only been
through the relatively weak fast_mix() function.

The biggest downside of this approach is that the pre-init loading is
now deferred until later, which means things that need random numbers
after interrupts are enabled, but before workqueues are running -- or
before this particular worker manages to run -- are going to get into
trouble. Hopefully in the real world, this window is rather small,
especially since this code won't run until 64 interrupts had occurred.

Cc: Sultan Alsawaf <sultan@kerneltoast.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Eric Biggers <ebiggers@kernel.org>
Cc: Theodore Ts'o <tytso@mit.edu>
Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# abded93e 24-Feb-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: unify cycles_t and jiffies usage and types

random_get_entropy() returns a cycles_t, not an unsigned long, which is
sometimes 64 bits on various 32-bit platforms, including x86.
Conversely, jiffies is always unsigned long. This commit fixes things to
use cycles_t for fields that use random_get_entropy(), named "cycles",
and unsigned long for fields that use jiffies, named "now". It's also
good to mix in a cycles_t and a jiffies in the same way for both
add_device_randomness and add_timer_randomness, rather than using xor in
one case. Finally, we unify the order of these volatile reads, always
reading the more precise cycles counter, and then jiffies, so that the
cycle counter is as close to the event as possible.

Cc: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# 64276a99 24-Feb-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: cleanup UUID handling

Rather than hard coding various lengths, we can use the right constants.
Strings should be `char *` while buffers should be `u8 *`. Rather than
have a nonsensical and unused maxlength, just remove it. Finally, use
snprintf instead of sprintf, just out of good hygiene.

As well, remove the old comment about returning a binary UUID via the
binary sysctl syscall. That syscall was removed from the kernel in 5.5,
and actually, the "uuid_strategy" function and related infrastructure
for even serving it via the binary sysctl syscall was removed with
894d2491153a ("sysctl drivers: Remove dead binary sysctl support") back
in 2.6.33.

Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# a3f9e891 22-Feb-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: only wake up writers after zap if threshold was passed

The only time that we need to wake up /dev/random writers on
RNDCLEARPOOL/RNDZAPPOOL is when we're changing from a value that is
greater than or equal to POOL_MIN_BITS to zero, because if we're
changing from below POOL_MIN_BITS to zero, the writers are already
unblocked.

Cc: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# da3951eb 22-Feb-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: round-robin registers as ulong, not u32

When the interrupt handler does not have a valid cycle counter, it calls
get_reg() to read a register from the irq stack, in round-robin.
Currently it does this assuming that registers are 32-bit. This is
_probably_ the case, and probably all platforms without cycle counters
are in fact 32-bit platforms. But maybe not, and either way, it's not
quite correct. This commit fixes that to deal with `unsigned long`
rather than `u32`.

Cc: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# 3191dd5a 13-Feb-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: clear fast pool, crng, and batches in cpuhp bring up

For the irq randomness fast pool, rather than having to use expensive
atomics, which were visibly the most expensive thing in the entire irq
handler, simply take care of the extreme edge case of resetting count to
zero in the cpuhp online handler, just after workqueues have been
reenabled. This simplifies the code a bit and lets us use vanilla
variables rather than atomics, and performance should be improved.

As well, very early on when the CPU comes up, while interrupts are still
disabled, we clear out the per-cpu crng and its batches, so that it
always starts with fresh randomness.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Sultan Alsawaf <sultan@kerneltoast.com>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# 1daf2f38 12-Feb-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: check for crng_init == 0 in add_device_randomness()

This has no real functional change, as crng_pre_init_inject() (and
before that, crng_slow_init()) always checks for == 0, not >= 2. So
correct the outer unlocked change to reflect that. Before this used
crng_ready(), which was not correct.

Cc: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# da792c6d 12-Feb-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: unify early init crng load accounting

crng_fast_load() and crng_slow_load() have different semantics:

- crng_fast_load() xors and accounts with crng_init_cnt.
- crng_slow_load() hashes and doesn't account.

However add_hwgenerator_randomness() can afford to hash (it's called
from a kthread), and it should account. Additionally, ones that can
afford to hash don't need to take a trylock but can take a normal lock.
So, we combine these into one function, crng_pre_init_inject(), which
allows us to control these in a uniform way. This will make it simpler
later to simplify this all down when the time comes for that.

Cc: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# afba0b80 11-Feb-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: do not take pool spinlock at boot

Since rand_initialize() is run while interrupts are still off and
nothing else is running, we don't need to repeatedly take and release
the pool spinlock, especially in the RDSEED loop.

Reviewed-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# 58340f8e 04-Feb-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: defer fast pool mixing to worker

On PREEMPT_RT, it's problematic to take spinlocks from hard irq
handlers. We can fix this by deferring to a workqueue the dumping of
the fast pool into the input pool.

We accomplish this with some careful rules on fast_pool->count:

- When it's incremented to >= 64, we schedule the work.
- If the top bit is set, we never schedule the work, even if >= 64.
- The worker is responsible for setting it back to 0 when it's done.

There are two small issues around using workqueues for this purpose that
we work around.

The first issue is that mix_interrupt_randomness() might be migrated to
another CPU during CPU hotplug. This issue is rectified by checking that
it hasn't been migrated (after disabling irqs). If it has been migrated,
then we set the count to zero, so that when the CPU comes online again,
it can requeue the work. As part of this, we switch to using an
atomic_t, so that the increment in the irq handler doesn't wipe out the
zeroing if the CPU comes back online while this worker is running.

The second issue is that, though relatively minor in effect, we probably
want to make sure we get a consistent view of the pool onto the stack,
in case it's interrupted by an irq while reading. To do this, we don't
reenable irqs until after the copy. There are only 18 instructions
between the cli and sti, so this is a pretty tiny window.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Sultan Alsawaf <sultan@kerneltoast.com>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# 5f75d9f3 10-Feb-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: rewrite header introductory comment

Now that we've re-documented the various sections, we can remove the
outdated text here and replace it with a high-level overview.

Cc: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# 0deff3c4 10-Feb-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: group sysctl functions

This pulls all of the sysctl-focused functions into the sixth labeled
section.

No functional changes.

Cc: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# a6adf8e7 10-Feb-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: group userspace read/write functions

This pulls all of the userspace read/write-focused functions into the
fifth labeled section.

No functional changes.

Cc: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# 92c653cf 10-Feb-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: group entropy collection functions

This pulls all of the entropy collection-focused functions into the
fourth labeled section.

No functional changes.

Cc: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# a5ed7cb1 10-Feb-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: group entropy extraction functions

This pulls all of the entropy extraction-focused functions into the
third labeled section.

No functional changes.

Cc: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# 3655adc7 10-Feb-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: group crng functions

This pulls all of the crng-focused functions into the second labeled
section.

No functional changes.

Cc: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# 5f1bb112 10-Feb-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: group initialization wait functions

This pulls all of the readiness waiting-focused functions into the first
labeled section.

No functional changes.

Cc: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# 87e7d5ab 11-Feb-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: remove whitespace and reorder includes

This is purely cosmetic. Future work involves figuring out which of
these headers we need and which we don't.

Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# 246c03dd 10-Feb-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: introduce drain_entropy() helper to declutter crng_reseed()

In preparation for separating responsibilities, break out the entropy
count management part of crng_reseed() into its own function.

No functional changes.

Cc: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# b2f408fe 10-Feb-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: deobfuscate irq u32/u64 contributions

In the irq handler, we fill out 16 bytes differently on 32-bit and
64-bit platforms, and for 32-bit vs 64-bit cycle counters, which doesn't
always correspond with the bitness of the platform. Whether or not you
like this strangeness, it is a matter of fact. But it might not be a
fact you well realized until now, because the code that loaded the irq
info into 4 32-bit words was quite confusing. Instead, this commit
makes everything explicit by having separate (compile-time) branches for
32-bit and 64-bit types.

Cc: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# a07fdae3 10-Feb-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: add proper SPDX header

Convert the current license into the SPDX notation of "(GPL-2.0 OR
BSD-3-Clause)". This infers GPL-2.0 from the text "ALTERNATIVELY, this
product may be distributed under the terms of the GNU General Public
License, in which case the provisions of the GPL are required INSTEAD OF
the above restrictions" and it infers BSD-3-Clause from the verbatim
BSD 3 clause license in the file.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# 14c17463 10-Feb-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: remove unused tracepoints

These explicit tracepoints aren't really used and show sign of aging.
It's work to keep these up to date, and before I attempted to keep them
up to date, they weren't up to date, which indicates that they're not
really used. These days there are better ways of introspecting anyway.

Cc: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# 95e6060c 10-Feb-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: remove ifdef'd out interrupt bench

With tools like kbench9000 giving more finegrained responses, and this
basically never having been used ever since it was initially added,
let's just get rid of this. There *is* still work to be done on the
interrupt handler, but this really isn't the way it's being developed.

Cc: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# 0791e8b6 09-Feb-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: tie batched entropy generation to base_crng generation

Now that we have an explicit base_crng generation counter, we don't need
a separate one for batched entropy. Rather, we can just move the
generation forward every time we change crng_init state or update the
base_crng key.

Cc: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# 7191c628 09-Feb-2022 Dominik Brodowski <linux@dominikbrodowski.net>

random: fix locking for crng_init in crng_reseed()

crng_init is protected by primary_crng->lock. Therefore, we need
to hold this lock when increasing crng_init to 2. As we shouldn't
hold this lock for too long, only hold it for those parts which
require protection.

Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# 7b5164fb 09-Feb-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: zero buffer after reading entropy from userspace

This buffer may contain entropic data that shouldn't stick around longer
than needed, so zero out the temporary buffer at the end of write_pool().

Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Reviewed-by: Jann Horn <jannh@google.com>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# 434537ae 07-Feb-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: remove outdated INT_MAX >> 6 check in urandom_read()

In 79a8468747c5 ("random: check for increase of entropy_count because of
signed conversion"), a number of checks were added around what values
were passed to account(), because account() was doing fancy fixed point
fractional arithmetic, and a user had some ability to pass large values
directly into it. One of things in that commit was limiting those values
to INT_MAX >> 6. The first >> 3 was for bytes to bits, and the next >> 3
was for bits to 1/8 fractional bits.

However, for several years now, urandom reads no longer touch entropy
accounting, and so this check serves no purpose. The current flow is:

urandom_read_nowarn()-->get_random_bytes_user()-->chacha20_block()

Of course, we don't want that size_t to be truncated when adding it into
the ssize_t. But we arrive at urandom_read_nowarn() in the first place
either via ordinary fops, which limits reads to MAX_RW_COUNT, or via
getrandom() which limits reads to INT_MAX.

Cc: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Reviewed-by: Jann Horn <jannh@google.com>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# 04ec96b7 09-Feb-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: make more consistent use of integer types

We've been using a flurry of int, unsigned int, size_t, and ssize_t.
Let's unify all of this into size_t where it makes sense, as it does in
most places, and leave ssize_t for return values with possible errors.

In addition, keeping with the convention of other functions in this
file, functions that are dealing with raw bytes now take void *
consistently instead of a mix of that and u8 *, because much of the time
we're actually passing some other structure that is then interpreted as
bytes by the function.

We also take the opportunity to fix the outdated and incorrect comment
in get_random_bytes_arch().

Cc: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Reviewed-by: Jann Horn <jannh@google.com>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# 66e4c2b9 08-Feb-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: use hash function for crng_slow_load()

Since we have a hash function that's really fast, and the goal of
crng_slow_load() is reportedly to "touch all of the crng's state", we
can just hash the old state together with the new state and call it a
day. This way we dont need to reason about another LFSR or worry about
various attacks there. This code is only ever used at early boot and
then never again.

Cc: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# 186873c5 07-Feb-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: use simpler fast key erasure flow on per-cpu keys

Rather than the clunky NUMA full ChaCha state system we had prior, this
commit is closer to the original "fast key erasure RNG" proposal from
<https://blog.cr.yp.to/20170723-random.html>, by simply treating ChaCha
keys on a per-cpu basis.

All entropy is extracted to a base crng key of 32 bytes. This base crng
has a birthdate and a generation counter. When we go to take bytes from
the crng, we first check if the birthdate is too old; if it is, we
reseed per usual. Then we start working on a per-cpu crng.

This per-cpu crng makes sure that it has the same generation counter as
the base crng. If it doesn't, it does fast key erasure with the base
crng key and uses the output as its new per-cpu key, and then updates
its local generation counter. Then, using this per-cpu state, we do
ordinary fast key erasure. Half of this first block is used to overwrite
the per-cpu crng key for the next call -- this is the fast key erasure
RNG idea -- and the other half, along with the ChaCha state, is returned
to the caller. If the caller desires more than this remaining half, it
can generate more ChaCha blocks, unlocked, using the now detached ChaCha
state that was just returned. Crypto-wise, this is more or less what we
were doing before, but this simply makes it more explicit and ensures
that we always have backtrack protection by not playing games with a
shared block counter.

The flow looks like this:

──extract()──► base_crng.key ◄──memcpy()───┐
│ │
└──chacha()──────┬─► new_base_key
└─► crngs[n].key ◄──memcpy()───┐
│ │
└──chacha()───┬─► new_key
└─► random_bytes

└────►

There are a few hairy details around early init. Just as was done
before, prior to having gathered enough entropy, crng_fast_load() and
crng_slow_load() dump bytes directly into the base crng, and when we go
to take bytes from the crng, in that case, we're doing fast key erasure
with the base crng rather than the fast unlocked per-cpu crngs. This is
fine as that's only the state of affairs during very early boot; once
the crng initializes we never use these paths again.

In the process of all this, the APIs into the crng become a bit simpler:
we have get_random_bytes(buf, len) and get_random_bytes_user(buf, len),
which both do what you'd expect. All of the details of fast key erasure
and per-cpu selection happen only in a very short critical section of
crng_make_state(), which selects the right per-cpu key, does the fast
key erasure, and returns a local state to the caller's stack. So, we no
longer have a need for a separate backtrack function, as this happens
all at once here. The API then allows us to extend backtrack protection
to batched entropy without really having to do much at all.

The result is a bit simpler than before and has fewer foot guns. The
init time state machine also gets a lot simpler as we don't need to wait
for workqueues to come online and do deferred work. And the multi-core
performance should be increased significantly, by virtue of having hardly
any locking on the fast path.

Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Jann Horn <jannh@google.com>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# c30c575d 08-Feb-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: absorb fast pool into input pool after fast load

During crng_init == 0, we never credit entropy in add_interrupt_
randomness(), but instead dump it directly into the primary_crng. That's
fine, except for the fact that we then wind up throwing away that
entropy later when we switch to extracting from the input pool and
xoring into (and later in this series overwriting) the primary_crng key.
The two other early init sites -- add_hwgenerator_randomness()'s use
crng_fast_load() and add_device_ randomness()'s use of crng_slow_load()
-- always additionally give their inputs to the input pool. But not
add_interrupt_randomness().

This commit fixes that shortcoming by calling mix_pool_bytes() after
crng_fast_load() in add_interrupt_randomness(). That's partially
verboten on PREEMPT_RT, where it implies taking spinlock_t from an IRQ
handler. But this also only happens during early boot and then never
again after that. Plus it's a trylock so it has the same considerations
as calling crng_fast_load(), which we're already using.

Cc: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Suggested-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# 91c2afca 08-Feb-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: do not xor RDRAND when writing into /dev/random

Continuing the reasoning of "random: ensure early RDSEED goes through
mixer on init", we don't want RDRAND interacting with anything without
going through the mixer function, as a backdoored CPU could presumably
cancel out data during an xor, which it'd have a harder time doing when
being forced through a cryptographic hash function. There's actually no
need at all to be calling RDRAND in write_pool(), because before we
extract from the pool, we always do so with 32 bytes of RDSEED hashed in
at that stage. Xoring at this stage is needless and introduces a minor
liability.

Cc: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# a02cf3d0 07-Feb-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: ensure early RDSEED goes through mixer on init

Continuing the reasoning of "random: use RDSEED instead of RDRAND in
entropy extraction" from this series, at init time we also don't want to
be xoring RDSEED directly into the crng. Instead it's safer to put it
into our entropy collector and then re-extract it, so that it goes
through a hash function with preimage resistance. As a matter of hygiene,
we also order these now so that the RDSEED byte are hashed in first,
followed by the bytes that are likely more predictable (e.g. utsname()).

Cc: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# 85664172 07-Feb-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: inline leaves of rand_initialize()

This is a preparatory commit for the following one. We simply inline the
various functions that rand_initialize() calls that have no other
callers. The compiler was doing this anyway before. Doing this will
allow us to reorganize this after. We can then move the trust_cpu and
parse_trust_cpu definitions a bit closer to where they're actually used,
which makes the code easier to read.

Cc: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# a9412d51 06-Feb-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: get rid of secondary crngs

As the comment said, this is indeed a "hack". Since it was introduced,
it's been a constant state machine nightmare, with lots of subtle early
boot issues and a wildly complex set of machinery to keep everything in
sync. Rather than continuing to play whack-a-mole with this approach,
this commit simply removes it entirely. This commit is preparation for
"random: use simpler fast key erasure flow on per-cpu keys" in this
series, which introduces a simpler (and faster) mechanism to accomplish
the same thing.

Cc: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# 28f425e5 07-Feb-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: use RDSEED instead of RDRAND in entropy extraction

When /dev/random was directly connected with entropy extraction, without
any expansion stage, extract_buf() was called for every 10 bytes of data
read from /dev/random. For that reason, RDRAND was used rather than
RDSEED. At the same time, crng_reseed() was still only called every 5
minutes, so there RDSEED made sense.

Those olden days were also a time when the entropy collector did not use
a cryptographic hash function, which meant most bets were off in terms
of real preimage resistance. For that reason too it didn't matter
_that_ much whether RDSEED was mixed in before or after entropy
extraction; both choices were sort of bad.

But now we have a cryptographic hash function at work, and with that we
get real preimage resistance. We also now only call extract_entropy()
every 5 minutes, rather than every 10 bytes. This allows us to do two
important things.

First, we can switch to using RDSEED in extract_entropy(), as Dominik
suggested. Second, we can ensure that RDSEED input always goes into the
cryptographic hash function with other things before being used
directly. This eliminates a category of attacks in which the CPU knows
the current state of the crng and knows that we're going to xor RDSEED
into it, and so it computes a malicious RDSEED. By going through our
hash function, it would require the CPU to compute a preimage on the
fly, which isn't going to happen.

Cc: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Suggested-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# 7c2fe2b3 05-Feb-2022 Dominik Brodowski <linux@dominikbrodowski.net>

random: fix locking in crng_fast_load()

crng_init is protected by primary_crng->lock, so keep holding that lock
when incrementing crng_init from 0 to 1 in crng_fast_load(). The call to
pr_notice() can wait until the lock is released; this code path cannot
be reached twice, as crng_fast_load() aborts early if crng_init > 0.

Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# 77760fd7 28-Jan-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: remove batched entropy locking

Rather than use spinlocks to protect batched entropy, we can instead
disable interrupts locally, since we're dealing with per-cpu data, and
manage resets with a basic generation counter. At the same time, we
can't quite do this on PREEMPT_RT, where we still want spinlocks-as-
mutexes semantics. So we use a local_lock_t, which provides the right
behavior for each. Because this is a per-cpu lock, that generation
counter is still doing the necessary CPU-to-CPU communication.

This should improve performance a bit. It will also fix the linked splat
that Jonathan received with a PROVE_RAW_LOCK_NESTING=y.

Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Suggested-by: Andy Lutomirski <luto@kernel.org>
Reported-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Tested-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Link: https://lore.kernel.org/lkml/YfMa0QgsjCVdRAvJ@latitude/
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# 5d58ea3a 04-Feb-2022 Eric Biggers <ebiggers@google.com>

random: remove use_input_pool parameter from crng_reseed()

The primary_crng is always reseeded from the input_pool, while the NUMA
crngs are always reseeded from the primary_crng. Remove the redundant
'use_input_pool' parameter from crng_reseed() and just directly check
whether the crng is the primary_crng.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# a49c010e 03-Feb-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: make credit_entropy_bits() always safe

This is called from various hwgenerator drivers, so rather than having
one "safe" version for userspace and one "unsafe" version for the
kernel, just make everything safe; the checks are cheap and sensible to
have anyway.

Reported-by: Sultan Alsawaf <sultan@kerneltoast.com>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# 489c7fc4 05-Feb-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: always wake up entropy writers after extraction

Now that POOL_BITS == POOL_MIN_BITS, we must unconditionally wake up
entropy writers after every extraction. Therefore there's no point of
write_wakeup_threshold, so we can move it to the dustbin of unused
compatibility sysctls. While we're at it, we can fix a small comparison
where we were waking up after <= min rather than < min.

Cc: Theodore Ts'o <tytso@mit.edu>
Suggested-by: Eric Biggers <ebiggers@kernel.org>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# c5704490 03-Feb-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: use linear min-entropy accumulation crediting

30e37ec516ae ("random: account for entropy loss due to overwrites")
assumed that adding new entropy to the LFSR pool probabilistically
cancelled out old entropy there, so entropy was credited asymptotically,
approximating Shannon entropy of independent sources (rather than a
stronger min-entropy notion) using 1/8th fractional bits and replacing
a constant 2-2/√𝑒 term (~0.786938) with 3/4 (0.75) to slightly
underestimate it. This wasn't superb, but it was perhaps better than
nothing, so that's what was done. Which entropy specifically was being
cancelled out and how much precisely each time is hard to tell, though
as I showed with the attack code in my previous commit, a motivated
adversary with sufficient information can actually cancel out
everything.

Since we're no longer using an LFSR for entropy accumulation, this
probabilistic cancellation is no longer relevant. Rather, we're now
using a computational hash function as the accumulator and we've
switched to working in the random oracle model, from which we can now
revisit the question of min-entropy accumulation, which is done in
detail in <https://eprint.iacr.org/2019/198>.

Consider a long input bit string that is built by concatenating various
smaller independent input bit strings. Each one of these inputs has a
designated min-entropy, which is what we're passing to
credit_entropy_bits(h). When we pass the concatenation of these to a
random oracle, it means that an adversary trying to receive back the
same reply as us would need to become certain about each part of the
concatenated bit string we passed in, which means becoming certain about
all of those h values. That means we can estimate the accumulation by
simply adding up the h values in calls to credit_entropy_bits(h);
there's no probabilistic cancellation at play like there was said to be
for the LFSR. Incidentally, this is also what other entropy accumulators
based on computational hash functions do as well.

So this commit replaces credit_entropy_bits(h) with essentially `total =
min(POOL_BITS, total + h)`, done with a cmpxchg loop as before.

What if we're wrong and the above is nonsense? It's not, but let's
assume we don't want the actual _behavior_ of the code to change much.
Currently that behavior is not extracting from the input pool until it
has 128 bits of entropy in it. With the old algorithm, we'd hit that
magic 128 number after roughly 256 calls to credit_entropy_bits(1). So,
we can retain more or less the old behavior by waiting to extract from
the input pool until it hits 256 bits of entropy using the new code. For
people concerned about this change, it means that there's not that much
practical behavioral change. And for folks actually trying to model
the behavior rigorously, it means that we have an even higher margin
against attacks.

Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# 9c07f578 02-Feb-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: simplify entropy debiting

Our pool is 256 bits, and we only ever use all of it or don't use it at
all, which is decided by whether or not it has at least 128 bits in it.
So we can drastically simplify the accounting and cmpxchg loop to do
exactly this. While we're at it, we move the minimum bit size into a
constant so it can be shared between the two places where it matters.

The reason we want any of this is for the case in which an attacker has
compromised the current state, and then bruteforces small amounts of
entropy added to it. By demanding a particular minimum amount of entropy
be present before reseeding, we make that bruteforcing difficult.

Note that this rationale no longer includes anything about /dev/random
blocking at the right moment, since /dev/random no longer blocks (except
for at ~boot), but rather uses the crng. In a former life, /dev/random
was different and therefore required a more nuanced account(), but this
is no longer.

Behaviorally, nothing changes here. This is just a simplification of
the code.

Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# 6e8ec255 16-Jan-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: use computational hash for entropy extraction

The current 4096-bit LFSR used for entropy collection had a few
desirable attributes for the context in which it was created. For
example, the state was huge, which meant that /dev/random would be able
to output quite a bit of accumulated entropy before blocking. It was
also, in its time, quite fast at accumulating entropy byte-by-byte,
which matters given the varying contexts in which mix_pool_bytes() is
called. And its diffusion was relatively high, which meant that changes
would ripple across several words of state rather quickly.

However, it also suffers from a few security vulnerabilities. In
particular, inputs learned by an attacker can be undone, but moreover,
if the state of the pool leaks, its contents can be controlled and
entirely zeroed out. I've demonstrated this attack with this SMT2
script, <https://xn--4db.cc/5o9xO8pb>, which Boolector/CaDiCal solves in
a matter of seconds on a single core of my laptop, resulting in little
proof of concept C demonstrators such as <https://xn--4db.cc/jCkvvIaH/c>.

For basically all recent formal models of RNGs, these attacks represent
a significant cryptographic flaw. But how does this manifest
practically? If an attacker has access to the system to such a degree
that he can learn the internal state of the RNG, arguably there are
other lower hanging vulnerabilities -- side-channel, infoleak, or
otherwise -- that might have higher priority. On the other hand, seed
files are frequently used on systems that have a hard time generating
much entropy on their own, and these seed files, being files, often leak
or are duplicated and distributed accidentally, or are even seeded over
the Internet intentionally, where their contents might be recorded or
tampered with. Seen this way, an otherwise quasi-implausible
vulnerability is a bit more practical than initially thought.

Another aspect of the current mix_pool_bytes() function is that, while
its performance was arguably competitive for the time in which it was
created, it's no longer considered so. This patch improves performance
significantly: on a high-end CPU, an i7-11850H, it improves performance
of mix_pool_bytes() by 225%, and on a low-end CPU, a Cortex-A7, it
improves performance by 103%.

This commit replaces the LFSR of mix_pool_bytes() with a straight-
forward cryptographic hash function, BLAKE2s, which is already in use
for pool extraction. Universal hashing with a secret seed was considered
too, something along the lines of <https://eprint.iacr.org/2013/338>,
but the requirement for a secret seed makes for a chicken & egg problem.
Instead we go with a formally proven scheme using a computational hash
function, described in sections 5.1, 6.4, and B.1.8 of
<https://eprint.iacr.org/2019/198>.

BLAKE2s outputs 256 bits, which should give us an appropriate amount of
min-entropy accumulation, and a wide enough margin of collision
resistance against active attacks. mix_pool_bytes() becomes a simple
call to blake2s_update(), for accumulation, while the extraction step
becomes a blake2s_final() to generate a seed, with which we can then do
a HKDF-like or BLAKE2X-like expansion, the first part of which we fold
back as an init key for subsequent blake2s_update()s, and the rest we
produce to the caller. This then is provided to our CRNG like usual. In
that expansion step, we make opportunistic use of 32 bytes of RDRAND
output, just as before. We also always reseed the crng with 32 bytes,
unconditionally, or not at all, rather than sometimes with 16 as before,
as we don't win anything by limiting beyond the 16 byte threshold.

Going for a hash function as an entropy collector is a conservative,
proven approach. The result of all this is a much simpler and much less
bespoke construction than what's there now, which not only plugs a
vulnerability but also improves performance considerably.

Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# 322cbb50 24-Jan-2022 Christoph Hellwig <hch@lst.de>

block: remove genhd.h

There is no good reason to keep genhd.h separate from the main blkdev.h
header that includes it. So fold the contents of genhd.h into blkdev.h
and remove genhd.h entirely.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20220124093913.742411-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 9d5505f1 30-Jan-2022 Dominik Brodowski <linux@dominikbrodowski.net>

random: only call crng_finalize_init() for primary_crng

crng_finalize_init() returns instantly if it is called for another pool
than primary_crng. The test whether crng_finalize_init() is still required
can be moved to the relevant caller in crng_reseed(), and
crng_need_final_init can be reset to false if crng_finalize_init() is
called with workqueues ready. Then, no previous callsite will call
crng_finalize_init() unless it is needed, and we can get rid of the
superfluous function parameter.

Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# ebf76063 30-Jan-2022 Dominik Brodowski <linux@dominikbrodowski.net>

random: access primary_pool directly rather than through pointer

Both crng_initialize_primary() and crng_init_try_arch_early() are
only called for the primary_pool. Accessing it directly instead of
through a function parameter simplifies the code.

Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# 042e293e 28-Jan-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: wake up /dev/random writers after zap

When account() is called, and the amount of entropy dips below
random_write_wakeup_bits, we wake up the random writers, so that they
can write some more in. However, the RNDZAPENTCNT/RNDCLEARPOOL ioctl
sets the entropy count to zero -- a potential reduction just like
account() -- but does not unblock writers. This commit adds the missing
logic to that ioctl to unblock waiting writers.

Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# c321e907 25-Jan-2022 Dominik Brodowski <linux@dominikbrodowski.net>

random: continually use hwgenerator randomness

The rngd kernel thread may sleep indefinitely if the entropy count is
kept above random_write_wakeup_bits by other entropy sources. To make
best use of multiple sources of randomness, mix entropy from hardware
RNGs into the pool at least once within CRNG_RESEED_INTERVAL.

Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# 5475e8f0 21-Jan-2022 Xiaoming Ni <nixiaoming@huawei.com>

random: move the random sysctl declarations to its own file

kernel/sysctl.c is a kitchen sink where everyone leaves their dirty
dishes, this makes it very difficult to maintain.

To help with this maintenance let's start by moving sysctls to places
where they actually belong. The proc sysctl maintainers do not want to
know what sysctl knobs you wish to add for your own piece of code, we
just care about the core logic.

So move the random sysctls to their own file and use
register_sysctl_init().

[mcgrof@kernel.org: commit log update to justify the move]

Link: https://lkml.kernel.org/r/20211124231435.1445213-3-mcgrof@kernel.org
Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Amir Goldstein <amir73il@gmail.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Antti Palosaari <crope@iki.fi>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: Clemens Ladisch <clemens@ladisch.de>
Cc: David Airlie <airlied@linux.ie>
Cc: Douglas Gilbert <dgilbert@interlog.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: James E.J. Bottomley <jejb@linux.ibm.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Julia Lawall <julia.lawall@inria.fr>
Cc: Kees Cook <keescook@chromium.org>
Cc: Lukas Middendorf <kernel@tuxforce.de>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Paul Turner <pjt@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Phillip Potter <phil@philpotter.co.uk>
Cc: Qing Wang <wangqing@vivo.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Sebastian Reichel <sre@kernel.org>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Stephen Kitt <steve@sk2.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# a254a0e4 17-Jan-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: simplify arithmetic function flow in account()

Now that have_bytes is never modified, we can simplify this function.
First, we move the check for negative entropy_count to be first. That
ensures that subsequent reads of this will be non-negative. Then,
have_bytes and ibytes can be folded into their one use site in the
min_t() function.

Suggested-by: Dominik Brodowski <linux@dominikbrodowski.net>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# 248045b8 15-Jan-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: selectively clang-format where it makes sense

This is an old driver that has seen a lot of different eras of kernel
coding style. In an effort to make it easier to code for, unify the
coding style around the current norm, by accepting some of -- but
certainly not all of -- the suggestions from clang-format. This should
remove ambiguity in coding style, especially with regards to spacing,
when code is being changed or amended. Consequently it also makes code
review easier on the eyes, following one uniform style rather than
several.

Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# 6c0eace6 15-Jan-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: access input_pool_data directly rather than through pointer

This gets rid of another abstraction we no longer need. It would be nice
if we could instead make pool an array rather than a pointer, but the
latent entropy plugin won't be able to do its magic in that case. So
instead we put all accesses to the input pool's actual data through the
input_pool_data array directly.

Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# 18263c4e 13-Jan-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: cleanup fractional entropy shift constants

The entropy estimator is calculated in terms of 1/8 bits, which means
there are various constants where things are shifted by 3. Move these
into our pool info enum with the other relevant constants. While we're
at it, move an English assertion about sizes into a proper BUILD_BUG_ON
so that the compiler can ensure this invariant.

Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# b3d51c1f 14-Jan-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: prepend remaining pool constants with POOL_

The other pool constants are prepended with POOL_, but not these last
ones. Rename them. This will then let us move them into the enum in the
following commit.

Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# 5b87adf3 13-Jan-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: de-duplicate INPUT_POOL constants

We already had the POOL_* constants, so deduplicate the older INPUT_POOL
ones. As well, fold EXTRACT_SIZE into the poolinfo enum, since it's
related.

Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# 0f637027 13-Jan-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: remove unused OUTPUT_POOL constants

We no longer have an output pool. Rather, we have just a wakeup bits
threshold for /dev/random reads, presumably so that processes don't
hang. This value, random_write_wakeup_bits, is configurable anyway. So
all the no longer usefully named OUTPUT_POOL constants were doing was
setting a reasonable default for random_write_wakeup_bits. This commit
gets rid of the constants and just puts it all in the default value of
random_write_wakeup_bits.

Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# 90ed1e67 12-Jan-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: rather than entropy_store abstraction, use global

Originally, the RNG used several pools, so having things abstracted out
over a generic entropy_store object made sense. These days, there's only
one input pool, and then an uneven mix of usage via the abstraction and
usage via &input_pool. Rather than this uneasy mixture, just get rid of
the abstraction entirely and have things always use the global. This
simplifies the code and makes reading it a bit easier.

Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# 8b2d953b 12-Jan-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: remove unused extract_entropy() reserved argument

This argument is always set to zero, as a result of us not caring about
keeping a certain amount reserved in the pool these days. So just remove
it and cleanup the function signatures.

Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# a4bfa9b3 12-Jan-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: remove incomplete last_data logic

There were a few things added under the "if (fips_enabled)" banner,
which never really got completed, and the FIPS people anyway are
choosing a different direction. Rather than keep around this halfbaked
code, get rid of it so that we can focus on a single design of the RNG
rather than two designs.

Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# d38bb085 09-Jan-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: cleanup integer types

Rather than using the userspace type, __uXX, switch to using uXX. And
rather than using variously chosen `char *` or `unsigned char *`, use
`u8 *` uniformly for things that aren't strings, in the case where we
are doing byte-by-byte traversal.

Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# 91ec0fe1 09-Jan-2022 Jason A. Donenfeld <Jason@zx2c4.com>

random: cleanup poolinfo abstraction

Now that we're only using one polynomial, we can cleanup its
representation into constants, instead of passing around pointers
dynamically to select different polynomials. This improves the codegen
and makes the code a bit more straightforward.

Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# c0a8a61e 14-Jan-2022 Schspa Shi <schspa@gmail.com>

random: fix typo in comments

s/or/for

Signed-off-by: Schspa Shi <schspa@gmail.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# 6c8e11e0 03-Jan-2022 Jann Horn <jannh@google.com>

random: don't reset crng_init_cnt on urandom_read()

At the moment, urandom_read() (used for /dev/urandom) resets crng_init_cnt
to zero when it is called at crng_init<2. This is inconsistent: We do it
for /dev/urandom reads, but not for the equivalent
getrandom(GRND_INSECURE).

(And worse, as Jason pointed out, we're only doing this as long as
maxwarn>0.)

crng_init_cnt is only read in crng_fast_load(); it is relevant at
crng_init==0 for determining when to switch to crng_init==1 (and where in
the RNG state array to write).

As far as I understand:

- crng_init==0 means "we have nothing, we might just be returning the same
exact numbers on every boot on every machine, we don't even have
non-cryptographic randomness; we should shove every bit of entropy we
can get into the RNG immediately"
- crng_init==1 means "well we have something, it might not be
cryptographic, but at least we're not gonna return the same data every
time or whatever, it's probably good enough for TCP and ASLR and stuff;
we now have time to build up actual cryptographic entropy in the input
pool"
- crng_init==2 means "this is supposed to be cryptographically secure now,
but we'll keep adding more entropy just to be sure".

The current code means that if someone is pulling data from /dev/urandom
fast enough at crng_init==0, we'll keep resetting crng_init_cnt, and we'll
never make forward progress to crng_init==1. It seems to be intended to
prevent an attacker from bruteforcing the contents of small individual RNG
inputs on the way from crng_init==0 to crng_init==1, but that's misguided;
crng_init==1 isn't supposed to provide proper cryptographic security
anyway, RNG users who care about getting secure RNG output have to wait
until crng_init==2.

This code was inconsistent, and it probably made things worse - just get
rid of it.

Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# 2ee25b69 30-Dec-2021 Jason A. Donenfeld <Jason@zx2c4.com>

random: avoid superfluous call to RDRAND in CRNG extraction

RDRAND is not fast. RDRAND is actually quite slow. We've known this for
a while, which is why functions like get_random_u{32,64} were converted
to use batching of our ChaCha-based CRNG instead.

Yet CRNG extraction still includes a call to RDRAND, in the hot path of
every call to get_random_bytes(), /dev/urandom, and getrandom(2).

This call to RDRAND here seems quite superfluous. CRNG is already
extracting things based on a 256-bit key, based on good entropy, which
is then reseeded periodically, updated, backtrack-mutated, and so
forth. The CRNG extraction construction is something that we're already
relying on to be secure and solid. If it's not, that's a serious
problem, and it's unlikely that mixing in a measly 32 bits from RDRAND
is going to alleviate things.

And in the case where the CRNG doesn't have enough entropy yet, we're
already initializing the ChaCha key row with RDRAND in
crng_init_try_arch_early().

Removing the call to RDRAND improves performance on an i7-11850H by
370%. In other words, the vast majority of the work done by
extract_crng() prior to this commit was devoted to fetching 32 bits of
RDRAND.

Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# 96562f28 31-Dec-2021 Dominik Brodowski <linux@dominikbrodowski.net>

random: early initialization of ChaCha constants

Previously, the ChaCha constants for the primary pool were only
initialized in crng_initialize_primary(), called by rand_initialize().
However, some randomness is actually extracted from the primary pool
beforehand, e.g. by kmem_cache_create(). Therefore, statically
initialize the ChaCha constants for the primary pool.

Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: <linux-crypto@vger.kernel.org>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# 7b873241 30-Dec-2021 Jason A. Donenfeld <Jason@zx2c4.com>

random: use IS_ENABLED(CONFIG_NUMA) instead of ifdefs

Rather than an awkward combination of ifdefs and __maybe_unused, we can
ensure more source gets parsed, regardless of the configuration, by
using IS_ENABLED for the CONFIG_NUMA conditional code. This makes things
cleaner and easier to follow.

I've confirmed that on !CONFIG_NUMA, we don't wind up with excess code
by accident; the generated object file is the same.

Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# 161212c7 29-Dec-2021 Dominik Brodowski <linux@dominikbrodowski.net>

random: harmonize "crng init done" messages

We print out "crng init done" for !TRUST_CPU, so we should also print
out the same for TRUST_CPU.

Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# 57826fee 29-Dec-2021 Jason A. Donenfeld <Jason@zx2c4.com>

random: mix bootloader randomness into pool

If we're trusting bootloader randomness, crng_fast_load() is called by
add_hwgenerator_randomness(), which sets us to crng_init==1. However,
usually it is only called once for an initial 64-byte push, so bootloader
entropy will not mix any bytes into the input pool. So it's conceivable
that crng_init==1 when crng_initialize_primary() is called later, but
then the input pool is empty. When that happens, the crng state key will
be overwritten with extracted output from the empty input pool. That's
bad.

In contrast, if we're not trusting bootloader randomness, we call
crng_slow_load() *and* we call mix_pool_bytes(), so that later
crng_initialize_primary() isn't drawing on nothing.

In order to prevent crng_initialize_primary() from extracting an empty
pool, have the trusted bootloader case mirror that of the untrusted
bootloader case, mixing the input into the pool.

[linux@dominikbrodowski.net: rewrite commit message]
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# 73c7733f 29-Dec-2021 Jason A. Donenfeld <Jason@zx2c4.com>

random: do not throw away excess input to crng_fast_load

When crng_fast_load() is called by add_hwgenerator_randomness(), we
currently will advance to crng_init==1 once we've acquired 64 bytes, and
then throw away the rest of the buffer. Usually, that is not a problem:
When add_hwgenerator_randomness() gets called via EFI or DT during
setup_arch(), there won't be any IRQ randomness. Therefore, the 64 bytes
passed by EFI exactly matches what is needed to advance to crng_init==1.
Usually, DT seems to pass 64 bytes as well -- with one notable exception
being kexec, which hands over 128 bytes of entropy to the kexec'd kernel.
In that case, we'll advance to crng_init==1 once 64 of those bytes are
consumed by crng_fast_load(), but won't continue onward feeding in bytes
to progress to crng_init==2. This commit fixes the issue by feeding
any leftover bytes into the next phase in add_hwgenerator_randomness().

[linux@dominikbrodowski.net: rewrite commit message]
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# 9c3ddde3 29-Dec-2021 Jason A. Donenfeld <Jason@zx2c4.com>

random: do not re-init if crng_reseed completes before primary init

If the bootloader supplies sufficient material and crng_reseed() is called
very early on, but not too early that wqs aren't available yet, then we
might transition to crng_init==2 before rand_initialize()'s call to
crng_initialize_primary() made. Then, when crng_initialize_primary() is
called, if we're trusting the CPU's RDRAND instructions, we'll
needlessly reinitialize the RNG and emit a message about it. This is
mostly harmless, as numa_crng_init() will allocate and then free what it
just allocated, and excessive calls to invalidate_batched_entropy()
aren't so harmful. But it is funky and the extra message is confusing,
so avoid the re-initialization all together by checking for crng_init <
2 in crng_initialize_primary(), just as we already do in crng_reseed().

Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# f7e67b8e 29-Dec-2021 Dominik Brodowski <linux@dominikbrodowski.net>

random: fix crash on multiple early calls to add_bootloader_randomness()

Currently, if CONFIG_RANDOM_TRUST_BOOTLOADER is enabled, multiple calls
to add_bootloader_randomness() are broken and can cause a NULL pointer
dereference, as noted by Ivan T. Ivanov. This is not only a hypothetical
problem, as qemu on arm64 may provide bootloader entropy via EFI and via
devicetree.

On the first call to add_hwgenerator_randomness(), crng_fast_load() is
executed, and if the seed is long enough, crng_init will be set to 1.
On subsequent calls to add_bootloader_randomness() and then to
add_hwgenerator_randomness(), crng_fast_load() will be skipped. Instead,
wait_event_interruptible() and then credit_entropy_bits() will be called.
If the entropy count for that second seed is large enough, that proceeds
to crng_reseed().

However, both wait_event_interruptible() and crng_reseed() depends
(at least in numa_crng_init()) on workqueues. Therefore, test whether
system_wq is already initialized, which is a sufficient indicator that
workqueue_init_early() has progressed far enough.

If we wind up hitting the !system_wq case, we later want to do what
would have been done there when wqs are up, so set a flag, and do that
work later from the rand_initialize() call.

Reported-by: Ivan T. Ivanov <iivanov@suse.de>
Fixes: 18b915ac6b0a ("efi/random: Treat EFI_RNG_PROTOCOL output as bootloader randomness")
Cc: stable@vger.kernel.org
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
[Jason: added crng_need_done state and related logic.]
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# 0d9488ff 24-Dec-2021 Jason A. Donenfeld <Jason@zx2c4.com>

random: do not sign extend bytes for rotation when mixing

By using `char` instead of `unsigned char`, certain platforms will sign
extend the byte when `w = rol32(*bytes++, input_rotate)` is called,
meaning that bit 7 is overrepresented when mixing. This isn't a real
problem (unless the mixer itself is already broken) since it's still
invertible, but it's not quite correct either. Fix this by using an
explicit unsigned type.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# 9f9eff85 21-Dec-2021 Jason A. Donenfeld <Jason@zx2c4.com>

random: use BLAKE2s instead of SHA1 in extraction

This commit addresses one of the lower hanging fruits of the RNG: its
usage of SHA1.

BLAKE2s is generally faster, and certainly more secure, than SHA1, which
has [1] been [2] really [3] very [4] broken [5]. Additionally, the
current construction in the RNG doesn't use the full SHA1 function, as
specified, and allows overwriting the IV with RDRAND output in an
undocumented way, even in the case when RDRAND isn't set to "trusted",
which means potential malicious IV choices. And its short length means
that keeping only half of it secret when feeding back into the mixer
gives us only 2^80 bits of forward secrecy. In other words, not only is
the choice of hash function dated, but the use of it isn't really great
either.

This commit aims to fix both of these issues while also keeping the
general structure and semantics as close to the original as possible.
Specifically:

a) Rather than overwriting the hash IV with RDRAND, we put it into
BLAKE2's documented "salt" and "personal" fields, which were
specifically created for this type of usage.
b) Since this function feeds the full hash result back into the
entropy collector, we only return from it half the length of the
hash, just as it was done before. This increases the
construction's forward secrecy from 2^80 to a much more
comfortable 2^128.
c) Rather than using the raw "sha1_transform" function alone, we
instead use the full proper BLAKE2s function, with finalization.

This also has the advantage of supplying 16 bytes at a time rather than
SHA1's 10 bytes, which, in addition to having a faster compression
function to begin with, means faster extraction in general. On an Intel
i7-11850H, this commit makes initial seeding around 131% faster.

BLAKE2s itself has the nice property of internally being based on the
ChaCha permutation, which the RNG is already using for expansion, so
there shouldn't be any issue with newness, funkiness, or surprising CPU
behavior, since it's based on something already in use.

[1] https://eprint.iacr.org/2005/010.pdf
[2] https://www.iacr.org/archive/crypto2005/36210017/36210017.pdf
[3] https://eprint.iacr.org/2015/967.pdf
[4] https://shattered.io/static/shattered.pdf
[5] https://www.usenix.org/system/files/sec20-leurent.pdf

Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# 009ba856 20-Dec-2021 Eric Biggers <ebiggers@google.com>

random: fix data race on crng init time

_extract_crng() does plain loads of crng->init_time and
crng_global_init_time, which causes undefined behavior if
crng_reseed() and RNDRESEEDCRNG modify these corrently.

Use READ_ONCE() and WRITE_ONCE() to make the behavior defined.

Don't fix the race on crng->init_time by protecting it with crng->lock,
since it's not a problem for duplicate reseedings to occur. I.e., the
lockless access with READ_ONCE() is fine.

Fixes: d848e5f8e1eb ("random: add new ioctl RNDRESEEDCRNG")
Fixes: e192be9d9a30 ("random: replace non-blocking pool with a Chacha20-based CRNG")
Cc: stable@vger.kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# 5d73d1e3 20-Dec-2021 Eric Biggers <ebiggers@google.com>

random: fix data race on crng_node_pool

extract_crng() and crng_backtrack_protect() load crng_node_pool with a
plain load, which causes undefined behavior if do_numa_crng_init()
modifies it concurrently.

Fix this by using READ_ONCE(). Note: as per the previous discussion
https://lore.kernel.org/lkml/20211219025139.31085-1-ebiggers@kernel.org/T/#u,
READ_ONCE() is believed to be sufficient here, and it was requested that
it be used here instead of smp_load_acquire().

Also change do_numa_crng_init() to set crng_node_pool using
cmpxchg_release() instead of mb() + cmpxchg(), as the former is
sufficient here but is more lightweight.

Fixes: 1e7f583af67b ("random: make /dev/urandom scalable for silly userspace programs")
Cc: stable@vger.kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# 703f7066 07-Dec-2021 Sebastian Andrzej Siewior <bigeasy@linutronix.de>

random: remove unused irq_flags argument from add_interrupt_randomness()

Since commit
ee3e00e9e7101 ("random: use registers from interrupted code for CPU's w/o a cycle counter")

the irq_flags argument is no longer used.

Remove unused irq_flags.

Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dexuan Cui <decui@microsoft.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wei Liu <wei.liu@kernel.org>
Cc: linux-hyperv@vger.kernel.org
Cc: x86@kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Wei Liu <wei.liu@kernel.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# 2b6c6e3d 01-Dec-2021 Mark Brown <broonie@kernel.org>

random: document add_hwgenerator_randomness() with other input functions

The section at the top of random.c which documents the input functions
available does not document add_hwgenerator_randomness() which might lead
a reader to overlook it. Add a brief note about it.

Signed-off-by: Mark Brown <broonie@kernel.org>
[Jason: reorganize position of function in doc comment and also document
add_bootloader_randomness() while we're at it.]
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# 118a4417 21-Mar-2021 Eric Biggers <ebiggers@google.com>

random: remove dead code left over from blocking pool

Remove some dead code that was left over following commit 90ea1c6436d2
("random: remove the blocking pool").

Cc: linux-crypto@vger.kernel.org
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Jann Horn <jannh@google.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>


# a181e0fd 21-Mar-2021 Eric Biggers <ebiggers@google.com>

random: initialize ChaCha20 constants with correct endianness

On big endian CPUs, the ChaCha20-based CRNG is using the wrong
endianness for the ChaCha20 constants.

This doesn't matter cryptographically, but technically it means it's not
ChaCha20 anymore. Fix it to always use the standard constants.

Cc: linux-crypto@vger.kernel.org
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Jann Horn <jannh@google.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>


# 11a0b5e0 12-Jan-2021 Eric Biggers <ebiggers@google.com>

random: fix the RNDRESEEDCRNG ioctl

The RNDRESEEDCRNG ioctl reseeds the primary_crng from itself, which
doesn't make sense. Reseed it from the input_pool instead.

Fixes: d848e5f8e1eb ("random: add new ioctl RNDRESEEDCRNG")
Cc: stable@vger.kernel.org
Cc: linux-crypto@vger.kernel.org
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Jann Horn <jannh@google.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jann Horn <jannh@google.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Link: https://lore.kernel.org/r/20210112192818.69921-1-ebiggers@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# 390596c9 05-Nov-2020 Ard Biesheuvel <ardb@kernel.org>

random: avoid arch_get_random_seed_long() when collecting IRQ randomness

When reseeding the CRNG periodically, arch_get_random_seed_long() is
called to obtain entropy from an architecture specific source if one
is implemented. In most cases, these are special instructions, but in
some cases, such as on ARM, we may want to back this using firmware
calls, which are considerably more expensive.

Another call to arch_get_random_seed_long() exists in the CRNG driver,
in add_interrupt_randomness(), which collects entropy by capturing
inter-interrupt timing and relying on interrupt jitter to provide
random bits. This is done by keeping a per-CPU state, and mixing in
the IRQ number, the cycle counter and the return address every time an
interrupt is taken, and mixing this per-CPU state into the entropy pool
every 64 invocations, or at least once per second. The entropy that is
gathered this way is credited as 1 bit of entropy. Every time this
happens, arch_get_random_seed_long() is invoked, and the result is
mixed in as well, and also credited with 1 bit of entropy.

This means that arch_get_random_seed_long() is called at least once
per second on every CPU, which seems excessive, and doesn't really
scale, especially in a virtualization scenario where CPUs may be
oversubscribed: in cases where arch_get_random_seed_long() is backed
by an instruction that actually goes back to a shared hardware entropy
source (such as RNDRRS on ARM), we will end up hitting it hundreds of
times per second.

So let's drop the call to arch_get_random_seed_long() from
add_interrupt_randomness(), and instead, rely on crng_reseed() to call
the arch hook to get random seed material from the platform.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Tested-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com>
Link: https://lore.kernel.org/r/20201105152944.16953-1-ardb@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>


# a24d22b2 12-Nov-2020 Eric Biggers <ebiggers@google.com>

crypto: sha - split sha.h into sha1.h and sha2.h

Currently <crypto/sha.h> contains declarations for both SHA-1 and SHA-2,
and <crypto/sha3.h> contains declarations for SHA-3.

This organization is inconsistent, but more importantly SHA-1 is no
longer considered to be cryptographically secure. So to the extent
possible, SHA-1 shouldn't be grouped together with any of the other SHA
versions, and usage of it should be phased out.

Therefore, split <crypto/sha.h> into two headers <crypto/sha1.h> and
<crypto/sha2.h>, and make everyone explicitly specify whether they want
the declarations for SHA-1, SHA-2, or both.

This avoids making the SHA-1 declarations visible to files that don't
want anything to do with SHA-1. It also prepares for potentially moving
sha1.h into a new insecure/ or dangerous/ directory.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>


# c51f8f88 09-Aug-2020 George Spelvin <lkml@sdf.org>

random32: make prandom_u32() output unpredictable

Non-cryptographic PRNGs may have great statistical properties, but
are usually trivially predictable to someone who knows the algorithm,
given a small sample of their output. An LFSR like prandom_u32() is
particularly simple, even if the sample is widely scattered bits.

It turns out the network stack uses prandom_u32() for some things like
random port numbers which it would prefer are *not* trivially predictable.
Predictability led to a practical DNS spoofing attack. Oops.

This patch replaces the LFSR with a homebrew cryptographic PRNG based
on the SipHash round function, which is in turn seeded with 128 bits
of strong random key. (The authors of SipHash have *not* been consulted
about this abuse of their algorithm.) Speed is prioritized over security;
attacks are rare, while performance is always wanted.

Replacing all callers of prandom_u32() is the quick fix.
Whether to reinstate a weaker PRNG for uses which can tolerate it
is an open question.

Commit f227e3ec3b5c ("random32: update the net random state on interrupt
and activity") was an earlier attempt at a solution. This patch replaces
it.

Reported-by: Amit Klein <aksecurity@gmail.com>
Cc: Willy Tarreau <w@1wt.eu>
Cc: Eric Dumazet <edumazet@google.com>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: tytso@mit.edu
Cc: Florian Westphal <fw@strlen.de>
Cc: Marc Plumb <lkml.mplumb@gmail.com>
Fixes: f227e3ec3b5c ("random32: update the net random state on interrupt and activity")
Signed-off-by: George Spelvin <lkml@sdf.org>
Link: https://lore.kernel.org/netdev/20200808152628.GA27941@SDF.ORG/
[ willy: partial reversal of f227e3ec3b5c; moved SIPROUND definitions
to prandom.h for later use; merged George's prandom_seed() proposal;
inlined siprand_u32(); replaced the net_rand_state[] array with 4
members to fix a build issue; cosmetic cleanups to make checkpatch
happy; fixed RANDOM32_SELFTEST build ]
Signed-off-by: Willy Tarreau <w@1wt.eu>


# f227e3ec 10-Jul-2020 Willy Tarreau <w@1wt.eu>

random32: update the net random state on interrupt and activity

This modifies the first 32 bits out of the 128 bits of a random CPU's
net_rand_state on interrupt or CPU activity to complicate remote
observations that could lead to guessing the network RNG's internal
state.

Note that depending on some network devices' interrupt rate moderation
or binding, this re-seeding might happen on every packet or even almost
never.

In addition, with NOHZ some CPUs might not even get timer interrupts,
leaving their local state rarely updated, while they are running
networked processes making use of the random state. For this reason, we
also perform this update in update_process_times() in order to at least
update the state when there is user or system activity, since it's the
only case we care about.

Reported-by: Amit Klein <aksecurity@gmail.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Eric Dumazet <edumazet@google.com>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# a2541dcb 02-Jun-2020 Christoph Hellwig <hch@lst.de>

random: fix an incorrect __user annotation on proc_do_entropy

No user pointers for sysctls anymore.

Fixes: 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler")
Reported-by: build test robot <lkp@intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 228c4f26 02-May-2020 Eric Biggers <ebiggers@google.com>

crypto: lib/sha1 - fold linux/cryptohash.h into crypto/sha.h

<linux/cryptohash.h> sounds very generic and important, like it's the
header to include if you're doing cryptographic hashing in the kernel.
But actually it only includes the library implementation of the SHA-1
compression function (not even the full SHA-1). This should basically
never be used anymore; SHA-1 is no longer considered secure, and there
are much better ways to do cryptographic hashing in the kernel.

Remove this header and fold it into <crypto/sha.h> which already
contains constants and functions for SHA-1 (along with SHA-2).

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>


# 6b0b0fa2 02-May-2020 Eric Biggers <ebiggers@google.com>

crypto: lib/sha1 - rename "sha" to "sha1"

The library implementation of the SHA-1 compression function is
confusingly called just "sha_transform()". Alongside it are some "SHA_"
constants and "sha_init()". Presumably these are left over from a time
when SHA just meant SHA-1. But now there are also SHA-2 and SHA-3, and
moreover SHA-1 is now considered insecure and thus shouldn't be used.

Therefore, rename these functions and constants to make it very clear
that they are for SHA-1. Also add a comment to make it clear that these
shouldn't be used.

For the extra-misleadingly named "SHA_MESSAGE_BYTES", rename it to
SHA1_BLOCK_SIZE and define it to just '64' rather than '(512/8)' so that
it matches the same definition in <crypto/sha.h>. This prepares for
merging <linux/cryptohash.h> into <crypto/sha.h>.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>


# 32927393 24-Apr-2020 Christoph Hellwig <hch@lst.de>

sysctl: pass kernel pointers to ->proc_handler

Instead of having all the sysctl handlers deal with user pointers, which
is rather hairy in terms of the BPF interaction, copy the input to and
from userspace in common code. This also means that the strings are
always NUL-terminated by the common code, making the API a little bit
safer.

As most handler just pass through the data to one of the common handlers
a lot of the changes are mechnical.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# ab9a7e27 09-Mar-2020 Mark Rutland <mark.rutland@arm.com>

random: avoid warnings for !CONFIG_NUMA builds

As crng_initialize_secondary() is only called by do_numa_crng_init(),
and the latter is under ifdeffery for CONFIG_NUMA, when CONFIG_NUMA is
not selected the compiler will warn that the former is unused:

| drivers/char/random.c:820:13: warning: 'crng_initialize_secondary' defined but not used [-Wunused-function]
| 820 | static void crng_initialize_secondary(struct crng_state *crng)
| | ^~~~~~~~~~~~~~~~~~~~~~~~~

Stephen reports that this happens for x86_64 noallconfig builds.

We could move crng_initialize_secondary() and crng_init_try_arch() under
the CONFIG_NUMA ifdeffery, but this has the unfortunate property of
separating them from crng_initialize_primary() and
crng_init_try_arch_early() respectively. Instead, let's mark
crng_initialize_secondary() as __maybe_unused.

Link: https://lore.kernel.org/r/20200310121747.GA49602@lakrids.cambridge.arm.com
Fixes: 5cbe0f13b51a ("random: split primary/secondary crng init paths")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>


# e00d996a 25-Feb-2020 Qian Cai <cai@lca.pw>

random: fix data races at timer_rand_state

Fields in "struct timer_rand_state" could be accessed concurrently.
Lockless plain reads and writes result in data races. Fix them by adding
pairs of READ|WRITE_ONCE(). The data races were reported by KCSAN,

BUG: KCSAN: data-race in add_timer_randomness / add_timer_randomness

write to 0xffff9f320a0a01d0 of 8 bytes by interrupt on cpu 22:
add_timer_randomness+0x100/0x190
add_timer_randomness at drivers/char/random.c:1152
add_disk_randomness+0x85/0x280
scsi_end_request+0x43a/0x4a0
scsi_io_completion+0xb7/0x7e0
scsi_finish_command+0x1ed/0x2a0
scsi_softirq_done+0x1c9/0x1d0
blk_done_softirq+0x181/0x1d0
__do_softirq+0xd9/0x57c
irq_exit+0xa2/0xc0
do_IRQ+0x8b/0x190
ret_from_intr+0x0/0x42
cpuidle_enter_state+0x15e/0x980
cpuidle_enter+0x69/0xc0
call_cpuidle+0x23/0x40
do_idle+0x248/0x280
cpu_startup_entry+0x1d/0x1f
start_secondary+0x1b2/0x230
secondary_startup_64+0xb6/0xc0

no locks held by swapper/22/0.
irq event stamp: 32871382
_raw_spin_unlock_irqrestore+0x53/0x60
_raw_spin_lock_irqsave+0x21/0x60
_local_bh_enable+0x21/0x30
irq_exit+0xa2/0xc0

read to 0xffff9f320a0a01d0 of 8 bytes by interrupt on cpu 2:
add_timer_randomness+0xe8/0x190
add_disk_randomness+0x85/0x280
scsi_end_request+0x43a/0x4a0
scsi_io_completion+0xb7/0x7e0
scsi_finish_command+0x1ed/0x2a0
scsi_softirq_done+0x1c9/0x1d0
blk_done_softirq+0x181/0x1d0
__do_softirq+0xd9/0x57c
irq_exit+0xa2/0xc0
do_IRQ+0x8b/0x190
ret_from_intr+0x0/0x42
cpuidle_enter_state+0x15e/0x980
cpuidle_enter+0x69/0xc0
call_cpuidle+0x23/0x40
do_idle+0x248/0x280
cpu_startup_entry+0x1d/0x1f
start_secondary+0x1b2/0x230
secondary_startup_64+0xb6/0xc0

no locks held by swapper/2/0.
irq event stamp: 37846304
_raw_spin_unlock_irqrestore+0x53/0x60
_raw_spin_lock_irqsave+0x21/0x60
_local_bh_enable+0x21/0x30
irq_exit+0xa2/0xc0

Reported by Kernel Concurrency Sanitizer on:
Hardware name: HP ProLiant BL660c Gen9, BIOS I38 10/17/2018

Link: https://lore.kernel.org/r/1582648024-13111-1-git-send-email-cai@lca.pw
Signed-off-by: Qian Cai <cai@lca.pw>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>


# 69efea71 21-Feb-2020 Jason A. Donenfeld <Jason@zx2c4.com>

random: always use batched entropy for get_random_u{32,64}

It turns out that RDRAND is pretty slow. Comparing these two
constructions:

for (i = 0; i < CHACHA_BLOCK_SIZE; i += sizeof(ret))
arch_get_random_long(&ret);

and

long buf[CHACHA_BLOCK_SIZE / sizeof(long)];
extract_crng((u8 *)buf);

it amortizes out to 352 cycles per long for the top one and 107 cycles
per long for the bottom one, on Coffee Lake Refresh, Intel Core i9-9880H.

And importantly, the top one has the drawback of not benefiting from the
real rng, whereas the bottom one has all the nice benefits of using our
own chacha rng. As get_random_u{32,64} gets used in more places (perhaps
beyond what it was originally intended for when it was introduced as
get_random_{int,long} back in the md5 monstrosity era), it seems like it
might be a good thing to strengthen its posture a tiny bit. Doing this
should only be stronger and not any weaker because that pool is already
initialized with a bunch of rdrand data (when available). This way, we
get the benefits of the hardware rng as well as our own rng.

Another benefit of this is that we no longer hit pitfalls of the recent
stream of AMD bugs in RDRAND. One often used code pattern for various
things is:

do {
val = get_random_u32();
} while (hash_table_contains_key(val));

That recent AMD bug rendered that pattern useless, whereas we're really
very certain that chacha20 output will give pretty distributed numbers,
no matter what.

So, this simplification seems better both from a security perspective
and from a performance perspective.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lore.kernel.org/r/20200221201037.30231-1-Jason@zx2c4.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>


# 253d3194 10-Feb-2020 Mark Rutland <mark.rutland@arm.com>

random: add arch_get_random_*long_early()

Some architectures (e.g. arm64) can have heterogeneous CPUs, and the
boot CPU may be able to provide entropy while secondary CPUs cannot. On
such systems, arch_get_random_long() and arch_get_random_seed_long()
will fail unless support for RNG instructions has been detected on all
CPUs. This prevents the boot CPU from being able to provide
(potentially) trusted entropy when seeding the primary CRNG.

To make it possible to seed the primary CRNG from the boot CPU without
adversely affecting the runtime versions of arch_get_random_long() and
arch_get_random_seed_long(), this patch adds new early versions of the
functions used when initializing the primary CRNG.

Default implementations are provided atop of the existing
arch_get_random_long() and arch_get_random_seed_long() so that only
architectures with such constraints need to provide the new helpers.

There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Theodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/20200210130015.17664-3-mark.rutland@arm.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>


# 5cbe0f13 10-Feb-2020 Mark Rutland <mark.rutland@arm.com>

random: split primary/secondary crng init paths

Currently crng_initialize() is used for both the primary CRNG and
secondary CRNGs. While we wish to share common logic, we need to do a
number of additional things for the primary CRNG, and this would be
easier to deal with were these handled in separate functions.

This patch splits crng_initialize() into crng_initialize_primary() and
crng_initialize_secondary(), with common logic factored out into a
crng_init_try_arch() helper.

There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Theodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/20200210130015.17664-2-mark.rutland@arm.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>


# 09a6d00a 07-Jan-2020 Yangtao Li <tiny.windzz@gmail.com>

random: remove some dead code of poolinfo

Since it is not being used, so delete it.

Signed-off-by: Yangtao Li <tiny.windzz@gmail.com>
Link: https://lore.kernel.org/r/20190607182517.28266-5-tiny.windzz@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>


# 727d499a 07-Jan-2020 Yangtao Li <tiny.windzz@gmail.com>

random: fix typo in add_timer_randomness()

s/entimate/estimate

Signed-off-by: Yangtao Li <tiny.windzz@gmail.com>
Link: https://lore.kernel.org/r/20190607182517.28266-4-tiny.windzz@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>


# 12cd53af 07-Jun-2019 Yangtao Li <tiny.windzz@gmail.com>

random: Add and use pr_fmt()

Prefix all printk/pr_<level> messages with "random: " to make the
logging a bit more consistent.

Miscellanea:

o Convert a printks to pr_notice
o Whitespace to align to open parentheses
o Remove embedded "random: " from pr_* as pr_fmt adds it

Signed-off-by: Yangtao Li <tiny.windzz@gmail.com>
Link: https://lore.kernel.org/r/20190607182517.28266-3-tiny.windzz@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>


# 12faac30 07-Jun-2019 Yangtao Li <tiny.windzz@gmail.com>

random: convert to ENTROPY_BITS for better code readability

Signed-off-by: Yangtao Li <tiny.windzz@gmail.com>
Link: https://lore.kernel.org/r/20190607182517.28266-2-tiny.windzz@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>


# 870e05b1 07-Jan-2020 Yangtao Li <tiny.windzz@gmail.com>

random: remove unnecessary unlikely()

WARN_ON() already contains an unlikely(), so it's not necessary to use
unlikely.

Signed-off-by: Yangtao Li <tiny.windzz@gmail.com>
Link: https://lore.kernel.org/r/20190607182517.28266-1-tiny.windzz@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>


# c95ea0c6 23-Dec-2019 Andy Lutomirski <luto@kernel.org>

random: remove kernel.random.read_wakeup_threshold

It has no effect any more, so remove it. We can revert this if
there is some user code that expects to be able to set this sysctl.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Link: https://lore.kernel.org/r/a74ed2cf0b5a5451428a246a9239f5bc4e29358f.1577088521.git.luto@kernel.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>


# 84df7cdf 23-Dec-2019 Andy Lutomirski <luto@kernel.org>

random: delete code to pull data into pools

There is no pool that pulls, so it was just dead code.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Link: https://lore.kernel.org/r/4a05fe0c7a5c831389ef4aea51d24528ac8682c7.1577088521.git.luto@kernel.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>


# 90ea1c64 23-Dec-2019 Andy Lutomirski <luto@kernel.org>

random: remove the blocking pool

There is no longer any interface to read data from the blocking
pool, so remove it.

This enables quite a bit of code deletion, much of which will be
done in subsequent patches.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Link: https://lore.kernel.org/r/511225a224bf0a291149d3c0b8b45393cd03ab96.1577088521.git.luto@kernel.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>


# 30c08efe 23-Dec-2019 Andy Lutomirski <luto@kernel.org>

random: make /dev/random be almost like /dev/urandom

This patch changes the read semantics of /dev/random to be the same
as /dev/urandom except that reads will block until the CRNG is
ready.

None of the cleanups that this enables have been done yet. As a
result, this gives a warning about an unused function.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Link: https://lore.kernel.org/r/5e6ac8831c6cf2e56a7a4b39616d1732b2bdd06c.1577088521.git.luto@kernel.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>


# 48446f19 23-Dec-2019 Andy Lutomirski <luto@kernel.org>

random: ignore GRND_RANDOM in getentropy(2)

The separate blocking pool is going away. Start by ignoring
GRND_RANDOM in getentropy(2).

This should not materially break any API. Any code that worked
without this change should work at least as well with this change.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Link: https://lore.kernel.org/r/705c5a091b63cc5da70c99304bb97e0109be0a26.1577088521.git.luto@kernel.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>


# 75551dbf 23-Dec-2019 Andy Lutomirski <luto@kernel.org>

random: add GRND_INSECURE to return best-effort non-cryptographic bytes

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Link: https://lore.kernel.org/r/d5473b56cf1fa900ca4bd2b3fc1e5b8874399919.1577088521.git.luto@kernel.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>


# c6f1deb1 23-Dec-2019 Andy Lutomirski <luto@kernel.org>

random: Add a urandom_read_nowait() for random APIs that don't warn

/dev/random and getrandom() never warn. Split the meat of
urandom_read() into urandom_read_nowarn() and leave the warning code
in urandom_read().

This has no effect on kernel behavior, but it makes subsequent
patches more straightforward. It also makes the fact that
getrandom() never warns more obvious.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Link: https://lore.kernel.org/r/c87ab200588de746431d9f916501ef11e5242b13.1577088521.git.luto@kernel.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>


# 4c8d0621 23-Dec-2019 Andy Lutomirski <luto@kernel.org>

random: Don't wake crng_init_wait when crng_init == 1

crng_init_wait is only used to wayt for crng_init to be set to 2, so
there's no point to waking it when crng_init is set to 1. Remove the
unnecessary wake_up_interruptible() call.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Link: https://lore.kernel.org/r/6fbc0bfcbfc1fa2c76fd574f5b6f552b11be7fde.1577088521.git.luto@kernel.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>


# 1b710b1b 13-Nov-2019 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>

char/random: silence a lockdep splat with printk()

Sergey didn't like the locking order,

uart_port->lock -> tty_port->lock

uart_write (uart_port->lock)
__uart_start
pl011_start_tx
pl011_tx_chars
uart_write_wakeup
tty_port_tty_wakeup
tty_port_default
tty_port_tty_get (tty_port->lock)

but those code is so old, and I have no clue how to de-couple it after
checking other locks in the splat. There is an onging effort to make all
printk() as deferred, so until that happens, workaround it for now as a
short-term fix.

LTP: starting iogen01 (export LTPROOT; rwtest -N iogen01 -i 120s -s
read,write -Da -Dv -n 2 500b:$TMPDIR/doio.f1.$$
1000b:$TMPDIR/doio.f2.$$)
WARNING: possible circular locking dependency detected
------------------------------------------------------
doio/49441 is trying to acquire lock:
ffff008b7cff7290 (&(&zone->lock)->rlock){..-.}, at: rmqueue+0x138/0x2050

but task is already holding lock:
60ff000822352818 (&pool->lock/1){-.-.}, at: start_flush_work+0xd8/0x3f0

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #4 (&pool->lock/1){-.-.}:
lock_acquire+0x320/0x360
_raw_spin_lock+0x64/0x80
__queue_work+0x4b4/0xa10
queue_work_on+0xac/0x11c
tty_schedule_flip+0x84/0xbc
tty_flip_buffer_push+0x1c/0x28
pty_write+0x98/0xd0
n_tty_write+0x450/0x60c
tty_write+0x338/0x474
__vfs_write+0x88/0x214
vfs_write+0x12c/0x1a4
redirected_tty_write+0x90/0xdc
do_loop_readv_writev+0x140/0x180
do_iter_write+0xe0/0x10c
vfs_writev+0x134/0x1cc
do_writev+0xbc/0x130
__arm64_sys_writev+0x58/0x8c
el0_svc_handler+0x170/0x240
el0_sync_handler+0x150/0x250
el0_sync+0x164/0x180

-> #3 (&(&port->lock)->rlock){-.-.}:
lock_acquire+0x320/0x360
_raw_spin_lock_irqsave+0x7c/0x9c
tty_port_tty_get+0x24/0x60
tty_port_default_wakeup+0x1c/0x3c
tty_port_tty_wakeup+0x34/0x40
uart_write_wakeup+0x28/0x44
pl011_tx_chars+0x1b8/0x270
pl011_start_tx+0x24/0x70
__uart_start+0x5c/0x68
uart_write+0x164/0x1c8
do_output_char+0x33c/0x348
n_tty_write+0x4bc/0x60c
tty_write+0x338/0x474
redirected_tty_write+0xc0/0xdc
do_loop_readv_writev+0x140/0x180
do_iter_write+0xe0/0x10c
vfs_writev+0x134/0x1cc
do_writev+0xbc/0x130
__arm64_sys_writev+0x58/0x8c
el0_svc_handler+0x170/0x240
el0_sync_handler+0x150/0x250
el0_sync+0x164/0x180

-> #2 (&port_lock_key){-.-.}:
lock_acquire+0x320/0x360
_raw_spin_lock+0x64/0x80
pl011_console_write+0xec/0x2cc
console_unlock+0x794/0x96c
vprintk_emit+0x260/0x31c
vprintk_default+0x54/0x7c
vprintk_func+0x218/0x254
printk+0x7c/0xa4
register_console+0x734/0x7b0
uart_add_one_port+0x734/0x834
pl011_register_port+0x6c/0xac
sbsa_uart_probe+0x234/0x2ec
platform_drv_probe+0xd4/0x124
really_probe+0x250/0x71c
driver_probe_device+0xb4/0x200
__device_attach_driver+0xd8/0x188
bus_for_each_drv+0xbc/0x110
__device_attach+0x120/0x220
device_initial_probe+0x20/0x2c
bus_probe_device+0x54/0x100
device_add+0xae8/0xc2c
platform_device_add+0x278/0x3b8
platform_device_register_full+0x238/0x2ac
acpi_create_platform_device+0x2dc/0x3a8
acpi_bus_attach+0x390/0x3cc
acpi_bus_attach+0x108/0x3cc
acpi_bus_attach+0x108/0x3cc
acpi_bus_attach+0x108/0x3cc
acpi_bus_scan+0x7c/0xb0
acpi_scan_init+0xe4/0x304
acpi_init+0x100/0x114
do_one_initcall+0x348/0x6a0
do_initcall_level+0x190/0x1fc
do_basic_setup+0x34/0x4c
kernel_init_freeable+0x19c/0x260
kernel_init+0x18/0x338
ret_from_fork+0x10/0x18

-> #1 (console_owner){-...}:
lock_acquire+0x320/0x360
console_lock_spinning_enable+0x6c/0x7c
console_unlock+0x4f8/0x96c
vprintk_emit+0x260/0x31c
vprintk_default+0x54/0x7c
vprintk_func+0x218/0x254
printk+0x7c/0xa4
get_random_u64+0x1c4/0x1dc
shuffle_pick_tail+0x40/0xac
__free_one_page+0x424/0x710
free_one_page+0x70/0x120
__free_pages_ok+0x61c/0xa94
__free_pages_core+0x1bc/0x294
memblock_free_pages+0x38/0x48
__free_pages_memory+0xcc/0xfc
__free_memory_core+0x70/0x78
free_low_memory_core_early+0x148/0x18c
memblock_free_all+0x18/0x54
mem_init+0xb4/0x17c
mm_init+0x14/0x38
start_kernel+0x19c/0x530

-> #0 (&(&zone->lock)->rlock){..-.}:
validate_chain+0xf6c/0x2e2c
__lock_acquire+0x868/0xc2c
lock_acquire+0x320/0x360
_raw_spin_lock+0x64/0x80
rmqueue+0x138/0x2050
get_page_from_freelist+0x474/0x688
__alloc_pages_nodemask+0x3b4/0x18dc
alloc_pages_current+0xd0/0xe0
alloc_slab_page+0x2b4/0x5e0
new_slab+0xc8/0x6bc
___slab_alloc+0x3b8/0x640
kmem_cache_alloc+0x4b4/0x588
__debug_object_init+0x778/0x8b4
debug_object_init_on_stack+0x40/0x50
start_flush_work+0x16c/0x3f0
__flush_work+0xb8/0x124
flush_work+0x20/0x30
xlog_cil_force_lsn+0x88/0x204 [xfs]
xfs_log_force_lsn+0x128/0x1b8 [xfs]
xfs_file_fsync+0x3c4/0x488 [xfs]
vfs_fsync_range+0xb0/0xd0
generic_write_sync+0x80/0xa0 [xfs]
xfs_file_buffered_aio_write+0x66c/0x6e4 [xfs]
xfs_file_write_iter+0x1a0/0x218 [xfs]
__vfs_write+0x1cc/0x214
vfs_write+0x12c/0x1a4
ksys_write+0xb0/0x120
__arm64_sys_write+0x54/0x88
el0_svc_handler+0x170/0x240
el0_sync_handler+0x150/0x250
el0_sync+0x164/0x180

other info that might help us debug this:

Chain exists of:
&(&zone->lock)->rlock --> &(&port->lock)->rlock --> &pool->lock/1

Possible unsafe locking scenario:

CPU0 CPU1
---- ----
lock(&pool->lock/1);
lock(&(&port->lock)->rlock);
lock(&pool->lock/1);
lock(&(&zone->lock)->rlock);

*** DEADLOCK ***

4 locks held by doio/49441:
#0: a0ff00886fc27408 (sb_writers#8){.+.+}, at: vfs_write+0x118/0x1a4
#1: 8fff00080810dfe0 (&xfs_nondir_ilock_class){++++}, at:
xfs_ilock+0x2a8/0x300 [xfs]
#2: ffff9000129f2390 (rcu_read_lock){....}, at:
rcu_lock_acquire+0x8/0x38
#3: 60ff000822352818 (&pool->lock/1){-.-.}, at:
start_flush_work+0xd8/0x3f0

stack backtrace:
CPU: 48 PID: 49441 Comm: doio Tainted: G W
Hardware name: HPE Apollo 70 /C01_APACHE_MB , BIOS
L50_5.13_1.11 06/18/2019
Call trace:
dump_backtrace+0x0/0x248
show_stack+0x20/0x2c
dump_stack+0xe8/0x150
print_circular_bug+0x368/0x380
check_noncircular+0x28c/0x294
validate_chain+0xf6c/0x2e2c
__lock_acquire+0x868/0xc2c
lock_acquire+0x320/0x360
_raw_spin_lock+0x64/0x80
rmqueue+0x138/0x2050
get_page_from_freelist+0x474/0x688
__alloc_pages_nodemask+0x3b4/0x18dc
alloc_pages_current+0xd0/0xe0
alloc_slab_page+0x2b4/0x5e0
new_slab+0xc8/0x6bc
___slab_alloc+0x3b8/0x640
kmem_cache_alloc+0x4b4/0x588
__debug_object_init+0x778/0x8b4
debug_object_init_on_stack+0x40/0x50
start_flush_work+0x16c/0x3f0
__flush_work+0xb8/0x124
flush_work+0x20/0x30
xlog_cil_force_lsn+0x88/0x204 [xfs]
xfs_log_force_lsn+0x128/0x1b8 [xfs]
xfs_file_fsync+0x3c4/0x488 [xfs]
vfs_fsync_range+0xb0/0xd0
generic_write_sync+0x80/0xa0 [xfs]
xfs_file_buffered_aio_write+0x66c/0x6e4 [xfs]
xfs_file_write_iter+0x1a0/0x218 [xfs]
__vfs_write+0x1cc/0x214
vfs_write+0x12c/0x1a4
ksys_write+0xb0/0x120
__arm64_sys_write+0x54/0x88
el0_svc_handler+0x170/0x240
el0_sync_handler+0x150/0x250
el0_sync+0x164/0x180

Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Signed-off-by: Qian Cai <cai@lca.pw>
Link: https://lore.kernel.org/r/1573679785-21068-1-git-send-email-cai@lca.pw
Signed-off-by: Theodore Ts'o <tytso@mit.edu>


# 4aa37c46 17-Dec-2019 Jason A. Donenfeld <Jason@zx2c4.com>

random: don't forget compat_ioctl on urandom

Recently, there's been some compat ioctl cleanup, in which large
hardcoded lists were replaced with compat_ptr_ioctl. One of these
changes involved removing the random.c hardcoded list entries and adding
a compat ioctl function pointer to the random.c fops. In the process,
urandom was forgotten about, so this commit fixes that oversight.

Fixes: 507e4e2b430b ("compat_ioctl: remove /dev/random commands")
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Link: https://lore.kernel.org/r/20191217172455.186395-1-Jason@zx2c4.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# 08e97aec 16-Nov-2019 Herbert Xu <herbert@gondor.apana.org.au>

Revert "hwrng: core - Freeze khwrng thread during suspend"

This reverts commit 03a3bb7ae631 ("hwrng: core - Freeze khwrng
thread during suspend"), ff296293b353 ("random: Support freezable
kthreads in add_hwgenerator_randomness()") and 59b569480dc8 ("random:
Use wait_event_freezable() in add_hwgenerator_randomness()").

These patches introduced regressions and we need more time to
get them ready for mainline.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>


# 507e4e2b 07-Sep-2018 Arnd Bergmann <arnd@arndb.de>

compat_ioctl: remove /dev/random commands

These are all handled by the random driver, so instead of listing
each ioctl, we can use the generic compat_ptr_ioctl() helper.

Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>


# 3fd57e7a 01-Oct-2019 Borislav Petkov <bp@alien8.de>

char/random: Add a newline at the end of the file

On Tue, Oct 01, 2019 at 10:14:40AM -0700, Linus Torvalds wrote:
> The previous state of the file didn't have that 0xa at the end, so you get that
>
>
> -EXPORT_SYMBOL_GPL(add_bootloader_randomness);
> \ No newline at end of file
> +EXPORT_SYMBOL_GPL(add_bootloader_randomness);
>
> which is "the '-' line doesn't have a newline, the '+' line does" marker.

Aaha, that makes total sense, thanks for explaining. Oh well, let's fix
it then so that people don't scratch heads like me.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 50ee7529 28-Sep-2019 Linus Torvalds <torvalds@linux-foundation.org>

random: try to actively add entropy rather than passively wait for it

For 5.3 we had to revert a nice ext4 IO pattern improvement, because it
caused a bootup regression due to lack of entropy at bootup together
with arguably broken user space that was asking for secure random
numbers when it really didn't need to.

See commit 72dbcf721566 (Revert "ext4: make __ext4_get_inode_loc plug").

This aims to solve the issue by actively generating entropy noise using
the CPU cycle counter when waiting for the random number generator to
initialize. This only works when you have a high-frequency time stamp
counter available, but that's the case on all modern x86 CPU's, and on
most other modern CPU's too.

What we do is to generate jitter entropy from the CPU cycle counter
under a somewhat complex load: calling the scheduler while also
guaranteeing a certain amount of timing noise by also triggering a
timer.

I'm sure we can tweak this, and that people will want to look at other
alternatives, but there's been a number of papers written on jitter
entropy, and this should really be fairly conservative by crediting one
bit of entropy for every timer-induced jump in the cycle counter. Not
because the timer itself would be all that unpredictable, but because
the interaction between the timer and the loop is going to be.

Even if (and perhaps particularly if) the timer actually happens on
another CPU, the cacheline interaction between the loop that reads the
cycle counter and the timer itself firing is going to add perturbations
to the cycle counter values that get mixed into the entropy pool.

As Thomas pointed out, with a modern out-of-order CPU, even quite simple
loops show a fair amount of hard-to-predict timing variability even in
the absense of external interrupts. But this tries to take that further
by actually having a fairly complex interaction.

This is not going to solve the entropy issue for architectures that have
no CPU cycle counter, but it's not clear how (and if) that is solvable,
and the hardware in question is largely starting to be irrelevant. And
by doing this we can at least avoid some of the even more contentious
approaches (like making the entropy waiting time out in order to avoid
the possibly unbounded waiting).

Cc: Ahmed Darwish <darwish.07@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Nicholas Mc Guire <hofrat@opentech.at>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Willy Tarreau <w@1wt.eu>
Cc: Alexander E. Patrakov <patrakov@gmail.com>
Cc: Lennart Poettering <mzxreary@0pointer.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 59b56948 05-Sep-2019 Stephen Boyd <swboyd@chromium.org>

random: Use wait_event_freezable() in add_hwgenerator_randomness()

Sebastian reports that after commit ff296293b353 ("random: Support freezable
kthreads in add_hwgenerator_randomness()") we can call might_sleep() when the
task state is TASK_INTERRUPTIBLE (state=1). This leads to the following warning.

do not call blocking ops when !TASK_RUNNING; state=1 set at [<00000000349d1489>] prepare_to_wait_event+0x5a/0x180
WARNING: CPU: 0 PID: 828 at kernel/sched/core.c:6741 __might_sleep+0x6f/0x80
Modules linked in:

CPU: 0 PID: 828 Comm: hwrng Not tainted 5.3.0-rc7-next-20190903+ #46
RIP: 0010:__might_sleep+0x6f/0x80

Call Trace:
kthread_freezable_should_stop+0x1b/0x60
add_hwgenerator_randomness+0xdd/0x130
hwrng_fillfn+0xbf/0x120
kthread+0x10c/0x140
ret_from_fork+0x27/0x50

We shouldn't call kthread_freezable_should_stop() from deep within the
wait_event code because the task state is still set as
TASK_INTERRUPTIBLE instead of TASK_RUNNING and
kthread_freezable_should_stop() will try to call into the freezer with
the task in the wrong state. Use wait_event_freezable() instead so that
it calls schedule() in the right place and tries to enter the freezer
when the task state is TASK_RUNNING instead.

Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Keerthy <j-keerthy@ti.com>
Fixes: ff296293b353 ("random: Support freezable kthreads in add_hwgenerator_randomness()")
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>


# 428826f5 23-Aug-2019 Hsin-Yi Wang <hsinyi@chromium.org>

fdt: add support for rng-seed

Introducing a chosen node, rng-seed, which is an entropy that can be
passed to kernel called very early to increase initial device
randomness. Bootloader should provide this entropy and the value is
read from /chosen/rng-seed in DT.

Obtain of_fdt_crc32 for CRC check after early_init_dt_scan_nodes(),
since early_init_dt_scan_chosen() would modify fdt to erase rng-seed.

Add a new interface add_bootloader_randomness() for rng-seed use case.
Depends on whether the seed is trustworthy, rng seed would be passed to
add_hwgenerator_randomness(). Otherwise it would be passed to
add_device_randomness(). Decision is controlled by kernel config
RANDOM_TRUST_BOOTLOADER.

Signed-off-by: Hsin-Yi Wang <hsinyi@chromium.org>
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Rob Herring <robh@kernel.org>
Reviewed-by: Theodore Ts'o <tytso@mit.edu> # drivers/char/random.c
Signed-off-by: Will Deacon <will@kernel.org>


# ff296293 19-Aug-2019 Stephen Boyd <swboyd@chromium.org>

random: Support freezable kthreads in add_hwgenerator_randomness()

The kthread calling this function is freezable after commit 03a3bb7ae631
("hwrng: core - Freeze khwrng thread during suspend") is applied.
Unfortunately, this function uses wait_event_interruptible() but doesn't
check for the kthread being woken up by the fake freezer signal. When a
user suspends the system, this kthread will wake up and if it fails the
entropy size check it will immediately go back to sleep and not go into
the freezer. Eventually, suspend will fail because the task never froze
and a warning message like this may appear:

PM: suspend entry (deep)
Filesystems sync: 0.000 seconds
Freezing user space processes ... (elapsed 0.001 seconds) done.
OOM killer disabled.
Freezing remaining freezable tasks ...
Freezing of tasks failed after 20.003 seconds (1 tasks refusing to freeze, wq_busy=0):
hwrng R running task 0 289 2 0x00000020
[<c08c64c4>] (__schedule) from [<c08c6a10>] (schedule+0x3c/0xc0)
[<c08c6a10>] (schedule) from [<c05dbd8c>] (add_hwgenerator_randomness+0xb0/0x100)
[<c05dbd8c>] (add_hwgenerator_randomness) from [<bf1803c8>] (hwrng_fillfn+0xc0/0x14c [rng_core])
[<bf1803c8>] (hwrng_fillfn [rng_core]) from [<c015abec>] (kthread+0x134/0x148)
[<c015abec>] (kthread) from [<c01010e8>] (ret_from_fork+0x14/0x2c)

Check for a freezer signal here and skip adding any randomness if the
task wakes up because it was frozen. This should make the kthread freeze
properly and suspend work again.

Fixes: 03a3bb7ae631 ("hwrng: core - Freeze khwrng thread during suspend")
Reported-by: Keerthy <j-keerthy@ti.com>
Tested-by: Keerthy <j-keerthy@ti.com>
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>


# 58be0106 21-May-2019 Theodore Ts'o <tytso@mit.edu>

random: fix soft lockup when trying to read from an uninitialized blocking pool

Fixes: eb9d1bf079bb: "random: only read from /dev/random after its pool has received 128 bits"
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>


# b7d5dc21 19-Apr-2019 Sebastian Andrzej Siewior <bigeasy@linutronix.de>

random: add a spinlock_t to struct batched_entropy

The per-CPU variable batched_entropy_uXX is protected by get_cpu_var().
This is just a preempt_disable() which ensures that the variable is only
from the local CPU. It does not protect against users on the same CPU
from another context. It is possible that a preemptible context reads
slot 0 and then an interrupt occurs and the same value is read again.

The above scenario is confirmed by lockdep if we add a spinlock:
| ================================
| WARNING: inconsistent lock state
| 5.1.0-rc3+ #42 Not tainted
| --------------------------------
| inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
| ksoftirqd/9/56 [HC0[0]:SC1[1]:HE0:SE0] takes:
| (____ptrval____) (batched_entropy_u32.lock){+.?.}, at: get_random_u32+0x3e/0xe0
| {SOFTIRQ-ON-W} state was registered at:
| _raw_spin_lock+0x2a/0x40
| get_random_u32+0x3e/0xe0
| new_slab+0x15c/0x7b0
| ___slab_alloc+0x492/0x620
| __slab_alloc.isra.73+0x53/0xa0
| kmem_cache_alloc_node+0xaf/0x2a0
| copy_process.part.41+0x1e1/0x2370
| _do_fork+0xdb/0x6d0
| kernel_thread+0x20/0x30
| kthreadd+0x1ba/0x220
| ret_from_fork+0x3a/0x50

| other info that might help us debug this:
| Possible unsafe locking scenario:
|
| CPU0
| ----
| lock(batched_entropy_u32.lock);
| <Interrupt>
| lock(batched_entropy_u32.lock);
|
| *** DEADLOCK ***
|
| stack backtrace:
| Call Trace:

| kmem_cache_alloc_trace+0x20e/0x270
| ipmi_alloc_recv_msg+0x16/0x40

| __do_softirq+0xec/0x48d
| run_ksoftirqd+0x37/0x60
| smpboot_thread_fn+0x191/0x290
| kthread+0xfe/0x130
| ret_from_fork+0x3a/0x50

Add a spinlock_t to the batched_entropy data structure and acquire the
lock while accessing it. Acquire the lock with disabled interrupts
because this function may be used from interrupt context.

Remove the batched_entropy_reset_lock lock. Now that we have a lock for
the data scructure, we can access it from a remote CPU.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>


# 92e507d2 19-Apr-2019 George Spelvin <lkml@sdf.org>

random: document get_random_int() family

Explain what these functions are for and when they offer
an advantage over get_random_bytes().

(We still need documentation on rng_is_initialized(), the
random_ready_callback system, and early boot in general.)

Signed-off-by: George Spelvin <lkml@sdf.org>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>


# fe6f1a6a 19-Apr-2019 Jon DeVree <nuxi@vault24.org>

random: fix CRNG initialization when random.trust_cpu=1

When the system boots with random.trust_cpu=1 it doesn't initialize the
per-NUMA CRNGs because it skips the rest of the CRNG startup code. This
means that the code from 1e7f583af67b ("random: make /dev/urandom scalable
for silly userspace programs") is not used when random.trust_cpu=1.

crash> dmesg | grep random:
[ 0.000000] random: get_random_bytes called from start_kernel+0x94/0x530 with crng_init=0
[ 0.314029] random: crng done (trusting CPU's manufacturer)
crash> print crng_node_pool
$6 = (struct crng_state **) 0x0

After adding the missing call to numa_crng_init() the per-NUMA CRNGs are
initialized again:

crash> dmesg | grep random:
[ 0.000000] random: get_random_bytes called from start_kernel+0x94/0x530 with crng_init=0
[ 0.314031] random: crng done (trusting CPU's manufacturer)
crash> print crng_node_pool
$1 = (struct crng_state **) 0xffff9a915f4014a0

The call to invalidate_batched_entropy() was also missing. This is
important for architectures like PPC and S390 which only have the
arch_get_random_seed_* functions.

Fixes: 39a8883a2b98 ("random: add a config option to trust the CPU's hwrng")
Signed-off-by: Jon DeVree <nuxi@vault24.org>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>


# d5553523 19-Apr-2019 Kees Cook <keescook@chromium.org>

random: move rand_initialize() earlier

Right now rand_initialize() is run as an early_initcall(), but it only
depends on timekeeping_init() (for mixing ktime_get_real() into the
pools). However, the call to boot_init_stack_canary() for stack canary
initialization runs earlier, which triggers a warning at boot:

random: get_random_bytes called from start_kernel+0x357/0x548 with crng_init=0

Instead, this moves rand_initialize() to after timekeeping_init(), and moves
canary initialization here as well.

Note that this warning may still remain for machines that do not have
UEFI RNG support (which initializes the RNG pools during setup_arch()),
or for x86 machines without RDRAND (or booting without "random.trust=on"
or CONFIG_RANDOM_TRUST_CPU=y).

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>


# eb9d1bf0 20-Feb-2019 Theodore Ts'o <tytso@mit.edu>

random: only read from /dev/random after its pool has received 128 bits

Immediately after boot, we allow reads from /dev/random before its
entropy pool has been fully initialized. Fix this so that we don't
allow this until the blocking pool has received 128 bits.

We do this by repurposing the initialized flag in the entropy pool
struct, and use the initialized flag in the blocking pool to indicate
whether it is safe to pull from the blocking pool.

To do this, we needed to rework when we decide to push entropy from the
input pool to the blocking pool, since the initialized flag for the
input pool was used for this purpose. To simplify things, we no
longer use the initialized flag for that purpose, nor do we use the
entropy_total field any more.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>


# 764ed189 01-Nov-2018 Rasmus Villemoes <linux@rasmusvillemoes.dk>

drivers/char/random.c: make primary_crng static

Since the definition of struct crng_state is private to random.c, and
primary_crng is neither declared or used elsewhere, there's no reason
for that symbol to have external linkage.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>


# 3bd0b5bf 01-Nov-2018 Rasmus Villemoes <linux@rasmusvillemoes.dk>

drivers/char/random.c: remove unused stuct poolinfo::poolbits

This field is never used, might as well remove it.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>


# 26e0854a 01-Nov-2018 Rasmus Villemoes <linux@rasmusvillemoes.dk>

drivers/char/random.c: constify poolinfo_table

Never modified, might as well be put in .rodata.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>


# 1ca1b917 16-Nov-2018 Eric Biggers <ebiggers@google.com>

crypto: chacha20-generic - refactor to allow varying number of rounds

In preparation for adding XChaCha12 support, rename/refactor
chacha20-generic to support different numbers of rounds. The
justification for needing XChaCha12 support is explained in more detail
in the patch "crypto: chacha - add XChaCha12 support".

The only difference between ChaCha{8,12,20} are the number of rounds
itself; all other parts of the algorithm are the same. Therefore,
remove the "20" from all definitions, structures, functions, files, etc.
that will be shared by all ChaCha versions.

Also make ->setkey() store the round count in the chacha_ctx (previously
chacha20_ctx). The generic code then passes the round count through to
chacha_block(). There will be a ->setkey() function for each explicitly
allowed round count; the encrypt/decrypt functions will be the same. I
decided not to do it the opposite way (same ->setkey() function for all
round counts, with different encrypt/decrypt functions) because that
would have required more boilerplate code in architecture-specific
implementations of ChaCha and XChaCha.

Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>


# a5e9f557 11-Sep-2018 Eric Biggers <ebiggers@google.com>

crypto: chacha20 - Fix chacha20_block() keystream alignment (again)

In commit 9f480faec58c ("crypto: chacha20 - Fix keystream alignment for
chacha20_block()"), I had missed that chacha20_block() can be called
directly on the buffer passed to get_random_bytes(), which can have any
alignment. So, while my commit didn't break anything, it didn't fully
solve the alignment problems.

Revert my solution and just update chacha20_block() to use
put_unaligned_le32(), so the output buffer need not be aligned.
This is simpler, and on many CPUs it's the same speed.

But, I kept the 'tmp' buffers in extract_crng_user() and
_get_random_bytes() 4-byte aligned, since that alignment is actually
needed for _crng_backtrack_protect() too.

Reported-by: Stephan Müller <smueller@chronox.de>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>


# 9b254366 27-Aug-2018 Kees Cook <keescook@chromium.org>

random: make CPU trust a boot parameter

Instead of forcing a distro or other system builder to choose
at build time whether the CPU is trusted for CRNG seeding via
CONFIG_RANDOM_TRUST_CPU, provide a boot-time parameter for end users to
control the choice. The CONFIG will set the default state instead.

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>


# 9a47249d 31-Jul-2018 Jason A. Donenfeld <Jason@zx2c4.com>

random: Make crng state queryable

It is very useful to be able to know whether or not get_random_bytes_wait
/ wait_for_random_bytes is going to block or not, or whether plain
get_random_bytes is going to return good randomness or bad randomness.

The particular use case is for mitigating certain attacks in WireGuard.
A handshake packet arrives and is queued up. Elsewhere a worker thread
takes items from the queue and processes them. In replying to these
items, it needs to use some random data, and it has to be good random
data. If we simply block until we can have good randomness, then it's
possible for an attacker to fill the queue up with packets waiting to be
processed. Upon realizing the queue is full, WireGuard will detect that
it's under a denial of service attack, and behave accordingly. A better
approach is just to drop incoming handshake packets if the crng is not
yet initialized.

This patch, therefore, makes that information directly accessible.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>


# b34fbaa9 22-Jul-2018 Ingo Molnar <mingo@elte.hu>

random: remove preempt disabled region

No need to keep preemption disabled across the whole function.

mix_pool_bytes() uses a spin_lock() to protect the pool and there are
other places like write_pool() whhich invoke mix_pool_bytes() without
disabling preemption.
credit_entropy_bits() is invoked from other places like
add_hwgenerator_randomness() without disabling preemption.

Before commit 95b709b6be49 ("random: drop trickle mode") the function
used __this_cpu_inc_return() which would require disabled preemption.
The preempt_disable() section was added in commit 43d5d3018c37 ("[PATCH]
random driver preempt robustness", history tree). It was claimed that
the code relied on "vt_ioctl() being called under BKL".

Cc: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[bigeasy: enhance the commit message]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>


# 39a8883a 17-Jul-2018 Theodore Ts'o <tytso@mit.edu>

random: add a config option to trust the CPU's hwrng

This gives the user building their own kernel (or a Linux
distribution) the option of deciding whether or not to trust the CPU's
hardware random number generator (e.g., RDRAND for x86 CPU's) as being
correctly implemented and not having a back door introduced (perhaps
courtesy of a Nation State's law enforcement or intelligence
agencies).

This will prevent getrandom(2) from blocking, if there is a
willingness to trust the CPU manufacturer.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>


# 753d433b 21-Jun-2018 Tobin C. Harding <me@tobin.cc>

random: Return nbytes filled from hw RNG

Currently the function get_random_bytes_arch() has return value 'void'.
If the hw RNG fails we currently fall back to using get_random_bytes().
This defeats the purpose of requesting random material from the hw RNG
in the first place.

There are currently no intree users of get_random_bytes_arch().

Only get random bytes from the hw RNG, make function return the number
of bytes retrieved from the hw RNG.

Acked-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Tobin C. Harding <me@tobin.cc>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>


# 8ddd6efa 21-Jun-2018 Tobin C. Harding <me@tobin.cc>

random: Fix whitespace pre random-bytes work

There are a couple of whitespace issues around the function
get_random_bytes_arch(). In preparation for patching this function
let's clean them up.

Acked-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Tobin C. Harding <me@tobin.cc>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>


# 81e69df3 14-Jul-2018 Theodore Ts'o <tytso@mit.edu>

random: mix rdrand with entropy sent in from userspace

Fedora has integrated the jitter entropy daemon to work around slow
boot problems, especially on VM's that don't support virtio-rng:

https://bugzilla.redhat.com/show_bug.cgi?id=1572944

It's understandable why they did this, but the Jitter entropy daemon
works fundamentally on the principle: "the CPU microarchitecture is
**so** complicated and we can't figure it out, so it *must* be
random". Yes, it uses statistical tests to "prove" it is secure, but
AES_ENCRYPT(NSA_KEY, COUNTER++) will also pass statistical tests with
flying colors.

So if RDRAND is available, mix it into entropy submitted from
userspace. It can't hurt, and if you believe the NSA has backdoored
RDRAND, then they probably have enough details about the Intel
microarchitecture that they can reverse engineer how the Jitter
entropy daemon affects the microarchitecture, and attack its output
stream. And if RDRAND is in fact an honest DRNG, it will immeasurably
improve on what the Jitter entropy daemon might produce.

This also provides some protection against someone who is able to read
or set the entropy seed file.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
Cc: Arnd Bergmann <arnd@arndb.de>


# a11e1d43 28-Jun-2018 Linus Torvalds <torvalds@linux-foundation.org>

Revert changes to convert to ->poll_mask() and aio IOCB_CMD_POLL

The poll() changes were not well thought out, and completely
unexplained. They also caused a huge performance regression, because
"->poll()" was no longer a trivial file operation that just called down
to the underlying file operations, but instead did at least two indirect
calls.

Indirect calls are sadly slow now with the Spectre mitigation, but the
performance problem could at least be largely mitigated by changing the
"->get_poll_head()" operation to just have a per-file-descriptor pointer
to the poll head instead. That gets rid of one of the new indirections.

But that doesn't fix the new complexity that is completely unwarranted
for the regular case. The (undocumented) reason for the poll() changes
was some alleged AIO poll race fixing, but we don't make the common case
slower and more complex for some uncommon special case, so this all
really needs way more explanations and most likely a fundamental
redesign.

[ This revert is a revert of about 30 different commits, not reverted
individually because that would just be unnecessarily messy - Linus ]

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 89b310a2 09-Apr-2018 Christoph Hellwig <hch@lst.de>

random: convert to ->poll_mask

The big change is that random_read_wait and random_write_wait are merged
into a single waitqueue that uses keyed wakeups. Because wait_event_*
doesn't know about that this will lead to occassional spurious wakeups
in _random_read and add_hwgenerator_randomness, but wait_event_* is
designed to handle these and were are not in a a hot path there.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# 4e00b339 24-Apr-2018 Theodore Ts'o <tytso@mit.edu>

random: rate limit unseeded randomness warnings

On systems without sufficient boot randomness, no point spamming dmesg.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org


# 6c1e851c 23-Apr-2018 Theodore Ts'o <tytso@mit.edu>

random: fix possible sleeping allocation from irq context

We can do a sleeping allocation from an irq context when CONFIG_NUMA
is enabled. Fix this by initializing the NUMA crng instances in a
workqueue.

Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reported-by: syzbot+9de458f6a5e713ee8c1a@syzkaller.appspotmail.com
Fixes: 8ef35c866f8862df ("random: set up the NUMA crng instances...")
Cc: stable@vger.kernel.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>


# d848e5f8 11-Apr-2018 Theodore Ts'o <tytso@mit.edu>

random: add new ioctl RNDRESEEDCRNG

Add a new ioctl which forces the the crng to be reseeded.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org


# 0bb29a84 11-Apr-2018 Theodore Ts'o <tytso@mit.edu>

random: crng_reseed() should lock the crng instance that it is modifying

Reported-by: Jann Horn <jannh@google.com>
Fixes: 1e7f583af67b ("random: make /dev/urandom scalable for silly...")
Cc: stable@kernel.org # 4.8+
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jann Horn <jannh@google.com>


# 8ef35c86 11-Apr-2018 Theodore Ts'o <tytso@mit.edu>

random: set up the NUMA crng instances after the CRNG is fully initialized

Until the primary_crng is fully initialized, don't initialize the NUMA
crng nodes. Otherwise users of /dev/urandom on NUMA systems before
the CRNG is fully initialized can get very bad quality randomness. Of
course everyone should move to getrandom(2) where this won't be an
issue, but there's a lot of legacy code out there. This related to
CVE-2018-1108.

Reported-by: Jann Horn <jannh@google.com>
Fixes: 1e7f583af67b ("random: make /dev/urandom scalable for silly...")
Cc: stable@kernel.org # 4.8+
Signed-off-by: Theodore Ts'o <tytso@mit.edu>


# dc12baac 11-Apr-2018 Theodore Ts'o <tytso@mit.edu>

random: use a different mixing algorithm for add_device_randomness()

add_device_randomness() use of crng_fast_load() was highly
problematic. Some callers of add_device_randomness() can pass in a
large amount of static information. This would immediately promote
the crng_init state from 0 to 1, without really doing much to
initialize the primary_crng's internal state with something even
vaguely unpredictable.

Since we don't have the speed constraints of add_interrupt_randomness(),
we can do a better job mixing in the what unpredictability a device
driver or architecture maintainer might see fit to give us, and do it
in a way which does not bump the crng_init_cnt variable.

Also, since add_device_randomness() doesn't bump any entropy
accounting in crng_init state 0, mix the device randomness into the
input_pool entropy pool as well. This is related to CVE-2018-1108.

Reported-by: Jann Horn <jannh@google.com>
Fixes: ee7998c50c26 ("random: do not ignore early device randomness")
Cc: stable@kernel.org # 4.13+
Signed-off-by: Theodore Ts'o <tytso@mit.edu>


# 43838a23 11-Apr-2018 Theodore Ts'o <tytso@mit.edu>

random: fix crng_ready() test

The crng_init variable has three states:

0: The CRNG is not initialized at all
1: The CRNG has a small amount of entropy, hopefully good enough for
early-boot, non-cryptographical use cases
2: The CRNG is fully initialized and we are sure it is safe for
cryptographic use cases.

The crng_ready() function should only return true once we are in the
last state. This addresses CVE-2018-1108.

Reported-by: Jann Horn <jannh@google.com>
Fixes: e192be9d9a30 ("random: replace non-blocking pool...")
Cc: stable@kernel.org # 4.8+
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jann Horn <jannh@google.com>


# 5e747dd9 28-Feb-2018 Rasmus Villemoes <linux@rasmusvillemoes.dk>

drivers/char/random.c: remove unused dont_count_entropy

Ever since "random: kill dead extract_state struct" [1], the
dont_count_entropy member of struct timer_rand_state has been
effectively unused. Since it hasn't found a new use in 12 years, it's
probably safe to finally kill it.

[1] Pre-git, https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git/commit/?id=c1c48e61c251f57e7a3f1bf11b3c462b2de9dcb5

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>


# e8e8a2e4 28-Feb-2018 Andi Kleen <ak@linux.intel.com>

random: optimize add_interrupt_randomness

add_interrupt_randomess always wakes up
code blocking on /dev/random. This wake up is done
unconditionally. Unfortunately this means all interrupts
take the wait queue spinlock, which can be rather expensive
on large systems processing lots of interrupts.

We saw 1% cpu time spinning on this on a large macro workload
running on a large system.

I believe it's a recent regression (?)

Always check if there is a waiter on the wait queue
before waking up. This check can be done without
taking a spinlock.

1.06% 10460 [kernel.vmlinux] [k] native_queued_spin_lock_slowpath
|
---native_queued_spin_lock_slowpath
|
--0.57%--_raw_spin_lock_irqsave
|
--0.56%--__wake_up_common_lock
credit_entropy_bits
add_interrupt_randomness
handle_irq_event_percpu
handle_irq_event
handle_edge_irq
handle_irq
do_IRQ
common_interrupt

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>


# 9f886f4d 25-Feb-2017 Theodore Ts'o <tytso@mit.edu>

random: use a tighter cap in credit_entropy_bits_safe()

This fixes a harmless UBSAN where root could potentially end up
causing an overflow while bumping the entropy_total field (which is
ignored once the entropy pool has been initialized, and this generally
is completed during the boot sequence).

This is marginal for the stable kernel series, but it's a really
trivial patch, and it fixes UBSAN warning that might cause security
folks to get overly excited for no reason.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reported-by: Chen Feng <puck.chen@hisilicon.com>
Cc: stable@vger.kernel.org


# a9a08845 11-Feb-2018 Linus Torvalds <torvalds@linux-foundation.org>

vfs: do bulk POLL* -> EPOLL* replacement

This is the mindless scripted replacement of kernel use of POLL*
variables as described by Al, done by this script:

for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do
L=`git grep -l -w POLL$V | grep -v '^t' | grep -v /um/ | grep -v '^sa' | grep -v '/poll.h$'|grep -v '^D'`
for f in $L; do sed -i "-es/^\([^\"]*\)\(\<POLL$V\>\)/\\1E\\2/" $f; done
done

with de-mangling cleanups yet to come.

NOTE! On almost all architectures, the EPOLL* constants have the same
values as the POLL* constants do. But they keyword here is "almost".
For various bad reasons they aren't the same, and epoll() doesn't
actually work quite correctly in some cases due to this on Sparc et al.

The next patch from Al will sort out the final differences, and we
should be all done.

Scripted-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 9f480fae 22-Nov-2017 Eric Biggers <ebiggers@google.com>

crypto: chacha20 - Fix keystream alignment for chacha20_block()

When chacha20_block() outputs the keystream block, it uses 'u32' stores
directly. However, the callers (crypto/chacha20_generic.c and
drivers/char/random.c) declare the keystream buffer as a 'u8' array,
which is not guaranteed to have the needed alignment.

Fix it by having both callers declare the keystream as a 'u32' array.
For now this is preferable to switching over to the unaligned access
macros because chacha20_block() is only being used in cases where we can
easily control the alignment (stack buffers).

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>


# afc9a42b 03-Jul-2017 Al Viro <viro@zeniv.linux.org.uk>

the rest of drivers/*: annotate ->poll() instances

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 49502766 15-Nov-2017 Levin, Alexander (Sasha Levin) <alexander.levin@verizon.com>

kmemcheck: remove annotations

Patch series "kmemcheck: kill kmemcheck", v2.

As discussed at LSF/MM, kill kmemcheck.

KASan is a replacement that is able to work without the limitation of
kmemcheck (single CPU, slow). KASan is already upstream.

We are also not aware of any users of kmemcheck (or users who don't
consider KASan as a suitable replacement).

The only objection was that since KASAN wasn't supported by all GCC
versions provided by distros at that time we should hold off for 2
years, and try again.

Now that 2 years have passed, and all distros provide gcc that supports
KASAN, kill kmemcheck again for the very same reasons.

This patch (of 4):

Remove kmemcheck annotations, and calls to kmemcheck from the kernel.

[alexander.levin@verizon.com: correctly remove kmemcheck call from dma_map_sg_attrs]
Link: http://lkml.kernel.org/r/20171012192151.26531-1-alexander.levin@verizon.com
Link: http://lkml.kernel.org/r/20171007030159.22241-2-alexander.levin@verizon.com
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tim Hansen <devtimhansen@gmail.com>
Cc: Vegard Nossum <vegardno@ifi.uio.no>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 6aa7de05 23-Oct-2017 Mark Rutland <mark.rutland@arm.com>

locking/atomics: COCCINELLE/treewide: Convert trivial ACCESS_ONCE() patterns to READ_ONCE()/WRITE_ONCE()

Please do not apply this to mainline directly, instead please re-run the
coccinelle script shown below and apply its output.

For several reasons, it is desirable to use {READ,WRITE}_ONCE() in
preference to ACCESS_ONCE(), and new code is expected to use one of the
former. So far, there's been no reason to change most existing uses of
ACCESS_ONCE(), as these aren't harmful, and changing them results in
churn.

However, for some features, the read/write distinction is critical to
correct operation. To distinguish these cases, separate read/write
accessors must be used. This patch migrates (most) remaining
ACCESS_ONCE() instances to {READ,WRITE}_ONCE(), using the following
coccinelle script:

----
// Convert trivial ACCESS_ONCE() uses to equivalent READ_ONCE() and
// WRITE_ONCE()

// $ make coccicheck COCCI=/home/mark/once.cocci SPFLAGS="--include-headers" MODE=patch

virtual patch

@ depends on patch @
expression E1, E2;
@@

- ACCESS_ONCE(E1) = E2
+ WRITE_ONCE(E1, E2)

@ depends on patch @
expression E;
@@

- ACCESS_ONCE(E)
+ READ_ONCE(E)
----

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: davem@davemloft.net
Cc: linux-arch@vger.kernel.org
Cc: mpe@ellerman.id.au
Cc: shuah@kernel.org
Cc: snitzer@redhat.com
Cc: thor.thayer@linux.intel.com
Cc: tj@kernel.org
Cc: viro@zeniv.linux.org.uk
Cc: will.deacon@arm.com
Link: http://lkml.kernel.org/r/1508792849-3115-19-git-send-email-paulmck@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 51d96dc2 08-Aug-2017 Helge Deller <deller@gmx.de>

random: fix warning message on ia64 and parisc

Fix the warning message on the parisc and IA64 architectures to show the
correct function name of the caller by using %pS instead of %pF. The
message is printed with the value of _RET_IP_ which calls
__builtin_return_address(0) and as such returns the IP address caller
instead of pointer to a function descriptor of the caller.

The effect of this patch is visible on the parisc and ia64 architectures
only since those are the ones which use function descriptors while on
all others %pS and %pF will behave the same.

Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Helge Deller <deller@gmx.de>
Fixes: eecabf567422 ("random: suppress spammy warnings about unseeded randomness")
Fixes: d06bfd1989fe ("random: warn when kernel uses unseeded randomness")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 72e5c740 30-Jun-2017 Sebastian Andrzej Siewior <bigeasy@linutronix.de>

random: reorder READ_ONCE() in get_random_uXX

Avoid the READ_ONCE in commit 4a072c71f49b ("random: silence compiler
warnings and fix race") if we can leave the function after
arch_get_random_XXX().

Cc: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>


# eecabf56 08-Jun-2017 Theodore Ts'o <tytso@mit.edu>

random: suppress spammy warnings about unseeded randomness

Unfortunately, on some models of some architectures getting a fully
seeded CRNG is extremely difficult, and so this can result in dmesg
getting spammed for a surprisingly long time. This is really bad from
a security perspective, and so architecture maintainers really need to
do what they can to get the CRNG seeded sooner after the system is
booted. However, users can't do anything actionble to address this,
and spamming the kernel messages log will only just annoy people.

For developers who want to work on improving this situation,
CONFIG_WARN_UNSEEDED_RANDOM has been renamed to
CONFIG_WARN_ALL_UNSEEDED_RANDOM. By default the kernel will always
print the first use of unseeded randomness. This way, hopefully the
security obsessed will be happy that there is _some_ indication when
the kernel boots there may be a potential issue with that architecture
or subarchitecture. To see all uses of unseeded randomness,
developers can enable CONFIG_WARN_ALL_UNSEEDED_RANDOM.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>


# ee7998c5 12-Jul-2017 Kees Cook <keescook@chromium.org>

random: do not ignore early device randomness

The add_device_randomness() function would ignore incoming bytes if the
crng wasn't ready. This additionally makes sure to make an early enough
call to add_latent_entropy() to influence the initial stack canary,
which is especially important on non-x86 systems where it stays the same
through the life of the boot.

Link: http://lkml.kernel.org/r/20170626233038.GA48751@beast
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jessica Yu <jeyu@redhat.com>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Lokesh Vutla <lokeshvutla@ti.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# d06bfd19 07-Jun-2017 Jason A. Donenfeld <Jason@zx2c4.com>

random: warn when kernel uses unseeded randomness

This enables an important dmesg notification about when drivers have
used the crng without it being seeded first. Prior, these errors would
occur silently, and so there hasn't been a great way of diagnosing these
types of bugs for obscure setups. By adding this as a config option, we
can leave it on by default, so that we learn where these issues happen,
in the field, will still allowing some people to turn it off, if they
really know what they're doing and do not want the log entries.

However, we don't leave it _completely_ by default. An earlier version
of this patch simply had `default y`. I'd really love that, but it turns
out, this problem with unseeded randomness being used is really quite
present and is going to take a long time to fix. Thus, as a compromise
between log-messages-for-all and nobody-knows, this is `default y`,
except it is also `depends on DEBUG_KERNEL`. This will ensure that the
curious see the messages while others don't have to.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>


# e297a783 07-Jun-2017 Jason A. Donenfeld <Jason@zx2c4.com>

random: add wait_for_random_bytes() API

This enables users of get_random_{bytes,u32,u64,int,long} to wait until
the pool is ready before using this function, in case they actually want
to have reliable randomness.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>


# 4a072c71 14-Jun-2017 Jason A. Donenfeld <Jason@zx2c4.com>

random: silence compiler warnings and fix race

Odd versions of gcc for the sh4 architecture will actually warn about
flags being used while uninitialized, so we set them to zero. Non crazy
gccs will optimize that out again, so it doesn't make a difference.

Next, over aggressive gccs could inline the expression that defines
use_lock, which could then introduce a race resulting in a lock
imbalance. By using READ_ONCE, we prevent that fate. Finally, we make
that assignment const, so that gcc can still optimize a nice amount.

Finally, we fix a potential deadlock between primary_crng.lock and
batched_entropy_reset_lock, where they could be called in opposite
order. Moving the call to invalidate_batched_entropy to outside the lock
rectifies this issue.

Fixes: b169c13de473a85b3c859bb36216a4cb5f00a54a
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org


# b169c13d 07-Jun-2017 Jason A. Donenfeld <Jason@zx2c4.com>

random: invalidate batched entropy after crng init

It's possible that get_random_{u32,u64} is used before the crng has
initialized, in which case, its output might not be cryptographically
secure. For this problem, directly, this patch set is introducing the
*_wait variety of functions, but even with that, there's a subtle issue:
what happens to our batched entropy that was generated before
initialization. Prior to this commit, it'd stick around, supplying bad
numbers. After this commit, we force the entropy to be re-extracted
after each phase of the crng has initialized.

In order to avoid a race condition with the position counter, we
introduce a simple rwlock for this invalidation. Since it's only during
this awkward transition period, after things are all set up, we stop
using it, so that it doesn't have an impact on performance.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org # v4.11+


# 92e75428 07-Jun-2017 Theodore Ts'o <tytso@mit.edu>

random: use lockless method of accessing and updating f->reg_idx

Linus pointed out that there is a much more efficient way of avoiding
the problem that we were trying to address in commit 9dfa7bba35ac0:
"fix race in drivers/char/random.c:get_reg()".

Signed-off-by: Theodore Ts'o <tytso@mit.edu>


# 9dfa7bba 30-Apr-2017 Michael Schmitz <schmitzmic@gmail.com>

fix race in drivers/char/random.c:get_reg()

get_reg() can be reentered on architectures with prioritized interrupts
(m68k in this case), causing f->reg_index to be incremented after the
range check. Out of bounds memory access past the pt_regs struct results.
This will go mostly undetected unless access is beyond end of memory.

Prevent the race by disabling interrupts in get_reg().

Tested on m68k (Atari Falcon, and ARAnyM emulator).

Kudos to Geert Uytterhoeven for helping to trace this race.

Signed-off-by: Michael Schmitz <schmitzmic@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>


# db61ffe3 31-Jan-2017 Fabio Estevam <fabio.estevam@nxp.com>

random: move random_min_urandom_seed into CONFIG_SYSCTL ifdef block

Building arm allnodefconfig causes the following build warning:

drivers/char/random.c:318:12: warning: 'random_min_urandom_seed' defined but not used [-Wunused-variable]

Fix the warning by moving 'random_min_urandom_seed' declaration inside
the CONFIG_SYSCTL ifdef block, where it is actually used.

While at it, remove the comment prior to the variable declaration.

Signed-off-by: Fabio Estevam <fabio.estevam@nxp.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>


# c440408c 22-Jan-2017 Jason A. Donenfeld <Jason@zx2c4.com>

random: convert get_random_int/long into get_random_u32/u64

Many times, when a user wants a random number, he wants a random number
of a guaranteed size. So, thinking of get_random_int and get_random_long
in terms of get_random_u32 and get_random_u64 makes it much easier to
achieve this. It also makes the code simpler.

On 32-bit platforms, get_random_int and get_random_long are both aliased
to get_random_u32. On 64-bit platforms, int->u32 and long->u64.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>


# f5b98461 06-Jan-2017 Jason A. Donenfeld <Jason@zx2c4.com>

random: use chacha20 for get_random_int/long

Now that our crng uses chacha20, we can rely on its speedy
characteristics for replacing MD5, while simultaneously achieving a
higher security guarantee. Before the idea was to use these functions if
you wanted random integers that aren't stupidly insecure but aren't
necessarily secure either, a vague gray zone, that hopefully was "good
enough" for its users. With chacha20, we can strengthen this claim,
since either we're using an rdrand-like instruction, or we're using the
same crng as /dev/urandom. And it's faster than what was before.

We could have chosen to replace this with a SipHash-derived function,
which might be slightly faster, but at the cost of having yet another
RNG construction in the kernel. By moving to chacha20, we have a single
RNG to analyze and verify, and we also already get good performance
improvements on all platforms.

Implementation-wise, rather than use a generic buffer for both
get_random_int/long and memcpy based on the size needs, we use a
specific buffer for 32-bit reads and for 64-bit reads. This way, we're
guaranteed to always have aligned accesses on all platforms. While
slightly more verbose in C, the assembly this generates is a lot
simpler than otherwise.

Finally, on 32-bit platforms where longs and ints are the same size,
we simply alias get_random_int to get_random_long.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Suggested-by: Theodore Ts'o <tytso@mit.edu>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>


# 5d0e5ea3 27-Dec-2016 Stephan Müller <smueller@chronox.de>

random: fix comment for unused random_min_urandom_seed

The variable random_min_urandom_seed is not needed any more as it
defined the reseeding behavior of the nonblocking pool. Though it is not
needed any more, it is left in the code for user space interface
compatibility.

Signed-off-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>


# 43d8a72c 27-Dec-2016 Stephan Müller <smueller@chronox.de>

random: remove variable limit

The variable limit was used to identify the nonblocking pool's unlimited
random number generation. As the nonblocking pool is a thing of the
past, remove the limit variable and any conditions around it (i.e.
preserve the branches for limit == 1).

Signed-off-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>


# 2e03c36f 27-Dec-2016 Stephan Müller <smueller@chronox.de>

random: remove stale urandom_init_wait

The urandom_init_wait wait queue is a left over from the pre-ChaCha20
times and can therefore be savely removed.

Signed-off-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>


# 3d071d8d 14-Dec-2016 Stephan Mueller <stephan.mueller@atsec.com>

random: remove stale maybe_reseed_primary_crng

The function maybe_reseed_primary_crng is not used anywhere and thus can
be removed.

Signed-off-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>


# 7c0f6ba6 24-Dec-2016 Linus Torvalds <torvalds@linux-foundation.org>

Replace <asm/uaccess.h> with <linux/uaccess.h> globally

This was entirely automated, using the script by Al:

PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>'
sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \
$(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h)

to do the replacement at the end of the merge window.

Requested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 7425154d 11-Oct-2016 Jason Cooper <jason@lakedaemon.net>

random: remove unused randomize_range()

All call sites for randomize_range have been updated to use the much
simpler and more robust randomize_addr(). Remove the now unnecessary
code.

Link: http://lkml.kernel.org/r/20160803233913.32511-8-jason@lakedaemon.net
Signed-off-by: Jason Cooper <jason@lakedaemon.net>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 99fdafde 11-Oct-2016 Jason Cooper <jason@lakedaemon.net>

random: simplify API for random address requests

To date, all callers of randomize_range() have set the length to 0, and
check for a zero return value. For the current callers, the only way to
get zero returned is if end <= start. Since they are all adding a
constant to the start address, this is unnecessary.

We can remove a bunch of needless checks by simplifying the API to do just
what everyone wants, return an address between [start, start + range).

While we're here, s/get_random_int/get_random_long/. No current call site
is adversely affected by get_random_int(), since all current range
requests are < UINT_MAX. However, we should match caller expectations to
avoid coming up short (ha!) in the future.

All current callers to randomize_range() chose to use the start address if
randomize_range() failed. Therefore, we simplify things by just returning
the start address on error.

randomize_range() will be removed once all callers have been converted
over to randomize_addr().

Link: http://lkml.kernel.org/r/20160803233913.32511-2-jason@lakedaemon.net
Signed-off-by: Jason Cooper <jason@lakedaemon.net>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Roberts, William C" <william.c.roberts@intel.com>
Cc: Yann Droneaud <ydroneaud@opteya.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: Nick Kralevich <nnk@google.com>
Cc: Jeffrey Vander Stoep <jeffv@google.com>
Cc: Daniel Cashman <dcashman@android.com>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 0766f788 20-Jun-2016 Emese Revfy <re.emese@gmail.com>

latent_entropy: Mark functions with __latent_entropy

The __latent_entropy gcc attribute can be used only on functions and
variables. If it is on a function then the plugin will instrument it for
gathering control-flow entropy. If the attribute is on a variable then
the plugin will initialize it with random contents. The variable must
be an integer, an integer array type or a structure with integer fields.

These specific functions have been selected because they are init
functions (to help gather boot-time entropy), are called at unpredictable
times, or they have variable loops, each of which provide some level of
latent entropy.

Signed-off-by: Emese Revfy <re.emese@gmail.com>
[kees: expanded commit message]
Signed-off-by: Kees Cook <keescook@chromium.org>


# dd0f0cf5 30-Jul-2016 Michael Ellerman <mpe@ellerman.id.au>

random: Fix crashes with sparse node ids

On a system with sparse node ids, eg. a powerpc system with 4 nodes
numbered like so:

node 0: [mem 0x0000000000000000-0x00000007ffffffff]
node 1: [mem 0x0000000800000000-0x0000000fffffffff]
node 16: [mem 0x0000001000000000-0x00000017ffffffff]
node 17: [mem 0x0000001800000000-0x0000001fffffffff]

The code in rand_initialize() will allocate 4 pointers for the pool
array, and initialise them correctly.

However when go to use the pool, in eg. extract_crng(), we use the
numa_node_id() to index into the array. For the higher numbered node ids
this leads to random memory corruption, depending on what was kmalloc'ed
adjacent to the pool array.

Fix it by using nr_node_ids to size the pool array.

Fixes: 1e7f583af67b ("random: make /dev/urandom scalable for silly userspace programs")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 59b8d4f1 27-Jul-2016 Theodore Ts'o <tytso@mit.edu>

random: use for_each_online_node() to iterate over NUMA nodes

This fixes a crash on s390 with fake NUMA enabled.

Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Fixes: 1e7f583af67b ("random: make /dev/urandom scalable for silly userspace programs")
Signed-off-by: Theodore Ts'o <tytso@mit.edu>


# 86a574de 03-Jul-2016 Theodore Ts'o <tytso@mit.edu>

random: strengthen input validation for RNDADDTOENTCNT

Don't allow RNDADDTOENTCNT or RNDADDENTROPY to accept a negative
entropy value. It doesn't make any sense to subtract from the entropy
counter, and it can trigger a warning:

random: negative entropy/overflow: pool input count -40000
------------[ cut here ]------------
WARNING: CPU: 3 PID: 6828 at drivers/char/random.c:670[< none
>] credit_entropy_bits+0x21e/0xad0 drivers/char/random.c:670
Modules linked in:
CPU: 3 PID: 6828 Comm: a.out Not tainted 4.7.0-rc4+ #4
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
ffffffff880b58e0 ffff88005dd9fcb0 ffffffff82cc838f ffffffff87158b40
fffffbfff1016b1c 0000000000000000 0000000000000000 ffffffff87158b40
ffffffff83283dae 0000000000000009 ffff88005dd9fcf8 ffffffff8136d27f
Call Trace:
[< inline >] __dump_stack lib/dump_stack.c:15
[<ffffffff82cc838f>] dump_stack+0x12e/0x18f lib/dump_stack.c:51
[<ffffffff8136d27f>] __warn+0x19f/0x1e0 kernel/panic.c:516
[<ffffffff8136d48c>] warn_slowpath_null+0x2c/0x40 kernel/panic.c:551
[<ffffffff83283dae>] credit_entropy_bits+0x21e/0xad0 drivers/char/random.c:670
[< inline >] credit_entropy_bits_safe drivers/char/random.c:734
[<ffffffff8328785d>] random_ioctl+0x21d/0x250 drivers/char/random.c:1546
[< inline >] vfs_ioctl fs/ioctl.c:43
[<ffffffff8185316c>] do_vfs_ioctl+0x18c/0xff0 fs/ioctl.c:674
[< inline >] SYSC_ioctl fs/ioctl.c:689
[<ffffffff8185405f>] SyS_ioctl+0x8f/0xc0 fs/ioctl.c:680
[<ffffffff86a995c0>] entry_SYSCALL_64_fastpath+0x23/0xc1
arch/x86/entry/entry_64.S:207
---[ end trace 5d4902b2ba842f1f ]---

This was triggered using the test program:

// autogenerated by syzkaller (http://github.com/google/syzkaller)

int main() {
int fd = open("/dev/random", O_RDWR);
int val = -5000;
ioctl(fd, RNDADDTOENTCNT, &val);
return 0;
}

It's harmless in that (a) only root can trigger it, and (b) after
complaining the code never does let the entropy count go negative, but
it's better to simply not allow this userspace from passing in a
negative entropy value altogether.

Google-Bug-Id: #29575089
Reported-By: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>


# c92e040d 04-May-2016 Theodore Ts'o <tytso@mit.edu>

random: add backtracking protection to the CRNG

Signed-off-by: Theodore Ts'o <tytso@mit.edu>


# 1e7f583a 02-May-2016 Theodore Ts'o <tytso@mit.edu>

random: make /dev/urandom scalable for silly userspace programs

On a system with a 4 socket (NUMA) system where a large number of
application threads were all trying to read from /dev/urandom, this
can result in the system spending 80% of its time contending on the
global urandom spinlock. The application should have used its own
PRNG, but let's try to help it from running, lemming-like, straight
over the locking cliff.

Reported-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>


# e192be9d 12-Jun-2016 Theodore Ts'o <tytso@mit.edu>

random: replace non-blocking pool with a Chacha20-based CRNG

The CRNG is faster, and we don't pretend to track entropy usage in the
CRNG any more.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>


# b1132dea 04-May-2016 Eric Biggers <ebiggers3@gmail.com>

random: properly align get_random_int_hash

get_random_long() reads from the get_random_int_hash array using an
unsigned long pointer. For this code to be guaranteed correct on all
architectures, the array must be aligned to an unsigned long boundary.

Cc: stable@kernel.org
Signed-off-by: Eric Biggers <ebiggers3@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>


# 4b44f2d1 02-May-2016 Stephan Mueller <smueller@chronox.de>

random: add interrupt callback to VMBus IRQ handler

The Hyper-V Linux Integration Services use the VMBus implementation for
communication with the Hypervisor. VMBus registers its own interrupt
handler that completely bypasses the common Linux interrupt handling.
This implies that the interrupt entropy collector is not triggered.

This patch adds the interrupt entropy collection callback into the VMBus
interrupt handler function.

Cc: stable@kernel.org
Signed-off-by: Stephan Mueller <stephan.mueller@atsec.com>
Signed-off-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>


# 9b4d0087 13-Jun-2016 Theodore Ts'o <tytso@mit.edu>

random: print a warning for the first ten uninitialized random users

Since systemd is consistently using /dev/urandom before it is
initialized, we can't see the other potentially dangerous users of
/dev/urandom immediately after boot. So print the first ten such
complaints instead.

Cc: stable@kernel.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>


# 3371f3da 12-Jun-2016 Theodore Ts'o <tytso@mit.edu>

random: initialize the non-blocking pool via add_hwgenerator_randomness()

If we have a hardware RNG and are using the in-kernel rngd, we should
use this to initialize the non-blocking pool so that getrandom(2)
doesn't block unnecessarily.

Cc: stable@kernel.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>


# 8da4b8c4 20-May-2016 Andy Shevchenko <andriy.shevchenko@linux.intel.com>

lib/uuid.c: move generate_random_uuid() to uuid.c

Let's gather the UUID related functions under one hood.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Dmitry Kasatkin <dmitry.kasatkin@gmail.com>
Cc: Mimi Zohar <zohar@linux.vnet.ibm.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# ec9ee4ac 26-Feb-2016 Daniel Cashman <dcashman@android.com>

drivers: char: random: add get_random_long()

Commit d07e22597d1d ("mm: mmap: add new /proc tunable for mmap_base
ASLR") added the ability to choose from a range of values to use for
entropy count in generating the random offset to the mmap_base address.

The maximum value on this range was set to 32 bits for 64-bit x86
systems, but this value could be increased further, requiring more than
the 32 bits of randomness provided by get_random_int(), as is already
possible for arm64. Add a new function: get_random_long() which more
naturally fits with the mmap usage of get_random_int() but operates
exactly the same as get_random_int().

Also, fix the shifting constant in mmap_rnd() to be an unsigned long so
that values greater than 31 bits generate an appropriate mask without
overflow. This is especially important on x86, as its shift instruction
uses a 5-bit mask for the shift operand, which meant that any value for
mmap_rnd_bits over 31 acts as a no-op and effectively disables mmap_base
randomization.

Finally, replace calls to get_random_int() with get_random_long() where
appropriate.

This patch (of 2):

Add get_random_long().

Signed-off-by: Daniel Cashman <dcashman@android.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: David S. Miller <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Nick Kralevich <nnk@google.com>
Cc: Jeff Vander Stoep <jeffv@google.com>
Cc: Mark Salyzyn <salyzyn@android.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# c2719503 09-Jun-2015 Herbert Xu <herbert@gondor.apana.org.au>

random: Remove kernel blocking API

This patch removes the kernel blocking API as it has been completely
replaced by the callback API.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>


# 205a525c 09-Jun-2015 Herbert Xu <herbert@gondor.apana.org.au>

random: Add callback API for random pool readiness

The get_blocking_random_bytes API is broken because the wait can
be arbitrarily long (potentially forever) so there is no safe way
of calling it from within the kernel.

This patch replaces it with a callback API instead. The callback
is invoked potentially from interrupt context so the user needs
to schedule their own work thread if necessary.

In addition to adding callbacks, they can also be removed as
otherwise this opens up a way for user-space to allocate kernel
memory with no bound (by opening algif_rng descriptors and then
closing them).

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>


# 16b369a9 25-May-2015 Stephan Mueller <smueller@chronox.de>

random: Blocking API for accessing nonblocking_pool

The added API calls provide a synchronous function call
get_blocking_random_bytes where the caller is blocked until
the nonblocking_pool is initialized.

CC: Andreas Steffen <andreas.steffen@strongswan.org>
CC: Theodore Ts'o <tytso@mit.edu>
CC: Sandy Harris <sandyinchina@gmail.com>
Signed-off-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>


# 1d9de44e 21-May-2015 Herbert Xu <herbert@gondor.apana.org.au>

random: Wake up all getrandom(2) callers when pool is ready

If more than one application invokes getrandom(2) before the pool
is ready, then all bar one will be stuck forever because we use
wake_up_interruptible which wakes up a single task.

This patch replaces it with wake_up_all.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>


# 19acc77a 06-Feb-2015 George Spelvin <linux@horizon.com>

random: Fix fast_mix() function

There was a bad typo in commit 43759d4f429c ("random: use an improved
fast_mix() function") and I didn't notice because it "looked right", so
I saw what I expected to see when I reviewed it.

Only months later did I look and notice it's not the Threefish-inspired
mix function that I had designed and optimized.

Mea Culpa. Each input bit still has a chance to affect each output bit,
and the fast pool is spilled *long* before it fills, so it's not a total
disaster, but it's definitely not the intended great improvement.

I'm still working on finding better rotation constants. These are good
enough, but since it's unrolled twice, it's possible to get better
mixing for free by using eight different constants rather than repeating
the same four.

Signed-off-by: George Spelvin <linux@horizon.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org # v3.16+
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# d4c5efdb 26-Aug-2014 Daniel Borkmann <daniel@iogearbox.net>

random: add and use memzero_explicit() for clearing data

zatimend has reported that in his environment (3.16/gcc4.8.3/corei7)
memset() calls which clear out sensitive data in extract_{buf,entropy,
entropy_user}() in random driver are being optimized away by gcc.

Add a helper memzero_explicit() (similarly as explicit_bzero() variants)
that can be used in such cases where a variable with sensitive data is
being cleared out in the end. Other use cases might also be in crypto
code. [ I have put this into lib/string.c though, as it's always built-in
and doesn't need any dependencies then. ]

Fixes kernel bugzilla: 82041

Reported-by: zatimend@hotmail.co.uk
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org


# 1b2a1a7e 16-Aug-2014 Christoph Lameter <cl@linux.com>

drivers/char/random: Replace __get_cpu_var uses

A single case of using __get_cpu_var for address calculation.

Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Tejun Heo <tj@kernel.org>


# 48d6be95 17-Jul-2014 Theodore Ts'o <tytso@mit.edu>

random: limit the contribution of the hw rng to at most half

For people who don't trust a hardware RNG which can not be audited,
the changes to add support for RDSEED can be troubling since 97% or
more of the entropy will be contributed from the in-CPU hardware RNG.

We now have a in-kernel khwrngd, so for those people who do want to
implicitly trust the CPU-based system, we could create an arch-rng
hw_random driver, and allow khwrng refill the entropy pool. This
allows system administrator whether or not they trust the CPU (I
assume the NSA will trust RDRAND/RDSEED implicitly :-), and if so,
what level of entropy derating they want to use.

The reason why this is a really good idea is that if different people
use different levels of entropy derating, it will make it much more
difficult to design a backdoor'ed hwrng that can be generally
exploited in terms of the output of /dev/random when different attack
targets are using differing levels of entropy derating.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>


# c6e9d6f3 17-Jul-2014 Theodore Ts'o <tytso@mit.edu>

random: introduce getrandom(2) system call

The getrandom(2) system call was requested by the LibreSSL Portable
developers. It is analoguous to the getentropy(2) system call in
OpenBSD.

The rationale of this system call is to provide resiliance against
file descriptor exhaustion attacks, where the attacker consumes all
available file descriptors, forcing the use of the fallback code where
/dev/[u]random is not available. Since the fallback code is often not
well-tested, it is better to eliminate this potential failure mode
entirely.

The other feature provided by this new system call is the ability to
request randomness from the /dev/urandom entropy pool, but to block
until at least 128 bits of entropy has been accumulated in the
/dev/urandom entropy pool. Historically, the emphasis in the
/dev/urandom development has been to ensure that urandom pool is
initialized as quickly as possible after system boot, and preferably
before the init scripts start execution.

This is because changing /dev/urandom reads to block represents an
interface change that could potentially break userspace which is not
acceptable. In practice, on most x86 desktop and server systems, in
general the entropy pool can be initialized before it is needed (and
in modern kernels, we will printk a warning message if not). However,
on an embedded system, this may not be the case. And so with this new
interface, we can provide the functionality of blocking until the
urandom pool has been initialized. Any userspace program which uses
this new functionality must take care to assure that if it is used
during the boot process, that it will not cause the init scripts or
other portions of the system startup to hang indefinitely.

SYNOPSIS
#include <linux/random.h>

int getrandom(void *buf, size_t buflen, unsigned int flags);

DESCRIPTION
The system call getrandom() fills the buffer pointed to by buf
with up to buflen random bytes which can be used to seed user
space random number generators (i.e., DRBG's) or for other
cryptographic uses. It should not be used for Monte Carlo
simulations or other programs/algorithms which are doing
probabilistic sampling.

If the GRND_RANDOM flags bit is set, then draw from the
/dev/random pool instead of the /dev/urandom pool. The
/dev/random pool is limited based on the entropy that can be
obtained from environmental noise, so if there is insufficient
entropy, the requested number of bytes may not be returned.
If there is no entropy available at all, getrandom(2) will
either block, or return an error with errno set to EAGAIN if
the GRND_NONBLOCK bit is set in flags.

If the GRND_RANDOM bit is not set, then the /dev/urandom pool
will be used. Unlike using read(2) to fetch data from
/dev/urandom, if the urandom pool has not been sufficiently
initialized, getrandom(2) will block (or return -1 with the
errno set to EAGAIN if the GRND_NONBLOCK bit is set in flags).

The getentropy(2) system call in OpenBSD can be emulated using
the following function:

int getentropy(void *buf, size_t buflen)
{
int ret;

if (buflen > 256)
goto failure;
ret = getrandom(buf, buflen, 0);
if (ret < 0)
return ret;
if (ret == buflen)
return 0;
failure:
errno = EIO;
return -1;
}

RETURN VALUE
On success, the number of bytes that was filled in the buf is
returned. This may not be all the bytes requested by the
caller via buflen if insufficient entropy was present in the
/dev/random pool, or if the system call was interrupted by a
signal.

On error, -1 is returned, and errno is set appropriately.

ERRORS
EINVAL An invalid flag was passed to getrandom(2)

EFAULT buf is outside the accessible address space.

EAGAIN The requested entropy was not available, and
getentropy(2) would have blocked if the
GRND_NONBLOCK flag was not set.

EINTR While blocked waiting for entropy, the call was
interrupted by a signal handler; see the description
of how interrupted read(2) calls on "slow" devices
are handled with and without the SA_RESTART flag
in the signal(7) man page.

NOTES
For small requests (buflen <= 256) getrandom(2) will not
return EINTR when reading from the urandom pool once the
entropy pool has been initialized, and it will return all of
the bytes that have been requested. This is the recommended
way to use getrandom(2), and is designed for compatibility
with OpenBSD's getentropy() system call.

However, if you are using GRND_RANDOM, then getrandom(2) may
block until the entropy accounting determines that sufficient
environmental noise has been gathered such that getrandom(2)
will be operating as a NRBG instead of a DRBG for those people
who are working in the NIST SP 800-90 regime. Since it may
block for a long time, these guarantees do *not* apply. The
user may want to interrupt a hanging process using a signal,
so blocking until all of the requested bytes are returned
would be unfriendly.

For this reason, the user of getrandom(2) MUST always check
the return value, in case it returns some error, or if fewer
bytes than requested was returned. In the case of
!GRND_RANDOM and small request, the latter should never
happen, but the careful userspace code (and all crypto code
should be careful) should check for this anyway!

Finally, unless you are doing long-term key generation (and
perhaps not even then), you probably shouldn't be using
GRND_RANDOM. The cryptographic algorithms used for
/dev/urandom are quite conservative, and so should be
sufficient for all purposes. The disadvantage of GRND_RANDOM
is that it can block, and the increased complexity required to
deal with partially fulfilled getrandom(2) requests.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Zach Brown <zab@zabbo.net>


# 79a84687 18-Jul-2014 Hannes Frederic Sowa <hannes@stressinduktion.org>

random: check for increase of entropy_count because of signed conversion

The expression entropy_count -= ibytes << (ENTROPY_SHIFT + 3) could
actually increase entropy_count if during assignment of the unsigned
expression on the RHS (mind the -=) we reduce the value modulo
2^width(int) and assign it to entropy_count. Trinity found this.

[ Commit modified by tytso to add an additional safety check for a
negative entropy_count -- which should never happen, and to also add
an additional paranoia check to prevent overly large count values to
be passed into urandom_read(). ]

Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org


# ee3e00e9 15-Jun-2014 Theodore Ts'o <tytso@mit.edu>

random: use registers from interrupted code for CPU's w/o a cycle counter

For CPU's that don't have a cycle counter, or something equivalent
which can be used for random_get_entropy(), random_get_entropy() will
always return 0. In that case, substitute with the saved interrupt
registers to add a bit more unpredictability.

Some folks have suggested hashing all of the registers
unconditionally, but this would increase the overhead of
add_interrupt_randomness() by at least an order of magnitude, and this
would very likely be unacceptable.

The changes in this commit have been benchmarked as mostly unaffecting
the overhead of add_interrupt_randomness() if the entropy counter is
present, and doubling the overhead if it is not present.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: Jörn Engel <joern@logfs.org>


# c84dbf61 14-Jun-2014 Torsten Duwe <duwe@lst.de>

random: add_hwgenerator_randomness() for feeding entropy from devices

This patch adds an interface to the random pool for feeding entropy
in-kernel.

Signed-off-by: Torsten Duwe <duwe@suse.de>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Acked-by: H. Peter Anvin <hpa@zytor.com>


# 43759d4f 14-Jun-2014 Theodore Ts'o <tytso@mit.edu>

random: use an improved fast_mix() function

Use more efficient fast_mix() function. Thanks to George Spelvin for
doing the leg work to find a more efficient mixing function.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: George Spelvin <linux@horizon.com>


# 840f9507 14-Jun-2014 Theodore Ts'o <tytso@mit.edu>

random: clean up interrupt entropy accounting for archs w/o cycle counters

For architectures that don't have cycle counters, the algorithm for
deciding when to avoid giving entropy credit due to back-to-back timer
interrupts didn't make any sense, since we were checking every 64
interrupts. Change it so that we only give an entropy credit if the
majority of the interrupts are not based on the timer.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: George Spelvin <linux@horizon.com>


# cff85031 10-Jun-2014 Theodore Ts'o <tytso@mit.edu>

random: only update the last_pulled time if we actually transferred entropy

In xfer_secondary_pull(), check to make sure we need to pull from the
secondary pool before checking and potentially updating the
last_pulled time.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: George Spelvin <linux@horizon.com>


# 85608f8e 10-Jun-2014 Theodore Ts'o <tytso@mit.edu>

random: remove unneeded hash of a portion of the entropy pool

We previously extracted a portion of the entropy pool in
mix_pool_bytes() and hashed it in to avoid racing CPU's from returning
duplicate random values. Now that we are using a spinlock to prevent
this from happening, this is no longer necessary. So remove it, to
simplify the code a bit.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: George Spelvin <linux@horizon.com>


# 91fcb532 10-Jun-2014 Theodore Ts'o <tytso@mit.edu>

random: always update the entropy pool under the spinlock

Instead of using lockless techniques introduced in commit
902c098a3663, use spin_trylock to try to grab entropy pool's lock. If
we can't get the lock, then just try again on the next interrupt.

Based on discussions with George Spelvin.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: George Spelvin <linux@horizon.com>


# e33ba5fa 15-Jun-2014 Theodore Ts'o <tytso@mit.edu>

random: fix nasty entropy accounting bug

Commit 0fb7a01af5b0 "random: simplify accounting code", introduced in
v3.15, has a very nasty accounting problem when the entropy pool has
has fewer bytes of entropy than the number of requested reserved
bytes. In that case, "have_bytes - reserved" goes negative, and since
size_t is unsigned, the expression:

ibytes = min_t(size_t, ibytes, have_bytes - reserved);

... does not do the right thing. This is rather bad, because it
defeats the catastrophic reseeding feature in the
xfer_secondary_pool() path.

It also can cause the "BUG: spinlock trylock failure on UP" for some
kernel configurations when prandom_reseed() calls get_random_bytes()
in the early init, since when the entropy count gets corrupted,
credit_entropy_bits() erroneously believes that the nonblocking pool
has been fully initialized (when in fact it is not), and so it calls
prandom_reseed(true) recursively leading to the spinlock BUG.

The logic is *not* the same it was originally, but in the cases where
it matters, the behavior is the same, and the resulting code is
hopefully easier to read and understand.

Fixes: 0fb7a01af5b0 "random: simplify accounting code"
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: Greg Price <price@mit.edu>
Cc: stable@vger.kernel.org #v3.15


# 5eb10d91 06-Jun-2014 Joe Perches <joe@perches.com>

random: convert use of typedef ctl_table to struct ctl_table

This typedef is unnecessary and should just be removed.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# f9c6d498 16-May-2014 Theodore Ts'o <tytso@mit.edu>

random: fix BUG_ON caused by accounting simplification

Commit ee1de406ba6eb1 ("random: simplify accounting logic") simplified
things too much, in that it allows the following to trigger an
overflow that results in a BUG_ON crash:

dd if=/dev/urandom of=/dev/zero bs=67108707 count=1

Thanks to Peter Zihlstra for discovering the crash, and Hannes
Frederic for analyizing the root cause.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reported-by: Peter Zijlstra <peterz@infradead.org>
Reported-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Greg Price <price@mit.edu>


# bdcfa3e5 25-Apr-2014 Christoph Hellwig <hch@infradead.org>

random: export add_disk_randomness

This will be needed for pending changes to the scsi midlayer that now
calls lower level block APIs, as well as any blk-mq driver that wants to
contribute to the random pool.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Jens Axboe <axboe@fb.com>


# 7b878d4b 17-Mar-2014 H. Peter Anvin <hpa@linux.intel.com>

random: Add arch_has_random[_seed]()

Add predicate functions for having arch_get_random[_seed]*(). The
only current use is to avoid the loop in arch_random_refill() when
arch_get_random_seed_long() is unavailable.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>


# 331c6490 17-Mar-2014 H. Peter Anvin <hpa@linux.intel.com>

random: If we have arch_get_random_seed*(), try it before blocking

If we have arch_get_random_seed*(), try to use it for emergency refill
of the entropy pool before giving up and blocking on /dev/random. It
may or may not work in the moment, but if it does work, it will give
the user better service than blocking will.

Reviewed-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>


# 83664a69 17-Mar-2014 H. Peter Anvin <hpa@linux.intel.com>

random: Use arch_get_random_seed*() at init time and once a second

Use arch_get_random_seed*() in two places in the Linux random
driver (drivers/char/random.c):

1. During entropy pool initialization, use RDSEED in favor of RDRAND,
with a fallback to the latter. Entropy exhaustion is unlikely to
happen there on physical hardware as the machine is single-threaded
at that point, but could happen in a virtual machine. In that
case, the fallback to RDRAND will still provide more than adequate
entropy pool initialization.

2. Once a second, issue RDSEED and, if successful, feed it to the
entropy pool. To ensure an extra layer of security, only credit
half the entropy just in case.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>


# 46884442 17-Dec-2013 Theodore Ts'o <tytso@mit.edu>

random: use the architectural HWRNG for the SHA's IV in extract_buf()

To help assuage the fears of those who think the NSA can introduce a
massive hack into the instruction decode and out of order execution
engine in the CPU without hundreds of Intel engineers knowing about
it (only one of which woud need to have the conscience and courage of
Edward Snowden to spill the beans to the public), use the HWRNG to
initialize the SHA starting value, instead of xor'ing it in
afterwards.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>


# 2132a96f 06-Dec-2013 Greg Price <price@MIT.EDU>

random: clarify bits/bytes in wakeup thresholds

These are a recurring cause of confusion, so rename them to
hopefully be clearer.

Signed-off-by: Greg Price <price@mit.edu>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>


# 7d1b08c4 07-Dec-2013 Greg Price <price@MIT.EDU>

random: entropy_bytes is actually bits

The variable 'entropy_bytes' is set from an expression that actually
counts bits. Fortunately it's also only compared to values that also
count bits. Rename it accordingly.

Signed-off-by: Greg Price <price@mit.edu>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>


# 0fb7a01a 05-Dec-2013 Greg Price <price@MIT.EDU>

random: simplify accounting code

With this we handle "reserved" in just one place. As a bonus the
code becomes less nested, and the "wakeup_write" flag variable
becomes unnecessary. The variable "flags" was already unused.

This code behaves identically to the previous version except in
two pathological cases that don't occur. If the argument "nbytes"
is already less than "min", then we didn't previously enforce
"min". If r->limit is false while "reserved" is nonzero, then we
previously applied "reserved" in checking whether we had enough
bits, even though we don't apply it to actually limit how many we
take. The callers of account() never exercise either of these cases.

Before the previous commit, it was possible for "nbytes" to be less
than "min" if userspace chose a pathological configuration, but no
longer.

Cc: Jiri Kosina <jkosina@suse.cz>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Greg Price <price@mit.edu>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>


# 8c2aa339 05-Dec-2013 Greg Price <price@MIT.EDU>

random: tighten bound on random_read_wakeup_thresh

We use this value in a few places other than its literal meaning,
in particular in _xfer_secondary_pool() as a minimum number of
bits to pull from the input pool at a time into either output
pool. It doesn't make sense to pull more bits than the whole size
of an output pool.

We could and possibly should separate the quantities "how much
should the input pool have to have to wake up /dev/random readers"
and "how much should we transfer from the input to an output pool
at a time", but nobody is likely to be sad they can't set the first
quantity to more than 1024 bits, so for now just limit them both.

Signed-off-by: Greg Price <price@mit.edu>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>


# a58aa4ed 29-Nov-2013 Greg Price <price@MIT.EDU>

random: forget lock in lockless accounting

The only mutable data accessed here is ->entropy_count, but since
10b3a32d2 ("random: fix accounting race condition") we use cmpxchg to
protect our accesses to ->entropy_count here. Drop the use of the
lock.

Cc: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Price <price@mit.edu>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>


# ee1de406 29-Nov-2013 Greg Price <price@MIT.EDU>

random: simplify accounting logic

This logic is exactly equivalent to the old logic, but it should
be easier to see what it's doing.

The equivalence depends on one fact from outside this function:
when 'r->limit' is false, 'reserved' is zero. (Well, two facts;
the other is that 'reserved' is never negative.)

Cc: Jiri Kosina <jkosina@suse.cz>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Greg Price <price@mit.edu>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>


# 19fa5be1 29-Nov-2013 Greg Price <price@MIT.EDU>

random: fix comment on "account"

This comment didn't quite keep up as extract_entropy() was split into
four functions. Put each bit by the function it describes.

Signed-off-by: Greg Price <price@mit.edu>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>


# 12ff3a51 29-Nov-2013 Greg Price <price@MIT.EDU>

random: simplify loop in random_read

The loop condition never changes until just before a break, so we
might as well write it as a constant. Also since a996996dd75a
("random: drop weird m_time/a_time manipulation") we don't do anything
after the loop finishes, so the 'break's might as well return
directly. Some other simplifications.

There should be no change in behavior introduced by this commit.

Signed-off-by: Greg Price <price@mit.edu>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>


# 18e9cea7 29-Nov-2013 Greg Price <price@MIT.EDU>

random: fix description of get_random_bytes

After this remark was written, commit d2e7c96af added a use of
arch_get_random_long() inside the get_random_bytes codepath.
The main point stands, but it needs to be reworded.

Signed-off-by: Greg Price <price@mit.edu>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>


# f22052b2 29-Nov-2013 Greg Price <price@MIT.EDU>

random: fix comment on proc_do_uuid

There's only one function here now, as uuid_strategy is long gone.
Also make the bit about "If accesses via ..." clearer.

Signed-off-by: Greg Price <price@mit.edu>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>


# dfd38750 29-Nov-2013 Greg Price <price@MIT.EDU>

random: fix typos / spelling errors in comments

Signed-off-by: Greg Price <price@mit.edu>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>


# 4af712e8 10-Nov-2013 Hannes Frederic Sowa <hannes@stressinduktion.org>

random32: add prandom_reseed_late() and call when nonblocking pool becomes initialized

The Tausworthe PRNG is initialized at late_initcall time. At that time the
entropy pool serving get_random_bytes is not filled sufficiently. This
patch adds an additional reseeding step as soon as the nonblocking pool
gets marked as initialized.

On some machines it might be possible that late_initcall gets called after
the pool has been initialized. In this situation we won't reseed again.

(A call to prandom_seed_late blocks later invocations of early reseed
attempts.)

Joint work with Daniel Borkmann.

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 392a546d 03-Nov-2013 Theodore Ts'o <tytso@mit.edu>

random: add debugging code to detect early use of get_random_bytes()

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>


# 644008df 03-Nov-2013 Theodore Ts'o <tytso@mit.edu>

random: initialize the last_time field in struct timer_rand_state

Since we initialize jiffies to wrap five minutes before boot (see
INITIAL_JIFFIES defined in include/linux/jiffies.h) it's important to
make sure the last_time field is initialized to INITIAL_JIFFIES.
Otherwise, the entropy estimator will overestimate the amount of
entropy resulting from the first call to add_timer_randomness(),
generally by about 8 bits.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>


# ae9ecd92 03-Nov-2013 Theodore Ts'o <tytso@mit.edu>

random: don't zap entropy count in rand_initialize()

The rand_initialize() function was being run fairly late in the kernel
boot sequence. This was unfortunate, since it zero'ed the entropy
counters, thus throwing away credit that was accumulated earlier in
the boot sequence, and it also meant that initcall functions run
before rand_initialize were using a minimally initialized pool.

To fix this, fix init_std_data() to no longer zap the entropy counter;
it wasn't necessary, and move rand_initialize() to be an early
initcall.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>


# 301f0595 03-Nov-2013 Theodore Ts'o <tytso@mit.edu>

random: printk notifications for urandom pool initialization

Print a notification to the console when the nonblocking pool is
initialized. Also printk a warning when a process tries reading from
/dev/urandom before it is fully initialized.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>


# 40db23e5 02-Nov-2013 Theodore Ts'o <tytso@mit.edu>

random: make add_timer_randomness() fill the nonblocking pool first

Change add_timer_randomness() so that it directs incoming entropy to
the nonblocking pool first if it hasn't been fully initialized yet.
This matches the strategy we use in add_interrupt_randomness(), which
allows us to push the randomness where we need it the most during when
the system is first booting up, so that get_random_bytes() and
/dev/urandom become safe to use as soon as possible.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>


# f80bbd8b 02-Oct-2013 Theodore Ts'o <tytso@mit.edu>

random: convert DEBUG_ENT to tracepoints

Instead of using the random driver's ad-hoc DEBUG_ENT() mechanism, use
tracepoints instead. This allows for a much more fine-grained control
of which debugging mechanism which a developer might need, and unifies
the debugging messages with all of the existing tracepoints.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>


# 6265e169 02-Oct-2013 Theodore Ts'o <tytso@mit.edu>

random: push extra entropy to the output pools

As the input pool gets filled, start transfering entropy to the output
pools until they get filled. This allows us to use the output pools
to store more system entropy. Waste not, want not....

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>


# 95b709b6 02-Oct-2013 Theodore Ts'o <tytso@mit.edu>

random: drop trickle mode

The add_timer_randomness() used to drop into trickle mode when entropy
pool was estimated to be 87.5% full. This was important when
add_timer_randomness() was used to sample interrupts. It's not used
for this any more --- add_interrupt_randomness() now uses fast_mix()
instead. By elimitating trickle mode, it allows us to fully utilize
entropy provided by add_input_randomness() and add_disk_randomness()
even when the input pool is above the old trickle threshold of 87.5%.

This helps to answer the criticism in [1] in their hypothetical
scenario where our entropy estimator was inaccurate, even though the
measurements in [2] seem to indicate that our entropy estimator given
real-life entropy collection is actually pretty good, albeit on the
conservative side (which was as it was designed).

[1] http://eprint.iacr.org/2013/338.pdf
[2] http://eprint.iacr.org/2012/251.pdf

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>


# 6e9fa2c8 22-Sep-2013 Theodore Ts'o <tytso@mit.edu>

random: adjust the generator polynomials in the mixing function slightly

Our mixing functions were analyzed by Lacharme, Roeck, Strubel, and
Videau in their paper, "The Linux Pseudorandom Number Generator
Revisited" (see: http://eprint.iacr.org/2012/251.pdf).

They suggested a slight change to improve our mixing functions
slightly. I also adjusted the comments to better explain what is
going on, and to document why the polynomials were changed.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>


# 655b2264 22-Sep-2013 Theodore Ts'o <tytso@mit.edu>

random: speed up the fast_mix function by a factor of four

By mixing the entropy in chunks of 32-bit words instead of byte by
byte, we can speed up the fast_mix function significantly. Since it
is called on every single interrupt, on systems with a very heavy
interrupt load, this can make a noticeable difference.

Also fix a compilation warning in add_interrupt_randomness() and avoid
xor'ing cycles and jiffies together just in case we have an
architecture which tries to define random_get_entropy() by returning
jiffies.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reported-by: Jörn Engel <joern@logfs.org>


# f5c2742c 22-Sep-2013 Theodore Ts'o <tytso@mit.edu>

random: cap the rate which the /dev/urandom pool gets reseeded

In order to avoid draining the input pool of its entropy at too high
of a rate, enforce a minimum time interval between reseedings of the
urandom pool. This is set to 60 seconds by default.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>


# c59974ae 21-Sep-2013 Theodore Ts'o <tytso@mit.edu>

random: optimize the entropy_store structure

Use smaller types to slightly shrink the size of the entropy store
structure.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>


# 3ef4cb2d 12-Sep-2013 Theodore Ts'o <tytso@mit.edu>

random: optimize spinlock use in add_device_randomness()

The add_device_randomness() function calls mix_pool_bytes() twice for
the input pool and the non-blocking pool, for a total of four times.
By using _mix_pool_byte() and taking the spinlock in
add_device_randomness(), we can halve the number of times we need
take each pool's spinlock.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>


# 5910895f 12-Sep-2013 Theodore Ts'o <tytso@mit.edu>

random: fix the tracepoint for get_random_bytes(_arch)

Fix a problem where get_random_bytes_arch() was calling the tracepoint
get_random_bytes(). So add a new tracepoint for
get_random_bytes_arch(), and make get_random_bytes() and
get_random_bytes_arch() call their correct tracepoint.

Also, add a new tracepoint for add_device_randomness()

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>


# 30e37ec5 10-Sep-2013 H. Peter Anvin <hpa@linux.intel.com>

random: account for entropy loss due to overwrites

When we write entropy into a non-empty pool, we currently don't
account at all for the fact that we will probabilistically overwrite
some of the entropy in that pool. This means that unless the pool is
fully empty, we are currently *guaranteed* to overestimate the amount
of entropy in the pool!

Assuming Shannon entropy with zero correlations we end up with an
exponentally decaying value of new entropy added:

entropy <- entropy + (pool_size - entropy) *
(1 - exp(-add_entropy/pool_size))

However, calculations involving fractional exponentials are not
practical in the kernel, so apply a piecewise linearization:

For add_entropy <= pool_size/2 then

(1 - exp(-add_entropy/pool_size)) >= (add_entropy/pool_size)*0.7869...

... so we can approximate the exponential with
3/4*add_entropy/pool_size and still be on the
safe side by adding at most pool_size/2 at a time.

In order for the loop not to take arbitrary amounts of time if a bad
ioctl is received, terminate if we are within one bit of full. This
way the loop is guaranteed to terminate after no more than
log2(poolsize) iterations, no matter what the input value is. The
vast majority of the time the loop will be executed exactly once.

The piecewise linearization is very conservative, approaching 3/4 of
the usable input value for small inputs, however, our entropy
estimation is pretty weak at best, especially for small values; we
have no handle on correlation; and the Shannon entropy measure (Rényi
entropy of order 1) is not the correct one to use in the first place,
but rather the correct entropy measure is the min-entropy, the Rényi
entropy of infinite order.

As such, this conservatism seems more than justified.

This does introduce fractional bit values. I have left it to have 3
bits of fraction, so that with a pool of 2^12 bits the multiply in
credit_entropy_bits() can still fit into an int, as 2*(3+12) < 31. It
is definitely possible to allow for more fractional accounting, but
that multiply then would have to be turned into a 32*32 -> 64 multiply.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: DJ Johnston <dj.johnston@intel.com>


# a283b5c4 10-Sep-2013 H. Peter Anvin <hpa@zytor.com>

random: allow fractional bits to be tracked

Allow fractional bits of entropy to be tracked by scaling the entropy
counter (fixed point). This will be used in a subsequent patch that
accounts for entropy lost due to overwrites.

[ Modified by tytso to fix up a few missing places where the
entropy_count wasn't properly converted from fractional bits to
bits. ]

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>


# 9ed17b70 10-Sep-2013 H. Peter Anvin <hpa@linux.intel.com>

random: statically compute poolbitshift, poolbytes, poolbits

Use a macro to statically compute poolbitshift (will be used in a
subsequent patch), poolbytes, and poolbits. On virtually all
architectures the cost of a memory load with an offset is the same as
the one of a memory load.

It is still possible for this to generate worse code since the C
compiler doesn't know the fixed relationship between these fields, but
that is somewhat unlikely.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>


# 85a1f777 21-Sep-2013 Theodore Ts'o <tytso@mit.edu>

random: mix in architectural randomness earlier in extract_buf()

Previously if CPU chip had a built-in random number generator (i.e.,
RDRAND on newer x86 chips), we mixed it in at the very end of
extract_buf() using an XOR operation.

We now mix it in right after the calculate a hash across the entire
pool. This has the advantage that any contribution of entropy from
the CPU's HWRNG will get mixed back into the pool. In addition, it
means that if the HWRNG has any defects (either accidentally or
maliciously introduced), this will be mitigated via the non-linear
transform of the SHA-1 hash function before we hand out generated
output.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>


# 61875f30 21-Sep-2013 Theodore Ts'o <tytso@mit.edu>

random: allow architectures to optionally define random_get_entropy()

Allow architectures which have a disabled get_cycles() function to
provide a random_get_entropy() function which provides a fine-grained,
rapidly changing counter that can be used by the /dev/random driver.

For example, an architecture might have a rapidly changing register
used to control random TLB cache eviction, or DRAM refresh that
doesn't meet the requirements of get_cycles(), but which is good
enough for the needs of the random driver.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org


# 47d06e53 10-Sep-2013 Theodore Ts'o <tytso@mit.edu>

random: run random_int_secret_init() run after all late_initcalls

The some platforms (e.g., ARM) initializes their clocks as
late_initcalls for some unknown reason. So make sure
random_int_secret_init() is run after all of the late_initcalls are
run.

Cc: stable@vger.kernel.org
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>


# 0244ad00 30-Aug-2013 Martin Schwidefsky <schwidefsky@de.ibm.com>

Remove GENERIC_HARDIRQ config option

After the last architecture switched to generic hard irqs the config
options HAVE_GENERIC_HARDIRQS & GENERIC_HARDIRQS and the related code
for !CONFIG_GENERIC_HARDIRQS can be removed.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>


# a151427e 13-Jun-2013 Joe Perches <joe@perches.com>

char: Convert use of typedef ctl_table to struct ctl_table

This typedef is unnecessary and should just be removed.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# 10b3a32d 24-May-2013 Jiri Kosina <jkosina@suse.cz>

random: fix accounting race condition with lockless irq entropy_count update

Commit 902c098a3663 ("random: use lockless techniques in the interrupt
path") turned IRQ path from being spinlock protected into lockless
cmpxchg-retry update.

That commit removed r->lock serialization between crediting entropy bits
from IRQ context and accounting when extracting entropy on userspace
read path, but didn't turn the r->entropy_count reads/updates in
account() to use cmpxchg as well.

It has been observed, that under certain circumstances this leads to
read() on /dev/urandom to return 0 (EOF), as r->entropy_count gets
corrupted and becomes negative, which in turn results in propagating 0
all the way from account() to the actual read() call.

Convert the accounting code to be the proper lockless counterpart of
what has been partially done by 902c098a3663.

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Greg KH <greg@kroah.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 1e7e2e05 24-May-2013 Jarod Wilson <jarod@redhat.com>

drivers/char/random.c: fix priming of last_data

Commit ec8f02da9ea5 ("random: prime last_data value per fips
requirements") added priming of last_data per fips requirements.

Unfortuantely, it did so in a way that can lead to multiple threads all
incrementing nbytes, but only one actually doing anything with the extra
data, which leads to some fun random corruption and panics.

The fix is to simply do everything needed to prime last_data in a single
shot, so there's no window for multiple cpus to increment nbytes -- in
fact, we won't even increment or decrement nbytes anymore, we'll just
extract the needed EXTRACT_SIZE one time per pool and then carry on with
the normal routine.

All these changes have been tested across multiple hosts and
architectures where panics were previously encoutered. The code changes
are are strictly limited to areas only touched when when booted in fips
mode.

This change should also go into 3.8-stable, to make the myriads of fips
users on 3.8.x happy.

Signed-off-by: Jarod Wilson <jarod@redhat.com>
Tested-by: Jan Stancek <jstancek@redhat.com>
Tested-by: Jan Stodola <jstodola@redhat.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Matt Mackall <mpm@selenic.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 16c7fa05 30-Apr-2013 Andy Shevchenko <andriy.shevchenko@linux.intel.com>

lib/string_helpers: introduce generic string_unescape

There are several places in kernel where modules unescapes input to convert
C-Style Escape Sequences into byte codes.

The patch provides generic implementation of such approach. Test cases are
also included into the patch.

[akpm@linux-foundation.org: clarify comment]
[akpm@linux-foundation.org: export get_random_int() to modules]
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Samuel Thibault <samuel.thibault@ens-lyon.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: William Hubbs <w.d.hubbs@gmail.com>
Cc: Chris Brannon <chris@the-brannons.com>
Cc: Kirk Reiser <kirk@braille.uwo.ca>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 1f8e8ed0 08-Apr-2012 Kent Overstreet <koverstreet@google.com>

Export get_random_int()

Needed for bcache - need a cheap source of random numbers for perturbing
IO sizes, for rate limiting IO to the SSD.

Signed-off-by: Kent Overstreet <koverstreet@google.com>
CC: "Theodore Ts'o" <tytso@mit.edu>


# b9809552 04-Mar-2013 Theodore Ts'o <tytso@mit.edu>

random: fix locking dependency with the tasklist_lock

Commit 6133705494bb introduced a circular lock dependency because
posix_cpu_timers_exit() is called by release_task(), which is holding
a writer lock on tasklist_lock, and this can cause a deadlock since
kill_fasync() gets called with nonblocking_pool.lock taken.

There's no reason why kill_fasync() needs to be taken while the random
pool is locked, so move it out to fix this locking dependency.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reported-by: Russ Dill <Russ.Dill@gmail.com>
Cc: stable@kernel.org


# eece09ec 17-Jul-2011 Thomas Gleixner <tglx@linutronix.de>

locking: Various static lock initializer fixes

The static lock initializers want to be fed the proper name of the
lock and not some random string. In mainline random strings are
obfuscating the readability of debug output, but for RT they prevent
the spinlock substitution. Fix it up.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# ec8f02da 06-Nov-2012 Jarod Wilson <jarod@redhat.com>

random: prime last_data value per fips requirements

The value stored in last_data must be primed for FIPS 140-2 purposes. Upon
first use, either on system startup or after an RNDCLEARPOOL ioctl, we
need to take an initial random sample, store it internally in last_data,
then pass along the value after that to the requester, so that consistency
checks aren't being run against stale and possibly known data.

CC: Herbert Xu <herbert@gondor.apana.org.au>
CC: "David S. Miller" <davem@davemloft.net>
CC: Matt Mackall <mpm@selenic.com>
CC: linux-crypto@vger.kernel.org
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>


# 8eb2ffbf 15-Oct-2012 Jiri Kosina <jkosina@suse.cz>

random: fix debug format strings

Fix the following warnings in formatting debug output:

drivers/char/random.c: In function ‘xfer_secondary_pool’:
drivers/char/random.c:827: warning: format ‘%d’ expects type ‘int’, but argument 7 has type ‘size_t’
drivers/char/random.c: In function ‘account’:
drivers/char/random.c:859: warning: format ‘%d’ expects type ‘int’, but argument 5 has type ‘size_t’
drivers/char/random.c:881: warning: format ‘%d’ expects type ‘int’, but argument 5 has type ‘size_t’
drivers/char/random.c: In function ‘random_read’:
drivers/char/random.c:1141: warning: format ‘%d’ expects type ‘int’, but argument 5 has type ‘ssize_t’
drivers/char/random.c:1145: warning: format ‘%d’ expects type ‘int’, but argument 5 has type ‘ssize_t’
drivers/char/random.c:1145: warning: format ‘%d’ expects type ‘int’, but argument 6 has type ‘long unsigned int’

by using '%zd' instead of '%d' to properly denote ssize_t/size_t conversion.

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>


# be5b779a 15-Oct-2012 Jiri Kosina <jkosina@suse.cz>

random: make it possible to enable debugging without rebuild

The module parameter that turns debugging mode (which basically means
printing a few extra lines during runtime) is in '#if 0' block. Forcing
everyone who would like to see how entropy is behaving on his system to
rebuild seems to be a little bit too harsh.

If we were concerned about speed, we could potentially turn 'debug' into a
static key, but I don't think it's necessary.

Drop the '#if 0' block to allow using the 'debug' parameter without rebuilding.

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>


# d2e7c96a 27-Jul-2012 H. Peter Anvin <hpa@linux.intel.com>

random: mix in architectural randomness in extract_buf()

Mix in any architectural randomness in extract_buf() instead of
xfer_secondary_buf(). This allows us to mix in more architectural
randomness, and it also makes xfer_secondary_buf() faster, moving a
tiny bit of additional CPU overhead to process which is extracting the
randomness.

[ Commit description modified by tytso to remove an extended
advertisement for the RDRAND instruction. ]

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: DJ Johnston <dj.johnston@intel.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org


# cbc96b75 23-Jul-2012 Tony Luck <tony.luck@intel.com>

random: Add comment to random_initialize()

Many platforms have per-machine instance data (serial numbers,
asset tags, etc.) squirreled away in areas that are accessed
during early system bringup. Mixing this data into the random
pools has a very high value in providing better random data,
so we should allow (and even encourage) architecture code to
call add_device_randomness() from the setup_arch() paths.

However, this limits our options for internal structure of
the random driver since random_initialize() is not called
until long after setup_arch().

Add a big fat comment to rand_initialize() spelling out
this requirement.

Suggested-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>


# c5857ccf 14-Jul-2012 Theodore Ts'o <tytso@mit.edu>

random: remove rand_initialize_irq()

With the new interrupt sampling system, we are no longer using the
timer_rand_state structure in the irq descriptor, so we can stop
initializing it now.

[ Merged in fixes from Sedat to find some last missing references to
rand_initialize_irq() ]

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Sedat Dilek <sedat.dilek@gmail.com>


# 00ce1db1 04-Jul-2012 Theodore Ts'o <tytso@mit.edu>

random: add tracepoints for easier debugging and verification

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>


# c2557a30 05-Jul-2012 Theodore Ts'o <tytso@mit.edu>

random: add new get_random_bytes_arch() function

Create a new function, get_random_bytes_arch() which will use the
architecture-specific hardware random number generator if it is
present. Change get_random_bytes() to not use the HW RNG, even if it
is avaiable.

The reason for this is that the hw random number generator is fast (if
it is present), but it requires that we trust the hardware
manufacturer to have not put in a back door. (For example, an
increasing counter encrypted by an AES key known to the NSA.)

It's unlikely that Intel (for example) was paid off by the US
Government to do this, but it's impossible for them to prove otherwise
--- especially since Bull Mountain is documented to use AES as a
whitener. Hence, the output of an evil, trojan-horse version of
RDRAND is statistically indistinguishable from an RDRAND implemented
to the specifications claimed by Intel. Short of using a tunnelling
electronic microscope to reverse engineer an Ivy Bridge chip and
disassembling and analyzing the CPU microcode, there's no way for us
to tell for sure.

Since users of get_random_bytes() in the Linux kernel need to be able
to support hardware systems where the HW RNG is not present, most
time-sensitive users of this interface have already created their own
cryptographic RNG interface which uses get_random_bytes() as a seed.
So it's much better to use the HW RNG to improve the existing random
number generator, by mixing in any entropy returned by the HW RNG into
/dev/random's entropy pool, but to always _use_ /dev/random's entropy
pool.

This way we get almost of the benefits of the HW RNG without any
potential liabilities. The only benefits we forgo is the
speed/performance enhancements --- and generic kernel code can't
depend on depend on get_random_bytes() having the speed of a HW RNG
anyway.

For those places that really want access to the arch-specific HW RNG,
if it is available, we provide get_random_bytes_arch().

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org


# e6d4947b 05-Jul-2012 Theodore Ts'o <tytso@mit.edu>

random: use the arch-specific rng in xfer_secondary_pool

If the CPU supports a hardware random number generator, use it in
xfer_secondary_pool(), where it will significantly improve things and
where we can afford it.

Also, remove the use of the arch-specific rng in
add_timer_randomness(), since the call is significantly slower than
get_cycles(), and we're much better off using it in
xfer_secondary_pool() anyway.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org


# a2080a67 04-Jul-2012 Linus Torvalds <torvalds@linux-foundation.org>

random: create add_device_randomness() interface

Add a new interface, add_device_randomness() for adding data to the
random pool that is likely to differ between two devices (or possibly
even per boot). This would be things like MAC addresses or serial
numbers, or the read-out of the RTC. This does *not* add any actual
entropy to the pool, but it initializes the pool to different values
for devices that might otherwise be identical and have very little
entropy available to them (particularly common in the embedded world).

[ Modified by tytso to mix in a timestamp, since there may be some
variability caused by the time needed to detect/configure the hardware
in question. ]

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org


# 902c098a 04-Jul-2012 Theodore Ts'o <tytso@mit.edu>

random: use lockless techniques in the interrupt path

The real-time Linux folks don't like add_interrupt_randomness() taking
a spinlock since it is called in the low-level interrupt routine.
This also allows us to reduce the overhead in the fast path, for the
random driver, which is the interrupt collection path.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org


# 775f4b29 02-Jul-2012 Theodore Ts'o <tytso@mit.edu>

random: make 'add_interrupt_randomness()' do something sane

We've been moving away from add_interrupt_randomness() for various
reasons: it's too expensive to do on every interrupt, and flooding the
CPU with interrupts could theoretically cause bogus floods of entropy
from a somewhat externally controllable source.

This solves both problems by limiting the actual randomness addition
to just once a second or after 64 interrupts, whicever comes first.
During that time, the interrupt cycle data is buffered up in a per-cpu
pool. Also, we make sure the the nonblocking pool used by urandom is
initialized before we start feeding the normal input pool. This
assures that /dev/urandom is returning unpredictable data as soon as
possible.

(Based on an original patch by Linus, but significantly modified by
tytso.)

Tested-by: Eric Wustrow <ewust@umich.edu>
Reported-by: Eric Wustrow <ewust@umich.edu>
Reported-by: Nadia Heninger <nadiah@cs.ucsd.edu>
Reported-by: Zakir Durumeric <zakir@umich.edu>
Reported-by: J. Alex Halderman <jhalderm@umich.edu>.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org


# 74feec5d 06-Jul-2012 Theodore Ts'o <tytso@mit.edu>

random: fix up sparse warnings

Add extern and static declarations to suppress sparse warnings

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>


# 44e4360f 12-Apr-2012 Mathieu Desnoyers <mathieu.desnoyers@efficios.com>

drivers/char/random.c: fix boot id uniqueness race

/proc/sys/kernel/random/boot_id can be read concurrently by userspace
processes. If two (or more) user-space processes concurrently read
boot_id when sysctl_bootid is not yet assigned, a race can occur making
boot_id differ between the reads. Because the whole point of the boot id
is to be unique across a kernel execution, fix this by protecting this
operation with a spinlock.

Given that this operation is not frequently used, hitting the spinlock
on each call should not be an issue.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Matt Mackall <mpm@selenic.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Greg Kroah-Hartman <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 2dac8e54 16-Jan-2012 H. Peter Anvin <hpa@linux.intel.com>

random: Adjust the number of loops when initializing

When we are initializing using arch_get_random_long() we only need to
loop enough times to touch all the bytes in the buffer; using
poolwords for that does twice the number of operations necessary on a
64-bit machine, since in the random number generator code "word" means
32 bits.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Link: http://lkml.kernel.org/r/1324589281-31931-1-git-send-email-tytso@mit.edu


# 3e88bdff 22-Dec-2011 Theodore Ts'o <tytso@mit.edu>

random: Use arch-specific RNG to initialize the entropy store

If there is an architecture-specific random number generator (such as
RDRAND for Intel architectures), use it to initialize /dev/random's
entropy stores. Even in the worst case, if RDRAND is something like
AES(NSA_KEY, counter++), it won't hurt, and it will definitely help
against any other adversaries.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Link: http://lkml.kernel.org/r/1324589281-31931-1-git-send-email-tytso@mit.edu
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>


# 90ab5ee9 12-Jan-2012 Rusty Russell <rusty@rustcorp.com.au>

module_param: make bool parameters really bool (drivers & misc)

module_param(bool) used to counter-intuitively take an int. In
fddd5201 (mid-2009) we allowed bool or int/unsigned int using a messy
trick.

It's time to remove the int/unsigned int option. For this version
it'll simply give a warning, but it'll break next kernel version.

Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


# cf833d0b 22-Dec-2011 Linus Torvalds <torvalds@linux-foundation.org>

random: Use arch_get_random_int instead of cycle counter if avail

We still don't use rdrand in /dev/random, which just seems stupid. We
accept the *cycle*counter* as a random input, but we don't accept
rdrand? That's just broken.

Sure, people can do things in user space (write to /dev/random, use
rdrand in addition to /dev/random themselves etc etc), but that
*still* seems to be a particularly stupid reason for saying "we
shouldn't bother to try to do better in /dev/random".

And even if somebody really doesn't trust rdrand as a source of random
bytes, it seems singularly stupid to trust the cycle counter *more*.

So I'd suggest the attached patch. I'm not going to even bother
arguing that we should add more bits to the entropy estimate, because
that's not the point - I don't care if /dev/random fills up slowly or
not, I think it's just stupid to not use the bits we can get from
rdrand and mix them into the strong randomness pool.

Link: http://lkml.kernel.org/r/CA%2B55aFwn59N1=m651QAyTy-1gO1noGbK18zwKDwvwqnravA84A@mail.gmail.com
Acked-by: "David S. Miller" <davem@davemloft.net>
Acked-by: "Theodore Ts'o" <tytso@mit.edu>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>


# bd29e568 16-Nov-2011 Tony Luck <tony.luck@intel.com>

fix typo/thinko in get_random_bytes()

If there is an architecture-specific random number generator we use it
to acquire randomness one "long" at a time. We should put these random
words into consecutive words in the result buffer - not just overwrite
the first word again and again.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 0d2f096b 16-Nov-2011 Tony Luck <tony.luck@intel.com>

random: Fix handing of arch_get_random_long in get_random_bytes()

If there is an architecture-specific random number generator we use
it to acquire randomness one "long" at a time. We should put these
random words into consecutive words in the result buffer - not just
overwrite the first word again and again.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/4ec4061010261a4cb0@agluck-desktop.sc.intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>


# 6e5714ea 03-Aug-2011 David S. Miller <davem@davemloft.net>

net: Compute protocol sequence numbers and fragment IDs using MD5.

Computers have become a lot faster since we compromised on the
partial MD4 hash which we use currently for performance reasons.

MD5 is a much safer choice, and is inline with both RFC1948 and
other ISS generators (OpenBSD, Solaris, etc.)

Furthermore, only having 24-bits of the sequence number be truly
unpredictable is a very serious limitation. So the periodic
regeneration and 8-bit counter have been removed. We compute and
use a full 32-bit sequence number.

For ipv6, DCCP was found to use a 32-bit truncated initial sequence
number (it needs 43-bits) and that is fixed here as well.

Reported-by: Dan Kaminsky <dan@doxpara.com>
Tested-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 63d77173 31-Jul-2011 H. Peter Anvin <hpa@zytor.com>

random: Add support for architectural random hooks

Add support for architecture-specific hooks into the kernel-directed
random number generator interfaces. This patchset does not use the
architecture random number generator interfaces for the
userspace-directed interfaces (/dev/random and /dev/urandom), thus
eliminating the need to distinguish between them based on a pool
pointer.

Changes in version 3:
- Moved the hooks from extract_entropy() to get_random_bytes().
- Changes the hooks to inlines.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "Theodore Ts'o" <tytso@mit.edu>


# 87c48fa3 21-Jul-2011 Eric Dumazet <eric.dumazet@gmail.com>

ipv6: make fragment identifications less predictable

IPv6 fragment identification generation is way beyond what we use for
IPv4 : It uses a single generator. Its not scalable and allows DOS
attacks.

Now inetpeer is IPv6 aware, we can use it to provide a more secure and
scalable frag ident generator (per destination, instead of system wide)

This patch :
1) defines a new secure_ipv6_id() helper
2) extends inet_getid() to provide 32bit results
3) extends ipv6_select_ident() with a new dest parameter

Reported-by: Fernando Gont <fernando@gont.com.ar>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 25985edc 30-Mar-2011 Lucas De Marchi <lucas.demarchi@profusion.mobi>

Fix common misspellings

Fixes generated by 'codespell' and manually reviewed.

Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>


# 442a4fff 21-Feb-2011 Jarod Wilson <jarod@redhat.com>

random: update interface comments to reflect reality

At present, the comment header in random.c makes no mention of
add_disk_randomness, and instead, suggests that disk activity adds to the
random pool by way of add_interrupt_randomness, which appears to not have
been the case since sometime prior to the existence of git, and even prior
to bitkeeper. Didn't look any further back. At least, as far as I can
tell, there are no storage drivers setting IRQF_SAMPLE_RANDOM, which is a
requirement for add_interrupt_randomness to trigger, so the only way for a
disk to contribute entropy is by way of add_disk_randomness. Update
comments accordingly, complete with special mention about solid state
drives being a crappy source of entropy (see e2e1a148bc for reference).

Signed-off-by: Jarod Wilson <jarod@redhat.com>
Acked-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>


# b29c617a 06-Dec-2010 Christoph Lameter <cl@linux.com>

random: Use this_cpu_inc_return

__this_cpu_inc can create a single instruction to do the same as
__get_cpu_var()++.

Cc: Richard Kennedy <richard@rsk.demon.co.uk>
Cc: Matt Mackall <mpm@selenic.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Tejun Heo <tj@kernel.org>


# 6038f373 15-Aug-2010 Arnd Bergmann <arnd@arndb.de>

llseek: automatically add .llseek fop

All file_operations should get a .llseek operation so we can make
nonseekable_open the default for future file operations without a
.llseek pointer.

The three cases that we can automatically detect are no_llseek, seq_lseek
and default_llseek. For cases where we can we can automatically prove that
the file offset is always ignored, we use noop_llseek, which maintains
the current behavior of not returning an error from a seek.

New drivers should normally not use noop_llseek but instead use no_llseek
and call nonseekable_open at open time. Existing drivers can be converted
to do the same when the maintainer knows for certain that no user code
relies on calling seek on the device file.

The generated code is often incorrectly indented and right now contains
comments that clarify for each added line why a specific variant was
chosen. In the version that gets submitted upstream, the comments will
be gone and I will manually fix the indentation, because there does not
seem to be a way to do that using coccinelle.

Some amount of new code is currently sitting in linux-next that should get
the same modifications, which I will do at the end of the merge window.

Many thanks to Julia Lawall for helping me learn to write a semantic
patch that does all this.

===== begin semantic patch =====
// This adds an llseek= method to all file operations,
// as a preparation for making no_llseek the default.
//
// The rules are
// - use no_llseek explicitly if we do nonseekable_open
// - use seq_lseek for sequential files
// - use default_llseek if we know we access f_pos
// - use noop_llseek if we know we don't access f_pos,
// but we still want to allow users to call lseek
//
@ open1 exists @
identifier nested_open;
@@
nested_open(...)
{
<+...
nonseekable_open(...)
...+>
}

@ open exists@
identifier open_f;
identifier i, f;
identifier open1.nested_open;
@@
int open_f(struct inode *i, struct file *f)
{
<+...
(
nonseekable_open(...)
|
nested_open(...)
)
...+>
}

@ read disable optional_qualifier exists @
identifier read_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
expression E;
identifier func;
@@
ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off)
{
<+...
(
*off = E
|
*off += E
|
func(..., off, ...)
|
E = *off
)
...+>
}

@ read_no_fpos disable optional_qualifier exists @
identifier read_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
@@
ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off)
{
... when != off
}

@ write @
identifier write_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
expression E;
identifier func;
@@
ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off)
{
<+...
(
*off = E
|
*off += E
|
func(..., off, ...)
|
E = *off
)
...+>
}

@ write_no_fpos @
identifier write_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
@@
ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off)
{
... when != off
}

@ fops0 @
identifier fops;
@@
struct file_operations fops = {
...
};

@ has_llseek depends on fops0 @
identifier fops0.fops;
identifier llseek_f;
@@
struct file_operations fops = {
...
.llseek = llseek_f,
...
};

@ has_read depends on fops0 @
identifier fops0.fops;
identifier read_f;
@@
struct file_operations fops = {
...
.read = read_f,
...
};

@ has_write depends on fops0 @
identifier fops0.fops;
identifier write_f;
@@
struct file_operations fops = {
...
.write = write_f,
...
};

@ has_open depends on fops0 @
identifier fops0.fops;
identifier open_f;
@@
struct file_operations fops = {
...
.open = open_f,
...
};

// use no_llseek if we call nonseekable_open
////////////////////////////////////////////
@ nonseekable1 depends on !has_llseek && has_open @
identifier fops0.fops;
identifier nso ~= "nonseekable_open";
@@
struct file_operations fops = {
... .open = nso, ...
+.llseek = no_llseek, /* nonseekable */
};

@ nonseekable2 depends on !has_llseek @
identifier fops0.fops;
identifier open.open_f;
@@
struct file_operations fops = {
... .open = open_f, ...
+.llseek = no_llseek, /* open uses nonseekable */
};

// use seq_lseek for sequential files
/////////////////////////////////////
@ seq depends on !has_llseek @
identifier fops0.fops;
identifier sr ~= "seq_read";
@@
struct file_operations fops = {
... .read = sr, ...
+.llseek = seq_lseek, /* we have seq_read */
};

// use default_llseek if there is a readdir
///////////////////////////////////////////
@ fops1 depends on !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier readdir_e;
@@
// any other fop is used that changes pos
struct file_operations fops = {
... .readdir = readdir_e, ...
+.llseek = default_llseek, /* readdir is present */
};

// use default_llseek if at least one of read/write touches f_pos
/////////////////////////////////////////////////////////////////
@ fops2 depends on !fops1 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier read.read_f;
@@
// read fops use offset
struct file_operations fops = {
... .read = read_f, ...
+.llseek = default_llseek, /* read accesses f_pos */
};

@ fops3 depends on !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier write.write_f;
@@
// write fops use offset
struct file_operations fops = {
... .write = write_f, ...
+ .llseek = default_llseek, /* write accesses f_pos */
};

// Use noop_llseek if neither read nor write accesses f_pos
///////////////////////////////////////////////////////////

@ fops4 depends on !fops1 && !fops2 && !fops3 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier read_no_fpos.read_f;
identifier write_no_fpos.write_f;
@@
// write fops use offset
struct file_operations fops = {
...
.write = write_f,
.read = read_f,
...
+.llseek = noop_llseek, /* read and write both use no f_pos */
};

@ depends on has_write && !has_read && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier write_no_fpos.write_f;
@@
struct file_operations fops = {
... .write = write_f, ...
+.llseek = noop_llseek, /* write uses no f_pos */
};

@ depends on has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier read_no_fpos.read_f;
@@
struct file_operations fops = {
... .read = read_f, ...
+.llseek = noop_llseek, /* read uses no f_pos */
};

@ depends on !has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
@@
struct file_operations fops = {
...
+.llseek = noop_llseek, /* no read or write fn */
};
===== End semantic patch =====

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Julia Lawall <julia@diku.dk>
Cc: Christoph Hellwig <hch@infradead.org>


# 4015d9a8 31-Jul-2010 Richard Kennedy <richard@rsk.demon.co.uk>

random: Reorder struct entropy_store to remove padding on 64bits

Re-order structure entropy_store to remove 8 bytes of padding on
64 bit builds, so shrinking this structure from 72 to 64 bytes
and allowing it to fit into one cache line.

Signed-off-by: Richard Kennedy <richard@rsk.demon.co.uk>
Signed-off-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>


# e954bc91 20-May-2010 Matt Mackall <mpm@selenic.com>

random: simplify fips mode

Rather than dynamically allocate 10 bytes, move it to static allocation.
This saves space and avoids the need for error checking.

Signed-off-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>


# c41b20e7 11-Dec-2009 Adam Buchbinder <adam.buchbinder@gmail.com>

Fix misspellings of "truly" in comments.

Some comments misspell "truly"; this fixes them. No code changes.

Signed-off-by: Adam Buchbinder <adam.buchbinder@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>


# cd1510cb 01-Feb-2010 Herbert Xu <herbert@gondor.apana.org.au>

random: Remove unused inode variable

The previous changeset left behind an unused inode variable.
This patch removes it.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>


# a996996d 29-Jan-2010 Matt Mackall <mpm@selenic.com>

random: drop weird m_time/a_time manipulation

No other driver does anything remotely like this that I know of except
for the tty drivers, and I can't see any reason for random/urandom to do
it. In fact, it's a (trivial, harmless) timing information leak. And
obviously, it generates power- and flash-cycle wasting I/O, especially
if combined with something like hwrngd. Also, it breaks ubifs's
expectations.

Signed-off-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>


# 35900771 14-Dec-2009 Joe Perches <joe@perches.com>

random.c: use %pU to print UUIDs

Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 6d456111 16-Nov-2009 Eric W. Biederman <ebiederm@xmission.com>

sysctl: Drop & in front of every proc_handler.

For consistency drop & in front of every proc_handler. Explicity
taking the address is unnecessary and it prevents optimizations
like stubbing the proc_handlers to NULL.

Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>


# 894d2491 05-Nov-2009 Eric W. Biederman <ebiederm@xmission.com>

sysctl drivers: Remove dead binary sysctl support

Now that sys_sysctl is a wrapper around /proc/sys all of
the binary sysctl support elsewhere in the tree is
dead code.

Cc: Jens Axboe <axboe@kernel.dk>
Cc: Corey Minyard <minyard@acm.org>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Neil Brown <neilb@suse.de>
Cc: "James E.J. Bottomley" <James.Bottomley@suse.de>
Acked-by: Clemens Ladisch <clemens@ladisch.de> for drivers/char/hpet.c
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>


# 8d65af78 23-Sep-2009 Alexey Dobriyan <adobriyan@gmail.com>

sysctl: remove "struct file *" argument of ->proc_handler

It's unused.

It isn't needed -- read or write flag is already passed and sysctl
shouldn't care about the rest.

It _was_ used in two places at arch/frv for some reason.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: David Howells <dhowells@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: James Morris <jmorris@namei.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 5b739ef8 18-Jun-2009 Neil Horman <nhorman@tuxdriver.com>

random: Add optional continuous repetition test to entropy store based rngs

FIPS-140 requires that all random number generators implement continuous self
tests in which each extracted block of data is compared against the last block
for repetition. The ansi_cprng implements such a test, but it would be nice if
the hw rng's did the same thing. Obviously its not something thats always
needed, but it seems like it would be a nice feature to have on occasion. I've
written the below patch which allows individual entropy stores to be flagged as
desiring a continuous test to be run on them as is extracted. By default this
option is off, but is enabled in the event that fips mode is selected during
bootup.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>


# 26a9a418 19-May-2009 Linus Torvalds <torvalds@linux-foundation.org>

Avoid ICE in get_random_int() with gcc-3.4.5

Martin Knoblauch reports that trying to build 2.6.30-rc6-git3 with
RHEL4.3 userspace (gcc (GCC) 3.4.5 20051201 (Red Hat 3.4.5-2)) causes an
internal compiler error (ICE):

drivers/char/random.c: In function `get_random_int':
drivers/char/random.c:1672: error: unrecognizable insn:
(insn 202 148 150 0 /scratch/build/linux-2.6.30-rc6-git3/arch/x86/include/asm/tsc.h:23 (set (reg:SI 0 ax [91])
(subreg:SI (plus:DI (plus:DI (reg:DI 0 ax [88])
(subreg:DI (reg:SI 6 bp) 0))
(const_int -4 [0xfffffffffffffffc])) 0)) -1 (nil)
(nil))
drivers/char/random.c:1672: internal compiler error: in extract_insn, at recog.c:2083

and after some debugging it turns out that it's due to the code trying
to figure out the rough value of the current stack pointer by taking an
address of an uninitialized variable and casting that to an integer.

This is clearly a compiler bug, but it's not worth fighting - while the
current stack kernel pointer might be somewhat hard to predict in user
space, it's also not generally going to change for a lot of the call
chains for a particular process.

So just drop it, and mumble some incoherent curses at the compiler.

Tested-by: Martin Knoblauch <spamtrap@knobisoft.de>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 8a0a9bd4 05-May-2009 Linus Torvalds <torvalds@linux-foundation.org>

random: make get_random_int() more random

It's a really simple patch that basically just open-codes the current
"secure_ip_id()" call, but when open-coding it we now use a _static_
hashing area, so that it gets updated every time.

And to make sure somebody can't just start from the same original seed of
all-zeroes, and then do the "half_md4_transform()" over and over until
they get the same sequence as the kernel has, each iteration also mixes in
the same old "current->pid + jiffies" we used - so we should now have a
regular strong pseudo-number generator, but we also have one that doesn't
have a single seed.

Note: the "pid + jiffies" is just meant to be a tiny tiny bit of noise. It
has no real meaning. It could be anything. I just picked the previous
seed, it's just that now we keep the state in between calls and that will
feed into the next result, and that should make all the difference.

I made that hash be a per-cpu data just to avoid cache-line ping-pong:
having multiple CPU's write to the same data would be fine for randomness,
and add yet another layer of chaos to it, but since get_random_int() is
supposed to be a fast interface I did it that way instead. I considered
using "__raw_get_cpu_var()" to avoid any preemption overhead while still
getting the hash be _mostly_ ping-pong free, but in the end good taste won
out.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 417b43d4 02-Apr-2009 Anton Blanchard <anton@samba.org>

random: align rekey_work's timer

Align rekey_work. Even though it's infrequent, we may as well line it up.

Signed-off-by: Anton Blanchard <anton@samba.org>
Acked-by: Matt Mackall <mpm@selenic.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# d178a1eb 11-Jan-2009 Yinghai Lu <yinghai@kernel.org>

sparseirq: fix build with unknown irq_desc struct

Ingo Molnar wrote:
>
> tip/kernel/fork.c: In function 'copy_signal':
> tip/kernel/fork.c:825: warning: unused variable 'ret'
> tip/drivers/char/random.c: In function 'get_timer_rand_state':
> tip/drivers/char/random.c:584: error: dereferencing pointer to incomplete type
> tip/drivers/char/random.c: In function 'set_timer_rand_state':
> tip/drivers/char/random.c:594: error: dereferencing pointer to incomplete type
> make[3]: *** [drivers/char/random.o] Error 1

irq_desc is defined in linux/irq.h, so include it in the genirq case.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# d7e51e66 07-Jan-2009 Yinghai Lu <yinghai@kernel.org>

sparseirq: make some func to be used with genirq

Impact: clean up sparseirq fallout on random.c

Ingo suggested to change some ifdef from SPARSE_IRQ to GENERIC_HARDIRQS
so we could some #ifdef later if all arch support genirq

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Acked-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# cda796a3 06-Jan-2009 Matt Mackall <mpm@selenic.com>

random: don't try to look at entropy_count outside the lock

As a non-atomic value, it's only safe to look at entropy_count when the
pool lock is held, so we move the BUG_ON inside the lock for correctness.

Also remove the spurious comment. It's ok for entropy_count to
temporarily exceed POOLBITS so long as it's left in a consistent state
when the lock is released.

This is a more correct, simple, and idiomatic fix for the bug in
8b76f46a2db. I've left the reorderings introduced by that patch in place
as they're harmless, even though they don't properly deal with potential
atomicity issues.

Signed-off-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 2f983570 03-Jan-2009 Yinghai Lu <yinghai@kernel.org>

sparseirq: move set/get_timer_rand_state back to .c

those two functions only used in that C file

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 0b8f1efa 05-Dec-2008 Yinghai Lu <yinghai@kernel.org>

sparse irq_desc[] array: core kernel and x86 changes

Impact: new feature

Problem on distro kernels: irq_desc[NR_IRQS] takes megabytes of RAM with
NR_CPUS set to large values. The goal is to be able to scale up to much
larger NR_IRQS value without impacting the (important) common case.

To solve this, we generalize irq_desc[NR_IRQS] to an (optional) array of
irq_desc pointers.

When CONFIG_SPARSE_IRQ=y is used, we use kzalloc_node to get irq_desc,
this also makes the IRQ descriptors NUMA-local (to the site that calls
request_irq()).

This gets rid of the irq_cfg[] static array on x86 as well: irq_cfg now
uses desc->chip_data for x86 to store irq_cfg.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 233e70f4 31-Oct-2008 Al Viro <viro@ZenIV.linux.org.uk>

saner FASYNC handling on file close

As it is, all instances of ->release() for files that have ->fasync()
need to remember to evict file from fasync lists; forgetting that
creates a hole and we actually have a bunch that *does* forget.

So let's keep our lives simple - let __fput() check FASYNC in
file->f_flags and call ->fasync() there if it's been set. And lose that
crap in ->release() instances - leaving it there is still valid, but we
don't have to bother anymore.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# f221e726 15-Oct-2008 Alexey Dobriyan <adobriyan@gmail.com>

sysctl: simplify ->strategy

name and nlen parameters passed to ->strategy hook are unused, remove
them. In general ->strategy hook should know what it's doing, and don't
do something tricky for which, say, pointer to original userspace array
may be needed (name).

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net> [ networking bits ]
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Matt Mackall <mpm@selenic.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# d6c88a50 15-Oct-2008 Thomas Gleixner <tglx@linutronix.de>

genirq: revert dynarray

Revert the dynarray changes. They need more thought and polishing.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 2cc21ef8 15-Oct-2008 Thomas Gleixner <tglx@linutronix.de>

genirq: remove sparse irq code

This code is not ready, but we need to rip it out instead of rebasing
as we would lose the APIC/IO_APIC unification otherwise.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 3060d6fe 19-Aug-2008 Yinghai Lu <yhlu.kernel@gmail.com>

x86: put timer_rand_state pointer into irq_desc

irq_timer_state[] is a NR_IRQS sized array that is a side-by array to
the real irq_desc[] array.

Integrate that field into the (now dynamic) irq_desc dynamic array and
save some RAM.

v2: keep the old way to support arch not support irq_desc

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# eef1de76 19-Aug-2008 Yinghai Lu <yhlu.kernel@gmail.com>

irqs: make irq_timer_state to use dyn_array

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 1f45f562 19-Aug-2008 Yinghai Lu <yhlu.kernel@gmail.com>

drivers/char: use nr_irqs

convert them to nr_irqs.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# f331c029 03-Sep-2008 Tejun Heo <tj@kernel.org>

block: don't depend on consecutive minor space

* Implement disk_devt() and part_devt() and use them to directly
access devt instead of computing it from ->major and ->first_minor.

Note that all references to ->major and ->first_minor outside of
block layer is used to determine devt of the disk (the part0) and as
->major and ->first_minor will continue to represent devt for the
disk, converting these users aren't strictly necessary. However,
convert them for consistency.

* Implement disk_max_parts() to avoid directly deferencing
genhd->minors.

* Update bdget_disk() such that it doesn't assume consecutive minor
space.

* Move devt computation from register_disk() to add_disk() and make it
the only one (all other usages use the initially determined value).

These changes clean up the code and will help disk->part dereference
fix and extended block device numbers.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>


# 8b76f46a 02-Sep-2008 Andrew Morton <akpm@linux-foundation.org>

drivers/char/random.c: fix a race which can lead to a bogus BUG()

Fix a bug reported by and diagnosed by Aaron Straus.

This is a regression intruduced into 2.6.26 by

commit adc782dae6c4c0f6fb679a48a544cfbcd79ae3dc
Author: Matt Mackall <mpm@selenic.com>
Date: Tue Apr 29 01:03:07 2008 -0700

random: simplify and rename credit_entropy_store

credit_entropy_bits() does:

spin_lock_irqsave(&r->lock, flags);
...
if (r->entropy_count > r->poolinfo->POOLBITS)
r->entropy_count = r->poolinfo->POOLBITS;

so there is a time window in which this BUG_ON():

static size_t account(struct entropy_store *r, size_t nbytes, int min,
int reserved)
{
unsigned long flags;

BUG_ON(r->entropy_count > r->poolinfo->POOLBITS);

/* Hold lock while accounting */
spin_lock_irqsave(&r->lock, flags);

can trigger.

We could fix this by moving the assertion inside the lock, but it seems
safer and saner to revert to the old behaviour wherein
entropy_store.entropy_count at no time exceeds
entropy_store.poolinfo->POOLBITS.

Reported-by: Aaron Straus <aaron@merfinllc.com>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: <stable@kernel.org> [2.6.26.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 9f593653 18-Aug-2008 Stephen Hemminger <shemminger@vyatta.com>

nf_nat: use secure_ipv4_port_ephemeral() for NAT port randomization

Use incoming network tuple as seed for NAT port randomization.
This avoids concerns of leaking net_random() bits, and also gives better
port distribution. Don't have NAT server, compile tested only.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

[ added missing EXPORT_SYMBOL_GPL ]

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 27ac792c 23-Jul-2008 Andrea Righi <righi.andrea@gmail.com>

PAGE_ALIGN(): correctly handle 64-bit values on 32-bit architectures

On 32-bit architectures PAGE_ALIGN() truncates 64-bit values to the 32-bit
boundary. For example:

u64 val = PAGE_ALIGN(size);

always returns a value < 4GB even if size is greater than 4GB.

The problem resides in PAGE_MASK definition (from include/asm-x86/page.h for
example):

#define PAGE_SHIFT 12
#define PAGE_SIZE (_AC(1,UL) << PAGE_SHIFT)
#define PAGE_MASK (~(PAGE_SIZE-1))
...
#define PAGE_ALIGN(addr) (((addr)+PAGE_SIZE-1)&PAGE_MASK)

The "~" is performed on a 32-bit value, so everything in "and" with
PAGE_MASK greater than 4GB will be truncated to the 32-bit boundary.
Using the ALIGN() macro seems to be the right way, because it uses
typeof(addr) for the mask.

Also move the PAGE_ALIGN() definitions out of include/asm-*/page.h in
include/linux/mm.h.

See also lkml discussion: http://lkml.org/lkml/2008/6/11/237

[akpm@linux-foundation.org: fix drivers/media/video/uvc/uvc_queue.c]
[akpm@linux-foundation.org: fix v850]
[akpm@linux-foundation.org: fix powerpc]
[akpm@linux-foundation.org: fix arm]
[akpm@linux-foundation.org: fix mips]
[akpm@linux-foundation.org: fix drivers/media/video/pvrusb2/pvrusb2-dvb.c]
[akpm@linux-foundation.org: fix drivers/mtd/maps/uclinux.c]
[akpm@linux-foundation.org: fix powerpc]
Signed-off-by: Andrea Righi <righi.andrea@gmail.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 9a6f70bb 29-Apr-2008 Jeff Dike <jdike@addtoit.com>

random: add async notification support to /dev/random

Add async notification support to /dev/random.

A little test case is below. Without this patch, you get:

$ ./async-random
Drained the pool
Found more randomness

With it, you get:

$ ./async-random
Drained the pool
SIGIO
Found more randomness

#include <stdio.h>
#include <stdlib.h>
#include <signal.h>
#include <errno.h>
#include <fcntl.h>

static void handler(int sig)
{
printf("SIGIO\n");
}

int main(int argc, char **argv)
{
int fd, n, err, flags;

if(signal(SIGIO, handler) < 0){
perror("setting SIGIO handler");
exit(1);
}

fd = open("/dev/random", O_RDONLY);
if(fd < 0){
perror("open");
exit(1);
}

flags = fcntl(fd, F_GETFL);
if (flags < 0){
perror("getting flags");
exit(1);
}

flags |= O_NONBLOCK;
if (fcntl(fd, F_SETFL, flags) < 0){
perror("setting flags");
exit(1);
}

while((err = read(fd, &n, sizeof(n))) > 0) ;

if(err == 0){
printf("random returned 0\n");
exit(1);
}
else if(errno != EAGAIN){
perror("read");
exit(1);
}

flags |= O_ASYNC;
if (fcntl(fd, F_SETFL, flags) < 0){
perror("setting flags");
exit(1);
}

if (fcntl(fd, F_SETOWN, getpid()) < 0) {
perror("Setting SIGIO");
exit(1);
}

printf("Drained the pool\n");
read(fd, &n, sizeof(n));
printf("Found more randomness\n");

return(0);
}

Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Signed-off-by: Matt Mackall <mpm@selenic.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# adc782da 29-Apr-2008 Matt Mackall <mpm@selenic.com>

random: simplify and rename credit_entropy_store

- emphasize bits in the name
- make zero bits lock-free
- simplify logic

Signed-off-by: Matt Mackall <mpm@selenic.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# e68e5b66 29-Apr-2008 Matt Mackall <mpm@selenic.com>

random: make mixing interface byte-oriented

Switch add_entropy_words to a byte-oriented interface, eliminating numerous
casts and byte/word size rounding issues. This also reduces the overall
bit/byte/word confusion in this code.

We now mix a byte at a time into the word-based pool. This takes four times
as many iterations, but should be negligible compared to hashing overhead.
This also increases our pool churn, which adds some depth against some
theoretical failure modes.

The function name is changed to emphasize pool mixing and deemphasize entropy
(the samples mixed in may not contain any). extract is added to the core
function to make it clear that it extracts from the pool.

Signed-off-by: Matt Mackall <mpm@selenic.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 993ba211 29-Apr-2008 Matt Mackall <mpm@selenic.com>

random: simplify add_ptr logic

The add_ptr variable wasn't used in a sensible way, use only i instead.
i got reused later for a different purpose, use j instead.

While we're here, put tap0 first in the tap list and add a comment.

Signed-off-by: Matt Mackall <mpm@selenic.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 6d38b827 29-Apr-2008 Matt Mackall <mpm@selenic.com>

random: remove some prefetch logic

The urandom output pool (ie the fast path) fits in one cacheline, so
this is pretty unnecessary. Further, the output path has already
fetched the entire pool to hash it before calling in here.

(This was the only user of prefetch_range in the kernel, and it passed
in words rather than bytes!)

Signed-off-by: Matt Mackall <mpm@selenic.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# feee7697 29-Apr-2008 Matt Mackall <mpm@selenic.com>

random: eliminate redundant new_rotate variable

- eliminate new_rotate
- move input_rotate masking
- simplify input_rotate update
- move input_rotate update to end of inner loop for readability

Signed-off-by: Matt Mackall <mpm@selenic.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 43358209 29-Apr-2008 Matt Mackall <mpm@selenic.com>

random: remove cacheline alignment for locks

Earlier changes greatly reduce the number of times we grab the lock
per output byte, so we shouldn't need this particular hack any more.

Signed-off-by: Matt Mackall <mpm@selenic.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 1c0ad3d4 29-Apr-2008 Matt Mackall <mpm@selenic.com>

random: make backtracking attacks harder

At each extraction, we change (poolbits / 16) + 32 bits in the pool,
or 96 bits in the case of the secondary pools. Thus, a brute-force
backtracking attack on the pool state is less difficult than breaking
the hash. In certain cases, this difficulty may be is reduced to 2^64
iterations.

Instead, hash the entire pool in one go, then feedback the whole hash
(160 bits) in one go. This will make backtracking at least as hard as
inverting the hash.

Signed-off-by: Matt Mackall <mpm@selenic.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# ffd8d3fa 29-Apr-2008 Matt Mackall <mpm@selenic.com>

random: improve variable naming, clear extract buffer

- split the SHA variables apart into hash and workspace
- rename data to extract
- wipe extract and workspace after hashing

Signed-off-by: Matt Mackall <mpm@selenic.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 53c3f63e 29-Apr-2008 Matt Mackall <mpm@selenic.com>

random: reuse rand_initialize

Signed-off-by: Matt Mackall <mpm@selenic.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 43ae4860 29-Apr-2008 Matt Mackall <mpm@selenic.com>

random: use unlocked_ioctl

No locking actually needed.

Signed-off-by: Matt Mackall <mpm@selenic.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 88c730da 29-Apr-2008 Matt Mackall <mpm@selenic.com>

random: consolidate wakeup logic

Signed-off-by: Matt Mackall <mpm@selenic.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 90b75ee5 29-Apr-2008 Matt Mackall <mpm@selenic.com>

random: clean up checkpatch complaints

Signed-off-by: Matt Mackall <mpm@selenic.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 91f3f1e3 06-Feb-2008 Matt Mackall <mpm@selenic.com>

drivers/char/random.c:write_pool() cond_resched() needed

Reduce latency for large writes to /dev/[u]random

Signed-off-by: Matt Mackall <mpm@selenic.com>
Cc: Sami Farin <safari-kernel@safari.iki.fi>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 640e248e 30-Jan-2008 Adrian Bunk <bunk@kernel.org>

unexport add_disk_randomness

This patch removes the no longer used EXPORT_SYMBOL(add_disk_randomness).

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Acked-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>


# 6dd10a62 13-Nov-2007 Eric Dumazet <dada1@cosmosbay.com>

[NET] random : secure_tcp_sequence_number should not assume CONFIG_KTIME_SCALAR

All 32 bits machines but i386 dont have CONFIG_KTIME_SCALAR. On these
machines, ktime.tv64 is more than 4 times the (correct) result given
by ktime_to_ns()

Again on these machines, using ktime_get_real().tv64 >> 6 give a
32bits rollover every 64 seconds, which is not wanted (less than the
120 s MSL)

Using ktime_to_ns() is the portable way to get nsecs from a ktime, and
have correct code.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c80544dc 18-Oct-2007 Stephen Hemminger <shemminger@linux-foundation.org>

sparse pointer use of zero as null

Get rid of sparse related warnings from places that use integer as NULL
pointer.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Cc: Andi Kleen <ak@suse.de>
Cc: Jeff Garzik <jeff@garzik.org>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Ian Kent <raven@themaw.net>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Davide Libenzi <davidel@xmailserver.org>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 9b42c336 01-Oct-2007 Eric Dumazet <dada1@cosmosbay.com>

[TCP]: secure_tcp_sequence_number() should not use a too fast clock

TCP V4 sequence numbers are 32bits, and RFC 793 assumed a 250 KHz clock.
In order to follow network speed increase, we can use a faster clock, but
we should limit this clock so that the delay between two rollovers is
greater than MSL (TCP Maximum Segment Lifetime : 2 minutes)

Choosing a 64 nsec clock should be OK, since the rollovers occur every
274 seconds.

Problem spotted by Denys Fedoryshchenko

[ This bug was introduced by f85958151900f9d30fa5ff941b0ce71eaa45a7de ]

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 5a021e9f 19-Jul-2007 Matt Mackall <mpm@selenic.com>

random: fix bound check ordering (CVE-2007-3105)

If root raised the default wakeup threshold over the size of the
output pool, the pool transfer function could overflow the stack with
RNG bytes, causing a DoS or potential privilege escalation.

(Bug reported by the PaX Team <pageexec@freemail.hu>)

Cc: Theodore Tso <tytso@mit.edu>
Cc: Willy Tarreau <w@1wt.eu>
Signed-off-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 679ce0ac 16-Jun-2007 Matt Mackall <mpm@selenic.com>

random: fix output buffer folding

(As reported by linux@horizon.com)

Folding is done to minimize the theoretical possibility of systematic
weakness in the particular bits of the SHA1 hash output. The result of
this bug is that 16 out of 80 bits are un-folded. Without a major new
vulnerability being found in SHA1, this is harmless, but still worth
fixing.

Signed-off-by: Matt Mackall <mpm@selenic.com>
Cc: <linux@horizon.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 7f397dcd 29-May-2007 Matt Mackall <mpm@selenic.com>

random: fix seeding with zero entropy

Add data from zero-entropy random_writes directly to output pools to
avoid accounting difficulties on machines without entropy sources.

Tested on lguest with all entropy sources disabled.

Signed-off-by: Matt Mackall <mpm@selenic.com>
Acked-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 602b6aee 29-May-2007 Matt Mackall <mpm@selenic.com>

random: fix error in entropy extraction

Fix cast error in entropy extraction.
Add comments explaining the magic 16.
Remove extra confusing loop variable.

Signed-off-by: Matt Mackall <mpm@selenic.com>
Acked-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# f8595815 28-Mar-2007 Eric Dumazet <dada1@cosmosbay.com>

[NET]: random functions can use nsec resolution instead of usec

In order to get more randomness for secure_tcpv6_sequence_number(),
secure_tcp_sequence_number(), secure_dccp_sequence_number() functions,
we can use the high resolution time services, providing nanosec
resolution.

I've also done two kmalloc()/kzalloc() conversions.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# cb69cc52 07-Mar-2007 Adrian Bunk <bunk@stusta.de>

[TCP/DCCP/RANDOM]: Remove unused exports.

This patch removes the following not or no longer used exports:
- drivers/char/random.c: secure_tcp_sequence_number
- net/dccp/options.c: sysctl_dccp_feat_sequence_window
- net/netlink/af_netlink.c: netlink_set_err

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 2b8693c0 12-Feb-2007 Arjan van de Ven <arjan@linux.intel.com>

[PATCH] mark struct file_operations const 3

Many struct file_operations in the kernel can be "const". Marking them const
moves these to the .rodata section, which avoids false sharing with potential
dirty data. In addition it'll catch accidental writes at compile time to
these shared resources.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 1f29bcd7 10-Dec-2006 Alexey Dobriyan <adobriyan@gmail.com>

[PATCH] sysctl: remove unused "context" param

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andi Kleen <ak@suse.de>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: David Howells <dhowells@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# a7113a96 08-Dec-2006 Josef Sipek <jsipek@fsl.cs.sunysb.edu>

[PATCH] struct path: convert char-drivers

Signed-off-by: Josef Sipek <jsipek@fsl.cs.sunysb.edu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# b09b845c 14-Nov-2006 Al Viro <viro@zeniv.linux.org.uk>

[RANDOM]: Annotate random.h IP helpers.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 65f27f38 22-Nov-2006 David Howells <dhowells@redhat.com>

WorkStruct: Pass the work_struct pointer instead of context data

Pass the work_struct pointer to the work function rather than context data.
The work function can use container_of() to work out the data.

For the cases where the container of the work_struct may go away the moment the
pending bit is cleared, it is made possible to defer the release of the
structure by deferring the clearing of the pending bit.

To make this work, an extra flag is introduced into the management side of the
work_struct. This governs auto-release of the structure upon execution.

Ordinarily, the work queue executor would release the work_struct for further
scheduling or deallocation by clearing the pending bit prior to jumping to the
work function. This means that, unless the driver makes some guarantee itself
that the work_struct won't go away, the work function may not access anything
else in the work_struct or its container lest they be deallocated.. This is a
problem if the auxiliary data is taken away (as done by the last patch).

However, if the pending bit is *not* cleared before jumping to the work
function, then the work function *may* access the work_struct and its container
with no problems. But then the work function must itself release the
work_struct by calling work_release().

In most cases, automatic release is fine, so this is the default. Special
initiators exist for the non-auto-release case (ending in _NAR).


Signed-Off-By: David Howells <dhowells@redhat.com>


# 52bad64d 22-Nov-2006 David Howells <dhowells@redhat.com>

WorkStruct: Separate delayable and non-delayable events.

Separate delayable work items from non-delayable work items be splitting them
into a separate structure (delayed_work), which incorporates a work_struct and
the timer_list removed from work_struct.

The work_struct struct is huge, and this limits it's usefulness. On a 64-bit
architecture it's nearly 100 bytes in size. This reduces that by half for the
non-delayable type of event.

Signed-Off-By: David Howells <dhowells@redhat.com>


# 80fc9f53 10-Oct-2006 Dmitry Torokhov <dtor@insightbb.com>

Input: add missing exports to fix modular build

Signed-off-by: Dmitry Torokhov <dtor@mail.ru>


# e9ff3990 02-Oct-2006 Serge E. Hallyn <serue@us.ibm.com>

[PATCH] namespaces: utsname: switch to using uts namespaces

Replace references to system_utsname to the per-process uts namespace
where appropriate. This includes things like uname.

Changes: Per Eric Biederman's comments, use the per-process uts namespace
for ELF_PLATFORM, sunrpc, and parts of net/ipv4/ipconfig.c

[jdike@addtoit.com: UML fix]
[clg@fr.ibm.com: cleanup]
[akpm@osdl.org: build fix]
Signed-off-by: Serge E. Hallyn <serue@us.ibm.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Andrey Savochkin <saw@sw.ru>
Signed-off-by: Cedric Le Goater <clg@fr.ibm.com>
Cc: Jeff Dike <jdike@addtoit.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 9361401e 30-Sep-2006 David Howells <dhowells@redhat.com>

[PATCH] BLOCK: Make it possible to disable the block layer [try #6]

Make it possible to disable the block layer. Not all embedded devices require
it, some can make do with just JFFS2, NFS, ramfs, etc - none of which require
the block layer to be present.

This patch does the following:

(*) Introduces CONFIG_BLOCK to disable the block layer, buffering and blockdev
support.

(*) Adds dependencies on CONFIG_BLOCK to any configuration item that controls
an item that uses the block layer. This includes:

(*) Block I/O tracing.

(*) Disk partition code.

(*) All filesystems that are block based, eg: Ext3, ReiserFS, ISOFS.

(*) The SCSI layer. As far as I can tell, even SCSI chardevs use the
block layer to do scheduling. Some drivers that use SCSI facilities -
such as USB storage - end up disabled indirectly from this.

(*) Various block-based device drivers, such as IDE and the old CDROM
drivers.

(*) MTD blockdev handling and FTL.

(*) JFFS - which uses set_bdev_super(), something it could avoid doing by
taking a leaf out of JFFS2's book.

(*) Makes most of the contents of linux/blkdev.h, linux/buffer_head.h and
linux/elevator.h contingent on CONFIG_BLOCK being set. sector_div() is,
however, still used in places, and so is still available.

(*) Also made contingent are the contents of linux/mpage.h, linux/genhd.h and
parts of linux/fs.h.

(*) Makes a number of files in fs/ contingent on CONFIG_BLOCK.

(*) Makes mm/bounce.c (bounce buffering) contingent on CONFIG_BLOCK.

(*) set_page_dirty() doesn't call __set_page_dirty_buffers() if CONFIG_BLOCK
is not enabled.

(*) fs/no-block.c is created to hold out-of-line stubs and things that are
required when CONFIG_BLOCK is not set:

(*) Default blockdev file operations (to give error ENODEV on opening).

(*) Makes some /proc changes:

(*) /proc/devices does not list any blockdevs.

(*) /proc/diskstats and /proc/partitions are contingent on CONFIG_BLOCK.

(*) Makes some compat ioctl handling contingent on CONFIG_BLOCK.

(*) If CONFIG_BLOCK is not defined, makes sys_quotactl() return -ENODEV if
given command other than Q_SYNC or if a special device is specified.

(*) In init/do_mounts.c, no reference is made to the blockdev routines if
CONFIG_BLOCK is not defined. This does not prohibit NFS roots or JFFS2.

(*) The bdflush, ioprio_set and ioprio_get syscalls can now be absent (return
error ENOSYS by way of cond_syscall if so).

(*) The seclvl_bd_claim() and seclvl_bd_release() security calls do nothing if
CONFIG_BLOCK is not set, since they can't then happen.

Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# e4d91918 03-Jul-2006 Ingo Molnar <mingo@elte.hu>

[PATCH] lockdep: locking init debugging improvement

Locking init improvement:

- introduce and use __SPIN_LOCK_UNLOCKED for array initializations,
to pass in the name string of locks, used by debugging

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 6ab3d562 30-Jun-2006 Jörn Engel <joern@wohnheim.fh-wedel.de>

Remove obsolete #include <linux/config.h>

Signed-off-by: Jörn Engel <joern@wohnheim.fh-wedel.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>


# 30aaa154 09-Apr-2006 Adrian Bunk <bunk@stusta.de>

[IPV6]: Unexport secure_ipv6_port_ephemeral

This patch removes the unused EXPORT_SYMBOL(secure_ipv6_port_ephemeral).

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d251575a 11-Jan-2006 Stephen Hemminger <shemminger@osdl.org>

[PATCH] random: get rid of sparse warning

Get rid of bogus extern attribute that causes sparse warning.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Acked-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# d8313f5c 14-Dec-2005 Arnaldo Carvalho de Melo <acme@mandriva.com>

[INET6]: Generalise tcp_v6_hash_connect

Renaming it to inet6_hash_connect, making it possible to ditch
dccp_v6_hash_connect and share the same code with TCP instead.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a7f5e7f1 14-Dec-2005 Arnaldo Carvalho de Melo <acme@mandriva.com>

[INET]: Generalise tcp_v4_hash_connect

Renaming it to inet_hash_connect, making it possible to ditch
dccp_v4_hash_connect and share the same code with TCP instead.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c4365c92 09-Aug-2005 Arnaldo Carvalho de Melo <acme@ghostprotocols.net>

[RANDOM]: Introduce secure_dccp_sequence_number

Code contributed by Stephen Hemminger.

Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 6c036527 07-Jul-2005 Christoph Lameter <christoph@lameter.com>

[PATCH] mostly_read data section

Add a new section called ".data.read_mostly" for data items that are read
frequently and rarely written to like cpumaps etc.

If these maps are placed in the .data section then these frequenly read
items may end up in cachelines with data is is frequently updated. In that
case all processors in an SMP system must needlessly reload the cachelines
again and again containing elements of those frequently used variables.

The ability to share these cachelines will allow each cpu in an SMP system
to keep local copies of those shared cachelines thereby optimizing
performance.

Signed-off-by: Alok N Kataria <alokk@calsoftinc.com>
Signed-off-by: Shobhit Dayal <shobhit@calsoftinc.com>
Signed-off-by: Christoph Lameter <christoph@scalex86.org>
Signed-off-by: Shai Fultheim <shai@scalex86.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 9e95ce27 16-Apr-2005 Matt Mackall <mpm@selenic.com>

[PATCH] update maintainer for /dev/random

Ted has agreed to let me take over as maintainer of /dev/random and
friends. I've gone ahead and added a line to his entry in CREDITS.

Signed-off-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 1da177e4 16-Apr-2005 Linus Torvalds <torvalds@ppc970.osdl.org>

Linux-2.6.12-rc2

Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.

Let it rip!