History log of /linux-master/lib/idr.c
Revision Date Author Comments
# af73483f 21-Dec-2023 Matthew Wilcox (Oracle) <willy@infradead.org>

ida: Fix crash in ida_free when the bitmap is empty

The IDA usually detects double-frees, but that detection failed to
consider the case when there are no nearby IDs allocated and so we have a
NULL bitmap rather than simply having a clear bit. Add some tests to the
test-suite to be sure we don't inadvertently reintroduce this problem.
Unfortunately they're quite noisy so include a message to disregard
the warnings.

Reported-by: Zhenghan Wang <wzhmmmmm@gmail.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 2a15de80 26-Aug-2023 Ariel Marcovitch <arielmarcovitch@gmail.com>

idr: fix param name in idr_alloc_cyclic() doc

The relevant parameter is 'start' and not 'nextid'

Fixes: 460488c58ca8 ("idr: Remove idr_alloc_ext")
Signed-off-by: Ariel Marcovitch <arielmarcovitch@gmail.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>


# fc82bbf4 10-Jul-2022 Linus Torvalds <torvalds@linux-foundation.org>

ida: don't use BUG_ON() for debugging

This is another old BUG_ON() that just shouldn't exist (see also commit
a382f8fee42c: "signal handling: don't use BUG_ON() for debugging").

In fact, as Matthew Wilcox points out, this condition shouldn't really
even result in a warning, since a negative id allocation result is just
a normal allocation failure:

"I wonder if we should even warn here -- sure, the caller is trying to
free something that wasn't allocated, but we don't warn for
kfree(NULL)"

and goes on to point out how that current error check is only causing
people to unnecessarily do their own index range checking before freeing
it.

This was noted by Itay Iellin, because the bluetooth HCI socket cookie
code does *not* do that range checking, and ends up just freeing the
error case too, triggering the BUG_ON().

The HCI code requires CAP_NET_RAW, and seems to just result in an ugly
splat, but there really is no reason to BUG_ON() here, and we have
generally striven for allocation models where it's always ok to just do

free(alloc());

even if the allocation were to fail for some random reason (usually
obviously that "random" reason being some resource limit).

Fixes: 88eca0207cf1 ("ida: simplified functions for id allocation")
Reported-by: Itay Iellin <ieitayie@gmail.com>
Suggested-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 3b674261 15-Oct-2020 Stephen Boyd <swboyd@chromium.org>

lib/idr.c: document calling context for IDA APIs mustn't use locks

The documentation for these functions indicates that callers don't need to
hold a lock while calling them, but that documentation is only in one
place under "IDA Usage". Let's state the same information on each IDA
function so that it's clear what the calling context requires.
Furthermore, let's document ida_simple_get() with the same information so
that callers know how this API works.

Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Tri Vo <trong@android.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Matthew Wilcox <willy@infradead.org>
Link: https://lkml.kernel.org/r/20200910055246.2297797-1-swboyd@chromium.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# a219b856 02-Apr-2020 Matthew Wilcox (Oracle) <willy@infradead.org>

ida: Free allocated bitmap in error path

If a bitmap needs to be allocated, and then by the time the thread
is scheduled to be run again all the indices which would satisfy the
allocation have been allocated then we would leak the allocation. Almost
impossible to hit in practice, but a trivial fix. Found by Coverity.

Fixes: f32f004cddf8 ("ida: Convert to XArray")
Reported-by: coverity-bot <keescook+coverity-bot@chromium.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>


# 5a74ac4c 01-Nov-2019 Matthew Wilcox (Oracle) <willy@infradead.org>

idr: Fix idr_get_next_ul race with idr_remove

Commit 5c089fd0c734 ("idr: Fix idr_get_next race with idr_remove")
neglected to fix idr_get_next_ul(). As far as I can tell, nobody's
actually using this interface under the RCU read lock, but fix it now
before anybody decides to use it.

Fixes: 5c089fd0c734 ("idr: Fix idr_get_next race with idr_remove")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>


# 5c089fd0 14-May-2019 Matthew Wilcox (Oracle) <willy@infradead.org>

idr: Fix idr_get_next race with idr_remove

If the entry is deleted from the IDR between the call to
radix_tree_iter_find() and rcu_dereference_raw(), idr_get_next()
will return NULL, which will end the iteration prematurely. We should
instead continue to the next entry in the IDR. This only happens if the
iteration is protected by the RCU lock. Most IDR users use a spinlock
or semaphore to exclude simultaneous modifications. It was noticed once
the PID allocator was converted to use the IDR, as it uses the RCU lock,
but there may be other users elsewhere in the kernel.

We can't use the normal pattern of calling radix_tree_deref_retry()
(which catches both a retry entry in a leaf node and a node entry in
the root) as the IDR supports storing entries which are unaligned,
which will trigger an infinite loop if they are encountered. Instead,
we have to explicitly check whether the entry is a retry entry.

Fixes: 0a835c4f090a ("Reimplement IDR and IDA using the radix tree")
Reported-by: Brendan Gregg <bgregg@netflix.com>
Tested-by: Brendan Gregg <bgregg@netflix.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>


# 457c8996 19-May-2019 Thomas Gleixner <tglx@linutronix.de>

treewide: Add SPDX license identifier for missed files

Add SPDX license identifiers to all files which:

- Have no license information of any form

- Have EXPORT_.*_SYMBOL_GPL inside which was used in the
initial scan/conversion to ignore the file

These files fall under the project license, GPL v2 only. The resulting SPDX
license identifier is:

GPL-2.0-only

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# 1cf56f9d 09-Apr-2018 Matthew Wilcox <willy@infradead.org>

radix tree: Remove radix_tree_update_node_t

The only user of this functionality was the workingset code, and it's
now been converted to the XArray. Remove __radix_tree_delete_node()
entirely as it was also only used by the workingset code.

Signed-off-by: Matthew Wilcox <willy@infradead.org>


# f32f004c 04-Jul-2018 Matthew Wilcox <willy@infradead.org>

ida: Convert to XArray

Use the XA_TRACK_FREE ability to track which entries have a free bit,
similarly to how it uses the radix tree's IDR_FREE tag. This eliminates
the per-cpu ida_bitmap preload, and fixes the memory consumption
regression I introduced when making the IDR able to store any pointer.

Signed-off-by: Matthew Wilcox <willy@infradead.org>


# f8d5d0cc 07-Nov-2017 Matthew Wilcox <willy@infradead.org>

xarray: Add definition of struct xarray

This is a direct replacement for struct radix_tree_root. Some of the
struct members have changed name; convert those, and use a #define so
that radix_tree users continue to work without change.

Signed-off-by: Matthew Wilcox <willy@infradead.org>
Reviewed-by: Josef Bacik <jbacik@fb.com>


# 3159f943 03-Nov-2017 Matthew Wilcox <willy@infradead.org>

xarray: Replace exceptional entries

Introduce xarray value entries and tagged pointers to replace radix
tree exceptional entries. This is a slight change in encoding to allow
the use of an extra bit (we can now store BITS_PER_LONG - 1 bits in a
value entry). It is also a change in emphasis; exceptional entries are
intimidating and different. As the comment explains, you can choose
to store values or pointers in the xarray and they are both first-class
citizens.

Signed-off-by: Matthew Wilcox <willy@infradead.org>
Reviewed-by: Josef Bacik <jbacik@fb.com>


# 66ee620f 25-Jun-2018 Matthew Wilcox <willy@infradead.org>

idr: Permit any valid kernel pointer to be stored

An upcoming change to the encoding of internal entries will set the bottom
two bits to 0b10. Unfortunately, m68k only aligns some data structures
to 2 bytes, so the IDR will interpret them as internal entries and things
will go badly wrong.

Change the radix tree so that it stops either when the node indicates
that it's the bottom of the tree (shift == 0) or when the entry is not an
internal entry. This means we cannot insert an arbitrary kernel pointer
as a multiorder entry, but the IDR does not permit multiorder entries.

Annoyingly, this means the IDR can no longer take advantage of the radix
tree's ability to store a single entry at offset 0 without allocating
memory. A pointer which is 2-byte aligned cannot be stored directly in
the root as it would be indistinguishable from a node, so we must allocate
a node in order to store a 2-byte pointer at index 0. The idr_replace()
function does not take a GFP flags argument, so cannot allocate memory.
If a user inserts a 4-byte aligned pointer at index 0 and then replaces
it with a 2-byte aligned pointer, we must be able to store it.

Arbitrary pointer values are still not permitted; pointers of the
form 2 + (i * 4) for values of i between 0 and 1023 are reserved for
the implementation. These are not valid kernel pointers as they would
point into the zero page.

This change does cause a runtime memory consumption regression for
the IDA. I will recover that later.

Signed-off-by: Matthew Wilcox <willy@infradead.org>
Tested-by: Guenter Roeck <linux@roeck-us.net>


# 1df89519 18-Jun-2018 Matthew Wilcox <willy@infradead.org>

ida: Change ida_get_new_above to return the id

This calling convention makes more sense for the implementation as well
as the callers. It even shaves 32 bytes off the compiled code size.

Signed-off-by: Matthew Wilcox <willy@infradead.org>


# b03f8e43 18-Jun-2018 Matthew Wilcox <willy@infradead.org>

ida: Remove old API

Delete ida_pre_get(), ida_get_new(), ida_get_new_above() and ida_remove()
from the public API. Some of these functions still exist as internal
helpers, but they should not be called by consumers.

Signed-off-by: Matthew Wilcox <willy@infradead.org>


# 5ade60dd 20-Mar-2018 Matthew Wilcox <willy@infradead.org>

ida: Add new API

Add ida_alloc(), ida_alloc_min(), ida_alloc_max(), ida_alloc_range()
and ida_free(). The ida_alloc_max() and ida_alloc_range() functions
differ from ida_simple_get() in that they take an inclusive 'max'
parameter instead of an exclusive 'end' parameter. Callers are about
evenly split whether they'd like inclusive or exclusive parameters and
'max' is easier to document than 'end'.

Change the IDA allocation to first attempt to allocate a bit using
existing memory, and only allocate memory afterwards. Also change the
behaviour of 'min' > INT_MAX from being a BUG() to returning -ENOSPC.

Leave compatibility wrappers in place for ida_simple_get() and
ida_simple_remove() to avoid changing all callers.

Signed-off-by: Matthew Wilcox <willy@infradead.org>


# 50d97d50 21-Jun-2018 Matthew Wilcox <willy@infradead.org>

ida: Lock the IDA in ida_destroy

The user has no need to handle locking between ida_simple_get() and
ida_simple_remove(). They shouldn't be forced to think about whether
ida_destroy() might be called at the same time as any of their other
IDA manipulation calls. Improve the documnetation while I'm in here.

Signed-off-by: Matthew Wilcox <willy@infradead.org>


# b94078e6 07-Jun-2018 Matthew Wilcox <willy@infradead.org>

lib/idr.c: remove simple_ida_lock

Improve the scalability of the IDA by using the per-IDA xa_lock rather
than the global simple_ida_lock. IDAs are not typically used in
performance-sensitive locations, but since we have this lock anyway, we
can use it. It is also a step towards converting the IDA from the radix
tree to the XArray.

[akpm@linux-foundation.org: idr.c needs xarray.h]
Link: http://lkml.kernel.org/r/20180331125332.GF13332@bombadil.infradead.org
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 4b0ad076 26-Feb-2018 Matthew Wilcox <willy@infradead.org>

idr: Fix handling of IDs above INT_MAX

Khalid reported that the kernel selftests are currently failing:

selftests: test_bpf.sh
========================================
test_bpf: [FAIL]
not ok 1..8 selftests: test_bpf.sh [FAIL]

He bisected it to 6ce711f2750031d12cec91384ac5cfa0a485b60a ("idr: Make
1-based IDRs more efficient").

The root cause is doing a signed comparison in idr_alloc_u32() instead
of an unsigned comparison. I went looking for any similar problems and
found a couple (which would each result in the failure to warn in two
situations that aren't supposed to happen).

I knocked up a few test-cases to prove that I was right and added them
to the test-suite.

Reported-by: Khalid Aziz <khalid.aziz@oracle.com>
Tested-by: Khalid Aziz <khalid.aziz@oracle.com>
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>


# b1a8a7a7 21-Feb-2018 Rasmus Villemoes <linux@rasmusvillemoes.dk>

ida: do zeroing in ida_pre_get()

As far as I can tell, the only place the per-cpu ida_bitmap is populated
is in ida_pre_get. The pre-allocated element is stolen in two places in
ida_get_new_above, in both cases immediately followed by a memset(0).

Since ida_get_new_above is called with locks held, do the zeroing in
ida_pre_get, or rather let kmalloc() do it. Also, apparently gcc
generates ~44 bytes of code to do a memset(, 0, 128):

$ scripts/bloat-o-meter vmlinux.{0,1}
add/remove: 0/0 grow/shrink: 2/1 up/down: 5/-88 (-83)
Function old new delta
ida_pre_get 115 119 +4
vermagic 27 28 +1
ida_get_new_above 715 627 -88

Link: http://lkml.kernel.org/r/20180108225634.15340-1-linux@rasmusvillemoes.dk
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Acked-by: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Eric Biggers <ebiggers@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 6ce711f2 30-Nov-2017 Matthew Wilcox <willy@infradead.org>

idr: Make 1-based IDRs more efficient

About 20% of the IDR users in the kernel want the allocated IDs to start
at 1. The implementation currently searches all the way down the left
hand side of the tree, finds no free ID other than ID 0, walks all the
way back up, and then all the way down again. This patch 'rebases' the
ID so we fill the entire radix tree, rather than leave a gap at 0.

Chris Wilson says: "I did the quick hack of allocating index 0 of the
idr and that eradicated idr_get_free() from being at the top of the
profiles for the many-object stress tests. This improvement will be
much appreciated."

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>


# 72fd6c7b 28-Nov-2017 Matthew Wilcox <willy@infradead.org>

idr: Warn if old iterators see large IDs

Now that the IDR can be used to store large IDs, it is possible somebody
might only partially convert their old code and use the iterators which
can only handle IDs up to INT_MAX. It's probably unwise to show them a
truncated ID, so settle for spewing warnings to dmesg, and terminating
the iteration.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>


# 7a457577 28-Nov-2017 Matthew Wilcox <willy@infradead.org>

idr: Rename idr_for_each_entry_ext

Most places in the kernel that we need to distinguish functions by the
type of their arguments, we use '_ul' as a suffix for the unsigned long
variant, not '_ext'. Also add kernel-doc.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>


# 460488c5 28-Nov-2017 Matthew Wilcox <willy@infradead.org>

idr: Remove idr_alloc_ext

It has no more users, so remove it. Move idr_alloc() back into idr.c,
move the guts of idr_alloc_cmn() into idr_alloc_u32(), remove the
wrappers around idr_get_free_cmn() and rename it to idr_get_free().
While there is now no interface to allocate IDs larger than a u32,
the IDR internals remain ready to handle a larger ID should a need arise.

These changes make it possible to provide the guarantee that, if the
nextid pointer points into the object, the object's ID will be initialised
before a concurrent lookup can find the object.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>


# e096f6a7 28-Nov-2017 Matthew Wilcox <willy@infradead.org>

idr: Add idr_alloc_u32 helper

All current users of idr_alloc_ext() actually want to allocate a u32
and idr_alloc_u32() fits their needs better.

Like idr_get_next(), it uses a 'nextid' argument which serves as both
a pointer to the start ID and the assigned ID (instead of a separate
minimum and pointer-to-assigned-ID argument). It uses a 'max' argument
rather than 'end' because the semantics that idr_alloc has for 'end'
don't work well for unsigned types.

Since idr_alloc_u32() returns an errno instead of the allocated ID, mark
it as __must_check to help callers use it correctly. Include copious
kernel-doc. Chris Mi <chrism@mellanox.com> has promised to contribute
test-cases for idr_alloc_u32.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>


# 234a4624 28-Nov-2017 Matthew Wilcox <willy@infradead.org>

idr: Delete idr_replace_ext function

Changing idr_replace's 'id' argument to 'unsigned long' works for all
callers. Callers which passed a negative ID now get -ENOENT instead of
-EINVAL. No callers relied on this error value.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>


# c7df8ad2 15-Nov-2017 Mel Gorman <mgorman@techsingularity.net>

mm, truncate: do not check mapping for every page being truncated

During truncation, the mapping has already been checked for shmem and
dax so it's known that workingset_update_node is required.

This patch avoids the checks on mapping for each page being truncated.
In all other cases, a lookup helper is used to determine if
workingset_update_node() needs to be called. The one danger is that the
API is slightly harder to use as calling workingset_update_node directly
without checking for dax or shmem mappings could lead to surprises.
However, the API rarely needs to be used and hopefully the comment is
enough to give people the hint.

sparsetruncate (tiny)
4.14.0-rc4 4.14.0-rc4
oneirq-v1r1 pickhelper-v1r1
Min Time 141.00 ( 0.00%) 140.00 ( 0.71%)
1st-qrtle Time 142.00 ( 0.00%) 141.00 ( 0.70%)
2nd-qrtle Time 142.00 ( 0.00%) 142.00 ( 0.00%)
3rd-qrtle Time 143.00 ( 0.00%) 143.00 ( 0.00%)
Max-90% Time 144.00 ( 0.00%) 144.00 ( 0.00%)
Max-95% Time 147.00 ( 0.00%) 145.00 ( 1.36%)
Max-99% Time 195.00 ( 0.00%) 191.00 ( 2.05%)
Max Time 230.00 ( 0.00%) 205.00 ( 10.87%)
Amean Time 144.37 ( 0.00%) 143.82 ( 0.38%)
Stddev Time 10.44 ( 0.00%) 9.00 ( 13.74%)
Coeff Time 7.23 ( 0.00%) 6.26 ( 13.41%)
Best99%Amean Time 143.72 ( 0.00%) 143.34 ( 0.26%)
Best95%Amean Time 142.37 ( 0.00%) 142.00 ( 0.26%)
Best90%Amean Time 142.19 ( 0.00%) 141.85 ( 0.24%)
Best75%Amean Time 141.92 ( 0.00%) 141.58 ( 0.24%)
Best50%Amean Time 141.69 ( 0.00%) 141.31 ( 0.27%)
Best25%Amean Time 141.38 ( 0.00%) 140.97 ( 0.29%)

As you'd expect, the gain is marginal but it can be detected. The
differences in bonnie are all within the noise which is not surprising
given the impact on the microbenchmark.

radix_tree_update_node_t is a callback for some radix operations that
optionally passes in a private field. The only user of the callback is
workingset_update_node and as it no longer requires a mapping, the
private field is removed.

Link: http://lkml.kernel.org/r/20171018075952.10627-3-mgorman@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# a70e43a5 03-Oct-2017 Eric Biggers <ebiggers@google.com>

lib/idr.c: fix comment for idr_replace()

idr_replace() returns the old value on success, not 0.

Link: http://lkml.kernel.org/r/20170918162642.37511-1-ebiggers3@gmail.com
Signed-off-by: Eric Biggers <ebiggers@google.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# a47f68d6 13-Sep-2017 Eric Biggers <ebiggers@google.com>

idr: remove WARN_ON_ONCE() when trying to replace negative ID

IDR only supports non-negative IDs. There used to be a 'WARN_ON_ONCE(id <
0)' in idr_replace(), but it was intentionally removed by commit
2e1c9b286765 ("idr: remove WARN_ON_ONCE() on negative IDs").

Then it was added back by commit 0a835c4f090a ("Reimplement IDR and IDA
using the radix tree"). However it seems that adding it back was a
mistake, given that some users such as drm_gem_handle_delete()
(DRM_IOCTL_GEM_CLOSE) pass in a value from userspace to idr_replace(),
allowing the WARN_ON_ONCE to be triggered. drm_gem_handle_delete()
actually just wants idr_replace() to return an error code if the ID is
not allocated, including in the case where the ID is invalid (negative).

So once again remove the bogus WARN_ON_ONCE().

This bug was found by syzkaller, which encountered the following
warning:

WARNING: CPU: 3 PID: 3008 at lib/idr.c:157 idr_replace+0x1d8/0x240 lib/idr.c:157
Kernel panic - not syncing: panic_on_warn set ...

CPU: 3 PID: 3008 Comm: syzkaller218828 Not tainted 4.13.0-rc4-next-20170811 #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Call Trace:
fixup_bug+0x40/0x90 arch/x86/kernel/traps.c:190
do_trap_no_signal arch/x86/kernel/traps.c:224 [inline]
do_trap+0x260/0x390 arch/x86/kernel/traps.c:273
do_error_trap+0x120/0x390 arch/x86/kernel/traps.c:310
do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:323
invalid_op+0x1e/0x30 arch/x86/entry/entry_64.S:930
RIP: 0010:idr_replace+0x1d8/0x240 lib/idr.c:157
RSP: 0018:ffff8800394bf9f8 EFLAGS: 00010297
RAX: ffff88003c6c60c0 RBX: 1ffff10007297f43 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8800394bfa78
RBP: ffff8800394bfae0 R08: ffffffff82856487 R09: 0000000000000000
R10: ffff8800394bf9a8 R11: ffff88006c8bae28 R12: ffffffffffffffff
R13: ffff8800394bfab8 R14: dffffc0000000000 R15: ffff8800394bfbc8
drm_gem_handle_delete+0x33/0xa0 drivers/gpu/drm/drm_gem.c:297
drm_gem_close_ioctl+0xa1/0xe0 drivers/gpu/drm/drm_gem.c:671
drm_ioctl_kernel+0x1e7/0x2e0 drivers/gpu/drm/drm_ioctl.c:729
drm_ioctl+0x72e/0xa50 drivers/gpu/drm/drm_ioctl.c:825
vfs_ioctl fs/ioctl.c:45 [inline]
do_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:685
SYSC_ioctl fs/ioctl.c:700 [inline]
SyS_ioctl+0x8f/0xc0 fs/ioctl.c:691
entry_SYSCALL_64_fastpath+0x1f/0xbe

Here is a C reproducer:

#include <fcntl.h>
#include <stddef.h>
#include <stdint.h>
#include <sys/ioctl.h>
#include <drm/drm.h>

int main(void)
{
int cardfd = open("/dev/dri/card0", O_RDONLY);

ioctl(cardfd, DRM_IOCTL_GEM_CLOSE,
&(struct drm_gem_close) { .handle = -1 } );
}

Link: http://lkml.kernel.org/r/20170906235306.20534-1-ebiggers3@gmail.com
Fixes: 0a835c4f090a ("Reimplement IDR and IDA using the radix tree")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: <stable@vger.kernel.org> [v4.11+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 388f79fd 30-Aug-2017 Chris Mi <chrism@mellanox.com>

idr: Add new APIs to support unsigned long

The following new APIs are added:

int idr_alloc_ext(struct idr *idr, void *ptr, unsigned long *index,
unsigned long start, unsigned long end, gfp_t gfp);
void *idr_remove_ext(struct idr *idr, unsigned long id);
void *idr_find_ext(const struct idr *idr, unsigned long id);
void *idr_replace_ext(struct idr *idr, void *ptr, unsigned long id);
void *idr_get_next_ext(struct idr *idr, unsigned long *nextid);

Signed-off-by: Chris Mi <chrism@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 7e73eb0b 13-Feb-2017 Matthew Wilcox <willy@infradead.org>

idr: Add missing __rcu annotations

Where we use the radix tree iteration macros, we need to annotate 'slot'
with __rcu. Make sure we don't forget any new places in the future with
the same CFLAGS check used for radix-tree.c.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>


# d37cacc5 17-Dec-2016 Matthew Wilcox <willy@infradead.org>

ida: Use exceptional entries for small IDAs

We can use the root entry as a bitmap and save allocating a 128 byte
bitmap for an IDA that contains only a few entries (30 on a 32-bit
machine, 62 on a 64-bit machine). This costs about 300 bytes of kernel
text on x86-64, so as long as 3 IDAs fall into this category, this
is a net win for memory consumption.

Thanks to Rasmus Villemoes for his work documenting the problem and
collecting statistics on IDAs.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>


# 7ad3d4d8 16-Dec-2016 Matthew Wilcox <willy@infradead.org>

ida: Move ida_bitmap to a percpu variable

When we preload the IDA, we allocate an IDA bitmap. Instead of storing
that preallocated bitmap in the IDA, we store it in a percpu variable.
Generally there are more IDAs in the system than CPUs, so this cuts down
on the number of preallocated bitmaps that are unused, and about half
of the IDA users did not call ida_destroy() so they were leaking IDA
bitmaps.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>


# 0a835c4f 20-Dec-2016 Matthew Wilcox <willy@infradead.org>

Reimplement IDR and IDA using the radix tree

The IDR is very similar to the radix tree. It has some functionality that
the radix tree did not have (alloc next free, cyclic allocation, a
callback-based for_each, destroy tree), which is readily implementable on
top of the radix tree. A few small changes were needed in order to use a
tag to represent nodes with free space below them. More extensive
changes were needed to support storing NULL as a valid entry in an IDR.
Plain radix trees still interpret NULL as a not-present entry.

The IDA is reimplemented as a client of the newly enhanced radix tree. As
in the current implementation, it uses a bitmap at the last level of the
tree.

Signed-off-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>


# a2ef9471 12-Dec-2016 Daniel Vetter <daniel.vetter@ffwll.ch>

lib/ida: document locking requirements a bit better

I wanted to wrap a bunch of ida_simple_get calls into their own locking,
until I dug around and read the original commit message. Stuff like
this should imo be added to the kernel doc, let's do that.

Link: http://lkml.kernel.org/r/20161027072216.20411-1-daniel.vetter@ffwll.ch
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# d0164adc 06-Nov-2015 Mel Gorman <mgorman@techsingularity.net>

mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep and avoiding waking kswapd

__GFP_WAIT has been used to identify atomic context in callers that hold
spinlocks or are in interrupts. They are expected to be high priority and
have access one of two watermarks lower than "min" which can be referred
to as the "atomic reserve". __GFP_HIGH users get access to the first
lower watermark and can be called the "high priority reserve".

Over time, callers had a requirement to not block when fallback options
were available. Some have abused __GFP_WAIT leading to a situation where
an optimisitic allocation with a fallback option can access atomic
reserves.

This patch uses __GFP_ATOMIC to identify callers that are truely atomic,
cannot sleep and have no alternative. High priority users continue to use
__GFP_HIGH. __GFP_DIRECT_RECLAIM identifies callers that can sleep and
are willing to enter direct reclaim. __GFP_KSWAPD_RECLAIM to identify
callers that want to wake kswapd for background reclaim. __GFP_WAIT is
redefined as a caller that is willing to enter direct reclaim and wake
kswapd for background reclaim.

This patch then converts a number of sites

o __GFP_ATOMIC is used by callers that are high priority and have memory
pools for those requests. GFP_ATOMIC uses this flag.

o Callers that have a limited mempool to guarantee forward progress clear
__GFP_DIRECT_RECLAIM but keep __GFP_KSWAPD_RECLAIM. bio allocations fall
into this category where kswapd will still be woken but atomic reserves
are not used as there is a one-entry mempool to guarantee progress.

o Callers that are checking if they are non-blocking should use the
helper gfpflags_allow_blocking() where possible. This is because
checking for __GFP_WAIT as was done historically now can trigger false
positives. Some exceptions like dm-crypt.c exist where the code intent
is clearer if __GFP_DIRECT_RECLAIM is used instead of the helper due to
flag manipulations.

o Callers that built their own GFP flags instead of starting with GFP_KERNEL
and friends now also need to specify __GFP_KSWAPD_RECLAIM.

The first key hazard to watch out for is callers that removed __GFP_WAIT
and was depending on access to atomic reserves for inconspicuous reasons.
In some cases it may be appropriate for them to use __GFP_HIGH.

The second key hazard is callers that assembled their own combination of
GFP flags instead of starting with something like GFP_KERNEL. They may
now wish to specify __GFP_KSWAPD_RECLAIM. It's almost certainly harmless
if it's missed in most cases as other activity will wake kswapd.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Vitaly Wool <vitalywool@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 87d1d169 12-Feb-2015 Rasmus Villemoes <linux@rasmusvillemoes.dk>

lib/idr.c: remove redundant include

idr.c doesn't seem to use anything from hardirq.h (or anything included
from that). Removing it produces identical objdump -d output, and gives
44 fewer lines in the .idr.o.cmd dependency file.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# da3dae54 08-Sep-2014 Masanari Iida <standby24x7@gmail.com>

Documentation: Docbook: Fix generated DocBook/kernel-api.xml

This patch fix spelling typo found in DocBook/kernel-api.xml.
It is because the file is generated from the source comments,
I have to fix the comments in source codes.

Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>


# 93b7aca3 08-Aug-2014 Andrey Ryabinin <ryabinin.a.a@gmail.com>

lib/idr.c: fix out-of-bounds pointer dereference

I'm working on address sanitizer project for kernel. Recently we
started experiments with stack instrumentation, to detect out-of-bounds
read/write bugs on stack.

Just after booting I've hit out-of-bounds read on stack in idr_for_each
(and in __idr_remove_all as well):

struct idr_layer **paa = &pa[0];

while (id >= 0 && id <= max) {
...
while (n < fls(id)) {
n += IDR_BITS;
p = *--paa; <--- here we are reading pa[-1] value.
}
}

Despite the fact that after this dereference we are exiting out of loop
and never use p, such behaviour is undefined and should be avoided.

Fix this by moving pointer derference to the beggining of the loop,
right before we will use it.

Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com>
Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Alexey Preobrazhensky <preobr@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 15f3ec3f 06-Jun-2014 Lai Jiangshan <laijs@cn.fujitsu.com>

idr: reduce the unneeded check in free_layer()

If "idr->hint == p" is true, it also implies "idr->hint" is true(not NULL).

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# aefb7682 06-Jun-2014 Lai Jiangshan <laijs@cn.fujitsu.com>

idr: don't need to shink the free list when idr_remove()

After idr subsystem is changed to RCU-awared, the free layer will not go
to the free list. The free list will not be filled up when
idr_remove(). So we don't need to shink it too.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# b93804b2 06-Jun-2014 Lai Jiangshan <laijs@cn.fujitsu.com>

idr: fix idr_replace()'s returned error code

When the smaller id is not found, idr_replace() returns -ENOENT. But
when the id is bigger enough, idr_replace() returns -EINVAL, actually
there is no difference between these two kinds of ids.

These are all unallocated id, the return values of the idr_replace() for
these ids should be the same: -ENOENT.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# aef0f62e 06-Jun-2014 Lai Jiangshan <laijs@cn.fujitsu.com>

idr: fix NULL pointer dereference when ida_remove(unallocated_id)

If the ida has at least one existing id, and when an unallocated ID
which meets a certain condition is passed to the ida_remove(), the
system will crash because it hits NULL pointer dereference.

The condition is that the unallocated ID shares the same lowest idr
layer with the existing ID, but the idr slot would be different if the
unallocated ID were to be allocated.

In this case the matching idr slot for the unallocated_id is NULL,
causing @bitmap to be NULL which the function dereferences without
checking crashing the kernel.

See the test code:

static void test3(void)
{
int id;
DEFINE_IDA(test_ida);

printk(KERN_INFO "Start test3\n");
if (ida_pre_get(&test_ida, GFP_KERNEL) < 0) return;
if (ida_get_new(&test_ida, &id) < 0) return;
ida_remove(&test_ida, 4000); /* bug: null deference here */
printk(KERN_INFO "End of test3\n");
}

It happens only when the caller tries to free an unallocated ID which is
the caller's fault. It is not a bug. But it is better to add the
proper check and complain rather than crashing the kernel.

[tj@kernel.org: updated patch description]
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 8f9f665a 06-Jun-2014 Lai Jiangshan <laijs@cn.fujitsu.com>

idr: fix unexpected ID-removal when idr_remove(unallocated_id)

If unallocated_id = (ANY * idr_max(idp->layers) + existing_id) is passed
to idr_remove(). The existing_id will be removed unexpectedly.

The following test shows this unexpected id-removal:

static void test4(void)
{
int id;
DEFINE_IDR(test_idr);

printk(KERN_INFO "Start test4\n");
id = idr_alloc(&test_idr, (void *)1, 42, 43, GFP_KERNEL);
BUG_ON(id != 42);
idr_remove(&test_idr, 42 + IDR_SIZE);
TEST_BUG_ON(idr_find(&test_idr, 42) != (void *)1);
idr_destroy(&test_idr);
printk(KERN_INFO "End of test4\n");
}

ida_remove() shares the similar problem.

It happens only when the caller tries to free an unallocated ID which is
the caller's fault. It is not a bug. But it is better to add the
proper check and complain rather than removing an existing_id silently.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 3afb69cb 06-Jun-2014 Lai Jiangshan <laijs@cn.fujitsu.com>

idr: fix overflow bug during maximum ID calculation at maximum height

idr_replace() open-codes the logic to calculate the maximum valid ID
given the height of the idr tree; unfortunately, the open-coded logic
doesn't account for the fact that the top layer may have unused slots
and over-shifts the limit to zero when the tree is at its maximum
height.

The following test code shows it fails to replace the value for
id=((1<<27)+42):

static void test5(void)
{
int id;
DEFINE_IDR(test_idr);
#define TEST5_START ((1<<27)+42) /* use the highest layer */

printk(KERN_INFO "Start test5\n");
id = idr_alloc(&test_idr, (void *)1, TEST5_START, 0, GFP_KERNEL);
BUG_ON(id != TEST5_START);
TEST_BUG_ON(idr_replace(&test_idr, (void *)2, TEST5_START) != (void *)1);
idr_destroy(&test_idr);
printk(KERN_INFO "End of test5\n");
}

Fix the bug by using idr_max() which correctly takes into account the
maximum allowed shift.

sub_alloc() shares the same problem and may incorrectly fail with
-EAGAIN; however, this bug doesn't affect correct operation because
idr_get_empty_slot(), which already uses idr_max(), retries with the
increased @id in such cases.

[tj@kernel.org: Updated patch description.]
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 3f59b067 07-Apr-2014 Monam Agarwal <monamagarwal123@gmail.com>

lib/idr.c: use RCU_INIT_POINTER(x, NULL)

Replace rcu_assign_pointer(x, NULL) with RCU_INIT_POINTER(x, NULL)

The rcu_assign_pointer() ensures that the initialization of a structure
is carried out before storing a pointer to that structure. And in the
case of the NULL pointer, there is no structure to initialize.

So, rcu_assign_pointer(p, NULL) can be safely converted to
RCU_INIT_POINTER(p, NULL)

Signed-off-by: Monam Agarwal <monamagarwal123@gmail.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 90ae3ae5 07-Apr-2014 Stephen Hemminger <stephen@networkplumber.org>

idr: remove dead code

Remove no longer used deprecated code, and make local functions
static.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Jean Delvare <jdelvare@suse.de>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Jeff Layton <jlayton@redhat.com>
Cc: Philipp Reisner <philipp.reisner@linbit.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: George Spelvin <linux@horizon.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 05f7a7d6 08-Aug-2011 Andreas Gruenbacher <agruen@linbit.com>

idr: Add new function idr_is_empty()

Signed-off-by: Andreas Gruenbacher <agruen@linbit.com>
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>


# dd04b452 03-Jul-2013 Jean Delvare <jdelvare@suse.de>

idr: print a stack dump after ida_remove warning

We print a dump stack after idr_remove warning. This is useful to find
the faulty piece of code. Let's do the same for ida_remove, as it would
be equally useful there.

[akpm@linux-foundation.org: convert the open-coded printk+dump_stack into WARN()]
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Cc: Tejun Heo <tj@kernel.org>
Cc: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 3e6628c4 29-Apr-2013 Jeff Layton <jlayton@kernel.org>

idr: introduce idr_alloc_cyclic()

As Tejun points out, there are several users of the IDR facility that
attempt to use it in a cyclic fashion. These users are likely to see
-ENOSPC errors after the counter wraps one or more times however.

This patchset adds a new idr_alloc_cyclic routine and converts several
of these users to it. Many of these users are in obscure parts of the
kernel, and I don't have a good way to test some of them. The change is
pretty straightforward though, so hopefully it won't be an issue.

There is one other cyclic user of idr_alloc that I didn't touch in
ipc/util.c. That one is doing some strange stuff that I didn't quite
understand, but it looks like it should probably be converted later
somehow.

This patch:

Thus spake Tejun Heo:

Ooh, BTW, the cyclic allocation is broken. It's prone to -ENOSPC
after the first wraparound. There are several cyclic users in the
kernel and I think it probably would be best to implement cyclic
support in idr.

This patch does that by adding new idr_alloc_cyclic function that such
users in the kernel can use. With this, there's no need for a caller to
keep track of the last value used as that's now tracked internally. This
should prevent the ENOSPC problems that can hit when the "last allocated"
counter exceeds INT_MAX.

Later patches will convert existing cyclic users to the new interface.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Tejun Heo <tj@kernel.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Eric Paris <eparis@parisplace.org>
Cc: Jack Morgenstein <jackm@dev.mellanox.co.il>
Cc: John McCutchan <john@johnmccutchan.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Cc: Robert Love <rlove@rlove.org>
Cc: Roland Dreier <roland@purestorage.com>
Cc: Sridhar Samudrala <sri@us.ibm.com>
Cc: Steve Wise <swise@opengridcomputing.com>
Cc: Tom Tucker <tom@opengridcomputing.com>
Cc: Vlad Yasevich <vyasevich@gmail.com>

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 59bfbcf0 13-Mar-2013 Tejun Heo <tj@kernel.org>

idr: idr_alloc() shouldn't trigger lowmem warning when preloaded

GFP_NOIO is often used for idr_alloc() inside preloaded section as the
allocation mask doesn't really matter. If the idr tree needs to be
expanded, idr_alloc() first tries to allocate using the specified
allocation mask and if it fails falls back to the preloaded buffer. This
order prevent non-preloading idr_alloc() users from taking advantage of
preloading ones by using preload buffer without filling it shifting the
burden of allocation to the preload users.

Unfortunately, this allowed/expected-to-fail kmem_cache allocation ends up
generating spurious slab lowmem warning before succeeding the request from
the preload buffer.

This patch makes idr_layer_alloc() add __GFP_NOWARN to the first
kmem_cache attempt and try kmem_cache again w/o __GFP_NOWARN after
allocation from preload_buffer fails so that lowmem warning is generated
if not suppressed by the original @gfp_mask.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: David Teigland <teigland@redhat.com>
Tested-by: David Teigland <teigland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# c8615d37 13-Mar-2013 Tejun Heo <tj@kernel.org>

idr: deprecate idr_pre_get() and idr_get_new[_above]()

Now that all in-kernel users are converted to ues the new alloc
interface, mark the old interface deprecated. We should be able to
remove these in a few releases.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 5857f70c 04-Mar-2013 Randy Dunlap <rdunlap@infradead.org>

idr: fix new kernel-doc warnings

Fix new kernel-doc warnings in idr:

Warning(include/linux/idr.h:113): No description found for parameter 'idr'
Warning(include/linux/idr.h:113): Excess function parameter 'idp' description in 'idr_find'
Warning(lib/idr.c:232): Excess function parameter 'id' description in 'sub_alloc'
Warning(lib/idr.c:232): Excess function parameter 'id' description in 'sub_alloc'

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 2e1c9b28 08-Mar-2013 Tejun Heo <tj@kernel.org>

idr: remove WARN_ON_ONCE() on negative IDs

idr_find(), idr_remove() and idr_replace() used to silently ignore the
sign bit and perform lookup with the rest of the bits. The weird behavior
has been changed such that negative IDs are treated as invalid. As the
behavior change was subtle, WARN_ON_ONCE() was added in the hope of
determining who's calling idr functions with negative IDs so that they can
be examined for problems.

Up until now, all two reported cases are ID number coming directly from
userland and getting fed into idr_find() and the warnings seem to cause
more problems than being helpful. Drop the WARN_ON_ONCE()s.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: <markus@trippelsdorf.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 7175c61c 27-Feb-2013 Tejun Heo <tj@kernel.org>

idr: explain WARN_ON_ONCE() on negative IDs out-of-range ID

Until recently, when an negative ID is specified, idr functions used to
ignore the sign bit and proceeded with the operation with the rest of
bits, which is bizarre and error-prone. The behavior recently got changed
so that negative IDs are treated as invalid but we're triggering
WARN_ON_ONCE() on negative IDs just in case somebody was depending on the
sign bit being ignored, so that those can be detected and fixed easily.

We only need this for a while. Explain why WARN_ON_ONCE()s are there and
that they can be removed later.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 0ffc2a9c 27-Feb-2013 Tejun Heo <tj@kernel.org>

idr: implement lookup hint

While idr lookup isn't a particularly heavy operation, it still is too
substantial to use in hot paths without worrying about the performance
implications. With recent changes, each idr_layer covers 256 slots
which should be enough to cover most use cases with single idr_layer
making lookup hint very attractive.

This patch adds idr->hint which points to the idr_layer which
allocated an ID most recently and the fast path lookup becomes

if (look up target's prefix matches that of the hinted layer)
return hint->ary[ID's offset in the leaf layer];

which can be inlined.

idr->hint is set to the leaf node on idr_fill_slot() and cleared from
free_layer().

[andriy.shevchenko@linux.intel.com: always do slow path when hint is uninitialized]
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 54616283 27-Feb-2013 Tejun Heo <tj@kernel.org>

idr: add idr_layer->prefix

Add a field which carries the prefix of ID the idr_layer covers. This
will be used to implement lookup hint.

This patch doesn't make use of the new field and doesn't introduce any
behavior difference.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 1d9b2e1e 27-Feb-2013 Tejun Heo <tj@kernel.org>

idr: remove length restriction from idr_layer->bitmap

Currently, idr->bitmap is declared as an unsigned long which restricts
the number of bits an idr_layer can contain. All bitops can handle
arbitrary positive integer bit number and there's no reason for this
restriction.

Declare idr_layer->bitmap using DECLARE_BITMAP() instead of a single
unsigned long.

* idr_layer->bitmap is now an array. '&' dropped from params to
bitops.

* Replaced "== IDR_FULL" tests with bitmap_full() and removed
IDR_FULL.

* Replaced find_next_bit() on ~bitmap with find_next_zero_bit().

* Replaced "bitmap = 0" with bitmap_clear().

This patch doesn't (or at least shouldn't) introduce any behavior
changes.

[akpm@linux-foundation.org: checkpatch fixes]
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# e8c8d1bc 27-Feb-2013 Tejun Heo <tj@kernel.org>

idr: remove MAX_IDR_MASK and move left MAX_IDR_* into idr.c

MAX_IDR_MASK is another weirdness in the idr interface. As idr covers
whole positive integer range, it's defined as 0x7fffffff or INT_MAX.

Its usage in idr_find(), idr_replace() and idr_remove() is bizarre.
They basically mask off the sign bit and operate on the rest, so if
the caller, by accident, passes in a negative number, the sign bit
will be masked off and the remaining part will be used as if that was
the input, which is worse than crashing.

The constant is visible in idr.h and there are several users in the
kernel.

* drivers/i2c/i2c-core.c:i2c_add_numbered_adapter()

Basically used to test if adap->nr is a negative number which isn't
-1 and returns -EINVAL if so. idr_alloc() already has negative
@start checking (w/ WARN_ON_ONCE), so this can go away.

* drivers/infiniband/core/cm.c:cm_alloc_id()
drivers/infiniband/hw/mlx4/cm.c:id_map_alloc()

Used to wrap cyclic @start. Can be replaced with max(next, 0).
Note that this type of cyclic allocation using idr is buggy. These
are prone to spurious -ENOSPC failure after the first wraparound.

* fs/super.c:get_anon_bdev()

The ID allocated from ida is masked off before being tested whether
it's inside valid range. ida allocated ID can never be a negative
number and the masking is unnecessary.

Update idr_*() functions to fail with -EINVAL when negative @id is
specified and update other MAX_IDR_MASK users as described above.

This leaves MAX_IDR_MASK without any user, remove it and relocate
other MAX_IDR_* constants to lib/idr.c.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Jean Delvare <khali@linux-fr.org>
Cc: Roland Dreier <roland@kernel.org>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Cc: "Marciniszyn, Mike" <mike.marciniszyn@intel.com>
Cc: Jack Morgenstein <jackm@dev.mellanox.co.il>
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Wolfram Sang <wolfram@the-dreams.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 326cf0f0 27-Feb-2013 Tejun Heo <tj@kernel.org>

idr: fix top layer handling

Most functions in idr fail to deal with the high bits when the idr
tree grows to the maximum height.

* idr_get_empty_slot() stops growing idr tree once the depth reaches
MAX_IDR_LEVEL - 1, which is one depth shallower than necessary to
cover the whole range. The function doesn't even notice that it
didn't grow the tree enough and ends up allocating the wrong ID
given sufficiently high @starting_id.

For example, on 64 bit, if the starting id is 0x7fffff01,
idr_get_empty_slot() will grow the tree 5 layer deep, which only
covers the 30 bits and then proceed to allocate as if the bit 30
wasn't specified. It ends up allocating 0x3fffff01 without the bit
30 but still returns 0x7fffff01.

* __idr_remove_all() will not remove anything if the tree is fully
grown.

* idr_find() can't find anything if the tree is fully grown.

* idr_for_each() and idr_get_next() can't iterate anything if the tree
is fully grown.

Fix it by introducing idr_max() which returns the maximum possible ID
given the depth of tree and replacing the id limit checks in all
affected places.

As the idr_layer pointer array pa[] needs to be 1 larger than the
maximum depth, enlarge pa[] arrays by one.

While this plugs the discovered issues, the whole code base is
horrible and in desparate need of rewrite. It's fragile like hell,

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: <stable@vger.kernel.org>

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# d5c7409f 27-Feb-2013 Tejun Heo <tj@kernel.org>

idr: implement idr_preload[_end]() and idr_alloc()

The current idr interface is very cumbersome.

* For all allocations, two function calls - idr_pre_get() and
idr_get_new*() - should be made.

* idr_pre_get() doesn't guarantee that the following idr_get_new*()
will not fail from memory shortage. If idr_get_new*() returns
-EAGAIN, the caller is expected to retry pre_get and allocation.

* idr_get_new*() can't enforce upper limit. Upper limit can only be
enforced by allocating and then freeing if above limit.

* idr_layer buffer is unnecessarily per-idr. Each idr ends up keeping
around MAX_IDR_FREE idr_layers. The memory consumed per idr is
under two pages but it makes it difficult to make idr_layer larger.

This patch implements the following new set of allocation functions.

* idr_preload[_end]() - Similar to radix preload but doesn't fail.
The first idr_alloc() inside preload section can be treated as if it
were called with @gfp_mask used for idr_preload().

* idr_alloc() - Allocate an ID w/ lower and upper limits. Takes
@gfp_flags and can be used w/o preloading. When used inside
preloaded section, the allocation mask of preloading can be assumed.

If idr_alloc() can be called from a context which allows sufficiently
relaxed @gfp_mask, it can be used by itself. If, for example,
idr_alloc() is called inside spinlock protected region, preloading can
be used like the following.

idr_preload(GFP_KERNEL);
spin_lock(lock);

id = idr_alloc(idr, ptr, start, end, GFP_NOWAIT);

spin_unlock(lock);
idr_preload_end();
if (id < 0)
error;

which is much simpler and less error-prone than idr_pre_get and
idr_get_new*() loop.

The new interface uses per-pcu idr_layer buffer and thus the number of
idr's in the system doesn't affect the amount of memory used for
preloading.

idr_layer_alloc() is introduced to handle idr_layer allocations for
both old and new ID allocation paths. This is a bit hairy now but the
new interface is expected to replace the old and the internal
implementation eventually will become simpler.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 3594eb28 27-Feb-2013 Tejun Heo <tj@kernel.org>

idr: refactor idr_get_new_above()

Move slot filling to idr_fill_slot() from idr_get_new_above_int() and
make idr_get_new_above() directly call it. idr_get_new_above_int() is
no longer needed and removed.

This will be used to implement a new ID allocation interface.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 12d1b439 27-Feb-2013 Tejun Heo <tj@kernel.org>

idr: remove _idr_rc_to_errno() hack

idr uses -1, IDR_NEED_TO_GROW and IDR_NOMORE_SPACE to communicate
exception conditions internally. The return value is later translated
to errno values using _idr_rc_to_errno().

This is confusing. Drop the custom ones and consistently use -EAGAIN
for "tree needs to grow", -ENOMEM for "need more memory" and -ENOSPC for
"ran out of ID space".

Due to the weird memory preloading mechanism, [ra]_get_new*() return
-EAGAIN on memory shortage, so we need to substitute -ENOMEM w/
-EAGAIN on those interface functions. They'll eventually be cleaned
up and the translations will go away.

This patch doesn't introduce any functional changes.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 49038ef4 27-Feb-2013 Tejun Heo <tj@kernel.org>

idr: relocate idr_for_each_entry() and reorganize id[r|a]_get_new()

* Move idr_for_each_entry() definition next to other idr related
definitions.

* Make id[r|a]_get_new() inline wrappers of id[r|a]_get_new_above().

This changes the implementation of idr_get_new() but the new
implementation is trivial. This patch doesn't introduce any
functional change.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# fe6e24ec 27-Feb-2013 Tejun Heo <tj@kernel.org>

idr: deprecate idr_remove_all()

There was only one legitimate use of idr_remove_all() and a lot more of
incorrect uses (or lack of it). Now that idr_destroy() implies
idr_remove_all() and all the in-kernel users updated not to use it,
there's no reason to keep it around. Mark it deprecated so that we can
later unexport it.

idr_remove_all() is made an inline function calling __idr_remove_all()
to avoid triggering deprecated warning on EXPORT_SYMBOL().

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 9bb26bc1 27-Feb-2013 Tejun Heo <tj@kernel.org>

idr: make idr_destroy() imply idr_remove_all()

idr is silly in quite a few ways, one of which is how it's supposed to
be destroyed - idr_destroy() doesn't release IDs and doesn't even whine
if the idr isn't empty. If the caller forgets idr_remove_all(), it
simply leaks memory.

Even ida gets this wrong and leaks memory on destruction. There is
absoltely no reason not to call idr_remove_all() from idr_destroy().
Nobody is abusing idr_destroy() for shrinking free layer buffer and
continues to use idr after idr_destroy(), so it's safe to do remove_all
from destroy.

In the whole kernel, there is only one place where idr_remove_all() is
legitimiately used without following idr_destroy() while there are quite
a few places where the caller forgets either idr_remove_all() or
idr_destroy() leaking memory.

This patch makes idr_destroy() call idr_destroy_all() and updates the
function description accordingly.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 6cdae741 27-Feb-2013 Tejun Heo <tj@kernel.org>

idr: fix a subtle bug in idr_get_next()

The iteration logic of idr_get_next() is borrowed mostly verbatim from
idr_for_each(). It walks down the tree looking for the slot matching
the current ID. If the matching slot is not found, the ID is
incremented by the distance of single slot at the given level and
repeats.

The implementation assumes that during the whole iteration id is aligned
to the layer boundaries of the level closest to the leaf, which is true
for all iterations starting from zero or an existing element and thus is
fine for idr_for_each().

However, idr_get_next() may be given any point and if the starting id
hits in the middle of a non-existent layer, increment to the next layer
will end up skipping the same offset into it. For example, an IDR with
IDs filled between [64, 127] would look like the following.

[ 0 64 ... ]
/----/ |
| |
NULL [ 64 ... 127 ]

If idr_get_next() is called with 63 as the starting point, it will try
to follow down the pointer from 0. As it is NULL, it will then try to
proceed to the next slot in the same level by adding the slot distance
at that level which is 64 - making the next try 127. It goes around the
loop and finds and returns 127 skipping [64, 126].

Note that this bug also triggers in idr_for_each_entry() loop which
deletes during iteration as deletions can make layers go away leaving
the iteration with unaligned ID into missing layers.

Fix it by ensuring proceeding to the next slot doesn't carry over the
unaligned offset - ie. use round_up(id + 1, slot_distance) instead of
id += slot_distance.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: David Teigland <teigland@redhat.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 125c4c70 04-Oct-2012 Fengguang Wu <fengguang.wu@intel.com>

idr: rename MAX_LEVEL to MAX_IDR_LEVEL

To avoid name conflicts:

drivers/video/riva/fbdev.c:281:9: sparse: preprocessor token MAX_LEVEL redefined

While at it, also make the other names more consistent and add
parentheses.

[akpm@linux-foundation.org: repair fallout]
[sfr@canb.auug.org.au: IB/mlx4: fix for MAX_ID_MASK to MAX_IDR_MASK name change]
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Cc: Bernd Petrovitsch <bernd@petrovitsch.priv.at>
Cc: walter harms <wharms@bfs.de>
Cc: Glauber Costa <glommer@parallels.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Roland Dreier <roland@purestorage.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 9f7de827 21-Mar-2012 Hugh Dickins <hughd@google.com>

idr: make idr_get_next() good for rcu_read_lock()

Make one small adjustment to idr_get_next(): take the height from the top
layer (stable under RCU) instead of from the root (unprotected by RCU), as
idr_find() does: so that it can be used with RCU locking. Copied comment
on RCU locking from idr_find().

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 8bc3bcc9 16-Nov-2011 Paul Gortmaker <paul.gortmaker@windriver.com>

lib: reduce the use of module.h wherever possible

For files only using THIS_MODULE and/or EXPORT_SYMBOL, map
them onto including export.h -- or if the file isn't even
using those, then just delete the include. Fix up any implicit
include dependencies that were being masked by module.h along
the way.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>


# 46cbc1d3 02-Nov-2011 Tejun Heo <tj@kernel.org>

ida: make ida_simple_get/put() IRQ safe

It's often convenient to be able to release resource from IRQ context.
Make ida_simple_*() use irqsave/restore spin ops so that they are IRQ
safe.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# e3816c54 31-Oct-2011 Wang Sheng-Hui <shhuiw@gmail.com>

lib/idr.c: fix comment for ida_get_new_above()

Signed-off-by: Wang Sheng-Hui <shhuiw@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# f5c3dd71 03-Aug-2011 Paul Bolle <pebolle@tiscali.nl>

Fix kernel-doc comment typo '@id'

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>


# 88eca020 03-Aug-2011 Rusty Russell <rusty@rustcorp.com.au>

ida: simplified functions for id allocation

The current hyper-optimized functions are overkill if you simply want to
allocate an id for a device. Create versions which use an internal
lock.

In followup patches, numerous drivers are converted to use this
interface.

Thanks to Tejun for feedback.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Jonathan Cameron <jic23@cam.ac.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 56083ab1 26-Oct-2010 Randy Dunlap <randy.dunlap@oracle.com>

docbook: add idr/ida to kernel-api docbook

Add idr/ida to kernel-api docbook.
Fix typos and kernel-doc notation.

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Naohiro Aota <naota@elisp.net>
Cc: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 066a9be6 26-Oct-2010 Naohiro Aota <naota@elisp.net>

idr: fix idr_pre_get() locking description

Despite the idr_pre_get() kernel-doc, there are some cases where you can
call idr_pre_get() from within locked regions. Add a description for such
cases.

See also: http://lkml.org/lkml/2010/9/16/462

[akpm@linux-foundation.org: cleanups, grammatical fixes]
Signed-off-by: Naohiro Aota <naota@elisp.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 1458ce16 27-Aug-2010 Naohiro Aota <naota@elisp.net>

idr: describe how nextidp works in idr_get_next().

It was unclear in original kernel-doc how nextidp worked in
idr_get_next(). Let's describe it.

Signed-off-by: Naohiro Aota <naota@elisp.net>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>


# ea24ea850 30-Aug-2010 Naohiro Aota <naota@elisp.net>

idr: fix kernel-doc warnings.

Fix the following kernel-doc warnings.

% perl scripts/kernel-doc lib/idr.c > /dev/null
Warning(lib/idr.c:300): No description found for parameter 'starting_id'
Warning(lib/idr.c:300): Excess function parameter 'start_id' description in 'idr_get_new_above'
Warning(lib/idr.c:485): No description found for parameter 'idp'
Warning(lib/idr.c:596): No description found for parameter 'nextidp'
Warning(lib/idr.c:596): Excess function parameter 'id' description in 'idr_get_next'
Warning(lib/idr.c:774): No description found for parameter 'starting_id'
Warning(lib/idr.c:774): Excess function parameter 'staring_id' description in 'ida_get_new_above'
Warning(lib/idr.c:918): No description found for parameter 'ida'

Signed-off-by: Naohiro Aota <naota@elisp.net>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>


# 94bfa3b6 07-Jun-2010 Paul E. McKenney <paulmck@kernel.org>

idr: fix RCU lockdep splat in idr_get_next()

Convert to rcu_dereference_raw() given that many callers may have many
different locking models.

Located-by: Miles Lane <miles.lane@gmail.com>
Tested-by: Miles Lane <miles.lane@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>


# 2dcb22b3 26-May-2010 Imre Deak <imre.deak@nokia.com>

idr: fix backtrack logic in idr_remove_all

Currently idr_remove_all will fail with a use after free error if
idr::layers is bigger than 2, which on 32 bit systems corresponds to items
more than 1024. This is due to stepping back too many levels during
backtracking. For simplicity let's assume that IDR_BITS=1 -> we have 2
nodes at each level below the root node and each leaf node stores two IDs.
(In reality for 32 bit systems IDR_BITS=5, with 32 nodes at each sub-root
level and 32 IDs in each leaf node). The sequence of freeing the nodes at
the moment is as follows:

layer
1 -> a(7)
2 -> b(3) c(5)
3 -> d(1) e(2) f(4) g(6)

Until step 4 things go fine, but then node c is freed, whereas node g
should be freed first. Since node c contains the pointer to node g we'll
have a use after free error at step 6.

How many levels we step back after visiting the leaf nodes is currently
determined by the msb of the id we are currently visiting:

Step
1. node d with IDs 0,1 is freed, current ID is advanced to 2.
msb of the current ID bit 1. This means we need to step back
1 level to node b and take the next sibling, node e.
2-3. node e with IDs 2,3 is freed, current ID is 4, msb is bit 2.
This means we need to step back 2 levels to node a, freeing
node b on the way.
4-5. node f with IDs 4,5 is freed, current ID is 6, msb is still
bit 2. This means we again need to step back 2 levels to node
a and free c on the way.
6. We should visit node g, but its pointer is not available as
node c was freed.

The fix changes how we determine the number of levels to step back.
Instead of deducting this merely from the msb of the current ID, we should
really check if advancing the ID causes an overflow to a bit position
corresponding to a given layer. In the above example overflow from bit 0
to bit 1 should mean stepping back 1 level. Overflow from bit 1 to bit 2
should mean stepping back 2 levels and so on.

The fix was tested with IDs up to 1 << 20, which corresponds to 4 layers
on 32 bit systems.

Signed-off-by: Imre Deak <imre.deak@nokia.com>
Reviewed-by: Tejun Heo <tj@kernel.org>
Cc: Eric Paris <eparis@redhat.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: <stable@kernel.org> [2.6.34.1]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 4d1ee80f 29-Jan-2010 Ben Hutchings <bhutchings@solarflare.com>

idr: export idr_get_next()

idr_get_next() was accidentally not exported when added. It is about
to be used by mtdcore, which may be built as a module.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>


# 96be753a 22-Feb-2010 Paul E. McKenney <paulmck@kernel.org>

idr: Apply lockdep-based diagnostics to rcu_dereference() uses

Because idr can be used with any of a number of locks or with
any flavor of RCU, just disable the lockdep-based diagnostics.
If idr needs diagnostics, the check expression will need to be
passed into the relevant idr primitives as an additional
argument.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: mathieu.desnoyers@polymtl.ca
Cc: josh@joshtriplett.org
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: dhowells@redhat.com
LKML-Reference: <1266887105-1528-11-git-send-email-paulmck@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# d2e7276b 22-Feb-2010 Tejun Heo <tj@kernel.org>

idr: fix a critical misallocation bug, take#2

This is retry of reverted 859ddf09743a8cc680af33f7259ccd0fd36bfe9d
("idr: fix a critical misallocation bug") which contained two bugs.

* pa[idp->layers] should be cleared even if it's not used by
sub_alloc() because it's used by mark idr_mark_full().

* The original condition check also assigned pa[l] to p which the new
code didn't do thus leaving p pointing at the wrong layer.

Both problems have been fixed and the idr code has received good amount
testing using userland testing setup where simple bitmap allocator is
run parallel to verify the result of idr allocation.

The bug this patch fixes is caused by sub_alloc() optimization path
bypassing out-of-room condition check and restarting allocation loop
with starting value higher than maximum allowed value. For detailed
description, please read commit message of 859ddf09.

Signed-off-by: Tejun Heo <tj@kernel.org>
Based-on-patch-from: Eric Paris <eparis@redhat.com>
Reported-by: Eric Paris <eparis@redhat.com>
Tested-by: Stefan Lippers-Hollmann <s.l-h@gmx.de>
Tested-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 6f14a668 04-Feb-2010 Tejun Heo <tj@kernel.org>

idr: revert misallocation bug fix

Commit 859ddf09743a8cc680af33f7259ccd0fd36bfe9d tried to fix
misallocation bug but broke full bit marking by not clearing
pa[idp->layers] and also is causing X failures due to lookup failure
in drm code. The cause of the latter hasn't been found yet. Revert
the fix for now.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 859ddf09 02-Feb-2010 Tejun Heo <tj@kernel.org>

idr: fix a critical misallocation bug

Eric Paris located a bug in idr. With IDR_BITS of 6, it grows to three
layers when id 4096 is first allocated. When that happens, idr wraps
incorrectly and searches the idr array ignoring the high bits. The
following test code from Eric demonstrates the bug nicely.

#include <linux/idr.h>
#include <linux/kernel.h>
#include <linux/module.h>

static DEFINE_IDR(test_idr);

int init_module(void)
{
int ret, forty95, forty96;
void *addr;

/* add 2 entries both with 4095 as the start address */
again1:
if (!idr_pre_get(&test_idr, GFP_KERNEL))
return -ENOMEM;
ret = idr_get_new_above(&test_idr, (void *)4095, 4095, &forty95);
if (ret) {
if (ret == -EAGAIN)
goto again1;
return ret;
}
if (forty95 != 4095)
printk(KERN_ERR "hmmm, forty95=%d\n", forty95);

again2:
if (!idr_pre_get(&test_idr, GFP_KERNEL))
return -ENOMEM;
ret = idr_get_new_above(&test_idr, (void *)4096, 4095, &forty96);
if (ret) {
if (ret == -EAGAIN)
goto again2;
return ret;
}
if (forty96 != 4096)
printk(KERN_ERR "hmmm, forty96=%d\n", forty96);

/* try to find the 2 entries, noticing that 4096 broke */
addr = idr_find(&test_idr, forty95);
if ((int)addr != forty95)
printk(KERN_ERR "hmmm, after find forty95=%d addr=%d\n", forty95, (int)addr);
addr = idr_find(&test_idr, forty96);
if ((int)addr != forty96)
printk(KERN_ERR "hmmm, after find forty96=%d addr=%d\n", forty96, (int)addr);
/* really weird, the entry which should be at 4096 is actually at 0!! */
addr = idr_find(&test_idr, 0);
if ((int)addr)
printk(KERN_ERR "found an entry at id=0 for addr=%d\n", (int)addr);

idr_remove(&test_idr, forty95);
idr_remove(&test_idr, forty96);

return 0;
}

void cleanup_module(void)
{
}

MODULE_AUTHOR("Eric Paris <eparis@redhat.com>");
MODULE_DESCRIPTION("Simple idr test");
MODULE_LICENSE("GPL");

This happens because when sub_alloc() back tracks it doesn't always do it
step-by-step while the over-the-limit detection assumes step-by-step
backtracking. The logic in sub_alloc() looks like the following.

restart:
clear pa[top level + 1] for end cond detection
l = top level
while (true) {
search for empty slot at this level
if (not found) {
push id to the next possible value
l++
A: if (pa[l] is clear)
failed, return asking caller to grow the tree
if (going up 1 level gives more slots to search)
continue the while loop above with the incremented l
else
C: goto restart
}
adjust id accordingly to the found slot
if (l == 0)
return found id;
create lower level if not there yet
record pa[l] and l--
}

Test A is the fail exit condition but this assumes that failure is
propagated upwared one level at a time but the B optimization path breaks
the assumption and restarts the whole thing with a start value which is
above the possible limit with the current layers. sub_alloc() assumes the
start id value is inside the limit when called and test A is the only exit
condition check, so it ends up searching for empty slot while ignoring
high set bit.

So, for 4095->4096 test, level0 search fails but pa[1] contains a valid
pointer. However, going up 1 level wouldn't give any more empty slot so
it takes C and when the whole thing restarts nobody notices the high bit
set beyond the top level.

This patch fixes the bug by changing the fail exit condition check to full
id limit check.

Based-on-patch-from: Eric Paris <eparis@redhat.com>
Reported-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 94e2bd68 16-Oct-2009 Thadeu Lima de Souza Cascardo <cascardo@holoscopio.com>

tree-wide: fix some typos and punctuation in comments

fix some typos and punctuation in comments

Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@holoscopio.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>


# 38460b48 02-Apr-2009 KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

cgroup: CSS ID support

Patch for Per-CSS(Cgroup Subsys State) ID and private hierarchy code.

This patch attaches unique ID to each css and provides following.

- css_lookup(subsys, id)
returns pointer to struct cgroup_subysys_state of id.
- css_get_next(subsys, id, rootid, depth, foundid)
returns the next css under "root" by scanning

When cgroup_subsys->use_id is set, an id for css is maintained.

The cgroup framework only parepares
- css_id of root css for subsys
- id is automatically attached at creation of css.
- id is *not* freed automatically. Because the cgroup framework
don't know lifetime of cgroup_subsys_state.
free_css_id() function is provided. This must be called by subsys.

There are several reasons to develop this.
- Saving space .... For example, memcg's swap_cgroup is array of
pointers to cgroup. But it is not necessary to be very fast.
By replacing pointers(8bytes per ent) to ID (2byes per ent), we can
reduce much amount of memory usage.

- Scanning without lock.
CSS_ID provides "scan id under this ROOT" function. By this, scanning
css under root can be written without locks.
ex)
do {
rcu_read_lock();
next = cgroup_get_next(subsys, id, root, &found);
/* check sanity of next here */
css_tryget();
rcu_read_unlock();
id = found + 1
} while(...)

Characteristics:
- Each css has unique ID under subsys.
- Lifetime of ID is controlled by subsys.
- css ID contains "ID" and "Depth in hierarchy" and stack of hierarchy
- Allowed ID is 1-65535, ID 0 is UNUSED ID.

Design Choices:
- scan-by-ID v.s. scan-by-tree-walk.
As /proc's pid scan does, scan-by-ID is robust when scanning is done
by following kind of routine.
scan -> rest a while(release a lock) -> conitunue from interrupted
memcg's hierarchical reclaim does this.

- When subsys->use_id is set, # of css in the system is limited to
65535.

[bharata@linux.vnet.ibm.com: remove rcu_read_lock() from css_get_next()]
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Paul Menage <menage@google.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 1b23336a 10-Mar-2009 Paul E. McKenney <paulmck@kernel.org>

idr: make idr_remove_all() do removal -before- free_layer()

Fix a problem in the IDR system, where an idr_remove_all() hands a data
element to call_rcu() (via free_layer()) before making that data element
inaccessible to new readers. This is very bad, and results in readers
still having a reference to this data element at the end of the grace
period.

Tests on large machines that concurrently map and unmap user-space memory
within the same multithreaded process result in crashes within about five
minutes. Applying this patch increases the kernel's longevity to the
three-to-eight-hour range.

There appear to be other similar problems in idr_get_empty_slot() and
sub_remove(), but I fixed the easy one in idr_remove_all() first. It is
therefore no surprise that failures still occur.

Located-by: Milton Miller II <miltonm@austin.ibm.com>
Tested-by: Milton Miller II <miltonm@austin.ibm.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 5b019e99 15-Jan-2009 Andrew Morton <akpm@linux-foundation.org>

lib/idr.c: use kmem_cache_zalloc() for the idr_layer cache

David points out that the idr_remove_all() function returns unused slabs
to the kmem cache, but needs to zero them first or else they will be
uninitialized upon next use. This causes crashes which have been observed
in the firewire subsystem.

He fixed this by zeroing the object before freeing it in idr_remove_all().

But we agree that simply removing the constructor and zeroing the object
at allocation time is simpler than relying upon slab constructor machinery
and might even be faster.

This problem was introduced by "idr: make idr_remove rcu-safe" (commit
cf481c20c476ad2c0febdace9ce23f5a4db19582), which was first released in
2.6.27.

There are no known codesites which trigger this bug in 2.6.27 or 2.6.28.
The post-2.6.28 firewire changes are the only known triggerer.

There might of course be not-yet-discovered triggerers in 2.6.27 and
2.6.28, and there might be out-of-tree triggerers which are added to those
kernel versions. I'll let the -stable guys decide whether they want to
backport this fix.

Reported-by: David Moore <dcm@acm.org>
Cc: Stefan Richter <stefanr@s5r6.in-berlin.de>
Cc: Nadia Derbey <Nadia.Derbey@bull.net>
Cc: Paul E. McKenney <paulmck@us.ibm.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Kristian Hgsberg <krh@redhat.com>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# b098161b 15-Jan-2009 Li Zefan <lizf@cn.fujitsu.com>

idr: fix wrong kernel-doc

idr_get_new_above() and ida_get_new_above() return an id in the range of
@staring_id ... 0x7fffffff, not 0 ... 0x7fffffff.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 711a49a0 10-Dec-2008 Manfred Spraul <manfred@colorfullife.com>

lib/idr.c: Fix bug introduced by RCU fix

The last patch to lib/idr.c caused a bug if idr_get_new_above() was
called on an empty idr.

Usually, nodes stay on the same layer. New layers are added to the top
of the tree.

The exception is idr_get_new_above() on an empty tree: In this case, the
new root node is first added on layer 0, then moved upwards. p->layer
was not updated.

As usual: You shall never rely on the source code comments, they will
only mislead you.

Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 6ff2d39b 01-Dec-2008 Manfred Spraul <manfred@colorfullife.com>

lib/idr.c: fix rcu related race with idr_find

2nd part of the fixes needed for
http://bugzilla.kernel.org/show_bug.cgi?id=11796.

When the idr tree is either grown or shrunk, then the update to the number
of layers and the top pointer were not atomic. This race caused crashes.

The attached patch fixes that by replicating the layers counter in each
layer, thus idr_find doesn't need idp->layers anymore.

Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Clement Calmels <cboulte@gmail.com>
Cc: Nadia Derbey <Nadia.Derbey@bull.net>
Cc: Pierre Peiffer <peifferp@gmail.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 51cc5068 25-Jul-2008 Alexey Dobriyan <adobriyan@gmail.com>

SL*B: drop kmem cache argument from constructor

Kmem cache passed to constructor is only needed for constructors that are
themselves multiplexeres. Nobody uses this "feature", nor does anybody uses
passed kmem cache in non-trivial way, so pass only pointer to object.

Non-trivial places are:
arch/powerpc/mm/init_64.c
arch/powerpc/mm/hugetlbpage.c

This is flag day, yes.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Acked-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Jon Tollefson <kniht@linux.vnet.ibm.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Matt Mackall <mpm@selenic.com>
[akpm@linux-foundation.org: fix arch/powerpc/mm/hugetlbpage.c]
[akpm@linux-foundation.org: fix mm/slab.c]
[akpm@linux-foundation.org: fix ubifs]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# cf481c20 25-Jul-2008 Nadia Derbey <Nadia.Derbey@bull.net>

idr: make idr_remove rcu-safe

Introduce the free_layer() routine: it is the one that actually frees memory
after a grace period has elapsed.

Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net>
Reviewed-by: "Paul E. McKenney" <paulmck@us.ibm.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Jim Houston <jim.houston@comcast.net>
Cc: Pierre Peiffer <peifferp@gmail.com>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# f9c46d6e 25-Jul-2008 Nadia Derbey <Nadia.Derbey@bull.net>

idr: make idr_find rcu-safe

Make idr_find rcu-safe: it can now be called inside an rcu_read critical
section.

Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net>
Reviewed-by: "Paul E. McKenney" <paulmck@us.ibm.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Jim Houston <jim.houston@comcast.net>
Cc: Pierre Peiffer <peifferp@gmail.com>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 3219b3b7 25-Jul-2008 Nadia Derbey <Nadia.Derbey@bull.net>

idr: make idr_get_new* rcu-safe

Make the idr_get_new* routines rcu-safe.

Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net>
Reviewed-by: "Paul E. McKenney" <paulmck@us.ibm.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Jim Houston <jim.houston@comcast.net>
Cc: Pierre Peiffer <peifferp@gmail.com>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 944ca05c 25-Jul-2008 Nadia Derbey <Nadia.Derbey@bull.net>

idr: error checking factorization

Do some code factorization in the return code analysis.

Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net>
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Jim Houston <jim.houston@comcast.net>
Cc: Pierre Peiffer <peifferp@gmail.com>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# f098ad65 25-Jul-2008 Nadia Derbey <Nadia.Derbey@bull.net>

idr: fix a printk call

Fix the incomplete printk call.

Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net>
Reviewed-by: "Paul E. McKenney" <paulmck@us.ibm.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Jim Houston <jim.houston@comcast.net>
Cc: Pierre Peiffer <peifferp@gmail.com>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 4ae53789 25-Jul-2008 Nadia Derbey <Nadia.Derbey@bull.net>

idr: rename some of the idr APIs internal routines

This is a trivial patch that renames:

. alloc_layer to get_from_free_list since it idr_pre_get that actually
allocates memory.
. free_layer to move_to_free_list since memory is not actually freed there.

This makes things more clear for the next patches.

Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net>
Reviewed-by: "Paul E. McKenney" <paulmck@us.ibm.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Jim Houston <jim.houston@comcast.net>
Cc: Pierre Peiffer <peifferp@gmail.com>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# af8e2a4c 01-May-2008 Nadia Derbey <Nadia.Derbey@bull.net>

idr: fix idr_remove()

The return inside the loop makes us free only a single layer.

Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net>
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Jim Houston <jim.houston@comcast.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 199f0ca5 29-Apr-2008 Akinobu Mita <akinobu.mita@gmail.com>

idr: create idr_layer_cache at boot time

Avoid a possible kmem_cache_create() failure by creating idr_layer_cache
unconditionary at boot time rather than creating it on-demand when idr_init()
is called the first time.

This change also enables us to eliminate the check every time idr_init() is
called.

[akpm@linux-foundation.org: rename init_id_cache() to idr_init_cache()]
[akpm@linux-foundation.org: fix alpha build]
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 4ba9b9d0 17-Oct-2007 Christoph Lameter <clameter@sgi.com>

Slab API: remove useless ctor parameter and reorder parameters

Slab constructors currently have a flags parameter that is never used. And
the order of the arguments is opposite to other slab functions. The object
pointer is placed before the kmem_cache pointer.

Convert

ctor(void *object, struct kmem_cache *s, unsigned long flags)

to

ctor(struct kmem_cache *s, void *object)

throughout the kernel

[akpm@linux-foundation.org: coupla fixes]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 5ba25331 14-Oct-2007 Al Viro <viro@ftp.linux.org.uk>

more low-hanging fruits - kernel, fs, lib signedness

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 6ace06dc 31-Jul-2007 Oleg Nesterov <oleg@tv-sign.ru>

idr_remove_all: kill unused variable

"error" is always equal to 0.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 20c2df83 19-Jul-2007 Paul Mundt <lethal@linux-sh.org>

mm: Remove slab destructors from kmem_cache_create().

Slab destructors were no longer supported after Christoph's
c59def9f222d44bb7e2f0a559f2906191a0862d7 change. They've been
BUGs for both slab and slub, and slob never supported them
either.

This rips out support for the dtor pointer from kmem_cache_create()
completely and fixes up every single callsite in the kernel (there were
about 224, not including the slab allocator definitions themselves,
or the documentation references).

Signed-off-by: Paul Mundt <lethal@linux-sh.org>


# 23936cc0 16-Jul-2007 Kristian Hoegsberg <krh@redhat.com>

lib: add idr_remove_all

Remove all ids from the given idr tree. idr_destroy() only frees up
unused, cached idp_layers, but this function will remove all id mappings
and leave all idp_layers unused.

A typical clean-up sequence for objects stored in an idr tree, will use
idr_for_each() to free all objects, if necessay, then idr_remove_all() to
remove all ids, and idr_destroy() to free up the cached idr_layers.

Signed-off-by: Kristian Hoegsberg <krh@redhat.com>
Cc: Tejun Heo <htejun@gmail.com>
Cc: Dave Airlie <airlied@linux.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 96d7fa42 16-Jul-2007 Kristian Hoegsberg <krh@redhat.com>

lib: add idr_for_each()

This patch adds an iterator function for the idr data structure. Compared
to just iterating through the idr with an integer and idr_find, this
iterator is (almost, but not quite) linear in the number of elements, as
opposed to the number of integers in the range covered by the idr. This
makes a difference for sparse idrs, but more importantly, it's a nicer way
to iterate through the elements.

The drm subsystem is moving to idr for tracking contexts and drawables, and
with this change, we can use the idr exclusively for tracking these
resources.

[akpm@linux-foundation.org: fix comment]
Signed-off-by: Kristian Hoegsberg <krh@redhat.com>
Cc: Tejun Heo <htejun@gmail.com>
Cc: Dave Airlie <airlied@linux.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 72dba584 13-Jun-2007 Tejun Heo <htejun@gmail.com>

ida: implement idr based id allocator

Implement idr based id allocator. ida is used the same way idr is
used but lacks id -> ptr translation and thus consumes much less
memory. struct ida_bitmap is attached as leaf nodes to idr tree which
is managed by the idr code. Each ida_bitmap is 128bytes long and
contains slightly less than a thousand slots.

ida is more aggressive with releasing extra resources acquired using
ida_pre_get(). After every successful id allocation, ida frees one
reserved idr_layer if possible. Reserved ida_bitmap is not freed
automatically but only one ida_bitmap is reserved and it's almost
always used right away. Under most circumstances, ida won't hold on
to memory for too long which isn't actively used.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>


# e33ac8bd 13-Jun-2007 Tejun Heo <htejun@gmail.com>

idr: separate out idr_mark_full()

Separate out idr_mark_full() from sub_alloc() and make marking the
allocated slot full the responsibility of idr_get_new_above_int().

Allocation part of idr_get_new_above_int() is renamed to
idr_get_empty_slot(). New idr_get_new_above_int() allocates a slot
using the function, install the user pointer and marks it full using
idr_mark_full().

This change doesn't introduce any behavior change. This will be
used by ida.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>


# 7aae6dd8 13-Jun-2007 Tejun Heo <htejun@gmail.com>

idr: fix obscure bug in allocation path

In sub_alloc(), when bitmap search fails, it goes up one level to
continue search. This is done by updating the id cursor and searching
the upper level again. If the cursor was at the end of the upper
level, we need to go further than that.

This wasn't implemented and when that happens the part of the cursor
which indexes into the upper level wraps and sub_alloc() ends up
searching the wrong bitmap. It allocates id which doesn't match the
actual slot.

This patch fixes this by restarting from the top if the search needs
to go higher than one level.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>


# 72fd4a35 10-Feb-2007 Robert P. J. Day <rpjday@mindspring.com>

[PATCH] Numerous fixes to kernel-doc info in source files.

A variety of (mostly) innocuous fixes to the embedded kernel-doc content in
source files, including:

* make multi-line initial descriptions single line
* denote some function names, constants and structs as such
* change erroneous opening '/*' to '/**' in a few places
* reword some text for clarity

Signed-off-by: Robert P. J. Day <rpjday@mindspring.com>
Cc: "Randy.Dunlap" <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# e18b890b 06-Dec-2006 Christoph Lameter <clameter@sgi.com>

[PATCH] slab: remove kmem_cache_t

Replace all uses of kmem_cache_t with struct kmem_cache.

The patch was generated using the following script:

#!/bin/sh
#
# Replace one string by another in all the kernel sources.
#

set -e

for file in `find * -name "*.c" -o -name "*.h"|xargs grep -l $1`; do
quilt add $file
sed -e "1,\$s/$1/$2/g" $file >/tmp/$$
mv /tmp/$$ $file
quilt refresh
done

The script was run like this

sh replace kmem_cache_t "struct kmem_cache"

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# c259cc28 14-Jul-2006 Roland Dreier <rdreier@cisco.com>

[PATCH] Convert idr's internal locking to _irqsave variant

Currently, the code in lib/idr.c uses a bare spin_lock(&idp->lock) to do
internal locking. This is a nasty trap for code that might call idr
functions from different contexts; for example, it seems perfectly
reasonable to call idr_get_new() from process context and idr_remove() from
interrupt context -- but with the current locking this would lead to a
potential deadlock.

The simplest fix for this is to just convert the idr locking to use
spin_lock_irqsave().

In particular, this fixes a very complicated locking issue detected by
lockdep, involving the ib_ipoib driver's priv->lock and dev->_xmit_lock,
which get involved with the ib_sa module's query_idr.lock.

Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Zach Brown <zach.brown@oracle.com>,
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 5806f07c 26-Jun-2006 Jeff Mahoney <jeffm@suse.com>

[PATCH] lib: add idr_replace

This patch adds idr_replace() to replace an existing pointer in a single
operation.

Device-mapper will use this to update the pointer it stored against a given
id.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 1eec0056 25-Jun-2006 Sonny Rao <sonny@burdell.org>

[PATCH] fix race in idr code

I ran into a bug where the kernel died in the idr code:

cpu 0x1d: Vector: 300 (Data Access) at [c000000b7096f710]
pc: c0000000001f8984: .idr_get_new_above_int+0x140/0x330
lr: c0000000001f89b4: .idr_get_new_above_int+0x170/0x330
sp: c000000b7096f990
msr: 800000000000b032
dar: 0
dsisr: 40010000
current = 0xc000000b70d43830
paca = 0xc000000000556900
pid = 2022, comm = hwup
1d:mon> t
[c000000b7096f990] c0000000000d2ad8 .expand_files+0x2e8/0x364 (unreliable)
[c000000b7096faa0] c0000000001f8bf8 .idr_get_new_above+0x18/0x68
[c000000b7096fb20] c00000000002a054 .init_new_context+0x5c/0xf0
[c000000b7096fbc0] c000000000049dc8 .copy_process+0x91c/0x1404
[c000000b7096fcd0] c00000000004a988 .do_fork+0xd8/0x224
[c000000b7096fdc0] c00000000000ebdc .sys_clone+0x5c/0x74
[c000000b7096fe30] c000000000008950 .ppc_clone+0x8/0xc


# e15ae2dd 30-Oct-2005 Jesper Juhl <jesper.juhl@gmail.com>

[PATCH] Whitespace and CodingStyle cleanup for lib/idr.c

Cleanup trailing whitespace, blank lines, CodingStyle issues etc, for
lib/idr.c

Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# fd4f2df2 21-Oct-2005 Al Viro <viro@zeniv.linux.org.uk>

[PATCH] gfp_t: lib/*

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 8d3b3591 23-Oct-2005 Andrew Morton <akpm@osdl.org>

[PATCH] inotify/idr leak fix

Fix a bug which was reported and diagnosed by
Stefan Jones <stefan.jones@churchillrandoms.co.uk>

IDR trees include a cache of idr_layer objects. There's no way to destroy
this cache, so when we discard an overall idr tree we end up leaking some
memory.

Add and use idr_destroy() for this. v9fs and infiniband also need to use
idr_destroy() to avoid leaks.

Or, we make the cache global, like radix_tree_preload(). Which is probably
better. Later.

Cc: Eric Van Hensbergen <ericvh@ericvh.myip.org>
Cc: Roland Dreier <rolandd@cisco.com>
Cc: Robert Love <rml@novell.com>
Cc: John McCutchan <ttb@tentacle.dhs.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 7c657f2f 26-Aug-2005 John McCutchan <ttb@tentacle.dhs.org>

[PATCH] Document idr_get_new_above() semantics, update inotify

There is an off by one problem with idr_get_new_above.

The comment and function name suggest that it will return an id >
starting_id, but it actually returned an id >= starting_id, and kernel
callers other than inotify treated it as such.

The patch below fixes the comment, and fixes inotifys usage. The
function name still doesn't match the behaviour, but it never did.

Signed-off-by: John McCutchan <ttb@tentacle.dhs.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 589777ea 21-Jun-2005 Zaur Kambarov <kambarov@berkeley.edu>

[PATCH] coverity: idr_get_new_above_int() overrun fix

This patch fixes overrun of array pa:
92 struct idr_layer *pa[MAX_LEVEL];

in

98 l = idp->layers;
99 pa[l--] = NULL;

by passing idp->layers, set in
202 idp->layers = layers;
to function sub_alloc in
203 v = sub_alloc(idp, ptr, &id);

Signed-off-by: Zaur Kambarov <zkambarov@coverity.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 1da177e4 16-Apr-2005 Linus Torvalds <torvalds@ppc970.osdl.org>

Linux-2.6.12-rc2

Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.

Let it rip!