History log of /linux-master/ipc/msg.c
Revision Date Author Comments
# 18319498 02-Sep-2021 Vasily Averin <vvs@virtuozzo.com>

memcg: enable accounting of ipc resources

When user creates IPC objects it forces kernel to allocate memory for
these long-living objects.

It makes sense to account them to restrict the host's memory consumption
from inside the memcg-limited container.

This patch enables accounting for IPC shared memory segments, messages
semaphores and semaphore's undo lists.

Link: https://lkml.kernel.org/r/d6507b06-4df6-78f8-6c54-3ae86e3b5339@virtuozzo.com
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jiri Slaby <jirislaby@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Serge Hallyn <serge@hallyn.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Yutian Yang <nglaive@gmail.com>
Cc: Zefan Li <lizefan.x@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# bc8136a5 30-Jun-2021 Vasily Averin <vvs@virtuozzo.com>

ipc: use kmalloc for msg_queue and shmid_kernel

msg_queue and shmid_kernel are quite small objects, no need to use
kvmalloc for them. mhocko@: "Both of them are 256B on most 64b systems."

Previously these objects was allocated via ipc_alloc/ipc_rcu_alloc(),
common function for several ipc objects. It had kvmalloc call inside().
Later, this function went away and was finally replaced by direct kvmalloc
call, and now we can use more suitable kmalloc/kfree for them.

Link: https://lkml.kernel.org/r/0d0b6c9b-8af3-29d8-34e2-a565c53780f3@virtuozzo.com
Reported-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Roman Gushchin <guro@fb.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# a11ddb37 22-May-2021 Varad Gautam <varad.gautam@suse.com>

ipc/mqueue, msg, sem: avoid relying on a stack reference past its expiry

do_mq_timedreceive calls wq_sleep with a stack local address. The
sender (do_mq_timedsend) uses this address to later call pipelined_send.

This leads to a very hard to trigger race where a do_mq_timedreceive
call might return and leave do_mq_timedsend to rely on an invalid
address, causing the following crash:

RIP: 0010:wake_q_add_safe+0x13/0x60
Call Trace:
__x64_sys_mq_timedsend+0x2a9/0x490
do_syscall_64+0x80/0x680
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f5928e40343

The race occurs as:

1. do_mq_timedreceive calls wq_sleep with the address of `struct
ext_wait_queue` on function stack (aliased as `ewq_addr` here) - it
holds a valid `struct ext_wait_queue *` as long as the stack has not
been overwritten.

2. `ewq_addr` gets added to info->e_wait_q[RECV].list in wq_add, and
do_mq_timedsend receives it via wq_get_first_waiter(info, RECV) to call
__pipelined_op.

3. Sender calls __pipelined_op::smp_store_release(&this->state,
STATE_READY). Here is where the race window begins. (`this` is
`ewq_addr`.)

4. If the receiver wakes up now in do_mq_timedreceive::wq_sleep, it
will see `state == STATE_READY` and break.

5. do_mq_timedreceive returns, and `ewq_addr` is no longer guaranteed
to be a `struct ext_wait_queue *` since it was on do_mq_timedreceive's
stack. (Although the address may not get overwritten until another
function happens to touch it, which means it can persist around for an
indefinite time.)

6. do_mq_timedsend::__pipelined_op() still believes `ewq_addr` is a
`struct ext_wait_queue *`, and uses it to find a task_struct to pass to
the wake_q_add_safe call. In the lucky case where nothing has
overwritten `ewq_addr` yet, `ewq_addr->task` is the right task_struct.
In the unlucky case, __pipelined_op::wake_q_add_safe gets handed a
bogus address as the receiver's task_struct causing the crash.

do_mq_timedsend::__pipelined_op() should not dereference `this` after
setting STATE_READY, as the receiver counterpart is now free to return.
Change __pipelined_op to call wake_q_add_safe on the receiver's
task_struct returned by get_task_struct, instead of dereferencing `this`
which sits on the receiver's stack.

As Manfred pointed out, the race potentially also exists in
ipc/msg.c::expunge_all and ipc/sem.c::wake_up_sem_queue_prepare. Fix
those in the same way.

Link: https://lkml.kernel.org/r/20210510102950.12551-1-varad.gautam@suse.com
Fixes: c5b2cbdbdac563 ("ipc/mqueue.c: update/document memory barriers")
Fixes: 8116b54e7e23ef ("ipc/sem.c: document and update memory barriers")
Fixes: 0d97a82ba830d8 ("ipc/msg.c: update and document memory barriers")
Signed-off-by: Varad Gautam <varad.gautam@suse.com>
Reported-by: Matthias von Faber <matthias.vonfaber@aox-tech.de>
Acked-by: Davidlohr Bueso <dbueso@suse.de>
Acked-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 4b78e201 07-Jun-2020 Jules Irenge <jbi.octave@gmail.com>

ipc/msg: add missing annotation for freeque()

Sparse reports a warning at freeque()

warning: context imbalance in freeque() - unexpected unlock

The root cause is the missing annotation at freeque()

Add the missing __releases(RCU) annotation
Add the missing __releases(&msq->q_perm) annotation

Signed-off-by: Jules Irenge <jbi.octave@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Lu Shuaibing <shuaibinglu@126.com>
Cc: Nathan Chancellor <natechancellor@gmail.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Link: http://lkml.kernel.org/r/20200403160505.2832-2-jbi.octave@gmail.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 889b3317 03-Feb-2020 Lu Shuaibing <shuaibinglu@126.com>

ipc/msg.c: consolidate all xxxctl_down() functions

A use of uninitialized memory in msgctl_down() because msqid64 in
ksys_msgctl hasn't been initialized. The local | msqid64 | is created in
ksys_msgctl() and then passed into msgctl_down(). Along the way msqid64
is never initialized before msgctl_down() checks msqid64->msg_qbytes.

KUMSAN(KernelUninitializedMemorySantizer, a new error detection tool)
reports:

==================================================================
BUG: KUMSAN: use of uninitialized memory in msgctl_down+0x94/0x300
Read of size 8 at addr ffff88806bb97eb8 by task syz-executor707/2022

CPU: 0 PID: 2022 Comm: syz-executor707 Not tainted 5.2.0-rc4+ #63
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
Call Trace:
dump_stack+0x75/0xae
__kumsan_report+0x17c/0x3e6
kumsan_report+0xe/0x20
msgctl_down+0x94/0x300
ksys_msgctl.constprop.14+0xef/0x260
do_syscall_64+0x7e/0x1f0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x4400e9
Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 fb 13 fc ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007ffd869e0598 EFLAGS: 00000246 ORIG_RAX: 0000000000000047
RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 00000000004400e9
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: 00000000006ca018 R08: 0000000000000000 R09: 0000000000000000
R10: 00000000ffffffff R11: 0000000000000246 R12: 0000000000401970
R13: 0000000000401a00 R14: 0000000000000000 R15: 0000000000000000

The buggy address belongs to the page:
page:ffffea0001aee5c0 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0
flags: 0x100000000000000()
raw: 0100000000000000 0000000000000000 ffffffff01ae0101 0000000000000000
raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
page dumped because: kumsan: bad access detected
==================================================================

Syzkaller reproducer:
msgctl$IPC_RMID(0x0, 0x0)

C reproducer:
// autogenerated by syzkaller (https://github.com/google/syzkaller)

int main(void)
{
syscall(__NR_mmap, 0x20000000, 0x1000000, 3, 0x32, -1, 0);
syscall(__NR_msgctl, 0, 0, 0);
return 0;
}

[natechancellor@gmail.com: adjust indentation in ksys_msgctl]
Link: https://github.com/ClangBuiltLinux/linux/issues/829
Link: http://lkml.kernel.org/r/20191218032932.37479-1-natechancellor@gmail.com
Link: http://lkml.kernel.org/r/20190613014044.24234-1-shuaibinglu@126.com
Signed-off-by: Lu Shuaibing <shuaibinglu@126.com>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: NeilBrown <neilb@suse.com>
From: Andrew Morton <akpm@linux-foundation.org>
Subject: drivers/block/null_blk_main.c: fix layout

Each line here overflows 80 cols by exactly one character. Delete one tab
per line to fix.

Cc: Shaohua Li <shli@fb.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 0d97a82b 03-Feb-2020 Manfred Spraul <manfred@colorfullife.com>

ipc/msg.c: update and document memory barriers

Transfer findings from ipc/mqueue.c:

- A control barrier was missing for the lockless receive case So in
theory, not yet initialized data may have been copied to user space -
obviously only for architectures where control barriers are not NOP.

- use smp_store_release(). In theory, the refount may have been
decreased to 0 already when wake_q_add() tries to get a reference.

Link: http://lkml.kernel.org/r/20191020123305.14715-5-manfred@colorfullife.com
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Waiman Long <longman@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: <1vier1@web.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 275f2214 31-Dec-2018 Arnd Bergmann <arnd@arndb.de>

ipc: rename old-style shmctl/semctl/msgctl syscalls

The behavior of these system calls is slightly different between
architectures, as determined by the CONFIG_ARCH_WANT_IPC_PARSE_VERSION
symbol. Most architectures that implement the split IPC syscalls don't set
that symbol and only get the modern version, but alpha, arm, microblaze,
mips-n32, mips-n64 and xtensa expect the caller to pass the IPC_64 flag.

For the architectures that so far only implement sys_ipc(), i.e. m68k,
mips-o32, powerpc, s390, sh, sparc, and x86-32, we want the new behavior
when adding the split syscalls, so we need to distinguish between the
two groups of architectures.

The method I picked for this distinction is to have a separate system call
entry point: sys_old_*ctl() now uses ipc_parse_version, while sys_*ctl()
does not. The system call tables of the five architectures are changed
accordingly.

As an additional benefit, we no longer need the configuration specific
definition for ipc_parse_version(), it always does the same thing now,
but simply won't get called on architectures with the modern interface.

A small downside is that on architectures that do set
ARCH_WANT_IPC_PARSE_VERSION, we now have an extra set of entry points
that are never called. They only add a few bytes of bloat, so it seems
better to keep them compared to adding yet another Kconfig symbol.
I considered adding new syscall numbers for the IPC_64 variants for
consistency, but decided against that for now.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>


# 9afc5eee 12-Jul-2018 Arnd Bergmann <arnd@arndb.de>

y2038: globally rename compat_time to old_time32

Christoph Hellwig suggested a slightly different path for handling
backwards compatibility with the 32-bit time_t based system calls:

Rather than simply reusing the compat_sys_* entry points on 32-bit
architectures unchanged, we get rid of those entry points and the
compat_time types by renaming them to something that makes more sense
on 32-bit architectures (which don't have a compat mode otherwise),
and then share the entry points under the new name with the 64-bit
architectures that use them for implementing the compatibility.

The following types and interfaces are renamed here, and moved
from linux/compat_time.h to linux/time32.h:

old new
--- ---
compat_time_t old_time32_t
struct compat_timeval struct old_timeval32
struct compat_timespec struct old_timespec32
struct compat_itimerspec struct old_itimerspec32
ns_to_compat_timeval() ns_to_old_timeval32()
get_compat_itimerspec64() get_old_itimerspec32()
put_compat_itimerspec64() put_old_itimerspec32()
compat_get_timespec64() get_old_timespec32()
compat_put_timespec64() put_old_timespec32()

As we already have aliases in place, this patch addresses only the
instances that are relevant to the system call interface in particular,
not those that occur in device drivers and other modules. Those
will get handled separately, while providing the 64-bit version
of the respective interfaces.

I'm not renaming the timex, rusage and itimerval structures, as we are
still debating what the new interface will look like, and whether we
will need a replacement at all.

This also doesn't change the names of the syscall entry points, which can
be done more easily when we actually switch over the 32-bit architectures
to use them, at that point we need to change COMPAT_SYSCALL_DEFINEx to
SYSCALL_DEFINEx with a new name, e.g. with a _time32 suffix.

Suggested-by: Christoph Hellwig <hch@infradead.org>
Link: https://lore.kernel.org/lkml/20180705222110.GA5698@infradead.org/
Signed-off-by: Arnd Bergmann <arnd@arndb.de>


# 27c331a1 21-Aug-2018 Manfred Spraul <manfred@colorfullife.com>

ipc/util.c: further variable name cleanups

The varable names got a mess, thus standardize them again:

id: user space id. Called semid, shmid, msgid if the type is known.
Most functions use "id" already.
idx: "index" for the idr lookup
Right now, some functions use lid, ipc_addid() already uses idx as
the variable name.
seq: sequence number, to avoid quick collisions of the user space id
key: user space key, used for the rhash tree

Link: http://lkml.kernel.org/r/20180712185241.4017-12-manfred@colorfullife.com
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Kees Cook <keescook@chromium.org>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# eae04d25 21-Aug-2018 Davidlohr Bueso <dave@stgolabs.net>

ipc: simplify ipc initialization

Now that we know that rhashtable_init() will not fail, we can get rid of a
lot of the unnecessary cleanup paths when the call errored out.

[manfred@colorfullife.com: variable name added to util.h to resolve checkpatch warning]
Link: http://lkml.kernel.org/r/20180712185241.4017-11-manfred@colorfullife.com
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Kees Cook <keescook@chromium.org>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 4241c1a3 21-Aug-2018 Manfred Spraul <manfred@colorfullife.com>

ipc: rename ipcctl_pre_down_nolock()

Both the comment and the name of ipcctl_pre_down_nolock() are misleading:
The function must be called while holdling the rw semaphore.

Therefore the patch renames the function to ipcctl_obtain_check(): This
name matches the other names used in util.c:

- "obtain" function look up a pointer in the idr, without
acquiring the object lock.
- The caller is responsible for locking.
- _check means that the sequence number is checked.

Link: http://lkml.kernel.org/r/20180712185241.4017-5-manfred@colorfullife.com
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Reviewed-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Kees Cook <keescook@chromium.org>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 39cfffd7 21-Aug-2018 Manfred Spraul <manfred@colorfullife.com>

ipc/util.c: use ipc_rcu_putref() for failues in ipc_addid()

ipc_addid() is impossible to use:
- for certain failures, the caller must not use ipc_rcu_putref(),
because the reference counter is not yet initialized.
- for other failures, the caller must use ipc_rcu_putref(),
because parallel operations could be ongoing already.

The patch cleans that up, by initializing the refcount early, and by
modifying all callers.

The issues is related to the finding of
syzbot+2827ef6b3385deb07eaf@syzkaller.appspotmail.com: syzbot found an
issue with reading kern_ipc_perm.seq, here both read and write to already
released memory could happen.

Link: http://lkml.kernel.org/r/20180712185241.4017-4-manfred@colorfullife.com
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 615c999c 21-Aug-2018 Manfred Spraul <manfred@colorfullife.com>

ipc: compute kern_ipc_perm.id under the ipc lock

ipc_addid() initializes kern_ipc_perm.id after having called
ipc_idr_alloc().

Thus a parallel semctl() or msgctl() that uses e.g. MSG_STAT may use this
unitialized value as the return code.

The patch moves all accesses to kern_ipc_perm.id under the spin_lock().

The issues is related to the finding of
syzbot+2827ef6b3385deb07eaf@syzkaller.appspotmail.com: syzbot found an
issue with kern_ipc_perm.seq

Link: http://lkml.kernel.org/r/20180712185241.4017-2-manfred@colorfullife.com
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Reviewed-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 0eb71a9d 17-Jun-2018 NeilBrown <neilb@suse.com>

rhashtable: split rhashtable.h

Due to the use of rhashtables in net namespaces,
rhashtable.h is included in lots of the kernel,
so a small changes can required a large recompilation.
This makes development painful.

This patch splits out rhashtable-types.h which just includes
the major type declarations, and does not include (non-trivial)
inline code. rhashtable.h is no longer included by anything
in the include/ directory.
Common include files only include rhashtable-types.h so a large
recompilation is only triggered when that changes.

Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c2ab975c 28-Apr-2015 Arnd Bergmann <arnd@arndb.de>

y2038: ipc: Report long times to user space

The shmid64_ds/semid64_ds/msqid64_ds data structures have been extended
to contain extra fields for storing the upper bits of the time stamps,
this patch does the other half of the job and and fills the new fields on
32-bit architectures as well as 32-bit tasks running on a 64-bit kernel
in compat mode.

There should be no change for native 64-bit tasks.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>


# 2a70b787 12-Apr-2018 Arnd Bergmann <arnd@arndb.de>

y2038: ipc: Use ktime_get_real_seconds consistently

In some places, we still used get_seconds() instead of
ktime_get_real_seconds(), and I'm changing the remaining ones now to
all use ktime_get_real_seconds() so we use the full available range for
timestamps instead of overflowing the 'unsigned long' return value in
year 2106 on 32-bit kernels.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>


# 23c8cec8 10-Apr-2018 Davidlohr Bueso <dave@stgolabs.net>

ipc/msg: introduce msgctl(MSG_STAT_ANY)

There is a permission discrepancy when consulting msq ipc object
metadata between /proc/sysvipc/msg (0444) and the MSG_STAT shmctl
command. The later does permission checks for the object vs S_IRUGO.
As such there can be cases where EACCESS is returned via syscall but the
info is displayed anyways in the procfs files.

While this might have security implications via info leaking (albeit no
writing to the msq metadata), this behavior goes way back and showing
all the objects regardless of the permissions was most likely an
overlook - so we are stuck with it. Furthermore, modifying either the
syscall or the procfs file can cause userspace programs to break (ie
ipcs). Some applications require getting the procfs info (without root
privileges) and can be rather slow in comparison with a syscall -- up to
500x in some reported cases for shm.

This patch introduces a new MSG_STAT_ANY command such that the msq ipc
object permissions are ignored, and only audited instead. In addition,
I've left the lsm security hook checks in place, as if some policy can
block the call, then the user has no other choice than just parsing the
procfs file.

Link: http://lkml.kernel.org/r/20180215162458.10059-4-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Reported-by: Robert Kettler <robert.kettler@outlook.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 31c213f2 20-Mar-2018 Dominik Brodowski <linux@dominikbrodowski.net>

ipc: add msgsnd syscall/compat_syscall wrappers

Provide ksys_msgsnd() and compat_ksys_msgsnd() wrappers to avoid in-kernel
calls to these syscalls. The ksys_ prefix denotes that these functions are
meant as a drop-in replacement for the syscalls. In particular, they use
the same calling convention as sys_msgsnd() and compat_sys_msgsnd().

This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>


# 078faac9 20-Mar-2018 Dominik Brodowski <linux@dominikbrodowski.net>

ipc: add msgrcv syscall/compat_syscall wrappers

Provide ksys_msgrcv() and compat_ksys_msgrcv() wrappers to avoid in-kernel
calls to these syscalls. The ksys_ prefix denotes that these functions are
meant as a drop-in replacement for the syscalls. In particular, they use
the same calling convention as sys_msgrcv() and compat_sys_msgrcv().

This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>


# e340db56 20-Mar-2018 Dominik Brodowski <linux@dominikbrodowski.net>

ipc: add msgctl syscall/compat_syscall wrappers

Provide ksys_msgctl() and compat_ksys_msgctl() wrappers to avoid in-kernel
calls to these syscalls. The ksys_ prefix denotes that these functions are
meant as a drop-in replacement for the syscalls. In particular, they use
the same calling convention as sys_msgctl() and compat_sys_msgctl().

This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>


# 3d65661a 20-Mar-2018 Dominik Brodowski <linux@dominikbrodowski.net>

ipc: add msgget syscall wrapper

Provide ksys_msgget() wrapper to avoid in-kernel calls to this syscall.
The ksys_ prefix denotes that this function is meant as a drop-in
replacement for the syscall. In particular, it uses the same calling
convention as sys_msgget().

This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>


# 50ab44b1 23-Mar-2018 Eric W. Biederman <ebiederm@xmission.com>

ipc: Directly call the security hook in ipc_ops.associate

After the last round of cleanups the shm, sem, and msg associate
operations just became trivial wrappers around the appropriate security
method. Simplify things further by just calling the security method
directly.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>


# 39a4940e 22-Mar-2018 Eric W. Biederman <ebiederm@xmission.com>

ipc/msg: Fix msgctl(..., IPC_STAT, ...) between pid namespaces

Today msg_lspid and msg_lrpid are remembered in the pid namespace of
the creator and the processes that last send or received a sysvipc
message. If you have processes in multiple pid namespaces that is
just wrong. The process ids reported will not make the least bit of
sense.

This fix is slightly more susceptible to a performance problem than
the related fix for System V shared memory. By definition the pids
are updated by msgsnd and msgrcv, the fast path of System V message
queues. The only concern over the previous implementation is the
incrementing and decrementing of the pid reference count. As that is
the only difference and multiple updates by of the task_tgid by
threads in the same process have been shown in af_unix sockets to
create a cache line ping-pong between cpus of the same processor.

In this case I don't expect cache lines holding pid reference counts
to ping pong between cpus. As senders and receivers update different
pids there is a natural separation there. Further if multiple threads
of the same process either send or receive messages the pid will be
updated to the same value and ipc_update_pid will avoid the reference
count update.

Which means in the common case I expect msg_lspid and msg_lrpid to
remain constant, and reference counts not to be updated when messages
are sent.

In rare cases it may be possible to trigger the issue which was
observed for af_unix sockets, but it will require multiple processes
with multiple threads to be either sending or receiving messages. It
just does not feel likely that anyone would do that in practice.

This change updates msgctl(..., IPC_STAT, ...) to return msg_lspid and
msg_lrpid in the pid namespace of the process calling stat.

This change also updates cat /proc/sysvipc/msg to return print msg_lspid
and msg_lrpid in the pid namespace of the process that opened the proc
file.

Fixes: b488893a390e ("pid namespaces: changes to show virtual ids to user")
Reviewed-by: Nagarathnam Muthusamy <nagarathnam.muthusamy@oracle.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>


# 34b56df9 22-Mar-2018 Eric W. Biederman <ebiederm@xmission.com>

msg: Move struct msg_queue into ipc/msg.c

All of the users are now in ipc/msg.c so make the definition local to
that file to make code maintenance easier. AKA to prevent rebuilding
the entire kernel when struct msg_queue changes.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>


# d8c6e854 22-Mar-2018 Eric W. Biederman <ebiederm@xmission.com>

msg/security: Pass kern_ipc_perm not msg_queue into the msg_queue security hooks

All of the implementations of security hooks that take msg_queue only
access q_perm the struct kern_ipc_perm member. This means the
dependencies of the msg_queue security hooks can be simplified by
passing the kern_ipc_perm member of msg_queue.

Making this change will allow struct msg_queue to become private to
ipc/msg.c.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>


# 87ad4b0d 06-Feb-2018 Philippe Mikoyan <philippe.mikoyan@skat.systems>

ipc: fix ipc data structures inconsistency

As described in the title, this patch fixes <ipc>id_ds inconsistency when
<ipc>ctl_stat executes concurrently with some ds-changing function, e.g.
shmat, msgsnd or whatever.

For instance, if shmctl(IPC_STAT) is running concurrently
with shmat, following data structure can be returned:
{... shm_lpid = 0, shm_nattch = 1, ...}

Link: http://lkml.kernel.org/r/20171202153456.6514-1-philippe.mikoyan@skat.systems
Signed-off-by: Philippe Mikoyan <philippe.mikoyan@skat.systems>
Reviewed-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# b2441318 01-Nov-2017 Greg Kroah-Hartman <gregkh@linuxfoundation.org>

License cleanup: add SPDX GPL-2.0 license identifier to files with no license

Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.

By default all files without license information are under the default
license of the kernel, which is GPL version 2.

Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.

This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.

How this work was done:

Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,

Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.

The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.

The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.

Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).

All documentation files were explicitly excluded.

The following heuristics were used to determine which SPDX license
identifiers to apply.

- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.

For non */uapi/* files that summary was:

SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139

and resulted in the first patch in this series.

If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:

SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930

and resulted in the second patch in this series.

- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:

SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1

and that resulted in the third patch in this series.

- when the two scanners agreed on the detected license(s), that became
the concluded license(s).

- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.

- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).

- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.

- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.

In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.

Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.

Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.

In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.

Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct

This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.

These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.

Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# 6aa211e8 25-Sep-2017 Linus Torvalds <torvalds@linux-foundation.org>

fix address space warnings in ipc/

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 0cfb6aee 08-Sep-2017 Guillaume Knispel <guillaume.knispel@supersonicimagine.com>

ipc: optimize semget/shmget/msgget for lots of keys

ipc_findkey() used to scan all objects to look for the wanted key. This
is slow when using a high number of keys. This change adds an rhashtable
of kern_ipc_perm objects in ipc_ids, so that one lookup cease to be O(n).

This change gives a 865% improvement of benchmark reaim.jobs_per_min on a
56 threads Intel(R) Xeon(R) CPU E5-2695 v3 @ 2.30GHz with 256G memory [1]

Other (more micro) benchmark results, by the author: On an i5 laptop, the
following loop executed right after a reboot took, without and with this
change:

for (int i = 0, k=0x424242; i < KEYS; ++i)
semget(k++, 1, IPC_CREAT | 0600);

total total max single max single
KEYS without with call without call with

1 3.5 4.9 µs 3.5 4.9
10 7.6 8.6 µs 3.7 4.7
32 16.2 15.9 µs 4.3 5.3
100 72.9 41.8 µs 3.7 4.7
1000 5,630.0 502.0 µs * *
10000 1,340,000.0 7,240.0 µs * *
31900 17,600,000.0 22,200.0 µs * *

*: unreliable measure: high variance

The duration for a lookup-only usage was obtained by the same loop once
the keys are present:

total total max single max single
KEYS without with call without call with

1 2.1 2.5 µs 2.1 2.5
10 4.5 4.8 µs 2.2 2.3
32 13.0 10.8 µs 2.3 2.8
100 82.9 25.1 µs * 2.3
1000 5,780.0 217.0 µs * *
10000 1,470,000.0 2,520.0 µs * *
31900 17,400,000.0 7,810.0 µs * *

Finally, executing each semget() in a new process gave, when still
summing only the durations of these syscalls:

creation:
total total
KEYS without with

1 3.7 5.0 µs
10 32.9 36.7 µs
32 125.0 109.0 µs
100 523.0 353.0 µs
1000 20,300.0 3,280.0 µs
10000 2,470,000.0 46,700.0 µs
31900 27,800,000.0 219,000.0 µs

lookup-only:
total total
KEYS without with

1 2.5 2.7 µs
10 25.4 24.4 µs
32 106.0 72.6 µs
100 591.0 352.0 µs
1000 22,400.0 2,250.0 µs
10000 2,510,000.0 25,700.0 µs
31900 28,200,000.0 115,000.0 µs

[1] http://lkml.kernel.org/r/20170814060507.GE23258@yexl-desktop

Link: http://lkml.kernel.org/r/20170815194954.ck32ta2z35yuzpwp@debix
Signed-off-by: Guillaume Knispel <guillaume.knispel@supersonicimagine.com>
Reviewed-by: Marc Pardo <marc.pardo@supersonicimagine.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Kees Cook <keescook@chromium.org>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Serge Hallyn <serge@hallyn.com>
Cc: Andrey Vagin <avagin@openvz.org>
Cc: Guillaume Knispel <guillaume.knispel@supersonicimagine.com>
Cc: Marc Pardo <marc.pardo@supersonicimagine.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 50578ea9 02-Aug-2017 Deepa Dinamani <deepa.kernel@gmail.com>

ipc: msg: Make msg_queue timestamps y2038 safe

time_t is not y2038 safe. Replace all uses of
time_t by y2038 safe time64_t.

Similarly, replace the calls to get_seconds() with
y2038 safe ktime_get_real_seconds().
Note that this preserves fast access on 64 bit systems,
but 32 bit systems need sequence counters.

The syscall interfaces themselves are not changed as part of
the patch. They will be part of a different series.

Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# ade9f91b 02-Aug-2017 Kees Cook <keescook@chromium.org>

ipc: add missing container_of()s for randstruct

When building with the randstruct gcc plugin, the layout of the IPC
structs will be randomized, which requires any sub-structure accesses to
use container_of(). The proc display handlers were missing the needed
container_of()s since the iterator is passing in the top-level struct
kern_ipc_perm.

This would lead to crashes when running the "lsipc" program after the
system had IPC registered (e.g. after starting up Gnome):

general protection fault: 0000 [#1] PREEMPT SMP
...
RIP: 0010:shm_add_rss_swap.isra.1+0x13/0xa0
...
Call Trace:
sysvipc_shm_proc_show+0x5e/0x150
sysvipc_proc_show+0x1a/0x30
seq_read+0x2e9/0x3f0
...

Link: http://lkml.kernel.org/r/20170730205950.GA55841@beast
Fixes: 3859a271a003 ("randstruct: Mark various structs for randomization")
Signed-off-by: Kees Cook <keescook@chromium.org>
Reported-by: Dominik Brodowski <linux@dominikbrodowski.net>
Acked-by: Davidlohr Bueso <dave@stgolabs.net>
Acked-by: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 9b1404c2 09-Jul-2017 Al Viro <viro@zeniv.linux.org.uk>

msgrcv(2), msgsnd(2): move compat to native

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 28327fae 09-Jul-2017 Al Viro <viro@zeniv.linux.org.uk>

ipc: make use of compat ipc_perm helpers

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 46939168 09-Jul-2017 Al Viro <viro@zeniv.linux.org.uk>

msgctl(): move compat to native

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 156d9ed1 09-Jul-2017 Al Viro <viro@zeniv.linux.org.uk>

msgctl(): split the actual work from copyin/copyout

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# fb259c31 12-Jul-2017 Kees Cook <keescook@chromium.org>

ipc/msg: remove special msg_alloc/free

There is nothing special about the msg_alloc/free routines any more, so
remove them to make code more readable.

[manfred@colorfullife.com: Rediff to keep rcu protection for security_msg_queue_alloc()]
Link: http://lkml.kernel.org/r/20170525185107.12869-19-manfred@colorfullife.com
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 3d3653f9 12-Jul-2017 Kees Cook <keescook@chromium.org>

ipc: move atomic_set() to where it is needed

Only after ipc_addid() has succeeded will refcounting be used, so move
initialization into ipc_addid() and remove from open-coded *_alloc()
routines.

Link: http://lkml.kernel.org/r/20170525185107.12869-17-manfred@colorfullife.com
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 51c23b7b 12-Jul-2017 Manfred Spraul <manfred@colorfullife.com>

ipc/msg.c: avoid ipc_rcu_putref for failed ipc_addid()

Loosely based on a patch from Kees Cook <keescook@chromium.org>:
- id and retval can be merged
- if ipc_addid() fails, then use call_rcu() directly.

The difference is that call_rcu is used for failed ipc_addid() calls, to
continue to guaranteed an rcu delay for security_msg_queue_free().

Link: http://lkml.kernel.org/r/20170525185107.12869-16-manfred@colorfullife.com
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 52f90890 12-Jul-2017 Kees Cook <keescook@chromium.org>

ipc/msg: avoid ipc_rcu_alloc()

Instead of using ipc_rcu_alloc() which only performs the refcount bump,
open code it. This also allows for msg_queue structure layout to be
randomized in the future.

Link: http://lkml.kernel.org/r/20170525185107.12869-12-manfred@colorfullife.com
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 9ef5932f 12-Jul-2017 Kees Cook <keescook@chromium.org>

ipc/msg: do not use ipc_rcu_free()

Avoid using ipc_rcu_free, since it just re-finds the original structure
pointer. For the pre-list-init failure path, there is no RCU needed,
since it was just allocated. It can be directly freed.

Link: http://lkml.kernel.org/r/20170525185107.12869-8-manfred@colorfullife.com
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# dba4cdd3 12-Jul-2017 Manfred Spraul <manfred@colorfullife.com>

ipc: merge ipc_rcu and kern_ipc_perm

ipc has two management structures that exist for every id:
- struct kern_ipc_perm, it contains e.g. the permissions.
- struct ipc_rcu, it contains the rcu head for rcu handling and the
refcount.

The patch merges both structures.

As a bonus, we may save one cacheline, because both structures are
cacheline aligned. In addition, it reduces the number of casts, instead
most codepaths can use container_of.

To simplify code, the ipc_rcu_alloc initializes the allocation to 0.

[manfred@colorfullife.com: really include the memset() into ipc_alloc_rcu()]
Link: http://lkml.kernel.org/r/564f8612-0601-b267-514f-a9f650ec9b32@colorfullife.com
Link: http://lkml.kernel.org/r/20170525185107.12869-3-manfred@colorfullife.com
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# eb61baf6 01-Feb-2017 Ingo Molnar <mingo@kernel.org>

sched/headers: Move the wake-queue types and interfaces from sched.h into <linux/sched/wake_q.h>

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 84f001e1 01-Feb-2017 Ingo Molnar <mingo@kernel.org>

sched/headers: Prepare for new header dependencies before moving code to <linux/sched/wake_q.h>

We are going to split <linux/sched/wake_q.h> out of <linux/sched.h>, which
will have to be picked up from other headers and a couple of .c files.

Create a trivial placeholder <linux/sched/wake_q.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.

Include the new header in the files that are going to need it.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 99989835 14-Dec-2016 Jiri Slaby <jirislaby@kernel.org>

ipc: msg, make msgrcv work with LONG_MIN

When LONG_MIN is passed to msgrcv, one would expect to recieve any
message. But convert_mode does *msgtyp = -*msgtyp and -LONG_MIN is
undefined. In particular, with my gcc -LONG_MIN produces -LONG_MIN
again.

So handle this case properly by assigning LONG_MAX to *msgtyp if
LONG_MIN was specified as msgtyp to msgrcv.

This code:
long msg[] = { 100, 200 };
int m = msgget(IPC_PRIVATE, IPC_CREAT | 0644);
msgsnd(m, &msg, sizeof(msg), 0);
msgrcv(m, &msg, sizeof(msg), LONG_MIN, 0);

produces currently nothing:

msgget(IPC_PRIVATE, IPC_CREAT|0644) = 65538
msgsnd(65538, {100, "\310\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"}, 16, 0) = 0
msgrcv(65538, ...

Except a UBSAN warning:

UBSAN: Undefined behaviour in ipc/msg.c:745:13
negation of -9223372036854775808 cannot be represented in type 'long int':

With the patch, I see what I expect:

msgget(IPC_PRIVATE, IPC_CREAT|0644) = 0
msgsnd(0, {100, "\310\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"}, 16, 0) = 0
msgrcv(0, {100, "\310\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"}, 16, -9223372036854775808, 0) = 16

Link: http://lkml.kernel.org/r/20161024082633.10148-1-jslaby@suse.cz
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 194a6b5b 17-Nov-2016 Waiman Long <longman@redhat.com>

sched/wake_q: Rename WAKE_Q to DEFINE_WAKE_Q

Currently the wake_q data structure is defined by the WAKE_Q() macro.
This macro, however, looks like a function doing something as "wake" is
a verb. Even checkpatch.pl was confused as it reported warnings like

WARNING: Missing a blank line after declarations
#548: FILE: kernel/futex.c:3665:
+ int ret;
+ WAKE_Q(wake_q);

This patch renames the WAKE_Q() macro to DEFINE_WAKE_Q() which clarifies
what the macro is doing and eliminates the checkpatch.pl warnings.

Signed-off-by: Waiman Long <longman@redhat.com>
Acked-by: Davidlohr Bueso <dave@stgolabs.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1479401198-1765-1-git-send-email-longman@redhat.com
[ Resolved conflict and added missing rename. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# ed27f912 11-Oct-2016 Davidlohr Bueso <dave@stgolabs.net>

ipc/msg: avoid waking sender upon full queue

Blocked tasks queued in q_senders waiting for their message to fit in the
queue are blindly awoken every time we think there's a remote chance this
might happen. This could cause numerous (and expensive -- thundering
herd-ish) bogus wakeups if the queue is still really full. Adding to the
scheduling cost/overhead, there's also the fact that we need to take the
ipc object lock and requeue ourselves in the q_senders list.

By keeping track of the blocked sender's message size, we can know
previously if the wakeup ought to occur or not. Otherwise, to maintain
the current wakeup order we just move it to the tail. This is exactly
what occurs right now if the sender needs to go back to sleep.

The case of EIDRM is left completely untouched, as we need to wakeup all
the tasks, and shouldn't be playing games in the first place.

This patch was seen to save on the 'msgctl10' ltp testcase ~15% in context
switches (avg out of ten runs). Although these tests are really about
functionality (as opposed to performance), is does show the direct
benefits of the optimization.

[akpm@linux-foundation.org: coding-style fixes]
Link: http://lkml.kernel.org/r/1469748819-19484-6-git-send-email-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# d0d6a2a9 11-Oct-2016 Davidlohr Bueso <dave@stgolabs.net>

ipc/msg: make ss_wakeup() kill arg boolean

... 'tis annoying.

Link: http://lkml.kernel.org/r/1469748819-19484-4-git-send-email-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# e3658538 11-Oct-2016 Davidlohr Bueso <dave@stgolabs.net>

ipc/msg: batch queue sender wakeups

Currently the use of wake_qs in sysv msg queues are only for the receiver
tasks that are blocked on the queue. But blocked sender tasks (due to
queue size constraints) still are awoken with the ipc object lock held,
which can be a problem particularly for small sized queues and far from
gracious for -rt (just like it was for the receiver side).

The paths that actually wakeup a sender are obviously related to when we
are either getting rid of the queue or after (some) space is freed-up
after a receiver takes the msg (msgrcv). Furthermore, with the exception
of msgrcv, we can always piggy-back on expunge_all that has its own tasks
lined-up for waking. Finally, upon unlinking the message, it should be no
problem delaying the wakeups a bit until after we've released the lock.

Link: http://lkml.kernel.org/r/1469748819-19484-3-git-send-email-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# ee51636c 11-Oct-2016 Sebastian Andrzej Siewior <bigeasy@linutronix.de>

ipc/msg: implement lockless pipelined wakeups

This patch moves the wakeup_process() invocation so it is not done under
the ipc global lock by making use of a lockless wake_q. With this change,
the waiter is woken up once the message has been assigned and it does not
need to loop on SMP if the message points to NULL. In the signal case we
still need to check the pointer under the lock to verify the state.

This change should also avoid the introduction of preempt_disable() in -RT
which avoids a busy-loop which pools for the NULL -> !NULL change if the
waiter has a higher priority compared to the waker.

By making use of wake_qs, the logic of sysv msg queues is greatly
simplified (and very well suited as we can batch lockless wakeups),
particularly around the lockless receive algorithm.

This has been tested with Manred's pmsg-shared tool on a "AMD A10-7800
Radeon R7, 12 Compute Cores 4C+8G":

test | before | after | diff
-----------------|------------|------------|----------
pmsg-shared 8 60 | 19,347,422 | 30,442,191 | + ~57.34 %
pmsg-shared 4 60 | 21,367,197 | 35,743,458 | + ~67.28 %
pmsg-shared 2 60 | 22,884,224 | 24,278,200 | + ~6.09 %

Link: http://lkml.kernel.org/r/1469748819-19484-2-git-send-email-dave@stgolabs.net
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 9b24fef9 02-Aug-2016 Fabian Frederick <fabf@skynet.be>

sysv, ipc: fix security-layer leaking

Commit 53dad6d3a8e5 ("ipc: fix race with LSMs") updated ipc_rcu_putref()
to receive rcu freeing function but used generic ipc_rcu_free() instead
of msg_rcu_free() which does security cleaning.

Running LTP msgsnd06 with kmemleak gives the following:

cat /sys/kernel/debug/kmemleak

unreferenced object 0xffff88003c0a11f8 (size 8):
comm "msgsnd06", pid 1645, jiffies 4294672526 (age 6.549s)
hex dump (first 8 bytes):
1b 00 00 00 01 00 00 00 ........
backtrace:
kmemleak_alloc+0x23/0x40
kmem_cache_alloc_trace+0xe1/0x180
selinux_msg_queue_alloc_security+0x3f/0xd0
security_msg_queue_alloc+0x2e/0x40
newque+0x4e/0x150
ipcget+0x159/0x1b0
SyS_msgget+0x39/0x40
entry_SYSCALL_64_fastpath+0x13/0x8f

Manfred Spraul suggested to fix sem.c as well and Davidlohr Bueso to
only use ipc_rcu_free in case of security allocation failure in newary()

Fixes: 53dad6d3a8e ("ipc: fix race with LSMs")
Link: http://lkml.kernel.org/r/1470083552-22966-1-git-send-email-fabf@skynet.be
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: <stable@vger.kernel.org> [3.12+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# b9a53227 29-Sep-2015 Linus Torvalds <torvalds@linux-foundation.org>

Initialize msg/shm IPC objects before doing ipc_addid()

As reported by Dmitry Vyukov, we really shouldn't do ipc_addid() before
having initialized the IPC object state. Yes, we initialize the IPC
object in a locked state, but with all the lockless RCU lookup work,
that IPC object lock no longer means that the state cannot be seen.

We already did this for the IPC semaphore code (see commit e8577d1f0329:
"ipc/sem.c: fully initialize sem_array before making it visible") but we
clearly forgot about msg and shm.

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 55b7ae50 30-Jun-2015 Davidlohr Bueso <dave@stgolabs.net>

ipc: rename ipc_obtain_object

... to ipc_obtain_object_idr, which is more meaningful and makes the code
slightly easier to follow.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# ff35e5ef 30-Jun-2015 Davidlohr Bueso <dave@stgolabs.net>

ipc,msg: provide barrier pairings for lockless receive

We currently use a full barrier on the sender side to to avoid receiver
tasks disappearing on us while still performing on the sender side wakeup.
We lack however, the proper CPU-CPU interactions pairing on the receiver
side which busy-waits for the message. Similarly, we do not need a full
smp_mb, and can relax the semantics for the writer and reader sides of the
message. This is safe as we are only ordering loads and stores to r_msg.
And in both smp_wmb and smp_rmb, there are no stores after the calls
_anyway_.

This obviously applies for pipelined_send and expunge_all, for EIRDM when
destroying a queue.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 7f032d6e 15-Apr-2015 Joe Perches <joe@perches.com>

ipc: remove use of seq_printf return value

The seq_printf return value, because it's frequently misused,
will eventually be converted to void.

See: commit 1f33c41c03da ("seq_file: Rename seq_overflow() to
seq_has_overflowed() and make public")

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 0050ee05 12-Dec-2014 Manfred Spraul <manfred@colorfullife.com>

ipc/msg: increase MSGMNI, remove scaling

SysV can be abused to allocate locked kernel memory. For most systems, a
small limit doesn't make sense, see the discussion with regards to SHMMAX.

Therefore: increase MSGMNI to the maximum supported.

And: If we ignore the risk of locking too much memory, then an automatic
scaling of MSGMNI doesn't make sense. Therefore the logic can be removed.

The code preserves auto_msgmni to avoid breaking any user space applications
that expect that the value exists.

Notes:
1) If an administrator must limit the memory allocations, then he can set
MSGMNI as necessary.

Or he can disable sysv entirely (as e.g. done by Android).

2) MSGMAX and MSGMNB are intentionally not increased, as these values are used
to control latency vs. throughput:
If MSGMNB is large, then msgsnd() just returns and more messages can be queued
before a task switch to a task that calls msgrcv() is forced.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Rafael Aquini <aquini@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 4bb6657d 06-Jun-2014 Davidlohr Bueso <davidlohr@hp.com>

ipc,msg: document volatile r_msg

The need for volatile is not obvious, document it.

Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Aswin Chandramouleeswaran <aswin@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 3440a6bd 06-Jun-2014 Davidlohr Bueso <davidlohr@hp.com>

ipc,msg: move some msgq ns code around

Nothing big and no logical changes, just get rid of some redundant
function declarations. Move msg_[init/exit]_ns down the end of the
file.

Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Aswin Chandramouleeswaran <aswin@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# f75a2f35 06-Jun-2014 Davidlohr Bueso <davidlohr@hp.com>

ipc,msg: use current->state helpers

Call __set_current_state() instead of assigning the new state directly.

Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
Signed-off-by: Manfred Spraul <manfred@colorfullif.com>
Cc: Aswin Chandramouleeswaran <aswin@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 46c0a8ca 06-Jun-2014 Paul McQuade <paulmcquad@gmail.com>

ipc, kernel: clear whitespace

trailing whitespace

Signed-off-by: Paul McQuade <paulmcquad@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 7153e402 06-Jun-2014 Paul McQuade <paulmcquad@gmail.com>

ipc, kernel: use Linux headers

Use #include <linux/uaccess.h> instead of <asm/uaccess.h>
Use #include <linux/types.h> instead of <asm/types.h>

Signed-off-by: Paul McQuade <paulmcquad@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# eb66ec44 06-Jun-2014 Mathias Krause <minipli@googlemail.com>

ipc: constify ipc_ops

There is no need to recreate the very same ipc_ops structure on every
kernel entry for msgget/semget/shmget. Just declare it static and be
done with it. While at it, constify it as we don't modify the structure
at runtime.

Found in the PaX patch, written by the PaX Team.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: PaX Team <pageexec@freemail.hu>
Cc: Davidlohr Bueso <davidlohr@hp.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 4f87dac3 10-Mar-2014 Michael Kerrisk <mtk.manpages@gmail.com>

ipc: Fix 2 bugs in msgrcv() MSG_COPY implementation

While testing and documenting the msgrcv() MSG_COPY flag that Stanislav
Kinsbursky added in commit 4a674f34ba04 ("ipc: introduce message queue
copy feature" => kernel 3.8), I discovered a couple of bugs in the
implementation. The two bugs concern MSG_COPY interactions with other
msgrcv() flags, namely:

(A) MSG_COPY + MSG_EXCEPT
(B) MSG_COPY + !IPC_NOWAIT

The bugs are distinct (and the fix for the first one is obvious),
however my fix for both is a single-line patch, which is why I'm
combining them in a single mail, rather than writing two mails+patches.

===== (A) MSG_COPY + MSG_EXCEPT =====

With the addition of the MSG_COPY flag, there are now two msgrcv()
flags--MSG_COPY and MSG_EXCEPT--that modify the meaning of the 'msgtyp'
argument in unrelated ways. Specifying both in the same call is a
logical error that is currently permitted, with the effect that MSG_COPY
has priority and MSG_EXCEPT is ignored. The call should give an error
if both flags are specified. The patch below implements that behavior.

===== (B) (B) MSG_COPY + !IPC_NOWAIT =====

The test code that was submitted in commit 3a665531a3b7 ("selftests: IPC
message queue copy feature test") shows MSG_COPY being used in
conjunction with IPC_NOWAIT. In other words, if there is no message at
the position 'msgtyp'. return immediately with the error in ENOMSG.

What was not (fully) tested is the behavior if MSG_COPY is specified
*without* IPC_NOWAIT, and there is an odd behavior. If the queue
contains less than 'msgtyp' messages, then the call blocks until the
next message is written to the queue. At that point, the msgrcv() call
returns a copy of the newly added message, regardless of whether that
message is at the ordinal position 'msgtyp'. This is clearly bogus, and
problematic for applications that might want to make use of the MSG_COPY
flag.

I considered the following possible solutions to this problem:

(1) Force the call to block until a message *does* appear at the
position 'msgtyp'.

(2) If the MSG_COPY flag is specified, the kernel should implicitly add
IPC_NOWAIT, so that the call fails with ENOMSG for this case.

(3) If the MSG_COPY flag is specified, but IPC_NOWAIT is not, generate
an error (probably, EINVAL is the right one).

I do not know if any application would really want to have the
functionality of solution (1), especially since an application can
determine in advance the number of messages in the queue using msgctl()
IPC_STAT. Obviously, this solution would be the most work to implement.

Solution (2) would have the effect of silently fixing any applications
that tried to employ broken behavior. However, it would mean that if we
later decided to implement solution (1), then user-space could not
easily detect what the kernel supports (but, since I'm somewhat doubtful
that solution (1) is needed, I'm not sure that this is much of a
problem).

Solution (3) would have the effect of informing broken applications that
they are doing something broken. The downside is that this would cause
a ABI breakage for any applications that are currently employing the
broken behavior. However:

a) Those applications are almost certainly not getting the results they
expect.
b) Possibly, those applications don't even exist, because MSG_COPY is
currently hidden behind CONFIG_CHECKPOINT_RESTORE.

The upside of solution (3) is that if we later decided to implement
solution (1), user-space could determine what the kernel supports, via
the error return.

In my view, solution (3) is mildly preferable to solution (2), and
solution (1) could still be done later if anyone really cares. The
patch below implements solution (3).

PS. For anyone out there still listening, it's the usual story:
documenting an API (and the thinking about, and the testing of the API,
that documentation entails) is the one of the single best ways of
finding bugs in the API, as I've learned from a lot of experience. Best
to do that documentation before releasing the API.

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Acked-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Cc: Stanislav Kinsbursky <skinsbursky@parallels.com>
Cc: stable@vger.kernel.org
Cc: Serge Hallyn <serge.hallyn@canonical.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# ffa571da 27-Jan-2014 Davidlohr Bueso <davidlohr@hp.com>

ipc,msg: document barriers

Both expunge_all() and pipeline_send() rely on both a nil msg value and
a full barrier to guarantee the correct ordering when waking up a task.

While its counterpart at the receiving end is well documented for the
lockless recv algorithm, we still need to document these specific
smp_mb() calls.

[akpm@linux-foundation.org: fix typo, per Mike]
[akpm@linux-foundation.org: mroe tpyos]
Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
Cc: Aswin Chandramouleeswaran <aswin@hp.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 239521f3 27-Jan-2014 Manfred Spraul <manfred@colorfullife.com>

ipc: whitespace cleanup

The ipc code does not adhere the typical linux coding style.
This patch fixes lots of simple whitespace errors.

- mostly autogenerated by
scripts/checkpatch.pl -f --fix \
--types=pointer_location,spacing,space_before_tab
- one manual fixup (keep structure members tab-aligned)
- removal of additional space_before_tab that were not found by --fix

Tested with some of my msg and sem test apps.

Andrew: Could you include it in -mm and move it towards Linus' tree?

Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Suggested-by: Li Bin <huawei.libin@huawei.com>
Cc: Joe Perches <joe@perches.com>
Acked-by: Rafael Aquini <aquini@redhat.com>
Cc: Davidlohr Bueso <davidlohr@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 0f3d2b01 27-Jan-2014 Rafael Aquini <aquini@redhat.com>

ipc: introduce ipc_valid_object() helper to sort out IPC_RMID races

After the locking semantics for the SysV IPC API got improved, a couple
of IPC_RMID race windows were opened because we ended up dropping the
'kern_ipc_perm.deleted' check performed way down in ipc_lock(). The
spotted races got sorted out by re-introducing the old test within the
racy critical sections.

This patch introduces ipc_valid_object() to consolidate the way we cope
with IPC_RMID races by using the same abstraction across the API
implementation.

Signed-off-by: Rafael Aquini <aquini@redhat.com>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Greg Thelen <gthelen@google.com>
Reviewed-by: Davidlohr Bueso <davidlohr@hp.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 4271b05a 30-Sep-2013 Davidlohr Bueso <davidlohr@hp.com>

ipc,msg: prevent race with rmid in msgsnd,msgrcv

This fixes a race in both msgrcv() and msgsnd() between finding the msg
and actually dealing with the queue, as another thread can delete shmid
underneath us if we are preempted before acquiring the
kern_ipc_perm.lock.

Manfred illustrates this nicely:

Assume a preemptible kernel that is preempted just after

msq = msq_obtain_object_check(ns, msqid)

in do_msgrcv(). The only lock that is held is rcu_read_lock().

Now the other thread processes IPC_RMID. When the first task is
resumed, then it will happily wait for messages on a deleted queue.

Fix this by checking for if the queue has been deleted after taking the
lock.

Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
Reported-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: <stable@vger.kernel.org> [3.11]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 53dad6d3 23-Sep-2013 Davidlohr Bueso <davidlohr@hp.com>

ipc: fix race with LSMs

Currently, IPC mechanisms do security and auditing related checks under
RCU. However, since security modules can free the security structure,
for example, through selinux_[sem,msg_queue,shm]_free_security(), we can
race if the structure is freed before other tasks are done with it,
creating a use-after-free condition. Manfred illustrates this nicely,
for instance with shared mem and selinux:

-> do_shmat calls rcu_read_lock()
-> do_shmat calls shm_object_check().
Checks that the object is still valid - but doesn't acquire any locks.
Then it returns.
-> do_shmat calls security_shm_shmat (e.g. selinux_shm_shmat)
-> selinux_shm_shmat calls ipc_has_perm()
-> ipc_has_perm accesses ipc_perms->security

shm_close()
-> shm_close acquires rw_mutex & shm_lock
-> shm_close calls shm_destroy
-> shm_destroy calls security_shm_free (e.g. selinux_shm_free_security)
-> selinux_shm_free_security calls ipc_free_security(&shp->shm_perm)
-> ipc_free_security calls kfree(ipc_perms->security)

This patch delays the freeing of the security structures after all RCU
readers are done. Furthermore it aligns the security life cycle with
that of the rest of IPC - freeing them based on the reference counter.
For situations where we need not free security, the current behavior is
kept. Linus states:

"... the old behavior was suspect for another reason too: having the
security blob go away from under a user sounds like it could cause
various other problems anyway, so I think the old code was at least
_prone_ to bugs even if it didn't have catastrophic behavior."

I have tested this patch with IPC testcases from LTP on both my
quad-core laptop and on a 64 core NUMA server. In both cases selinux is
enabled, and tests pass for both voluntary and forced preemption models.
While the mentioned races are theoretical (at least no one as reported
them), I wanted to make sure that this new logic doesn't break anything
we weren't aware of.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
Acked-by: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 4718787d 11-Sep-2013 Davidlohr Bueso <davidlohr.bueso@hp.com>

ipc,msg: drop msg_unlock

There is only one user left, drop this function and just call
ipc_unlock_object() and rcu_read_unlock().

Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# d9a605e4 11-Sep-2013 Davidlohr Bueso <davidlohr.bueso@hp.com>

ipc: rename ids->rw_mutex

Since in some situations the lock can be shared for readers, we shouldn't
be calling it a mutex, rename it to rwsem.

Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# bebcb928 03-Sep-2013 Manfred Spraul <manfred@colorfullife.com>

ipc/msg.c: Fix lost wakeup in msgsnd().

The check if the queue is full and adding current to the wait queue of
pending msgsnd() operations (ss_add()) must be atomic.

Otherwise:
- the thread that performs msgsnd() finds a full queue and decides to
sleep.
- the thread that performs msgrcv() first reads all messages from the
queue and then sleeps, because the queue is empty.
- the msgrcv() calls do not perform any wakeups, because the msgsnd()
task has not yet called ss_add().
- then the msgsnd()-thread first calls ss_add() and then sleeps.

Net result: msgsnd() and msgrcv() both sleep forever.

Observed with msgctl08 from ltp with a preemptible kernel.

Fix: Call ipc_lock_object() before performing the check.

The patch also moves security_msg_queue_msgsnd() under ipc_lock_object:
- msgctl(IPC_SET) explicitely mentions that it tries to expunge any
pending operations that are not allowed anymore with the new
permissions. If security_msg_queue_msgsnd() is called without locks,
then there might be races.
- it makes the patch much simpler.

Reported-and-tested-by: Vineet Gupta <Vineet.Gupta1@synopsys.com>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: stable@vger.kernel.org # for 3.11
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 368ae537 28-Aug-2013 Svenning Sørensen <sss@secomea.dk>

IPC: bugfix for msgrcv with msgtyp < 0

According to 'man msgrcv': "If msgtyp is less than 0, the first message of
the lowest type that is less than or equal to the absolute value of msgtyp
shall be received."

Bug: The kernel only returns a message if its type is 1; other messages
with type < abs(msgtype) will never get returned.

Fix: After having traversed the list to find the first message with the
lowest type, we need to actually return that message.

This regression was introduced by commit daaf74cf0867 ("ipc: refactor
msg list search into separate function")

Signed-off-by: Svenning Soerensen <sss@secomea.dk>
Reviewed-by: Peter Hurley <peter@hurleysoftware.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 9ad66ae6 08-Jul-2013 Davidlohr Bueso <davidlohr.bueso@hp.com>

ipc: remove unused functions

We can now drop the msg_lock and msg_lock_check functions along with a
bogus comment introduced previously in semctl_down.

Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 41a0d523 08-Jul-2013 Davidlohr Bueso <davidlohr.bueso@hp.com>

ipc,msg: shorten critical region in msgrcv

do_msgrcv() is the last msg queue function that abuses the ipc lock Take
it only when needed when actually updating msq.

Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Rik van Riel <riel@redhat.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 3dd1f784 08-Jul-2013 Davidlohr Bueso <davidlohr.bueso@hp.com>

ipc,msg: shorten critical region in msgsnd

do_msgsnd() is another function that does too many things with the ipc
object lock acquired. Take it only when needed when actually updating
msq.

Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# ac0ba20e 08-Jul-2013 Davidlohr Bueso <davidlohr.bueso@hp.com>

ipc,msg: make msgctl_nolock lockless

While the INFO cmd doesn't take the ipc lock, the STAT commands do
acquire it unnecessarily. We can do the permissions and security checks
only holding the rcu lock.

This function now mimics semctl_nolock().

Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# a5001a0d 08-Jul-2013 Davidlohr Bueso <davidlohr.bueso@hp.com>

ipc,msg: introduce lockless functions to obtain the ipc object

Add msq_obtain_object() and msq_obtain_object_check(), which will allow
us to get the ipc object without acquiring the lock. Just as with
semaphores, these functions are basically wrappers around
ipc_obtain_object*().

Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 2cafed30 08-Jul-2013 Davidlohr Bueso <davidlohr.bueso@hp.com>

ipc,msg: introduce msgctl_nolock

Similar to semctl, when calling msgctl, the *_INFO and *_STAT commands
can be performed without acquiring the ipc object.

Add a msgctl_nolock() function and move the logic of *_INFO and *_STAT
out of msgctl(). This change still takes the lock and it will be
properly lockless in the next patch

Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 15724ecb 08-Jul-2013 Davidlohr Bueso <davidlohr.bueso@hp.com>

ipc,msg: shorten critical region in msgctl_down

Instead of holding the ipc lock for the entire function, use the
ipcctl_pre_down_nolock and only acquire the lock for specific commands:
RMID and SET.

Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 7b4cc5d8 08-Jul-2013 Davidlohr Bueso <davidlohr.bueso@hp.com>

ipc: move locking out of ipcctl_pre_down_nolock

This function currently acquires both the rw_mutex and the rcu lock on
successful lookups, leaving the callers to explicitly unlock them,
creating another two level locking situation.

Make the callers (including those that still use ipcctl_pre_down())
explicitly lock and unlock the rwsem and rcu lock.

Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# cf9d5d78 08-Jul-2013 Davidlohr Bueso <davidlohr.bueso@hp.com>

ipc: close open coded spin lock calls

Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# dbfcd91f 08-Jul-2013 Davidlohr Bueso <davidlohr.bueso@hp.com>

ipc: move rcu lock out of ipc_addid

This patchset continues the work that began in the sysv ipc semaphore
scaling series, see

https://lkml.org/lkml/2013/3/20/546

Just like semaphores used to be, sysv shared memory and msg queues also
abuse the ipc lock, unnecessarily holding it for operations such as
permission and security checks.

This patchset mostly deals with mqueues, and while shared mem can be
done in a very similar way, I want to get these patches out in the open
first. It also does some pending cleanups, mostly focused on the two
level locking we have in ipc code, taking care of ipc_addid() and
ipcctl_pre_down_nolock() - yes there are still functions that need to be
updated as well.

This patch:

Make all callers explicitly take and release the RCU read lock.

This addresses the two level locking seen in newary(), newseg() and
newqueue(). For the last two, explicitly unlock the ipc object and the
rcu lock, instead of calling the custom shm_unlock and msg_unlock
functions. The next patch will deal with the open coded locking for
->perm.lock

Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 41239fe8 30-Apr-2013 Nikola Pajkovsky <npajkovs@redhat.com>

ipc/msg.c: use list_for_each_entry_[safe] for list traversing

The ipc/msg.c code does its list operations by hand and it open-codes the
accesses, instead of using for_each_entry_[safe].

Signed-off-by: Nikola Pajkovsky <npajkovs@redhat.com>
Cc: Stanislav Kinsbursky <skinsbursky@parallels.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 6062a8dc 30-Apr-2013 Rik van Riel <riel@surriel.com>

ipc,sem: fine grained locking for semtimedop

Introduce finer grained locking for semtimedop, to handle the common case
of a program wanting to manipulate one semaphore from an array with
multiple semaphores.

If the call is a semop manipulating just one semaphore in an array with
multiple semaphores, only take the lock for that semaphore itself.

If the call needs to manipulate multiple semaphores, or another caller is
in a transaction that manipulates multiple semaphores, the sem_array lock
is taken, as well as all the locks for the individual semaphores.

On a 24 CPU system, performance numbers with the semop-multi
test with N threads and N semaphores, look like this:

vanilla Davidlohr's Davidlohr's + Davidlohr's +
threads patches rwlock patches v3 patches
10 610652 726325 1783589 2142206
20 341570 365699 1520453 1977878
30 288102 307037 1498167 2037995
40 290714 305955 1612665 2256484
50 288620 312890 1733453 2650292
60 289987 306043 1649360 2388008
70 291298 306347 1723167 2717486
80 290948 305662 1729545 2763582
90 290996 306680 1736021 2757524
100 292243 306700 1773700 3059159

[davidlohr.bueso@hp.com: do not call sem_lock when bogus sma]
[davidlohr.bueso@hp.com: make refcounter atomic]
Signed-off-by: Rik van Riel <riel@redhat.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Chegu Vinod <chegu_vinod@hp.com>
Cc: Jason Low <jason.low2@hp.com>
Reviewed-by: Michel Lespinasse <walken@google.com>
Cc: Peter Hurley <peter@hurleysoftware.com>
Cc: Stanislav Kinsbursky <skinsbursky@parallels.com>
Tested-by: Emmanuel Benisty <benisty.e@gmail.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# daaf74cf 30-Apr-2013 Peter Hurley <peter@hurleysoftware.com>

ipc: refactor msg list search into separate function

[fengguang.wu@intel.com: find_msg can be static]
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# d076ac91 30-Apr-2013 Peter Hurley <peter@hurleysoftware.com>

ipc: simplify msg list search

Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Acked-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 8ac6ed58 30-Apr-2013 Peter Hurley <peter@hurleysoftware.com>

ipc: implement MSG_COPY as a new receive mode

Teach the helper routines about MSG_COPY so that msgtyp is preserved as
the message number to copy.

The security functions affected by this change were audited and no
additional changes are necessary.

Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Acked-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 852028af 30-Apr-2013 Peter Hurley <peter@hurleysoftware.com>

ipc: remove msg handling from queue scan

In preparation for refactoring the queue scan into a separate
function, relocate msg copying.

Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Acked-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 2dc958fa 01-Apr-2013 Stanislav Kinsbursky <skinsbursky@parallels.com>

ipc: set msg back to -EAGAIN if copy wasn't performed

Make sure that msg pointer is set back to error value in case of
MSG_COPY flag is set and desired message to copy wasn't found. This
garantees that msg is either a error pointer or a copy address.

Otherwise the last message in queue will be freed without unlinking from
the queue (which leads to memory corruption) and the dummy allocated
copy won't be released.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 88b9e456 08-Mar-2013 Peter Hurley <peter@hurleysoftware.com>

ipc: don't allocate a copy larger than max

When MSG_COPY is set, a duplicate message must be allocated for the copy
before locking the queue. However, the copy could not be larger than was
sent which is limited to msg_ctlmax.

Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Acked-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 3fcfe786 04-Jan-2013 Stanislav Kinsbursky <skinsbursky@parallels.com>

ipc: add more comments to message copying related code

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: James Morris <jmorris@namei.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 51eeacaa 04-Jan-2013 Stanislav Kinsbursky <skinsbursky@parallels.com>

ipc: simplify message copying

Remove the redundant and confusing fill_copy(). Also add copy_msg()
check for error. In this case exit from the function have to be done
instead of break, because further code interprets any error as EAGAIN.

Also define copy_msg() for the case when CONFIG_CHECKPOINT_RESTORE is
disabled.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: James Morris <jmorris@namei.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# b30efe27 04-Jan-2013 Stanislav Kinsbursky <skinsbursky@parallels.com>

ipc: convert prepare_copy() from macro to function

This code works if CONFIG_CHECKPOINT_RESTORE is disabled.

[akpm@linux-foundation.org: remove __maybe_unused]
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: James Morris <jmorris@namei.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 85398aa8 04-Jan-2013 Stanislav Kinsbursky <skinsbursky@parallels.com>

ipc: simplify free_copy() call

Passing and checking of msgflg to free_copy() is redundant. This patch
sets copy to NULL on declaration instead and checks for non-NULL in
free_copy().

Note: in case of copy allocation failure, error is returned immediately.
So no need to check for IS_ERR() in free_copy().

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: James Morris <jmorris@namei.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 4a674f34 04-Jan-2013 Stanislav Kinsbursky <skinsbursky@parallels.com>

ipc: introduce message queue copy feature

This patch is required for checkpoint/restore in userspace.

c/r requires some way to get all pending IPC messages without deleting
them from the queue (checkpoint can fail and in this case tasks will be
resumed, so queue have to be valid).

To achive this, new operation flag MSG_COPY for sys_msgrcv() system call
was introduced. If this flag was specified, then mtype is interpreted as
number of the message to copy.

If MSG_COPY is set, then kernel will allocate dummy message with passed
size, and then use new copy_msg() helper function to copy desired message
(instead of unlinking it from the queue).

Notes:

1) Return -ENOSYS if MSG_COPY is specified, but
CONFIG_CHECKPOINT_RESTORE is not set.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Cc: Serge Hallyn <serge.hallyn@canonical.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# f9dd87f4 04-Jan-2013 Stanislav Kinsbursky <skinsbursky@parallels.com>

ipc: message queue receive cleanup

Move all message related manipulation into one function msg_fill().
Actually, two functions because of the compat one.

[akpm@linux-foundation.org: checkpatch fixes]
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Cc: Serge Hallyn <serge.hallyn@canonical.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 9afdacda 04-Jan-2013 Stanislav Kinsbursky <skinsbursky@parallels.com>

ipc: remove forced assignment of selected message

This is a cleanup patch. The assignment is redundant.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Cc: Serge Hallyn <serge.hallyn@canonical.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 1efdb69b 07-Feb-2012 Eric W. Biederman <ebiederm@xmission.com>

userns: Convert ipc to use kuid and kgid where appropriate

- Store the ipc owner and creator with a kuid
- Store the ipc group and the crators group with a kgid.
- Add error handling to ipc_update_perms, allowing it to
fail if the uids and gids can not be converted to kuids
or kgids.
- Modify the proc files to display the ipc creator and
owner in the user namespace of the opener of the proc file.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>


# 25985edc 30-Mar-2011 Lucas De Marchi <lucas.demarchi@profusion.mobi>

Fix common misspellings

Fixes generated by 'codespell' and manually reviewed.

Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>


# b0e77598 23-Mar-2011 Serge E. Hallyn <serge@hallyn.com>

userns: user namespaces: convert several capable() calls

CAP_IPC_OWNER and CAP_IPC_LOCK can be checked against current_user_ns(),
because the resource comes from current's own ipc namespace.

setuid/setgid are to uids in own namespace, so again checks can be against
current_user_ns().

Changelog:
Jan 11: Use task_ns_capable() in place of sched_capable().
Jan 11: Use nsown_capable() as suggested by Bastian Blank.
Jan 11: Clarify (hopefully) some logic in futex and sched.c
Feb 15: use ns_capable for ipc, not nsown_capable
Feb 23: let copy_ipcs handle setting ipc_ns->user_ns
Feb 23: pass ns down rather than taking it from current

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Daniel Lezcano <daniel.lezcano@free.fr>
Acked-by: David Howells <dhowells@redhat.com>
Cc: James Morris <jmorris@namei.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 4be929be 24-May-2010 Alexey Dobriyan <adobriyan@gmail.com>

kernel-wide: replace USHORT_MAX, SHORT_MAX and SHORT_MIN with USHRT_MAX, SHRT_MAX and SHRT_MIN

- C99 knows about USHRT_MAX/SHRT_MAX/SHRT_MIN, not
USHORT_MAX/SHORT_MAX/SHORT_MIN.

- Make SHRT_MIN of type s16, not int, for consistency.

[akpm@linux-foundation.org: fix drivers/dma/timb_dma.c]
[akpm@linux-foundation.org: fix security/keys/keyring.c]
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: WANG Cong <xiyou.wangcong@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 5a0e3ad6 24-Mar-2010 Tejun Heo <tj@kernel.org>

include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h

percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.

2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).

* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>


# 7d6feeb2 15-Dec-2009 Serge E. Hallyn <serue@us.ibm.com>

ipc ns: fix memory leak (idr)

We have apparently had a memory leak since
7ca7e564e049d8b350ec9d958ff25eaa24226352 "ipc: store ipcs into IDRs" in
2007. The idr of which 3 exist for each ipc namespace is never freed.

This patch simply frees them when the ipcns is freed. I don't believe any
idr_remove() are done from rcu (and could therefore be delayed until after
this idr_destroy()), so the patch should be safe. Some quick testing
showed no harm, and the memory leak fixed.

Caught by kmemleak.

Signed-off-by: Serge E. Hallyn <serue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# f1970c48 18-Oct-2009 Felipe Contreras <felipe.contreras@gmail.com>

ipc: fix unused variable warning

Commit a0d092f introduced the following warning:
ipc/msg.c: In function ?msgctl_down?:
ipc/msg.c:415: warning: ?msqid64? may be used uninitialized in this function

The gcc warning in this case is actually bogus, as msqid64 is touched only
iff cmd == IPC_SET, and in such case, copy_msqid_from_user() initializes
it properly.

Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>


# e48fbb69 14-Jan-2009 Heiko Carstens <hca@linux.ibm.com>

[CVE-2009-0029] System call wrappers part 24

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>


# dfcceb26 05-Jun-2008 Nadia Derbey <Nadia.Derbey@bull.net>

ipc: only output msgmni value at boot time

When posting:
[PATCH 1/8] Scaling msgmni to the amount of lowmem
(see http://lkml.org/lkml/2008/2/11/171), I have added a KERN_INFO message
that is output each time msgmni is recomputed.

In http://lkml.org/lkml/2008/4/29/575 Tony Luck complained that this
message references an ipc namespace address that is useless.

I first thought of using an audit_log instead of a printk, as suggested by
Serge Hallyn. But unfortunately, we do not have any other information
than the namespace address to provide here too. So I chose to move the
message and output it only at boot time, removing the reference to the
namespace.

Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net>
Cc: Pierre Peiffer <peifferp@gmail.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Cc: "Serge E. Hallyn" <serue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 44f564a4 29-Apr-2008 Zhang, Yanmin <yanmin_zhang@linux.intel.com>

ipc: add definitions of USHORT_MAX and others

Add definitions of USHORT_MAX and others into kernel. ipc uses it and slub
implementation might also use it.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Zhang Yanmin <yanmin.zhang@intel.com>
Reviewed-by: Christoph Lameter <clameter@sgi.com>
Cc: Nadia Derbey <Nadia.Derbey@bull.net>
Cc: "Pierre Peiffer" <peifferp@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# a5f75e7f 29-Apr-2008 Pierre Peiffer <pierre.peiffer@bull.net>

IPC: consolidate all xxxctl_down() functions

semctl_down(), msgctl_down() and shmctl_down() are used to handle the same set
of commands for each kind of IPC. They all start to do the same job (they
retrieve the ipc and do some permission checks) before handling the commands
on their own.

This patch proposes to consolidate this by moving these same pieces of code
into one common function called ipcctl_pre_down().

It simplifies a little these xxxctl_down() functions and increases a little
the maintainability.

Signed-off-by: Pierre Peiffer <pierre.peiffer@bull.net>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Cc: Nadia Derbey <Nadia.Derbey@bull.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 8f4a3809 29-Apr-2008 Pierre Peiffer <pierre.peiffer@bull.net>

IPC: introduce ipc_update_perm()

The IPC_SET command performs the same permission setting for all IPCs. This
patch introduces a common ipc_update_perm() function to update these
permissions and makes use of it for all IPCs.

Signed-off-by: Pierre Peiffer <pierre.peiffer@bull.net>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Cc: Nadia Derbey <Nadia.Derbey@bull.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 016d7132 29-Apr-2008 Pierre Peiffer <pierre.peiffer@bull.net>

IPC: get rid of the use *_setbuf structure.

All IPCs make use of an intermetiate *_setbuf structure to handle the IPC_SET
command. This is not really needed and, moreover, it complicates a little bit
the code.

This patch gets rid of the use of it and uses directly the semid64_ds/
msgid64_ds/shmid64_ds structure.

In addition of removing one struture declaration, it also simplifies and
improves a little bit the common 64-bits path.

Signed-off-by: Pierre Peiffer <pierre.peiffer@bull.net>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Cc: Nadia Derbey <Nadia.Derbey@bull.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# a0d092fc 29-Apr-2008 Pierre Peiffer <pierre.peiffer@bull.net>

IPC/message queues: introduce msgctl_down

Currently, sys_msgctl is not easy to read.

This patch tries to improve that by introducing the msgctl_down function to
handle all commands requiring the rwmutex to be taken in write mode (ie
IPC_SET and IPC_RMID for now). It is the equivalent function of semctl_down
for message queues.

This greatly changes the readability of sys_msgctl and also harmonizes the way
these commands are handled among all IPCs.

Signed-off-by: Pierre Peiffer <pierre.peiffer@bull.net>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Cc: Nadia Derbey <Nadia.Derbey@bull.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# b6b337ad 29-Apr-2008 Nadia Derbey <Nadia.Derbey@bull.net>

ipc: recompute msgmni on memory add / remove

Introduce the registration of a callback routine that recomputes msg_ctlmni
upon memory add / remove.

A single notifier block is registered in the hotplug memory chain for all the
ipc namespaces.

Since the ipc namespaces are not linked together, they have their own
notification chain: one notifier_block is defined per ipc namespace.

Each time an ipc namespace is created (removed) it registers (unregisters) its
notifier block in (from) the ipcns chain. The callback routine registered in
the memory chain invokes the ipcns notifier chain with the IPCNS_LOWMEM event.
Each callback routine registered in the ipcns namespace, in turn, recomputes
msgmni for the owning namespace.

Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Matt Helsley <matthltc@us.ibm.com>
Cc: Mingming Cao <cmm@us.ibm.com>
Cc: Pierre Peiffer <pierre.peiffer@bull.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 4d89dc6a 29-Apr-2008 Nadia Derbey <Nadia.Derbey@bull.net>

ipc: scale msgmni to the number of ipc namespaces

Since all the namespaces see the same amount of memory (the total one) this
patch introduces a new variable that counts the ipc namespaces and divides
msg_ctlmni by this counter.

Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Matt Helsley <matthltc@us.ibm.com>
Cc: Mingming Cao <cmm@us.ibm.com>
Cc: Pierre Peiffer <pierre.peiffer@bull.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# f7bf3df8 29-Apr-2008 Nadia Derbey <Nadia.Derbey@bull.net>

ipc: scale msgmni to the amount of lowmem

On large systems we'd like to allow a larger number of message queues. In
some cases up to 32K. However simply setting MSGMNI to a larger value may
cause problems for smaller systems.

The first patch of this series introduces a default maximum number of message
queue ids that scales with the amount of lowmem.

Since msgmni is per namespace and there is no amount of memory dedicated to
each namespace so far, the second patch of this series scales msgmni to the
number of ipc namespaces too.

Since msgmni depends on the amount of memory, it becomes necessary to
recompute it upon memory add/remove. In the 4th patch, memory hotplug
management is added: a notifier block is registered into the memory hotplug
notifier chain for the ipc subsystem. Since the ipc namespaces are not linked
together, they have their own notification chain: one notifier_block is
defined per ipc namespace. Each time an ipc namespace is created (removed) it
registers (unregisters) its notifier block in (from) the ipcns chain. The
callback routine registered in the memory chain invokes the ipcns notifier
chain with the IPCNS_MEMCHANGE event. Each callback routine registered in the
ipcns namespace, in turn, recomputes msgmni for the owning namespace.

The 5th patch makes it possible to keep the memory hotplug notifier chain's
lock for a lesser amount of time: instead of directly notifying the ipcns
notifier chain upon memory add/remove, a work item is added to the global
workqueue. When activated, this work item is the one who notifies the ipcns
notifier chain.

Since msgmni depends on the number of ipc namespaces, it becomes necessary to
recompute it upon ipc namespace creation / removal. The 6th patch uses the
ipc namespace notifier chain for that purpose: that chain is notified each
time an ipc namespace is created or removed. This makes it possible to
recompute msgmni for all the namespaces each time one of them is created or
removed.

When msgmni is explicitely set from userspace, we should avoid recomputing it
upon memory add/remove or ipcns creation/removal. This is what the 7th patch
does: it simply unregisters the ipcns callback routine as soon as msgmni has
been changed from procfs or sysctl().

Even if msgmni is set by hand, it should be possible to make it back
automatically recomputed upon memory add/remove or ipcns creation/removal.
This what is achieved in patch 8: if set to a negative value, msgmni is added
back to the ipcns notifier chain, making it automatically recomputed again.

This patch:

Compute msg_ctlmni to make it scale with the amount of lowmem. msg_ctlmni is
now set to make the message queues occupy 1/32 of the available lowmem.

Some cleaning has also been done for the MSGPOOL constant: the msgctl man page
says it's not used, but it also defines it as a size in bytes (the code
expresses it in Kbytes).

Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Matt Helsley <matthltc@us.ibm.com>
Cc: Mingming Cao <cmm@us.ibm.com>
Cc: Pierre Peiffer <pierre.peiffer@bull.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 48dea404 29-Apr-2008 Pierre Peiffer <pierre.peiffer@bull.net>

IPC: use ipc_buildid() directly from ipc_addid()

By continuing to consolidate a little the IPC code, each id can be built
directly in ipc_addid() instead of having it built from each callers of
ipc_addid()

And I also remove shm_addid() in order to have, as much as possible, the
same code for shm/sem/msg.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Pierre Peiffer <pierre.peiffer@bull.net>
Cc: Nadia Derbey <Nadia.Derbey@bull.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 01b8b07a 08-Feb-2008 Pierre Peiffer <pierre.peiffer@bull.net>

IPC: consolidate sem_exit_ns(), msg_exit_ns() and shm_exit_ns()

sem_exit_ns(), msg_exit_ns() and shm_exit_ns() are all called when an
ipc_namespace is released to free all ipcs of each type. But in fact, they
do the same thing: they loop around all ipcs to free them individually by
calling a specific routine.

This patch proposes to consolidate this by introducing a common function,
free_ipcs(), that do the job. The specific routine to call on each
individual ipcs is passed as parameter. For this, these ipc-specific
'free' routines are reworked to take a generic 'struct ipc_perm' as
parameter.

Signed-off-by: Pierre Peiffer <pierre.peiffer@bull.net>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Nadia Derbey <Nadia.Derbey@bull.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# ed2ddbf8 08-Feb-2008 Pierre Peiffer <pierre.peiffer@bull.net>

IPC: make struct ipc_ids static in ipc_namespace

Each ipc_namespace contains a table of 3 pointers to struct ipc_ids (3 for
msg, sem and shm, structure used to store all ipcs) These 'struct ipc_ids'
are dynamically allocated for each icp_namespace as the ipc_namespace
itself (for the init namespace, they are initialized with pointers to
static variables instead)

It is so for historical reason: in fact, before the use of idr to store the
ipcs, the ipcs were stored in tables of variable length, depending of the
maximum number of ipc allowed. Now, these 'struct ipc_ids' have a fixed
size. As they are allocated in any cases for each new ipc_namespace, there
is no gain of memory in having them allocated separately of the struct
ipc_namespace.

This patch proposes to make this table static in the struct ipc_namespace.
Thus, we can allocate all in once and get rid of all the code needed to
allocate and free these ipc_ids separately.

Signed-off-by: Pierre Peiffer <pierre.peiffer@bull.net>
Acked-by: Cedric Le Goater <clg@fr.ibm.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Nadia Derbey <Nadia.Derbey@bull.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# ae5e1b22 08-Feb-2008 Pavel Emelyanov <xemul@openvz.org>

namespaces: move the IPC namespace under IPC_NS option

Currently the IPC namespace management code is spread over the ipc/*.c files.
I moved this code into ipc/namespace.c file which is compiled out when needed.

The linux/ipc_namespace.h file is used to store the prototypes of the
functions in namespace.c and the stubs for NAMESPACES=n case. This is done
so, because the stub for copy_ipc_namespace requires the knowledge of the
CLONE_NEWIPC flag, which is in sched.h. But the linux/ipc.h file itself in
included into many many .c files via the sys.h->sem.h sequence so adding the
sched.h into it will make all these .c depend on sched.h which is not that
good. On the other hand the knowledge about the namespaces stuff is required
in 4 .c files only.

Besides, this patch compiles out some auxiliary functions from ipc/sem.c,
msg.c and shm.c files. It turned out that moving these functions into
namespaces.c is not that easy because they use many other calls and macros
from the original file. Moving them would make this patch complicated. On
the other hand all these functions can be consolidated, so I will send a
separate patch doing this a bit later.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Kirill Korotaev <dev@sw.ru>
Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# b1ed88b4 06-Feb-2008 Pierre Peiffer <pierre.peiffer@bull.net>

IPC: fix error check in all new xxx_lock() and xxx_exit_ns() functions

In the new implementation of the [sem|shm|msg]_lock[_check]() routines, we
use the return value of ipc_lock() in container_of() without any check.
But ipc_lock may return a errcode. The use of this errcode in
container_of() may alter this errcode, and we don't want this.

And in xxx_exit_ns, the pointer return by idr_find is of type 'struct
kern_ipc_per'...

Today, the code will work as is because the member used in these
container_of() is the first member of its container (offset == 0), the
errcode isn't changed then. But in the general case, we can't count on
this assumption and this may lead later to a real bug if we don't correct
this.

Again, the proposed solution is simple and correct. But, as pointed by
Nadia, with this solution, the same check will be done several times (in
all sub-callers...), what is not very funny/optimal...

Signed-off-by: Pierre Peiffer <pierre.peiffer@bull.net>
Cc: Nadia Derbey <Nadia.Derbey@bull.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 283bb7fa 19-Oct-2007 Pierre Peiffer <Pierre.Peiffer@bull.net>

IPC: fix error case when idr-cache is empty in ipcget()

With the use of idr to store the ipc, the case where the idr cache is
empty, when idr_get_new is called (this may happen even if we call
idr_pre_get() before), is not well handled: it lets
semget()/shmget()/msgget() return ENOSPC when this cache is empty, what 1.
does not reflect the facts and 2. does not conform to the man(s).

This patch fixes this by retrying the whole process of allocation in this case.

Signed-off-by: Pierre Peiffer <pierre.peiffer@bull.net>
Cc: Nadia Derbey <Nadia.Derbey@bull.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 3ac88a41 19-Oct-2007 Kirill Korotaev <dev@openvz.org>

virtualization of sysv msg queues is incomplete

Virtualization of sysv msg queues is incomplete: msg_hdrs and msg_bytes
variables visible from userspace are global. Let's make them
per-namespace.

Signed-off-by: Alexey Kuznetsov <alexey@openvz.org>
Signed-off-by: Kirill Korotaev <dev@openvz.org>
Cc: Pierre Peiffer <pierre.peiffer@bull.net>
Cc: Nadia Derbey <Nadia.Derbey@bull.net>
Cc: Serge Hallyn <serue@us.ibm.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 1b531f21 19-Oct-2007 Nadia Derbey <Nadia.Derbey@bull.net>

ipc: remove unneeded parameters

Remvoe the unneeded parameters from ipc_checkid() and ipc_buildid()
interfaces.

Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 3e148c79 19-Oct-2007 Nadia Derbey <Nadia.Derbey@bull.net>

fix idr_find() locking

This is a patch that fixes the way idr_find() used to be called in ipc_lock():
in all the paths that don't imply an update of the ipcs idr, it was called
without the idr tree being locked.

The changes are:
. in ipc_ids, the mutex has been changed into a reader/writer semaphore.
. ipc_lock() now takes the mutex as a reader during the idr_find().
. a new routine ipc_lock_down() has been defined: it doesn't take the
mutex, assuming that it is being held by the caller. This is the routine
that is now called in all the update paths.

Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net>
Acked-by: Jarek Poplawski <jarkao2@o2.pl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# f4566f04 19-Oct-2007 Nadia Derbey <Nadia.Derbey@bull.net>

ipc: fix wrong comments

This patch fixes the wrong / obsolete comments in the ipc code. Also adds
a missing lock around ipc_get_maxid() in shm_get_stat().

Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 03f02c76 19-Oct-2007 Nadia Derbey <Nadia.Derbey@bull.net>

Storing ipcs into IDRs

This patch converts casts of struct kern_ipc_perm to
. struct msg_queue
. struct sem_array
. struct shmid_kernel
into the equivalent container_of() macro. It improves code maintenance
because the code need not change if kern_ipc_perm is no longer at the
beginning of the containing struct.

Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 023a5355 19-Oct-2007 Nadia Derbey <Nadia.Derbey@bull.net>

ipc: integrate ipc_checkid() into ipc_lock()

This patch introduces a new ipc_lock_check() routine interface:
. each time ipc_checkid() is called, this is done after calling ipc_lock().
ipc_checkid() is now called from inside ipc_lock_check().

[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: fix RCU locking]
Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 7748dbfa 19-Oct-2007 Nadia Derbey <Nadia.Derbey@bull.net>

ipc: unify the syscalls code

This patch introduces a change into the sys_msgget(), sys_semget() and
sys_shmget() routines: they now share a common code, which is better for
maintainability.

Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 7ca7e564 19-Oct-2007 Nadia Derbey <Nadia.Derbey@bull.net>

ipc: store ipcs into IDRs

This patch introduces ipcs storage into IDRs. The main changes are:
. This ipc_ids structure is changed: the entries array is changed into a
root idr structure.
. The grow_ary() routine is removed: it is not needed anymore when adding
an ipc structure, since we are now using the IDR facility.
. The ipc_rmid() routine interface is changed:
. there is no need for this routine to return the pointer passed in as
argument: it is now declared as a void
. since the id is now part of the kern_ipc_perm structure, no need to
have it as an argument to the routine

Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# b488893a 19-Oct-2007 Pavel Emelyanov <xemul@openvz.org>

pid namespaces: changes to show virtual ids to user

This is the largest patch in the set. Make all (I hope) the places where
the pid is shown to or get from user operate on the virtual pids.

The idea is:
- all in-kernel data structures must store either struct pid itself
or the pid's global nr, obtained with pid_nr() call;
- when seeking the task from kernel code with the stored id one
should use find_task_by_pid() call that works with global pids;
- when showing pid's numerical value to the user the virtual one
should be used, but however when one shows task's pid outside this
task's namespace the global one is to be used;
- when getting the pid from userspace one need to consider this as
the virtual one and use appropriate task/pid-searching functions.

[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: nuther build fix]
[akpm@linux-foundation.org: yet nuther build fix]
[akpm@linux-foundation.org: remove unneeded casts]
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Alexey Dobriyan <adobriyan@openvz.org>
Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Paul Menage <menage@google.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 8e1c091c 17-Jul-2007 Jeff Garzik <jeff@garzik.org>

arch/i386/* fs/* ipc/*: mark variables with uninitialized_var()

Mark variables with uninitialized_var() if such a warning appears,
and analysis proves that the var is initialized properly on all paths
it is used.

Signed-off-by: Jeff Garzik <jeff@garzik.org>


# 7d69a1f4 16-Jul-2007 Cedric Le Goater <clg@fr.ibm.com>

remove CONFIG_UTS_NS and CONFIG_IPC_NS

CONFIG_UTS_NS and CONFIG_IPC_NS have very little value as they only
deactivate the unshare of the uts and ipc namespaces and do not improve
performance.

Signed-off-by: Cedric Le Goater <clg@fr.ibm.com>
Acked-by: "Serge E. Hallyn" <serue@us.ibm.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Pavel Emelianov <xemul@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 651971cb 06-Dec-2006 suzuki <suzuki@linux.vnet.ibm.com>

[PATCH] Fix the size limit of compat space msgsize

Currently we allocate 64k space on the user stack and use it the msgbuf for
sys_{msgrcv,msgsnd} for compat and the results are later copied in user [
by copy_in_user]. This patch introduces helper routines for
sys_{msgrcv,msgsnd} as below:

do_msgsnd() : Accepts the mtype and user space ptr to the buffer along with
the msqid and msgflg.

do_msgrcv() : Accepts a kernel space ptr to mtype and a userspace ptr to
the buffer. The mtype has to be copied back the user space msgbuf by the
caller.

These changes avoid the need to allocate the msgsize on the userspace (
thus removing the size limt ) and the overhead of an extra copy_in_user().

Signed-off-by: Suzuki K P <suzuki@in.ibm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 80491eb9 04-Nov-2006 Linus Torvalds <torvalds@g5.osdl.org>

Revert unintentional "volatile" changes in ipc/msg.c

Commit 5a06a363ef48444186f18095ae1b932dddbbfa89 ("[PATCH] ipc/msg.c:
clean up coding style") breaks fakeroot on Alpha (variously hangs or
oopses), according to a report by Falk Hueffner.

The fact that the code seems to rely on compiler access ordering through
the use of "volatile" is a pretty certain sign that the code has locking
problems, and we should fix those properly and then remove the whole
"volatile" entirely.

But in the meantime, the movement of "volatile" was unintentional, and
should be reverted.

Cc: Falk Hueffner <falk@debian.org>
Cc: Andrew Morton <akpm@osdl.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# c7e12b83 02-Nov-2006 Pavel Emelianov <xemul@openvz.org>

[PATCH] Fix ipc entries removal

Fix two issuses related to ipc_ids->entries freeing.

1. When freeing ipc namespace we need to free entries allocated
with ipc_init_ids().

2. When removing old entries in grow_ary() ipc_rcu_putref()
may be called on entries set to &ids->nullentry earlier in
ipc_init_ids().
This is almost impossible without namespaces, but with
them this situation becomes possible.

Found during OpenVZ testing after obvious leaks in beancounters.

Signed-off-by: Pavel Emelianov <xemul@openvz.org>
Cc: Kirill Korotaev <dev@openvz.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 1e786937 02-Oct-2006 Kirill Korotaev <dev@openvz.org>

[PATCH] IPC namespace - msg

IPC namespace support for IPC msg code.

Signed-off-by: Pavel Emelianiov <xemul@openvz.org>
Signed-off-by: Kirill Korotaev <dev@openvz.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 5a06a363 30-Jul-2006 Ingo Molnar <mingo@elte.hu>

[PATCH] ipc/msg.c: clean up coding style

Clean up ipc/msg.c to conform to Documentation/CodingStyle. (before it was
an inconsistent hodepodge of various coding styles)

Verified that the before/after .o's are identical.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 6ab3d562 30-Jun-2006 Jörn Engel <joern@wohnheim.fh-wedel.de>

Remove obsolete #include <linux/config.h>

Signed-off-by: Jörn Engel <joern@wohnheim.fh-wedel.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>


# ac03221a 16-May-2006 Linda Knippers <linda.knippers@hp.com>

[PATCH] update of IPC audit record cleanup

The following patch addresses most of the issues with the IPC_SET_PERM
records as described in:
https://www.redhat.com/archives/linux-audit/2006-May/msg00010.html
and addresses the comments I received on the record field names.

To summarize, I made the following changes:

1. Changed sys_msgctl() and semctl_down() so that an IPC_SET_PERM
record is emitted in the failure case as well as the success case.
This matches the behavior in sys_shmctl(). I could simplify the
code in sys_msgctl() and semctl_down() slightly but it would mean
that in some error cases we could get an IPC_SET_PERM record
without an IPC record and that seemed odd.

2. No change to the IPC record type, given no feedback on the backward
compatibility question.

3. Removed the qbytes field from the IPC record. It wasn't being
set and when audit_ipc_obj() is called from ipcperms(), the
information isn't available. If we want the information in the IPC
record, more extensive changes will be necessary. Since it only
applies to message queues and it isn't really permission related, it
doesn't seem worth it.

4. Removed the obj field from the IPC_SET_PERM record. This means that
the kern_ipc_perm argument is no longer needed.

5. Removed the spaces and renamed the IPC_SET_PERM field names. Replaced iuid and
igid fields with ouid and ogid in the IPC record.

I tested this with the lspp.22 kernel on an x86_64 box. I believe it
applies cleanly on the latest kernel.

-- ljk

Signed-off-by: Linda Knippers <linda.knippers@hp.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 073115d6 02-Apr-2006 Steve Grubb <sgrubb@redhat.com>

[PATCH] Rework of IPC auditing

1) The audit_ipc_perms() function has been split into two different
functions:
- audit_ipc_obj()
- audit_ipc_set_perm()

There's a key shift here... The audit_ipc_obj() collects the uid, gid,
mode, and SElinux context label of the current ipc object. This
audit_ipc_obj() hook is now found in several places. Most notably, it
is hooked in ipcperms(), which is called in various places around the
ipc code permforming a MAC check. Additionally there are several places
where *checkid() is used to validate that an operation is being
performed on a valid object while not necessarily having a nearby
ipcperms() call. In these locations, audit_ipc_obj() is called to
ensure that the information is captured by the audit system.

The audit_set_new_perm() function is called any time the permissions on
the ipc object changes. In this case, the NEW permissions are recorded
(and note that an audit_ipc_obj() call exists just a few lines before
each instance).

2) Support for an AUDIT_IPC_SET_PERM audit message type. This allows
for separate auxiliary audit records for normal operations on an IPC
object and permissions changes. Note that the same struct
audit_aux_data_ipcctl is used and populated, however there are separate
audit_log_format statements based on the type of the message. Finally,
the AUDIT_IPC block of code in audit_free_aux() was extended to handle
aux messages of this new type. No more mem leaks I hope ;-)

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 5f921ae9 26-Mar-2006 Ingo Molnar <mingo@elte.hu>

[PATCH] sem2mutex: ipc, id.sem

Semaphore to mutex conversion.

The conversion was generated via scripts, and the result was validated
automatically via a script as well.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 8cd5283b 24-Mar-2006 Eric Sesterhenn <snakebyte@gmx.de>

BUG_ON() Conversion in ipc/msg.c

this changes if() BUG(); constructs to BUG_ON() which is
cleaner, contains unlikely() and can better optimized away.

Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>


# 8c8570fb 03-Nov-2005 Dustin Kirkland <dustin.kirkland@us.ibm.com>

[PATCH] Capture selinux subject/object context information.

This patch extends existing audit records with subject/object context
information. Audit records associated with filesystem inodes, ipc, and
tasks now contain SELinux label information in the field "subj" if the
item is performing the action, or in "obj" if the item is the receiver
of an action.

These labels are collected via hooks in SELinux and appended to the
appropriate record in the audit code.

This additional information is required for Common Criteria Labeled
Security Protection Profile (LSPP).

[AV: fixed kmalloc flags use]
[folded leak fixes]
[folded cleanup from akpm (kfree(NULL)]
[folded audit_inode_context() leak fix]
[folded akpm's fix for audit_ipc_perm() definition in case of !CONFIG_AUDIT]

Signed-off-by: Dustin Kirkland <dustin.kirkland@us.ibm.com>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 624dffcb 14-Jan-2006 Christian Kujau <evil@g-house.de>

correct email address of Manfred Spraul

I tried to send the forcedeth maintainer an email, but it came back with:

"The mail address manfreds@colorfullife.com is not read anymore.
Please resent your mail to manfred@ instead of manfreds@."

This patch fixes this.

Signed-off-by: Adrian Bunk <bunk@stusta.de>


# c59ede7b 11-Jan-2006 Randy Dunlap <rdunlap@infradead.org>

[PATCH] move capable() to capability.h

- Move capable() from sched.h to capability.h;

- Use <linux/capability.h> where capable() is used
(in include/, block/, ipc/, kernel/, a few drivers/,
mm/, security/, & sound/;
many more drivers/ to go)

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 19b4946c 06-Sep-2005 Mike Waychison <mikew@google.com>

[PATCH] ipc: convert /proc/sysvipc/* to generic seq_file interface

Change the /proc/sysvipc/shm|sem|msg files to use the generic seq_file
implementation for struct ipc_ids.

Signed-off-by: Mike Waychison <mikew@google.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 1da177e4 16-Apr-2005 Linus Torvalds <torvalds@ppc970.osdl.org>

Linux-2.6.12-rc2

Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.

Let it rip!