History log of /linux-master/drivers/connector/cn_proc.c
Revision Date Author Comments
# 8929f95b 09-Feb-2024 Keqi Wang <wangkeqi_chris@163.com>

connector/cn_proc: revert "connector: Fix proc_event_num_listeners count not cleared"

This reverts commit c46bfba1337d ("connector: Fix proc_event_num_listeners
count not cleared").

It is not accurate to reset proc_event_num_listeners according to
cn_netlink_send_mult() return value -ESRCH.

In the case of stress-ng netlink-proc, -ESRCH will always be returned,
because netlink_broadcast_filtered will return -ESRCH,
which may cause stress-ng netlink-proc performance degradation.

Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202401112259.b23a1567-oliver.sang@intel.com
Fixes: c46bfba1337d ("connector: Fix proc_event_num_listeners count not cleared")
Signed-off-by: Keqi Wang <wangkeqi_chris@163.com>
Link: https://lore.kernel.org/r/20240209091659.68723-1-wangkeqi_chris@163.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>


# c46bfba1 22-Dec-2023 wangkeqi <wangkeqiwang@didiglobal.com>

connector: Fix proc_event_num_listeners count not cleared

When we register a cn_proc listening event, the proc_event_num_listener
variable will be incremented by one, but if PROC_CN_MCAST_IGNORE is
not called, the count will not decrease.
This will cause the proc_*_connector function to take the wrong path.
It will reappear when the forkstat tool exits via ctrl + c.
We solve this problem by determining whether
there are still listeners to clear proc_event_num_listener.

Signed-off-by: wangkeqi <wangkeqiwang@didiglobal.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 9644bc49 20-Oct-2023 Anjali Kulkarni <anjali.k.kulkarni@oracle.com>

Fix NULL pointer dereference in cn_filter()

Check that sk_user_data is not NULL, else return from cn_filter().
Could not reproduce this issue, but Oliver Sang verified it has fixed
the "Closes" problem below.

Fixes: 2aa1f7a1f47c ("connector/cn_proc: Add filtering to fix some bugs")
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202309201456.84c19e27-oliver.sang@intel.com/
Signed-off-by: Anjali Kulkarni <anjali.k.kulkarni@oracle.com>
Link: https://lore.kernel.org/r/20231020234058.2232347-1-anjali.k.kulkarni@oracle.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>


# bfdfdc2f 19-Jul-2023 Anjali Kulkarni <anjali.k.kulkarni@oracle.com>

connector/cn_proc: Allow non-root users access

There were a couple of reasons for not allowing non-root users access
initially - one is there was some point no proper receive buffer
management in place for netlink multicast. But that should be long
fixed. See link below for more context.

Second is that some of the messages may contain data that is root only. But
this should be handled with a finer granularity, which is being done at the
protocol layer. The only problematic protocols are nf_queue and the
firewall netlink. Hence, this restriction for non-root access was relaxed
for NETLINK_ROUTE initially:
https://lore.kernel.org/all/20020612013101.A22399@wotan.suse.de/

This restriction has also been removed for following protocols:
NETLINK_KOBJECT_UEVENT, NETLINK_AUDIT, NETLINK_SOCK_DIAG,
NETLINK_GENERIC, NETLINK_SELINUX.

Since process connector messages are not sensitive (process fork, exit
notifications etc.), and anyone can read /proc data, we can allow non-root
access here. However, since process event notification is not the only
consumer of NETLINK_CONNECTOR, we can make this change even more
fine grained than the protocol level, by checking for multicast group
within the protocol.

Allow non-root access for NETLINK_CONNECTOR via NL_CFG_F_NONROOT_RECV
but add new bind function cn_bind(), which allows non-root access only
for CN_IDX_PROC multicast group.

Signed-off-by: Anjali Kulkarni <anjali.k.kulkarni@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 743acf35 19-Jul-2023 Anjali Kulkarni <anjali.k.kulkarni@oracle.com>

connector/cn_proc: Performance improvements

This patch adds the capability to filter messages sent by the proc
connector on the event type supplied in the message from the client
to the connector. The client can register to listen for an event type
given in struct proc_input.

This event based filteting will greatly enhance performance - handling
8K exits takes about 70ms, whereas 8K-forks + 8K-exits takes about 150ms
& handling 8K-forks + 8K-exits + 8K-execs takes 200ms. There are currently
9 different types of events, and we need to listen to all of them. Also,
measuring the time using pidfds for monitoring 8K process exits took
much longer - 200ms, as compared to 70ms using only exit notifications of
proc connector.

We also add a new event type - PROC_EVENT_NONZERO_EXIT, which is
only sent by kernel to a listening application when any process exiting,
has a non-zero exit status. This will help the clients like Oracle DB,
where a monitoring process wants notfications for non-zero process exits
so it can cleanup after them.

This kind of a new event could also be useful to other applications like
Google's lmkd daemon, which needs a killed process's exit notification.

The patch takes care that existing clients using old mechanism of not
sending the event type work without any changes.

cn_filter function checks to see if the event type being notified via
proc connector matches the event type requested by client, before
sending(matches) or dropping(does not match) a packet.

Signed-off-by: Anjali Kulkarni <anjali.k.kulkarni@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 2aa1f7a1 19-Jul-2023 Anjali Kulkarni <anjali.k.kulkarni@oracle.com>

connector/cn_proc: Add filtering to fix some bugs

The current proc connector code has the foll. bugs - if there are more
than one listeners for the proc connector messages, and one of them
deregisters for listening using PROC_CN_MCAST_IGNORE, they will still get
all proc connector messages, as long as there is another listener.

Another issue is if one client calls PROC_CN_MCAST_LISTEN, and another one
calls PROC_CN_MCAST_IGNORE, then both will end up not getting any messages.

This patch adds filtering and drops packet if client has sent
PROC_CN_MCAST_IGNORE. This data is stored in the client socket's
sk_user_data. In addition, we only increment or decrement
proc_event_num_listeners once per client. This fixes the above issues.

cn_release is the release function added for NETLINK_CONNECTOR. It uses
the newly added netlink_release function added to netlink_sock. It will
free sk_user_data.

Signed-off-by: Anjali Kulkarni <anjali.k.kulkarni@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 42c66d16 25-Jan-2022 Leo Yan <leo.yan@linaro.org>

connector/cn_proc: Use task_is_in_init_pid_ns()

This patch replaces open code with task_is_in_init_pid_ns() to check if
a task is in root PID namespace.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 3e92fd7b 27-May-2020 Mike Galbraith <umgwanakikbuti@gmail.com>

connector/cn_proc: Protect send_msg() with a local lock

send_msg() disables preemption to avoid out-of-order messages. As the
code inside the preempt disabled section acquires regular spinlocks,
which are converted to 'sleeping' spinlocks on a PREEMPT_RT kernel and
eventually calls into a memory allocator, this conflicts with the RT
semantics.

Convert it to a local_lock which allows RT kernels to substitute them with
a real per CPU lock. On non RT kernels this maps to preempt_disable() as
before. No functional change.

[bigeasy: Patch description]

Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20200527201119.1692513-6-bigeasy@linutronix.de


# 1a59d1b8 27-May-2019 Thomas Gleixner <tglx@linutronix.de>

treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 156

Based on 1 normalized pattern(s):

this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license as published by
the free software foundation either version 2 of the license or at
your option any later version this program is distributed in the
hope that it will be useful but without any warranty without even
the implied warranty of merchantability or fitness for a particular
purpose see the gnu general public license for more details you
should have received a copy of the gnu general public license along
with this program if not write to the free software foundation inc
59 temple place suite 330 boston ma 02111 1307 usa

extracted by the scancode license scanner the SPDX license identifier

GPL-2.0-or-later

has been chosen to replace the boilerplate/reference in 1334 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Allison Randal <allison@lohutok.net>
Reviewed-by: Richard Fontana <rfontana@redhat.com>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190527070033.113240726@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# 6d2b0f02 05-Mar-2019 Li RongQing <lirongqing@baidu.com>

connector: fix unsafe usage of ->real_parent

proc_exit_connector() uses ->real_parent lockless. This is not
safe that its parent can go away at any moment, so use RCU to
protect it, and ensure that this task is not released.

[ 747.624551] ==================================================================
[ 747.632946] BUG: KASAN: use-after-free in proc_exit_connector+0x1f7/0x310
[ 747.640686] Read of size 4 at addr ffff88a0276988e0 by task sshd/2882
[ 747.648032]
[ 747.649804] CPU: 11 PID: 2882 Comm: sshd Tainted: G E 4.19.26-rc2 #11
[ 747.658629] Hardware name: IBM x3550M4 -[7914OFV]-/00AM544, BIOS -[D7E142BUS-1.71]- 07/31/2014
[ 747.668419] Call Trace:
[ 747.671269] dump_stack+0xf0/0x19b
[ 747.675186] ? show_regs_print_info+0x5/0x5
[ 747.679988] ? kmsg_dump_rewind_nolock+0x59/0x59
[ 747.685302] print_address_description+0x6a/0x270
[ 747.691162] kasan_report+0x258/0x380
[ 747.695835] ? proc_exit_connector+0x1f7/0x310
[ 747.701402] proc_exit_connector+0x1f7/0x310
[ 747.706767] ? proc_coredump_connector+0x2d0/0x2d0
[ 747.712715] ? _raw_write_unlock_irq+0x29/0x50
[ 747.718270] ? _raw_write_unlock_irq+0x29/0x50
[ 747.723820] ? ___preempt_schedule+0x16/0x18
[ 747.729193] ? ___preempt_schedule+0x16/0x18
[ 747.734574] do_exit+0xa11/0x14f0
[ 747.738880] ? mm_update_next_owner+0x590/0x590
[ 747.744525] ? debug_show_all_locks+0x3c0/0x3c0
[ 747.761448] ? ktime_get_coarse_real_ts64+0xeb/0x1c0
[ 747.767589] ? lockdep_hardirqs_on+0x1a6/0x290
[ 747.773154] ? check_chain_key+0x139/0x1f0
[ 747.778345] ? check_flags.part.35+0x240/0x240
[ 747.783908] ? __lock_acquire+0x2300/0x2300
[ 747.789171] ? _raw_spin_unlock_irqrestore+0x59/0x70
[ 747.795316] ? _raw_spin_unlock_irqrestore+0x59/0x70
[ 747.801457] ? do_raw_spin_unlock+0x10f/0x1e0
[ 747.806914] ? do_raw_spin_trylock+0x120/0x120
[ 747.812481] ? preempt_count_sub+0x14/0xc0
[ 747.817645] ? _raw_spin_unlock+0x2e/0x50
[ 747.822708] ? __handle_mm_fault+0x12db/0x1fa0
[ 747.828367] ? __pmd_alloc+0x2d0/0x2d0
[ 747.833143] ? check_noncircular+0x50/0x50
[ 747.838309] ? match_held_lock+0x7f/0x340
[ 747.843380] ? check_noncircular+0x50/0x50
[ 747.848561] ? handle_mm_fault+0x21a/0x5f0
[ 747.853730] ? check_flags.part.35+0x240/0x240
[ 747.859290] ? check_chain_key+0x139/0x1f0
[ 747.864474] ? __do_page_fault+0x40f/0x760
[ 747.869655] ? __audit_syscall_entry+0x4b/0x1f0
[ 747.875319] ? syscall_trace_enter+0x1d5/0x7b0
[ 747.880877] ? trace_raw_output_preemptirq_template+0x90/0x90
[ 747.887895] ? trace_raw_output_sys_exit+0x80/0x80
[ 747.893860] ? up_read+0x3b/0x90
[ 747.898142] ? stop_critical_timings+0x260/0x260
[ 747.903909] do_group_exit+0xe0/0x1c0
[ 747.908591] ? __x64_sys_exit+0x30/0x30
[ 747.913460] ? trace_raw_output_preemptirq_template+0x90/0x90
[ 747.920485] ? tracer_hardirqs_on+0x270/0x270
[ 747.925956] __x64_sys_exit_group+0x28/0x30
[ 747.931214] do_syscall_64+0x117/0x400
[ 747.935988] ? syscall_return_slowpath+0x2f0/0x2f0
[ 747.941931] ? trace_hardirqs_off_thunk+0x1a/0x1c
[ 747.947788] ? trace_hardirqs_on_caller+0x1d0/0x1d0
[ 747.953838] ? lockdep_sys_exit+0x16/0x8e
[ 747.958915] ? trace_hardirqs_off_thunk+0x1a/0x1c
[ 747.964784] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 747.971021] RIP: 0033:0x7f572f154c68
[ 747.975606] Code: Bad RIP value.
[ 747.979791] RSP: 002b:00007ffed2dfaa58 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
[ 747.989324] RAX: ffffffffffffffda RBX: 00007f572f431840 RCX: 00007f572f154c68
[ 747.997910] RDX: 0000000000000001 RSI: 000000000000003c RDI: 0000000000000001
[ 748.006495] RBP: 0000000000000001 R08: 00000000000000e7 R09: fffffffffffffee0
[ 748.015079] R10: 00007f572f4387e8 R11: 0000000000000246 R12: 00007f572f431840
[ 748.023664] R13: 000055a7f90f2c50 R14: 000055a7f96e2310 R15: 000055a7f96e2310
[ 748.032287]
[ 748.034509] Allocated by task 2300:
[ 748.038982] kasan_kmalloc+0xa0/0xd0
[ 748.043562] kmem_cache_alloc_node+0xf5/0x2e0
[ 748.049018] copy_process+0x1781/0x4790
[ 748.053884] _do_fork+0x166/0x9a0
[ 748.058163] do_syscall_64+0x117/0x400
[ 748.062943] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 748.069180]
[ 748.071405] Freed by task 15395:
[ 748.075591] __kasan_slab_free+0x130/0x180
[ 748.080752] kmem_cache_free+0xc2/0x310
[ 748.085619] free_task+0xea/0x130
[ 748.089901] __put_task_struct+0x177/0x230
[ 748.095063] finish_task_switch+0x51b/0x5d0
[ 748.100315] __schedule+0x506/0xfa0
[ 748.104791] schedule+0xca/0x260
[ 748.108978] futex_wait_queue_me+0x27e/0x420
[ 748.114333] futex_wait+0x251/0x550
[ 748.118814] do_futex+0x75b/0xf80
[ 748.123097] __x64_sys_futex+0x231/0x2a0
[ 748.128065] do_syscall_64+0x117/0x400
[ 748.132835] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 748.139066]
[ 748.141289] The buggy address belongs to the object at ffff88a027698000
[ 748.141289] which belongs to the cache task_struct of size 12160
[ 748.156589] The buggy address is located 2272 bytes inside of
[ 748.156589] 12160-byte region [ffff88a027698000, ffff88a02769af80)
[ 748.171114] The buggy address belongs to the page:
[ 748.177055] page:ffffea00809da600 count:1 mapcount:0 mapping:ffff888107d01e00 index:0x0 compound_mapcount: 0
[ 748.189136] flags: 0x57ffffc0008100(slab|head)
[ 748.194688] raw: 0057ffffc0008100 ffffea00809a3200 0000000300000003 ffff888107d01e00
[ 748.204424] raw: 0000000000000000 0000000000020002 00000001ffffffff 0000000000000000
[ 748.214146] page dumped because: kasan: bad access detected
[ 748.220976]
[ 748.223197] Memory state around the buggy address:
[ 748.229128] ffff88a027698780: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 748.238271] ffff88a027698800: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 748.247414] >ffff88a027698880: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 748.256564] ^
[ 748.264267] ffff88a027698900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 748.273493] ffff88a027698980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 748.282630] ==================================================================

Fixes: b086ff87251b4a4 ("connector: add parent pid and tgid to coredump and exit events")
Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Acked-by: Evgeniy Polyakov <zbr@ioremap.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b086ff87 30-Apr-2018 Stefan Strogin <sstrogin@cisco.com>

connector: add parent pid and tgid to coredump and exit events

The intention is to get notified of process failures as soon
as possible, before a possible core dumping (which could be very long)
(e.g. in some process-manager). Coredump and exit process events
are perfect for such use cases (see 2b5faa4c553f "connector: Added
coredumping event to the process connector").

The problem is that for now the process-manager cannot know the parent
of a dying process using connectors. This could be useful if the
process-manager should monitor for failures only children of certain
parents, so we could filter the coredump and exit events by parent
process and/or thread ID.

Add parent pid and tgid to coredump and exit process connectors event
data.

Signed-off-by: Stefan Strogin <sstrogin@cisco.com>
Acked-by: Evgeniy Polyakov <zbr@ioremap.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 8297f2d9 04-Jul-2016 Paul Gortmaker <paul.gortmaker@windriver.com>

connector: make cn_proc explicitly non-modular

The Kconfig controlling build of this code is currently:

drivers/connector/Kconfig:config PROC_EVENTS
drivers/connector/Kconfig: bool "Report process events to userspace"

...meaning that it currently is not being built as a module by anyone.
Lets remove the two modular references, so that when reading the driver
there is no doubt it is builtin-only.

Since module_init translates to device_initcall in the non-modular
case, the init ordering remains unchanged with this commit.

Cc: Evgeniy Polyakov <zbr@ioremap.net>
Cc: netdev@vger.kernel.org
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ab8ed951 24-Jun-2016 Aaron Campbell <aaron@monkey.org>

connector: fix out-of-order cn_proc netlink message delivery

The proc connector messages include a sequence number, allowing userspace
programs to detect lost messages. However, performing this detection is
currently more difficult than necessary, since netlink messages can be
delivered to the application out-of-order. To fix this, leave pre-emption
disabled during cn_netlink_send(), and use GFP_NOWAIT.

The following was written as a test case. Building the kernel w/ make -j32
proved a reliable way to generate out-of-order cn_proc messages.

int
main(int argc, char *argv[])
{
static uint32_t last_seq[CPU_SETSIZE], seq;
int cpu, fd;
struct sockaddr_nl sa;
struct __attribute__((aligned(NLMSG_ALIGNTO))) {
struct nlmsghdr nl_hdr;
struct __attribute__((__packed__)) {
struct cn_msg cn_msg;
struct proc_event cn_proc;
};
} rmsg;
struct __attribute__((aligned(NLMSG_ALIGNTO))) {
struct nlmsghdr nl_hdr;
struct __attribute__((__packed__)) {
struct cn_msg cn_msg;
enum proc_cn_mcast_op cn_mcast;
};
} smsg;

fd = socket(PF_NETLINK, SOCK_DGRAM, NETLINK_CONNECTOR);
if (fd < 0) {
perror("socket");
}

sa.nl_family = AF_NETLINK;
sa.nl_groups = CN_IDX_PROC;
sa.nl_pid = getpid();
if (bind(fd, (struct sockaddr *)&sa, sizeof(sa)) < 0) {
perror("bind");
}

memset(&smsg, 0, sizeof(smsg));
smsg.nl_hdr.nlmsg_len = sizeof(smsg);
smsg.nl_hdr.nlmsg_pid = getpid();
smsg.nl_hdr.nlmsg_type = NLMSG_DONE;
smsg.cn_msg.id.idx = CN_IDX_PROC;
smsg.cn_msg.id.val = CN_VAL_PROC;
smsg.cn_msg.len = sizeof(enum proc_cn_mcast_op);
smsg.cn_mcast = PROC_CN_MCAST_LISTEN;
if (send(fd, &smsg, sizeof(smsg), 0) != sizeof(smsg)) {
perror("send");
}

while (recv(fd, &rmsg, sizeof(rmsg), 0) == sizeof(rmsg)) {
cpu = rmsg.cn_proc.cpu;
if (cpu < 0) {
continue;
}
seq = rmsg.cn_msg.seq;
if ((last_seq[cpu] != 0) && (seq != last_seq[cpu] + 1)) {
printf("out-of-order seq=%d on cpu=%d\n", seq, cpu);
}
last_seq[cpu] = seq;
}

/* NOTREACHED */

perror("recv");

return -1;
}

Signed-off-by: Aaron Campbell <aaron@monkey.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 9e93f21b 16-Jul-2014 Thomas Gleixner <tglx@linutronix.de>

connector: Use ktime_get_ns()

Replace the ever recurring:
ts = ktime_get_ts();
ns = timespec_to_ns(&ts);
with
ns = ktime_get_ns();

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Evgeniy Polyakov <zbr@ioremap.net>
Signed-off-by: John Stultz <john.stultz@linaro.org>


# 90f62cf3 23-Apr-2014 Eric W. Biederman <ebiederm@xmission.com>

net: Use netlink_ns_capable to verify the permisions of netlink messages

It is possible by passing a netlink socket to a more privileged
executable and then to fool that executable into writing to the socket
data that happens to be valid netlink message to do something that
privileged executable did not intend to do.

To keep this from happening replace bare capable and ns_capable calls
with netlink_capable, netlink_net_calls and netlink_ns_capable calls.
Which act the same as the previous calls except they verify that the
opener of the socket had the desired permissions as well.

Reported-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ac8f7330 15-Jan-2014 David Fries <David@Fries.net>

connector: add portid to unicast in addition to broadcasting

This allows replying only to the requestor portid while still
supporting broadcasting. Pass 0 to portid for the previous behavior.

Signed-off-by: David Fries <David@Fries.net>
Acked-by: Evgeniy Polyakov <zbr@ioremap.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# 1ca1a4cf 13-Nov-2013 Chris Metcalf <cmetcalf@tilera.com>

connector: improved unaligned access error fix

In af3e095a1fb4, Erik Jacobsen fixed one type of unaligned access
bug for ia64 by converting a 64-bit write to use put_unaligned().
Unfortunately, since gcc will convert a short memset() to a series
of appropriately-aligned stores, the problem is now visible again
on tilegx, where the memset that zeros out proc_event is converted
to three 64-bit stores, causing an unaligned access panic.

A better fix for the original problem is to ensure that proc_event
is aligned to 8 bytes here. We can do that relatively easily by
arranging to start the struct cn_msg aligned to 8 bytes and then
offset by 4 bytes. Doing so means that the immediately following
proc_event structure is then correctly aligned to 8 bytes.

The result is that the memset() stores are now aligned, and as an
added benefit, we can remove the put_unaligned() calls in the code.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# e727ca82 30-Sep-2013 Mathias Krause <minipli@googlemail.com>

proc connector: fix info leaks

Initialize event_data for all possible message types to prevent leaking
kernel stack contents to userland (up to 20 bytes). Also set the flags
member of the connector message to 0 to prevent leaking two more stack
bytes this way.

Cc: stable@vger.kernel.org # v2.6.15+
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 2b5faa4c 19-Mar-2013 Jesper Derehag <jderehag@hotmail.com>

connector: Added coredumping event to the process connector

Process connector can now also detect coredumping events.

Main aim of patch is get notified at start of coredumping, instead of
having to wait for it to finish and then being notified through EXIT
event.

Could be used for instance by process-managers that want to get
notified as soon as possible about process failures, and not
necessarily beeing notified after coredump, which could be in the
order of minutes depending on size of coredump, piping and so on.

Signed-off-by: Jesper Derehag <jderehag@hotmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# e70ab977 25-Feb-2013 Kees Cook <keescook@chromium.org>

proc connector: reject unprivileged listener bumps

While PROC_CN_MCAST_LISTEN/IGNORE is entirely advisory, it was possible
for an unprivileged user to turn off notifications for all listeners by
sending PROC_CN_MCAST_IGNORE. Instead, require the same privileges as
required for a multicast bind.

Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Evgeniy Polyakov <zbr@ioremap.net>
Cc: Matt Helsley <matthltc@us.ibm.com>
Cc: stable@vger.kernel.org
Acked-by: Evgeniy Polyakov <zbr@ioremap.net>
Acked-by: Matt Helsley <matthltc@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 9582d901 07-Feb-2012 Eric W. Biederman <ebiederm@xmission.com>

userns: Convert process event connector to handle kuids and kgids

- Only allow asking for events from the initial user and pid namespace,
where we generate the events in.

- Convert kuids and kgids into the initial user namespace to report
them via the process event connector.

Cc: David Miller <davem@davemloft.net>
Acked-by: Evgeniy Polyakov <zbr@ioremap.net>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>


# f3c48ecc 14-Jul-2012 Valentin Ilie <valentin.ilie@gmail.com>

drivers: connector: fixed coding style issues

V2: Replaced assignment in if statement.
Fixed coding style issues.

Signed-off-by: Valentin Ilie <valentin.ilie@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# f786ecba 21-Sep-2011 Vladimir Zapolskiy <vzapolskiy@gmail.com>

connector: add comm change event report to proc connector

Add an event to monitor comm value changes of tasks. Such an event
becomes vital, if someone desires to control threads of a process in
different manner.

A natural characteristic of threads is its comm value, and helpfully
application developers have an opportunity to change it in runtime.
Reporting about such events via proc connector allows to fine-grain
monitoring and control potentials, for instance a process control daemon
listening to proc connector and following comm value policies can place
specific threads to assigned cgroup partitions.

It might be possible to achieve a pale partial one-shot likeness without
this update, if an application changes comm value of a thread generator
task beforehand, then a new thread is cloned, and after that proc
connector listener gets the fork event and reads new thread's comm value
from procfs stat file, but this change visibly simplifies and extends the
matter.

Signed-off-by: Vladimir Zapolskiy <vzapolskiy@gmail.com>
Acked-by: Evgeniy Polyakov <zbr@ioremap.net>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 9e8f90df 28-Jul-2011 Oleg Nesterov <oleg@redhat.com>

proc_fork_connector: a lockless ->real_parent usage is not safe

proc_fork_connector() uses ->real_parent lockless. This is not safe if
copy_process() was called with CLONE_THREAD or CLONE_PARENT, in this case
the parent != current can go away at any moment.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Vladimir Zapolskiy <vzapolskiy@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Evgeniy Polyakov <zbr@ioremap.net>
Cc: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 60063497 26-Jul-2011 Arun Sharma <asharma@fb.com>

atomic: use <linux/atomic.h>

This allows us to move duplicated code in <asm/atomic.h>
(atomic_inc_not_zero() for now) to <linux/atomic.h>

Signed-off-by: Arun Sharma <asharma@fb.com>
Reviewed-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: David Miller <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# f19da2ce 25-Jul-2011 Andrew Morton <akpm@linux-foundation.org>

drivers/connector/cn_proc.c: remove unused local

Fix the warning

drivers/connector/cn_proc.c: In function 'proc_ptrace_connector':
drivers/connector/cn_proc.c:176: warning: unused variable 'tracer'

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 5ec9e63c 23-Jul-2011 Wanlong Gao <gaowanlong@cn.fujitsu.com>

drivers:connector:remove an unused variable *tracer*

The variable 'tracer' never be used, so remove it.
Added by f701e5b73a1a79ea62ffd45d9e2bed4c7d5c1fd2.

Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# f701e5b7 15-Jul-2011 Vladimir Zapolskiy <vzapolskiy@gmail.com>

connector: add an event for monitoring process tracers

This change adds a procfs connector event, which is emitted on every
successful process tracer attach or detach.

If some process connects to other one, kernelspace connector reports
process id and thread group id of both these involved processes. On
disconnection null process id is returned.

Such an event allows to create a simple automated userspace mechanism
to be aware about processes connecting to others, therefore predefined
process policies can be applied to them if needed.

Note, a detach signal is emitted only in case, if a tracer process
explicitly executes PTRACE_DETACH request. In other cases like tracee
or tracer exit detach event from proc connector is not reported.

Signed-off-by: Vladimir Zapolskiy <vzapolskiy@gmail.com>
Acked-by: Evgeniy Polyakov <zbr@ioremap.net>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>


# 3ea9f683 08-Dec-2010 Christoph Lameter <cl@linux.com>

connector: Use this_cpu operations

The patch was originally in the use cpuops patchset but it needs an
inc_return and is therefore dependent on an extension of the cpu ops.
Fixed up and verified that it compiles.

get_seq can benefit from this_cpu_operations. Address calculation is
avoided and the increment is done using an xadd.

Cc: Scott James Remnant <scott@ubuntu.com>
Cc: Mike Frysinger <vapier@gentoo.org>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Tejun Heo <tj@kernel.org>


# 5a0e3ad6 24-Mar-2010 Tejun Heo <tj@kernel.org>

include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h

percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.

2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).

* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>


# f0b25932 06-Oct-2009 Stephen Boyd <bebarino@gmail.com>

connector: Fix incompatible pointer type warning

Commit 7069331 (connector: Provide the sender's credentials to the
callback, 2009-10-02) changed callbacks to take two arguments but missed
this one.

drivers/connector/cn_proc.c: In function ‘cn_proc_init’:
drivers/connector/cn_proc.c:263: warning: passing argument 3 of
‘cn_add_callback’ from incompatible pointer type

Signed-off-by: Stephen Boyd <bebarino@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 02b51df1 22-Sep-2009 Scott James Remnant <scott@ubuntu.com>

proc connector: add event for process becoming session leader

The act of a process becoming a session leader is a useful signal to a
supervising init daemon such as Upstart.

While a daemon will normally do this as part of the process of becoming a
daemon, it is rare for its children to do so. When the children do, it is
nearly always a sign that the child should be considered detached from the
parent and not supervised along with it.

The poster-child example is OpenSSH; the per-login children call setsid()
so that they may control the pty connected to them. If the primary daemon
dies or is restarted, we do not want to consider the per-login children
and want to respawn the primary daemon without killing the children.

This patch adds a new PROC_SID_EVENT and associated structure to the
proc_event event_data union, it arranges for this to be emitted when the
special PIDTYPE_SID pid is set.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Scott James Remnant <scott@ubuntu.com>
Acked-by: Matt Helsley <matthltc@us.ibm.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
Acked-by: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 0741241c 17-Jul-2009 Mike Frysinger <vapier@gentoo.org>

connector: make callback argument type explicit

The connector documentation states that the argument to the callback
function is always a pointer to a struct cn_msg, but rather than encode it
in the API itself, it uses a void pointer everywhere. This doesn't make
much sense to encode the pointer in documentation as it prevents proper C
type checking from occurring and can easily allow people to use the wrong
pointer type. So convert the argument type to an explicit struct cn_msg
pointer.

Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c69e8d9c 13-Nov-2008 David Howells <dhowells@redhat.com>

CRED: Use RCU to access another task's creds and to release a task's own creds

Use RCU to access another task's creds and to release a task's own creds.
This means that it will be possible for the credentials of a task to be
replaced without another task (a) requiring a full lock to read them, and (b)
seeing deallocated memory.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: James Morris <jmorris@namei.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: James Morris <jmorris@namei.org>


# b6dff3ec 13-Nov-2008 David Howells <dhowells@redhat.com>

CRED: Separate task security context from task_struct

Separate the task security context from task_struct. At this point, the
security data is temporarily embedded in the task_struct with two pointers
pointing to it.

Note that the Alpha arch is altered as it refers to (E)UID and (E)GID in
entry.S via asm-offsets.

With comment fixes Signed-off-by: Marc Dionne <marc.c.dionne@gmail.com>

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: James Morris <jmorris@namei.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: James Morris <jmorris@namei.org>


# af3e095a 05-Jan-2007 Erik Jacobson <erikj@sgi.com>

[PATCH] connector: some fixes for ia64 unaligned access errors

On ia64, the various functions that make up cn_proc.c cause kernel
unaligned access errors.

If you are using these, for example, to get notification about all tasks
forking and exiting, you get multiple unaligned access errors per process.

Use put_unaligned() in the appropriate palces to fix this.

Signed-off-by: Erik Jacobson <erikj@sgi.com>
Cc: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: <stable@kernel.org>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 822cfbff 30-Jul-2006 Chandra Seetharaman <sekharan@us.ibm.com>

[PATCH] Process Events: Fix biarch compatibility issue. use __u64 timestamp

Events sent by Process Events Connector from a 64-bit kernel are not binary
compatible with a 32-bit userspace program because the "timestamp" field
(struct timespec) is not arch independent. This affects the fields that
follow "timestamp" as they will be be off by 8 bytes.

This is a problem for 32-bit userspace programs running with 64-bit kernels
on ppc64, s390, x86-64.. any "biarch" system.

Matt had submitted a different solution to lkml as an RFC earlier. We have
since switched to a solution recommended by Evgeniy Polyakov.

This patch fixes the problem by changing the timestamp to be a __u64, which
stores the number of nanoseconds.

Tested on a x86_64 system with both 32 bit application and 64 bit
application and on a i386 system.

Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Signed-off-by: Matt Helsley <matthltc@us.ibm.com>
Cc: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
Cc: Guillaume Thouvenin <guillaume.thouvenin@bull.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 1d31a4ea 23-Jun-2006 Matt Helsley <matthltc@us.ibm.com>

[PATCH] Process Events - Header Cleanup

Move connector header include to precisely where it's needed.

Remove unused time.h header file as well. This was leftover from previous
iterations of the process events patches.

Signed-off-by: Matt Helsley <matthltc@us.ibm.com>
Cc: Guillaume Thouvenin <guillaume.thouvenin@bull.net>
Cc: Nguyen Anh Quynh <aquynh@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# caf3c9dc 09-Jan-2006 Matt Helsley <matthltc@us.ibm.com>

[PATCH] Switch getnstimestamp() calls to ktime_get_ts()

Use ktime_get_ts() to take the timestamp instead of getnstimestamp(). This
patch prepares to remove getnstimestamp() by switching its only user to a
different function with almost exactly the same code.

Signed-off-by: Matt Helsley <matthltc@us.ibm.com>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# cc398c2e 08-Jan-2006 David S. Miller <davem@davemloft.net>

[PATCH] drivers/connector/cn_proc.c typos

The parameter to put_cpu_var() is unreferenced by the implementation, and
the compiler doesn't try to comprehend comments, so this wouldn't cause any
problem, but if bugged me enough to post a fix :-)

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 5650b736 12-Dec-2005 Matt Helsley <matthltc@us.ibm.com>

[PATCH] Add timestamp field to process events

This adds a timestamp field to the events sent via the process event
connector. The timestamp allows listeners to accurately account the
duration(s) between a process' events and offers strong means with which
to determine the order of events with respect to a given task while also
avoiding the addition of per-task data.

This alters the size and layout of the event structure and hence would
break compatibility if process events connector as it stands in 2.6.15-rc2
were released as a mainline kernel.

Signed-off-by: Matt Helsley <matthltc@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 9f46080c 07-Nov-2005 Matt Helsley <matthltc@us.ibm.com>

[PATCH] Process Events Connector

This patch adds a connector that reports fork, exec, id change, and exit
events for all processes to userspace. It replaces the fork_advisor patch
that ELSA is currently using. Applications that may find these events
useful include accounting/auditing (e.g. ELSA), system activity monitoring
(e.g. top), security, and resource management (e.g. CKRM).

Signed-off-by: Matt Helsley <matthltc@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>