History log of /linux-master/net/netlink/genetlink.c
Revision Date Author Comments
# 87d38197 02-Mar-2024 Jakub Kicinski <kuba@kernel.org>

genetlink: fit NLMSG_DONE into same read() as families

Make sure ctrl_fill_info() returns sensible error codes and
propagate them out to netlink core. Let netlink core decide
when to return skb->len and when to treat the exit as an
error. Netlink core does better job at it, if we always
return skb->len the core doesn't know when we're done
dumping and NLMSG_DONE ends up in a separate read().

Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 3de21a89 12-Feb-2024 Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>

genetlink: Add per family bind/unbind callbacks

Add genetlink family bind()/unbind() callbacks when adding/removing
multicast group to/from netlink client socket via setsockopt() or
bind() syscall.

They can be used to track if consumers of netlink multicast messages
emerge or disappear. Thus, a client implementing callbacks, can now
send events only when there are active consumers, preventing unnecessary
work when none exist.

Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20240212161615.161935-2-stanislaw.gruszka@linux.intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# cd4d7263 20-Dec-2023 Ido Schimmel <idosch@nvidia.com>

genetlink: Use internal flags for multicast groups

As explained in commit e03781879a0d ("drop_monitor: Require
'CAP_SYS_ADMIN' when joining "events" group"), the "flags" field in the
multicast group structure reuses uAPI flags despite the field not being
exposed to user space. This makes it impossible to extend its use
without adding new uAPI flags, which is inappropriate for internal
kernel checks.

Solve this by adding internal flags (i.e., "GENL_MCAST_*") and convert
the existing users to use them instead of the uAPI flags.

Tested using the reproducers in commit 44ec98ea5ea9 ("psample: Require
'CAP_NET_ADMIN' when joining "packets" group") and commit e03781879a0d
("drop_monitor: Require 'CAP_SYS_ADMIN' when joining "events" group").

No functional changes intended.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a7311324 16-Dec-2023 Jiri Pirko <jiri@resnulli.us>

genetlink: introduce per-sock family private storage

Introduce an xarray for Generic netlink family to store per-socket
private. Initialize this xarray only if family uses per-socket privs.

Introduce genl_sk_priv_get() to get the socket priv pointer for a family
and initialize it in case it does not exist.
Introduce __genl_sk_priv_get() to obtain socket priv pointer for a
family under RCU read lock.

Allow family to specify the priv size, init() and destroy() callbacks.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>


# e0378187 06-Dec-2023 Ido Schimmel <idosch@nvidia.com>

drop_monitor: Require 'CAP_SYS_ADMIN' when joining "events" group

The "NET_DM" generic netlink family notifies drop locations over the
"events" multicast group. This is problematic since by default generic
netlink allows non-root users to listen to these notifications.

Fix by adding a new field to the generic netlink multicast group
structure that when set prevents non-root users or root without the
'CAP_SYS_ADMIN' capability (in the user namespace owning the network
namespace) from joining the group. Set this field for the "events"
group. Use 'CAP_SYS_ADMIN' rather than 'CAP_NET_ADMIN' because of the
nature of the information that is shared over this group.

Note that the capability check in this case will always be performed
against the initial user namespace since the family is not netns aware
and only operates in the initial network namespace.

A new field is added to the structure rather than using the "flags"
field because the existing field uses uAPI flags and it is inappropriate
to add a new uAPI flag for an internal kernel check. In net-next we can
rework the "flags" field to use internal flags and fold the new field
into it. But for now, in order to reduce the amount of changes, add a
new field.

Since the information can only be consumed by root, mark the control
plane operations that start and stop the tracing as root-only using the
'GENL_ADMIN_PERM' flag.

Tested using [1].

Before:

# capsh -- -c ./dm_repo
# capsh --drop=cap_sys_admin -- -c ./dm_repo

After:

# capsh -- -c ./dm_repo
# capsh --drop=cap_sys_admin -- -c ./dm_repo
Failed to join "events" multicast group

[1]
$ cat dm.c
#include <stdio.h>
#include <netlink/genl/ctrl.h>
#include <netlink/genl/genl.h>
#include <netlink/socket.h>

int main(int argc, char **argv)
{
struct nl_sock *sk;
int grp, err;

sk = nl_socket_alloc();
if (!sk) {
fprintf(stderr, "Failed to allocate socket\n");
return -1;
}

err = genl_connect(sk);
if (err) {
fprintf(stderr, "Failed to connect socket\n");
return err;
}

grp = genl_ctrl_resolve_grp(sk, "NET_DM", "events");
if (grp < 0) {
fprintf(stderr,
"Failed to resolve \"events\" multicast group\n");
return grp;
}

err = nl_socket_add_memberships(sk, grp, NFNLGRP_NONE);
if (err) {
fprintf(stderr, "Failed to join \"events\" multicast group\n");
return err;
}

return 0;
}
$ gcc -I/usr/include/libnl3 -lnl-3 -lnl-genl-3 -o dm_repo dm.c

Fixes: 9a8afc8d3962 ("Network Drop Monitor: Adding drop monitor implementation & Netlink protocol")
Reported-by: "The UK's National Cyber Security Centre (NCSC)" <security@ncsc.gov.uk>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20231206213102.1824398-3-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# f862ed2d 21-Oct-2023 Jiri Pirko <jiri@resnulli.us>

genetlink: don't merge dumpit split op for different cmds into single iter

Currently, split ops of doit and dumpit are merged into a single iter
item when they are subsequent. However, there is no guarantee that the
dumpit op is for the same cmd as doit op.

Fix this by checking if cmd is the same for both.
This problem does not occur in existing families.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20231021112711.660606-2-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 5c670a01 14-Aug-2023 Jakub Kicinski <kuba@kernel.org>

genetlink: add a family pointer to struct genl_info

Having family in struct genl_info is quite useful. It cuts
down the number of arguments which need to be passed to
helpers which already take struct genl_info.

Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20230814214723.2924989-7-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 7288dd2f 14-Aug-2023 Jakub Kicinski <kuba@kernel.org>

genetlink: use attrs from struct genl_info

Since dumps carry struct genl_info now, use the attrs pointer
from genl_info and remove the one in struct genl_dumpit_info.

Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Reviewed-by: Miquel Raynal <miquel.raynal@bootlin.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20230814214723.2924989-6-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 9272af10 14-Aug-2023 Jakub Kicinski <kuba@kernel.org>

genetlink: add struct genl_info to struct genl_dumpit_info

Netlink GET implementations must currently juggle struct genl_info
and struct netlink_callback, depending on whether they were called
from doit or dumpit.

Add genl_info to the dump state and populate the fields.
This way implementations can simply pass struct genl_info around.

Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20230814214723.2924989-5-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# bffcc688 14-Aug-2023 Jakub Kicinski <kuba@kernel.org>

genetlink: remove userhdr from struct genl_info

Only three families use info->userhdr today and going forward
we discourage using fixed headers in new families.
So having the pointer to user header in struct genl_info
is an overkill. Compute the header pointer at runtime.

Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Aaron Conole <aconole@redhat.com>
Link: https://lore.kernel.org/r/20230814214723.2924989-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 84817d8c 14-Aug-2023 Jakub Kicinski <kuba@kernel.org>

genetlink: push conditional locking into dumpit/done

Add helpers which take/release the genl mutex based
on family->parallel_ops. Remove the separation between
handling of ops in locked and parallel families.

Future patches would make the duplicated code grow even more.

Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20230814214723.2924989-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 5766946e 20-Jul-2023 Jiri Pirko <jiri@resnulli.us>

genetlink: add explicit ordering break check for split ops

Currently, if cmd in the split ops array is of lower value than the
previous one, genl_validate_ops() continues to do the checks as if
the values are equal. This may result in non-obvious WARN_ON() hit in
these check.

Instead, check the incorrect ordering explicitly and put a WARN_ON()
in case it is broken.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20230720111354.562242-1-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 5ab8c41c 09-Jun-2023 Jakub Kicinski <kuba@kernel.org>

netlink: support extack in dump ->start()

Commit 4a19edb60d02 ("netlink: Pass extack to dump handlers")
added extack support to netlink dumps. It was focused on rtnl
and since rtnl does not use ->start(), ->done() callbacks
it ignored those. Genetlink on the other hand uses ->start()
extensively, for parsing and input validation.

Pass the extact in via struct netlink_dump_control and link
it to cb for the time of ->start(). Both struct netlink_dump_control
and extack itself live on the stack so we can't keep the same
extack for the duration of the dump. This means that the extack
visible in ->start() and each ->dump() callbacks will be different.
Corner cases like reporting a warning message in DONE across dump
calls are still not supported.

We could put the extack (for dumps) in the socket struct,
but layering makes it slightly awkward (extack pointer is decided
before the DO / DUMP split).

The genetlink dump error extacks are now surfaced:

$ cli.py --spec netlink/specs/ethtool.yaml --dump channels-get
lib.ynl.NlError: Netlink error: Invalid argument
nl_len = 64 (48) nl_flags = 0x300 nl_type = 2
error: -22 extack: {'msg': 'request header missing'}

Previously extack was missing:

$ cli.py --spec netlink/specs/ethtool.yaml --dump channels-get
lib.ynl.NlError: Netlink error: Invalid argument
nl_len = 36 (20) nl_flags = 0x100 nl_type = 2
error: -22

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d4545bf9 08-Feb-2023 Andy Shevchenko <andriy.shevchenko@linux.intel.com>

genetlink: Use string_is_terminated() helper

Use string_is_terminated() helper instead of cpecific memchr() call.
This shows better the intention of the call.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20230208133153.22528-2-andriy.shevchenko@linux.intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# c1b05105 09-Nov-2022 Jakub Kicinski <kuba@kernel.org>

genetlink: fix single op policy dump when do is present

Jonathan reports crashes when running net-next in Meta's fleet.
Stats collection uses ethtool -I which does a per-op policy dump
to check if stats are supported. We don't initialize the dumpit
information if doit succeeds due to evaluation short-circuiting.

The crash may look like this:

BUG: kernel NULL pointer dereference, address: 0000000000000cc0
RIP: 0010:netlink_policy_dump_add_policy+0x174/0x2a0
ctrl_dumppolicy_start+0x19f/0x2f0
genl_start+0xe7/0x140

Or we may trigger a warning:

WARNING: CPU: 1 PID: 785 at net/netlink/policy.c:87 netlink_policy_dump_get_policy_idx+0x79/0x80
RIP: 0010:netlink_policy_dump_get_policy_idx+0x79/0x80
ctrl_dumppolicy_put_op+0x214/0x360

depending on what garbage we pick up from the stack.

Reported-by: Jonathan Lemon <bsd@meta.com>
Fixes: 26588edbef60 ("genetlink: support split policies in ctrl_dumppolicy_put_op()")
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/20221109183254.554051-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 154ba79c 08-Nov-2022 Jakub Kicinski <kuba@kernel.org>

genetlink: correctly begin the iteration over policies

The return value from genl_op_iter_init() only tells us if
there are any policies but to begin the iteration (and therefore
load the first entry) we need to call genl_op_iter_next().
Note that it's safe to call genl_op_iter_next() on a family
with no ops, it will just return false.

This may lead to various crashes, a warning in
netlink_policy_dump_get_policy_idx() when policy is not found
or.. no problem at all if the kmalloc'ed memory happens to be
zeroed.

Fixes: b502b3185cd6 ("genetlink: use iterator in the op to policy map dumping")
Link: https://lore.kernel.org/r/20221108204128.330287-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# aba22ca8 04-Nov-2022 Jakub Kicinski <kuba@kernel.org>

genetlink: convert control family to split ops

Prove that the split ops work.
Sadly we need to keep bug-wards compatibility and specify
the same policy for dump as do, even tho we don't parse
inputs for the dump.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b8fd60c3 04-Nov-2022 Jakub Kicinski <kuba@kernel.org>

genetlink: allow families to use split ops directly

Let families to hook in the new split ops.

They are more flexible and should not be much larger than
full ops. Each split op is 40B while full op is 48B.
Devlink for example has 54 dos and 19 dumps, 2 of the dumps
do not have a do -> 56 full commands = 2688B.
Split ops would have taken 2920B, so 9% more space while
allowing individual per/post doit and per-type policies.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 7acfbbe1 04-Nov-2022 Jakub Kicinski <kuba@kernel.org>

genetlink: inline old iteration helpers

All dumpers use the iterators now, inline the cmd by index
stuff into iterator code.

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b502b318 04-Nov-2022 Jakub Kicinski <kuba@kernel.org>

genetlink: use iterator in the op to policy map dumping

We can't put the full iterator in the struct ctrl_dump_policy_ctx
because dump context is statically sized by netlink core.
Allocate it dynamically.

Rename policy to dump_map to make the logic a little easier to follow.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 6557461c 04-Nov-2022 Jakub Kicinski <kuba@kernel.org>

genetlink: add iterator for walking family ops

Subsequent changes will expose split op structures to users,
so walking the family ops with just an index will get harder.
Add a structured iterator, convert the simple cases.
Policy dumping needs more careful conversion.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 8d84322a 04-Nov-2022 Jakub Kicinski <kuba@kernel.org>

genetlink: inline genl_get_cmd()

All callers go via genl_get_cmd_split() now, so rename it
to genl_get_cmd() remove the original.

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 26588edb 04-Nov-2022 Jakub Kicinski <kuba@kernel.org>

genetlink: support split policies in ctrl_dumppolicy_put_op()

Pass do and dump versions of the op to ctrl_dumppolicy_put_op()
so that it can provide a different policy index for the two.

Since we now look at policies, and those are set appropriately
there's no need to look at the GENL_DONT_VALIDATE_DUMP flag.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 92d3d9ba 04-Nov-2022 Jakub Kicinski <kuba@kernel.org>

genetlink: add policies for both doit and dumpit in ctrl_dumppolicy_start()

Separate adding doit and dumpit policies for CTRL_CMD_GETPOLICY.
This has no effect until we actually allow do and dump to come
from different sources as netlink_policy_dump_add_policy()
does deduplication.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# e1a24891 04-Nov-2022 Jakub Kicinski <kuba@kernel.org>

genetlink: check for callback type at op load time

Now that genl_get_cmd_split() is informed what type of callback
user is trying to access (do or dump) we can check that this
callback is indeed available and return an error early.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 7747eb75 04-Nov-2022 Jakub Kicinski <kuba@kernel.org>

genetlink: load policy based on validation flags

Set the policy and maxattr pointers based on validation flags.
genl_family_rcv_msg_attrs_parse() will do nothing and return NULL
if maxattrs is zero, so no behavior change is expected.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 20b0b53a 04-Nov-2022 Jakub Kicinski <kuba@kernel.org>

genetlink: introduce split op representation

We currently have two forms of operations - small ops and "full" ops
(or just ops). The former does not have pointers for some of the less
commonly used features (namely dump start/done and policy).

The "full" ops, however, still don't contain all the necessary
information. In particular the policy is per command ID, while
do and dump often accept different attributes. It's also not
possible to define different pre_doit and post_doit callbacks
for different commands within the family.

At the same time a lot of commands do not support dumping and
therefore all the dump-related information is wasted space.

Create a new command representation which can hold info about
a do implementation or a dump implementation, but not both at
the same time.

Use this new representation on the command execution path
(genl_family_rcv_msg) as we either run a do or a dump and
don't have to create a "full" op there.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ff14adbd 04-Nov-2022 Jakub Kicinski <kuba@kernel.org>

genetlink: refactor the cmd <> policy mapping dump

The code at the top of ctrl_dumppolicy() dumps mappings between
ops and policies. It supports dumping both the entire family and
single op if dump is filtered. But both of those cases are handled
inside a loop, which makes the logic harder to follow and change.
Refactor to split the two cases more clearly.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ce48ebdd 25-Oct-2022 Jakub Kicinski <kuba@kernel.org>

genetlink: limit the use of validation workarounds to old ops

During review of previous change another thing came up - we should
limit the use of validation workarounds to old commands.
Don't list the workarounds one by one, as we're rejecting all existing
ones. We can deal with the masking in the unlikely event that new flag
is added.

Link: https://lore.kernel.org/all/6ba9f727e555fd376623a298d5d305ad408c3d47.camel@sipsolutions.net/
Link: https://lore.kernel.org/r/20221026001524.1892202-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 4fa86555 21-Oct-2022 Jakub Kicinski <kuba@kernel.org>

genetlink: piggy back on resv_op to default to a reject policy

To keep backward compatibility we used to leave attribute parsing
to the family if no policy is specified. This becomes tedious as
we move to more strict validation. Families must define reject all
policies if they don't want any attributes accepted.

Piggy back on the resv_start_op field as the switchover point.
AFAICT only ethtool has added new commands since the resv_start_op
was defined, and it has per-op policies so this should be a no-op.

Nonetheless the patch should still go into v6.1 for consistency.

Link: https://lore.kernel.org/all/20221019125745.3f2e7659@kernel.org/
Link: https://lore.kernel.org/r/20221021193532.1511293-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# cff2d762 29-Sep-2022 Jakub Kicinski <kuba@kernel.org>

genetlink: reject use of nlmsg_flags for new commands

Commit 9c5d03d36251 ("genetlink: start to validate reserved header bytes")
introduced extra validation for genetlink headers. We had to gate it
to only apply to new commands, to maintain bug-wards compatibility.
Use this opportunity (before the new checks make it to Linus's tree)
to add more conditions.

Validate that Generic Netlink families do not use nlmsg_flags outside
of the well-understood set.

Link: https://lore.kernel.org/all/20220928073709.1b93b74a@kernel.org/
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20220929142809.1167546-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 9c5d03d3 24-Aug-2022 Jakub Kicinski <kuba@kernel.org>

genetlink: start to validate reserved header bytes

We had historically not checked that genlmsghdr.reserved
is 0 on input which prevents us from using those precious
bytes in the future.

One use case would be to extend the cmd field, which is
currently just 8 bits wide and 256 is not a lot of commands
for some core families.

To make sure that new families do the right thing by default
put the onus of opting out of validation on existing families.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Paul Moore <paul@paul-moore.com> (NetLabel)
Signed-off-by: David S. Miller <davem@davemloft.net>


# 8f1948bd 25-Aug-2022 Jiri Pirko <jiri@nvidia.com>

genetlink: hold read cb_lock during iteration of genl_fam_idr in genl_bind()

In genl_bind(), currently genl_lock and write cb_lock are taken
for iteration of genl_fam_idr and processing of static values
stored in struct genl_family. Take just read cb_lock for this task
as it is sufficient to guard the idr and the struct against
concurrent genl_register/unregister_family() calls.

This will allow to run genl command processing in genl_rcv() and
mnl_socket_setsockopt(.., NETLINK_ADD_MEMBERSHIP, ..) in parallel.

Reported-by: Vikas Gupta <vikas.gupta@broadcom.com>
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20220825081940.1283335-1-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 24980136 16-Aug-2022 Jakub Kicinski <kuba@kernel.org>

net: genl: fix error path memory leak in policy dumping

If construction of the array of policies fails when recording
non-first policy we need to unwind.

netlink_policy_dump_add_policy() itself also needs fixing as
it currently gives up on error without recording the allocated
pointer in the pstate pointer.

Reported-by: syzbot+dc54d9ba8153b216cae0@syzkaller.appspotmail.com
Fixes: 50a896cf2d6f ("genetlink: properly support per-op policy dumping")
Link: https://lore.kernel.org/r/20220816161939.577583-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# bc830525 29-Jul-2021 Yajun Deng <yajun.deng@linux.dev>

net: netlink: Remove unused function

lockdep_genl_is_held() and its caller arm not used now, just remove them.

Signed-off-by: Yajun Deng <yajun.deng@linux.dev>
Link: https://lore.kernel.org/r/20210729074854.8968-1-yajun.deng@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# f9b282b3 26-Jul-2021 Yajun Deng <yajun.deng@linux.dev>

net: netlink: add the case when nlh is NULL

Add the case when nlh is NULL in nlmsg_report(),
so that the caller doesn't need to deal with this case.

Signed-off-by: Yajun Deng <yajun.deng@linux.dev>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 4d54cc32 12-Feb-2021 Florian Westphal <fw@strlen.de>

mptcp: avoid lock_fast usage in accept path

Once event support is added this may need to allocate memory while msk
lock is held with softirqs disabled.

Not using lock_fast also allows to do the allocation with GFP_KERNEL.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# e992a6ed 03-Oct-2020 Jakub Kicinski <kuba@kernel.org>

genetlink: allow dumping command-specific policy

Right now CTRL_CMD_GETPOLICY can only dump the family-wide
policy. Support dumping policy of a specific op.

v3:
- rebase after per-op policy export and handle that
v2:
- make cmd U32, just in case.
v1:
- don't echo op in the output in a naive way, this should
make it cleaner to extend the output format for dumping
policies for all the commands at once in the future.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20201001225933.1373426-11-kuba@kernel.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 50a896cf 03-Oct-2020 Johannes Berg <johannes.berg@intel.com>

genetlink: properly support per-op policy dumping

Add support for per-op policy dumping. The data is pretty much
as before, except that now the assumption that the policy with
index 0 is "the" policy no longer holds - you now need to look
at the new CTRL_ATTR_OP_POLICY attribute which is a nested attr
(indexed by op) containing attributes for do and dump policies.

When a single op is requested, the CTRL_ATTR_OP_POLICY will be
added in the same way, since do and dump policies may differ.

v2:
- conditionally advertise per-command policies only if there
actually is a policy being used for the do/dump and it's
present at all

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# aa85ee5f 03-Oct-2020 Johannes Berg <johannes.berg@intel.com>

genetlink: factor skb preparation out of ctrl_dumppolicy()

We'll need this later for the per-op policy index dump.

Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 04a351a6 03-Oct-2020 Johannes Berg <johannes.berg@intel.com>

netlink: rework policy dump to support multiple policies

Rework the policy dump code a bit to support adding multiple
policies to a single dump, in order to e.g. support per-op
policies in generic netlink.

v2:
- move kernel-doc to implementation [Jakub]
- squash the first patch to not flip-flop on the prototype
[Jakub]
- merge netlink_policy_dump_get_policy_idx() with the old
get_policy_idx() we already had
- rebase without Jakub's patch to have per-op dump

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a4bb4f5f 02-Oct-2020 Jakub Kicinski <kuba@kernel.org>

genetlink: switch control commands to per-op policies

In preparation for adding a new attribute to CTRL_CMD_GETPOLICY
split the policies for getpolicy and getfamily apart.

This will cause a slight user-visible change in that dumping
the policies will switch from per family to per op, but
supposedly sniffer-type applications (which are the main use
case for policy dumping thus far) should support both, anyway.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 8e1ed28f 02-Oct-2020 Jakub Kicinski <kuba@kernel.org>

genetlink: use parsed attrs in dumppolicy

Attributes are already parsed based on the policy specified
in the family and ready-to-use in info->attrs. No need to
call genlmsg_parse() again.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 48526a0f 02-Oct-2020 Jakub Kicinski <kuba@kernel.org>

genetlink: bring back per op policy

Add policy to the struct genl_ops structure, this time
with maxattr, so it can be used properly.

Propagate .policy and .maxattr from the family
in genl_get_cmd() if needed, this way the rest of the
code does not have to worry if the policy is per op
or global.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 78ade619 02-Oct-2020 Jakub Kicinski <kuba@kernel.org>

genetlink: use .start callback for dumppolicy

The structure of ctrl_dumppolicy() is clearly split into
init and dumping. Move the init to a .start callback
for clarity, it's a more idiomatic netlink dump code structure.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# adc84845 02-Oct-2020 Jakub Kicinski <kuba@kernel.org>

genetlink: add a structure for dump state

Whenever netlink dump uses more than 2 cb->args[] entries
code gets hard to read. We're about to add more state to
ctrl_dumppolicy() so create a structure.

Since the structure is typed and clearly named we can remove
the local fam_id variable and use ctx->fam_id directly.

v3:
- rebase onto explicit free fix
v1:
- s/nl_policy_dump/netlink_policy_dump_state/
- forward declare struct netlink_policy_dump_state,
and move from passing unsigned long to actual pointer type
- add build bug on
- u16 fam_id
- s/args/ctx/

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 0b588afd 02-Oct-2020 Jakub Kicinski <kuba@kernel.org>

genetlink: add small version of ops

We want to add maxattr and policy back to genl_ops, to enable
dumping per command policy to user space. This, however, would
cause bloat for all the families with global policies. Introduce
smaller version of ops (half the size of genl_ops). Translate
these smaller ops into a full blown struct before use in the
core.

v1:
- use struct assignment
- put a full copy of the op in struct genl_dumpit_info
- s/light/small/

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 949ca6b8 02-Oct-2020 Johannes Berg <johannes.berg@intel.com>

netlink: fix policy dump leak

[ Upstream commit a95bc734e60449e7b073ff7ff70c35083b290ae9 ]

If userspace doesn't complete the policy dump, we leak the
allocated state. Fix this.

Fixes: d07dcf9aadd6 ("netlink: add infrastructure to expose policies to userspace")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a95bc734 02-Oct-2020 Johannes Berg <johannes.berg@intel.com>

netlink: fix policy dump leak

If userspace doesn't complete the policy dump, we leak the
allocated state. Fix this.

Fixes: d07dcf9aadd6 ("netlink: add infrastructure to expose policies to userspace")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 85405918 22-Aug-2020 Randy Dunlap <rdunlap@infradead.org>

net: netlink: delete repeated words

Drop duplicated words in net/netlink/.

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c62e7ac3 15-Jul-2020 Daniel Lezcano <daniel.lezcano@linaro.org>

net: genetlink: Move initialization to core_initcall

The generic netlink is initialized far after the netlink protocol
itself at subsys_initcall. The devlink is initialized at the same
level, but after, as shown by a disassembly of the vmlinux:

[ ... ]
374 ffff8000115f22c0 <__initcall_devlink_init4>:
375 ffff8000115f22c4 <__initcall_genl_init4>:
[ ... ]

The function devlink_init() calls genl_register_family() before the
generic netlink subsystem is initialized.

As the generic netlink initcall level is set since 2005, it seems that
was not a problem, but now we have the thermal framework initialized
at the core_initcall level which creates the generic netlink family
and sends a notification which leads to a subtle memory corruption
only detectable when the CONFIG_INIT_ON_ALLOC_DEFAULT_ON option is set
with the earlycon at init time.

The thermal framework needs to be initialized early in order to begin
the mitigation as soon as possible. Moving it to postcore_initcall is
acceptable.

This patch changes the initialization level for the generic netlink
family to the core_initcall and comes after the netlink protocol
initialization.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: David S. Miller <davem@davemloft.net>
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Reviewed-by: Amit Kucheria <amit.kucheria@linaro.org>
Link: https://lore.kernel.org/r/20200715074120.8768-1-daniel.lezcano@linaro.org


# 1e82a62f 30-Jun-2020 Sean Tranchetti <stranche@codeaurora.org>

genetlink: remove genl_bind

A potential deadlock can occur during registering or unregistering a
new generic netlink family between the main nl_table_lock and the
cb_lock where each thread wants the lock held by the other, as
demonstrated below.

1) Thread 1 is performing a netlink_bind() operation on a socket. As part
of this call, it will call netlink_lock_table(), incrementing the
nl_table_users count to 1.
2) Thread 2 is registering (or unregistering) a genl_family via the
genl_(un)register_family() API. The cb_lock semaphore will be taken for
writing.
3) Thread 1 will call genl_bind() as part of the bind operation to handle
subscribing to GENL multicast groups at the request of the user. It will
attempt to take the cb_lock semaphore for reading, but it will fail and
be scheduled away, waiting for Thread 2 to finish the write.
4) Thread 2 will call netlink_table_grab() during the (un)registration
call. However, as Thread 1 has incremented nl_table_users, it will not
be able to proceed, and both threads will be stuck waiting for the
other.

genl_bind() is a noop, unless a genl_family implements the mcast_bind()
function to handle setting up family-specific multicast operations. Since
no one in-tree uses this functionality as Cong pointed out, simply removing
the genl_bind() function will remove the possibility for deadlock, as there
is no attempt by Thread 1 above to take the cb_lock semaphore.

Fixes: c380d9a7afff ("genetlink: pass multicast bind/unbind to families")
Suggested-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Johannes Berg <johannes.berg@intel.com>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Sean Tranchetti <stranche@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# bf64ff4c 27-Jun-2020 Cong Wang <xiyou.wangcong@gmail.com>

genetlink: get rid of family->attrbuf

genl_family_rcv_msg_attrs_parse() reuses the global family->attrbuf
when family->parallel_ops is false. However, family->attrbuf is not
protected by any lock on the genl_family_rcv_msg_doit() code path.

This leads to several different consequences, one of them is UAF,
like the following:

genl_family_rcv_msg_doit(): genl_start():
genl_family_rcv_msg_attrs_parse()
attrbuf = family->attrbuf
__nlmsg_parse(attrbuf);
genl_family_rcv_msg_attrs_parse()
attrbuf = family->attrbuf
__nlmsg_parse(attrbuf);
info->attrs = attrs;
cb->data = info;

netlink_unicast_kernel():
consume_skb()
genl_lock_dumpit():
genl_dumpit_info(cb)->attrs

Note family->attrbuf is an array of pointers to the skb data, once
the skb is freed, any dereference of family->attrbuf will be a UAF.

Maybe we could serialize the family->attrbuf with genl_mutex too, but
that would make the locking more complicated. Instead, we can just get
rid of family->attrbuf and always allocate attrbuf from heap like the
family->parallel_ops==true code path. This may add some performance
overhead but comparing with taking the global genl_mutex, it still
looks better.

Fixes: 75cdbdd08900 ("net: ieee802154: have genetlink code to parse the attrs during dumpit")
Fixes: 057af7071344 ("net: tipc: have genetlink code to parse the attrs during dumpit")
Reported-and-tested-by: syzbot+3039ddf6d7b13daf3787@syzkaller.appspotmail.com
Reported-and-tested-by: syzbot+80cad1e3cb4c41cde6ff@syzkaller.appspotmail.com
Reported-and-tested-by: syzbot+736bcbcb11b60d0c0792@syzkaller.appspotmail.com
Reported-and-tested-by: syzbot+520f8704db2b68091d44@syzkaller.appspotmail.com
Reported-and-tested-by: syzbot+c96e4dfb32f8987fdeed@syzkaller.appspotmail.com
Cc: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b65ce380 12-Jun-2020 Cong Wang <xiyou.wangcong@gmail.com>

genetlink: clean up family attributes allocations

genl_family_rcv_msg_attrs_parse() and genl_family_rcv_msg_attrs_free()
take a boolean parameter to determine whether allocate/free the family
attrs. This is unnecessary as we can just check family->parallel_ops.
More importantly, callers would not need to worry about pairing these
parameters correctly after this patch.

And this fixes a memory leak, as after commit c36f05559104
("genetlink: fix memory leaks in genl_family_rcv_msg_dumpit()")
we call genl_family_rcv_msg_attrs_parse() for both parallel and
non-parallel cases.

Fixes: c36f05559104 ("genetlink: fix memory leaks in genl_family_rcv_msg_dumpit()")
Reported-by: Ido Schimmel <idosch@idosch.org>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Tested-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c36f0555 02-Jun-2020 Cong Wang <xiyou.wangcong@gmail.com>

genetlink: fix memory leaks in genl_family_rcv_msg_dumpit()

There are two kinds of memory leaks in genl_family_rcv_msg_dumpit():

1. Before we call ops->start(), whenever an error happens, we forget
to free the memory allocated in genl_family_rcv_msg_dumpit().

2. When ops->start() fails, the 'info' has been already installed on
the per socket control block, so we should not free it here. More
importantly, nlk->cb_running is still false at this point, so
netlink_sock_destruct() cannot free it either.

The first kind of memory leaks is easier to resolve, but the second
one requires some deeper thoughts.

After reviewing how netfilter handles this, the most elegant solution
I find is just to use a similar way to allocate the memory, that is,
moving memory allocations from caller into ops->start(). With this,
we can solve both kinds of memory leaks: for 1), no memory allocation
happens before ops->start(); for 2), ops->start() handles its own
failures and 'info' is installed to the socket control block only
when success. The only ugliness here is we have to pass all local
variables on stack via a struct, but this is not hard to understand.

Alternatively, we can introduce a ops->free() to solve this too,
but it is overkill as only genetlink has this problem so far.

Fixes: 1927f41a22a0 ("net: genetlink: introduce dump info struct to be available during dumpit op")
Reported-by: syzbot+21f04f481f449c8db840@syzkaller.appspotmail.com
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: Florian Westphal <fw@strlen.de>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Jiri Pirko <jiri@mellanox.com>
Cc: YueHaibing <yuehaibing@huawei.com>
Cc: Shaochun Chen <cscnull@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d07dcf9a 30-Apr-2020 Johannes Berg <johannes.berg@intel.com>

netlink: add infrastructure to expose policies to userspace

Add, and use in generic netlink, helpers to dump out a netlink
policy to userspace, including all the range validation data,
nested policies etc.

This lets userspace discover what the kernel understands.

For families/commands other than generic netlink, the helpers
need to be used directly in an appropriate command, or we can
add some infrastructure (a new netlink family) that those can
register their policies with for introspection. I'm not that
familiar with non-generic netlink, so that's left out for now.

The data exposed to userspace also includes min and max length
for binary/string data, I've done that instead of letting the
userspace tools figure out whether min/max is intended based
on the type so that we can extend this later in the kernel, we
might want to just use the range data for example.

Because of this, I opted to not directly expose the NLA_*
values, even if some of them are already exposed via BPF, as
with min/max length we don't need to have different types here
for NLA_BINARY/NLA_MIN_LEN/NLA_EXACT_LEN, we just make them
all NL_ATTR_TYPE_BINARY with min/max length optionally set.

Similarly, we don't really need NLA_MSECS, and perhaps can
remove it in the future - but not if we encode it into the
userspace API now. It gets mapped to NL_ATTR_TYPE_U64 here.

Note that the exposing here corresponds to the strict policy
interpretation, and NLA_UNSPEC items are omitted entirely.
To get those, change them to NLA_MIN_LEN which behaves in
exactly the same way, but is exposed.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 39f3b41a 21-Feb-2020 Paolo Abeni <pabeni@redhat.com>

net: genetlink: return the error code when attribute parsing fails.

Currently if attribute parsing fails and the genl family
does not support parallel operation, the error code returned
by __nlmsg_parse() is discarded by genl_family_rcv_msg_attrs_parse().

Be sure to report the error for all genl families.

Fixes: c10e6cf85e7d ("net: genetlink: push attrbuf allocation and parsing to a separate function")
Fixes: ab5b526da048 ("net: genetlink: always allocate separate attrs for dumpit ops")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# cb0ce18a 11-Oct-2019 Michal Kubecek <mkubecek@suse.cz>

genetlink: do not parse attributes for families with zero maxattr

Commit c10e6cf85e7d ("net: genetlink: push attrbuf allocation and parsing
to a separate function") moved attribute buffer allocation and attribute
parsing from genl_family_rcv_msg_doit() into a separate function
genl_family_rcv_msg_attrs_parse() which, unlike the previous code, calls
__nlmsg_parse() even if family->maxattr is 0 (i.e. the family does its own
parsing). The parser error is ignored and does not propagate out of
genl_family_rcv_msg_attrs_parse() but an error message ("Unknown attribute
type") is set in extack and if further processing generates no error or
warning, it stays there and is interpreted as a warning by userspace.

Dumpit requests are not affected as genl_family_rcv_msg_dumpit() bypasses
the call of genl_family_rcv_msg_attrs_parse() if family->maxattr is zero.
Move this logic inside genl_family_rcv_msg_attrs_parse() so that we don't
have to handle it in each caller.

v3: put the check inside genl_family_rcv_msg_attrs_parse()
v2: adjust also argument of genl_family_rcv_msg_attrs_free()

Fixes: c10e6cf85e7d ("net: genetlink: push attrbuf allocation and parsing to a separate function")
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ab5b526d 07-Oct-2019 Jiri Pirko <jiri@mellanox.com>

net: genetlink: always allocate separate attrs for dumpit ops

Individual dumpit ops (start, dumpit, done) are locked by genl_lock
if !family->parallel_ops. However, multiple
genl_family_rcv_msg_dumpit() calls may in in flight in parallel.
Each has a separate struct genl_dumpit_info allocated
but they share the same family->attrbuf. Fix this by allocating separate
memory for attrs for dumpit ops, for non-parallel_ops (for parallel_ops
it is done already).

Reported-by: syzbot+495688b736534bb6c6ad@syzkaller.appspotmail.com
Reported-by: syzbot+ff59dc711f2cff879a05@syzkaller.appspotmail.com
Reported-by: syzbot+dbe02e13bcce52bcf182@syzkaller.appspotmail.com
Reported-by: syzbot+9cb7edb2906ea1e83006@syzkaller.appspotmail.com
Fixes: bf813b0afeae ("net: genetlink: parse attrs and store in contect info struct during dumpit")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>


# 265ecd4f 05-Oct-2019 Jiri Pirko <jiri@mellanox.com>

net: genetlink: remove unused genl_family_attrbuf()

genl_family_attrbuf() function is no longer used by anyone, so remove it.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# bf813b0a 05-Oct-2019 Jiri Pirko <jiri@mellanox.com>

net: genetlink: parse attrs and store in contect info struct during dumpit

Extend the dumpit info struct for attrs. Instead of existing attribute
validation do parse them and save in the info struct. Caller can benefit
from this and does not have to do parse itself. In order to properly
free attrs, genl_family pointer needs to be added to dumpit info struct
as well.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c10e6cf8 05-Oct-2019 Jiri Pirko <jiri@mellanox.com>

net: genetlink: push attrbuf allocation and parsing to a separate function

To be re-usable by dumpit as well, push the code that is taking care of
attrbuf allocation and parting from doit into separate function.
Introduce a helper to free the buffer too.

Check family->maxattr too before calling kfree() to be symmetrical with
the allocation check.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 1927f41a 05-Oct-2019 Jiri Pirko <jiri@mellanox.com>

net: genetlink: introduce dump info struct to be available during dumpit op

Currently the cb->data is taken by ops during non-parallel dumping.
Introduce a new structure genl_dumpit_info and store the ops there.
Distribute the info to both non-parallel and parallel dumping. Also add
a helper genl_dumpit_info() to easily get the info structure in the
dumpit callback from cb.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# be064def 05-Oct-2019 Jiri Pirko <jiri@mellanox.com>

net: genetlink: push doit/dumpit code from genl_family_rcv_msg

Currently the function genl_family_rcv_msg() is quite big. Since it is
quite convenient, push code that is related to doit and dumpit ops into
separate functions.

Do small changes on the way, like rc/err unification, NULL check etc.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 05d7f547 02-May-2019 Michal Kubecek <mkubecek@suse.cz>

genetlink: do not validate dump requests if there is no policy

Unlike do requests, dump genetlink requests now perform strict validation
by default even if the genetlink family does not set policy and maxtype
because it does validation and parsing on its own (e.g. because it wants to
allow different message format for different commands). While the null
policy will be ignored, maxtype (which would be zero) is still checked so
that any attribute will fail validation.

The solution is to only call __nla_validate() from genl_family_rcv_msg()
if family->maxtype is set.

Fixes: ef6243acb478 ("genetlink: optionally validate strictly/dumps")
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ef6243ac 26-Apr-2019 Johannes Berg <johannes.berg@intel.com>

genetlink: optionally validate strictly/dumps

Add options to strictly validate messages and dump messages,
sometimes perhaps validating dump messages non-strictly may
be required, so add an option for that as well.

Since none of this can really be applied to existing commands,
set the options everwhere using the following spatch:

@@
identifier ops;
expression X;
@@
struct genl_ops ops[] = {
...,
{
.cmd = X,
+ .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP,
...
},
...
};

For new commands one should just not copy the .validate 'opt-out'
flags and thus get strict validation.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 8cb08174 26-Apr-2019 Johannes Berg <johannes.berg@intel.com>

netlink: make validation more configurable for future strictness

We currently have two levels of strict validation:

1) liberal (default)
- undefined (type >= max) & NLA_UNSPEC attributes accepted
- attribute length >= expected accepted
- garbage at end of message accepted
2) strict (opt-in)
- NLA_UNSPEC attributes accepted
- attribute length >= expected accepted

Split out parsing strictness into four different options:
* TRAILING - check that there's no trailing data after parsing
attributes (in message or nested)
* MAXTYPE - reject attrs > max known type
* UNSPEC - reject attributes with NLA_UNSPEC policy entries
* STRICT_ATTRS - strictly validate attribute size

The default for future things should be *everything*.
The current *_strict() is a combination of TRAILING and MAXTYPE,
and is renamed to _deprecated_strict().
The current regular parsing has none of this, and is renamed to
*_parse_deprecated().

Additionally it allows us to selectively set one of the new flags
even on old policies. Notably, the UNSPEC flag could be useful in
this case, since it can be arranged (by filling in the policy) to
not be an incompatible userspace ABI change, but would then going
forward prevent forgetting attribute entries. Similar can apply
to the POLICY flag.

We end up with the following renames:
* nla_parse -> nla_parse_deprecated
* nla_parse_strict -> nla_parse_deprecated_strict
* nlmsg_parse -> nlmsg_parse_deprecated
* nlmsg_parse_strict -> nlmsg_parse_deprecated_strict
* nla_parse_nested -> nla_parse_nested_deprecated
* nla_validate_nested -> nla_validate_nested_deprecated

Using spatch, of course:
@@
expression TB, MAX, HEAD, LEN, POL, EXT;
@@
-nla_parse(TB, MAX, HEAD, LEN, POL, EXT)
+nla_parse_deprecated(TB, MAX, HEAD, LEN, POL, EXT)

@@
expression NLH, HDRLEN, TB, MAX, POL, EXT;
@@
-nlmsg_parse(NLH, HDRLEN, TB, MAX, POL, EXT)
+nlmsg_parse_deprecated(NLH, HDRLEN, TB, MAX, POL, EXT)

@@
expression NLH, HDRLEN, TB, MAX, POL, EXT;
@@
-nlmsg_parse_strict(NLH, HDRLEN, TB, MAX, POL, EXT)
+nlmsg_parse_deprecated_strict(NLH, HDRLEN, TB, MAX, POL, EXT)

@@
expression TB, MAX, NLA, POL, EXT;
@@
-nla_parse_nested(TB, MAX, NLA, POL, EXT)
+nla_parse_nested_deprecated(TB, MAX, NLA, POL, EXT)

@@
expression START, MAX, POL, EXT;
@@
-nla_validate_nested(START, MAX, POL, EXT)
+nla_validate_nested_deprecated(START, MAX, POL, EXT)

@@
expression NLH, HDRLEN, MAX, POL, EXT;
@@
-nlmsg_validate(NLH, HDRLEN, MAX, POL, EXT)
+nlmsg_validate_deprecated(NLH, HDRLEN, MAX, POL, EXT)

For this patch, don't actually add the strict, non-renamed versions
yet so that it breaks compile if I get it wrong.

Also, while at it, make nla_validate and nla_parse go down to a
common __nla_validate_parse() function to avoid code duplication.

Ultimately, this allows us to have very strict validation for every
new caller of nla_parse()/nlmsg_parse() etc as re-introduced in the
next patch, while existing things will continue to work as is.

In effect then, this adds fully strict validation for any new command.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ae0be8de 26-Apr-2019 Michal Kubecek <mkubecek@suse.cz>

netlink: make nla_nest_start() add NLA_F_NESTED flag

Even if the NLA_F_NESTED flag was introduced more than 11 years ago, most
netlink based interfaces (including recently added ones) are still not
setting it in kernel generated messages. Without the flag, message parsers
not aware of attribute semantics (e.g. wireshark dissector or libmnl's
mnl_nlmsg_fprintf()) cannot recognize nested attributes and won't display
the structure of their contents.

Unfortunately we cannot just add the flag everywhere as there may be
userspace applications which check nlattr::nla_type directly rather than
through a helper masking out the flags. Therefore the patch renames
nla_nest_start() to nla_nest_start_noflag() and introduces nla_nest_start()
as a wrapper adding NLA_F_NESTED. The calls which add NLA_F_NESTED manually
are rewritten to use nla_nest_start().

Except for changes in include/net/netlink.h, the patch was generated using
this semantic patch:

@@ expression E1, E2; @@
-nla_nest_start(E1, E2)
+nla_nest_start_noflag(E1, E2)

@@ expression E1, E2; @@
-nla_nest_start_noflag(E1, E2 | NLA_F_NESTED)
+nla_nest_start(E1, E2)

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 4e43df38 24-Apr-2019 Marcel Holtmann <marcel@holtmann.org>

genetlink: use idr_alloc_cyclic for family->id assignment

When allocating the next family->id it makes more sense to use
idr_alloc_cyclic to avoid re-using a previously used family->id as much
as possible.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 3b0f31f2 21-Mar-2019 Johannes Berg <johannes.berg@intel.com>

genetlink: make policy common to family

Since maxattr is common, the policy can't really differ sanely,
so make it common as well.

The only user that did in fact manage to make a non-common policy
is taskstats, which has to be really careful about it (since it's
still using a common maxattr!). This is no longer supported, but
we can fake it using pre_doit.

This reduces the size of e.g. nl80211.o (which has lots of commands):

text data bss dec hex filename
398745 14323 2240 415308 6564c net/wireless/nl80211.o (before)
397913 14331 2240 414484 65314 net/wireless/nl80211.o (after)
--------------------------------
-832 +8 0 -824

Which is obviously just 8 bytes for each command, and an added 8
bytes for the new policy pointer. I'm not sure why the ops list is
counted as .text though.

Most of the code transformations were done using the following spatch:
@ops@
identifier OPS;
expression POLICY;
@@
struct genl_ops OPS[] = {
...,
{
- .policy = POLICY,
},
...
};

@@
identifier ops.OPS;
expression ops.POLICY;
identifier fam;
expression M;
@@
struct genl_family fam = {
.ops = OPS,
.maxattr = M,
+ .policy = POLICY,
...
};

This also gets rid of devlink_nl_cmd_region_read_dumpit() accessing
the cb->data as ops, which we want to change in a later genl patch.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ceabee6c 21-Mar-2019 YueHaibing <yuehaibing@huawei.com>

genetlink: Fix a memory leak on error path

In genl_register_family(), when idr_alloc() fails,
we forget to free the memory we possibly allocate for
family->attrbuf.

Reported-by: Hulk Robot <hulkci@huawei.com>
Fixes: 2ae0f17df1cd ("genetlink: use idr to track families")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 6da2ec56 12-Jun-2018 Kees Cook <keescook@chromium.org>

treewide: kmalloc() -> kmalloc_array()

The kmalloc() function has a 2-factor argument form, kmalloc_array(). This
patch replaces cases of:

kmalloc(a * b, gfp)

with:
kmalloc_array(a * b, gfp)

as well as handling cases of:

kmalloc(a * b * c, gfp)

with:

kmalloc(array3_size(a, b, c), gfp)

as it's slightly less ugly than:

kmalloc_array(array_size(a, b), c, gfp)

This does, however, attempt to ignore constant size factors like:

kmalloc(4 * 1024, gfp)

though any constants defined via macros get caught up in the conversion.

Any factors with a sizeof() of "unsigned char", "char", and "u8" were
dropped, since they're redundant.

The tools/ directory was manually excluded, since it has its own
implementation of kmalloc().

The Coccinelle script used for this was:

// Fix redundant parens around sizeof().
@@
type TYPE;
expression THING, E;
@@

(
kmalloc(
- (sizeof(TYPE)) * E
+ sizeof(TYPE) * E
, ...)
|
kmalloc(
- (sizeof(THING)) * E
+ sizeof(THING) * E
, ...)
)

// Drop single-byte sizes and redundant parens.
@@
expression COUNT;
typedef u8;
typedef __u8;
@@

(
kmalloc(
- sizeof(u8) * (COUNT)
+ COUNT
, ...)
|
kmalloc(
- sizeof(__u8) * (COUNT)
+ COUNT
, ...)
|
kmalloc(
- sizeof(char) * (COUNT)
+ COUNT
, ...)
|
kmalloc(
- sizeof(unsigned char) * (COUNT)
+ COUNT
, ...)
|
kmalloc(
- sizeof(u8) * COUNT
+ COUNT
, ...)
|
kmalloc(
- sizeof(__u8) * COUNT
+ COUNT
, ...)
|
kmalloc(
- sizeof(char) * COUNT
+ COUNT
, ...)
|
kmalloc(
- sizeof(unsigned char) * COUNT
+ COUNT
, ...)
)

// 2-factor product with sizeof(type/expression) and identifier or constant.
@@
type TYPE;
expression THING;
identifier COUNT_ID;
constant COUNT_CONST;
@@

(
- kmalloc
+ kmalloc_array
(
- sizeof(TYPE) * (COUNT_ID)
+ COUNT_ID, sizeof(TYPE)
, ...)
|
- kmalloc
+ kmalloc_array
(
- sizeof(TYPE) * COUNT_ID
+ COUNT_ID, sizeof(TYPE)
, ...)
|
- kmalloc
+ kmalloc_array
(
- sizeof(TYPE) * (COUNT_CONST)
+ COUNT_CONST, sizeof(TYPE)
, ...)
|
- kmalloc
+ kmalloc_array
(
- sizeof(TYPE) * COUNT_CONST
+ COUNT_CONST, sizeof(TYPE)
, ...)
|
- kmalloc
+ kmalloc_array
(
- sizeof(THING) * (COUNT_ID)
+ COUNT_ID, sizeof(THING)
, ...)
|
- kmalloc
+ kmalloc_array
(
- sizeof(THING) * COUNT_ID
+ COUNT_ID, sizeof(THING)
, ...)
|
- kmalloc
+ kmalloc_array
(
- sizeof(THING) * (COUNT_CONST)
+ COUNT_CONST, sizeof(THING)
, ...)
|
- kmalloc
+ kmalloc_array
(
- sizeof(THING) * COUNT_CONST
+ COUNT_CONST, sizeof(THING)
, ...)
)

// 2-factor product, only identifiers.
@@
identifier SIZE, COUNT;
@@

- kmalloc
+ kmalloc_array
(
- SIZE * COUNT
+ COUNT, SIZE
, ...)

// 3-factor product with 1 sizeof(type) or sizeof(expression), with
// redundant parens removed.
@@
expression THING;
identifier STRIDE, COUNT;
type TYPE;
@@

(
kmalloc(
- sizeof(TYPE) * (COUNT) * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
kmalloc(
- sizeof(TYPE) * (COUNT) * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
kmalloc(
- sizeof(TYPE) * COUNT * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
kmalloc(
- sizeof(TYPE) * COUNT * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
kmalloc(
- sizeof(THING) * (COUNT) * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
kmalloc(
- sizeof(THING) * (COUNT) * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
kmalloc(
- sizeof(THING) * COUNT * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
kmalloc(
- sizeof(THING) * COUNT * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
)

// 3-factor product with 2 sizeof(variable), with redundant parens removed.
@@
expression THING1, THING2;
identifier COUNT;
type TYPE1, TYPE2;
@@

(
kmalloc(
- sizeof(TYPE1) * sizeof(TYPE2) * COUNT
+ array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
, ...)
|
kmalloc(
- sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
, ...)
|
kmalloc(
- sizeof(THING1) * sizeof(THING2) * COUNT
+ array3_size(COUNT, sizeof(THING1), sizeof(THING2))
, ...)
|
kmalloc(
- sizeof(THING1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(THING1), sizeof(THING2))
, ...)
|
kmalloc(
- sizeof(TYPE1) * sizeof(THING2) * COUNT
+ array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
, ...)
|
kmalloc(
- sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
, ...)
)

// 3-factor product, only identifiers, with redundant parens removed.
@@
identifier STRIDE, SIZE, COUNT;
@@

(
kmalloc(
- (COUNT) * STRIDE * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kmalloc(
- COUNT * (STRIDE) * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kmalloc(
- COUNT * STRIDE * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kmalloc(
- (COUNT) * (STRIDE) * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kmalloc(
- COUNT * (STRIDE) * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kmalloc(
- (COUNT) * STRIDE * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kmalloc(
- (COUNT) * (STRIDE) * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kmalloc(
- COUNT * STRIDE * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
)

// Any remaining multi-factor products, first at least 3-factor products,
// when they're not all constants...
@@
expression E1, E2, E3;
constant C1, C2, C3;
@@

(
kmalloc(C1 * C2 * C3, ...)
|
kmalloc(
- (E1) * E2 * E3
+ array3_size(E1, E2, E3)
, ...)
|
kmalloc(
- (E1) * (E2) * E3
+ array3_size(E1, E2, E3)
, ...)
|
kmalloc(
- (E1) * (E2) * (E3)
+ array3_size(E1, E2, E3)
, ...)
|
kmalloc(
- E1 * E2 * E3
+ array3_size(E1, E2, E3)
, ...)
)

// And then all remaining 2 factors products when they're not all constants,
// keeping sizeof() as the second factor argument.
@@
expression THING, E1, E2;
type TYPE;
constant C1, C2, C3;
@@

(
kmalloc(sizeof(THING) * C2, ...)
|
kmalloc(sizeof(TYPE) * C2, ...)
|
kmalloc(C1 * C2 * C3, ...)
|
kmalloc(C1 * C2, ...)
|
- kmalloc
+ kmalloc_array
(
- sizeof(TYPE) * (E2)
+ E2, sizeof(TYPE)
, ...)
|
- kmalloc
+ kmalloc_array
(
- sizeof(TYPE) * E2
+ E2, sizeof(TYPE)
, ...)
|
- kmalloc
+ kmalloc_array
(
- sizeof(THING) * (E2)
+ E2, sizeof(THING)
, ...)
|
- kmalloc
+ kmalloc_array
(
- sizeof(THING) * E2
+ E2, sizeof(THING)
, ...)
|
- kmalloc
+ kmalloc_array
(
- (E1) * E2
+ E1, E2
, ...)
|
- kmalloc
+ kmalloc_array
(
- (E1) * (E2)
+ E1, E2
, ...)
|
- kmalloc
+ kmalloc_array
(
- E1 * E2
+ E1, E2
, ...)
)

Signed-off-by: Kees Cook <keescook@chromium.org>


# 2f635cee 27-Mar-2018 Kirill Tkhai <ktkhai@virtuozzo.com>

net: Drop pernet_operations::async

Synchronous pernet_operations are not allowed anymore.
All are asynchronous. So, drop the structure member.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 02a2385f 14-Mar-2018 Nicolas Dichtel <nicolas.dichtel@6wind.com>

netlink: avoid a double skb free in genlmsg_mcast()

nlmsg_multicast() consumes always the skb, thus the original skb must be
freed only when this function is called with a clone.

Fixes: cb9f7a9a5c96 ("netlink: ensure to loop over all netns in genlmsg_multicast_allns()")
Reported-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 83caf62c 12-Feb-2018 Kirill Tkhai <ktkhai@virtuozzo.com>

net: Convert genl_pernet_ops

This pernet_operations create and destroy net::genl_sock.
Foreign pernet_operations don't touch it.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Acked-by: Andrei Vagin <avagin@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# cb9f7a9a 06-Feb-2018 Nicolas Dichtel <nicolas.dichtel@6wind.com>

netlink: ensure to loop over all netns in genlmsg_multicast_allns()

Nowadays, nlmsg_multicast() returns only 0 or -ESRCH but this was not the
case when commit 134e63756d5f was pushed.
However, there was no reason to stop the loop if a netns does not have
listeners.
Returns -ESRCH only if there was no listeners in all netns.

To avoid having the same problem in the future, I didn't take the
assumption that nlmsg_multicast() returns only 0 or -ESRCH.

Fixes: 134e63756d5f ("genetlink: make netns aware")
CC: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b2441318 01-Nov-2017 Greg Kroah-Hartman <gregkh@linuxfoundation.org>

License cleanup: add SPDX GPL-2.0 license identifier to files with no license

Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.

By default all files without license information are under the default
license of the kernel, which is GPL version 2.

Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.

This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.

How this work was done:

Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,

Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.

The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.

The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.

Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).

All documentation files were explicitly excluded.

The following heuristics were used to determine which SPDX license
identifiers to apply.

- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.

For non */uapi/* files that summary was:

SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139

and resulted in the first patch in this series.

If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:

SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930

and resulted in the second patch in this series.

- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:

SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1

and that resulted in the third patch in this series.

- when the two scanners agreed on the detected license(s), that became
the concluded license(s).

- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.

- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).

- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.

- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.

In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.

Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.

Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.

In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.

Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct

This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.

These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.

Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# fe52145f 12-Apr-2017 Johannes Berg <johannes.berg@intel.com>

netlink: pass extended ACK struct where available

This is an add-on to the previous patch that passes the extended ACK
structure where it's already available by existing genl_info or extack
function arguments.

This was done with this spatch (with some manual adjustment of
indentation):

@@
expression A, B, C, D, E;
identifier fn, info;
@@
fn(..., struct genl_info *info, ...) {
...
-nlmsg_parse(A, B, C, D, E, NULL)
+nlmsg_parse(A, B, C, D, E, info->extack)
...
}

@@
expression A, B, C, D, E;
identifier fn, info;
@@
fn(..., struct genl_info *info, ...) {
<...
-nla_parse_nested(A, B, C, D, NULL)
+nla_parse_nested(A, B, C, D, info->extack)
...>
}

@@
expression A, B, C, D, E;
identifier fn, extack;
@@
fn(..., struct netlink_ext_ack *extack, ...) {
<...
-nlmsg_parse(A, B, C, D, E, NULL)
+nlmsg_parse(A, B, C, D, E, extack)
...>
}

@@
expression A, B, C, D, E;
identifier fn, extack;
@@
fn(..., struct netlink_ext_ack *extack, ...) {
<...
-nla_parse(A, B, C, D, E, NULL)
+nla_parse(A, B, C, D, E, extack)
...>
}

@@
expression A, B, C, D, E;
identifier fn, extack;
@@
fn(..., struct netlink_ext_ack *extack, ...) {
...
-nlmsg_parse(A, B, C, D, E, NULL)
+nlmsg_parse(A, B, C, D, E, extack)
...
}

@@
expression A, B, C, D;
identifier fn, extack;
@@
fn(..., struct netlink_ext_ack *extack, ...) {
<...
-nla_parse_nested(A, B, C, D, NULL)
+nla_parse_nested(A, B, C, D, extack)
...>
}

@@
expression A, B, C, D;
identifier fn, extack;
@@
fn(..., struct netlink_ext_ack *extack, ...) {
<...
-nlmsg_validate(A, B, C, D, NULL)
+nlmsg_validate(A, B, C, D, extack)
...>
}

@@
expression A, B, C, D;
identifier fn, extack;
@@
fn(..., struct netlink_ext_ack *extack, ...) {
<...
-nla_validate(A, B, C, D, NULL)
+nla_validate(A, B, C, D, extack)
...>
}

@@
expression A, B, C;
identifier fn, extack;
@@
fn(..., struct netlink_ext_ack *extack, ...) {
<...
-nla_validate_nested(A, B, C, NULL)
+nla_validate_nested(A, B, C, extack)
...>
}

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# fceb6435 12-Apr-2017 Johannes Berg <johannes.berg@intel.com>

netlink: pass extended ACK struct to parsing functions

Pass the new extended ACK reporting struct to all of the generic
netlink parsing functions. For now, pass NULL in almost all callers
(except for some in the core.)

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 7ab606d1 12-Apr-2017 Johannes Berg <johannes.berg@intel.com>

genetlink: pass extended ACK report down

Pass the extended ACK reporting struct down from generic netlink to
the families, using the existing struct genl_info for simplicity.

Also add support to set the extended ACK information from generic
netlink users.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 2d4bc933 12-Apr-2017 Johannes Berg <johannes.berg@intel.com>

netlink: extended ACK reporting

Add the base infrastructure and UAPI for netlink extended ACK
reporting. All "manual" calls to netlink_ack() pass NULL for now and
thus don't get extended ACK reporting.

Big thanks goes to Pablo Neira Ayuso for not only bringing up the
whole topic at netconf (again) but also coming up with the nlattr
passing trick and various other ideas.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Reviewed-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 1d2a6a5e 22-Mar-2017 Stanislaw Gruszka <sgruszka@redhat.com>

genetlink: fix counting regression on ctrl_dumpfamily()

Commit 2ae0f17df1cd ("genetlink: use idr to track families") replaced

if (++n < fams_to_skip)
continue;
into:

if (n++ < fams_to_skip)
continue;

This subtle change cause that on retry ctrl_dumpfamily() call we omit
one family that failed to do ctrl_fill_info() on previous call, because
cb->args[0] = n number counts also family that failed to do
ctrl_fill_info().

Patch fixes the problem and avoid confusion in the future just decrease
n counter when ctrl_fill_info() fail.

User visible problem caused by this bug is failure to get access to
some genetlink family i.e. nl80211. However problem is reproducible
only if number of registered genetlink families is big enough to
cause second call of ctrl_dumpfamily().

Cc: Xose Vazquez Perez <xose.vazquez@gmail.com>
Cc: Larry Finger <Larry.Finger@lwfinger.net>
Cc: Johannes Berg <johannes@sipsolutions.net>
Fixes: 2ae0f17df1cd ("genetlink: use idr to track families")
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 00ffc1ba 03-Nov-2016 WANG Cong <xiyou.wangcong@gmail.com>

genetlink: fix a memory leak on error path

In __genl_register_family(), when genl_validate_assign_mc_groups()
fails, we forget to free the memory we possibly allocate for
family->attrbuf.

Note, some callers call genl_unregister_family() to clean up
on error path, it doesn't work because the family is inserted
to the global list in the nearly last step.

Cc: Jakub Kicinski <kubakici@wp.pl>
Cc: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 22ca904a 01-Nov-2016 Wei Yongjun <weiyongjun1@huawei.com>

genetlink: fix error return code in genl_register_family()

Fix to return a negative error code from the idr_alloc() error handling
case instead of 0, as done elsewhere in this function.

Also fix the return value check of idr_alloc() since idr_alloc return
negative errors on failure, not zero.

Fixes: 2ae0f17df1cd ("genetlink: use idr to track families")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 0e82c763 28-Oct-2016 pravin shelar <pshelar@ovn.org>

genetlink: Fix generic netlink family unregister

This patch fixes a typo in unregister operation.

Following crash is fixed by this patch. It can be easily reproduced
by repeating modprobe and rmmod module that uses genetlink.

[ 261.446686] BUG: unable to handle kernel paging request at ffffffffa0264088
[ 261.448921] IP: [<ffffffff813cb70e>] strcmp+0xe/0x30
[ 261.450494] PGD 1c09067
[ 261.451266] PUD 1c0a063
[ 261.452091] PMD 8068d5067
[ 261.452525] PTE 0
[ 261.453164]
[ 261.453618] Oops: 0000 [#1] SMP
[ 261.454577] Modules linked in: openvswitch(+) ...
[ 261.480753] RIP: 0010:[<ffffffff813cb70e>] [<ffffffff813cb70e>] strcmp+0xe/0x30
[ 261.483069] RSP: 0018:ffffc90003c0bc28 EFLAGS: 00010282
[ 261.510145] Call Trace:
[ 261.510896] [<ffffffff816f10ca>] genl_family_find_byname+0x5a/0x70
[ 261.512819] [<ffffffff816f2319>] genl_register_family+0xb9/0x630
[ 261.514805] [<ffffffffa02840bc>] dp_init+0xbc/0x120 [openvswitch]
[ 261.518268] [<ffffffff8100217d>] do_one_initcall+0x3d/0x160
[ 261.525041] [<ffffffff811808a9>] do_init_module+0x60/0x1f1
[ 261.526754] [<ffffffff8110687f>] load_module+0x22af/0x2860
[ 261.530144] [<ffffffff81107026>] SYSC_finit_module+0x96/0xd0
[ 261.531901] [<ffffffff8110707e>] SyS_finit_module+0xe/0x10
[ 261.533605] [<ffffffff8100391e>] do_syscall_64+0x6e/0x180
[ 261.535284] [<ffffffff817c2faf>] entry_SYSCALL64_slow_path+0x25/0x25
[ 261.546512] RIP [<ffffffff813cb70e>] strcmp+0xe/0x30
[ 261.550198] ---[ end trace 76505a814dd68770 ]---

Fixes: 2ae0f17df1c ("genetlink: use idr to track families").

Reported-by: Jarno Rajahalme <jarno@ovn.org>
CC: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 56989f6d 24-Oct-2016 Johannes Berg <johannes.berg@intel.com>

genetlink: mark families as __ro_after_init

Now genl_register_family() is the only thing (other than the
users themselves, perhaps, but I didn't find any doing that)
writing to the family struct.

In all families that I found, genl_register_family() is only
called from __init functions (some indirectly, in which case
I've add __init annotations to clarifly things), so all can
actually be marked __ro_after_init.

This protects the data structure from accidental corruption.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 2ae0f17d 24-Oct-2016 Johannes Berg <johannes.berg@intel.com>

genetlink: use idr to track families

Since generic netlink family IDs are small integers, allocated
densely, IDR is an ideal match for lookups. Replace the existing
hand-written hash-table with IDR for allocation and lookup.

This lets the families only be written to once, during register,
since the list_head can be removed and removal of a family won't
cause any writes.

It also slightly reduces the code size (by about 1.3k on x86-64).

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 489111e5 24-Oct-2016 Johannes Berg <johannes.berg@intel.com>

genetlink: statically initialize families

Instead of providing macros/inline functions to initialize
the families, make all users initialize them statically and
get rid of the macros.

This reduces the kernel code size by about 1.6k on x86-64
(with allyesconfig).

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a07ea4d9 24-Oct-2016 Johannes Berg <johannes.berg@intel.com>

genetlink: no longer support using static family IDs

Static family IDs have never really been used, the only
use case was the workaround I introduced for those users
that assumed their family ID was also their multicast
group ID.

Additionally, because static family IDs would never be
reserved by the generic netlink code, using a relatively
low ID would only work for built-in families that can be
registered immediately after generic netlink is started,
which is basically only the control family (apart from
the workaround code, which I also had to add code for so
it would reserve those IDs)

Thus, anything other than GENL_ID_GENERATE is flawed and
luckily not used except in the cases I mentioned. Move
those workarounds into a few lines of code, and then get
rid of GENL_ID_GENERATE entirely, making it more robust.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c90c39da 24-Oct-2016 Johannes Berg <johannes.berg@intel.com>

genetlink: introduce and use genl_family_attrbuf()

This helper function allows family implementations to access
their family's attrbuf. This gets rid of the attrbuf usage
in families, and also adds locking validation, since it's not
valid to use the attrbuf with parallel_ops or outside of the
dumpit callback.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 12d8de6d 31-Aug-2016 stephen hemminger <stephen@networkplumber.org>

net: make genetlink ctrl ops const

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 263ea090 18-Feb-2016 Florian Westphal <fw@strlen.de>

Revert "genl: Add genlmsg_new_unicast() for unicast message allocation"

This reverts commit bb9b18fb55b0 ("genl: Add genlmsg_new_unicast() for
unicast message allocation")'.

Nothing wrong with it; its no longer needed since this was only for
mmapped netlink support.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 4a92602a 05-Feb-2016 Tycho Andersen <tycho.andersen@canonical.com>

openvswitch: allow management from inside user namespaces

Operations with the GENL_ADMIN_PERM flag fail permissions checks because
this flag means we call netlink_capable, which uses the init user ns.

Instead, let's introduce a new flag, GENL_UNS_ADMIN_PERM for operations
which should be allowed inside a user namespace.

The motivation for this is to be able to run openvswitch in unprivileged
containers. I've tested this and it seems to work, but I really have no
idea about the security consequences of this patch, so thoughts would be
much appreciated.

v2: use the GENL_UNS_ADMIN_PERM flag instead of a check in each function
v3: use separate ifs for UNS_ADMIN_PERM and ADMIN_PERM, instead of one
massive one

Reported-by: James Page <james.page@canonical.com>
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
CC: Eric Biederman <ebiederm@xmission.com>
CC: Pravin Shelar <pshelar@ovn.org>
CC: Justin Pettit <jpettit@nicira.com>
CC: "David S. Miller" <davem@davemloft.net>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b8e429a2 13-Jan-2016 David S. Miller <davem@davemloft.net>

genetlink: Fix off-by-one in genl_allocate_reserve_groups()

The bug fix for adding n_groups to the computation forgot
to adjust ">=" to ">" to keep the condition correct.

Signed-off-by: David S. Miller <davem@davemloft.net>


# ccdf6ce6 11-Jan-2016 Matti Vaittinen <matti.vaittinen@nokia.com>

net: netlink: Fix multicast group storage allocation for families with more than one groups

Multicast groups are stored in global buffer. Check for needed buffer size
incorrectly compares buffer size to first id for family. This means that
for families with more than one mcast id one may allocate too small buffer
and end up writing rest of the groups to some unallocated memory. Fix the
buffer size check to compare allocated space to last mcast id for the
family.

Tested on ARM using kernel 3.14

Signed-off-by: Matti Vaittinen <matti.vaittinen@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# fc9e50f5 15-Dec-2015 Tom Herbert <tom@herbertland.com>

netlink: add a start callback for starting a netlink dump

The start callback allows the caller to set up a context for the
dump callbacks. Presumably, the context can then be destroyed in
the done callback.

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 61d03535 08-Oct-2015 Yaowei Bai <bywxiaobai@163.com>

net/netlink: lockdep_genl_is_held can be boolean

This patch makes lockdep_genl_is_held return bool to improve
readability due to this particular function only using either
one or zero as its return value.

No functional change.

Signed-off-by: Yaowei Bai <bywxiaobai@163.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 92c14d9b 22-Sep-2015 Jiri Benc <jbenc@redhat.com>

genetlink: simplify genl_notify

The genl_notify function has too many arguments for no real reason - all
callers use genl_info to get them anyway. Just pass the genl_info down to
genl_notify.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 053c095a 16-Jan-2015 Johannes Berg <johannes.berg@intel.com>

netlink: make nlmsg_end() and genlmsg_end() void

Contrary to common expectations for an "int" return, these functions
return only a positive value -- if used correctly they cannot even
return 0 because the message header will necessarily be in the skb.

This makes the very common pattern of

if (genlmsg_end(...) < 0) { ... }

be a whole bunch of dead code. Many places also simply do

return nlmsg_end(...);

and the caller is expected to deal with it.

This also commonly (at least for me) causes errors, because it is very
common to write

if (my_function(...))
/* error condition */

and if my_function() does "return nlmsg_end()" this is of course wrong.

Additionally, there's not a single place in the kernel that actually
needs the message length returned, and if anyone needs it later then
it'll be very easy to just use skb->len there.

Remove this, and make the functions void. This removes a bunch of dead
code as described above. The patch adds lines because I did

- return nlmsg_end(...);
+ nlmsg_end(...);
+ return 0;

I could have preserved all the function's return values by returning
skb->len, but instead I've audited all the places calling the affected
functions and found that none cared. A few places actually compared
the return value with <= 0 in dump functionality, but that could just
be changed to < 0 with no change in behaviour, so I opted for the more
efficient version.

One instance of the error I've made numerous times now is also present
in net/phonet/pn_netlink.c in the route_dumpit() function - it didn't
check for <0 or <=0 and thus broke out of the loop every single time.
I've preserved this since it will (I think) have caused the messages to
userspace to be formatted differently with just a single message for
every SKB returned to userspace. It's possible that this isn't needed
for the tools that actually use this, but I don't even know what they
are so couldn't test that changing this behaviour would be acceptable.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ee1c2442 16-Jan-2015 Johannes Berg <johannes.berg@intel.com>

genetlink: synchronize socket closing and family removal

In addition to the problem Jeff Layton reported, I looked at the code
and reproduced the same warning by subscribing and removing the genl
family with a socket still open. This is a fairly tricky race which
originates in the fact that generic netlink allows the family to go
away while sockets are still open - unlike regular netlink which has
a module refcount for every open socket so in general this cannot be
triggered.

Trying to resolve this issue by the obvious locking isn't possible as
it will result in deadlocks between unregistration and group unbind
notification (which incidentally lockdep doesn't find due to the home
grown locking in the netlink table.)

To really resolve this, introduce a "closing socket" reference counter
(for generic netlink only, as it's the only affected family) in the
core netlink code and use that in generic netlink to wait for all the
sockets that are being closed at the same time as a generic netlink
family is removed.

This fixes the race that when a socket is closed, it will should call
the unbind, but if the family is removed at the same time the unbind
will not find it, leading to the warning. The real problem though is
that in this case the unbind could actually find a new family that is
registered to have a multicast group with the same ID, and call its
mcast_unbind() leading to confusing.

Also remove the warning since it would still trigger, but is now no
longer a problem.

This also moves the code in af_netlink.c to before unreferencing the
module to avoid having the same problem in the normal non-genl case.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 5ad63005 16-Jan-2015 Johannes Berg <johannes.berg@intel.com>

genetlink: disallow subscribing to unknown mcast groups

Jeff Layton reported that he could trigger the multicast unbind warning
in generic netlink using trinity. I originally thought it was a race
condition between unregistering the generic netlink family and closing
the socket, but there's a far simpler explanation: genetlink currently
allows subscribing to groups that don't (yet) exist, and the warning is
triggered when unsubscribing again while the group still doesn't exist.

Originally, I had a warning in the subscribe case and accepted it out of
userspace API concerns, but the warning was of course wrong and removed
later.

However, I now think that allowing userspace to subscribe to groups that
don't exist is wrong and could possibly become a security problem:
Consider a (new) genetlink family implementing a permission check in
the mcast_bind() function similar to the like the audit code does today;
it would be possible to bypass the permission check by guessing the ID
and subscribing to the group it exists. This is only possible in case a
family like that would be dynamically loaded, but it doesn't seem like a
huge stretch, for example wireless may be loaded when you plug in a USB
device.

To avoid this reject such subscription attempts.

If this ends up causing userspace issues we may need to add a workaround
in af_netlink to deny such requests but not return an error.

Reported-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# dc97a1a9 29-Dec-2014 David S. Miller <davem@davemloft.net>

genetlink: A genl_bind() to an out-of-range multicast group should not WARN().

Users can request to bind to arbitrary multicast groups, so warning
when the requested group number is out of range is not appropriate.

And with the warning removed, and the 'err' variable properly given
an initial value, we can remove 'found' altogether.

Reported-by: Sedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 023e2cfa 23-Dec-2014 Johannes Berg <johannes.berg@intel.com>

netlink/genetlink: pass network namespace to bind/unbind

Netlink families can exist in multiple namespaces, and for the most
part multicast subscriptions are per network namespace. Thus it only
makes sense to have bind/unbind notifications per network namespace.

To achieve this, pass the network namespace of a given client socket
to the bind/unbind functions.

Also do this in generic netlink, and there also make sure that any
bind for multicast groups that only exist in init_net is rejected.
This isn't really a problem if it is accepted since a client in a
different namespace will never receive any notifications from such
a group, but it can confuse the family if not rejected (it's also
possible to silently (without telling the family) accept it, but it
would also have to be ignored on unbind so families that take any
kind of action on bind/unbind won't do unnecessary work for invalid
clients like that.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c380d9a7 23-Dec-2014 Johannes Berg <johannes.berg@intel.com>

genetlink: pass multicast bind/unbind to families

In order to make the newly fixed multicast bind/unbind
functionality in generic netlink, pass them down to the
appropriate family.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 2f91abd4 02-Jun-2014 Denis ChengRq <crquan@gmail.com>

genetlink: remove superfluous assignment

the local variable ops and n_ops were just read out from family,
and not changed, hence no need to assign back.

Validation functions should operate on const parameters and not
change anything.

Signed-off-by: Cheng Renquan <crquan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 90f62cf3 23-Apr-2014 Eric W. Biederman <ebiederm@xmission.com>

net: Use netlink_ns_capable to verify the permisions of netlink messages

It is possible by passing a netlink socket to a more privileged
executable and then to fool that executable into writing to the socket
data that happens to be valid netlink message to do something that
privileged executable did not intend to do.

To keep this from happening replace bare capable and ns_capable calls
with netlink_capable, netlink_net_calls and netlink_ns_capable calls.
Which act the same as the previous calls except they verify that the
opener of the socket had the desired permissions as well.

Reported-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# bb9b18fb 30-Nov-2013 Thomas Graf <tgraf@suug.ch>

genl: Add genlmsg_new_unicast() for unicast message allocation

Allocates a new sk_buff large enough to cover the specified payload
plus required Netlink headers. Will check receiving socket for
memory mapped i/o capability and use it if enabled. Will fall back
to non-mapped skb if message size exceeds the frame size of the ring.

Signed-of-by: Thomas Graf <tgraf@suug.ch>
Reviewed-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>


# 5e53e689 24-Nov-2013 Johannes Berg <johannes.berg@intel.com>

genetlink/pmcraid: use proper genetlink multicast API

The pmcraid driver is abusing the genetlink API and is using its
family ID as the multicast group ID, which is invalid and may
belong to somebody else (and likely will.)

Make it use the correct API, but since this may already be used
as-is by userspace, reserve a family ID for this code and also
reserve that group ID to not break userspace assumptions.

My previous patch broke event delivery in the driver as I missed
that it wasn't using the right API and forgot to update it later
in my series.

While changing this, I noticed that the genetlink code could use
the static group ID instead of a strcmp(), so also do that for
the VFS_DQUOT family.

Cc: Anil Ravindranath <anil_ravindranath@pmc-sierra.com>
Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 0f0e2159 23-Nov-2013 Geert Uytterhoeven <geert@linux-m68k.org>

genetlink: Fix uninitialized variable in genl_validate_assign_mc_groups()

net/netlink/genetlink.c: In function ‘genl_validate_assign_mc_groups’:
net/netlink/genetlink.c:217: warning: ‘err’ may be used uninitialized in this
function

Commit 2a94fe48f32ccf7321450a2cc07f2b724a444e5b ("genetlink: make multicast
groups const, prevent abuse") split genl_register_mc_group() in multiple
functions, but dropped the initialization of err.

Initialize err to zero to fix this.

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 220815a9 21-Nov-2013 Johannes Berg <johannes.berg@intel.com>

genetlink: fix genlmsg_multicast() bug

Unfortunately, I introduced a tremendously stupid bug into
genlmsg_multicast() when doing all those multicast group
changes: it adjusts the group number, but then passes it
to genlmsg_multicast_netns() which does that again.

Somehow, my tests failed to catch this, so add a warning
into genlmsg_multicast_netns() and remove the offending
group ID adjustment.

Also add a warning to the similar code in other functions
so people who misuse them are more loudly warned.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 2a94fe48 19-Nov-2013 Johannes Berg <johannes.berg@intel.com>

genetlink: make multicast groups const, prevent abuse

Register generic netlink multicast groups as an array with
the family and give them contiguous group IDs. Then instead
of passing the global group ID to the various functions that
send messages, pass the ID relative to the family - for most
families that's just 0 because the only have one group.

This avoids the list_head and ID in each group, adding a new
field for the mcast group ID offset to the family.

At the same time, this allows us to prevent abusing groups
again like the quota and dropmon code did, since we can now
check that a family only uses a group it owns.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 68eb5503 19-Nov-2013 Johannes Berg <johannes.berg@intel.com>

genetlink: pass family to functions using groups

This doesn't really change anything, but prepares for the
next patch that will change the APIs to pass the group ID
within the family, rather than the global group ID.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c2ebb908 19-Nov-2013 Johannes Berg <johannes.berg@intel.com>

genetlink: remove family pointer from genl_multicast_group

There's no reason to have the family pointer there since it
can just be passed internally where needed, so remove it.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 06fb555a 19-Nov-2013 Johannes Berg <johannes.berg@intel.com>

genetlink: remove genl_unregister_mc_group()

There are no users of this API remaining, and we'll soon
change group registration to be static (like ops are now)

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 2ecf7536 19-Nov-2013 Johannes Berg <johannes.berg@intel.com>

quota/genetlink: use proper genetlink multicast APIs

The quota code is abusing the genetlink API and is using
its family ID as the multicast group ID, which is invalid
and may belong to somebody else (and likely will.)

Make the quota code use the correct API, but since this
is already used as-is by userspace, reserve a family ID
for this code and also reserve that group ID to not break
userspace assumptions.

Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# e5dcecba 19-Nov-2013 Johannes Berg <johannes.berg@intel.com>

drop_monitor/genetlink: use proper genetlink multicast APIs

The drop monitor code is abusing the genetlink API and is
statically using the generic netlink multicast group 1, even
if that group belongs to somebody else (which it invariably
will, since it's not reserved.)

Make the drop monitor code use the proper APIs to reserve a
group ID, but also reserve the group id 1 in generic netlink
code to preserve the userspace API. Since drop monitor can
be a module, don't clear the bit for it on unregistration.

Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c53ed742 19-Nov-2013 Johannes Berg <johannes.berg@intel.com>

genetlink: only pass array to genl_register_family_with_ops()

As suggested by David Miller, make genl_register_family_with_ops()
a macro and pass only the array, evaluating ARRAY_SIZE() in the
macro, this is a little safer.

The openvswitch has some indirection, assing ops/n_ops directly in
that code. This might ultimately just assign the pointers in the
family initializations, saving the struct genl_family_and_ops and
code (once mcast groups are handled differently.)

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 029b234f 18-Nov-2013 Johannes Berg <johannes.berg@intel.com>

genetlink: rename shadowed variable

Sparse pointed out that the new flags variable I had added
shadowed an existing one, rename the new one to avoid that,
making the code clearer.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 568508aa 15-Nov-2013 Johannes Berg <johannes.berg@intel.com>

genetlink: unify registration functions

Now that the ops assignment is just two variables rather than a
long list iteration etc., there's no reason to separately export
__genl_register_family() and __genl_register_family_with_ops().

Unify the two functions into __genl_register_family() and make
genl_register_family_with_ops() call it after assigning the ops.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# f84f771d 14-Nov-2013 Johannes Berg <johannes.berg@intel.com>

genetlink: allow making ops const

Allow making the ops array const by not modifying the ops
flags on registration but rather only when ops are sent
out in the family information.

No users are updated yet except for the pre_doit/post_doit
calls in wireless (the only ones that exist now.)

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d91824c0 14-Nov-2013 Johannes Berg <johannes.berg@intel.com>

genetlink: register family ops as array

Instead of using a linked list, use an array. This reduces
the data size needed by the users of genetlink, for example
in wireless (net/wireless/nl80211.c) on 64-bit it frees up
over 1K of data space.

Remove the attempted sending of CTRL_CMD_NEWOPS ctrl event
since genl_ctrl_event(CTRL_CMD_NEWOPS, ...) only returns
-EINVAL anyway, therefore no such event could ever be sent.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 3686ec5e 14-Nov-2013 Johannes Berg <johannes.berg@intel.com>

genetlink: remove genl_register_ops/genl_unregister_ops

genl_register_ops() is still needed for internal registration,
but is no longer available to users of the API.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 33c6b1f6 23-Aug-2013 Pravin B Shelar <pshelar@nicira.com>

genl: Hold reference on correct module while netlink-dump.

netlink dump operations take module as parameter to hold
reference for entire netlink dump duration.
Currently it holds ref only on genl module which is not correct
when we use ops registered to genl from another module.
Following patch adds module pointer to genl_ops so that netlink
can hold ref count on it.

CC: Jesse Gross <jesse@nicira.com>
CC: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 9b96309c 23-Aug-2013 Pravin B Shelar <pshelar@nicira.com>

genl: Fix genl dumpit() locking.

In case of genl-family with parallel ops off, dumpif() callback
is expected to run under genl_lock, But commit def3117493eafd9df
(genl: Allow concurrent genl callbacks.) changed this behaviour
where only first dumpit() op was called under genl-lock.
For subsequent dump, only nlk->cb_lock was taken.
Following patch fixes it by defining locked dumpit() and done()
callback which takes care of genl-locking.

CC: Jesse Gross <jesse@nicira.com>
CC: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 9d47b380 21-Aug-2013 Johannes Berg <johannes.berg@intel.com>

Revert "genetlink: fix family dump race"

This reverts commit 58ad436fcf49810aa006016107f494c9ac9013db.

It turns out that the change introduced a potential deadlock
by causing a locking dependency with netlink's cb_mutex. I
can't seem to find a way to resolve this without doing major
changes to the locking, so revert this.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 58ad436f 13-Aug-2013 Johannes Berg <johannes.berg@intel.com>

genetlink: fix family dump race

When dumping generic netlink families, only the first dump call
is locked with genl_lock(), which protects the list of families,
and thus subsequent calls can access the data without locking,
racing against family addition/removal. This can cause a crash.
Fix it - the locking needs to be conditional because the first
time around it's already locked.

A similar bug was reported to me on an old kernel (3.4.47) but
the exact scenario that happened there is no longer possible,
on those kernels the first round wasn't locked either. Looking
at the current code I found the race described above, which had
also existed on the old kernel.

Cc: stable@vger.kernel.org
Reported-by: Andrei Otcheretianski <andrei.otcheretianski@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# e1ee3673 28-Jul-2013 Pablo Neira <pablo@netfilter.org>

genetlink: fix usage of NLM_F_EXCL or NLM_F_REPLACE

Currently, it is not possible to use neither NLM_F_EXCL nor
NLM_F_REPLACE from genetlink. This is due to this checking in
genl_family_rcv_msg:

if (nlh->nlmsg_flags & NLM_F_DUMP)

NLM_F_DUMP is NLM_F_MATCH|NLM_F_ROOT. Thus, if NLM_F_EXCL or
NLM_F_REPLACE flag is set, genetlink believes that you're
requesting a dump and it calls the .dumpit callback.

The solution that I propose is to refine this checking to
make it stricter:

if ((nlh->nlmsg_flags & NLM_F_DUMP) == NLM_F_DUMP)

And given the combination NLM_F_REPLACE and NLM_F_EXCL does
not make sense to me, it removes the ambiguity.

There was a patch that tried to fix this some time ago (0ab03c2
netlink: test for all flags of the NLM_F_DUMP composite) but it
tried to resolve this ambiguity in *all* existing netlink subsystems,
not only genetlink. That patch was reverted since it broke iproute2,
which is using NLM_F_ROOT to request the dump of the routing cache.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c74f2b26 26-Jul-2013 Stanislaw Gruszka <sgruszka@redhat.com>

genetlink: release cb_lock before requesting additional module

Requesting external module with cb_lock taken can result in
the deadlock like showed below:

[ 2458.111347] Showing all locks held in the system:
[ 2458.111347] 1 lock held by NetworkManager/582:
[ 2458.111347] #0: (cb_lock){++++++}, at: [<ffffffff8162bc79>] genl_rcv+0x19/0x40
[ 2458.111347] 1 lock held by modprobe/603:
[ 2458.111347] #0: (cb_lock){++++++}, at: [<ffffffff8162baa5>] genl_lock_all+0x15/0x30

[ 2461.579457] SysRq : Show Blocked State
[ 2461.580103] task PC stack pid father
[ 2461.580103] NetworkManager D ffff880034b84500 4040 582 1 0x00000080
[ 2461.580103] ffff8800197ff720 0000000000000046 00000000001d5340 ffff8800197fffd8
[ 2461.580103] ffff8800197fffd8 00000000001d5340 ffff880019631700 7fffffffffffffff
[ 2461.580103] ffff8800197ff880 ffff8800197ff878 ffff880019631700 ffff880019631700
[ 2461.580103] Call Trace:
[ 2461.580103] [<ffffffff817355f9>] schedule+0x29/0x70
[ 2461.580103] [<ffffffff81731ad1>] schedule_timeout+0x1c1/0x360
[ 2461.580103] [<ffffffff810e69eb>] ? mark_held_locks+0xbb/0x140
[ 2461.580103] [<ffffffff817377ac>] ? _raw_spin_unlock_irq+0x2c/0x50
[ 2461.580103] [<ffffffff810e6b6d>] ? trace_hardirqs_on_caller+0xfd/0x1c0
[ 2461.580103] [<ffffffff81736398>] wait_for_completion_killable+0xe8/0x170
[ 2461.580103] [<ffffffff810b7fa0>] ? wake_up_state+0x20/0x20
[ 2461.580103] [<ffffffff81095825>] call_usermodehelper_exec+0x1a5/0x210
[ 2461.580103] [<ffffffff817362ed>] ? wait_for_completion_killable+0x3d/0x170
[ 2461.580103] [<ffffffff81095cc3>] __request_module+0x1b3/0x370
[ 2461.580103] [<ffffffff810e6b6d>] ? trace_hardirqs_on_caller+0xfd/0x1c0
[ 2461.580103] [<ffffffff8162c5c9>] ctrl_getfamily+0x159/0x190
[ 2461.580103] [<ffffffff8162d8a4>] genl_family_rcv_msg+0x1f4/0x2e0
[ 2461.580103] [<ffffffff8162d990>] ? genl_family_rcv_msg+0x2e0/0x2e0
[ 2461.580103] [<ffffffff8162da1e>] genl_rcv_msg+0x8e/0xd0
[ 2461.580103] [<ffffffff8162b729>] netlink_rcv_skb+0xa9/0xc0
[ 2461.580103] [<ffffffff8162bc88>] genl_rcv+0x28/0x40
[ 2461.580103] [<ffffffff8162ad6d>] netlink_unicast+0xdd/0x190
[ 2461.580103] [<ffffffff8162b149>] netlink_sendmsg+0x329/0x750
[ 2461.580103] [<ffffffff815db849>] sock_sendmsg+0x99/0xd0
[ 2461.580103] [<ffffffff810bb58f>] ? local_clock+0x5f/0x70
[ 2461.580103] [<ffffffff810e96e8>] ? lock_release_non_nested+0x308/0x350
[ 2461.580103] [<ffffffff815dbc6e>] ___sys_sendmsg+0x39e/0x3b0
[ 2461.580103] [<ffffffff810565af>] ? kvm_clock_read+0x2f/0x50
[ 2461.580103] [<ffffffff810218b9>] ? sched_clock+0x9/0x10
[ 2461.580103] [<ffffffff810bb2bd>] ? sched_clock_local+0x1d/0x80
[ 2461.580103] [<ffffffff810bb448>] ? sched_clock_cpu+0xa8/0x100
[ 2461.580103] [<ffffffff810e33ad>] ? trace_hardirqs_off+0xd/0x10
[ 2461.580103] [<ffffffff810bb58f>] ? local_clock+0x5f/0x70
[ 2461.580103] [<ffffffff810e3f7f>] ? lock_release_holdtime.part.28+0xf/0x1a0
[ 2461.580103] [<ffffffff8120fec9>] ? fget_light+0xf9/0x510
[ 2461.580103] [<ffffffff8120fe0c>] ? fget_light+0x3c/0x510
[ 2461.580103] [<ffffffff815dd1d2>] __sys_sendmsg+0x42/0x80
[ 2461.580103] [<ffffffff815dd222>] SyS_sendmsg+0x12/0x20
[ 2461.580103] [<ffffffff81741ad9>] system_call_fastpath+0x16/0x1b
[ 2461.580103] modprobe D ffff88000f2c8000 4632 603 602 0x00000080
[ 2461.580103] ffff88000f04fba8 0000000000000046 00000000001d5340 ffff88000f04ffd8
[ 2461.580103] ffff88000f04ffd8 00000000001d5340 ffff8800377d4500 ffff8800377d4500
[ 2461.580103] ffffffff81d0b260 ffffffff81d0b268 ffffffff00000000 ffffffff81d0b2b0
[ 2461.580103] Call Trace:
[ 2461.580103] [<ffffffff817355f9>] schedule+0x29/0x70
[ 2461.580103] [<ffffffff81736d4d>] rwsem_down_write_failed+0xed/0x1a0
[ 2461.580103] [<ffffffff810bb200>] ? update_cpu_load_active+0x10/0xb0
[ 2461.580103] [<ffffffff8137b473>] call_rwsem_down_write_failed+0x13/0x20
[ 2461.580103] [<ffffffff8173492d>] ? down_write+0x9d/0xb2
[ 2461.580103] [<ffffffff8162baa5>] ? genl_lock_all+0x15/0x30
[ 2461.580103] [<ffffffff8162baa5>] genl_lock_all+0x15/0x30
[ 2461.580103] [<ffffffff8162cbb3>] genl_register_family+0x53/0x1f0
[ 2461.580103] [<ffffffffa01dc000>] ? 0xffffffffa01dbfff
[ 2461.580103] [<ffffffff8162d650>] genl_register_family_with_ops+0x20/0x80
[ 2461.580103] [<ffffffffa01dc000>] ? 0xffffffffa01dbfff
[ 2461.580103] [<ffffffffa017fe84>] nl80211_init+0x24/0xf0 [cfg80211]
[ 2461.580103] [<ffffffffa01dc000>] ? 0xffffffffa01dbfff
[ 2461.580103] [<ffffffffa01dc043>] cfg80211_init+0x43/0xdb [cfg80211]
[ 2461.580103] [<ffffffff810020fa>] do_one_initcall+0xfa/0x1b0
[ 2461.580103] [<ffffffff8105cb93>] ? set_memory_nx+0x43/0x50
[ 2461.580103] [<ffffffff810f75af>] load_module+0x1c6f/0x27f0
[ 2461.580103] [<ffffffff810f2c90>] ? store_uevent+0x40/0x40
[ 2461.580103] [<ffffffff810f82c6>] SyS_finit_module+0x86/0xb0
[ 2461.580103] [<ffffffff81741ad9>] system_call_fastpath+0x16/0x1b
[ 2461.580103] Sched Debug Version: v0.10, 3.11.0-0.rc1.git4.1.fc20.x86_64 #1

Problem start to happen after adding net-pf-16-proto-16-family-nl80211
alias name to cfg80211 module by below commit (though that commit
itself is perfectly fine):

commit fb4e156886ce6e8309e912d8b370d192330d19d3
Author: Marcel Holtmann <marcel@holtmann.org>
Date: Sun Apr 28 16:22:06 2013 -0700

nl80211: Add generic netlink module alias for cfg80211/nl80211

Reported-and-tested-by: Jeff Layton <jlayton@redhat.com>
Reported-by: Richard W.M. Jones <rjones@redhat.com>
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Reviewed-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 50754d21 26-Apr-2013 Wei Yongjun <yongjun_wei@trendmicro.com.cn>

genetlink: fix possible memory leak in genl_family_rcv_msg()

'attrbuf' is malloced in genl_family_rcv_msg() when family->maxattr &&
family->parallel_ops, thus should be freed before leaving from the error
handling cases, otherwise it will cause memory leak.

Introduced by commit def3117493eafd9dfa1f809d861e0031b2cc8a07
(genl: Allow concurrent genl callbacks.)

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>


# def31174 23-Apr-2013 Pravin B Shelar <pshelar@nicira.com>

genl: Allow concurrent genl callbacks.

All genl callbacks are serialized by genl-mutex. This can become
bottleneck in multi threaded case.
Following patch adds an parameter to genl_family so that a
particular family can get concurrent netlink callback without
genl_lock held.
New rw-sem is used to protect genl callback from genl family unregister.
in case of parallel_ops genl-family read-lock is taken for callbacks and
write lock is taken for register or unregistration for any family.
In case of locked genl family semaphore and gel-mutex is locked for
any openration.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# f1e79e20 18-Mar-2013 Masatake YAMATO <yamato@redhat.com>

genetlink: trigger BUG_ON if a group name is too long

Trigger BUG_ON if a group name is longer than GENL_NAMSIZ.

Signed-off-by: Masatake YAMATO <yamato@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 15e47304 07-Sep-2012 Eric W. Biederman <ebiederm@xmission.com>

netlink: Rename pid to portid to avoid confusion

It is a frequent mistake to confuse the netlink port identifier with a
process identifier. Try to reduce this confusion by renaming fields
that hold port identifiers portid instead of pid.

I have carefully avoided changing the structures exported to
userspace to avoid changing the userspace API.

I have successfully built an allyesconfig kernel with this change.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 9f00d977 07-Sep-2012 Pablo Neira Ayuso <pablo@netfilter.org>

netlink: hide struct module parameter in netlink_kernel_create

This patch defines netlink_kernel_create as a wrapper function of
__netlink_kernel_create to hide the struct module *me parameter
(which seems to be THIS_MODULE in all existing netlink subsystems).

Suggested by David S. Miller.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 9785e10a 07-Sep-2012 Pablo Neira Ayuso <pablo@netfilter.org>

netlink: kill netlink_set_nonroot

Replace netlink_set_nonroot by one new field `flags' in
struct netlink_kernel_cfg that is passed to netlink_kernel_create.

This patch also renames NL_NONROOT_* to NL_CFG_F_NONROOT_* since
now the flags field in nl_table is generic (so we can add more
flags if needed in the future).

Also adjust all callers in the net-next tree to use these flags
instead of netlink_set_nonroot.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 320f5ea0 23-Jul-2012 WANG Cong <xiyou.wangcong@gmail.com>

genetlink: define lockdep_genl_is_held() when CONFIG_LOCKDEP

lockdep_is_held() is defined when CONFIG_LOCKDEP, not CONFIG_PROVE_LOCKING.

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jesse Gross <jesse@nicira.com>
Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 2c53040f 10-Jul-2012 Ben Hutchings <bhutchings@solarflare.com>

net: Fix (nearly-)kernel-doc comments for various functions

Fix incorrect start markers, wrapped summary lines, missing section
breaks, incorrect separators, and some name mismatches.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a31f2d17 29-Jun-2012 Pablo Neira Ayuso <pablo@netfilter.org>

netlink: add netlink_kernel_cfg parameter to netlink_kernel_create

This patch adds the following structure:

struct netlink_kernel_cfg {
unsigned int groups;
void (*input)(struct sk_buff *skb);
struct mutex *cb_mutex;
};

That can be passed to netlink_kernel_create to set optional configurations
for netlink kernel sockets.

I've populated this structure by looking for NULL and zero parameters at the
existing code. The remaining parameters that always need to be set are still
left in the original interface.

That includes optional parameters for the netlink socket creation. This allows
easy extensibility of this interface in the future.

This patch also adapts all callers to use this new interface.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# e9412c37 29-May-2012 Neil Horman <nhorman@tuxdriver.com>

genetlink: Build a generic netlink family module alias

Generic netlink searches for -type- formatted aliases when requesting a module to
fulfill a protocol request (i.e. net-pf-16-proto-16-type-<x>, where x is a type
value). However generic netlink protocols have no well defined type numbers,
they have string names. Modify genl_ctrl_getfamily to request an alias in the
format net-pf-16-proto-16-family-<x> instead, where x is a generic string, and
add a macro that builds on the previously added MODULE_ALIAS_NET_PF_PROTO_NAME
macro to allow modules to specifify those generic strings.

Note, l2tp previously hacked together an net-pf-16-proto-16-type-l2tp alias
using the MODULE_ALIAS macro, with these updates we can convert that to use the
PROTO_NAME macro.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: Eric Dumazet <eric.dumazet@gmail.com>
CC: James Chapman <jchapman@katalix.com>
CC: David Miller <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 444653f6 29-Mar-2012 David S. Miller <davem@davemloft.net>

genetlink: Stop using NLA_PUT*().

These macros contain a hidden goto, and are thus extremely error
prone and make code hard to audit.

Signed-off-by: David S. Miller <davem@davemloft.net>


# 80d326fa 24-Feb-2012 Pablo Neira Ayuso <pablo@netfilter.org>

netlink: add netlink_dump_control structure for netlink_dump_start()

Davem considers that the argument list of this interface is getting
out of control. This patch tries to address this issue following
his proposal:

struct netlink_dump_control c = { .dump = dump, .done = done, ... };

netlink_dump_start(..., &c);

Suggested by David S. Miller.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a46621a3 30-Jan-2012 Denys Vlasenko <vda.linux@googlemail.com>

net: Deinline __nlmsg_put and genlmsg_put. -7k code on i386 defconfig.

text data bss dec hex filename
8455963 532732 1810804 10799499 a4c98b vmlinux.o.before
8448899 532732 1810804 10792435 a4adf3 vmlinux.o

This change also removes commented-out copy of __nlmsg_put
which was last touched in 2005 with "Enable once all users
have been converted" comment on top.

Changes in v2: rediffed against net-next.

Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# fd778461 02-Jan-2012 Eric Paris <eparis@redhat.com>

security: remove the security_netlink_recv hook as it is equivalent to capable()

Once upon a time netlink was not sync and we had to get the effective
capabilities from the skb that was being received. Today we instead get
the capabilities from the current task. This has rendered the entire
purpose of the hook moot as it is now functionally equivalent to the
capable() call.

Signed-off-by: Eric Paris <eparis@redhat.com>


# fa843095 28-Dec-2011 Stephen Hemminger <shemminger@vyatta.com>

genetlink: add auto module loading

When testing L2TP support, I discovered that the l2tp module is not autoloaded
as are other netlink interfaces. There is because of lack of hook in genetlink to call
request_module and load the module.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b57ef81f 22-Dec-2011 stephen hemminger <shemminger@vyatta.com>

netlink: af_netlink cleanup (v2)

Don't inline functions that cover several lines, and do inline
the trivial ones. Also make some arguments const.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 86b1309c 10-Nov-2011 Pravin B Shelar <pshelar@nicira.com>

genetlink: Add lockdep_genl_is_held().

Open vSwitch uses genl_mutex locking to protect datapath
data-structures like flow-table, flow-actions. Following patch adds
lockdep_genl_is_held() which is used for rcu annotation to prove
locking.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>


# 263ba61d 10-Nov-2011 Pravin B Shelar <pshelar@nicira.com>

genetlink: Add genl_notify()

Open vSwitch uses Generic Netlink interface for communication
between userspace and kernel module. genl_notify() is used
for sending notification back to userspace.

genl_notify() is analogous to rtnl_notify() but uses genl_sock
instead of rtnl.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>


# c7ac8679 09-Jun-2011 Greg Rose <gregory.v.rose@intel.com>

rtnetlink: Compute and store minimum ifinfo dump size

The message size allocated for rtnl ifinfo dumps was limited to
a single page. This is not enough for additional interface info
available with devices that support SR-IOV and caused a bug in
which VF info would not be displayed if more than approximately
40 VFs were created per interface.

Implement a new function pointer for the rtnl_register service that will
calculate the amount of data required for the ifinfo dump and allocate
enough data to satisfy the request.

Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>


# b8f3ab42 18-Jan-2011 David S. Miller <davem@davemloft.net>

Revert "netlink: test for all flags of the NLM_F_DUMP composite"

This reverts commit 0ab03c2b1478f2438d2c80204f7fef65b1bca9cf.

It breaks several things including the avahi daemon.

Signed-off-by: David S. Miller <davem@davemloft.net>


# 0ab03c2b 06-Jan-2011 Jan Engelhardt <jengelh@medozas.de>

netlink: test for all flags of the NLM_F_DUMP composite

Due to NLM_F_DUMP is composed of two bits, NLM_F_ROOT | NLM_F_MATCH,
when doing "if (x & NLM_F_DUMP)", it tests for _either_ of the bits
being set. Because NLM_F_MATCH's value overlaps with NLM_F_EXCL,
non-dump requests with NLM_F_EXCL set are mistaken as dump requests.

Substitute the condition to test for _all_ bits being set.

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ff4c92d8 04-Oct-2010 Johannes Berg <johannes.berg@intel.com>

genetlink: introduce pre_doit/post_doit hooks

Each family may have some amount of boilerplate
locking code that applies to most, or even all,
commands.

This allows a family to handle such things in
a more generic way, by allowing it to
a) include private flags in each operation
b) specify a pre_doit hook that is called,
before an operation's doit() callback and
may return an error directly,
c) specify a post_doit hook that can undo
locking or similar things done by pre_doit,
and finally
d) include two private pointers in each info
struct passed between all these operations
including doit(). (It's two because I'll
need two in nl80211 -- can be extended.)

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>


# 652c6717 25-Jul-2010 Changli Gao <xiaosuo@gmail.com>

genetlink: use genl_register_family_with_ops()

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 416c2f9c 25-Jul-2010 Changli Gao <xiaosuo@gmail.com>

genetlink: cleanup code according to CodingStyle

If the function is exported, the EXPORT* macro for it should follow immediately
after the closing function brace line.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
----
net/netlink/genetlink.c | 9 ++++-----
1 file changed, 4 insertions(+), 5 deletions(-)
Signed-off-by: David S. Miller <davem@davemloft.net>


# f408e0ce 02-Apr-2010 James Chapman <jchapman@katalix.com>

netlink: Export genl_lock() API for use by modules

This lets kernel modules which use genl netlink APIs serialize netlink
processing.

Signed-off-by: James Chapman <jchapman@katalix.com>
Reviewed-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 5a0e3ad6 24-Mar-2010 Tejun Heo <tj@kernel.org>

include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h

percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.

2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).

* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>


# e1d5a010 07-Jan-2010 Samir Bellabes <sam@synack.fr>

genetlink: optimize ctrl_dumpfamily()

there is a unnecessary test which can be replaced by a good initialization in
the 'for' statement

Noticed by Serge E. Hallyn <serue@us.ibm.com>

Signed-off-by: Samir Bellabes <sam@synack.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 988ade6b 14-Oct-2009 Krishna Kumar <krkumar2@in.ibm.com>

genetlink: Optimize and one bug fix in genl_generate_id()

1. GENL_MIN_ID is a valid id -> no need to start at
GENL_MIN_ID + 1.
2. Avoid going through the ids two times: If we start at
GENL_MIN_ID+1 (*or bigger*) and all ids are over!, the
code iterates through the list twice (*or lesser*).
3. Simplify code - no need to start at idx=0 which gets
reset to GENL_MIN_ID.

Patch on net-next-2.6. Reboot test shows that first id
passed to genl_register_family was 16, next two were
GENL_ID_GENERATE and genl_generate_id returned 17 & 18
(user level testing of same code shows expected values
across entire range of MIN/MAX).

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 93860b08 14-Oct-2009 Krishna Kumar <krkumar2@in.ibm.com>

genetlink: Optimize genl_register_family()

genl_register_family() doesn't need to call genl_family_find_byid
when GENL_ID_GENERATE is passed during register.

Patch on net-next-2.6, compile and reboot testing only.

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b8273570 24-Sep-2009 Johannes Berg <johannes@sipsolutions.net>

genetlink: fix netns vs. netlink table locking (2)

Similar to commit d136f1bd366fdb7e747ca7e0218171e7a00a98a5,
there's a bug when unregistering a generic netlink family,
which is caught by the might_sleep() added in that commit:

BUG: sleeping function called from invalid context at net/netlink/af_netlink.c:183
in_atomic(): 1, irqs_disabled(): 0, pid: 1510, name: rmmod
2 locks held by rmmod/1510:
#0: (genl_mutex){+.+.+.}, at: [<ffffffff8138283b>] genl_unregister_family+0x2b/0x130
#1: (rcu_read_lock){.+.+..}, at: [<ffffffff8138270c>] __genl_unregister_mc_group+0x1c/0x120
Pid: 1510, comm: rmmod Not tainted 2.6.31-wl #444
Call Trace:
[<ffffffff81044ff9>] __might_sleep+0x119/0x150
[<ffffffff81380501>] netlink_table_grab+0x21/0x100
[<ffffffff813813a3>] netlink_clear_multicast_users+0x23/0x60
[<ffffffff81382761>] __genl_unregister_mc_group+0x71/0x120
[<ffffffff81382866>] genl_unregister_family+0x56/0x130
[<ffffffffa0007d85>] nl80211_exit+0x15/0x20 [cfg80211]
[<ffffffffa000005a>] cfg80211_exit+0x1a/0x40 [cfg80211]

Fix in the same way by grabbing the netlink table lock
before doing rcu_read_lock().

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d136f1bd 11-Sep-2009 Johannes Berg <johannes@sipsolutions.net>

genetlink: fix netns vs. netlink table locking

Since my commits introducing netns awareness into
genetlink we can get this problem:

BUG: scheduling while atomic: modprobe/1178/0x00000002
2 locks held by modprobe/1178:
#0: (genl_mutex){+.+.+.}, at: [<ffffffff8135ee1a>] genl_register_mc_grou
#1: (rcu_read_lock){.+.+..}, at: [<ffffffff8135eeb5>] genl_register_mc_g
Pid: 1178, comm: modprobe Not tainted 2.6.31-rc8-wl-34789-g95cb731-dirty #
Call Trace:
[<ffffffff8103e285>] __schedule_bug+0x85/0x90
[<ffffffff81403138>] schedule+0x108/0x588
[<ffffffff8135b131>] netlink_table_grab+0xa1/0xf0
[<ffffffff8135c3a7>] netlink_change_ngroups+0x47/0x100
[<ffffffff8135ef0f>] genl_register_mc_group+0x12f/0x290

because I overlooked that netlink_table_grab() will
schedule, thinking it was just the rwlock. However,
in the contention case, that isn't actually true.

Fix this by letting the code grab the netlink table
lock first and then the RCU for netns protection.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b1f57195 04-Sep-2009 Brian Haley <brian.haley@hp.com>

netlink: silence compiler warning

CC net/netlink/genetlink.o
net/netlink/genetlink.c: In function ‘genl_register_mc_group’:
net/netlink/genetlink.c:139: warning: ‘err’ may be used uninitialized in this function

From following the code 'err' is initialized, but set it to zero to
silence the warning.

Signed-off-by: Brian Haley <brian.haley@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 134e6375 10-Jul-2009 Johannes Berg <johannes@sipsolutions.net>

genetlink: make netns aware

This makes generic netlink network namespace aware. No
generic netlink families except for the controller family
are made namespace aware, they need to be checked one by
one and then set the family->netnsok member to true.

A new function genlmsg_multicast_netns() is introduced to
allow sending a multicast message in a given namespace,
for example when it applies to an object that lives in
that namespace, a new function genlmsg_multicast_allns()
to send a message to all network namespaces (for objects
that do not have an associated netns).

The function genlmsg_multicast() is changed to multicast
the message in just init_net, which is currently correct
for all generic netlink families since they only work in
init_net right now. Some will later want to work in all
net namespaces because they do not care about the netns
at all -- those will have to be converted to use one of
the new functions genlmsg_multicast_allns() or
genlmsg_multicast_netns() whenever they are made netns
aware in some way.

After this patch families can easily decide whether or
not they should be available in all net namespaces. Many
genl families us it for objects not related to networking
and should therefore be available in all namespaces, but
that will have to be done on a per family basis.

Note that this doesn't touch on the checkpoint/restart
problem where network namespaces could be used, genl
families and multicast groups are numbered globally and
I see no easy way of changing that, especially since it
must be possible to multicast to all network namespaces
for those families that do not care about netns.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a7b11d73 21-May-2009 Michał Mirosław <mirq-linux@rere.qmqm.pl>

genetlink: Introduce genl_register_family_with_ops()

This introduces genl_register_family_with_ops() that registers a genetlink
family along with operations from a table. This is used to kill copy'n'paste
occurrences in following patches.

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 3efb40c2 20-Dec-2008 Inaky Perez-Gonzalez <inaky@linux.intel.com>

genetlink: export genl_unregister_mc_group()

Add an EXPORT_SYMBOL() to genl_unregister_mc_group(), to allow
unregistering groups on the run. EXPORT_SYMBOL_GPL() is not used as
the rest of the functions exported by this module (eg:
genl_register_mc_group) are also not _GPL().

Cleanup is currently done when unregistering a family, but there is
no way to unregister a single multicast group due to that function not
being exported. Seems to be a mistake as it is documented as for
external consumption.

This is needed by the WiMAX stack to be able to cleanup unused mc
groups.

Signed-off-by: Inaky Perez-Gonzalez <inaky@linux.intel.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>


# 6d1a3fb5 18-Jun-2008 Patrick McHardy <kaber@trash.net>

netlink: genl: fix circular locking

genetlink has a circular locking dependency when dumping the registered
families:

- dump start:
genl_rcv() : take genl_mutex
genl_rcv_msg() : call netlink_dump_start() while holding genl_mutex
netlink_dump_start(),
netlink_dump() : take nlk->cb_mutex
ctrl_dumpfamily() : try to detect this case and not take genl_mutex a
second time

- dump continuance:
netlink_rcv() : call netlink_dump
netlink_dump : take nlk->cb_mutex
ctrl_dumpfamily() : take genl_mutex

Register genl_lock as callback mutex with netlink to fix this. This slightly
widens an already existing module unload race, the genl ops used during the
dump might go away when the module is unloaded. Thomas Graf is working on a
seperate fix for this.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# bc3ed28c 03-Jun-2008 Thomas Graf <tgraf@suug.ch>

netlink: Improve returned error codes

Make nlmsg_trim(), nlmsg_cancel(), genlmsg_cancel(), and
nla_nest_cancel() void functions.

Return -EMSGSIZE instead of -1 if the provided message buffer is not
big enough.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 910d6c32 12-Feb-2008 Pavel Emelyanov <xemul@openvz.org>

[GENETLINK]: Relax dances with genl_lock.

The genl_unregister_family() calls the genl_unregister_mc_groups(),
which takes and releases the genl_lock and then locks and releases
this lock itself.

Relax this behavior, all the more so the genl_unregister_mc_groups()
is called from genl_unregister_family() only.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# cd40b7d3 10-Oct-2007 Denis V. Lunev <den@openvz.org>

[NET]: make netlink user -> kernel interface synchronious

This patch make processing netlink user -> kernel messages synchronious.
This change was inspired by the talk with Alexey Kuznetsov about current
netlink messages processing. He says that he was badly wrong when introduced
asynchronious user -> kernel communication.

The call netlink_unicast is the only path to send message to the kernel
netlink socket. But, unfortunately, it is also used to send data to the
user.

Before this change the user message has been attached to the socket queue
and sk->sk_data_ready was called. The process has been blocked until all
pending messages were processed. The bad thing is that this processing
may occur in the arbitrary process context.

This patch changes nlk->data_ready callback to get 1 skb and force packet
processing right in the netlink_unicast.

Kernel -> user path in netlink_unicast remains untouched.

EINTR processing for in netlink_run_queue was changed. It forces rtnl_lock
drop, but the process remains in the cycle until the message will be fully
processed. So, there is no need to use this kludges now.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 3b71535f 10-Oct-2007 Denis V. Lunev <den@openvz.org>

[NET]: Make netlink processing routines semi-synchronious (inspired by rtnl) v2

The code in netfilter/nfnetlink.c and in ./net/netlink/genetlink.c looks
like outdated copy/paste from rtnetlink.c. Push them into sync with the
original.

Changes from v1:
- deleted comment in nfnetlink_rcv_msg by request of Patrick McHardy

Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 0cfad075 16-Sep-2007 Herbert Xu <herbert@gondor.apana.org.au>

[NETLINK]: Avoid pointer in netlink_run_queue

I was looking at Patrick's fix to inet_diag and it occured
to me that we're using a pointer argument to return values
unnecessarily in netlink_run_queue. Changing it to return
the value will allow the compiler to generate better code
since the value won't have to be memory-backed.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b4b51029 12-Sep-2007 Eric W. Biederman <ebiederm@xmission.com>

[NET]: Support multiple network namespaces with netlink

Each netlink socket will live in exactly one network namespace,
this includes the controlling kernel sockets.

This patch updates all of the existing netlink protocols
to only support the initial network namespace. Request
by clients in other namespaces will get -ECONREFUSED.
As they would if the kernel did not have the support for
that netlink protocol compiled in.

As each netlink protocol is updated to be multiple network
namespace safe it can register multiple kernel sockets
to acquire a presence in the rest of the network namespaces.

The implementation in af_netlink is a simple filter implementation
at hash table insertion and hash table look up time.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 79d310d0 24-Jul-2007 Thomas Graf <tgraf@suug.ch>

[GENETLINK]: Correctly report errors while registering a multicast group

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 2c04ddb7 24-Jul-2007 Thomas Graf <tgraf@suug.ch>

[GENETLINK]: Fix adjustment of number of multicast groups

The current calculation of the maximum number of genetlink
multicast groups seems odd, fix it.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 79dc4386 24-Jul-2007 Thomas Graf <tgraf@suug.ch>

[GENETLINK]: Fix race in genl_unregister_mc_groups()

family->mcast_groups is protected by genl_lock so it must
be held while accessing the list in genl_unregister_mc_groups().
Requires adding a non-locking variant of genl_unregister_mc_group().

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 2dbba6f7 18-Jul-2007 Johannes Berg <johannes@sipsolutions.net>

[GENETLINK]: Dynamic multicast groups.

Introduce API to dynamically register and unregister multicast groups.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Patrick McHardy <kaber@trash.net>
Acked-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ef7c79ed 05-Jun-2007 Patrick McHardy <kaber@trash.net>

[NETLINK]: Mark netlink policies const

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# af65bdfc 20-Apr-2007 Patrick McHardy <kaber@trash.net>

[NETLINK]: Switch cb_lock spinlock to mutex and allow to override it

Switch cb_lock to mutex and allow netlink kernel users to override it
with a subsystem specific mutex for consistent locking in dump callbacks.
All netlink_dump_start users have been audited not to rely on any
side-effects of the previously used spinlock.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c702e804 23-Mar-2007 Thomas Graf <tgraf@suug.ch>

[NETLINK]: Directly return -EINTR from netlink_dump_start()

Now that all users of netlink_dump_start() use netlink_run_queue()
to process the receive queue, it is possible to return -EINTR from
netlink_dump_start() directly, therefore simplying the callers.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 1d00a4eb 23-Mar-2007 Thomas Graf <tgraf@suug.ch>

[NETLINK]: Remove error pointer from netlink message handler

The error pointer argument in netlink message handlers is used
to signal the special case where processing has to be interrupted
because a dump was started but no error happened. Instead it is
simpler and more clear to return -EINTR and have netlink_run_queue()
deal with getting the queue right.

nfnetlink passed on this error pointer to its subsystem handlers
but only uses it to signal the start of a netlink dump. Therefore
it can be removed there as well.

This patch also cleans up the error handling in the affected
message handlers to be consistent since it had to be touched anyway.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 45e7ae7f 23-Mar-2007 Thomas Graf <tgraf@suug.ch>

[NETLINK]: Ignore control messages directly in netlink_run_queue()

Changes netlink_rcv_skb() to skip netlink controll messages and don't
pass them on to the message handler.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d35b6856 23-Mar-2007 Thomas Graf <tgraf@suug.ch>

[NETLINK]: Ignore !NLM_F_REQUEST messages directly in netlink_run_queue()

netlink_rcv_skb() is changed to skip messages which don't have the
NLM_F_REQUEST bit to avoid every netlink family having to perform this
check on their own.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 746fac4d 09-Feb-2007 YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>

[NET] NETLINK: Fix whitespace errors.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 48d4ed7a 06-Dec-2006 Jamal Hadi Salim <hadi@cyberus.ca>

[GENETLINK]: Fix misplaced command flags.

The command flags for dump and do were swapped..

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 334c29a6 04-Dec-2006 Jamal Hadi Salim <hadi@cyberus.ca>

[GENETLINK]: Move command capabilities to flags.

This patch moves command capabilities to command flags. Other than
being cleaner, saves several bytes.
We increment the nlctrl version so as to signal to user space that
to not expect the attributes. We will try to be careful
not to do this too often ;->

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a4d1366d 01-Dec-2006 Jamal Hadi Salim <hadi@cyberus.ca>

[GENETLINK]: Add cmd dump completion.

Remove assumption that generic netlink commands cannot have dump
completion callbacks.

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>


# e94ef682 23-Nov-2006 Thomas Graf <tgraf@suug.ch>

[GENETLINK] ctrl: Avoid empty CTRL_ATTR_OPS attribute when dumping

Based on Jamal's patch but compiled and even tested. :-)

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 17c157c8 14-Nov-2006 Thomas Graf <tgraf@suug.ch>

[GENL]: Add genlmsg_put_reply() to simplify building reply headers

By modyfing genlmsg_put() to take a genl_family and by adding
genlmsg_put_reply() the process of constructing the netlink
and generic netlink headers is simplified.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 81878d27 14-Nov-2006 Thomas Graf <tgraf@suug.ch>

[GENL]: Add genlmsg_reply() to simply unicast replies to requests

A generic netlink user has no interest in knowing how to
address the source of the original request.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 339bf98f 10-Nov-2006 Thomas Graf <tgraf@suug.ch>

[NETLINK]: Do precise netlink message allocations where possible

Account for the netlink message header size directly in nlmsg_new()
instead of relying on the caller calculate it correctly.

Replaces error handling of message construction functions when
constructing notifications with bug traps since a failure implies
a bug in calculating the size of the skb.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# eb328111 18-Sep-2006 Thomas Graf <tgraf@suug.ch>

[GENL]: Provide more information to userspace about registered genl families

Additionaly exports the following information when providing
the list of registered generic netlink families:
- protocol version
- header size
- maximum number of attributes
- list of available operations including
- id
- flags
- avaiability of policy and doit/dumpit function

libnl HEAD provides a utility to read this new information:

0x0010 nlctrl version 1
hdrsize 0 maxattr 6
op GETFAMILY (0x03) [POLICY,DOIT,DUMPIT]
0x0011 NLBL_MGMT version 1
hdrsize 0 maxattr 0
op unknown (0x02) [DOIT]
op unknown (0x03) [DOIT]
....

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 5176f91e 26-Aug-2006 Thomas Graf <tgraf@suug.ch>

[NETLINK]: Make use of NLA_STRING/NLA_NUL_STRING attribute validation

Converts existing NLA_STRING attributes to use the new
validation features, saving a couple of temporary buffers.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d387f6ad 15-Aug-2006 Thomas Graf <tgraf@suug.ch>

[NETLINK]: Add notification message sending interface

Adds nlmsg_notify() implementing proper notification logic. The
message is multicasted to all listeners in the group. The
applications the requests orignates from can request a unicast
back report in which case said socket will be excluded from the
multicast to avoid duplicated notifications.

nlmsg_multicast() is extended to take allocation flags to
allow notification in atomic contexts.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>


# fe4944e5 05-Aug-2006 Thomas Graf <tgraf@suug.ch>

[NETLINK]: Extend netlink messaging interface

Adds:
nlmsg_get_pos() return current position in message
nlmsg_trim() trim part of message
nla_reserve_nohdr(skb, len) reserve room for an attribute w/o hdr
nla_put_nohdr(skb, len, data) add attribute w/o hdr
nla_find_nested() find attribute in nested attributes

Fixes nlmsg_new() to take allocation flags and consider size.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 6ab3d562 30-Jun-2006 Jörn Engel <joern@wohnheim.fh-wedel.de>

Remove obsolete #include <linux/config.h>

Signed-off-by: Jörn Engel <joern@wohnheim.fh-wedel.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>


# c7bdb545 27-Jun-2006 Darrel Goeddel <dgoeddel@trustedcs.com>

[NETLINK]: Encapsulate eff_cap usage within security framework.

This patch encapsulates the usage of eff_cap (in netlink_skb_params) within
the security framework by extending security_netlink_recv to include a required
capability parameter and converting all direct usage of eff_caps outside
of the lsm modules to use the interface. It also updates the SELinux
implementation of the security_netlink_send and security_netlink_recv
hooks to take advantage of the sid in the netlink_skb_params struct.
This also enables SELinux to perform auditing of netlink capability checks.
Please apply, for 2.6.18 if possible.

Signed-off-by: Darrel Goeddel <dgoeddel@trustedcs.com>
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 14cc3e2b 26-Mar-2006 Ingo Molnar <mingo@elte.hu>

[PATCH] sem2mutex: misc static one-file mutexes

Semaphore to mutex conversion.

The conversion was generated via scripts, and the result was validated
automatically via a script as well.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Jens Axboe <axboe@suse.de>
Cc: Neil Brown <neilb@cse.unsw.edu.au>
Acked-by: Alasdair G Kergon <agk@redhat.com>
Cc: Greg KH <greg@kroah.com>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Adam Belay <ambx1@neo.rr.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# e200bd80 13-Feb-2006 Jamal Hadi Salim <hadi@cyberus.ca>

[NETLINK] genetlink: Fix bugs spotted by Andrew Morton.

- panic() doesn't return.

- Don't forget to unlock on genl_register_family() error path

- genl_rcv_msg() is called via pointer so there's no point in declaring it
`inline'.

Notes:

genl_ctrl_event() ignores the genlmsg_multicast() return value.

lots of things ignore the genl_ctrl_event() return value.

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 23b0ca5b 13-Jan-2006 Per Liden <per.liden@ericsson.com>

[PATCH] genetlink: don't touch module ref count

Increasing the module ref count at registration will block the module from
ever being unloaded. In fact, genetlink should not care about the owner at
all. This patch removes the owner field from the struct registered with
genetlink.

Signed-off-by: Per Liden <per.liden@ericsson.com>
Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b461d2f2 03-Jan-2006 Per Liden <per.liden@ericsson.com>

[NETLINK] genetlink: fix cmd type in genl_ops to be consistent to u8

Signed-off-by: Per Liden <per.liden@ericsson.com>
ACKed-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 482a8524 09-Nov-2005 Thomas Graf <tgraf@suug.ch>

[NETLINK]: Generic netlink family

The generic netlink family builds on top of netlink and provides
simplifies access for the less demanding netlink users. It solves
the problem of protocol numbers running out by introducing a so
called controller taking care of id management and name resolving.

Generic netlink modules register themself after filling out their
id card (struct genl_family), after successful registration the
modules are able to register callbacks to command numbers by
filling out a struct genl_ops and calling genl_register_op(). The
registered callbacks are invoked with attributes parsed making
life of simple modules a lot easier.

Although generic netlink modules can request static identifiers,
it is recommended to use GENL_ID_GENERATE and to let the controller
assign a unique identifier to the module. Userspace applications
will then ask the controller and lookup the idenfier by the module
name.

Due to the current multicast implementation of netlink, the number
of generic netlink modules is restricted to 1024 to avoid wasting
memory for the per socket multiacst subscription bitmask.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>