History log of /linux-master/net/netfilter/nf_tables_api.c
Revision Date Author Comments
# 86a1471d 17-Apr-2024 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: fix memleak in map from abort path

The delete set command does not rely on the transaction object for
element removal, therefore, a combination of delete element + delete set
from the abort path could result in restoring twice the refcount of the
mapping.

Check for inactive element in the next generation for the delete element
command in the abort path, skip restoring state if next generation bit
has been already cleared. This is similar to the activate logic using
the set walk iterator.

[ 6170.286929] ------------[ cut here ]------------
[ 6170.286939] WARNING: CPU: 6 PID: 790302 at net/netfilter/nf_tables_api.c:2086 nf_tables_chain_destroy+0x1f7/0x220 [nf_tables]
[ 6170.287071] Modules linked in: [...]
[ 6170.287633] CPU: 6 PID: 790302 Comm: kworker/6:2 Not tainted 6.9.0-rc3+ #365
[ 6170.287768] RIP: 0010:nf_tables_chain_destroy+0x1f7/0x220 [nf_tables]
[ 6170.287886] Code: df 48 8d 7d 58 e8 69 2e 3b df 48 8b 7d 58 e8 80 1b 37 df 48 8d 7d 68 e8 57 2e 3b df 48 8b 7d 68 e8 6e 1b 37 df 48 89 ef eb c4 <0f> 0b 48 83 c4 08 5b 5d 41 5c 41 5d 41 5e 41 5f c3 cc cc cc cc 0f
[ 6170.287895] RSP: 0018:ffff888134b8fd08 EFLAGS: 00010202
[ 6170.287904] RAX: 0000000000000001 RBX: ffff888125bffb28 RCX: dffffc0000000000
[ 6170.287912] RDX: 0000000000000003 RSI: ffffffffa20298ab RDI: ffff88811ebe4750
[ 6170.287919] RBP: ffff88811ebe4700 R08: ffff88838e812650 R09: fffffbfff0623a55
[ 6170.287926] R10: ffffffff8311d2af R11: 0000000000000001 R12: ffff888125bffb10
[ 6170.287933] R13: ffff888125bffb10 R14: dead000000000122 R15: dead000000000100
[ 6170.287940] FS: 0000000000000000(0000) GS:ffff888390b00000(0000) knlGS:0000000000000000
[ 6170.287948] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 6170.287955] CR2: 00007fd31fc00710 CR3: 0000000133f60004 CR4: 00000000001706f0
[ 6170.287962] Call Trace:
[ 6170.287967] <TASK>
[ 6170.287973] ? __warn+0x9f/0x1a0
[ 6170.287986] ? nf_tables_chain_destroy+0x1f7/0x220 [nf_tables]
[ 6170.288092] ? report_bug+0x1b1/0x1e0
[ 6170.287986] ? nf_tables_chain_destroy+0x1f7/0x220 [nf_tables]
[ 6170.288092] ? report_bug+0x1b1/0x1e0
[ 6170.288104] ? handle_bug+0x3c/0x70
[ 6170.288112] ? exc_invalid_op+0x17/0x40
[ 6170.288120] ? asm_exc_invalid_op+0x1a/0x20
[ 6170.288132] ? nf_tables_chain_destroy+0x2b/0x220 [nf_tables]
[ 6170.288243] ? nf_tables_chain_destroy+0x1f7/0x220 [nf_tables]
[ 6170.288366] ? nf_tables_chain_destroy+0x2b/0x220 [nf_tables]
[ 6170.288483] nf_tables_trans_destroy_work+0x588/0x590 [nf_tables]

Fixes: 591054469b3e ("netfilter: nf_tables: revisit chain/object refcounting from elements")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# e79b47a8 17-Apr-2024 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: restore set elements when delete set fails

From abort path, nft_mapelem_activate() needs to restore refcounters to
the original state. Currently, it uses the set->ops->walk() to iterate
over these set elements. The existing set iterator skips inactive
elements in the next generation, this does not work from the abort path
to restore the original state since it has to skip active elements
instead (not inactive ones).

This patch moves the check for inactive elements to the set iterator
callback, then it reverses the logic for the .activate case which
needs to skip active elements.

Toggle next generation bit for elements when delete set command is
invoked and call nft_clear() from .activate (abort) path to restore the
next generation bit.

The splat below shows an object in mappings memleak:

[43929.457523] ------------[ cut here ]------------
[43929.457532] WARNING: CPU: 0 PID: 1139 at include/net/netfilter/nf_tables.h:1237 nft_setelem_data_deactivate+0xe4/0xf0 [nf_tables]
[...]
[43929.458014] RIP: 0010:nft_setelem_data_deactivate+0xe4/0xf0 [nf_tables]
[43929.458076] Code: 83 f8 01 77 ab 49 8d 7c 24 08 e8 37 5e d0 de 49 8b 6c 24 08 48 8d 7d 50 e8 e9 5c d0 de 8b 45 50 8d 50 ff 89 55 50 85 c0 75 86 <0f> 0b eb 82 0f 0b eb b3 0f 1f 40 00 90 90 90 90 90 90 90 90 90 90
[43929.458081] RSP: 0018:ffff888140f9f4b0 EFLAGS: 00010246
[43929.458086] RAX: 0000000000000000 RBX: ffff8881434f5288 RCX: dffffc0000000000
[43929.458090] RDX: 00000000ffffffff RSI: ffffffffa26d28a7 RDI: ffff88810ecc9550
[43929.458093] RBP: ffff88810ecc9500 R08: 0000000000000001 R09: ffffed10281f3e8f
[43929.458096] R10: 0000000000000003 R11: ffff0000ffff0000 R12: ffff8881434f52a0
[43929.458100] R13: ffff888140f9f5f4 R14: ffff888151c7a800 R15: 0000000000000002
[43929.458103] FS: 00007f0c687c4740(0000) GS:ffff888390800000(0000) knlGS:0000000000000000
[43929.458107] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[43929.458111] CR2: 00007f58dbe5b008 CR3: 0000000123602005 CR4: 00000000001706f0
[43929.458114] Call Trace:
[43929.458118] <TASK>
[43929.458121] ? __warn+0x9f/0x1a0
[43929.458127] ? nft_setelem_data_deactivate+0xe4/0xf0 [nf_tables]
[43929.458188] ? report_bug+0x1b1/0x1e0
[43929.458196] ? handle_bug+0x3c/0x70
[43929.458200] ? exc_invalid_op+0x17/0x40
[43929.458211] ? nft_setelem_data_deactivate+0xd7/0xf0 [nf_tables]
[43929.458271] ? nft_setelem_data_deactivate+0xe4/0xf0 [nf_tables]
[43929.458332] nft_mapelem_deactivate+0x24/0x30 [nf_tables]
[43929.458392] nft_rhash_walk+0xdd/0x180 [nf_tables]
[43929.458453] ? __pfx_nft_rhash_walk+0x10/0x10 [nf_tables]
[43929.458512] ? rb_insert_color+0x2e/0x280
[43929.458520] nft_map_deactivate+0xdc/0x1e0 [nf_tables]
[43929.458582] ? __pfx_nft_map_deactivate+0x10/0x10 [nf_tables]
[43929.458642] ? __pfx_nft_mapelem_deactivate+0x10/0x10 [nf_tables]
[43929.458701] ? __rcu_read_unlock+0x46/0x70
[43929.458709] nft_delset+0xff/0x110 [nf_tables]
[43929.458769] nft_flush_table+0x16f/0x460 [nf_tables]
[43929.458830] nf_tables_deltable+0x501/0x580 [nf_tables]

Fixes: 628bd3e49cba ("netfilter: nf_tables: drop map element references from preparation phase")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 29b359cf 10-Apr-2024 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nft_set_pipapo: walk over current view on netlink dump

The generation mask can be updated while netlink dump is in progress.
The pipapo set backend walk iterator cannot rely on it to infer what
view of the datastructure is to be used. Add notation to specify if user
wants to read/update the set.

Based on patch from Florian Westphal.

Fixes: 2b84e215f874 ("netfilter: nft_set_pipapo: .walk does not deal with generations")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# d78d867d 07-Apr-2024 Ziyang Xuan <william.xuanziyang@huawei.com>

netfilter: nf_tables: Fix potential data-race in __nft_obj_type_get()

nft_unregister_obj() can concurrent with __nft_obj_type_get(),
and there is not any protection when iterate over nf_tables_objects
list in __nft_obj_type_get(). Therefore, there is potential data-race
of nf_tables_objects list entry.

Use list_for_each_entry_rcu() to iterate over nf_tables_objects
list in __nft_obj_type_get(), and use rcu_read_lock() in the caller
nft_obj_type_get() to protect the entire type query process.

Fixes: e50092404c1b ("netfilter: nf_tables: add stateful objects")
Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# f969eb84 07-Apr-2024 Ziyang Xuan <william.xuanziyang@huawei.com>

netfilter: nf_tables: Fix potential data-race in __nft_expr_type_get()

nft_unregister_expr() can concurrent with __nft_expr_type_get(),
and there is not any protection when iterate over nf_tables_expressions
list in __nft_expr_type_get(). Therefore, there is potential data-race
of nf_tables_expressions list entry.

Use list_for_each_entry_rcu() to iterate over nf_tables_expressions
list in __nft_expr_type_get(), and use rcu_read_lock() in the caller
nft_expr_type_get() to protect the entire type query process.

Fixes: ef1f7df9170d ("netfilter: nf_tables: expression ops overloading")
Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 1bc83a01 03-Apr-2024 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: discard table flag update with pending basechain deletion

Hook unregistration is deferred to the commit phase, same occurs with
hook updates triggered by the table dormant flag. When both commands are
combined, this results in deleting a basechain while leaving its hook
still registered in the core.

Fixes: 179d9ba5559a ("netfilter: nf_tables: fix table flag updates")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 24225011 03-Apr-2024 Ziyang Xuan <william.xuanziyang@huawei.com>

netfilter: nf_tables: Fix potential data-race in __nft_flowtable_type_get()

nft_unregister_flowtable_type() within nf_flow_inet_module_exit() can
concurrent with __nft_flowtable_type_get() within nf_tables_newflowtable().
And thhere is not any protection when iterate over nf_tables_flowtables
list in __nft_flowtable_type_get(). Therefore, there is pertential
data-race of nf_tables_flowtables list entry.

Use list_for_each_entry_rcu() to iterate over nf_tables_flowtables list
in __nft_flowtable_type_get(), and use rcu_read_lock() in the caller
nft_flowtable_type_get() to protect the entire type query process.

Fixes: 3b49e2e94e6e ("netfilter: nf_tables: add flow table netlink frontend")
Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 994209dd 31-Mar-2024 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: reject new basechain after table flag update

When dormant flag is toggled, hooks are disabled in the commit phase by
iterating over current chains in table (existing and new).

The following configuration allows for an inconsistent state:

add table x
add chain x y { type filter hook input priority 0; }
add table x { flags dormant; }
add chain x w { type filter hook input priority 1; }

which triggers the following warning when trying to unregister chain w
which is already unregistered.

[ 127.322252] WARNING: CPU: 7 PID: 1211 at net/netfilter/core.c:50 1 __nf_unregister_net_hook+0x21a/0x260
[...]
[ 127.322519] Call Trace:
[ 127.322521] <TASK>
[ 127.322524] ? __warn+0x9f/0x1a0
[ 127.322531] ? __nf_unregister_net_hook+0x21a/0x260
[ 127.322537] ? report_bug+0x1b1/0x1e0
[ 127.322545] ? handle_bug+0x3c/0x70
[ 127.322552] ? exc_invalid_op+0x17/0x40
[ 127.322556] ? asm_exc_invalid_op+0x1a/0x20
[ 127.322563] ? kasan_save_free_info+0x3b/0x60
[ 127.322570] ? __nf_unregister_net_hook+0x6a/0x260
[ 127.322577] ? __nf_unregister_net_hook+0x21a/0x260
[ 127.322583] ? __nf_unregister_net_hook+0x6a/0x260
[ 127.322590] ? __nf_tables_unregister_hook+0x8a/0xe0 [nf_tables]
[ 127.322655] nft_table_disable+0x75/0xf0 [nf_tables]
[ 127.322717] nf_tables_commit+0x2571/0x2620 [nf_tables]

Fixes: 179d9ba5559a ("netfilter: nf_tables: fix table flag updates")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 24cea967 02-Apr-2024 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: flush pending destroy work before exit_net release

Similar to 2c9f0293280e ("netfilter: nf_tables: flush pending destroy
work before netlink notifier") to address a race between exit_net and
the destroy workqueue.

The trace below shows an element to be released via destroy workqueue
while exit_net path (triggered via module removal) has already released
the set that is used in such transaction.

[ 1360.547789] BUG: KASAN: slab-use-after-free in nf_tables_trans_destroy_work+0x3f5/0x590 [nf_tables]
[ 1360.547861] Read of size 8 at addr ffff888140500cc0 by task kworker/4:1/152465
[ 1360.547870] CPU: 4 PID: 152465 Comm: kworker/4:1 Not tainted 6.8.0+ #359
[ 1360.547882] Workqueue: events nf_tables_trans_destroy_work [nf_tables]
[ 1360.547984] Call Trace:
[ 1360.547991] <TASK>
[ 1360.547998] dump_stack_lvl+0x53/0x70
[ 1360.548014] print_report+0xc4/0x610
[ 1360.548026] ? __virt_addr_valid+0xba/0x160
[ 1360.548040] ? __pfx__raw_spin_lock_irqsave+0x10/0x10
[ 1360.548054] ? nf_tables_trans_destroy_work+0x3f5/0x590 [nf_tables]
[ 1360.548176] kasan_report+0xae/0xe0
[ 1360.548189] ? nf_tables_trans_destroy_work+0x3f5/0x590 [nf_tables]
[ 1360.548312] nf_tables_trans_destroy_work+0x3f5/0x590 [nf_tables]
[ 1360.548447] ? __pfx_nf_tables_trans_destroy_work+0x10/0x10 [nf_tables]
[ 1360.548577] ? _raw_spin_unlock_irq+0x18/0x30
[ 1360.548591] process_one_work+0x2f1/0x670
[ 1360.548610] worker_thread+0x4d3/0x760
[ 1360.548627] ? __pfx_worker_thread+0x10/0x10
[ 1360.548640] kthread+0x16b/0x1b0
[ 1360.548653] ? __pfx_kthread+0x10/0x10
[ 1360.548665] ret_from_fork+0x2f/0x50
[ 1360.548679] ? __pfx_kthread+0x10/0x10
[ 1360.548690] ret_from_fork_asm+0x1a/0x30
[ 1360.548707] </TASK>

[ 1360.548719] Allocated by task 192061:
[ 1360.548726] kasan_save_stack+0x20/0x40
[ 1360.548739] kasan_save_track+0x14/0x30
[ 1360.548750] __kasan_kmalloc+0x8f/0xa0
[ 1360.548760] __kmalloc_node+0x1f1/0x450
[ 1360.548771] nf_tables_newset+0x10c7/0x1b50 [nf_tables]
[ 1360.548883] nfnetlink_rcv_batch+0xbc4/0xdc0 [nfnetlink]
[ 1360.548909] nfnetlink_rcv+0x1a8/0x1e0 [nfnetlink]
[ 1360.548927] netlink_unicast+0x367/0x4f0
[ 1360.548935] netlink_sendmsg+0x34b/0x610
[ 1360.548944] ____sys_sendmsg+0x4d4/0x510
[ 1360.548953] ___sys_sendmsg+0xc9/0x120
[ 1360.548961] __sys_sendmsg+0xbe/0x140
[ 1360.548971] do_syscall_64+0x55/0x120
[ 1360.548982] entry_SYSCALL_64_after_hwframe+0x55/0x5d

[ 1360.548994] Freed by task 192222:
[ 1360.548999] kasan_save_stack+0x20/0x40
[ 1360.549009] kasan_save_track+0x14/0x30
[ 1360.549019] kasan_save_free_info+0x3b/0x60
[ 1360.549028] poison_slab_object+0x100/0x180
[ 1360.549036] __kasan_slab_free+0x14/0x30
[ 1360.549042] kfree+0xb6/0x260
[ 1360.549049] __nft_release_table+0x473/0x6a0 [nf_tables]
[ 1360.549131] nf_tables_exit_net+0x170/0x240 [nf_tables]
[ 1360.549221] ops_exit_list+0x50/0xa0
[ 1360.549229] free_exit_list+0x101/0x140
[ 1360.549236] unregister_pernet_operations+0x107/0x160
[ 1360.549245] unregister_pernet_subsys+0x1c/0x30
[ 1360.549254] nf_tables_module_exit+0x43/0x80 [nf_tables]
[ 1360.549345] __do_sys_delete_module+0x253/0x370
[ 1360.549352] do_syscall_64+0x55/0x120
[ 1360.549360] entry_SYSCALL_64_after_hwframe+0x55/0x5d

(gdb) list *__nft_release_table+0x473
0x1e033 is in __nft_release_table (net/netfilter/nf_tables_api.c:11354).
11349 list_for_each_entry_safe(flowtable, nf, &table->flowtables, list) {
11350 list_del(&flowtable->list);
11351 nft_use_dec(&table->use);
11352 nf_tables_flowtable_destroy(flowtable);
11353 }
11354 list_for_each_entry_safe(set, ns, &table->sets, list) {
11355 list_del(&set->list);
11356 nft_use_dec(&table->use);
11357 if (set->flags & (NFT_SET_MAP | NFT_SET_OBJECT))
11358 nft_map_deactivate(&ctx, set);
(gdb)

[ 1360.549372] Last potentially related work creation:
[ 1360.549376] kasan_save_stack+0x20/0x40
[ 1360.549384] __kasan_record_aux_stack+0x9b/0xb0
[ 1360.549392] __queue_work+0x3fb/0x780
[ 1360.549399] queue_work_on+0x4f/0x60
[ 1360.549407] nft_rhash_remove+0x33b/0x340 [nf_tables]
[ 1360.549516] nf_tables_commit+0x1c6a/0x2620 [nf_tables]
[ 1360.549625] nfnetlink_rcv_batch+0x728/0xdc0 [nfnetlink]
[ 1360.549647] nfnetlink_rcv+0x1a8/0x1e0 [nfnetlink]
[ 1360.549671] netlink_unicast+0x367/0x4f0
[ 1360.549680] netlink_sendmsg+0x34b/0x610
[ 1360.549690] ____sys_sendmsg+0x4d4/0x510
[ 1360.549697] ___sys_sendmsg+0xc9/0x120
[ 1360.549706] __sys_sendmsg+0xbe/0x140
[ 1360.549715] do_syscall_64+0x55/0x120
[ 1360.549725] entry_SYSCALL_64_after_hwframe+0x55/0x5d

Fixes: 0935d5588400 ("netfilter: nf_tables: asynchronous release")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 0d459e2f 28-Mar-2024 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: release mutex after nft_gc_seq_end from abort path

The commit mutex should not be released during the critical section
between nft_gc_seq_begin() and nft_gc_seq_end(), otherwise, async GC
worker could collect expired objects and get the released commit lock
within the same GC sequence.

nf_tables_module_autoload() temporarily releases the mutex to load
module dependencies, then it goes back to replay the transaction again.
Move it at the end of the abort phase after nft_gc_seq_end() is called.

Cc: stable@vger.kernel.org
Fixes: 720344340fb9 ("netfilter: nf_tables: GC transaction race with abort path")
Reported-by: Kuan-Ting Chen <hexrabbit@devco.re>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# a45e6889 28-Mar-2024 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: release batch on table validation from abort path

Unlike early commit path stage which triggers a call to abort, an
explicit release of the batch is required on abort, otherwise mutex is
released and commit_list remains in place.

Add WARN_ON_ONCE to ensure commit_list is empty from the abort path
before releasing the mutex.

After this patch, commit_list is always assumed to be empty before
grabbing the mutex, therefore

03c1f1ef1584 ("netfilter: Cleanup nft_net->module_list from nf_tables_exit_net()")

only needs to release the pending modules for registration.

Cc: stable@vger.kernel.org
Fixes: c0391b6ab810 ("netfilter: nf_tables: missing validation from the abort path")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 216e7bf7 20-Mar-2024 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: skip netdev hook unregistration if table is dormant

Skip hook unregistration when adding or deleting devices from an
existing netdev basechain. Otherwise, commit/abort path try to
unregister hooks which not enabled.

Fixes: b9703ed44ffb ("netfilter: nf_tables: support for adding new devices to an existing netdev chain")
Fixes: 7d937b107108 ("netfilter: nf_tables: support for deleting devices in an existing netdev chain")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 1e1fb6f0 20-Mar-2024 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: reject table flag and netdev basechain updates

netdev basechain updates are stored in the transaction object hook list.
When setting on the table dormant flag, it iterates over the existing
hooks in the basechain. Thus, skipping the hooks that are being
added/deleted in this transaction, which leaves hook registration in
inconsistent state.

Reject table flag updates in combination with netdev basechain updates
in the same batch:

- Update table flags and add/delete basechain: Check from basechain update
path if there are pending flag updates for this table.
- add/delete basechain and update table flags: Iterate over the transaction
list to search for basechain updates from the table update path.

In both cases, the batch is rejected. Based on suggestion from Florian Westphal.

Fixes: b9703ed44ffb ("netfilter: nf_tables: support for adding new devices to an existing netdev chain")
Fixes: 7d937b107108f ("netfilter: nf_tables: support for deleting devices in an existing netdev chain")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# b32ca27f 20-Mar-2024 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: reject destroy command to remove basechain hooks

Report EOPNOTSUPP if NFT_MSG_DESTROYCHAIN is used to delete hooks in an
existing netdev basechain, thus, only NFT_MSG_DELCHAIN is allowed.

Fixes: 7d937b107108f ("netfilter: nf_tables: support for deleting devices in an existing netdev chain")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 7eaf837a 06-Mar-2024 Quan Tian <tianquan23@gmail.com>

netfilter: nf_tables: Fix a memory leak in nf_tables_updchain

If nft_netdev_register_hooks() fails, the memory associated with
nft_stats is not freed, causing a memory leak.

This patch fixes it by moving nft_stats_alloc() down after
nft_netdev_register_hooks() succeeds.

Fixes: b9703ed44ffb ("netfilter: nf_tables: support for adding new devices to an existing netdev chain")
Signed-off-by: Quan Tian <tianquan23@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 4a0e7f2d 14-Mar-2024 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: do not compare internal table flags on updates

Restore skipping transaction if table update does not modify flags.

Fixes: 179d9ba5559a ("netfilter: nf_tables: fix table flag updates")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# a128885a 04-Jan-2024 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: pass flags to set backend selection routine

No need to refetch the flag from the netlink attribute, pass the
existing flags variable which already provide validated flags.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>


# 31bf508b 21-Dec-2023 Phil Sutter <phil@nwl.cc>

netfilter: nf_tables: Implement table adoption support

Allow a new process to take ownership of a previously owned table,
useful mostly for firewall management services restarting or suspending
when idle.

By extending __NFT_TABLE_F_UPDATE, the on/off/on check in
nf_tables_updtable() also covers table adoption, although it is actually
not needed: Table adoption is irreversible because nf_tables_updtable()
rejects attempts to drop NFT_TABLE_F_OWNER so table->nlpid setting can
happen just once within the transaction.

If the transaction commences, table's nlpid and flags fields are already
set and no further action is required. If it aborts, the table returns
to orphaned state.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Florian Westphal <fw@strlen.de>


# da5141bb 21-Dec-2023 Phil Sutter <phil@nwl.cc>

netfilter: nf_tables: Introduce NFT_TABLE_F_PERSIST

This companion flag to NFT_TABLE_F_OWNER requests the kernel to keep the
table around after the process has exited. It marks such table as
orphaned (by dropping OWNER flag but keeping PERSIST flag in place),
which opens it for other processes to manipulate. For the sake of
simplicity, PERSIST flag may not be altered though.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Florian Westphal <fw@strlen.de>


# 552705a3 04-Mar-2024 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: mark set as dead when unbinding anonymous set with timeout

While the rhashtable set gc runs asynchronously, a race allows it to
collect elements from anonymous sets with timeouts while it is being
released from the commit path.

Mingi Cho originally reported this issue in a different path in 6.1.x
with a pipapo set with low timeouts which is not possible upstream since
7395dfacfff6 ("netfilter: nf_tables: use timestamp to check for set
element timeout").

Fix this by setting on the dead flag for anonymous sets to skip async gc
in this case.

According to 08e4c8c5919f ("netfilter: nf_tables: mark newset as dead on
transaction abort"), Florian plans to accelerate abort path by releasing
objects via workqueue, therefore, this sets on the dead flag for abort
path too.

Cc: stable@vger.kernel.org
Fixes: 5f68718b34a5 ("netfilter: nf_tables: GC transaction API to avoid race with control plane")
Reported-by: Mingi Cho <mgcho.minic@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 5f4fc4bd 29-Feb-2024 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: reject constant set with timeout

This set combination is weird: it allows for elements to be
added/deleted, but once bound to the rule it cannot be updated anymore.
Eventually, all elements expire, leading to an empty set which cannot
be updated anymore. Reject this flags combination.

Cc: stable@vger.kernel.org
Fixes: 761da2935d6e ("netfilter: nf_tables: add set timeout API support")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 16603605 29-Feb-2024 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: disallow anonymous set with timeout flag

Anonymous sets are never used with timeout from userspace, reject this.
Exception to this rule is NFT_SET_EVAL to ensure legacy meters still work.

Cc: stable@vger.kernel.org
Fixes: 761da2935d6e ("netfilter: nf_tables: add set timeout API support")
Reported-by: lonial con <kongln9170@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 195e5f88 21-Feb-2024 Florian Westphal <fw@strlen.de>

netfilter: nf_tables: use kzalloc for hook allocation

KMSAN reports unitialized variable when registering the hook,
reg->hook_ops_type == NF_HOOK_OP_BPF)
~~~~~~~~~~~ undefined

This is a small structure, just use kzalloc to make sure this
won't happen again when new fields get added to nf_hook_ops.

Fixes: 7b4b2fa37587 ("netfilter: annotate nf_tables base hook ops")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# d472e985 19-Feb-2024 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: register hooks last when adding new chain/flowtable

Register hooks last when adding chain/flowtable to ensure that packets do
not walk over datastructure that is being released in the error path
without waiting for the rcu grace period.

Fixes: 91c7b38dc9f0 ("netfilter: nf_tables: use new transaction infrastructure to handle chain")
Fixes: 3b49e2e94e6e ("netfilter: nf_tables: add flow table netlink frontend")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# bccebf64 19-Feb-2024 Florian Westphal <fw@strlen.de>

netfilter: nf_tables: set dormant flag on hook register failure

We need to set the dormant flag again if we fail to register
the hooks.

During memory pressure hook registration can fail and we end up
with a table marked as active but no registered hooks.

On table/base chain deletion, nf_tables will attempt to unregister
the hook again which yields a warn splat from the nftables core.

Reported-and-tested-by: syzbot+de4025c006ec68ac56fc@syzkaller.appspotmail.com
Fixes: 179d9ba5559a ("netfilter: nf_tables: fix table flag updates")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 7395dfac 05-Feb-2024 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: use timestamp to check for set element timeout

Add a timestamp field at the beginning of the transaction, store it
in the nftables per-netns area.

Update set backend .insert, .deactivate and sync gc path to use the
timestamp, this avoids that an element expires while control plane
transaction is still unfinished.

.lookup and .update, which are used from packet path, still use the
current time to check if the element has expired. And .get path and dump
also since this runs lockless under rcu read size lock. Then, there is
async gc which also needs to check the current time since it runs
asynchronously from a workqueue.

Fixes: c3e1b005ed1c ("netfilter: nf_tables: add set element timeout support")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 776d4516 23-Jan-2024 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: restrict tunnel object to NFPROTO_NETDEV

Bail out on using the tunnel dst template from other than netdev family.
Add the infrastructure to check for the family in objects.

Fixes: af308b94a2a4 ("netfilter: nf_tables: add tunnel support")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# f342de4e 20-Jan-2024 Florian Westphal <fw@strlen.de>

netfilter: nf_tables: reject QUEUE/DROP verdict parameters

This reverts commit e0abdadcc6e1.

core.c:nf_hook_slow assumes that the upper 16 bits of NF_DROP
verdicts contain a valid errno, i.e. -EPERM, -EHOSTUNREACH or similar,
or 0.

Due to the reverted commit, its possible to provide a positive
value, e.g. NF_ACCEPT (1), which results in use-after-free.

Its not clear to me why this commit was made.

NF_QUEUE is not used by nftables; "queue" rules in nftables
will result in use of "nft_queue" expression.

If we later need to allow specifiying errno values from userspace
(do not know why), this has to call NF_DROP_GETERR and check that
"err <= 0" holds true.

Fixes: e0abdadcc6e1 ("netfilter: nf_tables: accept QUEUE/DROP verdict parameters")
Cc: stable@vger.kernel.org
Reported-by: Notselwyn <notselwyn@pwning.tech>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# b462579b 19-Jan-2024 Florian Westphal <fw@strlen.de>

netfilter: nf_tables: restrict anonymous set and map names to 16 bytes

nftables has two types of sets/maps, one where userspace defines the
name, and anonymous sets/maps, where userspace defines a template name.

For the latter, kernel requires presence of exactly one "%d".
nftables uses "__set%d" and "__map%d" for this. The kernel will
expand the format specifier and replaces it with the smallest unused
number.

As-is, userspace could define a template name that allows to move
the set name past the 256 bytes upperlimit (post-expansion).

I don't see how this could be a problem, but I would prefer if userspace
cannot do this, so add a limit of 16 bytes for the '%d' template name.

16 bytes is the old total upper limit for set names that existed when
nf_tables was merged initially.

Fixes: 387454901bd6 ("netfilter: nf_tables: Allow set names of up to 255 chars")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 113661e0 14-Jan-2024 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: reject NFT_SET_CONCAT with not field length description

It is still possible to set on the NFT_SET_CONCAT flag by specifying a
set size and no field description, report EINVAL in such case.

Fixes: 1b6345d4160e ("netfilter: nf_tables: check NFT_SET_CONCAT flag if field_count is specified")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 6b1ca88e 14-Jan-2024 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: skip dead set elements in netlink dump

Delete from packet path relies on the garbage collector to purge
elements with NFT_SET_ELEM_DEAD_BIT on.

Skip these dead elements from nf_tables_dump_setelem() path, I very
rarely see tests/shell/testcases/maps/typeof_maps_add_delete reports
[DUMP FAILED] showing a mismatch in the expected output with an element
that should not be there.

If the netlink dump happens before GC worker run, it might show dead
elements in the ruleset listing.

nft_rhash_get() already skips dead elements in nft_rhash_cmp(),
therefore, it already does not show the element when getting a single
element via netlink control plane.

Fixes: 5f68718b34a5 ("netfilter: nf_tables: GC transaction API to avoid race with control plane")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 3ce67e37 14-Jan-2024 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: do not allow mismatch field size and set key length

The set description provides the size of each field in the set whose sum
should not mismatch the set key length, bail out otherwise.

I did not manage to crash nft_set_pipapo with mismatch fields and set key
length so far, but this is UB which must be disallowed.

Fixes: f3a2181e16f1 ("netfilter: nf_tables: Support for sets with multiple ranged fields")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# b1db244f 12-Jan-2024 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: check if catch-all set element is active in next generation

When deactivating the catch-all set element, check the state in the next
generation that represents this transaction.

This bug uncovered after the recent removal of the element busy mark
a2dd0233cbc4 ("netfilter: nf_tables: remove busy mark and gc batch API").

Fixes: aaa31047a6d2 ("netfilter: nftables: add catch-all set element support")
Cc: stable@vger.kernel.org
Reported-by: lonial con <kongln9170@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 3c13725f 07-Jan-2024 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: bail out if stateful expression provides no .clone

All existing NFT_EXPR_STATEFUL provide a .clone interface, remove
fallback to copy content of stateful expression since this is never
exercised and bail out if .clone interface is not defined.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 65b3bd60 06-Jan-2024 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: validate .maxattr at expression registration

struct nft_expr_info allows to store up to NFT_EXPR_MAXATTR (16)
attributes when parsing netlink attributes.

Rise a warning in case there is ever a nft expression whose .maxattr
goes beyond this number of expressions, in such case, struct nft_expr_info
needs to be updated.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 0617c3de 03-Jan-2024 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: reject invalid set policy

Report -EINVAL in case userspace provides a unsupported set backend
policy.

Fixes: c50b960ccc59 ("netfilter: nf_tables: implement proper set selection")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# aaba7ddc 14-Dec-2023 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: validate chain type update if available

Parse netlink attribute containing the chain type in this update, to
bail out if this is different from the existing type.

Otherwise, it is possible to define a chain with the same name, hook and
priority but different type, which is silently ignored.

Fixes: 96518518cc41 ("netfilter: add nftables")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 08e4c8c5 27-Nov-2023 Florian Westphal <fw@strlen.de>

netfilter: nf_tables: mark newset as dead on transaction abort

If a transaction is aborted, we should mark the to-be-released NEWSET dead,
just like commit path does for DEL and DESTROYSET commands.

In both cases all remaining elements will be released via
set->ops->destroy().

The existing abort code does NOT post the actual release to the work queue.
Also the entire __nf_tables_abort() function is wrapped in gc_seq
begin/end pair.

Therefore, async gc worker will never try to release the pending set
elements, as gc sequence is always stale.

It might be possible to speed up transaction aborts via work queue too,
this would result in a race and a possible use-after-free.

So fix this before it becomes an issue.

Fixes: 5f68718b34a5 ("netfilter: nf_tables: GC transaction API to avoid race with control plane")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 3d483faa 09-Nov-2023 Phil Sutter <phil@nwl.cc>

netfilter: nf_tables: Add locking for NFT_MSG_GETSETELEM_RESET requests

Set expressions' dump callbacks are not concurrency-safe per-se with
reset bit set. If two CPUs reset the same element at the same time,
values may underrun at least with element-attached counters and quotas.

Prevent this by introducing dedicated callbacks for nfnetlink and the
asynchronous dump handling to serialize access.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# f649be6d 09-Nov-2023 Phil Sutter <phil@nwl.cc>

netfilter: nf_tables: Introduce nft_set_dump_ctx_init()

This is a wrapper around nft_ctx_init() for use in
nf_tables_getsetelem() and a resetting equivalent introduced later.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 5896e861 09-Nov-2023 Phil Sutter <phil@nwl.cc>

netfilter: nf_tables: Pass const set to nft_get_set_elem

The function is not supposed to alter the set, passing the pointer as
const is fine and merely requires to adjust signatures of two called
functions as well.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 7315dc1e 19-Dec-2023 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: skip set commit for deleted/destroyed sets

NFT_MSG_DELSET deactivates all elements in the set, skip
set->ops->commit() to avoid the unnecessary clone (for the pipapo case)
as well as the sync GC cycle, which could deactivate again expired
elements in such set.

Fixes: 5f68718b34a5 ("netfilter: nf_tables: GC transaction API to avoid race with control plane")
Reported-by: Kevin Rich <kevinrich1337@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# f6e1532a 04-Dec-2023 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: validate family when identifying table via handle

Validate table family when looking up for it via NFTA_TABLE_HANDLE.

Fixes: 3ecbfd65f50e ("netfilter: nf_tables: allocate handle and delete objects via handle")
Reported-by: Xingyuan Mo <hdthky0@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 8837ba3e 13-Nov-2023 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: split async and sync catchall in two functions

list_for_each_entry_safe() does not work for the async case which runs
under RCU, therefore, split GC logic for catchall in two functions
instead, one for each of the sync and async GC variants.

The catchall sync GC variant never sees a _DEAD bit set on ever, thus,
this handling is removed in such case, moreover, allocate GC sync batch
via GFP_KERNEL.

Fixes: 93995bf4af2c ("netfilter: nf_tables: remove catchall element in GC sync path")
Reported-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# a7d5a955 13-Nov-2023 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: bogus ENOENT when destroying element which does not exist

destroy element command bogusly reports ENOENT in case a set element
does not exist. ENOENT errors are skipped, however, err is still set
and propagated to userspace.

# nft destroy element ip raw BLACKLIST { 1.2.3.4 }
Error: Could not process rule: No such file or directory
destroy element ip raw BLACKLIST { 1.2.3.4 }
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Fixes: f80a612dd77c ("netfilter: nf_tables: add support to destroy operation")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 93995bf4 06-Nov-2023 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: remove catchall element in GC sync path

The expired catchall element is not deactivated and removed from GC sync
path. This path holds mutex so just call nft_setelem_data_deactivate()
and nft_setelem_catchall_remove() before queueing the GC work.

Fixes: 4a9e12ea7e70 ("netfilter: nft_set_pipapo: call nft_trans_gc_queue_sync() in catchall GC")
Reported-by: lonial con <kongln9170@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 94090b23 04-Nov-2023 Florian Westphal <fw@strlen.de>

netfilter: add missing module descriptions

W=1 builds warn on missing MODULE_DESCRIPTION, add them.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 9cdee063 24-Oct-2023 Phil Sutter <phil@nwl.cc>

netfilter: nf_tables: Carry reset boolean in nft_set_dump_ctx

Relieve the dump callback from having to check nlmsg_type upon each
call. Prep work for set element reset locking.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 078996fc 18-Oct-2023 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: set->ops->insert returns opaque set element in case of EEXIST

Return struct nft_elem_priv instead of struct nft_set_ext for
consistency with ("netfilter: nf_tables: expose opaque set element as
struct nft_elem_priv") and to prepare the introduction of element
timeout updates from control path.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 0e1ea651 16-Oct-2023 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: shrink memory consumption of set elements

Instead of copying struct nft_set_elem into struct nft_trans_elem, store
the pointer to the opaque set element object in the transaction. Adapt
set backend API (and set backend implementations) to take the pointer to
opaque set element representation whenever required.

This patch deconstifies .remove() and .activate() set backend API since
these modify the set element opaque object. And it also constify
nft_set_elem_ext() this provides access to the nft_set_ext struct
without updating the object.

According to pahole on x86_64, this patch shrinks struct nft_trans_elem
size from 216 to 24 bytes.

This patch also reduces stack memory consumption by removing the
template struct nft_set_elem object, using the opaque set element object
instead such as from the set iterator API, catchall elements and the get
element command.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 9dad402b 18-Oct-2023 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: expose opaque set element as struct nft_elem_priv

Add placeholder structure and place it at the beginning of each struct
nft_*_elem for each existing set backend, instead of exposing elements
as void type to the frontend which defeats compiler type checks. Use
this pointer to this new type to replace void *.

This patch updates the following set backend API to use this new struct
nft_elem_priv placeholder structure:

- update
- deactivate
- flush
- get

as well as the following helper functions:

- nft_set_elem_ext()
- nft_set_elem_init()
- nft_set_elem_destroy()
- nf_tables_set_elem_destroy()

This patch adds nft_elem_priv_cast() to cast struct nft_elem_priv to
native element representation from the corresponding set backend.
BUILD_BUG_ON() makes sure this .priv placeholder is always at the top
of the opaque set element representation.

Suggested-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 6509a2e4 18-Oct-2023 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: set backend .flush always succeeds

.flush is always successful since this results from iterating over the
set elements to toggle mark the element as inactive in the next
generation.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# a5523390 20-Oct-2023 Phil Sutter <phil@nwl.cc>

netfilter: nf_tables: Carry reset boolean in nft_obj_dump_ctx

Relieve the dump callback from having to inspect nlmsg_type upon each
call, just do it once at start of the dump.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 5a893b9c 20-Oct-2023 Phil Sutter <phil@nwl.cc>

netfilter: nf_tables: nft_obj_filter fits into cb->ctx

No need to allocate it if one may just use struct netlink_callback's
scratch area for it.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 2eda95cf 20-Oct-2023 Phil Sutter <phil@nwl.cc>

netfilter: nf_tables: Carry s_idx in nft_obj_dump_ctx

Prep work for moving the context into struct netlink_callback scratch
area.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# ecf49cad 20-Oct-2023 Phil Sutter <phil@nwl.cc>

netfilter: nf_tables: A better name for nft_obj_filter

Name it for what it is supposed to become, a real nft_obj_dump_ctx. No
functional change intended.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 4279cc60 20-Oct-2023 Phil Sutter <phil@nwl.cc>

netfilter: nf_tables: Unconditionally allocate nft_obj_filter

Prep work for moving the filter into struct netlink_callback's scratch
area.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# ff16111c 20-Oct-2023 Phil Sutter <phil@nwl.cc>

netfilter: nf_tables: Drop pointless memset in nf_tables_dump_obj

The code does not make use of cb->args fields past the first one, no
need to zero them.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 3cb03edb 19-Oct-2023 Phil Sutter <phil@nwl.cc>

netfilter: nf_tables: Add locking for NFT_MSG_GETRULE_RESET requests

Rule reset is not concurrency-safe per-se, so multiple CPUs may reset
the same rule at the same time. At least counter and quota expressions
will suffer from value underruns in this case.

Prevent this by introducing dedicated locking callbacks for nfnetlink
and the asynchronous dump handling to serialize access.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 1578c328 19-Oct-2023 Phil Sutter <phil@nwl.cc>

netfilter: nf_tables: Introduce nf_tables_getrule_single()

Outsource the reply skb preparation for non-dump getrule requests into a
distinct function. Prep work for rule reset locking.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 88773930 19-Oct-2023 Phil Sutter <phil@nwl.cc>

netfilter: nf_tables: Open-code audit log call in nf_tables_getrule()

The table lookup will be dropped from that function, so remove that
dependency from audit logging code. Using whatever is in
nla[NFTA_RULE_TABLE] is sufficient as long as the previous rule info
filling succeded.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 99ab9f84 29-Sep-2023 Phil Sutter <phil@nwl.cc>

netfilter: nf_tables: Don't allocate nft_rule_dump_ctx

Since struct netlink_callback::args is not used by rule dumpers anymore,
use it to hold nft_rule_dump_ctx. Add a build-time check to make sure it
won't ever exceed the available space.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Florian Westphal <fw@strlen.de>


# 8194d599 29-Sep-2023 Phil Sutter <phil@nwl.cc>

netfilter: nf_tables: Carry s_idx in nft_rule_dump_ctx

In order to move the context into struct netlink_callback's scratch
area, the latter must be unused first.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Florian Westphal <fw@strlen.de>


# 405c8fd6 29-Sep-2023 Phil Sutter <phil@nwl.cc>

netfilter: nf_tables: Carry reset flag in nft_rule_dump_ctx

This relieves the dump callback from having to check nlmsg_type upon
each call and instead performs the check once in .start callback.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Florian Westphal <fw@strlen.de>


# 30fa41a0 29-Sep-2023 Phil Sutter <phil@nwl.cc>

netfilter: nf_tables: Drop pointless memset when dumping rules

None of the dump callbacks uses netlink_callback::args beyond the first
element, no need to zero the data.

Fixes: 96518518cc41 ("netfilter: add nftables")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Florian Westphal <fw@strlen.de>


# afed2b54 29-Sep-2023 Phil Sutter <phil@nwl.cc>

netfilter: nf_tables: Always allocate nft_rule_dump_ctx

It will move into struct netlink_callback's scratch area later, just put
nf_tables_dump_rules_start in shape to reduce churn later.

Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Florian Westphal <fw@strlen.de>


# 013714bf 01-Sep-2023 Phil Sutter <phil@nwl.cc>

netfilter: nf_tables: Utilize NLA_POLICY_NESTED_ARRAY

Mark attributes which are supposed to be arrays of nested attributes
with known content as such. Originally suggested for
NFTA_RULE_EXPRESSIONS only, but does apply to others as well.

Suggested-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Florian Westphal <fw@strlen.de>


# aee1f692 22-Aug-2023 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: missing extended netlink error in lookup functions

Set netlink extended error reporting for several lookup functions which
allows userspace to infer what is the error cause.

Reported-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>


# f86fb940 18-Oct-2023 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: revert do not remove elements if set backend implements .abort

nf_tables_abort_release() path calls nft_set_elem_destroy() for
NFT_MSG_NEWSETELEM which releases the element, however, a reference to
the element still remains in the working copy.

Fixes: ebd032fa8818 ("netfilter: nf_tables: do not remove elements if set backend implements .abort")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>


# 1baf0152 11-Oct-2023 Phil Sutter <phil@nwl.cc>

netfilter: nf_tables: audit log object reset once per table

When resetting multiple objects at once (via dump request), emit a log
message per table (or filled skb) and resurrect the 'entries' parameter
to contain the number of objects being logged for.

To test the skb exhaustion path, perform some bulk counter and quota
adds in the kselftest.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Reviewed-by: Richard Guy Briggs <rgb@redhat.com>
Acked-by: Paul Moore <paul@paul-moore.com> (Audit)
Signed-off-by: Florian Westphal <fw@strlen.de>


# 505ce063 09-Oct-2023 Xingyuan Mo <hdthky0@gmail.com>

nf_tables: fix NULL pointer dereference in nft_expr_inner_parse()

We should check whether the NFTA_EXPR_NAME netlink attribute is present
before accessing it, otherwise a null pointer deference error will occur.

Call Trace:
<TASK>
dump_stack_lvl+0x4f/0x90
print_report+0x3f0/0x620
kasan_report+0xcd/0x110
__asan_load2+0x7d/0xa0
nla_strcmp+0x2f/0x90
__nft_expr_type_get+0x41/0xb0
nft_expr_inner_parse+0xe3/0x200
nft_inner_init+0x1be/0x2e0
nf_tables_newrule+0x813/0x1230
nfnetlink_rcv_batch+0xec3/0x1170
nfnetlink_rcv+0x1e4/0x220
netlink_unicast+0x34e/0x4b0
netlink_sendmsg+0x45c/0x7e0
__sys_sendto+0x355/0x370
__x64_sys_sendto+0x84/0xa0
do_syscall_64+0x3f/0x90
entry_SYSCALL_64_after_hwframe+0x6e/0xd8

Fixes: 3a07327d10a0 ("netfilter: nft_inner: support for inner tunnel header matching")
Signed-off-by: Xingyuan Mo <hdthky0@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>


# 4c90bba6 02-Oct-2023 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: do not refresh timeout when resetting element

The dump and reset command should not refresh the timeout, this command
is intended to allow users to list existing stateful objects and reset
them, element expiration should be refresh via transaction instead with
a specific command to achieve this, otherwise this is entering combo
semantics that will be hard to be undone later (eg. a user asking to
retrieve counters but _not_ requiring to refresh expiration).

Fixes: 079cd633219d ("netfilter: nf_tables: Introduce NFT_MSG_GETSETELEM_RESET")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>


# ebd032fa 04-Oct-2023 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: do not remove elements if set backend implements .abort

pipapo set backend maintains two copies of the datastructure, removing
the elements from the copy that is going to be discarded slows down
the abort path significantly, from several minutes to few seconds after
this patch.

Fixes: 212ed75dc5fb ("netfilter: nf_tables: integrate pipapo into commit protocol")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>


# 0d880dc6 22-Sep-2023 Phil Sutter <phil@nwl.cc>

netfilter: nf_tables: Deduplicate nft_register_obj audit logs

When adding/updating an object, the transaction handler emits suitable
audit log entries already, the one in nft_obj_notify() is redundant. To
fix that (and retain the audit logging from objects' 'update' callback),
Introduce an "audit log free" variant for internal use.

Fixes: c520292f29b8 ("audit: log nftables configuration change events once per table")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Reviewed-by: Richard Guy Briggs <rgb@redhat.com>
Acked-by: Paul Moore <paul@paul-moore.com> (Audit)
Signed-off-by: Florian Westphal <fw@strlen.de>


# cf5000a7 19-Sep-2023 Florian Westphal <fw@strlen.de>

netfilter: nf_tables: fix memleak when more than 255 elements expired

When more than 255 elements expired we're supposed to switch to a new gc
container structure.

This never happens: u8 type will wrap before reaching the boundary
and nft_trans_gc_space() always returns true.

This means we recycle the initial gc container structure and
lose track of the elements that came before.

While at it, don't deref 'gc' after we've passed it to call_rcu.

Fixes: 5f68718b34a5 ("netfilter: nf_tables: GC transaction API to avoid race with control plane")
Reported-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>


# c9bd2651 15-Sep-2023 Florian Westphal <fw@strlen.de>

netfilter: nf_tables: disable toggling dormant table state more than once

nft -f -<<EOF
add table ip t
add table ip t { flags dormant; }
add chain ip t c { type filter hook input priority 0; }
add table ip t
EOF

Triggers a splat from nf core on next table delete because we lose
track of right hook register state:

WARNING: CPU: 2 PID: 1597 at net/netfilter/core.c:501 __nf_unregister_net_hook
RIP: 0010:__nf_unregister_net_hook+0x41b/0x570
nf_unregister_net_hook+0xb4/0xf0
__nf_tables_unregister_hook+0x160/0x1d0
[..]

The above should have table in *active* state, but in fact no
hooks were registered.

Reject on/off/on games rather than attempting to fix this.

Fixes: 179d9ba5559a ("netfilter: nf_tables: fix table flag updates")
Reported-by: "Lee, Cherie-Anne" <cherie.lee@starlabs.sg>
Cc: Bing-Jhong Billy Jheng <billy@starlabs.sg>
Cc: info@starlabs.sg
Signed-off-by: Florian Westphal <fw@strlen.de>


# 7fb818f2 13-Sep-2023 Phil Sutter <phil@nwl.cc>

netfilter: nf_tables: Fix entries val in rule reset audit log

The value in idx and the number of rules handled in that particular
__nf_tables_dump_rules() call is not identical. The former is a cursor
to pick up from if multiple netlink messages are needed, so its value is
ever increasing. Fixing this is not just a matter of subtracting s_idx
from it, though: When resetting rules in multiple chains,
__nf_tables_dump_rules() is called for each and cb->args[0] is not
adjusted in between. Introduce a dedicated counter to record the number
of rules reset in this call in a less confusing way.

While being at it, prevent the direct return upon buffer exhaustion: Any
rules previously dumped into that skb would evade audit logging
otherwise.

Fixes: 9b5ba5c9c5109 ("netfilter: nf_tables: Unbreak audit log reset")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 23a3bfd4 10-Sep-2023 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: disallow element removal on anonymous sets

Anonymous sets need to be populated once at creation and then they are
bound to rule since 938154b93be8 ("netfilter: nf_tables: reject unbound
anonymous set before commit phase"), otherwise transaction reports
EINVAL.

Userspace does not need to delete elements of anonymous sets that are
not yet bound, reject this with EOPNOTSUPP.

From flush command path, skip anonymous sets, they are expected to be
bound already. Otherwise, EINVAL is hit at the end of this transaction
for unbound sets.

Fixes: 96518518cc41 ("netfilter: add nftables")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 4a9e12ea 06-Sep-2023 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nft_set_pipapo: call nft_trans_gc_queue_sync() in catchall GC

pipapo needs to enqueue GC transactions for catchall elements through
nft_trans_gc_queue_sync(). Add nft_trans_gc_catchall_sync() and
nft_trans_gc_catchall_async() to handle GC transaction queueing
accordingly.

Fixes: 5f68718b34a5 ("netfilter: nf_tables: GC transaction API to avoid race with control plane")
Fixes: f6c383b8c31a ("netfilter: nf_tables: adapt set backend to use GC transaction API")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# f15f29fd 07-Sep-2023 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: disallow rule removal from chain binding

Chain binding only requires the rule addition/insertion command within
the same transaction. Removal of rules from chain bindings within the
same transaction makes no sense, userspace does not utilize this
feature. Replace nft_chain_is_bound() check to nft_chain_binding() in
rule deletion commands. Replace command implies a rule deletion, reject
this command too.

Rule flush command can also safely rely on this nft_chain_binding()
check because unbound chains are not allowed since 62e1e94b246e
("netfilter: nf_tables: reject unbound chain set before commit phase").

Fixes: d0e2c7de92c7 ("netfilter: nf_tables: add NFT_CHAIN_BINDING")
Reported-by: Kevin Rich <kevinrich1337@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 9b5ba5c9 06-Sep-2023 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: Unbreak audit log reset

Deliver audit log from __nf_tables_dump_rules(), table dereference at
the end of the table list loop might point to the list head, leading to
this crash.

[ 4137.407349] BUG: unable to handle page fault for address: 00000000001f3c50
[ 4137.407357] #PF: supervisor read access in kernel mode
[ 4137.407359] #PF: error_code(0x0000) - not-present page
[ 4137.407360] PGD 0 P4D 0
[ 4137.407363] Oops: 0000 [#1] PREEMPT SMP PTI
[ 4137.407365] CPU: 4 PID: 500177 Comm: nft Not tainted 6.5.0+ #277
[ 4137.407369] RIP: 0010:string+0x49/0xd0
[ 4137.407374] Code: ff 77 36 45 89 d1 31 f6 49 01 f9 66 45 85 d2 75 19 eb 1e 49 39 f8 76 02 88 07 48 83 c7 01 83 c6 01 48 83 c2 01 4c 39 cf 74 07 <0f> b6 02 84 c0 75 e2 4c 89 c2 e9 58 e5 ff ff 48 c7 c0 0e b2 ff 81
[ 4137.407377] RSP: 0018:ffff8881179737f0 EFLAGS: 00010286
[ 4137.407379] RAX: 00000000001f2c50 RBX: ffff888117973848 RCX: ffff0a00ffffff04
[ 4137.407380] RDX: 00000000001f3c50 RSI: 0000000000000000 RDI: 0000000000000000
[ 4137.407381] RBP: 0000000000000000 R08: 0000000000000000 R09: 00000000ffffffff
[ 4137.407383] R10: ffffffffffffffff R11: ffff88813584d200 R12: 0000000000000000
[ 4137.407384] R13: ffffffffa15cf709 R14: 0000000000000000 R15: ffffffffa15cf709
[ 4137.407385] FS: 00007fcfc18bb580(0000) GS:ffff88840e700000(0000) knlGS:0000000000000000
[ 4137.407387] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4137.407388] CR2: 00000000001f3c50 CR3: 00000001055b2001 CR4: 00000000001706e0
[ 4137.407390] Call Trace:
[ 4137.407392] <TASK>
[ 4137.407393] ? __die+0x1b/0x60
[ 4137.407397] ? page_fault_oops+0x6b/0xa0
[ 4137.407399] ? exc_page_fault+0x60/0x120
[ 4137.407403] ? asm_exc_page_fault+0x22/0x30
[ 4137.407408] ? string+0x49/0xd0
[ 4137.407410] vsnprintf+0x257/0x4f0
[ 4137.407414] kvasprintf+0x3e/0xb0
[ 4137.407417] kasprintf+0x3e/0x50
[ 4137.407419] nf_tables_dump_rules+0x1c0/0x360 [nf_tables]
[ 4137.407439] ? __alloc_skb+0xc3/0x170
[ 4137.407442] netlink_dump+0x170/0x330
[ 4137.407447] __netlink_dump_start+0x227/0x300
[ 4137.407449] nf_tables_getrule+0x205/0x390 [nf_tables]

Deliver audit log only once at the end of the rule dump+reset for
consistency with the set dump+reset.

Ensure audit reset access to table under rcu read side lock. The table
list iteration holds rcu read lock side, but recent audit code
dereferences table object out of the rcu read lock side.

Fixes: ea078ae9108e ("netfilter: nf_tables: Audit log rule reset")
Fixes: 7e9be1124dbe ("netfilter: nf_tables: Audit log setelem reset")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Florian Westphal <fw@strlen.de>


# ea078ae9 29-Aug-2023 Phil Sutter <phil@nwl.cc>

netfilter: nf_tables: Audit log rule reset

Resetting rules' stateful data happens outside of the transaction logic,
so 'get' and 'dump' handlers have to emit audit log entries themselves.

Fixes: 8daa8fde3fc3f ("netfilter: nf_tables: Introduce NFT_MSG_GETRULE_RESET")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Reviewed-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 7e9be112 29-Aug-2023 Phil Sutter <phil@nwl.cc>

netfilter: nf_tables: Audit log setelem reset

Since set element reset is not integrated into nf_tables' transaction
logic, an explicit log call is needed, similar to NFT_MSG_GETOBJ_RESET
handling.

For the sake of simplicity, catchall element reset will always generate
a dedicated log entry. This relieves nf_tables_dump_set() from having to
adjust the logged element count depending on whether a catchall element
was found or not.

Fixes: 079cd633219d7 ("netfilter: nf_tables: Introduce NFT_MSG_GETSETELEM_RESET")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Reviewed-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 169384fb 21-Jun-2023 Florian Westphal <fw@strlen.de>

netfilter: nf_tables: allow loop termination for pending fatal signal

abort early so task can exit faster if a fatal signal is pending,
no need to continue validation in that case.

Signed-off-by: Florian Westphal <fw@strlen.de>


# 8357bc94 21-Aug-2023 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: use correct lock to protect gc_list

Use nf_tables_gc_list_lock spinlock, not nf_tables_destroy_list_lock to
protect the gc list.

Fixes: 5f68718b34a5 ("netfilter: nf_tables: GC transaction API to avoid race with control plane")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>


# 72034434 17-Aug-2023 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: GC transaction race with abort path

Abort path is missing a synchronization point with GC transactions. Add
GC sequence number hence any GC transaction losing race will be
discarded.

Fixes: 5f68718b34a5 ("netfilter: nf_tables: GC transaction API to avoid race with control plane")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>


# 2c9f0293 17-Aug-2023 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: flush pending destroy work before netlink notifier

Destroy work waits for the RCU grace period then it releases the objects
with no mutex held. All releases objects follow this path for
transactions, therefore, order is guaranteed and references to top-level
objects in the hierarchy remain valid.

However, netlink notifier might interfer with pending destroy work.
rcu_barrier() is not correct because objects are not release via RCU
callback. Flush destroy work before releasing objects from netlink
notifier path.

Fixes: d4bc8271db21 ("netfilter: nf_tables: netlink notifier might race to release objects")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>


# 4b80ced9 17-Aug-2023 Florian Westphal <fw@strlen.de>

netfilter: nf_tables: validate all pending tables

We have to validate all tables in the transaction that are in
VALIDATE_DO state, the blamed commit below did not move the break
statement to its right location so we only validate one table.

Moreover, we can't init table->validate to _SKIP when a table object
is allocated.

If we do, then if a transcaction creates a new table and then
fails the transaction, nfnetlink will loop and nft will hang until
user cancels the command.

Add back the pernet state as a place to stash the last state encountered.
This is either _DO (we hit an error during commit validation) or _SKIP
(transaction passed all checks).

Fixes: 00c320f9b755 ("netfilter: nf_tables: make validation state per table")
Reported-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>


# 02c6c244 15-Aug-2023 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: GC transaction race with netns dismantle

Use maybe_get_net() since GC workqueue might race with netns exit path.

Fixes: 5f68718b34a5 ("netfilter: nf_tables: GC transaction API to avoid race with control plane")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>


# 6a33d8b7 15-Aug-2023 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: fix GC transaction races with netns and netlink event exit path

Netlink event path is missing a synchronization point with GC
transactions. Add GC sequence number update to netns release path and
netlink event path, any GC transaction losing race will be discarded.

Fixes: 5f68718b34a5 ("netfilter: nf_tables: GC transaction API to avoid race with control plane")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>


# 90e5b346 12-Aug-2023 Florian Westphal <fw@strlen.de>

netfilter: nf_tables: deactivate catchall elements in next generation

When flushing, individual set elements are disabled in the next
generation via the ->flush callback.

Catchall elements are not disabled. This is incorrect and may lead to
double-deactivations of catchall elements which then results in memory
leaks:

WARNING: CPU: 1 PID: 3300 at include/net/netfilter/nf_tables.h:1172 nft_map_deactivate+0x549/0x730
CPU: 1 PID: 3300 Comm: nft Not tainted 6.5.0-rc5+ #60
RIP: 0010:nft_map_deactivate+0x549/0x730
[..]
? nft_map_deactivate+0x549/0x730
nf_tables_delset+0xb66/0xeb0

(the warn is due to nft_use_dec() detecting underflow).

Fixes: aaa31047a6d2 ("netfilter: nftables: add catch-all set element support")
Reported-by: lonial con <kongln9170@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>


# a2dd0233 09-Aug-2023 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: remove busy mark and gc batch API

Ditch it, it has been replace it by the GC transaction API and it has no
clients anymore.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# f6c383b8 09-Aug-2023 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: adapt set backend to use GC transaction API

Use the GC transaction API to replace the old and buggy gc API and the
busy mark approach.

No set elements are removed from async garbage collection anymore,
instead the _DEAD bit is set on so the set element is not visible from
lookup path anymore. Async GC enqueues transaction work that might be
aborted and retried later.

rbtree and pipapo set backends does not set on the _DEAD bit from the
sync GC path since this runs in control plane path where mutex is held.
In this case, set elements are deactivated, removed and then released
via RCU callback, sync GC never fails.

Fixes: 3c4287f62044 ("nf_tables: Add set type for arbitrary concatenation of ranges")
Fixes: 8d8540c4f5e0 ("netfilter: nft_set_rbtree: add timeout support")
Fixes: 9d0982927e79 ("netfilter: nft_hash: add support for timeouts")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 5f68718b 09-Aug-2023 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: GC transaction API to avoid race with control plane

The set types rhashtable and rbtree use a GC worker to reclaim memory.
From system work queue, in periodic intervals, a scan of the table is
done.

The major caveat here is that the nft transaction mutex is not held.
This causes a race between control plane and GC when they attempt to
delete the same element.

We cannot grab the netlink mutex from the work queue, because the
control plane has to wait for the GC work queue in case the set is to be
removed, so we get following deadlock:

cpu 1 cpu2
GC work transaction comes in , lock nft mutex
`acquire nft mutex // BLOCKS
transaction asks to remove the set
set destruction calls cancel_work_sync()

cancel_work_sync will now block forever, because it is waiting for the
mutex the caller already owns.

This patch adds a new API that deals with garbage collection in two
steps:

1) Lockless GC of expired elements sets on the NFT_SET_ELEM_DEAD_BIT
so they are not visible via lookup. Annotate current GC sequence in
the GC transaction. Enqueue GC transaction work as soon as it is
full. If ruleset is updated, then GC transaction is aborted and
retried later.

2) GC work grabs the mutex. If GC sequence has changed then this GC
transaction lost race with control plane, abort it as it contains
stale references to objects and let GC try again later. If the
ruleset is intact, then this GC transaction deactivates and removes
the elements and it uses call_rcu() to destroy elements.

Note that no elements are removed from GC lockless path, the _DEAD bit
is set and pointers are collected. GC catchall does not remove the
elements anymore too. There is a new set->dead flag that is set on to
abort the GC transaction to deal with set->ops->destroy() path which
removes the remaining elements in the set from commit_release, where no
mutex is held.

To deal with GC when mutex is held, which allows safe deactivate and
removal, add sync GC API which releases the set element object via
call_rcu(). This is used by rbtree and pipapo backends which also
perform garbage collection from control plane path.

Since element removal from sets can happen from control plane and
element garbage collection/timeout, it is necessary to keep the set
structure alive until all elements have been deactivated and destroyed.

We cannot do a cancel_work_sync or flush_work in nft_set_destroy because
its called with the transaction mutex held, but the aforementioned async
work queue might be blocked on the very mutex that nft_set_destroy()
callchain is sitting on.

This gives us the choice of ABBA deadlock or UaF.

To avoid both, add set->refs refcount_t member. The GC API can then
increment the set refcount and release it once the elements have been
free'd.

Set backends are adapted to use the GC transaction API in a follow up
patch entitled:

("netfilter: nf_tables: use gc transaction API in set backends")

This is joint work with Florian Westphal.

Fixes: cfed7e1b1f8e ("netfilter: nf_tables: add set garbage collection helpers")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 24138933 09-Aug-2023 Florian Westphal <fw@strlen.de>

netfilter: nf_tables: don't skip expired elements during walk

There is an asymmetry between commit/abort and preparation phase if the
following conditions are met:

1. set is a verdict map ("1.2.3.4 : jump foo")
2. timeouts are enabled

In this case, following sequence is problematic:

1. element E in set S refers to chain C
2. userspace requests removal of set S
3. kernel does a set walk to decrement chain->use count for all elements
from preparation phase
4. kernel does another set walk to remove elements from the commit phase
(or another walk to do a chain->use increment for all elements from
abort phase)

If E has already expired in 1), it will be ignored during list walk, so its use count
won't have been changed.

Then, when set is culled, ->destroy callback will zap the element via
nf_tables_set_elem_destroy(), but this function is only safe for
elements that have been deactivated earlier from the preparation phase:
lack of earlier deactivate removes the element but leaks the chain use
count, which results in a WARN splat when the chain gets removed later,
plus a leak of the nft_chain structure.

Update pipapo_get() not to skip expired elements, otherwise flush
command reports bogus ENOENT errors.

Fixes: 3c4287f62044 ("nf_tables: Add set type for arbitrary concatenation of ranges")
Fixes: 8d8540c4f5e0 ("netfilter: nft_set_rbtree: add timeout support")
Fixes: 9d0982927e79 ("netfilter: nft_hash: add support for timeouts")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 0ebc1064 23-Jul-2023 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: disallow rule addition to bound chain via NFTA_RULE_CHAIN_ID

Bail out with EOPNOTSUPP when adding rule to bound chain via
NFTA_RULE_CHAIN_ID. The following warning splat is shown when
adding a rule to a deleted bound chain:

WARNING: CPU: 2 PID: 13692 at net/netfilter/nf_tables_api.c:2013 nf_tables_chain_destroy+0x1f7/0x210 [nf_tables]
CPU: 2 PID: 13692 Comm: chain-bound-rul Not tainted 6.1.39 #1
RIP: 0010:nf_tables_chain_destroy+0x1f7/0x210 [nf_tables]

Fixes: d0e2c7de92c7 ("netfilter: nf_tables: add NFT_CHAIN_BINDING")
Reported-by: Kevin Rich <kevinrich1337@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>


# 6eaf41e8 20-Jul-2023 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: skip bound chain on rule flush

Skip bound chain when flushing table rules, the rule that owns this
chain releases these objects.

Otherwise, the following warning is triggered:

WARNING: CPU: 2 PID: 1217 at net/netfilter/nf_tables_api.c:2013 nf_tables_chain_destroy+0x1f7/0x210 [nf_tables]
CPU: 2 PID: 1217 Comm: chain-flush Not tainted 6.1.39 #1
RIP: 0010:nf_tables_chain_destroy+0x1f7/0x210 [nf_tables]

Fixes: d0e2c7de92c7 ("netfilter: nf_tables: add NFT_CHAIN_BINDING")
Reported-by: Kevin Rich <kevinrich1337@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>


# 751d460c 19-Jul-2023 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: skip bound chain in netns release path

Skip bound chain from netns release path, the rule that owns this chain
releases these objects.

Fixes: d0e2c7de92c7 ("netfilter: nf_tables: add NFT_CHAIN_BINDING")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>


# 314c8284 17-Jul-2023 Florian Westphal <fw@strlen.de>

netfilter: nf_tables: can't schedule in nft_chain_validate

Can be called via nft set element list iteration, which may acquire
rcu and/or bh read lock (depends on set type).

BUG: sleeping function called from invalid context at net/netfilter/nf_tables_api.c:3353
in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 1232, name: nft
preempt_count: 0, expected: 0
RCU nest depth: 1, expected: 0
2 locks held by nft/1232:
#0: ffff8881180e3ea8 (&nft_net->commit_mutex){+.+.}-{3:3}, at: nf_tables_valid_genid
#1: ffffffff83f5f540 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire
Call Trace:
nft_chain_validate
nft_lookup_validate_setelem
nft_pipapo_walk
nft_lookup_validate
nft_chain_validate
nft_immediate_validate
nft_chain_validate
nf_tables_validate
nf_tables_abort

No choice but to move it to nf_tables_validate().

Fixes: 81ea01066741 ("netfilter: nf_tables: add rescheduling points during loop detection walks")
Signed-off-by: Florian Westphal <fw@strlen.de>


# ddbd8be6 19-Jul-2023 Florian Westphal <fw@strlen.de>

netfilter: nf_tables: fix spurious set element insertion failure

On some platforms there is a padding hole in the nft_verdict
structure, between the verdict code and the chain pointer.

On element insertion, if the new element clashes with an existing one and
NLM_F_EXCL flag isn't set, we want to ignore the -EEXIST error as long as
the data associated with duplicated element is the same as the existing
one. The data equality check uses memcmp.

For normal data (NFT_DATA_VALUE) this works fine, but for NFT_DATA_VERDICT
padding area leads to spurious failure even if the verdict data is the
same.

This then makes the insertion fail with 'already exists' error, even
though the new "key : data" matches an existing entry and userspace
told the kernel that it doesn't want to receive an error indication.

Fixes: c016c7e45ddf ("netfilter: nf_tables: honor NLM_F_EXCL flag in set element insertion")
Signed-off-by: Florian Westphal <fw@strlen.de>


# 515ad530 05-Jul-2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com>

netfilter: nf_tables: do not ignore genmask when looking up chain by id

When adding a rule to a chain referring to its ID, if that chain had been
deleted on the same batch, the rule might end up referring to a deleted
chain.

This will lead to a WARNING like following:

[ 33.098431] ------------[ cut here ]------------
[ 33.098678] WARNING: CPU: 5 PID: 69 at net/netfilter/nf_tables_api.c:2037 nf_tables_chain_destroy+0x23d/0x260
[ 33.099217] Modules linked in:
[ 33.099388] CPU: 5 PID: 69 Comm: kworker/5:1 Not tainted 6.4.0+ #409
[ 33.099726] Workqueue: events nf_tables_trans_destroy_work
[ 33.100018] RIP: 0010:nf_tables_chain_destroy+0x23d/0x260
[ 33.100306] Code: 8b 7c 24 68 e8 64 9c ed fe 4c 89 e7 e8 5c 9c ed fe 48 83 c4 08 5b 41 5c 41 5d 41 5e 41 5f 5d 31 c0 89 c6 89 c7 c3 cc cc cc cc <0f> 0b 48 83 c4 08 5b 41 5c 41 5d 41 5e 41 5f 5d 31 c0 89 c6 89 c7
[ 33.101271] RSP: 0018:ffffc900004ffc48 EFLAGS: 00010202
[ 33.101546] RAX: 0000000000000001 RBX: ffff888006fc0a28 RCX: 0000000000000000
[ 33.101920] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[ 33.102649] RBP: ffffc900004ffc78 R08: 0000000000000000 R09: 0000000000000000
[ 33.103018] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8880135ef500
[ 33.103385] R13: 0000000000000000 R14: dead000000000122 R15: ffff888006fc0a10
[ 33.103762] FS: 0000000000000000(0000) GS:ffff888024c80000(0000) knlGS:0000000000000000
[ 33.104184] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 33.104493] CR2: 00007fe863b56a50 CR3: 00000000124b0001 CR4: 0000000000770ee0
[ 33.104872] PKRU: 55555554
[ 33.104999] Call Trace:
[ 33.105113] <TASK>
[ 33.105214] ? show_regs+0x72/0x90
[ 33.105371] ? __warn+0xa5/0x210
[ 33.105520] ? nf_tables_chain_destroy+0x23d/0x260
[ 33.105732] ? report_bug+0x1f2/0x200
[ 33.105902] ? handle_bug+0x46/0x90
[ 33.106546] ? exc_invalid_op+0x19/0x50
[ 33.106762] ? asm_exc_invalid_op+0x1b/0x20
[ 33.106995] ? nf_tables_chain_destroy+0x23d/0x260
[ 33.107249] ? nf_tables_chain_destroy+0x30/0x260
[ 33.107506] nf_tables_trans_destroy_work+0x669/0x680
[ 33.107782] ? mark_held_locks+0x28/0xa0
[ 33.107996] ? __pfx_nf_tables_trans_destroy_work+0x10/0x10
[ 33.108294] ? _raw_spin_unlock_irq+0x28/0x70
[ 33.108538] process_one_work+0x68c/0xb70
[ 33.108755] ? lock_acquire+0x17f/0x420
[ 33.108977] ? __pfx_process_one_work+0x10/0x10
[ 33.109218] ? do_raw_spin_lock+0x128/0x1d0
[ 33.109435] ? _raw_spin_lock_irq+0x71/0x80
[ 33.109634] worker_thread+0x2bd/0x700
[ 33.109817] ? __pfx_worker_thread+0x10/0x10
[ 33.110254] kthread+0x18b/0x1d0
[ 33.110410] ? __pfx_kthread+0x10/0x10
[ 33.110581] ret_from_fork+0x29/0x50
[ 33.110757] </TASK>
[ 33.110866] irq event stamp: 1651
[ 33.111017] hardirqs last enabled at (1659): [<ffffffffa206a209>] __up_console_sem+0x79/0xa0
[ 33.111379] hardirqs last disabled at (1666): [<ffffffffa206a1ee>] __up_console_sem+0x5e/0xa0
[ 33.111740] softirqs last enabled at (1616): [<ffffffffa1f5d40e>] __irq_exit_rcu+0x9e/0xe0
[ 33.112094] softirqs last disabled at (1367): [<ffffffffa1f5d40e>] __irq_exit_rcu+0x9e/0xe0
[ 33.112453] ---[ end trace 0000000000000000 ]---

This is due to the nft_chain_lookup_byid ignoring the genmask. After this
change, adding the new rule will fail as it will not find the chain.

Fixes: 837830a4b439 ("netfilter: nf_tables: add NFTA_RULE_CHAIN_ID attribute")
Cc: stable@vger.kernel.org
Reported-by: Mingi Cho of Theori working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 1689f259 28-Jun-2023 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: report use refcount overflow

Overflow use refcount checks are not complete.

Add helper function to deal with object reference counter tracking.
Report -EMFILE in case UINT_MAX is reached.

nft_use_dec() splats in case that reference counter underflows,
which should not ever happen.

Add nft_use_inc_restore() and nft_use_dec_restore() which are used
to restore reference counter from error and abort paths.

Use u32 in nft_flowtable and nft_object since helper functions cannot
work on bitfields.

Remove the few early incomplete checks now that the helper functions
are in place and used to check for refcount overflow.

Fixes: 96518518cc41 ("netfilter: add nftables")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# b389139f 25-Jun-2023 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: fix underflow in chain reference counter

Set element addition error path decrements reference counter on chains
twice: once on element release and again via nft_data_release().

Then, d6b478666ffa ("netfilter: nf_tables: fix underflow in object
reference counter") incorrectly fixed this by removing the stateful
object reference count decrement.

Restore the stateful object decrement as in b91d90368837 ("netfilter:
nf_tables: fix leaking object reference count") and let
nft_data_release() decrement the chain reference counter, so this is
done only once.

Fixes: d6b478666ffa ("netfilter: nf_tables: fix underflow in object reference counter")
Fixes: 628bd3e49cba ("netfilter: nf_tables: drop map element references from preparation phase")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 3e704897 25-Jun-2023 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: unbind non-anonymous set if rule construction fails

Otherwise a dangling reference to a rule object that is gone remains
in the set binding list.

Fixes: 26b5a5712eb8 ("netfilter: nf_tables: add NFT_TRANS_PREPARE_ERROR to deal with bound set/chain")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 079cd633 15-Jun-2023 Phil Sutter <phil@nwl.cc>

netfilter: nf_tables: Introduce NFT_MSG_GETSETELEM_RESET

Analogous to NFT_MSG_GETOBJ_RESET, but for set elements with a timeout
or attached stateful expressions like counters or quotas - reset them
all at once. Respect a per element timeout value if present to reset the
'expires' value to.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 96b2ef9b 06-Jun-2023 Florian Westphal <fw@strlen.de>

netfilter: nf_tables: permit update of set size

Now that set->nelems is always updated permit update of the sets max size.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 42e344f0 16-Jun-2023 Phil Sutter <phil@nwl.cc>

netfilter: nf_tables: Fix for deleting base chains with payload

When deleting a base chain, iptables-nft simply submits the whole chain
to the kernel, including the NFTA_CHAIN_HOOK attribute. The new code
added by fixed commit then turned this into a chain update, destroying
the hook but not the chain itself. Detect the situation by checking if
the chain type is either netdev or inet/ingress.

Fixes: 7d937b107108f ("netfilter: nf_tables: support for deleting devices in an existing netdev chain")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 043d2acf 14-Jun-2023 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: drop module reference after updating chain

Otherwise the module reference counter is leaked.

Fixes b9703ed44ffb ("netfilter: nf_tables: support for adding new devices to an existing netdev chain")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# e26d3009 16-Jun-2023 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: disallow timeout for anonymous sets

Never used from userspace, disallow these parameters.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# b770283c 16-Jun-2023 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: disallow updates of anonymous sets

Disallow updates of set timeout and garbage collection parameters for
anonymous sets.

Fixes: 123b99619cca ("netfilter: nf_tables: honor set timeout and garbage collection updates")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 62e1e94b 16-Jun-2023 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: reject unbound chain set before commit phase

Use binding list to track set transaction and to check for unbound
chains before entering the commit phase.

Bail out if chain binding remain unused before entering the commit
step.

Fixes: d0e2c7de92c7 ("netfilter: nf_tables: add NFT_CHAIN_BINDING")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 938154b9 16-Jun-2023 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: reject unbound anonymous set before commit phase

Add a new list to track set transaction and to check for unbound
anonymous sets before entering the commit phase.

Bail out at the end of the transaction handling if an anonymous set
remains unbound.

Fixes: 96518518cc41 ("netfilter: add nftables")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# c88c535b 16-Jun-2023 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: disallow element updates of bound anonymous sets

Anonymous sets come with NFT_SET_CONSTANT from userspace. Although API
allows to create anonymous sets without NFT_SET_CONSTANT, it makes no
sense to allow to add and to delete elements for bound anonymous sets.

Fixes: 96518518cc41 ("netfilter: add nftables")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# d6b47866 16-Jun-2023 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: fix underflow in object reference counter

Since ("netfilter: nf_tables: drop map element references from
preparation phase"), integration with commit protocol is better,
therefore drop the workaround that b91d90368837 ("netfilter: nf_tables:
fix leaking object reference count") provides.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 628bd3e4 16-Jun-2023 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: drop map element references from preparation phase

set .destroy callback releases the references to other objects in maps.
This is very late and it results in spurious EBUSY errors. Drop refcount
from the preparation phase instead, update set backend not to drop
reference counter from set .destroy path.

Exceptions: NFT_TRANS_PREPARE_ERROR does not require to drop the
reference counter because the transaction abort path releases the map
references for each element since the set is unbound. The abort path
also deals with releasing reference counter for new elements added to
unbound sets.

Fixes: 591054469b3e ("netfilter: nf_tables: revisit chain/object refcounting from elements")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 26b5a571 16-Jun-2023 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: add NFT_TRANS_PREPARE_ERROR to deal with bound set/chain

Add a new state to deal with rule expressions deactivation from the
newrule error path, otherwise the anonymous set remains in the list in
inactive state for the next generation. Mark the set/chain transaction
as unbound so the abort path releases this object, set it as inactive in
the next generation so it is not reachable anymore from this transaction
and reference counter is dropped.

Fixes: 1240eb93f061 ("netfilter: nf_tables: incorrect error path handling with NFT_MSG_NEWRULE")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 4bedf9ee 16-Jun-2023 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: fix chain binding transaction logic

Add bound flag to rule and chain transactions as in 6a0a8d10a366
("netfilter: nf_tables: use-after-free in failing rule with bound set")
to skip them in case that the chain is already bound from the abort
path.

This patch fixes an imbalance in the chain use refcnt that triggers a
WARN_ON on the table and chain destroy path.

This patch also disallows nested chain bindings, which is not
supported from userspace.

The logic to deal with chain binding in nft_data_hold() and
nft_data_release() is not correct. The NFT_TRANS_PREPARE state needs a
special handling in case a chain is bound but next expressions in the
same rule fail to initialize as described by 1240eb93f061 ("netfilter:
nf_tables: incorrect error path handling with NFT_MSG_NEWRULE").

The chain is left bound if rule construction fails, so the objects
stored in this chain (and the chain itself) are released by the
transaction records from the abort path, follow up patch ("netfilter:
nf_tables: add NFT_TRANS_PREPARE_ERROR to deal with bound set/chain")
completes this error handling.

When deleting an existing rule, chain bound flag is set off so the
rule expression .destroy path releases the objects.

Fixes: d0e2c7de92c7 ("netfilter: nf_tables: add NFT_CHAIN_BINDING")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 1240eb93 07-Jun-2023 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: incorrect error path handling with NFT_MSG_NEWRULE

In case of error when adding a new rule that refers to an anonymous set,
deactivate expressions via NFT_TRANS_PREPARE state, not NFT_TRANS_RELEASE.
Thus, the lookup expression marks anonymous sets as inactive in the next
generation to ensure it is not reachable in this transaction anymore and
decrement the set refcount as introduced by c1592a89942e ("netfilter:
nf_tables: deactivate anonymous set from preparation phase"). The abort
step takes care of undoing the anonymous set.

This is also consistent with rule deletion, where NFT_TRANS_PREPARE is
used. Note that this error path is exercised in the preparation step of
the commit protocol. This patch replaces nf_tables_rule_release() by the
deactivate and destroy calls, this time with NFT_TRANS_PREPARE.

Due to this incorrect error handling, it is possible to access a
dangling pointer to the anonymous set that remains in the transaction
list.

[1009.379054] BUG: KASAN: use-after-free in nft_set_lookup_global+0x147/0x1a0 [nf_tables]
[1009.379106] Read of size 8 at addr ffff88816c4c8020 by task nft-rule-add/137110
[1009.379116] CPU: 7 PID: 137110 Comm: nft-rule-add Not tainted 6.4.0-rc4+ #256
[1009.379128] Call Trace:
[1009.379132] <TASK>
[1009.379135] dump_stack_lvl+0x33/0x50
[1009.379146] ? nft_set_lookup_global+0x147/0x1a0 [nf_tables]
[1009.379191] print_address_description.constprop.0+0x27/0x300
[1009.379201] kasan_report+0x107/0x120
[1009.379210] ? nft_set_lookup_global+0x147/0x1a0 [nf_tables]
[1009.379255] nft_set_lookup_global+0x147/0x1a0 [nf_tables]
[1009.379302] nft_lookup_init+0xa5/0x270 [nf_tables]
[1009.379350] nf_tables_newrule+0x698/0xe50 [nf_tables]
[1009.379397] ? nf_tables_rule_release+0xe0/0xe0 [nf_tables]
[1009.379441] ? kasan_unpoison+0x23/0x50
[1009.379450] nfnetlink_rcv_batch+0x97c/0xd90 [nfnetlink]
[1009.379470] ? nfnetlink_rcv_msg+0x480/0x480 [nfnetlink]
[1009.379485] ? __alloc_skb+0xb8/0x1e0
[1009.379493] ? __alloc_skb+0xb8/0x1e0
[1009.379502] ? entry_SYSCALL_64_after_hwframe+0x46/0xb0
[1009.379509] ? unwind_get_return_address+0x2a/0x40
[1009.379517] ? write_profile+0xc0/0xc0
[1009.379524] ? avc_lookup+0x8f/0xc0
[1009.379532] ? __rcu_read_unlock+0x43/0x60

Fixes: 958bee14d071 ("netfilter: nf_tables: use new transaction infrastructure to handle sets")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 212ed75d 07-Jun-2023 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: integrate pipapo into commit protocol

The pipapo set backend follows copy-on-update approach, maintaining one
clone of the existing datastructure that is being updated. The clone
and current datastructures are swapped via rcu from the commit step.

The existing integration with the commit protocol is flawed because
there is no operation to clean up the clone if the transaction is
aborted. Moreover, the datastructure swap happens on set element
activation.

This patch adds two new operations for sets: commit and abort, these new
operations are invoked from the commit and abort steps, after the
transactions have been digested, and it updates the pipapo set backend
to use it.

This patch adds a new ->pending_update field to sets to maintain a list
of sets that require this new commit and abort operations.

Fixes: 3c4287f62044 ("nf_tables: Add set type for arbitrary concatenation of ranges")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 08e42a0d 06-Jun-2023 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: out-of-bound check in chain blob

Add current size of rule expressions to the boundary check.

Fixes: 2c865a8a28a1 ("netfilter: nf_tables: add rule blob layout")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# bd058763 23-May-2023 Gavrilov Ilia <Ilia.Gavrilov@infotecs.ru>

netfilter: nf_tables: Add null check for nla_nest_start_noflag() in nft_dump_basechain_hook()

The nla_nest_start_noflag() function may fail and return NULL;
the return value needs to be checked.

Found by InfoTeCS on behalf of Linux Verification Center
(linuxtesting.org) with SVACE.

Fixes: d54725cd11a5 ("netfilter: nf_tables: support for multiple devices per netdev hook")
Signed-off-by: Gavrilov Ilia <Ilia.Gavrilov@infotecs.ru>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# d4b7f29e 11-May-2023 Florian Westphal <fw@strlen.de>

netfilter: nf_tables: always increment set element count

At this time, set->nelems counter only increments when the set has
a maximum size.

All set elements decrement the counter unconditionally, this is
confusing.

Increment the counter unconditionally to make this symmetrical.
This would also allow changing the set maximum size after set creation
in a later patch.

Signed-off-by: Florian Westphal <fw@strlen.de>


# e3c361b8 11-May-2023 Florian Westphal <fw@strlen.de>

netfilter: nf_tables: fix nft_trans type confusion

nft_trans_FOO objects all share a common nft_trans base structure, but
trailing fields depend on the real object size. Access is only safe after
trans->msg_type check.

Check for rule type first. Found by code inspection.

Fixes: 1a94e38d254b ("netfilter: nf_tables: add NFTA_RULE_ID attribute")
Signed-off-by: Florian Westphal <fw@strlen.de>


# c1592a89 02-May-2023 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: deactivate anonymous set from preparation phase

Toggle deleted anonymous sets as inactive in the next generation, so
users cannot perform any update on it. Clear the generation bitmask
in case the transaction is aborted.

The following KASAN splat shows a set element deletion for a bound
anonymous set that has been already removed in the same transaction.

[ 64.921510] ==================================================================
[ 64.923123] BUG: KASAN: wild-memory-access in nf_tables_commit+0xa24/0x1490 [nf_tables]
[ 64.924745] Write of size 8 at addr dead000000000122 by task test/890
[ 64.927903] CPU: 3 PID: 890 Comm: test Not tainted 6.3.0+ #253
[ 64.931120] Call Trace:
[ 64.932699] <TASK>
[ 64.934292] dump_stack_lvl+0x33/0x50
[ 64.935908] ? nf_tables_commit+0xa24/0x1490 [nf_tables]
[ 64.937551] kasan_report+0xda/0x120
[ 64.939186] ? nf_tables_commit+0xa24/0x1490 [nf_tables]
[ 64.940814] nf_tables_commit+0xa24/0x1490 [nf_tables]
[ 64.942452] ? __kasan_slab_alloc+0x2d/0x60
[ 64.944070] ? nf_tables_setelem_notify+0x190/0x190 [nf_tables]
[ 64.945710] ? kasan_set_track+0x21/0x30
[ 64.947323] nfnetlink_rcv_batch+0x709/0xd90 [nfnetlink]
[ 64.948898] ? nfnetlink_rcv_msg+0x480/0x480 [nfnetlink]

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 8509f62b 25-Apr-2023 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: hit ENOENT on unexisting chain/flowtable update with missing attributes

If user does not specify hook number and priority, then assume this is
a chain/flowtable update. Therefore, report ENOENT which provides a
better hint than EINVAL. Set on extended netlink error report to refer
to the chain name.

Fixes: 5b6743fb2c2a ("netfilter: nf_tables: skip flowtable hooknum and priority on device updates")
Fixes: 5efe72698a97 ("netfilter: nf_tables: support for adding new devices to an existing netdev chain")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 207296f1 20-Apr-2023 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: allow to create netdev chain without device

Relax netdev chain creation to allow for loading the ruleset, then
adding/deleting devices at a later stage. Hardware offload does not
support for this feature yet.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 7d937b10 20-Apr-2023 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: support for deleting devices in an existing netdev chain

This patch allows for deleting devices in an existing netdev chain.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# b9703ed4 20-Apr-2023 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: support for adding new devices to an existing netdev chain

This patch allows users to add devices to an existing netdev chain.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# cdc32546 20-Apr-2023 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: rename function to destroy hook list

Rename nft_flowtable_hooks_destroy() by nft_hooks_destroy() to prepare
for netdev chain device updates.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 28339b21 20-Apr-2023 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: do not send complete notification of deletions

In most cases, table, name and handle is sufficient for userspace to
identify an object that has been deleted. Skipping unneeded fields in
the netlink attributes in the message saves bandwidth (ie. less chances
of hitting ENOBUFS).

Rules are an exception: the existing userspace monitor code relies on
the rule definition. This exception can be removed by implementing a
rule cache in userspace, this is already supported by the tracing
infrastructure.

Regarding flowtables, incremental deletion of devices is possible.
Skipping a full notification allows userspace to differentiate between
flowtable removal and incremental removal of devices.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# c3c060ad 20-Apr-2023 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: extended netlink error reporting for netdevice

Flowtable and netdev chains are bound to one or several netdevice,
extend netlink error reporting to specify the the netdevice that
triggers the error.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 00c320f9 13-Apr-2023 Florian Westphal <fw@strlen.de>

netfilter: nf_tables: make validation state per table

We only need to validate tables that saw changes in the current
transaction.

The existing code revalidates all tables, but this isn't needed as
cross-table jumps are not allowed (chains have table scope).

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 9a32e985 13-Apr-2023 Florian Westphal <fw@strlen.de>

netfilter: nf_tables: don't write table validation state without mutex

The ->cleanup callback needs to be removed, this doesn't work anymore as
the transaction mutex is already released in the ->abort function.

Just do it after a successful validation pass, this either happens
from commit or abort phases where transaction mutex is held.

Fixes: f102d66b335a ("netfilter: nf_tables: use dedicated mutex to guard transactions")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 63e9bbbc 11-Apr-2023 Florian Westphal <fw@strlen.de>

netfilter: nf_tables: don't store chain address on jump

Now that the rule trailer/end marker and the rcu head reside in the
same structure, we no longer need to save/restore the chain pointer
when performing/returning from a jump.

We can simply let the trace infra walk the evaluated rule until it
hits the end marker and then fetch the chain pointer from there.

When the rule is NULL (policy tracing), then chain and basechain
pointers were already identical, so just use the basechain.

This cuts size of jumpstack in half, from 256 to 128 bytes in 64bit,
scripts/stackusage says:

nf_tables_core.c:251 nft_do_chain 328 static

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# e38fbfa9 11-Apr-2023 Florian Westphal <fw@strlen.de>

netfilter: nf_tables: merge nft_rules_old structure and end of ruleblob marker

In order to free the rules in a chain via call_rcu, the rule array used
to stash a rcu_head and space for a pointer at the end of the rule array.

When the current nft_rule_dp blob format got added in
2c865a8a28a1 ("netfilter: nf_tables: add rule blob layout"), this results
in a double-trailer:

size (unsigned long)
struct nft_rule_dp
struct nft_expr
...
struct nft_rule_dp
struct nft_expr
...
struct nft_rule_dp (is_last=1) // Trailer

The trailer, struct nft_rule_dp (is_last=1), is not accounted for in size,
so it can be located via start_addr + size.

Because the rcu_head is stored after 'start+size' as well this means the
is_last trailer is *aliased* to the rcu_head (struct nft_rules_old).

This is harmless, because at this time the nft_do_chain function never
evaluates/accesses the trailer, it only checks the address boundary:

for (; rule < last_rule; rule = nft_rule_next(rule)) {
...

But this way the last_rule address has to be stashed in the jump
structure to restore it after returning from a chain.

nft_do_chain stack usage has become way too big, so put it on a diet.

Without this patch is impossible to use
for (; !rule->is_last; rule = nft_rule_next(rule)) {

... because on free, the needed update of the rcu_head will clobber the
nft_rule_dp is_last bit.

Furthermore, also stash the chain pointer in the trailer, this allows
to recover the original chain structure from nf_tables_trace infra
without a need to place them in the jump struct.

After this patch it is trivial to diet the jump stack structure,
done in the next two patches.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# d4eb7e39 17-Apr-2023 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: tighten netlink attribute requirements for catch-all elements

If NFT_SET_ELEM_CATCHALL is set on, then userspace provides no set element
key. Otherwise, bail out with -EINVAL.

Fixes: aaa31047a6d2 ("netfilter: nftables: add catch-all set element support")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# d46fc894 16-Apr-2023 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: validate catch-all set elements

catch-all set element might jump/goto to chain that uses expressions
that require validation.

Fixes: aaa31047a6d2 ("netfilter: nftables: add catch-all set element support")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# af0acf22 05-Apr-2023 Chen Aotian <chenaotian2@163.com>

netfilter: nf_tables: Modify nla_memdup's flag to GFP_KERNEL_ACCOUNT

For memory alloc that store user data from nla[NFTA_OBJ_USERDATA],
use GFP_KERNEL_ACCOUNT is more suitable.

Fixes: 33758c891479 ("memcg: enable accounting for nft objects")
Signed-off-by: Chen Aotian <chenaotian2@163.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 92f3e96d 08-Feb-2023 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: allow to fetch set elements when table has an owner

NFT_MSG_GETSETELEM returns -EPERM when fetching set elements that belong
to table that has an owner. This results in empty set/map listing from
userspace.

Fixes: 6001a930ce03 ("netfilter: nftables: introduce table ownership")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 1fb7696a 19-Jan-2023 Yang Yingliang <yangyingliang@huawei.com>

netfilter: nf_tables: fix wrong pointer passed to PTR_ERR()

It should be 'chain' passed to PTR_ERR() in the error path
after calling nft_chain_lookup() in nf_tables_delrule().

Fixes: f80a612dd77c ("netfilter: nf_tables: add support to destroy operation")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Fernando Fernandez Mancera <ffmancera@riseup.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# dac7f50a 17-Jan-2023 Alok Tiwari <alok.a.tiwari@oracle.com>

netfilter: nf_tables: NULL pointer dereference in nf_tables_updobj()

static analyzer detect null pointer dereference case for 'type'
function __nft_obj_type_get() can return NULL value which require to handle
if type is NULL pointer return -ENOENT.

This is a theoretical issue, since an existing object has a type, but
better add this failsafe check.

Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# f80a612d 02-Jan-2023 Fernando Fernandez Mancera <ffmancera@riseup.net>

netfilter: nf_tables: add support to destroy operation

Introduce NFT_MSG_DESTROY* message type. The destroy operation performs a
delete operation but ignoring the ENOENT errors.

This is useful for the transaction semantics, where failing to delete an
object which does not exist results in aborting the transaction.

This new command allows the transaction to proceed in case the object
does not exist.

Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>


# 123b9961 19-Dec-2022 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: honor set timeout and garbage collection updates

Set timeout and garbage collection interval updates are ignored on
updates. Add transaction to update global set element timeout and
garbage collection interval.

Fixes: 96518518cc41 ("netfilter: add nftables")
Suggested-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# f6594c37 19-Dec-2022 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: perform type checking for existing sets

If a ruleset declares a set name that matches an existing set in the
kernel, then validate that this declaration really refers to the same
set, otherwise bail out with EEXIST.

Currently, the kernel reports success when adding a set that already
exists in the kernel. This usually results in EINVAL errors at a later
stage, when the user adds elements to the set, if the set declaration
mismatches the existing set representation in the kernel.

Add a new function to check that the set declaration really refers to
the same existing set in the kernel.

Fixes: 96518518cc41 ("netfilter: add nftables")
Reported-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# a8fe4154 19-Dec-2022 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: add function to create set stateful expressions

Add a helper function to allocate and initialize the stateful expressions
that are defined in a set.

This patch allows to reuse this code from the set update path, to check
that type of the update matches the existing set in the kernel.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# bed4a63e 19-Dec-2022 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: consolidate set description

Add the following fields to the set description:

- key type
- data type
- object type
- policy
- gc_int: garbage collection interval)
- timeout: element timeout

This prepares for stricter set type checks on updates in a follow up
patch.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 98cbc40e 15-Nov-2022 Dan Carpenter <error27@gmail.com>

netfilter: nft_inner: fix IS_ERR() vs NULL check

The __nft_expr_type_get() function returns NULL on error. It never
returns error pointers.

Fixes: 3a07327d10a0 ("netfilter: nft_inner: support for inner tunnel header matching")
Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 8daa8fde 14-Oct-2022 Phil Sutter <phil@nwl.cc>

netfilter: nf_tables: Introduce NFT_MSG_GETRULE_RESET

Analogous to NFT_MSG_GETOBJ_RESET, but for rules: Reset stateful
expressions like counters or quotas. The latter two are the only
consumers, adjust their 'dump' callbacks to respect the parameter
introduced earlier.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 7d34aa3e 14-Oct-2022 Phil Sutter <phil@nwl.cc>

netfilter: nf_tables: Extend nft_expr_ops::dump callback parameters

Add a 'reset' flag just like with nft_object_ops::dump. This will be
useful to reset "anonymous stateful objects", e.g. simple rule counters.

No functional change intended.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# d120d1a6 26-Oct-2022 Thomas Gleixner <tglx@linutronix.de>

net: Remove the obsolte u64_stats_fetch_*_irq() users (net).

Now that the 32bit UP oddity is gone and 32bit uses always a sequence
count, there is no need for the fetch_irq() variants anymore.

Convert to the regular interface.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 3a07327d 25-Oct-2022 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nft_inner: support for inner tunnel header matching

This new expression allows you to match on the inner headers that are
encapsulated by any of the existing tunneling protocols.

This expression parses the inner packet to set the link, network and
transport offsets, so the existing expressions (with a few updates) can
be reused to match on the inner headers.

The inner expression supports for different tunnel combinations such as:

- ethernet frame over IPv4/IPv6 packet, eg. VxLAN.
- IPv4/IPv6 packet over IPv4/IPv6 packet, eg. IPIP.
- IPv4/IPv6 packet over IPv4/IPv6 + transport header, eg. GRE.
- transport header (ESP or SCTP) over transport header (usually UDP)

The following fields are used to describe the tunnel protocol:

- flags, which describe how to parse the inner headers:

NFT_PAYLOAD_CTX_INNER_TUN, the tunnel provides its own header.
NFT_PAYLOAD_CTX_INNER_ETHER, the ethernet frame is available as inner header.
NFT_PAYLOAD_CTX_INNER_NH, the network header is available as inner header.
NFT_PAYLOAD_CTX_INNER_TH, the transport header is available as inner header.

For example, VxLAN sets on all of these flags. While GRE only sets on
NFT_PAYLOAD_CTX_INNER_NH and NFT_PAYLOAD_CTX_INNER_TH. Then, ESP over
UDP only sets on NFT_PAYLOAD_CTX_INNER_TH.

The tunnel description is composed of the following attributes:

- header size: in case the tunnel comes with its own header, eg. VxLAN.

- type: this provides a hint to userspace on how to delinearize the rule.
This is useful for VxLAN and Geneve since they run over UDP, since
transport does not provide a hint. This is also useful in case hardware
offload is ever supported. The type is not currently interpreted by the
kernel.

- expression: currently only payload supported. Follow up patch adds
also inner meta support which is required by autogenerated
dependencies. The exthdr expression should be supported too
at some point. There is a new inner_ops operation that needs to be
set on to allow to use an existing expression from the inner expression.

This patch adds a new NFT_PAYLOAD_TUN_HEADER base which allows to match
on the tunnel header fields, eg. vxlan vni.

The payload expression is embedded into nft_inner private area and this
private data area is passed to the payload inner eval function via
direct call.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 33c7aba0 14-Nov-2022 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: do not set up extensions for end interval

Elements with an end interval flag set on do not store extensions. The
global set definition is currently setting on the timeout and stateful
expression for end interval elements.

This leads to skipping end interval elements from the set->ops->walk()
path as the expired check bogusly reports true.

Moreover, do not set up stateful expressions for elements with end
interval flag set on since this is never used.

Fixes: 65038428b2c6 ("netfilter: nf_tables: allow to specify stateful expression in set definition")
Fixes: 8d8540c4f5e0 ("netfilter: nft_set_rbtree: add timeout support")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 03c1f1ef 03-Nov-2022 Shigeru Yoshida <syoshida@redhat.com>

netfilter: Cleanup nft_net->module_list from nf_tables_exit_net()

syzbot reported a warning like below [1]:

WARNING: CPU: 3 PID: 9 at net/netfilter/nf_tables_api.c:10096 nf_tables_exit_net+0x71c/0x840
Modules linked in:
CPU: 2 PID: 9 Comm: kworker/u8:0 Tainted: G W 6.1.0-rc3-00072-g8e5423e991e8 #47
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-1.fc36 04/01/2014
Workqueue: netns cleanup_net
RIP: 0010:nf_tables_exit_net+0x71c/0x840
...
Call Trace:
<TASK>
? __nft_release_table+0xfc0/0xfc0
ops_exit_list+0xb5/0x180
cleanup_net+0x506/0xb10
? unregister_pernet_device+0x80/0x80
process_one_work+0xa38/0x1730
? pwq_dec_nr_in_flight+0x2b0/0x2b0
? rwlock_bug.part.0+0x90/0x90
? _raw_spin_lock_irq+0x46/0x50
worker_thread+0x67e/0x10e0
? process_one_work+0x1730/0x1730
kthread+0x2e5/0x3a0
? kthread_complete_and_exit+0x40/0x40
ret_from_fork+0x1f/0x30
</TASK>

In nf_tables_exit_net(), there is a case where nft_net->commit_list is
empty but nft_net->module_list is not empty. Such a case occurs with
the following scenario:

1. nfnetlink_rcv_batch() is called
2. nf_tables_newset() returns -EAGAIN and NFNL_BATCH_FAILURE bit is
set to status
3. nf_tables_abort() is called with NFNL_ABORT_AUTOLOAD
(nft_net->commit_list is released, but nft_net->module_list is not
because of NFNL_ABORT_AUTOLOAD flag)
4. Jump to replay label
5. netlink_skb_clone() fails and returns from the function (this is
caused by fault injection in the reproducer of syzbot)

This patch fixes this issue by calling __nf_tables_abort() when
nft_net->module_list is not empty in nf_tables_exit_net().

Fixes: eb014de4fd41 ("netfilter: nf_tables: autoload modules from the abort path")
Link: https://syzkaller.appspot.com/bug?id=802aba2422de4218ad0c01b46c9525cc9d4e4aa3 [1]
Reported-by: syzbot+178efee9e2d7f87f5103@syzkaller.appspotmail.com
Signed-off-by: Shigeru Yoshida <syoshida@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>


# 26b5934f 26-Oct-2022 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: release flow rule object from commit path

No need to postpone this to the commit release path, since no packets
are walking over this object, this is accessed from control plane only.
This helped uncovered UAF triggered by races with the netlink notifier.

Fixes: 9dd732e0bdf5 ("netfilter: nf_tables: memleak flow rule from commit path")
Reported-by: syzbot+8f747f62763bc6c32916@syzkaller.appspotmail.com
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# d4bc8271 26-Oct-2022 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: netlink notifier might race to release objects

commit release path is invoked via call_rcu and it runs lockless to
release the objects after rcu grace period. The netlink notifier handler
might win race to remove objects that the transaction context is still
referencing from the commit release path.

Call rcu_barrier() to ensure pending rcu callbacks run to completion
if the list of transactions to be destroyed is not empty.

Fixes: 6001a930ce03 ("netfilter: nftables: introduce table ownership")
Reported-by: syzbot+8f747f62763bc6c32916@syzkaller.appspotmail.com
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 96df8360 17-Oct-2022 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: relax NFTA_SET_ELEM_KEY_END set flags requirements

Otherwise EINVAL is bogusly reported to userspace when deleting a set
element. NFTA_SET_ELEM_KEY_END does not need to be set in case of:

- insertion: if not present, start key is used as end key.
- deletion: only start key needs to be specified, end key is ignored.

Hence, relax the sanity check.

Fixes: 88cccd908d51 ("netfilter: nf_tables: NFTA_SET_ELEM_KEY_END requires concat and interval flags")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 8556bceb 18-Aug-2022 Wolfram Sang <wsa+renesas@sang-engineering.com>

netfilter: move from strlcpy with unused retval to strscpy

Follow the advice of the below link and prefer 'strscpy' in this
subsystem. Conversion is 1:1 because the return value is not used.
Generated by a coccinelle script.

Link: https://lore.kernel.org/r/CAHk-=wgfRnXz0W3D37d01q3JFkr_i_uTL=V6A6G1oUZcprmknw@mail.gmail.com/
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Florian Westphal <fw@strlen.de>


# 9a4d6dd5 12-Sep-2022 Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>

netfilter: nf_tables: fix percpu memory leak at nf_tables_addchain()

It seems to me that percpu memory for chain stats started leaking since
commit 3bc158f8d0330f0a ("netfilter: nf_tables: map basechain priority to
hardware priority") when nft_chain_offload_priority() returned an error.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Fixes: 3bc158f8d0330f0a ("netfilter: nf_tables: map basechain priority to hardware priority")
Signed-off-by: Florian Westphal <fw@strlen.de>


# 921ebde3 12-Sep-2022 Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>

netfilter: nf_tables: fix nft_counters_enabled underflow at nf_tables_addchain()

syzbot is reporting underflow of nft_counters_enabled counter at
nf_tables_addchain() [1], for commit 43eb8949cfdffa76 ("netfilter:
nf_tables: do not leave chain stats enabled on error") missed that
nf_tables_chain_destroy() after nft_basechain_init() in the error path of
nf_tables_addchain() decrements the counter because nft_basechain_init()
makes nft_is_base_chain() return true by setting NFT_CHAIN_BASE flag.

Increment the counter immediately after returning from
nft_basechain_init().

Link: https://syzkaller.appspot.com/bug?extid=b5d82a651b71cd8a75ab [1]
Reported-by: syzbot <syzbot+b5d82a651b71cd8a75ab@syzkaller.appspotmail.com>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Tested-by: syzbot <syzbot+b5d82a651b71cd8a75ab@syzkaller.appspotmail.com>
Fixes: 43eb8949cfdffa76 ("netfilter: nf_tables: do not leave chain stats enabled on error")
Signed-off-by: Florian Westphal <fw@strlen.de>


# 77972a36 31-Aug-2022 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: clean up hook list when offload flags check fails

splice back the hook list so nft_chain_release_hook() has a chance to
release the hooks.

BUG: memory leak
unreferenced object 0xffff88810180b100 (size 96):
comm "syz-executor133", pid 3619, jiffies 4294945714 (age 12.690s)
hex dump (first 32 bytes):
28 64 23 02 81 88 ff ff 28 64 23 02 81 88 ff ff (d#.....(d#.....
90 a8 aa 83 ff ff ff ff 00 00 b5 0f 81 88 ff ff ................
backtrace:
[<ffffffff83a8c59b>] kmalloc include/linux/slab.h:600 [inline]
[<ffffffff83a8c59b>] nft_netdev_hook_alloc+0x3b/0xc0 net/netfilter/nf_tables_api.c:1901
[<ffffffff83a9239a>] nft_chain_parse_netdev net/netfilter/nf_tables_api.c:1998 [inline]
[<ffffffff83a9239a>] nft_chain_parse_hook+0x33a/0x530 net/netfilter/nf_tables_api.c:2073
[<ffffffff83a9b14b>] nf_tables_addchain.constprop.0+0x10b/0x950 net/netfilter/nf_tables_api.c:2218
[<ffffffff83a9c41b>] nf_tables_newchain+0xa8b/0xc60 net/netfilter/nf_tables_api.c:2593
[<ffffffff83a3d6a6>] nfnetlink_rcv_batch+0xa46/0xd20 net/netfilter/nfnetlink.c:517
[<ffffffff83a3db79>] nfnetlink_rcv_skb_batch net/netfilter/nfnetlink.c:638 [inline]
[<ffffffff83a3db79>] nfnetlink_rcv+0x1f9/0x220 net/netfilter/nfnetlink.c:656
[<ffffffff83a13b17>] netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
[<ffffffff83a13b17>] netlink_unicast+0x397/0x4c0 net/netlink/af_netlink.c:1345
[<ffffffff83a13fd6>] netlink_sendmsg+0x396/0x710 net/netlink/af_netlink.c:1921
[<ffffffff83865ab6>] sock_sendmsg_nosec net/socket.c:714 [inline]
[<ffffffff83865ab6>] sock_sendmsg+0x56/0x80 net/socket.c:734
[<ffffffff8386601c>] ____sys_sendmsg+0x36c/0x390 net/socket.c:2482
[<ffffffff8386a918>] ___sys_sendmsg+0xa8/0x110 net/socket.c:2536
[<ffffffff8386aaa8>] __sys_sendmsg+0x88/0x100 net/socket.c:2565
[<ffffffff845e5955>] do_syscall_x64 arch/x86/entry/common.c:50 [inline]
[<ffffffff845e5955>] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
[<ffffffff84800087>] entry_SYSCALL_64_after_hwframe+0x63/0xcd

Fixes: d54725cd11a5 ("netfilter: nf_tables: support for multiple devices per netdev hook")
Reported-by: syzbot+5fcdbfab6d6744c57418@syzkaller.appspotmail.com
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# e02f0d39 22-Aug-2022 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: disallow binding to already bound chain

Update nft_data_init() to report EINVAL if chain is already bound.

Fixes: d0e2c7de92c7 ("netfilter: nf_tables: add NFT_CHAIN_BINDING")
Reported-by: Gwangun Jung <exsociety@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 43eb8949 20-Aug-2022 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: do not leave chain stats enabled on error

Error might occur later in the nf_tables_addchain() codepath, enable
static key only after transaction has been created.

Fixes: 9f08ea848117 ("netfilter: nf_tables: keep chain counters away from hot path")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# ab482c6b 21-Aug-2022 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: make table handle allocation per-netns friendly

mutex is per-netns, move table_netns to the pernet area.

*read-write* to 0xffffffff883a01e8 of 8 bytes by task 6542 on cpu 0:
nf_tables_newtable+0x6dc/0xc00 net/netfilter/nf_tables_api.c:1221
nfnetlink_rcv_batch net/netfilter/nfnetlink.c:513 [inline]
nfnetlink_rcv_skb_batch net/netfilter/nfnetlink.c:634 [inline]
nfnetlink_rcv+0xa6a/0x13a0 net/netfilter/nfnetlink.c:652
netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
netlink_unicast+0x652/0x730 net/netlink/af_netlink.c:1345
netlink_sendmsg+0x643/0x740 net/netlink/af_netlink.c:1921

Fixes: f102d66b335a ("netfilter: nf_tables: use dedicated mutex to guard transactions")
Reported-by: Abhishek Shah <abhishek.shah@columbia.edu>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 5dc52d83 21-Aug-2022 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: disallow updates of implicit chain

Updates on existing implicit chain make no sense, disallow this.

Fixes: d0e2c7de92c7 ("netfilter: nf_tables: add NFT_CHAIN_BINDING")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 1b6345d4 15-Aug-2022 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: check NFT_SET_CONCAT flag if field_count is specified

Since f3a2181e16f1 ("netfilter: nf_tables: Support for sets with
multiple ranged fields"), it possible to combine intervals and
concatenations. Later on, ef516e8625dd ("netfilter: nf_tables:
reintroduce the NFT_SET_CONCAT flag") provides the NFT_SET_CONCAT flag
for userspace to report that the set stores a concatenation.

Make sure NFT_SET_CONCAT is set on if field_count is specified for
consistency. Otherwise, if NFT_SET_CONCAT is specified with no
field_count, bail out with EINVAL.

Fixes: ef516e8625dd ("netfilter: nf_tables: reintroduce the NFT_SET_CONCAT flag")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# fc0ae524 13-Aug-2022 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: disallow NFT_SET_ELEM_CATCHALL and NFT_SET_ELEM_INTERVAL_END

These flags are mutually exclusive, report EINVAL in this case.

Fixes: aaa31047a6d2 ("netfilter: nftables: add catch-all set element support")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 88cccd90 12-Aug-2022 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: NFTA_SET_ELEM_KEY_END requires concat and interval flags

If the NFT_SET_CONCAT|NFT_SET_INTERVAL flags are set on, then the
netlink attribute NFTA_SET_ELEM_KEY_END must be specified. Otherwise,
NFTA_SET_ELEM_KEY_END should not be present.

For catch-all element, NFTA_SET_ELEM_KEY_END should not be present.
The NFT_SET_ELEM_INTERVAL_END is never used with this set flags
combination.

Fixes: 7b225d0b5c6d ("netfilter: nf_tables: add NFTA_SET_ELEM_KEY_END attribute")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 5a2f3dc3 12-Aug-2022 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: validate NFTA_SET_ELEM_OBJREF based on NFT_SET_OBJECT flag

If the NFTA_SET_ELEM_OBJREF netlink attribute is present and
NFT_SET_OBJECT flag is set on, report EINVAL.

Move existing sanity check earlier to validate that NFT_SET_OBJECT
requires NFTA_SET_ELEM_OBJREF.

Fixes: 8aeff920dcc9 ("netfilter: nf_tables: add stateful object reference to set elements")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 271c5ca8 09-Aug-2022 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: really skip inactive sets when allocating name

While looping to build the bitmap of used anonymous set names, check the
current set in the iteration, instead of the one that is being created.

Fixes: 37a9cc525525 ("netfilter: nf_tables: add generation mask to sets")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 2024439b 11-Aug-2022 Florian Westphal <fw@strlen.de>

netfilter: nf_tables: fix scheduling-while-atomic splat

nf_tables_check_loops() can be called from rhashtable list
walk so cond_resched() cannot be used here.

Fixes: 81ea01066741 ("netfilter: nf_tables: add rescheduling points during loop detection walks")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# c485c35f 09-Aug-2022 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: possible module reference underflow in error path

dst->ops is set on when nft_expr_clone() fails, but module refcount has
not been bumped yet, therefore nft_expr_destroy() leads to module
reference underflow.

Fixes: 8cfd9b0f8515 ("netfilter: nftables: generalize set expressions support")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 4963674c 09-Aug-2022 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: disallow NFTA_SET_ELEM_KEY_END with NFT_SET_ELEM_INTERVAL_END flag

These are mutually exclusive, actually NFTA_SET_ELEM_KEY_END replaces
the flag notation.

Fixes: 7b225d0b5c6d ("netfilter: nf_tables: add NFTA_SET_ELEM_KEY_END attribute")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 34002783 09-Aug-2022 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: use READ_ONCE and WRITE_ONCE for shared generation id access

The generation ID is bumped from the commit path while holding the
mutex, however, netlink dump operations rely on RCU.

This patch also adds missing cb->base_eq initialization in
nf_tables_dump_set().

Fixes: 38e029f14a97 ("netfilter: nf_tables: set NLM_F_DUMP_INTR if netlink dumping is stale")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 58007785 09-Aug-2022 Florian Westphal <fw@strlen.de>

netfilter: nf_tables: fix null deref due to zeroed list head

In nf_tables_updtable, if nf_tables_table_enable returns an error,
nft_trans_destroy is called to free the transaction object.

nft_trans_destroy() calls list_del(), but the transaction was never
placed on a list -- the list head is all zeroes, this results in
a null dereference:

BUG: KASAN: null-ptr-deref in nft_trans_destroy+0x26/0x59
Call Trace:
nft_trans_destroy+0x26/0x59
nf_tables_newtable+0x4bc/0x9bc
[..]

Its sane to assume that nft_trans_destroy() can be called
on the transaction object returned by nft_trans_alloc(), so
make sure the list head is initialised.

Fixes: 55dd6f93076b ("netfilter: nf_tables: use new transaction infrastructure to handle table")
Reported-by: mingi cho <mgcho.minic@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# f323ef3a 08-Aug-2022 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: disallow jump to implicit chain from set element

Extend struct nft_data_desc to add a flag field that specifies
nft_data_init() is being called for set element data.

Use it to disallow jump to implicit chain from set element, only jump
to chain via immediate expression is allowed.

Fixes: d0e2c7de92c7 ("netfilter: nf_tables: add NFT_CHAIN_BINDING")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 341b6941 08-Aug-2022 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: upfront validation of data via nft_data_init()

Instead of parsing the data and then validate that type and length are
correct, pass a description of the expected data so it can be validated
upfront before parsing it to bail out earlier.

This patch adds a new .size field to specify the maximum size of the
data area. The .len field is optional and it is used as an input/output
field, it provides the specific length of the expected data in the input
path. If then .len field is not specified, then obtained length from the
netlink attribute is stored. This is required by cmp, bitwise, range and
immediate, which provide no netlink attribute that describes the data
length. The immediate expression uses the destination register type to
infer the expected data type.

Relying on opencoded validation of the expected data might lead to
subtle bugs as described in 7e6bc1f6cabc ("netfilter: nf_tables:
stricter validation of element data").

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 36d5b291 09-Aug-2022 Thadeu Lima de Souza Cascardo <cascardo@canonical.com>

netfilter: nf_tables: do not allow RULE_ID to refer to another chain

When doing lookups for rules on the same batch by using its ID, a rule from
a different chain can be used. If a rule is added to a chain but tries to
be positioned next to a rule from a different chain, it will be linked to
chain2, but the use counter on chain1 would be the one to be incremented.

When looking for rules by ID, use the chain that was used for the lookup by
name. The chain used in the context copied to the transaction needs to
match that same chain. That way, struct nft_rule does not need to get
enlarged with another member.

Fixes: 1a94e38d254b ("netfilter: nf_tables: add NFTA_RULE_ID attribute")
Fixes: 75dd48e2e420 ("netfilter: nf_tables: Support RULE_ID reference in new rule")
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 95f466d2 09-Aug-2022 Thadeu Lima de Souza Cascardo <cascardo@canonical.com>

netfilter: nf_tables: do not allow CHAIN_ID to refer to another table

When doing lookups for chains on the same batch by using its ID, a chain
from a different table can be used. If a rule is added to a table but
refers to a chain in a different table, it will be linked to the chain in
table2, but would have expressions referring to objects in table1.

Then, when table1 is removed, the rule will not be removed as its linked to
a chain in table2. When expressions in the rule are processed or removed,
that will lead to a use-after-free.

When looking for chains by ID, use the table that was used for the lookup
by name, and only return chains belonging to that same table.

Fixes: 837830a4b439 ("netfilter: nf_tables: add NFTA_RULE_CHAIN_ID attribute")
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 470ee20e 09-Aug-2022 Thadeu Lima de Souza Cascardo <cascardo@canonical.com>

netfilter: nf_tables: do not allow SET_ID to refer to another table

When doing lookups for sets on the same batch by using its ID, a set from a
different table can be used.

Then, when the table is removed, a reference to the set may be kept after
the set is freed, leading to a potential use-after-free.

When looking for sets by ID, use the table that was used for the lookup by
name, and only return sets belonging to that same table.

This fixes CVE-2022-2586, also reported as ZDI-CAN-17470.

Reported-by: Team Orca of Sea Security (@seasecresponse)
Fixes: 958bee14d071 ("netfilter: nf_tables: use new transaction infrastructure to handle sets")
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 34aae2c2 09-Aug-2022 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: validate variable length element extension

Update template to validate variable length extensions. This patch adds
a new .ext_len[id] field to the template to store the expected extension
length. This is used to sanity check the initialization of the variable
length extension.

Use PTR_ERR() in nft_set_elem_init() to report errors since, after this
update, there are two reason why this might fail, either because of
ENOMEM or insufficient room in the extension field (EINVAL).

Kernels up until 7e6bc1f6cabc ("netfilter: nf_tables: stricter
validation of element data") allowed to copy more data to the extension
than was allocated. This ext_len field allows to validate if the
destination has the correct size as additional check.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 81ea0106 25-Jul-2022 Florian Westphal <fw@strlen.de>

netfilter: nf_tables: add rescheduling points during loop detection walks

Add explicit rescheduling points during ruleset walk.

Switching to a faster algorithm is possible but this is a much
smaller change, suitable for nf tree.

Link: https://bugzilla.netfilter.org/show_bug.cgi?id=1460
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>


# c39ba4de 05-Jul-2022 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: replace BUG_ON by element length check

BUG_ON can be triggered from userspace with an element with a large
userdata area. Replace it by length check and return EINVAL instead.
Over time extensions have been growing in size.

Pick a sufficiently old Fixes: tag to propagate this fix.

Fixes: 7d7402642eaf ("netfilter: nf_tables: variable sized set element keys / data")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 7e6bc1f6 01-Jul-2022 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: stricter validation of element data

Make sure element data type and length do not mismatch the one specified
by the set declaration.

Fixes: 7d7402642eaf ("netfilter: nf_tables: variable sized set element keys / data")
Reported-by: Hugues ANGUELKOV <hanguelkov@randorisec.fr>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 3a41c64d 06-Jun-2022 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: bail out early if hardware offload is not supported

If user requests for NFT_CHAIN_HW_OFFLOAD, then check if either device
provides the .ndo_setup_tc interface or there is an indirect flow block
that has been registered. Otherwise, bail out early from the preparation
phase. Moreover, validate that family == NFPROTO_NETDEV and hook is
NF_NETDEV_INGRESS.

Fixes: c9626a2cbdb2 ("netfilter: nf_tables: add hardware offload support")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 9dd732e0 06-Jun-2022 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: memleak flow rule from commit path

Abort path release flow rule object, however, commit path does not.
Update code to destroy these objects before releasing the transaction.

Fixes: c9626a2cbdb2 ("netfilter: nf_tables: add hardware offload support")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# c271cc9f 05-Jun-2022 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: release new hooks on unsupported flowtable flags

Release the list of new hooks that are pending to be registered in case
that unsupported flowtable flags are provided.

Fixes: 78d9f48f7f44 ("netfilter: nf_tables: add devices to existing flowtable")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 2c9e4559 01-Jun-2022 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: always initialize flowtable hook list in transaction

The hook list is used if nft_trans_flowtable_update(trans) == true. However,
initialize this list for other cases for safety reasons.

Fixes: 78d9f48f7f44 ("netfilter: nf_tables: add devices to existing flowtable")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# b6d9014a 30-May-2022 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: delete flowtable hooks via transaction list

Remove inactive bool field in nft_hook object that was introduced in
abadb2f865d7 ("netfilter: nf_tables: delete devices from flowtable").
Move stale flowtable hooks to transaction list instead.

Deleting twice the same device does not result in ENOENT.

Fixes: abadb2f865d7 ("netfilter: nf_tables: delete devices from flowtable")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# ab5e5c06 01-Jun-2022 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: use kfree_rcu(ptr, rcu) to release hooks in clean_net path

Use kfree_rcu(ptr, rcu) variant instead as described by ae089831ff28
("netfilter: nf_tables: prefer kfree_rcu(ptr, rcu) variant").

Fixes: f9a43007d3f7 ("netfilter: nf_tables: double hook unregistration in netns path")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# f9a43007 30-May-2022 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: double hook unregistration in netns path

__nft_release_hooks() is called from pre_netns exit path which
unregisters the hooks, then the NETDEV_UNREGISTER event is triggered
which unregisters the hooks again.

[ 565.221461] WARNING: CPU: 18 PID: 193 at net/netfilter/core.c:495 __nf_unregister_net_hook+0x247/0x270
[...]
[ 565.246890] CPU: 18 PID: 193 Comm: kworker/u64:1 Tainted: G E 5.18.0-rc7+ #27
[ 565.253682] Workqueue: netns cleanup_net
[ 565.257059] RIP: 0010:__nf_unregister_net_hook+0x247/0x270
[...]
[ 565.297120] Call Trace:
[ 565.300900] <TASK>
[ 565.304683] nf_tables_flowtable_event+0x16a/0x220 [nf_tables]
[ 565.308518] raw_notifier_call_chain+0x63/0x80
[ 565.312386] unregister_netdevice_many+0x54f/0xb50

Unregister and destroy netdev hook from netns pre_exit via kfree_rcu
so the NETDEV_UNREGISTER path see unregistered hooks.

Fixes: 767d1216bff8 ("netfilter: nftables: fix possible UAF over chains from packet path in netns")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 3923b1e4 30-May-2022 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: hold mutex on netns pre_exit path

clean_net() runs in workqueue while walking over the lists, grab mutex.

Fixes: 767d1216bff8 ("netfilter: nftables: fix possible UAF over chains from packet path in netns")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# fecf31ee 27-May-2022 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: sanitize nft_set_desc_concat_parse()

Add several sanity checks for nft_set_desc_concat_parse():

- validate desc->field_count not larger than desc->field_len array.
- field length cannot be larger than desc->field_len (ie. U8_MAX)
- total length of the concatenation cannot be larger than register array.

Joint work with Florian Westphal.

Fixes: f3a2181e16f1 ("netfilter: nf_tables: Support for sets with multiple ranged fields")
Reported-by: <zhangziming.zzm@antgroup.com>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# b53c1166 19-May-2022 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: set element extended ACK reporting support

Report the element that causes problems via netlink extended ACK for set
element commands.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 52077804 25-May-2022 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: disallow non-stateful expression in sets earlier

Since 3e135cd499bf ("netfilter: nft_dynset: dynamic stateful expression
instantiation"), it is possible to attach stateful expressions to set
elements.

cd5125d8f518 ("netfilter: nf_tables: split set destruction in deactivate
and destroy phase") introduces conditional destruction on the object to
accomodate transaction semantics.

nft_expr_init() calls expr->ops->init() first, then check for
NFT_STATEFUL_EXPR, this stills allows to initialize a non-stateful
lookup expressions which points to a set, which might lead to UAF since
the set is not properly detached from the set->binding for this case.
Anyway, this combination is non-sense from nf_tables perspective.

This patch fixes this problem by checking for NFT_STATEFUL_EXPR before
expr->ops->init() is called.

The reporter provides a KASAN splat and a poc reproducer (similar to
those autogenerated by syzbot to report use-after-free errors). It is
unknown to me if they are using syzbot or if they use similar automated
tool to locate the bug that they are reporting.

For the record, this is the KASAN splat.

[ 85.431824] ==================================================================
[ 85.432901] BUG: KASAN: use-after-free in nf_tables_bind_set+0x81b/0xa20
[ 85.433825] Write of size 8 at addr ffff8880286f0e98 by task poc/776
[ 85.434756]
[ 85.434999] CPU: 1 PID: 776 Comm: poc Tainted: G W 5.18.0+ #2
[ 85.436023] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014

Fixes: 0b2d8a7b638b ("netfilter: nf_tables: add helper functions for expression handling")
Reported-and-tested-by: Aaron Adams <edg-e@nccgroup.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 10377d42 22-Mar-2022 Jakob Koschel <jakobkoschel@gmail.com>

netfilter: nf_tables: replace unnecessary use of list_for_each_entry_continue()

Since there is no way for list_for_each_entry_continue() to start
interating in the middle of the list they can be replaced with a call
to list_for_each_entry().

In preparation to limit the scope of the list iterator to the list
traversal loop, the list iterator variable 'rule' should not be used
past the loop.

v1->v2:
- also replace first usage of list_for_each_entry_continue() (Florian
Westphal)

Signed-off-by: Jakob Koschel <jakobkoschel@gmail.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 9e539c5b 18-May-2022 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: disable expression reduction infra

Either userspace or kernelspace need to pre-fetch keys inconditionally
before comparisons for this to work. Otherwise, register tracking data
is misleading and it might result in reducing expressions which are not
yet registers.

First expression is also guaranteed to be evaluated always, however,
certain expressions break before writing data to registers, before
comparing the data, leaving the register in undetermined state.

This patch disables this infrastructure by now.

Fixes: b2d306542ff9 ("netfilter: nf_tables: do not reduce read-only expressions")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 6c6f9f31 12-Apr-2022 Antoine Tenart <atenart@kernel.org>

netfilter: nf_tables: nft_parse_register can return a negative value

Since commit 6e1acfa387b9 ("netfilter: nf_tables: validate registers
coming from userspace.") nft_parse_register can return a negative value,
but the function prototype is still returning an unsigned int.

Fixes: 6e1acfa387b9 ("netfilter: nf_tables: validate registers coming from userspace.")
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 42193ffd 01-Apr-2022 Vasily Averin <vasily.averin@linux.dev>

netfilter: nf_tables: memcg accounting for dynamically allocated objects

nft_*.c files whose NFT_EXPR_STATEFUL flag is set on need to
use __GFP_ACCOUNT flag for objects that are dynamically
allocated from the packet path.

Such objects are allocated inside nft_expr_ops->init() callbacks
executed in task context while processing netlink messages.

In addition, this patch adds accounting to nft_set_elem_expr_clone()
used for the same purposes.

Signed-off-by: Vasily Averin <vvs@openvz.org>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 33758c89 24-Mar-2022 Vasily Averin <vasily.averin@linux.dev>

memcg: enable accounting for nft objects

nftables replaces iptables, but it lacks memcg accounting.

This patch account most of the memory allocation associated with nft
and should protect the host from misusing nft inside a memcg restricted
container.

Signed-off-by: Vasily Averin <vvs@openvz.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 34cc9e52 14-Mar-2022 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: cancel tracking for clobbered destination registers

Output of expressions might be larger than one single register, this might
clobber existing data. Reset tracking for all destination registers that
required to store the expression output.

This patch adds three new helper functions:

- nft_reg_track_update: cancel previous register tracking and update it.
- nft_reg_track_cancel: cancel any previous register tracking info.
- __nft_reg_track_cancel: cancel only one single register tracking info.

Partial register clobbering detection is also supported by checking the
.num_reg field which describes the number of register that are used.

This patch updates the following expressions:

- meta_bridge
- bitwise
- byteorder
- meta
- payload

to use these helper functions.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# b2d30654 14-Mar-2022 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: do not reduce read-only expressions

Skip register tracking for expressions that perform read-only operations
on the registers. Define and use a cookie pointer NFT_REDUCE_READONLY to
avoid defining stubs for these expressions.

This patch re-enables register tracking which was disabled in ed5f85d42290
("netfilter: nf_tables: disable register tracking"). Follow up patches
add remaining register tracking for existing expressions.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 6e1acfa3 17-Mar-2022 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: validate registers coming from userspace.

Bail out in case userspace uses unsupported registers.

Fixes: 49499c3e6e18 ("netfilter: nf_tables: switch registers to 32 bit addressing")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# f1082dd3 16-Feb-2022 Phil Sutter <phil@nwl.cc>

netfilter: nf_tables: Reject tables of unsupported family

An nftables family is merely a hollow container, its family just a
number and such not reliant on compile-time options other than nftables
support itself. Add an artificial check so attempts at using a family
the kernel can't support fail as early as possible. This helps user
space detect kernels which lack e.g. NFPROTO_INET.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# ed5f85d4 12-Mar-2022 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: disable register tracking

The register tracking infrastructure is incomplete, it might lead to
generating incorrect ruleset bytecode, disable it by now given we are
late in the release process.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# ae089831 22-Feb-2022 Eric Dumazet <edumazet@google.com>

netfilter: nf_tables: prefer kfree_rcu(ptr, rcu) variant

While kfree_rcu(ptr) _is_ supported, it has some limitations.

Given that 99.99% of kfree_rcu() users [1] use the legacy
two parameters variant, and @catchall objects do have an rcu head,
simply use it.

Choice of kfree_rcu(ptr) variant was probably not intentional.

[1] including calls from net/netfilter/nf_tables_api.c

Fixes: aaa31047a6d2 ("netfilter: nftables: add catch-all set element support")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# dad3bdee 21-Feb-2022 Florian Westphal <fw@strlen.de>

netfilter: nf_tables: fix memory leak during stateful obj update

stateful objects can be updated from the control plane.
The transaction logic allocates a temporary object for this purpose.

The ->init function was called for this object, so plain kfree() leaks
resources. We must call ->destroy function of the object.

nft_obj_destroy does this, but it also decrements the module refcount,
but the update path doesn't increment it.

To avoid special-casing the update object release, do module_get for
the update case too and release it via nft_obj_destroy().

Fixes: d62d0ba97b58 ("netfilter: nf_tables: Introduce stateful object update operation")
Cc: Fernando Fernandez Mancera <ffmancera@riseup.net>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 6069da44 17-Feb-2022 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: unregister flowtable hooks on netns exit

Unregister flowtable hooks before they are releases via
nf_tables_flowtable_destroy() otherwise hook core reports UAF.

BUG: KASAN: use-after-free in nf_hook_entries_grow+0x5a7/0x700 net/netfilter/core.c:142 net/netfilter/core.c:142
Read of size 4 at addr ffff8880736f7438 by task syz-executor579/3666

CPU: 0 PID: 3666 Comm: syz-executor579 Not tainted 5.16.0-rc5-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
__dump_stack lib/dump_stack.c:88 [inline] lib/dump_stack.c:106
dump_stack_lvl+0x1dc/0x2d8 lib/dump_stack.c:106 lib/dump_stack.c:106
print_address_description+0x65/0x380 mm/kasan/report.c:247 mm/kasan/report.c:247
__kasan_report mm/kasan/report.c:433 [inline]
__kasan_report mm/kasan/report.c:433 [inline] mm/kasan/report.c:450
kasan_report+0x19a/0x1f0 mm/kasan/report.c:450 mm/kasan/report.c:450
nf_hook_entries_grow+0x5a7/0x700 net/netfilter/core.c:142 net/netfilter/core.c:142
__nf_register_net_hook+0x27e/0x8d0 net/netfilter/core.c:429 net/netfilter/core.c:429
nf_register_net_hook+0xaa/0x180 net/netfilter/core.c:571 net/netfilter/core.c:571
nft_register_flowtable_net_hooks+0x3c5/0x730 net/netfilter/nf_tables_api.c:7232 net/netfilter/nf_tables_api.c:7232
nf_tables_newflowtable+0x2022/0x2cf0 net/netfilter/nf_tables_api.c:7430 net/netfilter/nf_tables_api.c:7430
nfnetlink_rcv_batch net/netfilter/nfnetlink.c:513 [inline]
nfnetlink_rcv_skb_batch net/netfilter/nfnetlink.c:634 [inline]
nfnetlink_rcv_batch net/netfilter/nfnetlink.c:513 [inline] net/netfilter/nfnetlink.c:652
nfnetlink_rcv_skb_batch net/netfilter/nfnetlink.c:634 [inline] net/netfilter/nfnetlink.c:652
nfnetlink_rcv+0x10e6/0x2550 net/netfilter/nfnetlink.c:652 net/netfilter/nfnetlink.c:652

__nft_release_hook() calls nft_unregister_flowtable_net_hooks() which
only unregisters the hooks, then after RCU grace period, it is
guaranteed that no packets add new entries to the flowtable (no flow
offload rules and flowtable hooks are reachable from packet path), so it
is safe to call nf_flow_table_free() which cleans up the remaining
entries from the flowtable (both software and hardware) and it unbinds
the flow_block.

Fixes: ff4bf2f42a40 ("netfilter: nf_tables: add nft_unregister_flowtable_hook()")
Reported-by: syzbot+e918523f77e62790d6d9@syzkaller.appspotmail.com
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# b07f4137 27-Jan-2022 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: remove assignment with no effect in chain blob builder

cppcheck possible warnings:

>> net/netfilter/nf_tables_api.c:2014:2: warning: Assignment of function parameter has no effect outside the function. Did you forget dereferencing it? [uselessAssignmentPtrArg]
ptr += offsetof(struct nft_rule_dp, data);
^

Reported-by: kernel test robot <yujie.liu@intel.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# fe75e84a 11-Jan-2022 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: set last expression in register tracking area

nft_rule_for_each_expr() sets on last to nft_rule_last(), however, this
is coming after track.last field is set on.

Use nft_expr_last() to set track.last accordingly.

Fixes: 12e4ecfa244b ("netfilter: nf_tables: add register tracking infrastructure")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# cf46eacb 11-Jan-2022 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: remove unused variable

> Remove unused variable and fix missing initialization.
>
> >> net/netfilter/nf_tables_api.c:8266:6: warning: variable 'i' set but not used [-Wunused-but-set-variable]
> int i;
> ^

Fixes: 2c865a8a28a1 ("netfilter: nf_tables: add rule blob layout")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 63045bfd 10-Jan-2022 Linus Torvalds <torvalds@linux-foundation.org>

netfilter: nf_tables: don't use 'data_size' uninitialized

Commit 2c865a8a28a1 ("netfilter: nf_tables: add rule blob layout") never
initialized the new 'data_size' variable.

I'm not sure how it ever worked, but it might have worked almost by
accident - gcc seems to occasionally miss these kinds of 'variable used
uninitialized' situations, but I've seen it do so because it ended up
zero-initializing them due to some other simplification.

But clang is very unhappy about it all, and correctly reports

net/netfilter/nf_tables_api.c:8278:4: error: variable 'data_size' is uninitialized when used here [-Werror,-Wuninitialized]
data_size += sizeof(*prule) + rule->dlen;
^~~~~~~~~
net/netfilter/nf_tables_api.c:8263:30: note: initialize the variable 'data_size' to silence this warning
unsigned int size, data_size;
^
= 0
1 error generated.

and this fix just initializes 'data_size' to zero before the loop.

Fixes: 2c865a8a28a1 ("netfilter: nf_tables: add rule blob layout")
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 12e4ecfa 09-Jan-2022 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: add register tracking infrastructure

This patch adds new infrastructure to skip redundant selector store
operations on the same register to achieve a performance boost from
the packet path.

This is particularly noticeable in pure linear rulesets but it also
helps in rulesets which are already heaving relying in maps to avoid
ruleset linear inspection.

The idea is to keep data of the most recurrent store operations on
register to reuse them with cmp and lookup expressions.

This infrastructure allows for dynamic ruleset updates since the ruleset
blob reduction happens from the kernel.

Userspace still needs to be updated to maximize register utilization to
cooperate to improve register data reuse / reduce number of store on
register operations.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 2c865a8a 09-Jan-2022 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: add rule blob layout

This patch adds a blob layout per chain to represent the ruleset in the
packet datapath.

size (unsigned long)
struct nft_rule_dp
struct nft_expr
...
struct nft_rule_dp
struct nft_expr
...
struct nft_rule_dp (is_last=1)

The new structure nft_rule_dp represents the rule in a more compact way
(smaller memory footprint) compared to the control-plane nft_rule
structure.

The ruleset blob is a read-only data structure. The first field contains
the blob size, then the rules containing expressions. There is a trailing
rule which is used by the tracing infrastructure which is equivalent to
the NULL rule marker in the previous representation. The blob size field
does not include the size of this trailing rule marker.

The ruleset blob is generated from the commit path.

This patch reuses the infrastructure available since 0cbc06b3faba
("netfilter: nf_tables: remove synchronize_rcu in commit phase") to
build the array of rules per chain.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 0f7d9b31 13-Dec-2021 Eric Dumazet <edumazet@google.com>

netfilter: nf_tables: fix use-after-free in nft_set_catchall_destroy()

We need to use list_for_each_entry_safe() iterator
because we can not access @catchall after kfree_rcu() call.

syzbot reported:

BUG: KASAN: use-after-free in nft_set_catchall_destroy net/netfilter/nf_tables_api.c:4486 [inline]
BUG: KASAN: use-after-free in nft_set_destroy net/netfilter/nf_tables_api.c:4504 [inline]
BUG: KASAN: use-after-free in nft_set_destroy+0x3fd/0x4f0 net/netfilter/nf_tables_api.c:4493
Read of size 8 at addr ffff8880716e5b80 by task syz-executor.3/8871

CPU: 1 PID: 8871 Comm: syz-executor.3 Not tainted 5.16.0-rc5-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description.constprop.0.cold+0x8d/0x2ed mm/kasan/report.c:247
__kasan_report mm/kasan/report.c:433 [inline]
kasan_report.cold+0x83/0xdf mm/kasan/report.c:450
nft_set_catchall_destroy net/netfilter/nf_tables_api.c:4486 [inline]
nft_set_destroy net/netfilter/nf_tables_api.c:4504 [inline]
nft_set_destroy+0x3fd/0x4f0 net/netfilter/nf_tables_api.c:4493
__nft_release_table+0x79f/0xcd0 net/netfilter/nf_tables_api.c:9626
nft_rcv_nl_event+0x4f8/0x670 net/netfilter/nf_tables_api.c:9688
notifier_call_chain+0xb5/0x200 kernel/notifier.c:83
blocking_notifier_call_chain kernel/notifier.c:318 [inline]
blocking_notifier_call_chain+0x67/0x90 kernel/notifier.c:306
netlink_release+0xcb6/0x1dd0 net/netlink/af_netlink.c:788
__sock_release+0xcd/0x280 net/socket.c:649
sock_close+0x18/0x20 net/socket.c:1314
__fput+0x286/0x9f0 fs/file_table.c:280
task_work_run+0xdd/0x1a0 kernel/task_work.c:164
tracehook_notify_resume include/linux/tracehook.h:189 [inline]
exit_to_user_mode_loop kernel/entry/common.c:175 [inline]
exit_to_user_mode_prepare+0x27e/0x290 kernel/entry/common.c:207
__syscall_exit_to_user_mode_work kernel/entry/common.c:289 [inline]
syscall_exit_to_user_mode+0x19/0x60 kernel/entry/common.c:300
do_syscall_64+0x42/0xb0 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f75fbf28adb
Code: 0f 05 48 3d 00 f0 ff ff 77 45 c3 0f 1f 40 00 48 83 ec 18 89 7c 24 0c e8 63 fc ff ff 8b 7c 24 0c 41 89 c0 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 35 44 89 c7 89 44 24 0c e8 a1 fc ff ff 8b 44
RSP: 002b:00007ffd8da7ec10 EFLAGS: 00000293 ORIG_RAX: 0000000000000003
RAX: 0000000000000000 RBX: 0000000000000004 RCX: 00007f75fbf28adb
RDX: 00007f75fc08e828 RSI: ffffffffffffffff RDI: 0000000000000003
RBP: 00007f75fc08a960 R08: 0000000000000000 R09: 00007f75fc08e830
R10: 00007ffd8da7ed10 R11: 0000000000000293 R12: 00000000002067c3
R13: 00007ffd8da7ed10 R14: 00007f75fc088f60 R15: 0000000000000032
</TASK>

Allocated by task 8886:
kasan_save_stack+0x1e/0x50 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:46 [inline]
set_alloc_info mm/kasan/common.c:434 [inline]
____kasan_kmalloc mm/kasan/common.c:513 [inline]
____kasan_kmalloc mm/kasan/common.c:472 [inline]
__kasan_kmalloc+0xa6/0xd0 mm/kasan/common.c:522
kasan_kmalloc include/linux/kasan.h:269 [inline]
kmem_cache_alloc_trace+0x1ea/0x4a0 mm/slab.c:3575
kmalloc include/linux/slab.h:590 [inline]
nft_setelem_catchall_insert net/netfilter/nf_tables_api.c:5544 [inline]
nft_setelem_insert net/netfilter/nf_tables_api.c:5562 [inline]
nft_add_set_elem+0x232e/0x2f40 net/netfilter/nf_tables_api.c:5936
nf_tables_newsetelem+0x6ff/0xbb0 net/netfilter/nf_tables_api.c:6032
nfnetlink_rcv_batch+0x1710/0x25f0 net/netfilter/nfnetlink.c:513
nfnetlink_rcv_skb_batch net/netfilter/nfnetlink.c:634 [inline]
nfnetlink_rcv+0x3af/0x420 net/netfilter/nfnetlink.c:652
netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1345
netlink_sendmsg+0x904/0xdf0 net/netlink/af_netlink.c:1921
sock_sendmsg_nosec net/socket.c:704 [inline]
sock_sendmsg+0xcf/0x120 net/socket.c:724
____sys_sendmsg+0x6e8/0x810 net/socket.c:2409
___sys_sendmsg+0xf3/0x170 net/socket.c:2463
__sys_sendmsg+0xe5/0x1b0 net/socket.c:2492
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae

Freed by task 15335:
kasan_save_stack+0x1e/0x50 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:46
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:366 [inline]
____kasan_slab_free mm/kasan/common.c:328 [inline]
__kasan_slab_free+0xd1/0x110 mm/kasan/common.c:374
kasan_slab_free include/linux/kasan.h:235 [inline]
__cache_free mm/slab.c:3445 [inline]
kmem_cache_free_bulk+0x67/0x1e0 mm/slab.c:3766
kfree_bulk include/linux/slab.h:446 [inline]
kfree_rcu_work+0x51c/0xa10 kernel/rcu/tree.c:3273
process_one_work+0x9b2/0x1690 kernel/workqueue.c:2298
worker_thread+0x658/0x11f0 kernel/workqueue.c:2445
kthread+0x405/0x4f0 kernel/kthread.c:327
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295

Last potentially related work creation:
kasan_save_stack+0x1e/0x50 mm/kasan/common.c:38
__kasan_record_aux_stack+0xb5/0xe0 mm/kasan/generic.c:348
kvfree_call_rcu+0x74/0x990 kernel/rcu/tree.c:3550
nft_set_catchall_destroy net/netfilter/nf_tables_api.c:4489 [inline]
nft_set_destroy net/netfilter/nf_tables_api.c:4504 [inline]
nft_set_destroy+0x34a/0x4f0 net/netfilter/nf_tables_api.c:4493
__nft_release_table+0x79f/0xcd0 net/netfilter/nf_tables_api.c:9626
nft_rcv_nl_event+0x4f8/0x670 net/netfilter/nf_tables_api.c:9688
notifier_call_chain+0xb5/0x200 kernel/notifier.c:83
blocking_notifier_call_chain kernel/notifier.c:318 [inline]
blocking_notifier_call_chain+0x67/0x90 kernel/notifier.c:306
netlink_release+0xcb6/0x1dd0 net/netlink/af_netlink.c:788
__sock_release+0xcd/0x280 net/socket.c:649
sock_close+0x18/0x20 net/socket.c:1314
__fput+0x286/0x9f0 fs/file_table.c:280
task_work_run+0xdd/0x1a0 kernel/task_work.c:164
tracehook_notify_resume include/linux/tracehook.h:189 [inline]
exit_to_user_mode_loop kernel/entry/common.c:175 [inline]
exit_to_user_mode_prepare+0x27e/0x290 kernel/entry/common.c:207
__syscall_exit_to_user_mode_work kernel/entry/common.c:289 [inline]
syscall_exit_to_user_mode+0x19/0x60 kernel/entry/common.c:300
do_syscall_64+0x42/0xb0 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x44/0xae

The buggy address belongs to the object at ffff8880716e5b80
which belongs to the cache kmalloc-64 of size 64
The buggy address is located 0 bytes inside of
64-byte region [ffff8880716e5b80, ffff8880716e5bc0)
The buggy address belongs to the page:
page:ffffea0001c5b940 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff8880716e5c00 pfn:0x716e5
flags: 0xfff00000000200(slab|node=0|zone=1|lastcpupid=0x7ff)
raw: 00fff00000000200 ffffea0000911848 ffffea00007c4d48 ffff888010c40200
raw: ffff8880716e5c00 ffff8880716e5000 000000010000001e 0000000000000000
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 0, migratetype Unmovable, gfp_mask 0x242040(__GFP_IO|__GFP_NOWARN|__GFP_COMP|__GFP_THISNODE), pid 3638, ts 211086074437, free_ts 211031029429
prep_new_page mm/page_alloc.c:2418 [inline]
get_page_from_freelist+0xa72/0x2f50 mm/page_alloc.c:4149
__alloc_pages+0x1b2/0x500 mm/page_alloc.c:5369
__alloc_pages_node include/linux/gfp.h:570 [inline]
kmem_getpages mm/slab.c:1377 [inline]
cache_grow_begin+0x75/0x470 mm/slab.c:2593
cache_alloc_refill+0x27f/0x380 mm/slab.c:2965
____cache_alloc mm/slab.c:3048 [inline]
____cache_alloc mm/slab.c:3031 [inline]
__do_cache_alloc mm/slab.c:3275 [inline]
slab_alloc mm/slab.c:3316 [inline]
__do_kmalloc mm/slab.c:3700 [inline]
__kmalloc+0x3b3/0x4d0 mm/slab.c:3711
kmalloc include/linux/slab.h:595 [inline]
kzalloc include/linux/slab.h:724 [inline]
tomoyo_get_name+0x234/0x480 security/tomoyo/memory.c:173
tomoyo_parse_name_union+0xbc/0x160 security/tomoyo/util.c:260
tomoyo_update_path_number_acl security/tomoyo/file.c:687 [inline]
tomoyo_write_file+0x629/0x7f0 security/tomoyo/file.c:1034
tomoyo_write_domain2+0x116/0x1d0 security/tomoyo/common.c:1152
tomoyo_add_entry security/tomoyo/common.c:2042 [inline]
tomoyo_supervisor+0xbc7/0xf00 security/tomoyo/common.c:2103
tomoyo_audit_path_number_log security/tomoyo/file.c:235 [inline]
tomoyo_path_number_perm+0x419/0x590 security/tomoyo/file.c:734
security_file_ioctl+0x50/0xb0 security/security.c:1541
__do_sys_ioctl fs/ioctl.c:868 [inline]
__se_sys_ioctl fs/ioctl.c:860 [inline]
__x64_sys_ioctl+0xb3/0x200 fs/ioctl.c:860
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
page last free stack trace:
reset_page_owner include/linux/page_owner.h:24 [inline]
free_pages_prepare mm/page_alloc.c:1338 [inline]
free_pcp_prepare+0x374/0x870 mm/page_alloc.c:1389
free_unref_page_prepare mm/page_alloc.c:3309 [inline]
free_unref_page+0x19/0x690 mm/page_alloc.c:3388
slab_destroy mm/slab.c:1627 [inline]
slabs_destroy+0x89/0xc0 mm/slab.c:1647
cache_flusharray mm/slab.c:3418 [inline]
___cache_free+0x4cc/0x610 mm/slab.c:3480
qlink_free mm/kasan/quarantine.c:146 [inline]
qlist_free_all+0x4e/0x110 mm/kasan/quarantine.c:165
kasan_quarantine_reduce+0x180/0x200 mm/kasan/quarantine.c:272
__kasan_slab_alloc+0x97/0xb0 mm/kasan/common.c:444
kasan_slab_alloc include/linux/kasan.h:259 [inline]
slab_post_alloc_hook mm/slab.h:519 [inline]
slab_alloc_node mm/slab.c:3261 [inline]
kmem_cache_alloc_node+0x2ea/0x590 mm/slab.c:3599
__alloc_skb+0x215/0x340 net/core/skbuff.c:414
alloc_skb include/linux/skbuff.h:1126 [inline]
nlmsg_new include/net/netlink.h:953 [inline]
rtmsg_ifinfo_build_skb+0x72/0x1a0 net/core/rtnetlink.c:3808
rtmsg_ifinfo_event net/core/rtnetlink.c:3844 [inline]
rtmsg_ifinfo_event net/core/rtnetlink.c:3835 [inline]
rtmsg_ifinfo+0x83/0x120 net/core/rtnetlink.c:3853
netdev_state_change net/core/dev.c:1395 [inline]
netdev_state_change+0x114/0x130 net/core/dev.c:1386
linkwatch_do_dev+0x10e/0x150 net/core/link_watch.c:167
__linkwatch_run_queue+0x233/0x6a0 net/core/link_watch.c:213
linkwatch_event+0x4a/0x60 net/core/link_watch.c:252
process_one_work+0x9b2/0x1690 kernel/workqueue.c:2298

Memory state around the buggy address:
ffff8880716e5a80: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
ffff8880716e5b00: 00 00 00 00 00 00 fc fc fc fc fc fc fc fc fc fc
>ffff8880716e5b80: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
^
ffff8880716e5c00: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
ffff8880716e5c80: 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc

Fixes: aaa31047a6d2 ("netfilter: nftables: add catch-all set element support")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 6fb721cf 26-Sep-2021 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: honor NLM_F_CREATE and NLM_F_EXCL in event notification

Include the NLM_F_CREATE and NLM_F_EXCL flags in netlink event
notifications, otherwise userspace cannot distiguish between create and
add commands.

Fixes: 96518518cc41 ("netfilter: add nftables")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 2c964c55 24-Sep-2021 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: reverse order in rule replacement expansion

Deactivate old rule first, then append the new rule, so rule replacement
notification via netlink first reports the deletion of the old rule with
handle X in first place, then it adds the new rule (reusing the handle X
of the replaced old rule).

Note that the abort path releases the transaction that has been created
by nft_delrule() on error.

Fixes: ca08987885a1 ("netfilter: nf_tables: deactivate expressions in rule replecement routine")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# e189ae16 20-Sep-2021 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: add position handle in event notification

Add position handle to allow to identify the rule location from netlink
events. Otherwise, userspace cannot incrementally update a userspace
cache through monitoring events.

Skip handle dump if the rule has been either inserted (at the beginning
of the ruleset) or appended (at the end of the ruleset), the
NLM_F_APPEND netlink flag is sufficient in these two cases.

Handle NLM_F_REPLACE as NLM_F_APPEND since the rule replacement
expansion appends it after the specified rule handle.

Fixes: 96518518cc41 ("netfilter: add nftables")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 45928afe 13-Sep-2021 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: Fix oversized kvmalloc() calls

The commit 7661809d493b ("mm: don't allow oversized kvmalloc() calls")
limits the max allocatable memory via kvmalloc() to MAX_INT.

Reported-by: syzbot+cd43695a64bcd21b8596@syzkaller.appspotmail.com
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# a499b03b 13-Sep-2021 Florian Westphal <fw@strlen.de>

netfilter: nf_tables: unlink table before deleting it

syzbot reports following UAF:
BUG: KASAN: use-after-free in memcmp+0x18f/0x1c0 lib/string.c:955
nla_strcmp+0xf2/0x130 lib/nlattr.c:836
nft_table_lookup.part.0+0x1a2/0x460 net/netfilter/nf_tables_api.c:570
nft_table_lookup net/netfilter/nf_tables_api.c:4064 [inline]
nf_tables_getset+0x1b3/0x860 net/netfilter/nf_tables_api.c:4064
nfnetlink_rcv_msg+0x659/0x13f0 net/netfilter/nfnetlink.c:285
netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2504

Problem is that all get operations are lockless, so the commit_mutex
held by nft_rcv_nl_event() isn't enough to stop a parallel GET request
from doing read-accesses to the table object even after synchronize_rcu().

To avoid this, unlink the table first and store the table objects in
on-stack scratch space.

Fixes: 6001a930ce03 ("netfilter: nftables: introduce table ownership")
Reported-and-tested-by: syzbot+f31660cf279b0557160c@syzkaller.appspotmail.com
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# cfbe3650 13-Jul-2021 Dongliang Mu <mudongliangabcd@gmail.com>

netfilter: nf_tables: fix audit memory leak in nf_tables_commit

In nf_tables_commit, if nf_tables_commit_audit_alloc fails, it does not
free the adp variable.

Fix this by adding nf_tables_commit_audit_free which frees
the linked list with the head node adl.

backtrace:
kmalloc include/linux/slab.h:591 [inline]
kzalloc include/linux/slab.h:721 [inline]
nf_tables_commit_audit_alloc net/netfilter/nf_tables_api.c:8439 [inline]
nf_tables_commit+0x16e/0x1760 net/netfilter/nf_tables_api.c:8508
nfnetlink_rcv_batch+0x512/0xa80 net/netfilter/nfnetlink.c:562
nfnetlink_rcv_skb_batch net/netfilter/nfnetlink.c:634 [inline]
nfnetlink_rcv+0x1fa/0x220 net/netfilter/nfnetlink.c:652
netlink_unicast_kernel net/netlink/af_netlink.c:1314 [inline]
netlink_unicast+0x2c7/0x3e0 net/netlink/af_netlink.c:1340
netlink_sendmsg+0x36b/0x6b0 net/netlink/af_netlink.c:1929
sock_sendmsg_nosec net/socket.c:702 [inline]
sock_sendmsg+0x56/0x80 net/socket.c:722

Reported-by: syzbot <syzkaller@googlegroups.com>
Reported-by: kernel test robot <lkp@intel.com>
Fixes: c520292f29b8 ("audit: log nftables configuration change events once per table")
Signed-off-by: Dongliang Mu <mudongliangabcd@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 4ca041f9 24-Jun-2021 Colin Ian King <colin.king@canonical.com>

netfilter: nf_tables: Fix dereference of null pointer flow

In the case where chain->flags & NFT_CHAIN_HW_OFFLOAD is false then
nft_flow_rule_create is not called and flow is NULL. The subsequent
error handling execution via label err_destroy_flow_rule will lead
to a null pointer dereference on flow when calling nft_flow_rule_destroy.
Since the error path to err_destroy_flow_rule has to cater for null
and non-null flows, only call nft_flow_rule_destroy if flow is non-null
to fix this issue.

Addresses-Coverity: ("Explicity null dereference")
Fixes: 3c5e44622011 ("netfilter: nf_tables: memleak in hw offload abort path")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# e31f072f 21-Jun-2021 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: do not allow to delete table with owner by handle

nft_table_lookup_byhandle() also needs to validate the netlink PortID
owner when deleting a table by handle.

Fixes: 6001a930ce03 ("netfilter: nftables: introduce table ownership")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 53479909 22-Jun-2021 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: skip netlink portID validation if zero

nft_table_lookup() allows us to obtain the table object by the name and
the family. The netlink portID validation needs to be skipped for the
dump path, since the ownership only applies to commands to update the
given table. Skip validation if the specified netlink PortID is zero
when calling nft_table_lookup().

Fixes: 6001a930ce03 ("netfilter: nftables: introduce table ownership")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 3c5e4462 18-Jun-2021 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: memleak in hw offload abort path

Release flow from the abort path, this is easy to reproduce since
b72920f6e4a9 ("netfilter: nftables: counter hardware offload support").
If the preparation phase fails, then the abort path is exercised without
releasing the flow rule object.

unreferenced object 0xffff8881f0fa7700 (size 128):
comm "nft", pid 1335, jiffies 4294931120 (age 4163.740s)
hex dump (first 32 bytes):
08 e4 de 13 82 88 ff ff 98 e4 de 13 82 88 ff ff ................
48 e4 de 13 82 88 ff ff 01 00 00 00 00 00 00 00 H...............
backtrace:
[<00000000634547e7>] flow_rule_alloc+0x26/0x80
[<00000000c8426156>] nft_flow_rule_create+0xc9/0x3f0 [nf_tables]
[<0000000075ff8e46>] nf_tables_newrule+0xc79/0x10a0 [nf_tables]
[<00000000ba65e40e>] nfnetlink_rcv_batch+0xaac/0xf90 [nfnetlink]
[<00000000505c614a>] nfnetlink_rcv+0x1bb/0x1f0 [nfnetlink]
[<00000000eb78e1fe>] netlink_unicast+0x34b/0x480
[<00000000a8f72c94>] netlink_sendmsg+0x3af/0x690
[<000000009cb1ddf4>] sock_sendmsg+0x96/0xa0
[<0000000039d06e44>] ____sys_sendmsg+0x3fe/0x440
[<00000000137e82ca>] ___sys_sendmsg+0xd8/0x140
[<000000000c6bf6a6>] __sys_sendmsg+0xb3/0x130
[<0000000043bd6268>] do_syscall_64+0x40/0xb0
[<00000000afdebc2d>] entry_SYSCALL_64_after_hwframe+0x44/0xae

Remove flow rule release from the offload commit path, otherwise error
from the offload commit phase might trigger a double-free due to the
execution of the abort_offload -> abort. After this patch, the abort
path takes care of releasing the flow rule.

This fix also needs to move the nft_flow_rule_create() call before the
transaction object is added otherwise the abort path might find a NULL
pointer to the flow rule object for the NFT_CHAIN_HW_OFFLOAD case.

While at it, rename BASIC-like goto tags to slightly more meaningful
names rather than adding a new "err3" tag.

Fixes: 63b48c73ff56 ("netfilter: nf_tables_offload: undo updates if transaction fails")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# c5c6accd 08-Jun-2021 Florian Westphal <fw@strlen.de>

netfilter: nf_tables: move base hook annotation to init helper

coverity scanner says:
2187 if (nft_is_base_chain(chain)) {
vvv CID 1505166: Memory - corruptions (UNINIT)
vvv Using uninitialized value "basechain".
2188 basechain->ops.hook_ops_type = NF_HOOK_OP_NF_TABLES;

... I don't see how nft_is_base_chain() can evaluate to true
while basechain pointer is garbage.

However, it seems better to place the NF_HOOK_OP_NF_TABLES annotation
in nft_basechain_hook_init() instead.

Reported-by: coverity-bot <keescook+coverity-bot@chromium.org>
Addresses-Coverity-ID: 1505166 ("Memory - corruptions")
Fixes: 65b8b7bfc5284f ("netfilter: annotate nf_tables base hook ops")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 7b4b2fa3 03-Jun-2021 Florian Westphal <fw@strlen.de>

netfilter: annotate nf_tables base hook ops

This will allow a followup patch to treat the 'ops->priv' pointer
as nft_chain argument without having to first walk the table/chains
to check if there is a matching base chain pointer.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 67086651 30-May-2021 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: remove nft_ctx_init_from_setattr()

Replace nft_ctx_init_from_setattr() by nft_table_lookup().

This patch also disentangles nf_tables_delset() where NFTA_SET_TABLE is
required while nft_ctx_init_from_setattr() allows it to be optional.

From the nf_tables_delset() path, this also allows to set up the context
structure when it is needed.

Removing this helper function saves us 14 LoC, so it is not helping to
consolidate code.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# e2b750d7 30-May-2021 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: remove nft_ctx_init_from_elemattr()

Replace nft_ctx_init_from_elemattr() by nft_table_lookup() and set up
the context structure right before it is really needed.

Moreover, nft_ctx_init_from_elemattr() is setting up the context
structure for codepaths where this is not really needed at all.

This helper function is also not helping to consolidate code, removing
it saves us 4 LoC.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# ef4b65e5 30-May-2021 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nfnetlink: add struct nfgenmsg to struct nfnl_info and use it

Update the nfnl_info structure to add a pointer to the nfnetlink header.
This simplifies the existing codebase since this header is usually
accessed. Update existing clients to use this new field.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# ad9f151e 03-Jun-2021 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: initialize set before expression setup

nft_set_elem_expr_alloc() needs an initialized set if expression sets on
the NFT_EXPR_GC flag. Move set fields initialization before expression
setup.

[4512935.019450] ==================================================================
[4512935.019456] BUG: KASAN: null-ptr-deref in nft_set_elem_expr_alloc+0x84/0xd0 [nf_tables]
[4512935.019487] Read of size 8 at addr 0000000000000070 by task nft/23532
[4512935.019494] CPU: 1 PID: 23532 Comm: nft Not tainted 5.12.0-rc4+ #48
[...]
[4512935.019502] Call Trace:
[4512935.019505] dump_stack+0x89/0xb4
[4512935.019512] ? nft_set_elem_expr_alloc+0x84/0xd0 [nf_tables]
[4512935.019536] ? nft_set_elem_expr_alloc+0x84/0xd0 [nf_tables]
[4512935.019560] kasan_report.cold.12+0x5f/0xd8
[4512935.019566] ? nft_set_elem_expr_alloc+0x84/0xd0 [nf_tables]
[4512935.019590] nft_set_elem_expr_alloc+0x84/0xd0 [nf_tables]
[4512935.019615] nf_tables_newset+0xc7f/0x1460 [nf_tables]

Reported-by: syzbot+ce96ca2b1d0b37c6422d@syzkaller.appspotmail.com
Fixes: 65038428b2c6 ("netfilter: nf_tables: allow to specify stateful expression in set definition")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 179d9ba5 24-May-2021 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: fix table flag updates

The dormant flag need to be updated from the preparation phase,
otherwise, two consecutive requests to dorm a table in the same batch
might try to remove the same hooks twice, resulting in the following
warning:

hook not found, pf 3 num 0
WARNING: CPU: 0 PID: 334 at net/netfilter/core.c:480 __nf_unregister_net_hook+0x1eb/0x610 net/netfilter/core.c:480
Modules linked in:
CPU: 0 PID: 334 Comm: kworker/u4:5 Not tainted 5.12.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: netns cleanup_net
RIP: 0010:__nf_unregister_net_hook+0x1eb/0x610 net/netfilter/core.c:480

This patch is a partial revert of 0ce7cf4127f1 ("netfilter: nftables:
update table flags from the commit phase") to restore the previous
behaviour.

However, there is still another problem: A batch containing a series of
dorm-wakeup-dorm table and vice-versa also trigger the warning above
since hook unregistration happens from the preparation phase, while hook
registration occurs from the commit phase.

To fix this problem, this patch adds two internal flags to annotate the
original dormant flag status which are __NFT_TABLE_F_WAS_DORMANT and
__NFT_TABLE_F_WAS_AWAKEN, to restore it from the abort path.

The __NFT_TABLE_F_UPDATE bitmask allows to handle the dormant flag update
with one single transaction.

Reported-by: syzbot+7ad5cd1615f2d89c6e7e@syzkaller.appspotmail.com
Fixes: 0ce7cf4127f1 ("netfilter: nftables: update table flags from the commit phase")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 983c4fcb 19-May-2021 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: extended netlink error reporting for chain type

Users that forget to select the NAT chain type in netfilter's Kconfig
hit ENOENT when adding the basechain.

This report is however sparse since it might be the table, the chain
or the kernel module that is missing/does not exist.

This patch provides extended netlink error reporting for the
NFTA_CHAIN_TYPE netlink attribute, which conveys the basechain type.
If the user selects a basechain that his custom kernel does not support,
the netlink extended error provides a more accurate hint on the
described issue.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# c781471d 19-May-2021 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: missing error reporting for not selected expressions

Sometimes users forget to turn on nftables extensions from Kconfig that
they need. In such case, the error reporting from userspace is
misleading:

$ sudo nft add rule x y counter
Error: Could not process rule: No such file or directory
add rule x y counter
^^^^^^^^^^^^^^^^^^^^

Add missing NL_SET_BAD_ATTR() to provide a hint:

$ nft add rule x y counter
Error: Could not process rule: No such file or directory
add rule x y counter
^^^^^^^

Fixes: 83d9dcba06c5 ("netfilter: nf_tables: extended netlink error reporting for expressions")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 6c8774a9 06-May-2021 Eric Dumazet <edumazet@google.com>

netfilter: nftables: avoid potential overflows on 32bit arches

User space could ask for very large hash tables, we need to make sure
our size computations wont overflow.

nf_tables_newset() needs to double check the u64 size
will fit into size_t field.

Fixes: 0ed6389c483d ("netfilter: nf_tables: rename set implementations")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 85dfd816 05-May-2021 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nftables: Fix a memleak from userdata error path in new objects

Release object name if userdata allocation fails.

Fixes: b131c96496b3 ("netfilter: nf_tables: add userdata support for nft_object")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# aaa31047 27-Apr-2021 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nftables: add catch-all set element support

This patch extends the set infrastructure to add a special catch-all set
element. If the lookup fails to find an element (or range) in the set,
then the catch-all element is selected. Users can specify a mapping,
expression(s) and timeout to be attached to the catch-all element.

This patch adds a catchall list to the set, this list might contain more
than one single catch-all element (e.g. in case that the catch-all
element is removed and a new one is added in the same transaction).
However, most of the time, there will be either one element or no
elements at all in this list.

The catch-all element is identified via NFT_SET_ELEM_CATCHALL flag and
such special element has no NFTA_SET_ELEM_KEY attribute. There is a new
nft_set_elem_catchall object that stores a reference to the dummy
catch-all element (catchall->elem) whose layout is the same of the set
element type to reuse the existing set element codebase.

The set size does not apply to the catch-all element, users can define a
catch-all element even if the set is full.

The check for valid set element flags hava been updates to report
EOPNOTSUPP in case userspace requests flags that are not supported when
using new userspace nftables and old kernel.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 97c976d6 27-Apr-2021 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nftables: add helper function to validate set element data

When binding sets to rule, validate set element data according to
set definition. This patch adds a helper function to be reused by
the catch-all set element support.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# e6ba7cb6 27-Apr-2021 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nftables: add helper function to flush set elements

This patch adds nft_set_flush() which prepares for the catch-all
element support.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 6387aa6e 27-Apr-2021 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nftables: add loop check helper function

This patch adds nft_check_loops() to reuse it in the new catch-all
element codebase.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# f8bb7889 27-Apr-2021 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nftables: rename set element data activation/deactivation functions

Rename:

- nft_set_elem_activate() to nft_set_elem_data_activate().
- nft_set_elem_deactivate() to nft_set_elem_data_deactivate().

To prepare for updates in the set element infrastructure to add support
for the special catch-all element.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 50f2db9e 22-Apr-2021 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nfnetlink: consolidate callback types

Add enum nfnl_callback_type to identify the callback type to provide one
single callback.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 7dab8ee3 22-Apr-2021 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nfnetlink: pass struct nfnl_info to batch callbacks

Update batch callbacks to use the nfnl_info structure. Rename one
clashing info variable to expr_info.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 797d4980 22-Apr-2021 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nfnetlink: pass struct nfnl_info to rcu callbacks

Update rcu callbacks to use the nfnl_info structure.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# d59d2f82 22-Apr-2021 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nftables: add nft_pernet() helper function

Consolidate call to net_generic(net, nf_tables_net_id) in this
wrapper function.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# b72920f6 15-Apr-2021 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nftables: counter hardware offload support

This patch adds the .offload_stats operation to synchronize hardware
stats with the expression data. Update the counter expression to use
this new interface. The hardware stats are retrieved from the netlink
dump path via FLOW_CLS_STATS command to the driver.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 0854db2a 01-Apr-2021 Florian Westphal <fw@strlen.de>

netfilter: nf_tables: use net_generic infra for transaction data

This moves all nf_tables pernet data from struct net to a net_generic
extension, with the exception of the gencursor.

The latter is used in the data path and also outside of the nf_tables
core. All others are only used from the configuration plane.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 8c56049f 31-Mar-2021 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nftables: remove documentation on static functions

Since 4f16d25c68ec ("netfilter: nftables: add nft_parse_register_load()
and use it") and 345023b0db31 ("netfilter: nftables: add
nft_parse_register_store() and use it"), the following functions are not
exported symbols anymore:

- nft_parse_register()
- nft_validate_register_load()
- nft_validate_register_store()

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# dadf33c9 02-Apr-2021 Dan Carpenter <dan.carpenter@oracle.com>

netfilter: nftables: fix a warning message in nf_tables_commit_audit_collect()

The first argument of a WARN_ONCE() is a condition. This WARN_ONCE()
will only print the table name, and is potentially problematic if the
table name has a %s in it.

Fixes: c520292f29b8 ("audit: log nftables configuration change events once per table")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 19c28b13 30-Mar-2021 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: add helper function to set up the nfnetlink header and use it

This patch adds a helper function to set up the netlink and nfnetlink headers.
Update existing codebase to use it.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 802b8051 30-Mar-2021 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nftables: add helper function to set the base sequence number

This patch adds a helper function to calculate the base sequence number
field that is stored in the nfnetlink header. Use the helper function
whenever possible.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 7726c9ce 29-Mar-2021 Yang Yingliang <yangyingliang@huawei.com>

netfilter: nftables: remove unnecessary spin_lock_init()

The spinlock nf_tables_destroy_list_lock is initialized statically.
It is unnecessary to initialize by spin_lock_init().

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# c520292f 26-Mar-2021 Richard Guy Briggs <rgb@redhat.com>

audit: log nftables configuration change events once per table

Reduce logging of nftables events to a level similar to iptables.
Restore the table field to list the table, adding the generation.

Indicate the op as the most significant operation in the event.

A couple of sample events:

type=PROCTITLE msg=audit(2021-03-18 09:30:49.801:143) : proctitle=/usr/bin/python3 -s /usr/sbin/firewalld --nofork --nopid
type=SYSCALL msg=audit(2021-03-18 09:30:49.801:143) : arch=x86_64 syscall=sendmsg success=yes exit=172 a0=0x6 a1=0x7ffdcfcbe650 a2=0x0 a3=0x7ffdcfcbd52c items=0 ppid=1 pid=367 auid=unset uid=root gid=root euid=root suid=root fsuid=root egid=roo
t sgid=root fsgid=root tty=(none) ses=unset comm=firewalld exe=/usr/bin/python3.9 subj=system_u:system_r:firewalld_t:s0 key=(null)
type=NETFILTER_CFG msg=audit(2021-03-18 09:30:49.801:143) : table=firewalld:2 family=ipv6 entries=1 op=nft_register_table pid=367 subj=system_u:system_r:firewalld_t:s0 comm=firewalld
type=NETFILTER_CFG msg=audit(2021-03-18 09:30:49.801:143) : table=firewalld:2 family=ipv4 entries=1 op=nft_register_table pid=367 subj=system_u:system_r:firewalld_t:s0 comm=firewalld
type=NETFILTER_CFG msg=audit(2021-03-18 09:30:49.801:143) : table=firewalld:2 family=inet entries=1 op=nft_register_table pid=367 subj=system_u:system_r:firewalld_t:s0 comm=firewalld

type=PROCTITLE msg=audit(2021-03-18 09:30:49.839:144) : proctitle=/usr/bin/python3 -s /usr/sbin/firewalld --nofork --nopid
type=SYSCALL msg=audit(2021-03-18 09:30:49.839:144) : arch=x86_64 syscall=sendmsg success=yes exit=22792 a0=0x6 a1=0x7ffdcfcbe650 a2=0x0 a3=0x7ffdcfcbd52c items=0 ppid=1 pid=367 auid=unset uid=root gid=root euid=root suid=root fsuid=root egid=r
oot sgid=root fsgid=root tty=(none) ses=unset comm=firewalld exe=/usr/bin/python3.9 subj=system_u:system_r:firewalld_t:s0 key=(null)
type=NETFILTER_CFG msg=audit(2021-03-18 09:30:49.839:144) : table=firewalld:3 family=ipv6 entries=30 op=nft_register_chain pid=367 subj=system_u:system_r:firewalld_t:s0 comm=firewalld
type=NETFILTER_CFG msg=audit(2021-03-18 09:30:49.839:144) : table=firewalld:3 family=ipv4 entries=30 op=nft_register_chain pid=367 subj=system_u:system_r:firewalld_t:s0 comm=firewalld
type=NETFILTER_CFG msg=audit(2021-03-18 09:30:49.839:144) : table=firewalld:3 family=inet entries=165 op=nft_register_chain pid=367 subj=system_u:system_r:firewalld_t:s0 comm=firewalld

The issue was originally documented in
https://github.com/linux-audit/audit-kernel/issues/124

Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# cefa31a9 25-Mar-2021 Florian Westphal <fw@strlen.de>

netfilter: nft_log: perform module load from nf_tables

modprobe calls from the nf_logger_find_get() API causes deadlock in very
special cases because they occur with the nf_tables transaction mutex held.

In the specific case of nf_log, deadlock is via:

A nf_tables -> transaction mutex -> nft_log -> modprobe -> nf_log_syslog \
-> pernet_ops rwsem -> wait for C
B netlink event -> rtnl_mutex -> nf_tables transaction mutex -> wait for A
C close() -> ip6mr_sk_done -> rtnl_mutex -> wait for B

Earlier patch added NFLOG/xt_LOG module softdeps to avoid the need to load
the backend module during a transaction.

For nft_log we would have to add a softdep for both nfnetlink_log or
nf_log_syslog, since we do not know in advance which of the two backends
are going to be configured.

This defers the modprobe op until after the transaction mutex is released.

Tested-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 0ce7cf41 17-Mar-2021 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nftables: update table flags from the commit phase

Do not update table flags from the preparation phase. Store the flags
update into the transaction, then update the flags from the commit
phase.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# c2168e6b 05-Mar-2021 Gustavo A. R. Silva <gustavoars@kernel.org>

netfilter: Fix fall-through warnings for Clang

In preparation to enable -Wimplicit-fallthrough for Clang, fix multiple
warnings by explicitly adding multiple break statements instead of just
letting the code fall through to the next case.

Link: https://github.com/KSPP/linux/issues/115
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 4d8f9065 10-Apr-2021 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nftables: clone set element expression template

memcpy() breaks when using connlimit in set elements. Use
nft_expr_clone() to initialize the connlimit expression list, otherwise
connlimit garbage collector crashes when walking on the list head copy.

[ 493.064656] Workqueue: events_power_efficient nft_rhash_gc [nf_tables]
[ 493.064685] RIP: 0010:find_or_evict+0x5a/0x90 [nf_conncount]
[ 493.064694] Code: 2b 43 40 83 f8 01 77 0d 48 c7 c0 f5 ff ff ff 44 39 63 3c 75 df 83 6d 18 01 48 8b 43 08 48 89 de 48 8b 13 48 8b 3d ee 2f 00 00 <48> 89 42 08 48 89 10 48 b8 00 01 00 00 00 00 ad de 48 89 03 48 83
[ 493.064699] RSP: 0018:ffffc90000417dc0 EFLAGS: 00010297
[ 493.064704] RAX: 0000000000000000 RBX: ffff888134f38410 RCX: 0000000000000000
[ 493.064708] RDX: 0000000000000000 RSI: ffff888134f38410 RDI: ffff888100060cc0
[ 493.064711] RBP: ffff88812ce594a8 R08: ffff888134f38438 R09: 00000000ebb9025c
[ 493.064714] R10: ffffffff8219f838 R11: 0000000000000017 R12: 0000000000000001
[ 493.064718] R13: ffffffff82146740 R14: ffff888134f38410 R15: 0000000000000000
[ 493.064721] FS: 0000000000000000(0000) GS:ffff88840e440000(0000) knlGS:0000000000000000
[ 493.064725] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 493.064729] CR2: 0000000000000008 CR3: 00000001330aa002 CR4: 00000000001706e0
[ 493.064733] Call Trace:
[ 493.064737] nf_conncount_gc_list+0x8f/0x150 [nf_conncount]
[ 493.064746] nft_rhash_gc+0x106/0x390 [nf_tables]

Reported-by: Laura Garcia Liebana <nevola@gmail.com>
Fixes: 409444522976 ("netfilter: nf_tables: add elements with stateful expressions")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 86fe2c19 17-Mar-2021 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nftables: skip hook overlap logic if flowtable is stale

If the flowtable has been previously removed in this batch, skip the
hook overlap checks. This fixes spurious EEXIST errors when removing and
adding the flowtable in the same batch.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 7b35582c 16-Mar-2021 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nftables: allow to update flowtable flags

Honor flowtable flags from the control update path. Disallow disabling
to toggle hardware offload support though.

Fixes: 8bb69f3b2918 ("netfilter: nf_tables: add flowtable offload control plane")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 7e6136f1 17-Mar-2021 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nftables: report EOPNOTSUPP on unsupported flowtable flags

Error was not set accordingly.

Fixes: 8bb69f3b2918 ("netfilter: nf_tables: add flowtable offload control plane")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# bd1777b3 03-Mar-2021 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nftables: bogus check for netlink portID with table owner

The existing branch checks for 0 != table->nlpid which always evaluates
true for tables that have an owner.

Fixes: 6001a930ce03 ("netfilter: nftables: introduce table ownership")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 2888b080 03-Mar-2021 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nftables: fix possible double hook unregistration with table owner

Skip hook unregistration of owner tables from the netns exit path,
nft_rcv_nl_event() unregisters the table hooks before tearing down
the table content.

Fixes: 6001a930ce03 ("netfilter: nftables: introduce table ownership")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 9cc0001a 27-Feb-2021 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nftables: disallow updates on table ownership

Disallow updating the ownership bit on an existing table: Do not allow
to grab ownership on an existing table. Do not allow to drop ownership
on an existing table.

Fixes: 6001a930ce03 ("netfilter: nftables: introduce table ownership")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 6001a930 14-Feb-2021 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nftables: introduce table ownership

A userspace daemon like firewalld might need to monitor for netlink
updates to detect its ruleset removal by the (global) flush ruleset
command to ensure ruleset persistency. This adds extra complexity from
userspace and, for some little time, the firewall policy is not in
place.

This patch adds the NFT_TABLE_F_OWNER flag which allows a userspace
program to own the table that creates in exclusivity.

Tables that are owned...

- can only be updated and removed by the owner, non-owners hit EPERM if
they try to update it or remove it.
- are destroyed when the owner closes the netlink socket or the process
is gone (implicit netlink socket closure).
- are skipped by the global flush ruleset command.
- are listed in the global ruleset.

The userspace process that sets on the NFT_TABLE_F_OWNER flag need to
leave open the netlink socket.

A new NFTA_TABLE_OWNER netlink attribute specifies the netlink port ID
to identify the owner from userspace.

This patch also updates error reporting when an unknown table flag is
specified to change it from EINVAL to EOPNOTSUPP given that EINVAL is
usually reserved to report for malformed netlink messages to userspace.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 00dfe9be 14-Feb-2021 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nftables: add helper function to release hooks of one single table

Add a function to release the hooks of one single table.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# fd020332 15-Feb-2021 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nftables: add helper function to release one table

Add a function to release one table.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 08a01c11 25-Jan-2021 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nftables: statify nft_parse_register()

This function is not used anymore by any extension, statify it.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 345023b0 25-Jan-2021 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nftables: add nft_parse_register_store() and use it

This new function combines the netlink register attribute parser
and the store validation function.

This update requires to replace:

enum nft_registers dreg:8;

in many of the expression private areas otherwise compiler complains
with:

error: cannot take address of bit-field ‘dreg’

when passing the register field as reference.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 4f16d25c 25-Jan-2021 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nftables: add nft_parse_register_load() and use it

This new function combines the netlink register attribute parser
and the load validation function.

This update requires to replace:

enum nft_registers sreg:8;

in many of the expression private areas otherwise compiler complains
with:

error: cannot take address of bit-field ‘sreg’

when passing the register field as reference.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 664899e8 08-Feb-2021 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nftables: relax check for stateful expressions in set definition

Restore the original behaviour where users are allowed to add an element
with any stateful expression if the set definition specifies no stateful
expressions. Make sure upper maximum number of stateful expressions of
NFT_SET_EXPR_MAX is not reached.

Fixes: 8cfd9b0f8515 ("netfilter: nftables: generalize set expressions support")
Fixes: 48b0ae046ee9 ("netfilter: nftables: netlink support for several set element expressions")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 767d1216 02-Feb-2021 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nftables: fix possible UAF over chains from packet path in netns

Although hooks are released via call_rcu(), chain and rule objects are
immediately released while packets are still walking over these bits.

This patch adds the .pre_exit callback which is invoked before
synchronize_rcu() in the netns framework to stay safe.

Remove a comment which is not valid anymore since the core does not use
synchronize_net() anymore since 8c873e219970 ("netfilter: core: free
hooks with call_rcu").

Suggested-by: Florian Westphal <fw@strlen.de>
Fixes: df05ef874b28 ("netfilter: nf_tables: release objects on netns destruction")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# fca05d4d 15-Jan-2021 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nft_dynset: honor stateful expressions in set definition

If the set definition contains stateful expressions, allocate them for
the newly added entries from the packet path.

Fixes: 65038428b2c6 ("netfilter: nf_tables: allow to specify stateful expression in set definition")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# b4e70d8d 26-Dec-2020 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nftables: add set expression flags

The set flag NFT_SET_EXPR provides a hint to the kernel that userspace
supports for multiple expressions per set element. In the same
direction, NFT_DYNSET_F_EXPR specifies that dynset expression defines
multiple expressions per set element.

This allows new userspace software with old kernels to bail out with
EOPNOTSUPP. This update is similar to ef516e8625dd ("netfilter:
nf_tables: reintroduce the NFT_SET_CONCAT flag"). The NFT_SET_EXPR flag
needs to be set on when the NFTA_SET_EXPRESSIONS attribute is specified.
The NFT_SET_EXPR flag is not set on with NFTA_SET_EXPR to retain
backward compatibility in old userspace binaries.

Fixes: 48b0ae046ee9 ("netfilter: nftables: netlink support for several set element expressions")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 161b838e 14-Dec-2020 Colin Ian King <colin.king@canonical.com>

netfilter: nftables: fix incorrect increment of loop counter

The intention of the err_expr cleanup path is to iterate over the
allocated expr_array objects and free them, starting from i - 1 and
working down to the start of the array. Currently the loop counter
is being incremented instead of decremented and also the index i is
being used instead of k, repeatedly destroying the same expr_array
element. Fix this by decrementing k and using k as the index into
expr_array.

Addresses-Coverity: ("Infinite loop")
Fixes: 8cfd9b0f8515 ("netfilter: nftables: generalize set expressions support")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 48b0ae04 07-Dec-2020 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nftables: netlink support for several set element expressions

This patch adds three new netlink attributes to encapsulate a list of
expressions per set elements:

- NFTA_SET_EXPRESSIONS: this attribute provides the set definition in
terms of expressions. New set elements get attached the list of
expressions that is specified by this new netlink attribute.
- NFTA_SET_ELEM_EXPRESSIONS: this attribute allows users to restore (or
initialize) the stateful information of set elements when adding an
element to the set.
- NFTA_DYNSET_EXPRESSIONS: this attribute specifies the list of
expressions that the set element gets when it is inserted from the
packet path.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 563125a7 09-Dec-2020 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nftables: generalize set extension to support for several expressions

This patch replaces NFT_SET_EXPR by NFT_SET_EXT_EXPRESSIONS. This new
extension allows to attach several expressions to one set element (not
only one single expression as NFT_SET_EXPR provides). This patch
prepares for support for several expressions per set element in the
netlink userspace API.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 8cfd9b0f 07-Dec-2020 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nftables: generalize set expressions support

Currently, the set infrastucture allows for one single expressions per
element. This patch extends the existing infrastructure to allow for up
to two expressions. This is not updating the netlink API yet, this is
coming as an initial preparation patch.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 42f1c271 08-Dec-2020 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nftables: comment indirect serialization of commit_mutex with rtnl_mutex

Add an explicit comment in the code to describe the indirect
serialization of the holders of the commit_mutex with the rtnl_mutex.
Commit 90d2723c6d4c ("netfilter: nf_tables: do not hold reference on
netdevice from preparation phase") already describes this, but a comment
in this case is better for reference.

Reported-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 917d80d3 08-Dec-2020 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nft_dynset: fix timeouts later than 23 days

Use nf_msecs_to_jiffies64 and nf_jiffies64_to_msecs as provided by
8e1102d5a159 ("netfilter: nf_tables: support timeouts larger than 23
days"), otherwise ruleset listing breaks.

Fixes: a8b1e36d0d1d ("netfilter: nft_dynset: fix element timeout for HZ != 1000")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# c0700dfa 19-Nov-2020 Florian Westphal <fw@strlen.de>

netfilter: nf_tables: avoid false-postive lockdep splat

There are reports wrt lockdep splat in nftables, e.g.:
------------[ cut here ]------------
WARNING: CPU: 2 PID: 31416 at net/netfilter/nf_tables_api.c:622
lockdep_nfnl_nft_mutex_not_held+0x28/0x38 [nf_tables]
...

These are caused by an earlier, unrelated bug such as a n ABBA deadlock
in a different subsystem.
In such an event, lockdep is disabled and lockdep_is_held returns true
unconditionally. This then causes the WARN() in nf_tables.

Make the WARN conditional on lockdep still active to avoid this.

Fixes: f102d66b335a417 ("netfilter: nf_tables: use dedicated mutex to guard transactions")
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Link: https://lore.kernel.org/linux-kselftest/CA+G9fYvFUpODs+NkSYcnwKnXm62tmP=ksLeBPmB+KFrB2rvCtQ@mail.gmail.com/
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 872f6903 15-Nov-2020 Francis Laniel <laniel_francis@privacyrequired.com>

treewide: rename nla_strlcpy to nla_strscpy.

Calls to nla_strlcpy are now replaced by calls to nla_strscpy which is the new
name of this function.

Signed-off-by: Francis Laniel <laniel_francis@privacyrequired.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 35b7ee34 31-Oct-2020 Andrew Lunn <andrew@lunn.ch>

netfilter: nftables: Add __printf() attribute

nft_request_module calls vsnprintf() using parameters passed to it.
Make the function with __printf() attribute so the compiler can check
the format and arguments.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# c0391b6a 29-Oct-2020 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: missing validation from the abort path

If userspace does not include the trailing end of batch message, then
nfnetlink aborts the transaction. This allows to check that ruleset
updates trigger no errors.

After this patch, invoking this command from the prerouting chain:

# nft -c add rule x y fib saddr . oif type local

fails since oif is not supported there.

This patch fixes the lack of rule validation from the abort/check path
to catch configuration errors such as the one above.

Fixes: a654de8fdc18 ("netfilter: nf_tables: fix chain dependency validation")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# dceababa 22-Oct-2020 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nftables: fix netlink report logic in flowtable and genid

The netlink report should be sent regardless the available listeners.

Fixes: 84d7fce69388 ("netfilter: nf_tables: export rule-set generation ID")
Fixes: 3b49e2e94e6e ("netfilter: nf_tables: add flow table netlink frontend")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 31cc578a 20-Oct-2020 Saeed Mirzamohammadi <saeed.mirzamohammadi@oracle.com>

netfilter: nftables_offload: KASAN slab-out-of-bounds Read in nft_flow_rule_create

This patch fixes the issue due to:

BUG: KASAN: slab-out-of-bounds in nft_flow_rule_create+0x622/0x6a2
net/netfilter/nf_tables_offload.c:40
Read of size 8 at addr ffff888103910b58 by task syz-executor227/16244

The error happens when expr->ops is accessed early on before performing the boundary check and after nft_expr_next() moves the expr to go out-of-bounds.

This patch checks the boundary condition before expr->ops that fixes the slab-out-of-bounds Read issue.

Add nft_expr_more() and use it to fix this problem.

Signed-off-by: Saeed Mirzamohammadi <saeed.mirzamohammadi@oracle.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# d25e2e93 14-Oct-2020 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: restore NF_INET_NUMHOOKS

This definition is used by the iptables legacy UAPI, restore it.

Fixes: d3519cb89f6d ("netfilter: nf_tables: add inet ingress support")
Reported-by: Jason A. Donenfeld <Jason@zx2c4.com>
Tested-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 98a381a7 12-Oct-2020 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nftables: extend error reporting for chain updates

The initial support for netlink extended ACK is missing the chain update
path, which results in misleading error reporting in case of EEXIST.

Fixes 36dd1bcc07e5 ("netfilter: nf_tables: initial support for extended ACK reporting")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# d3519cb8 07-Oct-2020 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: add inet ingress support

This patch adds a new ingress hook for the inet family. The inet ingress
hook emulates the IP receive path code, therefore, unclean packets are
drop before walking over the ruleset in this basechain.

This patch also introduces the nft_base_chain_netdev() helper function
to check if this hook is bound to one or more devices (through the hook
list infrastructure). This check allows to perform the same handling for
the inet ingress as it would be a netdev ingress chain from the control
plane.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 002f2176 28-Sep-2020 Jose M. Guisado Gomez <guigom@riseup.net>

netfilter: nf_tables: add userdata attributes to nft_chain

Enables storing userdata for nft_chain. Field udata points to user data
and udlen stores its length.

Adds new attribute flag NFTA_CHAIN_USERDATA.

Signed-off-by: Jose M. Guisado Gomez <guigom@riseup.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 85db827a 28-Sep-2020 Jose M. Guisado Gomez <guigom@riseup.net>

netfilter: nf_tables: use nla_memdup to copy udata

When userdata support was added to tables and objects, user data coming
from user space was allocated and copied using kzalloc + nla_memcpy.

Use nla_memdup to copy userdata of tables and objects.

Signed-off-by: Jose M. Guisado Gomez <guigom@riseup.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# bc7a7082 27-Sep-2020 Jose M. Guisado Gomez <guigom@riseup.net>

netfilter: nf_tables: fix userdata memleak

When userdata was introduced for tables and objects its allocation was
only freed inside the error path of the new{table, object} functions.

Free user data inside corresponding destroy functions for tables and
objects.

Fixes: b131c96496b3 ("netfilter: nf_tables: add userdata support for nft_object")
Fixes: 7a81575b806e ("netfilter: nf_tables: add userdata attributes to nft_table")
Signed-off-by: Jose M. Guisado Gomez <guigom@riseup.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# b131c964 08-Sep-2020 Jose M. Guisado Gomez <guigom@riseup.net>

netfilter: nf_tables: add userdata support for nft_object

Enables storing userdata for nft_object. Initially this will store an
optional comment but can be extended in the future as needed.

Adds new attribute NFTA_OBJ_USERDATA to nft_object.

Signed-off-by: Jose M. Guisado Gomez <guigom@riseup.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 67cc570e 27-Aug-2020 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: coalesce multiple notifications into one skbuff

On x86_64, each notification results in one skbuff allocation which
consumes at least 768 bytes due to the skbuff overhead.

This patch coalesces several notifications into one single skbuff, so
each notification consumes at least ~211 bytes, that ~3.5 times less
memory consumption. As a result, this is reducing the chances to exhaust
the netlink socket receive buffer.

Rule of thumb is that each notification batch only contains netlink
messages whose report flag is the same, nfnetlink_send() requires this
to do appropriate delivery to userspace, either via unicast (echo
mode) or multicast (monitor mode).

The skbuff control buffer is used to annotate the report flag for later
handling at the new coalescing routine.

The batch skbuff notification size is NLMSG_GOODSIZE, using a larger
skbuff would allow for more socket receiver buffer savings (to amortize
the cost of the skbuff even more), however, going over that size might
break userspace applications, so let's be conservative and stick to
NLMSG_GOODSIZE.

Reported-by: Phil Sutter <phil@nwl.cc>
Acked-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# ee921183 23-Aug-2020 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nfnetlink: nfnetlink_unicast() reports EAGAIN instead of ENOBUFS

Frontend callback reports EAGAIN to nfnetlink to retry a command, this
is used to signal that module autoloading is required. Unfortunately,
nlmsg_unicast() reports EAGAIN in case the receiver socket buffer gets
full, so it enters a busy-loop.

This patch updates nfnetlink_unicast() to turn EAGAIN into ENOBUFS and
to use nlmsg_unicast(). Remove the flags field in nfnetlink_unicast()
since this is always MSG_DONTWAIT in the existing code which is exactly
what nlmsg_unicast() passes to netlink_unicast() as parameter.

Fixes: 96518518cc41 ("netfilter: add nftables")
Reported-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 7a81575b 20-Aug-2020 Jose M. Guisado Gomez <guigom@riseup.net>

netfilter: nf_tables: add userdata attributes to nft_table

Enables storing userdata for nft_table. Field udata points to user data
and udlen store its length.

Adds new attribute flag NFTA_TABLE_USERDATA

Signed-off-by: Jose M. Guisado Gomez <guigom@riseup.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 6f03bf43 20-Aug-2020 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: add NFTA_SET_USERDATA if not null

Kernel sends an empty NFTA_SET_USERDATA attribute with no value if
userspace adds a set with no NFTA_SET_USERDATA attribute.

Fixes: e6d8ecac9e68 ("netfilter: nf_tables: Add new attributes into nft_set to store user data.")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 59136aa3 11-Aug-2020 Florian Westphal <fw@strlen.de>

netfilter: nf_tables: free chain context when BINDING flag is missing

syzbot found a memory leak in nf_tables_addchain() because the chain
object is not free'd correctly on error.

Fixes: d0e2c7de92c7 ("netfilter: nf_tables: add NFT_CHAIN_BINDING")
Reported-by: syzbot+c99868fde67014f7e9f5@syzkaller.appspotmail.com
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 77a92189 01-Aug-2020 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: report EEXIST on overlaps

Replace EBUSY by EEXIST in the following cases:

- If the user adds a chain with a different configuration such as different
type, hook and priority.

- If the user adds a non-base chain that clashes with an existing basechain.

- If the user adds a { key : value } mapping element and the key exists
but the value differs.

- If the device already belongs to an existing flowtable.

User describe that this error reporting is confusing:

- https://bugzilla.netfilter.org/show_bug.cgi?id=1176
- https://bugzilla.netfilter.org/show_bug.cgi?id=1413

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 83d9dcba 01-Aug-2020 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: extended netlink error reporting for expressions

This patch extends 36dd1bcc07e5 ("netfilter: nf_tables: initial support
for extended ACK reporting") to include netlink extended error reporting
for expressions. This allows userspace to identify what rule expression
is triggering the error.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# ffe8923f 24-Jul-2020 Florian Westphal <fw@strlen.de>

netfilter: nft_compat: make sure xtables destructors have run

Pablo Neira found that after recent update of xt_IDLETIMER the
iptables-nft tests sometimes show an error.

He tracked this down to the delayed cleanup used by nf_tables core:
del rule (transaction A)
add rule (transaction B)

Its possible that by time transaction B (both in same netns) runs,
the xt target destructor has not been invoked yet.

For native nft expressions this is no problem because all expressions
that have such side effects make sure these are handled from the commit
phase, rather than async cleanup.

For nft_compat however this isn't true.

Instead of forcing synchronous behaviour for nft_compat, keep track
of the number of outstanding destructor calls.

When we attempt to create a new expression, flush the cleanup worker
to make sure destructors have completed.

With lots of help from Pablo Neira.

Reported-by: Pablo Neira Ayso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 954d8297 08-Jul-2020 Gustavo A. R. Silva <gustavoars@kernel.org>

netfilter: Use fallthrough pseudo-keyword

Replace the existing /* fall through */ comments and its variants with
the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
fall-through markings when it is the case.

[1] https://www.kernel.org/doc/html/latest/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 1e9451cb 14-Jul-2020 Florian Westphal <fw@strlen.de>

netfilter: nf_tables: fix nat hook table deletion

sybot came up with following transaction:
add table ip syz0
add chain ip syz0 syz2 { type nat hook prerouting priority 0; policy accept; }
add table ip syz0 { flags dormant; }
delete chain ip syz0 syz2
delete table ip syz0

which yields:
hook not found, pf 2 num 0
WARNING: CPU: 0 PID: 6775 at net/netfilter/core.c:413 __nf_unregister_net_hook+0x3e6/0x4a0 net/netfilter/core.c:413
[..]
nft_unregister_basechain_hooks net/netfilter/nf_tables_api.c:206 [inline]
nft_table_disable net/netfilter/nf_tables_api.c:835 [inline]
nf_tables_table_disable net/netfilter/nf_tables_api.c:868 [inline]
nf_tables_commit+0x32d3/0x4d70 net/netfilter/nf_tables_api.c:7550
nfnetlink_rcv_batch net/netfilter/nfnetlink.c:486 [inline]
nfnetlink_rcv_skb_batch net/netfilter/nfnetlink.c:544 [inline]
nfnetlink_rcv+0x14a5/0x1e50 net/netfilter/nfnetlink.c:562
netlink_unicast_kernel net/netlink/af_netlink.c:1303 [inline]

Problem is that when I added ability to override base hook registration
to make nat basechains register with the nat core instead of netfilter
core, I forgot to update nft_table_disable() to use that instead of
the 'raw' hook register interface.

In syzbot transaction, the basechain is of 'nat' type. Its registered
with the nat core. The switch to 'dormant mode' attempts to delete from
netfilter core instead.

After updating nft_table_disable/enable to use the correct helper,
nft_(un)register_basechain_hooks can be folded into the only remaining
caller.

Because nft_trans_table_enable() won't do anything when the DORMANT flag
is set, remove the flag first, then re-add it in case re-enablement
fails, else this patch breaks sequence:

add table ip x { flags dormant; }
/* add base chains */
add table ip x

The last 'add' will remove the dormant flags, but won't have any other
effect -- base chains are not registered.
Then, next 'set dormant flag' will create another 'hook not found'
splat.

Reported-by: syzbot+2570f2c036e3da5db176@syzkaller.appspotmail.com
Fixes: 4e25ceb80b58 ("netfilter: nf_tables: allow chain type to override hook register")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 3db86c39 12-Jul-2020 Andrew Lunn <andrew@lunn.ch>

net: netfilter: kerneldoc fixes

Simple fixes which require no deep knowledge of the code.

Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Jozsef Kadlecsik <kadlec@netfilter.org>
Cc: Florian Westphal <fw@strlen.de>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 68df2ed5 03-Jul-2020 Paul Moore <paul@paul-moore.com>

audit: use the proper gfp flags in the audit_log_nfcfg() calls

Commit 142240398e50 ("audit: add gfp parameter to audit_log_nfcfg")
incorrectly passed gfp flags to audit_log_nfcfg() which were not
consistent with the calling function, this commit fixes that.

Fixes: 142240398e50 ("audit: add gfp parameter to audit_log_nfcfg")
Reported-by: Jones Desougi <jones.desougi+netfilter@gmail.com>
Reviewed-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>


# c1f79a2e 03-Jul-2020 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: reject unsupported chain flags

Bail out if userspace sends unsupported chain flags.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# d0e2c7de 30-Jun-2020 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: add NFT_CHAIN_BINDING

This new chain flag specifies that:

* the kernel dynamically allocates the chain name, if no chain name
is specified.

* If the immediate expression that refers to this chain is removed,
then this bound chain (and its content) is destroyed.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 04b7db41 30-Jun-2020 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: add nft_chain_add()

This patch adds a helper function to add the chain to the hashtable and
the chain list.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 67c49de4 30-Jun-2020 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: expose enum nft_chain_flags through UAPI

This enum definition was never exposed through UAPI. Rename
NFT_BASE_CHAIN to NFT_CHAIN_BASE for consistency.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 51d70f18 30-Jun-2020 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: add NFTA_VERDICT_CHAIN_ID attribute

This netlink attribute allows you to identify the chain to jump/goto by
means of the chain ID.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 837830a4 30-Jun-2020 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: add NFTA_RULE_CHAIN_ID attribute

This new netlink attribute allows you to add rules to chains by the
chain ID.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 74cccc3d 30-Jun-2020 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: add NFTA_CHAIN_ID attribute

This netlink attribute allows you to refer to chains inside a
transaction as an alternative to the name and the handle. The chain
binding support requires this new chain ID approach.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 14224039 27-Jun-2020 Richard Guy Briggs <rgb@redhat.com>

audit: add gfp parameter to audit_log_nfcfg

Fixed an inconsistent use of GFP flags in nft_obj_notify() that used
GFP_KERNEL when a GFP flag was passed in to that function. Given this
allocated memory was then used in audit_log_nfcfg() it led to an audit
of all other GFP allocations in net/netfilter/nf_tables_api.c and a
modification of audit_log_nfcfg() to accept a GFP parameter.

Reported-by: Dan Carptenter <dan.carpenter@oracle.com>
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>


# 8e6cf365 04-Jun-2020 Richard Guy Briggs <rgb@redhat.com>

audit: log nftables configuration change events

iptables, ip6tables, arptables and ebtables table registration,
replacement and unregistration configuration events are logged for the
native (legacy) iptables setsockopt api, but not for the
nftables netlink api which is used by the nft-variant of iptables in
addition to nftables itself.

Add calls to log the configuration actions in the nftables netlink api.

This uses the same NETFILTER_CFG record format but overloads the table
field.

type=NETFILTER_CFG msg=audit(2020-05-28 17:46:41.878:162) : table=?:0;?:0 family=unspecified entries=2 op=nft_register_gen pid=396 subj=system_u:system_r:firewalld_t:s0 comm=firewalld
...
type=NETFILTER_CFG msg=audit(2020-05-28 17:46:41.878:162) : table=firewalld:1;?:0 family=inet entries=0 op=nft_register_table pid=396 subj=system_u:system_r:firewalld_t:s0 comm=firewalld
...
type=NETFILTER_CFG msg=audit(2020-05-28 17:46:41.911:163) : table=firewalld:1;filter_FORWARD:85 family=inet entries=8 op=nft_register_chain pid=396 subj=system_u:system_r:firewalld_t:s0 comm=firewalld
...
type=NETFILTER_CFG msg=audit(2020-05-28 17:46:41.911:163) : table=firewalld:1;filter_FORWARD:85 family=inet entries=101 op=nft_register_rule pid=396 subj=system_u:system_r:firewalld_t:s0 comm=firewalld
...
type=NETFILTER_CFG msg=audit(2020-05-28 17:46:41.911:163) : table=firewalld:1;__set0:87 family=inet entries=87 op=nft_register_setelem pid=396 subj=system_u:system_r:firewalld_t:s0 comm=firewalld
...
type=NETFILTER_CFG msg=audit(2020-05-28 17:46:41.911:163) : table=firewalld:1;__set0:87 family=inet entries=0 op=nft_register_set pid=396 subj=system_u:system_r:firewalld_t:s0 comm=firewalld

For further information please see issue
https://github.com/linux-audit/audit-kernel/issues/124

Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>


# 3003055f 10-Jun-2020 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: hook list memleak in flowtable deletion

After looking up for the flowtable hooks that need to be removed,
release the hook objects in the deletion list. The error path needs to
released these hook objects too.

Fixes: abadb2f865d7 ("netfilter: nf_tables: delete devices from flowtable")
Reported-by: syzbot+eb9d5924c51d6d59e094@syzkaller.appspotmail.com
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 5b6743fb 22-May-2020 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: skip flowtable hooknum and priority on device updates

On device updates, the hooknum and priority attributes are not required.
This patch makes optional these two netlink attributes.

Moreover, bail out with EOPNOTSUPP if userspace tries to update the
hooknum and priority for existing flowtables.

While at this, turn EINVAL into EOPNOTSUPP in case the hooknum is not
ingress. EINVAL is reserved for missing netlink attribute / malformed
netlink messages.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 05abe445 20-May-2020 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: allow to register flowtable with no devices

A flowtable might be composed of dynamic interfaces only. Such dynamic
interfaces might show up at a later stage. This patch allows users to
register a flowtable with no devices. Once the dynamic interface becomes
available, the user adds the dynamic devices to the flowtable.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# abadb2f8 20-May-2020 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: delete devices from flowtable

This patch allows users to delete devices from existing flowtables.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 78d9f48f 20-May-2020 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: add devices to existing flowtable

This patch allows users to add devices to an existing flowtable.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# c42d8bda 20-May-2020 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: pass hook list to flowtable event notifier

Update the flowtable netlink notifier to take the list of hooks as input.
This allows to reuse this function in incremental flowtable hook updates.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 389a2cbc 20-May-2020 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: add nft_flowtable_hooks_destroy()

This patch adds a helper function destroy the flowtable hooks.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# f9382669 18-May-2020 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: pass hook list to nft_{un,}register_flowtable_net_hooks()

This patch prepares for incremental flowtable hook updates.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# d9246a53 20-May-2020 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: generalise flowtable hook parsing

Update nft_flowtable_parse_hook() to take the flowtable hook list as
parameter. This allows to reuse this function to update the hooks.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# fdb9c405 24-Apr-2020 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: allow up to 64 bytes in the set element data area

So far, the set elements could store up to 128-bits in the data area.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# e6abef61 26-Mar-2020 Jason A. Donenfeld <Jason@zx2c4.com>

x86: update AS_* macros to binutils >=2.23, supporting ADX and AVX2

Now that the kernel specifies binutils 2.23 as the minimum version, we
can remove ifdefs for AVX2 and ADX throughout.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>


# ef516e86 07-Apr-2020 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: reintroduce the NFT_SET_CONCAT flag

Stefano originally proposed to introduce this flag, users hit EOPNOTSUPP
in new binaries with old kernels when defining a set with ranges in
a concatenation.

Fixes: f3a2181e16f1 ("netfilter: nf_tables: Support for sets with multiple ranged fields")
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# d9583cdf 07-Apr-2020 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: report EOPNOTSUPP on unsupported flags/object type

EINVAL should be used for malformed netlink messages. New userspace
utility and old kernels might easily result in EINVAL when exercising
new set features, which is misleading.

Fixes: 8aeff920dcc9 ("netfilter: nf_tables: add stateful object reference to set elements")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 7fb6f78d 01-Apr-2020 Eric Dumazet <edumazet@google.com>

netfilter: nf_tables: do not leave dangling pointer in nf_tables_set_alloc_name

If nf_tables_set_alloc_name() frees set->name, we better
clear set->name to avoid a future use-after-free or invalid-free.

BUG: KASAN: double-free or invalid-free in nf_tables_newset+0x1ed6/0x2560 net/netfilter/nf_tables_api.c:4148

CPU: 0 PID: 28233 Comm: syz-executor.0 Not tainted 5.6.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x188/0x20d lib/dump_stack.c:118
print_address_description.constprop.0.cold+0xd3/0x315 mm/kasan/report.c:374
kasan_report_invalid_free+0x61/0xa0 mm/kasan/report.c:468
__kasan_slab_free+0x129/0x140 mm/kasan/common.c:455
__cache_free mm/slab.c:3426 [inline]
kfree+0x109/0x2b0 mm/slab.c:3757
nf_tables_newset+0x1ed6/0x2560 net/netfilter/nf_tables_api.c:4148
nfnetlink_rcv_batch+0x83a/0x1610 net/netfilter/nfnetlink.c:433
nfnetlink_rcv_skb_batch net/netfilter/nfnetlink.c:543 [inline]
nfnetlink_rcv+0x3af/0x420 net/netfilter/nfnetlink.c:561
netlink_unicast_kernel net/netlink/af_netlink.c:1303 [inline]
netlink_unicast+0x537/0x740 net/netlink/af_netlink.c:1329
netlink_sendmsg+0x882/0xe10 net/netlink/af_netlink.c:1918
sock_sendmsg_nosec net/socket.c:652 [inline]
sock_sendmsg+0xcf/0x120 net/socket.c:672
____sys_sendmsg+0x6b9/0x7d0 net/socket.c:2345
___sys_sendmsg+0x100/0x170 net/socket.c:2399
__sys_sendmsg+0xec/0x1b0 net/socket.c:2432
do_syscall_64+0xf6/0x7d0 arch/x86/entry/common.c:294
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x45c849
Code: ad b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 7b b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007fe5ca21dc78 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007fe5ca21e6d4 RCX: 000000000045c849
RDX: 0000000000000000 RSI: 0000000020000c40 RDI: 0000000000000003
RBP: 000000000076bf00 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
R13: 000000000000095b R14: 00000000004cc0e9 R15: 000000000076bf0c

Allocated by task 28233:
save_stack+0x1b/0x80 mm/kasan/common.c:72
set_track mm/kasan/common.c:80 [inline]
__kasan_kmalloc mm/kasan/common.c:515 [inline]
__kasan_kmalloc.constprop.0+0xbf/0xd0 mm/kasan/common.c:488
__do_kmalloc mm/slab.c:3656 [inline]
__kmalloc_track_caller+0x159/0x790 mm/slab.c:3671
kvasprintf+0xb5/0x150 lib/kasprintf.c:25
kasprintf+0xbb/0xf0 lib/kasprintf.c:59
nf_tables_set_alloc_name net/netfilter/nf_tables_api.c:3536 [inline]
nf_tables_newset+0x1543/0x2560 net/netfilter/nf_tables_api.c:4088
nfnetlink_rcv_batch+0x83a/0x1610 net/netfilter/nfnetlink.c:433
nfnetlink_rcv_skb_batch net/netfilter/nfnetlink.c:543 [inline]
nfnetlink_rcv+0x3af/0x420 net/netfilter/nfnetlink.c:561
netlink_unicast_kernel net/netlink/af_netlink.c:1303 [inline]
netlink_unicast+0x537/0x740 net/netlink/af_netlink.c:1329
netlink_sendmsg+0x882/0xe10 net/netlink/af_netlink.c:1918
sock_sendmsg_nosec net/socket.c:652 [inline]
sock_sendmsg+0xcf/0x120 net/socket.c:672
____sys_sendmsg+0x6b9/0x7d0 net/socket.c:2345
___sys_sendmsg+0x100/0x170 net/socket.c:2399
__sys_sendmsg+0xec/0x1b0 net/socket.c:2432
do_syscall_64+0xf6/0x7d0 arch/x86/entry/common.c:294
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 28233:
save_stack+0x1b/0x80 mm/kasan/common.c:72
set_track mm/kasan/common.c:80 [inline]
kasan_set_free_info mm/kasan/common.c:337 [inline]
__kasan_slab_free+0xf7/0x140 mm/kasan/common.c:476
__cache_free mm/slab.c:3426 [inline]
kfree+0x109/0x2b0 mm/slab.c:3757
nf_tables_set_alloc_name net/netfilter/nf_tables_api.c:3544 [inline]
nf_tables_newset+0x1f73/0x2560 net/netfilter/nf_tables_api.c:4088
nfnetlink_rcv_batch+0x83a/0x1610 net/netfilter/nfnetlink.c:433
nfnetlink_rcv_skb_batch net/netfilter/nfnetlink.c:543 [inline]
nfnetlink_rcv+0x3af/0x420 net/netfilter/nfnetlink.c:561
netlink_unicast_kernel net/netlink/af_netlink.c:1303 [inline]
netlink_unicast+0x537/0x740 net/netlink/af_netlink.c:1329
netlink_sendmsg+0x882/0xe10 net/netlink/af_netlink.c:1918
sock_sendmsg_nosec net/socket.c:652 [inline]
sock_sendmsg+0xcf/0x120 net/socket.c:672
____sys_sendmsg+0x6b9/0x7d0 net/socket.c:2345
___sys_sendmsg+0x100/0x170 net/socket.c:2399
__sys_sendmsg+0xec/0x1b0 net/socket.c:2432
do_syscall_64+0xf6/0x7d0 arch/x86/entry/common.c:294
entry_SYSCALL_64_after_hwframe+0x49/0xbe

The buggy address belongs to the object at ffff8880a6032d00
which belongs to the cache kmalloc-32 of size 32
The buggy address is located 0 bytes inside of
32-byte region [ffff8880a6032d00, ffff8880a6032d20)
The buggy address belongs to the page:
page:ffffea0002980c80 refcount:1 mapcount:0 mapping:ffff8880aa0001c0 index:0xffff8880a6032fc1
flags: 0xfffe0000000200(slab)
raw: 00fffe0000000200 ffffea0002a3be88 ffffea00029b1908 ffff8880aa0001c0
raw: ffff8880a6032fc1 ffff8880a6032000 000000010000003e 0000000000000000
page dumped because: kasan: bad access detected

Fixes: 65038428b2c6 ("netfilter: nf_tables: allow to specify stateful expression in set definition")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# d56aab26 27-Mar-2020 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: skip set types that do not support for expressions

The bitmap set does not support for expressions, skip it from the
estimation step.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 0a6a9515 25-Mar-2020 Qian Cai <cai@lca.pw>

netfilter: nf_tables: silence a RCU-list warning in nft_table_lookup()

It is safe to traverse &net->nft.tables with &net->nft.commit_mutex
held using list_for_each_entry_rcu(). Silence the PROVE_RCU_LIST false
positive,

WARNING: suspicious RCU usage
net/netfilter/nf_tables_api.c:523 RCU-list traversed in non-reader section!!

other info that might help us debug this:

rcu_scheduler_active = 2, debug_locks = 1
1 lock held by iptables/1384:
#0: ffffffff9745c4a8 (&net->nft.commit_mutex){+.+.}, at: nf_tables_valid_genid+0x25/0x60 [nf_tables]

Call Trace:
dump_stack+0xa1/0xea
lockdep_rcu_suspicious+0x103/0x10d
nft_table_lookup.part.0+0x116/0x120 [nf_tables]
nf_tables_newtable+0x12c/0x7d0 [nf_tables]
nfnetlink_rcv_batch+0x559/0x1190 [nfnetlink]
nfnetlink_rcv+0x1da/0x210 [nfnetlink]
netlink_unicast+0x306/0x460
netlink_sendmsg+0x44b/0x770
____sys_sendmsg+0x46b/0x4a0
___sys_sendmsg+0x138/0x1a0
__sys_sendmsg+0xb6/0x130
__x64_sys_sendmsg+0x48/0x50
do_syscall_64+0x69/0xf4
entry_SYSCALL_64_after_hwframe+0x49/0xb3

Signed-off-by: Qian Cai <cai@lca.pw>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# cfbd1125 23-Mar-2020 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: add enum nft_flowtable_flags to uapi

Expose the NFT_FLOWTABLE_HW_OFFLOAD flag through uapi.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 8c2d45b2 21-Mar-2020 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: Allow set back-ends to report partial overlaps on insertion

Currently, the -EEXIST return code of ->insert() callbacks is ambiguous: it
might indicate that a given element (including intervals) already exists as
such, or that the new element would clash with existing ones.

If identical elements already exist, the front-end is ignoring this without
returning error, in case NLM_F_EXCL is not set. However, if the new element
can't be inserted due an overlap, we should report this to the user.

To this purpose, allow set back-ends to return -ENOTEMPTY on collision with
existing elements, translate that to -EEXIST, and return that to userspace,
no matter if NLM_F_EXCL was set.

Reported-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 475beb9c 18-Mar-2020 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: add nft_set_elem_expr_destroy() and use it

This patch adds nft_set_elem_expr_destroy() to destroy stateful
expressions in set elements.

This patch also updates the commit path to call this function to invoke
expr->ops->destroy_clone when required.

This is implicitly fixing up a module reference counter leak and
a memory leak in expressions that allocated internal state, e.g.
nft_counter.

Fixes: 409444522976 ("netfilter: nf_tables: add elements with stateful expressions")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 772f4e82 17-Mar-2020 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: fix double-free on set expression from the error path

After copying the expression to the set element extension, release the
expression and reset the pointer to avoid a double-free from the error
path.

Fixes: 409444522976 ("netfilter: nf_tables: add elements with stateful expressions")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 65038428 17-Mar-2020 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: allow to specify stateful expression in set definition

This patch allows users to specify the stateful expression for the
elements in this set via NFTA_SET_EXPR. This new feature allows you to
turn on counters for all of the elements in this set.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 0c2a85ed 17-Mar-2020 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: pass context to nft_set_destroy()

The patch that adds support for stateful expressions in set definitions
require this.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# c604cc691 17-Mar-2020 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: move nft_expr_clone() to nf_tables_api.c

Move the nft_expr_clone() helper function to the core.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 40944452 11-Mar-2020 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: add elements with stateful expressions

Update nft_add_set_elem() to handle the NFTA_SET_ELEM_EXPR netlink
attribute. This patch allows users to to add elements with stateful
expressions.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 795a6d6b 11-Mar-2020 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: statify nft_expr_init()

Not exposed anymore to modules, statify this function.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# a7fc9368 11-Mar-2020 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: add nft_set_elem_expr_alloc()

Add helper function to create stateful expression.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 7400b063 07-Mar-2020 Stefano Brivio <sbrivio@redhat.com>

nft_set_pipapo: Introduce AVX2-based lookup implementation

If the AVX2 set is available, we can exploit the repetitive
characteristic of this algorithm to provide a fast, vectorised
version by using 256-bit wide AVX2 operations for bucket loads and
bitwise intersections.

In most cases, this implementation consistently outperforms rbtree
set instances despite the fact they are configured to use a given,
single, ranged data type out of the ones used for performance
measurements by the nft_concat_range.sh kselftest.

That script, injecting packets directly on the ingoing device path
with pktgen, reports, averaged over five runs on a single AMD Epyc
7402 thread (3.35GHz, 768 KiB L1D$, 12 MiB L2$), the figures below.
CONFIG_RETPOLINE was not set here.

Note that this is not a fair comparison over hash and rbtree set
types: non-ranged entries (used to have a reference for hash types)
would be matched faster than this, and matching on a single field
only (which is the case for rbtree) is also significantly faster.

However, it's not possible at the moment to choose this set type
for non-ranged entries, and the current implementation also needs
a few minor adjustments in order to match on less than two fields.

---------------.-----------------------------------.------------.
AMD Epyc 7402 | baselines, Mpps | this patch |
1 thread |___________________________________|____________|
3.35GHz | | | | | |
768KiB L1D$ | netdev | hash | rbtree | | |
---------------| hook | no | single | | pipapo |
type entries | drop | ranges | field | pipapo | AVX2 |
---------------|--------|--------|--------|--------|------------|
net,port | | | | | |
1000 | 19.0 | 10.4 | 3.8 | 4.0 | 7.5 +87% |
---------------|--------|--------|--------|--------|------------|
port,net | | | | | |
100 | 18.8 | 10.3 | 5.8 | 6.3 | 8.1 +29% |
---------------|--------|--------|--------|--------|------------|
net6,port | | | | | |
1000 | 16.4 | 7.6 | 1.8 | 2.1 | 4.8 +128% |
---------------|--------|--------|--------|--------|------------|
port,proto | | | | | |
30000 | 19.6 | 11.6 | 3.9 | 0.5 | 2.6 +420% |
---------------|--------|--------|--------|--------|------------|
net6,port,mac | | | | | |
10 | 16.5 | 5.4 | 4.3 | 3.4 | 4.7 +38% |
---------------|--------|--------|--------|--------|------------|
net6,port,mac, | | | | | |
proto 1000 | 16.5 | 5.7 | 1.9 | 1.4 | 3.6 +26% |
---------------|--------|--------|--------|--------|------------|
net,mac | | | | | |
1000 | 19.0 | 8.4 | 3.9 | 2.5 | 6.4 +156% |
---------------'--------'--------'--------'--------'------------'

A similar strategy could be easily reused to implement specialised
versions for other SIMD sets, and I plan to post at least a NEON
version at a later time.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 24d19826 18-Feb-2020 Florian Westphal <fw@strlen.de>

netfilter: nf_tables: make all set structs const

They do not need to be writeable anymore.

v2: remove left-over __read_mostly annotation in set_pipapo.c (Stefano)

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# e32a4dc6 18-Feb-2020 Florian Westphal <fw@strlen.de>

netfilter: nf_tables: make sets built-in

Placing nftables set support in an extra module is pointless:

1. nf_tables needs dynamic registeration interface for sake of one module
2. nft heavily relies on sets, e.g. even simple rule like
"nft ... tcp dport { 80, 443 }" will not work with _SETS=n.

IOW, either nftables isn't used or both nf_tables and nf_tables_set
modules are needed anyway.

With extra module:
307K net/netfilter/nf_tables.ko
79K net/netfilter/nf_tables_set.ko

text data bss dec filename
146416 3072 545 150033 nf_tables.ko
35496 1817 0 37313 nf_tables_set.ko

This patch:
373K net/netfilter/nf_tables.ko

178563 4049 545 183157 nf_tables.ko

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 1d305ba4 05-Mar-2020 Florian Westphal <fw@strlen.de>

netfilter: nf_tables: fix infinite loop when expr is not available

nft will loop forever if the kernel doesn't support an expression:

1. nft_expr_type_get() appends the family specific name to the module list.
2. -EAGAIN is returned to nfnetlink, nfnetlink calls abort path.
3. abort path sets ->done to true and calls request_module for the
expression.
4. nfnetlink replays the batch, we end up in nft_expr_type_get() again.
5. nft_expr_type_get attempts to append family-specific name. This
one already exists on the list, so we continue
6. nft_expr_type_get adds the generic expression name to the module
list. -EAGAIN is returned, nfnetlink calls abort path.
7. abort path encounters the family-specific expression which
has 'done' set, so it gets removed.
8. abort path requests the generic expression name, sets done to true.
9. batch is replayed.

If the expression could not be loaded, then we will end up back at 1),
because the family-specific name got removed and the cycle starts again.

Note that userspace can SIGKILL the nft process to stop the cycle, but
the desired behaviour is to return an error after the generic expr name
fails to load the expression.

Fixes: eb014de4fd418 ("netfilter: nf_tables: autoload modules from the abort path")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# d78008de 03-Mar-2020 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: dump NFTA_CHAIN_FLAGS attribute

Missing NFTA_CHAIN_FLAGS netlink attribute when dumping basechain
definitions.

Fixes: c9626a2cbdb2 ("netfilter: nf_tables: add hardware offload support")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 2d285f26 02-Mar-2020 Florian Westphal <fw@strlen.de>

netfilter: nf_tables: free flowtable hooks on hook register error

If hook registration fails, the hooks allocated via nft_netdev_hook_alloc
need to be freed.

We can't change the goto label to 'goto 5' -- while it does fix the memleak
it does cause a warning splat from the netfilter core (the hooks were not
registered).

Fixes: 3f0465a9ef02 ("netfilter: nf_tables: dynamically allocate hooks per net_device in flowtables")
Reported-by: syzbot+a2ff6fa45162a5ed4dd3@syzkaller.appspotmail.com
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# f3a2181e 21-Jan-2020 Stefano Brivio <sbrivio@redhat.com>

netfilter: nf_tables: Support for sets with multiple ranged fields

Introduce a new nested netlink attribute, NFTA_SET_DESC_CONCAT, used
to specify the length of each field in a set concatenation.

This allows set implementations to support concatenation of multiple
ranged items, as they can divide the input key into matching data for
every single field. Such set implementations would be selected as
they specify support for NFT_SET_INTERVAL and allow desc->field_count
to be greater than one. Explicitly disallow this for nft_set_rbtree.

In order to specify the interval for a set entry, userspace would
include in NFTA_SET_DESC_CONCAT attributes field lengths, and pass
range endpoints as two separate keys, represented by attributes
NFTA_SET_ELEM_KEY and NFTA_SET_ELEM_KEY_END.

While at it, export the number of 32-bit registers available for
packet matching, as nftables will need this to know the maximum
number of field lengths that can be specified.

For example, "packets with an IPv4 address between 192.0.2.0 and
192.0.2.42, with destination port between 22 and 25", can be
expressed as two concatenated elements:

NFTA_SET_ELEM_KEY: 192.0.2.0 . 22
NFTA_SET_ELEM_KEY_END: 192.0.2.42 . 25

and NFTA_SET_DESC_CONCAT attribute would contain:

NFTA_LIST_ELEM
NFTA_SET_FIELD_LEN: 4
NFTA_LIST_ELEM
NFTA_SET_FIELD_LEN: 2

v4: No changes
v3: Complete rework, NFTA_SET_DESC_CONCAT instead of NFTA_SET_SUBKEY
v2: No changes

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 7b225d0b 21-Jan-2020 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: add NFTA_SET_ELEM_KEY_END attribute

Add NFTA_SET_ELEM_KEY_END attribute to convey the closing element of the
interval between kernel and userspace.

This patch also adds the NFT_SET_EXT_KEY_END extension to store the
closing element value in this interval.

v4: No changes
v3: New patch

[sbrivio: refactor error paths and labels; add corresponding
nft_set_ext_type for new key; rebase]
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 20a1452c 21-Jan-2020 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: add nft_setelem_parse_key()

Add helper function to parse the set element key netlink attribute.

v4: No changes
v3: New patch

[sbrivio: refactor error paths and labels; use NFT_DATA_VALUE_MAXLEN
instead of sizeof(*key) in helper, value can be longer than that;
rebase]
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# eb014de4 21-Jan-2020 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: autoload modules from the abort path

This patch introduces a list of pending module requests. This new module
list is composed of nft_module_request objects that contain the module
name and one status field that tells if the module has been already
loaded (the 'done' field).

In the first pass, from the preparation phase, the netlink command finds
that a module is missing on this list. Then, a module request is
allocated and added to this list and nft_request_module() returns
-EAGAIN. This triggers the abort path with the autoload parameter set on
from nfnetlink, request_module() is called and the module request enters
the 'done' state. Since the mutex is released when loading modules from
the abort phase, the module list is zapped so this is iteration occurs
over a local list. Therefore, the request_module() calls happen when
object lists are in consistent state (after fulling aborting the
transaction) and the commit list is empty.

On the second pass, the netlink command will find that it already tried
to load the module, so it does not request it again and
nft_request_module() returns 0. Then, there is a look up to find the
object that the command was missing. If the module was successfully
loaded, the command proceeds normally since it finds the missing object
in place, otherwise -ENOENT is reported to userspace.

This patch also updates nfnetlink to include the reason to enter the
abort phase, which is required for this new autoload module rationale.

Fixes: ec7470b834fe ("netfilter: nf_tables: store transaction list locally while requesting module")
Reported-by: syzbot+29125d208b3dae9a7019@syzkaller.appspotmail.com
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 82603549 21-Jan-2020 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: add __nft_chain_type_get()

This new helper function validates that unknown family and chain type
coming from userspace do not trigger an out-of-bound array access. Bail
out in case __nft_chain_type_get() returns NULL from
nft_chain_parse_hook().

Fixes: 9370761c56b6 ("netfilter: nf_tables: convert built-in tables/chains to chain types")
Reported-by: syzbot+156a04714799b1d480bc@syzkaller.appspotmail.com
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 335178d5 15-Jan-2020 Florian Westphal <fw@strlen.de>

netfilter: nf_tables: fix flowtable list del corruption

syzbot reported following crash:

list_del corruption, ffff88808c9bb000->prev is LIST_POISON2 (dead000000000122)
[..]
Call Trace:
__list_del_entry include/linux/list.h:131 [inline]
list_del_rcu include/linux/rculist.h:148 [inline]
nf_tables_commit+0x1068/0x3b30 net/netfilter/nf_tables_api.c:7183
[..]

The commit transaction list has:

NFT_MSG_NEWTABLE
NFT_MSG_NEWFLOWTABLE
NFT_MSG_DELFLOWTABLE
NFT_MSG_DELTABLE

A missing generation check during DELTABLE processing causes it to queue
the DELFLOWTABLE operation a second time, so we corrupt the list here:

case NFT_MSG_DELFLOWTABLE:
list_del_rcu(&nft_trans_flowtable(trans)->list);
nf_tables_flowtable_notify(&trans->ctx,

because we have two different DELFLOWTABLE transactions for the same
flowtable. We then call list_del_rcu() twice for the same flowtable->list.

The object handling seems to suffer from the same bug so add a generation
check too and only queue delete transactions for flowtables/objects that
are still active in the next generation.

Reported-by: syzbot+37a6804945a3a13b1572@syzkaller.appspotmail.com
Fixes: 3b49e2e94e6eb ("netfilter: nf_tables: add flow table netlink frontend")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# cd77e75b 16-Jan-2020 Dan Carpenter <dan.carpenter@oracle.com>

netfilter: nf_tables: fix memory leak in nf_tables_parse_netdev_hooks()

Syzbot detected a leak in nf_tables_parse_netdev_hooks(). If the hook
already exists, then the error handling doesn't free the newest "hook".

Reported-by: syzbot+f9d4095107fc8749c69c@syzkaller.appspotmail.com
Fixes: b75a3e8371bc ("netfilter: nf_tables: allow netdevice to be used only once per flowtable")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 9332d27d 16-Jan-2020 Florian Westphal <fw@strlen.de>

netfilter: nf_tables: remove WARN and add NLA_STRING upper limits

This WARN can trigger because some of the names fed to the module
autoload function can be of arbitrary length.

Remove the WARN and add limits for all NLA_STRING attributes.

Reported-by: syzbot+0e63ae76d117ae1c3a01@syzkaller.appspotmail.com
Fixes: 452238e8d5ffd8 ("netfilter: nf_tables: add and use helper for module autoload")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# ec7470b8 13-Jan-2020 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: store transaction list locally while requesting module

This patch fixes a WARN_ON in nft_set_destroy() due to missing
set reference count drop from the preparation phase. This is triggered
by the module autoload path. Do not exercise the abort path from
nft_request_module() while preparation phase cleaning up is still
pending.

WARNING: CPU: 3 PID: 3456 at net/netfilter/nf_tables_api.c:3740 nft_set_destroy+0x45/0x50 [nf_tables]
[...]
CPU: 3 PID: 3456 Comm: nft Not tainted 5.4.6-arch3-1 #1
RIP: 0010:nft_set_destroy+0x45/0x50 [nf_tables]
Code: e8 30 eb 83 c6 48 8b 85 80 00 00 00 48 8b b8 90 00 00 00 e8 dd 6b d7 c5 48 8b 7d 30 e8 24 dd eb c5 48 89 ef 5d e9 6b c6 e5 c5 <0f> 0b c3 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 8b 7f 10 e9 52
RSP: 0018:ffffac4f43e53700 EFLAGS: 00010202
RAX: 0000000000000001 RBX: ffff99d63a154d80 RCX: 0000000001f88e03
RDX: 0000000001f88c03 RSI: ffff99d6560ef0c0 RDI: ffff99d63a101200
RBP: ffff99d617721de0 R08: 0000000000000000 R09: 0000000000000318
R10: 00000000f0000000 R11: 0000000000000001 R12: ffffffff880fabf0
R13: dead000000000122 R14: dead000000000100 R15: ffff99d63a154d80
FS: 00007ff3dbd5b740(0000) GS:ffff99d6560c0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00001cb5de6a9000 CR3: 000000016eb6a004 CR4: 00000000001606e0
Call Trace:
__nf_tables_abort+0x3e3/0x6d0 [nf_tables]
nft_request_module+0x6f/0x110 [nf_tables]
nft_expr_type_request_module+0x28/0x50 [nf_tables]
nf_tables_expr_parse+0x198/0x1f0 [nf_tables]
nft_expr_init+0x3b/0xf0 [nf_tables]
nft_dynset_init+0x1e2/0x410 [nf_tables]
nf_tables_newrule+0x30a/0x930 [nf_tables]
nfnetlink_rcv_batch+0x2a0/0x640 [nfnetlink]
nfnetlink_rcv+0x125/0x171 [nfnetlink]
netlink_unicast+0x179/0x210
netlink_sendmsg+0x208/0x3d0
sock_sendmsg+0x5e/0x60
____sys_sendmsg+0x21b/0x290

Update comment on the code to describe the new behaviour.

Reported-by: Marco Oliverio <marco.oliverio@tanaza.com>
Fixes: 452238e8d5ff ("netfilter: nf_tables: add and use helper for module autoload")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 5acab914 03-Jan-2020 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: unbind callbacks from flowtable destroy path

Callback unbinding needs to be done after nf_flow_table_free(),
otherwise entries are not removed from the hardware.

Update nft_unregister_flowtable_net_hooks() to call
nf_unregister_net_hook() instead since the commit/abort paths do not
deal with the callback unbinding anymore.

Add a comment to nft_flowtable_event() to clarify that
flow_offload_netdev_event() already removes the entries before the
callback unbinding.

Fixes: 8bb69f3b2918 ("netfilter: nf_tables: add flowtable offload control plane")
Fixes ff4bf2f42a40 ("netfilter: nf_tables: add nft_unregister_flowtable_hook()")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: wenxu <wenxu@ucloud.cn>


# c593642c 09-Dec-2019 Pankaj Bharadiya <pankaj.laxminarayan.bharadiya@intel.com>

treewide: Use sizeof_field() macro

Replace all the occurrences of FIELD_SIZEOF() with sizeof_field() except
at places where these are defined. Later patches will remove the unused
definition of FIELD_SIZEOF().

This patch is generated using following script:

EXCLUDE_FILES="include/linux/stddef.h|include/linux/kernel.h"

git grep -l -e "\bFIELD_SIZEOF\b" | while read file;
do

if [[ "$file" =~ $EXCLUDE_FILES ]]; then
continue
fi
sed -i -e 's/\bFIELD_SIZEOF\b/sizeof_field/g' $file;
done

Signed-off-by: Pankaj Bharadiya <pankaj.laxminarayan.bharadiya@intel.com>
Link: https://lore.kernel.org/r/20190924105839.110713-3-pankaj.laxminarayan.bharadiya@intel.com
Co-developed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: David Miller <davem@davemloft.net> # for net


# fd57d0cb 06-Dec-2019 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: skip module reference count bump on object updates

Use __nft_obj_type_get() instead, otherwise there is a module reference
counter leak.

Fixes: d62d0ba97b58 ("netfilter: nf_tables: Introduce stateful object update operation")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 0d2c96af 06-Dec-2019 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: validate NFT_DATA_VALUE after nft_data_init()

Userspace might bogusly sent NFT_DATA_VERDICT in several netlink
attributes that assume NFT_DATA_VALUE. Moreover, make sure that error
path invokes nft_data_release() to decrement the reference count on the
chain object.

Fixes: 96518518cc41 ("netfilter: add nftables")
Fixes: 0f3cd9b36977 ("netfilter: nf_tables: add range expression")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# bffc124b 06-Dec-2019 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: validate NFT_SET_ELEM_INTERVAL_END

Only NFTA_SET_ELEM_KEY and NFTA_SET_ELEM_FLAGS make sense for elements
whose NFT_SET_ELEM_INTERVAL_END flag is set on.

Fixes: 96518518cc41 ("netfilter: add nftables")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# ff4bf2f4 15-Nov-2019 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: add nft_unregister_flowtable_hook()

Unbind flowtable callback if hook is unregistered.

This patch is implicitly fixing the error path of
nf_tables_newflowtable() and nft_flowtable_event().

Fixes: 8bb69f3b2918 ("netfilter: nf_tables: add flowtable offload control plane")
Reported-by: wenxu <wenxu@ucloud.cn>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# d7c03a9f 15-Nov-2019 wenxu <wenxu@ucloud.cn>

netfilter: nf_tables: check if bind callback fails and unbind if hook registration fails

Undo the callback binding before unregistering the existing hooks. This
should also check for error of the bind setup call.

Fixes: c29f74e0df7a ("netfilter: nf_flow_table: hardware offload support")
Signed-off-by: wenxu <wenxu@ucloud.cn>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 63b48c73 14-Nov-2019 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables_offload: undo updates if transaction fails

The nft_flow_rule_offload_commit() function might fail after several
successful commands, thus, leaving the hardware filtering policy in
inconsistent state.

This patch adds nft_flow_rule_offload_abort() function which undoes the
updates that have been already processed if one command in this
transaction fails. Hence, the hardware ruleset is left as it was before
this aborted transaction.

The deletion path needs to create the flow_rule object too, in case that
an existing rule needs to be re-added from the abort path.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 8bb69f3b 11-Nov-2019 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: add flowtable offload control plane

This patch adds the NFTA_FLOWTABLE_FLAGS attribute that allows users to
specify the NF_FLOWTABLE_HW_OFFLOAD flag. This patch also adds a new
setup interface for the flowtable type to perform the flowtable offload
block callback configuration.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 1ed012f6 04-Nov-2019 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: bogus EOPNOTSUPP on basechain update

Userspace never includes the NFT_BASE_CHAIN flag, this flag is inferred
from the NFTA_CHAIN_HOOK atribute. The chain update path does not allow
to update flags at this stage, the existing sanity check bogusly hits
EOPNOTSUPP in the basechain case if the offload flag is set on.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 9fedd894 02-Nov-2019 Fernando Fernandez Mancera <ffmancera@riseup.net>

netfilter: nf_tables: fix unexpected EOPNOTSUPP error

If the object type doesn't implement an update operation and the user tries to
update it will silently ignore the update operation.

Fixes: aa4095a156b5 ("netfilter: nf_tables: fix possible null-pointer dereference in object update")
Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# b685b534 23-Sep-2019 Paul E. McKenney <paulmck@kernel.org>

net/netfilter: Replace rcu_swap_protected() with rcu_replace_pointer()

This commit replaces the use of rcu_swap_protected() with the more
intuitively appealing rcu_replace_pointer() as a step towards removing
rcu_swap_protected().

Link: https://lore.kernel.org/lkml/CAHk-=wiAsJLw1egFEE=Z7-GGtM6wcvtyytXZA1+BHqta4gg6Hw@mail.gmail.com/
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
[ paulmck: From rcu_replace() to rcu_replace_pointer() per Ingo Molnar. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Jozsef Kadlecsik <kadlec@netfilter.org>
Cc: Florian Westphal <fw@strlen.de>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: <netfilter-devel@vger.kernel.org>
Cc: <coreteam@netfilter.org>
Cc: <netdev@vger.kernel.org>


# d54725cd 16-Oct-2019 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: support for multiple devices per netdev hook

This patch allows you to register one netdev basechain to multiple
devices. This adds a new NFTA_HOOK_DEVS netlink attribute to specify
the list of netdevices. Basechains store a list of hooks.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# cb662ac6 16-Oct-2019 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: increase maximum devices number per flowtable

Rise the maximum limit of devices per flowtable up to 256. Rename
NFT_FLOWTABLE_DEVICE_MAX to NFT_NETDEVICE_MAX in preparation to reuse
the netdev hook parser for ingress basechain.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# b75a3e83 16-Oct-2019 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: allow netdevice to be used only once per flowtable

Allow netdevice only once per flowtable, otherwise hit EEXIST.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 3f0465a9 16-Oct-2019 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: dynamically allocate hooks per net_device in flowtables

Use a list of hooks per device instead an array.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 71a8a63b 16-Oct-2019 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_flow_table: move priority to struct nf_flowtable

Hardware offload needs access to the priority field, store this field in
the nf_flowtable object.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 9b05b6e1 24-Sep-2019 Laura Garcia Liebana <nevola@gmail.com>

netfilter: nf_tables: bogus EBUSY when deleting flowtable after flush

The deletion of a flowtable after a flush in the same transaction
results in EBUSY. This patch adds an activation and deactivation of
flowtables in order to update the _use_ counter.

Signed-off-by: Laura Garcia Liebana <nevola@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# acab7131 19-Sep-2019 Florian Westphal <fw@strlen.de>

netfilter: nf_tables: allow lookups in dynamic sets

This un-breaks lookups in sets that have the 'dynamic' flag set.
Given this active example configuration:

table filter {
set set1 {
type ipv4_addr
size 64
flags dynamic,timeout
timeout 1m
}

chain input {
type filter hook input priority 0; policy accept;
}
}

... this works:
nft add rule ip filter input add @set1 { ip saddr }

-> whenever rule is triggered, the source ip address is inserted
into the set (if it did not exist).

This won't work:
nft add rule ip filter input ip saddr @set1 counter
Error: Could not process rule: Operation not supported

In other words, we can add entries to the set, but then can't make
matching decision based on that set.

That is just wrong -- all set backends support lookups (else they would
not be very useful).
The failure comes from an explicit rejection in nft_lookup.c.

Looking at the history, it seems like NFT_SET_EVAL used to mean
'set contains expressions' (aka. "is a meter"), for instance something like

nft add rule ip filter input meter example { ip saddr limit rate 10/second }
or
nft add rule ip filter input meter example { ip saddr counter }

The actual meaning of NFT_SET_EVAL however, is
'set can be updated from the packet path'.

'meters' and packet-path insertions into sets, such as
'add @set { ip saddr }' use exactly the same kernel code (nft_dynset.c)
and thus require a set backend that provides the ->update() function.

The only set that provides this also is the only one that has the
NFT_SET_EVAL feature flag.

Removing the wrong check makes the above example work.
While at it, also fix the flag check during set instantiation to
allow supported combinations only.

Fixes: 8aeff920dcc9b3f ("netfilter: nf_tables: add stateful object reference to set elements")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# ad652f38 16-Sep-2019 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: add NFT_CHAIN_POLICY_UNSET and use it

Default policy is defined as a unsigned 8-bit field, do not use a
negative value to leave it unset, use this new NFT_CHAIN_POLICY_UNSET
instead.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 06d392cb 10-Sep-2019 wenxu <wenxu@ucloud.cn>

netfilter: nf_tables_offload: remove rules when the device unregisters

If the net_device unregisters, clean up the offload rules before the
chain is destroy.

Signed-off-by: wenxu <wenxu@ucloud.cn>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# be2861dc 08-Sep-2019 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nft_{fwd,dup}_netdev: add offload support

This patch adds support for packet mirroring and redirection. The
nft_fwd_dup_netdev_offload() function configures the flow_action object
for the fwd and the dup actions.

Extend nft_flow_rule_destroy() to release the net_device object when the
flow_rule object is released, since nft_fwd_dup_netdev_offload() bumps
the net_device reference counter.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: wenxu <wenxu@ucloud.cn>


# 3474a2c6 02-Sep-2019 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables_offload: move indirect flow_block callback logic to core

Add nft_offload_init() and nft_offload_exit() function to deal with the
init and the exit path of the offload infrastructure.

Rename nft_indr_block_get_and_ing_cmd() to nft_indr_block_cb().

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# b74ae961 06-Sep-2019 Dan Carpenter <dan.carpenter@oracle.com>

netfilter: nf_tables: Fix an Oops in nf_tables_updobj() error handling

The "newobj" is an error pointer so we can't pass it to kfree(). It
doesn't need to be freed so we can remove that and I also renamed the
error label.

Fixes: d62d0ba97b58 ("netfilter: nf_tables: Introduce stateful object update operation")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Fernando Fernandez Mancera <ffmancera@riseup.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# aa4095a1 04-Sep-2019 Fernando Fernandez Mancera <ffmancera@riseup.net>

netfilter: nf_tables: fix possible null-pointer dereference in object update

Not all objects have an update operation. If the object type doesn't
implement an update operation and the user tries to update it will hit
EOPNOTSUPP.

Fixes: d62d0ba97b58 ("netfilter: nf_tables: Introduce stateful object update operation")
Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# d62d0ba9 26-Aug-2019 Fernando Fernandez Mancera <ffmancera@riseup.net>

netfilter: nf_tables: Introduce stateful object update operation

This patch adds the infrastructure needed for the stateful object update
support.

Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 3bc158f8 15-Aug-2019 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: map basechain priority to hardware priority

This patch adds initial support for offloading basechains using the
priority range from 1 to 65535. This is restricting the netfilter
priority range to 16-bit integer since this is what most drivers assume
so far from tc. It should be possible to extend this range of supported
priorities later on once drivers are updated to support for 32-bit
integer priorities.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 6a0a8d10 09-Aug-2019 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: use-after-free in failing rule with bound set

If a rule that has already a bound anonymous set fails to be added, the
preparation phase releases the rule and the bound set. However, the
transaction object from the abort path still has a reference to the set
object that is stale, leading to a use-after-free when checking for the
set->bound field. Add a new field to the transaction that specifies if
the set is bound, so the abort path can skip releasing it since the rule
command owns it and it takes care of releasing it. After this update,
the set->bound field is removed.

[ 24.649883] Unable to handle kernel paging request at virtual address 0000000000040434
[ 24.657858] Mem abort info:
[ 24.660686] ESR = 0x96000004
[ 24.663769] Exception class = DABT (current EL), IL = 32 bits
[ 24.669725] SET = 0, FnV = 0
[ 24.672804] EA = 0, S1PTW = 0
[ 24.675975] Data abort info:
[ 24.678880] ISV = 0, ISS = 0x00000004
[ 24.682743] CM = 0, WnR = 0
[ 24.685723] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000428952000
[ 24.692207] [0000000000040434] pgd=0000000000000000
[ 24.697119] Internal error: Oops: 96000004 [#1] SMP
[...]
[ 24.889414] Call trace:
[ 24.891870] __nf_tables_abort+0x3f0/0x7a0
[ 24.895984] nf_tables_abort+0x20/0x40
[ 24.899750] nfnetlink_rcv_batch+0x17c/0x588
[ 24.904037] nfnetlink_rcv+0x13c/0x190
[ 24.907803] netlink_unicast+0x18c/0x208
[ 24.911742] netlink_sendmsg+0x1b0/0x350
[ 24.915682] sock_sendmsg+0x4c/0x68
[ 24.919185] ___sys_sendmsg+0x288/0x2c8
[ 24.923037] __sys_sendmsg+0x7c/0xd0
[ 24.926628] __arm64_sys_sendmsg+0x2c/0x38
[ 24.930744] el0_svc_common.constprop.0+0x94/0x158
[ 24.935556] el0_svc_handler+0x34/0x90
[ 24.939322] el0_svc+0x8/0xc
[ 24.942216] Code: 37280300 f9404023 91014262 aa1703e0 (f9401863)
[ 24.948336] ---[ end trace cebbb9dcbed3b56f ]---

Fixes: f6ac85858976 ("netfilter: nf_tables: unbind set in rule from commit path")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 9a32669f 06-Aug-2019 wenxu <wenxu@ucloud.cn>

netfilter: nf_tables_offload: support indr block call

nftable support indr-block call. It makes nftable an offload vlan
and tunnel device.

nft add table netdev firewall
nft add chain netdev firewall aclout { type filter hook ingress offload device mlx_pf0vf0 priority - 300 \; }
nft add rule netdev firewall aclout ip daddr 10.0.0.1 fwd to vlan0
nft add chain netdev firewall aclin { type filter hook ingress device vlan0 priority - 300 \; }
nft add rule netdev firewall aclin ip daddr 10.0.0.7 fwd to mlx_pf0vf0

Signed-off-by: wenxu <wenxu@ucloud.cn>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 14bfb13f 19-Jul-2019 Pablo Neira Ayuso <pablo@netfilter.org>

net: flow_offload: add flow_block structure and use it

This object stores the flow block callbacks that are attached to this
block. Update flow_block_cb_lookup() to take this new object.

This patch restores the block sharing feature.

Fixes: da3eeb904ff4 ("net: flow_offload: add list handling functions")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b717273d 13-Jul-2019 Florian Westphal <fw@strlen.de>

netfilter: nf_tables: don't fail when updating base chain policy

The following nftables test case fails on nf-next:

tests/shell/run-tests.sh tests/shell/testcases/transactions/0011chain_0

The test case contains:
add chain x y { type filter hook input priority 0; }
add chain x y { policy drop; }"

The new test
if (chain->flags ^ flags)
return -EOPNOTSUPP;

triggers here, because chain->flags has NFT_BASE_CHAIN set, but flags
is 0 because no flag attribute was present in the policy update.

Just fetch the current flag settings of a pre-existing chain in case
userspace did not provide any.

Fixes: c9626a2cbdb20 ("netfilter: nf_tables: add hardware offload support")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# c9626a2c 09-Jul-2019 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: add hardware offload support

This patch adds hardware offload support for nftables through the
existing netdev_ops->ndo_setup_tc() interface, the TC_SETUP_CLSFLOWER
classifier and the flow rule API. This hardware offload support is
available for the NFPROTO_NETDEV family and the ingress hook.

Each nftables expression has a new ->offload interface, that is used to
populate the flow rule object that is attached to the transaction
object.

There is a new per-table NFT_TABLE_F_HW flag, that is set on to offload
an entire table, including all of its chains.

This patch supports for basic metadata (layer 3 and 4 protocol numbers),
5-tuple payload matching and the accept/drop actions; this also includes
basechain hardware offload only.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 0ef1efd1 05-Jul-2019 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: force module load in case select_ops() returns -EAGAIN

nft_meta needs to pull in the nft_meta_bridge module in case that this
is a bridge family rule from the select_ops() path.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 9cff126f 05-Jul-2019 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: __nft_expr_type_get() selects specific family type

In case that there are two types, prefer the family specify extension.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# b9c04ae7 05-Jul-2019 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: add nft_expr_type_request_module()

This helper function makes sure the family specific extension is loaded.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 79ebb5bb 18-Jun-2019 Laura Garcia Liebana <nevola@gmail.com>

netfilter: nf_tables: enable set expiration time for set elements

Currently, the expiration of every element in a set or map
is a read-only parameter generated at kernel side.

This change will permit to set a certain expiration date
per element that will be required, for example, during
stateful replication among several nodes.

This patch handles the NFTA_SET_ELEM_EXPIRATION in order
to configure the expiration parameter per element, or
will use the timeout in the case that the expiration
is not set.

Signed-off-by: Laura Garcia Liebana <nevola@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# d2912cb1 04-Jun-2019 Thomas Gleixner <tglx@linutronix.de>

treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500

Based on 2 normalized pattern(s):

this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license version 2 as
published by the free software foundation

this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license version 2 as
published by the free software foundation #

extracted by the scancode license scanner the SPDX license identifier

GPL-2.0-only

has been chosen to replace the boilerplate/reference in 4122 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Enrico Weigelt <info@metux.net>
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190604081206.933168790@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# 53315ac6 22-May-2019 Florian Westphal <fw@strlen.de>

netfilter: nf_tables: free base chain counters from worker

No need to use synchronize_rcu() here, just swap the two pointers
and have the release occur from work queue after commit has completed.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 2c82c7e7 30-Apr-2019 Florian Westphal <fw@strlen.de>

netfilter: nf_tables: fix oops during rule dump

We can oops in nf_tables_fill_rule_info().

Its not possible to fetch previous element in rcu-protected lists
when deletions are not prevented somehow: list_del_rcu poisons
the ->prev pointer value.

Before rcu-conversion this was safe as dump operations did hold
nfnetlink mutex.

Pass previous rule as argument, obtained by keeping a pointer to
the previous rule during traversal.

Fixes: d9adf22a291883 ("netfilter: nf_tables: use call_rcu in netlink dumps")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# edbd82c5 30-Apr-2019 Florian Westphal <fw@strlen.de>

netfilter: nf_tables: fix base chain stat rcu_dereference usage

Following splat gets triggered when nfnetlink monitor is running while
xtables-nft selftests are running:

net/netfilter/nf_tables_api.c:1272 suspicious rcu_dereference_check() usage!
other info that might help us debug this:

1 lock held by xtables-nft-mul/27006:
#0: 00000000e0f85be9 (&net->nft.commit_mutex){+.+.}, at: nf_tables_valid_genid+0x1a/0x50
Call Trace:
nf_tables_fill_chain_info.isra.45+0x6cc/0x6e0
nf_tables_chain_notify+0xf8/0x1a0
nf_tables_commit+0x165c/0x1740

nf_tables_fill_chain_info() can be called both from dumps (rcu read locked)
or from the transaction path if a userspace process subscribed to nftables
notifications.

In the 'table dump' case, rcu_access_pointer() cannot be used: We do not
hold transaction mutex so the pointer can be NULLed right after the check.
Just unconditionally fetch the value, then have the helper return
immediately if its NULL.

In the notification case we don't hold the rcu read lock, but updates are
prevented due to transaction mutex. Use rcu_dereference_check() to make lockdep
aware of this.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 66293c46 12-Apr-2019 Florian Westphal <fw@strlen.de>

netfilter: nf_tables: delay chain policy update until transaction is complete

When we process a long ruleset of the form

chain input {
type filter hook input priority filter; policy drop;
...
}

Then the base chain gets registered early on, we then continue to
process/validate the next messages coming in the same transaction.

Problem is that if the base chain policy is 'drop', it will take effect
immediately, which causes all traffic to get blocked until the
transaction completes or is aborted.

Fix this by deferring the policy until the transaction has been
processed and all of the rules have been flagged as active.

Reported-by: Jann Haber <jann.haber@selfnet.de>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 8cb08174 26-Apr-2019 Johannes Berg <johannes.berg@intel.com>

netlink: make validation more configurable for future strictness

We currently have two levels of strict validation:

1) liberal (default)
- undefined (type >= max) & NLA_UNSPEC attributes accepted
- attribute length >= expected accepted
- garbage at end of message accepted
2) strict (opt-in)
- NLA_UNSPEC attributes accepted
- attribute length >= expected accepted

Split out parsing strictness into four different options:
* TRAILING - check that there's no trailing data after parsing
attributes (in message or nested)
* MAXTYPE - reject attrs > max known type
* UNSPEC - reject attributes with NLA_UNSPEC policy entries
* STRICT_ATTRS - strictly validate attribute size

The default for future things should be *everything*.
The current *_strict() is a combination of TRAILING and MAXTYPE,
and is renamed to _deprecated_strict().
The current regular parsing has none of this, and is renamed to
*_parse_deprecated().

Additionally it allows us to selectively set one of the new flags
even on old policies. Notably, the UNSPEC flag could be useful in
this case, since it can be arranged (by filling in the policy) to
not be an incompatible userspace ABI change, but would then going
forward prevent forgetting attribute entries. Similar can apply
to the POLICY flag.

We end up with the following renames:
* nla_parse -> nla_parse_deprecated
* nla_parse_strict -> nla_parse_deprecated_strict
* nlmsg_parse -> nlmsg_parse_deprecated
* nlmsg_parse_strict -> nlmsg_parse_deprecated_strict
* nla_parse_nested -> nla_parse_nested_deprecated
* nla_validate_nested -> nla_validate_nested_deprecated

Using spatch, of course:
@@
expression TB, MAX, HEAD, LEN, POL, EXT;
@@
-nla_parse(TB, MAX, HEAD, LEN, POL, EXT)
+nla_parse_deprecated(TB, MAX, HEAD, LEN, POL, EXT)

@@
expression NLH, HDRLEN, TB, MAX, POL, EXT;
@@
-nlmsg_parse(NLH, HDRLEN, TB, MAX, POL, EXT)
+nlmsg_parse_deprecated(NLH, HDRLEN, TB, MAX, POL, EXT)

@@
expression NLH, HDRLEN, TB, MAX, POL, EXT;
@@
-nlmsg_parse_strict(NLH, HDRLEN, TB, MAX, POL, EXT)
+nlmsg_parse_deprecated_strict(NLH, HDRLEN, TB, MAX, POL, EXT)

@@
expression TB, MAX, NLA, POL, EXT;
@@
-nla_parse_nested(TB, MAX, NLA, POL, EXT)
+nla_parse_nested_deprecated(TB, MAX, NLA, POL, EXT)

@@
expression START, MAX, POL, EXT;
@@
-nla_validate_nested(START, MAX, POL, EXT)
+nla_validate_nested_deprecated(START, MAX, POL, EXT)

@@
expression NLH, HDRLEN, MAX, POL, EXT;
@@
-nlmsg_validate(NLH, HDRLEN, MAX, POL, EXT)
+nlmsg_validate_deprecated(NLH, HDRLEN, MAX, POL, EXT)

For this patch, don't actually add the strict, non-renamed versions
yet so that it breaks compile if I get it wrong.

Also, while at it, make nla_validate and nla_parse go down to a
common __nla_validate_parse() function to avoid code duplication.

Ultimately, this allows us to have very strict validation for every
new caller of nla_parse()/nlmsg_parse() etc as re-introduced in the
next patch, while existing things will continue to work as is.

In effect then, this adds fully strict validation for any new command.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ae0be8de 26-Apr-2019 Michal Kubecek <mkubecek@suse.cz>

netlink: make nla_nest_start() add NLA_F_NESTED flag

Even if the NLA_F_NESTED flag was introduced more than 11 years ago, most
netlink based interfaces (including recently added ones) are still not
setting it in kernel generated messages. Without the flag, message parsers
not aware of attribute semantics (e.g. wireshark dissector or libmnl's
mnl_nlmsg_fprintf()) cannot recognize nested attributes and won't display
the structure of their contents.

Unfortunately we cannot just add the flag everywhere as there may be
userspace applications which check nlattr::nla_type directly rather than
through a helper masking out the flags. Therefore the patch renames
nla_nest_start() to nla_nest_start_noflag() and introduces nla_nest_start()
as a wrapper adding NLA_F_NESTED. The calls which add NLA_F_NESTED manually
are rewritten to use nla_nest_start().

Except for changes in include/net/netlink.h, the patch was generated using
this semantic patch:

@@ expression E1, E2; @@
-nla_nest_start(E1, E2)
+nla_nest_start_noflag(E1, E2)

@@ expression E1, E2; @@
-nla_nest_start_noflag(E1, E2 | NLA_F_NESTED)
+nla_nest_start(E1, E2)

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 33d1c018 05-Apr-2019 Dan Carpenter <dan.carpenter@oracle.com>

netfilter: nf_tables: prevent shift wrap in nft_chain_parse_hook()

I believe that "hook->num" can be up to UINT_MAX. Shifting more than
31 bits would is undefined in C but in practice it would lead to shift
wrapping. That would lead to an array overflow in nf_tables_addchain():

ops->hook = hook.type->hooks[ops->hooknum];

Fixes: fe19c04ca137 ("netfilter: nf_tables: remove nhooks field from struct nft_af_info")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 3b0a081d 04-Apr-2019 Florian Westphal <fw@strlen.de>

netfilter: make two functions static

They have no external callers anymore.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# c1deb065 27-Mar-2019 Florian Westphal <fw@strlen.de>

netfilter: nf_tables: merge route type into core

very little code, so it really doesn't make sense to have extra
modules or even a kconfig knob for this.

Merge them and make functionality available unconditionally.
The merge makes inet family route support trivial, so add it
as well here.

Before:
text data bss dec hex filename
835 832 0 1667 683 nft_chain_route_ipv4.ko
870 832 0 1702 6a6 nft_chain_route_ipv6.ko
111568 2556 529 114653 1bfdd nf_tables.ko

After:
text data bss dec hex filename
113133 2556 529 116218 1c5fa nf_tables.ko

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# f7e840ee 17-Mar-2019 Colin Ian King <colin.king@canonical.com>

netfilter: nf_tables: remove unused parameter ctx

Function nf_tables_set_desc_parse parameter ctx is not being used
so remove it as it is redundant.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 3b15d09f 27-Feb-2019 Li RongQing <lirongqing@baidu.com>

time: Introduce jiffies64_to_msecs()

there is a similar helper in net/netfilter/nf_tables_api.c,
this maybe become a common request someday, so move it to
time.c

Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Acked-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 8f0db018 01-Apr-2019 NeilBrown <neilb@suse.com>

rhashtable: use bit_spin_locks to protect hash bucket.

This patch changes rhashtables to use a bit_spin_lock on BIT(1) of the
bucket pointer to lock the hash chain for that bucket.

The benefits of a bit spin_lock are:
- no need to allocate a separate array of locks.
- no need to have a configuration option to guide the
choice of the size of this array
- locking cost is often a single test-and-set in a cache line
that will have to be loaded anyway. When inserting at, or removing
from, the head of the chain, the unlock is free - writing the new
address in the bucket head implicitly clears the lock bit.
For __rhashtable_insert_fast() we ensure this always happens
when adding a new key.
- even when lockings costs 2 updates (lock and unlock), they are
in a cacheline that needs to be read anyway.

The cost of using a bit spin_lock is a little bit of code complexity,
which I think is quite manageable.

Bit spin_locks are sometimes inappropriate because they are not fair -
if multiple CPUs repeatedly contend of the same lock, one CPU can
easily be starved. This is not a credible situation with rhashtable.
Multiple CPUs may want to repeatedly add or remove objects, but they
will typically do so at different buckets, so they will attempt to
acquire different locks.

As we have more bit-locks than we previously had spinlocks (by at
least a factor of two) we can expect slightly less contention to
go with the slightly better cache behavior and reduced memory
consumption.

To enhance type checking, a new struct is introduced to represent the
pointer plus lock-bit
that is stored in the bucket-table. This is "struct rhash_lock_head"
and is empty. A pointer to this needs to be cast to either an
unsigned lock, or a "struct rhash_head *" to be useful.
Variables of this type are most often called "bkt".

Previously "pprev" would sometimes point to a bucket, and sometimes a
->next pointer in an rhash_head. As these are now different types,
pprev is NULL when it would have pointed to the bucket. In that case,
'blk' is used, together with correct locking protocol.

Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b25a31bf 18-Mar-2019 Taehee Yoo <ap420073@gmail.com>

netfilter: nf_tables: add missing ->release_ops() in error path of newrule()

->release_ops() callback releases resources and this is used in error path.
If nf_tables_newrule() fails after ->select_ops(), it should release
resources. but it can not call ->destroy() because that should be called
after ->init().
At this point, ->release_ops() should be used for releasing resources.

Test commands:
modprobe -rv xt_tcpudp
iptables-nft -I INPUT -m tcp <-- error command
lsmod

Result:
Module Size Used by
xt_tcpudp 20480 2 <-- it should be 0

Fixes: b8e204006340 ("netfilter: nft_compat: use .release_ops and remove list of extension")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# b8b27498 07-Mar-2019 Florian Westphal <fw@strlen.de>

netfilter: nf_tables: return immediately on empty commit

When running 'nft flush ruleset' while no rules exist, we will increment
the generation counter and announce a new genid to userspace, yet
nothing had changed in the first place.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 3f3a390d 11-Mar-2019 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: use-after-free in dynamic operations

Smatch reports:

net/netfilter/nf_tables_api.c:2167 nf_tables_expr_destroy()
error: dereferencing freed memory 'expr->ops'

net/netfilter/nf_tables_api.c
2162 static void nf_tables_expr_destroy(const struct nft_ctx *ctx,
2163 struct nft_expr *expr)
2164 {
2165 if (expr->ops->destroy)
2166 expr->ops->destroy(ctx, expr);
^^^^
--> 2167 module_put(expr->ops->type->owner);
^^^^^^^^^
2168 }

Smatch says there are three functions which free expr->ops.

Fixes: b8e204006340 ("netfilter: nft_compat: use .release_ops and remove list of extension")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 273fe3f1 08-Mar-2019 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: bogus EBUSY when deleting set after flush

Set deletion after flush coming in the same batch results in EBUSY. Add
set use counter to track the number of references to this set from
rules. We cannot rely on the list of bindings for this since such list
is still populated from the preparation phase.

Reported-by: Václav Zindulka <vaclav.zindulka@tlapnet.cz>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 40ba1d9b 07-Mar-2019 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: fix set double-free in abort path

The abort path can cause a double-free of an anonymous set.
Added-and-to-be-aborted rule looks like this:

udp dport { 137, 138 } drop

The to-be-aborted transaction list looks like this:

newset
newsetelem
newsetelem
rule

This gets walked in reverse order, so first pass disables the rule, the
set elements, then the set.

After synchronize_rcu(), we then destroy those in same order: rule, set
element, set element, newset.

Problem is that the anonymous set has already been bound to the rule, so
the rule (lookup expression destructor) already frees the set, when then
cause use-after-free when trying to delete the elements from this set,
then try to free the set again when handling the newset expression.

Rule releases the bound set in first place from the abort path, this
causes the use-after-free on set element removal when undoing the new
element transactions. To handle this, skip new element transaction if
set is bound from the abort path.

This is still causes the use-after-free on set element removal. To
handle this, remove transaction from the list when the set is already
bound.

Joint work with Florian Westphal.

Fixes: f6ac85858976 ("netfilter: nf_tables: unbind set in rule from commit path")
Bugzilla: https://bugzilla.netfilter.org/show_bug.cgi?id=1325
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# b8e20400 13-Feb-2019 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nft_compat: use .release_ops and remove list of extension

Add .release_ops, that is called in case of error at a later stage in
the expression initialization path, ie. .select_ops() has been already
set up operations and that needs to be undone. This allows us to unwind
.select_ops from the error path, ie. release the dynamic operations for
this extension.

Moreover, allocate one single operation instead of recycling them, this
comes at the cost of consuming a bit more memory per rule, but it
simplifies the infrastructure.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 23b7ca4f 14-Feb-2019 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: fix flush after rule deletion in the same batch

Flush after rule deletion bogusly hits -ENOENT. Skip rules that have
been already from nft_delrule_by_chain() which is always called from the
flush path.

Fixes: cf9dc09d0949 ("netfilter: nf_tables: fix missing rules flushing per table")
Reported-by: Phil Sutter <phil@nwl.cc>
Acked-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# f6ac8585 02-Feb-2019 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: unbind set in rule from commit path

Anonymous sets that are bound to rules from the same transaction trigger
a kernel splat from the abort path due to double set list removal and
double free.

This patch updates the logic to search for the transaction that is
responsible for creating the set and disable the set list removal and
release, given the rule is now responsible for this. Lookup is reverse
since the transaction that adds the set is likely to be at the tail of
the list.

Moreover, this patch adds the unbind step to deliver the event from the
commit path. This should not be done from the worker thread, since we
have no guarantees of in-order delivery to the listener.

This patch removes the assumption that both activate and deactivate
callbacks need to be provided.

Fixes: cd5125d8f518 ("netfilter: nf_tables: split set destruction in deactivate and destroy phase")
Reported-by: Mikhail Morfikov <mmorfikov@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 0604628b 29-Jan-2019 Florian Westphal <fw@strlen.de>

netfilter: nf_tables: add NFTA_RULE_POSITION_ID to nla_policy

Fixes: 75dd48e2e420a ("netfilter: nf_tables: Support RULE_ID reference in new rule")
Reported-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 75dd48e2 14-Jan-2019 Phil Sutter <phil@nwl.cc>

netfilter: nf_tables: Support RULE_ID reference in new rule

To allow for a batch to contain rules in arbitrary ordering, introduce
NFTA_RULE_POSITION_ID attribute which works just like NFTA_RULE_POSITION
but contains the ID of another rule within the same batch. This helps
iptables-nft-restore handling dumps with mixed insert/append commands
correctly.

Note that NFTA_RULE_POSITION takes precedence over
NFTA_RULE_POSITION_ID, so if the former is present, the latter is
ignored.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 4d44175a 08-Jan-2019 Florian Westphal <fw@strlen.de>

netfilter: nf_tables: handle nft_object lookups via rhltable

Instead of linear search, use rhlist interface to look up the objects.
This fixes rulesets with thousands of named objects (quota, counters and
the like).

We only use a single table for this and consider the address of the
table we're doing the lookup in as a part of the key.

This reduces restore time of a sample ruleset with ~20k named counters
from 37 seconds to 0.8 seconds.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# d152159b 08-Jan-2019 Florian Westphal <fw@strlen.de>

netfilter: nf_tables: prepare nft_object for lookups via hashtable

Add a 'key' structure for object, so we can look them up by name + table
combination (the name can be the same in each table).

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 715849ab 08-Jan-2019 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: selective rule dump needs table to be specified

Table needs to be specified for selective rule dumps per chain.

Fixes: 241faeceb849c ("netfilter: nf_tables: Speed up selective rule dumps")
Reported-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# b91d9036 04-Jan-2019 Taehee Yoo <ap420073@gmail.com>

netfilter: nf_tables: fix leaking object reference count

There is no code that decreases the reference count of stateful objects
in error path of the nft_add_set_elem(). this causes a leak of reference
count of stateful objects.

Test commands:
$nft add table ip filter
$nft add counter ip filter c1
$nft add map ip filter m1 { type ipv4_addr : counter \;}
$nft add element ip filter m1 { 1 : c1 }
$nft add element ip filter m1 { 1 : c1 }
$nft delete element ip filter m1 { 1 }
$nft delete counter ip filter c1

Result:
Error: Could not process rule: Device or resource busy
delete counter ip filter c1
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

At the second 'nft add element ip filter m1 { 1 : c1 }', the reference
count of the 'c1' is increased then it tries to insert into the 'm1'. but
the 'm1' already has same element so it returns -EEXIST.
But it doesn't decrease the reference count of the 'c1' in the error path.
Due to a leak of the reference count of the 'c1', the 'c1' can't be
removed by 'nft delete counter ip filter c1'.

Fixes: 8aeff920dcc9 ("netfilter: nf_tables: add stateful object reference to set elements")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 310529e6 30-Dec-2018 Phil Sutter <phil@nwl.cc>

netfilter: nf_tables: Fix for endless loop when dumping ruleset

__nf_tables_dump_rules() stores the current idx value into cb->args[0]
before returning to caller. With multiple chains present, cb->args[0] is
therefore updated after each chain's rules have been traversed. This
though causes the final nf_tables_dump_rules() run (which should return
an skb->len of zero since no rules are left to dump) to continue dumping
rules for each but the first chain. Fix this by moving the cb->args[0]
update to nf_tables_dump_rules().

With no final action to be performed anymore in
__nf_tables_dump_rules(), drop 'out_unfinished' jump label and 'rc'
variable - instead return the appropriate value directly.

Fixes: 241faeceb849c ("netfilter: nf_tables: Speed up selective rule dumps")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# eb895086 20-Dec-2018 Kangjie Lu <kjlu@umn.edu>

netfilter: nf_tables: fix a missing check of nla_put_failure

If nla_nest_start() may fail. The fix checks its return value and goes
to nla_put_failure if it fails.

Signed-off-by: Kangjie Lu <kjlu@umn.edu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 241faece 12-Dec-2018 Phil Sutter <phil@nwl.cc>

netfilter: nf_tables: Speed up selective rule dumps

If just a table name was given, nf_tables_dump_rules() continued over
the list of tables even after a match was found. The simple fix is to
exit the loop if it reached the bottom and ctx->table was not NULL.

When iterating over the table's chains, the same problem as above
existed. But worse than that, if a chain name was given the hash table
wasn't used to find the corresponding chain. Fix this by introducing a
helper function iterating over a chain's rules (and taking care of the
cb->args handling), then introduce a shortcut to it if a chain name was
given.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 4c05ec47 26-Nov-2018 Taehee Yoo <ap420073@gmail.com>

netfilter: nf_tables: fix suspicious RCU usage in nft_chain_stats_replace()

basechain->stats is rcu protected data which is updated from
nft_chain_stats_replace(). This function is executed from the commit
phase which holds the pernet nf_tables commit mutex - not the global
nfnetlink subsystem mutex.

Test commands to reproduce the problem are:
%iptables-nft -I INPUT
%iptables-nft -Z
%iptables-nft -Z

This patch uses RCU calls to handle basechain->stats updates to fix a
splat that looks like:

[89279.358755] =============================
[89279.363656] WARNING: suspicious RCU usage
[89279.368458] 4.20.0-rc2+ #44 Tainted: G W L
[89279.374661] -----------------------------
[89279.379542] net/netfilter/nf_tables_api.c:1404 suspicious rcu_dereference_protected() usage!
[...]
[89279.406556] 1 lock held by iptables-nft/5225:
[89279.411728] #0: 00000000bf45a000 (&net->nft.commit_mutex){+.+.}, at: nf_tables_valid_genid+0x1f/0x70 [nf_tables]
[89279.424022] stack backtrace:
[89279.429236] CPU: 0 PID: 5225 Comm: iptables-nft Tainted: G W L 4.20.0-rc2+ #44
[89279.430135] Call Trace:
[89279.430135] dump_stack+0xc9/0x16b
[89279.430135] ? show_regs_print_info+0x5/0x5
[89279.430135] ? lockdep_rcu_suspicious+0x117/0x160
[89279.430135] nft_chain_commit_update+0x4ea/0x640 [nf_tables]
[89279.430135] ? sched_clock_local+0xd4/0x140
[89279.430135] ? check_flags.part.35+0x440/0x440
[89279.430135] ? __rhashtable_remove_fast.constprop.67+0xec0/0xec0 [nf_tables]
[89279.430135] ? sched_clock_cpu+0x126/0x170
[89279.430135] ? find_held_lock+0x39/0x1c0
[89279.430135] ? hlock_class+0x140/0x140
[89279.430135] ? is_bpf_text_address+0x5/0xf0
[89279.430135] ? check_flags.part.35+0x440/0x440
[89279.430135] ? __lock_is_held+0xb4/0x140
[89279.430135] nf_tables_commit+0x2555/0x39c0 [nf_tables]

Fixes: f102d66b335a4 ("netfilter: nf_tables: use dedicated mutex to guard transactions")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# ca089878 27-Nov-2018 Taehee Yoo <ap420073@gmail.com>

netfilter: nf_tables: deactivate expressions in rule replecement routine

There is no expression deactivation call from the rule replacement path,
hence, chain counter is not decremented. A few steps to reproduce the
problem:

%nft add table ip filter
%nft add chain ip filter c1
%nft add chain ip filter c1
%nft add rule ip filter c1 jump c2
%nft replace rule ip filter c1 handle 3 accept
%nft flush ruleset

<jump c2> expression means immediate NFT_JUMP to chain c2.
Reference count of chain c2 is increased when the rule is added.

When rule is deleted or replaced, the reference counter of c2 should be
decreased via nft_rule_expr_deactivate() which calls
nft_immediate_deactivate().

Splat looks like:
[ 214.396453] WARNING: CPU: 1 PID: 21 at net/netfilter/nf_tables_api.c:1432 nf_tables_chain_destroy.isra.38+0x2f9/0x3a0 [nf_tables]
[ 214.398983] Modules linked in: nf_tables nfnetlink
[ 214.398983] CPU: 1 PID: 21 Comm: kworker/1:1 Not tainted 4.20.0-rc2+ #44
[ 214.398983] Workqueue: events nf_tables_trans_destroy_work [nf_tables]
[ 214.398983] RIP: 0010:nf_tables_chain_destroy.isra.38+0x2f9/0x3a0 [nf_tables]
[ 214.398983] Code: 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 8e 00 00 00 48 8b 7b 58 e8 e1 2c 4e c6 48 89 df e8 d9 2c 4e c6 eb 9a <0f> 0b eb 96 0f 0b e9 7e fe ff ff e8 a7 7e 4e c6 e9 a4 fe ff ff e8
[ 214.398983] RSP: 0018:ffff8881152874e8 EFLAGS: 00010202
[ 214.398983] RAX: 0000000000000001 RBX: ffff88810ef9fc28 RCX: ffff8881152876f0
[ 214.398983] RDX: dffffc0000000000 RSI: 1ffff11022a50ede RDI: ffff88810ef9fc78
[ 214.398983] RBP: 1ffff11022a50e9d R08: 0000000080000000 R09: 0000000000000000
[ 214.398983] R10: 0000000000000000 R11: 0000000000000000 R12: 1ffff11022a50eba
[ 214.398983] R13: ffff888114446e08 R14: ffff8881152876f0 R15: ffffed1022a50ed6
[ 214.398983] FS: 0000000000000000(0000) GS:ffff888116400000(0000) knlGS:0000000000000000
[ 214.398983] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 214.398983] CR2: 00007fab9bb5f868 CR3: 000000012aa16000 CR4: 00000000001006e0
[ 214.398983] Call Trace:
[ 214.398983] ? nf_tables_table_destroy.isra.37+0x100/0x100 [nf_tables]
[ 214.398983] ? __kasan_slab_free+0x145/0x180
[ 214.398983] ? nf_tables_trans_destroy_work+0x439/0x830 [nf_tables]
[ 214.398983] ? kfree+0xdb/0x280
[ 214.398983] nf_tables_trans_destroy_work+0x5f5/0x830 [nf_tables]
[ ... ]

Fixes: bb7b40aecbf7 ("netfilter: nf_tables: bogus EBUSY in chain deletions")
Reported by: Christoph Anton Mitterer <calestyo@scientia.net>
Link: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=914505
Link: https://bugzilla.kernel.org/show_bug.cgi?id=201791
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 29e38801 12-Nov-2018 Florian Westphal <fw@strlen.de>

netfilter: nf_tables: fix use-after-free when deleting compat expressions

nft_compat ops do not have static storage duration, unlike all other
expressions.

When nf_tables_expr_destroy() returns, expr->ops might have been
free'd already, so we need to store next address before calling
expression destructor.

For same reason, we can't deref match pointer after nft_xt_put().

This can be easily reproduced by adding msleep() before
nft_match_destroy() returns.

Fixes: 0ca743a55991 ("netfilter: nf_tables: add compatibility layer for x_tables")
Reported-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 447750f2 03-Nov-2018 Florian Westphal <fw@strlen.de>

netfilter: nf_tables: don't use position attribute on rule replacement

Its possible to set both HANDLE and POSITION when replacing a rule.
In this case, the rule at POSITION gets replaced using the
userspace-provided handle. Rule handles are supposed to be generated
by the kernel only.

Duplicate handles should be harmless, however better disable this "feature"
by only checking for the POSITION attribute on insert operations.

Fixes: 5e94846686d0 ("netfilter: nf_tables: add insert operation")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 0fb39bbe 31-Oct-2018 Florian Westphal <fw@strlen.de>

netfilter: nf_tables: don't skip inactive chains during update

There is no synchronization between packet path and the configuration plane.

The packet path uses two arrays with rules, one contains the current (active)
generation. The other either contains the last (obsolete) generation or
the future one.

Consider:
cpu1 cpu2
nft_do_chain(c);
delete c
net->gen++;
genbit = !!net->gen;
rules = c->rg[genbit];

cpu1 ignores c when updating if c is not active anymore in the new
generation.

On cpu2, we now use rules from wrong generation, as c->rg[old]
contains the rules matching 'c' whereas c->rg[new] was not updated and
can even point to rules that have been free'd already, causing a crash.

To fix this, make sure that 'current' to the 'next' generation are
identical for chains that are going away so that c->rg[new] will just
use the matching rules even if genbit was incremented already.

Fixes: 0cbc06b3faba7 ("netfilter: nf_tables: remove synchronize_rcu in commit phase")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# b7f1a16d 01-Oct-2018 Taehee Yoo <ap420073@gmail.com>

netfilter: nf_flow_table: remove flowtable hook flush routine in netns exit routine

When device is unregistered, flowtable flush routine is called
by notifier_call(nf_tables_flowtable_event). and exit callback of
nftables pernet_operation(nf_tables_exit_net) also has flowtable flush
routine. but when network namespace is destroyed, both notifier_call
and pernet_operation are called. hence flowtable flush routine in
pernet_operation is unnecessary.

test commands:
%ip netns add vm1
%ip netns exec vm1 nft add table ip filter
%ip netns exec vm1 nft add flowtable ip filter w \
{ hook ingress priority 0\; devices = { lo }\; }
%ip netns del vm1

splat looks like:
[ 265.187019] WARNING: CPU: 0 PID: 87 at net/netfilter/core.c:309 nf_hook_entry_head+0xc7/0xf0
[ 265.187112] Modules linked in: nf_flow_table_ipv4 nf_flow_table nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink ip_tables x_tables
[ 265.187390] CPU: 0 PID: 87 Comm: kworker/u4:2 Not tainted 4.19.0-rc3+ #5
[ 265.187453] Workqueue: netns cleanup_net
[ 265.187514] RIP: 0010:nf_hook_entry_head+0xc7/0xf0
[ 265.187546] Code: 8d 81 68 03 00 00 5b c3 89 d0 83 fa 04 48 8d 84 c7 e8 11 00 00 76 81 0f 0b 31 c0 e9 78 ff ff ff 0f 0b 48 83 c4 08 31 c0 5b c3 <0f> 0b 31 c0 e9 65 ff ff ff 0f 0b 31 c0 e9 5c ff ff ff 48 89 0c 24
[ 265.187573] RSP: 0018:ffff88011546f098 EFLAGS: 00010246
[ 265.187624] RAX: ffffffff8d90e135 RBX: 1ffff10022a8de1c RCX: 0000000000000000
[ 265.187645] RDX: 0000000000000000 RSI: 0000000000000005 RDI: ffff880116298040
[ 265.187645] RBP: ffff88010ea4c1a8 R08: 0000000000000000 R09: 0000000000000000
[ 265.187645] R10: ffff88011546f1d8 R11: ffffed0022c532c1 R12: ffff88010ea4c1d0
[ 265.187645] R13: 0000000000000005 R14: dffffc0000000000 R15: ffff88010ea4c1c4
[ 265.187645] FS: 0000000000000000(0000) GS:ffff88011b200000(0000) knlGS:0000000000000000
[ 265.187645] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 265.187645] CR2: 00007fdfb8d00000 CR3: 0000000057a16000 CR4: 00000000001006f0
[ 265.187645] Call Trace:
[ 265.187645] __nf_unregister_net_hook+0xca/0x5d0
[ 265.187645] ? nf_hook_entries_free.part.3+0x80/0x80
[ 265.187645] ? save_trace+0x300/0x300
[ 265.187645] nf_unregister_net_hooks+0x2e/0x40
[ 265.187645] nf_tables_exit_net+0x479/0x1340 [nf_tables]
[ 265.187645] ? find_held_lock+0x39/0x1c0
[ 265.187645] ? nf_tables_abort+0x30/0x30 [nf_tables]
[ 265.187645] ? inet_frag_destroy_rcu+0xd0/0xd0
[ 265.187645] ? trace_hardirqs_on+0x93/0x210
[ 265.187645] ? __bpf_trace_preemptirq_template+0x10/0x10
[ 265.187645] ? inet_frag_destroy_rcu+0xd0/0xd0
[ 265.187645] ? inet_frag_destroy_rcu+0xd0/0xd0
[ 265.187645] ? __mutex_unlock_slowpath+0x17f/0x740
[ 265.187645] ? wait_for_completion+0x710/0x710
[ 265.187645] ? bucket_table_free+0xb2/0x1f0
[ 265.187645] ? nested_table_free+0x130/0x130
[ 265.187645] ? __lock_is_held+0xb4/0x140
[ 265.187645] ops_exit_list.isra.10+0x94/0x140
[ 265.187645] cleanup_net+0x45b/0x900
[ ... ]

This WARNING means that hook unregisteration is failed because
all flowtables hooks are already unregistered by notifier_call.

Network namespace exit routine guarantees that all devices will be
unregistered first. then, other exit callbacks of pernet_operations
are called. so that removing flowtable flush routine in exit callback of
pernet_operation(nf_tables_exit_net) doesn't make flowtable leak.

Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# fa5950e4 04-Sep-2018 Florian Westphal <fw@strlen.de>

netfilter: nf_tables: avoid BUG_ON usage

None of these spots really needs to crash the kernel.
In one two cases we can jsut report error to userspace, in the other
cases we can just use WARN_ON (and leak memory instead).

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 0935d558 29-Aug-2018 Florian Westphal <fw@strlen.de>

netfilter: nf_tables: asynchronous release

Release the committed transaction log from a work queue, moving
expensive synchronize_rcu out of the locked section and providing
opportunity to batch this.

On my test machine this cuts runtime of nft-test.py in half.
Based on earlier patch from Pablo Neira Ayuso.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 0ef235c7 30-Aug-2018 Florian Westphal <fw@strlen.de>

netfilter: nf_tables: warn when expr implements only one of activate/deactivate

->destroy is only allowed to free data, or do other cleanups that do not
have side effects on other state, such as visibility to other netlink
requests.

Such things need to be done in ->deactivate.
As a transaction can fail, we need to make sure we can undo such
operations, therefore ->activate() has to be provided too.

So print a warning and refuse registration if expr->ops provides
only one of the two operations.

v2: fix nft_expr_check_ops to not repeat same check twice (Jones Desougi)

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# cd5125d8 29-Aug-2018 Florian Westphal <fw@strlen.de>

netfilter: nf_tables: split set destruction in deactivate and destroy phase

Splits unbind_set into destroy_set and unbinding operation.

Unbinding removes set from lists (so new transaction would not
find it anymore) but keeps memory allocated (so packet path continues
to work).

Rebind function is added to allow unrolling in case transaction
that wants to remove set is aborted.

Destroy function is added to free the memory, but this could occur
outside of transaction in the future.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 7acfda53 25-Aug-2018 Taehee Yoo <ap420073@gmail.com>

netfilter: nf_tables: release chain in flushing set

When element of verdict map is deleted, the delete routine should
release chain. however, flush element of verdict map routine doesn't
release chain.

test commands:
%nft add table ip filter
%nft add chain ip filter c1
%nft add map ip filter map1 { type ipv4_addr : verdict \; }
%nft add element ip filter map1 { 1 : jump c1 }
%nft flush map ip filter map1
%nft flush ruleset

splat looks like:
[ 4895.170899] kernel BUG at net/netfilter/nf_tables_api.c:1415!
[ 4895.178114] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
[ 4895.178880] CPU: 0 PID: 1670 Comm: nft Not tainted 4.18.0+ #55
[ 4895.178880] RIP: 0010:nf_tables_chain_destroy.isra.28+0x39/0x220 [nf_tables]
[ 4895.178880] Code: fc ff df 53 48 89 fb 48 83 c7 50 48 89 fa 48 c1 ea 03 0f b6 04 02 84 c0 74 09 3c 03 7f 05 e8 3e 4c 25 e1 8b 43 50 85 c0 74 02 <0f> 0b 48 89 da 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 80 3c 02
[ 4895.228342] RSP: 0018:ffff88010b98f4c0 EFLAGS: 00010202
[ 4895.234841] RAX: 0000000000000001 RBX: ffff8801131c6968 RCX: ffff8801146585b0
[ 4895.234841] RDX: 1ffff10022638d37 RSI: ffff8801191a9348 RDI: ffff8801131c69b8
[ 4895.234841] RBP: ffff8801146585a8 R08: 1ffff1002323526a R09: 0000000000000000
[ 4895.234841] R10: 0000000000000000 R11: 0000000000000000 R12: dead000000000200
[ 4895.234841] R13: dead000000000100 R14: ffffffffa3638af8 R15: dffffc0000000000
[ 4895.234841] FS: 00007f6d188e6700(0000) GS:ffff88011b600000(0000) knlGS:0000000000000000
[ 4895.234841] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4895.234841] CR2: 00007ffe72b8df88 CR3: 000000010e2d4000 CR4: 00000000001006f0
[ 4895.234841] Call Trace:
[ 4895.234841] nf_tables_commit+0x2704/0x2c70 [nf_tables]
[ 4895.234841] ? nfnetlink_rcv_batch+0xa4f/0x11b0 [nfnetlink]
[ 4895.234841] ? nf_tables_setelem_notify.constprop.48+0x1a0/0x1a0 [nf_tables]
[ 4895.323824] ? __lock_is_held+0x9d/0x130
[ 4895.323824] ? kasan_unpoison_shadow+0x30/0x40
[ 4895.333299] ? kasan_kmalloc+0xa9/0xc0
[ 4895.333299] ? kmem_cache_alloc_trace+0x2c0/0x310
[ 4895.333299] ? nfnetlink_rcv_batch+0xa4f/0x11b0 [nfnetlink]
[ 4895.333299] nfnetlink_rcv_batch+0xdb9/0x11b0 [nfnetlink]
[ 4895.333299] ? debug_show_all_locks+0x290/0x290
[ 4895.333299] ? nfnetlink_net_init+0x150/0x150 [nfnetlink]
[ 4895.333299] ? sched_clock_cpu+0xe5/0x170
[ 4895.333299] ? sched_clock_local+0xff/0x130
[ 4895.333299] ? sched_clock_cpu+0xe5/0x170
[ 4895.333299] ? find_held_lock+0x39/0x1b0
[ 4895.333299] ? sched_clock_local+0xff/0x130
[ 4895.333299] ? memset+0x1f/0x40
[ 4895.333299] ? nla_parse+0x33/0x260
[ 4895.333299] ? ns_capable_common+0x6e/0x110
[ 4895.333299] nfnetlink_rcv+0x2c0/0x310 [nfnetlink]
[ ... ]

Fixes: 591054469b3e ("netfilter: nf_tables: revisit chain/object refcounting from elements")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 6a48de01 02-Aug-2018 Florian Westphal <fw@strlen.de>

netfilter: nf_tables: don't prevent event handler from device cleanup on netns exit

When a netnsamespace exits, the nf_tables pernet_ops will remove all rules.
However, there is one caveat:

Base chains that register ingress hooks will cause use-after-free:
device is already gone at that point.

The device event handlers prevent this from happening:
netns exit synthesizes unregister events for all devices.

However, an improper fix for a race condition made the notifiers a no-op
in case they get called from netns exit path, so revert that part.

This is safe now as the previous patch fixed nf_tables pernet ops
and device notifier initialisation ordering.

Fixes: 0a2cf5ee432c2 ("netfilter: nf_tables: close race between netns exit and rmmod")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# d209df3e 02-Aug-2018 Florian Westphal <fw@strlen.de>

netfilter: nf_tables: fix register ordering

We must register nfnetlink ops last, as that exposes nf_tables to
userspace. Without this, we could theoretically get nfnetlink request
before net->nft state has been initialized.

Fixes: 99633ab29b213 ("netfilter: nf_tables: complete net namespace support")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 4ef360dd 25-Jul-2018 Taehee Yoo <ap420073@gmail.com>

netfilter: nft_set: fix allocation size overflow in privsize callback.

In order to determine allocation size of set, ->privsize is invoked.
At this point, both desc->size and size of each data structure of set
are used. desc->size means number of element that is given by user.
desc->size is u32 type. so that upperlimit of set element is 4294967295.
but return type of ->privsize is also u32. hence overflow can occurred.

test commands:
%nft add table ip filter
%nft add set ip filter hash1 { type ipv4_addr \; size 4294967295 \; }
%nft list ruleset

splat looks like:
[ 1239.202910] kasan: CONFIG_KASAN_INLINE enabled
[ 1239.208788] kasan: GPF could be caused by NULL-ptr deref or user memory access
[ 1239.217625] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
[ 1239.219329] CPU: 0 PID: 1603 Comm: nft Not tainted 4.18.0-rc5+ #7
[ 1239.229091] RIP: 0010:nft_hash_walk+0x1d2/0x310 [nf_tables_set]
[ 1239.229091] Code: 84 d2 7f 10 4c 89 e7 89 44 24 38 e8 d8 5a 17 e0 8b 44 24 38 48 8d 7b 10 41 0f b6 0c 24 48 89 fa 48 89 fe 48 c1 ea 03 83 e6 07 <42> 0f b6 14 3a 40 38 f2 7f 1a 84 d2 74 16
[ 1239.229091] RSP: 0018:ffff8801118cf358 EFLAGS: 00010246
[ 1239.229091] RAX: 0000000000000000 RBX: 0000000000020400 RCX: 0000000000000001
[ 1239.229091] RDX: 0000000000004082 RSI: 0000000000000000 RDI: 0000000000020410
[ 1239.229091] RBP: ffff880114d5a988 R08: 0000000000007e94 R09: ffff880114dd8030
[ 1239.229091] R10: ffff880114d5a988 R11: ffffed00229bb006 R12: ffff8801118cf4d0
[ 1239.229091] R13: ffff8801118cf4d8 R14: 0000000000000000 R15: dffffc0000000000
[ 1239.229091] FS: 00007f5a8fe0b700(0000) GS:ffff88011b600000(0000) knlGS:0000000000000000
[ 1239.229091] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1239.229091] CR2: 00007f5a8ecc27b0 CR3: 000000010608e000 CR4: 00000000001006f0
[ 1239.229091] Call Trace:
[ 1239.229091] ? nft_hash_remove+0xf0/0xf0 [nf_tables_set]
[ 1239.229091] ? memset+0x1f/0x40
[ 1239.229091] ? __nla_reserve+0x9f/0xb0
[ 1239.229091] ? memcpy+0x34/0x50
[ 1239.229091] nf_tables_dump_set+0x9a1/0xda0 [nf_tables]
[ 1239.229091] ? __kmalloc_reserve.isra.29+0x2e/0xa0
[ 1239.229091] ? nft_chain_hash_obj+0x630/0x630 [nf_tables]
[ 1239.229091] ? nf_tables_commit+0x2c60/0x2c60 [nf_tables]
[ 1239.229091] netlink_dump+0x470/0xa20
[ 1239.229091] __netlink_dump_start+0x5ae/0x690
[ 1239.229091] nft_netlink_dump_start_rcu+0xd1/0x160 [nf_tables]
[ 1239.229091] nf_tables_getsetelem+0x2e5/0x4b0 [nf_tables]
[ 1239.229091] ? nft_get_set_elem+0x440/0x440 [nf_tables]
[ 1239.229091] ? nft_chain_hash_obj+0x630/0x630 [nf_tables]
[ 1239.229091] ? nf_tables_dump_obj_done+0x70/0x70 [nf_tables]
[ 1239.229091] ? nla_parse+0xab/0x230
[ 1239.229091] ? nft_get_set_elem+0x440/0x440 [nf_tables]
[ 1239.229091] nfnetlink_rcv_msg+0x7f0/0xab0 [nfnetlink]
[ 1239.229091] ? nfnetlink_bind+0x1d0/0x1d0 [nfnetlink]
[ 1239.229091] ? debug_show_all_locks+0x290/0x290
[ 1239.229091] ? sched_clock_cpu+0x132/0x170
[ 1239.229091] ? find_held_lock+0x39/0x1b0
[ 1239.229091] ? sched_clock_local+0x10d/0x130
[ 1239.229091] netlink_rcv_skb+0x211/0x320
[ 1239.229091] ? nfnetlink_bind+0x1d0/0x1d0 [nfnetlink]
[ 1239.229091] ? netlink_ack+0x7b0/0x7b0
[ 1239.229091] ? ns_capable_common+0x6e/0x110
[ 1239.229091] nfnetlink_rcv+0x2d1/0x310 [nfnetlink]
[ 1239.229091] ? nfnetlink_rcv_batch+0x10f0/0x10f0 [nfnetlink]
[ 1239.229091] ? netlink_deliver_tap+0x829/0x930
[ 1239.229091] ? lock_acquire+0x265/0x2e0
[ 1239.229091] netlink_unicast+0x406/0x520
[ 1239.509725] ? netlink_attachskb+0x5b0/0x5b0
[ 1239.509725] ? find_held_lock+0x39/0x1b0
[ 1239.509725] netlink_sendmsg+0x987/0xa20
[ 1239.509725] ? netlink_unicast+0x520/0x520
[ 1239.509725] ? _copy_from_user+0xa9/0xc0
[ 1239.509725] __sys_sendto+0x21a/0x2c0
[ 1239.509725] ? __ia32_sys_getpeername+0xa0/0xa0
[ 1239.509725] ? retint_kernel+0x10/0x10
[ 1239.509725] ? sched_clock_cpu+0x132/0x170
[ 1239.509725] ? find_held_lock+0x39/0x1b0
[ 1239.509725] ? lock_downgrade+0x540/0x540
[ 1239.509725] ? up_read+0x1c/0x100
[ 1239.509725] ? __do_page_fault+0x763/0x970
[ 1239.509725] ? retint_user+0x18/0x18
[ 1239.509725] __x64_sys_sendto+0x177/0x180
[ 1239.509725] do_syscall_64+0xaa/0x360
[ 1239.509725] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1239.509725] RIP: 0033:0x7f5a8f468e03
[ 1239.509725] Code: 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb d0 0f 1f 84 00 00 00 00 00 83 3d 49 c9 2b 00 00 75 13 49 89 ca b8 2c 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 34 c3 48 83 ec 08 e8
[ 1239.509725] RSP: 002b:00007ffd78d0b778 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 1239.509725] RAX: ffffffffffffffda RBX: 00007ffd78d0c890 RCX: 00007f5a8f468e03
[ 1239.509725] RDX: 0000000000000034 RSI: 00007ffd78d0b7e0 RDI: 0000000000000003
[ 1239.509725] RBP: 00007ffd78d0b7d0 R08: 00007f5a8f15c160 R09: 000000000000000c
[ 1239.509725] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffd78d0b7e0
[ 1239.509725] R13: 0000000000000034 R14: 00007f5a8f9aff60 R15: 00005648040094b0
[ 1239.509725] Modules linked in: nf_tables_set nf_tables nfnetlink ip_tables x_tables
[ 1239.670713] ---[ end trace 39375adcda140f11 ]---
[ 1239.676016] RIP: 0010:nft_hash_walk+0x1d2/0x310 [nf_tables_set]
[ 1239.682834] Code: 84 d2 7f 10 4c 89 e7 89 44 24 38 e8 d8 5a 17 e0 8b 44 24 38 48 8d 7b 10 41 0f b6 0c 24 48 89 fa 48 89 fe 48 c1 ea 03 83 e6 07 <42> 0f b6 14 3a 40 38 f2 7f 1a 84 d2 74 16
[ 1239.705108] RSP: 0018:ffff8801118cf358 EFLAGS: 00010246
[ 1239.711115] RAX: 0000000000000000 RBX: 0000000000020400 RCX: 0000000000000001
[ 1239.719269] RDX: 0000000000004082 RSI: 0000000000000000 RDI: 0000000000020410
[ 1239.727401] RBP: ffff880114d5a988 R08: 0000000000007e94 R09: ffff880114dd8030
[ 1239.735530] R10: ffff880114d5a988 R11: ffffed00229bb006 R12: ffff8801118cf4d0
[ 1239.743658] R13: ffff8801118cf4d8 R14: 0000000000000000 R15: dffffc0000000000
[ 1239.751785] FS: 00007f5a8fe0b700(0000) GS:ffff88011b600000(0000) knlGS:0000000000000000
[ 1239.760993] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1239.767560] CR2: 00007f5a8ecc27b0 CR3: 000000010608e000 CR4: 00000000001006f0
[ 1239.775679] Kernel panic - not syncing: Fatal exception
[ 1239.776630] Kernel Offset: 0x1f000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 1239.776630] Rebooting in 5 seconds..

Fixes: 20a69341f2d0 ("netfilter: nf_tables: add netlink set API")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 445509eb 03-Aug-2018 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: simplify NLM_F_CREATE handling

* From nf_tables_newchain(), codepath provides context that allows us to
infer if we are updating a chain (in that case, no module autoload is
required) or adding a new one (then, module autoload is indeed
needed).
* We only need it in one single spot in nf_tables_newrule().
* Not needed for nf_tables_newset() at all.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 1974d245 31-Jul-2018 YueHaibing <yuehaibing@huawei.com>

netfilter: nf_tables: remove unused variable

Variable 'ext' is being assigned but are never used hence they are
unused and can be removed.

Cleans up clang warnings:
net/netfilter/nf_tables_api.c:4032:28: warning: variable ‘ext’ set but not used [-Wunused-but-set-variable]

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 9e619d87 31-Jul-2018 Florian Westphal <fw@strlen.de>

netfilter: nf_tables: flow event notifier must use transaction mutex

Fixes: f102d66b335a4 ("netfilter: nf_tables: use dedicated mutex to guard transactions")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 90fd131a 22-Jul-2018 Florian Westphal <fw@strlen.de>

netfilter: nf_tables: move dumper state allocation into ->start

Shaochun Chen points out we leak dumper filter state allocations
stored in dump_control->data in case there is an error before netlink sets
cb_running (after which ->done will be called at some point).

In order to fix this, add .start functions and do the allocations
there.

->done is going to clean up, and in case error occurs before
->start invocation no cleanups need to be done anymore.

Reported-by: shaochun chen <cscnull@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# c6cc94df 16-Jul-2018 Florian Westphal <fw@strlen.de>

netfilter: nf_tables: don't allow to rename to already-pending name

Its possible to rename two chains to the same name in one
transaction:

nft add chain t c1
nft add chain t c2
nft 'rename chain t c1 c3;rename chain t c2 c3'

This creates two chains named 'c3'.

Appears to be harmless, both chains can still be deleted both
by name or handle, but, nevertheless, its a bug.

Walk transaction log and also compare vs. the pending renames.

Both chains can still be deleted, but nevertheless it is a bug as
we don't allow to create chains with identical names, so we should
prevent this from happening-by-rename too.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 9f8aac0b 16-Jul-2018 Florian Westphal <fw@strlen.de>

netfilter: nf_tables: fix memory leaks on chain rename

The new name is stored in the transaction metadata, on commit,
the pointers to the old and new names are swapped.

Therefore in abort and commit case we have to free the
pointer in the chain_trans container.

In commit case, the pointer can be used by another cpu that
is currently dumping the renamed chain, thus kfree needs to
happen after waiting for rcu readers to complete.

Fixes: b7263e071a ("netfilter: nf_tables: Allow chain name of up to 255 chars")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# a12486eb 16-Jul-2018 Florian Westphal <fw@strlen.de>

netfilter: nf_tables: free flow table struct too

Fixes: 3b49e2e94e6ebb ("netfilter: nf_tables: add flow table netlink frontend")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# b8088dda 16-Jul-2018 Florian Westphal <fw@strlen.de>

netfilter: nf_tables: use dev->name directly

no need to store the name in separate area.

Furthermore, it uses kmalloc but not kfree and most accesses seem to treat
it as char[IFNAMSIZ] not char *.

Remove this and use dev->name instead.

In case event zeroed dev, just omit the name in the dump.

Fixes: d92191aa84e5f1 ("netfilter: nf_tables: cache device name in flowtable object")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# f102d66b 11-Jul-2018 Florian Westphal <fw@strlen.de>

netfilter: nf_tables: use dedicated mutex to guard transactions

Continue to use nftnl subsys mutex to protect (un)registration of hook types,
expressions and so on, but force batch operations to do their own
locking.

This allows distinct net namespaces to perform transactions in parallel.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 2a43ecf9 11-Jul-2018 Florian Westphal <fw@strlen.de>

netfilter: nf_tables: avoid global info storage

This works because all accesses are currently serialized by nfnl
nf_tables subsys mutex.

If we want to have per-netns locking, we need to make this scratch
area pernetns or allocate it on demand.

This does the latter, its ~28kbyte but we can fallback to vmalloc
so it should be fine.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# be2ab5b4 11-Jul-2018 Florian Westphal <fw@strlen.de>

netfilter: nf_tables: take module reference when starting a batch

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# ca2f18be 11-Jul-2018 Florian Westphal <fw@strlen.de>

netfilter: nf_tables: make valid_genid callback mandatory

always call this function, followup patch can use this to
aquire a per-netns transaction log to guard the entire batch
instead of using the nfnl susbsys mutex (which is shared among all
namespaces).

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 452238e8 11-Jul-2018 Florian Westphal <fw@strlen.de>

netfilter: nf_tables: add and use helper for module autoload

module autoload is problematic, it requires dropping the mutex that
protects the transaction. Once the mutex has been dropped, another
client can start a new transaction before we had a chance to abort
current transaction log.

This helper makes sure we first zap the transaction log, then
drop mutex for module autoload.

In case autload is successful, the caller has to reply entire
message anyway.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 26b2f552 12-Jul-2018 Taehee Yoo <ap420073@gmail.com>

netfilter: nf_tables: fix jumpstack depth validation

The level of struct nft_ctx is updated by nf_tables_check_loops(). That
is used to validate jumpstack depth. But jumpstack validation routine
doesn't update and validate recursively. So, in some cases, chain depth
can be bigger than the NFT_JUMP_STACK_SIZE.

After this patch, The jumpstack validation routine is located in the
nft_chain_validate(). When new rules or new set elements are added, the
nft_table_validate() is called by the nf_tables_newrule and the
nf_tables_newsetelem. The nft_table_validate() calls the
nft_chain_validate() that visit all their children chains recursively.
So it can update depth of chain certainly.

Reproducer:
%cat ./test.sh
#!/bin/bash
nft add table ip filter
nft add chain ip filter input { type filter hook input priority 0\; }
for ((i=0;i<20;i++)); do
nft add chain ip filter a$i
done

nft add rule ip filter input jump a1

for ((i=0;i<10;i++)); do
nft add rule ip filter a$i jump a$((i+1))
done

for ((i=11;i<19;i++)); do
nft add rule ip filter a$i jump a$((i+1))
done

nft add rule ip filter a10 jump a11

Result:
[ 253.931782] WARNING: CPU: 1 PID: 0 at net/netfilter/nf_tables_core.c:186 nft_do_chain+0xacc/0xdf0 [nf_tables]
[ 253.931915] Modules linked in: nf_tables nfnetlink ip_tables x_tables
[ 253.932153] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.18.0-rc3+ #48
[ 253.932153] RIP: 0010:nft_do_chain+0xacc/0xdf0 [nf_tables]
[ 253.932153] Code: 83 f8 fb 0f 84 c7 00 00 00 e9 d0 00 00 00 83 f8 fd 74 0e 83 f8 ff 0f 84 b4 00 00 00 e9 bd 00 00 00 83 bd 64 fd ff ff 0f 76 09 <0f> 0b 31 c0 e9 bc 02 00 00 44 8b ad 64 fd
[ 253.933807] RSP: 0018:ffff88011b807570 EFLAGS: 00010212
[ 253.933807] RAX: 00000000fffffffd RBX: ffff88011b807660 RCX: 0000000000000000
[ 253.933807] RDX: 0000000000000010 RSI: ffff880112b39d78 RDI: ffff88011b807670
[ 253.933807] RBP: ffff88011b807850 R08: ffffed0023700ece R09: ffffed0023700ecd
[ 253.933807] R10: ffff88011b80766f R11: ffffed0023700ece R12: ffff88011b807898
[ 253.933807] R13: ffff880112b39d80 R14: ffff880112b39d60 R15: dffffc0000000000
[ 253.933807] FS: 0000000000000000(0000) GS:ffff88011b800000(0000) knlGS:0000000000000000
[ 253.933807] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 253.933807] CR2: 00000000014f1008 CR3: 000000006b216000 CR4: 00000000001006e0
[ 253.933807] Call Trace:
[ 253.933807] <IRQ>
[ 253.933807] ? sched_clock_cpu+0x132/0x170
[ 253.933807] ? __nft_trace_packet+0x180/0x180 [nf_tables]
[ 253.933807] ? sched_clock_cpu+0x132/0x170
[ 253.933807] ? debug_show_all_locks+0x290/0x290
[ 253.933807] ? __lock_acquire+0x4835/0x4af0
[ 253.933807] ? inet_ehash_locks_alloc+0x1a0/0x1a0
[ 253.933807] ? unwind_next_frame+0x159e/0x1840
[ 253.933807] ? __read_once_size_nocheck.constprop.4+0x5/0x10
[ 253.933807] ? nft_do_chain_ipv4+0x197/0x1e0 [nf_tables]
[ 253.933807] ? nft_do_chain+0x5/0xdf0 [nf_tables]
[ 253.933807] nft_do_chain_ipv4+0x197/0x1e0 [nf_tables]
[ 253.933807] ? nft_do_chain_arp+0xb0/0xb0 [nf_tables]
[ 253.933807] ? __lock_is_held+0x9d/0x130
[ 253.933807] nf_hook_slow+0xc4/0x150
[ 253.933807] ip_local_deliver+0x28b/0x380
[ 253.933807] ? ip_call_ra_chain+0x3e0/0x3e0
[ 253.933807] ? ip_rcv_finish+0x1610/0x1610
[ 253.933807] ip_rcv+0xbcc/0xcc0
[ 253.933807] ? debug_show_all_locks+0x290/0x290
[ 253.933807] ? ip_local_deliver+0x380/0x380
[ 253.933807] ? __lock_is_held+0x9d/0x130
[ 253.933807] ? ip_local_deliver+0x380/0x380
[ 253.933807] __netif_receive_skb_core+0x1c9c/0x2240

Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 0eb71a9d 17-Jun-2018 NeilBrown <neilb@suse.com>

rhashtable: split rhashtable.h

Due to the use of rhashtables in net namespaces,
rhashtable.h is included in lots of the kernel,
so a small changes can required a large recompilation.
This makes development painful.

This patch splits out rhashtable-types.h which just includes
the major type declarations, and does not include (non-trivial)
inline code. rhashtable.h is no longer included by anything
in the include/ directory.
Common include files only include rhashtable-types.h so a large
recompilation is only triggered when that changes.

Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 6396bb22 12-Jun-2018 Kees Cook <keescook@chromium.org>

treewide: kzalloc() -> kcalloc()

The kzalloc() function has a 2-factor argument form, kcalloc(). This
patch replaces cases of:

kzalloc(a * b, gfp)

with:
kcalloc(a * b, gfp)

as well as handling cases of:

kzalloc(a * b * c, gfp)

with:

kzalloc(array3_size(a, b, c), gfp)

as it's slightly less ugly than:

kzalloc_array(array_size(a, b), c, gfp)

This does, however, attempt to ignore constant size factors like:

kzalloc(4 * 1024, gfp)

though any constants defined via macros get caught up in the conversion.

Any factors with a sizeof() of "unsigned char", "char", and "u8" were
dropped, since they're redundant.

The Coccinelle script used for this was:

// Fix redundant parens around sizeof().
@@
type TYPE;
expression THING, E;
@@

(
kzalloc(
- (sizeof(TYPE)) * E
+ sizeof(TYPE) * E
, ...)
|
kzalloc(
- (sizeof(THING)) * E
+ sizeof(THING) * E
, ...)
)

// Drop single-byte sizes and redundant parens.
@@
expression COUNT;
typedef u8;
typedef __u8;
@@

(
kzalloc(
- sizeof(u8) * (COUNT)
+ COUNT
, ...)
|
kzalloc(
- sizeof(__u8) * (COUNT)
+ COUNT
, ...)
|
kzalloc(
- sizeof(char) * (COUNT)
+ COUNT
, ...)
|
kzalloc(
- sizeof(unsigned char) * (COUNT)
+ COUNT
, ...)
|
kzalloc(
- sizeof(u8) * COUNT
+ COUNT
, ...)
|
kzalloc(
- sizeof(__u8) * COUNT
+ COUNT
, ...)
|
kzalloc(
- sizeof(char) * COUNT
+ COUNT
, ...)
|
kzalloc(
- sizeof(unsigned char) * COUNT
+ COUNT
, ...)
)

// 2-factor product with sizeof(type/expression) and identifier or constant.
@@
type TYPE;
expression THING;
identifier COUNT_ID;
constant COUNT_CONST;
@@

(
- kzalloc
+ kcalloc
(
- sizeof(TYPE) * (COUNT_ID)
+ COUNT_ID, sizeof(TYPE)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(TYPE) * COUNT_ID
+ COUNT_ID, sizeof(TYPE)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(TYPE) * (COUNT_CONST)
+ COUNT_CONST, sizeof(TYPE)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(TYPE) * COUNT_CONST
+ COUNT_CONST, sizeof(TYPE)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(THING) * (COUNT_ID)
+ COUNT_ID, sizeof(THING)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(THING) * COUNT_ID
+ COUNT_ID, sizeof(THING)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(THING) * (COUNT_CONST)
+ COUNT_CONST, sizeof(THING)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(THING) * COUNT_CONST
+ COUNT_CONST, sizeof(THING)
, ...)
)

// 2-factor product, only identifiers.
@@
identifier SIZE, COUNT;
@@

- kzalloc
+ kcalloc
(
- SIZE * COUNT
+ COUNT, SIZE
, ...)

// 3-factor product with 1 sizeof(type) or sizeof(expression), with
// redundant parens removed.
@@
expression THING;
identifier STRIDE, COUNT;
type TYPE;
@@

(
kzalloc(
- sizeof(TYPE) * (COUNT) * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
kzalloc(
- sizeof(TYPE) * (COUNT) * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
kzalloc(
- sizeof(TYPE) * COUNT * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
kzalloc(
- sizeof(TYPE) * COUNT * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
kzalloc(
- sizeof(THING) * (COUNT) * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
kzalloc(
- sizeof(THING) * (COUNT) * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
kzalloc(
- sizeof(THING) * COUNT * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
kzalloc(
- sizeof(THING) * COUNT * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
)

// 3-factor product with 2 sizeof(variable), with redundant parens removed.
@@
expression THING1, THING2;
identifier COUNT;
type TYPE1, TYPE2;
@@

(
kzalloc(
- sizeof(TYPE1) * sizeof(TYPE2) * COUNT
+ array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
, ...)
|
kzalloc(
- sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
, ...)
|
kzalloc(
- sizeof(THING1) * sizeof(THING2) * COUNT
+ array3_size(COUNT, sizeof(THING1), sizeof(THING2))
, ...)
|
kzalloc(
- sizeof(THING1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(THING1), sizeof(THING2))
, ...)
|
kzalloc(
- sizeof(TYPE1) * sizeof(THING2) * COUNT
+ array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
, ...)
|
kzalloc(
- sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
, ...)
)

// 3-factor product, only identifiers, with redundant parens removed.
@@
identifier STRIDE, SIZE, COUNT;
@@

(
kzalloc(
- (COUNT) * STRIDE * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kzalloc(
- COUNT * (STRIDE) * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kzalloc(
- COUNT * STRIDE * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kzalloc(
- (COUNT) * (STRIDE) * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kzalloc(
- COUNT * (STRIDE) * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kzalloc(
- (COUNT) * STRIDE * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kzalloc(
- (COUNT) * (STRIDE) * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kzalloc(
- COUNT * STRIDE * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
)

// Any remaining multi-factor products, first at least 3-factor products,
// when they're not all constants...
@@
expression E1, E2, E3;
constant C1, C2, C3;
@@

(
kzalloc(C1 * C2 * C3, ...)
|
kzalloc(
- (E1) * E2 * E3
+ array3_size(E1, E2, E3)
, ...)
|
kzalloc(
- (E1) * (E2) * E3
+ array3_size(E1, E2, E3)
, ...)
|
kzalloc(
- (E1) * (E2) * (E3)
+ array3_size(E1, E2, E3)
, ...)
|
kzalloc(
- E1 * E2 * E3
+ array3_size(E1, E2, E3)
, ...)
)

// And then all remaining 2 factors products when they're not all constants,
// keeping sizeof() as the second factor argument.
@@
expression THING, E1, E2;
type TYPE;
constant C1, C2, C3;
@@

(
kzalloc(sizeof(THING) * C2, ...)
|
kzalloc(sizeof(TYPE) * C2, ...)
|
kzalloc(C1 * C2 * C3, ...)
|
kzalloc(C1 * C2, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(TYPE) * (E2)
+ E2, sizeof(TYPE)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(TYPE) * E2
+ E2, sizeof(TYPE)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(THING) * (E2)
+ E2, sizeof(THING)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(THING) * E2
+ E2, sizeof(THING)
, ...)
|
- kzalloc
+ kcalloc
(
- (E1) * E2
+ E1, E2
, ...)
|
- kzalloc
+ kcalloc
(
- (E1) * (E2)
+ E1, E2
, ...)
|
- kzalloc
+ kcalloc
(
- E1 * E2
+ E1, E2
, ...)
)

Signed-off-by: Kees Cook <keescook@chromium.org>


# 6da2ec56 12-Jun-2018 Kees Cook <keescook@chromium.org>

treewide: kmalloc() -> kmalloc_array()

The kmalloc() function has a 2-factor argument form, kmalloc_array(). This
patch replaces cases of:

kmalloc(a * b, gfp)

with:
kmalloc_array(a * b, gfp)

as well as handling cases of:

kmalloc(a * b * c, gfp)

with:

kmalloc(array3_size(a, b, c), gfp)

as it's slightly less ugly than:

kmalloc_array(array_size(a, b), c, gfp)

This does, however, attempt to ignore constant size factors like:

kmalloc(4 * 1024, gfp)

though any constants defined via macros get caught up in the conversion.

Any factors with a sizeof() of "unsigned char", "char", and "u8" were
dropped, since they're redundant.

The tools/ directory was manually excluded, since it has its own
implementation of kmalloc().

The Coccinelle script used for this was:

// Fix redundant parens around sizeof().
@@
type TYPE;
expression THING, E;
@@

(
kmalloc(
- (sizeof(TYPE)) * E
+ sizeof(TYPE) * E
, ...)
|
kmalloc(
- (sizeof(THING)) * E
+ sizeof(THING) * E
, ...)
)

// Drop single-byte sizes and redundant parens.
@@
expression COUNT;
typedef u8;
typedef __u8;
@@

(
kmalloc(
- sizeof(u8) * (COUNT)
+ COUNT
, ...)
|
kmalloc(
- sizeof(__u8) * (COUNT)
+ COUNT
, ...)
|
kmalloc(
- sizeof(char) * (COUNT)
+ COUNT
, ...)
|
kmalloc(
- sizeof(unsigned char) * (COUNT)
+ COUNT
, ...)
|
kmalloc(
- sizeof(u8) * COUNT
+ COUNT
, ...)
|
kmalloc(
- sizeof(__u8) * COUNT
+ COUNT
, ...)
|
kmalloc(
- sizeof(char) * COUNT
+ COUNT
, ...)
|
kmalloc(
- sizeof(unsigned char) * COUNT
+ COUNT
, ...)
)

// 2-factor product with sizeof(type/expression) and identifier or constant.
@@
type TYPE;
expression THING;
identifier COUNT_ID;
constant COUNT_CONST;
@@

(
- kmalloc
+ kmalloc_array
(
- sizeof(TYPE) * (COUNT_ID)
+ COUNT_ID, sizeof(TYPE)
, ...)
|
- kmalloc
+ kmalloc_array
(
- sizeof(TYPE) * COUNT_ID
+ COUNT_ID, sizeof(TYPE)
, ...)
|
- kmalloc
+ kmalloc_array
(
- sizeof(TYPE) * (COUNT_CONST)
+ COUNT_CONST, sizeof(TYPE)
, ...)
|
- kmalloc
+ kmalloc_array
(
- sizeof(TYPE) * COUNT_CONST
+ COUNT_CONST, sizeof(TYPE)
, ...)
|
- kmalloc
+ kmalloc_array
(
- sizeof(THING) * (COUNT_ID)
+ COUNT_ID, sizeof(THING)
, ...)
|
- kmalloc
+ kmalloc_array
(
- sizeof(THING) * COUNT_ID
+ COUNT_ID, sizeof(THING)
, ...)
|
- kmalloc
+ kmalloc_array
(
- sizeof(THING) * (COUNT_CONST)
+ COUNT_CONST, sizeof(THING)
, ...)
|
- kmalloc
+ kmalloc_array
(
- sizeof(THING) * COUNT_CONST
+ COUNT_CONST, sizeof(THING)
, ...)
)

// 2-factor product, only identifiers.
@@
identifier SIZE, COUNT;
@@

- kmalloc
+ kmalloc_array
(
- SIZE * COUNT
+ COUNT, SIZE
, ...)

// 3-factor product with 1 sizeof(type) or sizeof(expression), with
// redundant parens removed.
@@
expression THING;
identifier STRIDE, COUNT;
type TYPE;
@@

(
kmalloc(
- sizeof(TYPE) * (COUNT) * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
kmalloc(
- sizeof(TYPE) * (COUNT) * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
kmalloc(
- sizeof(TYPE) * COUNT * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
kmalloc(
- sizeof(TYPE) * COUNT * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
kmalloc(
- sizeof(THING) * (COUNT) * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
kmalloc(
- sizeof(THING) * (COUNT) * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
kmalloc(
- sizeof(THING) * COUNT * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
kmalloc(
- sizeof(THING) * COUNT * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
)

// 3-factor product with 2 sizeof(variable), with redundant parens removed.
@@
expression THING1, THING2;
identifier COUNT;
type TYPE1, TYPE2;
@@

(
kmalloc(
- sizeof(TYPE1) * sizeof(TYPE2) * COUNT
+ array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
, ...)
|
kmalloc(
- sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
, ...)
|
kmalloc(
- sizeof(THING1) * sizeof(THING2) * COUNT
+ array3_size(COUNT, sizeof(THING1), sizeof(THING2))
, ...)
|
kmalloc(
- sizeof(THING1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(THING1), sizeof(THING2))
, ...)
|
kmalloc(
- sizeof(TYPE1) * sizeof(THING2) * COUNT
+ array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
, ...)
|
kmalloc(
- sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
, ...)
)

// 3-factor product, only identifiers, with redundant parens removed.
@@
identifier STRIDE, SIZE, COUNT;
@@

(
kmalloc(
- (COUNT) * STRIDE * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kmalloc(
- COUNT * (STRIDE) * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kmalloc(
- COUNT * STRIDE * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kmalloc(
- (COUNT) * (STRIDE) * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kmalloc(
- COUNT * (STRIDE) * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kmalloc(
- (COUNT) * STRIDE * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kmalloc(
- (COUNT) * (STRIDE) * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kmalloc(
- COUNT * STRIDE * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
)

// Any remaining multi-factor products, first at least 3-factor products,
// when they're not all constants...
@@
expression E1, E2, E3;
constant C1, C2, C3;
@@

(
kmalloc(C1 * C2 * C3, ...)
|
kmalloc(
- (E1) * E2 * E3
+ array3_size(E1, E2, E3)
, ...)
|
kmalloc(
- (E1) * (E2) * E3
+ array3_size(E1, E2, E3)
, ...)
|
kmalloc(
- (E1) * (E2) * (E3)
+ array3_size(E1, E2, E3)
, ...)
|
kmalloc(
- E1 * E2 * E3
+ array3_size(E1, E2, E3)
, ...)
)

// And then all remaining 2 factors products when they're not all constants,
// keeping sizeof() as the second factor argument.
@@
expression THING, E1, E2;
type TYPE;
constant C1, C2, C3;
@@

(
kmalloc(sizeof(THING) * C2, ...)
|
kmalloc(sizeof(TYPE) * C2, ...)
|
kmalloc(C1 * C2 * C3, ...)
|
kmalloc(C1 * C2, ...)
|
- kmalloc
+ kmalloc_array
(
- sizeof(TYPE) * (E2)
+ E2, sizeof(TYPE)
, ...)
|
- kmalloc
+ kmalloc_array
(
- sizeof(TYPE) * E2
+ E2, sizeof(TYPE)
, ...)
|
- kmalloc
+ kmalloc_array
(
- sizeof(THING) * (E2)
+ E2, sizeof(THING)
, ...)
|
- kmalloc
+ kmalloc_array
(
- sizeof(THING) * E2
+ E2, sizeof(THING)
, ...)
|
- kmalloc
+ kmalloc_array
(
- (E1) * E2
+ E1, E2
, ...)
|
- kmalloc
+ kmalloc_array
(
- (E1) * (E2)
+ E1, E2
, ...)
|
- kmalloc
+ kmalloc_array
(
- E1 * E2
+ E1, E2
, ...)
)

Signed-off-by: Kees Cook <keescook@chromium.org>


# 0a2cf5ee 11-Jun-2018 Florian Westphal <fw@strlen.de>

netfilter: nf_tables: close race between netns exit and rmmod

If net namespace is exiting while nf_tables module is being removed
we can oops:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000040
IP: nf_tables_flowtable_event+0x43/0xf0 [nf_tables]
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
Modules linked in: nf_tables(-) nfnetlink [..]
unregister_netdevice_notifier+0xdd/0x130
nf_tables_module_exit+0x24/0x3a [nf_tables]
SyS_delete_module+0x1c5/0x240
do_syscall_64+0x74/0x190

Avoid this by attempting to take reference on the net namespace from
the notifiers. If it fails the namespace is exiting already, and nft
core is taking care of cleanup work.

We also need to make sure the netdev hook type gets removed
before netns ops removal, else notifier might be invoked with device
event for a netns where net->nft was never initialised (because
pernet ops was removed beforehand).

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 71ad00c5 11-Jun-2018 Florian Westphal <fw@strlen.de>

netfilter: nf_tables: fix module unload race

We must first remove the nfnetlink protocol handler when nf_tables module
is unloaded -- we don't want userspace to submit new change requests once
we've started to tear down nft state.

Furthermore, nfnetlink must not call any subsystem function after
call_batch returned -EAGAIN.

EAGAIN means the subsys mutex was dropped, so its unlikely but possible that
nf_tables subsystem was removed due to 'rmmod nf_tables' on another cpu.

Therefore, we must abort batch completely and not move on to next part of
the batch.

Last, we can't invoke ->abort unless we've checked that the subsystem is
still registered.

Change netns exit path of nf_tables to make sure any incompleted
transaction gets removed on exit.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 1b2470e5 02-Jun-2018 Florian Westphal <fw@strlen.de>

netfilter: nf_tables: handle chain name lookups via rhltable

If there is a significant amount of chains list search is too slow, so
add an rhlist table for this.

This speeds up ruleset loading: for every new rule we have to check if
the name already exists in current generation.

We need to be able to cope with duplicate chain names in case a transaction
drops the nfnl mutex (for request_module) and the abort of this old
transaction is still pending.

The list is kept -- we need a way to iterate chains even if hash resize is
in progress without missing an entry.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 371ebcbb 02-Jun-2018 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: add destroy_clone expression

Before this patch, cloned expressions are released via ->destroy. This
is a problem for the new connlimit expression since the ->destroy path
drop a reference on the conntrack modules and it unregisters hooks. The
new ->destroy_clone provides context that this expression is being
released from the packet path, so it is mirroring ->clone(), where
neither module reference is dropped nor hooks need to be unregistered -
because this done from the control plane path from the ->init() path.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 3453c927 02-Jun-2018 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: pass ctx to nf_tables_expr_destroy()

nft_set_elem_destroy() can be called from call_rcu context. Annotate
netns and table in set object so we can populate the context object.
Moreover, pass context object to nf_tables_set_elem_destroy() from the
commit phase, since it is already available from there.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 00bfb320 02-Jun-2018 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: pass context to object destroy indirection

The new connlimit object needs this to properly deal with conntrack
dependencies.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 9c7f96fd 31-May-2018 Alexey Kodanev <alexey.kodanev@oracle.com>

netfilter: nf_tables: check msg_type before nft_trans_set(trans)

The patch moves the "trans->msg_type == NFT_MSG_NEWSET" check before
using nft_trans_set(trans). Otherwise we can get out of bounds read.

For example, KASAN reported the one when running 0001_cache_handling_0 nft
test. In this case "trans->msg_type" was NFT_MSG_NEWTABLE:

[75517.177808] BUG: KASAN: slab-out-of-bounds in nft_set_lookup_global+0x22f/0x270 [nf_tables]
[75517.279094] Read of size 8 at addr ffff881bdb643fc8 by task nft/7356
...
[75517.375605] CPU: 26 PID: 7356 Comm: nft Tainted: G E 4.17.0-rc7.1.x86_64 #1
[75517.489587] Hardware name: Oracle Corporation SUN SERVER X4-2
[75517.618129] Call Trace:
[75517.648821] dump_stack+0xd1/0x13b
[75517.691040] ? show_regs_print_info+0x5/0x5
[75517.742519] ? kmsg_dump_rewind_nolock+0xf5/0xf5
[75517.799300] ? lock_acquire+0x143/0x310
[75517.846738] print_address_description+0x85/0x3a0
[75517.904547] kasan_report+0x18d/0x4b0
[75517.949892] ? nft_set_lookup_global+0x22f/0x270 [nf_tables]
[75518.019153] ? nft_set_lookup_global+0x22f/0x270 [nf_tables]
[75518.088420] ? nft_set_lookup_global+0x22f/0x270 [nf_tables]
[75518.157689] nft_set_lookup_global+0x22f/0x270 [nf_tables]
[75518.224869] nf_tables_newsetelem+0x1a5/0x5d0 [nf_tables]
[75518.291024] ? nft_add_set_elem+0x2280/0x2280 [nf_tables]
[75518.357154] ? nla_parse+0x1a5/0x300
[75518.401455] ? kasan_kmalloc+0xa6/0xd0
[75518.447842] nfnetlink_rcv+0xc43/0x1bdf [nfnetlink]
[75518.507743] ? nfnetlink_rcv+0x7a5/0x1bdf [nfnetlink]
[75518.569745] ? nfnl_err_reset+0x3c0/0x3c0 [nfnetlink]
[75518.631711] ? lock_acquire+0x143/0x310
[75518.679133] ? netlink_deliver_tap+0x9b/0x1070
[75518.733840] ? kasan_unpoison_shadow+0x31/0x40
[75518.788542] netlink_unicast+0x45d/0x680
[75518.837111] ? __isolate_free_page+0x890/0x890
[75518.891913] ? netlink_attachskb+0x6b0/0x6b0
[75518.944542] netlink_sendmsg+0x6fa/0xd30
[75518.993107] ? netlink_unicast+0x680/0x680
[75519.043758] ? netlink_unicast+0x680/0x680
[75519.094402] sock_sendmsg+0xd9/0x160
[75519.138810] ___sys_sendmsg+0x64d/0x980
[75519.186234] ? copy_msghdr_from_user+0x350/0x350
[75519.243118] ? lock_downgrade+0x650/0x650
[75519.292738] ? do_raw_spin_unlock+0x5d/0x250
[75519.345456] ? _raw_spin_unlock+0x24/0x30
[75519.395065] ? __handle_mm_fault+0xbde/0x3410
[75519.448830] ? sock_setsockopt+0x3d2/0x1940
[75519.500516] ? __lock_acquire.isra.25+0xdc/0x19d0
[75519.558448] ? lock_downgrade+0x650/0x650
[75519.608057] ? __audit_syscall_entry+0x317/0x720
[75519.664960] ? __fget_light+0x58/0x250
[75519.711325] ? __sys_sendmsg+0xde/0x170
[75519.758850] __sys_sendmsg+0xde/0x170
[75519.804193] ? __ia32_sys_shutdown+0x90/0x90
[75519.856725] ? syscall_trace_enter+0x897/0x10e0
[75519.912354] ? trace_event_raw_event_sys_enter+0x920/0x920
[75519.979432] ? __audit_syscall_entry+0x720/0x720
[75520.036118] do_syscall_64+0xa3/0x3d0
[75520.081248] ? prepare_exit_to_usermode+0x47/0x1d0
[75520.139904] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[75520.201680] RIP: 0033:0x7fc153320ba0
[75520.245772] RSP: 002b:00007ffe294c3638 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
[75520.337708] RAX: ffffffffffffffda RBX: 00007ffe294c4820 RCX: 00007fc153320ba0
[75520.424547] RDX: 0000000000000000 RSI: 00007ffe294c46b0 RDI: 0000000000000003
[75520.511386] RBP: 00007ffe294c47b0 R08: 0000000000000004 R09: 0000000002114090
[75520.598225] R10: 00007ffe294c30a0 R11: 0000000000000246 R12: 00007ffe294c3660
[75520.684961] R13: 0000000000000001 R14: 00007ffe294c3650 R15: 0000000000000001

[75520.790946] Allocated by task 7356:
[75520.833994] kasan_kmalloc+0xa6/0xd0
[75520.878088] __kmalloc+0x189/0x450
[75520.920107] nft_trans_alloc_gfp+0x20/0x190 [nf_tables]
[75520.983961] nf_tables_newtable+0xcd0/0x1bd0 [nf_tables]
[75521.048857] nfnetlink_rcv+0xc43/0x1bdf [nfnetlink]
[75521.108655] netlink_unicast+0x45d/0x680
[75521.157013] netlink_sendmsg+0x6fa/0xd30
[75521.205271] sock_sendmsg+0xd9/0x160
[75521.249365] ___sys_sendmsg+0x64d/0x980
[75521.296686] __sys_sendmsg+0xde/0x170
[75521.341822] do_syscall_64+0xa3/0x3d0
[75521.386957] entry_SYSCALL_64_after_hwframe+0x44/0xa9

[75521.467867] Freed by task 23454:
[75521.507804] __kasan_slab_free+0x132/0x180
[75521.558137] kfree+0x14d/0x4d0
[75521.596005] free_rt_sched_group+0x153/0x280
[75521.648410] sched_autogroup_create_attach+0x19a/0x520
[75521.711330] ksys_setsid+0x2ba/0x400
[75521.755529] __ia32_sys_setsid+0xa/0x10
[75521.802850] do_syscall_64+0xa3/0x3d0
[75521.848090] entry_SYSCALL_64_after_hwframe+0x44/0xa9

[75521.929000] The buggy address belongs to the object at ffff881bdb643f80
which belongs to the cache kmalloc-96 of size 96
[75522.079797] The buggy address is located 72 bytes inside of
96-byte region [ffff881bdb643f80, ffff881bdb643fe0)
[75522.221234] The buggy address belongs to the page:
[75522.280100] page:ffffea006f6d90c0 count:1 mapcount:0 mapping:0000000000000000 index:0x0
[75522.377443] flags: 0x2fffff80000100(slab)
[75522.426956] raw: 002fffff80000100 0000000000000000 0000000000000000 0000000180200020
[75522.521275] raw: ffffea006e6fafc0 0000000c0000000c ffff881bf180f400 0000000000000000
[75522.615601] page dumped because: kasan: bad access detected

Fixes: 37a9cc525525 ("netfilter: nf_tables: add generation mask to sets")
Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# a654de8f 30-May-2018 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: fix chain dependency validation

The following ruleset:

add table ip filter
add chain ip filter input { type filter hook input priority 4; }
add chain ip filter ap
add rule ip filter input jump ap
add rule ip filter ap masquerade

results in a panic, because the masquerade extension should be rejected
from the filter chain. The existing validation is missing a chain
dependency check when the rule is added to the non-base chain.

This patch fixes the problem by walking down the rules from the
basechains, searching for either immediate or lookup expressions, then
jumping to non-base chains and again walking down the rules to perform
the expression validation, so we make sure the full ruleset graph is
validated. This is done only once from the commit phase, in case of
problem, we abort the transaction and perform fine grain validation for
error reporting. This patch requires 003087911af2 ("netfilter:
nfnetlink: allow commit to fail") to achieve this behaviour.

This patch also adds a cleanup callback to nfnl batch interface to reset
the validate state from the exit path.

As a result of this patch, nf_tables_check_loops() doesn't use
->validate to check for loops, instead it just checks for immediate
expressions.

Reported-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# d9adf22a 27-May-2018 Florian Westphal <fw@strlen.de>

netfilter: nf_tables: use call_rcu in netlink dumps

We can make all dumps and lookups lockless.

Dumps currently only hold the nfnl mutex on the dump request itself.
Dumps can span multiple syscalls, dump continuation doesn't acquire the
nfnl mutex anywhere, i.e. the dump callbacks in nf_tables already use
rcu and never rely on nfnl mutex being held.

So, just switch all dumpers to rcu.

This requires taking a module reference before dropping the rcu lock
so rmmod is blocked, we also need to hold module reference over
the entire dump operation sequence. netlink already supports this
via the .module member in the netlink_dump_control struct.

For the non-dump case (i.e. lookup of a specific tables, chains, etc),
we need to swtich to _rcu list iteration primitive and make sure we
use GFP_ATOMIC.

This patch also adds the new nft_netlink_dump_start_rcu() helper that
takes care of the get_ref, drop-rcu-lock,start dump,
get-rcu-lock,put-ref sequence.

The helper will be reused for all dumps.

Rationale in all dump requests is:

- use the nft_netlink_dump_start_rcu helper added in first patch
- use GFP_ATOMIC and rcu list iteration
- switch to .call_rcu

... thus making all dumps in nf_tables not depend on the
nfnl mutex anymore.

In the nf_tables_getgen: This callback just fetches the current base
sequence, there is no need to serialize this with nfnl nft mutex.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# d6501de8 27-May-2018 Florian Westphal <fw@strlen.de>

netfilter: nf_tables: fix endian mismatch in return type

harmless, but it avoids sparse warnings:

nf_tables_api.c:2813:16: warning: incorrect type in return expression (different base types)
nf_tables_api.c:2863:47: warning: incorrect type in argument 3 (different base types)
nf_tables_api.c:3524:47: warning: incorrect type in argument 3 (different base types)
nf_tables_api.c:3538:55: warning: incorrect type in argument 3 (different base types)

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 0cbc06b3 24-May-2018 Florian Westphal <fw@strlen.de>

netfilter: nf_tables: remove synchronize_rcu in commit phase

synchronize_rcu() is expensive.

The commit phase currently enforces an unconditional
synchronize_rcu() after incrementing the generation counter.

This is to make sure that a packet always sees a consistent chain, either
nft_do_chain is still using old generation (it will skip the newly added
rules), or the new one (it will skip old ones that might still be linked
into the list).

We could just remove the synchronize_rcu(), it would not cause a crash but
it could cause us to evaluate a rule that was removed and new rule for the
same packet, instead of either-or.

To resolve this, add rule pointer array holding two generations, the
current one and the future generation.

In commit phase, allocate the rule blob and populate it with the rules that
will be active in the new generation.

Then, make this rule blob public, replacing the old generation pointer.

Then the generation counter can be incremented.

nft_do_chain() will either continue to use the current generation
(in case loop was invoked right before increment), or the new one.

Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# bbb8c61f 28-May-2018 Taehee Yoo <ap420073@gmail.com>

netfilter: nf_tables: increase nft_counters_enabled in nft_chain_stats_replace()

When a chain is updated, a counter can be attached. if so,
the nft_counters_enabled should be increased.

test commands:

%nft add table ip filter
%nft add chain ip filter input { type filter hook input priority 4\; }
%iptables-compat -Z input
%nft delete chain ip filter input

we can see below messages.

[ 286.443720] jump label: negative count!
[ 286.448278] WARNING: CPU: 0 PID: 1459 at kernel/jump_label.c:197 __static_key_slow_dec_cpuslocked+0x6f/0xf0
[ 286.449144] Modules linked in: nf_tables nfnetlink ip_tables x_tables
[ 286.449144] CPU: 0 PID: 1459 Comm: nft Tainted: G W 4.17.0-rc2+ #12
[ 286.449144] RIP: 0010:__static_key_slow_dec_cpuslocked+0x6f/0xf0
[ 286.449144] RSP: 0018:ffff88010e5176f0 EFLAGS: 00010286
[ 286.449144] RAX: 000000000000001b RBX: ffffffffc0179500 RCX: ffffffffb8a82522
[ 286.449144] RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffff88011b7e5eac
[ 286.449144] RBP: 0000000000000000 R08: ffffed00236fce5c R09: ffffed00236fce5b
[ 286.449144] R10: ffffffffc0179503 R11: ffffed00236fce5c R12: 0000000000000000
[ 286.449144] R13: ffff88011a28e448 R14: ffff88011a28e470 R15: dffffc0000000000
[ 286.449144] FS: 00007f0384328700(0000) GS:ffff88011b600000(0000) knlGS:0000000000000000
[ 286.449144] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 286.449144] CR2: 00007f038394bf10 CR3: 0000000104a86000 CR4: 00000000001006f0
[ 286.449144] Call Trace:
[ 286.449144] static_key_slow_dec+0x6a/0x70
[ 286.449144] nf_tables_chain_destroy+0x19d/0x210 [nf_tables]
[ 286.449144] nf_tables_commit+0x1891/0x1c50 [nf_tables]
[ 286.449144] nfnetlink_rcv+0x1148/0x13d0 [nfnetlink]
[ ... ]

Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 360cc79d 28-May-2018 Taehee Yoo <ap420073@gmail.com>

netfilter: nf_tables: fix NULL-ptr in nf_tables_dump_obj()

The table field in nft_obj_filter is not an array. In order to check
tablename, we should check if the pointer is set.

Test commands:

%nft add table ip filter
%nft add counter ip filter ct1
%nft reset counters

Splat looks like:

[ 306.510504] kasan: CONFIG_KASAN_INLINE enabled
[ 306.516184] kasan: GPF could be caused by NULL-ptr deref or user memory access
[ 306.524775] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
[ 306.528284] Modules linked in: nft_objref nft_counter nf_tables nfnetlink ip_tables x_tables
[ 306.528284] CPU: 0 PID: 1488 Comm: nft Not tainted 4.17.0-rc4+ #17
[ 306.528284] Hardware name: To be filled by O.E.M. To be filled by O.E.M./Aptio CRB, BIOS 5.6.5 07/08/2015
[ 306.528284] RIP: 0010:nf_tables_dump_obj+0x52c/0xa70 [nf_tables]
[ 306.528284] RSP: 0018:ffff8800b6cb7520 EFLAGS: 00010246
[ 306.528284] RAX: 0000000000000000 RBX: ffff8800b6c49820 RCX: 0000000000000000
[ 306.528284] RDX: 0000000000000000 RSI: dffffc0000000000 RDI: ffffed0016d96e9a
[ 306.528284] RBP: ffff8800b6cb75c0 R08: ffffed00236fce7c R09: ffffed00236fce7b
[ 306.528284] R10: ffffffff9f6241e8 R11: ffffed00236fce7c R12: ffff880111365108
[ 306.528284] R13: 0000000000000000 R14: ffff8800b6c49860 R15: ffff8800b6c49860
[ 306.528284] FS: 00007f838b007700(0000) GS:ffff88011b600000(0000) knlGS:0000000000000000
[ 306.528284] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 306.528284] CR2: 00007ffeafabcf78 CR3: 00000000b6cbe000 CR4: 00000000001006f0
[ 306.528284] Call Trace:
[ 306.528284] netlink_dump+0x470/0xa20
[ 306.528284] __netlink_dump_start+0x5ae/0x690
[ 306.528284] ? nf_tables_getobj+0x1b3/0x740 [nf_tables]
[ 306.528284] nf_tables_getobj+0x2f5/0x740 [nf_tables]
[ 306.528284] ? nft_obj_notify+0x100/0x100 [nf_tables]
[ 306.528284] ? nf_tables_getobj+0x740/0x740 [nf_tables]
[ 306.528284] ? nf_tables_dump_flowtable_done+0x70/0x70 [nf_tables]
[ 306.528284] ? nft_obj_notify+0x100/0x100 [nf_tables]
[ 306.528284] nfnetlink_rcv_msg+0x8ff/0x932 [nfnetlink]
[ 306.528284] ? nfnetlink_rcv_msg+0x216/0x932 [nfnetlink]
[ 306.528284] netlink_rcv_skb+0x1c9/0x2f0
[ 306.528284] ? nfnetlink_bind+0x1d0/0x1d0 [nfnetlink]
[ 306.528284] ? debug_check_no_locks_freed+0x270/0x270
[ 306.528284] ? netlink_ack+0x7a0/0x7a0
[ 306.528284] ? ns_capable_common+0x6e/0x110
[ ... ]

Fixes: e46abbcc05aa8 ("netfilter: nf_tables: Allow table names of up to 255 chars")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# a37061a6 14-May-2018 Florian Westphal <fw@strlen.de>

netfilter: lift one-nat-hook-only restriction

This reverts commit f92b40a8b2645
("netfilter: core: only allow one nat hook per hook point"), this
limitation is no longer needed. The nat core now invokes these
functions and makes sure that hook evaluation stops after a mapping is
created and a null binding is created otherwise.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 4e25ceb8 14-May-2018 Florian Westphal <fw@strlen.de>

netfilter: nf_tables: allow chain type to override hook register

Will be used in followup patch when nat types no longer
use nf_register_net_hook() but will instead register with the nat core.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# f0dfd7a2 09-May-2018 Colin Ian King <colin.king@canonical.com>

netfilter: nf_tables: fix memory leak on error exit return

Currently the -EBUSY error return path is not free'ing resources
allocated earlier, leaving a memory leak. Fix this by exiting via the
error exit label err5 that performs the necessary resource clean
up.

Detected by CoverityScan, CID#1432975 ("Resource leak")

Fixes: 9744a6fcefcb ("netfilter: nf_tables: check if same extensions are set when adding elements")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# bb7b40ae 07-May-2018 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: bogus EBUSY in chain deletions

When removing a rule that jumps to chain and such chain in the same
batch, this bogusly hits EBUSY. Add activate and deactivate operations
to expression that can be called from the preparation and the
commit/abort phases.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 2f99aa31 25-Apr-2018 Florian Westphal <fw@strlen.de>

netfilter: nf_tables: skip synchronize_rcu if transaction log is empty

After processing the transaction log, the remaining entries of the log
need to be released.

However, in some cases no entries remain, e.g. because the transaction
did not remove anything.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 8e1102d5 16-Apr-2018 Florian Westphal <fw@strlen.de>

netfilter: nf_tables: support timeouts larger than 23 days

Marco De Benedetto says:
I would like to use a timeout of 30 days for elements in a set but it
seems there is a some kind of problem above 24d20h31m23s.

Fix this by using 'jiffies64' for timeout handling to get same behaviour
on 32 and 64bit systems.

nftables passes timeouts as u64 in milliseconds to the kernel,
but on kernel side we used a mixture of 'long' and jiffies conversions
rather than u64 and jiffies64.

Bugzilla: https://bugzilla.netfilter.org/show_bug.cgi?id=1237
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 71cc0873 03-Apr-2018 Phil Sutter <phil@nwl.cc>

netfilter: nf_tables: Simplify set backend selection

Drop nft_set_type's ability to act as a container of multiple backend
implementations it chooses from. Instead consolidate the whole selection
logic in nft_select_set_ops() and the actual backend provided estimate()
callback.

This turns nf_tables_set_types into a list containing all available
backends which is traversed when selecting one matching userspace
requested criteria.

Also, this change allows to embed nft_set_ops structure into
nft_set_type and pull flags field into the latter as it's only used
during selection phase.

A crucial part of this change is to make sure the new layout respects
hash backend constraints formerly enforced by nft_hash_select_ops()
function: This is achieved by introduction of a specific estimate()
callback for nft_hash_fast_ops which returns false for key lengths != 4.
In turn, nft_hash_estimate() is changed to return false for key lengths
== 4 so it won't be chosen by accident. Also, both callbacks must return
false for unbounded sets as their size estimate depends on a known
maximum element count.

Note that this patch partially reverts commit 4f2921ca21b71 ("netfilter:
nf_tables: meter: pick a set backend that supports updates") by making
nft_set_ops_candidate() not explicitly look for an update callback but
make NFT_SET_EVAL a regular backend feature flag which is checked along
with the others. This way all feature requirements are checked in one
go.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 36dd1bcc 27-Mar-2018 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: initial support for extended ACK reporting

Keep it simple to start with, just report attribute offsets that can be
useful to userspace when representating errors to users.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# cac20fcd 27-Mar-2018 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: simplify lookup functions

Replace the nf_tables_ prefix by nft_ and merge code into single lookup
function whenever possible. In many cases we go over the 80-chars
boundary function names, this save us ~50 LoC.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 84453a90 26-Feb-2018 Felix Fietkau <nbd@nbd.name>

netfilter: nf_flow_table: track flow tables in nf_flow_table directly

Avoids having nf_flow_table depend on nftables (useful for future
iptables backport work)

Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 17857d92 26-Feb-2018 Felix Fietkau <nbd@nbd.name>

netfilter: nf_flow_table: fix priv pointer for netdev hook

The offload ip hook expects a pointer to the flowtable, not to the
rhashtable. Since the rhashtable is the first member, this is safe for
the moment, but breaks as soon as the structure layout changes

Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# a268de77 26-Feb-2018 Felix Fietkau <nbd@nbd.name>

netfilter: nf_flow_table: move init code to nf_flow_table_core.c

Reduces duplication of .gc and .params in flowtable type definitions and
makes the API clearer

Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# d71efb59 18-Apr-2018 Taehee Yoo <ap420073@gmail.com>

netfilter: nf_tables: fix out-of-bounds in nft_chain_commit_update

When chain name is changed, nft_chain_commit_update is called.
In the nft_chain_commit_update, trans->ctx.chain->name has old chain name
and nft_trans_chain_name(trans) has new chain name.
If new chain name is longer than old chain name, KASAN warns
slab-out-of-bounds.

[ 175.015012] BUG: KASAN: slab-out-of-bounds in strcpy+0x9e/0xb0
[ 175.022735] Write of size 1 at addr ffff880114e022da by task iptables-compat/1458

[ 175.031353] CPU: 0 PID: 1458 Comm: iptables-compat Not tainted 4.16.0-rc7+ #146
[ 175.031353] Hardware name: To be filled by O.E.M. To be filled by O.E.M./Aptio CRB, BIOS 5.6.5 07/08/2015
[ 175.031353] Call Trace:
[ 175.031353] dump_stack+0x68/0xa0
[ 175.031353] print_address_description+0xd0/0x260
[ 175.031353] ? strcpy+0x9e/0xb0
[ 175.031353] kasan_report+0x234/0x350
[ 175.031353] __asan_report_store1_noabort+0x1c/0x20
[ 175.031353] strcpy+0x9e/0xb0
[ 175.031353] nf_tables_commit+0x1ccc/0x2990
[ 175.031353] nfnetlink_rcv+0x141e/0x16c0
[ 175.031353] ? nfnetlink_net_init+0x150/0x150
[ 175.031353] ? lock_acquire+0x370/0x370
[ 175.031353] ? lock_acquire+0x370/0x370
[ 175.031353] netlink_unicast+0x444/0x640
[ 175.031353] ? netlink_attachskb+0x700/0x700
[ 175.031353] ? _copy_from_iter_full+0x180/0x740
[ 175.031353] ? kasan_check_write+0x14/0x20
[ 175.031353] ? _copy_from_user+0x9b/0xd0
[ 175.031353] netlink_sendmsg+0x845/0xc70
[ ... ]

Steps to reproduce:
iptables-compat -N 1
iptables-compat -E 1 aaaaaaaaa

Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 2f6adf48 10-Apr-2018 Florian Westphal <fw@strlen.de>

netfilter: nf_tables: free set name in error path

set->name must be free'd here in case ops->init fails.

Fixes: 387454901bd6 ("netfilter: nf_tables: Allow set names of up to 255 chars")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 569ccae6 10-Apr-2018 Florian Westphal <fw@strlen.de>

netfilter: nf_tables: can't fail after linking rule into active rule list

rules in nftables a free'd using kfree, but protected by rcu, i.e. we
must wait for a grace period to elapse.

Normal removal patch does this, but nf_tables_newrule() doesn't obey
this rule during error handling.

It calls nft_trans_rule_add() *after* linking rule, and, if that
fails to allocate memory, it unlinks the rule and then kfree() it --
this is unsafe.

Switch order -- first add rule to transaction list, THEN link it
to public list.

Note: nft_trans_rule_add() uses GFP_KERNEL; it will not fail so this
is not a problem in practice (spotted only during code review).

Fixes: 0628b123c96d12 ("netfilter: nfnetlink: add batch support and use it from nf_tables")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# a3073c17 27-Mar-2018 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: use nft_set_lookup_global from nf_tables_newsetelem()

Replace opencoded implementation of nft_set_lookup_global() by call to
this function.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 10659cba 27-Mar-2018 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: rename to nft_set_lookup_global()

To prepare shorter introduction of shorter function prefix.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 43a605f2 27-Mar-2018 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: enable conntrack if NAT chain is registered

Register conntrack hooks if the user adds NAT chains. Users get confused
with the existing behaviour since they will see no packets hitting this
chain until they add the first rule that refers to conntrack.

This patch adds new ->init() and ->free() indirections to chain types
that can be used by NAT chains to invoke the conntrack dependency.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 02c7b25e 27-Mar-2018 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: build-in filter chain type

One module per supported filter chain family type takes too much memory
for very little code - too much modularization - place all chain filter
definitions in one single file.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# cc07eeb0 27-Mar-2018 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: nft_register_chain_type() returns void

Use WARN_ON() instead since it should not happen that neither family
goes over NFPROTO_NUMPROTO nor there is already a chain of this type
already registered.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 32537e91 27-Mar-2018 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: rename struct nf_chain_type

Use nft_ prefix. By when I added chain types, I forgot to use the
nftables prefix. Rename enum nft_chain_type to enum nft_chain_types too,
otherwise there is an overlap.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 2f635cee 27-Mar-2018 Kirill Tkhai <ktkhai@virtuozzo.com>

net: Drop pernet_operations::async

Synchronous pernet_operations are not allowed anymore.
All are asynchronous. So, drop the structure member.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 90d2723c 20-Mar-2018 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: do not hold reference on netdevice from preparation phase

The netfilter netdevice event handler hold the nfnl_lock mutex, this
avoids races with a device going away while such device is being
attached to hooks from the netlink control plane. Therefore, either
control plane bails out with ENOENT or netdevice event path waits until
the hook that is attached to net_device is registered.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# d92191aa 21-Mar-2018 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: cache device name in flowtable object

Devices going away have to grab the nfnl_lock from the netdev event path
to avoid races with control plane updates.

However, netlink dumps in netfilter do not hold nfnl_lock mutex. Cache
the device name into the objects to avoid an use-after-free situation
for a device that is going away.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 467697d2 20-Mar-2018 Florian Westphal <fw@strlen.de>

netfilter: nf_tables: add missing netlink attrs to policies

Fixes: 8aeff920dcc9 ("netfilter: nf_tables: add stateful object reference to set elements")
Fixes: f25ad2e907f1 ("netfilter: nf_tables: prepare for expressions associated to set elements")
Fixes: 1a94e38d254b ("netfilter: nf_tables: add NFTA_RULE_ID attribute")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# ae6153b5 18-Mar-2018 Florian Westphal <fw@strlen.de>

netfilter: nf_tables: permit second nat hook if colliding hook is going away

Sergei Trofimovich reported that restoring an nft ruleset doesn't work
anymore unless old rule content is flushed first.

The problem stems from a recent change designed to prevent multiple nat
hooks at the same hook point locations and nftables transaction model.

A 'flush ruleset' won't take effect until the entire transaction has
completed.

So, if one has a nft.rules file that contains a 'flush ruleset',
followed by a nat hook register request, then 'nft -f file' will work,
but running 'nft -f file' again will fail with -EBUSY.

Reason is that nftables will place the flush/removal requests in the
transaction list, but it will not act on the removal until after all new
rules are in place.

The netfilter core will therefore get request to register a new nat
hook before the old one is removed -- this now fails as the netfilter
core can't know the existing hook is staged for removal.

To fix this, we can search the transaction log when a hook collision
is detected. The collision is okay if

1. there is a delete request pending for the nat hook that is already
registered.
2. there is no second add request for a matching nat hook.
This is required to only apply the exception once.

Fixes: f92b40a8b2645 ("netfilter: core: only allow one nat hook per hook point")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 4f2921ca 14-Mar-2018 Florian Westphal <fw@strlen.de>

netfilter: nf_tables: meter: pick a set backend that supports updates

in nftables, 'meter' can be used to instantiate a hash-table at run
time:

rule add filter forward iif "internal" meter hostacct { ip saddr counter}
nft list meter ip filter hostacct
table ip filter {
meter hostacct {
type ipv4_addr
elements = { 192.168.0.1 : counter packets 8 bytes 2672, ..

because elemets get added on the fly, the kernel must chose a set
backend type that implements the ->update() function, otherwise
rule insertion fails with EOPNOTSUPP.

Therefore, skip set types that lack ->update, and also
make sure we do not discard a (bad) candidate when we did yet
find any candidate at all. This could happen when userspace prefers
low memory footprint -- the set implementation currently checked might
not be a fit at all. Make sure we pick it anyway (!bops). In
case next candidate is a better fix, it will be chosen instead.

But in case nothing else is found we at least have a non-ideal
match rather than no match at all.

Fixes: 6c03ae210ce3 ("netfilter: nft_set_hash: add non-resizable hashtable implementation")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 5b4c6e38 12-Mar-2018 Gustavo A. R. Silva <gustavo@embeddedor.com>

netfilter: nf_tables: remove VLA usage

In preparation to enabling -Wvla, remove VLA and replace it
with dynamic memory allocation.

>From a security viewpoint, the use of Variable Length Arrays can be
a vector for stack overflow attacks. Also, in general, as the code
evolves it is easy to lose track of how big a VLA can get. Thus, we
can end up having segfaults that are hard to debug.

Also, fixed as part of the directive to remove all VLAs from
the kernel: https://lkml.org/lkml/2018/3/7/621

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# c04a3f73 09-Mar-2018 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: release flowtable hooks

Otherwise we leak this array.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# c7c5e435 06-Mar-2018 Kirill Tkhai <ktkhai@virtuozzo.com>

net: Convert nf_tables_net_ops

These pernet_operations looks nicely separated per-net.
Exit method unregisters net's nf tables objects.
We allow them be executed in parallel.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ae0662f8 19-Jan-2018 kbuild test robot <fengguang.wu@intel.com>

netfilter: nf_tables: nf_tables_obj_lookup_byhandle() can be static

Fixes: 3ecbfd65f50e ("netfilter: nf_tables: allocate handle and delete objects via handle")
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 0e0d5002 27-Feb-2018 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: use the right index from flowtable error path

Use the right loop index, not the number of devices in the array that we
need to remove, the following message uncovered the problem:

[ 5437.044119] hook not found, pf 5 num 0
[ 5437.044140] WARNING: CPU: 2 PID: 24983 at net/netfilter/core.c:376 __nf_unregister_net_hook+0x250/0x280

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# e603ea4b 26-Feb-2018 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: missing attribute validation in nf_tables_delflowtable()

Return -EINVAL is mandatory attributes are missing.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 32fc7187 26-Feb-2018 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: return EBUSY if device already belongs to flowtable

If the netdevice is already part of a flowtable, return EBUSY. I cannot
find a valid usecase for having two flowtables bound to the same
netdevice. We can still have two flowtable where the device set is
disjoint.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# b408c5b0 06-Feb-2018 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: fix flowtable free

Every flow_offload entry is added into the table twice. Because of this,
rhashtable_free_and_destroy can't be used, since it would call kfree for
each flow_offload object twice.

This patch cleans up the flowtable via nf_flow_table_iterate() to
schedule removal of entries by setting on the dying bit, then there is
an explicitly invocation of the garbage collector to release resources.

Based on patch from Felix Fietkau.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# c7f0030b 01-Feb-2018 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nft_flow_offload: wait for garbage collector to run after cleanup

If netdevice goes down, then flowtable entries are scheduled to be
removed. Wait for garbage collector to have a chance to run so it can
delete them from the hashtable.

The flush call might sleep, so hold the nfnl mutex from
nft_flow_table_iterate() instead of rcu read side lock. The use of the
nfnl mutex is also implicitly fixing races between updates via nfnetlink
and netdevice event.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# e5531166 19-Jan-2018 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: remove messages print and boot/module load time

Several reasons for this:

* Several modules maintain internal version numbers, that they print at
boot/module load time, that are not exposed to userspace, as a
primitive mechanism to make revision number control from the earlier
days of Netfilter.

* IPset shows the protocol version at boot/module load time, instead
display this via module description, as Jozsef suggested.

* Remove copyright notice at boot/module load time in two spots, the
Netfilter codebase is a collective development effort, if we would
have to display copyrights for each contributor at boot/module load
time for each extensions we have, we would probably fill up logs with
lots of useless information - from a technical standpoint.

So let's be consistent and remove them all.

Acked-by: Florian Westphal <fw@strlen.de>
Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 0e839dfa 18-Jan-2018 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: set flowtable priority and hooknum field

Otherwise netlink dump sends uninitialized fields to userspace.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 3ecbfd65 26-Dec-2017 Harsha Sharma <harshasharmaiitr@gmail.com>

netfilter: nf_tables: allocate handle and delete objects via handle

This patch allows deletion of objects via unique handle which can be
listed via '-a' option.

Signed-off-by: Harsha Sharma <harshasharmaiitr@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 03a0120f 10-Jan-2018 Wei Yongjun <weiyongjun1@huawei.com>

netfilter: nf_tables: fix a typo in nf_tables_getflowtable()

Fix a typo, we should check 'flowtable' instead of 'table'.

Fixes: 3b49e2e94e6e ("netfilter: nf_tables: add flow table netlink frontend")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 98319cb9 08-Jan-2018 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: get rid of struct nft_af_info abstraction

Remove the infrastructure to register/unregister nft_af_info structure,
this structure stores no useful information anymore.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# dd4cbef7 08-Jan-2018 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: get rid of pernet families

Now that we have a single table list for each netns, we can get rid of
one pointer per family and the global afinfo list, thus, shrinking
struct netns for nftables that now becomes 64 bytes smaller.

And call __nft_release_afinfo() from __net_exit path accordingly to
release netnamespace objects on removal.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 36596dad 08-Jan-2018 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: add single table list for all families

Place all existing user defined tables in struct net *, instead of
having one list per family. This saves us from one level of indentation
in netlink dump functions.

Place pointer to struct nft_af_info in struct nft_table temporarily, as
we still need this to put back reference module reference counter on
table removal.

This patch comes in preparation for the removal of struct nft_af_info.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 1ea26cca 19-Dec-2017 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: remove struct nft_af_info parameter in nf_tables_chain_type_lookup()

Pass family number instead, this comes in preparation for the removal of
struct nft_af_info.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# c9c17211 18-Dec-2017 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: no need for struct nft_af_info to enable/disable table

nf_tables_table_enable() and nf_tables_table_disable() take a pointer to
struct nft_af_info that is never used, remove it.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# e7bb5c71 19-Dec-2017 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: remove flag field from struct nft_af_info

Replace it by a direct check for the netdev protocol family.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# fe19c04c 19-Dec-2017 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: remove nhooks field from struct nft_af_info

We already validate the hook through bitmask, so this check is
superfluous. When removing this, this patch is also fixing a bug in the
new flowtable codebase, since ctx->afi points to the table family
instead of the netdev family which is where the flowtable is really
hooked in.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 3b49e2e9 06-Jan-2018 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: add flow table netlink frontend

This patch introduces a netlink control plane to create, delete and dump
flow tables. Flow tables are identified by name, this name is used from
rules to refer to an specific flow table. Flow tables use the rhashtable
class and a generic garbage collector to remove expired entries.

This also adds the infrastructure to add different flow table types, so
we can add one for each layer 3 protocol family.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 0befd061 01-Jan-2018 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: remove nft_dereference()

This macro is unnecessary, it just hides details for one single caller.
nfnl_dereference() is just enough.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# c2f9eafe 09-Dec-2017 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: remove hooks from family definition

They don't belong to the family definition, move them to the filter
chain type definition instead.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# c974a3a3 09-Dec-2017 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: remove multihook chains and families

Since NFPROTO_INET is handled from the core, we don't need to maintain
extra infrastructure in nf_tables to handle the double hook
registration, one for IPv4 and another for IPv6.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 408070d6 24-Nov-2017 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: add nft_set_is_anonymous() helper

Add helper function to test for the NFT_SET_ANONYMOUS flag.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 84ba7dd7 08-Dec-2017 Florian Westphal <fw@strlen.de>

netfilter: nf_tables: reject nat hook registration if prio is before conntrack

No problem for iptables as priorities are fixed values defined in the
nat modules, but in nftables the priority its coming from userspace.

Reject in case we see that such a hook would not work.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# f92b40a8 08-Dec-2017 Florian Westphal <fw@strlen.de>

netfilter: core: only allow one nat hook per hook point

The netfilter NAT core cannot deal with more than one NAT hook per hook
location (prerouting, input ...), because the NAT hooks install a NAT null
binding in case the iptables nat table (iptable_nat hooks) or the
corresponding nftables chain (nft nat hooks) doesn't specify a nat
transformation.

Null bindings are needed to detect port collsisions between NAT-ed and
non-NAT-ed connections.

This causes nftables NAT rules to not work when iptable_nat module is
loaded, and vice versa because nat binding has already been attached
when the second nat hook is consulted.

The netfilter core is not really the correct location to handle this
(hooks are just hooks, the core has no notion of what kinds of side
effects a hook implements), but its the only place where we can check
for conflicts between both iptables hooks and nftables hooks without
adding dependencies.

So add nat annotation to hook_ops to describe those hooks that will
add NAT bindings and then make core reject if such a hook already exists.
The annotation fills a padding hole, in case further restrictions appar
we might change this to a 'u8 type' instead of bool.

iptables error if nft nat hook active:
iptables -t nat -A POSTROUTING -j MASQUERADE
iptables v1.4.21: can't initialize iptables table `nat': File exists
Perhaps iptables or your kernel needs to be upgraded.

nftables error if iptables nat table present:
nft -f /etc/nftables/ipv4-nat
/usr/etc/nftables/ipv4-nat:3:1-2: Error: Could not process rule: File exists
table nat {
^^

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 8bea728d 24-Dec-2017 Hangbin Liu <liuhangbin@gmail.com>

netfilter: nf_tables: fix potential NULL-ptr deref in nf_tables_dump_obj_done()

If there is no NFTA_OBJ_TABLE and NFTA_OBJ_TYPE, the c.data will be NULL in
nf_tables_getobj(). So before free filter->table in nf_tables_dump_obj_done(),
we need to check if filter is NULL first.

Fixes: e46abbcc05aa ("netfilter: nf_tables: Allow table names of up to 255 chars")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 24c0df82 18-Dec-2017 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: fix chain filter in nf_tables_dump_rules()

ctx->chain may be null now that we have very large object names,
so we cannot check for ctx->chain[0] here.

Fixes: b7263e071aba7 ("netfilter: nf_tables: Allow table names of up to 255 chars")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Phil Sutter <phil@nwl.cc>


# 613d0776 12-Nov-2017 Vasily Averin <vvs@virtuozzo.com>

netfilter: exit_net cleanup check added

Be sure that lists initialized in net_init hook was return to initial
state.

Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# ba0e4d99 09-Oct-2017 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: get set elements via netlink

This patch adds a new get operation to look up for specific elements in
a set via netlink interface. You can also use it to check if an interval
already exists.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 644e334e 05-Nov-2017 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: performance set policy skips size description in selection

Use the complexity and space notations if policy is performance, this
results in placing the bitmap set representation over the hashtable for
key <= 16 for better performance as we discussed during the last NFWS in
Faro, Portugal.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 5f9bfe0e 04-Oct-2017 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: do not dump chain counters if not enabled

Chain counters are only enabled on demand since 9f08ea848117, skip them
when dumping them via netlink.

Fixes: 9f08ea848117 ("netfilter: nf_tables: keep chain counters away from hot path")
Reported-by: Johny Mattsson <johny.mattsson+kernel@gmail.com>
Tested-by: Johny Mattsson <johny.mattsson+kernel@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# e63aaaa6 19-Sep-2017 Arvind Yadav <arvind.yadav.cs@gmail.com>

netfilter: nf_tables: Release memory obtained by kasprintf

Free memory region, if nf_tables_set_alloc_name is not successful.

Fixes: 387454901bd6 ("netfilter: nf_tables: Allow set names of up to 255 chars")
Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 0d18779b 23-Sep-2017 JingPiao Chen <chenjingpiao@gmail.com>

netfilter: nf_tables: fix update chain error

# nft add table filter
# nft add chain filter c1
# nft rename chain filter c1 c2

Error: Could not process rule: No such file or directory
rename chain filter c1 c2
^^^^^^^^^^^^^^^^^^^^^^^^^^

# nft add chain filter c2
# nft rename chain filter c1 c2
# nft list table filter

table ip filter {
chain c2 {
}

chain c2 {
}
}

Fixes: 664b0f8cd8 ("netfilter: nf_tables: add generation mask to chains")
Signed-off-by: JingPiao Chen <chenjingpiao@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 9dee1474 03-Sep-2017 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: support for recursive chain deletion

This patch sorts out an asymmetry in deletions. Currently, table and set
deletion commands come with an implicit content flush on deletion.
However, chain deletion results in -EBUSY if there is content in this
chain, so no implicit flush happens. So you have to send a flush command
in first place to delete chains, this is inconsistent and it can be
annoying in terms of user experience.

This patch uses the new NLM_F_NONREC flag to request non-recursive chain
deletion, ie. if the chain to be removed contains rules, then this
returns EBUSY. This problem was discussed during the NFWS'17 in Faro,
Portugal. In iptables, you hit -EBUSY if you try to delete a chain that
contains rules, so you have to flush first before you can remove
anything. Since iptables-compat uses the nf_tables netlink interface, it
has to use the NLM_F_NONREC flag from userspace to retain the original
iptables semantics, ie. bail out on removing chains that contain rules.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# a8278400 03-Sep-2017 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: use NLM_F_NONREC for deletion requests

Bail out if user requests non-recursive deletion for tables and sets.
This new flags tells nf_tables netlink interface to reject deletions if
tables and sets have content.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 4035285f 03-Sep-2017 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: add nf_tables_addchain()

Wrap the chain addition path in a function to make it more maintainable.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 2c4a488a 03-Sep-2017 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: add nf_tables_updchain()

nf_tables_newchain() is too large, wrap the chain update path in a
function to make it more maintainable.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# dfc46034 23-Aug-2017 Pablo M. Bermudo Garay <pablombg@gmail.com>

netfilter: nf_tables: add select_ops for stateful objects

This patch adds support for overloading stateful objects operations
through the select_ops() callback, just as it is implemented for
expressions.

This change is needed for upcoming additions to the stateful objects
infrastructure.

Signed-off-by: Pablo M. Bermudo Garay <pablombg@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 61509575 27-Jul-2017 Phil Sutter <phil@nwl.cc>

netfilter: nf_tables: Allow object names of up to 255 chars

Same conversion as for table names, use NFT_NAME_MAXLEN as upper
boundary as well.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 38745490 27-Jul-2017 Phil Sutter <phil@nwl.cc>

netfilter: nf_tables: Allow set names of up to 255 chars

Same conversion as for table names, use NFT_NAME_MAXLEN as upper
boundary as well.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# b7263e07 27-Jul-2017 Phil Sutter <phil@nwl.cc>

netfilter: nf_tables: Allow chain name of up to 255 chars

Same conversion as for table names, use NFT_NAME_MAXLEN as upper
boundary as well.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# e46abbcc 27-Jul-2017 Phil Sutter <phil@nwl.cc>

netfilter: nf_tables: Allow table names of up to 255 chars

Allocate all table names dynamically to allow for arbitrary lengths but
introduce NFT_NAME_MAXLEN as an upper sanity boundary. It's value was
chosen to allow using a domain name as per RFC 1035.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 784b4e61 19-Jul-2017 Phil Sutter <phil@nwl.cc>

netfilter: nf_tables: Attach process info to NFT_MSG_NEWGEN notifications

This is helpful for 'nft monitor' to track which process caused a given
change to the ruleset.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 9f08ea84 18-Jul-2017 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: keep chain counters away from hot path

These chain counters are only used by the iptables-compat tool, that
allow users to use the x_tables extensions from the existing nf_tables
framework. This patch makes nf_tables by ~5% for the general usecase,
ie. native nft users, where no chain counters are used at all.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 04ba724b 19-Jun-2017 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nfnetlink: extended ACK reporting

Pass down struct netlink_ext_ack as parameter to all of our nfnetlink
subsystem callbacks, so we can work on follow up patches to provide
finer grain error reporting using the new infrastructure that
2d4bc93368f5 ("netlink: extended ACK reporting") provides.

No functional change, just pass down this new object to callbacks.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# d8297d4f 14-Jun-2017 Florian Westphal <fw@strlen.de>

netfilter: nf_tables: reduce chain type table size

text data bss dec hex filename
old: 151590 2240 1152 154982 25d66 net/netfilter/nf_tables_api.o
new: 151666 2240 416 154322 25ad2 net/netfilter/nf_tables_api.o

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 1ff75a3e 22-May-2017 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: allow large allocations for new sets

The new fixed size hashtable backend implementation may result in a
large array of buckets that would spew splats from mm. Update this code
to fall back on vmalloc in case the memory allocation order is too
costly.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 347b408d 22-May-2017 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: pass set description to ->privsize

The new non-resizable hashtable variant needs this to calculate the
size of the bucket array.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 2b664957 22-May-2017 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: select set backend flavour depending on description

This patch adds the infrastructure to support several implementations of
the same set type. This selection will be based on the set description
and the features available for this set. This allow us to select set
backend implementation that will result in better performance numbers.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 59105446 15-May-2017 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: revisit chain/object refcounting from elements

Andreas reports that the following incremental update using our commit
protocol doesn't work.

# nft -f incremental-update.nft
delete element ip filter client_to_any { 10.180.86.22 : goto CIn_1 }
delete chain ip filter CIn_1
... Error: Could not process rule: Device or resource busy

The existing code is not well-integrated into the commit phase protocol,
since element deletions do not result in refcount decrement from the
preparation phase. This results in bogus EBUSY errors like the one
above.

Two new functions come with this patch:

* nft_set_elem_activate() function is used from the abort path, to
restore the set element refcounting on objects that occurred from
the preparation phase.

* nft_set_elem_deactivate() that is called from nft_del_setelem() to
decrement set element refcounting on objects from the preparation
phase in the commit protocol.

The nft_data_uninit() has been renamed to nft_data_release() since this
function does not uninitialize any data store in the data register,
instead just releases the references to objects. Moreover, a new
function nft_data_hold() has been introduced to be used from
nft_set_elem_activate().

Reported-by: Andreas Schultz <aschultz@tpip.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# fa803605 14-May-2017 Liping Zhang <zlpnobody@gmail.com>

netfilter: nf_tables: can't assume lock is acquired when dumping set elems

When dumping the elements related to a specified set, we may invoke the
nf_tables_dump_set with the NFNL_SUBSYS_NFTABLES lock not acquired. So
we should use the proper rcu operation to avoid race condition, just
like other nft dump operations.

Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 9744a6fc 30-Apr-2017 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: check if same extensions are set when adding elements

If no NLM_F_EXCL is set and the element already exists in the set, make
sure that both elements have the same extensions.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 79250568 12-Apr-2017 Aaron Conole <aconole@bytheb.org>

netfilter: nf_tables: remove double return statement

Signed-off-by: Aaron Conole <aconole@bytheb.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# fceb6435 12-Apr-2017 Johannes Berg <johannes.berg@intel.com>

netlink: pass extended ACK struct to parsing functions

Pass the new extended ACK reporting struct to all of the generic
netlink parsing functions. For now, pass NULL in almost all callers
(except for some in the core.)

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# cbbb40e2 28-Mar-2017 simran singhal <singhalsimran0@gmail.com>

net: netfilter: Use list_{next/prev}_entry instead of list_entry

This patch replace list_entry with list_prev_entry as it makes the
code more clear to read.

Signed-off-by: simran singhal <singhalsimran0@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# dedb67c4 28-Mar-2017 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: Add nfnl_msg_type() helper function

Add and use nfnl_msg_type() function to replace opencoded nfnetlink
message type. I suggested this change, Arushi Singhal made an initial
patch to address this but was missing several spots.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# f323d954 20-Mar-2017 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: add nft_is_base_chain() helper

This new helper function allows us to check if this is a basechain.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 04166f48 13-Mar-2017 Pablo Neira Ayuso <pablo@netfilter.org>

Revert "netfilter: nf_tables: add flush field to struct nft_set_iter"

This reverts commit 1f48ff6c5393aa7fe290faf5d633164f105b0aa7.

This patch is not required anymore now that we keep a dummy list of
set elements in the bitmap set implementation, so revert this before
we forget this code has no clients.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 84fba055 08-Mar-2017 Florian Westphal <fw@strlen.de>

netfilter: provide nft_ctx in object init function

this is needed by the upcoming ct helper object type --
we'd like to be able use the table family (ip, ip6, inet) to figure
out which helper has to be requested.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# c7a72e3f 06-Mar-2017 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: add nft_set_lookup()

This new function consolidates set lookup via either name or ID by
introducing a new nft_set_lookup() function. Replace existing spots
where we can use this too.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# c56e3956 05-Mar-2017 Liping Zhang <zlpnobody@gmail.com>

netfilter: nf_tables: validate the expr explicitly after init successfully

When we want to validate the expr's dependency or hooks, we must do two
things to accomplish it. First, write a X_validate callback function
and point ->validate to it. Second, call X_validate in init routine.
This is very common, such as fib, nat, reject expr and so on ...

It is a little ugly, since we will call X_validate in the expr's init
routine, it's better to do it in nf_tables_newexpr. So we can avoid to
do this again and again. After doing this, the second step listed above
is not useful anymore, remove them now.

Patch was tested by nftables/tests/py/nft-test.py and
nftables/tests/shell/run-tests.sh.

Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 25e94a99 28-Feb-2017 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: don't call nfnetlink_set_err() if nfnetlink_send() fails

The underlying nlmsg_multicast() already sets sk->sk_err for us to
notify socket overruns, so we should not do anything with this return
value. So we just call nfnetlink_set_err() if:

1) We fail to allocate the netlink message.

or

2) We don't have enough space in the netlink message to place attributes,
which means that we likely need to allocate a larger message.

Before this patch, the internal ESRCH netlink error code was propagated
to userspace, which is quite misleading. Netlink semantics mandate that
listeners just hit ENOBUFS if the socket buffer overruns.

Reported-by: Alexander Alemayhu <alexander@alemayhu.com>
Tested-by: Alexander Alemayhu <alexander@alemayhu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 7286ff7f 10-Feb-2017 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: honor NFT_SET_OBJECT in set backend selection

Check for NFT_SET_OBJECT feature flag, otherwise we may end up selecting
the wrong set backend.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 1a94e38d 09-Feb-2017 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: add NFTA_RULE_ID attribute

This new attribute allows us to uniquely identify a rule in transaction.
Robots may trigger an insertion followed by deletion in a batch, in that
scenario we still don't have a public rule handle that we can use to
delete the rule. This is similar to the NFTA_SET_ID attribute that
allows us to refer to an anonymous set from a batch.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 74e8bcd2 09-Feb-2017 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: add check_genid to the nfnetlink subsystem

This patch implements the check generation id as provided by nfnetlink.
This allows us to reject ruleset updates against stale baseline, so
userspace can retry update with a fresh ruleset cache.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 0b5a7874 18-Jan-2017 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: add space notation to sets

The space notation allows us to classify the set backend implementation
based on the amount of required memory. This provides an order of the
set representation scalability in terms of memory. The size field is
still left in place so use this if the userspace provides no explicit
number of elements, so we cannot calculate the real memory that this set
needs. This also helps us break ties in the set backend selection
routine, eg. two backend implementations provide the same performance.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 55af753c 18-Jan-2017 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: rename struct nft_set_estimate class field

Use lookup as field name instead, to prepare the introduction of the
memory class in a follow up patch.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 1f48ff6c 18-Jan-2017 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: add flush field to struct nft_set_iter

This provides context to walk callback iterator, thus, we know if the
walk happens from the set flush path. This is required by the new bitmap
set type coming in a follow up patch which has no real struct
nft_set_ext, so it has to allocate it based on the two bit compact
element representation.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 1ba1c414 18-Jan-2017 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: rename deactivate_one() to flush()

Although semantics are similar to deactivate() with no implicit element
lookup, this is only called from the set flush path, so better rename
this to flush().

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# baa2d42c 18-Jan-2017 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: use struct nft_set_iter in set element flush

Instead of struct nft_set_dump_args, remove unnecessary wrapper
structure.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 5cb82a38 18-Jan-2017 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: pass netns to set->ops->remove()

This new parameter is required by the new bitmap set type that comes in a
follow up patch.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 10435c11 20-Jan-2017 Feng <fgao@ikuai8.com>

netfilter: nf_tables: Eliminate duplicated code in nf_tables_table_enable()

If something fails in nf_tables_table_enable(), it unregisters the
chains. But the rollback code is the same as nf_tables_table_disable()
almostly, except there is one counter check. Now create one wrapper
function to eliminate the duplicated codes.

Signed-off-by: Feng <fgao@ikuai8.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# b2c11e4b 23-Jan-2017 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: bump set->ndeact on set flush

Add missing set->ndeact update on each deactivated element from the set
flush path. Otherwise, sets with fixed size break after flush since
accounting breaks.

# nft add set x y { type ipv4_addr\; size 2\; }
# nft add element x y { 1.1.1.1 }
# nft add element x y { 1.1.1.2 }
# nft flush set x y
# nft add element x y { 1.1.1.1 }
<cmdline>:1:1-28: Error: Could not process rule: Too many open files in system

Fixes: 8411b6442e59 ("netfilter: nf_tables: support for set flushing")
Reported-by: Elise Lennion <elise.lennion@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# de70185d 23-Jan-2017 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: deconstify walk callback function

The flush operation needs to modify set and element objects, so let's
deconstify this.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 35d0ac90 23-Jan-2017 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: fix set->nelems counting with no NLM_F_EXCL

If the element exists and no NLM_F_EXCL is specified, do not bump
set->nelems, otherwise we leak one set element slot. This problem
amplifies if the set is full since the abort path always decrements the
counter for the -ENFILE case too, giving one spare extra slot.

Fix this by moving set->nelems update to nft_add_set_elem() after
successful element insertion. Moreover, remove the element if the set is
full so there is no need to rely on the abort path to undo things
anymore.

Fixes: c016c7e45ddf ("netfilter: nf_tables: honor NLM_F_EXCL flag in set element insertion")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# b2fbd044 20-Jan-2017 Liping Zhang <zlpnobody@gmail.com>

netfilter: nf_tables: validate the name size when possible

Currently, if the user add a stateful object with the name size exceed
NFT_OBJ_MAXNAMELEN - 1 (i.e. 31), we truncate it down to 31 silently.
This is not friendly, furthermore, this will cause duplicated stateful
objects when the first 31 characters of the name is same. So limit the
stateful object's name size to NFT_OBJ_MAXNAMELEN - 1.

After apply this patch, error message will be printed out like this:
# name_32=$(printf "%0.sQ" {1..32})
# nft add counter filter $name_32
<cmdline>:1:1-52: Error: Could not process rule: Numerical result out
of range
add counter filter QQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQ
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Also this patch cleans up the codes which missing the name size limit
validation in nftables.

Fixes: e50092404c1b ("netfilter: nf_tables: add stateful objects")
Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 1a28ad74 16-Jan-2017 Gao Feng <fgao@ikuai8.com>

netfilter: nf_tables: eliminate useless condition checks

The return value of nf_tables_table_lookup() is valid pointer or one
pointer error. There are two cases:

1) IS_ERR(table) is true, it would return the error or reset the
table as NULL, it is unnecessary to perform the latter check
"table != NULL".

2) IS_ERR(obj) is false, the table is one valid pointer. It is also
unnecessary to perform that check.

The nf_tables_newset() and nf_tables_newobj() have same logic codes.

In summary, we could move the block of condition check "table != NULL"
in the else block to eliminate the original condition checks.

Signed-off-by: Gao Feng <fgao@ikuai8.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# d21e540b 07-Jan-2017 Liping Zhang <zlpnobody@gmail.com>

netfilter: nf_tables: fix possible oops when dumping stateful objects

When dumping nft stateful objects, if NFTA_OBJ_TABLE and NFTA_OBJ_TYPE
attributes are not specified either, filter will become NULL, so oops
will happen(actually nft utility will always set NFTA_OBJ_TABLE attr,
so I write a test program to make this happen):

BUG: unable to handle kernel NULL pointer dereference at (null)
IP: nf_tables_dump_obj+0x17c/0x330 [nf_tables]
[...]
Call Trace:
? nf_tables_dump_obj+0x5/0x330 [nf_tables]
? __kmalloc_reserve.isra.35+0x31/0x90
? __alloc_skb+0x5b/0x1e0
netlink_dump+0x124/0x2a0
__netlink_dump_start+0x161/0x190
nf_tables_getobj+0xe8/0x280 [nf_tables]

Fixes: a9fea2a3c3cf ("netfilter: nf_tables: allow to filter stateful object dumps by type")
Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 3e38df13 13-Dec-2016 Florian Westphal <fw@strlen.de>

netfilter: nf_tables: fix oob access

BUG: KASAN: slab-out-of-bounds in nf_tables_rule_destroy+0xf1/0x130 at addr ffff88006a4c35c8
Read of size 8 by task nft/1607

When we've destroyed last valid expr, nft_expr_next() returns an invalid expr.
We must not dereference it unless it passes != nft_expr_last() check.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 8411b644 05-Dec-2016 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: support for set flushing

This patch adds support for set flushing, that consists of walking over
the set elements if the NFTA_SET_ELEM_LIST_ELEMENTS attribute is set.
This patch requires the following changes:

1) Add set->ops->deactivate_one() operation: This allows us to
deactivate an element from the set element walk path, given we can
skip the lookup that happens in ->deactivate().

2) Add a new nft_trans_alloc_gfp() function since we need to allocate
transactions using GFP_ATOMIC given the set walk path happens with
held rcu_read_lock.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 1a37ef76 05-Dec-2016 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: constify struct nft_ctx * parameter in nft_trans_alloc()

Context is not modified by nft_trans_alloc(), so constify it.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# a9fea2a3 27-Nov-2016 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: allow to filter stateful object dumps by type

This patch adds the netlink code to filter out dump of stateful objects,
through the NFTA_OBJ_TYPE netlink attribute.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 63aea290 27-Nov-2016 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nft_objref: support for stateful object maps

This patch allows us to refer to stateful object dictionaries, the
source register indicates the key data to be used to look up for the
corresponding state object. We can refer to these maps through names or,
alternatively, the map transaction id. This allows us to refer to both
anonymous and named maps.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 8aeff920 27-Nov-2016 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: add stateful object reference to set elements

This patch allows you to refer to stateful objects from set elements.
This provides the infrastructure to create maps where the right hand
side of the mapping is a stateful object.

This allows us to build dictionaries of stateful objects, that you can
use to perform fast lookups using any arbitrary key combination.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 18965317 27-Nov-2016 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nft_quota: add depleted flag for objects

Notify on depleted quota objects. The NFT_QUOTA_F_DEPLETED flag
indicates we have reached overquota.

Add pointer to table from nft_object, so we can use it when sending the
depletion notification to userspace.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 2599e989 27-Nov-2016 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: notify internal updates of stateful objects

Introduce nf_tables_obj_notify() to notify internal state changes in
stateful objects. This is used by the quota object to report depletion
in a follow up patch.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 43da04a5 27-Nov-2016 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: atomic dump and reset for stateful objects

This patch adds a new NFT_MSG_GETOBJ_RESET command perform an atomic
dump-and-reset of the stateful object. This also comes with add support
for atomic dump and reset for counter and quota objects.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# e5009240 27-Nov-2016 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: add stateful objects

This patch augments nf_tables to support stateful objects. This new
infrastructure allows you to create, dump and delete stateful objects,
that are identified by a user-defined name.

This patch adds the generic infrastructure, follow up patches add
support for two stateful objects: counters and quotas.

This patch provides a native infrastructure for nf_tables to replace
nfacct, the extended accounting infrastructure for iptables.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# d3e2a111 20-Nov-2016 Anders K. Pedersen <akp@cohaesio.com>

netfilter: nf_tables: fix inconsistent element expiration calculation

As Liping Zhang reports, after commit a8b1e36d0d1d ("netfilter: nft_dynset:
fix element timeout for HZ != 1000"), priv->timeout was stored in jiffies,
while set->timeout was stored in milliseconds. This is inconsistent and
incorrect.

Firstly, we already call msecs_to_jiffies in nft_set_elem_init, so
priv->timeout will be converted to jiffies twice.

Secondly, if the user did not specify the NFTA_DYNSET_TIMEOUT attr,
set->timeout will be used, but we forget to call msecs_to_jiffies
when do update elements.

Fix this by using jiffies internally for traditional sets and doing the
conversions to/from msec when interacting with userspace - as dynset
already does.

This is preferable to doing the conversions, when elements are inserted or
updated, because this can happen very frequently on busy dynsets.

Fixes: a8b1e36d0d1d ("netfilter: nft_dynset: fix element timeout for HZ != 1000")
Reported-by: Liping Zhang <zlpnobody@gmail.com>
Signed-off-by: Anders K. Pedersen <akp@cohaesio.com>
Acked-by: Liping Zhang <zlpnobody@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 58c78e10 06-Nov-2016 Liping Zhang <zlpnobody@gmail.com>

netfilter: nf_tables: fix oops when inserting an element into a verdict map

Dalegaard says:
The following ruleset, when loaded with 'nft -f bad.txt'
----snip----
flush ruleset
table ip inlinenat {
map sourcemap {
type ipv4_addr : verdict;
}

chain postrouting {
ip saddr vmap @sourcemap accept
}
}
add chain inlinenat test
add element inlinenat sourcemap { 100.123.10.2 : jump test }
----snip----

results in a kernel oops:
BUG: unable to handle kernel paging request at 0000000000001344
IP: [<ffffffffa07bf704>] nf_tables_check_loops+0x114/0x1f0 [nf_tables]
[...]
Call Trace:
[<ffffffffa07c2aae>] ? nft_data_init+0x13e/0x1a0 [nf_tables]
[<ffffffffa07c1950>] nft_validate_register_store+0x60/0xb0 [nf_tables]
[<ffffffffa07c74b5>] nft_add_set_elem+0x545/0x5e0 [nf_tables]
[<ffffffffa07bfdd0>] ? nft_table_lookup+0x30/0x60 [nf_tables]
[<ffffffff8132c630>] ? nla_strcmp+0x40/0x50
[<ffffffffa07c766e>] nf_tables_newsetelem+0x11e/0x210 [nf_tables]
[<ffffffff8132c400>] ? nla_validate+0x60/0x80
[<ffffffffa030d9b4>] nfnetlink_rcv+0x354/0x5a7 [nfnetlink]

Because we forget to fill the net pointer in bind_ctx, so dereferencing
it may cause kernel crash.

Reported-by: Dalegaard <dalegaard@gmail.com>
Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# c17c3cdf 29-Oct-2016 Liping Zhang <zlpnobody@gmail.com>

netfilter: nf_tables: destroy the set if fail to add transaction

When the memory is exhausted, then we will fail to add the NFT_MSG_NEWSET
transaction. In such case, we should destroy the set before we free it.

Fixes: 958bee14d071 ("netfilter: nf_tables: use new transaction infrastructure to handle sets")
Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# f1d505bb 25-Oct-2016 John W. Linville <linville@tuxdriver.com>

netfilter: nf_tables: fix type mismatch with error return from nft_parse_u32_check

Commit 36b701fae12ac ("netfilter: nf_tables: validate maximum value of
u32 netlink attributes") introduced nft_parse_u32_check with a return
value of "unsigned int", yet on error it returns "-ERANGE".

This patch corrects the mismatch by changing the return value to "int",
which happens to match the actual users of nft_parse_u32_check already.

Found by Coverity, CID 1373930.

Note that commit 21a9e0f1568ea ("netfilter: nft_exthdr: fix error
handling in nft_exthdr_init()) attempted to address the issue, but
did not address the return type of nft_parse_u32_check.

Signed-off-by: John W. Linville <linville@tuxdriver.com>
Cc: Laura Garcia Liebana <nevola@gmail.com>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: 36b701fae12ac ("netfilter: nf_tables: validate maximum value...")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 61f9e292 22-Oct-2016 Liping Zhang <zlpnobody@gmail.com>

netfilter: nf_tables: fix *leak* when expr clone fail

When nft_expr_clone failed, a series of problems will happen:

1. module refcnt will leak, we call __module_get at the beginning but
we forget to put it back if ops->clone returns fail
2. memory will be leaked, if clone fail, we just return NULL and forget
to free the alloced element
3. set->nelems will become incorrect when set->size is specified. If
clone fail, we should decrease the set->nelems

Now this patch fixes these problems. And fortunately, clone fail will
only happen on counter expression when memory is exhausted.

Fixes: 086f332167d6 ("netfilter: nf_tables: add clone interface to expression operations")
Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 09525a09 11-Oct-2016 Dan Carpenter <dan.carpenter@oracle.com>

netfilter: nf_tables: underflow in nft_parse_u32_check()

We don't want to allow negatives here.

Fixes: 36b701fae12a ('netfilter: nf_tables: validate maximum value of u32 netlink attributes')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 36b701fa 14-Sep-2016 Laura Garcia Liebana <nevola@gmail.com>

netfilter: nf_tables: validate maximum value of u32 netlink attributes

Fetch value and validate u32 netlink attribute. This validation is
usually required when the u32 netlink attributes are being stored in a
field whose size is smaller.

This patch revisits 4da449ae1df9 ("netfilter: nft_exthdr: Add size check
on u8 nft_exthdr attributes").

Fixes: 96518518cc41 ("netfilter: add nftables")
Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Laura Garcia Liebana <nevola@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# c016c7e4 23-Aug-2016 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: honor NLM_F_EXCL flag in set element insertion

If the NLM_F_EXCL flag is set, then new elements that clash with an
existing one return EEXIST. In case you try to add an element whose
data area differs from what we have, then this returns EBUSY. If no
flag is specified at all, then this returns success to userspace.

This patch also update the set insert operation so we can fetch the
existing element that clashes with the one you want to add, we need
this to make sure the element data doesn't differ.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 6133740d 01-Aug-2016 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: reject hook configuration updates on existing chains

Currently, if you add a base chain whose name clashes with an existing
non-base chain, nf_tables doesn't complain about this. Similarly, if you
update the chain type, the hook number and priority.

With this patch, nf_tables bails out in case any of this unsupported
operations occur by returning EBUSY.

# nft add table x
# nft add chain x y
# nft add chain x y { type nat hook input priority 0\; }
<cmdline>:1:1-49: Error: Could not process rule: Device or resource busy
add chain x y { type nat hook input priority 0; }
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 508f8ccd 01-Aug-2016 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: introduce nft_chain_parse_hook()

Introduce a new function to wrap the code that parses the chain hook
configuration so we can reuse this code to validate chain updates.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 6e1f760e 18-Jul-2016 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: allow to filter out rules by table and chain

If the table and/or chain attributes are set in a rule dump request,
we filter out the rules based on this selection.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 42a55769 08-Jul-2016 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: get rid of possible_net_t from set and basechain

We can pass the netns pointer as parameter to the functions that need to
gain access to it. From basechains, I didn't find any client for this
field anymore so let's remove this too.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 82bec71d 22-Jun-2016 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: get rid of NFT_BASECHAIN_DISABLED

This flag was introduced to restore rulesets from the new netdev
family, but since 5ebe0b0eec9d6f7 ("netfilter: nf_tables: destroy
basechain and rules on netdevice removal") the ruleset is released
once the netdev is gone.

This also removes nft_register_basechain() and
nft_unregister_basechain() since they have no clients anymore after
this rework.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 37a9cc52 12-Jun-2016 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: add generation mask to sets

Similar to ("netfilter: nf_tables: add generation mask to tables").

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 664b0f8c 12-Jun-2016 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: add generation mask to chains

Similar to ("netfilter: nf_tables: add generation mask to tables").

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# f2a6d766 14-Jun-2016 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: add generation mask to tables

This patch addresses two problems:

1) The netlink dump is inconsistent when interfering with an ongoing
transaction update for several reasons:

1.a) We don't honor the internal NFT_TABLE_INACTIVE flag, and we should
be skipping these inactive objects in the dump.

1.b) We perform speculative deletion during the preparation phase, that
may result in skipping active objects.

1.c) The listing order changes, which generates noise when tracking
incremental ruleset update via tools like git or our own
testsuite.

2) We don't allow to add and to update the object in the same batch,
eg. add table x; add table x { flags dormant\; }.

In order to resolve these problems:

1) If the user requests a deletion, the object becomes inactive in the
next generation. Then, ignore objects that scheduled to be deleted
from the lookup path, as they will be effectively removed in the
next generation.

2) From the get/dump path, if the object is not currently active, we
skip it.

3) Support 'add X -> update X' sequence from a transaction.

After this update, we obtain a consistent list as long as we stay
in the same generation. The userspace side can detect interferences
through the generation counter so it can restart the dumping.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 889f7ee7 12-Jun-2016 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: add generic macros to check for generation mask

Thus, we can reuse these to check the genmask of any object type, not
only rules. This is required now that tables, chain and sets will get a
generation mask field too in follow up patches.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 6cafaf47 20-Jun-2016 Liping Zhang <liping.zhang@spreadtrum.com>

netfilter: nf_tables: fix memory leak if expr init fails

If expr init fails then we need to free it.

So when the user add a nft rule as follows:

# nft add rule filter input tcp dport 22 flow table ssh \
{ ip saddr limit rate 0/second }

memory leak will happen.

Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# a02f4248 10-Jun-2016 Liping Zhang <liping.zhang@spreadtrum.com>

netfilter: nf_tables: fix wrong destroy anonymous sets if binding fails

When we add a nft rule like follows:
# nft add rule filter test tcp dport vmap {1: jump test}
-ELOOP error will be returned, and the anonymous set will be
destroyed.

But after that, nf_tables_abort will also try to remove the
element and destroy the set, which was already destroyed and
freed.

If we add a nft wrong rule, nft_tables_abort will do the cleanup
work rightly, so nf_tables_set_destroy call here is redundant and
wrong, remove it.

Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 8588ac09 10-Jun-2016 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: reject loops from set element jump to chain

Liping Zhang says:

"Users may add such a wrong nft rules successfully, which will cause an
endless jump loop:

# nft add rule filter test tcp dport vmap {1: jump test}

This is because before we commit, the element in the current anonymous
set is inactive, so osp->walk will skip this element and miss the
validate check."

To resolve this problem, this patch passes the generation mask to the
walk function through the iter container structure depending on the code
path:

1) If we're dumping the elements, then we have to check if the element
is active in the current generation. Thus, we check for the current
bit in the genmask.

2) If we're checking for loops, then we have to check if the element is
active in the next generation, as we're in the middle of a
transaction. Thus, we check for the next bit in the genmask.

Based on original patch from Liping Zhang.

Reported-by: Liping Zhang <liping.zhang@spreadtrum.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Tested-by: Liping Zhang <liping.zhang@spreadtrum.com>


# a4684402 10-Jun-2016 Liping Zhang <liping.zhang@spreadtrum.com>

netfilter: nf_tables: fix wrong check of NFT_SET_MAP in nf_tables_bind_set

We should check "i" is used as a dictionary or not, "binding" is already
checked before.

Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# eaa2bcd6 27-May-2016 Phil Turnbull <phil.turnbull@oracle.com>

netfilter: nf_tables: validate NFTA_SET_TABLE parameter

If the NFTA_SET_TABLE parameter is missing and the NLM_F_DUMP flag is
not set, then a NULL pointer dereference is triggered in
nf_tables_set_lookup because ctx.table is NULL.

Signed-off-by: Phil Turnbull <phil.turnbull@oracle.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# cb39ad8b 04-May-2016 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: allow set names up to 32 bytes

Currently, we support set names of up to 16 bytes, get this aligned
with the maximum length we can use in ipset to make it easier when
considering migration to nf_tables.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 3971ca14 12-Apr-2016 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: parse element flags from nft_del_setelem()

Parse flags and pass them to the set via ->deactivate() to check if we
remove the right element from the intervals.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 0e9091d6 12-Apr-2016 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: introduce nft_setelem_parse_flags() helper

This function parses the set element flags, thus, we can reuse the same
handling when deleting elements.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# b46f6ded 22-Apr-2016 Nicolas Dichtel <nicolas.dichtel@6wind.com>

libnl: nla_put_be64(): align on a 64-bit area

nla_data() is now aligned on a 64-bit area.

A temporary version (nla_put_be64_32bit()) is added for nla_put_net64().
This function is removed in the next patch.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# e6d8ecac 05-Jan-2016 Carlos Falgueras García <carlosfg@riseup.net>

netfilter: nf_tables: Add new attributes into nft_set to store user data.

User data is stored at after 'nft_set_ops' private data into 'data[]'
flexible array. The field 'udata' points to user data and 'udlen' stores
its length.

Add new flag NFTA_SET_USERDATA.

Signed-off-by: Carlos Falgueras García <carlosfg@riseup.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 5913beaf 15-Dec-2015 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nfnetlink: pass down netns pointer to commit() and abort() callbacks

Adapt callsites to avoid recurrent lookup of the netns pointer.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 7b8002a1 15-Dec-2015 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nfnetlink: pass down netns pointer to call() and call_rcu()

Adapt callsites to avoid recurrent lookup of the netns pointer.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# f4c756b4 15-Dec-2015 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: remove check against removal of inactive objects

The following sequence inside a batch, although not very useful, is
valid:

add table foo
...
delete table foo

This may be generated by some robot while applying some incremental
upgrade, so remove the defensive checks against this.

This patch keeps the check on the get/dump path by now, we have to
replace the inactive flag by introducing object generations.

Reported-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 5ebe0b0e 15-Dec-2015 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: destroy basechain and rules on netdevice removal

If the netdevice is destroyed, the resources that are attached should
be released too as they belong to the device that is now gone.

Suggested-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# df05ef87 15-Dec-2015 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: release objects on netns destruction

We have to release the existing objects on netns removal otherwise we
leak them. Chains are unregistered in first place to make sure no
packets are walking on our rules and sets anymore.

The object release happens by when we unregister the family via
nft_release_afinfo() which is called from nft_unregister_afinfo() from
the corresponding __net_exit path in every family.

Reported-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# a907e36d 07-Dec-2015 Xin Long <lucien.xin@gmail.com>

netfilter: nf_tables: use reverse traversal commit_list in nf_tables_abort

When we use 'nft -f' to submit rules, it will build multiple rules into
one netlink skb to send to kernel, kernel will process them one by one.
meanwhile, it add the trans into commit_list to record every commit.
if one of them's return value is -EAGAIN, status |= NFNL_BATCH_REPLAY
will be marked. after all the process is done. it will roll back all the
commits.

now kernel use list_add_tail to add trans to commit, and use
list_for_each_entry_safe to roll back. which means the order of adding
and rollback is the same. that will cause some cases cannot work well,
even trigger call trace, like:

1. add a set into table foo [return -EAGAIN]:
commit_list = 'add set trans'
2. del foo:
commit_list = 'add set trans' -> 'del set trans' -> 'del tab trans'
then nf_tables_abort will be called to roll back:
firstly process 'add set trans':
case NFT_MSG_NEWSET:
trans->ctx.table->use--;
list_del_rcu(&nft_trans_set(trans)->list);

it will del the set from the table foo, but it has removed when del
table foo [step 2], then the kernel will panic.

the right order of rollback should be:
'del tab trans' -> 'del set trans' -> 'add set trans'.
which is opposite with commit_list order.

so fix it by rolling back commits with reverse order in nf_tables_abort.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 633c9a84 08-Dec-2015 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nfnetlink: avoid recurrent netns lookups in call_batch

Pass the net pointer to the call_batch callback functions so we can skip
recurrent lookups.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Tested-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>


# 33d5a7b1 28-Nov-2015 Florian Westphal <fw@strlen.de>

netfilter: nf_tables: extend tracing infrastructure

nft monitor mode can then decode and display this trace data.

Parts of LL/Network/Transport headers are provided as separate
attributes.

Otherwise, printing IP address data becomes virtually impossible
for userspace since in the case of the netdev family we really don't
want userspace to have to know all the possible link layer types
and/or sizes just to display/print an ip address.

We also don't want userspace to have to follow ipv6 header chains
to get the s/dport info, the kernel already did this work for us.

To avoid bloating nft_do_chain all data required for tracing is
encapsulated in nft_traceinfo.

The structure is initialized unconditionally(!) for each nft_do_chain
invocation.

This unconditionall call will be moved under a static key in a
followup patch.

With lots of help from Patrick McHardy and Pablo Neira.

Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 2ffbceb2 13-Oct-2015 Florian Westphal <fw@strlen.de>

netfilter: remove hook owner refcounting

since commit 8405a8fff3f8 ("netfilter: nf_qeueue: Drop queue entries on
nf_unregister_hook") all pending queued entries are discarded.

So we can simply remove all of the owner handling -- when module is
removed it also needs to unregister all its hooks.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# fd2ecda0 10-Jul-2015 Eric W. Biederman <ebiederm@xmission.com>

netfilter: nftables: Only run the nftables chains in the proper netns

- Register the nftables chains in the network namespace that they need
to run in.

- Remove the hacks that stopped chains running in the wrong network
namespace.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 835b8033 14-Jun-2015 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables_netdev: unregister hooks on net_device removal

In case the net_device is gone, we have to unregister the hooks and put back
the reference on the net_device object. Once it comes back, register them
again. This also covers the device rename case.

This patch also adds a new flag to indicate that the basechain is disabled, so
their hooks are not registered. This flag is used by the netdev family to
handle the case where the net_device object is gone. Currently this flag is not
exposed to userspace.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# d8ee8f7c 14-Jun-2015 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: add nft_register_basechain() and nft_unregister_basechain()

This wrapper functions take care of hook registration for basechains.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 2cbce139 12-Jun-2015 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: attach net_device to basechain

The device is part of the hook configuration, so instead of a global
configuration per table, set it to each of the basechain that we create.

This patch reworks ebddf1a8d78a ("netfilter: nf_tables: allow to bind table to
net_device").

Note that this adds a dev_name field in the nft_base_chain structure which is
required the netdev notification subscription that follows up in a patch to
handle gone net_devices.

Suggested-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# ebddf1a8 26-May-2015 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: allow to bind table to net_device

This patch adds the internal NFT_AF_NEEDS_DEV flag to indicate that you must
attach this table to a net_device.

This change is required by the follow up patch that introduces the new netdev
table.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 960bd2c2 15-May-2015 Mirek Kratochvil <exa.exa@gmail.com>

netfilter: nf_tables: fix bogus warning in nft_data_uninit()

The values 0x00000000-0xfffffeff are reserved for userspace datatype. When,
deleting set elements with maps, a bogus warning is triggered.

WARNING: CPU: 0 PID: 11133 at net/netfilter/nf_tables_api.c:4481 nft_data_uninit+0x35/0x40 [nf_tables]()

This fixes the check accordingly to enum definition in
include/linux/netfilter/nf_tables.h

Fixes: https://bugzilla.netfilter.org/show_bug.cgi?id=1013
Signed-off-by: Mirek Kratochvil <exa.exa@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 4c4ed074 14-Apr-2015 Florian Westphal <fw@strlen.de>

netfilter: nf_tables: fix wrong length for jump/goto verdicts

NFT_JUMP/GOTO erronously sets length to sizeof(void *).

We then allocate insufficient memory when such element is added to a vmap.

Suggested-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 7c6c6e95 11-Apr-2015 Patrick McHardy <kaber@trash.net>

netfilter: nf_tables: add flag to indicate set contains expressions

Add a set flag to indicate that the set is used as a state table and
contains expressions for evaluation. This operation is mutually
exclusive with the mapping operation, so sets specifying both are
rejected. The lookup expression also rejects binding to state tables
since it only deals with loopup and map operations.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# f25ad2e9 11-Apr-2015 Patrick McHardy <kaber@trash.net>

netfilter: nf_tables: prepare for expressions associated to set elements

Preparation to attach expressions to set elements: add a set extension
type to hold an expression and dump the expression information with the
set element.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 0b2d8a7b 11-Apr-2015 Patrick McHardy <kaber@trash.net>

netfilter: nf_tables: add helper functions for expression handling

Add helper functions for initializing, cloning, dumping and destroying
a single expression that is not part of a rule.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 7d740264 10-Apr-2015 Patrick McHardy <kaber@trash.net>

netfilter: nf_tables: variable sized set element keys / data

This patch changes sets to support variable sized set element keys / data
up to 64 bytes each by using variable sized set extensions. This allows
to use concatenations with bigger data items suchs as IPv6 addresses.

As a side effect, small keys/data now don't require the full 16 bytes
of struct nft_data anymore but just the space they need.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# d0a11fc3 10-Apr-2015 Patrick McHardy <kaber@trash.net>

netfilter: nf_tables: support variable sized data in nft_data_init()

Add a size argument to nft_data_init() and pass in the available space.
This will be used by the following patches to support variable sized
set element data.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 49499c3e 10-Apr-2015 Patrick McHardy <kaber@trash.net>

netfilter: nf_tables: switch registers to 32 bit addressing

Switch the nf_tables registers from 128 bit addressing to 32 bit
addressing to support so called concatenations, where multiple values
can be concatenated over multiple registers for O(1) exact matches of
multiple dimensions using sets.

The old register values are mapped to areas of 128 bits for compatibility.
When dumping register numbers, values are expressed using the old values
if they refer to the beginning of a 128 bit area for compatibility.

To support concatenations, register loads of less than a full 32 bit
value need to be padded. This mainly affects the payload and exthdr
expressions, which both unconditionally zero the last word before
copying the data.

Userspace fully passes the testsuite using both old and new register
addressing.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# b1c96ed3 10-Apr-2015 Patrick McHardy <kaber@trash.net>

netfilter: nf_tables: add register parsing/dumping helpers

Add helper functions to parse and dump register values in netlink attributes.
These helpers will later be changed to take care of translation between the
old 128 bit and the new 32 bit register numbers.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 1ca2e170 10-Apr-2015 Patrick McHardy <kaber@trash.net>

netfilter: nf_tables: use struct nft_verdict within struct nft_data

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# d07db988 10-Apr-2015 Patrick McHardy <kaber@trash.net>

netfilter: nf_tables: introduce nft_validate_register_load()

Change nft_validate_input_register() to not only validate the input
register number, but also the length of the load, and rename it to
nft_validate_register_load() to reflect that change.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 27e6d201 10-Apr-2015 Patrick McHardy <kaber@trash.net>

netfilter: nf_tables: kill nft_validate_output_register()

All users of nft_validate_register_store() first invoke
nft_validate_output_register(). There is in fact no use for using it
on its own, so simplify the code by folding the functionality into
nft_validate_register_store() and kill it.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 58f40ab6 10-Apr-2015 Patrick McHardy <kaber@trash.net>

netfilter: nft_lookup: use nft_validate_register_store() to validate types

In preparation of validating the length of a register store, use
nft_validate_register_store() in nft_lookup instead of open coding the
validation.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 1ec10212 10-Apr-2015 Patrick McHardy <kaber@trash.net>

netfilter: nf_tables: rename nft_validate_data_load()

The existing name is ambiguous, data is loaded as well when we read from
a register. Rename to nft_validate_register_store() for clarity and
consistency with the upcoming patch to introduce its counterpart,
nft_validate_register_load().

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 45d9bcda 10-Apr-2015 Patrick McHardy <kaber@trash.net>

netfilter: nf_tables: validate len in nft_validate_data_load()

For values spanning multiple registers, we need to validate that enough
space is available from the destination register onwards. Add a len
argument to nft_validate_data_load() and consolidate the existing length
validations in preparation of that.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 68e942e8 05-Apr-2015 Patrick McHardy <kaber@trash.net>

netfilter: nf_tables: support optional userdata for set elements

Add an userdata set extension and allow the user to attach arbitrary
data to set elements. This is intended to hold TLV encoded data like
comments or DNS annotations that have no meaning to the kernel.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 22fe54d5 05-Apr-2015 Patrick McHardy <kaber@trash.net>

netfilter: nf_tables: add support for dynamic set updates

Add a new "dynset" expression for dynamic set updates.

A new set op ->update() is added which, for non existant elements,
invokes an initialization callback and inserts the new element.
For both new or existing elements the extenstion pointer is returned
to the caller to optionally perform timer updates or other actions.

Element removal is not supported so far, however that seems to be a
rather exotic need and can be added later on.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 11113e19 05-Apr-2015 Patrick McHardy <kaber@trash.net>

netfilter: nf_tables: support different set binding types

Currently a set binding is assumed to be related to a lookup and, in
case of maps, a data load.

In order to use bindings for set updates, the loop detection checks
must be restricted to map operations only. Add a flags member to the
binding struct to hold the set "action" flags such as NFT_SET_MAP,
and perform loop detection based on these.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 3dd0673a 05-Apr-2015 Patrick McHardy <kaber@trash.net>

netfilter: nf_tables: prepare set element accounting for async updates

Use atomic operations for the element count to avoid races with async
updates.

To properly handle the transactional semantics during netlink updates,
deleted but not yet committed elements are accounted for seperately and
are treated as being already removed. This means for the duration of
a netlink transaction, the limit might be exceeded by the amount of
elements deleted. Set implementations must be prepared to handle this.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 4a8678ef 05-Apr-2015 Patrick McHardy <kaber@trash.net>

netfilter: nf_tables: fix set selection when timeouts are requested

The NFT_SET_TIMEOUT flag is ignore in nft_select_set_ops, which may
lead to selection of a set implementation that doesn't actually
support timeouts.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 69086658 25-Mar-2015 Patrick McHardy <kaber@trash.net>

netfilter: nf_tables: add GC synchronization helpers

GC is expected to happen asynchrously to the netlink interface. In the
netlink path, both insertion and removal of elements consist of two
steps, insertion followed by activation or deactivation followed by
removal, during which the element must not be freed by GC.

The synchronization helpers use an unused bit in the genmask field to
atomically mark an element as "busy", meaning it is either currently
being handled through the netlink API or by GC.

Elements being processed by GC will never survive, netlink will simply
ignore them. Elements being currently processed through netlink will be
skipped by GC and reprocessed during the next run.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# cfed7e1b 25-Mar-2015 Patrick McHardy <kaber@trash.net>

netfilter: nf_tables: add set garbage collection helpers

Add helpers for GC batch destruction: since element destruction needs
a RCU grace period for all set implementations, add some helper functions
for asynchronous batch destruction. Elements are collected in a batch
structure, which is asynchronously released using RCU once its full.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# c3e1b005 25-Mar-2015 Patrick McHardy <kaber@trash.net>

netfilter: nf_tables: add set element timeout support

Add API support for set element timeouts. Elements can have a individual
timeout value specified, overriding the sets' default.

Two new extension types are used for timeouts - the timeout value and
the expiration time. The timeout value only exists if it differs from
the default value.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 761da293 25-Mar-2015 Patrick McHardy <kaber@trash.net>

netfilter: nf_tables: add set timeout API support

Add set timeout support to the netlink API. Sets with timeout support
enabled can have a default timeout value and garbage collection interval
specified.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# cc02e457 25-Mar-2015 Patrick McHardy <kaber@trash.net>

netfilter: nf_tables: implement set transaction support

Set elements are the last object type not supporting transaction support.
Implement similar to the existing rule transactions:

The global transaction counter keeps track of two generations, current
and next. Each element contains a bitmask specifying in which generations
it is inactive.

New elements start out as inactive in the current generation and active
in the next. On commit, the previous next generation becomes the current
generation and the element becomes active. The bitmask is then cleared
to indicate that the element is active in all future generations. If the
transaction is aborted, the element is removed from the set before it
becomes active.

When removing an element, it gets marked as inactive in the next generation.
On commit the next generation becomes active and the therefor the element
inactive. It is then taken out of then set and released. On abort, the
element is marked as active for the next generation again.

Lookups ignore elements not active in the current generation.

The current set types (hash/rbtree) both use a field in the extension area
to store the generation mask. This (currently) does not require any
additional memory since we have some free space in there.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# ea4bd995 25-Mar-2015 Patrick McHardy <kaber@trash.net>

netfilter: nf_tables: add transaction helper functions

Add some helper functions for building the genmask as preparation for
set transactions.

Also add a little documentation how this stuff actually works.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 61edafbb 25-Mar-2015 Patrick McHardy <kaber@trash.net>

netfilter: nf_tables: consolide set element destruction

With the conversion to set extensions, it is now possible to consolidate
the different set element destruction functions.

The set implementations' ->remove() functions are changed to only take
the element out of their internal data structures. Elements will be freed
in a batched fashion after the global transaction's completion RCU grace
period.

This reduces the amount of grace periods required for nft_hash from N
to zero additional ones, additionally this guarantees that the set
elements' extensions of all implementations can be used under RCU
protection.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# fe2811eb 25-Mar-2015 Patrick McHardy <kaber@trash.net>

netfilter: nf_tables: convert hash and rbtree to set extensions

The set implementations' private struct will only contain the elements
needed to maintain the search structure, all other elements are moved
to the set extensions.

Element allocation and initialization is performed centrally by
nf_tables_api instead of by the different set implementations'
->insert() functions. A new "elemsize" member in the set ops specifies
the amount of memory to reserve for internal usage. Destruction
will also be moved out of the set implementations by a following patch.

Except for element allocation, the patch is a simple conversion to
using data from the extension area.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 3ac4c07a 25-Mar-2015 Patrick McHardy <kaber@trash.net>

netfilter: nf_tables: add set extensions

Add simple set extension infrastructure for maintaining variable sized
and optional per element data.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 5ebb335d 21-Mar-2015 Patrick McHardy <kaber@trash.net>

netfilter: nf_tables: move struct net pointer to base chain

The network namespace is only needed for base chains to get at the
gencursor. Also convert to possible_net_t.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 55df35d2 21-Mar-2015 Patrick McHardy <kaber@trash.net>

netfilter: nf_tables: reject NFT_SET_ELEM_INTERVAL_END flag for non-interval sets

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# ffdb210e 17-Mar-2015 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: consolidate error path of nf_tables_newtable()

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# d6b6cb1d 17-Mar-2015 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: allow to change chain policy without hook if it exists

If there's an existing base chain, we have to allow to change the
default policy without indicating the hook information.

However, if the chain doesn't exists, we have to enforce the presence of
the hook attribute.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 1cae565e 05-Mar-2015 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: limit maximum table name length to 32 bytes

Set the same as we use for chain names, it should be enough.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 59900e0a 04-Mar-2015 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: fix error handling of rule replacement

In general, if a transaction object is added to the list successfully,
we can rely on the abort path to undo what we've done. This allows us to
simplify the error handling of the rule replacement path in
nf_tables_newrule().

This implicitly fixes an unnecessary removal of the old rule, which
needs to be left in place if we fail to replace.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 86f1ec32 03-Mar-2015 Patrick McHardy <kaber@trash.net>

netfilter: nf_tables: fix userdata length overflow

The NFT_USERDATA_MAXLEN is defined to 256, however we only have a u8
to store its size. Introduce a struct nft_userdata which contains a
length field and indicate its presence using a single bit in the rule.

The length field of struct nft_userdata is also a u8, however we don't
store zero sized data, so the actual length is udata->len + 1.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 9889840f 03-Mar-2015 Patrick McHardy <kaber@trash.net>

netfilter: nf_tables: check for overflow of rule dlen field

Check that the space required for the expressions doesn't exceed the
size of the dlen field, which would lead to the iterators crashing.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 8670c3a5 03-Mar-2015 Patrick McHardy <kaber@trash.net>

netfilter: nf_tables: fix transaction race condition

A race condition exists in the rule transaction code for rules that
get added and removed within the same transaction.

The new rule starts out as inactive in the current and active in the
next generation and is inserted into the ruleset. When it is deleted,
it is additionally set to inactive in the next generation as well.

On commit the next generation is begun, then the actions are finalized.
For the new rule this would mean clearing out the inactive bit for
the previously current, now next generation.

However nft_rule_clear() clears out the bits for *both* generations,
activating the rule in the current generation, where it should be
deactivated due to being deleted. The rule will thus be active until
the deletion is finalized, removing the rule from the ruleset.

Similarly, when aborting a transaction for the same case, the undo
of insertion will remove it from the RCU protected rule list, the
deletion will clear out all bits. However until the next RCU
synchronization after all operations have been undone, the rule is
active on CPUs which can still see the rule on the list.

Generally, there may never be any modifications of the current
generations' inactive bit since this defeats the entire purpose of
atomicity. Change nft_rule_clear() to only touch the next generations
bit to fix this.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 02263db0 20-Feb-2015 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: fix addition/deletion of elements from commit/abort

We have several problems in this path:

1) There is a use-after-free when removing individual elements from
the commit path.

2) We have to uninit() the data part of the element from the abort
path to avoid a chain refcount leak.

3) We have to check for set->flags to see if there's a mapping, instead
of the element flags.

4) We have to check for !(flags & NFT_SET_ELEM_INTERVAL_END) to skip
elements that are part of the interval that have no data part, so
they don't need to be uninit().

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# f5553c19 29-Jan-2015 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: fix leaks in error path of nf_tables_newchain()

Release statistics and module refcount on memory allocation problems.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# e8781f70 21-Jan-2015 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: disable preemption when restoring chain counters

With CONFIG_DEBUG_PREEMPT=y

[22144.496057] BUG: using smp_processor_id() in preemptible [00000000] code: iptables-compat/10406
[22144.496061] caller is debug_smp_processor_id+0x17/0x1b
[22144.496065] CPU: 2 PID: 10406 Comm: iptables-compat Not tainted 3.19.0-rc4+ #
[...]
[22144.496092] Call Trace:
[22144.496098] [<ffffffff8145b9fa>] dump_stack+0x4f/0x7b
[22144.496104] [<ffffffff81244f52>] check_preemption_disabled+0xd6/0xe8
[22144.496110] [<ffffffff81244f90>] debug_smp_processor_id+0x17/0x1b
[22144.496120] [<ffffffffa07c557e>] nft_stats_alloc+0x94/0xc7 [nf_tables]
[22144.496130] [<ffffffffa07c73d2>] nf_tables_newchain+0x471/0x6d8 [nf_tables]
[22144.496140] [<ffffffffa07c5ef6>] ? nft_trans_alloc+0x18/0x34 [nf_tables]
[22144.496154] [<ffffffffa063c8da>] nfnetlink_rcv_batch+0x2b4/0x457 [nfnetlink]

Reported-by: Andreas Schultz <aschultz@tpip.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 75e8d06d 14-Jan-2015 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: validate hooks in NAT expressions

The user can crash the kernel if it uses any of the existing NAT
expressions from the wrong hook, so add some code to validate this
when loading the rule.

This patch introduces nft_chain_validate_hooks() which is based on
an existing function in the bridge version of the reject expression.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 053c095a 16-Jan-2015 Johannes Berg <johannes.berg@intel.com>

netlink: make nlmsg_end() and genlmsg_end() void

Contrary to common expectations for an "int" return, these functions
return only a positive value -- if used correctly they cannot even
return 0 because the message header will necessarily be in the skb.

This makes the very common pattern of

if (genlmsg_end(...) < 0) { ... }

be a whole bunch of dead code. Many places also simply do

return nlmsg_end(...);

and the caller is expected to deal with it.

This also commonly (at least for me) causes errors, because it is very
common to write

if (my_function(...))
/* error condition */

and if my_function() does "return nlmsg_end()" this is of course wrong.

Additionally, there's not a single place in the kernel that actually
needs the message length returned, and if anyone needs it later then
it'll be very easy to just use skb->len there.

Remove this, and make the functions void. This removes a bunch of dead
code as described above. The patch adds lines because I did

- return nlmsg_end(...);
+ nlmsg_end(...);
+ return 0;

I could have preserved all the function's return values by returning
skb->len, but instead I've audited all the places calling the affected
functions and found that none cared. A few places actually compared
the return value with <= 0 in dump functionality, but that could just
be changed to < 0 with no change in behaviour, so I opted for the more
efficient version.

One instance of the error I've made numerous times now is also present
in net/phonet/pn_netlink.c in the route_dumpit() function - it didn't
check for <0 or <=0 and thus broke out of the loop every single time.
I've preserved this since it will (I think) have caused the messages to
userspace to be formatted differently with just a single message for
every SKB returned to userspace. It's possible that this isn't needed
for the tools that actually use this, but I don't even know what they
are so couldn't test that changing this behaviour would be acceptable.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a2f18db0 04-Jan-2015 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: fix flush ruleset chain dependencies

Jumping between chains doesn't mix well with flush ruleset. Rules
from a different chain and set elements may still refer to us.

[ 353.373791] ------------[ cut here ]------------
[ 353.373845] kernel BUG at net/netfilter/nf_tables_api.c:1159!
[ 353.373896] invalid opcode: 0000 [#1] SMP
[ 353.373942] Modules linked in: intel_powerclamp uas iwldvm iwlwifi
[ 353.374017] CPU: 0 PID: 6445 Comm: 31c3.nft Not tainted 3.18.0 #98
[ 353.374069] Hardware name: LENOVO 5129CTO/5129CTO, BIOS 6QET47WW (1.17 ) 07/14/2010
[...]
[ 353.375018] Call Trace:
[ 353.375046] [<ffffffff81964c31>] ? nf_tables_commit+0x381/0x540
[ 353.375101] [<ffffffff81949118>] nfnetlink_rcv+0x3d8/0x4b0
[ 353.375150] [<ffffffff81943fc5>] netlink_unicast+0x105/0x1a0
[ 353.375200] [<ffffffff8194438e>] netlink_sendmsg+0x32e/0x790
[ 353.375253] [<ffffffff818f398e>] sock_sendmsg+0x8e/0xc0
[ 353.375300] [<ffffffff818f36b9>] ? move_addr_to_kernel.part.20+0x19/0x70
[ 353.375357] [<ffffffff818f44f9>] ? move_addr_to_kernel+0x19/0x30
[ 353.375410] [<ffffffff819016d2>] ? verify_iovec+0x42/0xd0
[ 353.375459] [<ffffffff818f3e10>] ___sys_sendmsg+0x3f0/0x400
[ 353.375510] [<ffffffff810615fa>] ? native_sched_clock+0x2a/0x90
[ 353.375563] [<ffffffff81176697>] ? acct_account_cputime+0x17/0x20
[ 353.375616] [<ffffffff8110dc78>] ? account_user_time+0x88/0xa0
[ 353.375667] [<ffffffff818f4bbd>] __sys_sendmsg+0x3d/0x80
[ 353.375719] [<ffffffff81b184f4>] ? int_check_syscall_exit_work+0x34/0x3d
[ 353.375776] [<ffffffff818f4c0d>] SyS_sendmsg+0xd/0x20
[ 353.375823] [<ffffffff81b1826d>] system_call_fastpath+0x16/0x1b

Release objects in this order: rules -> sets -> chains -> tables, to
make sure no references to chains are held anymore.

Reported-by: Asbjoern Sloth Toennesen <asbjorn@asbjorn.biz>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 982f4051 18-Nov-2014 Markus Elfring <elfring@users.sourceforge.net>

netfilter: Deletion of unnecessary checks before two function calls

The functions free_percpu() and module_put() test whether their argument
is NULL and then return immediately. Thus the test around the call is
not needed.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Acked-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# b326dd37 10-Nov-2014 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: restore synchronous object release from commit/abort

The existing xtables matches and targets, when used from nft_compat, may
sleep from the destroy path, ie. when removing rules. Since the objects
are released via call_rcu from softirq context, this results in lockdep
splats and possible lockups that may be hard to reproduce.

Patrick also indicated that delayed object release via call_rcu can
cause us problems in the ordering of event notifications when anonymous
sets are in place.

So, this patch restores the synchronous object release from the commit
and abort paths. This includes a call to synchronize_rcu() to make sure
that no packets are walking on the objects that are going to be
released. This is slowier though, but it's simple and it resolves the
aforementioned problems.

This is a partial revert of c7c32e7 ("netfilter: nf_tables: defer all
object release via rcu") that was introduced in 3.16 to speed up
interaction with userspace.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 01cfa0a4 29-Oct-2014 stephen hemminger <stephen@networkplumber.org>

netfilter: fix spelling errors

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# c123bb71 21-Oct-2014 Sabrina Dubroca <sd@queasysnail.net>

netfilter: nf_tables: check for NULL in nf_tables_newchain pcpu stats allocation

alloc_percpu returns NULL on failure, not a negative error code.

Fixes: ff3cd7b3c922 ("netfilter: nf_tables: refactor chain statistic routines")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 7210e4e3 13-Oct-2014 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: restrict nat/masq expressions to nat chain type

This adds the missing validation code to avoid the use of nat/masq from
non-nat chains. The validation assumes two possible configuration
scenarios:

1) Use of nat from base chain that is not of nat type. Reject this
configuration from the nft_*_init() path of the expression.

2) Use of nat from non-base chain. In this case, we have to wait until
the non-base chain is referenced by at least one base chain via
jump/goto. This is resolved from the nft_*_validate() path which is
called from nf_tables_check_loops().

The user gets an -EOPNOTSUPP in both cases.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 1b1bc49c 01-Oct-2014 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: wait for call_rcu completion on module removal

Make sure the objects have been released before the nf_tables modules
is removed.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 9363dc4b 23-Sep-2014 Arturo Borrero <arturo.borrero.glez@gmail.com>

netfilter: nf_tables: store and dump set policy

We want to know in which cases the user explicitly sets the policy
options. In that case, we also want to dump back the info.

Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 84d7fce6 04-Sep-2014 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: export rule-set generation ID

This patch exposes the ruleset generation ID in three ways:

1) The new command NFT_MSG_GETGEN that exposes the 32-bits ruleset
generation ID. This ID is incremented in every commit and it
should be large enough to avoid wraparound problems.

2) The less significant 16-bits of the generation ID are exposed through
the nfgenmsg->res_id header field. This allows us to quickly catch
if the ruleset has change between two consecutive list dumps from
different object lists (in this specific case I think the risk of
wraparound is unlikely).

3) Userspace subscribers may receive notifications of new rule-set
generation after every commit. This also provides an alternative
way to monitor the generation ID. If the events are lost, the
userspace process hits a overrun error, so it knows that it is
working with a stale ruleset anyway.

Patrick spotted that rule-set transformations in userspace may take
quite some time. In that case, it annotates the 32-bits generation ID
before fetching the rule-set, then:

1) it compares it to what we obtain after the transformation to
make sure it is not working with a stale rule-set and no wraparound
has ocurred.

2) it subscribes to ruleset notifications, so it can watch for new
generation ID.

This is complementary to the NLM_F_DUMP_INTR approach, which allows
us to detect an interference in the middle one single list dumping.
There is no way to explicitly check that an interference has occurred
between two list dumps from the kernel, since it doesn't know how
many lists the userspace client is actually going to dump.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# b9ac12ef 02-Sep-2014 Arturo Borrero <arturo.borrero.glez@gmail.com>

netfilter: nf_tables: extend NFT_MSG_DELTABLE to support flushing the ruleset

This patch extend the NFT_MSG_DELTABLE call to support flushing the entire
ruleset.

The options now are:
* No family speficied, no table specified: flush all the ruleset.
* Family specified, no table specified: flush all tables in the AF.
* Family specified, table specified: flush the given table.

Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# ee01d542 02-Sep-2014 Arturo Borrero <arturo.borrero.glez@gmail.com>

netfilter: nf_tables: add helpers to schedule objects deletion

This patch refactor the code to schedule objects deletion.
They are useful in follow-up patches.

In order to be able to use these new helper functions in all the code,
they are placed in the top of the file, with all the dependant functions
and symbols.

nft_rule_disactivate_next has been renamed to nft_rule_deactivate.

Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# ce24b721 02-Sep-2014 Arturo Borrero <arturo.borrero.glez@gmail.com>

netfilter: nf_tables: rename nf_table_delrule_by_chain()

For the sake of homogenize the function naming scheme, let's rename
nf_table_delrule_by_chain() to nft_delrule_by_chain().

Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# c5598794 02-Sep-2014 Arturo Borrero <arturo.borrero.glez@gmail.com>

netfilter: nf_tables: add helper to unregister chain hooks

This patch adds a helper function to unregister chain hooks in the chain
deletion path. Basically, a code factorization.

The new function is useful in follow-up patches.

Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 5e266fe7 02-Sep-2014 Arturo Borrero <arturo.borrero.glez@gmail.com>

netfilter: nf_tables: refactor rule deletion helper

This helper function always schedule the rule to be removed in the following
transaction.
In follow-up patches, it is interesting to handle separately the logic of rule
activation/disactivation from the transaction mechanism.

So, this patch simply splits the original nf_tables_delrule_one() in two
functions, allowing further control.

While at it, for the sake of homigeneize the function naming scheme, let's
rename nf_tables_delrule_one() to nft_delrule().

Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 609ccf08 07-Aug-2014 Julia Lawall <Julia.Lawall@lip6.fr>

netfilter: nf_tables: fix error return code

Convert a zero return value on error to a negative one, as returned
elsewhere in the function.

A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)

// <smpl>
@@
identifier ret; expression e1,e2;
@@
(
if (\(ret < 0\|ret != 0\))
{ ... return ret; }
|
ret = 0
)
... when != ret = e1
when != &ret
*if(...)
{
... when != ret = e2
when forall
return ret;
}
// </smpl>

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# b88825de 05-Aug-2014 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: don't update chain with unset counters

Fix possible replacement of the per-cpu chain counters by null
pointer when updating an existing chain in the commit path.

Reported-by: Matteo Croce <technoboy85@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# a3716e70 01-Aug-2014 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: uninitialize element key/data from the commit path

This should happen once the element has been effectively released in
the commit path, not before. This fixes a possible chain refcount leak
if the transaction is aborted.

Reported-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 0dc13625 01-Aug-2014 Thomas Graf <tgraf@suug.ch>

netfilter: nf_tables: Avoid duplicate call to nft_data_uninit() for same key

nft_del_setelem() currently calls nft_data_uninit() twice on the same
key. Once to release the key which is guaranteed to be NFT_DATA_VALUE
and a second time in the error path to which it falls through.

The second call has been harmless so far though because the type
passed is always NFT_DATA_VALUE which is currently a no-op.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 7d5570ca 25-Jul-2014 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: check for unset NFTA_SET_ELEM_LIST_ELEMENTS attribute

Otherwise, the kernel oopses in nla_for_each_nested when iterating over
the unset attribute NFTA_SET_ELEM_LIST_ELEMENTS in the
nf_tables_{new,del}setelem() path.

netlink: 65524 bytes leftover after parsing attributes in process `nft'.
[...]
Oops: 0000 [#1] SMP
[...]
CPU: 2 PID: 6287 Comm: nft Not tainted 3.16.0-rc2+ #169
RIP: 0010:[<ffffffffa0526e61>] [<ffffffffa0526e61>] nf_tables_newsetelem+0x82/0xec [nf_tables]
[...]
Call Trace:
[<ffffffffa05178c4>] nfnetlink_rcv+0x2e7/0x3d7 [nfnetlink]
[<ffffffffa0517939>] ? nfnetlink_rcv+0x35c/0x3d7 [nfnetlink]
[<ffffffff8137d300>] netlink_unicast+0xf8/0x17a
[<ffffffff8137d6a5>] netlink_sendmsg+0x323/0x351
[...]

Fix this by returning -EINVAL if this attribute is not set, which
doesn't make sense at all since those commands are there to add and to
delete elements from the set.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 5b96af77 16-Jul-2014 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: simplify set dump through netlink

This patch uses the cb->data pointer that allows us to store the
context when dumping the set list. Thus, we don't need to parse the
original netlink message containing the dump request for each recvmsg()
call when dumping the set list. The different function flavours
depending on the dump criteria has been also merged into one single
generic function. This saves us ~100 lines of code.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# ce355e20 09-Jul-2014 Eric Dumazet <edumazet@google.com>

netfilter: nf_tables: 64bit stats need some extra synchronization

Use generic u64_stats_sync infrastructure to get proper 64bit stats,
even on 32bit arches, at no extra cost for 64bit arches.

Without this fix, 32bit arches can have some wrong counters at the time
the carry is propagated into upper word.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 38e029f1 30-Jun-2014 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: set NLM_F_DUMP_INTR if netlink dumping is stale

An updater may interfer with the dumping of any of the object lists.
Fix this by using a per-net generation counter and use the
nl_dump_check_consistent() interface so the NLM_F_DUMP_INTR flag is set
to notify userspace that it has to restart the dump since an updater
has interfered.

This patch also replaces the existing consistency checking code in the
rule dumping path since it is broken. Basically, the value that the
dump callback returns is not propagated to userspace via
netlink_dump_start().

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# e688a7f8 01-Jul-2014 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: safe RCU iteration on list when dumping

The dump operation through netlink is not protected by the nfnl_lock.
Thus, a reader process can be dumping any of the existing object
lists while another process can be updating the list content.

This patch resolves this situation by protecting all the object
lists with RCU in the netlink dump path which is the reader side.
The updater path is already protected via nfnl_lock, so use list
manipulation RCU-safe operations.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 63283dd2 27-Jun-2014 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: skip transaction if no update flags in tables

Skip transaction handling for table updates with no changes in
the flags. This fixes a crash when passing the table flag with all
bits unset.

Reported-by: Ana Rey <anarey@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 6403d962 11-Jun-2014 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: indicate family when dumping set elements

Set the nfnetlink header that indicates the family of this element.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# ac904ac8 10-Jun-2014 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: fix wrong type in transaction when replacing rules

In b380e5c ("netfilter: nf_tables: add message type to transactions"),
I used the wrong message type in the rule replacement case. The rule
that is replaced needs to be handled as a deleted rule.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# ac34b861 10-Jun-2014 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: decrement chain use counter when replacing rules

Thus, the chain use counter remains with the same value after the
rule replacement.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# a0a7379e 10-Jun-2014 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: use u32 for chain use counter

Since 4fefee5 ("netfilter: nf_tables: allow to delete several objects
from a batch"), every new rule bumps the chain use counter. However,
this is limited to 16 bits, which means that it will overrun after
2^16 rules.

Use a u32 chain counter and check for overflows (just like we do for
table objects).

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 5bc5c307 10-Jun-2014 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: use RCU-safe list insertion when replacing rules

The patch 5e94846 ("netfilter: nf_tables: add insert operation") did
not include RCU-safe list insertion when replacing rules.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 31f8441c 29-May-2014 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: atomic allocation in set notifications from rcu callback

Use GFP_ATOMIC allocations when sending removal notifications of
anonymous sets from rcu callback context. Sleeping in that context
is illegal.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 4fefee57 22-May-2014 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: allow to delete several objects from a batch

Three changes to allow the deletion of several objects with dependencies
in one transaction, they are:

1) Introduce speculative counter increment/decrement that is undone in
the abort path if required, thus we avoid hitting -EBUSY when deleting
the chain. The counter updates are reverted in the abort path.

2) Increment/decrement table/chain use counter for each set/rule. We need
this to fully rely on the use counters instead of the list content,
eg. !list_empty(&chain->rules) which evaluate true in the middle of the
transaction.

3) Decrement table use counter when an anonymous set is bound to the
rule in the commit path. This avoids hitting -EBUSY when deleting
the table that contains anonymous sets. The anonymous sets are released
in the nf_tables_rule_destroy path. This should not be a problem since
the rule already bumped the use counter of the chain, so the bound
anonymous set reflects dependencies through the rule object, which
already increases the chain use counter.

So the general assumption after this patch is that the use counters are
bumped by direct object dependencies.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# a1cee076 23-May-2014 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: release objects in reverse order in the abort path

The patch c7c32e7 ("netfilter: nf_tables: defer all object release via
rcu") indicates that we always release deleted objects in the reverse
order, but that is only needed in the abort path. These are the two
possible scenarios when releasing objects:

1) Deletion scenario in the commit path: no need to release objects in
the reverse order since userspace already ensures that dependencies are
fulfilled), ie. userspace tells us to delete rule -> ... -> rule ->
chain -> table. In this case, we have to release the objects in the
*same order* as userspace provided.

2) Deletion scenario in the abort path: we have to iterate in the reverse
order to undo what it cannot be added, ie. userspace sent us a batch
that includes: table -> chain -> rule -> ... -> rule, and that needs to
be partially undone. In this case, we have to release objects in the
reverse order to ensure that the set and chain objects point to valid
rule and table objects.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 46bbafce 21-May-2014 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: fix wrong transaction ordering in set elements

The transaction needs to be placed at the end of the commit list,
otherwise event notifications are reordered and we may crash when
releasing object via call_rcu.

This problem was introduced in 60319eb ("netfilter: nf_tables: use new
transaction infrastructure to handle elements").

Reported-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# c7c32e72 09-Apr-2014 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: defer all object release via rcu

Now that all objects are released in the reverse order via the
transaction infrastructure, we can enqueue the release via
call_rcu to save one synchronize_rcu. For small rule-sets loaded
via nft -f, it now takes around 50ms less here.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 128ad332 09-May-2014 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: remove skb and nlh from context structure

Instead of caching the original skbuff that contains the netlink
messages, this stores the netlink message sequence number, the
netlink portID and the report flag. This helps to prepare the
introduction of the object release via call_rcu.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 35151d84 05-May-2014 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: simplify nf_tables_*_notify

Now that all these function are called from the commit path, we can
pass the context structure to reduce the amount of parameters in all
of the nf_tables_*_notify functions. This patch also removes unneeded
branches to check for skb, nlh and net that should be always set in
the context structure.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 60319eb1 03-Apr-2014 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: use new transaction infrastructure to handle elements

Leave the set content in consistent state if we fail to load the
batch. Use the new generic transaction infrastructure to achieve
this.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 55dd6f93 03-Apr-2014 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: use new transaction infrastructure to handle table

This patch speeds up rule-set updates and it also provides a way
to revert updates and leave things in consistent state in case that
the batch needs to be aborted.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# e1aaca93 30-Mar-2014 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: pass context to nf_tables_updtable()

So nf_tables_uptable() only takes one single parameter.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# f75edf5e 30-Mar-2014 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: disabling table hooks always succeeds

nf_tables_table_disable() always succeeds, make this function void.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 91c7b38d 09-Apr-2014 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: use new transaction infrastructure to handle chain

This patch speeds up rule-set updates and it also introduces a way to
revert chain updates if the batch is aborted. The idea is to store the
changes in the transaction to apply that in the commit step.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# ff3cd7b3 09-Apr-2014 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: refactor chain statistic routines

Add new routines to encapsulate chain statistics allocation and
replacement.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 958bee14 03-Apr-2014 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: use new transaction infrastructure to handle sets

This patch reworks the nf_tables API so set updates are included in
the same batch that contains rule updates. This speeds up rule-set
updates since we skip a dialog of four messages between kernel and
user-space (two on each direction), from:

1) create the set and send netlink message to the kernel
2) process the response from the kernel that contains the allocated name.
3) add the set elements and send netlink message to the kernel.
4) process the response from the kernel (to check for errors).

To:

1) add the set to the batch.
2) add the set elements to the batch.
3) add the rule that points to the set.
4) send batch to the kernel.

This also introduces an internal set ID (NFTA_SET_ID) that is unique
in the batch so set elements and rules can refer to new sets.

Backward compatibility has been only retained in userspace, this
means that new nft versions can talk to the kernel both in the new
and the old fashion.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# b380e5c7 03-Apr-2014 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: add message type to transactions

The patch adds message type to the transaction to simplify the
commit the and abort routines. Yet another step forward in the
generalisation of the transaction infrastructure.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 37082f93 03-Apr-2014 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: relocate commit and abort routines in the source file

Move the commit and abort routines to the bottom of the source code
file. This change is required by the follow up patches that add the
set, chain and table transaction support.

This patch is just a cleanup to access several functions without
having to declare their prototypes.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 1081d11b 03-Apr-2014 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: generalise transaction infrastructure

This patch generalises the existing rule transaction infrastructure
so it can be used to handle set, table and chain object transactions
as well. The transaction provides a data area that stores private
information depending on the transaction type.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 7c95f6d8 03-Apr-2014 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: deconstify table and chain in context structure

The new transaction infrastructure updates the family, table and chain
objects in the context structure, so let's deconstify them. While at it,
move the context structure initialization routine to the top of the
source file as it will be also used from the table and chain routines.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 4c1f7818 31-Mar-2014 Pablo Neira <pablo@netfilter.org>

netfilter: nf_tables: relax string validation of NFTA_CHAIN_TYPE

Use NLA_STRING for consistency with other string attributes in
nf_tables.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 758dbcec 14-Apr-2014 Tomasz Bursztyka <tomasz.bursztyka@linux.intel.com>

netfilter: nf_tables: Stack expression type depending on their family

To ensure family tight expression gets selected in priority to family
agnostic ones.

Signed-off-by: Tomasz Bursztyka <tomasz.bursztyka@linux.intel.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 60eb1894 06-Mar-2014 Patrick McHardy <kaber@trash.net>

netfilter: nf_tables: handle more than 8 * PAGE_SIZE set name allocations

We currently have a limit of 8 * PAGE_SIZE anonymous sets. Lift that limit
by continuing the scan if the entire page is exhausted.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 2fec6bb6 30-Mar-2014 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: fix wrong format in request_module()

The intended format in request_module is %.*s instead of %*.s.

Reported-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# a9bdd836 24-Mar-2014 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: set names cannot be larger than 15 bytes

Currently, nf_tables trims off the set name if it exceeeds 15
bytes, so explicitly reject set names that are too large.

Reported-by: Giuseppe Longo <giuseppelng@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# d60ce62f 01-Apr-2014 Arturo Borrero <arturo.borrero.glez@gmail.com>

netfilter: nf_tables: add set_elem notifications

This patch adds set_elems notifications. When a set_elem is
added/deleted, all listening peers in userspace will receive the
corresponding notification.

Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@gnumonks.org>


# c50b960c 28-Mar-2014 Patrick McHardy <kaber@trash.net>

netfilter: nf_tables: implement proper set selection

The current set selection simply choses the first set type that provides
the requested features, which always results in the rbtree being chosen
by virtue of being the first set in the list.

What we actually want to do is choose the implementation that can provide
the requested features and is optimal from either a performance or memory
perspective depending on the characteristics of the elements and the
preferences specified by the user.

The elements are not known when creating a set. Even if we would provide
them for anonymous (literal) sets, we'd still have standalone sets where
the elements are not known in advance. We therefore need an abstract
description of the data charcteristics.

The kernel already knows the size of the key, this patch starts by
introducing a nested set description which so far contains only the maximum
amount of elements. Based on this the set implementations are changed to
provide an estimate of the required amount of memory and the lookup
complexity class.

The set ops have a new callback ->estimate() that is invoked during set
selection. It receives a structure containing the attributes known to the
kernel and is supposed to populate a struct nft_set_estimate with the
complexity class and, in case the size is known, the complete amount of
memory required, or the amount of memory required per element otherwise.

Based on the policy specified by the user (performance/memory, defaulting
to performance) the kernel will then select the best suited implementation.

Even if the set implementation would allow to add more than the specified
maximum amount of elements, they are enforced since new implementations
might not be able to add more than maximum based on which they were
selected.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# ab9da5c1 07-Mar-2014 Patrick McHardy <kaber@trash.net>

netfilter: nf_tables: restore notifications for anonymous set destruction

Since we have the context available again, we can restore notifications
for destruction of anonymous sets.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 62472bce 07-Mar-2014 Patrick McHardy <kaber@trash.net>

netfilter: nf_tables: restore context for expression destructors

In order to fix set destruction notifications and get rid of unnecessary
members in private data structures, pass the context to expressions'
destructor functions again.

In order to do so, replace various members in the nft_rule_trans structure
by the full context.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# a36e901c 07-Mar-2014 Patrick McHardy <kaber@trash.net>

netfilter: nf_tables: clean up nf_tables_trans_add() argument order

The context argument logically comes first, and this is what every other
function dealing with contexts does.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 0768b3b3 19-Feb-2014 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: add optional user data area to rules

This allows us to store user comment strings, but it could be also
used to store any kind of information that the user application needs
to link to the rule.

Scratch 8 bits for the new ulen field that indicates the length the
user data area. 4 bits from the handle (so it's 42 bits long, according
to Patrick, it would last 139 years with 1000 new rules per second)
and 4 bits from dlen (so the expression data area is 4K, which seems
sufficient by now even considering the compatibility layer).

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Patrick McHardy <kaber@trash.net>


# e0abdadc 18-Feb-2014 Patrick McHardy <kaber@trash.net>

netfilter: nf_tables: accept QUEUE/DROP verdict parameters

Allow userspace to specify the queue number or the errno code for QUEUE
and DROP verdicts.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 67a8fc27 18-Feb-2014 Patrick McHardy <kaber@trash.net>

netfilter: nf_tables: add nft_dereference() macro

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 62f9c8b4 07-Feb-2014 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: fix loop checking with end interval elements

Fix access to uninitialized data for end interval elements. The
element data part is uninitialized in interval end elements.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# bd7fc645 06-Feb-2014 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: do not allow NFT_SET_ELEM_INTERVAL_END flag and data

This combination is not allowed since end interval elements cannot
contain data.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Patrick McHardy <kaber@trash.net>


# 0165d932 25-Jan-2014 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: fix racy rule deletion

We may lost race if we flush the rule-set (which happens asynchronously
via call_rcu) and we try to remove the table (that userspace assumes
to be empty).

Fix this by recovering synchronous rule and chain deletion. This was
introduced time ago before we had no batch support, and synchronous
rule deletion performance was not good. Now that we have the batch
support, we can just postpone the purge of old rule in a second step
in the commit phase. All object deletions are synchronous after this
patch.

As a side effect, we save memory as we don't need rcu_head per rule
anymore.

Cc: Patrick McHardy <kaber@trash.net>
Reported-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 64d46806 05-Feb-2014 Patrick McHardy <kaber@trash.net>

netfilter: nf_tables: add AF specific expression support

For the reject module, we need to add AF-specific implementations to
get rid of incorrect module dependencies. Try to load an AF-specific
module first and fall back to generic modules.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# ec2c9935 05-Feb-2014 Patrick McHardy <kaber@trash.net>

netfilter: nf_tables: fix potential oops when dumping sets

Commit c9c8e48597 (netfilter: nf_tables: dump sets in all existing families)
changed nft_ctx_init_from_setattr() to only look up the address family if it
is not NFPROTO_UNSPEC. However if it is NFPROTO_UNSPEC and a table attribute
is given, nftables_afinfo_lookup() will dereference the NULL afi pointer.

Fix by checking for non-NULL afi and also move a check added by that commit
to the proper position.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 53b70287 04-Feb-2014 Patrick McHardy <kaber@trash.net>

netfilter: nf_tables: fix overrun in nf_tables_set_alloc_name()

The map that is used to allocate anonymous sets is indeed
BITS_PER_BYTE * PAGE_SIZE long.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 3dd7279f 25-Jan-2014 Patrick McHardy <kaber@trash.net>

netfilter: nf_tables: fix oops when deleting a chain with references

The following commands trigger an oops:

# nft -i
nft> add table filter
nft> add chain filter input { type filter hook input priority 0; }
nft> add chain filter test
nft> add rule filter input jump test
nft> delete chain filter test

We need to check the chain use counter before allowing destruction since
we might have references from sets or jump rules.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=69341
Reported-by: Matthew Ife <deleriux1@gmail.com>
Tested-by: Matthew Ife <deleriux1@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 8f46df18 10-Jan-2014 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: fix missing byteorder conversion in policy

When fetching the policy attribute, the byteorder conversion was
missing, breaking the chain policy setting.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 44a6f0df 09-Jan-2014 Patrick McHardy <kaber@trash.net>

netfilter: nf_tables: prohibit deletion of a table with existing sets

We currently leak the set memory when deleting a table that still has
sets in it. Return EBUSY when attempting to delete a table with sets.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 7047f9d0 09-Jan-2014 Patrick McHardy <kaber@trash.net>

netfilter: nf_tables: take AF module reference when creating a table

The table refers to data of the AF module, so we need to make sure the
module isn't unloaded while the table exists.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# c5c1f975 09-Jan-2014 Patrick McHardy <kaber@trash.net>

netfilter: nf_tables: perform flags validation before table allocation

Simplifies error handling. Additionally use the correct type u32 for the
host byte order flags value.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# fa2c1de0 09-Jan-2014 Patrick McHardy <kaber@trash.net>

netfilter: nf_tables: minor nf_chain_type cleanups

Minor nf_chain_type cleanups:

- reorder struct to plug a hoe
- rename struct module member to "owner" for consistency
- rename nf_hookfn array to "hooks" for consistency
- reorder initializers for better readability

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 2a37d755 09-Jan-2014 Patrick McHardy <kaber@trash.net>

netfilter: nf_tables: constify chain type definitions and pointers

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 93b0806f 09-Jan-2014 Patrick McHardy <kaber@trash.net>

netfilter: nf_tables: replay request after dropping locks to load chain type

To avoid races, we need to replay to request after dropping the nfnl_mutex
to auto-load the chain type module.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# baae3e62 09-Jan-2014 Patrick McHardy <kaber@trash.net>

netfilter: nf_tables: fix chain type module reference handling

The chain type module reference handling makes no sense at all: we take
a reference immediately when the module is registered, preventing the
module from ever being unloaded.

Fix by taking a reference when we're actually creating a chain of the
chain type and release the reference when destroying the chain.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 75820676 09-Jan-2014 Patrick McHardy <kaber@trash.net>

netfilter: nf_tables: fix check for table overflow

The table use counter is only increased for new chains, so move the check
to the correct position.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 4401a862 09-Jan-2014 Patrick McHardy <kaber@trash.net>

netfilter: nf_tables: restore chain change atomicity

Chain counter validation is performed after the chain policy has
potentially been changed. Move counter validation/setting before
changing of the chain policy to fix this.

Additionally fix a memory leak if chain counter allocation fails
for new chains, remove an unnecessary free_percpu() and move
counter allocation for new chains

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 57de2a0c 09-Jan-2014 Patrick McHardy <kaber@trash.net>

netfilter: nf_tables: split chain policy validation from actually setting it

Currently nf_tables_newchain() atomicity is broken because of having
validation of some netlink attributes performed after changing attributes
of the chain. The chain policy is (currently) fine, but split it up as
preparation for the following fixes and to avoid future mistakes.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 115a60b1 02-Jan-2014 Patrick McHardy <kaber@trash.net>

netfilter: nf_tables: add support for multi family tables

Add support to register chains to multiple hooks for different address
families for mixed IPv4/IPv6 tables.

Signed-off-by: Patrick McHardy <kaber@trash.net>


# 3b088c4b 02-Jan-2014 Patrick McHardy <kaber@trash.net>

netfilter: nf_tables: make chain types override the default AF functions

Currently the AF-specific hook functions override the chain-type specific
hook functions. That doesn't make too much sense since the chain types
are a special case of the AF-specific hooks.

Make the AF-specific hook functions the default and make the optional
chain type hooks override them.

As a side effect, the necessary code restructuring reduces the code size,
f.i. in case of nf_tables_ipv4.o:

nf_tables_ipv4_init_net | -24
nft_do_chain_ipv4 | -113
2 functions changed, 137 bytes removed, diff: -137

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# c9c8e485 26-Dec-2013 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: dump sets in all existing families

This patch allows you to dump all sets available in all of
the registered families. This allows you to use NFPROTO_UNSPEC
to dump all existing sets, similarly to other existing table,
chain and rule operations.

This patch is based on original patch from Arturo Borrero
González.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 720e0dfa 31-Dec-2013 Michal Nazarewicz <mina86@mina86.com>

netfilter: nf_tables: remove unused variable in nf_tables_dump_set()

The nfmsg variable is not used (except in sizeof operator which does
not care about its value) between the first and second time it is
assigned the value. Furthermore, nlmsg_data has no side effects, so
the assignment can be safely removed.

Signed-off-by: Michal Nazarewicz <mina86@mina86.com>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 14662917 30-Dec-2013 Daniel Borkmann <daniel@iogearbox.net>

netfilter: nf_tables: fix type in parsing in nf_tables_set_alloc_name()

In nf_tables_set_alloc_name(), we are trying to find a new, unused
name for our new set and interate through the list of present sets.
As far as I can see, we're using format string %d to parse already
present names in order to mark their presence in a bitmap, so that
we can later on find the first 0 in that map to assign the new set
name to. We should rather use a temporary variable of type int to
store the result of sscanf() to, and for making sanity checks on.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 2ee0d3c8 27-Dec-2013 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: fix wrong datatype in nft_validate_data_load()

This patch fixes dictionary mappings, eg.

add rule ip filter input meta dnat set tcp dport map { 22 => 1.1.1.1, 23 => 2.2.2.2 }

The kernel was returning -EINVAL in nft_validate_data_load() since
the type of the set element data that is passed was the real userspace
datatype instead of NFT_DATA_VALUE.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# d2012975 27-Dec-2013 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: fix oops when updating table with user chains

This patch fixes a crash while trying to deactivate a table that
contains user chains. You can reproduce it via:

% nft add table table1
% nft add chain table1 chain1
% nft-table-upd ip table1 dormant

[ 253.021026] BUG: unable to handle kernel NULL pointer dereference at 0000000000000030
[ 253.021114] IP: [<ffffffff8134cebd>] nf_register_hook+0x35/0x6f
[ 253.021167] PGD 30fa5067 PUD 30fa2067 PMD 0
[ 253.021208] Oops: 0000 [#1] SMP
[...]
[ 253.023305] Call Trace:
[ 253.023331] [<ffffffffa0885020>] nf_tables_newtable+0x11c/0x258 [nf_tables]
[ 253.023385] [<ffffffffa0878592>] nfnetlink_rcv_msg+0x1f4/0x226 [nfnetlink]
[ 253.023438] [<ffffffffa0878418>] ? nfnetlink_rcv_msg+0x7a/0x226 [nfnetlink]
[ 253.023491] [<ffffffffa087839e>] ? nfnetlink_bind+0x45/0x45 [nfnetlink]
[ 253.023542] [<ffffffff8134b47e>] netlink_rcv_skb+0x3c/0x88
[ 253.023586] [<ffffffffa0878973>] nfnetlink_rcv+0x3af/0x3e4 [nfnetlink]
[ 253.023638] [<ffffffff813fb0d4>] ? _raw_read_unlock+0x22/0x34
[ 253.023683] [<ffffffff8134af17>] netlink_unicast+0xe2/0x161
[ 253.023727] [<ffffffff8134b29a>] netlink_sendmsg+0x304/0x332
[ 253.023773] [<ffffffff8130d250>] __sock_sendmsg_nosec+0x25/0x27
[ 253.023820] [<ffffffff8130fb93>] sock_sendmsg+0x5a/0x7b
[ 253.023861] [<ffffffff8130d5d5>] ? copy_from_user+0x2a/0x2c
[ 253.023905] [<ffffffff8131066f>] ? move_addr_to_kernel+0x35/0x60
[ 253.023952] [<ffffffff813107b3>] SYSC_sendto+0x119/0x15c
[ 253.023995] [<ffffffff81401107>] ? sysret_check+0x1b/0x56
[ 253.024039] [<ffffffff8108dc30>] ? trace_hardirqs_on_caller+0x140/0x1db
[ 253.024090] [<ffffffff8120164e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[ 253.024141] [<ffffffff81310caf>] SyS_sendto+0x9/0xb
[ 253.026219] [<ffffffff814010e2>] system_call_fastpath+0x16/0x1b

Reported-by: Alex Wei <alex.kern.mentor@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# e38195bf 24-Dec-2013 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: fix dumping with large number of sets

If not table name is specified, the dumping of the existing sets
may be incomplete with a sufficiently large number of sets and
tables. This patch fixes missing reset of the cursors after
finding the location of the last object that has been included
in the previous multi-part message.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# d8bcc768 12-Dec-2013 Tomasz Bursztyka <tomasz.bursztyka@linux.intel.com>

netfilter: nf_tables: Expose the table usage counter via netlink

Userspace can therefore know whether a table is in use or not, and
by how many chains. Suggested by Pablo Neira Ayuso.

Signed-off-by: Tomasz Bursztyka <tomasz.bursztyka@linux.intel.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# cf9dc09d 24-Nov-2013 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: fix missing rules flushing per table

This patch allows you to atomically remove all rules stored in
a table via the NFT_MSG_DELRULE command. You only need to indicate
the specific table and no chain to flush all rules stored in that
table.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# b5bc89bf 10-Oct-2013 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: add trace support

This patch adds support for tracing the packet travel through
the ruleset, in a similar fashion to x_tables.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 0628b123 14-Oct-2013 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nfnetlink: add batch support and use it from nf_tables

This patch adds a batch support to nfnetlink. Basically, it adds
two new control messages:

* NFNL_MSG_BATCH_BEGIN, that indicates the beginning of a batch,
the nfgenmsg->res_id indicates the nfnetlink subsystem ID.

* NFNL_MSG_BATCH_END, that results in the invocation of the
ss->commit callback function. If not specified or an error
ocurred in the batch, the ss->abort function is invoked
instead.

The end message represents the commit operation in nftables, the
lack of end message results in an abort. This patch also adds the
.call_batch function that is only called from the batch receival
path.

This patch adds atomic rule updates and dumps based on
bitmask generations. This allows to atomically commit a set of
rule-set updates incrementally without altering the internal
state of existing nf_tables expressions/matches/targets.

The idea consists of using a generation cursor of 1 bit and
a bitmask of 2 bits per rule. Assuming the gencursor is 0,
then the genmask (expressed as a bitmask) can be interpreted
as:

00 active in the present, will be active in the next generation.
01 inactive in the present, will be active in the next generation.
10 active in the present, will be deleted in the next generation.
^
gencursor

Once you invoke the transition to the next generation, the global
gencursor is updated:

00 active in the present, will be active in the next generation.
01 active in the present, needs to zero its future, it becomes 00.
10 inactive in the present, delete now.
^
gencursor

If a dump is in progress and nf_tables enters a new generation,
the dump will stop and return -EBUSY to let userspace know that
it has to retry again. In order to invalidate dumps, a global
genctr counter is increased everytime nf_tables enters a new
generation.

This new operation can be used from the user-space utility
that controls the firewall, eg.

nft -f restore

The rule updates contained in `file' will be applied atomically.

cat file
-----
add filter INPUT ip saddr 1.1.1.1 counter accept #1
del filter INPUT ip daddr 2.2.2.2 counter drop #2
-EOF-

Note that the rule 1 will be inactive until the transition to the
next generation, the rule 2 will be evicted in the next generation.

There is a penalty during the rule update due to the branch
misprediction in the packet matching framework. But that should be
quickly resolved once the iteration over the commit list that
contain rules that require updates is finished.

Event notification happens once the rule-set update has been
committed. So we skip notifications is case the rule-set update
is aborted, which can happen in case that the rule-set is tested
to apply correctly.

This patch squashed the following patches from Pablo:

* nf_tables: atomic rule updates and dumps
* nf_tables: get rid of per rule list_head for commits
* nf_tables: use per netns commit list
* nfnetlink: add batch support and use it from nf_tables
* nf_tables: all rule updates are transactional
* nf_tables: attach replacement rule after stale one
* nf_tables: do not allow deletion/replacement of stale rules
* nf_tables: remove unused NFTA_RULE_FLAGS

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 5e948466 10-Oct-2013 Eric Leblond <eric@regit.org>

netfilter: nf_tables: add insert operation

This patch adds a new rule attribute NFTA_RULE_POSITION which is
used to store the position of a rule relatively to the others.
By providing the create command and specifying the position, the
rule is inserted after the rule with the handle equal to the
provided position.

Regarding notification, the position attribute specifies the
handle of the previous rule to make sure we don't point to any
stale rule in notifications coming from the commit path.

This patch includes the following fix from Pablo:

* nf_tables: fix rule deletion event reporting

Signed-off-by: Eric Leblond <eric@regit.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 99633ab2 10-Oct-2013 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: complete net namespace support

Register family per netnamespace to ensure that sets are
only visible in its approapriate namespace.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 9ddf6323 10-Oct-2013 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: add support for dormant tables

This patch allows you to temporarily disable an entire table.
You can change the state of a dormant table via NFT_MSG_NEWTABLE
messages. Using this operation you can wake up a table, so their
chains are registered.

This provides atomicity at chain level. Thus, the rule-set of one
chain is applied at once, avoiding any possible intermediate state
in every chain. Still, the chains that belongs to a table are
registered consecutively. This also allows you to have inactive
tables in the kernel.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 0ca743a5 13-Oct-2013 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: add compatibility layer for x_tables

This patch adds the x_tables compatibility layer. This allows you
to use existing x_tables matches and targets from nf_tables.

This compatibility later allows us to use existing matches/targets
for features that are still missing in nf_tables. We can progressively
replace them with native nf_tables extensions. It also provides the
userspace compatibility software that allows you to express the
rule-set using the iptables syntax but using the nf_tables kernel
components.

In order to get this compatibility layer working, I've done the
following things:

* add NFNL_SUBSYS_NFT_COMPAT: this new nfnetlink subsystem is used
to query the x_tables match/target revision, so we don't need to
use the native x_table getsockopt interface.

* emulate xt structures: this required extending the struct nft_pktinfo
to include the fragment offset, which is already obtained from
ip[6]_tables and that is used by some matches/targets.

* add support for default policy to base chains, required to emulate
x_tables.

* add NFTA_CHAIN_USE attribute to obtain the number of references to
chains, required by x_tables emulation.

* add chain packet/byte counters using per-cpu.

* support 32-64 bits compat.

For historical reasons, this patch includes the following patches
that were posted in the netfilter-devel mailing list.

From Pablo Neira Ayuso:
* nf_tables: add default policy to base chains
* netfilter: nf_tables: add NFTA_CHAIN_USE attribute
* nf_tables: nft_compat: private data of target and matches in contiguous area
* nf_tables: validate hooks for compat match/target
* nf_tables: nft_compat: release cached matches/targets
* nf_tables: x_tables support as a compile time option
* nf_tables: fix alias for xtables over nftables module
* nf_tables: add packet and byte counters per chain
* nf_tables: fix per-chain counter stats if no counters are passed
* nf_tables: don't bump chain stats
* nf_tables: add protocol and flags for xtables over nf_tables
* nf_tables: add ip[6]t_entry emulation
* nf_tables: move specific layer 3 compat code to nf_tables_ipv[4|6]
* nf_tables: support 32bits-64bits x_tables compat
* nf_tables: fix compilation if CONFIG_COMPAT is disabled

From Patrick McHardy:
* nf_tables: move policy to struct nft_base_chain
* nf_tables: send notifications for base chain policy changes

From Alexander Primak:
* nf_tables: remove the duplicate NF_INET_LOCAL_OUT

From Nicolas Dichtel:
* nf_tables: fix compilation when nf-netlink is a module

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 9370761c 10-Oct-2013 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: convert built-in tables/chains to chain types

This patch converts built-in tables/chains to chain types that
allows you to deploy customized table and chain configurations from
userspace.

After this patch, you have to specify the chain type when
creating a new chain:

add chain ip filter output { type filter hook input priority 0; }
^^^^ ------

The existing chain types after this patch are: filter, route and
nat. Note that tables are just containers of chains with no specific
semantics, which is a significant change with regards to iptables.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# ef1f7df9 10-Oct-2013 Patrick McHardy <kaber@trash.net>

netfilter: nf_tables: expression ops overloading

Split the expression ops into two parts and support overloading of
the runtime expression ops based on the requested function through
a ->select_ops() callback.

This can be used to provide optimized implementations, for instance
for loading small aligned amounts of data from the packet or inlining
frequently used operations into the main evaluation loop.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 20a69341 10-Oct-2013 Patrick McHardy <kaber@trash.net>

netfilter: nf_tables: add netlink set API

This patch adds the new netlink API for maintaining nf_tables sets
independently of the ruleset. The API supports the following operations:

- creation of sets
- deletion of sets
- querying of specific sets
- dumping of all sets

- addition of set elements
- removal of set elements
- dumping of all set elements

Sets are identified by name, each table defines an individual namespace.
The name of a set may be allocated automatically, this is mostly useful
in combination with the NFT_SET_ANONYMOUS flag, which destroys a set
automatically once the last reference has been released.

Sets can be marked constant, meaning they're not allowed to change while
linked to a rule. This allows to perform lockless operation for set
types that would otherwise require locking.

Additionally, if the implementation supports it, sets can (as before) be
used as maps, associating a data value with each key (or range), by
specifying the NFT_SET_MAP flag and can be used for interval queries by
specifying the NFT_SET_INTERVAL flag.

Set elements are added and removed incrementally. All element operations
support batching, reducing netlink message and set lookup overhead.

The old "set" and "hash" expressions are replaced by a generic "lookup"
expression, which binds to the specified set. Userspace is not aware
of the actual set implementation used by the kernel anymore, all
configuration options are generic.

Currently the implementation selection logic is largely missing and the
kernel will simply use the first registered implementation supporting the
requested operation. Eventually, the plan is to have userspace supply a
description of the data characteristics and select the implementation
based on expected performance and memory use.

This patch includes the new 'lookup' expression to look up for element
matching in the set.

This patch includes kernel-doc descriptions for this set API and it
also includes the following fixes.

From Patrick McHardy:
* netfilter: nf_tables: fix set element data type in dumps
* netfilter: nf_tables: fix indentation of struct nft_set_elem comments
* netfilter: nf_tables: fix oops in nft_validate_data_load()
* netfilter: nf_tables: fix oops while listing sets of built-in tables
* netfilter: nf_tables: destroy anonymous sets immediately if binding fails
* netfilter: nf_tables: propagate context to set iter callback
* netfilter: nf_tables: add loop detection

From Pablo Neira Ayuso:
* netfilter: nf_tables: allow to dump all existing sets
* netfilter: nf_tables: fix wrong type for flags variable in newelem

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 96518518 14-Oct-2013 Patrick McHardy <kaber@trash.net>

netfilter: add nftables

This patch adds nftables which is the intended successor of iptables.
This packet filtering framework reuses the existing netfilter hooks,
the connection tracking system, the NAT subsystem, the transparent
proxying engine, the logging infrastructure and the userspace packet
queueing facilities.

In a nutshell, nftables provides a pseudo-state machine with 4 general
purpose registers of 128 bits and 1 specific purpose register to store
verdicts. This pseudo-machine comes with an extensible instruction set,
a.k.a. "expressions" in the nftables jargon. The expressions included
in this patch provide the basic functionality, they are:

* bitwise: to perform bitwise operations.
* byteorder: to change from host/network endianess.
* cmp: to compare data with the content of the registers.
* counter: to enable counters on rules.
* ct: to store conntrack keys into register.
* exthdr: to match IPv6 extension headers.
* immediate: to load data into registers.
* limit: to limit matching based on packet rate.
* log: to log packets.
* meta: to match metainformation that usually comes with the skbuff.
* nat: to perform Network Address Translation.
* payload: to fetch data from the packet payload and store it into
registers.
* reject (IPv4 only): to explicitly close connection, eg. TCP RST.

Using this instruction-set, the userspace utility 'nft' can transform
the rules expressed in human-readable text representation (using a
new syntax, inspired by tcpdump) to nftables bytecode.

nftables also inherits the table, chain and rule objects from
iptables, but in a more configurable way, and it also includes the
original datatype-agnostic set infrastructure with mapping support.
This set infrastructure is enhanced in the follow up patch (netfilter:
nf_tables: add netlink set API).

This patch includes the following components:

* the netlink API: net/netfilter/nf_tables_api.c and
include/uapi/netfilter/nf_tables.h
* the packet filter core: net/netfilter/nf_tables_core.c
* the expressions (described above): net/netfilter/nft_*.c
* the filter tables: arp, IPv4, IPv6 and bridge:
net/ipv4/netfilter/nf_tables_ipv4.c
net/ipv6/netfilter/nf_tables_ipv6.c
net/ipv4/netfilter/nf_tables_arp.c
net/bridge/netfilter/nf_tables_bridge.c
* the NAT table (IPv4 only):
net/ipv4/netfilter/nf_table_nat_ipv4.c
* the route table (similar to mangle):
net/ipv4/netfilter/nf_table_route_ipv4.c
net/ipv6/netfilter/nf_table_route_ipv6.c
* internal definitions under:
include/net/netfilter/nf_tables.h
include/net/netfilter/nf_tables_core.h
* It also includes an skeleton expression:
net/netfilter/nft_expr_template.c
and the preliminary implementation of the meta target
net/netfilter/nft_meta_target.c

It also includes a change in struct nf_hook_ops to add a new
pointer to store private data to the hook, that is used to store
the rule list per chain.

This patch is based on the patch from Patrick McHardy, plus merged
accumulated cleanups, fixes and small enhancements to the nftables
code that has been done since 2009, which are:

From Patrick McHardy:
* nf_tables: adjust netlink handler function signatures
* nf_tables: only retry table lookup after successful table module load
* nf_tables: fix event notification echo and avoid unnecessary messages
* nft_ct: add l3proto support
* nf_tables: pass expression context to nft_validate_data_load()
* nf_tables: remove redundant definition
* nft_ct: fix maxattr initialization
* nf_tables: fix invalid event type in nf_tables_getrule()
* nf_tables: simplify nft_data_init() usage
* nf_tables: build in more core modules
* nf_tables: fix double lookup expression unregistation
* nf_tables: move expression initialization to nf_tables_core.c
* nf_tables: build in payload module
* nf_tables: use NFPROTO constants
* nf_tables: rename pid variables to portid
* nf_tables: save 48 bits per rule
* nf_tables: introduce chain rename
* nf_tables: check for duplicate names on chain rename
* nf_tables: remove ability to specify handles for new rules
* nf_tables: return error for rule change request
* nf_tables: return error for NLM_F_REPLACE without rule handle
* nf_tables: include NLM_F_APPEND/NLM_F_REPLACE flags in rule notification
* nf_tables: fix NLM_F_MULTI usage in netlink notifications
* nf_tables: include NLM_F_APPEND in rule dumps

From Pablo Neira Ayuso:
* nf_tables: fix stack overflow in nf_tables_newrule
* nf_tables: nft_ct: fix compilation warning
* nf_tables: nft_ct: fix crash with invalid packets
* nft_log: group and qthreshold are 2^16
* nf_tables: nft_meta: fix socket uid,gid handling
* nft_counter: allow to restore counters
* nf_tables: fix module autoload
* nf_tables: allow to remove all rules placed in one chain
* nf_tables: use 64-bits rule handle instead of 16-bits
* nf_tables: fix chain after rule deletion
* nf_tables: improve deletion performance
* nf_tables: add missing code in route chain type
* nf_tables: rise maximum number of expressions from 12 to 128
* nf_tables: don't delete table if in use
* nf_tables: fix basechain release

From Tomasz Bursztyka:
* nf_tables: Add support for changing users chain's name
* nf_tables: Change chain's name to be fixed sized
* nf_tables: Add support for replacing a rule by another one
* nf_tables: Update uapi nftables netlink header documentation

From Florian Westphal:
* nft_log: group is u16, snaplen u32

From Phil Oester:
* nf_tables: operational limit match

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>