History log of /freebsd-current/sys/netpfil/pf/pf.c
Revision Date Author Comments
# 301ec2ce 06-May-2024 Kristof Provost <kp@FreeBSD.org>

pf: always mark states as unlinked before detaching them

Users have reported crashes in pf_test_state_udp() where at least one state key
is NULL.

That suggests that pf_detach_state() ran concurrently with pf_test_state_udp().
pf_test_state_udp() holds the state lock (aka the id lock), but
pf_detach_state() does not.
The intent is that detached states are not returned by STATE_LOOKUP/
pf_find_state(), as the state's timeout is set to PFTM_UNLINKED and thus
pf_find_state() does not find the state.

There are other paths to pf_detach_state() (outside of pf_unlink_state())
though, where we did not set the timeout to PFTM_UNLINKED. Fix those, and assert
that the timeout is set correctly when we enter pf_detach_state().

MFC after: 1 week
See also: https://redmine.pfsense.org/issues/15413
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D45101


# 93c5ba5a 22-Apr-2024 Lexi Winter <lexi@le-Fay.ORG>

sys/netpfil/pf: fix non-INET module build

pf.ko, when built as a module without 'options INET' but with 'options
VIMAGE', won't load:

link_elf_obj: symbol vnet_entry_in_loopback_mask undefined

This is because it uses IN_LOOPBACK(), which in the VIMAGE case uses
INET-specific symbols.

Fix by making this check conditional on #ifdef INET.

Reviewed by: imp
Pull Request: https://github.com/freebsd/freebsd-src/pull/1157


# 8ce3ef5f 18-Apr-2024 Gordon Bergling <gbe@FreeBSD.org>

netpfil: Fix typos in source code comments

- s/addres/address/

MFC after: 3 days


# a983cea4 27-Mar-2024 Kristof Provost <kp@FreeBSD.org>

pf: fix reply-to after rdr and dummynet

If we redirect a packet to localhost and it gets dummynet'd it may be
re-injected later (e.g. when delayed) which means it will be passed
through ip_input() again. ip_input() will then reject the packet because
it's directed to the loopback address, but did not arrive on a loopback
interface.

Fix this by having pf set the rcvif to V_iflo if we redirect to
loopback.

See also: https://redmine.pfsense.org/issues/15363
Sponsored by: Rubicon Communications, LLC ("Netgate")


# a1ecbc57 23-Mar-2024 Kristof Provost <kp@FreeBSD.org>

pf: fix use-after-free

If we fragment the packet in pf_route() the first transmitted packet
will free the pf_mtag we have stored in pf_pdesc (pd). Ensure we
update that pointer for every packet to avoid using a freed pointer in
pf_dummynet_route().

Reported by: CI KASAN, markj
MFC after: 1 week


# c6f11163 12-Mar-2024 Kristof Provost <kp@FreeBSD.org>

pf: fix dummynet + route-to

Ensure that we pick the correct dummynet pipe (i.e. forward vs. reverse
direction) when applying route-to.

We mark the processing as outbound so that dummynet will re-inject in
the correct phase of processing after it's done with the packet, but
that will cause us to pick the wrong pipe number. Reverse them so that
the incorrect decision ends up picking the correct pipe.

Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D44366


# 0ea0c026 11-Mar-2024 Kristof Provost <kp@FreeBSD.org>

pf: avoid passing through dummynet multiple times

In some setups we end up with multiple states created for a single
packet, which in turn can mean we run the packet through dummynet
multiple times. That's not expected or intended. Mark each packet when
it goes through dummynet, and do not pass packet through dummynet if
they're marked as having already passed through.

See also: https://redmine.pfsense.org/issues/14854
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D44365


# 6460322a 02-Feb-2024 Kristof Provost <kp@FreeBSD.org>

pf: support if-bound with reply-to

On reply-to we don't know what interface to bind to when we create
the state. Create any reply-to state as floating, but bind to the
appropriate interface once we're handling the reply.

See also: https://redmine.pfsense.org/issues/15220
Sponsored by: Rubicon Communications, LLC ("Netgate")


# 9566d927 27-Feb-2024 Kristof Provost <kp@FreeBSD.org>

pf: fix packet-to-big for route-to as well

When we handle a packet via route-to (i.e. pf_route6()) we still need to
verify the MTU. However, we only run that check in the forwarding case.

Set the PFIL_FWD tag when running the pf_test6(PF_OUT) check from
pf_route6(). We are in fact forwarding, so should call the test function
as such. This will cause us to run the MTU check, and generate an ICMP6
packet-too-big error when required.

See also: 54c62e3e5d8cd90c5571a1d4c8c5f062d580480e
See also: f1c0030bb05cfa01bdd500e50befbb425fecc4c4
See also: https://redmine.pfsense.org/issues/14290
Sponsored by: Rubicon Communications, LLC ("Netgate")


# 04c68025 02-Feb-2024 Kristof Provost <kp@FreeBSD.org>

pf: add a probe point to BOUND_IFACE

It's been useful at least once, so we may as well keep it.

Sponsored by: Rubicon Communications, LLC ("Netgate")


# 58a26743 05-Feb-2024 Kajetan Staszkiewicz <vegeta@tuxpowered.net>

pf: Ensure that st->kif is obtained in a way which respects the r->rpool->mtx mutex

The redirection pool stored in r->rpool.cur is used for loadbalancing
and cur can change whenever loadbalancing happens, which is for every
new connection. Therefore it can't be trusted outside of pf_map_addr()
and the r->rpool->mtx mutex. After evaluating the ruleset, loadbalancing
decission is made in pf_map_addr() called from within pf_create_state()
and stored in the state itself.

This patch modifies BOUND_IFACE() so that it only uses the information
already stored in the state which has been obtained in a way which
respects the r->rpool->mtx mutex.

Reviewed by: kp
Differential Revision: https://reviews.freebsd.org/D43741


# 8a16fd43 06-Feb-2024 Kristof Provost <kp@FreeBSD.org>

Revert "pf: Ensure that st->kif is obtained in a way which respects the r->rpool->mtx mutex"

This commit is correct, but was misattributed. Revert so we can re-apply
with the correct author set.

This reverts commit 6d4a140acfdf637bb559d371c583e4db478e1549.


# 6d4a140a 05-Feb-2024 Igor Ostapenko <pm@igoro.pro>

pf: Ensure that st->kif is obtained in a way which respects the r->rpool->mtx mutex

The redirection pool stored in r->rpool.cur is used for loadbalancing
and cur can change whenever loadbalancing happens, which is for every
new connection. Therefore it can't be trusted outside of pf_map_addr()
and the r->rpool->mtx mutex. After evaluating the ruleset, loadbalancing
decission is made in pf_map_addr() called from within pf_create_state()
and stored in the state itself.

This patch modifies BOUND_IFACE() so that it only uses the information
already stored in the state which has been obtained in a way which
respects the r->rpool->mtx mutex.

Reviewed by: kp
Differential Revision: https://reviews.freebsd.org/D43741


# 11ff3552 02-Feb-2024 rilysh <nightquick@proton.me>

sys/netpfil/pf/pf.c: remove an extra semicolon

Signed-off-by: rilysh <nightquick@proton.me>
Reviewed by: imp
Pull Request: https://github.com/freebsd/freebsd-src/pull/959


# c0708798 02-Feb-2024 rilysh <nightquick@proton.me>

sys/netpfil/pf/pf.c: remove an extra semicolon

Signed-off-by: rilysh <nightquick@proton.me>
Reviewed by: imp
Pull Request: https://github.com/freebsd/freebsd-src/pull/959


# b8ef285f 01-Feb-2024 Kristof Provost <kp@FreeBSD.org>

pf: ensure dummynet gets the correct direction after route-to

If we apply a route-to to an inbound packet pf_route() may hand that
packet over to dummynet. Dummynet may then delay the packet, and later
re-inject it. This re-injection (in dummynet_send()) needs to know
if the packet was inbound or outbound, to call the correct path for
continued processing.

That's done based on the pf_pdesc we pass along (through
pf_dummynet_route() and pf_pdesc_to_dnflow()). In the case of pf_route()
on inbound packets that may be wrong, because we're called in the input
path, and didn't update pf_pdesc->dir.

This can manifest in issues with fragmented packets. For example, a
fragmented packet will be re-fragmented in pf_route(), and if dummynet
makes different decisions for some of the fragments (that is, it delays
some and allows others to pass through directly) this will break.

The packets that pass through dummynet without delay will be transmitted
correctly (through the ifp->if_output() call in pf_route()), but
the delayed packets will be re-injected in the input path (and not
the output path, as they should be). These packets will pass through
pf_test(PF_IN) as they're tagged PF_MTAG_FLAG_DUMMYNET. However,
this tag is then removed and the packet will be routed and enter
pf_test(PF_OUT) where pf_reassemble() will hold them indefinitely
(as some fragments have been transmitted directly, and will never hit
pf_test(PF_OUT)).

The fix is simple: we must update pf_pfdesc->dir to PF_OUT before we
pass the packet to dummynet.

See also: https://redmine.pfsense.org/issues/15156
Reviewed by: rcm
Sponsored by: Rubicon Communications, LLC ("Netgate")


# 31828075 25-Jan-2024 Kristof Provost <kp@FreeBSD.org>

pf: bind route-to states to their route-to interface

When we route-to the state should be bound to the route-to interface,
not the default route interface. However, we should only do so for
outbound traffic, because inbound traffic should bind on the arriving
interface, not the one we eventually transmit on.

Explicitly check for this in BOUND_IFACE().

We must also extend pf_find_state(), because subsequent packets within
the established state will attempt to match the original interface, not
the route-to interface.

Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D43589


# f1c0030b 24-Jan-2024 Kristof Provost <kp@FreeBSD.org>

pf: only check MTU for IPv6 packets when forwarding

When the packets are generated locally (i.e. PFIL_FWD is not set) we
might generate overly large packets and rely on the NIC to fragment it
for us. In that case we'd reject a valid packet.

Reported by: Herbert J. Skuhra <herbert@gojira.at>
Tested by: Herbert J. Skuhra <herbert@gojira.at>
Fixes: 54c62e3e5d8cd90c5571a1d4c8c5f062d580480e
Sponsored by: Rubicon Communications, LLC ("Netgate")


# 54c62e3e 17-Jan-2024 Kristof Provost <kp@FreeBSD.org>

pf: work around icmp6 packet-too-big not being sent when binat-ing

If we're applying NPTv6 we pass a packet with a modified source and/or
destination address to the network stack.

If that packet then turns out to be larger than the MTU of the sending
interface the stack will attempt to generate an icmp6 packet-too-big
error, but may fail to look up the appropriate source address for that
error message. Even if it does, pf would still have to undo the binat
operation inside the icmp6 packet so the sending host can make sense of
the error.

We can avoid both problems entirely by having pf also perform the MTU
check (taking the potential refragmentation into account), and
generating the icmp6 error directly in pf.

See also: https://redmine.pfsense.org/issues/14290
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D43499


# fc6e5069 13-Dec-2023 Kristof Provost <kp@FreeBSD.org>

pflow: add RFC8158 NAT support

Extend pflow(4) to send NAT44 Session Create and Delete events.
This applies only to IPFIX (i.e. proto version 10), and requires no
user configuration.

Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D43114


# 04932601 07-Dec-2023 Kristof Provost <kp@FreeBSD.org>

pf: store state creation/expiration timestamps with milisecond precision

The primary beneficiary is pflow(4), which expects milisecond precision
in timestamps.

Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D43112


# baf9b6d0 01-Dec-2023 Kristof Provost <kp@FreeBSD.org>

pf: allow pflow to be activated per rule

Only generate ipfix/netflow reports (through pflow) for the rules where
this is enabled. Reports can also be enabled globally through 'set
state-default pflow'.

Obtained from: OpenBSD
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D43108


# f92d9b1a 28-Nov-2023 Kristof Provost <kp@FreeBSD.org>

pflow: import from OpenBSD

pflow is a pseudo device to export flow accounting data over UDP.
It's compatible with netflow version 5 and IPFIX (10).

The data is extracted from the pf state table. States are exported once
they are removed.

Reviewed by: melifaro
Obtained from: OpenBSD
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D43106


# 948e8413 02-Jan-2024 Kristof Provost <kp@FreeBSD.org>

pflog: pass the action to pflog directly

If a packet is malformed, it is dropped by pf(4). The rule referenced
in pflog(4) is the default rule. As the default rule is a pass
rule, tcpdump printed "pass" although the packet was actually
dropped. Use the actual action, rather than the rule's action, or an
attempt at guessing the correct action.

Inspired by OpenBSD's 'pflog(4) logs packet dropped by default rule with block.' commit.

Sponsored by: Rubicon Communications, LLC ("Netgate")


# 5f840a17 02-Jan-2024 Kristof Provost <kp@FreeBSD.org>

pf: don't clobber log flag

If we decide to discard a packet due to unexpected IP options or
unsupported headers we set pd.act.log. However, this can later get
overwritten when we copy the state's saved actions over.

Merge the two log fields to ensure we log as expected.

Sponsored by: Rubicon Communications, LLC ("Netgate")


# 6284d5f7 29-Nov-2023 Kristof Provost <kp@FreeBSD.org>

pf: remove incorrect fragmentation check

We do not need to check PFDESC_IP_REAS while tracking TCP state.
Moreover, this check incorrectly considers no-data packets (e.g. RST) to
be in-window when this flag is not set.

Sponsored by: Rubicon Communications, LLC ("Netgate")
Approved by: so
Security: FreeBSD-SA-23:17.pf


# 7093414c 17-Nov-2023 Kristof Provost <kp@FreeBSD.org>

pf: sctp heartbeats confirm a connection

When we create a new state for multihomed sctp connections (i.e.
based on INIT/INIT_ACK or ASCONF parameters) the new connection will
never see a COOKIE/COOKIE_ACK exchange. We should consider HEARTBEAT_ACK
to be a confirmation that the connection is established.

This ensures that such connections do not time out earlier than
expected.

MFC after: 1 week
Sponsored by: Orange Business Services


# a8dbbeb1 16-Nov-2023 Kristof Provost <kp@FreeBSD.org>

pf: skip urpf check for sctp multihomed states

When we create a new state for multihomed sctp connections (i.e.
based on INIT/INIT_ACK or ASCONF parameters) we cannot know what
interfaces we'll be seeing that traffic on. These states are floating
states, i.e. on "all" interfaces. We cannot do reverse path filtering
for these states, so do not do so.

MFC after: 1 week
Sponsored by: Orange Business Services


# 0fe663b2 16-Nov-2023 Kristof Provost <kp@FreeBSD.org>

pf: always create multihomed states as floating

When we create a new state for multihomed sctp connections (i.e.
based on INIT/INIT_ACK or ASCONF parameters) we cannot know what
interfaces we'll be seeing that traffic on. Make those states floating,
irrespective of state policy.

MFC after: 1 week
Sponsored by: Orange Business Services


# fe3bb40b 17-Nov-2023 Igor Ostapenko <pm@igoro.pro>

pf: fix dummynet + ipdivert use case

Dummynet re-injects an mbuf with MTAG_IPFW_RULE added, and the same mtag
is used by divert(4) as parameters for packet diversion.

If according to pf rule set a packet should go through dummynet first
and through ipdivert after then mentioned mtag must be removed after
dummynet not to make ipdivert think that this is its input parameters.

At the very beginning ipfw consumes this mtag what means the same
behavior with tag clearing after dummynet.

And after fabf705f4b5a pf passes parameters to ipdivert using its
personal MTAG_PF_DIVERT mtag.

PR: 274850
Reviewed by: kp
Differential Revision: https://reviews.freebsd.org/D42609


# af21145f 09-Nov-2023 Konstantin Belousov <kib@FreeBSD.org>

pf_purge_expired_states(): fix build without SDT probes

Sponsored by: The FreeBSD Foundation


# 0d2ab4a4 09-Nov-2023 Kristof Provost <kp@FreeBSD.org>

pf: add hashtable row count SDT

This allows us to figure out how many states each hashrow contains. That
can be important to know when debugging performance issues.

A simple probe could be:

dtrace -n 'pf:purge:state:rowcount { @counts["states per row"] = quantize(arg1); }'
dtrace: description 'pf:purge:state:rowcount ' matched 1 probe
^C

states per row
value ------------- Distribution ------------- count
-1 | 0
0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 8257624
1 | 14321
2 | 0

MFC after: 1 week
Sponsored by: Modirum MDPay


# ca9dbde8 27-Oct-2023 Kristof Provost <kp@FreeBSD.org>

pf: support SCTP-specific timeouts

Allow SCTP state timeouts to be configured independently from TCP state
timeouts.

Reviewed by: tuexen
MFC after: 1 week
Sponsored by: Orange Business Services
Differential Revision: https://reviews.freebsd.org/D42393


# d6d38b02 17-Oct-2023 Kristof Provost <kp@FreeBSD.org>

pf: fix missing SCTP multihomed states

The existing code to create extra states when SCTP endpoints supplied
extra addresses missed a case. As a result we failed to generate all of
the required states.

Briefly, if host A has address 1 and 2 and host B has addres 3 and 4 we
generated 1 - 3 and 2 - 3, as well as 1 - 4, but not 2 - 4.

Store the list of endpoints supplied by each host and use those to
generate all of the connection permutations.

MFC after: 1 week
Sponsored by: Orange Business Services
Differential Revision: https://reviews.freebsd.org/D42361


# c1146e6a 20-Oct-2023 Kristof Provost <kp@FreeBSD.org>

pf: use an enum for packet direction in divert tag

The benefit is that in the debugger you will see PF_DIVERT_MTAG_DIR_IN
instead of 1 when looking at a structure. And compilation time failure
if anybody sets it to a wrong value. Using "port" instead of "ndir" when
assigning a port improves readability of code.

Suggested by: glebius
MFC after: 3 weeks
X-MFC-With: fabf705f4b


# fabf705f 18-Oct-2023 Igor Ostapenko <pm@igoro.pro>

pf: fix pf divert-to loop

Resolved conflict between ipfw and pf if both are used and pf wants to
do divert(4) by having separate mtags for pf and ipfw.

Also fix the incorrect 'rulenum' check, which caused the reported loop.

While here add a few test cases to ensure that divert-to works as
expected, even if ipfw is loaded.

divert(4)
PR: 272770
MFC after: 3 weeks
Reviewed by: kp
Differential Revision: https://reviews.freebsd.org/D42142


# 4d19ecea 12-Oct-2023 Kajetan Staszkiewicz <vegeta@tuxpowered.net>

pf: Free pf_rule_items when state is not created

This addresses the issues of pf_rule_times leaking in case of stateless
rules and in case of state creation failures, like hitting the state
limit.

Reviewed by: kp
MFC after: 1 week
Sponsored by: InnoGames GmbH
Differential Revision: https://reviews.freebsd.org/D42169


# b00dbe85 05-Oct-2023 Kristof Provost <kp@FreeBSD.org>

pf: fix SCTP SDT probe

We want the return value of pf_test_rule(), i.e. the result of the
evaluation of the new state, not the result of the evaluation of the
original packet/state.

MFC after: 1 week
Sponsored by: Orange Business Services


# 74c24613 03-Oct-2023 Kristof Provost <kp@FreeBSD.org>

pf: cope with missing rpool.cur

If we're evaluating a pfsync'd state (and have different rules on both
ends) our state may point to the default rule, which does not have
rpool.cur set. As a result we can end up dereferencing a NULL pointer.

Explicitly check for this when we try to re-construct the route-to interface.

Also add a test case which can trigger this issue.

MFC after: 3 days
See also: https://redmine.pfsense.org/issues/14804
Sponsored by: Rubicon Communications, LLC ("Netgate")


# 480f62cc 29-Sep-2023 Kristof Provost <kp@FreeBSD.org>

pf: only create sctp multihome states if we pass the packet

If we've decided to drop the packet we shouldn't create additional
states based off it.

MFC after: 3 days
Sponsored by: Orange Business Services


# aefda9c9 28-Sep-2023 Kristof Provost <kp@FreeBSD.org>

pf: ensure 'off' is always set before use

If we bail out early from pf_test(6)() we still need to clean up/finish
SCTP multihome work, which requires the 'off' value to be set. Set it
early enough.

MFC after: 3 days
Sponsored by: Orange Business Services


# b6ce4111 06-Sep-2023 Kristof Provost <kp@FreeBSD.org>

pf: fix state leak

If we hit the csfailed case in pf_create_state() we may have allocated
a state, so we must also free it. While here reduce the amount of
duplicated cleanup code.

MFC after: 2 weeks
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D41772


# 3482f57f 09-Sep-2023 Konstantin Belousov <kib@FreeBSD.org>

netpfil/pf/pf.c: fix build without dtrace

Sponsored by: The FreeBSD Foundation


# 4d3af82f 05-Sep-2023 Kristof Provost <kp@FreeBSD.org>

pf: mark removed connections within a multihome association as shutting down

Parse IP removal in ASCONF chunks, find the affected state(s) and mark
them as shutting down. This will cause them to time out according to
PFTM_TCP_CLOSING timeouts, rather than waiting for the established
session timeout.

MFC after: 3 weeks
Sponsored by: Orange Business Services


# f1cc29af 04-Sep-2023 Kristof Provost <kp@FreeBSD.org>

pf: inherit v_tag values to multihomed connections

When we create a new state for an existing SCTP association inherit the
v_tag values from the original connection.

MFC after: 3 weeks
Sponsored by: Orange Business Services


# 51a78dd2 01-Sep-2023 Kristof Provost <kp@FreeBSD.org>

pf: improve SCTP state validation

Only create new states for INIT chunks, or when we're creating a
secondary state for a multihomed association.

Store and verify verification tag.

MFC after: 3 weeks
Sponsored by: Orange Business Services


# 10aa9ddb 02-Aug-2023 Kristof Provost <kp@FreeBSD.org>

pf: support SCTP multihoming

SCTP may announce additional IP addresses it'll use in the INIT/INIT_ACK
chunks, or in ASCONF chunks at any time during the connection. Parse these
parameters, evaluate the ruleset for the new connection and if allowed
create the corresponding states.

MFC after: 3 weeks
Sponsored by: Orange Business Services
Differential Revision: https://reviews.freebsd.org/D41637


# d10de21f 24-Aug-2023 Kajetan Staszkiewicz <vegeta@tuxpowered.net>

pf: Access r->rpool.cur->kif under mutex protection

pf_route() sends traffic to a specified next hop over a specific
interface. The next hop is obtained in pf_map_addr() but the interface
is obtained directly via r->rpool.cur->kif` outside of the lock held in
pf_map_addr() in multiple places around pf. The chosen interface is not
stored in source node.

Move the interface selection into pf_map_addr(), have the function
return it together with the chosen IP address and ensure its stored
in struct pf_ksrc_node, store it in the source node and use the stored
value when needed.

Sponsored by: InnoGames GmbH
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D41570


# 92d41522 21-Aug-2023 Kajetan Staszkiewicz <vegeta@tuxpowered.net>

pf: enable the syncookie feature for IPv6

When syncookie support was added to pf the relevant work was only done
in pf_test(), not pf_test6(). Do this now.

MFC after: 1 week
Reviewed by: kp
Sponsored by: InnoGames GmbH
Differential Revision: https://reviews.freebsd.org/D41502


# 9642d948 20-Aug-2023 Kajetan Staszkiewicz <vegeta@tuxpowered.net>

pf: reduce indentation

Early-return to reduce syncookie-related indentation.

No functional change.

MFC after: 1 week
Reviewed by: kp
Sponsored by: InnoGames GmbH
Differential Revision: https://reviews.freebsd.org/D41502


# 685dc743 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove $FreeBSD$: one-line .c pattern

Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/


# 6053adaf 01-Jun-2023 Kristof Provost <kp@FreeBSD.org>

pf: add SCTP NAT support

Support NAT-ing SCTP connections.

This is mostly similar to UDP and TCP, but we refuse to change ports for
SCTP, to avoid interfering with multihomed connections.

As a result we also never copy the SCTP header back or recalculate
checksums as we'd do for TCP or UDP (because we don't modify the header
for SCTP).

We do use the existing pf_change_ap() function to modify the packet,
because we may still need to update the IPv4 header checksum.

Reviewed by: tuexen
MFC after: 3 weeks
Sponsored by: Orange Business Services
Differential Revision: https://reviews.freebsd.org/D40866


# d1bc1e9e 31-May-2023 Kristof Provost <kp@FreeBSD.org>

pf: support 'return' for SCTP

Send an SCTP Abort message if we're refusing a connection, just like we
send a RST for TCP.

MFC after: 3 weeks
Sponsored by: Orange Business Services
Differential Revision: https://reviews.freebsd.org/D40864


# 010ee43f 27-Apr-2023 Kristof Provost <kp@FreeBSD.org>

pf: initial SCTP support

Basic state tracking for SCTP. This means we scan through the packet to
identify the different chunks (so we can identify state changes).

MFC after: 3 weeks
Sponsored by: Orange Business Services
Differential Revision: https://reviews.freebsd.org/D40862


# 61e22e9b 07-Jul-2023 Kristof Provost <kp@FreeBSD.org>

pf: use sctp_calculate_cksum()

This function is always available, even if the SCTP or SCTP_SUPPORT options
are not set.
That lets us remove an ifdef, and also means we improve pf's SCTP handling
when the options are not set.

MFC after: 3 weeks
Sponsored by: Orange Business Services
Differential Revision: https://reviews.freebsd.org/D40911


# 6b4ed16d 12-Jul-2023 Kajetan Staszkiewicz <vegeta@tuxpowered.net>

pf: Simplify rule actions logic

Actions applied to a processed packet come in case of stateless
firewalling from a rule or in case of statefull firewalling from a
state. The state obtains the actions from a rule when it is created by a
rule or by pfsync. The logic for deciding if actions come from a rule or
a state is spread across many places in pf.

There already is struct pf_rule_actions in struct pf_pdesc and thus it
can be used as a central place for storing actions and their parameters.
OpenBSD does something similar: they also store the actions in struct
pf_pdesc and have no variables in pf_test() but they use separate
variables instead of a structure. By using struct pf_rule_actions we can
simplify the code even further. Applying of actions is done *only* in
pf_rule_to_actions() no matter if for the legacy scrub rules or for the
normal match / pass rules. The logic of choosing if rule or state
actions are used is applied only once in pf_test() by copying the whole
struct.

Reviewed by: kp
Sponsored by: InnoGames GmbH
Differential Revision: https://reviews.freebsd.org/D41009


# f2064dd1 12-Jul-2023 Kajetan Staszkiewicz <vegeta@tuxpowered.net>

pf: Fix duplicate storage of direction

The variable storing the direction of a processed packet is passed
around to many functions. Most of those functions already have a pointer
to struct pf_pdesc which also contains the direction. By using the one
in struct pf_pdesc we can reduce the amount of arguments passed around.

Reviewed by: kp
Sponsored by: InnGames GmbH
Differential Revision: https://reviews.freebsd.org/D41008


# 7dc3be36 19-Jun-2023 Kajetan Staszkiewicz <vegeta@tuxpowered.net>

pf: Fix usage of pf tags with syncookies

The value stored in pf_mtag->tag comes from "tag" and "match tag"
keywords in pf.conf and must not be abused for storing other
information. A ruleset with enough tags could set or remove the bits
responsible for PF_TAG_SYNCOOKIE_RECREATED.

Move this syncookie status to pf_mtag->flags. Rename this and other
related constants in a way that will prevent such mistakes in the
future. Move PF_REASSEMBLED constant to mbuf.h and rename accordingly
because it's not a flag stored in pf_mtag, but an identifier of a
different m_tag. Change the value of the constant to avoid conflicts
with other m_tags using MTAG_ABI_COMPAT.

Rename the variables in pf_build_tcp() and pf_send_tcp() in to reduce
confusion.

Reviewed by: kp
Sponsored by: InnoGames GmbH
Differential Revision: https://reviews.freebsd.org/D40587


# ba94bf28 15-Jun-2023 Kristof Provost <kp@FreeBSD.org>

pf: extend use of skip steps for Ethernet rules

Use the already populated PFE_SKIP_DST_ADDR and extend the skip
infrastructure to also skip on IP source/destination addresses.

This should make evaluating the rules slightly faster.

Reported by: R. Christian McDonald <rcm@rcm.sh>
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D40567


# 460f0aaf 30-May-2023 Kristof Provost <kp@FreeBSD.org>

pf: fix log message

Use __func__ so we log the correct function name.

Sponsored by: Rubicon Communications, LLC ("Netgate")


# 9925aee0 30-May-2023 Kristof Provost <kp@FreeBSD.org>

pf: carry over rule actions from route-to rules

If we route-to (or dup-to/reply-to) we re-run pf_test(), which will also
create states for the connection.
This means that we may end up matching a different (i.e. not the state
that was created by the route-to rule) state, without the attributes
(such as dummynet pipes/queues) set by the route-to rule.

Address this by inheriting the pf_rule_actions from the route-to rule
while evaluating the connection again in pf_test(). That is, we set
default pf_rule_actions based on the route-to rule for the new
evaluation. The new rule may still overrule these, but if it does not
have such actions the route-to actions are applied.

Do the same for IPv6 rules in pf_test6()/pf_route6().

See also: https://redmine.pfsense.org/issues/14039
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D40340


# 4bf98559 29-May-2023 Kajetan Staszkiewicz <vegeta@tuxpowered.net>

pf: make contents of struct pfsync_state configurable

Make struct pfsync_state contents configurable by sending out new
versions of the structure in separate subheader actions. Both old and
new version of struct pfsync_state can be understood, so replication of
states from a system running an older kernel is possible. The version
being sent out is configured using ifconfig pfsync0 … version XXXX. The
version is an user-friendly string - 1301 stands for FreeBSD 13.1 (I
have checked synchronization against a host running 13.1), 1400 stands
for 14.0.

A host running an older kernel will just ignore the messages and count
them as "packets discarded for bad action".

Reviewed by: kp
Sponsored by: InnoGames GmbH
Differential Revision: https://reviews.freebsd.org/D39392


# bdd47177 11-May-2023 Kristof Provost <kp@FreeBSD.org>

pf: release rules lock before passing the packet to dummynet

In the Ethernet rules we held the PF_RULES lock while we called
ip_dn_io_ptr() (i.e. dummynet). That meant that we could end up back in
pf while still holding the PF_RULES lock.
That's not immediately fatal, because that lock is recursive, but still
not ideal.

There also appear to be scenarios where this can actually trigger
deadlocks.

We don't need to hold the PF_RULES lock, as long as we make a local copy
of the data we need from the rule (in this case, the action and
bridge_to target). It's safe to keep the struct ifnet pointer around,
because we remain in NET_EPOCH.

See also: https://redmine.pfsense.org/issues/14373
MFC after: 1 week
Reviewed by: mjg
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D40067


# bf206a1d 04-May-2023 Kristof Provost <kp@FreeBSD.org>

pf: remove NULL check before uma_zfree()

uma_zfree() can be called on a NULL pointer. Simplify the pf code a
little by removing the redundant checks.

Sponsored by: Rubicon Communications, LLC ("Netgate")


# 16303d2b 03-May-2023 Kajetan Staszkiewicz <vegeta@tuxpowered.net>

pf: improve source node error handling

Functions manipulating source nodes can fail due to various reasons like
memory allocation errors, hitting configured limits or lack of
redirection targets. Ensure those errors are properly caught and
propagated in the code. Increase the error counters not only when
parsing the main ruleset but the NAT ruleset too.

Cherry-picked from development of D39880

Reviewed by: kp
Sponsored by: InnoGames GmbH
Differential Revision: https://reviews.freebsd.org/D39940


# db0a2bfd 01-May-2023 Kajetan Staszkiewicz <vegeta@tuxpowered.net>

pf: reduce number of hashing operations when handling source nodes

Reduce number of hashing operations when handling source nodes by always
having a pointer to the hash row mutex in the source node. Provide
macros for handling and asserting the mutex. Calculate the hash only
once in pf_find_src_node() and then use this hash in subsequent
operations.

Cherry-picked from development of D39880

Reviewed by: kp, mjg
Sponsored by: InnoGames GmbH
Differential Revision: https://reviews.freebsd.org/D39888


# a81f5112 26-Apr-2023 Kristof Provost <kp@FreeBSD.org>

pf: clear PF_TAG_ROUTE_TO for dummynet fast path

Similar to the PF_TAG_DUMMYNET we must also clear the route tag if
dummynet didn't keep the packet. In that case we'd continue immediately
and there'd be no need for the route tag. Keeping it could lead to
unexpected routing of traffic.

See also: 27407a6adc793bdfaef8a86ece32fb1b461429f0
See also: https://redmine.pfsense.org/issues/14055
Sponsored by: Rubicon Communications, LLC ("Netgate")


# 39282ef3 13-Apr-2023 Kajetan Staszkiewicz <vegeta@tuxpowered.net>

pf: backport OpenBSD syntax of "scrub" option for "match" and "pass" rules

Introduce the OpenBSD syntax of "scrub" option for "match" and "pass"
rules and the "set reassemble" flag. The patch is backward-compatible,
pf.conf can be still written in FreeBSD-style.

Obtained from: OpenBSD
MFC after: never
Sponsored by: InnoGames GmbH
Differential Revision: https://reviews.freebsd.org/D38025


# b52b61c0 12-Mar-2023 Kristof Provost <kp@FreeBSD.org>

pf: distinguish forwarding and output cases for pf_refragment6()

Re-introduce PFIL_FWD, because pf's pf_refragment6() needs to know if
we're ip6_forward()-ing or ip6_output()-ing.

ip6_forward() relies on m->m_pkthdr.rcvif, at least for link-local
traffic (for in6_get_unicast_scopeid()). rcvif is not set for locally
generated traffic (e.g. from icmp6_reflect()), so we need to call the
correct output function.

Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revisi: https://reviews.freebsd.org/D39061


# 3d0d5b21 23-Jan-2023 Justin Hibbits <jhibbits@FreeBSD.org>

IfAPI: Explicitly include <net/if_private.h> in netstack

Summary:
In preparation of making if_t completely opaque outside of the netstack,
explicitly include the header. <net/if_var.h> will stop including the
header in the future.

Sponsored by: Juniper Networks, Inc.
Reviewed by: glebius, melifaro
Differential Revision: https://reviews.freebsd.org/D38200


# 9c041b45 31-Dec-2022 Kristof Provost <kp@FreeBSD.org>

pf: fix syncookies in conjunction with tcp fast port reuse

Basic scenario: we have a closed connection (In TCPS_FIN_WAIT_2), and
get a new connection (i.e. SYN) re-using the tuple.

Without syncookies we look at the SYN, and completely unlink the old,
closed state on the SYN.
With syncookies we send a generated SYN|ACK back, and drop the SYN,
never looking at the state table.

So when the ACK (i.e. the third step in the three way handshake for
connection setup) turns up, we’ve not actually removed the old state, so
we find it, and don’t do the syncookie dance, or allow the new
connection to get set up.

Explicitly check for this in pf_test_state_tcp(). If we find a state in
TCPS_FIN_WAIT_2 and the syncookie is valid we delete the existing state
so we can set up the new state.
Note that when we verify the syncookie in pf_test_state_tcp() we don't
decrement the number of half-open connections to avoid an incorrect
double decrement.

MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D37919


# 8a8af942 22-Sep-2022 Kristof Provost <kp@FreeBSD.org>

pf: bridge-to

Allow pf (l2) to be used to redirect ethernet packets to a different
interface.

The intended use case is to send 802.1x challenges out to a side
interface, to enable AT&T links to function with pfSense as a gateway,
rather than the AT&T provided hardware.

Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D37193


# b37707bb 15-Oct-2022 Kristof Provost <kp@FreeBSD.org>

pf: fix LINT-NOINET6 build


# a974702e 07-Oct-2022 Kristof Provost <kp@FreeBSD.org>

pf: apply the network stack's ICMP rate limiting to ICMP errors sent by pf

PR: 266477
Event: Aberdeen Hackathon 2022
Differential Revision: https://reviews.freebsd.org/D36903


# 133935d2 07-Oct-2022 Kristof Provost <kp@FreeBSD.org>

pf: atomically increment state ids

Rather than using a per-cpu state counter, and adding in the CPU id we
can atomically increment the number.
This has the advantage of removing the assumption that the CPU ID fits
in 8 bits.

Event: Aberdeen Hackathon 2022
Reviewed by: mjg
Differential Revision: https://reviews.freebsd.org/D36915


# e5d08f47 09-Sep-2022 Mateusz Guzik <mjg@FreeBSD.org>

pf: remove pf_bcmp_state_key

Clang 14 performs the optimisation on its own, thus the custom code is
no longer needed.

Sponsored by: Rubicon Communications, LLC ("Netgate")


# 9503043f 02-Sep-2022 Mateusz Guzik <mjg@FreeBSD.org>

pf: stop using PFIL_FWD

It is only there to check if the packet was reassembled,
relevant if we are forwarding. But if the packet originated
locally it could not have been reassembled, thus the flag is
redundant.

Reviewed by: kp
Sponsored by: Rubicon Communications, LLC ("Netgate")


# 503b5870 25-Jul-2022 Dimitry Andric <dim@FreeBSD.org>

Adjust function definitions in pf.c to avoid clang 15 warnings

With clang 15, the following -Werror warnings are produced:

sys/netpfil/pf/pf.c:985:19: error: a function declaration without a prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes]
pf_mtag_initialize()
^
void
sys/netpfil/pf/pf.c:995:14: error: a function declaration without a prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes]
pf_initialize()
^
void
sys/netpfil/pf/pf.c:1089:16: error: a function declaration without a prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes]
pf_mtag_cleanup()
^
void
sys/netpfil/pf/pf.c:1096:11: error: a function declaration without a prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes]
pf_cleanup()
^
void
sys/netpfil/pf/pf.c:1989:27: error: a function declaration without a prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes]
pf_purge_expired_src_nodes()
^
void
sys/netpfil/pf/pf.c:2174:24: error: a function declaration without a prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes]
pf_purge_unlinked_rules()
^
void

This is because pf_mtag_initialize(), pf_initialize(),
pf_mtag_cleanup(), pf_cleanup(), pf_purge_expired_src_nodes(), and
pf_purge_unlinked_rules() are declared with (void) argument lists, but
defined with empty argument lists. Make the definitions match the
declarations.

MFC after: 3 days


# ba3b6b93 01-Jul-2022 Kristof Provost <kp@FreeBSD.org>

pf: handle dummynet for non-IP packets

Do not panic if we try to dummynet an Ethernet packet that's not IPv4 or
IPv6. Simply give it to dummynet.

Sponsored by: Rubicon Communications, LLC ("Netgate")


# 8e1c2334 23-Jun-2022 Kristof Provost <kp@FreeBSD.org>

pf: reduce the risk of src/dst mis-use

NULL out src/dst and check them rather than relying of 'af' to indicate
these variables are valid.

Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D35573


# fd72bfa6 24-Jun-2022 Kristof Provost <kp@FreeBSD.org>

pf: ensure mbufs are long enough before we copy out IP(v6) headers

This isn't likely to be an issue on real hardware (as Ethernet has a
minimal packet length of 64 bytes), but can cause panics with short
packets on if_epair.

Sponsored by: Rubicon Communications, LLC ("Netgate")


# 488626e5 22-Jun-2022 Kristof Provost <kp@FreeBSD.org>

pf: copy out rather than m_pullup() in pf_test_eth_rule()

Don't change the mbuf chain layout. We've encountered alignment issues
in the tcp syncookie code on armv7, which are triggered by the
m_pullup() here.

Reviewed by: mjg
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D35551


# 1f61367f 31-May-2022 Kristof Provost <kp@FreeBSD.org>

pf: support matching on tags for Ethernet rules

Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D35362


# 81ef217a 03-Jun-2022 Kristof Provost <kp@FreeBSD.org>

pf: Improve route-to handling of pfsync'd states

When a state if pfsync’d to a different host it doesn’t get all of the
expected pointers, including the pointer to the struct pfi_kif / struct
ifnet rt_kif pointer. (I.e. the interface to route out on).

That in turn means that pf_route() ends up dropping the packet.

Use the rule's struct pfi_kif pointer so we can still route out of the
expected interface.

MFC after: 2 weeks
Sponsored by: Orange Business Services


# 6c92016a 31-May-2022 Mateusz Guzik <mjg@FreeBSD.org>

pf: fix a race against kif destruction in pf_test{,6}

ifp kif was dereferenced prior to taking the lock and
could have been nullified later.

Reviewed by: kp
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision:


# bbec8e69 18-May-2022 Kristof Provost <kp@FreeBSD.org>

pf: call dummynet directly from the ethernet code

Until recently dummynet in ethernet rules did not send packets directly
to dummynet but instead marked them and left the interactions with
dummynet to the layer 3 pf code.
This worked fine for incoming packets (where we process ethernet rules
before layer 3 rules), but not for outbound packets (where the order of
operations is the reverse).

Dummynet does support handling layer 2 traffic, so send the packets
directly to dummynet.

The main limitation now is that pf does not inspect layer 4 (i.e.
TCP/UDP) so we don't have protocol information or port numbers. Dummynet
potentially uses this to separate traffic flows, which will not work for
ethernet dummynet rules. However, pipes (i.e. adding latency or
restricting bandwidth) will work exactly as expected.

Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D35257


# 009e8f0a 11-May-2022 Kristof Provost <kp@FreeBSD.org>

pf: fix pf_rule_to_actions()

If we already had a pipe set in the actions struct we need to take care
to clear the flag if we're overwriting it with a queue.

This can happen if we've got Ethernet rules setting a dummynet pipe.
It does this indirectly, by adding the dummynet information to a pf_mtag
associated with the mbuf.

Sponsored by: Rubicon Communications, LLC ("Netgate")


# a908f8f0 10-May-2022 Kristof Provost <kp@FreeBSD.org>

pf: tag dummynet'd route-to packets with their real destination

If we delay route-to/dup-to/reply-to through dummynet we are eventually
returned to pf_test(). At that point we no longer have the context for
the route-to destination. We'd just skip the pf_test() and continue
processing. This means that route-to did not work as expected.

Extend pf_mtag to carry the route-to destination so we can apply it when
we re-enter pf_test().

Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D35159


# 37c45229 09-May-2022 Kristof Provost <kp@FreeBSD.org>

pf: also apply dummynet to route-to/dup-to packets

If packets are processed by a route-to/dup-to/reply-to rule (i.e. they
pass through pf_route(6)) dummynet was not applied to them.
This is because pf_route(6) passes packets directly to ifp->if_output(),
so the dummynet functions were never called.

Factor out the dummynet code and call dummynet prior to
ifp->if_output(). This has a secondary benefit of reducing some code
duplication between the IPv4 and IPv6 paths.

Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D35158


# 4d48dd68 06-May-2022 Kristof Provost <kp@FreeBSD.org>

pf: don't reject dummynet-ed packets

If we pass a packet to dummynet we should indicate we've passed it (but
keep m0 == NULL). Otherwise we'll indicate to the calling layers that
the packet has been rejected.

Sponsored by: Rubicon Communications, LLC ("Netgate")


# 9501fc93 06-May-2022 Kristof Provost <kp@FreeBSD.org>

pf: dummynet fix

If we don't have a pipe set we shouldn't feed packets into dummynet.
This could occur if we have a 'dnpipe (0, 100)' configuration, for
example. We do want to feed the packet to dummynet in the return
direction, but not in the forward direction. In that case
pf_pdesc_to_dnflow() should return false, rather than pass a pipe number
of 0 to dummynet.

Sponsored by: Rubicon Communications, LLC ("Netgate")


# c530c80e 06-May-2022 Kristof Provost <kp@FreeBSD.org>

pf: fix reverse direction dummynet

Due to a typo dnrpipe (i.e. the pipe for reverse direction traffic) was
nevern assigned, preventing it from working correctly.

Sponsored by: Rubicon Communications, LLC ("Netgate")


# 27407a6a 05-May-2022 Kristof Provost <kp@FreeBSD.org>

pf: clear PF_TAG_DUMMYNET for dummynet fast path

ip_dn_io_ptr() (i.e. dummynet_io()) can return the mbuf immediately (as
opposed to owning it and later passing it through dummynet_send(), which
returns it to pf_test()). In that case we must clear the PF_TAG_DUMMYNET
flag to ensure we don't skip any subsequent firewall passes.

This can happen if we process a packet in PFIL_IN, set PF_TAG_DUMMYNET
on it, pass it to ip_dn_io_ptr() but have it returned immediately. The
packet continues its normal path, eventually hitting
pf_test(dir=PFIL_OUT), where we'd skip when we're not supposed to.

Sponsored by: Rubicon Communications, LLC ("Netgate")


# 0abcc1d2 22-Apr-2022 Reid Linnemann <rlinnemann@netgate.com>

pf: Add per-rule timestamps for rule and eth_rule

Similar to ipfw rule timestamps, these timestamps internally are
uint32_t snaps of the system time in seconds. The timestamp is CPU local
and updated each time a rule or a state associated with a rule or state
is matched.

Reviewed by: kp
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D34970


# 812839e5 12-Apr-2022 Kristof Provost <kp@FreeBSD.org>

pf: allow the use of tables in ethernet rules

Allow tables to be used for the l3 source/destination matching.
This requires taking the PF_RULES read lock.

Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D34917


# d557e89a 08-Apr-2022 John Baldwin <jhb@FreeBSD.org>

pf: Workaround set but unused warning.

The RB_NEXT macro does not use its middle argument since commit
5fce408cc44c737267aaaf0dcecd3454ba9089cd in 2004 (which ironically
fixed an "unused parameter" warning by introducing this warning in all
consumers). RB_PREV has also copied this unfortunate behavior of an
unused argument.

This results in 'parent' not being used. To workaround, inline the
value of 'parent' as the second argument to RB_NEXT.

Reviewed by: kp
Differential Revision: https://reviews.freebsd.org/D34833


# 93f8c38c 25-Feb-2022 Mateusz Guzik <mjg@FreeBSD.org>

pf: add pf_config_lock

For now only protects rule creation/destruction, but will allow
gradually reducing the scope of rules lock when changing the
rules.

Reviewed by: kp
Sponsored by: Rubicon Communications, LLC ("Netgate")


# a693d17b 16-Mar-2022 Gleb Smirnoff <glebius@FreeBSD.org>

pf: fix !INET or !INET6 builds

Fixes: pfr_match_addr8a42005d1e4


# 8a42005d 08-Mar-2022 Kristof Provost <kp@FreeBSD.org>

pf: support basic L3 filtering in the Ethernet rules

Allow filtering based on the source or destination IP/IPv6 address in
the Ethernet layer rules.

Reviewed by: pauamma_gundo.com (man), debdrup (man)
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D34482


# b590f17a 20-Jan-2022 Kristof Provost <kp@FreeBSD.org>

pf: support masking mac addresses

When filtering Ethernet packets allow rules to specify a mac address
with a mask. This indicates which bits of the specified address are
significant. This allows users to do things like filter based on device
manufacturer.

Sponsored by: Rubicon Communications, LLC ("Netgate")


# c5131afe 01-Oct-2021 Kristof Provost <kp@FreeBSD.org>

pf: add anchor support for ether rules

Support anchors in ether rules.

Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D32482


# fb330f39 27-Sep-2021 Kristof Provost <kp@FreeBSD.org>

pf: support dummynet on L2 rules

Allow packets to be tagged with dummynet information. Note that we do
not apply dummynet shaping on the L2 traffic, but instead mark it for
dummynet processing in the L3 code. This is the same approach as we take
for ALTQ.

Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D32222


# 5c75dfdf 16-Feb-2021 Kristof Provost <kp@FreeBSD.org>

pf: SDTs for ether rule matching

Add static DTrace probe points to allow debugging of ether rule
matching.

Reviewed by: mjg
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D31741


# 20c4899a 10-Feb-2021 Kristof Provost <kp@FreeBSD.org>

pf: Do not hold PF_RULES_RLOCK while processing Ethernet rules

Avoid the overhead of acquiring a (read) RULES lock when processing the
Ethernet rules.
We can get away with that because when rules are modified they're staged
in V_pf_keth_inactive. We take care to ensure the swap to V_pf_keth is
atomic, so that pf_test_eth_rule() always sees either the old rules, or
the new ruleset.

We need to take care not to delete the old ruleset until we're sure no
pf_test_eth_rule() is still running with those. We accomplish that by
using NET_EPOCH_CALL() to actually free the old rules.

Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D31739


# e732e742 03-Feb-2021 Kristof Provost <kp@FreeBSD.org>

pf: Initial Ethernet level filtering code

This is the kernel side of stateless Ethernel level filtering for pf.

The primary use case for this is to enable captive portal functionality
to allow/deny access by MAC address, rather than per IP address.

Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D31737


# 751d4c7b 10-Jan-2022 Kristof Provost <kp@FreeBSD.org>

pf: postpone clearing of struct pf_pdesc

Postpone zeroing out pd until after the PFI_IFLAG_SKIP/M_SKIP_FIREWALL
checks. We don't need it until then, and it saves us a few CPU cycles in
some cases.
This isn't expected to make a measurable performance change though.

Reviewed by: mjg, glebius
Pointed out by: markj
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D33815


# ecc39359 10-Jan-2022 Kristof Provost <kp@FreeBSD.org>

pf: remove PF_TAG_GENERATED

It's never set, so we can remove both the check for it and the
definition.

Reviewed by: mjg, glebius
Pointed out by: markj
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D33814


# b6c8c7b9 24-Nov-2021 Mateusz Guzik <mjg@FreeBSD.org>

pf: add pf_bcmp_state_key

Reviewed by: kp
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D33131


# 44775b16 24-Nov-2021 Mark Johnston <markj@FreeBSD.org>

netinet: Remove unneeded mb_unmapped_to_ext() calls

in_cksum_skip() now handles unmapped mbufs on platforms where they're
permitted.

Reviewed by: glebius, jhb
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D33097


# 90c55481 22-Nov-2021 Kristof Provost <kp@FreeBSD.org>

pf: fix netpfil.common.dummynet:pf_nat test

This test failed if ipfw was loaded (as well as pf). pf used the same
tag as dummynet to indicate a packet had already gone through dummynet.
However, ipfw removes this tag, so pf didn't realise the packet had
already gone through dummynet.

Introduce a separate flag, in the existing pf mtag rather than re-using
the ipfw tag. There were no free flag bits, but PF_TAG_FRAGCACHE is no
longer used so its bit can be re-purposed.

MFC after: 2 weeks
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D33087


# 18d04cd2 22-Nov-2021 Kristof Provost <kp@FreeBSD.org>

pf: align IPv6 dummynet handling with IPv4

In e5c4987e3f we fixed issues with nat and dummynet, but only changed
the IPv4 code. Make the same change for IPv6 as well.

Reviewed by: glebius
MFC after: 3 weeks
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D33086


# c36f9041 22-Nov-2021 Kristof Provost <kp@FreeBSD.org>

pf: remove unused variables

No functional change intended.

Reviewed by: glebius
MFC after: 2 weeks
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D33085


# 756bb50b 16-Nov-2021 Mark Johnston <markj@FreeBSD.org>

sctp: Remove now-unneeded mb_unmapped_to_ext() calls

sctp_delayed_checksum() now handles unmapped mbufs, thanks to m_apply().

No functional change intended.

Reviewed by: tuexen
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32942


# 8f3d786c 01-Nov-2021 Mateusz Guzik <mjg@FreeBSD.org>

pf: remove the flags argument from pf_unlink_state

All consumers call it with PF_ENTER_LOCKED.

Reviewed by: kp
Sponsored by: Rubicon Communications, LLC ("Netgate")


# edf6dd82 01-Nov-2021 Mateusz Guzik <mjg@FreeBSD.org>

pf: fix use-after-free from pf_find_state_all

state was returned without any locks nor references held

Reviewed by: kp
Sponsored by: Rubicon Communications, LLC ("Netgate")


# e5c4987e 26-Oct-2021 Kristof Provost <kp@FreeBSD.org>

pf: fix dummynet + NAT

Dummynet differs from ALTQ in that ALTQ schedules packets after they
leave pf. Dummynet schedules them after they leave pf, but then
re-injects them.
We currently deal with this by ensuring we don't re-schedule a packet we
get from dummynet, but this produces unexpected results when combined
with NAT, as dummynet processing is done after the NAT transformation.
In other words, the second time the packet is handed to pf it may have a
different source and destination address.

Simplify this by moving dummynet processing to after all other pf
processing, and not re-processing (but always passing) packets from
dummynet.

This fixes NAT of dummynet delayed packets, and also reduces processing
overhead (because we only do state/rule lookup for each dummynet packet
once, rather than twice).

MFC after: 3 weeks
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D32665


# c8ee75f2 10-Oct-2021 Gleb Smirnoff <glebius@FreeBSD.org>

Use network epoch to protect local IPv4 addresses hash.

The modification to the hash are already naturally locked by
in_control_sx. Convert the hash lists to CK lists. Remove the
in_ifaddr_rmlock. Assert the network epoch where necessary.

Most cases when the hash lookup is done the epoch is already entered.
Cover a few cases, that need entering the epoch, which mostly is
initial configuration of tunnel interfaces and multicast addresses.

Reviewed by: melifaro
Differential revision: https://reviews.freebsd.org/D32584


# ab238f14 19-Oct-2021 Luiz Otavio O Souza <loos@FreeBSD.org>

pf: ensure we have the correct source/destination IP address in ICMP errors

When we route-to a packet that later turns out to not fit in the
outbound interface MTU we generate an ICMP error.
However, if we've already changed those (i.e. we've passed through a NAT
rule) we have to undo the transformation first.

Obtained from: pfSense
MFC after: 3 weeks
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D32571


# 076b3a50 16-Oct-2021 Kristof Provost <kp@FreeBSD.org>

pf: don't drop packets when redirection information comes from a state

For some traffic there might be no matching rule in the current ruleset,
for example when a state was imported via pfsync from a sytem with a
different ruleset checksum. In this case pf_route uses s->rt_addr for
routing target instead of r->rpool.cur but r->rpool is checked anyway,
resulting in dropped packets.

PR: 259183
Submitted by: Kajetan Staszkiewicz <vegeta tuxpowered.net>
Sponsored by: InnoGames GmbH


# bf863718 24-Jul-2021 Kristof Provost <kp@FreeBSD.org>

pf: implement adaptive mode

Use atomic counters to ensure that we correctly track the number of half
open states and syncookie responses in-flight.
This determines if we activate or deactivate syncookies in adaptive
mode.

MFC after: 1 week
Sponsored by: Modirum MDPay
Differential Revision: https://reviews.freebsd.org/D32134


# 63b3c1c7 15-May-2021 Kristof Provost <kp@FreeBSD.org>

pf: support dummynet

Allow pf to use dummynet pipes and queues.

We re-use the currently unused IPFW_IS_DUMMYNET flag to allow dummynet
to tell us that a packet is being re-injected after being delayed. This
is needed to avoid endlessly looping the packet between pf and dummynet.

MFC after: 2 weeks
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D31904


# 8e496ea1 18-Sep-2021 Franco Fichtner <franco@opnsense.org>

pf: always log nat rule and do it pre-rewrite

See also https://github.com/opnsense/core/issues/5005

Reviewed by: kp
Differential Revision: https://reviews.freebsd.org/D31504


# 9bdff593 10-Sep-2021 Kristof Provost <kp@FreeBSD.org>

pf: fix NOINET6 builds

MFC after: 1 week
Sponsored by: Modirum MDPay


# 0a51d74c 01-Sep-2021 Kristof Provost <kp@FreeBSD.org>

pf: fix synproxy to local

When we're synproxy-ing a connection that's going to us (as opposed to a
forwarded one) we wound up trying to send out the pf-generated tcp
packets through pf_intr(), which called ip(6)_output(). That doesn't
work all that well for packets that are destined for us, so in that case
we must call ip(6)_input() instead.

MFC after: 1 week
Sponsored by: Modirum MDPay
Differential Revision: https://reviews.freebsd.org/D31853


# a0c64a44 03-Sep-2021 Kristof Provost <kp@FreeBSD.org>

pf: ensure states passed to pf_free_state() are always unlinked

In pf_create_state() we can end up deleting the state immediately. This
can happen if we fail to map the relevant addresses or fail
normalization or fail to insert it into the state table.
If that happens we delete the state again with pf_free_state(). However,
this asserts that the state must be unlinked.

It's correct to simply set the state to PFTM_UNLINKED because we've not
yet linked it.

Submitted by: Mateusz Guzik <mjg@FreeBSD.org>
Reviewed by: scottl
MFC after: 1 week
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D31752


# ce3ea450 20-Jul-2021 Kristof Provost <kp@FreeBSD.org>

pf: import pf_set_protostate() from OpenBSD

to change a state's state (that term is overloaded in pf, protocol state
like ESTABLISHED for tcp here), don't do it directly, but go through a newly
introduced pf_set_protostate()

Reviewed by: kbowling
Obtainted from: OpenBSD
MFC after: 1 week
Sponsored by: Modirum MDPay
Differential Revision: https://reviews.freebsd.org/D31729


# 3e875f95 17-Aug-2021 Mateusz Guzik <mjg@FreeBSD.org>

pf: assert dir state on pf_test{,6}

The intent is to line up various enums so that branching in the lines of:

idx = (dir == PF_IN ? PF_SK_WIRE : PF_SK_STACK);

is avoided.

Reviewed by: kp
Sponsored by: Rubicon Communications, LLC ("Netgate")


# 5091ca26 17-Aug-2021 Mateusz Guzik <mjg@FreeBSD.org>

pf: save on branching in the common case in pf_test

Reviewed by: kp
Sponsored by: Rubicon Communications, LLC ("Netgate")


# 9ef8cd0b 22-Jul-2021 Kristof Provost <kp@FreeBSD.org>

vlan: deduplicate bpf_setpcp() and pf_ieee8021q_setpcp()

These two fuctions were identical, so move them into the common
vlan_set_pcp() function, exposed in the if_vlan_var.h header.

Reviewed by: donner
MFC after: 1 week
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D31275


# 87c010e6 24-Jul-2021 Mateusz Guzik <mjg@FreeBSD.org>

pf: batch critical section for several counters

Reviewed by: kp
Sponsored by: Rubicon Communications, LLC ("Netgate")


# 02cf67cc 22-Jul-2021 Mateusz Guzik <mjg@FreeBSD.org>

pf: switch rule counters to pf_counter_u64

Reviewed by: kp
Sponsored by: Rubicon Communications, LLC ("Netgate")


# d40d4b3e 22-Jul-2021 Mateusz Guzik <mjg@FreeBSD.org>

pf: switch kif counters to pf_counter_u64

Reviewed by: kp
Sponsored by: Rubicon Communications, LLC ("Netgate")


# fc4c42ce 23-Jul-2021 Mateusz Guzik <mjg@FreeBSD.org>

pf: switch pf_status.fcounters to pf_counter_u64

Reviewed by: kp
Sponsored by: Rubicon Communications, LLC ("Netgate")


# defdcdd5 22-Jul-2021 Mateusz Guzik <mjg@FreeBSD.org>

pf: add hybrid 32- an 64- bit counters

Numerous counters got migrated from straight uint64_t to the counter(9)
API. Unfortunately the implementation comes with a significiant
performance hit on some platforms and cannot be easily fixed.

Work around the problem by implementing a pf-specific variant.

Reviewed by: kp
Sponsored by: Rubicon Communications, LLC ("Netgate")


# 907257d6 19-Jul-2021 Mateusz Guzik <mjg@FreeBSD.org>

pf: embed a pointer to the lock in struct pf_kstate

This shaves calculation which in particular helps on arm.

Note using the & hack instead would still be more work.

Reviewed by: kp
Sponsored by: Rubicon Communications, LLC ("Netgate")


# 8e1864ed 20-May-2021 Kristof Provost <kp@FreeBSD.org>

pf: syncookie support

Import OpenBSD's syncookie support for pf. This feature help pf resist
TCP SYN floods by only creating states once the remote host completes
the TCP handshake rather than when the initial SYN packet is received.

This is accomplished by using the initial sequence numbers to encode a
cookie (hence the name) in the SYN+ACK response and verifying this on
receipt of the client ACK.

Reviewed by: kbowling
Obtained from: OpenBSD
MFC after: 1 week
Sponsored by: Modirum MDPay
Differential Revision: https://reviews.freebsd.org/D31138


# ee9c3d38 10-Jun-2021 Kristof Provost <kp@FreeBSD.org>

pf: factor out pf_synproxy()

MFC after: 1 week
Sponsored by: Modirum MDPay
Differential Revision: https://reviews.freebsd.org/D31137


# 144ec071 19-Jul-2021 Mateusz Guzik <mjg@FreeBSD.org>

pf: add a branch prediction to expire state check in pf_find_state

Reviewed by: kp
Sponsored by: Rubicon Communications, LLC ("Netgate")


# 2c0d115b 13-Jul-2021 Kristof Provost <kp@FreeBSD.org>

pf: locally originating connections with 'route-to' fail

Similar to the REPLY_TO shortcut (6d786845cf) we also can't shortcut
ROUTE_TO. If we do we will fail to apply transformations or update the
state, which can lead to premature termination of the connections.

PR: 257106
MFC after: 3 weeks
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D31177


# 295f2d93 19-May-2021 Kristof Provost <kp@FreeBSD.org>

pf: Remove unused arguments from pf_send_tcp()

struct mbuf *replyto is not actually used (and only rarely provided).
The same applies to struct ifnet *ifp.

No functional change.

Reviewed by: mjg
MFC after: 1 week
Sponsored by: Modirum MDPay
Differential Revision: https://reviews.freebsd.org/D31136


# ef950daa 02-Mar-2021 Kristof Provost <kp@FreeBSD.org>

pf: match keyword support

Support the 'match' keyword.
Note that support is limited to adding queuing information, so without
ALTQ support in the kernel setting match rules is pointless.

For the avoidance of doubt: this is NOT full support for the match
keyword as found in OpenBSD's pf. That could potentially be built on top
of this, but this commit is NOT that.

MFC after: 2 weeks
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D31115


# 19d6e29b 08-Jul-2021 Mateusz Guzik <mjg@FreeBSD.org>

pf: add pf_find_state_all_exists

Reviewed by: kp
Sponsored by: Rubicon Communications, LLC ("Netgate")


# 211cddf9 06-Jul-2021 Kristof Provost <kp@FreeBSD.org>

pf: rename pf_state to pf_kstate

Indicate that this is a kernel-only structure, and make it easier to
distinguish from others used to communicate with userspace.

Reviewed by: mjg
MFC after: 1 week
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D31096


# f649cff5 05-Jul-2021 Mateusz Guzik <mjg@FreeBSD.org>

pf: padalign global locks found in pf.c

Reviewed by: kp
Sponsored by: Rubicon Communications, LLC ("Netgate")


# dc1ab04e 02-Jul-2021 Mateusz Guzik <mjg@FreeBSD.org>

pf: allow table stats clearing and reading with ruleset rlock

Instead serialize against these operations with a dedicated lock.

Prior to the change, When pushing 17 mln pps of traffic, calling
DIOCRGETTSTATS in a loop would restrict throughput to about 7 mln. With
the change there is no slowdown.

Reviewed by: kp (previous version)
Sponsored by: Rubicon Communications, LLC ("Netgate")


# d26ef5c7 28-Jun-2021 Mateusz Guzik <mjg@FreeBSD.org>

pf: make sure the dtrace probe has safe access to state

Reviewed by: kp
Sponsored by: Rubicon Communications, LLC ("Netgate")


# 55cc305d 28-Jun-2021 Mateusz Guzik <mjg@FreeBSD.org>

pf: revert: Use counter(9) for pf_state byte/packet tracking

stats are not shared and consequently per-CPU counters only waste
memory.

No slowdown was measured when passing over 20M pps.

Reviewed by: kp
Sponsored by: Rubicon Communications, LLC ("Netgate")


# 803dfe3d 28-Jun-2021 Mateusz Guzik <mjg@FreeBSD.org>

pf: deduplicate V_pf_state_z handling with pfsync

Reviewed by: kp
Sponsored by: Rubicon Communications, LLC ("Netgate")


# 7f025db5 28-Jun-2021 Mateusz Guzik <mjg@FreeBSD.org>

pf: fix error-case leaks in pf_create_state

The hand-rolled clean up failed to free counters.

Reviewed by: kp
Sponsored by: Rubicon Communications, LLC ("Netgate")


# ccb17a21 28-Jun-2021 Mateusz Guzik <mjg@FreeBSD.org>

pf: factor out state allocation into pf_alloc_state

Reviewed by: kp
Sponsored by: Rubicon Communications, LLC ("Netgate")


# d09388d0 28-Jun-2021 Mateusz Guzik <mjg@FreeBSD.org>

pf: add pf_release_staten and use it in pf_unlink_state

Saves one atomic op.

Reviewed by: kp
Sponsored by: Rubicon Communications, LLC ("Netgate")


# d38630f6 04-Jun-2021 Kristof Provost <kp@FreeBSD.org>

pf: store L4 headers in pf_pdesc

Rather than pointers to the headers store full copies. This brings us
slightly closer to what OpenBSD does, and also makes more sense than
storing pointers to stack variable copies of the headers.

Reviewed by: donner, scottl
MFC after: 1 week
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D30719


# d0fdf2b2 12-May-2021 Kristof Provost <kp@FreeBSD.org>

pf: Track the original kif for floating states

Track (and display) the interface that created a state, even if it's a
floating state (and thus uses virtual interface 'all').

MFC after: 1 week
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D30245


# 4f1f67e8 15-Apr-2021 Kristof Provost <kp@FreeBSD.org>

pf: PFRULE_REFS should not be user-visible

Split the PFRULE_REFS flag from the rule_flag field. PFRULE_REFS is a
kernel-internal flag and should not be exposed to or read from
userspace.

MFC after: 4 weeks
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D29778


# 6d786845 07-Apr-2021 Kristof Provost <kp@FreeBSD.org>

pf: Do not short-circuit processing for REPLY_TO

When we find a state for packets that was created by a reply-to rule we
still need to process the packet. The state may require us to modify the
packet (e.g. in rdr or nat cases), which we won't do with the shortcut.

MFC after: 2 week
Sponsored by: Rubicon Communications, LLC ("Netgate")


# f4c02909 02-Apr-2021 Kristof Provost <kp@FreeBSD.org>

pf: Add static DTrace probe points

These two have proven to be useful during debugging. We may as well keep
them permanently.
Others will be added as their utility becomes clear.

Reviewed by: gnn
MFC after: 2 weeks
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D29555


# 829a69db 01-Apr-2021 Kristof Provost <kp@FreeBSD.org>

pf: change pf_route so pf only runs when packets enter and leave the stack.

before this change pf_route operated on the semantic that pf runs
when packets go over an interface, so when pf_route changed which
interface the packet was on it would run pf_test again. this change
changes (restores) the semantic that pf is only supposed to run
when packets go in or out of the network stack, even if route-to
is responsibly for short circuiting past the network stack.

just to be clear, for normal packets (ie, those not touched by
route-to/reply-to/dup-to), there isn't a difference between running
pf when packets enter or leave the stack, or having pf run when a
packet goes over an interface.

the main reason for this change is that running the same packet
through pf multiple times creates confusion for the state table.
by default, pf states are floating, meaning that packets are matched
to states regardless of which interface they're going over. if a
packet leaving on em0 is rerouted out em1, both traversals will end
up using the same state, which at best will make the accounting
look weird, or at worst fail some checks in the state and get
dropped.

another reason for this commit is is to make handling of the changes
that route-to makes consistent with other changes that are made to
packet. eg, when nat is applied to a packet, we don't run pf_test
again with the new addresses.

the main caveat with this diff is you can't have one rule that
pushes a packet out a different interface, and then have a rule on
that second interface that NATs the packet. i'm not convinced this
ever worked reliably or was used much anyway, so we don't think
it's a big concern.

discussed with many, with special thanks to bluhm@, sashan@ and
sthen@ for weathering most of that pain.
ok claudio@ sashan@ jmatthew@

Obtained from: OpenBSD
MFC after: 2 weeks
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D29554


# 92d1463e 25-Mar-2021 Ed Maste <emaste@FreeBSD.org>

pf: remove obsolete reference to ndis(4) in a comment


# b93a796b 23-Mar-2021 Mark Johnston <markj@FreeBSD.org>

pf: Handle unmapped mbufs when computing checksums

PR: 254419
Reviewed by: gallatin, kp
Tested by: Igor A. Valkov <viaprog@gmail.com>
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D29378


# cecfaf9b 10-Mar-2021 Kristof Provost <kp@FreeBSD.org>

pf: Fully remove interrupt events on vnet cleanup

swi_remove() removes the software interrupt handler but does not remove
the associated interrupt event.
This is visible when creating and remove a vnet jail in `procstat -t
12`.

We can remove it manually with intr_event_destroy().

PR: 254171
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D29211


# 28dc2c95 10-Mar-2021 Kristof Provost <kp@FreeBSD.org>

pf: Simplify cleanup

We can now counter_u64_free(NULL), so remove the checks.

MFC after: 1 week
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D29190


# bb4a7d94 04-Mar-2021 Kristof Provost <kp@FreeBSD.org>

net: Introduce IPV6_DSCP(), IPV6_ECN() and IPV6_TRAFFIC_CLASS() macros

Introduce convenience macros to retrieve the DSCP, ECN or traffic class
bits from an IPv6 header.

Use them where appropriate.

Reviewed by: ae (previous version), rscheff, tuexen, rgrimes
MFC after: 2 weeks
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D29056


# f1932384 03-Mar-2021 Kristof Provost <kp@FreeBSD.org>

pf: Retrieve DSCP value from the IPv6 header

Teach pf to read the DSCP value from the IPv6 header so that we can
match on them.

Reviewed by: donner
MFC after: 2 weeks
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D29048


# 0c458752 28-Jan-2021 Yannis Planus <yannis.planus@alstomgroup.com>

pf: duplicate frames only once when using dup-to pf rule

When using DUP-TO rule, frames are duplicated 3 times on both output
interfaces and duplication interface. Add a flag to not duplicate a
duplicated frame.

Inspired by a patch from Miłosz Kaniewski milosz.kaniewski at gmail.com
https://lists.freebsd.org/pipermail/freebsd-pf/2015-November/007886.html

Reviewed by: kp@
Differential Revision: https://reviews.freebsd.org/D27018


# 5a3b9507 13-Dec-2020 Kristof Provost <kp@FreeBSD.org>

pf: Convert pfi_kkif to use counter_u64

Improve caching behaviour by using counter_u64 rather than variables
shared between cores.

The result of converting all counters to counter(9) (i.e. this full
patch series) is a significant improvement in throughput. As tested by
olivier@, on Intel Xeon E5-2697Av4 (16Cores, 32 threads) hardware with
Mellanox ConnectX-4 MCX416A-CCAT (100GBase-SR4) nics we see:

x FreeBSD 20201223: inet packets-per-second
+ FreeBSD 20201223 with pf patches: inet packets-per-second
+--------------------------------------------------------------------------+
| + |
| xx + |
|xxx +++|
||A| |
| |A||
+--------------------------------------------------------------------------+
N Min Max Median Avg Stddev
x 5 9216962 9526356 9343902 9371057.6 116720.36
+ 5 19427190 19698400 19502922 19546509 109084.92
Difference at 95.0% confidence
1.01755e+07 +/- 164756
108.584% +/- 2.9359%
(Student's t, pooled s = 112967)

Reviewed by: philip
MFC after: 2 weeks
Sponsored by: Orange Business Services
Differential Revision: https://reviews.freebsd.org/D27763


# 320c1116 12-Dec-2020 Kristof Provost <kp@FreeBSD.org>

pf: Split pfi_kif into a user and kernel space structure

No functional change.

MFC after: 2 weeks
Sponsored by: Orange Business Services
Differential Revision: https://reviews.freebsd.org/D27761


# c3adacda 05-Dec-2020 Kristof Provost <kp@FreeBSD.org>

pf: Change pf_krule counters to use counter_u64

This improves the cache behaviour of pf and results in improved
throughput.

MFC after: 2 weeks
Sponsored by: Orange Business Services
Differential Revision: https://reviews.freebsd.org/D27760


# e86bddea 05-Dec-2020 Kristof Provost <kp@FreeBSD.org>

pf: Split pf_rule into kernel and user space versions

No functional change intended.

MFC after: 2 weeks
Sponsored by: Orange Business Services
Differential Revision: https://reviews.freebsd.org/D27758


# fbbf270e 13-Nov-2020 Kristof Provost <kp@FreeBSD.org>

pf: Use counter_u64 in pf_src_node

Reviewd by: philip
MFC after: 2 weeks
Sponsored by: Orange Business Services
Differential Revision: https://reviews.freebsd.org/D27756


# 17ad7334 23-Dec-2020 Kristof Provost <kp@FreeBSD.org>

pf: Split pf_src_node into a kernel and userspace struct

Introduce a kernel version of struct pf_src_node (pf_ksrc_node).

This will allow us to improve the in-kernel data structure without
breaking userspace compatibility.

Reviewed by: philip
MFC after: 2 weeks
Sponsored by: Orange Business Services
Differential Revision: https://reviews.freebsd.org/D27707


# 1c00efe9 23-Dec-2020 Kristof Provost <kp@FreeBSD.org>

pf: Use counter(9) for pf_state byte/packet tracking

This improves cache behaviour by not writing to the same variable from
multiple cores simultaneously.

pf_state is only used in the kernel, so can be safely modified.

Reviewed by: Lutz Donnerhacke, philip
MFC after: 1 week
Sponsed by: Orange Business Services
Differential Revision: https://reviews.freebsd.org/D27661


# c3f69af0 20-Dec-2020 Kristof Provost <kp@FreeBSD.org>

pf: Fix unaligned checksum updates

The algorithm we use to update checksums only works correctly if the
updated data is aligned on 16-bit boundaries (relative to the start of
the packet).

Import the OpenBSD fix for this issue.

PR: 240416
Obtained from: OpenBSD
MFC after: 1 week
Reviewed by: tuexen (previous version)
Differential Revision: https://reviews.freebsd.org/D27696


# 3420068a 12-Dec-2020 Kristof Provost <kp@FreeBSD.org>

pf: Allow net.pf.request_maxcount to be set from loader.conf

Mark request_maxcount as RWTUN so we can set it both at runtime and from
loader.conf. This avoids usings getting caught out by the change from tunable
to run time configuration.

Suggested by: Franco Fichtner
MFC after: 3 days


# 9ee99cec 11-Dec-2020 Brooks Davis <brooks@FreeBSD.org>

hme(4): Remove as previous announced

The hme (Happy Meal Ethernet) driver was the onboard NIC in most
supported sparc64 platforms. A few PCI NICs do exist, but we have seen
no evidence of use on non-sparc systems.

Reviewed by: imp, emaste, bcr
Sponsored by: DARPA


# 71c9acef 20-Nov-2020 Kristof Provost <kp@FreeBSD.org>

pf: Fix incorrect assertion

We never set PFRULE_RULESRCTRACK when calling pf_insert_src_node(). We do set
PFRULE_SRCTRACK, so update the assertion to match.

MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D27254


# 662c1305 01-Sep-2020 Mateusz Guzik <mjg@FreeBSD.org>

net: clean up empty lines in .c and .h files


# 95033af9 18-Jun-2020 Mark Johnston <markj@FreeBSD.org>

Add the SCTP_SUPPORT kernel option.

This is in preparation for enabling a loadable SCTP stack. Analogous to
IPSEC/IPSEC_SUPPORT, the SCTP_SUPPORT kernel option must be configured
in order to support a loadable SCTP implementation.

Discussed with: tuexen
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation


# 643ce948 15-Apr-2020 Alexander V. Chernikov <melifaro@FreeBSD.org>

Convert pf rtable checks to the new routing KPI.

Switch uRPF to use specific fib(9)-provided uRPF.
Switch MSS calculation to the latest fib(9) kpi.

Reviewed by: kp
Differential Revision: https://reviews.freebsd.org/D24386


# 10b49b23 21-Feb-2020 Pawel Biernacki <kaktus@FreeBSD.org>

Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (6 of many)

r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.

Mark all nodes in pf, pfsync and carp as MPSAFE.

Reviewed by: kp
Approved by: kib (mentor, blanket)
Differential Revision: https://reviews.freebsd.org/D23634


# cca2ea64 13-Dec-2019 Kristof Provost <kp@FreeBSD.org>

pf: Make request_maxcount runtime adjustable

There's no reason for this to be a tunable. It's perfectly safe to
change this at runtime.

Reviewed by: Lutz Donnerhacke
Differential Revision: https://reviews.freebsd.org/D22737


# 492f3a31 24-Nov-2019 Kristof Provost <kp@FreeBSD.org>

pf: Add endline to all DPFPRINTF()

DPFPRINTF() doesn't automatically add an endline, so be consistent and
always add it.


# a0d571cb 17-Oct-2019 Kristof Provost <kp@FreeBSD.org>

pf: Must be in NET_EPOCH to call icmp_error

icmp_reflect(), called through icmp_error() requires us to be in NET_EPOCH.
Failure to hold it leads to the following panic (with INVARIANTS):

panic: Assertion in_epoch(net_epoch_preempt) failed at /usr/src/sys/netinet/ip_icmp.c:742
cpuid = 2
time = 1571233273
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00e0977920
vpanic() at vpanic+0x17e/frame 0xfffffe00e0977980
panic() at panic+0x43/frame 0xfffffe00e09779e0
icmp_reflect() at icmp_reflect+0x625/frame 0xfffffe00e0977aa0
icmp_error() at icmp_error+0x720/frame 0xfffffe00e0977b10
pf_intr() at pf_intr+0xd5/frame 0xfffffe00e0977b50
ithread_loop() at ithread_loop+0x1c6/frame 0xfffffe00e0977bb0
fork_exit() at fork_exit+0x80/frame 0xfffffe00e0977bf0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00e0977bf0

Note that we now enter NET_EPOCH twice if we enter ip_output() from pf_intr(),
but ip_output() will soon be converted to a function that requires epoch, so
entering NET_EPOCH directly from pf_intr() makes more sense.

Discussed with: glebius@


# bff630d1 12-Oct-2019 Mark Johnston <markj@FreeBSD.org>

Fix the build after r353458.

MFC with: r353458
Sponsored by: The FreeBSD Foundation


# 6cc9ab86 12-Oct-2019 Mark Johnston <markj@FreeBSD.org>

Add a missing include of opt_sctp.h.

MFC after: 1 week
Sponsored by: The FreeBSD Foundation


# f287767d 29-Jul-2019 Kristof Provost <kp@FreeBSD.org>

pf: Remove partial RFC2675 support

Remove our (very partial) support for RFC2675 Jumbograms. They're not
used, not actually supported and not a good idea.

Reviewed by: thj@
Differential Revision: https://reviews.freebsd.org/D21086


# f89d2072 17-Jun-2019 Xin LI <delphij@FreeBSD.org>

Separate kernel crc32() implementation to its own header (gsb_crc32.h) and
rename the source to gsb_crc32.c.

This is a prerequisite of unifying kernel zlib instances.

PR: 229763
Submitted by: Yoshihiro Ota <ota at j.email.ne.jp>
Differential Revision: https://reviews.freebsd.org/D20193


# d086d413 25-May-2019 Li-Wen Hsu <lwhsu@FreeBSD.org>

Remove an uneeded indentation introduced in r223637 to silence gcc warnging

MFC after: 3 days
Sponsored by: The FreeBSD Foundation


# 6c1c6ae5 04-Apr-2019 Rodney W. Grimes <rgrimes@FreeBSD.org>

Use IN_foo() macros from sys/netinet/in.h inplace of handcrafted code

There are a few places that use hand crafted versions of the macros
from sys/netinet/in.h making it difficult to actually alter the
values in use by these macros. Correct that by replacing handcrafted
code with proper macro usage.

Reviewed by: karels, kristof
Approved by: bde (mentor)
MFC after: 3 weeks
Sponsored by: John Gilmore
Differential Revision: https://reviews.freebsd.org/D19317


# a8a16c71 03-Apr-2019 Conrad Meyer <cem@FreeBSD.org>

Replace read_random(9) with more appropriate arc4rand(9) KPIs

Reviewed by: ae, delphij
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D19760


# 64af73aa 21-Mar-2019 Kristof Provost <kp@FreeBSD.org>

pf: Ensure that IP addresses match in ICMP error packets

States in pf(4) let ICMP and ICMP6 packets pass if they have a
packet in their payload that matches an exiting connection. It was
not checked whether the outer ICMP packet has the same destination
IP as the source IP of the inner protocol packet. Enforce that
these addresses match, to prevent ICMP packets that do not make
sense.

Reported by: Nicolas Collignon, Corentin Bayet, Eloi Vanderbeken, Luca Moro at Synacktiv
Obtained from: OpenBSD
Security: CVE-2019-5598


# 1830dae3 14-Mar-2019 Gleb Smirnoff <glebius@FreeBSD.org>

Make second argument of ip_divert(), that specifies packet direction a bool.
This allows pf(4) to avoid including ipfw(4) private files.


# 22c58991 24-Feb-2019 Kristof Provost <kp@FreeBSD.org>

pf: Small performance tweak

Because fetching a counter is a rather expansive function we should use
counter_u64_fetch() in pf_state_expires() only when necessary. A "rdr
pass" rule should not cause more effort than separate "rdr" and "pass"
rules. For rules with adaptive timeout values the call of
counter_u64_fetch() should be accepted, but otherwise not.

From the man page:
The adaptive timeout values can be defined both globally and for
each rule. When used on a per-rule basis, the values relate to the
number of states created by the rule, otherwise to the total number
of states.

This handling of adaptive timeouts is done in pf_state_expires(). The
calculation needs three values: start, end and states.

1. Normal rules "pass .." without adaptive setting meaning "start = 0"
runs in the else-section and therefore takes "start" and "end" from
the global default settings and sets "states" to pf_status.states
(= total number of states).

2. Special rules like
"pass .. keep state (adaptive.start 500 adaptive.end 1000)"
have start != 0, run in the if-section and take "start" and "end"
from the rule and set "states" to the number of states created by
their rule using counter_u64_fetch().

Thats all ok, but there is a third case without special handling in the
above code snippet:

3. All "rdr/nat pass .." statements use together the pf_default_rule.
Therefore we have "start != 0" in this case and we run the
if-section but we better should run the else-section in this case and
do not fetch the counter of the pf_default_rule but take the total
number of states.

Submitted by: Andreas Longwitz <longwitz@incore.de>
MFC after: 2 weeks


# 8f2ac656 10-Feb-2019 Patrick Kelsey <pkelsey@FreeBSD.org>

Reduce the time it takes the kernel to install a new PF config containing a large number of queues

In general, the time savings come from separating the active and
inactive queues lists into separate interface and non-interface queue
lists, and changing the rule and queue tag management from list-based
to hash-bashed.

In HFSC, a linear scan of the class table during each queue destroy
was also eliminated.

There are now two new tunables to control the hash size used for each
tag set (default for each is 128):

net.pf.queue_tag_hashsize
net.pf.rule_tag_hashsize

Reviewed by: kp
MFC after: 1 week
Sponsored by: RG Nets
Differential Revision: https://reviews.freebsd.org/D19131


# 336683f2 12-Dec-2018 Kristof Provost <kp@FreeBSD.org>

pf: Fix endless loop on NAT exhaustion with sticky-address

When we try to find a source port in pf_get_sport() it's possible that
all available source ports will be in use. In that case we call
pf_map_addr() to try to find a new source IP to try from. If there are
no more available source IPs pf_map_addr() will return 1 and we stop
trying.

However, if sticky-address is set we'll always return the same IP
address, even if we've already tried that one.
We need to check the supplied address, because if that's the one we'd
set it means pf_get_sport() has already tried it, and we should error
out rather than keep trying.

PR: 233867
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D18483


# 5b551954 11-Dec-2018 Kristof Provost <kp@FreeBSD.org>

pf: Prevent integer overflow in PF when calculating the adaptive timeout.

Mainly states of established TCP connections would be affected resulting
in immediate state removal once the number of states is bigger than
adaptive.start. Disabling adaptive timeouts is a workaround to avoid this bug.
Issue found and initial diff by Mathieu Blanc (mathieu.blanc at cea dot fr)

Reported by: Andreas Longwitz <longwitz AT incore.de>
Obtained from: OpenBSD
MFC after: 2 weeks


# 5f6cf24e 02-Nov-2018 Kristof Provost <kp@FreeBSD.org>

pfsync: Make pfsync callbacks per-vnet

The callbacks are installed and removed depending on the state of the
pfsync device, which is per-vnet. The callbacks must also be per-vnet.

MFC after: 2 weeks
Sponsored by: Orange Business Services
Differential Revision: https://reviews.freebsd.org/D17499


# 13d640d3 23-Oct-2018 Kristof Provost <kp@FreeBSD.org>

pf: Fix copy/paste error in IPv6 address rewriting

We checked the destination address, but replaced the source address. This was
fixed in OpenBSD as part of their NAT rework, which we don't want to import
right now.

CID: 1009561
MFC after: 3 weeks


# 1563a27e 20-Oct-2018 Kristof Provost <kp@FreeBSD.org>

pf synproxy will do the 3WHS on behalf of the target machine, and once
the 3WHS is completed, establish the backend connection. The trigger
for "3WHS completed" is the reception of the first ACK. However, we
should not proceed if that ACK also has RST or FIN set.

PR: 197484
Obtained from: OpenBSD
MFC after: 2 weeks


# 032d3aaa 15-Sep-2018 John-Mark Gurney <jmg@FreeBSD.org>

Significantly improve pf purge cpu usage by only taking locks
when there is work to do. This reduces CPU consumption to one
third on systems. This will help keep the thread CPU usage under
control now that the default hash size has increased.

Reviewed by: kp
Approved by: re (kib)
Differential Revision: https://reviews.freebsd.org/D17097


# 5f901c92 24-Jul-2018 Andrew Turner <andrew@FreeBSD.org>

Use the new VNET_DEFINE_STATIC macro when we are defining static VNET
variables.

Reviewed by: bz
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D16147


# 32ece669 14-Jul-2018 Kristof Provost <kp@FreeBSD.org>

pf: Fix synproxy

Synproxy was accidentally broken by r335569. The 'return (action)' must be
executed for every non-PF_PASS result, but the error packet (TCP RST or ICMP
error) should only be sent if the packet was dropped (i.e. PF_DROP) and the
return flag is set.

PR: 229477
Submitted by: Andre Albsmeier <mail AT fbsd.e4m.org>
MFC after: 1 week


# 150182e3 22-Jun-2018 Kristof Provost <kp@FreeBSD.org>

pf: Support "return" statements in passing rules when they fail.

Normally pf rules are expected to do one of two things: pass the traffic or
block it. Blocking can be silent - "drop", or loud - "return", "return-rst",
"return-icmp". Yet there is a 3rd category of traffic passing through pf:
Packets matching a "pass" rule but when applying the rule fails. This happens
when redirection table is empty or when src node or state creation fails. Such
rules always fail silently without notifying the sender.

Allow users to configure this behaviour too, so that pf returns an error packet
in these cases.

PR: 226850
Submitted by: Kajetan Staszkiewicz <vegeta tuxpowered.net>
MFC after: 1 week
Sponsored by: InnoGames GmbH


# 0b799353 09-Jun-2018 Kristof Provost <kp@FreeBSD.org>

pf: Fix deadlock with route-to

If a locally generated packet is routed (with route-to/reply-to/dup-to) out of
a different interface it's passed through the firewall again. This meant we
lost the inp pointer and if we required the pointer (e.g. for user ID matching)
we'd deadlock trying to acquire an inp lock we've already got.

Pass the inp pointer along with pf_route()/pf_route6().

PR: 228782
MFC after: 1 week


# 455969d3 30-May-2018 Kristof Provost <kp@FreeBSD.org>

pf: Replace rwlock on PF_RULES_LOCK with rmlock

Given that PF_RULES_LOCK is a mostly read lock, replace the rwlock with rmlock.
This change improves packet processing rate in high pps environments.
Benchmarking by olivier@ shows a 65% improvement in pps.

While here, also eliminate all appearances of "sys/rwlock.h" includes since it
is not used anymore.

Submitted by: farrokhi@
Differential Revision: https://reviews.freebsd.org/D15502


# 2695c9c1 02-May-2018 Sean Bruno <sbruno@FreeBSD.org>

Retire ixgb(4)

This driver was for an early and uncommon legacy PCI 10GbE for a single
ASIC, Intel 82597EX. Intel quickly shifted to the long lived ixgbe family.

Submitted by: kbowling
Reviewed by: brooks imp jeffrey.e.pieper@intel.com
Relnotes: yes
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D15234


# c41420d5 11-Apr-2018 Kristof Provost <kp@FreeBSD.org>

pf: limit ioctl to a reasonable and tuneable number of elements

pf ioctls frequently take a variable number of elements as argument. This can
potentially allow users to request very large allocations. These will fail,
but even a failing M_NOWAIT might tie up resources and result in concurrent
M_WAITOK allocations entering vm_wait and inducing reclamation of caches.

Limit these ioctls to what should be a reasonable value, but allow users to
tune it should they need to.

Differential Revision: https://reviews.freebsd.org/D15018


# effaab88 23-Mar-2018 Kristof Provost <kp@FreeBSD.org>

netpfil: Introduce PFIL_FWD flag

Forwarded packets passed through PFIL_OUT, which made it difficult for
firewalls to figure out if they were forwarding or producing packets. This in
turn is an issue for pf for IPv6 fragment handling: it needs to call
ip6_output() or ip6_forward() to handle the fragments. Figuring out which was
difficult (and until now, incorrect).
Having pfil distinguish the two removes an ugly piece of code from pf.

Introduce a new variant of the netpfil callbacks with a flags variable, which
has PFIL_FWD set for forwarded packets. This allows pf to reliably work out if
a packet is forwarded.

Reviewed by: ae, kevans
Differential Revision: https://reviews.freebsd.org/D13715


# bf56a3fe 25-Feb-2018 Kristof Provost <kp@FreeBSD.org>

pf: Cope with overly large net.pf.states_hashsize

If the user configures a states_hashsize or source_nodes_hashsize value we may
not have enough memory to allocate this. This used to lock up pf, because these
allocations used M_WAITOK.

Cope with this by attempting the allocation with M_NOWAIT and falling back to
the default sizes (with M_WAITOK) if these fail.

PR: 209475
Submitted by: Fehmi Noyan Isi <fnoyanisi AT yahoo.com>
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D14367


# c201b564 01-Feb-2018 Kristof Provost <kp@FreeBSD.org>

pf: Avoid warning without INVARIANTS

When INVARIANTS is not set the 'last' variable is not used, which can generate
compiler warnings.
If this invariant is ever violated it'd result in a KASSERT failure in
refcount_release(), so this one is not strictly required.


# 6701c432 23-Jan-2018 Kristof Provost <kp@FreeBSD.org>

pf: States have at least two references

pf_unlink_state() releases a reference to the state without checking if
this is the last reference. It can't be, because pf_state_insert()
initialises it to two. KASSERT() that this is always the case.

CID: 1347140


# 5d0020d6 31-Dec-2017 Kristof Provost <kp@FreeBSD.org>

pf: Clean all fragments on shutdown

When pf is unloaded, or a vnet jail using pf is stopped we need to
ensure we clean up all fragments, not just the expired ones.


# fe267a55 27-Nov-2017 Pedro F. Giffuni <pfg@FreeBSD.org>

sys: general adoption of SPDX licensing ID tags.

Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

No functional change intended.


# b7ae4355 09-Jul-2017 Kristof Provost <kp@FreeBSD.org>

pf: Fix vnet purging

pf_purge_thread() breaks up the work of iterating all states (in
pf_purge_expired_states()) and tracks progress in the idx variable.

If multiple vnets exist this results in pf_purge_thread() only calling
pf_purge_expired_states() for part of the states (the first part of the
first vnet, second part of the second vnet and so on).
Combined with the mark-and-sweep approach to cleaning up old rules (in
V_pf_unlinked_rules) that resulted in pf freeing rules that were still
referenced by states. This in turn caused panics when pf_state_expires()
encounters that state and attempts to access the rule.

We need to track the progress per vnet, not globally, so idx is moved
into a per-vnet V_pf_purge_idx.

PR: 219251
Sponsored by: Hackathon Essen 2017


# 3601d251 31-Mar-2017 Kristof Provost <kp@FreeBSD.org>

pf: Fix leak of pf_state_keys

If we hit the state limit we returned from pf_create_state() without cleaning
up.

PR: 217997
Submitted by: Max <maximos@als.nnov.ru>
MFC after: 1 week


# 2f8fb3a8 22-Mar-2017 Kristof Provost <kp@FreeBSD.org>

pf: Fix possible shutdown race

Prevent possible races in the pf_unload() / pf_purge_thread() shutdown
code. Lock the pf_purge_thread() with the new pf_end_lock to prevent
these races.

Use a shared/exclusive lock, as we need to also acquire another sx lock
(VNET_LIST_RLOCK). It's fine for both pf_purge_thread() and pf_unload()
to sleep,

Pointed out by: eri, glebius, jhb
Differential Revision: https://reviews.freebsd.org/D10026


# 08ef4ddb 18-Mar-2017 Kristof Provost <kp@FreeBSD.org>

pf: Fix rule evaluation after inet6 route-to

In pf_route6() we re-run the ruleset with PF_FWD if the packet goes out
of a different interface. pf_test6() needs to know that the packet was
forwarded (in case it needs to refragment so it knows whether to call
ip6_output() or ip6_forward()).

This lead pf_test6() to try to evaluate rules against the PF_FWD
direction, which isn't supported, so it needs to treat PF_FWD as PF_OUT.
Once fwdir is set correctly the correct output/forward function will be
called.

PR: 217883
Submitted by: Kajetan Staszkiewicz
MFC after: 1 week
Sponsored by: InnoGames GmbH


# 2a57d24b 11-Mar-2017 Kristof Provost <kp@FreeBSD.org>

pf: Fix incorrect rw_sleep() in pf_unload()

When we unload we don't hold the pf_rules_lock, so we cannot call rw_sleep()
with it, because it would release a lock we do not hold. There's no need for the
lock either, so we can just tsleep().

While here also make the same change in pf_purge_thread(), because it explicitly
takes the lock before rw_sleep() and then immediately releases it afterwards.


# f6182013 11-Mar-2017 Kristof Provost <kp@FreeBSD.org>

pf: Do not lose the VNET lock when ending the purge thread

When the pf_purge_thread() exits it must make sure to release the
VNET_LIST_RLOCK it still holds.
kproc_exit() does not return.


# 164aa3ce 30-Jan-2017 Gleb Smirnoff <glebius@FreeBSD.org>

Fix indentantion in pf_purge_thread(). No functional change.


# a5c1a50a 28-Jan-2017 Luiz Otavio O Souza <loos@FreeBSD.org>

Do not run the pf purge thread while the VNET variables are not
initialized, this can cause a divide by zero (if the VNET initialization
takes to long to complete).

Obtained from: pfSense
MFC after: 2 weeks
Sponsored by: Rubicon Communications, LLC (Netgate)


# 1f495578 13-Oct-2016 Kristof Provost <kp@FreeBSD.org>

pf: port extended DSCP support from OpenBSD

Ignore the ECN bits on 'tos' and 'set-tos' and allow to use
DCSP names instead of having to embed their TOS equivalents
as plain numbers.

Obtained from: OpenBSD
Sponsored by: OPNsense
Differential Revision: https://reviews.freebsd.org/D8165


# 813196a1 04-Oct-2016 Kristof Provost <kp@FreeBSD.org>

pf: remove fastroute tag

The tag fastroute came from ipf and was removed in OpenBSD in 2011. The code
allows to skip the in pfil hooks and completely removes the out pfil invoke,
albeit looking up a route that the IP stack will likely find on its own.
The code between IPv4 and IPv6 is also inconsistent and marked as "XXX"
for years.

Submitted by: Franco Fichtner <franco@opnsense.org>
Differential Revision: https://reviews.freebsd.org/D8058


# 0df377cb 14-Aug-2016 Kristof Provost <kp@FreeBSD.org>

pf: Add missing byte-order swap to pf_match_addr_range

Without this, rules using address ranges (e.g. "10.1.1.1 - 10.1.1.5") did not
match addresses correctly on little-endian systems.

PR: 211796
Obtained from: OpenBSD (sthen)
MFC after: 3 days


# a0429b54 23-Jun-2016 Bjoern A. Zeeb <bz@FreeBSD.org>

Update pf(4) and pflog(4) to survive basic VNET testing, which includes
proper virtualisation, teardown, avoiding use-after-free, race conditions,
no longer creating a thread per VNET (which could easily be a couple of
thousand threads), gracefully ignoring global events (e.g., eventhandlers)
on teardown, clearing various globally cached pointers and checking
them before use.

Reviewed by: kp
Approved by: re (gjb)
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D6924


# 3e248e0f 17-Jun-2016 Kristof Provost <kp@FreeBSD.org>

pf: Filter on and set vlan PCP values

Adopt the OpenBSD syntax for setting and filtering on VLAN PCP values. This
introduces two new keywords: 'set prio' to set the PCP value, and 'prio' to
filter on it.

Reviewed by: allanjude, araujo
Approved by: re (gjb)
Obtained from: OpenBSD (mostly)
Differential Revision: https://reviews.freebsd.org/D6786


# b599e8dc 23-May-2016 Kristof Provost <kp@FreeBSD.org>

pf: Fix more ICMP mistranslation

In the default case fix the substitution of the destination address.

PR: 201519
Submitted by: Max <maximos@als.nnov.ru>
MFC after: 1 week


# c0c82715 22-May-2016 Kristof Provost <kp@FreeBSD.org>

pf: Fix ICMP translation

Fix ICMP source address rewriting in rdr scenarios.

PR: 201519
Submitted by: Max <maximos@als.nnov.ru>
MFC after: 1 week


# a4641f4e 03-May-2016 Pedro F. Giffuni <pfg@FreeBSD.org>

sys/net*: minor spelling fixes.

No functional change.


# 0d8c9331 16-Mar-2016 Kristof Provost <kp@FreeBSD.org>

pf: Improve forwarding detection

When we guess the nature of the outbound packet (output vs. forwarding) we need
to take bridges into account. When bridging the input interface does not match
the output interface, but we're not forwarding. Similarly, it's possible for the
interface to actually be the bridge interface itself (and not a member interface).

PR: 202351
MFC after: 2 weeks


# c90369f8 19-Feb-2016 Kristof Provost <kp@FreeBSD.org>

in pf_print_state_parts, do not use skw->proto to print the protocol but our
local copy proto that we very carefully set beforehands. skw being NULL is
perfectly valid there.

Obtained from: OpenBSD (henning)


# 460a5b50 07-Jan-2016 Alexander V. Chernikov <melifaro@FreeBSD.org>

Convert pf(4) to the new routing API.

Differential Revision: https://reviews.freebsd.org/D4763


# 637670e7 15-Nov-2015 Alexander V. Chernikov <melifaro@FreeBSD.org>

Bring back the ability of passing cached route via nd6_output_ifp().


# 5a505b31 07-Nov-2015 Kristof Provost <kp@FreeBSD.org>

pf: Fix broken rule skip calculation

r289932 accidentally broke the rule skip calculation. The address family
argument to PF_ANEQ() is now important, and because it was set to 0 the macro
always evaluated to false.
This resulted in incorrect skip values, which in turn broke the rule
evaluations.


# 679e3c77 29-Oct-2015 Kristof Provost <kp@FreeBSD.org>

pf: Fix IPv6 checksums with route-to.

When using route-to (or reply-to) pf sends the packet directly to the output
interface. If that interface doesn't support checksum offloading the checksum
has to be calculated in software.
That was already done in the IPv4 case, but not for the IPv6 case. As a result
we'd emit packets with pseudo-header checksums (i.e. incorrect checksums).

This issue was exposed by the changes in r289316 when pf stopped performing full
checksum calculations for all packets.

Submitted by: Luoqi Chen
MFC after: 1 week


# 78546dad 27-Oct-2015 Alexander V. Chernikov <melifaro@FreeBSD.org>

Eliminate last rtalloc_ign() caller.

Differential Revision: https://reviews.freebsd.org/D3927


# c110fc49 14-Oct-2015 Kristof Provost <kp@FreeBSD.org>

pf: Fix TSO issues

In certain configurations (mostly but not exclusively as a VM on Xen) pf
produced packets with an invalid TCP checksum.

The problem was that pf could only handle packets with a full checksum. The
FreeBSD IP stack produces TCP packets with a pseudo-header checksum (only
addresses, length and protocol).
Certain network interfaces expect to see the pseudo-header checksum, so they
end up producing packets with invalid checksums.

To fix this stop calculating the full checksum and teach pf to only update TCP
checksums if TSO is disabled or the change affects the pseudo-header checksum.

PR: 154428, 193579, 198868
Reviewed by: sbruno
MFC after: 1 week
Relnotes: yes
Sponsored by: RootBSD
Differential Revision: https://reviews.freebsd.org/D3779


# 1fe201c3 16-Sep-2015 Alexander V. Chernikov <melifaro@FreeBSD.org>

Simplify the way of attaching IPv6 link-layer header.

Problem description:
How do we currently perform layer 2 resolution and header imposition:

For IPv4 we have the following chain:
ip_output() -> (ether|atm|whatever)_output() -> arpresolve()

Lookup is done in proper place (link-layer output routine) and it is possible
to provide cached lle data.

For IPv6 situation is more complex:
ip6_output() -> nd6_output() -> nd6_output_ifp() -> (whatever)_output() ->
nd6_storelladdr()

We have ip6_ouput() which calls nd6_output() instead of link output routine.
nd6_output() does the following:
* checks if lle exists, creates it if needed (similar to arpresolve())
* performes lle state transitions (similar to arpresolve())
* calls nd6_output_ifp() which pushes packets to link output routine along
with running SeND/MAC hooks regardless of lle state
(e.g. works as run-hooks placeholder).

After that, iface output routine like ether_output() calls nd6_storelladdr()
which performs lle lookup once again.

As a result, we perform lookup twice for each outgoing packet for most types
of interfaces. We also need to maintain runtime-checked table of 'nd6-free'
interfaces (see nd6_need_cache()).

Fix this behavior by eliminating first ND lookup. To be more specific:
* make all nd6_output() consumers use nd6_output_ifp() instead
* rename nd6_output[_slow]() to nd6_resolve_[slow]()
* convert nd6_resolve() and nd6_resolve_slow() to arpresolve() semantics,
e.g. copy L2 address to buffer instead of pushing packet towards lower
layers
* Make all nd6_storelladdr() users use nd6_resolve()
* eliminate nd6_storelladdr()

The resulting callchain is the following:
ip6_output() -> nd6_output_ifp() -> (whatever)_output() -> nd6_resolve()

Error handling:
Currently sending packet to non-existing la results in ip6_<output|forward>
-> nd6_output() -> nd6_output _lle() which returns 0.
In new scenario packet is propagated to <ether|whatever>_output() ->
nd6_resolve() which will return EWOULDBLOCK, and that result
will be converted to 0.

(And EWOULDBLOCK is actually used by IB/TOE code).

Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D1469


# 2f6c345a 01-Sep-2015 Kristof Provost <kp@FreeBSD.org>

pf: Fix misdetection of forwarding when net.link.bridge.pfil_bridge is set

If net.link.bridge.pfil_bridge is set we can end up thinking we're forwarding in
pf_test6() because the rcvif and the ifp (output interface) are different.
In that case we're bridging though, and the rcvif the the bridge member on which
the packet was received and ifp is the bridge itself.
If we'd set dir to PF_FWD we'd end up calling ip6_forward() which is incorrect.

Instead check if the rcvif is a member of the ifp bridge. (In other words, the
if_bridge is the ifp's softc). If that's the case we're not forwarding but
bridging.

PR: 202351
Reviewed by: eri
Differential Revision: https://reviews.freebsd.org/D3534


# 299c819a 28-Jul-2015 Renato Botelho <garga@FreeBSD.org>

Simplify logic added in r285945 as suggested by glebius

Approved by: glebius
MFC after: 3 days
Sponsored by: Netgate


# b1b98a2d 28-Jul-2015 Renato Botelho <garga@FreeBSD.org>

Respect pf rule log option before log dropped packets with IP options or
dangerous v6 headers

Reviewed by: gnn, eri
Approved by: gnn
Obtained from: pfSense
MFC after: 3 days
Sponsored by: Netgate
Differential Revision: https://reviews.freebsd.org/D3222


# 3e437fd2 28-Jul-2015 Gleb Smirnoff <glebius@FreeBSD.org>

Fix a typo in r280169. Of course we are interested in deleting nsn only
if we have just created it and we were the last reference.

Submitted by: dhartmei


# a5b789f6 24-Jun-2015 Ermal Luçi <eri@FreeBSD.org>

ALTQ FAIRQ discipline import from DragonFLY

Differential Revision: https://reviews.freebsd.org/D2847
Reviewed by: glebius, wblock(manpage)
Approved by: gnn(mentor)
Obtained from: pfSense
Sponsored by: Netgate


# 3dd01a88 19-May-2015 Gleb Smirnoff <glebius@FreeBSD.org>

Use MTX_SYSINIT() instead of mtx_init() to separate mutex initialization
from associated structures initialization. The mutexes are global, while
the structures are per-vnet.

Submitted by: Nikos Vassiliadis <nvass gmx.com>


# 78680d05 18-May-2015 Gleb Smirnoff <glebius@FreeBSD.org>

A miss from r283061: don't dereference NULL is pf_get_mtag() fails.

PR: 200222
Submitted by: Franco Fichtner <franco opnsense.org>


# b7f69c50 18-May-2015 Gleb Smirnoff <glebius@FreeBSD.org>

Don't dereference NULL is pf_get_mtag() fails.

PR: 200222
Submitted by: Franco Fichtner <franco opnsense.org>


# 3d1bbe5f 14-Apr-2015 Kristof Provost <kp@FreeBSD.org>

pf: Fix forwarding detection

If the direction is not PF_OUT we can never be forwarding. Some input packets
have rcvif != ifp (looped back packets), which lead us to ip6_forward() inbound
packets, causing panics.

Equally, we need to ensure that packets were really received and not locally
generated before trying to ip6_forward() them.

Differential Revision: https://reviews.freebsd.org/D2286
Approved by: gnn(mentor)


# 3e8c6d74 16-Mar-2015 Gleb Smirnoff <glebius@FreeBSD.org>

Always lock the hash row of a source node when updating its 'states' counter.

PR: 182401
Sponsored by: Nginx, Inc.


# 998fbd14 12-Mar-2015 Andrey V. Elsukov <ae@FreeBSD.org>

Reset mbuf pointer to NULL in fastroute case to indicate that mbuf was
consumed by filter. This fixes several panics due to accessing to mbuf
after free.

Submitted by: Kristof Provost
MFC after: 1 week


# 39a58828 16-Feb-2015 Gleb Smirnoff <glebius@FreeBSD.org>

In the forwarding case refragment the reassembled packets with the same
size as they arrived in. This allows the sender to determine the optimal
fragment size by Path MTU Discovery.

Roughly based on the OpenBSD work by Alexander Bluhm.

Submitted by: Kristof Provost
Differential Revision: D1767


# f5ceb22b 15-Feb-2015 Gleb Smirnoff <glebius@FreeBSD.org>

Update the pf fragment handling code to closer match recent OpenBSD.
That partially fixes IPv6 fragment handling. Thanks to Kristof for
working on that.

Submitted by: Kristof Provost
Tested by: peter
Differential Revision: D1765


# efc6c51f 21-Jan-2015 Gleb Smirnoff <glebius@FreeBSD.org>

Back out r276841, r276756, r276747, r276746. The change in r276747 is very
very questionable, since it makes vimages more dependent on each other. But
the reason for the backout is that it screwed up shutting down the pf purge
threads, and now kernel immedially panics on pf module unload. Although module
unloading isn't an advertised feature of pf, it is very important for
development process.

I'd like to not backout r276746, since in general it is good. But since it
has introduced numerous build breakages, that later were addressed in
r276841, r276756, r276747, I need to back it out as well. Better replay it
in clean fashion from scratch.


# 8d665c6b 06-Jan-2015 Craig Rodrigues <rodrigc@FreeBSD.org>

Reapply previous patch to fix build.

PR: 194515


# 4de985af 06-Jan-2015 Craig Rodrigues <rodrigc@FreeBSD.org>

Instead of creating a purge thread for every vnet, create
a single purge thread and clean up all vnets from this thread.

PR: 194515
Differential Revision: D1315
Submitted by: Nikos Vassiliadis <nvass@gmx.com>


# c75820c7 06-Jan-2015 Craig Rodrigues <rodrigc@FreeBSD.org>

Merge: r258322 from projects/pf branch

Split functions that initialize various pf parts into their
vimage parts and global parts.
Since global parts appeared to be only mutex initializations, just
abandon them and use MTX_SYSINIT() instead.
Kill my incorrect VNET_FOREACH() iterator and instead use correct
approach with VNET_SYSINIT().

PR: 194515
Differential Revision: D1309
Submitted by: glebius, Nikos Vassiliadis <nvass@gmx.com>
Reviewed by: trociny, zec, gnn


# f9723c77 20-Nov-2014 Alexander V. Chernikov <melifaro@FreeBSD.org>

Simplify API: use new NHOP_LOOKUP_AIFP flag to select what ifp
we need to return.
Rename fib[64]_lookup_nh_basic to fib[64]_lookup_nh, add flags
fields for all relevant functions.


# 5b07fc31 09-Nov-2014 Alexander V. Chernikov <melifaro@FreeBSD.org>

Finish r274315: remove union 'u' from struct pf_send_entry.

Suggested by: kib


# a458ad86 09-Nov-2014 Alexander V. Chernikov <melifaro@FreeBSD.org>

Remove unused 'struct route' fields.


# 257480b8 04-Nov-2014 Alexander V. Chernikov <melifaro@FreeBSD.org>

Convert netinet6/ to use new routing API.

* Remove &ifpp from ip6_output() in favor of ri->ri_nh_info
* Provide different wrappers to in6_selectsrc:
Currently it is used by 2 differenct type of customers:
- socket-based one, which all are unsure about provided
address scope and
- in-kernel ones (ND code mostly), which don't have
any sockets, options, crededentials, etc.
So, we provide two different wrappers to in6_selectsrc()
returning select source.
* Make different versions of selectroute():
Currenly selectroute() is used in two scenarios:
- SAS, via in6_selecsrc() -> in6_selectif() -> selectroute()
- output, via in6_output -> wrapper -> selectroute()
Provide different versions for each customer:
- fib6_lookup_nh_basic()-based in6_selectif() which is
capable of returning interface only, without MTU/NHOP/L2
calculations
- full-blown fib6_selectroute() with cached route/multipath/
MTU/L2
* Stop using routing table for link-local address lookups
* Add in6_ifawithifp_lla() to make for-us check faster for link-local
* Add in6_splitscope / in6_setllascope for faster embed/deembed scopes


# 1b0f129f 25-Oct-2014 Alexander V. Chernikov <melifaro@FreeBSD.org>

Convert last piece of pf to use fib4_lookup_nh_ext().


# 7b42f6fa 25-Oct-2014 Alexander V. Chernikov <melifaro@FreeBSD.org>

* Convert TOE framework to use new routing api.
* Add fib6_lookup_nh_ext().
* Rename union structures:
nhop64_basic -> nhopu_basic,
nhop64_extended -> nhopu_extended


# f0188618 21-Oct-2014 Hans Petter Selasky <hselasky@FreeBSD.org>

Fix multiple incorrect SYSCTL arguments in the kernel:

- Wrong integer type was specified.

- Wrong or missing "access" specifier. The "access" specifier
sometimes included the SYSCTL type, which it should not, except for
procedural SYSCTL nodes.

- Logical OR where binary OR was expected.

- Properly assert the "access" argument passed to all SYSCTL macros,
using the CTASSERT macro. This applies to both static- and dynamically
created SYSCTLs.

- Properly assert the the data type for both static and dynamic
SYSCTLs. In the case of static SYSCTLs we only assert that the data
pointed to by the SYSCTL data pointer has the correct size, hence
there is no easy way to assert types in the C language outside a
C-function.

- Rewrote some code which doesn't pass a constant "access" specifier
when creating dynamic SYSCTL nodes, which is now a requirement.

- Updated "EXAMPLES" section in SYSCTL manual page.

MFC after: 3 days
Sponsored by: Mellanox Technologies


# 99e9de87 18-Oct-2014 Dag-Erling Smørgrav <des@FreeBSD.org>

Add a complete implementation of MurmurHash3. Tweak both implementations
so they match the established idiom. Document them in hash(9).

MFC after: 1 month
MFC with: r272906


# 9ae91cc4 12-Oct-2014 Alexander V. Chernikov <melifaro@FreeBSD.org>

Implement fib*_lookup_nh_basic to provide fast non-refcounted
way to determine egress ifp / mtu.


# 1d2baefc 10-Oct-2014 George V. Neville-Neil <gnn@FreeBSD.org>

Change the PF hash from Jenkins to Murmur3. In forwarding tests
this showed a conservative 3% incrase in PPS.

Differential Revision: https://reviews.freebsd.org/D461
Submitted by: des
Reviewed by: emaste
MFC after: 1 month


# bf7dcda3 03-Sep-2014 Gleb Smirnoff <glebius@FreeBSD.org>

Clean up unused CSUM_FRAGMENT.

Sponsored by: Nginx, Inc.


# b616ae25 01-Sep-2014 Gleb Smirnoff <glebius@FreeBSD.org>

Explicitly free packet on PF_DROP, otherwise a "quick" rule with
"route-to" may still forward it.

PR: 177808
Submitted by: Kajetan Staszkiewicz <kajetan.staszkiewicz innogames.de>
Sponsored by: InnoGames GmbH


# e85343b1 15-Aug-2014 Gleb Smirnoff <glebius@FreeBSD.org>

Do not lookup source node twice when pf_map_addr() is used.

PR: 184003
Submitted by: Kajetan Staszkiewicz <vegeta tuxpowered.net>
Sponsored by: InnoGames GmbH


# afab0f7e 15-Aug-2014 Gleb Smirnoff <glebius@FreeBSD.org>

pf_map_addr() can fail and in this case we should drop the packet,
otherwise bad consequences including a routing loop can occur.

Move pf_set_rt_ifp() earlier in state creation sequence and
inline it, cutting some extra code.

PR: 183997
Submitted by: Kajetan Staszkiewicz <vegeta tuxpowered.net>
Sponsored by: InnoGames GmbH


# 11341cf9 14-Aug-2014 Gleb Smirnoff <glebius@FreeBSD.org>

Fix synproxy with IPv6. pf_test6() was missing a check for M_SKIP_FIREWALL.

PR: 127920
Submitted by: Kajetan Staszkiewicz <vegeta tuxpowered.net>
Sponsored by: InnoGames GmbH


# a9572d8f 14-Aug-2014 Gleb Smirnoff <glebius@FreeBSD.org>

- Count global pf(4) statistics in counter(9).
- Do not count global number of states and of src_nodes,
use uma_zone_get_cur() to obtain values.
- Struct pf_status becomes merely an ioctl API structure,
and moves to netpfil/pf/pf.h with its constants.
- V_pf_status is now of type struct pf_kstatus.

Submitted by: Kajetan Staszkiewicz <vegeta tuxpowered.net>
Sponsored by: InnoGames GmbH


# af3b2549 27-Jun-2014 Hans Petter Selasky <hselasky@FreeBSD.org>

Pull in r267961 and r267973 again. Fix for issues reported will follow.


# 37a107a4 27-Jun-2014 Glen Barber <gjb@FreeBSD.org>

Revert r267961, r267973:

These changes prevent sysctl(8) from returning proper output,
such as:

1) no output from sysctl(8)
2) erroneously returning ENOMEM with tools like truss(1)
or uname(1)
truss: can not get etype: Cannot allocate memory


# 3da1cf1e 27-Jun-2014 Hans Petter Selasky <hselasky@FreeBSD.org>

Extend the meaning of the CTLFLAG_TUN flag to automatically check if
there is an environment variable which shall initialize the SYSCTL
during early boot. This works for all SYSCTL types both statically and
dynamically created ones, except for the SYSCTL NODE type and SYSCTLs
which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to
be used in the case a tunable sysctl has a custom initialisation
function allowing the sysctl to still be marked as a tunable. The
kernel SYSCTL API is mostly the same, with a few exceptions for some
special operations like iterating childrens of a static/extern SYSCTL
node. This operation should probably be made into a factored out
common macro, hence some device drivers use this. The reason for
changing the SYSCTL API was the need for a SYSCTL parent OID pointer
and not only the SYSCTL parent OID list pointer in order to quickly
generate the sysctl path. The motivation behind this patch is to avoid
parameter loading cludges inside the OFED driver subsystem. Instead of
adding special code to the OFED driver subsystem to post-load tunables
into dynamically created sysctls, we generalize this in the kernel.

Other changes:
- Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask"
to "hw.pcic.intr_mask".
- Removed redundant TUNABLE statements throughout the kernel.
- Some minor code rewrites in connection to removing not needed
TUNABLE statements.
- Added a missing SYSCTL_DECL().
- Wrapped two very long lines.
- Avoid malloc()/free() inside sysctl string handling, in case it is
called to initialize a sysctl from a tunable, hence malloc()/free() is
not ready when sysctls from the sysctl dataset are registered.
- Bumped FreeBSD version to indicate SYSCTL API change.

MFC after: 2 weeks
Sponsored by: Mellanox Technologies


# b437b06c 29-May-2014 John Baldwin <jhb@FreeBSD.org>

Fix pf(4) to build with MAXCPU set to 256. MAXCPU is actually a count,
not a maximum ID value (so it is a cap on mp_ncpus, not mp_maxid).


# 79bde95f 16-Apr-2014 Gleb Smirnoff <glebius@FreeBSD.org>

Backout r257223,r257224,r257225,r257246,r257710. The changes caused
some regressions in ICMP handling, and right now me and Baptiste
are out of time on analyzing them.

PR: 188253


# 0a7c583a 06-Apr-2014 Martin Matuska <mm@FreeBSD.org>

Execute pf_overload_task() in vnet context. Fixes a vnet kernel panic.

Reviewed by: trociny


# 7e92ce73 29-Mar-2014 Martin Matuska <mm@FreeBSD.org>

De-virtualize UMA zone pf_mtag_z and move to global initialization part.

The m_tag struct does not know about vnet context and the pf_mtag_free()
callback is called unaware of current vnet. This causes a panic.

Reviewed by: Nikos Vassiliadis, trociny@


# e3a7aa6f 04-Mar-2014 Gleb Smirnoff <glebius@FreeBSD.org>

- Remove rt_metrics_lite and simply put its members into rtentry.
- Use counter(9) for rt_pksent (former rt_rmx.rmx_pksent). This
removes another cache trashing ++ from packet forwarding path.
- Create zini/fini methods for the rtentry UMA zone. Via initialize
mutex and counter in them.
- Fix reporting of rmx_pksent to routing socket.
- Fix netstat(1) to report "Use" both in kvm(3) and sysctl(3) mode.

The change is mostly targeted for stable/10 merge. For head,
rt_pksent is expected to just disappear.

Discussed with: melifaro
Sponsored by: Netflix
Sponsored by: Nginx, Inc.


# dc64d6b7 19-Feb-2014 Martin Matuska <mm@FreeBSD.org>

Revert r262196

I am going to split this into two individual patches and test it with
the projects/pf branch that may get merged later.


# a93b9a64 18-Feb-2014 Martin Matuska <mm@FreeBSD.org>

De-virtualize pf_mtag_z [1]
Process V_pf_overloadqueue in vnet context [2]

This fixes two VIMAGE kernel panics and allows to simultaneously run host-pf
and vnet jails. pf inside jails remains broken.

PR: kern/182964
Submitted by: glebius@FreeBSD.org [2], myself [1]
Tested by: rodrigc@FreeBSD.org, myself
MFC after: 2 weeks


# 48278b88 14-Feb-2014 Gleb Smirnoff <glebius@FreeBSD.org>

Once pf became not covered by a single mutex, many counters in it became
race prone. Some just gather statistics, but some are later used in
different calculations.

A real problem was the race provoked underflow of the states_cur counter
on a rule. Once it goes below zero, it wraps to UINT32_MAX. Later this
value is used in pf_state_expires() and any state created by this rule
is immediately expired.

Thus, make fields states_cur, states_tot and src_nodes of struct
pf_rule be counter(9)s.

Thanks to Dennis for providing me shell access to problematic box and
his help with reproducing, debugging and investigating the problem.

Thanks to: Dennis Yusupoff <dyr smartspb.net>
Also reported by: dumbbell, pgj, Rambler
Sponsored by: Nginx, Inc.


# be3d21a2 22-Jan-2014 Gleb Smirnoff <glebius@FreeBSD.org>

Remove NULL pointer dereference.

CID: 1009118


# 0b5d46ce 21-Dec-2013 Gleb Smirnoff <glebius@FreeBSD.org>

Fix fallout from r258479: in pf_free_src_node() the node must already
be unlinked.

Reported by: Konstantin Kukushkin <dark rambler-co.ru>
Sponsored by: Nginx, Inc.


# d77c1b32 22-Nov-2013 Gleb Smirnoff <glebius@FreeBSD.org>

To support upcoming changes change internal API for source node handling:
- Removed pf_remove_src_node().
- Introduce pf_unlink_src_node() and pf_unlink_src_node_locked().
These function do not proceed with freeing of a node, just disconnect
it from storage.
- New function pf_free_src_nodes() works on a list of previously
disconnected nodes and frees them.
- Utilize new API in pf_purge_expired_src_nodes().

In collaboration with: Kajetan Staszkiewicz <kajetan.staszkiewicz innogames.de>

Sponsored by: InnoGames GmbH
Sponsored by: Nginx, Inc.


# 4280d14d 22-Nov-2013 Gleb Smirnoff <glebius@FreeBSD.org>

Style: don't compare unsigned <= 0.

Sponsored by: Nginx, Inc.


# f053058c 18-Nov-2013 Gleb Smirnoff <glebius@FreeBSD.org>

- Split functions that initialize various pf parts into their vimage
parts and global parts.
- Since global parts appeared to be only mutex initializations, just
abandon them and use MTX_SYSINIT() instead.
- Kill my incorrect VNET_FOREACH() iterator and instead use correct
approach with VNET_SYSINIT().

Submitted by: Nikos Vassiliadis <nvass gmx.com>
Reviewed by: trociny


# 6c71335c 05-Nov-2013 Gleb Smirnoff <glebius@FreeBSD.org>

Fix fallout from r257223. Since pf_test_state_icmp() can call
pf_icmp_state_lookup() twice, we need to unlock previously found state.

Reported & tested by: gavin


# e1b58d2c 04-Nov-2013 Gleb Smirnoff <glebius@FreeBSD.org>

Code logic of handling PFTM_PURGE into pf_find_state().


# 7710f9f1 04-Nov-2013 Gleb Smirnoff <glebius@FreeBSD.org>

Remove unused PFTM_UNTIL_PACKET const.


# 1ce5620d 28-Oct-2013 Gleb Smirnoff <glebius@FreeBSD.org>

- Fix VIMAGE build.
- Fix build with gcc.


# 0664b03c 27-Oct-2013 Baptiste Daroussin <bapt@FreeBSD.org>

Import pf.c 1.638 from OpenBSD

Original log:
Some ICMP types that also have icmp_id, pointed out by markus@

Obtained from: OpenBSD


# 5fff3f10 27-Oct-2013 Baptiste Daroussin <bapt@FreeBSD.org>

Improt pf.c 1.636 from OpenBSD

Original log:
Make sure pd2 has a pointer to the icmp header in the payload; fixes
panic seen with some some icmp types in icmp error message payloads.

Obtained from: OpenBSD


# 44df0d93 27-Oct-2013 Baptiste Daroussin <bapt@FreeBSD.org>

Import pf.c 1.635 and pf_lb.c 1.4 from OpenBSD

Stricter state checking for ICMP and ICMPv6 packets: include the ICMP type

in one port of the state key, using the type to determine which
side should be the id, and which should be the type. Also:
- Handle ICMP6 messages which are typically sent to multicast
addresses but recieve unicast replies, by doing fallthrough lookups
against the correct multicast address. - Clear up some mistaken
assumptions in the PF code:
- Not all ICMP packets have an icmp_id, so simulate
one based on other data if we can, otherwise set it to 0.
- Don't modify the icmp id field in NAT unless it's echo
- Use the full range of possible id's when NATing icmp6 echoy

Difference with OpenBSD version:
- C99ify the new code
- WITHOUT_INET6 safe

Reviewed by: glebius
Obtained from: OpenBSD


# 75bf2db3 27-Oct-2013 Gleb Smirnoff <glebius@FreeBSD.org>

Move new pf includes to the pf directory. The pfvar.h remain
in net, to avoid compatibility breakage for no sake.

The future plan is to split most of non-kernel parts of
pfvar.h into pf.h, and then make pfvar.h a kernel only
include breaking compatibility.

Discussed with: bz


# 76039bc8 26-Oct-2013 Gleb Smirnoff <glebius@FreeBSD.org>

The r48589 promised to remove implicit inclusion of if_var.h soon. Prepare
to this event, adding if_var.h to files that do need it. Also, include
all includes that now are included due to implicit pollution via if_var.h

Sponsored by: Netflix
Sponsored by: Nginx, Inc.


# 86bd0491 19-Aug-2013 Andre Oppermann <andre@FreeBSD.org>

Add m_clrprotoflags() to clear protocol specific mbuf flags at up and
downwards layer crossings.

Consistently use it within IP, IPv6 and ethernet protocols.

Discussed with: trociny, glebius


# 6828cc99 19-Jun-2013 Gleb Smirnoff <glebius@FreeBSD.org>

De-vnet hash sizes and hash masks.

Submitted by: Nikos Vassiliadis <nvass gmx.com>
Reviewed by: trociny


# 93ecffe5 13-Jun-2013 Gleb Smirnoff <glebius@FreeBSD.org>

Improve locking strategy between keys hash and ID hash.

Before this change state creating sequence was:

1) lock wire key hash
2) link state's wire key
3) unlock wire key hash
4) lock stack key hash
5) link state's stack key
6) unlock stack key hash
7) lock ID hash
8) link into ID hash
9) unlock ID hash

What could happen here is that other thread finds the state via key
hash lookup after 6), locks ID hash and does some processing of the
state. When the thread creating state unblocks, it finds the state
it was inserting already non-virgin.

Now we perform proper interlocking between key hash locks and ID hash
lock:

1) lock wire & stack hashes
2) link state's keys
3) lock ID hash
4) unlock wire & stack hashes
5) link into ID hash
6) unlock ID hash

To achieve that, the following hacking was performed in pf_state_key_attach():

- Key hash mutex is marked with MTX_DUPOK.
- To avoid deadlock on 2 key hash mutexes, we lock them in order determined
by their address value.
- pf_state_key_attach() had a magic to reuse a > FIN_WAIT_2 state. It unlinked
the conflicting state synchronously. In theory this could require locking
a third key hash, which we can't do now.
Now we do not remove the state immediately, instead we leave this task to
the purge thread. To avoid conflicts in a short period before state is
purged, we push to the very end of the TAILQ.
- On success, before dropping key hash locks, pf_state_key_attach() locks
ID hash and returns.

Tested by: Ian FREISLICH <ianf clue.co.za>


# 5af77b3e 11-May-2013 Gleb Smirnoff <glebius@FreeBSD.org>

Return meaningful error code from pf_state_key_attach() and
pf_state_insert().


# 03911dec 11-May-2013 Gleb Smirnoff <glebius@FreeBSD.org>

Better debug message.


# 7a954bbb 06-May-2013 Gleb Smirnoff <glebius@FreeBSD.org>

Simplify printf().


# dc4ad05e 14-Mar-2013 Gleb Smirnoff <glebius@FreeBSD.org>

Use m_get/m_gethdr instead of compat macros.

Sponsored by: Nginx, Inc.


# d8aa10cc 28-Dec-2012 Gleb Smirnoff <glebius@FreeBSD.org>

In netpfil/pf:
- Add my copyright to files I've touched a lot this year.
- Add dash in front of all copyright notices according to style(9).
- Move $OpenBSD$ down below copyright notices.
- Remove extra line between cdefs.h and __FBSDID.


# f5002be6 17-Dec-2012 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Warn about reaching various PF limits.

Reviewed by: glebius
Obtained from: WHEEL Systems


# feaa4dd2 12-Dec-2012 Gleb Smirnoff <glebius@FreeBSD.org>

Initialize state id prior to attaching state to key hash. Otherwise a
race can happen, when pf_find_state() finds state via key hash, and locks
id hash slot 0 instead of appropriate to state id slot.


# 59cc9fde 06-Dec-2012 Gleb Smirnoff <glebius@FreeBSD.org>

Rule memory garbage collecting in new pf scans only states that are on
id hash. If a state has been disconnected from id hash, its rule pointers
can no longer be dereferenced, and referenced memory can't be modified.
Thus, move rule statistics from pf_free_rule() to pf_unlink_rule() and
update them prior to releasing id hash slot lock.

Reported by: Ian FREISLICH <ianf cloudseed.co.za>


# 38cc0bfa 06-Dec-2012 Gleb Smirnoff <glebius@FreeBSD.org>

Close possible races between state deletion and sent being sent out
from pfsync:
- Call into pfsync_delete_state() holding the state lock.
- Set the state timeout to PFTM_UNLINKED after state has been moved
to the PFSYNC_S_DEL queue in pfsync.

Reported by: Ian FREISLICH <ianf cloudseed.co.za>


# 078468ed 26-Oct-2012 Gleb Smirnoff <glebius@FreeBSD.org>

o Remove last argument to ip_fragment(), and obtain all needed information
on checksums directly from mbuf flags. This simplifies code.
o Clear CSUM_IP from the mbuf in ip_fragment() if we did checksums in
hardware. Some driver may not announce CSUM_IP in theur if_hwassist,
although try to do checksums if CSUM_IP set on mbuf. Example is em(4).
o While here, consistently use CSUM_IP instead of its alias CSUM_DELAY_IP.
After this change CSUM_DELAY_IP vanishes from the stack.

Submitted by: Sebastian Kuzminsky <seb lineratesystems.com>


# 8f134647 22-Oct-2012 Gleb Smirnoff <glebius@FreeBSD.org>

Switch the entire IPv4 stack to keep the IP packet header
in network byte order. Any host byte order processing is
done in local variables and host byte order values are
never[1] written to a packet.

After this change a packet processed by the stack isn't
modified at all[2] except for TTL.

After this change a network stack hacker doesn't need to
scratch his head trying to figure out what is the byte order
at the given place in the stack.

[1] One exception still remains. The raw sockets convert host
byte order before pass a packet to an application. Probably
this would remain for ages for compatibility.

[2] The ip_input() still subtructs header len from ip->ip_len,
but this is planned to be fixed soon.

Reviewed by: luigi, Maxim Dounin <mdounin mdounin.ru>
Tested by: ray, Olivier Cochard-Labbe <olivier cochard.me>


# 23e9c6dc 08-Oct-2012 Gleb Smirnoff <glebius@FreeBSD.org>

After r241245 it appeared that in_delayed_cksum(), which still expects
host byte order, was sometimes called with net byte order. Since we are
moving towards net byte order throughout the stack, the function was
converted to expect net byte order, and its consumers fixed appropriately:
- ip_output(), ipfilter(4) not changed, since already call
in_delayed_cksum() with header in net byte order.
- divert(4), ng_nat(4), ipfw_nat(4) now don't need to swap byte order
there and back.
- mrouting code and IPv6 ipsec now need to switch byte order there and
back, but I hope, this is temporary solution.
- In ipsec(4) shifted switch to net byte order prior to in_delayed_cksum().
- pf_route() catches up on r241245 changes to ip_output().


# ea2951be 06-Oct-2012 Gleb Smirnoff <glebius@FreeBSD.org>

The pfil(9) layer guarantees us presence of the protocol header,
so remove extra check, that is always false.

P.S. Also, goto there lead to unlocking a not locked rwlock.


# e2cfe424 28-Sep-2012 Gleb Smirnoff <glebius@FreeBSD.org>

Simplify and somewhat redesign interaction between pf_purge_thread() and
pf_purge_expired_states().

Now pf purging daemon stores the current hash table index on stack
in pf_purge_thread(), and supplies it to next iteration of
pf_purge_expired_states(). The latter returns new index back.

The important change is that whenever pf_purge_expired_states() wraps
around the array it returns immediately. This makes our knowledge about
status of states expiry run more consistent. Prior to this change it
could happen that n-th run stopped on i-th entry, and returned (1) as
full run complete, then next (n+1) full run stopped on j-th entry, where
j < i, and that broke the mark-and-sweep algorythm that saves references
rules. A referenced rule was freed, and this later lead to a crash.


# 29bdd62c 22-Sep-2012 Gleb Smirnoff <glebius@FreeBSD.org>

When connection rate hits and we overload a source to a table,
we are actually editing table, which means editing rules,
thus we need writer access to 'em.

Fix this by offloading the update of table to the same taskqueue,
we already use for flushing. Since taskqueues major task is now
overloading, and flushing is optional, do mechanical rename
s/flush/overload/ in the code related to the taskqueue.

Since overloading tasks do unsafe referencing of rules, provide
a bandaid in pf_purge_unlinked_rules(). If the latter sees any
queued tasks, then it skips purging for this run.

In table code:
- Assert any lock in pfr_lookup_addr().
- Assert writer lock in pfr_route_kentry().


# b7340ded 20-Sep-2012 Gleb Smirnoff <glebius@FreeBSD.org>

Reduce copy/paste when freeing an source node.


# 22c91478 20-Sep-2012 Gleb Smirnoff <glebius@FreeBSD.org>

Utilize Jenkins hash with random seed for source nodes storage.


# 1d6139c0 18-Sep-2012 Gleb Smirnoff <glebius@FreeBSD.org>

Make ruleset anchors in pf(4) reentrant. We've got two problems here:

1) Ruleset parser uses a global variable for anchor stack.
2) When processing a wildcard anchor, matching anchors are marked.

To fix the first one:

o Allocate anchor processing stack on stack. To make this allocation
as small as possible, following measures taken:
- Maximum stack size reduced from 64 to 32.
- The struct pf_anchor_stackframe trimmed by one pointer - parent.
We can always obtain the parent via the rule pointer.
- When pf_test_rule() calls pf_get_translation(), the former lends
its stack to the latter, to avoid recursive allocation 32 entries.

The second one appeared more tricky. The code, that marks anchors was
added in OpenBSD rev. 1.516 of pf.c. According to commit log, the idea
is to enable the "quick" keyword on an anchor rule. The feature isn't
documented anywhere. The most obscure part of the 1.516 was that code
examines the "match" mark on a just processed child, which couldn't be
put here by current frame. Since this wasn't documented even in the
commit message and functionality of this is not clear to me, I decided
to drop this examination for now. The rest of 1.516 is redone in a
thread safe manner - the mark isn't put on the anchor itself, but on
current stack frame. To avoid growing stack frame, we utilize LSB
from the rule pointer, relying on kernel malloc(9) returning pointer
aligned addresses.

Discussed with: dhartmei


# 3b3a8eb9 14-Sep-2012 Gleb Smirnoff <glebius@FreeBSD.org>

o Create directory sys/netpfil, where all packet filters should
reside, and move there ipfw(4) and pf(4).

o Move most modified parts of pf out of contrib.

Actual movements:

sys/contrib/pf/net/*.c -> sys/netpfil/pf/
sys/contrib/pf/net/*.h -> sys/net/
contrib/pf/pfctl/*.c -> sbin/pfctl
contrib/pf/pfctl/*.h -> sbin/pfctl
contrib/pf/pfctl/pfctl.8 -> sbin/pfctl
contrib/pf/pfctl/*.4 -> share/man/man4
contrib/pf/pfctl/*.5 -> share/man/man5

sys/netinet/ipfw -> sys/netpfil/ipfw

The arguable movement is pf/net/*.h -> sys/net. There are
future plans to refactor pf includes, so I decided not to
break things twice.

Not modified bits of pf left in contrib: authpf, ftp-proxy,
tftp-proxy, pflogd.

The ipfw(4) movement is planned to be merged to stable/9,
to make head and stable match.

Discussed with: bz, luigi