History log of /freebsd-current/sys/netpfil/pf/pf_lb.c
Revision Date Author Comments
# d10de21f 24-Aug-2023 Kajetan Staszkiewicz <vegeta@tuxpowered.net>

pf: Access r->rpool.cur->kif under mutex protection

pf_route() sends traffic to a specified next hop over a specific
interface. The next hop is obtained in pf_map_addr() but the interface
is obtained directly via r->rpool.cur->kif` outside of the lock held in
pf_map_addr() in multiple places around pf. The chosen interface is not
stored in source node.

Move the interface selection into pf_map_addr(), have the function
return it together with the chosen IP address and ensure its stored
in struct pf_ksrc_node, store it in the source node and use the stored
value when needed.

Sponsored by: InnoGames GmbH
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D41570


# 685dc743 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove $FreeBSD$: one-line .c pattern

Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/


# 6053adaf 01-Jun-2023 Kristof Provost <kp@FreeBSD.org>

pf: add SCTP NAT support

Support NAT-ing SCTP connections.

This is mostly similar to UDP and TCP, but we refuse to change ports for
SCTP, to avoid interfering with multihomed connections.

As a result we also never copy the SCTP header back or recalculate
checksums as we'd do for TCP or UDP (because we don't modify the header
for SCTP).

We do use the existing pf_change_ap() function to modify the packet,
because we may still need to update the IPv4 header checksum.

Reviewed by: tuexen
MFC after: 3 weeks
Sponsored by: Orange Business Services
Differential Revision: https://reviews.freebsd.org/D40866


# f2064dd1 12-Jul-2023 Kajetan Staszkiewicz <vegeta@tuxpowered.net>

pf: Fix duplicate storage of direction

The variable storing the direction of a processed packet is passed
around to many functions. Most of those functions already have a pointer
to struct pf_pdesc which also contains the direction. By using the one
in struct pf_pdesc we can reduce the amount of arguments passed around.

Reviewed by: kp
Sponsored by: InnGames GmbH
Differential Revision: https://reviews.freebsd.org/D41008


# 16303d2b 03-May-2023 Kajetan Staszkiewicz <vegeta@tuxpowered.net>

pf: improve source node error handling

Functions manipulating source nodes can fail due to various reasons like
memory allocation errors, hitting configured limits or lack of
redirection targets. Ensure those errors are properly caught and
propagated in the code. Increase the error counters not only when
parsing the main ruleset but the NAT ruleset too.

Cherry-picked from development of D39880

Reviewed by: kp
Sponsored by: InnoGames GmbH
Differential Revision: https://reviews.freebsd.org/D39940


# db0a2bfd 01-May-2023 Kajetan Staszkiewicz <vegeta@tuxpowered.net>

pf: reduce number of hashing operations when handling source nodes

Reduce number of hashing operations when handling source nodes by always
having a pointer to the hash row mutex in the source node. Provide
macros for handling and asserting the mutex. Calculate the hash only
once in pf_find_src_node() and then use this hash in subsequent
operations.

Cherry-picked from development of D39880

Reviewed by: kp, mjg
Sponsored by: InnoGames GmbH
Differential Revision: https://reviews.freebsd.org/D39888


# 6b94546a 09-Sep-2022 Mateusz Guzik <mjg@FreeBSD.org>

pf: partially depessimize pf_match_translation

Reviewed by: kp
Sponsored by: Rubicon Communications, LLC ("Netgate")


# 5f5e32f1 10-Jan-2022 Kristof Provost <kp@FreeBSD.org>

pf: protect the rpool from races

The roundrobin pool stores its state in the rule, which could
potentially lead to invalid addresses being returned.

For example, thread A just executed PF_AINC(&rpool->counter) and
immediately afterwards thread B executes PF_ACPY(naddr, &rpool->counter)
(i.e. after the pf_match_addr() check of rpool->counter).

Lock the rpool with its own mutex to prevent these races. The
performance impact of this is expected to be low, as each rule has its
own lock, and the lock is also only relevant when state is being created
(so only for the initial packets of a connection, not for all traffic).

See also: https://redmine.pfsense.org/issues/12660
Reviewed by: glebius
MFC after: 3 weeks
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D33874


# 498cca14 16-Oct-2021 Kristof Provost <kp@FreeBSD.org>

pf: selecting pf_map_addr is not an error

When a redirection/nat IP address is selected by pf_map_addr it is
logged with PF_DEBUG_MISC level. This one according to the manual means
"Generate debug messages for various errors". Selecting an IP address is
not an error, it's a normal function of pf for route-to, nat and some
other operations. Therefore PF_DEBUG_NOISY level should be choosen which
is means "Generate debug messages for common conditions".

PR: 259184
Submitted by: Kajetan Staszkiewicz <vegeta tuxpowered.net>
Sponsored by: InnoGames GmbH


# 02cf67cc 22-Jul-2021 Mateusz Guzik <mjg@FreeBSD.org>

pf: switch rule counters to pf_counter_u64

Reviewed by: kp
Sponsored by: Rubicon Communications, LLC ("Netgate")


# 19d6e29b 08-Jul-2021 Mateusz Guzik <mjg@FreeBSD.org>

pf: add pf_find_state_all_exists

Reviewed by: kp
Sponsored by: Rubicon Communications, LLC ("Netgate")


# d38630f6 04-Jun-2021 Kristof Provost <kp@FreeBSD.org>

pf: store L4 headers in pf_pdesc

Rather than pointers to the headers store full copies. This brings us
slightly closer to what OpenBSD does, and also makes more sense than
storing pointers to stack variable copies of the headers.

Reviewed by: donner, scottl
MFC after: 1 week
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D30719


# 2aa21096 13-Apr-2021 Kurosawa Takahiro <takahiro.kurosawa@gmail.com>

pf: Implement the NAT source port selection of MAP-E Customer Edge

MAP-E (RFC 7597) requires special care for selecting source ports
in NAT operation on the Customer Edge because a part of bits of the port
numbers are used by the Border Relay to distinguish another side of the
IPv4-over-IPv6 tunnel.

PR: 254577
Reviewed by: kp
Differential Revision: https://reviews.freebsd.org/D29468


# 320c1116 12-Dec-2020 Kristof Provost <kp@FreeBSD.org>

pf: Split pfi_kif into a user and kernel space structure

No functional change.

MFC after: 2 weeks
Sponsored by: Orange Business Services
Differential Revision: https://reviews.freebsd.org/D27761


# c3adacda 05-Dec-2020 Kristof Provost <kp@FreeBSD.org>

pf: Change pf_krule counters to use counter_u64

This improves the cache behaviour of pf and results in improved
throughput.

MFC after: 2 weeks
Sponsored by: Orange Business Services
Differential Revision: https://reviews.freebsd.org/D27760


# e86bddea 05-Dec-2020 Kristof Provost <kp@FreeBSD.org>

pf: Split pf_rule into kernel and user space versions

No functional change intended.

MFC after: 2 weeks
Sponsored by: Orange Business Services
Differential Revision: https://reviews.freebsd.org/D27758


# 17ad7334 23-Dec-2020 Kristof Provost <kp@FreeBSD.org>

pf: Split pf_src_node into a kernel and userspace struct

Introduce a kernel version of struct pf_src_node (pf_ksrc_node).

This will allow us to improve the in-kernel data structure without
breaking userspace compatibility.

Reviewed by: philip
MFC after: 2 weeks
Sponsored by: Orange Business Services
Differential Revision: https://reviews.freebsd.org/D27707


# 336683f2 12-Dec-2018 Kristof Provost <kp@FreeBSD.org>

pf: Fix endless loop on NAT exhaustion with sticky-address

When we try to find a source port in pf_get_sport() it's possible that
all available source ports will be in use. In that case we call
pf_map_addr() to try to find a new source IP to try from. If there are
no more available source IPs pf_map_addr() will return 1 and we stop
trying.

However, if sticky-address is set we'll always return the same IP
address, even if we've already tried that one.
We need to check the supplied address, because if that's the one we'd
set it means pf_get_sport() has already tried it, and we should error
out rather than keep trying.

PR: 233867
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D18483


# 2b0a4ffa 06-Dec-2018 Kristof Provost <kp@FreeBSD.org>

pf: add a comment describing why do we call pf_map_addr again if port
selection process fails

Obtained from: OpenBSD


# 455969d3 30-May-2018 Kristof Provost <kp@FreeBSD.org>

pf: Replace rwlock on PF_RULES_LOCK with rmlock

Given that PF_RULES_LOCK is a mostly read lock, replace the rwlock with rmlock.
This change improves packet processing rate in high pps environments.
Benchmarking by olivier@ shows a 65% improvement in pps.

While here, also eliminate all appearances of "sys/rwlock.h" includes since it
is not used anymore.

Submitted by: farrokhi@
Differential Revision: https://reviews.freebsd.org/D15502


# fe267a55 27-Nov-2017 Pedro F. Giffuni <pfg@FreeBSD.org>

sys: general adoption of SPDX licensing ID tags.

Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

No functional change intended.


# 7f3ad018 08-Aug-2017 Kristof Provost <kp@FreeBSD.org>

pf_get_sport(): Prevent possible endless loop when searching for an unused nat port

This is an import of Alexander Bluhm's OpenBSD commit r1.60,
the first chunk had to be modified because on OpenBSD the
'cut' declaration is located elsewhere.

Upstream report by Jingmin Zhou:
https://marc.info/?l=openbsd-pf&m=150020133510896&w=2

OpenBSD commit message:
Use a 32 bit variable to detect integer overflow when searching for
an unused nat port. Prevents a possible endless loop if high port
is 65535 or low port is 0.
report and analysis Jingmin Zhou; OK sashan@ visa@
Quoted from: https://cvsweb.openbsd.org/cgi-bin/cvsweb/src/sys/net/pf_lb.c

PR: 221201
Submitted by: Fabian Keil <fk@fabiankeil.de>
Obtained from: OpenBSD via ElectroBSD
MFC after: 1 week


# 98a9874f 06-Mar-2017 Kristof Provost <kp@FreeBSD.org>

pf: Fix a crash in low-memory situations

If the call to pf_state_key_clone() in pf_get_translation() fails (i.e. there's
no more memory for it) it frees skp. This is wrong, because skp is a
pf_state_key **, so we need to free *skp, as is done later in the function.
Getting it wrong means we try to free a stack variable of the calling
pf_test_rule() function, and we panic.


# e85343b1 15-Aug-2014 Gleb Smirnoff <glebius@FreeBSD.org>

Do not lookup source node twice when pf_map_addr() is used.

PR: 184003
Submitted by: Kajetan Staszkiewicz <vegeta tuxpowered.net>
Sponsored by: InnoGames GmbH


# 79bde95f 16-Apr-2014 Gleb Smirnoff <glebius@FreeBSD.org>

Backout r257223,r257224,r257225,r257246,r257710. The changes caused
some regressions in ICMP handling, and right now me and Baptiste
are out of time on analyzing them.

PR: 188253


# a830c452 06-Jan-2014 Gleb Smirnoff <glebius@FreeBSD.org>

When pf_get_translation() fails, it should leave *sn pointer pristine,
otherwise we will panic in pf_test_rule().

PR: 182557


# e4e01d9c 14-Nov-2013 Gleb Smirnoff <glebius@FreeBSD.org>

Some fixups to pf_get_sport after r257223:

- Do not return blindly if proto isn't ICMP.
- The dport is in network order, so fix comparisons.
- Remove ridiculous htonl(arc4random()).
- Push local variable to a narrower block.


# 44df0d93 27-Oct-2013 Baptiste Daroussin <bapt@FreeBSD.org>

Import pf.c 1.635 and pf_lb.c 1.4 from OpenBSD

Stricter state checking for ICMP and ICMPv6 packets: include the ICMP type

in one port of the state key, using the type to determine which
side should be the id, and which should be the type. Also:
- Handle ICMP6 messages which are typically sent to multicast
addresses but recieve unicast replies, by doing fallthrough lookups
against the correct multicast address. - Clear up some mistaken
assumptions in the PF code:
- Not all ICMP packets have an icmp_id, so simulate
one based on other data if we can, otherwise set it to 0.
- Don't modify the icmp id field in NAT unless it's echo
- Use the full range of possible id's when NATing icmp6 echoy

Difference with OpenBSD version:
- C99ify the new code
- WITHOUT_INET6 safe

Reviewed by: glebius
Obtained from: OpenBSD


# 75bf2db3 27-Oct-2013 Gleb Smirnoff <glebius@FreeBSD.org>

Move new pf includes to the pf directory. The pfvar.h remain
in net, to avoid compatibility breakage for no sake.

The future plan is to split most of non-kernel parts of
pfvar.h into pf.h, and then make pfvar.h a kernel only
include breaking compatibility.

Discussed with: bz


# 76039bc8 26-Oct-2013 Gleb Smirnoff <glebius@FreeBSD.org>

The r48589 promised to remove implicit inclusion of if_var.h soon. Prepare
to this event, adding if_var.h to files that do need it. Also, include
all includes that now are included due to implicit pollution via if_var.h

Sponsored by: Netflix
Sponsored by: Nginx, Inc.


# 8fc6e19c 02-Sep-2013 Gleb Smirnoff <glebius@FreeBSD.org>

Merge 1.12 of pf_lb.c from OpenBSD, with some changes. Original commit:

date: 2010/02/04 14:10:12; author: sthen; state: Exp; lines: +24 -19;
pf_get_sport() picks a random port from the port range specified in a
nat rule. It should check to see if it's in-use (i.e. matches an existing
PF state), if it is, it cycles sequentially through other ports until
it finds a free one. However the check was being done with the state
keys the wrong way round so it was never actually finding the state
to be in-use.

- switch the keys to correct this, avoiding random state collisions
with nat. Fixes PR 6300 and problems reported by robert@ and viq.

- check pf_get_sport() return code in pf_test(); if port allocation
fails the packet should be dropped rather than sent out untranslated.

Help/ok claudio@.

Some additional changes to 1.12:

- We also need to bzero() the key to zero padding, otherwise key
won't match.
- Collapse two if blocks into one with ||, since both conditions
lead to the same processing.
- Only naddr changes in the cycle, so move initialization of other
fields above the cycle.
- s/u_intXX_t/uintXX_t/g

PR: kern/181690
Submitted by: Olivier Cochard-Labbé <olivier cochard.me>
Sponsored by: Nginx, Inc.


# d8aa10cc 28-Dec-2012 Gleb Smirnoff <glebius@FreeBSD.org>

In netpfil/pf:
- Add my copyright to files I've touched a lot this year.
- Add dash in front of all copyright notices according to style(9).
- Move $OpenBSD$ down below copyright notices.
- Remove extra line between cdefs.h and __FBSDID.


# 1d6139c0 18-Sep-2012 Gleb Smirnoff <glebius@FreeBSD.org>

Make ruleset anchors in pf(4) reentrant. We've got two problems here:

1) Ruleset parser uses a global variable for anchor stack.
2) When processing a wildcard anchor, matching anchors are marked.

To fix the first one:

o Allocate anchor processing stack on stack. To make this allocation
as small as possible, following measures taken:
- Maximum stack size reduced from 64 to 32.
- The struct pf_anchor_stackframe trimmed by one pointer - parent.
We can always obtain the parent via the rule pointer.
- When pf_test_rule() calls pf_get_translation(), the former lends
its stack to the latter, to avoid recursive allocation 32 entries.

The second one appeared more tricky. The code, that marks anchors was
added in OpenBSD rev. 1.516 of pf.c. According to commit log, the idea
is to enable the "quick" keyword on an anchor rule. The feature isn't
documented anywhere. The most obscure part of the 1.516 was that code
examines the "match" mark on a just processed child, which couldn't be
put here by current frame. Since this wasn't documented even in the
commit message and functionality of this is not clear to me, I decided
to drop this examination for now. The rest of 1.516 is redone in a
thread safe manner - the mark isn't put on the anchor itself, but on
current stack frame. To avoid growing stack frame, we utilize LSB
from the rule pointer, relying on kernel malloc(9) returning pointer
aligned addresses.

Discussed with: dhartmei


# 3b3a8eb9 14-Sep-2012 Gleb Smirnoff <glebius@FreeBSD.org>

o Create directory sys/netpfil, where all packet filters should
reside, and move there ipfw(4) and pf(4).

o Move most modified parts of pf out of contrib.

Actual movements:

sys/contrib/pf/net/*.c -> sys/netpfil/pf/
sys/contrib/pf/net/*.h -> sys/net/
contrib/pf/pfctl/*.c -> sbin/pfctl
contrib/pf/pfctl/*.h -> sbin/pfctl
contrib/pf/pfctl/pfctl.8 -> sbin/pfctl
contrib/pf/pfctl/*.4 -> share/man/man4
contrib/pf/pfctl/*.5 -> share/man/man5

sys/netinet/ipfw -> sys/netpfil/ipfw

The arguable movement is pf/net/*.h -> sys/net. There are
future plans to refactor pf includes, so I decided not to
break things twice.

Not modified bits of pf left in contrib: authpf, ftp-proxy,
tftp-proxy, pflogd.

The ipfw(4) movement is planned to be merged to stable/9,
to make head and stable match.

Discussed with: bz, luigi