History log of /freebsd-11-stable/sys/netpfil/ipfw/ip_fw2.c
Revision Date Author Comments
# 361832 05-Jun-2020 ae

MFC r361624:
Fix O_IP_FLOW_LOOKUP opcode handling.

Do not check table value matching when table lookup has failed.


# 356036 23-Dec-2019 ae

MFC r355712:
Make TCP options parsing stricter.

Rework tcpopts_parse() to be more strict. Use const pointer. Add length
checks for specific TCP options. The main purpose of the change is
avoiding of possible out of mbuf's data access.

Reported by: Maxime Villard


# 355851 17-Dec-2019 ae

MFC r355581:
Avoid access to stale ip pointer and call UPDATE_POINTERS() after
PULLUP_LEN_LOCKED().

PULLUP_LEN_LOCKED() could update mbuf and thus we need to update related
pointers that can be used in next opcodes.

Reported by: Maxime Villard <max at m00nbsd net>

NOTE: this commit also adds UPDATE_POINTERS() stub macro, that originally
is part of r345166 commit that was not merged.


# 355850 17-Dec-2019 ae

MFC r350413:
Avoid possible lock leaking.

After r343619 ipfw uses own locking for packets flow. PULLUP_LEN() macro
is used in ipfw_chk() to make m_pullup(). When m_pullup() fails, it just
returns via `goto pullup_failed`. There are two places where PULLUP_LEN()
is called with IPFW_PF_RLOCK() held.

Add PULLUP_LEN_LOCKED() macro to use in these places to be able release
the lock, when m_pullup() fails.

Sponsored by: Yandex LLC

NOTE: since r343619 was not merged, this commit is mostly NOP, but
it is needed to reduce code difference between stable and head/.


# 349648 03-Jul-2019 ae

MFC r349366:
Follow the RFC 3128 and drop short TCP fragments with offset = 1.


# 349647 03-Jul-2019 ae

MFC r349365:
Mark default rule with IPFW_RULE_NOOPT flag, so it can be showed in
compact form.


# 349573 01-Jul-2019 ae

MFC r349267:
Add "tcpmss" opcode to match the TCP MSS value.

With this opcode it is possible to match TCP packets with specified
MSS option, whose value corresponds to configured in opcode value.
It is allowed to specify single value, range of values, or array of
specific values or ranges. E.g.

# ipfw add deny log tcp from any to any tcpmss 0-500


# 347333 08-May-2019 ae

MFC r346884:
Add IPv6 support for O_IPLEN opcode.

Obtained from: Yandex LLC


# 346201 14-Apr-2019 ae

MFC r342908:
Reduce the size of struct ip_fw_args from 240 to 128 bytes on amd64.
And refactor the code to avoid unneeded initialization to reduce overhead
of per-packet processing.

ipfw(4) can be invoked by pfil(9) framework for each packet several times.
Each call uses on-stack variable of type struct ip_fw_args to keep the
state of ipfw(4) processing. Currently this variable has 240 bytes size
on amd64. Each time ipfw(4) does bzero() on it, and then it initializes
some fields.

glebius@ has reported that they at Netflix discovered, that initialization
of this variable produces significant overhead on packet processing.
After patching I managed to increase performance of packet processing on
simple routing with ipfw(4) firewalling to about 11% from 9.8Mpps up to
11Mpps (Xeon E5-2660 v4@ + Mellanox 100G card).

Introduced new field flags, it is used to keep track of what fields was
initialized. Some fields were moved into the anonymous union, to reduce
the size. They all are mutually exclusive. dummypar field was unused, and
therefore it is removed. The hopstore6 field type was changed from
sockaddr_in6 to a bit smaller struct ip_fw_nh6. And now the size of struct
ip_fw_args is 128 bytes.

ipfw_chk() was modified to properly handle ip_fw_args.flags instead of
rely on checking for NULL pointers.

Obtained from: Yandex LLC
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D18690

MFC r342909:
Fix the build with INVARIANTS.

MFC r343551:
Fix the bug introduced in r342908, that causes problems with dynamic
handling for protocols without ports numbers.

Since port numbers were uninitialized for protocols like ICMP/ICMPv6,
ipfw_chk() used some non-zero values to create dynamic states, and due
this it failed to match replies with created states.

Reported by: Oliver Hartmann, Boris Lytochkin
Obtained from: Yandex LLC


# 343142 18-Jan-2019 ae

MFC 342925:
Relax requirement to packet size of CARP protocol and remove version check.

CARP shares protocol number 112 with VRRP (RFC 5798). And the size of
VRRP packet may be smaller than CARP. ipfw_chk() does m_pullup() to at
least sizeof(struct carp_header) and can fail when packet is VRRP. This
leads to packet drop and message about failed pullup attempt.
Also, RFC 5798 defines version 3 of VRRP protocol, this version number
also unsupported by CARP and such check leads to packet drop.

carp_input() does its own checks for protocol version and packet size,
so we can remove these checks to be able pass VRRP packets.

PR: 234207


# 339580 22-Oct-2018 ae

MFC r339357:
Add extra parentheses to fix "versrcreach" opcode, (oif != NULL) should
not be used as condition for ternary operator.

Submitted by: Tatsuki Makino <tatsuki_makino at hotmail dot com>


# 337461 08-Aug-2018 ae

MFC r336132:
Add "record-state", "set-limit" and "defer-action" rule options to ipfw.

"record-state" is similar to "keep-state", but it doesn't produce implicit
O_PROBE_STATE opcode in a rule. "set-limit" is like "limit", but it has the
same feature as "record-state", it is single opcode without implicit
O_PROBE_STATE opcode. "defer-action" is targeted to be used with dynamic
states. When rule with this opcode is matched, the rule's action will
not be executed, instead dynamic state will be created. And when this
state will be matched by "check-state", then rule action will be executed.
This allows create a more complicated rulesets.

Submitted by: lev


# 332401 11-Apr-2018 ae

MFC r328988,r328989:
Rework ipfw dynamic states implementation to be lockless on fast path.

o added struct ipfw_dyn_info that keeps all needed for ipfw_chk and
for dynamic states implementation information;
o added DYN_LOOKUP_NEEDED() macro that can be used to determine the
need of new lookup of dynamic states;
o ipfw_dyn_rule now becomes obsolete. Currently it used to pass
information from kernel to userland only.
o IPv4 and IPv6 states now described by different structures
dyn_ipv4_state and dyn_ipv6_state;
o IPv6 scope zones support is added;
o ipfw(4) now depends from Concurrency Kit;
o states are linked with "entry" field using CK_SLIST. This allows
lockless lookup and protected by mutex modifications.
o the "expired" SLIST field is used for states expiring.
o struct dyn_data is used to keep generic information for both IPv4
and IPv6;
o struct dyn_parent is used to keep O_LIMIT_PARENT information;
o IPv4 and IPv6 states are stored in different hash tables;
o O_LIMIT_PARENT states now are kept separately from O_LIMIT and
O_KEEP_STATE states;
o per-cpu dyn_hp pointers are used to implement hazard pointers and they
prevent freeing states that are locklessly used by lookup threads;
o mutexes to protect modification of lists in hash tables now kept in
separate arrays. 65535 limit to maximum number of hash buckets now
removed.
o Separate lookup and install functions added for IPv4 and IPv6 states
and for parent states.
o By default now is used Jenkinks hash function.

Obtained from: Yandex LLC
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D12685


# 332229 07-Apr-2018 tuexen

MFC r326233:

Add to ipfw support for sending an SCTP packet containing an ABORT chunk.
This is similar to the TCP case. where a TCP RST segment can be sent.

There is one limitation: When sending an ABORT in response to an incoming
packet, it should be tested if there is no ABORT chunk in the received
packet. Currently, it is only checked if the first chunk is an ABORT
chunk to avoid parsing the whole packet, which could result in a DOS attack.

Thanks to Timo Voelker for helping me to test this patch.

MFC r327200:

When adding support for sending SCTP packets containing an ABORT chunk
to ipfw in https://svnweb.freebsd.org/changeset/base/326233,
a dependency on the SCTP stack was added to ipfw by accident.

This was noted by Kevel Bowling in https://reviews.freebsd.org/D13594
where also a solution was suggested. This patch is based on Kevin's
suggestion, but implements the required SCTP checksum computation
without any dependency on other SCTP sources.

While there, do some cleanups and improve comments.

Thanks to Kevin Kevin Bowling for reporting the issue and suggesting
a fix.


# 332209 07-Apr-2018 tuexen

MFC r324216:

Fix a bug which avoided that rules for matching port numbers for SCTP
packets where actually matched.
While there, make clean in the man-page that SCTP port numbers are
supported in rules.


# 331201 19-Mar-2018 ae

MFC r330792:
Do not try to reassemble IPv6 fragments in "reass" rule.

ip_reass() expects IPv4 packet and will just corrupt any IPv6 packets
that it gets. Until proper IPv6 fragments handling function will be
implemented, pass IPv6 packets to next rule.

PR: 170604


# 328968 07-Feb-2018 ae

MFC r328326:
When IPv6 packet is handled by O_REJECT opcode, convert ICMP code
specified in the arg1 into ICMPv6 destination unreachable code according
to RFC7915.

Obtained from: Yandex LLC
Sponsored by: Yandex LLC


# 328772 02-Feb-2018 ae

MFC r328161:
Add UDPLite support to ipfw(4).

Now it is possible to use UDPLite's port numbers in rules,
create dynamic states for UDPLite packets and see "UDPLite" for matched
packets in log.

Obtained from: Yandex LLC
Sponsored by: Yandex LLC


# 326388 30-Nov-2017 ae

MFC r326086:
Add ipfw_add_protected_rule() function that creates rule with 65535
number in the reserved set 31. Use this function to create default rule.

MFC r326115:
Rework rule ranges matching. Use comparison rule id with UINT32_MAX to
match all rules with the same rule number.

MFC r326116:
Move ipfw_send_pkt() from ip_fw_dynamic.c into ip_fw2.c.
It is not specific for dynamic states function and called also from
generic code.

MFC r326117:
Check that address family of state matches address family of packet.
If it is not matched avoid comparing other state fields.

MFC r326118:
Modify ipfw's dynamic states KPI.

Hide the locking logic used in the dynamic states implementation from
generic code. Rename ipfw_install_state() and ipfw_lookup_dyn_rule()
function to have similar names: ipfw_dyn_install_state() and
ipfw_dyn_lookup_state(). Move dynamic rule counters updating to the
ipfw_dyn_lookup_state() function. Now this function return NULL when
there is no state and pointer to the parent rule when state is found.
Thus now there is no need to return pointer to dynamic rule, and no need
to hold bucket lock for this state. Remove ipfw_dyn_unlock() function.

Differential Revision: https://reviews.freebsd.org/D11657

Obtained from: Yandex LLC
Sponsored by: Yandex LLC


# 326142 24-Nov-2017 ae

MFC r325960:
Unconditionally enable support for O_IPSEC opcode.

IPsec support can be loaded as kernel module, thus do not depend from
kernel option IPSEC and always build O_IPSEC opcode implementation as
enabled.

MFC r325962:
Do not invoke IPv4 NAT handler for non IPv4 packets. Libalias expects
a packet is IPv4. And in case when it is IPv6, it just translates them
as IPv4. This leads to corruption and in some cases to panics.
In particular a panic can happen when value of ip6_plen modified to
something that leads to IP fragmentation, but actual packet length does
not match the IP length.

Packets that are not IPv4 will be dropped by NAT rule.


# 325229 31-Oct-2017 ae

MFC r324947:
Add IPv6 support for O_TCPDATALEN opcode.

PR: 222746


# 324790 20-Oct-2017 ae

MFC r324593:
Fix regression in handling O_FORWARD_IP opcode after r279948.

To properly handle 'fwd tablearg,port' opcode, copy sin_port value from
sockaddr_in structure stored in the opcode into corresponding hopstore
field.

PR: 222953


# 324047 27-Sep-2017 ae

MFC r323839:
Use in_localip() function instead of unlocked access to addresses hash
to determine that an address is our local.

PR: 220078


# 321811 31-Jul-2017 philip

MFC r320941: Fix GRE over IPv6 tunnels with IPFW

Previously, GRE packets in IPv6 tunnels would be dropped by IPFW (unless
net.inet6.ip6.fw.deny_unknown_exthdrs was unset).

PR: 220640
Submitted by: Kun Xie <kxie@xiplink.com>


# 317042 17-Apr-2017 ae

MFC r316461:
Remove "IPFW static rules" rmlock.

Make PFIL's lock global and use it for this purpose.
This reduces the number of locks needed to acquire for each packet.

Obtained from: Yandex LLC
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D10154


# 316605 07-Apr-2017 ae

MFC r316329:
Reset the cached state of last lookup in the dynamic states when an
external action is completed, but the rule search is continued.

External action handler can change the content of @args argument,
that is used for dynamic state lookup. Enforce the new lookup to be able
install new state, when the search is continued.

Obtained from: Yandex LLC
Sponsored by: Yandex LLC


# 316446 03-Apr-2017 ae

MFC r304041:
Move logging via BPF support into separate file.

* make interface cloner VNET-aware;
* simplify cloner code and use if_clone_simple();
* migrate LOGIF_LOCK() to rmlock;
* add ipfw_bpf_mtap2() function to pass mbuf to BPF;
* introduce new additional ipfwlog0 pseudo interface. It differs from
ipfw0 by DLT type used in bpfattach. This interface is intended to
used by ipfw modules to dump packets with additional info attached.
Currently pflog format is used. ipfw_bpf_mtap2() function uses second
argument to determine which interface use for dumping. If dlen is equal
to ETHER_HDR_LEN it uses old ipfw0 interface, if dlen is equal to
PFLOG_HDRLEN - ipfwlog0 will be used.

Obtained from: Yandex LLC
Sponsored by: Yandex LLC

MFC r304043:
Add three helper function to manage tables from external modules.

ipfw_objhash_lookup_table_kidx does lookup kernel index of table;
ipfw_ref_table/ipfw_unref_table takes and releases reference to table.

Obtained from: Yandex LLC
Sponsored by: Yandex LLC

MFC r304046, 304108:
Add ipfw_nat64 module that implements stateless and stateful NAT64.

The module works together with ipfw(4) and implemented as its external
action module.

Stateless NAT64 registers external action with name nat64stl. This
keyword should be used to create NAT64 instance and to address this
instance in rules. Stateless NAT64 uses two lookup tables with mapped
IPv4->IPv6 and IPv6->IPv4 addresses to perform translation.

A configuration of instance should looks like this:
1. Create lookup tables:
# ipfw table T46 create type addr valtype ipv6
# ipfw table T64 create type addr valtype ipv4
2. Fill T46 and T64 tables.
3. Add rule to allow neighbor solicitation and advertisement:
# ipfw add allow icmp6 from any to any icmp6types 135,136
4. Create NAT64 instance:
# ipfw nat64stl NAT create table4 T46 table6 T64
5. Add rules that matches the traffic:
# ipfw add nat64stl NAT ip from any to table(T46)
# ipfw add nat64stl NAT ip from table(T64) to 64:ff9b::/96
6. Configure DNS64 for IPv6 clients and add route to 64:ff9b::/96
via NAT64 host.

Stateful NAT64 registers external action with name nat64lsn. The only
one option required to create nat64lsn instance - prefix4. It defines
the pool of IPv4 addresses used for translation.

A configuration of instance should looks like this:
1. Add rule to allow neighbor solicitation and advertisement:
# ipfw add allow icmp6 from any to any icmp6types 135,136
2. Create NAT64 instance:
# ipfw nat64lsn NAT create prefix4 A.B.C.D/28
3. Add rules that matches the traffic:
# ipfw add nat64lsn NAT ip from any to A.B.C.D/28
# ipfw add nat64lsn NAT ip6 from any to 64:ff9b::/96
4. Configure DNS64 for IPv6 clients and add route to 64:ff9b::/96
via NAT64 host.

Obtained from: Yandex LLC
Relnotes: yes
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D6434

MFC r304048:
Replace __noinline with special debug macro NAT64NOINLINE.

MFC r304061:
Use %ju to print unsigned 64-bit value.

MFC r304076:
Make statistics nat64lsn, nat64stl an nptv6 output netstat-like:
"@value @description" and fix build due to -Wformat errors.

MFC r304378 (by bz):
Try to fix gcc compilation errors (which are right).
nat64_getlasthdr() returns an int, which can be -1 in case of error,
storing the result in an uint8_t and then comparing to < 0 is not
helpful. Do what is done in the rest of the code and make proto an
int here as well.

MFC r309187:
Fix ICMPv6 Time Exceeded error message translation.

MFC r314718:
Use new ipfw_lookup_table() in the nat64 too.

MFC r315204,315233:
Use memset with structure size.


# 316444 03-Apr-2017 ae

MFC r303012:
Add ipfw_nptv6 module that implements Network Prefix Translation for IPv6
as defined in RFC 6296. The module works together with ipfw(4) and
implemented as its external action module. When it is loaded, it registers
as eaction and can be used in rules. The usage pattern is similar to
ipfw_nat(4). All matched by rule traffic goes to the NPT module.

Reviewed by: hrs
Obtained from: Yandex LLC
Relnotes: yes
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D6420

MFC r304049:
Add `stats reset` command implementation to NPTv6 module
to be able reset statistics counters.

Obtained from: Yandex LLC
Sponsored by: Yandex LLC

MFC r304076:
Make statistics nat64lsn, nat64stl an nptv6 output netstat-like:
"@value @description" and fix build due to -Wformat errors.

MFC r314507:
Fix NPTv6 rule counters when one_pass is not enabled.

Consider the rule matching when both @done and @retval values
returned from ipfw_run_eaction() are zero. And modify ipfw_nptv6()
to return IP_FW_DENY and @done=0 when addresses do not match.

Obtained from: Yandex LLC
Sponsored by: Yandex LLC


# 316274 30-Mar-2017 ae

MFC r303018:
Add named dynamic states support to ipfw(4).

The keep-state, limit and check-state now will have additional argument
flowname. This flowname will be assigned to dynamic rule by keep-state
or limit opcode. And then can be matched by check-state opcode or
O_PROBE_STATE internal opcode. To reduce possible breakage and to maximize
compatibility with old rulesets default flowname introduced.
It will be assigned to the rules when user has omitted state name in
keep-state and check-state opcodes. Also if name is ambiguous (can be
evaluated as rule opcode) it will be replaced to default.

Reviewed by: julian
Obtained from: Yandex LLC
Relnotes: yes
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D6674

MFC r304087:
Do not warn about ambiguous state name when we inspect a comment token.

MFC r304089:
Add an ability to attach comment to check-state rules.

MFC r310727 (by marius):
Fix a bug in r272840; given that the optlen parameter of setsockopt(2)
is a 32-bit socklen_t, do_get3() passes the kernel to access the wrong
32-bit half on big-endian LP64 machines when simply casting the 64-bit
size_t optlen to a socklen_t pointer.
While at it and given that the intention of do_get3() apparently is to
hide/wrap the fact that socket options are used for communication with
ipfw(4), change the optlen parameter of do_set3() to be of type size_t
and as such more appropriate than uintptr_t, too.

MFC r315305:
Change the syntax of ipfw's named states.

Since the state name is an optional argument, it often can conflict
with other options. To avoid ambiguity now the state name must be
prefixed with a colon.

Sponsored by: Yandex LLC


# 315532 19-Mar-2017 ae

MFC r314716:
Add IPv6 support to O_IP_DST_LOOKUP opcode.

o check the size of O_IP_SRC_LOOKUP opcode, it can not exceed the size of
ipfw_insn_u32;
o rename ipfw_lookup_table_extended() function into ipfw_lookup_table() and
remove old ipfw_lookup_table();
o use args->f_id.flow_id6 that is in host byte order to get DSCP value;
o add SCTP ports support to 'lookup src/dst-port' opcode;
o add IPv6 support to 'lookup src/dst-ip' opcode.

PR: 217292
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D9873


# 314990 10-Mar-2017 ae

MFC r314614:
Fix matching table entry value. Use real table value instead of its index
in valuestate array.

When opcode has size equal to ipfw_insn_u32, this means that it should
additionally match value specified in d[0] with table entry value.
ipfw_table_lookup() returns table value index, use TARG_VAL() macro to
convert it to its value. The actual 32-bit value stored in the tag field
of table_value structure, where all unspecified u32 values are kept.

PR: 217262


# 306475 29-Sep-2016 ae

MFC r305940:
Move opcode rewriter init and destroy handlers into non-VNET code.

PR: 212576,212649,212077
Submitted by: John Zielinski


# 304079 14-Aug-2016 ae

MFC r303955:
Restore "nat global" support.

Now zero value of arg1 used to specify "tablearg", use the old "tablearg"
value for "nat global". Introduce new macro IP_FW_NAT44_GLOBAL to replace
hardcoded magic number to specify "nat global". Also replace 65535 magic
number with corresponding macro. Fix typo in comments.

PR: 211256