History log of /freebsd-11-stable/sys/netpfil/ipfw/
Revision Date Author Comments
(<<< Hide modified files)
(Show modified files >>>)
364162 12-Aug-2020 ae

MFC r363888:
Handle delayed checksums if needed in NAT64.

Upper level protocols defer checksums calculation in hope we have
checksums offloading in a network card. CSUM_DELAY_DATA flag is used
to determine that checksum calculation was deferred. And IP output
routine checks for this flag before pass mbuf to lower layer. Forwarded
packets have not this flag.

NAT64 uses checksums adjustment when it translates IP headers.
In most cases NAT64 is used for forwarded packets, but in case when it
handles locally originated packets we need to finish checksum calculation
that was deferred to correctly adjust it.

Add check for presence of CSUM_DELAY_DATA flag and finish checksum
calculation before adjustment.

362303 18-Jun-2020 eugen

MFC r361789: ipfw: unbreak matching with big table type flow.

361832 05-Jun-2020 ae

MFC r361624:
Fix O_IP_FLOW_LOOKUP opcode handling.

Do not check table value matching when table lookup has failed.

356036 23-Dec-2019 ae

MFC r355712:
Make TCP options parsing stricter.

Rework tcpopts_parse() to be more strict. Use const pointer. Add length
checks for specific TCP options. The main purpose of the change is
avoiding of possible out of mbuf's data access.

Reported by: Maxime Villard

355851 17-Dec-2019 ae

MFC r355581:
Avoid access to stale ip pointer and call UPDATE_POINTERS() after
PULLUP_LEN_LOCKED().

PULLUP_LEN_LOCKED() could update mbuf and thus we need to update related
pointers that can be used in next opcodes.

Reported by: Maxime Villard <max at m00nbsd net>

NOTE: this commit also adds UPDATE_POINTERS() stub macro, that originally
is part of r345166 commit that was not merged.

355850 17-Dec-2019 ae

MFC r350413:
Avoid possible lock leaking.

After r343619 ipfw uses own locking for packets flow. PULLUP_LEN() macro
is used in ipfw_chk() to make m_pullup(). When m_pullup() fails, it just
returns via `goto pullup_failed`. There are two places where PULLUP_LEN()
is called with IPFW_PF_RLOCK() held.

Add PULLUP_LEN_LOCKED() macro to use in these places to be able release
the lock, when m_pullup() fails.

Sponsored by: Yandex LLC

NOTE: since r343619 was not merged, this commit is mostly NOP, but
it is needed to reduce code difference between stable and head/.

351387 22-Aug-2019 ae

MFC r351071:
Fix rule truncation on external action module unloading.

350583 05-Aug-2019 ae

MFC r350417:
Add ipfw_get_action() function to get the pointer to action opcode.

ACTION_PTR() returns pointer to the start of rule action section,
but rule can keep several rule modifiers like O_LOG, O_TAG and O_ALTQ,
and only then real action opcode is stored.

ipfw_get_action() function inspects the rule action section, skips
all modifiers and returns action opcode.

Use this function in ipfw_reset_eaction() and flush_nat_ptrs().

350138 19-Jul-2019 ae

MFC r349940:
Correctly truncate the rule in case when it has several action opcodes.

It is possible, that opcode at the ACTION_PTR() location is not real
action, but action modificator like "log", "tag" etc. In this case we
need to check for each opcode in the loop to find O_EXTERNAL_ACTION.

Obtained from: Yandex LLC
Sponsored by: Yandex LLC

MFC r349941:
Do not modify cmd pointer if it is already last opcode in the rule.

349648 03-Jul-2019 ae

MFC r349366:
Follow the RFC 3128 and drop short TCP fragments with offset = 1.

349647 03-Jul-2019 ae

MFC r349365:
Mark default rule with IPFW_RULE_NOOPT flag, so it can be showed in
compact form.

349573 01-Jul-2019 ae

MFC r349267:
Add "tcpmss" opcode to match the TCP MSS value.

With this opcode it is possible to match TCP packets with specified
MSS option, whose value corresponds to configured in opcode value.
It is allowed to specify single value, range of values, or array of
specific values or ranges. E.g.

# ipfw add deny log tcp from any to any tcpmss 0-500

349411 26-Jun-2019 ae

Fix the uninitialized use of source IPv6 address in NAT64LSN.

This code is already refactored in head/, but due to the missing
epoch(9) support it is impossible to merge. So, it is direct commit to
stable/11.

Reported by: Patrick M. Hausen <hausen punkt de>
Tested by: Patrick M. Hausen <hausen punkt de>
MFC after: 3 days

348997 12-Jun-2019 ae

MFC r348682:
Initialize V_nat64out methods explicitly.

It looks like initialization of static variable doesn't work for
VIMAGE and this leads to panic.

Approved by: re (gjb)

347333 08-May-2019 ae

MFC r346884:
Add IPv6 support for O_IPLEN opcode.

Obtained from: Yandex LLC

346212 14-Apr-2019 ae

MFC r345264:
Add NAT64 CLAT implementation as defined in RFC6877.

CLAT is customer-side translator that algorithmically translates 1:1
private IPv4 addresses to global IPv6 addresses, and vice versa.
It is implemented as part of ipfw_nat64 kernel module. When module
is loaded or compiled into the kernel, it registers "nat64clat" external
action. External action named instance can be created using `create`
command and then used in ipfw rules. The create command accepts two
IPv6 prefixes `plat_prefix` and `clat_prefix`. If plat_prefix is ommitted,
IPv6 NAT64 Well-Known prefix 64:ff9b::/96 will be used.

# ipfw nat64clat CLAT create clat_prefix SRC_PFX plat_prefix DST_PFX
# ipfw add nat64clat CLAT ip4 from IPv4_PFX to any out
# ipfw add nat64clat CLAT ip6 from DST_PFX to SRC_PFX in

Obtained from: Yandex LLC
Submitted by: Boris N. Lytochkin
Relnotes: yes
Sponsored by: Yandex LLC

346211 14-Apr-2019 ae

MFC r345263:
Add SPDX-License-Identifier and update year in copyright.

346210 14-Apr-2019 ae

MFC r345262:
Modify struct nat64_config.

Add second IPv6 prefix to generic config structure and rename another
fields to conform to RFC6877. Now it contains two prefixes and length:
PLAT is provider-side translator that translates N:1 global IPv6 addresses
to global IPv4 addresses. CLAT is customer-side translator (XLAT) that
algorithmically translates 1:1 IPv4 addresses to global IPv6 addresses.
Use PLAT prefix in stateless (nat64stl) and stateful (nat64lsn)
translators.

Modify nat64_extract_ip4() and nat64_embed_ip4() functions to accept
prefix length and use plat_plen to specify prefix length.

Retire net.inet.ip.fw.nat64_allow_private sysctl variable.
Add NAT64_ALLOW_PRIVATE flag and use "allow_private" config option to
configure this ability separately for each NAT64 instance.

Obtained from: Yandex LLC
Sponsored by: Yandex LLC

346209 14-Apr-2019 ae

MFC r339542:
Retire IPFIREWALL_NAT64_DIRECT_OUTPUT kernel option. And add ability
to switch the output method in run-time. Also document some sysctl
variables that can by changed for NAT64 module.

NAT64 had compile time option IPFIREWALL_NAT64_DIRECT_OUTPUT to use
if_output directly from nat64 module. By default is used netisr based
output method. Now both methods can be used, but they require different
handling by rules.

Obtained from: Yandex LLC
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D16647

346205 14-Apr-2019 ae

MFC r341471:
Reimplement how net.inet.ip.fw.dyn_keep_states works.

Turning on of this feature allows to keep dynamic states when parent
rule is deleted. But it works only when the default rule is
"allow from any to any".

Now when rule with dynamic opcode is going to be deleted, and
net.inet.ip.fw.dyn_keep_states is enabled, existing states will reference
named objects corresponding to this rule, and also reference the rule.
And when ipfw_dyn_lookup_state() will find state for deleted parent rule,
it will return the pointer to the deleted rule, that is still valid.
This implementation doesn't support O_LIMIT_PARENT rules.

The refcnt field was added to struct ip_fw to keep reference, also
next pointer added to be able iterate rules and not damage the content
when deleted rules are chained.

Named objects are referenced only when states are going to be deleted to
be able reuse kidx of named objects when new parent rules will be
installed.

ipfw_dyn_get_count() function was modified and now it also looks into
dynamic states and constructs maps of existing named objects. This is
needed to correctly export orphaned states into userland.

ipfw_free_rule() was changed to be global, since now dynamic state can
free rule, when it is expired and references counters becomes 1.

External actions subsystem also modified, since external actions can be
deregisterd and instances can be destroyed. In these cases deleted rules,
that are referenced by orphaned states, must be modified to prevent access
to freed memory. ipfw_dyn_reset_eaction(), ipfw_reset_eaction_instance()
functions added for these purposes.

Obtained from: Yandex LLC
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D17532

MFC r341472:
Add ability to request listing and deleting only for dynamic states.

This can be useful, when net.inet.ip.fw.dyn_keep_states is enabled, but
after rules reloading some state must be deleted. Added new flag '-D'
for such purpose.

Retire '-e' flag, since there can not be expired states in the meaning
that this flag historically had.

Also add "verbose" mode for listing of dynamic states, it can be enabled
with '-v' flag and adds additional information to states list. This can
be useful for debugging.

Obtained from: Yandex LLC
Sponsored by: Yandex LLC

MFC r344018:
Remove `set' field from state structure and use set from parent rule.

Initially it was introduced because parent rule pointer could be freed,
and rule's information could become inaccessible. In r341471 this was
changed. And now we don't need this information, and also it can become
stale. E.g. rule can be moved from one set to another. This can lead
to parent's set and state's set will not match. In this case it is
possible that static rule will be freed, but dynamic state will not.
This can happen when `ipfw delete set N` command is used to delete
rules, that were moved to another set.
To fix the problem we will use the set number from parent rule.

Obtained from: Yandex LLC
Sponsored by: Yandex LLC

MFC r344870:
Fix the problem with O_LIMIT states introduced in r344018.

dyn_install_state() uses `rule` pointer when it creates state.
For O_LIMIT states this pointer actually is not struct ip_fw,
it is pointer to O_LIMIT_PARENT state, that keeps actual pointer
to ip_fw parent rule. Thus we need to cache rule id and number
before calling dyn_get_parent_state(), so we can use them later
when the `rule` pointer is overrided.

PR: 236292

346201 14-Apr-2019 ae

MFC r342908:
Reduce the size of struct ip_fw_args from 240 to 128 bytes on amd64.
And refactor the code to avoid unneeded initialization to reduce overhead
of per-packet processing.

ipfw(4) can be invoked by pfil(9) framework for each packet several times.
Each call uses on-stack variable of type struct ip_fw_args to keep the
state of ipfw(4) processing. Currently this variable has 240 bytes size
on amd64. Each time ipfw(4) does bzero() on it, and then it initializes
some fields.

glebius@ has reported that they at Netflix discovered, that initialization
of this variable produces significant overhead on packet processing.
After patching I managed to increase performance of packet processing on
simple routing with ipfw(4) firewalling to about 11% from 9.8Mpps up to
11Mpps (Xeon E5-2660 v4@ + Mellanox 100G card).

Introduced new field flags, it is used to keep track of what fields was
initialized. Some fields were moved into the anonymous union, to reduce
the size. They all are mutually exclusive. dummypar field was unused, and
therefore it is removed. The hopstore6 field type was changed from
sockaddr_in6 to a bit smaller struct ip_fw_nh6. And now the size of struct
ip_fw_args is 128 bytes.

ipfw_chk() was modified to properly handle ip_fw_args.flags instead of
rely on checking for NULL pointers.

Obtained from: Yandex LLC
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D18690

MFC r342909:
Fix the build with INVARIANTS.

MFC r343551:
Fix the bug introduced in r342908, that causes problems with dynamic
handling for protocols without ports numbers.

Since port numbers were uninitialized for protocols like ICMP/ICMPv6,
ipfw_chk() used some non-zero values to create dynamic states, and due
this it failed to match replies with created states.

Reported by: Oliver Hartmann, Boris Lytochkin
Obtained from: Yandex LLC

345259 18-Mar-2019 ae

MFC r345004 (with modification):
Add IP_FW_NAT64 to codes that ipfw_chk() can return.

It will be used by upcoming NAT64 changes. We use separate code
to avoid propogating EACCES error code to user level applications
when NAT64 consumes a packet.

Obtained from: Yandex LLC
Sponsored by: Yandex LLC

345257 18-Mar-2019 ae

MFC r345003:
Add NULL pointer check to nat64_output().

It is possible that a processed packet was originated by local host,
in this case m->m_pkthdr.rcvif is NULL. Check and set it to V_loif to
avoid NULL pointer dereference in IP input code, since it is expected
that packet has valid receiving interface when netisr processes it.

Obtained from: Yandex LLC
Sponsored by: Yandex LLC

343142 18-Jan-2019 ae

MFC 342925:
Relax requirement to packet size of CARP protocol and remove version check.

CARP shares protocol number 112 with VRRP (RFC 5798). And the size of
VRRP packet may be smaller than CARP. ipfw_chk() does m_pullup() to at
least sizeof(struct carp_header) and can fail when packet is VRRP. This
leads to packet drop and message about failed pullup attempt.
Also, RFC 5798 defines version 3 of VRRP protocol, this version number
also unsupported by CARP and such check leads to packet drop.

carp_input() does its own checks for protocol version and packet size,
so we can remove these checks to be able pass VRRP packets.

PR: 234207

341842 12-Dec-2018 ae

MFC r341469:
Add assertion to check that named object has correct type.

341462 04-Dec-2018 ae

MFC r341073:
Do not limit the mbuf queue length for keepalive packets.

It was unlimited before overhaul, and one user reported that this limit
can be reached easily.

PR: 233562

340956 26-Nov-2018 eugen

MFC r339810: ipfw: implement ngtee/netgraph actions for layer-2 frames.

Kernel part of ipfw does not support and ignores rules other than
"pass", "deny" and dummynet-related for layer-2 (ethernet frames).
Others are processed as "pass".

Make it support ngtee/netgraph rules just like they are supported
for IP packets. For example, this allows us to mirror some frames
selectively to another interface for delivery to remote network analyzer
over RSPAN vlan. Assuming ng_ipfw(4) netgraph node has a hook named "900"
attached to "lower" hook of vlan900's ng_ether(4) node, that would be
as simple as:

ipfw add ngtee 900 ip from any to 8.8.8.8 layer2 out xmit igb0

PR: 213452
Tested-by: Fyodor Ustinov <ufm@ufm.su>

340544 18-Nov-2018 ae

Revert r340541. It requires VNET_DEFINE_STATIC() macro that is not yet
merged into stable/11.

340542 18-Nov-2018 ae

MFC r339544:
Call inet_ntop() only when its result is needed.

Obtained from: Yandex LLC
Sponsored by: Yandex LLC

340541 18-Nov-2018 ae

MFC r339542:
Retire IPFIREWALL_NAT64_DIRECT_OUTPUT kernel option. And add ability
to switch the output method in run-time. Also document some sysctl
variables that can by changed for NAT64 module.

NAT64 had compile time option IPFIREWALL_NAT64_DIRECT_OUTPUT to use
if_output directly from nat64 module. By default is used netisr based
output method. Now both methods can be used, but they require different
handling by rules.

Obtained from: Yandex LLC
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D16647

340538 18-Nov-2018 ae

MFC r339545:
Do not decrement RST life time if keep_alive is not turned on.

This allows use differen values configured by user for sysctl variable
net.inet.ip.fw.dyn_rst_lifetime.

Obtained from: Yandex LLC
Sponsored by: Yandex LLC

339580 22-Oct-2018 ae

MFC r339357:
Add extra parentheses to fix "versrcreach" opcode, (oif != NULL) should
not be used as condition for ternary operator.

Submitted by: Tatsuki Makino <tatsuki_makino at hotmail dot com>

338082 20-Aug-2018 loos

MFC r321316, r337860:

Fix a few typos in comments.

337902 16-Aug-2018 ae

MFC r337469:
Use host byte order when comparing mss values.

This fixes tcp-setmss action on little endian machines.

PR: 225536
Submitted by: John Zielinski

337461 08-Aug-2018 ae

MFC r336132:
Add "record-state", "set-limit" and "defer-action" rule options to ipfw.

"record-state" is similar to "keep-state", but it doesn't produce implicit
O_PROBE_STATE opcode in a rule. "set-limit" is like "limit", but it has the
same feature as "record-state", it is single opcode without implicit
O_PROBE_STATE opcode. "defer-action" is targeted to be used with dynamic
states. When rule with this opcode is matched, the rule's action will
not be executed, instead dynamic state will be created. And when this
state will be matched by "check-state", then rule action will be executed.
This allows create a more complicated rulesets.

Submitted by: lev

336468 19-Jul-2018 ae

MFC r336219:
Use correct size when we are allocating array for skipto index.

Also, there is no need to use M_ZERO for idxmap_back. It will be
re-filled just after allocation in update_skipto_cache().

PR: 229665

334836 08-Jun-2018 ae

MFC r333403:
Bring in some last changes in NAT64 implementation:

o Modify ipfw(8) to be able set any prefix6 not just Well-Known,
and also show configured prefix6;
o relocate some definitions and macros into proper place;
o convert nat64_debug and nat64_allow_private variables to be
VNET-compatible;
o add struct nat64_config that keeps generic configuration needed
to NAT64 code;
o add nat64_check_prefix6() function to check validness of specified
by user IPv6 prefix according to RFC6052;
o use nat64_check_private_ip4() and nat64_embed_ip4() functions
instead of nat64_get_ip4() and nat64_set_ip4() macros. This allows
to use any configured IPv6 prefixes that are allowed by RFC6052;
o introduce NAT64_WKPFX flag, that is set when IPv6 prefix is
Well-Known IPv6 prefix. It is used to reduce overhead to check this;
o modify nat64lsn_cfg and nat64stl_cfg structures to use nat64_config
structure. And respectivelly modify the rest of code;
o remove now unused ro argument from nat64_output() function;
o remove __FreeBSD_version ifdef, NAT64 was not merged to older versions;
o add commented -DIPFIREWALL_NAT64_DIRECT_OUTPUT flag to module's Makefile
as example.

MFC r333406:
Update NAT64 documentation, now we support any IPv6 prefixes.

334149 24-May-2018 ae

MFC r333986:
Remove check for matching the rulenum, ruleid and rule pointer from
dyn_lookup_ipv[46]_state_locked(). These checks are remnants of not
ready to be committed code, and they are there by accident.
Due to the race these checks can lead to creating of duplicate states
when concurrent threads in the same time will try to add state for two
packets of the same flow, but in reverse directions and matched by
different parent rules.

Reported by: lev

MFC r334039:
Restore the ability to keep states after parent rule deletion.

This feature is disabled by default and was removed when dynamic states
implementation changed to be lockless. Now it is reimplemented with small
differences - when dyn_keep_states sysctl variable is enabled,
dyn_match_ipv[46]_state() function doesn't match child states of deleted
rule. And thus they are keept alive until expired. ipfw_dyn_lookup_state()
function does check that state was not orphaned, and if so, it returns
pointer to default_rule and its position in the rules map. The main visible
difference is that orphaned states still have the same rule number that
they have before parent rule deleted, because now a state has many fields
related to rule and changing them all atomically to point to default_rule
seems hard enough.

Reported by: <lantw44 at gmail.com>
Approved by: re (kib)

332811 20-Apr-2018 ae

MFC r332467:
To avoid possible deadlock do not acquire JQUEUE_LOCK before callout_drain.

332772 19-Apr-2018 oleg

Fix ipfw table creation when net.inet.ip.fw.tables_sets = 0 and non zero set
specified on table creation. This fixes following:

# sysctl net.inet.ip.fw.tables_sets
net.inet.ip.fw.tables_sets: 0
# ipfw table all info
# ipfw set 1 table 1 create type addr
# ipfw set 1 table 1 create type addr
# ipfw add 10 set 1 count ip from table\(1\) to any
00010 count ip from table(1) to any
# ipfw add 10 set 1 count ip from table\(1\) to any
00010 count ip from table(1) to any
# ipfw table all info
--- table(1), set(1) ---
kindex: 4, type: addr
references: 1, valtype: legacy
algorithm: addr:radix
items: 0, size: 296
--- table(1), set(1) ---
kindex: 3, type: addr
references: 1, valtype: legacy
algorithm: addr:radix
items: 0, size: 296
--- table(1), set(1) ---
kindex: 2, type: addr
references: 0, valtype: legacy
algorithm: addr:radix
items: 0, size: 296
--- table(1), set(1) ---
kindex: 1, type: addr
references: 0, valtype: legacy
algorithm: addr:radix
items: 0, size: 296
#

332768 19-Apr-2018 ae

MFC r332459:
Fix integer types mismatch for flags field in nat64stl_cfg structure.

Also preserve internal flags on NAT64STL reconfiguration.

332767 19-Apr-2018 ae

MFC r332457:
Use cfg->nomatch_verdict as return value from NAT64LSN handler when
given mbuf is considered as not matched.

If mbuf was consumed or freed during handling, we must return
IP_FW_DENY, since ipfw's pfil handler ipfw_check_packet() expects
IP_FW_DENY when mbuf pointer is NULL. This fixes KASSERT panics
when NAT64 is used with INVARIANTS. Also remove unused nomatch_final
field from struct nat64lsn_cfg.

Reported by: Justin Holcomb <justin at justinholcomb dot me>

332766 19-Apr-2018 ae

MFC r332456:
Migrate NAT64 to FIB KPI.

332765 19-Apr-2018 ae

MFC r316825:
Use address of specific union member instead of whole union address to
fix PVS-Studio warnings.

MFC r316826:
Avoid undefined behavior.

The 'pktid' variable is modified while being used twice between
sequence points, probably due to htonl() is macro.

332401 11-Apr-2018 ae

MFC r328988,r328989:
Rework ipfw dynamic states implementation to be lockless on fast path.

o added struct ipfw_dyn_info that keeps all needed for ipfw_chk and
for dynamic states implementation information;
o added DYN_LOOKUP_NEEDED() macro that can be used to determine the
need of new lookup of dynamic states;
o ipfw_dyn_rule now becomes obsolete. Currently it used to pass
information from kernel to userland only.
o IPv4 and IPv6 states now described by different structures
dyn_ipv4_state and dyn_ipv6_state;
o IPv6 scope zones support is added;
o ipfw(4) now depends from Concurrency Kit;
o states are linked with "entry" field using CK_SLIST. This allows
lockless lookup and protected by mutex modifications.
o the "expired" SLIST field is used for states expiring.
o struct dyn_data is used to keep generic information for both IPv4
and IPv6;
o struct dyn_parent is used to keep O_LIMIT_PARENT information;
o IPv4 and IPv6 states are stored in different hash tables;
o O_LIMIT_PARENT states now are kept separately from O_LIMIT and
O_KEEP_STATE states;
o per-cpu dyn_hp pointers are used to implement hazard pointers and they
prevent freeing states that are locklessly used by lookup threads;
o mutexes to protect modification of lists in hash tables now kept in
separate arrays. 65535 limit to maximum number of hash buckets now
removed.
o Separate lookup and install functions added for IPv4 and IPv6 states
and for parent states.
o By default now is used Jenkinks hash function.

Obtained from: Yandex LLC
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D12685

332229 07-Apr-2018 tuexen

MFC r326233:

Add to ipfw support for sending an SCTP packet containing an ABORT chunk.
This is similar to the TCP case. where a TCP RST segment can be sent.

There is one limitation: When sending an ABORT in response to an incoming
packet, it should be tested if there is no ABORT chunk in the received
packet. Currently, it is only checked if the first chunk is an ABORT
chunk to avoid parsing the whole packet, which could result in a DOS attack.

Thanks to Timo Voelker for helping me to test this patch.

MFC r327200:

When adding support for sending SCTP packets containing an ABORT chunk
to ipfw in https://svnweb.freebsd.org/changeset/base/326233,
a dependency on the SCTP stack was added to ipfw by accident.

This was noted by Kevel Bowling in https://reviews.freebsd.org/D13594
where also a solution was suggested. This patch is based on Kevin's
suggestion, but implements the required SCTP checksum computation
without any dependency on other SCTP sources.

While there, do some cleanups and improve comments.

Thanks to Kevin Kevin Bowling for reporting the issue and suggesting
a fix.

332209 07-Apr-2018 tuexen

MFC r324216:

Fix a bug which avoided that rules for matching port numbers for SCTP
packets where actually matched.
While there, make clean in the man-page that SCTP port numbers are
supported in rules.

331201 19-Mar-2018 ae

MFC r330792:
Do not try to reassemble IPv6 fragments in "reass" rule.

ip_reass() expects IPv4 packet and will just corrupt any IPv6 packets
that it gets. Until proper IPv6 fragments handling function will be
implemented, pass IPv6 packets to next rule.

PR: 170604

331151 18-Mar-2018 eadler

MFC r314955:

o Typo in the comment fixed.

328968 07-Feb-2018 ae

MFC r328326:
When IPv6 packet is handled by O_REJECT opcode, convert ICMP code
specified in the arg1 into ICMPv6 destination unreachable code according
to RFC7915.

Obtained from: Yandex LLC
Sponsored by: Yandex LLC

328772 02-Feb-2018 ae

MFC r328161:
Add UDPLite support to ipfw(4).

Now it is possible to use UDPLite's port numbers in rules,
create dynamic states for UDPLite packets and see "UDPLite" for matched
packets in log.

Obtained from: Yandex LLC
Sponsored by: Yandex LLC

326388 30-Nov-2017 ae

MFC r326086:
Add ipfw_add_protected_rule() function that creates rule with 65535
number in the reserved set 31. Use this function to create default rule.

MFC r326115:
Rework rule ranges matching. Use comparison rule id with UINT32_MAX to
match all rules with the same rule number.

MFC r326116:
Move ipfw_send_pkt() from ip_fw_dynamic.c into ip_fw2.c.
It is not specific for dynamic states function and called also from
generic code.

MFC r326117:
Check that address family of state matches address family of packet.
If it is not matched avoid comparing other state fields.

MFC r326118:
Modify ipfw's dynamic states KPI.

Hide the locking logic used in the dynamic states implementation from
generic code. Rename ipfw_install_state() and ipfw_lookup_dyn_rule()
function to have similar names: ipfw_dyn_install_state() and
ipfw_dyn_lookup_state(). Move dynamic rule counters updating to the
ipfw_dyn_lookup_state() function. Now this function return NULL when
there is no state and pointer to the parent rule when state is found.
Thus now there is no need to return pointer to dynamic rule, and no need
to hold bucket lock for this state. Remove ipfw_dyn_unlock() function.

Differential Revision: https://reviews.freebsd.org/D11657

Obtained from: Yandex LLC
Sponsored by: Yandex LLC

326142 24-Nov-2017 ae

MFC r325960:
Unconditionally enable support for O_IPSEC opcode.

IPsec support can be loaded as kernel module, thus do not depend from
kernel option IPSEC and always build O_IPSEC opcode implementation as
enabled.

MFC r325962:
Do not invoke IPv4 NAT handler for non IPv4 packets. Libalias expects
a packet is IPv4. And in case when it is IPv6, it just translates them
as IPv4. This leads to corruption and in some cases to panics.
In particular a panic can happen when value of ip6_plen modified to
something that leads to IP fragmentation, but actual packet length does
not match the IP length.

Packets that are not IPv4 will be dropped by NAT rule.

325730 12-Nov-2017 truckman

MFC r325008

Fix Dummynet AQM packet marking function ecn_mark() and fq_codel /
fq_pie schedulers packet classification functions in layer2 (bridge mode).

Dummynet AQM packet marking function ecn_mark() and fq_codel/fq_pie
schedulers packet classification functions (fq_codel_classify_flow()
and fq_pie_classify_flow()) assume mbuf is pointing at L3 (IP)
packet. However, this assumption is incorrect if ipfw/dummynet is
used to manage layer2 traffic (bridge mode) since mbuf will point
at L2 frame. This patch solves this problem by identifying the
source of the frame/packet (L2 or L3) and adding ETHER_HDR_LEN
offset when converting an mbuf pointer to ip pointer if the traffic
is from layer2. More specifically, in dummynet packet tagging
function, tag_mbuf(), iphdr_off is set to ETHER_HDR_LEN if the
traffic is from layer2 and set to zero otherwise. Whenever an access
to IP header is required, mtodo(m, dn_tag_get(m)->iphdr_off) is
used instead of mtod(m, struct ip *) to correctly convert mbuf
pointer to ip pointer in both L2 and L3 traffic.

Submitted by: lstewart
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D12506

325229 31-Oct-2017 ae

MFC r324947:
Add IPv6 support for O_TCPDATALEN opcode.

PR: 222746

324790 20-Oct-2017 ae

MFC r324593:
Fix regression in handling O_FORWARD_IP opcode after r279948.

To properly handle 'fwd tablearg,port' opcode, copy sin_port value from
sockaddr_in structure stored in the opcode into corresponding hopstore
field.

PR: 222953

324047 27-Sep-2017 ae

MFC r323839:
Use in_localip() function instead of unlocked access to addresses hash
to determine that an address is our local.

PR: 220078

324046 27-Sep-2017 ae

MFC r323836:
Do not acquire IPFW_WLOCK when a named object is created and destroyed.

Acquiring of IPFW_WLOCK is requried for cases when we are going to
change some data that can be accessed during processing of packets flow.
When we create new named object, there are not yet any rules, that
references it, thus holding IPFW_UH_WLOCK is enough to safely update
needed structures. When we destroy an object, we do this only when its
reference counter becomes zero. And it is safe to not acquire IPFW_WLOCK,
because noone references it. The another case is when we failed to finish
some action and thus we are doing rollback and destroying an object, in
this case it is still not referenced by rules and no need to acquire
IPFW_WLOCK.

This also fixes panic with INVARIANTS due to recursive IPFW_WLOCK acquiring.

Sponsored by: Yandex LLC

321811 31-Jul-2017 philip

MFC r320941: Fix GRE over IPv6 tunnels with IPFW

Previously, GRE packets in IPv6 tunnels would be dropped by IPFW (unless
net.inet6.ip6.fw.deny_unknown_exthdrs was unset).

PR: 220640
Submitted by: Kun Xie <kxie@xiplink.com>

320593 03-Jul-2017 ae

MFC r320479:
Fix IPv6 extension header parsing. The length field doesn't include the
first 8 octets.

Obtained from: Yandex LLC

318904 25-May-2017 truckman

MFC r318527

Fix the queue delay estimation in PIE/FQ-PIE when the timestamp
(TS) method is used. When packet timestamp is used, the "current_qdelay"
keeps storing the last queue delay value calculated in the dequeue
function. Therefore, when a burst of packets arrives followed by
a pause, the "current_qdelay" will store a high value caused by the
burst and stick to that value during the pause because the queue
delay measurement is done inside the dequeue function. This causes
the drop probability calculation function to calculate high drop
probability value instead of zero and prevents the burst allowance
mechanism from working properly. Fix this problem by resetting
"current_qdelay" inside the drop probability calculation function
when the queue length is zero and TS option is used.

Submitted by: Rasool Al-Saadi <ralsaadi@swin.edu.au>

318885 25-May-2017 truckman

MFC r318511

The result of right shifting a negative signed value is implementation
defined. On machines without arithmetic shift instructions, zero bits
may be shifted in from the left, giving a large positive result instead
of the desired divide-by power-of-2. Fix this by operating on the
absolute value and compensating for the possible negation later.

Reverse the order of the underflow/overflow tests and the exponential
decay calculation to avoid the possibility of an erroneous overflow
detection if p is a sufficiently small non-negative value. Also
check for negative values of prob before doing the exponential decay
to avoid another instance of of right shifting a negative value.

Tested by: Rasool Al-Saadi <ralsaadi@swin.edu.au>

318154 10-May-2017 marius

MFC: r311817

In dummynet(4), random chunks of memory are casted to struct dn_*,
potentially leading to fatal unaligned accesses on architectures with
strict alignment requirements. This change fixes dummynet(4) as far
as accesses to 64-bit members of struct dn_* are concerned, tripping
up on sparc64 with accesses to 32-bit members happening to be correctly
aligned there. In other words, this only fixes the tip of the iceberg;
larger parts of dummynet(4) still need to be rewritten in order to
properly work on all of !x86.
In principle, considering the amount of code in dummynet(4) that needs
this erroneous pattern corrected, an acceptable workaround would be to
declare all struct dn_* packed, forcing compilers to do byte-accesses
as a side-effect. However, given that the structs in question aren't
laid out well either, this would break ABI/KBI.
While at it, replace all existing bcopy(9) calls with memcpy(9) for
performance reasons, as there is no need to check for overlap in these
cases.

PR: 189219

317488 27-Apr-2017 truckman

MFC r316777 (by cem)

dummynet: Use strlcpy to appease static checkers

Some dummynet modules used strcpy() to copy from a larger buffer
(dn_aqm->name) to a smaller buffer (dn_extra_parms->name). It happens that
the lengths of the strings in the dn_aqm buffers were always hardcoded to be
smaller than the dn_extra_parms buffer ("CODEL", "PIE").

Use strlcpy() instead, to appease static checkers. No functional change.

Reported by: Coverity
CIDs: 1356163, 1356165
Sponsored by: Dell EMC Isilon

317262 21-Apr-2017 ae

MFC r316824:
The rule field in the ipfw_dyn_rule structure is used as storage
to pass rule number and rule set to userland. In r272840 the kernel
internal rule representation was changed and the rulenum field of
struct ip_fw_rule got the type uint32_t, but userlevel representation
still have the type uint16_t. To not overflow the size of pointer
on the systems with 32-bit pointer size use separate variable to
copy rulenum and set.

Reported by: PVS-Studio

317045 17-Apr-2017 ae

MFC r316435:
Add ipfw_pmod kernel module.

The module is designed for modification of a packets of any protocols.
For now it implements only TCP MSS modification. It adds the external
action handler for "tcp-setmss" action.

A rule with tcp-setmss action does additional check for protocol and
TCP flags. If SYN flag is present, it parses TCP options and modifies
MSS option if its value is greater than configured value in the rule.
Then it adjustes TCP checksum if needed. After handling the search
continues with the next rule.

Obtained from: Yandex LLC
Relnotes: yes
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D10150

317044 17-Apr-2017 ae

MFC r316433:
Add the log formatting for an external action opcode.

Obtained from: Yandex LLC
Sponsored by: Yandex LLC

317043 17-Apr-2017 ae

MFC r316434:
Add O_EXTERNAL_DATA opcode support.

This opcode can be used to attach some data to external action opcode.
And unlike to O_EXTERNAL_INSTANCE opcode, this opcode does not require
creating of named instance to pass configuration arguments to external
action handler. The data is coming just next to O_EXTERNAL_ACTION opcode.

The userlevel part currenly supports formatting for opcode with ipfw_insn
size, by default it expects u16 numeric value in the arg1.

Obtained from: Yandex LLC
Sponsored by: Yandex LLC

317042 17-Apr-2017 ae

MFC r316461:
Remove "IPFW static rules" rmlock.

Make PFIL's lock global and use it for this purpose.
This reduces the number of locks needed to acquire for each packet.

Obtained from: Yandex LLC
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D10154

316605 07-Apr-2017 ae

MFC r316329:
Reset the cached state of last lookup in the dynamic states when an
external action is completed, but the rule search is continued.

External action handler can change the content of @args argument,
that is used for dynamic state lookup. Enforce the new lookup to be able
install new state, when the search is continued.

Obtained from: Yandex LLC
Sponsored by: Yandex LLC

316446 03-Apr-2017 ae

MFC r304041:
Move logging via BPF support into separate file.

* make interface cloner VNET-aware;
* simplify cloner code and use if_clone_simple();
* migrate LOGIF_LOCK() to rmlock;
* add ipfw_bpf_mtap2() function to pass mbuf to BPF;
* introduce new additional ipfwlog0 pseudo interface. It differs from
ipfw0 by DLT type used in bpfattach. This interface is intended to
used by ipfw modules to dump packets with additional info attached.
Currently pflog format is used. ipfw_bpf_mtap2() function uses second
argument to determine which interface use for dumping. If dlen is equal
to ETHER_HDR_LEN it uses old ipfw0 interface, if dlen is equal to
PFLOG_HDRLEN - ipfwlog0 will be used.

Obtained from: Yandex LLC
Sponsored by: Yandex LLC

MFC r304043:
Add three helper function to manage tables from external modules.

ipfw_objhash_lookup_table_kidx does lookup kernel index of table;
ipfw_ref_table/ipfw_unref_table takes and releases reference to table.

Obtained from: Yandex LLC
Sponsored by: Yandex LLC

MFC r304046, 304108:
Add ipfw_nat64 module that implements stateless and stateful NAT64.

The module works together with ipfw(4) and implemented as its external
action module.

Stateless NAT64 registers external action with name nat64stl. This
keyword should be used to create NAT64 instance and to address this
instance in rules. Stateless NAT64 uses two lookup tables with mapped
IPv4->IPv6 and IPv6->IPv4 addresses to perform translation.

A configuration of instance should looks like this:
1. Create lookup tables:
# ipfw table T46 create type addr valtype ipv6
# ipfw table T64 create type addr valtype ipv4
2. Fill T46 and T64 tables.
3. Add rule to allow neighbor solicitation and advertisement:
# ipfw add allow icmp6 from any to any icmp6types 135,136
4. Create NAT64 instance:
# ipfw nat64stl NAT create table4 T46 table6 T64
5. Add rules that matches the traffic:
# ipfw add nat64stl NAT ip from any to table(T46)
# ipfw add nat64stl NAT ip from table(T64) to 64:ff9b::/96
6. Configure DNS64 for IPv6 clients and add route to 64:ff9b::/96
via NAT64 host.

Stateful NAT64 registers external action with name nat64lsn. The only
one option required to create nat64lsn instance - prefix4. It defines
the pool of IPv4 addresses used for translation.

A configuration of instance should looks like this:
1. Add rule to allow neighbor solicitation and advertisement:
# ipfw add allow icmp6 from any to any icmp6types 135,136
2. Create NAT64 instance:
# ipfw nat64lsn NAT create prefix4 A.B.C.D/28
3. Add rules that matches the traffic:
# ipfw add nat64lsn NAT ip from any to A.B.C.D/28
# ipfw add nat64lsn NAT ip6 from any to 64:ff9b::/96
4. Configure DNS64 for IPv6 clients and add route to 64:ff9b::/96
via NAT64 host.

Obtained from: Yandex LLC
Relnotes: yes
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D6434

MFC r304048:
Replace __noinline with special debug macro NAT64NOINLINE.

MFC r304061:
Use %ju to print unsigned 64-bit value.

MFC r304076:
Make statistics nat64lsn, nat64stl an nptv6 output netstat-like:
"@value @description" and fix build due to -Wformat errors.

MFC r304378 (by bz):
Try to fix gcc compilation errors (which are right).
nat64_getlasthdr() returns an int, which can be -1 in case of error,
storing the result in an uint8_t and then comparing to < 0 is not
helpful. Do what is done in the rest of the code and make proto an
int here as well.

MFC r309187:
Fix ICMPv6 Time Exceeded error message translation.

MFC r314718:
Use new ipfw_lookup_table() in the nat64 too.

MFC r315204,315233:
Use memset with structure size.

316444 03-Apr-2017 ae

MFC r303012:
Add ipfw_nptv6 module that implements Network Prefix Translation for IPv6
as defined in RFC 6296. The module works together with ipfw(4) and
implemented as its external action module. When it is loaded, it registers
as eaction and can be used in rules. The usage pattern is similar to
ipfw_nat(4). All matched by rule traffic goes to the NPT module.

Reviewed by: hrs
Obtained from: Yandex LLC
Relnotes: yes
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D6420

MFC r304049:
Add `stats reset` command implementation to NPTv6 module
to be able reset statistics counters.

Obtained from: Yandex LLC
Sponsored by: Yandex LLC

MFC r304076:
Make statistics nat64lsn, nat64stl an nptv6 output netstat-like:
"@value @description" and fix build due to -Wformat errors.

MFC r314507:
Fix NPTv6 rule counters when one_pass is not enabled.

Consider the rule matching when both @done and @retval values
returned from ipfw_run_eaction() are zero. And modify ipfw_nptv6()
to return IP_FW_DENY and @done=0 when addresses do not match.

Obtained from: Yandex LLC
Sponsored by: Yandex LLC

316324 31-Mar-2017 truckman

MFC r315516

Change several constants used by the PIE algorithm from unsigned to signed.

- PIE_MAX_PROB is compared to variable of int64_t and the type promotion
rules can cause the value of that variable to be treated as unsigned.
If the value is actually negative, then the result of the comparsion
is incorrect, causing the algorithm to perform poorly in some
situations. Changing the constant to be signed cause the comparision
to work correctly.

- PIE_SCALE is also compared to signed values. Fortunately they are
also compared to zero and negative values are discarded so this is
more of a cosmetic fix.

- PIE_DQ_THRESHOLD is only compared to unsigned values, but it is small
enough that the automatic promotion to unsigned is harmless.

Submitted by: Rasool Al-Saadi <ralsaadi@swin.edu.au>

316274 30-Mar-2017 ae

MFC r303018:
Add named dynamic states support to ipfw(4).

The keep-state, limit and check-state now will have additional argument
flowname. This flowname will be assigned to dynamic rule by keep-state
or limit opcode. And then can be matched by check-state opcode or
O_PROBE_STATE internal opcode. To reduce possible breakage and to maximize
compatibility with old rulesets default flowname introduced.
It will be assigned to the rules when user has omitted state name in
keep-state and check-state opcodes. Also if name is ambiguous (can be
evaluated as rule opcode) it will be replaced to default.

Reviewed by: julian
Obtained from: Yandex LLC
Relnotes: yes
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D6674

MFC r304087:
Do not warn about ambiguous state name when we inspect a comment token.

MFC r304089:
Add an ability to attach comment to check-state rules.

MFC r310727 (by marius):
Fix a bug in r272840; given that the optlen parameter of setsockopt(2)
is a 32-bit socklen_t, do_get3() passes the kernel to access the wrong
32-bit half on big-endian LP64 machines when simply casting the 64-bit
size_t optlen to a socklen_t pointer.
While at it and given that the intention of do_get3() apparently is to
hide/wrap the fact that socket options are used for communication with
ipfw(4), change the optlen parameter of do_set3() to be of type size_t
and as such more appropriate than uintptr_t, too.

MFC r315305:
Change the syntax of ipfw's named states.

Since the state name is an optional argument, it often can conflict
with other options. To avoid ambiguity now the state name must be
prefixed with a colon.

Sponsored by: Yandex LLC

315532 19-Mar-2017 ae

MFC r314716:
Add IPv6 support to O_IP_DST_LOOKUP opcode.

o check the size of O_IP_SRC_LOOKUP opcode, it can not exceed the size of
ipfw_insn_u32;
o rename ipfw_lookup_table_extended() function into ipfw_lookup_table() and
remove old ipfw_lookup_table();
o use args->f_id.flow_id6 that is in host byte order to get DSCP value;
o add SCTP ports support to 'lookup src/dst-port' opcode;
o add IPv6 support to 'lookup src/dst-ip' opcode.

PR: 217292
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D9873

315456 17-Mar-2017 vangyzen

MFC r313821 r315277 r315286

Use inet_ntoa_r() instead of inet_ntoa() throughout the kernel.

inet_ntoa() cannot be used safely in a multithreaded environment
because it uses a static local buffer. Instead, use inet_ntoa_r()
with a buffer on the caller's stack, except for KTR messages.
KTR can correctly log the immediate integral values passed to it,
as well as constant strings, but not non-constant strings,
since they might change by the time ktrdump retrieves them.
Therefore, use hex notation in KTR messages.

Sponsored by: Dell EMC

315221 14-Mar-2017 pfg

MFC r313982, r314068:
sys: Replace zero with NULL for pointers.

Found with: devel/coccinelle


/freebsd-11-stable/sys/amd64/amd64/db_disasm.c
/freebsd-11-stable/sys/amd64/amd64/pmap.c
/freebsd-11-stable/sys/boot/common/md.c
/freebsd-11-stable/sys/boot/efi/libefi/efinet.c
/freebsd-11-stable/sys/boot/fdt/fdt_overlay.c
/freebsd-11-stable/sys/boot/ficl/ficl.c
/freebsd-11-stable/sys/boot/kshim/bsd_kernel.c
/freebsd-11-stable/sys/boot/ofw/libofw/ofw_memory.c
/freebsd-11-stable/sys/boot/sparc64/loader/main.c
/freebsd-11-stable/sys/boot/userboot/userboot/userboot_disk.c
/freebsd-11-stable/sys/boot/zfs/zfs.c
/freebsd-11-stable/sys/boot/zfs/zfsimpl.c
/freebsd-11-stable/sys/dev/agp/agp.c
/freebsd-11-stable/sys/dev/an/if_an.c
/freebsd-11-stable/sys/dev/arcmsr/arcmsr.c
/freebsd-11-stable/sys/dev/bce/if_bce.c
/freebsd-11-stable/sys/dev/beri/virtio/virtio_block.c
/freebsd-11-stable/sys/dev/buslogic/bt_pci.c
/freebsd-11-stable/sys/dev/ce/if_ce.c
/freebsd-11-stable/sys/dev/cm/smc90cx6.c
/freebsd-11-stable/sys/dev/cp/if_cp.c
/freebsd-11-stable/sys/dev/ctau/ctddk.c
/freebsd-11-stable/sys/dev/ctau/if_ct.c
/freebsd-11-stable/sys/dev/cx/cxddk.c
/freebsd-11-stable/sys/dev/cx/if_cx.c
/freebsd-11-stable/sys/dev/de/if_de.c
/freebsd-11-stable/sys/dev/ed/if_ed.c
/freebsd-11-stable/sys/dev/fatm/if_fatm.c
/freebsd-11-stable/sys/dev/fe/if_fe.c
/freebsd-11-stable/sys/dev/firewire/if_fwip.c
/freebsd-11-stable/sys/dev/hptiop/hptiop.c
/freebsd-11-stable/sys/dev/hptmv/entry.c
/freebsd-11-stable/sys/dev/hptmv/gui_lib.c
/freebsd-11-stable/sys/dev/hptmv/hptproc.c
/freebsd-11-stable/sys/dev/hptmv/ioctl.c
/freebsd-11-stable/sys/dev/iicbus/if_ic.c
/freebsd-11-stable/sys/dev/isp/isp_pci.c
/freebsd-11-stable/sys/dev/le/am7990.c
/freebsd-11-stable/sys/dev/le/am79900.c
/freebsd-11-stable/sys/dev/le/lance.c
/freebsd-11-stable/sys/dev/md/md.c
/freebsd-11-stable/sys/dev/ncr/ncr.c
/freebsd-11-stable/sys/dev/ofw/ofw_bus_subr.c
/freebsd-11-stable/sys/dev/patm/if_patm_tx.c
/freebsd-11-stable/sys/dev/pccard/pccard.c
/freebsd-11-stable/sys/dev/pms/RefTisa/sallsdk/spc/sainit.c
/freebsd-11-stable/sys/dev/pms/RefTisa/tisa/sassata/common/tdioctl.c
/freebsd-11-stable/sys/dev/pms/freebsd/driver/ini/src/agtiapi.c
/freebsd-11-stable/sys/dev/ppbus/if_plip.c
/freebsd-11-stable/sys/dev/ppbus/ppbconf.c
/freebsd-11-stable/sys/dev/ppc/ppc.c
/freebsd-11-stable/sys/dev/sbni/if_sbni_isa.c
/freebsd-11-stable/sys/dev/sn/if_sn.c
/freebsd-11-stable/sys/dev/sym/sym_hipd.c
/freebsd-11-stable/sys/dev/vx/if_vx.c
/freebsd-11-stable/sys/libkern/iconv_xlat16.c
/freebsd-11-stable/sys/net/if_fddisubr.c
/freebsd-11-stable/sys/net/if_iso88025subr.c
/freebsd-11-stable/sys/net/iflib.c
ip_fw_sockopt.c
315191 13-Mar-2017 ae

MFC r314715:
Reject invalid object types that can not be used with specific opcodes.

When we doing reference counting of named objects in the new rule,
for existing objects check that opcode references to correct object,
otherwise return EINVAL.

PR: 217391

314990 10-Mar-2017 ae

MFC r314614:
Fix matching table entry value. Use real table value instead of its index
in valuestate array.

When opcode has size equal to ipfw_insn_u32, this means that it should
additionally match value specified in d[0] with table entry value.
ipfw_table_lookup() returns table value index, use TARG_VAL() macro to
convert it to its value. The actual 32-bit value stored in the tag field
of table_value structure, where all unspecified u32 values are kept.

PR: 217262

313725 14-Feb-2017 ngie

MFC r313356:

Fix typos in comments (returing -> returning)

312677 24-Jan-2017 ae

MFC r312341:
Initialize IPFW static rules rmlock with RM_RECURSE flag.

This lock was replaced from rwlock in r272840. But unlike rwlock, rmlock
doesn't allow recursion on rm_rlock(), so at this time fix this with
RM_RECURSE flag. Later we need to change ipfw to avoid such recursions.

PR: 216171

310015 13-Dec-2016 ae

MFC r309660:
Convert result of hash_packet6() into host byte order.

For IPv4 similar function uses addresses and ports in host byte order,
but for IPv6 it used network byte order. This led to very bad hash
distribution for IPv6 flows. Now the result looks similar to IPv4.

308749 17-Nov-2016 loos

MFC r308237:

Remove the mbuf tag after use (for reinjected packets).

Fixes the packet processing in dummynet l2 rules.

Obtained from: pfSense
Sponsored by: Rubicon Communications, LLC (Netgate)

308660 15-Nov-2016 loos

Stop abusing from struct ifnet presence to determine the packet direction
for dummynet, use the correct argument for that, remove the false coment
about the presence of struct ifnet.

Fixes the input match of dummynet l2 rules.

Obtained from: pfSense
Sponsored by: Rubicon Communications, LLC (Netgate)

307970 26-Oct-2016 ae

MFC r307628:
Fix `ipfw table lookup` handler to return entry value, but not its index.

306475 30-Sep-2016 ae

MFC r305940:
Move opcode rewriter init and destroy handlers into non-VNET code.

PR: 212576,212649,212077
Submitted by: John Zielinski

306025 20-Sep-2016 ae

MFC r305778:
Fix swap tables between sets when this functional is enabled.

We have 6 opcode rewriters for table opcodes. When `set swap' command
invoked, it is called for each rewriter, so at the end we get the same
result, because opcode rewriter uses ETLV type to match opcode. And all
tables opcodes have the same ETLV type. To solve this problem, use
separate sets handler for one opcode rewriter. Use it to handle TEST_ALL,
SWAP_ALL and MOVE_ALL commands.

PR: 212630

304843 26-Aug-2016 kib

MFC r303382:
Provide the getboottime(9) and getboottimebin(9) KPI.

MFC r303387:
Prevent parallel tc_windup() calls. Keep boottime in timehands,
and adjust it from tc_windup().

MFC notes:

The boottime and boottimebin globals are still exported from
the kernel dyn symbol table in stable/11, but their declarations are
removed from sys/time.h. This preserves KBI but not KPI, while all
in-tree consumers are converted to getboottime().

The variables are updated after tc_setclock_mtx is dropped, which gives
approximately same unlocked bugs as before.

The boottime and boottimebin locals in several sys/kern_tc.c functions
were renamed by adding the '_x' suffix to avoid name conficts.

304415 18-Aug-2016 oleg

MFC r304154

Fix command: ipfw set (enable|disable) N (where N > 4).

304079 14-Aug-2016 ae

MFC r303955:
Restore "nat global" support.

Now zero value of arg1 used to specify "tablearg", use the old "tablearg"
value for "nat global". Introduce new macro IP_FW_NAT44_GLOBAL to replace
hardcoded magic number to specify "nat global". Also replace 65535 magic
number with corresponding macro. Fix typo in comments.

PR: 211256

302927 16-Jul-2016 truckman

MFC r302667

Fix problems in the FQ-PIE AQM cleanup code that could leak memory or
cause a crash.

Because dummynet calls pie_cleanup() while holding a mutex, pie_cleanup()
is not able to use callout_drain() to make sure that all callouts are
finished before it returns, and callout_stop() is not sufficient to make
that guarantee. After pie_cleanup() returns, dummynet will free a
structure that any remaining callouts will want to access.

Fix these problems by allocating a separate structure to contain the
data used by the callouts. In pie_cleanup(), call callout_reset_sbt()
to replace the normal callout with a cleanup callout that does the cleanup
work for each sub-queue. The instance of the cleanup callout that
destroys the last flow will also free the extra allocated block of memory.
Protect the reference count manipulation in the cleanup callout with
DN_BH_WLOCK() to be consistent with all of the other usage of the reference
count where this lock is held by the dummynet code.

Submitted by: Rasool Al-Saadi <ralsaadi@swin.edu.au>
Approved by: re (gjb)
Differential Revision: https://reviews.freebsd.org/D7174

302408 08-Jul-2016 gjb

Copy head@r302406 to stable/11 as part of the 11.0-RELEASE cycle.
Prune svn:mergeinfo from the new branch, as nothing has been merged
here.

Additional commits post-branch will follow.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation


/freebsd-11-stable/MAINTAINERS
/freebsd-11-stable/cddl
/freebsd-11-stable/cddl/contrib/opensolaris
/freebsd-11-stable/cddl/contrib/opensolaris/cmd/dtrace/test/tst/common/print
/freebsd-11-stable/cddl/contrib/opensolaris/cmd/zfs
/freebsd-11-stable/cddl/contrib/opensolaris/lib/libzfs
/freebsd-11-stable/contrib/amd
/freebsd-11-stable/contrib/apr
/freebsd-11-stable/contrib/apr-util
/freebsd-11-stable/contrib/atf
/freebsd-11-stable/contrib/binutils
/freebsd-11-stable/contrib/bmake
/freebsd-11-stable/contrib/byacc
/freebsd-11-stable/contrib/bzip2
/freebsd-11-stable/contrib/com_err
/freebsd-11-stable/contrib/compiler-rt
/freebsd-11-stable/contrib/dialog
/freebsd-11-stable/contrib/dma
/freebsd-11-stable/contrib/dtc
/freebsd-11-stable/contrib/ee
/freebsd-11-stable/contrib/elftoolchain
/freebsd-11-stable/contrib/elftoolchain/ar
/freebsd-11-stable/contrib/elftoolchain/brandelf
/freebsd-11-stable/contrib/elftoolchain/elfdump
/freebsd-11-stable/contrib/expat
/freebsd-11-stable/contrib/file
/freebsd-11-stable/contrib/gcc
/freebsd-11-stable/contrib/gcclibs/libgomp
/freebsd-11-stable/contrib/gdb
/freebsd-11-stable/contrib/gdtoa
/freebsd-11-stable/contrib/groff
/freebsd-11-stable/contrib/ipfilter
/freebsd-11-stable/contrib/ldns
/freebsd-11-stable/contrib/ldns-host
/freebsd-11-stable/contrib/less
/freebsd-11-stable/contrib/libarchive
/freebsd-11-stable/contrib/libarchive/cpio
/freebsd-11-stable/contrib/libarchive/libarchive
/freebsd-11-stable/contrib/libarchive/libarchive_fe
/freebsd-11-stable/contrib/libarchive/tar
/freebsd-11-stable/contrib/libc++
/freebsd-11-stable/contrib/libc-vis
/freebsd-11-stable/contrib/libcxxrt
/freebsd-11-stable/contrib/libexecinfo
/freebsd-11-stable/contrib/libpcap
/freebsd-11-stable/contrib/libstdc++
/freebsd-11-stable/contrib/libucl
/freebsd-11-stable/contrib/libxo
/freebsd-11-stable/contrib/llvm
/freebsd-11-stable/contrib/llvm/projects/libunwind
/freebsd-11-stable/contrib/llvm/tools/clang
/freebsd-11-stable/contrib/llvm/tools/lldb
/freebsd-11-stable/contrib/llvm/tools/llvm-dwarfdump
/freebsd-11-stable/contrib/llvm/tools/llvm-lto
/freebsd-11-stable/contrib/mdocml
/freebsd-11-stable/contrib/mtree
/freebsd-11-stable/contrib/ncurses
/freebsd-11-stable/contrib/netcat
/freebsd-11-stable/contrib/ntp
/freebsd-11-stable/contrib/nvi
/freebsd-11-stable/contrib/one-true-awk
/freebsd-11-stable/contrib/openbsm
/freebsd-11-stable/contrib/openpam
/freebsd-11-stable/contrib/openresolv
/freebsd-11-stable/contrib/pf
/freebsd-11-stable/contrib/sendmail
/freebsd-11-stable/contrib/serf
/freebsd-11-stable/contrib/sqlite3
/freebsd-11-stable/contrib/subversion
/freebsd-11-stable/contrib/tcpdump
/freebsd-11-stable/contrib/tcsh
/freebsd-11-stable/contrib/tnftp
/freebsd-11-stable/contrib/top
/freebsd-11-stable/contrib/top/install-sh
/freebsd-11-stable/contrib/tzcode/stdtime
/freebsd-11-stable/contrib/tzcode/zic
/freebsd-11-stable/contrib/tzdata
/freebsd-11-stable/contrib/unbound
/freebsd-11-stable/contrib/vis
/freebsd-11-stable/contrib/wpa
/freebsd-11-stable/contrib/xz
/freebsd-11-stable/crypto/heimdal
/freebsd-11-stable/crypto/openssh
/freebsd-11-stable/crypto/openssl
/freebsd-11-stable/gnu/lib
/freebsd-11-stable/gnu/usr.bin/binutils
/freebsd-11-stable/gnu/usr.bin/cc/cc_tools
/freebsd-11-stable/gnu/usr.bin/gdb
/freebsd-11-stable/lib/libc/locale/ascii.c
/freebsd-11-stable/sys/cddl/contrib/opensolaris
/freebsd-11-stable/sys/contrib/dev/acpica
/freebsd-11-stable/sys/contrib/ipfilter
/freebsd-11-stable/sys/contrib/libfdt
/freebsd-11-stable/sys/contrib/octeon-sdk
/freebsd-11-stable/sys/contrib/x86emu
/freebsd-11-stable/sys/contrib/xz-embedded
/freebsd-11-stable/usr.sbin/bhyve/atkbdc.h
/freebsd-11-stable/usr.sbin/bhyve/bhyvegc.c
/freebsd-11-stable/usr.sbin/bhyve/bhyvegc.h
/freebsd-11-stable/usr.sbin/bhyve/console.c
/freebsd-11-stable/usr.sbin/bhyve/console.h
/freebsd-11-stable/usr.sbin/bhyve/pci_fbuf.c
/freebsd-11-stable/usr.sbin/bhyve/pci_xhci.c
/freebsd-11-stable/usr.sbin/bhyve/pci_xhci.h
/freebsd-11-stable/usr.sbin/bhyve/ps2kbd.c
/freebsd-11-stable/usr.sbin/bhyve/ps2kbd.h
/freebsd-11-stable/usr.sbin/bhyve/ps2mouse.c
/freebsd-11-stable/usr.sbin/bhyve/ps2mouse.h
/freebsd-11-stable/usr.sbin/bhyve/rfb.c
/freebsd-11-stable/usr.sbin/bhyve/rfb.h
/freebsd-11-stable/usr.sbin/bhyve/sockstream.c
/freebsd-11-stable/usr.sbin/bhyve/sockstream.h
/freebsd-11-stable/usr.sbin/bhyve/usb_emul.c
/freebsd-11-stable/usr.sbin/bhyve/usb_emul.h
/freebsd-11-stable/usr.sbin/bhyve/usb_mouse.c
/freebsd-11-stable/usr.sbin/bhyve/vga.c
/freebsd-11-stable/usr.sbin/bhyve/vga.h
302338 05-Jul-2016 truckman

Fix a race condition between the main thread in aqm_pie_cleanup() and the
callout thread that can cause a kernel panic. Always do the final cleanup
in the callout thread by passing a separate callout function for that task
to callout_reset_sbt().

Protect the ref_count decrement in the callout with DN_BH_WLOCK(). All
other ref_count manipulation is protected with this lock.

There is still a tiny window between ref_count reaching zero and the end
of the callout function where it is unsafe to unload the module. Fixing
this would require the use of callout_drain(), but this can't be done
because dummynet holds a mutex and callout_drain() might sleep.

Remove the callout_pending(), callout_active(), and callout_deactivate()
calls from calculate_drop_prob(). They are not needed because this callout
uses callout_init_mtx().

Submitted by: Rasool Al-Saadi <ralsaadi@swin.edu.au>
Approved by: re (gjb)
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D6928


302302 30-Jun-2016 bz

In case of the global eventhandler make sure the current VNET
is still operational before doing any work; otherwise we might
run into, e.g., destroyed locks.

PR: 210724
Reported by: olevole olevole.ru
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Obtained from: projects/vnet
Approved by: re (gjb)


302290 30-Jun-2016 bz

Move the ipfw_log_bpf() calls from global module initialisation to
per-VNET initialisation and virtualise the interface cloning to
allow a dedicated ipfw log interface per VNET.

Approved by: re (gjb)
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation


302054 21-Jun-2016 bz

Get closer to a VIMAGE network stack teardown from top to bottom rather
than removing the network interfaces first. This change is rather larger
and convoluted as the ordering requirements cannot be separated.

Move the pfil(9) framework to SI_SUB_PROTO_PFIL, move Firewalls and
related modules to their own SI_SUB_PROTO_FIREWALL.
Move initialization of "physical" interfaces to SI_SUB_DRIVERS,
move virtual (cloned) interfaces to SI_SUB_PSEUDO.
Move Multicast to SI_SUB_PROTO_MC.

Re-work parts of multicast initialisation and teardown, not taking the
huge amount of memory into account if used as a module yet.

For interface teardown we try to do as many of them as we can on
SI_SUB_INIT_IF, but for some this makes no sense, e.g., when tunnelling
over a higher layer protocol such as IP. In that case the interface
has to go along (or before) the higher layer protocol is shutdown.

Kernel hhooks need to go last on teardown as they may be used at various
higher layers and we cannot remove them before we cleaned up the higher
layers.

For interface teardown there are multiple paths:
(a) a cloned interface is destroyed (inside a VIMAGE or in the base system),
(b) any interface is moved from a virtual network stack to a different
network stack ("vmove"), or (c) a virtual network stack is being shut down.
All code paths go through if_detach_internal() where we, depending on the
vmove flag or the vnet state, make a decision on how much to shut down;
in case we are destroying a VNET the individual protocol layers will
cleanup their own parts thus we cannot do so again for each interface as
we end up with, e.g., double-frees, destroying locks twice or acquiring
already destroyed locks.
When calling into protocol cleanups we equally have to tell them
whether they need to detach upper layer protocols ("ulp") or not
(e.g., in6_ifdetach()).

Provide or enahnce helper functions to do proper cleanup at a protocol
rather than at an interface level.

Approved by: re (hrs)
Obtained from: projects/vnet
Reviewed by: gnn, jhb
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D6747


301440 05-Jun-2016 melifaro

Fix 4-byte overflow in ipv6_writemask.

This bug could cause some IPv6 table prefix delete requests to fail.

Obtained from: Yandex LLC


301162 01-Jun-2016 truckman

Replace constant expressions that contain multiplications by
fractional floating point values with integer divides. This will
eliminate any chance that the compiler will generate code to evaluate
the expression using floating point at runtime.

Suggested by: bde
Submitted by: Rasool Al-Saadi <ralsaadi@swin.edu.au>
MFC after: 8 days (with r300779 and r300949)


300949 29-May-2016 truckman

Cast some expressions that multiply a long long constant by a
floating point constant to int64_t. This avoids the runtime
conversion of the the other operand in a set of comparisons from
int64_t to floating point and doing the comparisions in floating
point.

Suggested by: lidl
Submitted by: Rasool Al-Saadi <ralsaadi@swin.edu.au>
MFC after: 2 weeks (with r300779)


300783 26-May-2016 truckman

Correct a typo in a comment.

MFC after: 2 weeks (with r300779)


300781 26-May-2016 truckman

Modify BOUND_VAR() macro to wrap all of its arguments in () and tweak
its expression to work on powerpc and sparc64 (gcc compatibility).

Correct a typo in a nearby comment.

MFC after: 2 weeks (with r300779)


300779 26-May-2016 truckman

Import Dummynet AQM version 0.2.1 (CoDel, FQ-CoDel, PIE and FQ-PIE).

Centre for Advanced Internet Architectures

Implementing AQM in FreeBSD

* Overview <http://caia.swin.edu.au/freebsd/aqm/index.html>

* Articles, Papers and Presentations
<http://caia.swin.edu.au/freebsd/aqm/papers.html>

* Patches and Tools <http://caia.swin.edu.au/freebsd/aqm/downloads.html>

Overview

Recent years have seen a resurgence of interest in better managing
the depth of bottleneck queues in routers, switches and other places
that get congested. Solutions include transport protocol enhancements
at the end-hosts (such as delay-based or hybrid congestion control
schemes) and active queue management (AQM) schemes applied within
bottleneck queues.

The notion of AQM has been around since at least the late 1990s
(e.g. RFC 2309). In recent years the proliferation of oversized
buffers in all sorts of network devices (aka bufferbloat) has
stimulated keen community interest in four new AQM schemes -- CoDel,
FQ-CoDel, PIE and FQ-PIE.

The IETF AQM working group is looking to document these schemes,
and independent implementations are a corner-stone of the IETF's
process for confirming the clarity of publicly available protocol
descriptions. While significant development work on all three schemes
has occured in the Linux kernel, there is very little in FreeBSD.

Project Goals

This project began in late 2015, and aims to design and implement
functionally-correct versions of CoDel, FQ-CoDel, PIE and FQ_PIE
in FreeBSD (with code BSD-licensed as much as practical). We have
chosen to do this as extensions to FreeBSD's ipfw/dummynet firewall
and traffic shaper. Implementation of these AQM schemes in FreeBSD
will:
* Demonstrate whether the publicly available documentation is
sufficient to enable independent, functionally equivalent implementations

* Provide a broader suite of AQM options for sections the networking
community that rely on FreeBSD platforms

Program Members:

* Rasool Al Saadi (developer)

* Grenville Armitage (project lead)

Acknowledgements:

This project has been made possible in part by a gift from the
Comcast Innovation Fund.

Submitted by: Rasool Al-Saadi <ralsaadi@swin.edu.au>
X-No objection: core
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D6388


300302 20-May-2016 ae

Fix the regression introduced in r300143.
When we are creating new dynamic state use MATCH_FORWARD direction to
correctly initialize protocol's state.


300143 18-May-2016 ae

Move protocol state handling code from lookup_dyn_rule_locked() function
into dyn_update_proto_state(). This allows eliminate the second state
lookup in the ipfw_install_state().
Also remove MATCH_* macros, they are defined in ip_fw_private.h as enum.

Obtained from: Yandex LLC
Sponsored by: Yandex LLC


300021 17-May-2016 ae

Make named objects set-aware. Now it is possible to create named
objects with the same name in different sets.

Add optional manage_sets() callback to objects rewriting framework.
It is intended to implement handler for moving and swapping named
object's sets. Add ipfw_obj_manage_sets() function that implements
generic sets handler. Use new callback to implement sets support for
lookup tables.
External actions objects are global and they don't support sets.
Modify eaction_findbyname() to reflect this.
ipfw(8) now may fail to move rules or sets, because some named objects
in target set may have conflicting names.
Note that ipfw_obj_ntlv type was changed, but since lookup tables
actually didn't support sets, this change is harmless.

Obtained from: Yandex LLC
Sponsored by: Yandex LLC


299420 11-May-2016 ae

Fix memory leak possible in error case.
Use free_rule() instead of free(), it will also release memory allocated
for rule counters.

Obtained from: Yandex LLC
Sponsored by: Yandex LLC


299152 06-May-2016 ae

Change the type of objhash_cb_t callback function to be able return an
error code. Use it to interrupt the loop in ipfw_objhash_foreach().

Obtained from: Yandex LLC
Sponsored by: Yandex LLC


299136 05-May-2016 ae

Rename find_name_tlv_type() to ipfw_find_name_tlv_type() and make it
global. Use it in ip_fw_table.c instead of find_name_tlv() to reduce
duplicated code.

Obtained from: Yandex LLC
Sponsored by: Yandex LLC


298995 03-May-2016 pfg

sys/net*: minor spelling fixes.

No functional change.


298702 27-Apr-2016 ae

Make create_object callback optional and return EOPNOTSUPP when it isn't
defined. Remove eaction_create_compat() and use designated initializers to
initialize eaction_opcodes structure.

Obtained from: Yandex LLC


298048 15-Apr-2016 pfg

netpfil: for pointers replace 0 with NULL.

These are mostly cosmetical, no functional change.

Found with devel/coccinelle.

Reviewed by: ae


298016 14-Apr-2016 ae

Add External Actions KPI to ipfw(9).

It allows implementing loadable kernel modules with new actions and
without needing to modify kernel headers and ipfw(8). The module
registers its action handler and keyword string, that will be used
as action name. Using generic syntax user can add rules with this
action. Also ipfw(8) can be easily modified to extend basic syntax
for external actions, that become a part base system.
Sample modules will coming soon.

Obtained from: Yandex LLC
Sponsored by: Yandex LLC


298003 14-Apr-2016 ae

Change the type of 'etlv' field in struct named_object to uint16_t.
It should match with the type field in struct ipfw_obj_tlv.

Obtained from: Yandex LLC
Sponsored by: Yandex LLC


298001 14-Apr-2016 ae

Adjust some comments and make ref_opcode_object() static.


298000 14-Apr-2016 ae

o Teach opcode rewriting framework handle several rewriters for
the same opcode.

o Reduce number of times classifier callback is called. It is
redundant to call it just after find_op_rw(), since the last
does call it already and can have all results.

o Do immediately opcode rewrite in the ref_opcode_object().
This eliminates additional classifier lookup later on bulk update.
For unresolved opcodes the behavior still the same, we save information
from classifier callback in the obj_idx array, then perform automatic
objects creation, then perform rewriting for opcodes using indeces
from created objects.

Obtained from: Yandex LLC
Sponsored by: Yandex LLC


297992 14-Apr-2016 ae

Move several functions related to opcode rewriting framework from
ip_fw_table.c into ip_fw_sockopt.c and make them static.

Obtained from: Yandex LLC


297793 10-Apr-2016 pfg

Cleanup unnecessary semicolons from the kernel.

Found with devel/coccinelle.


296348 03-Mar-2016 ae

Use correct size for malloc.

Obtained from: Yandex LLC
MFC after: 1 week


296272 01-Mar-2016 jhb

Remove taskqueue_enqueue_fast().

taskqueue_enqueue() was changed to support both fast and non-fast
taskqueues 10 years ago in r154167. It has been a compat shim ever
since. It's time for the compat shim to go.

Submitted by: Howard Su <howard0su@gmail.com>
Reviewed by: sephe
Differential Revision: https://reviews.freebsd.org/D5131


295969 24-Feb-2016 ae

Fix bug in filling and handling ipfw's O_DSCP opcode.
Due to integer overflow CS4 token was handled as BE.

PR: 207459
MFC after: 1 week


295766 18-Feb-2016 glebius

Fix obvious typo, that lead to incorrect sorting.

Found by: PVS-Studio


295126 01-Feb-2016 glebius

These files were getting sys/malloc.h and vm/uma.h with header pollution
via sys/mbuf.h


294882 27-Jan-2016 luigi

cleanup and document in some detail the internals of the testing code
for dummynet schedulers


294881 27-Jan-2016 luigi

the _Static_assert was not supposed to be in the commit.


294879 27-Jan-2016 luigi

bugfix: the scheduler template (dn_schk) for the round robin scheduler
is followed by another structure (rr_schk) whose size must be set
in the schk_datalen field of the descriptor.
Not allocating the memory may cause other memory to be overwritten
(though dn_schk is 192 bytes and rr_schk only 12 so we may be lucky
and end up in the padding after the dn_schk).

This is a merge candidate for stable and 10.3

MFC after: 3 days


294859 26-Jan-2016 luigi

fix various warnings to compile the test code with -Wextra


294858 26-Jan-2016 luigi

fix various warnings (signed/unsigned, printf types, unused arguments)


294857 26-Jan-2016 luigi

prevent warnings for signed/unsigned comparisons and unused arguments.
Add checks for parameters overflowing 32 bit.


294856 26-Jan-2016 luigi

prevent warning for unused argument


294855 26-Jan-2016 luigi

avoid warnings for signed/unsigned comparison and unused arguments


294761 26-Jan-2016 luigi

Revert one chunk from commit 285362, which introduced an off-by-one error
in computing a shift index. The error was due to the use of mixed
fls() / __fls() functions in another implementation of qfq.
To avoid that the problem occurs again, properly document which
incarnation of the function we need.
Note that the bug only affects QFQ in FreeBSD head from last july, as
the patch was not merged to other versions.


294706 25-Jan-2016 melifaro

MFP r287070,r287073: split radix implementation and route table structure.

There are number of radix consumers in kernel land (pf,ipfw,nfs,route)
with different requirements. In fact, first 3 don't have _any_ requirements
and first 2 does not use radix locking. On the other hand, routing
structure do have these requirements (rnh_gen, multipath, custom
to-be-added control plane functions, different locking).
Additionally, radix should not known anything about its consumers internals.

So, radix code now uses tiny 'struct radix_head' structure along with
internal 'struct radix_mask_head' instead of 'struct radix_node_head'.
Existing consumers still uses the same 'struct radix_node_head' with
slight modifications: they need to pass pointer to (embedded)
'struct radix_head' to all radix callbacks.

Routing code now uses new 'struct rib_head' with different locking macro:
RADIX_NODE_HEAD prefix was renamed to RIB_ (which stands for routing
information base).

New net/route_var.h header was added to hold routing subsystem internal
data. 'struct rib_head' was placed there. 'struct rtentry' will also
be moved there soon.


294525 21-Jan-2016 melifaro

Fix panic on table/table entry delete. The panic could have happened
if more than 64 distinct values had been used.

Table value code uses internal objhash API which requires unique key
for each object. For value code, pointer to the actual value data
is used. The actual problem arises from the fact that 'actual' e.g.
runtime data is stored in array and that array is auto-growing. There is
special hook (update_tvalue() function) which is used to update the pointers
after the change. For some reason, object 'key' was not updated.
Fix this by adding update code to the update_tvalue().

Sponsored by: Yandex LLC


293630 10-Jan-2016 melifaro

Initialize error value ta_lookup_kfib() by default to please compiler.


293629 10-Jan-2016 bz

Initialize error after r293626 in case neither INET nor INET6 is
compiled into the kernel. Ideally lots more code would just not
be called (or compiled in) in that case but that requires a lot
more surgery. For now try to make IP-less kernels compile again.


293626 10-Jan-2016 melifaro

Make ipfw addr:kfib lookup algo use new routing KPI.


293625 10-Jan-2016 melifaro

Use already pre-calculated number of entries instead of tc->count.


292254 15-Dec-2015 hselasky

Properly drain callouts in the IPFW subsystem to avoid use after free
panics when unloading the dummynet and IPFW modules:

- The callout drain function can sleep and should not be called having
a non-sleepable lock locked. Remove locks around "ipfw_dyn_uninit(0)".

- Add a new "dn_gone" variable to prevent asynchronous restart of
dummynet callouts when unloading the dummynet kernel module.

- Call "dn_reschedule()" locked so that "dn_gone" can be set and
checked atomically with regard to starting a new callout.

Reviewed by: hiren
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D3855


291993 08-Dec-2015 melifaro

Merge helper fib* functions used for basic lookups.

Vast majority of rtalloc(9) users require only basic info from
route table (e.g. "does the rtentry interface match with the interface
I have?". "what is the MTU?", "Give me the IPv4 source address to use",
etc..).
Instead of hand-rolling lookups, checking if rtentry is up, valid,
dealing with IPv6 mtu, finding "address" ifp (almost never done right),
provide easy-to-use API hiding all the complexity and returning the
needed info into small on-stack structure.

This change also helps hiding route subsystem internals (locking, direct
rtentry accesses).
Additionaly, using this API improves lookup performance since rtentry is not
locked.
(This is safe, since all the rtentry changes happens under both radix WLOCK
and rtentry WLOCK).

Sponsored by: Yandex LLC


291222 23-Nov-2015 ae

Add destroy_object callback to object rewriting framework.
It is called when last reference to named object is going to be released
and allows to do additional cleanup for implementation of named objects.

Obtained from: Yandex LLC
Sponsored by: Yandex LLC


291001 17-Nov-2015 bdrewery

Fix dynamic IPv6 rules showing junk for non-specified address masks.

For example:
00002 0 0 (19s) PARENT 1 tcp 10.10.0.5 0 <-> 0.0.0.0 0
00002 4 412 (1s) LIMIT tcp 10.10.0.5 25848 <-> 10.10.0.7 22
00002 10 777 (1s) LIMIT tcp 2001:894:5a24:653::503:1 52023 <-> 2001:894:5a24:653:ca0a:a9ff:fe04:3978 22
00002 0 0 (17s) PARENT 1 tcp 2001:894:5a24:653::503:1 0 <-> 80f3:70d:23fe:ffff:1005:: 0

Fix this by zeroing the unused address, as is done for IPv4:
00002 0 0 (18s) PARENT 1 tcp 10.10.0.5 0 <-> 0.0.0.0 0
00002 36 14952 (1s) LIMIT tcp 10.10.0.5 25848 <-> 10.10.0.7 22
00002 0 0 (0s) PARENT 1 tcp 2001:894:5a24:653::503:1 0 <-> :: 0
00002 4 345 (274s) LIMIT tcp 2001:894:5a24:653::503:1 34131 <-> 2001:470:1f11:262:ca0a:a9ff:fe04:3978 22

MFC after: 2 weeks


290545 08-Nov-2015 melifaro

Print proper setfib values in ipfw log.

Submitted by: Denis Schneider <v1ne2go at gmail>


290543 08-Nov-2015 melifaro

Fix setfib target.
Problem was introduced in r272840 when converting tablearg value to 0.

Submitted by: Denis Schneider <v1ne2go at gmail>


290345 03-Nov-2015 ae

Remove now obsolete KASSERT.
Actually, object classify callbacks can skip some opcodes, that could
be rewritten. We will deteremine real numbed of rewritten opcodes a bit
later in this function.

Reported by: David H. Wolfskill <david at catwhisker dot org>


290334 03-Nov-2015 ae

Eliminate any conditional increments of object_opcodes in the
check_ipfw_rule_body() function. This function is intended to just
determine that rule has some opcodes that can be rewrited. Then the
ref_rule_objects() function will determine real number of rewritten
opcodes using classify callback.

Reviewed by: melifaro
Obtained from: Yandex LLC
Sponsored by: Yandex LLC


290332 03-Nov-2015 ae

Add ipfw_check_object_name_generic() function to do basic checks for an
object name correctness. Each type of object can do more strict checking
in own implementation. Do such checks for tables in check_table_name().

Reviewed by: melifaro
Obtained from: Yandex LLC
Sponsored by: Yandex LLC


290330 03-Nov-2015 ae

Implement `ipfw internal olist` command to list named objects.

Reviewed by: melifaro
Obtained from: Yandex LLC
Sponsored by: Yandex LLC


288530 03-Oct-2015 melifaro

Bump number of prefixes in O_IP_<SRC|DST> from 15 to 31 (max possible).

PR: 203459
Submitted by: groos at xiplink.com
MFC after: 2 weeks


287195 27-Aug-2015 melifaro

Fix packets/bytes accounting on i386.

Spotted by: julian


286003 29-Jul-2015 ae

Reduce overhead of ipfw's me6 opcode.

Skip checks for IPv6 multicast addresses.
Use in6_localip() for global unicast.
And for IPv6 link-local addresses do search in the IPv6 addresses list.
Since LLA are stored in the kernel internal form, use
IN6_ARE_MASKED_ADDR_EQUAL() macro with lla_mask for addresses comparison.
lla_mask has zero bits in the second word, where we keep sin6_scope_id.

Obtained from: Yandex LLC
Sponsored by: Yandex LLC


285712 20-Jul-2015 ae

Add helper functions for IP checksum adjusting. Use these functions in
dummynet code and for setdscp. This fixes wrong checksums in some cases.

Obtained from: Yandex LLC
MFC after: 2 weeks
Sponsored by: Yandex LLC


285362 10-Jul-2015 luigi

assorted algorithmic fixes from Paolo Valente (one of my qfq coauthors):
- use 1ULL to avoid shift truncations
- recompute the sum of weight dynamically to provide better fairness
- fix an erroneous constant in the computation of the slot
- preserve timestamp correctness when the old timestamp is stale.


285361 10-Jul-2015 luigi

one more warning suppression when compiling the test code in userspace.


285360 10-Jul-2015 luigi

add code to compute fairness indexes;
cleanups to remove compile warnings.


283291 22-May-2015 jkim

CALLOUT_MPSAFE has lost its meaning since r141428, i.e., for more than ten
years for head. However, it is continuously misused as the mpsafe argument
for callout_init(9). Deprecate the flag and clean up callout_init() calls
to make them more consistent.

Differential Revision: https://reviews.freebsd.org/D2613
Reviewed by: jhb
MFC after: 2 weeks


283116 19-May-2015 luigi

use proper types to represent function pointers


283113 19-May-2015 luigi

remove a redundant ; at the end of a function

MFC after: 1 week


283111 19-May-2015 luigi

remove an extra ; after MODULE_DEPEND
(would otherwise generate a warning with more verbose compiler flags)

MFC after: 1 week


282856 13-May-2015 luigi

bugfix (only affecting the "lookup" option in the userspace version of ipfw):

the conditional block should not include the 'else' otherwise
the code does a 'break;' without completing the check


282825 12-May-2015 melifaro

Remove ptei->value check from ipfw_link_table_values():
even if there was non-zero number of restarts, we would unref/clear
all value references and start ipfw_link_table_values() once again
with (mostly) cleared "tei" buffer.
Additionally, ptei->ptv stores only to-be-added values, not existing ones.
This is a forgotten piece of previous value refconting implementation,
and now it is simply incorrect.


282521 06-May-2015 melifaro

Fix panic when prepare_batch_buffer() returns error.


282286 30-Apr-2015 melifaro

Fix KASSERT introduced in r282155.

Found by: dhw


282155 28-Apr-2015 melifaro

Fix panic introduced by r282070.
Arm friendly KASSERT() to ease debug of similar crashes.

Submitted by: Olivier Cochard-Labbé


282082 27-Apr-2015 melifaro

Fix 'may be used uninitialized' warning not caught by clang.


282081 27-Apr-2015 melifaro

Use free_nat_instance() for nat instance deletion.

Sponsored by: Yandex LLC


282070 27-Apr-2015 melifaro

Make rule table kernel-index rewriting support any kind of objects.

Currently we have tables identified by their names in userland
with internal kernel-assigned indices. This works the following way:

When userland wishes to communicate with kernel to add or change rule(s),
it makes indexed sorted array of table names
(internally ipfw_obj_ntlv entries), and refer to indices in that
array in rule manipulation.
Prior to committing new rule to the ruleset kernel
a) finds all referenced tables, bump their refcounts and change
values inside the opcodes to be real kernel indices
b) auto-creates all referenced but not existing tables and then
do a) for them.

Kernel does almost the same when exporting rules to userland:
prepares array of used tables in all rules in range, and
prepends it before the actual ruleset retaining actual in-kernel
indexes for that.

There is also special translation layer for legacy clients which is
able to provide 'real' indices for table names (basically doing atoi()).

While it is arguable that every subsystem really needs names instead of
numbers, there are several things that should be noted:

1) every non-singleton subsystem needs to store its runtime state
somewhere inside ipfw chain (and be able to get it fast)
2) we can't assume object numbers provided by humans will be dense.

Existing nat implementation (O(n) access and LIST inside chain) is a
good example.

Hence the following:
* Convert table-centric rewrite code to be more generic, callback-based
* Move most of the code from ip_fw_table.c to ip_fw_sockopt.c
* Provide abstract API to permit subsystems convert their objects
between userland string identifier and in-kernel index.
(See struct opcode_obj_rewrite) for more details
* Create another per-chain index (in next commit) shared among all subsystems
* Convert current NAT44 implementation to use new API, O(1) lookups,
shared index and names instead of numbers (in next commit).

Sponsored by: Yandex LLC


282051 27-Apr-2015 glebius

Fix memory leak.

PR: 199670
Reviewed by: ae


280910 31-Mar-2015 ae

The offset variable has been cleared all bits except IP6F_OFF_MASK.
Use ip6f_mf variable instead of checking its bits.


279948 13-Mar-2015 ae

Fix `ipfw fwd tablearg'. Use dedicated field nh4 in struct table_value
to obtain IPv4 next hop address in tablearg case.

Add `fwd tablearg' support for IPv6. ipfw(8) uses INADDR_ANY as next hop
address in O_FORWARD_IP opcode for specifying tablearg case. For IPv6 we
still use this opcode, but when packet identified as IPv6 packet, we
obtain next hop address from dedicated field nh6 in struct table_value.

Replace hopstore field in struct ip_fw_args with anonymous union and add
hopstore6 field. Use this field to copy tablearg value for IPv6.

Replace spare1 field in struct table_value with zoneid. Use it to keep
scope zone id for link-local IPv6 addresses. Since spare1 was used
internally, replace spare0 array with two variables spare0 and spare1.

Use getaddrinfo(3)/getnameinfo(3) functions for parsing and formatting
IPv6 addresses in table_value. Use zoneid field in struct table_value
to store sin6_scope_id value.

Since the kernel still uses embedded scope zone id to represent
link-local addresses, convert next_hop6 address into this form before
return from pfil processing. This also fixes in6_localip() check
for link-local addresses.

Differential Revision: https://reviews.freebsd.org/D2015
Obtained from: Yandex LLC
Sponsored by: Yandex LLC


278264 05-Feb-2015 melifaro

Fix IP_FW_NAT44_LIST_NAT size calculation.

Found by: lev
Sponsored by: Yandex LLC


278259 05-Feb-2015 melifaro

* Make sure table algorithm destroy hook is always called without locks
* Explicitly lock freeing interface references in ta_destroy_ifidx
* Change ipfw_iface_unref() to require UH lock
* Add forgotten ipfw_iface_unref() to destroy_ifidx_locked()

PR: kern/197276
Submitted by: lev
Sponsored by: Yandex LLC


277240 16-Jan-2015 melifaro

Use ipfw runtime lock only when real modification is required.


274315 09-Nov-2014 melifaro

Remove unused 'struct route' fields.


274225 07-Nov-2014 glebius

Remove SYSCTL_VNET_* macros, and simply put CTLFLAG_VNET where needed.

Sponsored by: Nginx, Inc.


274087 04-Nov-2014 melifaro

Remove unused variable.

Found by: Coverity
CID: 1245739


273588 24-Oct-2014 melifaro

Bump default dynamic limit to 16k entries.
Print better log message when limit is hit.

PR: 193300
Submitted by: me at nileshgr.com


273483 22-Oct-2014 melifaro

Rename log2 to tal_log2.

Submitted by: luigi


273453 22-Oct-2014 luigi

remove/fix old code for building ipfw and dummynet in userspace


273327 20-Oct-2014 melifaro

Use copyout() directly instead of updating various fields
before/after each sooptcopyout() call.

Found by: luigi
Sponsored by: Yandex LLC


273274 19-Oct-2014 melifaro

Perform more checks on the number of tables supplied by user.


273260 18-Oct-2014 melifaro

Use IPFW_RULE_CNTR_SIZE macro instead of non-relevant ip_fw_cntr structure.

Found by: luigi


273035 13-Oct-2014 melifaro

Fix matching default rule on clear/show commands.

Found by: Oleg Ginzburg


272940 11-Oct-2014 melifaro

Fix KASSERT typo.


272912 10-Oct-2014 melifaro

Remove redundant if_notifier declaration.


272900 10-Oct-2014 melifaro

Fix KASSERT argument type.


272899 10-Oct-2014 melifaro

Fix NOINET6 build for ipfw.


272898 10-Oct-2014 melifaro

Partially fix build on !amd64

Pointed by: bz


272840 09-Oct-2014 melifaro

Merge projects/ipfw to HEAD.

Main user-visible changes are related to tables:

* Tables are now identified by names, not numbers.
There can be up to 65k tables with up to 63-byte long names.
* Tables are now set-aware (default off), so you can switch/move
them atomically with rules.
* More functionality is supported (swap, lock, limits, user-level lookup,
batched add/del) by generic table code.
* New table types are added (flow) so you can match multiple packet fields at once.
* Ability to add different type of lookup algorithms for particular
table type has been added.
* New table algorithms are added (cidr:hash, iface:array, number:array and
flow:hash) to make certain types of lookup more effective.
* Table value are now capable of holding multiple data fields for
different tablearg users

Performance changes:
* Main ipfw lock was converted to rmlock
* Rule counters were separated from rule itself and made per-cpu.
* Radix table entries fits into 128 bytes
* struct ip_fw is now more compact so more rules will fit into 64 bytes
* interface tables uses array of existing ifindexes for faster match

ABI changes:
All functionality supported by old ipfw(8) remains functional.
Old & new binaries can work together with the following restrictions:
* Tables named other than ^\d+$ are shown as table(65535) in
ruleset in old binaries

Internal changes:.
Changing table ids to numbers resulted in format modification for
most sockopt codes. Old sopt format was compact, but very hard to
extend (no versioning, inability to add more opcodes), so
* All relevant opcodes were converted to TLV-based versioned IP_FW3-based codes.
* The remaining opcodes were also converted to be able to eliminate
all older opcodes at once
* All IP_FW3 handlers uses special API instead of calling sooptcopy*
directly to ease adding another communication methods
* struct ip_fw is now different for kernel and userland
* tablearg value has been changed to 0 to ease future extensions
* table "values" are now indexes in special value array which
holds extended data for given index
* Batched add/delete has been added to tables code
* Most changes has been done to permit batched rule addition.
* interface tracking API has been added (started on demand)
to permit effective interface tables operations
* O(1) skipto cache, currently turned off by default at
compile-time (eats 512K).

* Several steps has been made towards making libipfw:
* most of new functions were separated into "parse/prepare/show
and actuall-do-stuff" pieces (already merged).
* there are separate functions for parsing text string into "struct ip_fw"
and printing "struct ip_fw" to supplied buffer (already merged).
* Probably some more less significant/forgotten features

MFC after: 1 month
Sponsored by: Yandex LLC


272614 06-Oct-2014 melifaro

Improve r272609 (O_TCPOPTS).

MFC after: 3 dayes


272609 06-Oct-2014 melifaro

Fix O_TCPOPTS processing.

Obtained from: luigi


272518 04-Oct-2014 melifaro

Bump max rule size to 512 opcodes.


272089 25-Sep-2014 sbruno

Fix NULL pointer deref in ipfw when using dummynet at layer 2.
Drop packet if pkg->ifp is NULL, which is the case here.

ref. https://github.com/HardenedBSD/hardenedBSD
commit 4eef3881c64f6e3aa38eebbeaf27a947a5d47dd7

PR 193861 -- DUMMYNET LAYER2: kernel panic

in this case a kernel panic occurs. Hence, when we do not get an interface,
we just drop the packet in question.

PR: 193681
Submitted by: David Carlier <david.carlier@hardenedbsd.org>
Obtained from: Hardened BSD
MFC after: 2 weeks
Relnotes: yes


270425 23-Aug-2014 melifaro

Whitespace/style changes merged from projects/ipfw.


267992 28-Jun-2014 hselasky

Pull in r267961 and r267973 again. Fix for issues reported will follow.


267985 27-Jun-2014 gjb

Revert r267961, r267973:

These changes prevent sysctl(8) from returning proper output,
such as:

1) no output from sysctl(8)
2) erroneously returning ENOMEM with tools like truss(1)
or uname(1)
truss: can not get etype: Cannot allocate memory


267961 27-Jun-2014 hselasky

Extend the meaning of the CTLFLAG_TUN flag to automatically check if
there is an environment variable which shall initialize the SYSCTL
during early boot. This works for all SYSCTL types both statically and
dynamically created ones, except for the SYSCTL NODE type and SYSCTLs
which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to
be used in the case a tunable sysctl has a custom initialisation
function allowing the sysctl to still be marked as a tunable. The
kernel SYSCTL API is mostly the same, with a few exceptions for some
special operations like iterating childrens of a static/extern SYSCTL
node. This operation should probably be made into a factored out
common macro, hence some device drivers use this. The reason for
changing the SYSCTL API was the need for a SYSCTL parent OID pointer
and not only the SYSCTL parent OID list pointer in order to quickly
generate the sysctl path. The motivation behind this patch is to avoid
parameter loading cludges inside the OFED driver subsystem. Instead of
adding special code to the OFED driver subsystem to post-load tunables
into dynamically created sysctls, we generalize this in the kernel.

Other changes:
- Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask"
to "hw.pcic.intr_mask".
- Removed redundant TUNABLE statements throughout the kernel.
- Some minor code rewrites in connection to removing not needed
TUNABLE statements.
- Added a missing SYSCTL_DECL().
- Wrapped two very long lines.
- Avoid malloc()/free() inside sysctl string handling, in case it is
called to initialize a sysctl from a tunable, hence malloc()/free() is
not ready when sysctls from the sysctl dataset are registered.
- Bumped FreeBSD version to indicate SYSCTL API change.

MFC after: 2 weeks
Sponsored by: Mellanox Technologies


266955 01-Jun-2014 hiren

DNOLD_IS_ECN introduced by r266941 is not required.
DNOLD_* flags are for compat with old binaries.

Suggested by: luigi


266941 01-Jun-2014 hiren

ECN marking implenetation for dummynet.
Changes include both DCTCP and RFC 3168 ECN marking methodology.

DCTCP draft: http://tools.ietf.org/html/draft-bensley-tcpm-dctcp-00

Submitted by: Midori Kato (aoimidori27@gmail.com)
Worked with: Lars Eggert (lars@netapp.com)
Reviewed by: luigi, hiren


266399 18-May-2014 ae

Since ipfw nat configures all options in one step, we should set all bits
in the mask when calling LibAliasSetMode() to properly clear unneeded
options.

PR: 189655
MFC after: 1 week
Sponsored by: Yandex LLC


266310 17-May-2014 melifaro

Fix wrong formatting of 0.0.0.0/X table records in ipfw(8).

Add `flags` u16 field to the hole in ipfw_table_xentry structure.
Kernel has been guessing address family for supplied record based
on xent length size.
Userland, however, has been getting fixed-size ipfw_table_xentry structures
guessing address family by checking address by IN6_IS_ADDR_V4COMPAT().

Fix this behavior by providing specific IPFW_TCF_INET flag for IPv4 records.

PR: bin/189471
Submitted by: Dennis Yusupoff <dyr@smartspb.net>
MFC after: 2 weeks


264963 26-Apr-2014 trociny

Define startup order the same way as it is in dummynet.


264540 16-Apr-2014 ae

Set oif only for outgoing packets.

PR: 188543
MFC after: 1 week
Sponsored by: Yandex LLC


264421 13-Apr-2014 brueffer

Free resources and error cases; re-indent a curly brace while here.

CID: 1199366
Found with: Coverity Prevent(tm)
MFC after: 1 week


263497 21-Mar-2014 glebius

Fix breakage in ipfw+VIMAGE after r261590.

PR: kern/187665
Sponsored by: Nginx, Inc.


261943 15-Feb-2014 gnn

Summary: Two quick edits to the implementation notes as they're no
longer stored in netinet but in netpfil.


261915 15-Feb-2014 dim

Under sys/netpfil/ipfw, surround two IPv6-specific static functions with
#ifdef INET6, since they are unused when INET6 is disabled.

MFC after: 3 days


261117 24-Jan-2014 melifaro

Reorder struct ip_fw_chain:
* move rarely-used fields down
* move uh_lock to different cacheline
* remove some usused fields

Sponsored by: Yandex LLC


260551 11-Jan-2014 melifaro

Revert r260548. We really should not use IPFW_WLOCK() here
but this requires some more playing with IPFW_UH_WLOCK(). Leave till later.


260548 11-Jan-2014 melifaro

We don't need chain write lock since we're not modifying its contents.
LibAliasSetAddress() uses its own mutex to serialize changes.

While here, convert ifp->if_xname access to if_name() function.

MFC after: 2 weeks
Sponsored by: Yandex LLC


260247 03-Jan-2014 melifaro

Use rnh_matchaddr instead of rnh_lookup for longest-prefix match.
rnh_lookup is effectively the same as rnh_matchaddr if called with
empy network mask.

MFC after: 2 weeks


259568 18-Dec-2013 melifaro

Add net.inet.ip.fw.dyn_keep_states sysctl which
re-links dynamic states to default rule instead of
flushing on rule deletion.
This can be useful while performing ruleset reload
(think about `atomic` reload via changing sets).
Currently it is turned off by default.

MFC after: 2 weeks
Sponsored by: Yandex LLC


258711 28-Nov-2013 melifaro

Simplify O_NAT opcode handling.

MFC after: 2 weeks
Sponsored by: Yandex LLC


258708 28-Nov-2013 melifaro

Check ipfw table numbers in both user and kernel space before rule addition.

Found by: Saychik Pavel <umka@localka.net>
MFC after: 2 weeks
Sponsored by: Yandex LLC


258588 25-Nov-2013 rodrigc

In sys/netpfil/ipfw/ip_fw_nat.c:vnet_ipfw_nat_uninit() we call "IPFW_WLOCK(chain);".
This lock gets deleted in sys/netpfil/ipfw/ip_fw2.c:vnet_ipfw_uninit().

Therefore, vnet_ipfw_nat_uninit() *must* be called before vnet_ipfw_uninit(),
but this doesn't always happen, because the VNET_SYSINIT order is the same for both functions.
In sys/net/netpfil/ipfw/ip_fw2.c and sys/net/netpfil/ipfw/ip_fw_nat.c,
IPFW_SI_SUB_FIREWALL == IPFW_NAT_SI_SUB_FIREWALL == SI_SUB_PROTO_IFATTACHDOMAIN
and
IPFW_MODULE_ORDER == IPFW_NAT_MODULE_ORDER

Consequently, if VIMAGE is enabled, and jails are created and destroyed,
the system sometimes crashes, because we are trying to use a deleted lock.

To reproduce the problem:
(1) Take a GENERIC kernel config, and add options for: VIMAGE, WITNESS,
INVARIANTS.
(2) Run this command in a loop:
jail -l -u root -c path=/ name=foo persist vnet && jexec foo ifconfig lo0 127.0.0.1/8 && jail -r foo

(see http://lists.freebsd.org/pipermail/freebsd-current/2010-November/021280.html )

Fix the problem by increasing the value of IPFW_NAT_SI_SUB_FIREWALL,
so that vnet_ipfw_nat_uninit() runs after vnet_ipfw_uninit().


258467 22-Nov-2013 luigi

add a counter on the struct mq (a queue of mbufs),
and add a block for userspace compiling.


258466 22-Nov-2013 luigi

disable some ipfw match options when compiling in userspace


258465 22-Nov-2013 luigi

make this code compile in userspace on OSX


258464 22-Nov-2013 luigi

more support for userspace compiling of this code:
emulate the uma_zone for dynamic rules.


258463 22-Nov-2013 luigi

make ipfw_check_packet() and ipfw_check_frame() public,
so they can be used in the userspace version of ipfw/dummynet
(normally using netmap for the I/O path).

This is the first of a few commits to ease compiling the
ipfw kernel code in userspace.


257689 05-Nov-2013 glebius

Remove net.link.ether.inet.useloopback sysctl tunable. It was always on by
default from the very beginning. It was placed in wrong namespace
net.link.ether, originally it had been at another wrong namespace. It was
incorrectly documented at incorrect manual page arp(8). Since new-ARP commit,
the tunable have been consulted only on route addition, and ignored on route
deletion. Behaviour of a system with tunable turned off is not fully correct,
and has no advantages comparing to normal behavior.


257241 28-Oct-2013 glebius

Include necessary headers that now are available due to pollution
via if_var.h.

Sponsored by: Netflix
Sponsored by: Nginx, Inc.


257215 27-Oct-2013 glebius

Move new pf includes to the pf directory. The pfvar.h remain
in net, to avoid compatibility breakage for no sake.

The future plan is to split most of non-kernel parts of
pfvar.h into pf.h, and then make pfvar.h a kernel only
include breaking compatibility.

Discussed with: bz


257179 26-Oct-2013 glebius

Provide includes that are needed in these files, and before were read
in implicitly via if.h -> if_var.h pollution.

Sponsored by: Netflix
Sponsored by: Nginx, Inc.


257176 26-Oct-2013 glebius

The r48589 promised to remove implicit inclusion of if_var.h soon. Prepare
to this event, adding if_var.h to files that do need it. Also, include
all includes that now are included due to implicit pollution via if_var.h

Sponsored by: Netflix
Sponsored by: Nginx, Inc.


255928 28-Sep-2013 philip

Use the correct EtherType for logging IPv6 packets.

Reviewed by: melifaro
Approved by: re (kib, glebius)
MFC after: 3 days


254781 24-Aug-2013 mav

Make dummynet use new direct callout(9) execution mechanism. Since the only
thing done by the dummynet handler is taskqueue_enqueue() call, it doesn't
need extra switch to the clock SWI context.

On idle system this change in half reduces number of active CPU cycles and
wakes up only one CPU from sleep instead of two.

I was going to make this change much earlier as part of calloutng project,
but waited for better solution with skipping idle ticks to be implemented.
Unfortunately with 10.0 release coming it is better get at least this.


254776 24-Aug-2013 trociny

Make ipfw nat init/unint work correctly for VIMAGE:

* Do per vnet instance cleanup (previously it was only for vnet0 on
module unload, and led to libalias leaks and possible panics due to
stale pointer dereferences).

* Instead of protecting ipfw hooks registering/deregistering by only
vnet0 lock (which does not prevent pointers access from another
vnets), introduce per vnet ipfw_nat_loaded variable. The variable is
set after hooks are registered and unset before they are deregistered.

* Devirtualize ifaddr_event_tag as we run only one event handler for
all vnets.

* It is supposed that ifaddr_change event handler is called in the
interface vnet context, so add an assertion.

Reviewed by: zec
MFC after: 2 weeks


250246 04-May-2013 melifaro

Use unified method for accessing / updating cached rule pointers.

MFC after: 2 weeks


250131 01-May-2013 eadler

Correct a few sizeof()s

Submitted by: swildner@DragonFlyBSD.org
Reviewed by: alfred


250039 29-Apr-2013 glebius

Remove useless ifdef KLD_MODULE from dummynet module unload path. This
fixes panic on unload.

Reported by: pho


249925 26-Apr-2013 glebius

Add const qualifier to the dst parameter of the ifnet if_output method.


248971 01-Apr-2013 melifaro

Fix ipfw rule validation partially broken by r248552.

Pointed by: avg
MFC with: r248552


248697 25-Mar-2013 ae

When we are removing a specific set, call ipfw_expire_dyn_rules only once.

Obtained from: Yandex LLC
MFC after: 1 week


248552 20-Mar-2013 melifaro

Add ipfw support for setting/matching DiffServ codepoints (DSCP).

Setting DSCP support is done via O_SETDSCP which works for both
IPv4 and IPv6 packets. Fast checksum recalculation (RFC 1624) is done for IPv4.
Dscp can be specified by name (AFXY, CSX, BE, EF), by value
(0..63) or via tablearg.

Matching DSCP is done via another opcode (O_DSCP) which accepts several
classes at once (af11,af22,be). Classes are stored in bitmask (2 u32 words).

Many people made their variants of this patch, the ones I'm aware of are
(in alphabetic order):

Dmitrii Tejblum
Marcelo Araujo
Roman Bogorodskiy (novel)
Sergey Matveichuk (sem)
Sergey Ryabin

PR: kern/102471, kern/121122
MFC after: 2 weeks


248491 19-Mar-2013 ae

Separate the locking macros that are used in the packet flow path
from others. This helps easy switch to use pfil(4) lock.


247626 02-Mar-2013 melifaro

Fix callout expiring dynamic rules.

PR: kern/175530
Submitted by: Vladimir Spiridenkov <vs@gtn.ru>
MFC after: 2 weeks


244634 23-Dec-2012 melifaro

Add parentheses to IP_FW_ARG_TABLEARG() definition.

Suggested by: glebius
MFC with: r244633


244633 23-Dec-2012 melifaro

Use unified IP_FW_ARG_TABLEARG() macro for most tablearg checks.
Log real value instead of IP_FW_TABLEARG (65535) in ipfw_log().

Noticed by: Vitaliy Tokarenko <rphone@ukr.net>
MFC after: 2 weeks


243882 05-Dec-2012 glebius

Mechanically substitute flags from historic mbuf allocator with
malloc(9) flags within sys.

Exceptions:

- sys/contrib not touched
- sys/mbuf.h edited manually


243711 30-Nov-2012 melifaro

Use common macros for working with rule/dynamic counters.
This is done as preparation to introduce per-cpu ipfw counters.

MFC after: 3 weeks


243707 30-Nov-2012 melifaro

Make ipfw dynamic states operations SMP-ready.

* Global IPFW_DYN_LOCK() is changed to per-bucket mutex.
* State expiration is done in ipfw_tick every second.
* No expiration is done on forwarding path.
* hash table resize is done automatically and does not flush all states.
* Dynamic UMA zone is now allocated per each VNET
* State limiting is now done via UMA(9) api.

Discussed with: ipfw
MFC after: 3 weeks
Sponsored by: Yandex LLC


242834 09-Nov-2012 melifaro

Simplify sending keepalives.
Prepare ipfw_tick() to be used by other consumers.

Reviewed by: ae(basically)
MFC after: 2 weeks


242632 05-Nov-2012 melifaro

Add assertion to enforce 'nat global' locking requierements changed by r241908.

Suggested by: adrian, glebius
MFC after: 3 days


242631 05-Nov-2012 melifaro

Use unified print_dyn_rule_flags() function for debugging messages
instead of hand-made printfs in every place.

MFC after: 1 week


242463 02-Nov-2012 ae

Remove the recently added sysctl variable net.pfil.forward.
Instead, add protocol specific mbuf flags M_IP_NEXTHOP and
M_IP6_NEXTHOP. Use them to indicate that the mbuf's chain
contains the PACKET_TAG_IPFORWARD tag. And do a tag lookup
only when this flag is set.

Suggested by: andre


242079 25-Oct-2012 ae

Remove the IPFIREWALL_FORWARD kernel option and make possible to turn
on the related functionality in the runtime via the sysctl variable
net.pfil.forward. It is turned off by default.

Sponsored by: Yandex LLC
Discussed with: net@
MFC after: 2 weeks


241913 22-Oct-2012 glebius

Switch the entire IPv4 stack to keep the IP packet header
in network byte order. Any host byte order processing is
done in local variables and host byte order values are
never[1] written to a packet.

After this change a packet processed by the stack isn't
modified at all[2] except for TTL.

After this change a network stack hacker doesn't need to
scratch his head trying to figure out what is the byte order
at the given place in the stack.

[1] One exception still remains. The raw sockets convert host
byte order before pass a packet to an application. Probably
this would remain for ages for compatibility.

[2] The ip_input() still subtructs header len from ip->ip_len,
but this is planned to be fixed soon.

Reviewed by: luigi, Maxim Dounin <mdounin mdounin.ru>
Tested by: ray, Olivier Cochard-Labbe <olivier cochard.me>


241908 22-Oct-2012 melifaro

Remove unnecessary chain read lock in ipfw nat 'global' code.
Document case when ipfw chain lock must be held while calling ipfw_nat().

MFC after: 2 weeks


241610 16-Oct-2012 glebius

Make the "struct if_clone" opaque to users of the cloning API. Users
now use function calls:

if_clone_simple()
if_clone_advanced()

to initialize a cloner, instead of macros that initialize if_clone
structure.

Discussed with: brooks, bz, 1 year ago


241394 10-Oct-2012 kevlo

Revert previous commit...

Pointyhat to: kevlo (myself)


241370 09-Oct-2012 kevlo

Prefer NULL over 0 for pointers


241369 09-Oct-2012 kevlo

Fix typo: s/unknow/unknown


241359 08-Oct-2012 glebius

Catch up with r241245 and do not return packet back in host byte order.


241344 08-Oct-2012 glebius

After r241245 it appeared that in_delayed_cksum(), which still expects
host byte order, was sometimes called with net byte order. Since we are
moving towards net byte order throughout the stack, the function was
converted to expect net byte order, and its consumers fixed appropriately:
- ip_output(), ipfilter(4) not changed, since already call
in_delayed_cksum() with header in net byte order.
- divert(4), ng_nat(4), ipfw_nat(4) now don't need to swap byte order
there and back.
- mrouting code and IPv6 ipsec now need to switch byte order there and
back, but I hope, this is temporary solution.
- In ipsec(4) shifted switch to net byte order prior to in_delayed_cksum().
- pf_route() catches up on r241245 changes to ip_output().


241245 06-Oct-2012 glebius

A step in resolving mess with byte ordering for AF_INET. After this change:

- All packets in NETISR_IP queue are in net byte order.
- ip_input() is entered in net byte order and converts packet
to host byte order right _after_ processing pfil(9) hooks.
- ip_output() is entered in host byte order and converts packet
to net byte order right _before_ processing pfil(9) hooks.
- ip_fragment() accepts and emits packet in net byte order.
- ip_forward(), ip_mloopback() use host byte order (untouched actually).
- ip_fastforward() no longer modifies packet at all (except ip_ttl).
- Swapping of byte order there and back removed from the following modules:
pf(4), ipfw(4), enc(4), if_bridge(4).
- Swapping of byte order added to ipfilter(4), based on __FreeBSD_version
- __FreeBSD_version bumped.
- pfil(9) manual page updated.

Reviewed by: ray, luigi, eri, melifaro
Tested by: glebius (LE), ray (BE)


240494 14-Sep-2012 glebius

o Create directory sys/netpfil, where all packet filters should
reside, and move there ipfw(4) and pf(4).

o Move most modified parts of pf out of contrib.

Actual movements:

sys/contrib/pf/net/*.c -> sys/netpfil/pf/
sys/contrib/pf/net/*.h -> sys/net/
contrib/pf/pfctl/*.c -> sbin/pfctl
contrib/pf/pfctl/*.h -> sbin/pfctl
contrib/pf/pfctl/pfctl.8 -> sbin/pfctl
contrib/pf/pfctl/*.4 -> share/man/man4
contrib/pf/pfctl/*.5 -> share/man/man5

sys/netinet/ipfw -> sys/netpfil/ipfw

The arguable movement is pf/net/*.h -> sys/net. There are
future plans to refactor pf includes, so I decided not to
break things twice.

Not modified bits of pf left in contrib: authpf, ftp-proxy,
tftp-proxy, pflogd.

The ipfw(4) movement is planned to be merged to stable/9,
to make head and stable match.

Discussed with: bz, luigi


240233 08-Sep-2012 glebius

Merge the projects/pf/head branch, that was worked on for last six months,
into head. The most significant achievements in the new code:

o Fine grained locking, thus much better performance.
o Fixes to many problems in pf, that were specific to FreeBSD port.

New code doesn't have that many ifdefs and much less OpenBSDisms, thus
is more attractive to our developers.

Those interested in details, can browse through SVN log of the
projects/pf/head branch. And for reference, here is exact list of
revisions merged:

r232043, r232044, r232062, r232148, r232149, r232150, r232298, r232330,
r232332, r232340, r232386, r232390, r232391, r232605, r232655, r232656,
r232661, r232662, r232663, r232664, r232673, r232691, r233309, r233782,
r233829, r233830, r233834, r233835, r233836, r233865, r233866, r233868,
r233873, r234056, r234096, r234100, r234108, r234175, r234187, r234223,
r234271, r234272, r234282, r234307, r234309, r234382, r234384, r234456,
r234486, r234606, r234640, r234641, r234642, r234644, r234651, r235505,
r235506, r235535, r235605, r235606, r235826, r235991, r235993, r236168,
r236173, r236179, r236180, r236181, r236186, r236223, r236227, r236230,
r236252, r236254, r236298, r236299, r236300, r236301, r236397, r236398,
r236399, r236499, r236512, r236513, r236525, r236526, r236545, r236548,
r236553, r236554, r236556, r236557, r236561, r236570, r236630, r236672,
r236673, r236679, r236706, r236710, r236718, r237154, r237155, r237169,
r237314, r237363, r237364, r237368, r237369, r237376, r237440, r237442,
r237751, r237783, r237784, r237785, r237788, r237791, r238421, r238522,
r238523, r238524, r238525, r239173, r239186, r239644, r239652, r239661,
r239773, r240125, r240130, r240131, r240136, r240186, r240196, r240212.

I'd like to thank people who participated in early testing:

Tested by: Florian Smeets <flo freebsd.org>
Tested by: Chekaluk Vitaly <artemrts ukr.net>
Tested by: Ben Wilber <ben desync.com>
Tested by: Ian FREISLICH <ianf cloudseed.co.za>


240099 04-Sep-2012 melifaro

Introduce new link-layer PFIL hook V_link_pfil_hook.
Merge ether_ipfw_chk() and part of bridge_pfil() into
unified ipfw_check_frame() function called by PFIL.
This change was suggested by rwatson? @ DevSummit.

Remove ipfw headers from ether/bridge code since they are unneeded now.

Note this thange introduce some (temporary) performance penalty since
PFIL read lock has to be acquired for every link-level packet.

MFC after: 3 weeks


239997 01-Sep-2012 eadler

Mark the ipfw interface type as not being ether. This fixes an issue
where uuidgen tried to obtain a ipfw device's mac address which was
always zero.

PR: 170460
Submitted by: wxs
Reviewed by: bdrewery
Reviewed by: delphij
Approved by: cperciva
MFC after: 1 week


239124 07-Aug-2012 luigi

s/lenght/length/ in comments


239093 06-Aug-2012 luigi

move functions outside the SYSBEGIN/SYSEND block

(SYSBEGIN/SYSEND are specific to ipfw/dummynet and are used to
emulate sysctl on platforms that do not have them, and they work
by creating an array which contains all the sysctl-ed symbols.)


239092 06-Aug-2012 luigi

use FREE_PKT instead of m_freem to free an mbuf.
The former is the standard form used in ipfw/dummynet, so that
it is easier to remap it to different memory managers depending
on the platform.


238988 02-Aug-2012 luigi

replace __unused with a portable construct;
fix a couple of signed/unsigned warnings.


238978 01-Aug-2012 luigi

replace inet_ntoa_r with the more standard inet_ntop().
As discussed on -current, inet_ntoa_r() is non standard,
has different arguments in userspace and kernel, and
almost unused (no clients in userspace, only
net/flowtable.c, net/if_llatbl.c, netinet/in_pcb.c, netinet/tcp_subr.c
in the kernel)


238977 01-Aug-2012 luigi

add a cast to avoid a signed/unsigned warning (to be removed
when we will have TUNABLE_UINT constructors)


238277 09-Jul-2012 hrs

Make ipfw0 logging pseudo-interface clonable. It can be created automatically
by $firewall_logif rc.conf(5) variable at boot time or manually by ifconfig(8)
after a boot.

Discussed on: freebsd-ipfw@


238265 08-Jul-2012 melifaro

Finally fix lookup (account remaining '\0') and deletion
(provide valid key length for radix lookup).

Submitted by: Ihor Kaharlichenko<madkinder at gmail.com> (prev version)
Approved by: kib(mentor)
MFC after: 3 days

Sponsored by: Shtorm ISP


238063 03-Jul-2012 issyl0

- Make ipfw's sched rules case insensitive, for user-friendliness.
- Add a note to the ipfw(8) man page about the rules no longer being
case sensitive.
- Fix some typos in the man page.

PR: docs/164772
Reviewed by: bz
Approved by: gabor (doc mentor, src committer)
MFC after: 2 weeks


237479 23-Jun-2012 melifaro

Fix interface matching by ipfw table

Submitted by: Ihor Kaharlichenko <madkinder@gmail.com>
Tested by: Ihor Kaharlichenko <madkinder@gmail.com>
Approved by: kib(mentor)
MFC after: 3 days


236819 09-Jun-2012 melifaro

Validate IPv4 network mask being passed to ipfw kernel interface.
Incorrect mask can possibly be one of the reasons for kern/127209 existance.

Approved by: kib(mentor)
MFC after: 3 days


234946 03-May-2012 melifaro

Revert r234834 per luigi@ request.

Cleaner solution (e.g. adding another header) should be done here.

Original log:
Move several enums and structures required for L2 filtering from ip_fw_private.h to ip_fw.h.
Remove ipfw/ip_fw_private.h header from non-ipfw code.

Requested by: luigi
Approved by: kib(mentor)


234834 30-Apr-2012 melifaro

Move several enums and structures required for L2 filtering from ip_fw_private.h to ip_fw.h.
Remove ipfw/ip_fw_private.h header from non-ipfw code.

Approved by: ae(mentor)
MFC after: 2 weeks


233745 31-Mar-2012 glebius

Don't check malloc(M_WAITOK) results.


233478 25-Mar-2012 melifaro

- Permit number of ipfw tables to be changed in runtime.

net.inet.ip.fw.tables_max is now read-write.

- Bump IPFW_TABLES_MAX to 65535
Default number of tables is still 128

- Remove IPFW_TABLES_MAX from ipfw(8) code.

Sponsored by Yandex LLC

Approved by: kib(mentor)

MFC after: 2 weeks


232868 12-Mar-2012 melifaro

Fix VNET build broken by r232865.
Temporary remove the ability to assign different number of tables per VNET instance.


232865 12-Mar-2012 melifaro

- Add ipfw eXtended tables permitting radix to be used for any kind of keys.
- Add support for IPv6 and interface extended tables
- Make number of tables to be loader tunable in range 0..65534.
- Use IP_FW3 opcode for all new extended table cmds

No ABI changes are introduced. Old userland will see valid tables for
IPv4 tables and no entries otherwise. Flush works for any table.

IP_FW3 socket option is used to encapsulate all new opcodes:
/* IP_FW3 header/opcodes */
typedef struct _ip_fw3_opheader {
uint16_t opcode; /* Operation opcode */
uint16_t reserved[3]; /* Align to 64-bit boundary */
} ip_fw3_opheader;

New opcodes added:
IP_FW_TABLE_XADD, IP_FW_TABLE_XDEL, IP_FW_TABLE_XGETSIZE, IP_FW_TABLE_XLIST

ipfw(8) table argument parsing behavior is changed:
'ipfw table 999 add host' now assumes 'host' to be interface name instead of
hostname.

New tunable:
net.inet.ip.fw.tables_max controls number of table supported by ipfw in given
VNET instance. 128 is still the default value.

New syntax:
ipfw add skipto tablearg ip from any to any via table(42) in
ipfw add skipto tablearg ip from any to any via table(4242) out

This is a bit hackish, special interface name '\1' is used to signal interface
table number is passed in p.glob field.

Sponsored by Yandex LLC

Reviewed by: ae
Approved by: ae (mentor)

MFC after: 4 weeks


232273 28-Feb-2012 oleg

- Refresh dynamic tcp rule only if both sides answered keepalive packets.
- Remove some useless assignments.

MFC after: 1 month


232272 28-Feb-2012 oleg

lookup_dyn_rule_locked(): style(9) cleanup

MFC after: 1 month


231991 22-Feb-2012 ae

Don't use `m' after m_megapullup.

PR: kern/165373
MFC after: 3 days


231852 17-Feb-2012 bz

Merge multi-FIB IPv6 support from projects/multi-fibv6/head/:

Extend the so far IPv4-only support for multiple routing tables (FIBs)
introduced in r178888 to IPv6 providing feature parity.

This includes an extended rtalloc(9) KPI for IPv6, the necessary
adjustments to the network stack, and user land support as in netstat.

Sponsored by: Cisco Systems, Inc.
Reviewed by: melifaro (basically)
MFC after: 10 days


231076 06-Feb-2012 glebius

Make the 'tcpwin' option of ipfw(8) accept ranges and lists.

Submitted by: sem


230614 27-Jan-2012 luigi

a variable was erroneously declared as 32 bit instead of 64.

MFC after: 3 days


230452 22-Jan-2012 bz

Make #error messages string-literals and remove punctuation.

Reported by: bde (for ip_divert)
Reviewed by: bde
MFC after: 3 days


227458 11-Nov-2011 eadler

- add a missing "be" and "in"
- fix other errors introduced when committing r226436
- add 'function' to a sentence where it makes sense

Submitted by: delphij
Submitted by: dougb
Submitted by: jhb
Approved by: dougb
Approved by: jhb


227309 07-Nov-2011 ed

Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs.

The SYSCTL_NODE macro defines a list that stores all child-elements of
that node. If there's no SYSCTL_DECL macro anywhere else, there's no
reason why it shouldn't be static.


227293 07-Nov-2011 ed

Mark MALLOC_DEFINEs static that have no corresponding MALLOC_DECLAREs.

This means that their use is restricted to a single C file.


227085 04-Nov-2011 bz

Always use the opt_*.h options for ipfw.ko, not just when
compiled into the kernel.
Do not try to build the module in case of no INET support but
keep #error calls for now in case we would compile it into the
kernel.

This should fix an issue where the module would fail to enable
IPv6 support from the rc framework, but also other INET and INET6
parts being silently compiled out without giving a warning in the
module case.

While here garbage collect unneeded opt_*.h includes.
opt_ipdn.h is not used anywhere but we need to leave the DUMMYNET
entry in options for conditional inclusion in kernel so keep the
file with the same name.

Reported by: pluknet
Reviewed by: plunket, jhb
MFC After: 3 days


226436 16-Oct-2011 eadler

- change "is is" to "is" or "it is"
- change "the the" to "the"

Approved by: lstewart
Approved by: sahil (mentor)
MFC after: 3 days


225793 27-Sep-2011 bz

Unbreak no-ip and no-inet6 module builds with ipfw. For now continue to
build the ip_fw_pfil.c hooks and ipfw even in case of no-ip under the
assumption that the private L2 hook (which hopefully eventually will be a
pfil hook as well) can still be useful.

Allow building the module without inet as well.

Glanced at by: jhb
MFC after: 3 days


225518 12-Sep-2011 jhb

Allow the ipfw.ko module built with a kernel to honor any IPFIREWALL_*
options defined in the kernel config. This more closely matches the
behavior of other modules which inherit configuration settings from the
kernel configuration during a kernel + modules build.

Reviewed by: luigi
Approved by: re (kib)
MFC after: 1 week


225044 20-Aug-2011 bz

Add support for IPv6 to ipfw fwd:
Distinguish IPv4 and IPv6 addresses and optional port numbers in
user space to set the option for the correct protocol family.
Add support in the kernel for carrying the new IPv6 destination
address and port.
Add support to TCP and UDP for IPv6 and fix UDP IPv4 to not change
the address in the IP header.
Add support for IPv6 forwarding to a non-local destination.
Add a regession test uitilizing VIMAGE to check all 20 possible
combinations I could think of.

Obtained from: David Dolson at Sandvine Incorporated
(original version for ipfw fwd IPv6 support)
Sponsored by: Sandvine Incorporated
PR: bin/117214
MFC after: 4 weeks
Approved by: re (kib)


225036 20-Aug-2011 bz

Hide IPv6 next header parsing warnings under the verbose sysctl
so people can possibly disable it when their consoles are flooded,
or enabled it for debugging.

MFC after: 2 weeks
Approved by: re (kib)


225034 20-Aug-2011 bz

After r225032 fix logging in a similar way masking the the IPv6
more fragments flag off so that offset == 0 checks work properly.

PR: kern/145733
Submitted by: Matthew Luckie (mjl luckie.org.nz)
MFC after: 2 weeks
X-MFC with: r225032
Approved by: re (kib)


225033 20-Aug-2011 bz

If we detect an IPv6 fragment header and it is not the first fragment,
then terminate the loop as we will not find any further headers and
for short fragments this could otherwise lead to a pullup error
discarding the fragment.

PR: kern/145733
Submitted by: Matthew Luckie (mjl luckie.org.nz)
MFC after: 2 weeks
Approved by: re (kib)


225032 20-Aug-2011 bz

ipfw internally checks for offset == 0 to determine whether the
packet is a/the first fragment or not. For IPv6 we have added the
"more fragments" flag as well to be able to determine on whether
there will be more as we do not have the fragment header avaialble
for logging, while for IPv4 this information can be derived directly
from the IPv4 header. This allowed fragmented packets to bypass
normal rules as proper masking was not done when checking offset.
Split variables to not need masking for IPv6 to avoid further errors.

PR: kern/145733
Submitted by: Matthew Luckie (mjl luckie.org.nz)
MFC after: 2 weeks
Approved by: re (kib)


225030 20-Aug-2011 bz

While not explicitly allowed by RFC 2460, in case there is no
translation technology involved (and that section is suggested to
be removed by Errata 2843), single packet fragments do not harm.

There is another errata under discussion to clarify and allow this.
Meanwhile add a sysctl to allow disabling this behaviour again.
We will treat single packet fragment (a fragment header added
when not needed) as if there was no fragment header.

PR: kern/145733
Submitted by: Matthew Luckie (mjl luckie.org.nz) (original version)
Tested by: Matthew Luckie (mjl luckie.org.nz)
MFC after: 2 weeks
Approved by: re (kib)


223666 29-Jun-2011 ae

Add new rule actions "call" and "return" to ipfw. They make
possible to organize subroutines with rules.

The "call" action saves the current rule number in the internal
stack and rules processing continues from the first rule with
specified number (similar to skipto action). If later a rule with
"return" action is encountered, the processing returns to the first
rule with number of "call" rule saved in the stack plus one or higher.

Submitted by: Vadim Goncharov
Discussed by: ipfw@, luigi@


223637 28-Jun-2011 bz

Update packet filter (pf) code to OpenBSD 4.5.

You need to update userland (world and ports) tools
to be in sync with the kernel.

Submitted by: mlaier
Submitted by: eri


223593 27-Jun-2011 glebius

Add possibility to pass IPv6 packets to a divert(4) socket.

Submitted by: sem


223358 21-Jun-2011 ae

Do not use SET_HOST_IPLEN() macro for IPv6 packets.

PR: kern/157239
MFC after: 2 weeks


223080 14-Jun-2011 ae

Implement "global" mode for ipfw nat. It is similar to natd(8)
"globalport" option for multiple NAT instances.

If ipfw rule contains "global" keyword instead of nat_number, then
for each outgoing packet ipfw_nat looks up translation state in all
configured nat instances. If an entry is found, packet aliased
according to that entry, otherwise packet is passed unchanged.

User can specify "skip_global" option in NAT configuration to exclude
an instance from the lookup in global mode.

PR: kern/157867
Submitted by: Alexander V. Chernikov (previous version)
Tested by: Eugene Grosbein


223073 14-Jun-2011 ae

Add IPv6 support to the ipfw uid/gid check. Pass an ip_fw_args structure
to the check_uidgid() function, since it contains all needed arguments
and also pointer to mbuf and now it is possible use in_pcblookup_mbuf()
function.

Since i can not test it for the non-FreeBSD case, i keep this ifdef
unchanged.

Tested by: Alexander V. Chernikov
MFC after: 3 weeks


222806 07-Jun-2011 ae

Make a behaviour of the libalias based in-kernel NAT a bit closer to
how natd(8) does work. natd(8) drops packets only when libalias returns
PKT_ALIAS_IGNORED and "deny_incoming" option is set, but ipfw_nat
always did drop packets that were not aliased, even if they should
not be aliased and just are going through.

PR: kern/122109, kern/129093, kern/157379
Submitted by: Alexander V. Chernikov (previous version)
MFC after: 1 month


222748 06-Jun-2011 rwatson

Implement a CPU-affine TCP and UDP connection lookup data structure,
struct inpcbgroup. pcbgroups, or "connection groups", supplement the
existing inpcbinfo connection hash table, which when pcbgroups are
enabled, might now be thought of more usefully as a per-protocol
4-tuple reservation table.

Connections are assigned to connection groups base on a hash of their
4-tuple; wildcard sockets require special handling, and are members
of all connection groups. During a connection lookup, a
per-connection group lock is employed rather than the global pcbinfo
lock. By aligning connection groups with input path processing,
connection groups take on an effective CPU affinity, especially when
aligned with RSS work placement (see a forthcoming commit for
details). This eliminates cache line migration associated with
global, protocol-layer data structures in steady state TCP and UDP
processing (with the exception of protocol-layer statistics; further
commit to follow).

Elements of this approach were inspired by Willman, Rixner, and Cox's
2006 USENIX paper, "An Evaluation of Network Stack Parallelization
Strategies in Modern Operating Systems". However, there are also
significant differences: we maintain the inpcb lock, rather than using
the connection group lock for per-connection state.

Likewise, the focus of this implementation is alignment with NIC
packet distribution strategies such as RSS, rather than pure software
strategies. Despite that focus, software distribution is supported
through the parallel netisr implementation, and works well in
configurations where the number of hardware threads is greater than
the number of NIC input queues, such as in the RMI XLR threaded MIPS
architecture.

Another important difference is the continued maintenance of existing
hash tables as "reservation tables" -- these are useful both to
distinguish the resource allocation aspect of protocol name management
and the more common-case lookup aspect. In configurations where
connection tables are aligned with hardware hashes, it is desirable to
use the traditional lookup tables for loopback or encapsulated traffic
rather than take the expense of hardware hashes that are hard to
implement efficiently in software (such as RSS Toeplitz).

Connection group support is enabled by compiling "options PCBGROUP"
into your kernel configuration; for the time being, this is an
experimental feature, and hence is not enabled by default.

Subject to the limited MFCability of change dependencies in inpcb,
and its change to the inpcbinfo init function signature, this change
in principle could be merged to FreeBSD 8.x.

Reviewed by: bz
Sponsored by: Juniper Networks, Inc.


222742 06-Jun-2011 ae

Do not return EINVAL when user does `ipfw set N flush` on an empty set.

MFC after: 2 weeks


222582 01-Jun-2011 ae

O_FORWARD_IP is only action which depends from the result of lookup of
dynamic rules. We are doing forwarding in the following cases:
o For the simple ipfw fwd rule, e.g.

fwd 10.0.0.1 ip from any to any out xmit em0
fwd 127.0.0.1,3128 tcp from any to any 80 in recv em1

o For the dynamic fwd rule, e.g.

fwd 192.168.0.1 tcp from any to 10.0.0.3 3333 setup keep-state

When this rule triggers it creates a dynamic rule, but this
dynamic rule should forward packets only in forward direction.

o And the last case that does not work before - simple fwd rule which
triggers when some dynamic rule is already executed.

PR: kern/147720, kern/150798
MFC after: 1 month


222560 01-Jun-2011 ae

Hide some debug messages under debug macro.

MFC after: 1 week


222559 01-Jun-2011 ae

Hide useless warning under debug macro.

PR: kern/69963
MFC after: 1 week


222488 30-May-2011 rwatson

Decompose the current single inpcbinfo lock into two locks:

- The existing ipi_lock continues to protect the global inpcb list and
inpcb counter. This lock is now relegated to a small number of
allocation and free operations, and occasional operations that walk
all connections (including, awkwardly, certain UDP multicast receive
operations -- something to revisit).

- A new ipi_hash_lock protects the two inpcbinfo hash tables for
looking up connections and bound sockets, manipulated using new
INP_HASH_*() macros. This lock, combined with inpcb locks, protects
the 4-tuple address space.

Unlike the current ipi_lock, ipi_hash_lock follows the individual inpcb
connection locks, so may be acquired while manipulating a connection on
which a lock is already held, avoiding the need to acquire the inpcbinfo
lock preemptively when a binding change might later be required. As a
result, however, lookup operations necessarily go through a reference
acquire while holding the lookup lock, later acquiring an inpcb lock --
if required.

A new function in_pcblookup() looks up connections, and accepts flags
indicating how to return the inpcb. Due to lock order changes, callers
no longer need acquire locks before performing a lookup: the lookup
routine will acquire the ipi_hash_lock as needed. In the future, it will
also be able to use alternative lookup and locking strategies
transparently to callers, such as pcbgroup lookup. New lookup flags are,
supplementing the existing INPLOOKUP_WILDCARD flag:

INPLOOKUP_RLOCKPCB - Acquire a read lock on the returned inpcb
INPLOOKUP_WLOCKPCB - Acquire a write lock on the returned inpcb

Callers must pass exactly one of these flags (for the time being).

Some notes:

- All protocols are updated to work within the new regime; especially,
TCP, UDPv4, and UDPv6. pcbinfo ipi_lock acquisitions are largely
eliminated, and global hash lock hold times are dramatically reduced
compared to previous locking.
- The TCP syncache still relies on the pcbinfo lock, something that we
may want to revisit.
- Support for reverting to the FreeBSD 7.x locking strategy in TCP input
is no longer available -- hash lookup locks are now held only very
briefly during inpcb lookup, rather than for potentially extended
periods. However, the pcbinfo ipi_lock will still be acquired if a
connection state might change such that a connection is added or
removed.
- Raw IP sockets continue to use the pcbinfo ipi_lock for protection,
due to maintaining their own hash tables.
- The interface in6_pcblookup_hash_locked() is maintained, which allows
callers to acquire hash locks and perform one or more lookups atomically
with 4-tuple allocation: this is required only for TCPv6, as there is no
in6_pcbconnect_setup(), which there should be.
- UDPv6 locking remains significantly more conservative than UDPv4
locking, which relates to source address selection. This needs
attention, as it likely significantly reduces parallelism in this code
for multithreaded socket use (such as in BIND).
- In the UDPv4 and UDPv6 multicast cases, we need to revisit locking
somewhat, as they relied on ipi_lock to stablise 4-tuple matches, which
is no longer sufficient. A second check once the inpcb lock is held
should do the trick, keeping the general case from requiring the inpcb
lock for every inpcb visited.
- This work reminds us that we need to revisit locking of the v4/v6 flags,
which may be accessed lock-free both before and after this change.
- Right now, a single lock name is used for the pcbhash lock -- this is
undesirable, and probably another argument is required to take care of
this (or a char array name field in the pcbinfo?).

This is not an MFC candidate for 8.x due to its impact on lookup and
locking semantics. It's possible some of these issues could be worked
around with compatibility wrappers, if necessary.

Reviewed by: bz
Sponsored by: Juniper Networks, Inc.


222474 30-May-2011 ae

Wrap long line.

MFC after: 2 weeks


222473 30-May-2011 ae

Add tablearg support for ipfw setfib.

PR: kern/156410
MFC after: 2 weeks


221521 06-May-2011 ae

Convert delay parameter back to ms when reporting to user.

PR: 156838
MFC after: 1 week


220914 21-Apr-2011 glebius

Use size_t for sopt_valsize.

Submitted by: Brandon Gooch <jamesbrandongooch gmail.com>


220878 20-Apr-2011 bz

MFp4 CH=191466:

Move fw_one_pass to where it belongs: it is a property of ipfw,
not of ip_input.

Reviewed by: gnn
Sponsored by: The FreeBSD Foundation
Sponsored by: iXsystems
MFC after: 3 days


220837 19-Apr-2011 glebius

- Rewrite functions that copyin/out NAT configuration, so that they
calculate required memory size dynamically.
- Fix races on chain re-lock.
- Introduce new field to ip_fw_chain - generation count. Now utilized
only in the NAT configuration, but can be utilized wider in ipfw.
- Get rid of NAT_BUF_LEN in ip_fw.h

PR: kern/143653


220832 19-Apr-2011 ae

Add sysctl handlers for net.inet.ip.dummynet.hash_size, .pipe_byte_limit
and .pipe_slot_limit oids to prevent to set incorrect values.

MFC after: 2 weeks


220831 19-Apr-2011 ae

ipdn_bound_var() functions is designed to bound a variable between
specified minimum and maximum. In case when specified default value
is out of bounds it does not work as expected and does not limit
variable. Check that default value is in range and limit it if needed.
Also bump max_hash_size value to 65536 to correspond with manual page.

PR: kern/152887
MFC after: 2 weeks


220812 19-Apr-2011 ae

Use M_WAITOK instead M_WAIT for malloc. Remove unneded checks.

MFC after: 1 week


220800 18-Apr-2011 glebius

LibAliasInit() should allocate memory with M_WAITOK flag. Modify it
and its callers.


220796 18-Apr-2011 glebius

Pullup up to TCP header length before matching against 'tcpopts'.

PR: kern/156180
Reviewed by: luigi


220568 12-Apr-2011 ae

Restore previous behaviour - always match rule when we doing tagging,
even when tag is already exists.

Reported by: Vadim Goncharov
MFC after: 1 week


220211 31-Mar-2011 ae

Fill up src_port and dst_port variables for SCTP over IPv4.

PR: kern/153415
MFC after: 1 week


220204 31-Mar-2011 ae

Fix malloc types.

MFC after: 1 week


220203 31-Mar-2011 ae

Fix a memory leak. Memory that is allocated for schedulers hash table
was not freed.

PR: kern/156083
MFC after: 1 week


218909 21-Feb-2011 brucec

Fix typos - remove duplicate "the".

PR: bin/154928
Submitted by: Eitan Adler <lists at eitanadler.com>
MFC after: 3 days


218741 16-Feb-2011 pluknet

Bump dummynet module version to meet dummynet schedulers' requirements,
and thus unbreak loading dummynet.ko via /boot/loader.conf.

Reported by: rihad <rihad att mail.ru> on freebsd-net
Approved by: kib (mentor)


218360 05-Feb-2011 luigi

correct the 'output_time' of packets generated by dummynet.
In the dec.2009 rewrite I introduced a bug, using for the
computation the arrival time instead of the time the packet
has exited from the queue.
The bandwidth computation was still correct because it is
computed elsewhere, but traffic was sent out in bursts.

The bug is also present in RELENG_8 after dec.2009

Thanks to Daikichi Osuga for investingating, finding and fixing the
bug with detailed graphs of the behaviour before and after the fix.

Submitted by: Daikichi Osuga
MFC after: 2 weeks


217361 13-Jan-2011 jhb

Use a blocking malloc() to initialize the dummynet taskq.

Reviewed by: luigi


217322 12-Jan-2011 mdf

sysctl(9) cleanup checkpoint: amd64 GENERIC builds cleanly.

Commit the net* piece.


217110 07-Jan-2011 jhb

Use a regular taskqueue for dummynet rather than a "fast" taskqueue.

Reviewed by: luigi


215701 22-Nov-2010 dim

After some off-list discussion, revert a number of changes to the
DPCPU_DEFINE and VNET_DEFINE macros, as these cause problems for various
people working on the affected files. A better long-term solution is
still being considered. This reversal may give some modules empty
set_pcpu or set_vnet sections, but these are harmless.

Changes reverted:

------------------------------------------------------------------------
r215318 | dim | 2010-11-14 21:40:55 +0100 (Sun, 14 Nov 2010) | 4 lines

Instead of unconditionally emitting .globl's for the __start_set_xxx and
__stop_set_xxx symbols, only emit them when the set_vnet or set_pcpu
sections are actually defined.

------------------------------------------------------------------------
r215317 | dim | 2010-11-14 21:38:11 +0100 (Sun, 14 Nov 2010) | 3 lines

Apply the STATIC_VNET_DEFINE and STATIC_DPCPU_DEFINE macros throughout
the tree.

------------------------------------------------------------------------
r215316 | dim | 2010-11-14 21:23:02 +0100 (Sun, 14 Nov 2010) | 2 lines

Add macros to define static instances of VNET_DEFINE and DPCPU_DEFINE.


215317 14-Nov-2010 dim

Apply the STATIC_VNET_DEFINE and STATIC_DPCPU_DEFINE macros throughout
the tree.


215179 12-Nov-2010 luigi

The first customer of the SO_USER_COOKIE option:
the "sockarg" ipfw option matches packets associated to
a local socket and with a non-zero so_user_cookie value.
The value is made available as tablearg, so it can be used
as a skipto target or pipe number in ipfw/dummynet rules.

Code by Paul Joe, manpage by me.

Submitted by: Paul Joe
MFC after: 1 week


213329 01-Oct-2010 luigi

put back the assigment to sched_time. It was correct, and
it was necessary.

Submitted by: Riccardo Panicucci


213279 29-Sep-2010 luigi

remove an unnecessary (and wrong) assignment.
It was meant to reset idle_time (and it was not needed),
but i even used the wrong field.

Obtained from: Oleg
MFC after: 3 days


213267 29-Sep-2010 luigi

whitespace changes in preparation for future commits


213265 29-Sep-2010 luigi

fix handling of initial credit for an idle pipe.
This fixes the bug where setting bw > 1 MTU/tick resulted in
infinite bandwidth if io_fast=1

PR: 147245 148429
Obtained from: Riccardo Panicucci
MFC after: 3 days


213254 28-Sep-2010 luigi

fix breakage in in-kernel NAT: the code did not honor
net.inet.ip.fw.one_pass and always moved to the next rule
in case of a successful nat.

This should fix several related PR (waiting for feedback
before closing them)

PR: 145167 149572 150141
MFC after: 3 days


213253 28-Sep-2010 luigi

Whitespace changes to reduce diffs wrt the most recent ipfw/dummynet code:
+ remove an unused macro,
+ adjust the constants in an enum
+ small whitespace changes

MFC after: 3 days


212256 06-Sep-2010 glebius

in_delayed_cksum() requires host byte order.

Reported by: Alexander Levin <amindomao googlemail.com>
MFC after: 1 week


211992 30-Aug-2010 maxim

o Some programs could send broadcast/multicast traffic to ipfw
pseudo-interface. This leads to a panic due to uninitialized
if_broadcastaddr address. Initialize it and implement ip_output()
method to prevent mbuf leak later.

ipfw pseudo-interface should never send anything therefore call
panic(9) in if_start() method.

PR: kern/149807
Submitted by: Dmitrij Tejblum
MFC after: 2 weeks


210537 27-Jul-2010 glebius

Fix operation of "netgraph" action in conjunction with the
net.inet.ip.fw.one_pass sysctl.

The "ngtee" action is still broken.

PR: kern/148885
Submitted by: Nickolay Dudorov <nnd mail.nsk.ru>


210123 15-Jul-2010 luigi

remove some conditional #ifdefs (no-op on FreeBSD);
run the timer routine on cpu 0.


210120 15-Jul-2010 luigi

whitespace fixes


210119 15-Jul-2010 luigi

fix a comment and final empty line


209845 09-Jul-2010 glebius

Improve last commit: use bpf_mtap2() to avoiding stack usage.

Prodded by: julian


209797 08-Jul-2010 glebius

Since r209216 bpf(4) searches for mbuf_tags(9) and thus will not work with
a stub m_hdr instead of a full mbuf.

PR: kern/148050


209589 29-Jun-2010 glebius

After processing the O_SKIPTO opcode our cmd points to the next rule, and
"match" processing at the end of inner loop would look ahead into the next
rule, which is incorrect. Particularly, in the case when the next rule
started with F_NOT opcode it was skipped blindly.

To fix this, exit the inner loop with the continue operator forcibly and
explicitly.

PR: kern/147798


206845 19-Apr-2010 luigi

whitespace fixes (trailing whitespace, bad indentation
after a merge, etc.)


206461 10-Apr-2010 bz

Try to help with a virtualized dummynet after r206428.

This adds the explicit include (so far probably included through one of the
few "hidden" includes in other header files) for vnet.h and adds a cast
to unbreak LINT-VIMAGE.


206428 09-Apr-2010 luigi

This commit enables partial operation of dummynet with kernels
compiled with "options VIMAGE".
As it is now, there is still a single instance of the pipes,
and it is only usable from vnet0 (the main instance).
Trying to use a pipe from a different vimage does not crash
the system as it did before, but the traffic coming out from
the pipe goes to the wrong place, and i still need to
figure out where.

Support for per-vimage pipes is almost there (just a matter of
uncommenting the VNET_* definitions for dn_cfg, plus putting into
the structure the remaining static variables), however i need
first to figure out how init/uninit work, and also to understand
where packets are ending up on exit from a pipe.

In summary: vimage support for dummynet is not complete yet,
but we are getting there.


206425 09-Apr-2010 luigi

no need to pass an argument to dn_compat_calc_size()

MFC after: 3 days


206339 07-Apr-2010 luigi

Hopefully fix the recent breakage in rule deletion.
A few more tests and this will also go into -stable where
the problem is more critical.


205955 31-Mar-2010 luigi

fix bug in previous commit related to rule deletion
(stable/8 just fixed moments ago)


205831 29-Mar-2010 luigi

remove a leftover debugging message


205830 29-Mar-2010 luigi

Fix handling of set manipulations.
This patch has two fixes for potential kernel panics (one wrong
index, one access to the wrong lock) and two fixes to wrong logic
in a conditional. The potential panics are also on stable/8,
so I am going to MFC the fix quickly.


205602 24-Mar-2010 luigi

Honor ip.fw.one_pass when a packet comes out of a pipe without being delayed.
I forgot to handle this case when i did the mtag cleanup three months ago.

PR: 145004


205417 21-Mar-2010 luigi

Add a priority-based packet scheduler.

Sponsored by: The ONELAB2 Project
Submitted by: Riccardo Panicucci


205415 21-Mar-2010 luigi

no need for ipfw_flush_tables(), we just need ipfw_destroy_tables()


205414 21-Mar-2010 luigi

revise documentation


205178 15-Mar-2010 luigi

small fixes to estimate the buffer size when requesting all pipes/flows.


205173 15-Mar-2010 luigi

+ implement (two lines) the kernel side of 'lookup dscp N' to use the
dscp as a search key in table lookups;

+ (re)implement a sysctl variable to control the expire frequency of
pipes and queues when they become empty;

+ add 'queue number' as optional part of the flow_id. This can be
enabled with the command

queue X config mask queue ...

and makes it possible to support priority-based schedulers, where
packets should be grouped according to the priority and not some
fields in the 5-tuple.
This is implemented as follows:
- redefine a field in the ipfw_flow_id (in sys/netinet/ip_fw.h) but
without changing the size or shape of the structure, so there are
no ABI changes. On passing, also document how other fields are
used, and remove some useless assignments in ip_fw2.c

- implement small changes in the userland code to set/read the field;

- revise the functions in ip_dummynet.c to manipulate masks so they
also handle the additional field;

There are no ABI changes in this commit.


205050 11-Mar-2010 luigi

implement listing of a subset of pipes/queues/schedulers.
The filtering of the output is done in the kernel instead of userland
to reduce the amount of data transfered.


204954 10-Mar-2010 luigi

fix handling of commands issued by RELENG_7 version of /sbin/ipfw,

Submitted by: Riccardo Panicucci


204866 08-Mar-2010 luigi

cosmetic changes and C++ compatibility


204865 08-Mar-2010 luigi

don't use C++ keywords as variable names


204862 08-Mar-2010 luigi

do not report an error unnecessarily


204837 07-Mar-2010 bz

Not only flush the ipfw tables when unloading ipfw or tearing
down a virtual netowrk stack, but also free the Radix Node Head.

Sponsored by: ISPsystem
Reviewed by: julian
MFC after: 5 days


204763 05-Mar-2010 luigi

plug a memory leak on pipe's reconfiguration


204754 05-Mar-2010 luigi

fix a memory leak when deleting RED queues


204736 04-Mar-2010 luigi

portability fixes


204735 04-Mar-2010 luigi

don't use keywords as variable names.


204714 04-Mar-2010 luigi

use callout_drain() (outside the lock) when unloading the module.
This prevents a potential deadlock.

Submitted by: Francesco Magno


204713 04-Mar-2010 luigi

improve compatibility with RELENG_7.2


204591 02-Mar-2010 luigi

Bring in the most recent version of ipfw and dummynet, developed
and tested over the past two months in the ipfw3-head branch. This
also happens to be the same code available in the Linux and Windows
ports of ipfw and dummynet.

The major enhancement is a completely restructured version of
dummynet, with support for different packet scheduling algorithms
(loadable at runtime), faster queue/pipe lookup, and a much cleaner
internal architecture and kernel/userland ABI which simplifies
future extensions.

In addition to the existing schedulers (FIFO and WF2Q+), we include
a Deficit Round Robin (DRR or RR for brevity) scheduler, and a new,
very fast version of WF2Q+ called QFQ.

Some test code is also present (in sys/netinet/ipfw/test) that
lets you build and test schedulers in userland.

Also, we have added a compatibility layer that understands requests
from the RELENG_7 and RELENG_8 versions of the /sbin/ipfw binaries,
and replies correctly (at least, it does its best; sometimes you
just cannot tell who sent the request and how to answer).
The compatibility layer should make it possible to MFC this code in a
relatively short time.

Some minor glitches (e.g. handling of ipfw set enable/disable,
and a workaround for a bug in RELENG_7's /sbin/ipfw) will be
fixed with separate commits.

CREDITS:
This work has been partly supported by the ONELAB2 project, and
mostly developed by Riccardo Panicucci and myself.
The code for the qfq scheduler is mostly from Fabio Checconi,
and Marta Carbone and Francesco Magno have helped with testing,
debugging and some bug fixes.


204003 17-Feb-2010 luigi

remove recursive lock/unlock calls, we do them already before entering
the switch.

Reported by: Marta Carbone


202459 17-Jan-2010 ume

Change 'me' to match any IPv6 address configured on an interface in
the system as well as any IPv4 address.

Reviewed by: David Horn <dhorn2000__at__gmail.com>, luigi, qingli
MFC after: 2 weeks


201745 07-Jan-2010 luigi

we don't use dummynet_drain!


201740 07-Jan-2010 luigi

check that we have an ipv4 packet before swapping ip_len and ip_off.
This should fix the handling of ipv6 packets which i broke when i
made ipfw operate on packets in network format.

Reported by: Hajimu UMEMOTO


201735 07-Jan-2010 luigi

Following up on a request from Ermal Luci to make
ip_divert work as a client of pf(4),
make ip_divert not depend on ipfw.

This is achieved by moving to ip_var.h the struct ipfw_rule_ref
(which is part of the mtag for all reinjected packets) and other
declarations of global variables, and moving to raw_ip.c global
variables for filter and divert hooks.

Note that names and locations could be made more generic
(ipfw_rule_ref is really a generic reference robust to reconfigurations;
the packet filter is not necessarily ipfw; filters and their clients
are not necessarily limited to ipv4), but _right now_ most
of this stuff works on ipfw and ipv4, so i don't feel like
doing a gratuitous renaming, at least for the time being.


201732 07-Jan-2010 luigi

some header shuffling to help decoupling ip_divert from ipfw


201722 07-Jan-2010 luigi

put ip_len in correct order for ip_output().
This prevents a panic when ipfw generates packets on its own
(such as reject or keepalives for dynamic rules).

Reported by: Chagin Dmitry


201568 05-Jan-2010 luigi

this file does not require ip_dummynet.h


201527 04-Jan-2010 luigi

Various cleanup done in ipfw3-head branch including:
- use a uniform mtag format for all packets that exit and re-enter
the firewall in the middle of a rulechain. On reentry, all tags
containing reinject info are renamed to MTAG_IPFW_RULE so the
processing is simpler.

- make ipfw and dummynet use ip_len and ip_off in network format
everywhere. Conversion is done only once instead of tracking
the format in every place.

- use a macro FREE_PKT to dispose of mbufs. This eases portability.

On passing i also removed a few typos, staticise or localise variables,
remove useless declarations and other minor things.

Overall the code shrinks a bit and is hopefully more readable.

I have tested functionality for all but ng_ipfw and if_bridge/if_ethersubr.
For ng_ipfw i am actually waiting for feedback from glebius@ because
we might have some small changes to make.
For if_bridge and if_ethersubr feedback would be welcome
(there are still some redundant parts in these two modules that
I would like to remove, but first i need to check functionality).


201150 29-Dec-2009 luigi

we really need htonl() here, see the comment a few lines above in the code.


201124 28-Dec-2009 luigi

bring the NGM_IPFW_COOKIE back into ng_ipfw.h, libnetgraph expects
to find it there. Unfortunately this reintroduces the dependency
on ip_fw_pfil.c


201122 28-Dec-2009 luigi

bring in several cleanups tested in ipfw3-head branch, namely:

r201011
- move most of ng_ipfw.h into ip_fw_private.h, as this code is
ipfw-specific. This removes a dependency on ng_ipfw.h from some files.

- move many equivalent definitions of direction (IN, OUT) for
reinjected packets into ip_fw_private.h

- document the structure of the packet tags used for dummynet
and netgraph;

r201049
- merge some common code to attach/detach hooks into
a single function.

r201055
- remove some duplicated code in ip_fw_pfil. The input
and output processing uses almost exactly the same code so
there is no need to use two separate hooks.
ip_fw_pfil.o goes from 2096 to 1382 bytes of .text

r201057 (see the svn log for full details)
- macros to make the conversion of ip_len and ip_off
between host and network format more explicit

r201113 (the remaining parts)
- readability fixes -- put braces around some large for() blocks,
localize variables so the compiler does not think they are uninitialized,
do not insist on precise allocation size if we have more than we need.

r201119
- when doing a lookup, keys must be in big endian format because
this is what the radix code expects (this fixes a bug in the
recently-introduced 'lookup' option)

No ABI changes in this commit.

MFC after: 1 week


201121 28-Dec-2009 luigi

readability fixes -- add braces on large blocks, remove unnecessary
initializations


201120 28-Dec-2009 luigi

explain details of operation of table lookups, and improve portability


201046 27-Dec-2009 luigi

diverted packet must re-enter _after_ the matching rule,
or we create loops.
The divert cookie (that can be set from userland too)
contains the matching rule nr, so we must start from nr+1.

Reported by: Joe Marcus Clarke


200951 24-Dec-2009 luigi

fix poor indentation resulting from a merge


200909 23-Dec-2009 luigi

mostly style changes, such as removal of trailing whitespace,
reformatting to avoid unnecessary line breaks, small block
restructuring to avoid unnecessary nesting, replace macros
with function calls, etc.

As a side effect of code restructuring, this commit fixes one bug:
previously, if a realloc() failed, memory was leaked. Now, the
realloc is not there anymore, as we first count how much memory
we need and then do a single malloc.


200897 23-Dec-2009 luigi

fix build with the new fast lookup structure.
Also remove some unnecessary headers


200896 23-Dec-2009 luigi

fix build on 64-bit architectures.
Also fix the indentation on a few lines.


200855 22-Dec-2009 luigi

merge code from ipfw3-head to reduce contention on the ipfw lock
and remove all O(N) sequences from kernel critical sections in ipfw.

In detail:

1. introduce a IPFW_UH_LOCK to arbitrate requests from
the upper half of the kernel. Some things, such as 'ipfw show',
can be done holding this lock in read mode, whereas insert and
delete require IPFW_UH_WLOCK.

2. introduce a mapping structure to keep rules together. This replaces
the 'next' chain currently used in ipfw rules. At the moment
the map is a simple array (sorted by rule number and then rule_id),
so we can find a rule quickly instead of having to scan the list.
This reduces many expensive lookups from O(N) to O(log N).

3. when an expensive operation (such as insert or delete) is done
by userland, we grab IPFW_UH_WLOCK, create a new copy of the map
without blocking the bottom half of the kernel, then acquire
IPFW_WLOCK and quickly update pointers to the map and related info.
After dropping IPFW_LOCK we can then continue the cleanup protected
by IPFW_UH_LOCK. So userland still costs O(N) but the kernel side
is only blocked for O(1).

4. do not pass pointers to rules through dummynet, netgraph, divert etc,
but rather pass a <slot, chain_id, rulenum, rule_id> tuple.
We validate the slot index (in the array of #2) with chain_id,
and if successful do a O(1) dereference; otherwise, we can find
the rule in O(log N) through <rulenum, rule_id>

All the above does not change the userland/kernel ABI, though there
are some disgusting casts between pointers and uint32_t

Operation costs now are as follows:

Function Old Now Planned
-------------------------------------------------------------------
+ skipto X, non cached O(N) O(log N)
+ skipto X, cached O(1) O(1)
XXX dynamic rule lookup O(1) O(log N) O(1)
+ skipto tablearg O(N) O(1)
+ reinject, non cached O(N) O(log N)
+ reinject, cached O(1) O(1)
+ kernel blocked during setsockopt() O(N) O(1)
-------------------------------------------------------------------

The only (very small) regression is on dynamic rule lookup and this will
be fixed in a day or two, without changing the userland/kernel ABI

Supported by: Valeria Paoli
MFC after: 1 month


200838 22-Dec-2009 luigi

some mostly cosmetic changes in preparation for upcoming work:

+ in many places, replace &V_layer3_chain with a local
variable chain;
+ bring the counter of rules and static_len within ip_fw_chain
replacing static variables;
+ remove some spurious comments and extern declaration;
+ document which lock protects certain data structures


200673 18-Dec-2009 ru

Added proper attribution.

Requested by: luigi


200654 17-Dec-2009 luigi

Add some experimental code to log traffic with tcpdump,
similar to pflog(4).
To use the feature, just put the 'log' options on rules
you are interested in, e.g.

ipfw add 5000 count log ....

and run
tcpdump -ni ipfw0 ...

net.inet.ip.fw.verbose=0 enables logging to ipfw0,
net.inet.ip.fw.verbose=1 sends logging to syslog as before.

More features can be added, similar to pflog(), to store in
the MAC header metadata such as rule numbers and actions.
Manpage to come once features are settled.


200634 17-Dec-2009 luigi

simplify and document lookup_next_rule()


200629 17-Dec-2009 luigi

simplify the code that finds the next rule after reinjections

MFC after: 1 week


200610 16-Dec-2009 luigi

remove a duplicate sysctl entry


200603 16-Dec-2009 luigi

bring back a couple of #include that are supplied by nesting,
and explain why they are used.


200601 16-Dec-2009 luigi

Various cosmetic cleanup of the files:
- move global variables around to reduce the scope and make them
static if possible;
- add an ipfw_ prefix to all public functions to prevent conflicts
(the same should be done for variables);
- try to pack variable declaration in an uniform way across files;
- clarify some comments;
- remove some misspelling of names (#define V_foo VNET(bar)) that
slipped in due to cut&paste
- remove duplicate static variables in different files;

MFC after: 1 month


200598 16-Dec-2009 imp

Quick fix to make this compile:
Remove redundant extern declearations.
If the maintainer has a better fix, then feel free to back this out.


200590 15-Dec-2009 luigi

more splitting of ip_fw2.c, now extract the 'table' routines
and the sockopt routines (the upper half of the kernel).

Whoever is the author of the 'table' code (Ruslan/glebius/oleg ?)
please change the attribution in ip_fw_table.c. I have copied
the copyright line from ip_fw2.c but it carries my name and I have
neither written nor designed the feature so I don't deserve
the credit.

MFC after: 1 month


200580 15-Dec-2009 luigi

Start splitting ip_fw2.c and ip_fw.h into smaller components.
At this time we pull out from ip_fw2.c the logging functions, and
support for dynamic rules, and move kernel-only stuff into
netinet/ipfw/ip_fw_private.h

No ABI change involved in this commit, unless I made some mistake.
ip_fw.h has changed, though not in the userland-visible part.

Files touched by this commit:

conf/files
now references the two new source files

netinet/ip_fw.h
remove kernel-only definitions gone into netinet/ipfw/ip_fw_private.h.

netinet/ipfw/ip_fw_private.h
new file with kernel-specific ipfw definitions

netinet/ipfw/ip_fw_log.c
ipfw_log and related functions

netinet/ipfw/ip_fw_dynamic.c
code related to dynamic rules

netinet/ipfw/ip_fw2.c
removed the pieces that goes in the new files

netinet/ipfw/ip_fw_nat.c
minor rearrangement to remove LOOKUP_NAT from the
main headers. This require a new function pointer.

A bunch of other kernel files that included netinet/ip_fw.h now
require netinet/ipfw/ip_fw_private.h as well.
Not 100% sure i caught all of them.

MFC after: 1 month


200567 15-Dec-2009 luigi

implement a new match option,

lookup {dst-ip|src-ip|dst-port|src-port|uid|jail} N

which searches the specified field in table N and sets tablearg
accordingly.
With dst-ip or src-ip the option replicates two existing options.
When used with other arguments, the option can be useful to
quickly dispatch traffic based on other fields.

Work supported by the Onelab project.

MFC after: 1 week


200361 10-Dec-2009 luigi

use div64 when converting back the burst value for userland


200360 10-Dec-2009 luigi

when draining a flowset free the entire chain, not just one packet.


200358 10-Dec-2009 luigi

centralize the code to free a packet (or a chain) while in dummynet.
Remove an old macro and its stale comment.


200170 05-Dec-2009 oleg

Fix burst processing for WF2Q pipes - do not increase available burst size
unless pipe is idle. This should fix follwing issues:
- 'dummynet: OUCH! pipe should have been idle!' log messages.
- exceeding configured pipe bandwidth.

MFC after: 1 week


200118 05-Dec-2009 luigi

adjust comment in previous commit after Julian's explanation


200116 05-Dec-2009 luigi

remove a dead block of code, document how the ipfw clients are
hooked and the difference in handling the 'enable' variable
for layer2 and layer3. The latter needs fixing once i figure out
how it worked pre-vnet.

MFC after: 7 days


200113 05-Dec-2009 luigi

fix build with VNET enabled

Reported by: David Wolfskill


200102 04-Dec-2009 ume

Use INET_ADDRSTRLEN and INET6_ADDRSTRLEN rather than hard
coded number.

Spotted by: bz


200059 03-Dec-2009 luigi

preparation work to replace the monster switch in ipfw_chk() with
table of functions.

This commit (which is heavily based on work done by Marta Carbone
in this year's GSOC project), removes the goto's and explicit
return from the inner switch(), so we will have a easier time when
putting the blocks into individual functions.

MFC after: 3 weeks


200055 03-Dec-2009 ume

Teach an IPv6 to the debug prints.


200040 02-Dec-2009 luigi

- initialize src_ip in the main loop to prevent a compiler warning
(gcc 4.x under linux, not sure how real is the complaint).
- rename a macro argument to prevent name clashes.
- add the macro name on a couple of #endif
- add a blank line for readability.

MFC after: 3 days


200029 02-Dec-2009 luigi

small changes for portability and diff reduction wrt/ FreeBSD 7.
No functional differences.

- use the div64() macro to wrap 64 bit divisions
(which almost always are 64 / 32 bits) so they are easier
to handle with compilers or OS that do not have native
support for 64bit divisions;

- use a local variable for p_numbytes even if not strictly
necessary on HEAD, as it reduces diffs with FreeBSD7

- in dummynet_send() check that a tag is present before
dereferencing the pointer.

- add a couple of blank lines for readability near the end of a function

MFC after: 3 days


200027 02-Dec-2009 ume

Teach an IPv6 to send_pkt() and ipfw_tick().
It fixes the issue which keep-alive doesn't work for an IPv6.

PR: kern/117234
Submitted by: mlaier, Joost Bekkers <joost__at__jodocus.org>
MFC after: 1 month


199073 09-Nov-2009 oleg

style(9): add missing parentheses


198845 03-Nov-2009 oleg

Fix two issues that can lead to exceeding configured pipe bandwidth:
- do not expire queues which are not ready to be expired.
- properly calculate available burst size.

MFC after: 3 days


197952 11-Oct-2009 julian

Virtualize the pfil hooks so that different jails may chose different
packet filters. ALso allows ipfw to be enabled on on ejail and disabled
on another. In 8.0 it's a global setting.

Sitting aroung in tree waiting to commit for: 2 months
MFC after: 2 months


196453 23-Aug-2009 julian

Fix another typo right next to the previous one, that amazingly, I did
not see before.

MFC after: 1 week


196451 23-Aug-2009 julian

Fix typo in comment that has been bugging me for days.

MFC after: 1 week


196423 21-Aug-2009 julian

Fix ipfw's initialization functions to get the correct order of evaluation
to allow vnet and non vnet operation. Move some functions from ip_fw_pfil.c
to ip_fw2.c and mode to mostly using the SYSINIT and VNET_SYSINIT handlers
instead of the modevent handler. Correct some spelling errors in comments
in the affected code. Note this bug fixes a crash in NON VIMAGE kernels when
ipfw is unloaded.

This patch is a minimal patch for 8.0
I have a much larger patch that actually fixes the underlying problems
that will be applied after 8.0

Reviewed by: zec@, rwatson@, bz@(earlier version)
Approved by: re (rwatson)
MFC after: Immediatly


196322 17-Aug-2009 jhb

Purge mergeinfo in sys/ that is either empty or a subset of the parent
mergeinfo on sys/ itself.

Approved by: re (mergeinfo blanket)


196201 14-Aug-2009 julian

Fix ipfw crash on uid or gid check.
Receiving any ip packet for which there is no existing socket will
crash if ipfw has a uid or gid test rule, as the uid/gid
of the non existent owner of said non existent socket is tested.
Brooks introduced this error as part of his >16 gids patch.
It appears to be a cut-n-paste error from similar code a few lines
before. The old code used the 'pcb' variable here, but in the
new code that switched the 'inp' variable, which is often NULL
and what is tested in the code further up. The rest of the multi-gid
patch for ipfw seems solid (and cleaner than previous code).

Reviewed by: brooks
Approved by: re (rwatson)


196019 01-Aug-2009 rwatson

Merge the remainder of kern_vimage.c and vimage.h into vnet.c and
vnet.h, we now use jails (rather than vimages) as the abstraction
for virtualization management, and what remained was specific to
virtual network stacks. Minor cleanups are done in the process,
and comments updated to reflect these changes.

Reviewed by: bz
Approved by: re (vimage blanket)


195923 28-Jul-2009 julian

Startup the vnet part of initialization a bit after the global part.
Fixes crash on boot if ipfw compiled in.

Submitted by: tegge@
Reviewed by: tegge@
Approved by: re (kib)


195862 25-Jul-2009 julian

Catch ipfw up to the rest of the vimage code.
It got left behind when it moved to its new location.

Approved by: re (kensmith)


195727 16-Jul-2009 rwatson

Remove unused VNET_SET() and related macros; only VNET_GET() is
ever actually used. Rename VNET_GET() to VNET() to shorten
variable references.

Discussed with: bz, julian
Reviewed by: bz
Approved by: re (kensmith, kib)


195699 14-Jul-2009 rwatson

Build on Jeff Roberson's linker-set based dynamic per-CPU allocator
(DPCPU), as suggested by Peter Wemm, and implement a new per-virtual
network stack memory allocator. Modify vnet to use the allocator
instead of monolithic global container structures (vinet, ...). This
change solves many binary compatibility problems associated with
VIMAGE, and restores ELF symbols for virtualized global variables.

Each virtualized global variable exists as a "reference copy", and also
once per virtual network stack. Virtualized global variables are
tagged at compile-time, placing the in a special linker set, which is
loaded into a contiguous region of kernel memory. Virtualized global
variables in the base kernel are linked as normal, but those in modules
are copied and relocated to a reserved portion of the kernel's vnet
region with the help of a the kernel linker.

Virtualized global variables exist in per-vnet memory set up when the
network stack instance is created, and are initialized statically from
the reference copy. Run-time access occurs via an accessor macro, which
converts from the current vnet and requested symbol to a per-vnet
address. When "options VIMAGE" is not compiled into the kernel, normal
global ELF symbols will be used instead and indirection is avoided.

This change restores static initialization for network stack global
variables, restores support for non-global symbols and types, eliminates
the need for many subsystem constructors, eliminates large per-subsystem
structures that caused many binary compatibility issues both for
monitoring applications (netstat) and kernel modules, removes the
per-function INIT_VNET_*() macros throughout the stack, eliminates the
need for vnet_symmap ksym(2) munging, and eliminates duplicate
definitions of virtualized globals under VIMAGE_GLOBALS.

Bump __FreeBSD_version and update UPDATING.

Portions submitted by: bz
Reviewed by: bz, zec
Discussed with: gnn, jamie, jeff, jhb, julian, sam
Suggested by: peter
Approved by: re (kensmith)


195023 26-Jun-2009 rwatson

Update various IPFW-related modules to use if_addr_rlock()/
if_addr_runlock() rather than IF_ADDR_LOCK()/IF_ADDR_UNLOCK().

MFC after: 6 weeks


194930 24-Jun-2009 oleg

- fix dummynet 'fast' mode for WF2Q case.
- fix printing of pipe profile data.
- introduce new pipe parameter: 'burst' - how much data can be sent through
pipe bypassing bandwidth limit.


194498 19-Jun-2009 brooks

Rework the credential code to support larger values of NGROUPS and
NGROUPS_MAX, eliminate ABI dependencies on them, and raise the to 1024
and 1023 respectively. (Previously they were equal, but under a close
reading of POSIX, NGROUPS_MAX was defined to be too large by 1 since it
is the number of supplemental groups, not total number of groups.)

The bulk of the change consists of converting the struct ucred member
cr_groups from a static array to a pointer. Do the equivalent in
kinfo_proc.

Introduce new interfaces crcopysafe() and crsetgroups() for duplicating
a process credential before modifying it and for setting group lists
respectively. Both interfaces take care for the details of allocating
groups array. crsetgroups() takes care of truncating the group list
to the current maximum (NGROUPS) if necessary. In the future,
crsetgroups() may be responsible for insuring invariants such as sorting
the supplemental groups to allow groupmember() to be implemented as a
binary search.

Because we can not change struct xucred without breaking application
ABIs, we leave it alone and introduce a new XU_NGROUPS value which is
always 16 and is to be used or NGRPS as appropriate for things such as
NFS which need to use no more than 16 groups. When feasible, truncate
the group list rather than generating an error.

Minor changes:
- Reduce the number of hand rolled versions of groupmember().
- Do not assign to both cr_gid and cr_groups[0].
- Modify ipfw to cache ucreds instead of part of their contents since
they are immutable once referenced by more than one entity.

Submitted by: Isilon Systems (initial implementation)
X-MFC after: never
PR: bin/113398 kern/133867


194245 15-Jun-2009 oleg

Since dn_pipe.numbytes is int64_t now - remove unnecessary overflow detection
code in ready_event_wfq().


193896 10-Jun-2009 luigi

in ip_dn_ctl(), do not allocate a large structure on the stack,
and use malloc() instead if/when it is necessary.

The problem is less relevant in previous versions because
the variable involved (tmp_pipe) is much smaller there.
Still worth fixing though.

Submitted by: Marta Carbone (GSOC)
MFC after: 3 days


193894 10-Jun-2009 luigi

small simplifications to the code in charge of reaping deleted rules:
- clear the head pointer immediately before using it, so there is
no chance of mistakes;
- call reap_rules() unconditionally. The function can handle a NULL
argument just fine, and the cost of the extra call is hardly
significant given that we do it rarely and outside the lock.

MFC after: 3 days


193859 09-Jun-2009 oleg

Close long existed race with net.inet.ip.fw.one_pass = 0:
If packet leaves ipfw to other kernel subsystem (dummynet, netgraph, etc)
it carries pointer to matching ipfw rule. If this packet then reinjected back
to ipfw, ruleset processing starts from that rule. If rule was deleted
meanwhile, due to existed race condition panic was possible (as well as
other odd effects like parsing rules in 'reap list').

P.S. this commit changes ABI so userland ipfw related binaries should be
recompiled.

MFC after: 1 month
Tested by: Mikolaj Golub


193744 08-Jun-2009 bz

After r193232 rt_tables in vnet.h are no longer indirectly dependent on
the ROUTETABLES kernel option thus there is no need to include opt_route.h
anymore in all consumers of vnet.h and no longer depend on it for module
builds.

Remove the hidden include in flowtable.h as well and leave the two
explicit #includes in ip_input.c and ip_output.c.


193532 05-Jun-2009 luigi

move kernel ipfw-related sources to a separate directory,
adjust conf/files and modules' Makefiles accordingly.

No code or ABI changes so this and most of previous related
changes can be easily MFC'ed

MFC after: 5 days