History log of /freebsd-11-stable/sys/netpfil/ipfw/ip_fw_log.c
Revision Date Author Comments
(<<< Hide modified files)
(Show modified files >>>)
# 346201 14-Apr-2019 ae

MFC r342908:
Reduce the size of struct ip_fw_args from 240 to 128 bytes on amd64.
And refactor the code to avoid unneeded initialization to reduce overhead
of per-packet processing.

ipfw(4) can be invoked by pfil(9) framework for each packet several times.
Each call uses on-stack variable of type struct ip_fw_args to keep the
state of ipfw(4) processing. Currently this variable has 240 bytes size
on amd64. Each time ipfw(4) does bzero() on it, and then it initializes
some fields.

glebius@ has reported that they at Netflix discovered, that initialization
of this variable produces significant overhead on packet processing.
After patching I managed to increase performance of packet processing on
simple routing with ipfw(4) firewalling to about 11% from 9.8Mpps up to
11Mpps (Xeon E5-2660 v4@ + Mellanox 100G card).

Introduced new field flags, it is used to keep track of what fields was
initialized. Some fields were moved into the anonymous union, to reduce
the size. They all are mutually exclusive. dummypar field was unused, and
therefore it is removed. The hopstore6 field type was changed from
sockaddr_in6 to a bit smaller struct ip_fw_nh6. And now the size of struct
ip_fw_args is 128 bytes.

ipfw_chk() was modified to properly handle ip_fw_args.flags instead of
rely on checking for NULL pointers.

Obtained from: Yandex LLC
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D18690

MFC r342909:
Fix the build with INVARIANTS.

MFC r343551:
Fix the bug introduced in r342908, that causes problems with dynamic
handling for protocols without ports numbers.

Since port numbers were uninitialized for protocols like ICMP/ICMPv6,
ipfw_chk() used some non-zero values to create dynamic states, and due
this it failed to match replies with created states.

Reported by: Oliver Hartmann, Boris Lytochkin
Obtained from: Yandex LLC


# 332229 07-Apr-2018 tuexen

MFC r326233:

Add to ipfw support for sending an SCTP packet containing an ABORT chunk.
This is similar to the TCP case. where a TCP RST segment can be sent.

There is one limitation: When sending an ABORT in response to an incoming
packet, it should be tested if there is no ABORT chunk in the received
packet. Currently, it is only checked if the first chunk is an ABORT
chunk to avoid parsing the whole packet, which could result in a DOS attack.

Thanks to Timo Voelker for helping me to test this patch.

MFC r327200:

When adding support for sending SCTP packets containing an ABORT chunk
to ipfw in https://svnweb.freebsd.org/changeset/base/326233,
a dependency on the SCTP stack was added to ipfw by accident.

This was noted by Kevel Bowling in https://reviews.freebsd.org/D13594
where also a solution was suggested. This patch is based on Kevin's
suggestion, but implements the required SCTP checksum computation
without any dependency on other SCTP sources.

While there, do some cleanups and improve comments.

Thanks to Kevin Kevin Bowling for reporting the issue and suggesting
a fix.


# 328772 02-Feb-2018 ae

MFC r328161:
Add UDPLite support to ipfw(4).

Now it is possible to use UDPLite's port numbers in rules,
create dynamic states for UDPLite packets and see "UDPLite" for matched
packets in log.

Obtained from: Yandex LLC
Sponsored by: Yandex LLC


# 317044 17-Apr-2017 ae

MFC r316433:
Add the log formatting for an external action opcode.

Obtained from: Yandex LLC
Sponsored by: Yandex LLC


# 316446 03-Apr-2017 ae

MFC r304041:
Move logging via BPF support into separate file.

* make interface cloner VNET-aware;
* simplify cloner code and use if_clone_simple();
* migrate LOGIF_LOCK() to rmlock;
* add ipfw_bpf_mtap2() function to pass mbuf to BPF;
* introduce new additional ipfwlog0 pseudo interface. It differs from
ipfw0 by DLT type used in bpfattach. This interface is intended to
used by ipfw modules to dump packets with additional info attached.
Currently pflog format is used. ipfw_bpf_mtap2() function uses second
argument to determine which interface use for dumping. If dlen is equal
to ETHER_HDR_LEN it uses old ipfw0 interface, if dlen is equal to
PFLOG_HDRLEN - ipfwlog0 will be used.

Obtained from: Yandex LLC
Sponsored by: Yandex LLC

MFC r304043:
Add three helper function to manage tables from external modules.

ipfw_objhash_lookup_table_kidx does lookup kernel index of table;
ipfw_ref_table/ipfw_unref_table takes and releases reference to table.

Obtained from: Yandex LLC
Sponsored by: Yandex LLC

MFC r304046, 304108:
Add ipfw_nat64 module that implements stateless and stateful NAT64.

The module works together with ipfw(4) and implemented as its external
action module.

Stateless NAT64 registers external action with name nat64stl. This
keyword should be used to create NAT64 instance and to address this
instance in rules. Stateless NAT64 uses two lookup tables with mapped
IPv4->IPv6 and IPv6->IPv4 addresses to perform translation.

A configuration of instance should looks like this:
1. Create lookup tables:
# ipfw table T46 create type addr valtype ipv6
# ipfw table T64 create type addr valtype ipv4
2. Fill T46 and T64 tables.
3. Add rule to allow neighbor solicitation and advertisement:
# ipfw add allow icmp6 from any to any icmp6types 135,136
4. Create NAT64 instance:
# ipfw nat64stl NAT create table4 T46 table6 T64
5. Add rules that matches the traffic:
# ipfw add nat64stl NAT ip from any to table(T46)
# ipfw add nat64stl NAT ip from table(T64) to 64:ff9b::/96
6. Configure DNS64 for IPv6 clients and add route to 64:ff9b::/96
via NAT64 host.

Stateful NAT64 registers external action with name nat64lsn. The only
one option required to create nat64lsn instance - prefix4. It defines
the pool of IPv4 addresses used for translation.

A configuration of instance should looks like this:
1. Add rule to allow neighbor solicitation and advertisement:
# ipfw add allow icmp6 from any to any icmp6types 135,136
2. Create NAT64 instance:
# ipfw nat64lsn NAT create prefix4 A.B.C.D/28
3. Add rules that matches the traffic:
# ipfw add nat64lsn NAT ip from any to A.B.C.D/28
# ipfw add nat64lsn NAT ip6 from any to 64:ff9b::/96
4. Configure DNS64 for IPv6 clients and add route to 64:ff9b::/96
via NAT64 host.

Obtained from: Yandex LLC
Relnotes: yes
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D6434

MFC r304048:
Replace __noinline with special debug macro NAT64NOINLINE.

MFC r304061:
Use %ju to print unsigned 64-bit value.

MFC r304076:
Make statistics nat64lsn, nat64stl an nptv6 output netstat-like:
"@value @description" and fix build due to -Wformat errors.

MFC r304378 (by bz):
Try to fix gcc compilation errors (which are right).
nat64_getlasthdr() returns an int, which can be -1 in case of error,
storing the result in an uint8_t and then comparing to < 0 is not
helpful. Do what is done in the rest of the code and make proto an
int here as well.

MFC r309187:
Fix ICMPv6 Time Exceeded error message translation.

MFC r314718:
Use new ipfw_lookup_table() in the nat64 too.

MFC r315204,315233:
Use memset with structure size.


# 315456 17-Mar-2017 vangyzen

MFC r313821 r315277 r315286

Use inet_ntoa_r() instead of inet_ntoa() throughout the kernel.

inet_ntoa() cannot be used safely in a multithreaded environment
because it uses a static local buffer. Instead, use inet_ntoa_r()
with a buffer on the caller's stack, except for KTR messages.
KTR can correctly log the immediate integral values passed to it,
as well as constant strings, but not non-constant strings,
since they might change by the time ktrdump retrieves them.
Therefore, use hex notation in KTR messages.

Sponsored by: Dell EMC


# 302408 07-Jul-2016 gjb

Copy head@r302406 to stable/11 as part of the 11.0-RELEASE cycle.
Prune svn:mergeinfo from the new branch, as nothing has been merged
here.

Additional commits post-branch will follow.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation


/freebsd-11-stable/MAINTAINERS
/freebsd-11-stable/cddl
/freebsd-11-stable/cddl/contrib/opensolaris
/freebsd-11-stable/cddl/contrib/opensolaris/cmd/dtrace/test/tst/common/print
/freebsd-11-stable/cddl/contrib/opensolaris/cmd/zfs
/freebsd-11-stable/cddl/contrib/opensolaris/lib/libzfs
/freebsd-11-stable/contrib/amd
/freebsd-11-stable/contrib/apr
/freebsd-11-stable/contrib/apr-util
/freebsd-11-stable/contrib/atf
/freebsd-11-stable/contrib/binutils
/freebsd-11-stable/contrib/bmake
/freebsd-11-stable/contrib/byacc
/freebsd-11-stable/contrib/bzip2
/freebsd-11-stable/contrib/com_err
/freebsd-11-stable/contrib/compiler-rt
/freebsd-11-stable/contrib/dialog
/freebsd-11-stable/contrib/dma
/freebsd-11-stable/contrib/dtc
/freebsd-11-stable/contrib/ee
/freebsd-11-stable/contrib/elftoolchain
/freebsd-11-stable/contrib/elftoolchain/ar
/freebsd-11-stable/contrib/elftoolchain/brandelf
/freebsd-11-stable/contrib/elftoolchain/elfdump
/freebsd-11-stable/contrib/expat
/freebsd-11-stable/contrib/file
/freebsd-11-stable/contrib/gcc
/freebsd-11-stable/contrib/gcclibs/libgomp
/freebsd-11-stable/contrib/gdb
/freebsd-11-stable/contrib/gdtoa
/freebsd-11-stable/contrib/groff
/freebsd-11-stable/contrib/ipfilter
/freebsd-11-stable/contrib/ldns
/freebsd-11-stable/contrib/ldns-host
/freebsd-11-stable/contrib/less
/freebsd-11-stable/contrib/libarchive
/freebsd-11-stable/contrib/libarchive/cpio
/freebsd-11-stable/contrib/libarchive/libarchive
/freebsd-11-stable/contrib/libarchive/libarchive_fe
/freebsd-11-stable/contrib/libarchive/tar
/freebsd-11-stable/contrib/libc++
/freebsd-11-stable/contrib/libc-vis
/freebsd-11-stable/contrib/libcxxrt
/freebsd-11-stable/contrib/libexecinfo
/freebsd-11-stable/contrib/libpcap
/freebsd-11-stable/contrib/libstdc++
/freebsd-11-stable/contrib/libucl
/freebsd-11-stable/contrib/libxo
/freebsd-11-stable/contrib/llvm
/freebsd-11-stable/contrib/llvm/projects/libunwind
/freebsd-11-stable/contrib/llvm/tools/clang
/freebsd-11-stable/contrib/llvm/tools/lldb
/freebsd-11-stable/contrib/llvm/tools/llvm-dwarfdump
/freebsd-11-stable/contrib/llvm/tools/llvm-lto
/freebsd-11-stable/contrib/mdocml
/freebsd-11-stable/contrib/mtree
/freebsd-11-stable/contrib/ncurses
/freebsd-11-stable/contrib/netcat
/freebsd-11-stable/contrib/ntp
/freebsd-11-stable/contrib/nvi
/freebsd-11-stable/contrib/one-true-awk
/freebsd-11-stable/contrib/openbsm
/freebsd-11-stable/contrib/openpam
/freebsd-11-stable/contrib/openresolv
/freebsd-11-stable/contrib/pf
/freebsd-11-stable/contrib/sendmail
/freebsd-11-stable/contrib/serf
/freebsd-11-stable/contrib/sqlite3
/freebsd-11-stable/contrib/subversion
/freebsd-11-stable/contrib/tcpdump
/freebsd-11-stable/contrib/tcsh
/freebsd-11-stable/contrib/tnftp
/freebsd-11-stable/contrib/top
/freebsd-11-stable/contrib/top/install-sh
/freebsd-11-stable/contrib/tzcode/stdtime
/freebsd-11-stable/contrib/tzcode/zic
/freebsd-11-stable/contrib/tzdata
/freebsd-11-stable/contrib/unbound
/freebsd-11-stable/contrib/vis
/freebsd-11-stable/contrib/wpa
/freebsd-11-stable/contrib/xz
/freebsd-11-stable/crypto/heimdal
/freebsd-11-stable/crypto/openssh
/freebsd-11-stable/crypto/openssl
/freebsd-11-stable/gnu/lib
/freebsd-11-stable/gnu/usr.bin/binutils
/freebsd-11-stable/gnu/usr.bin/cc/cc_tools
/freebsd-11-stable/gnu/usr.bin/gdb
/freebsd-11-stable/lib/libc/locale/ascii.c
/freebsd-11-stable/sys/cddl/contrib/opensolaris
/freebsd-11-stable/sys/contrib/dev/acpica
/freebsd-11-stable/sys/contrib/ipfilter
/freebsd-11-stable/sys/contrib/libfdt
/freebsd-11-stable/sys/contrib/octeon-sdk
/freebsd-11-stable/sys/contrib/x86emu
/freebsd-11-stable/sys/contrib/xz-embedded
/freebsd-11-stable/usr.sbin/bhyve/atkbdc.h
/freebsd-11-stable/usr.sbin/bhyve/bhyvegc.c
/freebsd-11-stable/usr.sbin/bhyve/bhyvegc.h
/freebsd-11-stable/usr.sbin/bhyve/console.c
/freebsd-11-stable/usr.sbin/bhyve/console.h
/freebsd-11-stable/usr.sbin/bhyve/pci_fbuf.c
/freebsd-11-stable/usr.sbin/bhyve/pci_xhci.c
/freebsd-11-stable/usr.sbin/bhyve/pci_xhci.h
/freebsd-11-stable/usr.sbin/bhyve/ps2kbd.c
/freebsd-11-stable/usr.sbin/bhyve/ps2kbd.h
/freebsd-11-stable/usr.sbin/bhyve/ps2mouse.c
/freebsd-11-stable/usr.sbin/bhyve/ps2mouse.h
/freebsd-11-stable/usr.sbin/bhyve/rfb.c
/freebsd-11-stable/usr.sbin/bhyve/rfb.h
/freebsd-11-stable/usr.sbin/bhyve/sockstream.c
/freebsd-11-stable/usr.sbin/bhyve/sockstream.h
/freebsd-11-stable/usr.sbin/bhyve/usb_emul.c
/freebsd-11-stable/usr.sbin/bhyve/usb_emul.h
/freebsd-11-stable/usr.sbin/bhyve/usb_mouse.c
/freebsd-11-stable/usr.sbin/bhyve/vga.c
/freebsd-11-stable/usr.sbin/bhyve/vga.h
# 302290 29-Jun-2016 bz

Move the ipfw_log_bpf() calls from global module initialisation to
per-VNET initialisation and virtualise the interface cloning to
allow a dedicated ipfw log interface per VNET.

Approved by: re (gjb)
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation


# 295126 01-Feb-2016 glebius

These files were getting sys/malloc.h and vm/uma.h with header pollution
via sys/mbuf.h


# 290545 08-Nov-2015 melifaro

Print proper setfib values in ipfw log.

Submitted by: Denis Schneider <v1ne2go at gmail>


# 280910 31-Mar-2015 ae

The offset variable has been cleared all bits except IP6F_OFF_MASK.
Use ip6f_mf variable instead of checking its bits.


# 272840 09-Oct-2014 melifaro

Merge projects/ipfw to HEAD.

Main user-visible changes are related to tables:

* Tables are now identified by names, not numbers.
There can be up to 65k tables with up to 63-byte long names.
* Tables are now set-aware (default off), so you can switch/move
them atomically with rules.
* More functionality is supported (swap, lock, limits, user-level lookup,
batched add/del) by generic table code.
* New table types are added (flow) so you can match multiple packet fields at once.
* Ability to add different type of lookup algorithms for particular
table type has been added.
* New table algorithms are added (cidr:hash, iface:array, number:array and
flow:hash) to make certain types of lookup more effective.
* Table value are now capable of holding multiple data fields for
different tablearg users

Performance changes:
* Main ipfw lock was converted to rmlock
* Rule counters were separated from rule itself and made per-cpu.
* Radix table entries fits into 128 bytes
* struct ip_fw is now more compact so more rules will fit into 64 bytes
* interface tables uses array of existing ifindexes for faster match

ABI changes:
All functionality supported by old ipfw(8) remains functional.
Old & new binaries can work together with the following restrictions:
* Tables named other than ^\d+$ are shown as table(65535) in
ruleset in old binaries

Internal changes:.
Changing table ids to numbers resulted in format modification for
most sockopt codes. Old sopt format was compact, but very hard to
extend (no versioning, inability to add more opcodes), so
* All relevant opcodes were converted to TLV-based versioned IP_FW3-based codes.
* The remaining opcodes were also converted to be able to eliminate
all older opcodes at once
* All IP_FW3 handlers uses special API instead of calling sooptcopy*
directly to ease adding another communication methods
* struct ip_fw is now different for kernel and userland
* tablearg value has been changed to 0 to ease future extensions
* table "values" are now indexes in special value array which
holds extended data for given index
* Batched add/delete has been added to tables code
* Most changes has been done to permit batched rule addition.
* interface tracking API has been added (started on demand)
to permit effective interface tables operations
* O(1) skipto cache, currently turned off by default at
compile-time (eats 512K).

* Several steps has been made towards making libipfw:
* most of new functions were separated into "parse/prepare/show
and actuall-do-stuff" pieces (already merged).
* there are separate functions for parsing text string into "struct ip_fw"
and printing "struct ip_fw" to supplied buffer (already merged).
* Probably some more less significant/forgotten features

MFC after: 1 month
Sponsored by: Yandex LLC


# 258465 22-Nov-2013 luigi

make this code compile in userspace on OSX


# 257176 26-Oct-2013 glebius

The r48589 promised to remove implicit inclusion of if_var.h soon. Prepare
to this event, adding if_var.h to files that do need it. Also, include
all includes that now are included due to implicit pollution via if_var.h

Sponsored by: Netflix
Sponsored by: Nginx, Inc.


# 255928 28-Sep-2013 philip

Use the correct EtherType for logging IPv6 packets.

Reviewed by: melifaro
Approved by: re (kib, glebius)
MFC after: 3 days


# 249925 26-Apr-2013 glebius

Add const qualifier to the dst parameter of the ifnet if_output method.


# 248552 20-Mar-2013 melifaro

Add ipfw support for setting/matching DiffServ codepoints (DSCP).

Setting DSCP support is done via O_SETDSCP which works for both
IPv4 and IPv6 packets. Fast checksum recalculation (RFC 1624) is done for IPv4.
Dscp can be specified by name (AFXY, CSX, BE, EF), by value
(0..63) or via tablearg.

Matching DSCP is done via another opcode (O_DSCP) which accepts several
classes at once (af11,af22,be). Classes are stored in bitmask (2 u32 words).

Many people made their variants of this patch, the ones I'm aware of are
(in alphabetic order):

Dmitrii Tejblum
Marcelo Araujo
Roman Bogorodskiy (novel)
Sergey Matveichuk (sem)
Sergey Ryabin

PR: kern/102471, kern/121122
MFC after: 2 weeks


# 244633 23-Dec-2012 melifaro

Use unified IP_FW_ARG_TABLEARG() macro for most tablearg checks.
Log real value instead of IP_FW_TABLEARG (65535) in ipfw_log().

Noticed by: Vitaliy Tokarenko <rphone@ukr.net>
MFC after: 2 weeks


# 241610 16-Oct-2012 glebius

Make the "struct if_clone" opaque to users of the cloning API. Users
now use function calls:

if_clone_simple()
if_clone_advanced()

to initialize a cloner, instead of macros that initialize if_clone
structure.

Discussed with: brooks, bz, 1 year ago


# 240494 14-Sep-2012 glebius

o Create directory sys/netpfil, where all packet filters should
reside, and move there ipfw(4) and pf(4).

o Move most modified parts of pf out of contrib.

Actual movements:

sys/contrib/pf/net/*.c -> sys/netpfil/pf/
sys/contrib/pf/net/*.h -> sys/net/
contrib/pf/pfctl/*.c -> sbin/pfctl
contrib/pf/pfctl/*.h -> sbin/pfctl
contrib/pf/pfctl/pfctl.8 -> sbin/pfctl
contrib/pf/pfctl/*.4 -> share/man/man4
contrib/pf/pfctl/*.5 -> share/man/man5

sys/netinet/ipfw -> sys/netpfil/ipfw

The arguable movement is pf/net/*.h -> sys/net. There are
future plans to refactor pf includes, so I decided not to
break things twice.

Not modified bits of pf left in contrib: authpf, ftp-proxy,
tftp-proxy, pflogd.

The ipfw(4) movement is planned to be merged to stable/9,
to make head and stable match.

Discussed with: bz, luigi


# 239997 01-Sep-2012 eadler

Mark the ipfw interface type as not being ether. This fixes an issue
where uuidgen tried to obtain a ipfw device's mac address which was
always zero.

PR: 170460
Submitted by: wxs
Reviewed by: bdrewery
Reviewed by: delphij
Approved by: cperciva
MFC after: 1 week


# 239092 06-Aug-2012 luigi

use FREE_PKT instead of m_freem to free an mbuf.
The former is the standard form used in ipfw/dummynet, so that
it is easier to remap it to different memory managers depending
on the platform.


# 238978 01-Aug-2012 luigi

replace inet_ntoa_r with the more standard inet_ntop().
As discussed on -current, inet_ntoa_r() is non standard,
has different arguments in userspace and kernel, and
almost unused (no clients in userspace, only
net/flowtable.c, net/if_llatbl.c, netinet/in_pcb.c, netinet/tcp_subr.c
in the kernel)


# 238277 09-Jul-2012 hrs

Make ipfw0 logging pseudo-interface clonable. It can be created automatically
by $firewall_logif rc.conf(5) variable at boot time or manually by ifconfig(8)
after a boot.

Discussed on: freebsd-ipfw@


# 227085 04-Nov-2011 bz

Always use the opt_*.h options for ipfw.ko, not just when
compiled into the kernel.
Do not try to build the module in case of no INET support but
keep #error calls for now in case we would compile it into the
kernel.

This should fix an issue where the module would fail to enable
IPv6 support from the rc framework, but also other INET and INET6
parts being silently compiled out without giving a warning in the
module case.

While here garbage collect unneeded opt_*.h includes.
opt_ipdn.h is not used anywhere but we need to leave the DUMMYNET
entry in options for conditional inclusion in kernel so keep the
file with the same name.

Reported by: pluknet
Reviewed by: plunket, jhb
MFC After: 3 days


# 225518 12-Sep-2011 jhb

Allow the ipfw.ko module built with a kernel to honor any IPFIREWALL_*
options defined in the kernel config. This more closely matches the
behavior of other modules which inherit configuration settings from the
kernel configuration during a kernel + modules build.

Reviewed by: luigi
Approved by: re (kib)
MFC after: 1 week


# 225044 20-Aug-2011 bz

Add support for IPv6 to ipfw fwd:
Distinguish IPv4 and IPv6 addresses and optional port numbers in
user space to set the option for the correct protocol family.
Add support in the kernel for carrying the new IPv6 destination
address and port.
Add support to TCP and UDP for IPv6 and fix UDP IPv4 to not change
the address in the IP header.
Add support for IPv6 forwarding to a non-local destination.
Add a regession test uitilizing VIMAGE to check all 20 possible
combinations I could think of.

Obtained from: David Dolson at Sandvine Incorporated
(original version for ipfw fwd IPv6 support)
Sponsored by: Sandvine Incorporated
PR: bin/117214
MFC after: 4 weeks
Approved by: re (kib)


# 225034 20-Aug-2011 bz

After r225032 fix logging in a similar way masking the the IPv6
more fragments flag off so that offset == 0 checks work properly.

PR: kern/145733
Submitted by: Matthew Luckie (mjl luckie.org.nz)
MFC after: 2 weeks
X-MFC with: r225032
Approved by: re (kib)


# 223666 29-Jun-2011 ae

Add new rule actions "call" and "return" to ipfw. They make
possible to organize subroutines with rules.

The "call" action saves the current rule number in the internal
stack and rules processing continues from the first rule with
specified number (similar to skipto action). If later a rule with
"return" action is encountered, the processing returns to the first
rule with number of "call" rule saved in the stack plus one or higher.

Submitted by: Vadim Goncharov
Discussed by: ipfw@, luigi@


# 211992 30-Aug-2010 maxim

o Some programs could send broadcast/multicast traffic to ipfw
pseudo-interface. This leads to a panic due to uninitialized
if_broadcastaddr address. Initialize it and implement ip_output()
method to prevent mbuf leak later.

ipfw pseudo-interface should never send anything therefore call
panic(9) in if_start() method.

PR: kern/149807
Submitted by: Dmitrij Tejblum
MFC after: 2 weeks


# 209845 09-Jul-2010 glebius

Improve last commit: use bpf_mtap2() to avoiding stack usage.

Prodded by: julian


# 209797 08-Jul-2010 glebius

Since r209216 bpf(4) searches for mbuf_tags(9) and thus will not work with
a stub m_hdr instead of a full mbuf.

PR: kern/148050


# 205173 15-Mar-2010 luigi

+ implement (two lines) the kernel side of 'lookup dscp N' to use the
dscp as a search key in table lookups;

+ (re)implement a sysctl variable to control the expire frequency of
pipes and queues when they become empty;

+ add 'queue number' as optional part of the flow_id. This can be
enabled with the command

queue X config mask queue ...

and makes it possible to support priority-based schedulers, where
packets should be grouped according to the priority and not some
fields in the 5-tuple.
This is implemented as follows:
- redefine a field in the ipfw_flow_id (in sys/netinet/ip_fw.h) but
without changing the size or shape of the structure, so there are
no ABI changes. On passing, also document how other fields are
used, and remove some useless assignments in ip_fw2.c

- implement small changes in the userland code to set/read the field;

- revise the functions in ip_dummynet.c to manipulate masks so they
also handle the additional field;

There are no ABI changes in this commit.


# 204591 02-Mar-2010 luigi

Bring in the most recent version of ipfw and dummynet, developed
and tested over the past two months in the ipfw3-head branch. This
also happens to be the same code available in the Linux and Windows
ports of ipfw and dummynet.

The major enhancement is a completely restructured version of
dummynet, with support for different packet scheduling algorithms
(loadable at runtime), faster queue/pipe lookup, and a much cleaner
internal architecture and kernel/userland ABI which simplifies
future extensions.

In addition to the existing schedulers (FIFO and WF2Q+), we include
a Deficit Round Robin (DRR or RR for brevity) scheduler, and a new,
very fast version of WF2Q+ called QFQ.

Some test code is also present (in sys/netinet/ipfw/test) that
lets you build and test schedulers in userland.

Also, we have added a compatibility layer that understands requests
from the RELENG_7 and RELENG_8 versions of the /sbin/ipfw binaries,
and replies correctly (at least, it does its best; sometimes you
just cannot tell who sent the request and how to answer).
The compatibility layer should make it possible to MFC this code in a
relatively short time.

Some minor glitches (e.g. handling of ipfw set enable/disable,
and a workaround for a bug in RELENG_7's /sbin/ipfw) will be
fixed with separate commits.

CREDITS:
This work has been partly supported by the ONELAB2 project, and
mostly developed by Riccardo Panicucci and myself.
The code for the qfq scheduler is mostly from Fabio Checconi,
and Marta Carbone and Francesco Magno have helped with testing,
debugging and some bug fixes.


# 201732 07-Jan-2010 luigi

some header shuffling to help decoupling ip_divert from ipfw


# 201527 04-Jan-2010 luigi

Various cleanup done in ipfw3-head branch including:
- use a uniform mtag format for all packets that exit and re-enter
the firewall in the middle of a rulechain. On reentry, all tags
containing reinject info are renamed to MTAG_IPFW_RULE so the
processing is simpler.

- make ipfw and dummynet use ip_len and ip_off in network format
everywhere. Conversion is done only once instead of tracking
the format in every place.

- use a macro FREE_PKT to dispose of mbufs. This eases portability.

On passing i also removed a few typos, staticise or localise variables,
remove useless declarations and other minor things.

Overall the code shrinks a bit and is hopefully more readable.

I have tested functionality for all but ng_ipfw and if_bridge/if_ethersubr.
For ng_ipfw i am actually waiting for feedback from glebius@ because
we might have some small changes to make.
For if_bridge and if_ethersubr feedback would be welcome
(there are still some redundant parts in these two modules that
I would like to remove, but first i need to check functionality).


# 201122 28-Dec-2009 luigi

bring in several cleanups tested in ipfw3-head branch, namely:

r201011
- move most of ng_ipfw.h into ip_fw_private.h, as this code is
ipfw-specific. This removes a dependency on ng_ipfw.h from some files.

- move many equivalent definitions of direction (IN, OUT) for
reinjected packets into ip_fw_private.h

- document the structure of the packet tags used for dummynet
and netgraph;

r201049
- merge some common code to attach/detach hooks into
a single function.

r201055
- remove some duplicated code in ip_fw_pfil. The input
and output processing uses almost exactly the same code so
there is no need to use two separate hooks.
ip_fw_pfil.o goes from 2096 to 1382 bytes of .text

r201057 (see the svn log for full details)
- macros to make the conversion of ip_len and ip_off
between host and network format more explicit

r201113 (the remaining parts)
- readability fixes -- put braces around some large for() blocks,
localize variables so the compiler does not think they are uninitialized,
do not insist on precise allocation size if we have more than we need.

r201119
- when doing a lookup, keys must be in big endian format because
this is what the radix code expects (this fixes a bug in the
recently-introduced 'lookup' option)

No ABI changes in this commit.

MFC after: 1 week


# 200855 22-Dec-2009 luigi

merge code from ipfw3-head to reduce contention on the ipfw lock
and remove all O(N) sequences from kernel critical sections in ipfw.

In detail:

1. introduce a IPFW_UH_LOCK to arbitrate requests from
the upper half of the kernel. Some things, such as 'ipfw show',
can be done holding this lock in read mode, whereas insert and
delete require IPFW_UH_WLOCK.

2. introduce a mapping structure to keep rules together. This replaces
the 'next' chain currently used in ipfw rules. At the moment
the map is a simple array (sorted by rule number and then rule_id),
so we can find a rule quickly instead of having to scan the list.
This reduces many expensive lookups from O(N) to O(log N).

3. when an expensive operation (such as insert or delete) is done
by userland, we grab IPFW_UH_WLOCK, create a new copy of the map
without blocking the bottom half of the kernel, then acquire
IPFW_WLOCK and quickly update pointers to the map and related info.
After dropping IPFW_LOCK we can then continue the cleanup protected
by IPFW_UH_LOCK. So userland still costs O(N) but the kernel side
is only blocked for O(1).

4. do not pass pointers to rules through dummynet, netgraph, divert etc,
but rather pass a <slot, chain_id, rulenum, rule_id> tuple.
We validate the slot index (in the array of #2) with chain_id,
and if successful do a O(1) dereference; otherwise, we can find
the rule in O(log N) through <rulenum, rule_id>

All the above does not change the userland/kernel ABI, though there
are some disgusting casts between pointers and uint32_t

Operation costs now are as follows:

Function Old Now Planned
-------------------------------------------------------------------
+ skipto X, non cached O(N) O(log N)
+ skipto X, cached O(1) O(1)
XXX dynamic rule lookup O(1) O(log N) O(1)
+ skipto tablearg O(N) O(1)
+ reinject, non cached O(N) O(log N)
+ reinject, cached O(1) O(1)
+ kernel blocked during setsockopt() O(N) O(1)
-------------------------------------------------------------------

The only (very small) regression is on dynamic rule lookup and this will
be fixed in a day or two, without changing the userland/kernel ABI

Supported by: Valeria Paoli
MFC after: 1 month


# 200654 17-Dec-2009 luigi

Add some experimental code to log traffic with tcpdump,
similar to pflog(4).
To use the feature, just put the 'log' options on rules
you are interested in, e.g.

ipfw add 5000 count log ....

and run
tcpdump -ni ipfw0 ...

net.inet.ip.fw.verbose=0 enables logging to ipfw0,
net.inet.ip.fw.verbose=1 sends logging to syslog as before.

More features can be added, similar to pflog(), to store in
the MAC header metadata such as rule numbers and actions.
Manpage to come once features are settled.


# 200601 16-Dec-2009 luigi

Various cosmetic cleanup of the files:
- move global variables around to reduce the scope and make them
static if possible;
- add an ipfw_ prefix to all public functions to prevent conflicts
(the same should be done for variables);
- try to pack variable declaration in an uniform way across files;
- clarify some comments;
- remove some misspelling of names (#define V_foo VNET(bar)) that
slipped in due to cut&paste
- remove duplicate static variables in different files;

MFC after: 1 month


# 200580 15-Dec-2009 luigi

Start splitting ip_fw2.c and ip_fw.h into smaller components.
At this time we pull out from ip_fw2.c the logging functions, and
support for dynamic rules, and move kernel-only stuff into
netinet/ipfw/ip_fw_private.h

No ABI change involved in this commit, unless I made some mistake.
ip_fw.h has changed, though not in the userland-visible part.

Files touched by this commit:

conf/files
now references the two new source files

netinet/ip_fw.h
remove kernel-only definitions gone into netinet/ipfw/ip_fw_private.h.

netinet/ipfw/ip_fw_private.h
new file with kernel-specific ipfw definitions

netinet/ipfw/ip_fw_log.c
ipfw_log and related functions

netinet/ipfw/ip_fw_dynamic.c
code related to dynamic rules

netinet/ipfw/ip_fw2.c
removed the pieces that goes in the new files

netinet/ipfw/ip_fw_nat.c
minor rearrangement to remove LOOKUP_NAT from the
main headers. This require a new function pointer.

A bunch of other kernel files that included netinet/ip_fw.h now
require netinet/ipfw/ip_fw_private.h as well.
Not 100% sure i caught all of them.

MFC after: 1 month