History log of /freebsd-9.3-release/sys/netpfil/ipfw/
Revision Date Author Comments
267654 20-Jun-2014 gjb

Copy stable/9 to releng/9.3 as part of the 9.3-RELEASE cycle.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation


266680 26-May-2014 ae

MFC r266399:
Since ipfw nat configures all options in one step, we should set all bits
in the mask when calling LibAliasSetMode() to properly clear unneeded
options.

PR: 189655
Approved by: re (marius)


265699 08-May-2014 melifaro

Merge r258708, r258711, r260247, r261117.

r258708:
Check ipfw table numbers in both user and kernel space before rule addition.
Found by: Saychik Pavel <umka@localka.net>

r258711:
Simplify O_NAT opcode handling.

r260247:
Use rnh_matchaddr instead of rnh_lookup for longest-prefix match.
rnh_lookup is effectively the same as rnh_matchaddr if called with
empy network mask.

r261117:
Reorder struct ip_fw_chain:
* move rarely-used fields down
* move uh_lock to different cacheline
* remove some usused fields


264805 23-Apr-2014 brueffer

MFC: r264421

Free resources in error cases; re-indent a curly brace while here.

CID: 1199366
Found with: Coverity Prevent(tm)


264247 08-Apr-2014 dteske

MFC a modified version of r258588 (modification suggested by Mikolaj
Golub; tested stable both original and modified versions) that fixes
kernel panic when starting/stopping VIMAGE jails.

Discussed with: rodrigc, Mikolaj Golub


262210 19-Feb-2014 dim

MFC r261915:

Under sys/netpfil/ipfw, surround two IPv6-specific static functions with
#ifdef INET6, since they are unused when INET6 is disabled.


256217 09-Oct-2013 mav

MFC r250131 (by eadler):
Correct a few sizeof()s


255987 02-Oct-2013 philip

MFC r255928:
Use the correct EtherType for logging IPv6 packets.


255395 08-Sep-2013 trociny

MFC r254776:

Make ipfw nat init/unint work correctly for VIMAGE:

* Do per vnet instance cleanup (previously it was only for vnet0 on
module unload, and led to libalias leaks and possible panics due to
stale pointer dereferences).

* Instead of protecting ipfw hooks registering/deregistering by only
vnet0 lock (which does not prevent pointers access from another
vnets), introduce per vnet ipfw_nat_loaded variable. The variable is
set after hooks are registered and unset before they are deregistered.

* Devirtualize ifaddr_event_tag as we run only one event handler for
all vnets.

* It is supposed that ifaddr_change event handler is called in the
interface vnet context, so add an assertion.

Reviewed by: zec


252159 24-Jun-2013 glebius

Merge r250039:
Remove useless ifdef KLD_MODULE from dummynet module unload path. This
fixes panic on unload.

Reported by: pho


250762 18-May-2013 melifaro

MFC r248552, r248971

Add ipfw support for setting/matching DiffServ codepoints (DSCP).

Setting DSCP support is done via O_SETDSCP which works for both
IPv4 and IPv6 packets. Fast checksum recalculation (RFC 1624) is done for IPv4.
Dscp can be specified by name (AFXY, CSX, BE, EF), by value
(0..63) or via tablearg.

Matching DSCP is done via another opcode (O_DSCP) which accepts several
classes at once (af11,af22,be). Classes are stored in bitmask (2 u32 words).

Many people made their variants of this patch, the ones I'm aware of are
(in alphabetic order):

Dmitrii Tejblum
Marcelo Araujo
Roman Bogorodskiy (novel)
Sergey Matveichuk (sem)
Sergey Ryabin

PR: kern/102471, kern/121122

Fix ipfw rule validation partially broken by r248552.


250761 18-May-2013 melifaro

MFC r243711.

Use common macros for working with rule/dynamic counters.
This is done as preparation to introduce per-cpu ipfw counters.


250760 18-May-2013 melifaro

Merge r244633, r250246.

Use unified IP_FW_ARG_TABLEARG() macro for most tablearg checks.
Log real value instead of IP_FW_TABLEARG (65535) in ipfw_log().

Use unified method for accessing / updating cached rule pointers.


249002 02-Apr-2013 ae

MFC r248697:
When we are removing a specific set, call ipfw_expire_dyn_rules only once.


248497 19-Mar-2013 melifaro

MFC r247626.

Fix callout expiring dynamic rules.

PR: kern/175530
Submitted by: Vladimir Spiridenkov <vs@gtn.ru>


248085 09-Mar-2013 marius

MFC: r227309 (partial)

Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs.

The SYSCTL_NODE macro defines a list that stores all child-elements of
that node. If there's no SYSCTL_DECL macro anywhere else, there's no
reason why it shouldn't be static.


244571 21-Dec-2012 melifaro

Merge r238978(approved by luigi), r242631, r242834, r243707

replace inet_ntoa_r with the more standard inet_ntop().
As discussed on -current, inet_ntoa_r() is non standard, has different arguments
in userspace and kernel, and almost unused (no clients in userspace, only
net/flowtable.c, net/if_llatbl.c, netinet/in_pcb.c, netinet/tcp_subr.c
in the kernel)

Use unified print_dyn_rule_flags() function for debugging messages
instead of hand-made printfs in every place.

Simplify sending keepalives.
Prepare ipfw_tick() to be used by other consumers.

Make ipfw dynamic states operations SMP-ready.

* Global IPFW_DYN_LOCK() is changed to per-bucket mutex.
* State expiration is done in ipfw_tick every second.
* No expiration is done on forwarding path.
* hash table resize is done automatically and does not flush all states.
* Dynamic UMA zone is now allocated per each VNET
* State limiting is now done via UMA(9) api.


244569 21-Dec-2012 melifaro

Merge r241908, r242632

Remove unnecessary chain read lock in ipfw nat 'global' code.
Document case when ipfw chain lock must be held while calling ipfw_nat().


243586 27-Nov-2012 ae

MFC r242079:
Remove the IPFIREWALL_FORWARD kernel option and make possible to turn
on the related functionality in the runtime via the sysctl variable
net.pfil.forward. It is turned off by default.

MFC r242082:
Note the removal of the IPFIREWALL_FORWARD kernel option.

MFC r242463:
Remove the recently added sysctl variable net.pfil.forward.
Instead, add protocol specific mbuf flags M_IP_NEXTHOP and
M_IP6_NEXTHOP. Use them to indicate that the mbuf's chain
contains the PACKET_TAG_IPFORWARD tag. And do a tag lookup
only when this flag is set.


243401 22-Nov-2012 glebius

Partially merge r240494, which moved netinet/ipfw to netpfil/ipfw,
to make it easier to merge ipfw commits back to stable/9.


238683 22-Jul-2012 issyl0

MFC r238063:
- Make ipfw's sched rules case insensitive, for user-friendliness.
- Add a note to the ipfw(8) man page about the rules no longer being
case sensitive.
- Fix some typos in the man page.

PR: docs/164772
Reviewed by: bz
Approved by: gavin
Approved by: re (kib)


238557 17-Jul-2012 melifaro

MFC r237479, r238265

Finally fix lookup (account remaining '\0') and deletion
(provide valid key length for radix lookup).

Submitted by: Ihor Kaharlichenko<madkinder at gmail.com> (prev version)
Approved by: re(hrs), kib(mentor)
Sponsored by: Shtorm ISP


237307 20-Jun-2012 melifaro

MFC r236819

Validate IPv4 network mask being passed to ipfw kernel interface.
Incorrect mask can possibly be one of the reasons for kern/127209 existance.

Approved by: ae(mentor)


236692 06-Jun-2012 oleg

MFC: r232272, r232273

- lookup_dyn_rule_locked(): style(9) cleanup
- Refresh dynamic tcp rule only if both sides answered keepalive packets.
- Remove some useless assignments.


234597 23-Apr-2012 melifaro

MFC r232865, r232868, r233478

- Add ipfw eXtended tables permitting radix to be used for any kind of keys.
- Add support for IPv6 and interface extended tables
- Make number of tables to be changed in runtime in range 0..65534.
- Use IP_FW3 opcode for all new extended table cmds

No ABI changes are introduced. Old userland will see valid tables for
IPv4 tables and no entries otherwise. Flush works for any table.

IP_FW3 socket option is used to encapsulate all new opcodes:
/* IP_FW3 header/opcodes */
typedef struct _ip_fw3_opheader {
uint16_t opcode; /* Operation opcode */
uint16_t reserved[3]; /* Align to 64-bit boundary */
} ip_fw3_opheader;

New opcodes added:
IP_FW_TABLE_XADD, IP_FW_TABLE_XDEL, IP_FW_TABLE_XGETSIZE, IP_FW_TABLE_XLIST

ipfw(8) table argument parsing behavior is changed:
'ipfw table 999 add host' now assumes 'host' to be interface name instead of
hostname.

New tunable:
net.inet.ip.fw.tables_max controls number of table supported by ipfw in given
VNET instance. 128 is still the default value.

Sysctl change:
net.inet.ip.fw.tables_max is now read-write.

New syntax:
ipfw add skipto tablearg ip from any to any via table(42) in
ipfw add skipto tablearg ip from any to any via table(4242) out

This is a bit hackish, special interface name '\1' is used to signal interface
table number is passed in p.glob field.

Sponsored by Yandex LLC

Approved by: kib(mentor)


234278 14-Apr-2012 glebius

Merge 231076,231078:
Make the 'tcpwin' option of ipfw(8) accept ranges and lists.

Submitted by: sem


232292 29-Feb-2012 bz

MFC r231852,232127:

Merge multi-FIB IPv6 support.

Extend the so far IPv4-only support for multiple routing tables (FIBs)
introduced in r178888 to IPv6 providing feature parity.

This includes an extended rtalloc(9) KPI for IPv6, the necessary
adjustments to the network stack, and user land support as in netstat.

Sponsored by: Cisco Systems, Inc.


232171 26-Feb-2012 ae

MFC r231991:
Don't use `m' after m_megapullup.

PR: kern/165373


229461 04-Jan-2012 eadler

MFC r227458, r226436:

- change "is is" to "is" or "it is"
- change "the the" to "the"
- other typo fixes

Approved by: lstewart


227232 06-Nov-2011 bz

MFC r227085:

Always use the opt_*.h options for ipfw.ko, not just when
compiled into the kernel.
Do not try to build the module in case of no INET support but
keep #error calls for now in case we would compile it into the
kernel.

This should fix an issue where the module would fail to enable
IPv6 support from the rc framework, but also other INET and INET6
parts being silently compiled out without giving a warning in the
module case.

While here garbage collect unneeded opt_*.h includes.
opt_ipdn.h is not used anywhere but we need to leave the DUMMYNET
entry in options for conditional inclusion in kernel so keep the
file with the same name.

Reported by: pluknet
Reviewed by: plunket, jhb

Approved by: re (kib)


225965 04-Oct-2011 bz

MFC r225793:

Unbreak no-ip and no-inet6 module builds with ipfw. For now continue to
build the ip_fw_pfil.c hooks and ipfw even in case of no-ip under the
assumption that the private L2 hook (which hopefully eventually will be a
pfil hook as well) can still be useful.

Allow building the module without inet as well.

Approved by: re (kib)


225736 23-Sep-2011 kensmith

Copy head to stable/9 as part of 9.0-RELEASE release cycle.

Approved by: re (implicit)


225518 12-Sep-2011 jhb

Allow the ipfw.ko module built with a kernel to honor any IPFIREWALL_*
options defined in the kernel config. This more closely matches the
behavior of other modules which inherit configuration settings from the
kernel configuration during a kernel + modules build.

Reviewed by: luigi
Approved by: re (kib)
MFC after: 1 week


225044 20-Aug-2011 bz

Add support for IPv6 to ipfw fwd:
Distinguish IPv4 and IPv6 addresses and optional port numbers in
user space to set the option for the correct protocol family.
Add support in the kernel for carrying the new IPv6 destination
address and port.
Add support to TCP and UDP for IPv6 and fix UDP IPv4 to not change
the address in the IP header.
Add support for IPv6 forwarding to a non-local destination.
Add a regession test uitilizing VIMAGE to check all 20 possible
combinations I could think of.

Obtained from: David Dolson at Sandvine Incorporated
(original version for ipfw fwd IPv6 support)
Sponsored by: Sandvine Incorporated
PR: bin/117214
MFC after: 4 weeks
Approved by: re (kib)


225036 20-Aug-2011 bz

Hide IPv6 next header parsing warnings under the verbose sysctl
so people can possibly disable it when their consoles are flooded,
or enabled it for debugging.

MFC after: 2 weeks
Approved by: re (kib)


225034 20-Aug-2011 bz

After r225032 fix logging in a similar way masking the the IPv6
more fragments flag off so that offset == 0 checks work properly.

PR: kern/145733
Submitted by: Matthew Luckie (mjl luckie.org.nz)
MFC after: 2 weeks
X-MFC with: r225032
Approved by: re (kib)


225033 20-Aug-2011 bz

If we detect an IPv6 fragment header and it is not the first fragment,
then terminate the loop as we will not find any further headers and
for short fragments this could otherwise lead to a pullup error
discarding the fragment.

PR: kern/145733
Submitted by: Matthew Luckie (mjl luckie.org.nz)
MFC after: 2 weeks
Approved by: re (kib)


225032 20-Aug-2011 bz

ipfw internally checks for offset == 0 to determine whether the
packet is a/the first fragment or not. For IPv6 we have added the
"more fragments" flag as well to be able to determine on whether
there will be more as we do not have the fragment header avaialble
for logging, while for IPv4 this information can be derived directly
from the IPv4 header. This allowed fragmented packets to bypass
normal rules as proper masking was not done when checking offset.
Split variables to not need masking for IPv6 to avoid further errors.

PR: kern/145733
Submitted by: Matthew Luckie (mjl luckie.org.nz)
MFC after: 2 weeks
Approved by: re (kib)


225030 20-Aug-2011 bz

While not explicitly allowed by RFC 2460, in case there is no
translation technology involved (and that section is suggested to
be removed by Errata 2843), single packet fragments do not harm.

There is another errata under discussion to clarify and allow this.
Meanwhile add a sysctl to allow disabling this behaviour again.
We will treat single packet fragment (a fragment header added
when not needed) as if there was no fragment header.

PR: kern/145733
Submitted by: Matthew Luckie (mjl luckie.org.nz) (original version)
Tested by: Matthew Luckie (mjl luckie.org.nz)
MFC after: 2 weeks
Approved by: re (kib)


223666 29-Jun-2011 ae

Add new rule actions "call" and "return" to ipfw. They make
possible to organize subroutines with rules.

The "call" action saves the current rule number in the internal
stack and rules processing continues from the first rule with
specified number (similar to skipto action). If later a rule with
"return" action is encountered, the processing returns to the first
rule with number of "call" rule saved in the stack plus one or higher.

Submitted by: Vadim Goncharov
Discussed by: ipfw@, luigi@


223637 28-Jun-2011 bz

Update packet filter (pf) code to OpenBSD 4.5.

You need to update userland (world and ports) tools
to be in sync with the kernel.

Submitted by: mlaier
Submitted by: eri


223593 27-Jun-2011 glebius

Add possibility to pass IPv6 packets to a divert(4) socket.

Submitted by: sem


223358 21-Jun-2011 ae

Do not use SET_HOST_IPLEN() macro for IPv6 packets.

PR: kern/157239
MFC after: 2 weeks


223080 14-Jun-2011 ae

Implement "global" mode for ipfw nat. It is similar to natd(8)
"globalport" option for multiple NAT instances.

If ipfw rule contains "global" keyword instead of nat_number, then
for each outgoing packet ipfw_nat looks up translation state in all
configured nat instances. If an entry is found, packet aliased
according to that entry, otherwise packet is passed unchanged.

User can specify "skip_global" option in NAT configuration to exclude
an instance from the lookup in global mode.

PR: kern/157867
Submitted by: Alexander V. Chernikov (previous version)
Tested by: Eugene Grosbein


223073 14-Jun-2011 ae

Add IPv6 support to the ipfw uid/gid check. Pass an ip_fw_args structure
to the check_uidgid() function, since it contains all needed arguments
and also pointer to mbuf and now it is possible use in_pcblookup_mbuf()
function.

Since i can not test it for the non-FreeBSD case, i keep this ifdef
unchanged.

Tested by: Alexander V. Chernikov
MFC after: 3 weeks


222806 07-Jun-2011 ae

Make a behaviour of the libalias based in-kernel NAT a bit closer to
how natd(8) does work. natd(8) drops packets only when libalias returns
PKT_ALIAS_IGNORED and "deny_incoming" option is set, but ipfw_nat
always did drop packets that were not aliased, even if they should
not be aliased and just are going through.

PR: kern/122109, kern/129093, kern/157379
Submitted by: Alexander V. Chernikov (previous version)
MFC after: 1 month


222748 06-Jun-2011 rwatson

Implement a CPU-affine TCP and UDP connection lookup data structure,
struct inpcbgroup. pcbgroups, or "connection groups", supplement the
existing inpcbinfo connection hash table, which when pcbgroups are
enabled, might now be thought of more usefully as a per-protocol
4-tuple reservation table.

Connections are assigned to connection groups base on a hash of their
4-tuple; wildcard sockets require special handling, and are members
of all connection groups. During a connection lookup, a
per-connection group lock is employed rather than the global pcbinfo
lock. By aligning connection groups with input path processing,
connection groups take on an effective CPU affinity, especially when
aligned with RSS work placement (see a forthcoming commit for
details). This eliminates cache line migration associated with
global, protocol-layer data structures in steady state TCP and UDP
processing (with the exception of protocol-layer statistics; further
commit to follow).

Elements of this approach were inspired by Willman, Rixner, and Cox's
2006 USENIX paper, "An Evaluation of Network Stack Parallelization
Strategies in Modern Operating Systems". However, there are also
significant differences: we maintain the inpcb lock, rather than using
the connection group lock for per-connection state.

Likewise, the focus of this implementation is alignment with NIC
packet distribution strategies such as RSS, rather than pure software
strategies. Despite that focus, software distribution is supported
through the parallel netisr implementation, and works well in
configurations where the number of hardware threads is greater than
the number of NIC input queues, such as in the RMI XLR threaded MIPS
architecture.

Another important difference is the continued maintenance of existing
hash tables as "reservation tables" -- these are useful both to
distinguish the resource allocation aspect of protocol name management
and the more common-case lookup aspect. In configurations where
connection tables are aligned with hardware hashes, it is desirable to
use the traditional lookup tables for loopback or encapsulated traffic
rather than take the expense of hardware hashes that are hard to
implement efficiently in software (such as RSS Toeplitz).

Connection group support is enabled by compiling "options PCBGROUP"
into your kernel configuration; for the time being, this is an
experimental feature, and hence is not enabled by default.

Subject to the limited MFCability of change dependencies in inpcb,
and its change to the inpcbinfo init function signature, this change
in principle could be merged to FreeBSD 8.x.

Reviewed by: bz
Sponsored by: Juniper Networks, Inc.


222742 06-Jun-2011 ae

Do not return EINVAL when user does `ipfw set N flush` on an empty set.

MFC after: 2 weeks


222582 01-Jun-2011 ae

O_FORWARD_IP is only action which depends from the result of lookup of
dynamic rules. We are doing forwarding in the following cases:
o For the simple ipfw fwd rule, e.g.

fwd 10.0.0.1 ip from any to any out xmit em0
fwd 127.0.0.1,3128 tcp from any to any 80 in recv em1

o For the dynamic fwd rule, e.g.

fwd 192.168.0.1 tcp from any to 10.0.0.3 3333 setup keep-state

When this rule triggers it creates a dynamic rule, but this
dynamic rule should forward packets only in forward direction.

o And the last case that does not work before - simple fwd rule which
triggers when some dynamic rule is already executed.

PR: kern/147720, kern/150798
MFC after: 1 month


222560 01-Jun-2011 ae

Hide some debug messages under debug macro.

MFC after: 1 week


222559 01-Jun-2011 ae

Hide useless warning under debug macro.

PR: kern/69963
MFC after: 1 week


222488 30-May-2011 rwatson

Decompose the current single inpcbinfo lock into two locks:

- The existing ipi_lock continues to protect the global inpcb list and
inpcb counter. This lock is now relegated to a small number of
allocation and free operations, and occasional operations that walk
all connections (including, awkwardly, certain UDP multicast receive
operations -- something to revisit).

- A new ipi_hash_lock protects the two inpcbinfo hash tables for
looking up connections and bound sockets, manipulated using new
INP_HASH_*() macros. This lock, combined with inpcb locks, protects
the 4-tuple address space.

Unlike the current ipi_lock, ipi_hash_lock follows the individual inpcb
connection locks, so may be acquired while manipulating a connection on
which a lock is already held, avoiding the need to acquire the inpcbinfo
lock preemptively when a binding change might later be required. As a
result, however, lookup operations necessarily go through a reference
acquire while holding the lookup lock, later acquiring an inpcb lock --
if required.

A new function in_pcblookup() looks up connections, and accepts flags
indicating how to return the inpcb. Due to lock order changes, callers
no longer need acquire locks before performing a lookup: the lookup
routine will acquire the ipi_hash_lock as needed. In the future, it will
also be able to use alternative lookup and locking strategies
transparently to callers, such as pcbgroup lookup. New lookup flags are,
supplementing the existing INPLOOKUP_WILDCARD flag:

INPLOOKUP_RLOCKPCB - Acquire a read lock on the returned inpcb
INPLOOKUP_WLOCKPCB - Acquire a write lock on the returned inpcb

Callers must pass exactly one of these flags (for the time being).

Some notes:

- All protocols are updated to work within the new regime; especially,
TCP, UDPv4, and UDPv6. pcbinfo ipi_lock acquisitions are largely
eliminated, and global hash lock hold times are dramatically reduced
compared to previous locking.
- The TCP syncache still relies on the pcbinfo lock, something that we
may want to revisit.
- Support for reverting to the FreeBSD 7.x locking strategy in TCP input
is no longer available -- hash lookup locks are now held only very
briefly during inpcb lookup, rather than for potentially extended
periods. However, the pcbinfo ipi_lock will still be acquired if a
connection state might change such that a connection is added or
removed.
- Raw IP sockets continue to use the pcbinfo ipi_lock for protection,
due to maintaining their own hash tables.
- The interface in6_pcblookup_hash_locked() is maintained, which allows
callers to acquire hash locks and perform one or more lookups atomically
with 4-tuple allocation: this is required only for TCPv6, as there is no
in6_pcbconnect_setup(), which there should be.
- UDPv6 locking remains significantly more conservative than UDPv4
locking, which relates to source address selection. This needs
attention, as it likely significantly reduces parallelism in this code
for multithreaded socket use (such as in BIND).
- In the UDPv4 and UDPv6 multicast cases, we need to revisit locking
somewhat, as they relied on ipi_lock to stablise 4-tuple matches, which
is no longer sufficient. A second check once the inpcb lock is held
should do the trick, keeping the general case from requiring the inpcb
lock for every inpcb visited.
- This work reminds us that we need to revisit locking of the v4/v6 flags,
which may be accessed lock-free both before and after this change.
- Right now, a single lock name is used for the pcbhash lock -- this is
undesirable, and probably another argument is required to take care of
this (or a char array name field in the pcbinfo?).

This is not an MFC candidate for 8.x due to its impact on lookup and
locking semantics. It's possible some of these issues could be worked
around with compatibility wrappers, if necessary.

Reviewed by: bz
Sponsored by: Juniper Networks, Inc.


222474 30-May-2011 ae

Wrap long line.

MFC after: 2 weeks


222473 30-May-2011 ae

Add tablearg support for ipfw setfib.

PR: kern/156410
MFC after: 2 weeks


221521 06-May-2011 ae

Convert delay parameter back to ms when reporting to user.

PR: 156838
MFC after: 1 week


220914 21-Apr-2011 glebius

Use size_t for sopt_valsize.

Submitted by: Brandon Gooch <jamesbrandongooch gmail.com>


220878 20-Apr-2011 bz

MFp4 CH=191466:

Move fw_one_pass to where it belongs: it is a property of ipfw,
not of ip_input.

Reviewed by: gnn
Sponsored by: The FreeBSD Foundation
Sponsored by: iXsystems
MFC after: 3 days


220837 19-Apr-2011 glebius

- Rewrite functions that copyin/out NAT configuration, so that they
calculate required memory size dynamically.
- Fix races on chain re-lock.
- Introduce new field to ip_fw_chain - generation count. Now utilized
only in the NAT configuration, but can be utilized wider in ipfw.
- Get rid of NAT_BUF_LEN in ip_fw.h

PR: kern/143653


220832 19-Apr-2011 ae

Add sysctl handlers for net.inet.ip.dummynet.hash_size, .pipe_byte_limit
and .pipe_slot_limit oids to prevent to set incorrect values.

MFC after: 2 weeks


220831 19-Apr-2011 ae

ipdn_bound_var() functions is designed to bound a variable between
specified minimum and maximum. In case when specified default value
is out of bounds it does not work as expected and does not limit
variable. Check that default value is in range and limit it if needed.
Also bump max_hash_size value to 65536 to correspond with manual page.

PR: kern/152887
MFC after: 2 weeks


220812 19-Apr-2011 ae

Use M_WAITOK instead M_WAIT for malloc. Remove unneded checks.

MFC after: 1 week


220800 18-Apr-2011 glebius

LibAliasInit() should allocate memory with M_WAITOK flag. Modify it
and its callers.


220796 18-Apr-2011 glebius

Pullup up to TCP header length before matching against 'tcpopts'.

PR: kern/156180
Reviewed by: luigi


220568 12-Apr-2011 ae

Restore previous behaviour - always match rule when we doing tagging,
even when tag is already exists.

Reported by: Vadim Goncharov
MFC after: 1 week


220211 31-Mar-2011 ae

Fill up src_port and dst_port variables for SCTP over IPv4.

PR: kern/153415
MFC after: 1 week


220204 31-Mar-2011 ae

Fix malloc types.

MFC after: 1 week


220203 31-Mar-2011 ae

Fix a memory leak. Memory that is allocated for schedulers hash table
was not freed.

PR: kern/156083
MFC after: 1 week


218909 21-Feb-2011 brucec

Fix typos - remove duplicate "the".

PR: bin/154928
Submitted by: Eitan Adler <lists at eitanadler.com>
MFC after: 3 days


218741 16-Feb-2011 pluknet

Bump dummynet module version to meet dummynet schedulers' requirements,
and thus unbreak loading dummynet.ko via /boot/loader.conf.

Reported by: rihad <rihad att mail.ru> on freebsd-net
Approved by: kib (mentor)


218360 05-Feb-2011 luigi

correct the 'output_time' of packets generated by dummynet.
In the dec.2009 rewrite I introduced a bug, using for the
computation the arrival time instead of the time the packet
has exited from the queue.
The bandwidth computation was still correct because it is
computed elsewhere, but traffic was sent out in bursts.

The bug is also present in RELENG_8 after dec.2009

Thanks to Daikichi Osuga for investingating, finding and fixing the
bug with detailed graphs of the behaviour before and after the fix.

Submitted by: Daikichi Osuga
MFC after: 2 weeks


217361 13-Jan-2011 jhb

Use a blocking malloc() to initialize the dummynet taskq.

Reviewed by: luigi


217322 12-Jan-2011 mdf

sysctl(9) cleanup checkpoint: amd64 GENERIC builds cleanly.

Commit the net* piece.


217110 07-Jan-2011 jhb

Use a regular taskqueue for dummynet rather than a "fast" taskqueue.

Reviewed by: luigi


215701 22-Nov-2010 dim

After some off-list discussion, revert a number of changes to the
DPCPU_DEFINE and VNET_DEFINE macros, as these cause problems for various
people working on the affected files. A better long-term solution is
still being considered. This reversal may give some modules empty
set_pcpu or set_vnet sections, but these are harmless.

Changes reverted:

------------------------------------------------------------------------
r215318 | dim | 2010-11-14 21:40:55 +0100 (Sun, 14 Nov 2010) | 4 lines

Instead of unconditionally emitting .globl's for the __start_set_xxx and
__stop_set_xxx symbols, only emit them when the set_vnet or set_pcpu
sections are actually defined.

------------------------------------------------------------------------
r215317 | dim | 2010-11-14 21:38:11 +0100 (Sun, 14 Nov 2010) | 3 lines

Apply the STATIC_VNET_DEFINE and STATIC_DPCPU_DEFINE macros throughout
the tree.

------------------------------------------------------------------------
r215316 | dim | 2010-11-14 21:23:02 +0100 (Sun, 14 Nov 2010) | 2 lines

Add macros to define static instances of VNET_DEFINE and DPCPU_DEFINE.


215317 14-Nov-2010 dim

Apply the STATIC_VNET_DEFINE and STATIC_DPCPU_DEFINE macros throughout
the tree.


215179 12-Nov-2010 luigi

The first customer of the SO_USER_COOKIE option:
the "sockarg" ipfw option matches packets associated to
a local socket and with a non-zero so_user_cookie value.
The value is made available as tablearg, so it can be used
as a skipto target or pipe number in ipfw/dummynet rules.

Code by Paul Joe, manpage by me.

Submitted by: Paul Joe
MFC after: 1 week


213329 01-Oct-2010 luigi

put back the assigment to sched_time. It was correct, and
it was necessary.

Submitted by: Riccardo Panicucci


213279 29-Sep-2010 luigi

remove an unnecessary (and wrong) assignment.
It was meant to reset idle_time (and it was not needed),
but i even used the wrong field.

Obtained from: Oleg
MFC after: 3 days


213267 29-Sep-2010 luigi

whitespace changes in preparation for future commits


213265 29-Sep-2010 luigi

fix handling of initial credit for an idle pipe.
This fixes the bug where setting bw > 1 MTU/tick resulted in
infinite bandwidth if io_fast=1

PR: 147245 148429
Obtained from: Riccardo Panicucci
MFC after: 3 days


213254 28-Sep-2010 luigi

fix breakage in in-kernel NAT: the code did not honor
net.inet.ip.fw.one_pass and always moved to the next rule
in case of a successful nat.

This should fix several related PR (waiting for feedback
before closing them)

PR: 145167 149572 150141
MFC after: 3 days


213253 28-Sep-2010 luigi

Whitespace changes to reduce diffs wrt the most recent ipfw/dummynet code:
+ remove an unused macro,
+ adjust the constants in an enum
+ small whitespace changes

MFC after: 3 days


212256 06-Sep-2010 glebius

in_delayed_cksum() requires host byte order.

Reported by: Alexander Levin <amindomao googlemail.com>
MFC after: 1 week


211992 30-Aug-2010 maxim

o Some programs could send broadcast/multicast traffic to ipfw
pseudo-interface. This leads to a panic due to uninitialized
if_broadcastaddr address. Initialize it and implement ip_output()
method to prevent mbuf leak later.

ipfw pseudo-interface should never send anything therefore call
panic(9) in if_start() method.

PR: kern/149807
Submitted by: Dmitrij Tejblum
MFC after: 2 weeks


210537 27-Jul-2010 glebius

Fix operation of "netgraph" action in conjunction with the
net.inet.ip.fw.one_pass sysctl.

The "ngtee" action is still broken.

PR: kern/148885
Submitted by: Nickolay Dudorov <nnd mail.nsk.ru>


210123 15-Jul-2010 luigi

remove some conditional #ifdefs (no-op on FreeBSD);
run the timer routine on cpu 0.


210120 15-Jul-2010 luigi

whitespace fixes


210119 15-Jul-2010 luigi

fix a comment and final empty line


209845 09-Jul-2010 glebius

Improve last commit: use bpf_mtap2() to avoiding stack usage.

Prodded by: julian


209797 08-Jul-2010 glebius

Since r209216 bpf(4) searches for mbuf_tags(9) and thus will not work with
a stub m_hdr instead of a full mbuf.

PR: kern/148050


209589 29-Jun-2010 glebius

After processing the O_SKIPTO opcode our cmd points to the next rule, and
"match" processing at the end of inner loop would look ahead into the next
rule, which is incorrect. Particularly, in the case when the next rule
started with F_NOT opcode it was skipped blindly.

To fix this, exit the inner loop with the continue operator forcibly and
explicitly.

PR: kern/147798


206845 19-Apr-2010 luigi

whitespace fixes (trailing whitespace, bad indentation
after a merge, etc.)


206461 10-Apr-2010 bz

Try to help with a virtualized dummynet after r206428.

This adds the explicit include (so far probably included through one of the
few "hidden" includes in other header files) for vnet.h and adds a cast
to unbreak LINT-VIMAGE.


206428 09-Apr-2010 luigi

This commit enables partial operation of dummynet with kernels
compiled with "options VIMAGE".
As it is now, there is still a single instance of the pipes,
and it is only usable from vnet0 (the main instance).
Trying to use a pipe from a different vimage does not crash
the system as it did before, but the traffic coming out from
the pipe goes to the wrong place, and i still need to
figure out where.

Support for per-vimage pipes is almost there (just a matter of
uncommenting the VNET_* definitions for dn_cfg, plus putting into
the structure the remaining static variables), however i need
first to figure out how init/uninit work, and also to understand
where packets are ending up on exit from a pipe.

In summary: vimage support for dummynet is not complete yet,
but we are getting there.


206425 09-Apr-2010 luigi

no need to pass an argument to dn_compat_calc_size()

MFC after: 3 days


206339 07-Apr-2010 luigi

Hopefully fix the recent breakage in rule deletion.
A few more tests and this will also go into -stable where
the problem is more critical.


205955 31-Mar-2010 luigi

fix bug in previous commit related to rule deletion
(stable/8 just fixed moments ago)


205831 29-Mar-2010 luigi

remove a leftover debugging message


205830 29-Mar-2010 luigi

Fix handling of set manipulations.
This patch has two fixes for potential kernel panics (one wrong
index, one access to the wrong lock) and two fixes to wrong logic
in a conditional. The potential panics are also on stable/8,
so I am going to MFC the fix quickly.


205602 24-Mar-2010 luigi

Honor ip.fw.one_pass when a packet comes out of a pipe without being delayed.
I forgot to handle this case when i did the mtag cleanup three months ago.

PR: 145004


205417 21-Mar-2010 luigi

Add a priority-based packet scheduler.

Sponsored by: The ONELAB2 Project
Submitted by: Riccardo Panicucci


205415 21-Mar-2010 luigi

no need for ipfw_flush_tables(), we just need ipfw_destroy_tables()


205414 21-Mar-2010 luigi

revise documentation


205178 15-Mar-2010 luigi

small fixes to estimate the buffer size when requesting all pipes/flows.


205173 15-Mar-2010 luigi

+ implement (two lines) the kernel side of 'lookup dscp N' to use the
dscp as a search key in table lookups;

+ (re)implement a sysctl variable to control the expire frequency of
pipes and queues when they become empty;

+ add 'queue number' as optional part of the flow_id. This can be
enabled with the command

queue X config mask queue ...

and makes it possible to support priority-based schedulers, where
packets should be grouped according to the priority and not some
fields in the 5-tuple.
This is implemented as follows:
- redefine a field in the ipfw_flow_id (in sys/netinet/ip_fw.h) but
without changing the size or shape of the structure, so there are
no ABI changes. On passing, also document how other fields are
used, and remove some useless assignments in ip_fw2.c

- implement small changes in the userland code to set/read the field;

- revise the functions in ip_dummynet.c to manipulate masks so they
also handle the additional field;

There are no ABI changes in this commit.


205050 11-Mar-2010 luigi

implement listing of a subset of pipes/queues/schedulers.
The filtering of the output is done in the kernel instead of userland
to reduce the amount of data transfered.


204954 10-Mar-2010 luigi

fix handling of commands issued by RELENG_7 version of /sbin/ipfw,

Submitted by: Riccardo Panicucci


204866 08-Mar-2010 luigi

cosmetic changes and C++ compatibility


204865 08-Mar-2010 luigi

don't use C++ keywords as variable names


204862 08-Mar-2010 luigi

do not report an error unnecessarily


204837 07-Mar-2010 bz

Not only flush the ipfw tables when unloading ipfw or tearing
down a virtual netowrk stack, but also free the Radix Node Head.

Sponsored by: ISPsystem
Reviewed by: julian
MFC after: 5 days


204763 05-Mar-2010 luigi

plug a memory leak on pipe's reconfiguration


204754 05-Mar-2010 luigi

fix a memory leak when deleting RED queues


204736 04-Mar-2010 luigi

portability fixes


204735 04-Mar-2010 luigi

don't use keywords as variable names.


204714 04-Mar-2010 luigi

use callout_drain() (outside the lock) when unloading the module.
This prevents a potential deadlock.

Submitted by: Francesco Magno


204713 04-Mar-2010 luigi

improve compatibility with RELENG_7.2


204591 02-Mar-2010 luigi

Bring in the most recent version of ipfw and dummynet, developed
and tested over the past two months in the ipfw3-head branch. This
also happens to be the same code available in the Linux and Windows
ports of ipfw and dummynet.

The major enhancement is a completely restructured version of
dummynet, with support for different packet scheduling algorithms
(loadable at runtime), faster queue/pipe lookup, and a much cleaner
internal architecture and kernel/userland ABI which simplifies
future extensions.

In addition to the existing schedulers (FIFO and WF2Q+), we include
a Deficit Round Robin (DRR or RR for brevity) scheduler, and a new,
very fast version of WF2Q+ called QFQ.

Some test code is also present (in sys/netinet/ipfw/test) that
lets you build and test schedulers in userland.

Also, we have added a compatibility layer that understands requests
from the RELENG_7 and RELENG_8 versions of the /sbin/ipfw binaries,
and replies correctly (at least, it does its best; sometimes you
just cannot tell who sent the request and how to answer).
The compatibility layer should make it possible to MFC this code in a
relatively short time.

Some minor glitches (e.g. handling of ipfw set enable/disable,
and a workaround for a bug in RELENG_7's /sbin/ipfw) will be
fixed with separate commits.

CREDITS:
This work has been partly supported by the ONELAB2 project, and
mostly developed by Riccardo Panicucci and myself.
The code for the qfq scheduler is mostly from Fabio Checconi,
and Marta Carbone and Francesco Magno have helped with testing,
debugging and some bug fixes.


204003 17-Feb-2010 luigi

remove recursive lock/unlock calls, we do them already before entering
the switch.

Reported by: Marta Carbone


202459 17-Jan-2010 ume

Change 'me' to match any IPv6 address configured on an interface in
the system as well as any IPv4 address.

Reviewed by: David Horn <dhorn2000__at__gmail.com>, luigi, qingli
MFC after: 2 weeks


201745 07-Jan-2010 luigi

we don't use dummynet_drain!


201740 07-Jan-2010 luigi

check that we have an ipv4 packet before swapping ip_len and ip_off.
This should fix the handling of ipv6 packets which i broke when i
made ipfw operate on packets in network format.

Reported by: Hajimu UMEMOTO


201735 07-Jan-2010 luigi

Following up on a request from Ermal Luci to make
ip_divert work as a client of pf(4),
make ip_divert not depend on ipfw.

This is achieved by moving to ip_var.h the struct ipfw_rule_ref
(which is part of the mtag for all reinjected packets) and other
declarations of global variables, and moving to raw_ip.c global
variables for filter and divert hooks.

Note that names and locations could be made more generic
(ipfw_rule_ref is really a generic reference robust to reconfigurations;
the packet filter is not necessarily ipfw; filters and their clients
are not necessarily limited to ipv4), but _right now_ most
of this stuff works on ipfw and ipv4, so i don't feel like
doing a gratuitous renaming, at least for the time being.


201732 07-Jan-2010 luigi

some header shuffling to help decoupling ip_divert from ipfw


201722 07-Jan-2010 luigi

put ip_len in correct order for ip_output().
This prevents a panic when ipfw generates packets on its own
(such as reject or keepalives for dynamic rules).

Reported by: Chagin Dmitry


201568 05-Jan-2010 luigi

this file does not require ip_dummynet.h


201527 04-Jan-2010 luigi

Various cleanup done in ipfw3-head branch including:
- use a uniform mtag format for all packets that exit and re-enter
the firewall in the middle of a rulechain. On reentry, all tags
containing reinject info are renamed to MTAG_IPFW_RULE so the
processing is simpler.

- make ipfw and dummynet use ip_len and ip_off in network format
everywhere. Conversion is done only once instead of tracking
the format in every place.

- use a macro FREE_PKT to dispose of mbufs. This eases portability.

On passing i also removed a few typos, staticise or localise variables,
remove useless declarations and other minor things.

Overall the code shrinks a bit and is hopefully more readable.

I have tested functionality for all but ng_ipfw and if_bridge/if_ethersubr.
For ng_ipfw i am actually waiting for feedback from glebius@ because
we might have some small changes to make.
For if_bridge and if_ethersubr feedback would be welcome
(there are still some redundant parts in these two modules that
I would like to remove, but first i need to check functionality).


201150 29-Dec-2009 luigi

we really need htonl() here, see the comment a few lines above in the code.


201124 28-Dec-2009 luigi

bring the NGM_IPFW_COOKIE back into ng_ipfw.h, libnetgraph expects
to find it there. Unfortunately this reintroduces the dependency
on ip_fw_pfil.c


201122 28-Dec-2009 luigi

bring in several cleanups tested in ipfw3-head branch, namely:

r201011
- move most of ng_ipfw.h into ip_fw_private.h, as this code is
ipfw-specific. This removes a dependency on ng_ipfw.h from some files.

- move many equivalent definitions of direction (IN, OUT) for
reinjected packets into ip_fw_private.h

- document the structure of the packet tags used for dummynet
and netgraph;

r201049
- merge some common code to attach/detach hooks into
a single function.

r201055
- remove some duplicated code in ip_fw_pfil. The input
and output processing uses almost exactly the same code so
there is no need to use two separate hooks.
ip_fw_pfil.o goes from 2096 to 1382 bytes of .text

r201057 (see the svn log for full details)
- macros to make the conversion of ip_len and ip_off
between host and network format more explicit

r201113 (the remaining parts)
- readability fixes -- put braces around some large for() blocks,
localize variables so the compiler does not think they are uninitialized,
do not insist on precise allocation size if we have more than we need.

r201119
- when doing a lookup, keys must be in big endian format because
this is what the radix code expects (this fixes a bug in the
recently-introduced 'lookup' option)

No ABI changes in this commit.

MFC after: 1 week


201121 28-Dec-2009 luigi

readability fixes -- add braces on large blocks, remove unnecessary
initializations


201120 28-Dec-2009 luigi

explain details of operation of table lookups, and improve portability


201046 27-Dec-2009 luigi

diverted packet must re-enter _after_ the matching rule,
or we create loops.
The divert cookie (that can be set from userland too)
contains the matching rule nr, so we must start from nr+1.

Reported by: Joe Marcus Clarke


200951 24-Dec-2009 luigi

fix poor indentation resulting from a merge


200909 23-Dec-2009 luigi

mostly style changes, such as removal of trailing whitespace,
reformatting to avoid unnecessary line breaks, small block
restructuring to avoid unnecessary nesting, replace macros
with function calls, etc.

As a side effect of code restructuring, this commit fixes one bug:
previously, if a realloc() failed, memory was leaked. Now, the
realloc is not there anymore, as we first count how much memory
we need and then do a single malloc.


200897 23-Dec-2009 luigi

fix build with the new fast lookup structure.
Also remove some unnecessary headers


200896 23-Dec-2009 luigi

fix build on 64-bit architectures.
Also fix the indentation on a few lines.


200855 22-Dec-2009 luigi

merge code from ipfw3-head to reduce contention on the ipfw lock
and remove all O(N) sequences from kernel critical sections in ipfw.

In detail:

1. introduce a IPFW_UH_LOCK to arbitrate requests from
the upper half of the kernel. Some things, such as 'ipfw show',
can be done holding this lock in read mode, whereas insert and
delete require IPFW_UH_WLOCK.

2. introduce a mapping structure to keep rules together. This replaces
the 'next' chain currently used in ipfw rules. At the moment
the map is a simple array (sorted by rule number and then rule_id),
so we can find a rule quickly instead of having to scan the list.
This reduces many expensive lookups from O(N) to O(log N).

3. when an expensive operation (such as insert or delete) is done
by userland, we grab IPFW_UH_WLOCK, create a new copy of the map
without blocking the bottom half of the kernel, then acquire
IPFW_WLOCK and quickly update pointers to the map and related info.
After dropping IPFW_LOCK we can then continue the cleanup protected
by IPFW_UH_LOCK. So userland still costs O(N) but the kernel side
is only blocked for O(1).

4. do not pass pointers to rules through dummynet, netgraph, divert etc,
but rather pass a <slot, chain_id, rulenum, rule_id> tuple.
We validate the slot index (in the array of #2) with chain_id,
and if successful do a O(1) dereference; otherwise, we can find
the rule in O(log N) through <rulenum, rule_id>

All the above does not change the userland/kernel ABI, though there
are some disgusting casts between pointers and uint32_t

Operation costs now are as follows:

Function Old Now Planned
-------------------------------------------------------------------
+ skipto X, non cached O(N) O(log N)
+ skipto X, cached O(1) O(1)
XXX dynamic rule lookup O(1) O(log N) O(1)
+ skipto tablearg O(N) O(1)
+ reinject, non cached O(N) O(log N)
+ reinject, cached O(1) O(1)
+ kernel blocked during setsockopt() O(N) O(1)
-------------------------------------------------------------------

The only (very small) regression is on dynamic rule lookup and this will
be fixed in a day or two, without changing the userland/kernel ABI

Supported by: Valeria Paoli
MFC after: 1 month


200838 22-Dec-2009 luigi

some mostly cosmetic changes in preparation for upcoming work:

+ in many places, replace &V_layer3_chain with a local
variable chain;
+ bring the counter of rules and static_len within ip_fw_chain
replacing static variables;
+ remove some spurious comments and extern declaration;
+ document which lock protects certain data structures


200673 18-Dec-2009 ru

Added proper attribution.

Requested by: luigi


200654 17-Dec-2009 luigi

Add some experimental code to log traffic with tcpdump,
similar to pflog(4).
To use the feature, just put the 'log' options on rules
you are interested in, e.g.

ipfw add 5000 count log ....

and run
tcpdump -ni ipfw0 ...

net.inet.ip.fw.verbose=0 enables logging to ipfw0,
net.inet.ip.fw.verbose=1 sends logging to syslog as before.

More features can be added, similar to pflog(), to store in
the MAC header metadata such as rule numbers and actions.
Manpage to come once features are settled.


200634 17-Dec-2009 luigi

simplify and document lookup_next_rule()


200629 17-Dec-2009 luigi

simplify the code that finds the next rule after reinjections

MFC after: 1 week


200610 16-Dec-2009 luigi

remove a duplicate sysctl entry


200603 16-Dec-2009 luigi

bring back a couple of #include that are supplied by nesting,
and explain why they are used.


200601 16-Dec-2009 luigi

Various cosmetic cleanup of the files:
- move global variables around to reduce the scope and make them
static if possible;
- add an ipfw_ prefix to all public functions to prevent conflicts
(the same should be done for variables);
- try to pack variable declaration in an uniform way across files;
- clarify some comments;
- remove some misspelling of names (#define V_foo VNET(bar)) that
slipped in due to cut&paste
- remove duplicate static variables in different files;

MFC after: 1 month


200598 16-Dec-2009 imp

Quick fix to make this compile:
Remove redundant extern declearations.
If the maintainer has a better fix, then feel free to back this out.


200590 15-Dec-2009 luigi

more splitting of ip_fw2.c, now extract the 'table' routines
and the sockopt routines (the upper half of the kernel).

Whoever is the author of the 'table' code (Ruslan/glebius/oleg ?)
please change the attribution in ip_fw_table.c. I have copied
the copyright line from ip_fw2.c but it carries my name and I have
neither written nor designed the feature so I don't deserve
the credit.

MFC after: 1 month


200580 15-Dec-2009 luigi

Start splitting ip_fw2.c and ip_fw.h into smaller components.
At this time we pull out from ip_fw2.c the logging functions, and
support for dynamic rules, and move kernel-only stuff into
netinet/ipfw/ip_fw_private.h

No ABI change involved in this commit, unless I made some mistake.
ip_fw.h has changed, though not in the userland-visible part.

Files touched by this commit:

conf/files
now references the two new source files

netinet/ip_fw.h
remove kernel-only definitions gone into netinet/ipfw/ip_fw_private.h.

netinet/ipfw/ip_fw_private.h
new file with kernel-specific ipfw definitions

netinet/ipfw/ip_fw_log.c
ipfw_log and related functions

netinet/ipfw/ip_fw_dynamic.c
code related to dynamic rules

netinet/ipfw/ip_fw2.c
removed the pieces that goes in the new files

netinet/ipfw/ip_fw_nat.c
minor rearrangement to remove LOOKUP_NAT from the
main headers. This require a new function pointer.

A bunch of other kernel files that included netinet/ip_fw.h now
require netinet/ipfw/ip_fw_private.h as well.
Not 100% sure i caught all of them.

MFC after: 1 month


200567 15-Dec-2009 luigi

implement a new match option,

lookup {dst-ip|src-ip|dst-port|src-port|uid|jail} N

which searches the specified field in table N and sets tablearg
accordingly.
With dst-ip or src-ip the option replicates two existing options.
When used with other arguments, the option can be useful to
quickly dispatch traffic based on other fields.

Work supported by the Onelab project.

MFC after: 1 week


200361 10-Dec-2009 luigi

use div64 when converting back the burst value for userland


200360 10-Dec-2009 luigi

when draining a flowset free the entire chain, not just one packet.


200358 10-Dec-2009 luigi

centralize the code to free a packet (or a chain) while in dummynet.
Remove an old macro and its stale comment.


200170 05-Dec-2009 oleg

Fix burst processing for WF2Q pipes - do not increase available burst size
unless pipe is idle. This should fix follwing issues:
- 'dummynet: OUCH! pipe should have been idle!' log messages.
- exceeding configured pipe bandwidth.

MFC after: 1 week


200118 05-Dec-2009 luigi

adjust comment in previous commit after Julian's explanation


200116 05-Dec-2009 luigi

remove a dead block of code, document how the ipfw clients are
hooked and the difference in handling the 'enable' variable
for layer2 and layer3. The latter needs fixing once i figure out
how it worked pre-vnet.

MFC after: 7 days


200113 05-Dec-2009 luigi

fix build with VNET enabled

Reported by: David Wolfskill


200102 04-Dec-2009 ume

Use INET_ADDRSTRLEN and INET6_ADDRSTRLEN rather than hard
coded number.

Spotted by: bz


200059 03-Dec-2009 luigi

preparation work to replace the monster switch in ipfw_chk() with
table of functions.

This commit (which is heavily based on work done by Marta Carbone
in this year's GSOC project), removes the goto's and explicit
return from the inner switch(), so we will have a easier time when
putting the blocks into individual functions.

MFC after: 3 weeks


200055 03-Dec-2009 ume

Teach an IPv6 to the debug prints.


200040 02-Dec-2009 luigi

- initialize src_ip in the main loop to prevent a compiler warning
(gcc 4.x under linux, not sure how real is the complaint).
- rename a macro argument to prevent name clashes.
- add the macro name on a couple of #endif
- add a blank line for readability.

MFC after: 3 days


200029 02-Dec-2009 luigi

small changes for portability and diff reduction wrt/ FreeBSD 7.
No functional differences.

- use the div64() macro to wrap 64 bit divisions
(which almost always are 64 / 32 bits) so they are easier
to handle with compilers or OS that do not have native
support for 64bit divisions;

- use a local variable for p_numbytes even if not strictly
necessary on HEAD, as it reduces diffs with FreeBSD7

- in dummynet_send() check that a tag is present before
dereferencing the pointer.

- add a couple of blank lines for readability near the end of a function

MFC after: 3 days


200027 02-Dec-2009 ume

Teach an IPv6 to send_pkt() and ipfw_tick().
It fixes the issue which keep-alive doesn't work for an IPv6.

PR: kern/117234
Submitted by: mlaier, Joost Bekkers <joost__at__jodocus.org>
MFC after: 1 month


199073 09-Nov-2009 oleg

style(9): add missing parentheses


198845 03-Nov-2009 oleg

Fix two issues that can lead to exceeding configured pipe bandwidth:
- do not expire queues which are not ready to be expired.
- properly calculate available burst size.

MFC after: 3 days


197952 11-Oct-2009 julian

Virtualize the pfil hooks so that different jails may chose different
packet filters. ALso allows ipfw to be enabled on on ejail and disabled
on another. In 8.0 it's a global setting.

Sitting aroung in tree waiting to commit for: 2 months
MFC after: 2 months


196453 23-Aug-2009 julian

Fix another typo right next to the previous one, that amazingly, I did
not see before.

MFC after: 1 week


196451 23-Aug-2009 julian

Fix typo in comment that has been bugging me for days.

MFC after: 1 week


196423 21-Aug-2009 julian

Fix ipfw's initialization functions to get the correct order of evaluation
to allow vnet and non vnet operation. Move some functions from ip_fw_pfil.c
to ip_fw2.c and mode to mostly using the SYSINIT and VNET_SYSINIT handlers
instead of the modevent handler. Correct some spelling errors in comments
in the affected code. Note this bug fixes a crash in NON VIMAGE kernels when
ipfw is unloaded.

This patch is a minimal patch for 8.0
I have a much larger patch that actually fixes the underlying problems
that will be applied after 8.0

Reviewed by: zec@, rwatson@, bz@(earlier version)
Approved by: re (rwatson)
MFC after: Immediatly


196322 17-Aug-2009 jhb

Purge mergeinfo in sys/ that is either empty or a subset of the parent
mergeinfo on sys/ itself.

Approved by: re (mergeinfo blanket)


196201 14-Aug-2009 julian

Fix ipfw crash on uid or gid check.
Receiving any ip packet for which there is no existing socket will
crash if ipfw has a uid or gid test rule, as the uid/gid
of the non existent owner of said non existent socket is tested.
Brooks introduced this error as part of his >16 gids patch.
It appears to be a cut-n-paste error from similar code a few lines
before. The old code used the 'pcb' variable here, but in the
new code that switched the 'inp' variable, which is often NULL
and what is tested in the code further up. The rest of the multi-gid
patch for ipfw seems solid (and cleaner than previous code).

Reviewed by: brooks
Approved by: re (rwatson)


196019 01-Aug-2009 rwatson

Merge the remainder of kern_vimage.c and vimage.h into vnet.c and
vnet.h, we now use jails (rather than vimages) as the abstraction
for virtualization management, and what remained was specific to
virtual network stacks. Minor cleanups are done in the process,
and comments updated to reflect these changes.

Reviewed by: bz
Approved by: re (vimage blanket)


195923 28-Jul-2009 julian

Startup the vnet part of initialization a bit after the global part.
Fixes crash on boot if ipfw compiled in.

Submitted by: tegge@
Reviewed by: tegge@
Approved by: re (kib)


195862 25-Jul-2009 julian

Catch ipfw up to the rest of the vimage code.
It got left behind when it moved to its new location.

Approved by: re (kensmith)


195727 16-Jul-2009 rwatson

Remove unused VNET_SET() and related macros; only VNET_GET() is
ever actually used. Rename VNET_GET() to VNET() to shorten
variable references.

Discussed with: bz, julian
Reviewed by: bz
Approved by: re (kensmith, kib)


195699 14-Jul-2009 rwatson

Build on Jeff Roberson's linker-set based dynamic per-CPU allocator
(DPCPU), as suggested by Peter Wemm, and implement a new per-virtual
network stack memory allocator. Modify vnet to use the allocator
instead of monolithic global container structures (vinet, ...). This
change solves many binary compatibility problems associated with
VIMAGE, and restores ELF symbols for virtualized global variables.

Each virtualized global variable exists as a "reference copy", and also
once per virtual network stack. Virtualized global variables are
tagged at compile-time, placing the in a special linker set, which is
loaded into a contiguous region of kernel memory. Virtualized global
variables in the base kernel are linked as normal, but those in modules
are copied and relocated to a reserved portion of the kernel's vnet
region with the help of a the kernel linker.

Virtualized global variables exist in per-vnet memory set up when the
network stack instance is created, and are initialized statically from
the reference copy. Run-time access occurs via an accessor macro, which
converts from the current vnet and requested symbol to a per-vnet
address. When "options VIMAGE" is not compiled into the kernel, normal
global ELF symbols will be used instead and indirection is avoided.

This change restores static initialization for network stack global
variables, restores support for non-global symbols and types, eliminates
the need for many subsystem constructors, eliminates large per-subsystem
structures that caused many binary compatibility issues both for
monitoring applications (netstat) and kernel modules, removes the
per-function INIT_VNET_*() macros throughout the stack, eliminates the
need for vnet_symmap ksym(2) munging, and eliminates duplicate
definitions of virtualized globals under VIMAGE_GLOBALS.

Bump __FreeBSD_version and update UPDATING.

Portions submitted by: bz
Reviewed by: bz, zec
Discussed with: gnn, jamie, jeff, jhb, julian, sam
Suggested by: peter
Approved by: re (kensmith)


195023 26-Jun-2009 rwatson

Update various IPFW-related modules to use if_addr_rlock()/
if_addr_runlock() rather than IF_ADDR_LOCK()/IF_ADDR_UNLOCK().

MFC after: 6 weeks


194930 24-Jun-2009 oleg

- fix dummynet 'fast' mode for WF2Q case.
- fix printing of pipe profile data.
- introduce new pipe parameter: 'burst' - how much data can be sent through
pipe bypassing bandwidth limit.


194498 19-Jun-2009 brooks

Rework the credential code to support larger values of NGROUPS and
NGROUPS_MAX, eliminate ABI dependencies on them, and raise the to 1024
and 1023 respectively. (Previously they were equal, but under a close
reading of POSIX, NGROUPS_MAX was defined to be too large by 1 since it
is the number of supplemental groups, not total number of groups.)

The bulk of the change consists of converting the struct ucred member
cr_groups from a static array to a pointer. Do the equivalent in
kinfo_proc.

Introduce new interfaces crcopysafe() and crsetgroups() for duplicating
a process credential before modifying it and for setting group lists
respectively. Both interfaces take care for the details of allocating
groups array. crsetgroups() takes care of truncating the group list
to the current maximum (NGROUPS) if necessary. In the future,
crsetgroups() may be responsible for insuring invariants such as sorting
the supplemental groups to allow groupmember() to be implemented as a
binary search.

Because we can not change struct xucred without breaking application
ABIs, we leave it alone and introduce a new XU_NGROUPS value which is
always 16 and is to be used or NGRPS as appropriate for things such as
NFS which need to use no more than 16 groups. When feasible, truncate
the group list rather than generating an error.

Minor changes:
- Reduce the number of hand rolled versions of groupmember().
- Do not assign to both cr_gid and cr_groups[0].
- Modify ipfw to cache ucreds instead of part of their contents since
they are immutable once referenced by more than one entity.

Submitted by: Isilon Systems (initial implementation)
X-MFC after: never
PR: bin/113398 kern/133867


194245 15-Jun-2009 oleg

Since dn_pipe.numbytes is int64_t now - remove unnecessary overflow detection
code in ready_event_wfq().


193896 10-Jun-2009 luigi

in ip_dn_ctl(), do not allocate a large structure on the stack,
and use malloc() instead if/when it is necessary.

The problem is less relevant in previous versions because
the variable involved (tmp_pipe) is much smaller there.
Still worth fixing though.

Submitted by: Marta Carbone (GSOC)
MFC after: 3 days


193894 10-Jun-2009 luigi

small simplifications to the code in charge of reaping deleted rules:
- clear the head pointer immediately before using it, so there is
no chance of mistakes;
- call reap_rules() unconditionally. The function can handle a NULL
argument just fine, and the cost of the extra call is hardly
significant given that we do it rarely and outside the lock.

MFC after: 3 days


193859 09-Jun-2009 oleg

Close long existed race with net.inet.ip.fw.one_pass = 0:
If packet leaves ipfw to other kernel subsystem (dummynet, netgraph, etc)
it carries pointer to matching ipfw rule. If this packet then reinjected back
to ipfw, ruleset processing starts from that rule. If rule was deleted
meanwhile, due to existed race condition panic was possible (as well as
other odd effects like parsing rules in 'reap list').

P.S. this commit changes ABI so userland ipfw related binaries should be
recompiled.

MFC after: 1 month
Tested by: Mikolaj Golub


193744 08-Jun-2009 bz

After r193232 rt_tables in vnet.h are no longer indirectly dependent on
the ROUTETABLES kernel option thus there is no need to include opt_route.h
anymore in all consumers of vnet.h and no longer depend on it for module
builds.

Remove the hidden include in flowtable.h as well and leave the two
explicit #includes in ip_input.c and ip_output.c.


193532 05-Jun-2009 luigi

move kernel ipfw-related sources to a separate directory,
adjust conf/files and modules' Makefiles accordingly.

No code or ABI changes so this and most of previous related
changes can be easily MFC'ed

MFC after: 5 days