History log of /freebsd-10-stable/sys/netpfil/
Revision Date Author Comments
(<<< Hide modified files)
(Show modified files >>>)
338106 20-Aug-2018 kp

MFC r337969:

pf: Limit the maximum number of fragments per packet

Similar to the network stack issue fixed in r337782 pf did not limit the number
of fragments per packet, which could be exploited to generate high CPU loads
with a crafted series of packets.

Limit each packet to no more than 64 fragments. This should be sufficient on
typical networks to allow maximum-sized IP frames.

This addresses the issue for both IPv4 and IPv6.

Security: CVE-2018-5391
Sponsored by: Klara Systems

335252 16-Jun-2018 kp

MFC r334876:

pf: Fix deadlock with route-to

If a locally generated packet is routed (with route-to/reply-to/dup-to) out of
a different interface it's passed through the firewall again. This meant we
lost the inp pointer and if we required the pointer (e.g. for user ID matching)
we'd deadlock trying to acquire an inp lock we've already got.

Pass the inp pointer along with pf_route()/pf_route6().

PR: 228782

332497 14-Apr-2018 kp

MFC r332142:

pf: Improve ioctl validation

Ensure that multiplications for memory allocations cannot overflow, and
that we'll not try to allocate M_WAITOK for potentially overly large
allocations.

332494 13-Apr-2018 kp

MFC r332107:

pf: Improve ioctl validation for DIOCRGETTABLES, DIOCRGETTSTATS, DIOCRCLRTSTATS and DIOCRSETTFLAGS

These ioctls can process a number of items at a time, which puts us at
risk of overflow in mallocarray() and of impossibly large allocations
even if we don't overflow.

Limit the allocation to required size (or the user allocation, if that's
smaller). That does mean we need to do the allocation with the rules
lock held (so the number doesn't change while we're doing this), so it
can't M_WAITOK.

332492 13-Apr-2018 kp

MFC r332136:

pf: Improve ioctl validation for DIOCIGETIFACES and DIOCXCOMMIT

These ioctls can process a number of items at a time, which puts us at
risk of overflow in mallocarray() and of impossibly large allocations
even if we don't overflow.

There's no obvious limit to the request size for these, so we limit the
requests to something which won't overflow. Change the memory allocation
to M_NOWAIT so excessive requests will fail rather than stall forever.

332487 13-Apr-2018 kp

MFC r332101:

pf: Improve ioctl validation for DIOCRADDTABLES and DIOCRDELTABLES

The DIOCRADDTABLES and DIOCRDELTABLES ioctls can process a number of
tables at a time, and as such try to allocate <number of tables> *
sizeof(struct pfr_table). This multiplication can overflow. Thanks to
mallocarray() this is not exploitable, but an overflow does panic the
system.

Arbitrarily limit this to 65535 tables. pfctl only ever processes one
table at a time, so it presents no issues there.

332330 09-Apr-2018 kp

MFC r331225:

pf: Fix memory leak in DIOCRADDTABLES

If a user attempts to add two tables with the same name the duplicate table
will not be added, but we forgot to free the duplicate table, leaking memory.
Ensure we free the duplicate table in the error path.

Reported by: Coverity
CID: 1382111

331202 19-Mar-2018 ae

MFC r330792:
Do not try to reassemble IPv6 fragments in "reass" rule.

ip_reass() expects IPv4 packet and will just corrupt any IPv6 packets
that it gets. Until proper IPv6 fragments handling function will be
implemented, pass IPv6 packets to next rule.

PR: 170604

331117 18-Mar-2018 kp

MFC r329950:

pf: Cope with overly large net.pf.states_hashsize

If the user configures a states_hashsize or source_nodes_hashsize value we may
not have enough memory to allocate this. This used to lock up pf, because these
allocations used M_WAITOK.

Cope with this by attempting the allocation with M_NOWAIT and falling back to
the default sizes (with M_WAITOK) if these fail.

PR: 209475
Submitted by: Fehmi Noyan Isi <fnoyanisi AT yahoo.com>

328277 23-Jan-2018 kp

MFC r327675

pf: Avoid integer overflow issues by using mallocarray() iso. malloc()

pfioctl() handles several ioctl that takes variable length input, these
include:
- DIOCRADDTABLES
- DIOCRDELTABLES
- DIOCRGETTABLES
- DIOCRGETTSTATS
- DIOCRCLRTSTATS
- DIOCRSETTFLAGS

All of them take a pfioc_table struct as input from userland. One of
its elements (pfrio_size) is used in a buffer length calculation.
The calculation contains an integer overflow which if triggered can lead
to out of bound reads and writes later on.

Reported by: Ilja Van Sprundel <ivansprundel@ioactive.com>

325731 12-Nov-2017 truckman

MFC r325008

Fix Dummynet AQM packet marking function ecn_mark() and fq_codel /
fq_pie schedulers packet classification functions in layer2 (bridge mode).

Dummynet AQM packet marking function ecn_mark() and fq_codel/fq_pie
schedulers packet classification functions (fq_codel_classify_flow()
and fq_pie_classify_flow()) assume mbuf is pointing at L3 (IP)
packet. However, this assumption is incorrect if ipfw/dummynet is
used to manage layer2 traffic (bridge mode) since mbuf will point
at L2 frame. This patch solves this problem by identifying the
source of the frame/packet (L2 or L3) and adding ETHER_HDR_LEN
offset when converting an mbuf pointer to ip pointer if the traffic
is from layer2. More specifically, in dummynet packet tagging
function, tag_mbuf(), iphdr_off is set to ETHER_HDR_LEN if the
traffic is from layer2 and set to zero otherwise. Whenever an access
to IP header is required, mtodo(m, dn_tag_get(m)->iphdr_off) is
used instead of mtod(m, struct ip *) to correctly convert mbuf
pointer to ip pointer in both L2 and L3 traffic.

Submitted by: lstewart
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D12506

325230 31-Oct-2017 ae

MFC r324947:
Add IPv6 support for O_TCPDATALEN opcode.

PR: 222746

321873 01-Aug-2017 philip

MFC r320941: Fix GRE over IPv6 tunnels with IPFW

Previously, GRE packets in IPv6 tunnels would be dropped by IPFW (unless
net.inet6.ip6.fw.deny_unknown_exthdrs was unset).

PR: 220640
Submitted by: Kun Xie <kxie@xiplink.com>

318905 25-May-2017 truckman

MFC r318527

Fix the queue delay estimation in PIE/FQ-PIE when the timestamp
(TS) method is used. When packet timestamp is used, the "current_qdelay"
keeps storing the last queue delay value calculated in the dequeue
function. Therefore, when a burst of packets arrives followed by
a pause, the "current_qdelay" will store a high value caused by the
burst and stick to that value during the pause because the queue
delay measurement is done inside the dequeue function. This causes
the drop probability calculation function to calculate high drop
probability value instead of zero and prevents the burst allowance
mechanism from working properly. Fix this problem by resetting
"current_qdelay" inside the drop probability calculation function
when the queue length is zero and TS option is used.

Submitted by: Rasool Al-Saadi <ralsaadi@swin.edu.au>

318886 25-May-2017 truckman

MFC r318511

The result of right shifting a negative signed value is implementation
defined. On machines without arithmetic shift instructions, zero bits
may be shifted in from the left, giving a large positive result instead
of the desired divide-by power-of-2. Fix this by operating on the
absolute value and compensating for the possible negation later.

Reverse the order of the underflow/overflow tests and the exponential
decay calculation to avoid the possibility of an erroneous overflow
detection if p is a sufficiently small non-negative value. Also
check for negative values of prob before doing the exponential decay
to avoid another instance of of right shifting a negative value.

Tested by: Rasool Al-Saadi <ralsaadi@swin.edu.au>

318155 10-May-2017 marius

MFC: r311817

In dummynet(4), random chunks of memory are casted to struct dn_*,
potentially leading to fatal unaligned accesses on architectures with
strict alignment requirements. This change fixes dummynet(4) as far
as accesses to 64-bit members of struct dn_* are concerned, tripping
up on sparc64 with accesses to 32-bit members happening to be correctly
aligned there. In other words, this only fixes the tip of the iceberg;
larger parts of dummynet(4) still need to be rewritten in order to
properly work on all of !x86.
In principle, considering the amount of code in dummynet(4) that needs
this erroneous pattern corrected, an acceptable workaround would be to
declare all struct dn_* packed, forcing compilers to do byte-accesses
as a side-effect. However, given that the structs in question aren't
laid out well either, this would break ABI/KBI.
While at it, replace all existing bcopy(9) calls with memcpy(9) for
performance reasons, as there is no need to check for overlap in these
cases.

PR: 189219

317489 27-Apr-2017 truckman

MFC r316777 (by cem)

dummynet: Use strlcpy to appease static checkers

Some dummynet modules used strcpy() to copy from a larger buffer
(dn_aqm->name) to a smaller buffer (dn_extra_parms->name). It happens that
the lengths of the strings in the dn_aqm buffers were always hardcoded to be
smaller than the dn_extra_parms buffer ("CODEL", "PIE").

Use strlcpy() instead, to appease static checkers. No functional change.

Reported by: Coverity
CIDs: 1356163, 1356165
Sponsored by: Dell EMC Isilon

317335 23-Apr-2017 kp

MFC r317186

pf: Fix possible incorrect IPv6 fragmentation

When forwarding pf tracks the size of the largest fragment in a fragmented
packet, and refragments based on this size.
It failed to ensure that this size was a multiple of 8 (as is required for all
but the last fragment), so it could end up generating incorrect fragments.

For example, if we received an 8 byte and 12 byte fragment pf would emit a first
fragment with 12 bytes of payload and the final fragment would claim to be at
offset 8 (not 12).

We now assert that the fragment size is a multiple of 8 in ip6_fragment(), so
other users won't make the same mistake.

Reported by: Antonios Atlasis <aatlasis at secfu net>

316641 08-Apr-2017 kp

MFC r316355

pf: Fix leak of pf_state_keys

If we hit the state limit we returned from pf_create_state() without cleaning
up.

PR: 217997
Submitted by: Max <maximos@als.nnov.ru>

316325 31-Mar-2017 truckman

MFC r315516

Change several constants used by the PIE algorithm from unsigned to signed.

- PIE_MAX_PROB is compared to variable of int64_t and the type promotion
rules can cause the value of that variable to be treated as unsigned.
If the value is actually negative, then the result of the comparsion
is incorrect, causing the algorithm to perform poorly in some
situations. Changing the constant to be signed cause the comparision
to work correctly.

- PIE_SCALE is also compared to signed values. Fortunately they are
also compared to zero and negative values are discarded so this is
more of a cosmetic fix.

- PIE_DQ_THRESHOLD is only compared to unsigned values, but it is small
enough that the automatic promotion to unsigned is harmless.

Submitted by: Rasool Al-Saadi <ralsaadi@swin.edu.au>

316000 26-Mar-2017 kp

MFC 315529

pf: Fix rule evaluation after inet6 route-to

In pf_route6() we re-run the ruleset with PF_FWD if the packet goes out
of a different interface. pf_test6() needs to know that the packet was
forwarded (in case it needs to refragment so it knows whether to call
ip6_output() or ip6_forward()).

This lead pf_test6() to try to evaluate rules against the PF_FWD
direction, which isn't supported, so it needs to treat PF_FWD as PF_OUT.
Once fwdir is set correctly the correct output/forward function will be
called.

PR: 217883
Submitted by: Kajetan Staszkiewicz
Sponsored by: InnoGames GmbH

315028 10-Mar-2017 vangyzen

MFC r313820

pf: use inet_ntoa_r() instead of inet_ntoa(); maybe fix IPv6 OS fingerprinting

inet_ntoa() cannot be used safely in a multithreaded environment
because it uses a static local buffer. Instead, use inet_ntoa_r()
with a buffer on the caller's stack.

This code had an INET6 conditional before this commit, but opt_inet6.h
was not included, so INET6 was never defined. Apparently, pf's OS
fingerprinting hasn't worked with IPv6 for quite some time.
This commit might fix it, but I didn't test that.

Relnotes: yes (if I/someone can test pf OS fingerprinting with IPv6)
Sponsored by: Dell EMC

314940 09-Mar-2017 kp

MFC r314810:

pf: Fix a crash in low-memory situations

If the call to pf_state_key_clone() in pf_get_translation() fails (i.e. there's
no more memory for it) it frees skp. This is wrong, because skp is a
pf_state_key **, so we need to free *skp, as is done later in the function.
Getting it wrong means we try to free a stack variable of the calling
pf_test_rule() function, and we panic.

314667 04-Mar-2017 avg

MFC r283291: don't use CALLOUT_MPSAFE with callout_init()

The main purpose of this MFC is to reduce conflicts for other merges.
Parts of the original change have already "trickled down" via individual MFCs.


/freebsd-10-stable/sys/amd64/amd64/mp_watchdog.c
/freebsd-10-stable/sys/cddl/contrib/opensolaris/uts/common/dtrace/dtrace.c
/freebsd-10-stable/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa_misc.c
/freebsd-10-stable/sys/cddl/dev/profile/profile.c
/freebsd-10-stable/sys/compat/ndis/subr_ntoskrnl.c
/freebsd-10-stable/sys/contrib/ipfilter/netinet/ip_fil_freebsd.c
/freebsd-10-stable/sys/dev/altera/jtag_uart/altera_jtag_uart_tty.c
/freebsd-10-stable/sys/dev/ath/if_ath.c
/freebsd-10-stable/sys/dev/ce/if_ce.c
/freebsd-10-stable/sys/dev/cp/if_cp.c
/freebsd-10-stable/sys/dev/ctau/if_ct.c
/freebsd-10-stable/sys/dev/cx/if_cx.c
/freebsd-10-stable/sys/dev/cxgb/cxgb_main.c
/freebsd-10-stable/sys/dev/cxgb/cxgb_sge.c
/freebsd-10-stable/sys/dev/dcons/dcons_os.c
/freebsd-10-stable/sys/dev/drm2/drm_irq.c
/freebsd-10-stable/sys/dev/drm2/i915/intel_display.c
/freebsd-10-stable/sys/dev/glxsb/glxsb.c
/freebsd-10-stable/sys/dev/gxemul/cons/gxemul_cons.c
/freebsd-10-stable/sys/dev/hifn/hifn7751.c
/freebsd-10-stable/sys/dev/hyperv/storvsc/hv_storvsc_drv_freebsd.c
/freebsd-10-stable/sys/dev/if_ndis/if_ndis.c
/freebsd-10-stable/sys/dev/isci/isci_io_request.c
/freebsd-10-stable/sys/dev/mfi/mfi.c
/freebsd-10-stable/sys/dev/mwl/if_mwl.c
/freebsd-10-stable/sys/dev/nand/nandsim_chip.c
/freebsd-10-stable/sys/dev/ntb/ntb_hw/ntb_hw.c
/freebsd-10-stable/sys/dev/nxge/if_nxge.c
/freebsd-10-stable/sys/dev/oce/oce_if.c
/freebsd-10-stable/sys/dev/patm/if_patm_attach.c
/freebsd-10-stable/sys/dev/rndtest/rndtest.c
/freebsd-10-stable/sys/dev/safe/safe.c
/freebsd-10-stable/sys/dev/sound/midi/mpu401.c
/freebsd-10-stable/sys/dev/sound/pci/atiixp.c
/freebsd-10-stable/sys/dev/sound/pci/es137x.c
/freebsd-10-stable/sys/dev/sound/pci/hda/hdaa.c
/freebsd-10-stable/sys/dev/sound/pci/hda/hdac.c
/freebsd-10-stable/sys/dev/sound/pci/via8233.c
/freebsd-10-stable/sys/dev/twa/tw_osl_freebsd.c
/freebsd-10-stable/sys/dev/tws/tws.c
/freebsd-10-stable/sys/dev/ubsec/ubsec.c
/freebsd-10-stable/sys/dev/virtio/random/virtio_random.c
/freebsd-10-stable/sys/dev/xen/netfront/netfront.c
/freebsd-10-stable/sys/fs/nfs/nfs_commonport.c
/freebsd-10-stable/sys/gdb/gdb_cons.c
/freebsd-10-stable/sys/geom/gate/g_gate.c
/freebsd-10-stable/sys/geom/journal/g_journal.c
/freebsd-10-stable/sys/geom/mirror/g_mirror.c
/freebsd-10-stable/sys/geom/raid3/g_raid3.c
/freebsd-10-stable/sys/geom/sched/gs_rr.c
/freebsd-10-stable/sys/i386/i386/mp_watchdog.c
/freebsd-10-stable/sys/kern/init_main.c
/freebsd-10-stable/sys/kern/kern_synch.c
/freebsd-10-stable/sys/kern/kern_thread.c
/freebsd-10-stable/sys/kern/subr_vmem.c
/freebsd-10-stable/sys/kern/uipc_domain.c
/freebsd-10-stable/sys/mips/cavium/octe/ethernet.c
/freebsd-10-stable/sys/mips/cavium/octeon_rnd.c
/freebsd-10-stable/sys/mips/nlm/dev/net/xlpge.c
/freebsd-10-stable/sys/mips/rmi/dev/xlr/rge.c
/freebsd-10-stable/sys/net/if_spppsubr.c
/freebsd-10-stable/sys/net80211/ieee80211_ht.c
/freebsd-10-stable/sys/net80211/ieee80211_hwmp.c
/freebsd-10-stable/sys/net80211/ieee80211_mesh.c
/freebsd-10-stable/sys/net80211/ieee80211_node.c
/freebsd-10-stable/sys/net80211/ieee80211_proto.c
/freebsd-10-stable/sys/netgraph/netflow/ng_netflow.c
/freebsd-10-stable/sys/netgraph/netgraph.h
/freebsd-10-stable/sys/netinet/in_pcb.c
/freebsd-10-stable/sys/netinet/ip_mroute.c
/freebsd-10-stable/sys/netinet/tcp_hostcache.c
/freebsd-10-stable/sys/netinet/tcp_subr.c
/freebsd-10-stable/sys/netinet6/in6_rmx.c
ipfw/ip_dummynet.c
ipfw/ip_fw_dynamic.c
pf/if_pfsync.c
/freebsd-10-stable/sys/ofed/include/linux/timer.h
/freebsd-10-stable/sys/ofed/include/linux/workqueue.h
/freebsd-10-stable/sys/powerpc/mambo/mambo_console.c
/freebsd-10-stable/sys/powerpc/pseries/phyp_console.c
/freebsd-10-stable/sys/sys/callout.h
/freebsd-10-stable/sys/vm/uma_core.c
/freebsd-10-stable/sys/x86/x86/mca.c
313726 14-Feb-2017 ngie

MFC r313356:

Fix typos in comments (returing -> returning)

310094 14-Dec-2016 kp

MFC r309563: pflog: Correctly initialise subrulenr

subrulenr is considered unset if it's set to -1, not if it's set to 1.
See contrib/tcpdump/print-pflog.c pflog_print() for a user.

This caused incorrect pflog output (tcpdump -n -e -ttt -i pflog0):
rule 0..16777216(match)
instead of the correct output of
rule 0/0(match)

PR: 214832
Submitted by: andywhite@gmail.com

304463 19-Aug-2016 kp

MFC r304152:

pf: Add missing byte-order swap to pf_match_addr_range

Without this, rules using address ranges (e.g. "10.1.1.1 - 10.1.1.5") did not
match addresses correctly on little-endian systems.

PR: 211796
Obtained from: OpenBSD (sthen)

304283 17-Aug-2016 kp

MFC r302497:

pf: Map hook returns onto the correct error values

pf returns PF_PASS, PF_DROP, ... in the netpfil hooks, but the hook callers
expect to get E<foo> error codes.
Map the returns values. A pass is 0 (everything is OK), anything else means
pf ate the packet, so return EACCES, which tells the stack not to emit an ICMP
error message.

PR: 207598

303850 08-Aug-2016 kp

MFC r290521:

pf: Fix broken rule skip calculation

r289932 accidentally broke the rule skip calculation. The address family
argument to PF_ANEQ() is now important, and because it was set to 0 the macro
always evaluated to false.
This resulted in incorrect skip values, which in turn broke the rule
evaluations.

302987 18-Jul-2016 truckman

MFC r302667

Fix problems in the FQ-PIE AQM cleanup code that could leak memory or
cause a crash.

Because dummynet calls pie_cleanup() while holding a mutex, pie_cleanup()
is not able to use callout_drain() to make sure that all callouts are
finished before it returns, and callout_stop() is not sufficient to make
that guarantee. After pie_cleanup() returns, dummynet will free a
structure that any remaining callouts will want to access.

Fix these problems by allocating a separate structure to contain the
data used by the callouts. In pie_cleanup(), call callout_reset_sbt()
to replace the normal callout with a cleanup callout that does the cleanup
work for each sub-queue. The instance of the cleanup callout that
destroys the last flow will also free the extra allocated block of memory.
Protect the reference count manipulation in the cleanup callout with
DN_BH_WLOCK() to be consistent with all of the other usage of the reference
count where this lock is held by the dummynet code.

Submitted by: Rasool Al-Saadi <ralsaadi@swin.edu.au>
Differential Revision: https://reviews.freebsd.org/D7174

302422 08-Jul-2016 truckman

MFC r302338

Fix a race condition between the main thread in aqm_pie_cleanup() and the
callout thread that can cause a kernel panic. Always do the final cleanup
in the callout thread by passing a separate callout function for that task
to callout_reset_sbt().

Protect the ref_count decrement in the callout with DN_BH_WLOCK(). All
other ref_count manipulation is protected with this lock.

There is still a tiny window between ref_count reaching zero and the end
of the callout function where it is unsafe to unload the module. Fixing
this would require the use of callout_drain(), but this can't be done
because dummynet holds a mutex and callout_drain() might sleep.

Remove the callout_pending(), callout_active(), and callout_deactivate()
calls from calculate_drop_prob(). They are not needed because this callout
uses callout_init_mtx().

Submitted by: Rasool Al-Saadi <ralsaadi@swin.edu.au>
Differential Revision: https://reviews.freebsd.org/D6928

301772 10-Jun-2016 truckman

MFC r300779, r300781, r300783, r300784, r300949, r301162, r301180

r300779 | truckman | 2016-05-26 14:40:13 -0700 (Thu, 26 May 2016) | 64 lines

Import Dummynet AQM version 0.2.1 (CoDel, FQ-CoDel, PIE and FQ-PIE).

Centre for Advanced Internet Architectures

Implementing AQM in FreeBSD

* Overview <http://caia.swin.edu.au/freebsd/aqm/index.html>

* Articles, Papers and Presentations
<http://caia.swin.edu.au/freebsd/aqm/papers.html>

* Patches and Tools <http://caia.swin.edu.au/freebsd/aqm/downloads.html>

Overview

Recent years have seen a resurgence of interest in better managing
the depth of bottleneck queues in routers, switches and other places
that get congested. Solutions include transport protocol enhancements
at the end-hosts (such as delay-based or hybrid congestion control
schemes) and active queue management (AQM) schemes applied within
bottleneck queues.

The notion of AQM has been around since at least the late 1990s
(e.g. RFC 2309). In recent years the proliferation of oversized
buffers in all sorts of network devices (aka bufferbloat) has
stimulated keen community interest in four new AQM schemes -- CoDel,
FQ-CoDel, PIE and FQ-PIE.

The IETF AQM working group is looking to document these schemes,
and independent implementations are a corner-stone of the IETF's
process for confirming the clarity of publicly available protocol
descriptions. While significant development work on all three schemes
has occured in the Linux kernel, there is very little in FreeBSD.

Project Goals

This project began in late 2015, and aims to design and implement
functionally-correct versions of CoDel, FQ-CoDel, PIE and FQ_PIE
in FreeBSD (with code BSD-licensed as much as practical). We have
chosen to do this as extensions to FreeBSD's ipfw/dummynet firewall
and traffic shaper. Implementation of these AQM schemes in FreeBSD
will:
* Demonstrate whether the publicly available documentation is
sufficient to enable independent, functionally equivalent implementations

* Provide a broader suite of AQM options for sections the networking
community that rely on FreeBSD platforms

Program Members:

* Rasool Al Saadi (developer)

* Grenville Armitage (project lead)

Acknowledgements:

This project has been made possible in part by a gift from the
Comcast Innovation Fund.

Submitted by: Rasool Al-Saadi <ralsaadi@swin.edu.au>
X-No objection: core
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D6388

[Remove some code that was added to the mq_append() inline function in
HEAD by r258457, which was not merged to stable/10. The AQM patch
moved mq_append() from ip_dn_io.c to the new file ip_dn_private.h, so
we need to remove that copy of the r258457 changes.]
------------------------------------------------------------------------
r300781 | truckman | 2016-05-26 14:44:52 -0700 (Thu, 26 May 2016) | 7 lines

Modify BOUND_VAR() macro to wrap all of its arguments in () and tweak
its expression to work on powerpc and sparc64 (gcc compatibility).

Correct a typo in a nearby comment.

MFC after: 2 weeks (with r300779)

------------------------------------------------------------------------
r300783 | truckman | 2016-05-26 15:03:28 -0700 (Thu, 26 May 2016) | 4 lines

Correct a typo in a comment.

MFC after: 2 weeks (with r300779)

------------------------------------------------------------------------
r300784 | truckman | 2016-05-26 15:07:09 -0700 (Thu, 26 May 2016) | 5 lines

Include the new AQM files when compiling a kernel with options DUMMYNET.

Reported by: Nikolay Denev <nike_d AT cytexbg DOT com>
MFC after: 2 weeks (with r300779)

------------------------------------------------------------------------
r300949 | truckman | 2016-05-29 00:23:56 -0700 (Sun, 29 May 2016) | 10 lines

Cast some expressions that multiply a long long constant by a
floating point constant to int64_t. This avoids the runtime
conversion of the the other operand in a set of comparisons from
int64_t to floating point and doing the comparisions in floating
point.

Suggested by: lidl
Submitted by: Rasool Al-Saadi <ralsaadi@swin.edu.au>
MFC after: 2 weeks (with r300779)

------------------------------------------------------------------------
r301162 | truckman | 2016-06-01 13:04:24 -0700 (Wed, 01 Jun 2016) | 9 lines

Replace constant expressions that contain multiplications by
fractional floating point values with integer divides. This will
eliminate any chance that the compiler will generate code to evaluate
the expression using floating point at runtime.

Suggested by: bde
Submitted by: Rasool Al-Saadi <ralsaadi@swin.edu.au>
MFC after: 8 days (with r300779 and r300949)

------------------------------------------------------------------------
r301180 | truckman | 2016-06-01 17:42:15 -0700 (Wed, 01 Jun 2016) | 2 lines

Belatedly bump .Dd date for Dummynet AQM import in r300779.

Relnotes: yes

301231 03-Jun-2016 truckman

MFC r266941, r266955

Needed for anticipated dummynet AQM MFC next week.

r266941 | hiren | 2014-06-01 00:28:24 -0700 (Sun, 01 Jun 2014) | 9 lines

ECN marking implenetation for dummynet.
Changes include both DCTCP and RFC 3168 ECN marking methodology.

DCTCP draft: http://tools.ietf.org/html/draft-bensley-tcpm-dctcp-00

Submitted by: Midori Kato (aoimidori27@gmail.com)
Worked with: Lars Eggert (lars@netapp.com)
Reviewed by: luigi, hiren

r266955 | hiren | 2014-06-01 13:19:17 -0700 (Sun, 01 Jun 2014) | 5 lines

DNOLD_IS_ECN introduced by r266941 is not required.
DNOLD_* flags are for compat with old binaries.

Suggested by: luigi

Discussed with: hiren
Relnotes: yes

300979 30-May-2016 kp

MFC 300501, 300508

pf: Fix ICMP translation

Fix ICMP source address rewriting in rdr scenarios.

pf: Fix more ICMP mistranslation

In the default case fix the substitution of the destination address.

PR: 201519
Submitted by: Max <maximos@als.nnov.ru>

300552 24-May-2016 kp

MFC 300307:

pf: Fix fragment timeout

We were inconsistent about the use of time_second vs. time_uptime.
Always use time_uptime so the value can be meaningfully compared.

Submitted by: "Max" <maximos@als.nnov.ru>

298133 16-Apr-2016 loos

MFC r287009, r287120 and r298131:

Add ALTQ(9) support for the CoDel algorithm.

CoDel is a parameterless queue discipline that handles variable bandwidth
and RTT.

It can be used as the single queue discipline on an interface or as a sub
discipline of existing queue disciplines such as PRIQ, CBQ, HFSC, FAIRQ.

Obtained from: pfSense
Sponsored by: Rubicon Communications (Netgate)

298091 16-Apr-2016 loos

MFC r284777, r284814, r284863 and r298088:

ALTQ FAIRQ discipline import from DragonFLY.

Differential Revision: https://reviews.freebsd.org/D2847
Obtained from: pfSense
Sponsored by: Rubicon Communications (Netgate)

297429 30-Mar-2016 kp

MFC 296932:
pf: Improve forwarding detection

When we guess the nature of the outbound packet (output vs. forwarding) we need
to take bridges into account. When bridging the input interface does not match
the output interface, but we're not forwarding. Similarly, it's possible for the
interface to actually be the bridge interface itself (and not a member interface).

297228 24-Mar-2016 hselasky

MFC r292254:

Properly drain callouts in the IPFW subsystem to avoid use after free
panics when unloading the dummynet and IPFW modules:

- The callout drain function can sleep and should not be called having
a non-sleepable lock locked. Remove locks around "ipfw_dyn_uninit(0)".

- Add a new "dn_gone" variable to prevent asynchronous restart of
dummynet callouts when unloading the dummynet kernel module.

- Call "dn_reschedule()" locked so that "dn_gone" can be set and
checked atomically with regard to starting a new callout.

PR: 208171
Requested by: Franco Fichtner (opnsense.org)
Differential Revision: https://reviews.freebsd.org/D3855

296649 11-Mar-2016 ae

MFC r296348:
Use correct size for malloc.

296340 03-Mar-2016 kp

MFC: r296025:

pf: Fix possible out-of-bounds write

In the DIOCRSETADDRS ioctl() handler we allocate a table for struct pfr_addrs,
which is processed in pfr_set_addrs(). At the users request we also provide
feedback on the deleted addresses, by storing them after the new list
('bcopy(&ad, addr + size + i, sizeof(ad));' in pfr_set_addrs()).

This means we write outside the bounds of the buffer we've just allocated.
We need to look at pfrio_size2 instead (i.e. the size the user reserved for our
feedback). That'd allow a malicious user to specify a smaller pfrio_size2 than
pfrio_size though, in which case we'd still read outside of the allocated
buffer. Instead we allocate the largest of the two values.

Reported By: Paul J Murphy <paul@inetstat.net>
PR: 207463
Approved by: re (marius)

296311 02-Mar-2016 ae

MFC r295969:
Fix bug in filling and handling ipfw's O_DSCP opcode.
Due to integer overflow CS4 token was handled as BE.

PR: 207459
Approved by: re (gjb)

295894 22-Feb-2016 garga

MFC r286641 (from oshogbo):

Use correct src/dst ports when removing states.

Submitted by: Milosz Kaniewski <m.kaniewski@wheelsystems.com>,
UMEZAWA Takeshi <umezawa@iij.ad.jp> (orginal)
Reviewed by: glebius
Approved by: re (marius)
Obtained from: OpenBSD
Sponsored by: Rubicon Communications (Netgate)
Differential revision: https://reviews.freebsd.org/D5392

295402 08-Feb-2016 glebius

Merge r264915: fix NULL pointer derefernce with special sequence of
DIOCADDADDR and DIOCADDRULE.

PR: 206933
Approved by: re (marius)

291772 04-Dec-2015 bdrewery

MFC r291001:

ipfw: Fix dynamic IPv6 rules showing junk for non-specified address masks.

Relnotes: yes

290669 11-Nov-2015 kp

MFC r290161:

pf: Fix IPv6 checksums with route-to.

When using route-to (or reply-to) pf sends the packet directly to the output
interface. If that interface doesn't support checksum offloading the checksum
has to be calculated in software.
That was already done in the IPv4 case, but not for the IPv6 case. As a result
we'd emit packets with pseudo-header checksums (i.e. incorrect checksums).

This issue was exposed by the changes in r289316 when pf stopped performing full
checksum calculations for all packets.

Submitted by: Luoqi Chen

289703 21-Oct-2015 kp

MFC r289316:

pf: Fix TSO issues

In certain configurations (mostly but not exclusively as a VM on Xen) pf
produced packets with an invalid TCP checksum.

The problem was that pf could only handle packets with a full checksum. The
FreeBSD IP stack produces TCP packets with a pseudo-header checksum (only
addresses, length and protocol).
Certain network interfaces expect to see the pseudo-header checksum, so they
end up producing packets with invalid checksums.

To fix this stop calculating the full checksum and teach pf to only update TCP
checksums if TSO is disabled or the change affects the pseudo-header checksum.

PR: 154428, 193579, 198868
Relnotes: yes
Sponsored by: RootBSD

287963 18-Sep-2015 melifaro

MFC r266310

Fix wrong formatting of 0.0.0.0/X table records in ipfw(8).

Add `flags` u16 field to the hole in ipfw_table_xentry structure.
Kernel has been guessing address family for supplied record based
on xent length size.
Userland, however, has been getting fixed-size ipfw_table_xentry structures
guessing address family by checking address by IN6_IS_ADDR_V4COMPAT().

Fix this behavior by providing specific IPFW_TCF_INET flag for IPv4 records.

PR: bin/189471,kern/200169

287680 11-Sep-2015 kp

MFC r287376

pf: Fix misdetection of forwarding when net.link.bridge.pfil_bridge is set

If net.link.bridge.pfil_bridge is set we can end up thinking we're forwarding
in pf_test6() because the rcvif and the ifp (output interface) are different.
In that case we're bridging though, and the rcvif the the bridge member on
which the packet was received and ifp is the bridge itself.
If we'd set dir to PF_FWD we'd end up calling ip6_forward() which is
incorrect.

Instead check if the rcvif is a member of the ifp bridge. (In other words, the
if_bridge is the ifp's softc). If that's the case we're not forwarding but
bridging.

PR: 202351

287207 27-Aug-2015 loos

MFC r287119:

Reapply r196551 which was accidentally reverted by r223637 (update to
OpenBSD pf 4.5).

Fix argument ordering to memcpy as well as the size of the copy in the
(theoretical) case that pfi_buffer_cnt should be greater than ~_max.

This fix the failure when you hit the self table size and force it to be
resized.

Sponsored by: Rubicon Communications (Netgate)

286961 20-Aug-2015 loos

MFC r286862:

Fix the copy of addresses passed from userland in table replace command.

The size2 is the maximum userland buffer size (used when the addresses are
copied back to userland).

Obtained from: pfSense
Sponsored by: Rubicon Communications (Netgate)

286125 31-Jul-2015 garga

MFC r285945, r285960:

Respect pf rule log option before log dropped packets with IP options or
dangerous v6 headers

Reviewed by: gnn, eri
Approved by: gnn, glebius
Obtained from: pfSense
Sponsored by: Netgate
Differential Revision: https://reviews.freebsd.org/D3222

286079 30-Jul-2015 gjb

MFC r285999 (kp):
pf: Always initialise pf_fragment.fr_flags

When we allocate the struct pf_fragment in pf_fillup_fragment() we
forgot to initialise the fr_flags field. As a result we sometimes
mistakenly thought the fragment to not be a buffered fragment.
This resulted in panics because we'd end up freeing the pf_fragment
but not removing it from V_pf_fragqueue (believing it to be part of
V_pf_cachequeue). The next time we iterated V_pf_fragqueue we'd use
a freed object and panic.

While here also fix a pf_fragment use after free in pf_normalize_ip().
pf_reassemble() frees the pf_fragment, so we can't use it any more.

X-MFS-To: releng/10.2
Sponsored by: The FreeBSD Foundation

286004 29-Jul-2015 glebius

Merge r285944: fix typo: delete nsn if we were the last reference.

285943 28-Jul-2015 glebius

Merge r283106:
During module unload unlock rules before destroying UMA zones, which
may sleep in uma_drain(). It is safe to unlock here, since we are already
dehooked from pfil(9) and all pf threads had quit.

285941 28-Jul-2015 glebius

Merge r283061, r283063: don't dereference NULL is pf_get_mtag() fails.

PR: 200222

285940 28-Jul-2015 glebius

Merge 280169: always lock the hash row of a source node when updating
its 'states' counter.

PR: 182401

285939 28-Jul-2015 glebius

Merge r271458:
- Provide a sleepable lock to protect against ioctl() vs ioctl() races.
- Use the new lock to protect against simultaneous DIOCSTART and/or
DIOCSTOP ioctls.

284581 18-Jun-2015 kp

Merge r284280

pf: Remove frc_direction

We don't use the direction of the fragments for anything. The frc_direction
field is assigned, but never read.
Just remove it.

Differential Revision: https://reviews.freebsd.org/D2825
Reviewed by: gnn

284580 18-Jun-2015 kp

Merge r284222, r284260

pf: address family must be set when creating a pf_fragment

Fix a panic when handling fragmented ip4 packets with 'drop-ovl' set.
In that scenario we take a different branch in pf_normalize_ip(), taking us to
pf_fragcache() (rather than pf_reassemble()). In pf_fragcache() we create a
pf_fragment, but do not set the address family. This leads to a panic when we
try to insert that into pf_frag_tree because pf_addr_cmp(), which is used to
compare the pf_fragments doesn't know what to do if the address family is not
set.

Simply ensure that the address family is set correctly (always AF_INET in this
path).

When we try to look up a pf_fragment with pf_find_fragment() we compare (see
pf_frag_compare()) addresses (and family), but also protocol. We failed to
save the protocol to the pf_fragment in pf_fragcache(), resulting in failing
reassembly.

PR: 200330
Differential Revision: https://reviews.freebsd.org/D2824
Reviewed by: gnn

284579 18-Jun-2015 kp

Merge r278874, r278925, r278868

- Improve INET/INET6 scope.
- style(9) declarations.
- Make couple of local functions static.
- Even more fixes to !INET and !INET6 kernels.
In collaboration with pluknet
- Toss declarations to fix regular build and NO_INET6 build.

Differential Revision: https://reviews.freebsd.org/D2823
Reviewed by: gnn

284577 18-Jun-2015 kp

Merge r281536

pf: Fix forwarding detection

If the direction is not PF_OUT we can never be forwarding. Some input packets
have rcvif != ifp (looped back packets), which lead us to ip6_forward() inbound
packets, causing panics.

Equally, we need to ensure that packets were really received and not locally
generated before trying to ip6_forward() them.

Differential Revision: https://reviews.freebsd.org/D2822
Reviewed by: gnn

284574 18-Jun-2015 kp

Merge r281164

pf: Skip firewall for refragmented ip6 packets

In cases where we scrub (fragment reassemble) on both input and output
we risk ending up in infinite loops when forwarding packets.

Fragmented packets come in and get collected until we can defragment. At
that point the defragmented packet is handed back to the ip stack (at
the pfil point in ip6_input(). Normal processing continues.

Eventually we figure out that the packet has to be forwarded and we end
up at the pfil hook in ip6_forward(). After doing the inspection on the
defragmented packet we see that the packet has been defragmented and
because we're forwarding we have to refragment it.

In pf_refragment6() we split the packet up again and then ip6_forward()
the individual fragments. Those fragments hit the pfil hook on the way
out, so they're collected until we can reconstruct the full packet, at
which point we're right back where we left off and things continue until
we run out of stack.

Break that loop by marking the fragments generated by pf_refragment6()
as M_SKIP_FIREWALL. There's no point in processing those packets in the
firewall anyway. We've already filtered on the full packet.

Differential Revision: https://reviews.freebsd.org/D2819
Reviewed by: gnn

284573 18-Jun-2015 kp

Merge r280956

pf: Deal with runt packets

On Ethernet packets have a minimal length, so very short packets get padding
appended to them. This padding is not stripped off in ip6_input() (due to
support for IPv6 Jumbograms, RFC2675).
That means PF needs to be careful when reassembling fragmented packets to not
include the padding in the reassembled packet.

While here also remove the 'Magic from ip_input.' bits. Splitting up and
re-joining an mbuf chain here doesn't make any sense.

Differential Revision: https://reviews.freebsd.org/D2818
Reviewed by: gnn

284572 18-Jun-2015 kp

Merge r280955

Preserve IPv6 fragment IDs accross reassembly and refragmentation

When forwarding fragmented IPv6 packets and filtering with PF we
reassemble and refragment. That means we generate new fragment headers
and a new fragment ID.

We already save the fragment IDs so we can do the reassembly so it's
straightforward to apply the incoming fragment ID on the refragmented
packets.

Differential Revision: https://reviews.freebsd.org/D2817
Reviewed by: gnn

284571 18-Jun-2015 kp

Merge r278843, r278858

In the forwarding case refragment the reassembled packets with the same
size as they arrived in. This allows the sender to determine the optimal
fragment size by Path MTU Discovery.

Roughly based on the OpenBSD work by Alexander Bluhm.

Differential Revision: https://reviews.freebsd.org/D2816
Reviewed by: gnn

284569 18-Jun-2015 kp

Merge r278831, r278834

Update the pf fragment handling code to closer match recent OpenBSD.
That partially fixes IPv6 fragment handling.

Differential Revision: https://reviews.freebsd.org/D2814
Reviewed by: gnn

283303 22-May-2015 jhb

MFC 266852,270223:
- Fix pf(4) to build with MAXCPU set to 256. MAXCPU is actually a count,
not a maximum ID value (so it is a cap on mp_ncpus, not mp_maxid).
- Bump MAXCPU on amd64 from 64 to 256. In practice APIC only permits 255
CPUs (IDs 0 through 254). Getting above that limit requires x2APIC.

282688 09-May-2015 gnn

MFC: 281529

I can find no reason to allow packets with both SYN and FIN bits
set past this point in the code. The packet should be dropped and
not massaged as it is here.

Differential Revision: https://reviews.freebsd.org/D2266
Submitted by: eri
Sponsored by: Rubicon Communications (Netgate)

280251 19-Mar-2015 ae

MFC r279910:
Reset mbuf pointer to NULL in fastroute case to indicate that mbuf was
consumed by filter. This fixes several panics due to accessing to mbuf
after free.

277581 23-Jan-2015 glebius

Merge r274709 by eri@: deal with IPv6 same way as we IPv4 and calculate
the checksum before entering pf_test6().

PR: 172648, 179392

274486 13-Nov-2014 gnn

MFC: 272906

Change the PF hash from Jenkins to Murmur3. In forwarding tests
this showed a conservative 3% incrase in PPS.

Original Differential Revision: https://reviews.freebsd.org/D461
Submitted by: des
Reviewed by: emaste

273736 27-Oct-2014 hselasky

MFC r263710, r273377, r273378, r273423 and r273455:

- De-vnet hash sizes and hash masks.
- Fix multiple issues related to arguments passed to SYSCTL macros.

Sponsored by: Mellanox Technologies


/freebsd-10-stable/sys/amd64/amd64/fpu.c
/freebsd-10-stable/sys/arm/arm/busdma_machdep-v6.c
/freebsd-10-stable/sys/arm/arm/busdma_machdep.c
/freebsd-10-stable/sys/cam/scsi/scsi_sa.c
/freebsd-10-stable/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c
/freebsd-10-stable/sys/cddl/dev/dtrace/dtrace_sysctl.c
/freebsd-10-stable/sys/compat/ndis/kern_ndis.c
/freebsd-10-stable/sys/dev/acpi_support/acpi_asus.c
/freebsd-10-stable/sys/dev/acpi_support/acpi_asus_wmi.c
/freebsd-10-stable/sys/dev/acpi_support/acpi_hp.c
/freebsd-10-stable/sys/dev/acpi_support/acpi_ibm.c
/freebsd-10-stable/sys/dev/acpi_support/acpi_rapidstart.c
/freebsd-10-stable/sys/dev/acpi_support/acpi_sony.c
/freebsd-10-stable/sys/dev/bxe/bxe.c
/freebsd-10-stable/sys/dev/cxgb/cxgb_sge.c
/freebsd-10-stable/sys/dev/cxgbe/t4_main.c
/freebsd-10-stable/sys/dev/e1000/if_em.c
/freebsd-10-stable/sys/dev/e1000/if_igb.c
/freebsd-10-stable/sys/dev/e1000/if_lem.c
/freebsd-10-stable/sys/dev/hatm/if_hatm.c
/freebsd-10-stable/sys/dev/ixgbe/ixgbe.c
/freebsd-10-stable/sys/dev/ixgbe/ixv.c
/freebsd-10-stable/sys/dev/ixl/if_ixl.c
/freebsd-10-stable/sys/dev/mpr/mpr.c
/freebsd-10-stable/sys/dev/mps/mps.c
/freebsd-10-stable/sys/dev/mrsas/mrsas.c
/freebsd-10-stable/sys/dev/mrsas/mrsas.h
/freebsd-10-stable/sys/dev/mxge/if_mxge.c
/freebsd-10-stable/sys/dev/oce/oce_sysctl.c
/freebsd-10-stable/sys/dev/qlxgb/qla_os.c
/freebsd-10-stable/sys/dev/qlxgbe/ql_os.c
/freebsd-10-stable/sys/dev/rt/if_rt.c
/freebsd-10-stable/sys/dev/sound/pci/hda/hdaa.c
/freebsd-10-stable/sys/dev/vxge/vxge.c
/freebsd-10-stable/sys/dev/xen/netfront/netfront.c
/freebsd-10-stable/sys/fs/devfs/devfs_devs.c
/freebsd-10-stable/sys/fs/fuse/fuse_main.c
/freebsd-10-stable/sys/fs/fuse/fuse_vfsops.c
/freebsd-10-stable/sys/fs/nfsserver/nfs_nfsdkrpc.c
/freebsd-10-stable/sys/geom/geom_kern.c
/freebsd-10-stable/sys/kern/kern_cpuset.c
/freebsd-10-stable/sys/kern/kern_descrip.c
/freebsd-10-stable/sys/kern/kern_mib.c
/freebsd-10-stable/sys/kern/kern_synch.c
/freebsd-10-stable/sys/kern/subr_devstat.c
/freebsd-10-stable/sys/kern/subr_kdb.c
/freebsd-10-stable/sys/kern/subr_uio.c
/freebsd-10-stable/sys/kern/vfs_cache.c
/freebsd-10-stable/sys/mips/mips/busdma_machdep.c
/freebsd-10-stable/sys/net/if_lagg.c
/freebsd-10-stable/sys/net/pfvar.h
/freebsd-10-stable/sys/net80211/ieee80211_ht.c
/freebsd-10-stable/sys/net80211/ieee80211_hwmp.c
/freebsd-10-stable/sys/net80211/ieee80211_mesh.c
/freebsd-10-stable/sys/net80211/ieee80211_superg.c
/freebsd-10-stable/sys/netgraph/bluetooth/common/ng_bluetooth.c
/freebsd-10-stable/sys/netgraph/ng_base.c
/freebsd-10-stable/sys/netgraph/ng_socket.c
/freebsd-10-stable/sys/netinet/cc/cc_chd.c
/freebsd-10-stable/sys/netinet/tcp_reass.c
/freebsd-10-stable/sys/netipsec/ipsec.h
/freebsd-10-stable/sys/netipx/ipx_proto.c
pf/if_pfsync.c
pf/pf.c
pf/pf_ioctl.c
/freebsd-10-stable/sys/ofed/drivers/net/mlx4/mlx4_en.h
/freebsd-10-stable/sys/powerpc/powermac/fcu.c
/freebsd-10-stable/sys/powerpc/powermac/smu.c
/freebsd-10-stable/sys/powerpc/powerpc/busdma_machdep.c
/freebsd-10-stable/sys/powerpc/powerpc/cpu.c
/freebsd-10-stable/sys/sys/sysctl.h
/freebsd-10-stable/sys/vm/memguard.c
/freebsd-10-stable/sys/vm/vm_kern.c
/freebsd-10-stable/sys/x86/x86/busdma_bounce.c
273184 16-Oct-2014 glebius

Merge r272358 from head:
Use rn_detachhead() instead of direct free(9) for radix tables.

271306 09-Sep-2014 glebius

Merge r270928: explicitly free packet on PF_DROP, otherwise a "quick"
rule with "route-to" may still forward it.

PR: 177808
Approved by: re (gjb)

270925 01-Sep-2014 glebius

Fix ABI broken in r270576. This is direct commit to stable/10.

Reported by: kib

270577 25-Aug-2014 glebius

Merge r270023 from head:
Do not lookup source node twice when pf_map_addr() is used.

PR: 184003
Submitted by: Kajetan Staszkiewicz <vegeta tuxpowered.net>
Sponsored by: InnoGames GmbH

270576 25-Aug-2014 glebius

Merge r270022 from head:
pf_map_addr() can fail and in this case we should drop the packet,
otherwise bad consequences including a routing loop can occur.

Move pf_set_rt_ifp() earlier in state creation sequence and
inline it, cutting some extra code.

PR: 183997
Submitted by: Kajetan Staszkiewicz <vegeta tuxpowered.net>
Sponsored by: InnoGames GmbH

270575 25-Aug-2014 glebius

Merge 270010 from head:
Fix synproxy with IPv6. pf_test6() was missing a check for M_SKIP_FIREWALL.

PR: 127920
Submitted by: Kajetan Staszkiewicz <vegeta tuxpowered.net>
Sponsored by: InnoGames GmbH

270574 25-Aug-2014 glebius

Merge r269998 from head:
- Count global pf(4) statistics in counter(9).
- Do not count global number of states and of src_nodes,
use uma_zone_get_cur() to obtain values.
- Struct pf_status becomes merely an ioctl API structure,
and moves to netpfil/pf/pf.h with its constants.
- V_pf_status is now of type struct pf_kstatus.

Submitted by: Kajetan Staszkiewicz <vegeta tuxpowered.net>
Sponsored by: InnoGames GmbH

270328 22-Aug-2014 glebius

Merge r268492:
On machines with strict alignment copy pfsync_state_key from packet
on stack to avoid unaligned access.

PR: 187381

266678 26-May-2014 ae

MFC r266399:
Since ipfw nat configures all options in one step, we should set all bits
in the mask when calling LibAliasSetMode() to properly clear unneeded
options.

PR: 189655

265700 08-May-2014 melifaro

Merge r258708, r258711, r260247, r261117.

r258708:
Check ipfw table numbers in both user and kernel space before rule addition.
Found by: Saychik Pavel <umka@localka.net>

r258711:
Simplify O_NAT opcode handling.

r260247:
Use rnh_matchaddr instead of rnh_lookup for longest-prefix match.
rnh_lookup is effectively the same as rnh_matchaddr if called with
empy network mask.

r261117:
Reorder struct ip_fw_chain:
* move rarely-used fields down
* move uh_lock to different cacheline
* remove some usused fields

265227 02-May-2014 trociny

MFC r264963:

Define startup order the same way as it is in dummynet.

265008 27-Apr-2014 mm

MFC r264689:
De-virtualize UMA zone pf_mtag_z and move to global initialization part.

The m_tag struct does not know about vnet context and the pf_mtag_free()
callback is called unaware of current vnet. This causes a panic.

PR: kern/182964

264813 23-Apr-2014 ae

MFC r264540:
Set oif only for outgoing packets.

PR: 188543

264804 23-Apr-2014 brueffer

MFC: r264421

Free resources in error cases; re-indent a curly brace while here.

CID: 1199366
Found with: Coverity Prevent(tm)

264454 14-Apr-2014 mm

MFC r264220:
Execute pf_overload_task() in vnet context. Fixes a vnet kernel panic.

Reviewed by: trociny

263680 24-Mar-2014 glebius

Merge r263497: fix ipfw + VIMAGE sysctls.

PR: kern/187665

263478 21-Mar-2014 glebius

Merge r262763, r262767, r262771, r262806 from head:
- Remove rt_metrics_lite and simply put its members into rtentry.
- Use counter(9) for rt_pksent (former rt_rmx.rmx_pksent). This
removes another cache trashing ++ from packet forwarding path.
- Create zini/fini methods for the rtentry UMA zone. Via initialize
mutex and counter in them.
- Fix reporting of rmx_pksent to routing socket.
- Fix netstat(1) to report "Use" both in kvm(3) and sysctl(3) mode.

263086 12-Mar-2014 glebius

Bulk sync of pf changes from head, in attempt to fixup broken build I
made in r263029.

Merge r257186,257215,257349,259736,261797.

These changesets split pfvar.h into several smaller headers and make
userland utilities to include only some of them.

263029 11-Mar-2014 glebius

Merge r261882, r261898, r261937, r262760, r262799:
Once pf became not covered by a single mutex, many counters in it became
race prone. Some just gather statistics, but some are later used in
different calculations.

A real problem was the race provoked underflow of the states_cur counter
on a rule. Once it goes below zero, it wraps to UINT32_MAX. Later this
value is used in pf_state_expires() and any state created by this rule
is immediately expired.

Thus, make fields states_cur, states_tot and src_nodes of struct
pf_rule be counter(9)s.

263027 11-Mar-2014 glebius

Merge r261029: remove NULL pointer dereference.

263026 11-Mar-2014 glebius

Merge r261028: fix resource leak and simplify code for DIOCCHANGEADDR.

262210 19-Feb-2014 dim

MFC r261915:

Under sys/netpfil/ipfw, surround two IPv6-specific static functions with
#ifdef INET6, since they are unused when INET6 is disabled.

261023 22-Jan-2014 glebius

Merge r260377: fix panic on pf_get_translation() failure.

PR: 182557

261019 22-Jan-2014 glebius

Merge r258478, r258479, r258480, r259719: fixes related to mass source
nodes removal.

PR: 176763

261018 22-Jan-2014 glebius

Merge several fixlets from head:

r257619: Remove unused PFTM_UNTIL_PACKET const.
r257620: Code logic of handling PFTM_PURGE into pf_find_state().
r258475: Don't compare unsigned <= 0.
r258477: Fix off by ones when scanning source nodes hash.

258912 04-Dec-2013 rodrigc

MFC r258588

In sys/netpfil/ipfw/ip_fw_nat.c:vnet_ipfw_nat_uninit() we call "IPFW_WLOCK(chain);".
This lock gets deleted in sys/netpfil/ipfw/ip_fw2.c:vnet_ipfw_uninit().

Therefore, vnet_ipfw_nat_uninit() *must* be called before vnet_ipfw_uninit(),
but this doesn't always happen, because the VNET_SYSINIT order is the same for both functions.
In sys/net/netpfil/ipfw/ip_fw2.c and sys/net/netpfil/ipfw/ip_fw_nat.c,
IPFW_SI_SUB_FIREWALL == IPFW_NAT_SI_SUB_FIREWALL == SI_SUB_PROTO_IFATTACHDOMAIN
and
IPFW_MODULE_ORDER == IPFW_NAT_MODULE_ORDER

Consequently, if VIMAGE is enabled, and jails are created and destroyed,
the system sometimes crashes, because we are trying to use a deleted lock.

To reproduce the problem:
(1) Take a GENERIC kernel config, and add options for: VIMAGE, WITNESS,
INVARIANTS.
(2) Run this command in a loop:
jail -l -u root -c path=/ name=foo persist vnet && jexec foo ifconfig lo0 127.0.0.1/8 && jail -r foo

(see http://lists.freebsd.org/pipermail/freebsd-current/2010-November/021280.html )

Fix the problem by increasing the value of IPFW_NAT_SI_SUB_FIREWALL,
so that vnet_ipfw_nat_uninit() runs after vnet_ipfw_uninit().

Approved by: re (gjb)

256281 10-Oct-2013 gjb

Copy head (r256279) to stable/10 as part of the 10.0-RELEASE cycle.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation


255928 28-Sep-2013 philip

Use the correct EtherType for logging IPv6 packets.

Reviewed by: melifaro
Approved by: re (kib, glebius)
MFC after: 3 days


255143 02-Sep-2013 glebius

Merge 1.12 of pf_lb.c from OpenBSD, with some changes. Original commit:

date: 2010/02/04 14:10:12; author: sthen; state: Exp; lines: +24 -19;
pf_get_sport() picks a random port from the port range specified in a
nat rule. It should check to see if it's in-use (i.e. matches an existing
PF state), if it is, it cycles sequentially through other ports until
it finds a free one. However the check was being done with the state
keys the wrong way round so it was never actually finding the state
to be in-use.

- switch the keys to correct this, avoiding random state collisions
with nat. Fixes PR 6300 and problems reported by robert@ and viq.

- check pf_get_sport() return code in pf_test(); if port allocation
fails the packet should be dropped rather than sent out untranslated.

Help/ok claudio@.

Some additional changes to 1.12:

- We also need to bzero() the key to zero padding, otherwise key
won't match.
- Collapse two if blocks into one with ||, since both conditions
lead to the same processing.
- Only naddr changes in the cycle, so move initialization of other
fields above the cycle.
- s/u_intXX_t/uintXX_t/g

PR: kern/181690
Submitted by: Olivier Cochard-Labbé <olivier cochard.me>
Sponsored by: Nginx, Inc.


254781 24-Aug-2013 mav

Make dummynet use new direct callout(9) execution mechanism. Since the only
thing done by the dummynet handler is taskqueue_enqueue() call, it doesn't
need extra switch to the clock SWI context.

On idle system this change in half reduces number of active CPU cycles and
wakes up only one CPU from sleep instead of two.

I was going to make this change much earlier as part of calloutng project,
but waited for better solution with skipping idle ticks to be implemented.
Unfortunately with 10.0 release coming it is better get at least this.


254776 24-Aug-2013 trociny

Make ipfw nat init/unint work correctly for VIMAGE:

* Do per vnet instance cleanup (previously it was only for vnet0 on
module unload, and led to libalias leaks and possible panics due to
stale pointer dereferences).

* Instead of protecting ipfw hooks registering/deregistering by only
vnet0 lock (which does not prevent pointers access from another
vnets), introduce per vnet ipfw_nat_loaded variable. The variable is
set after hooks are registered and unset before they are deregistered.

* Devirtualize ifaddr_event_tag as we run only one event handler for
all vnets.

* It is supposed that ifaddr_change event handler is called in the
interface vnet context, so add an assertion.

Reviewed by: zec
MFC after: 2 weeks


254523 19-Aug-2013 andre

Add m_clrprotoflags() to clear protocol specific mbuf flags at up and
downwards layer crossings.

Consistently use it within IP, IPv6 and ethernet protocols.

Discussed with: trociny, glebius


253769 29-Jul-2013 ae

Fix a possible NULL-pointer dereference on the pfsync(4) reconfiguration.

Reported by: Eugene M. Zheganin


251681 13-Jun-2013 glebius

Improve locking strategy between keys hash and ID hash.

Before this change state creating sequence was:

1) lock wire key hash
2) link state's wire key
3) unlock wire key hash
4) lock stack key hash
5) link state's stack key
6) unlock stack key hash
7) lock ID hash
8) link into ID hash
9) unlock ID hash

What could happen here is that other thread finds the state via key
hash lookup after 6), locks ID hash and does some processing of the
state. When the thread creating state unblocks, it finds the state
it was inserting already non-virgin.

Now we perform proper interlocking between key hash locks and ID hash
lock:

1) lock wire & stack hashes
2) link state's keys
3) lock ID hash
4) unlock wire & stack hashes
5) link into ID hash
6) unlock ID hash

To achieve that, the following hacking was performed in pf_state_key_attach():

- Key hash mutex is marked with MTX_DUPOK.
- To avoid deadlock on 2 key hash mutexes, we lock them in order determined
by their address value.
- pf_state_key_attach() had a magic to reuse a > FIN_WAIT_2 state. It unlinked
the conflicting state synchronously. In theory this could require locking
a third key hash, which we can't do now.
Now we do not remove the state immediately, instead we leave this task to
the purge thread. To avoid conflicts in a short period before state is
purged, we push to the very end of the TAILQ.
- On success, before dropping key hash locks, pf_state_key_attach() locks
ID hash and returns.

Tested by: Ian FREISLICH <ianf clue.co.za>


250522 11-May-2013 glebius

Return meaningful error code from pf_state_key_attach() and
pf_state_insert().


250521 11-May-2013 glebius

Better debug message.


250519 11-May-2013 glebius

Fix DIOCADDSTATE operation.


250518 11-May-2013 glebius

Invalid creatorid is always EINVAL, not only when we are in verbose mode.


250313 06-May-2013 glebius

Improve KASSERT() message.


250312 06-May-2013 glebius

Simplify printf().


250246 04-May-2013 melifaro

Use unified method for accessing / updating cached rule pointers.

MFC after: 2 weeks


250131 01-May-2013 eadler

Correct a few sizeof()s

Submitted by: swildner@DragonFlyBSD.org
Reviewed by: alfred


250039 29-Apr-2013 glebius

Remove useless ifdef KLD_MODULE from dummynet module unload path. This
fixes panic on unload.

Reported by: pho


249925 26-Apr-2013 glebius

Add const qualifier to the dst parameter of the ifnet if_output method.


248971 01-Apr-2013 melifaro

Fix ipfw rule validation partially broken by r248552.

Pointed by: avg
MFC with: r248552


248697 25-Mar-2013 ae

When we are removing a specific set, call ipfw_expire_dyn_rules only once.

Obtained from: Yandex LLC
MFC after: 1 week


248552 20-Mar-2013 melifaro

Add ipfw support for setting/matching DiffServ codepoints (DSCP).

Setting DSCP support is done via O_SETDSCP which works for both
IPv4 and IPv6 packets. Fast checksum recalculation (RFC 1624) is done for IPv4.
Dscp can be specified by name (AFXY, CSX, BE, EF), by value
(0..63) or via tablearg.

Matching DSCP is done via another opcode (O_DSCP) which accepts several
classes at once (af11,af22,be). Classes are stored in bitmask (2 u32 words).

Many people made their variants of this patch, the ones I'm aware of are
(in alphabetic order):

Dmitrii Tejblum
Marcelo Araujo
Roman Bogorodskiy (novel)
Sergey Matveichuk (sem)
Sergey Ryabin

PR: kern/102471, kern/121122
MFC after: 2 weeks


248491 19-Mar-2013 ae

Separate the locking macros that are used in the packet flow path
from others. This helps easy switch to use pfil(4) lock.


248324 15-Mar-2013 glebius

Use m_get/m_gethdr instead of compat macros.

Sponsored by: Nginx, Inc.


248207 12-Mar-2013 glebius

Functions m_getm2() and m_get2() have different order of arguments,
and that can drive someone crazy. While m_get2() is young and not
documented yet, change its order of arguments to match m_getm2().

Sorry for churn, but better now than later.


247626 02-Mar-2013 melifaro

Fix callout expiring dynamic rules.

PR: kern/175530
Submitted by: Vladimir Spiridenkov <vs@gtn.ru>
MFC after: 2 weeks


246822 15-Feb-2013 glebius

Finish the r244185. This fixes ever growing counter of pfsync bad
length packets, which was actually harmless.

Note that peers with different version of head/ may grow this
counter, but it is harmless - all pfsync data is processed.

Reported & tested by: Anton Yuzhaninov <citrin citrin.ru>
Sponsored by: Nginx, Inc


244769 28-Dec-2012 glebius

In netpfil/pf:
- Add my copyright to files I've touched a lot this year.
- Add dash in front of all copyright notices according to style(9).
- Move $OpenBSD$ down below copyright notices.
- Remove extra line between cdefs.h and __FBSDID.


244634 23-Dec-2012 melifaro

Add parentheses to IP_FW_ARG_TABLEARG() definition.

Suggested by: glebius
MFC with: r244633


244633 23-Dec-2012 melifaro

Use unified IP_FW_ARG_TABLEARG() macro for most tablearg checks.
Log real value instead of IP_FW_TABLEARG (65535) in ipfw_log().

Noticed by: Vitaliy Tokarenko <rphone@ukr.net>
MFC after: 2 weeks


244347 17-Dec-2012 pjd

Warn about reaching various PF limits.

Reviewed by: glebius
Obtained from: WHEEL Systems


244268 15-Dec-2012 trociny

In pfioctl, if the permission checks failed we returned with vnet context
set.

As the checks don't require vnet context, this is fixed by setting
vnet after the checks.

PR: kern/160541
Submitted by: Nikos Vassiliadis (slightly different approach)


244210 14-Dec-2012 glebius

Fix error in r235991. No-sleep version of IFNET_RLOCK() should
be used here, since we may hold the main pf rulesets rwlock.

Reported by: Fleuriot Damien <ml my.gd>


244202 14-Dec-2012 glebius

Fix VIMAGE build broken in r244185.

Submitted by: Nikolai Lifanov <lifanov mail.lifanov.com>


244185 13-Dec-2012 glebius

Merge rev. 1.119 from OpenBSD:

date: 2009/03/31 01:21:29; author: dlg; state: Exp; lines: +9 -16
...

this also firms up some of the input parsing so it handles short frames a
bit better.

This actually fixes reading beyond mbuf data area in pfsync_input(), that
may happen at certain pfsync datagrams.


244184 13-Dec-2012 glebius

Initialize state id prior to attaching state to key hash. Otherwise a
race can happen, when pf_find_state() finds state via key hash, and locks
id hash slot 0 instead of appropriate to state id slot.


244113 11-Dec-2012 glebius

Merge 1.127 from OpenBSD, that closes a regression from 1.125 (merged
as r242694):
do better detection of when we have a better version of the tcp sequence
windows than our peer.

this resolves the last of the pfsync traffic storm issues ive been able to
produce, and therefore makes it possible to do usable active-active
statuful firewalls with pf.


243944 06-Dec-2012 glebius

Rule memory garbage collecting in new pf scans only states that are on
id hash. If a state has been disconnected from id hash, its rule pointers
can no longer be dereferenced, and referenced memory can't be modified.
Thus, move rule statistics from pf_free_rule() to pf_unlink_rule() and
update them prior to releasing id hash slot lock.

Reported by: Ian FREISLICH <ianf cloudseed.co.za>


243941 06-Dec-2012 glebius

Close possible races between state deletion and sent being sent out
from pfsync:
- Call into pfsync_delete_state() holding the state lock.
- Set the state timeout to PFTM_UNLINKED after state has been moved
to the PFSYNC_S_DEL queue in pfsync.

Reported by: Ian FREISLICH <ianf cloudseed.co.za>


243940 06-Dec-2012 glebius

Remove extra PFSYNC_LOCK() in pfsync_bulk_update() which lead to lock
recursion.

Reported by: Ian FREISLICH <ianf cloudseed.co.za>


243939 06-Dec-2012 glebius

Revert erroneous r242693. A state may have PFTM_UNLINKED being on the
PFSYNC_S_DEL queue of pfsync.


243882 05-Dec-2012 glebius

Mechanically substitute flags from historic mbuf allocator with
malloc(9) flags within sys.

Exceptions:

- sys/contrib not touched
- sys/mbuf.h edited manually


243711 30-Nov-2012 melifaro

Use common macros for working with rule/dynamic counters.
This is done as preparation to introduce per-cpu ipfw counters.

MFC after: 3 weeks


243707 30-Nov-2012 melifaro

Make ipfw dynamic states operations SMP-ready.

* Global IPFW_DYN_LOCK() is changed to per-bucket mutex.
* State expiration is done in ipfw_tick every second.
* No expiration is done on forwarding path.
* hash table resize is done automatically and does not flush all states.
* Dynamic UMA zone is now allocated per each VNET
* State limiting is now done via UMA(9) api.

Discussed with: ipfw
MFC after: 3 weeks
Sponsored by: Yandex LLC


242834 09-Nov-2012 melifaro

Simplify sending keepalives.
Prepare ipfw_tick() to be used by other consumers.

Reviewed by: ae(basically)
MFC after: 2 weeks


242694 07-Nov-2012 glebius

Merge rev. 1.125 from OpenBSD:
date: 2009/06/12 02:03:51; author: dlg; state: Exp; lines: +59 -69
rewrite the way states from pfsync are merged into the local state tree
and the conditions on which pfsync will notify its peers on a stale update.

each side (ie, the sending and receiving side) of the state update is
compared separately. any side that is further along than the local state
tree is merged. if any side is further along in the local state table, an
update is sent out telling the peers about it.


242693 07-Nov-2012 glebius

It may happen that pfsync holds the last reference on a state. In this
case keys had already been freed. If encountering such state, then
just release last reference.

Not sure this can happen as a runtime race, but can be reproduced by
the following scenario:

- enable pfsync
- disable pfsync
- wait some time
- enable pfsync


242632 05-Nov-2012 melifaro

Add assertion to enforce 'nat global' locking requierements changed by r241908.

Suggested by: adrian, glebius
MFC after: 3 days


242631 05-Nov-2012 melifaro

Use unified print_dyn_rule_flags() function for debugging messages
instead of hand-made printfs in every place.

MFC after: 1 week


242463 02-Nov-2012 ae

Remove the recently added sysctl variable net.pfil.forward.
Instead, add protocol specific mbuf flags M_IP_NEXTHOP and
M_IP6_NEXTHOP. Use them to indicate that the mbuf's chain
contains the PACKET_TAG_IPFORWARD tag. And do a tag lookup
only when this flag is set.

Suggested by: andre


242161 26-Oct-2012 glebius

o Remove last argument to ip_fragment(), and obtain all needed information
on checksums directly from mbuf flags. This simplifies code.
o Clear CSUM_IP from the mbuf in ip_fragment() if we did checksums in
hardware. Some driver may not announce CSUM_IP in theur if_hwassist,
although try to do checksums if CSUM_IP set on mbuf. Example is em(4).
o While here, consistently use CSUM_IP instead of its alias CSUM_DELAY_IP.
After this change CSUM_DELAY_IP vanishes from the stack.

Submitted by: Sebastian Kuzminsky <seb lineratesystems.com>


242079 25-Oct-2012 ae

Remove the IPFIREWALL_FORWARD kernel option and make possible to turn
on the related functionality in the runtime via the sysctl variable
net.pfil.forward. It is turned off by default.

Sponsored by: Yandex LLC
Discussed with: net@
MFC after: 2 weeks


241913 22-Oct-2012 glebius

Switch the entire IPv4 stack to keep the IP packet header
in network byte order. Any host byte order processing is
done in local variables and host byte order values are
never[1] written to a packet.

After this change a packet processed by the stack isn't
modified at all[2] except for TTL.

After this change a network stack hacker doesn't need to
scratch his head trying to figure out what is the byte order
at the given place in the stack.

[1] One exception still remains. The raw sockets convert host
byte order before pass a packet to an application. Probably
this would remain for ages for compatibility.

[2] The ip_input() still subtructs header len from ip->ip_len,
but this is planned to be fixed soon.

Reviewed by: luigi, Maxim Dounin <mdounin mdounin.ru>
Tested by: ray, Olivier Cochard-Labbe <olivier cochard.me>


241908 22-Oct-2012 melifaro

Remove unnecessary chain read lock in ipfw nat 'global' code.
Document case when ipfw chain lock must be held while calling ipfw_nat().

MFC after: 2 weeks


241610 16-Oct-2012 glebius

Make the "struct if_clone" opaque to users of the cloning API. Users
now use function calls:

if_clone_simple()
if_clone_advanced()

to initialize a cloner, instead of macros that initialize if_clone
structure.

Discussed with: brooks, bz, 1 year ago


241394 10-Oct-2012 kevlo

Revert previous commit...

Pointyhat to: kevlo (myself)


241370 09-Oct-2012 kevlo

Prefer NULL over 0 for pointers


241369 09-Oct-2012 kevlo

Fix typo: s/unknow/unknown


241360 08-Oct-2012 glebius

Any pfil(9) hooks should be called with already set VNET context.

Reviewed by: bz


241359 08-Oct-2012 glebius

Catch up with r241245 and do not return packet back in host byte order.


241344 08-Oct-2012 glebius

After r241245 it appeared that in_delayed_cksum(), which still expects
host byte order, was sometimes called with net byte order. Since we are
moving towards net byte order throughout the stack, the function was
converted to expect net byte order, and its consumers fixed appropriately:
- ip_output(), ipfilter(4) not changed, since already call
in_delayed_cksum() with header in net byte order.
- divert(4), ng_nat(4), ipfw_nat(4) now don't need to swap byte order
there and back.
- mrouting code and IPv6 ipsec now need to switch byte order there and
back, but I hope, this is temporary solution.
- In ipsec(4) shifted switch to net byte order prior to in_delayed_cksum().
- pf_route() catches up on r241245 changes to ip_output().


241245 06-Oct-2012 glebius

A step in resolving mess with byte ordering for AF_INET. After this change:

- All packets in NETISR_IP queue are in net byte order.
- ip_input() is entered in net byte order and converts packet
to host byte order right _after_ processing pfil(9) hooks.
- ip_output() is entered in host byte order and converts packet
to net byte order right _before_ processing pfil(9) hooks.
- ip_fragment() accepts and emits packet in net byte order.
- ip_forward(), ip_mloopback() use host byte order (untouched actually).
- ip_fastforward() no longer modifies packet at all (except ip_ttl).
- Swapping of byte order there and back removed from the following modules:
pf(4), ipfw(4), enc(4), if_bridge(4).
- Swapping of byte order added to ipfilter(4), based on __FreeBSD_version
- __FreeBSD_version bumped.
- pfil(9) manual page updated.

Reviewed by: ray, luigi, eri, melifaro
Tested by: glebius (LE), ray (BE)


241244 06-Oct-2012 glebius

The pfil(9) layer guarantees us presence of the protocol header,
so remove extra check, that is always false.

P.S. Also, goto there lead to unlocking a not locked rwlock.


241131 02-Oct-2012 glebius

To reduce volume of pfsync traffic:
- Scan request update queue to prevent doubles.
- Do not push undersized daragram in pfsync_update_request().


241057 29-Sep-2012 glebius

Clear and re-setup all function pointers that glue pf(4) and pfsync(4)
together whenever the pfsync0 is brought down or up respectively.


241056 29-Sep-2012 glebius

Simplify send out queue code:
- Write method of a queue now is void,length of item is taken
as queue property.
- Write methods don't need to know about mbud, supply just buf
to them.
- No need for safe queue iterator in pfsync_sendout().

Obtained from: OpenBSD


241039 28-Sep-2012 glebius

Simplify and somewhat redesign interaction between pf_purge_thread() and
pf_purge_expired_states().

Now pf purging daemon stores the current hash table index on stack
in pf_purge_thread(), and supplies it to next iteration of
pf_purge_expired_states(). The latter returns new index back.

The important change is that whenever pf_purge_expired_states() wraps
around the array it returns immediately. This makes our knowledge about
status of states expiry run more consistent. Prior to this change it
could happen that n-th run stopped on i-th entry, and returned (1) as
full run complete, then next (n+1) full run stopped on j-th entry, where
j < i, and that broke the mark-and-sweep algorythm that saves references
rules. A referenced rule was freed, and this later lead to a crash.


240836 22-Sep-2012 glebius

EBUSY is a better reply for refusing to unload pf(4) or pfsync(4).

Submitted by: pluknet


240811 22-Sep-2012 glebius

When connection rate hits and we overload a source to a table,
we are actually editing table, which means editing rules,
thus we need writer access to 'em.

Fix this by offloading the update of table to the same taskqueue,
we already use for flushing. Since taskqueues major task is now
overloading, and flushing is optional, do mechanical rename
s/flush/overload/ in the code related to the taskqueue.

Since overloading tasks do unsafe referencing of rules, provide
a bandaid in pf_purge_unlinked_rules(). If the latter sees any
queued tasks, then it skips purging for this run.

In table code:
- Assert any lock in pfr_lookup_addr().
- Assert writer lock in pfr_route_kentry().


240810 22-Sep-2012 glebius

In pfr_insert_kentry() return ENOMEM if memory allocation failed.


240809 22-Sep-2012 glebius

Fix fallout from r236397 in pfr_update_stats(), that was missed
later in r237155. We need to zero sockaddr before lookup. While
here, make pfr_update_stats() panic on unknown af.


240737 20-Sep-2012 glebius

Reduce copy/paste when freeing an source node.


240736 20-Sep-2012 glebius

Utilize Jenkins hash with random seed for source nodes storage.


240642 18-Sep-2012 glebius

Provide kernel compile time option to make pf(4) default rule to drop.

This is important to secure a small timeframe at boot time, when
network is already configured, but pf(4) is not yet.

PR: kern/171622
Submitted by: Olivier Cochard-LabbИ <olivier cochard.me>


240641 18-Sep-2012 glebius

Make ruleset anchors in pf(4) reentrant. We've got two problems here:

1) Ruleset parser uses a global variable for anchor stack.
2) When processing a wildcard anchor, matching anchors are marked.

To fix the first one:

o Allocate anchor processing stack on stack. To make this allocation
as small as possible, following measures taken:
- Maximum stack size reduced from 64 to 32.
- The struct pf_anchor_stackframe trimmed by one pointer - parent.
We can always obtain the parent via the rule pointer.
- When pf_test_rule() calls pf_get_translation(), the former lends
its stack to the latter, to avoid recursive allocation 32 entries.

The second one appeared more tricky. The code, that marks anchors was
added in OpenBSD rev. 1.516 of pf.c. According to commit log, the idea
is to enable the "quick" keyword on an anchor rule. The feature isn't
documented anywhere. The most obscure part of the 1.516 was that code
examines the "match" mark on a just processed child, which couldn't be
put here by current frame. Since this wasn't documented even in the
commit message and functionality of this is not clear to me, I decided
to drop this examination for now. The rest of 1.516 is redone in a
thread safe manner - the mark isn't put on the anchor itself, but on
current stack frame. To avoid growing stack frame, we utilize LSB
from the rule pointer, relying on kernel malloc(9) returning pointer
aligned addresses.

Discussed with: dhartmei


240638 18-Sep-2012 glebius

Fix DIOCNATLOOK: zero key padding before performing lookup.


240494 14-Sep-2012 glebius

o Create directory sys/netpfil, where all packet filters should
reside, and move there ipfw(4) and pf(4).

o Move most modified parts of pf out of contrib.

Actual movements:

sys/contrib/pf/net/*.c -> sys/netpfil/pf/
sys/contrib/pf/net/*.h -> sys/net/
contrib/pf/pfctl/*.c -> sbin/pfctl
contrib/pf/pfctl/*.h -> sbin/pfctl
contrib/pf/pfctl/pfctl.8 -> sbin/pfctl
contrib/pf/pfctl/*.4 -> share/man/man4
contrib/pf/pfctl/*.5 -> share/man/man5

sys/netinet/ipfw -> sys/netpfil/ipfw

The arguable movement is pf/net/*.h -> sys/net. There are
future plans to refactor pf includes, so I decided not to
break things twice.

Not modified bits of pf left in contrib: authpf, ftp-proxy,
tftp-proxy, pflogd.

The ipfw(4) movement is planned to be merged to stable/9,
to make head and stable match.

Discussed with: bz, luigi