#
346210 |
|
14-Apr-2019 |
ae |
MFC r345262: Modify struct nat64_config.
Add second IPv6 prefix to generic config structure and rename another fields to conform to RFC6877. Now it contains two prefixes and length: PLAT is provider-side translator that translates N:1 global IPv6 addresses to global IPv4 addresses. CLAT is customer-side translator (XLAT) that algorithmically translates 1:1 IPv4 addresses to global IPv6 addresses. Use PLAT prefix in stateless (nat64stl) and stateful (nat64lsn) translators.
Modify nat64_extract_ip4() and nat64_embed_ip4() functions to accept prefix length and use plat_plen to specify prefix length.
Retire net.inet.ip.fw.nat64_allow_private sysctl variable. Add NAT64_ALLOW_PRIVATE flag and use "allow_private" config option to configure this ability separately for each NAT64 instance.
Obtained from: Yandex LLC Sponsored by: Yandex LLC
|
#
346201 |
|
14-Apr-2019 |
ae |
MFC r342908: Reduce the size of struct ip_fw_args from 240 to 128 bytes on amd64. And refactor the code to avoid unneeded initialization to reduce overhead of per-packet processing.
ipfw(4) can be invoked by pfil(9) framework for each packet several times. Each call uses on-stack variable of type struct ip_fw_args to keep the state of ipfw(4) processing. Currently this variable has 240 bytes size on amd64. Each time ipfw(4) does bzero() on it, and then it initializes some fields.
glebius@ has reported that they at Netflix discovered, that initialization of this variable produces significant overhead on packet processing. After patching I managed to increase performance of packet processing on simple routing with ipfw(4) firewalling to about 11% from 9.8Mpps up to 11Mpps (Xeon E5-2660 v4@ + Mellanox 100G card).
Introduced new field flags, it is used to keep track of what fields was initialized. Some fields were moved into the anonymous union, to reduce the size. They all are mutually exclusive. dummypar field was unused, and therefore it is removed. The hopstore6 field type was changed from sockaddr_in6 to a bit smaller struct ip_fw_nh6. And now the size of struct ip_fw_args is 128 bytes.
ipfw_chk() was modified to properly handle ip_fw_args.flags instead of rely on checking for NULL pointers.
Obtained from: Yandex LLC Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D18690
MFC r342909: Fix the build with INVARIANTS.
MFC r343551: Fix the bug introduced in r342908, that causes problems with dynamic handling for protocols without ports numbers.
Since port numbers were uninitialized for protocols like ICMP/ICMPv6, ipfw_chk() used some non-zero values to create dynamic states, and due this it failed to match replies with created states.
Reported by: Oliver Hartmann, Boris Lytochkin Obtained from: Yandex LLC
|
#
345259 |
|
18-Mar-2019 |
ae |
MFC r345004 (with modification): Add IP_FW_NAT64 to codes that ipfw_chk() can return.
It will be used by upcoming NAT64 changes. We use separate code to avoid propogating EACCES error code to user level applications when NAT64 consumes a packet.
Obtained from: Yandex LLC Sponsored by: Yandex LLC
|
#
340956 |
|
26-Nov-2018 |
eugen |
MFC r339810: ipfw: implement ngtee/netgraph actions for layer-2 frames.
Kernel part of ipfw does not support and ignores rules other than "pass", "deny" and dummynet-related for layer-2 (ethernet frames). Others are processed as "pass".
Make it support ngtee/netgraph rules just like they are supported for IP packets. For example, this allows us to mirror some frames selectively to another interface for delivery to remote network analyzer over RSPAN vlan. Assuming ng_ipfw(4) netgraph node has a hook named "900" attached to "lower" hook of vlan900's ng_ether(4) node, that would be as simple as:
ipfw add ngtee 900 ip from any to 8.8.8.8 layer2 out xmit igb0
PR: 213452 Tested-by: Fyodor Ustinov <ufm@ufm.su>
|
#
308749 |
|
17-Nov-2016 |
loos |
MFC r308237:
Remove the mbuf tag after use (for reinjected packets).
Fixes the packet processing in dummynet l2 rules.
Obtained from: pfSense Sponsored by: Rubicon Communications, LLC (Netgate)
|
#
308660 |
|
14-Nov-2016 |
loos |
Stop abusing from struct ifnet presence to determine the packet direction for dummynet, use the correct argument for that, remove the false coment about the presence of struct ifnet.
Fixes the input match of dummynet l2 rules.
Obtained from: pfSense Sponsored by: Rubicon Communications, LLC (Netgate)
|
#
302408 |
|
07-Jul-2016 |
gjb |
Copy head@r302406 to stable/11 as part of the 11.0-RELEASE cycle. Prune svn:mergeinfo from the new branch, as nothing has been merged here.
Additional commits post-branch will follow.
Approved by: re (implicit) Sponsored by: The FreeBSD Foundation |
#
283116 |
|
19-May-2015 |
luigi |
use proper types to represent function pointers
|
#
279948 |
|
13-Mar-2015 |
ae |
Fix `ipfw fwd tablearg'. Use dedicated field nh4 in struct table_value to obtain IPv4 next hop address in tablearg case.
Add `fwd tablearg' support for IPv6. ipfw(8) uses INADDR_ANY as next hop address in O_FORWARD_IP opcode for specifying tablearg case. For IPv6 we still use this opcode, but when packet identified as IPv6 packet, we obtain next hop address from dedicated field nh6 in struct table_value.
Replace hopstore field in struct ip_fw_args with anonymous union and add hopstore6 field. Use this field to copy tablearg value for IPv6.
Replace spare1 field in struct table_value with zoneid. Use it to keep scope zone id for link-local IPv6 addresses. Since spare1 was used internally, replace spare0 array with two variables spare0 and spare1.
Use getaddrinfo(3)/getnameinfo(3) functions for parsing and formatting IPv6 addresses in table_value. Use zoneid field in struct table_value to store sin6_scope_id value.
Since the kernel still uses embedded scope zone id to represent link-local addresses, convert next_hop6 address into this form before return from pfil processing. This also fixes in6_localip() check for link-local addresses.
Differential Revision: https://reviews.freebsd.org/D2015 Obtained from: Yandex LLC Sponsored by: Yandex LLC
|
#
274225 |
|
07-Nov-2014 |
glebius |
Remove SYSCTL_VNET_* macros, and simply put CTLFLAG_VNET where needed.
Sponsored by: Nginx, Inc.
|
#
264540 |
|
16-Apr-2014 |
ae |
Set oif only for outgoing packets.
PR: 188543 MFC after: 1 week Sponsored by: Yandex LLC
|
#
263497 |
|
21-Mar-2014 |
glebius |
Fix breakage in ipfw+VIMAGE after r261590.
PR: kern/187665 Sponsored by: Nginx, Inc.
|
#
258463 |
|
22-Nov-2013 |
luigi |
make ipfw_check_packet() and ipfw_check_frame() public, so they can be used in the userspace version of ipfw/dummynet (normally using netmap for the I/O path).
This is the first of a few commits to ease compiling the ipfw kernel code in userspace.
|
#
243882 |
|
05-Dec-2012 |
glebius |
Mechanically substitute flags from historic mbuf allocator with malloc(9) flags within sys.
Exceptions:
- sys/contrib not touched - sys/mbuf.h edited manually
|
#
242463 |
|
01-Nov-2012 |
ae |
Remove the recently added sysctl variable net.pfil.forward. Instead, add protocol specific mbuf flags M_IP_NEXTHOP and M_IP6_NEXTHOP. Use them to indicate that the mbuf's chain contains the PACKET_TAG_IPFORWARD tag. And do a tag lookup only when this flag is set.
Suggested by: andre
|
#
242079 |
|
25-Oct-2012 |
ae |
Remove the IPFIREWALL_FORWARD kernel option and make possible to turn on the related functionality in the runtime via the sysctl variable net.pfil.forward. It is turned off by default.
Sponsored by: Yandex LLC Discussed with: net@ MFC after: 2 weeks
|
#
241913 |
|
22-Oct-2012 |
glebius |
Switch the entire IPv4 stack to keep the IP packet header in network byte order. Any host byte order processing is done in local variables and host byte order values are never[1] written to a packet.
After this change a packet processed by the stack isn't modified at all[2] except for TTL.
After this change a network stack hacker doesn't need to scratch his head trying to figure out what is the byte order at the given place in the stack.
[1] One exception still remains. The raw sockets convert host byte order before pass a packet to an application. Probably this would remain for ages for compatibility.
[2] The ip_input() still subtructs header len from ip->ip_len, but this is planned to be fixed soon.
Reviewed by: luigi, Maxim Dounin <mdounin mdounin.ru> Tested by: ray, Olivier Cochard-Labbe <olivier cochard.me>
|
#
241359 |
|
08-Oct-2012 |
glebius |
Catch up with r241245 and do not return packet back in host byte order.
|
#
241245 |
|
06-Oct-2012 |
glebius |
A step in resolving mess with byte ordering for AF_INET. After this change:
- All packets in NETISR_IP queue are in net byte order. - ip_input() is entered in net byte order and converts packet to host byte order right _after_ processing pfil(9) hooks. - ip_output() is entered in host byte order and converts packet to net byte order right _before_ processing pfil(9) hooks. - ip_fragment() accepts and emits packet in net byte order. - ip_forward(), ip_mloopback() use host byte order (untouched actually). - ip_fastforward() no longer modifies packet at all (except ip_ttl). - Swapping of byte order there and back removed from the following modules: pf(4), ipfw(4), enc(4), if_bridge(4). - Swapping of byte order added to ipfilter(4), based on __FreeBSD_version - __FreeBSD_version bumped. - pfil(9) manual page updated.
Reviewed by: ray, luigi, eri, melifaro Tested by: glebius (LE), ray (BE)
|
#
240494 |
|
14-Sep-2012 |
glebius |
o Create directory sys/netpfil, where all packet filters should reside, and move there ipfw(4) and pf(4).
o Move most modified parts of pf out of contrib.
Actual movements:
sys/contrib/pf/net/*.c -> sys/netpfil/pf/ sys/contrib/pf/net/*.h -> sys/net/ contrib/pf/pfctl/*.c -> sbin/pfctl contrib/pf/pfctl/*.h -> sbin/pfctl contrib/pf/pfctl/pfctl.8 -> sbin/pfctl contrib/pf/pfctl/*.4 -> share/man/man4 contrib/pf/pfctl/*.5 -> share/man/man5
sys/netinet/ipfw -> sys/netpfil/ipfw
The arguable movement is pf/net/*.h -> sys/net. There are future plans to refactor pf includes, so I decided not to break things twice.
Not modified bits of pf left in contrib: authpf, ftp-proxy, tftp-proxy, pflogd.
The ipfw(4) movement is planned to be merged to stable/9, to make head and stable match.
Discussed with: bz, luigi
|
#
240099 |
|
04-Sep-2012 |
melifaro |
Introduce new link-layer PFIL hook V_link_pfil_hook. Merge ether_ipfw_chk() and part of bridge_pfil() into unified ipfw_check_frame() function called by PFIL. This change was suggested by rwatson? @ DevSummit.
Remove ipfw headers from ether/bridge code since they are unneeded now.
Note this thange introduce some (temporary) performance penalty since PFIL read lock has to be acquired for every link-level packet.
MFC after: 3 weeks
|
#
227085 |
|
04-Nov-2011 |
bz |
Always use the opt_*.h options for ipfw.ko, not just when compiled into the kernel. Do not try to build the module in case of no INET support but keep #error calls for now in case we would compile it into the kernel.
This should fix an issue where the module would fail to enable IPv6 support from the rc framework, but also other INET and INET6 parts being silently compiled out without giving a warning in the module case.
While here garbage collect unneeded opt_*.h includes. opt_ipdn.h is not used anywhere but we need to leave the DUMMYNET entry in options for conditional inclusion in kernel so keep the file with the same name.
Reported by: pluknet Reviewed by: plunket, jhb MFC After: 3 days
|
#
225793 |
|
27-Sep-2011 |
bz |
Unbreak no-ip and no-inet6 module builds with ipfw. For now continue to build the ip_fw_pfil.c hooks and ipfw even in case of no-ip under the assumption that the private L2 hook (which hopefully eventually will be a pfil hook as well) can still be useful.
Allow building the module without inet as well.
Glanced at by: jhb MFC after: 3 days
|
#
225518 |
|
12-Sep-2011 |
jhb |
Allow the ipfw.ko module built with a kernel to honor any IPFIREWALL_* options defined in the kernel config. This more closely matches the behavior of other modules which inherit configuration settings from the kernel configuration during a kernel + modules build.
Reviewed by: luigi Approved by: re (kib) MFC after: 1 week
|
#
225044 |
|
20-Aug-2011 |
bz |
Add support for IPv6 to ipfw fwd: Distinguish IPv4 and IPv6 addresses and optional port numbers in user space to set the option for the correct protocol family. Add support in the kernel for carrying the new IPv6 destination address and port. Add support to TCP and UDP for IPv6 and fix UDP IPv4 to not change the address in the IP header. Add support for IPv6 forwarding to a non-local destination. Add a regession test uitilizing VIMAGE to check all 20 possible combinations I could think of.
Obtained from: David Dolson at Sandvine Incorporated (original version for ipfw fwd IPv6 support) Sponsored by: Sandvine Incorporated PR: bin/117214 MFC after: 4 weeks Approved by: re (kib)
|
#
223593 |
|
27-Jun-2011 |
glebius |
Add possibility to pass IPv6 packets to a divert(4) socket.
Submitted by: sem
|
#
223358 |
|
21-Jun-2011 |
ae |
Do not use SET_HOST_IPLEN() macro for IPv6 packets.
PR: kern/157239 MFC after: 2 weeks
|
#
215701 |
|
22-Nov-2010 |
dim |
After some off-list discussion, revert a number of changes to the DPCPU_DEFINE and VNET_DEFINE macros, as these cause problems for various people working on the affected files. A better long-term solution is still being considered. This reversal may give some modules empty set_pcpu or set_vnet sections, but these are harmless.
Changes reverted:
------------------------------------------------------------------------ r215318 | dim | 2010-11-14 21:40:55 +0100 (Sun, 14 Nov 2010) | 4 lines
Instead of unconditionally emitting .globl's for the __start_set_xxx and __stop_set_xxx symbols, only emit them when the set_vnet or set_pcpu sections are actually defined.
------------------------------------------------------------------------ r215317 | dim | 2010-11-14 21:38:11 +0100 (Sun, 14 Nov 2010) | 3 lines
Apply the STATIC_VNET_DEFINE and STATIC_DPCPU_DEFINE macros throughout the tree.
------------------------------------------------------------------------ r215316 | dim | 2010-11-14 21:23:02 +0100 (Sun, 14 Nov 2010) | 2 lines
Add macros to define static instances of VNET_DEFINE and DPCPU_DEFINE.
|
#
215317 |
|
14-Nov-2010 |
dim |
Apply the STATIC_VNET_DEFINE and STATIC_DPCPU_DEFINE macros throughout the tree.
|
#
213254 |
|
28-Sep-2010 |
luigi |
fix breakage in in-kernel NAT: the code did not honor net.inet.ip.fw.one_pass and always moved to the next rule in case of a successful nat.
This should fix several related PR (waiting for feedback before closing them)
PR: 145167 149572 150141 MFC after: 3 days
|
#
206845 |
|
19-Apr-2010 |
luigi |
whitespace fixes (trailing whitespace, bad indentation after a merge, etc.)
|
#
204591 |
|
02-Mar-2010 |
luigi |
Bring in the most recent version of ipfw and dummynet, developed and tested over the past two months in the ipfw3-head branch. This also happens to be the same code available in the Linux and Windows ports of ipfw and dummynet.
The major enhancement is a completely restructured version of dummynet, with support for different packet scheduling algorithms (loadable at runtime), faster queue/pipe lookup, and a much cleaner internal architecture and kernel/userland ABI which simplifies future extensions.
In addition to the existing schedulers (FIFO and WF2Q+), we include a Deficit Round Robin (DRR or RR for brevity) scheduler, and a new, very fast version of WF2Q+ called QFQ.
Some test code is also present (in sys/netinet/ipfw/test) that lets you build and test schedulers in userland.
Also, we have added a compatibility layer that understands requests from the RELENG_7 and RELENG_8 versions of the /sbin/ipfw binaries, and replies correctly (at least, it does its best; sometimes you just cannot tell who sent the request and how to answer). The compatibility layer should make it possible to MFC this code in a relatively short time.
Some minor glitches (e.g. handling of ipfw set enable/disable, and a workaround for a bug in RELENG_7's /sbin/ipfw) will be fixed with separate commits.
CREDITS: This work has been partly supported by the ONELAB2 project, and mostly developed by Riccardo Panicucci and myself. The code for the qfq scheduler is mostly from Fabio Checconi, and Marta Carbone and Francesco Magno have helped with testing, debugging and some bug fixes.
|
#
201740 |
|
07-Jan-2010 |
luigi |
check that we have an ipv4 packet before swapping ip_len and ip_off. This should fix the handling of ipv6 packets which i broke when i made ipfw operate on packets in network format.
Reported by: Hajimu UMEMOTO
|
#
201732 |
|
07-Jan-2010 |
luigi |
some header shuffling to help decoupling ip_divert from ipfw
|
#
201568 |
|
05-Jan-2010 |
luigi |
this file does not require ip_dummynet.h
|
#
201527 |
|
04-Jan-2010 |
luigi |
Various cleanup done in ipfw3-head branch including: - use a uniform mtag format for all packets that exit and re-enter the firewall in the middle of a rulechain. On reentry, all tags containing reinject info are renamed to MTAG_IPFW_RULE so the processing is simpler.
- make ipfw and dummynet use ip_len and ip_off in network format everywhere. Conversion is done only once instead of tracking the format in every place.
- use a macro FREE_PKT to dispose of mbufs. This eases portability.
On passing i also removed a few typos, staticise or localise variables, remove useless declarations and other minor things.
Overall the code shrinks a bit and is hopefully more readable.
I have tested functionality for all but ng_ipfw and if_bridge/if_ethersubr. For ng_ipfw i am actually waiting for feedback from glebius@ because we might have some small changes to make. For if_bridge and if_ethersubr feedback would be welcome (there are still some redundant parts in these two modules that I would like to remove, but first i need to check functionality).
|
#
201124 |
|
28-Dec-2009 |
luigi |
bring the NGM_IPFW_COOKIE back into ng_ipfw.h, libnetgraph expects to find it there. Unfortunately this reintroduces the dependency on ip_fw_pfil.c
|
#
201122 |
|
28-Dec-2009 |
luigi |
bring in several cleanups tested in ipfw3-head branch, namely:
r201011 - move most of ng_ipfw.h into ip_fw_private.h, as this code is ipfw-specific. This removes a dependency on ng_ipfw.h from some files.
- move many equivalent definitions of direction (IN, OUT) for reinjected packets into ip_fw_private.h
- document the structure of the packet tags used for dummynet and netgraph;
r201049 - merge some common code to attach/detach hooks into a single function.
r201055 - remove some duplicated code in ip_fw_pfil. The input and output processing uses almost exactly the same code so there is no need to use two separate hooks. ip_fw_pfil.o goes from 2096 to 1382 bytes of .text
r201057 (see the svn log for full details) - macros to make the conversion of ip_len and ip_off between host and network format more explicit
r201113 (the remaining parts) - readability fixes -- put braces around some large for() blocks, localize variables so the compiler does not think they are uninitialized, do not insist on precise allocation size if we have more than we need.
r201119 - when doing a lookup, keys must be in big endian format because this is what the radix code expects (this fixes a bug in the recently-introduced 'lookup' option)
No ABI changes in this commit.
MFC after: 1 week
|
#
200855 |
|
22-Dec-2009 |
luigi |
merge code from ipfw3-head to reduce contention on the ipfw lock and remove all O(N) sequences from kernel critical sections in ipfw.
In detail:
1. introduce a IPFW_UH_LOCK to arbitrate requests from the upper half of the kernel. Some things, such as 'ipfw show', can be done holding this lock in read mode, whereas insert and delete require IPFW_UH_WLOCK.
2. introduce a mapping structure to keep rules together. This replaces the 'next' chain currently used in ipfw rules. At the moment the map is a simple array (sorted by rule number and then rule_id), so we can find a rule quickly instead of having to scan the list. This reduces many expensive lookups from O(N) to O(log N).
3. when an expensive operation (such as insert or delete) is done by userland, we grab IPFW_UH_WLOCK, create a new copy of the map without blocking the bottom half of the kernel, then acquire IPFW_WLOCK and quickly update pointers to the map and related info. After dropping IPFW_LOCK we can then continue the cleanup protected by IPFW_UH_LOCK. So userland still costs O(N) but the kernel side is only blocked for O(1).
4. do not pass pointers to rules through dummynet, netgraph, divert etc, but rather pass a <slot, chain_id, rulenum, rule_id> tuple. We validate the slot index (in the array of #2) with chain_id, and if successful do a O(1) dereference; otherwise, we can find the rule in O(log N) through <rulenum, rule_id>
All the above does not change the userland/kernel ABI, though there are some disgusting casts between pointers and uint32_t
Operation costs now are as follows:
Function Old Now Planned ------------------------------------------------------------------- + skipto X, non cached O(N) O(log N) + skipto X, cached O(1) O(1) XXX dynamic rule lookup O(1) O(log N) O(1) + skipto tablearg O(N) O(1) + reinject, non cached O(N) O(log N) + reinject, cached O(1) O(1) + kernel blocked during setsockopt() O(N) O(1) -------------------------------------------------------------------
The only (very small) regression is on dynamic rule lookup and this will be fixed in a day or two, without changing the userland/kernel ABI
Supported by: Valeria Paoli MFC after: 1 month
|
#
200601 |
|
16-Dec-2009 |
luigi |
Various cosmetic cleanup of the files: - move global variables around to reduce the scope and make them static if possible; - add an ipfw_ prefix to all public functions to prevent conflicts (the same should be done for variables); - try to pack variable declaration in an uniform way across files; - clarify some comments; - remove some misspelling of names (#define V_foo VNET(bar)) that slipped in due to cut&paste - remove duplicate static variables in different files;
MFC after: 1 month
|
#
200580 |
|
15-Dec-2009 |
luigi |
Start splitting ip_fw2.c and ip_fw.h into smaller components. At this time we pull out from ip_fw2.c the logging functions, and support for dynamic rules, and move kernel-only stuff into netinet/ipfw/ip_fw_private.h
No ABI change involved in this commit, unless I made some mistake. ip_fw.h has changed, though not in the userland-visible part.
Files touched by this commit:
conf/files now references the two new source files
netinet/ip_fw.h remove kernel-only definitions gone into netinet/ipfw/ip_fw_private.h.
netinet/ipfw/ip_fw_private.h new file with kernel-specific ipfw definitions
netinet/ipfw/ip_fw_log.c ipfw_log and related functions
netinet/ipfw/ip_fw_dynamic.c code related to dynamic rules
netinet/ipfw/ip_fw2.c removed the pieces that goes in the new files
netinet/ipfw/ip_fw_nat.c minor rearrangement to remove LOOKUP_NAT from the main headers. This require a new function pointer.
A bunch of other kernel files that included netinet/ip_fw.h now require netinet/ipfw/ip_fw_private.h as well. Not 100% sure i caught all of them.
MFC after: 1 month
|
#
197952 |
|
11-Oct-2009 |
julian |
Virtualize the pfil hooks so that different jails may chose different packet filters. ALso allows ipfw to be enabled on on ejail and disabled on another. In 8.0 it's a global setting.
Sitting aroung in tree waiting to commit for: 2 months MFC after: 2 months
|
#
196423 |
|
21-Aug-2009 |
julian |
Fix ipfw's initialization functions to get the correct order of evaluation to allow vnet and non vnet operation. Move some functions from ip_fw_pfil.c to ip_fw2.c and mode to mostly using the SYSINIT and VNET_SYSINIT handlers instead of the modevent handler. Correct some spelling errors in comments in the affected code. Note this bug fixes a crash in NON VIMAGE kernels when ipfw is unloaded.
This patch is a minimal patch for 8.0 I have a much larger patch that actually fixes the underlying problems that will be applied after 8.0
Reviewed by: zec@, rwatson@, bz@(earlier version) Approved by: re (rwatson) MFC after: Immediatly
|
#
196322 |
|
17-Aug-2009 |
jhb |
Purge mergeinfo in sys/ that is either empty or a subset of the parent mergeinfo on sys/ itself.
Approved by: re (mergeinfo blanket)
|
#
196019 |
|
01-Aug-2009 |
rwatson |
Merge the remainder of kern_vimage.c and vimage.h into vnet.c and vnet.h, we now use jails (rather than vimages) as the abstraction for virtualization management, and what remained was specific to virtual network stacks. Minor cleanups are done in the process, and comments updated to reflect these changes.
Reviewed by: bz Approved by: re (vimage blanket)
|
#
195699 |
|
14-Jul-2009 |
rwatson |
Build on Jeff Roberson's linker-set based dynamic per-CPU allocator (DPCPU), as suggested by Peter Wemm, and implement a new per-virtual network stack memory allocator. Modify vnet to use the allocator instead of monolithic global container structures (vinet, ...). This change solves many binary compatibility problems associated with VIMAGE, and restores ELF symbols for virtualized global variables.
Each virtualized global variable exists as a "reference copy", and also once per virtual network stack. Virtualized global variables are tagged at compile-time, placing the in a special linker set, which is loaded into a contiguous region of kernel memory. Virtualized global variables in the base kernel are linked as normal, but those in modules are copied and relocated to a reserved portion of the kernel's vnet region with the help of a the kernel linker.
Virtualized global variables exist in per-vnet memory set up when the network stack instance is created, and are initialized statically from the reference copy. Run-time access occurs via an accessor macro, which converts from the current vnet and requested symbol to a per-vnet address. When "options VIMAGE" is not compiled into the kernel, normal global ELF symbols will be used instead and indirection is avoided.
This change restores static initialization for network stack global variables, restores support for non-global symbols and types, eliminates the need for many subsystem constructors, eliminates large per-subsystem structures that caused many binary compatibility issues both for monitoring applications (netstat) and kernel modules, removes the per-function INIT_VNET_*() macros throughout the stack, eliminates the need for vnet_symmap ksym(2) munging, and eliminates duplicate definitions of virtualized globals under VIMAGE_GLOBALS.
Bump __FreeBSD_version and update UPDATING.
Portions submitted by: bz Reviewed by: bz, zec Discussed with: gnn, jamie, jeff, jhb, julian, sam Suggested by: peter Approved by: re (kensmith)
|
#
193859 |
|
09-Jun-2009 |
oleg |
Close long existed race with net.inet.ip.fw.one_pass = 0: If packet leaves ipfw to other kernel subsystem (dummynet, netgraph, etc) it carries pointer to matching ipfw rule. If this packet then reinjected back to ipfw, ruleset processing starts from that rule. If rule was deleted meanwhile, due to existed race condition panic was possible (as well as other odd effects like parsing rules in 'reap list').
P.S. this commit changes ABI so userland ipfw related binaries should be recompiled.
MFC after: 1 month Tested by: Mikolaj Golub
|
#
193532 |
|
05-Jun-2009 |
luigi |
move kernel ipfw-related sources to a separate directory, adjust conf/files and modules' Makefiles accordingly.
No code or ABI changes so this and most of previous related changes can be easily MFC'ed
MFC after: 5 days
|
#
193502 |
|
05-Jun-2009 |
luigi |
More cleanup in preparation of ipfw relocation (no actual code change):
+ move ipfw and dummynet hooks declarations to raw_ip.c (definitions in ip_var.h) same as for most other global variables. This removes some dependencies from ip_input.c;
+ remove the IPFW_LOADED macro, just test ip_fw_chk_ptr directly;
+ remove the DUMMYNET_LOADED macro, just test ip_dn_io_ptr directly;
+ move ip_dn_ruledel_ptr to ip_fw2.c which is the only file using it;
To be merged together with rev 193497
MFC after: 5 days
|
#
191688 |
|
30-Apr-2009 |
zec |
Permit buiding kernels with options VIMAGE, restricted to only a single active network stack instance. Turning on options VIMAGE at compile time yields the following changes relative to default kernel build:
1) V_ accessor macros for virtualized variables resolve to structure fields via base pointers, instead of being resolved as fields in global structs or plain global variables. As an example, V_ifnet becomes:
options VIMAGE: ((struct vnet_net *) vnet_net)->_ifnet default build: vnet_net_0._ifnet options VIMAGE_GLOBALS: ifnet
2) INIT_VNET_* macros will declare and set up base pointers to be used by V_ accessor macros, instead of resolving to whitespace:
INIT_VNET_NET(ifp->if_vnet); becomes
struct vnet_net *vnet_net = (ifp->if_vnet)->mod_data[VNET_MOD_NET];
3) Memory for vnet modules registered via vnet_mod_register() is now allocated at run time in sys/kern/kern_vimage.c, instead of per vnet module structs being declared as globals. If required, vnet modules can now request the framework to provide them with allocated bzeroed memory by filling in the vmi_size field in their vmi_modinfo structures.
4) structs socket, ifnet, inpcbinfo, tcpcb and syncache_head are extended to hold a pointer to the parent vnet. options VIMAGE builds will fill in those fields as required.
5) curvnet is introduced as a new global variable in options VIMAGE builds, always pointing to the default and only struct vnet.
6) struct sysctl_oid has been extended with additional two fields to store major and minor virtualization module identifiers, oid_v_subs and oid_v_mod. SYSCTL_V_* family of macros will fill in those fields accordingly, and store the offset in the appropriate vnet container struct in oid_arg1. In sysctl handlers dealing with virtualized sysctls, the SYSCTL_RESOLVE_V_ARG1() macro will compute the address of the target variable and make it available in arg1 variable for further processing.
Unused fields in structs vnet_inet, vnet_inet6 and vnet_ipfw have been deleted.
Reviewed by: bz, rwatson Approved by: julian (mentor)
|
#
191570 |
|
27-Apr-2009 |
oleg |
Optimize packet flow: if net.inet.ip.fw.one_pass != 0 and packet was processed by ipfw once - avoid second ipfw_chk() call. This saves us from unnecessary IPFW_RLOCK(), m_tag_find() calls and ip/tcp/udp header parsing.
MFC after: 2 month
|
#
190633 |
|
01-Apr-2009 |
piso |
Implement an ipfw action to reassemble ip packets: reass.
|
#
188676 |
|
16-Feb-2009 |
luigi |
correct some #include
|
#
186180 |
|
16-Dec-2008 |
rwatson |
IPFW's pfil hook/unhook code ignores the return values of pfil_add_hook() and pfil_remove_hook(), so cast them to (void).
MFC after: pretty soon
|
#
185937 |
|
11-Dec-2008 |
bz |
Put a global variables, which were virtualized but formerly missed under VIMAGE_GLOBAL.
Start putting the extern declarations of the virtualized globals under VIMAGE_GLOBAL as the globals themsevles are already. This will help by the time when we are going to remove the globals entirely.
While there garbage collect a few dead externs from ip6_var.h.
Sponsored by: The FreeBSD Foundation
|
#
185895 |
|
10-Dec-2008 |
zec |
Conditionally compile out V_ globals while instantiating the appropriate container structures, depending on VIMAGE_GLOBALS compile time option.
Make VIMAGE_GLOBALS a new compile-time option, which by default will not be defined, resulting in instatiations of global variables selected for V_irtualization (enclosed in #ifdef VIMAGE_GLOBALS blocks) to be effectively compiled out. Instantiate new global container structures to hold V_irtualized variables: vnet_net_0, vnet_inet_0, vnet_inet6_0, vnet_ipsec_0, vnet_netgraph_0, and vnet_gif_0.
Update the VSYM() macro so that depending on VIMAGE_GLOBALS the V_ macros resolve either to the original globals, or to fields inside container structures, i.e. effectively
#ifdef VIMAGE_GLOBALS #define V_rt_tables rt_tables #else #define V_rt_tables vnet_net_0._rt_tables #endif
Update SYSCTL_V_*() macros to operate either on globals or on fields inside container structs.
Extend the internal kldsym() lookups with the ability to resolve selected fields inside the virtualization container structs. This applies only to the fields which are explicitly registered for kldsym() visibility via VNET_MOD_DECLARE() and vnet_mod_register(), currently this is done only in sys/net/if.c.
Fix a few broken instances of MODULE_GLOBAL() macro use in SCTP code, and modify the MODULE_GLOBAL() macro to resolve to V_ macros, which in turn result in proper code being generated depending on VIMAGE_GLOBALS.
De-virtualize local static variables in sys/contrib/pf/net/pf_subr.c which were prematurely V_irtualized by automated V_ prepending scripts during earlier merging steps. PF virtualization will be done separately, most probably after next PF import.
Convert a few variable initializations at instantiation to initialization in init functions, most notably in ipfw. Also convert TUNABLE_INT() initializers for V_ variables to TUNABLE_FETCH_INT() in initializer functions.
Discussed at: devsummit Strassburg Reviewed by: bz, julian Approved by: julian (mentor) Obtained from: //depot/projects/vimage-commit2/... X-MFC after: never Sponsored by: NLnet Foundation, The FreeBSD Foundation
|
#
185348 |
|
26-Nov-2008 |
zec |
Merge more of currently non-functional (i.e. resolving to whitespace) macros from p4/vimage branch.
Do a better job at enclosing all instantiations of globals scheduled for virtualization in #ifdef VIMAGE_GLOBALS blocks.
De-virtualize and mark as const saorder_state_alive and saorder_state_any arrays from ipsec code, given that they are never updated at runtime, so virtualizing them would be pointless.
Reviewed by: bz, julian Approved by: julian (mentor) Obtained from: //depot/projects/vimage-commit2/... X-MFC after: never Sponsored by: NLnet Foundation, The FreeBSD Foundation
|
#
181803 |
|
17-Aug-2008 |
bz |
Commit step 1 of the vimage project, (network stack) virtualization work done by Marko Zec (zec@).
This is the first in a series of commits over the course of the next few weeks.
Mark all uses of global variables to be virtualized with a V_ prefix. Use macros to map them back to their global names for now, so this is a NOP change only.
We hope to have caught at least 85-90% of what is needed so we do not invalidate a lot of outstanding patches again.
Obtained from: //depot/projects/vimage-commit2/... Reviewed by: brooks, des, ed, mav, julian, jamie, kris, rwatson, zec, ... (various people I forgot, different versions) md5 (with a bit of help) Sponsored by: NLnet Foundation, The FreeBSD Foundation X-MFC after: never V_Commit_Message_Reviewed_By: more people than the patch
|
#
176765 |
|
03-Mar-2008 |
piso |
Raise a bit ipfw kld priority.
Discussed on: net-, ipfw-.
|
#
173399 |
|
06-Nov-2007 |
oleg |
1) dummynet_io() declaration has changed. 2) Alter packet flow inside dummynet: allow certain packets to bypass dummynet scheduler. Benefits are:
- lower latency: if packet flow does not exceed pipe bandwidth, packets will not be (up to tick) delayed (due to dummynet's scheduler granularity). - lower overhead: if packet avoids dummynet scheduler it shouldn't reenter ip stack later. Such packets can be fastforwarded. - recursion (which can lead to kernel stack exhaution) eliminated. This fix long existed panic, which can be triggered this way: kldload dummynet sysctl net.inet.ip.fw.one_pass=0 ipfw pipe 1 config bw 0 for i in `jot 30`; do ipfw add 1 pipe 1 icmp from any to any; done ping -c 1 localhost
3) Three new sysctl nodes are added: net.inet.ip.dummynet.io_pkt - packets passed to dummynet net.inet.ip.dummynet.io_pkt_fast - packets avoided dummynet scheduler net.inet.ip.dummynet.io_pkt_drop - packets dropped by dummynet
P.S. Above comments are true only for layer 3 packets. Layer 2 packet flow is not changed yet.
MFC after: 3 month
|
#
172467 |
|
07-Oct-2007 |
silby |
Add FBSDID to all files in netinet so that people can more easily include file version information in bug reports.
Approved by: re (kensmith)
|
#
165648 |
|
29-Dec-2006 |
piso |
Summer of Code 2005: improve libalias - part 2 of 2
With the second (and last) part of my previous Summer of Code work, we get:
-ipfw's in kernel nat
-redirect_* and LSNAT support
General information about nat syntax and some examples are available in the ipfw (8) man page. The redirect and LSNAT syntax are identical to natd, so please refer to natd (8) man page.
To enable in kernel nat in rc.conf, two options were added:
o firewall_nat_enable: equivalent to natd_enable
o firewall_nat_interface: equivalent to natd_interface
Remember to set net.inet.ip.fw.one_pass to 0, if you want the packet to continue being checked by the firewall ruleset after being (de)aliased.
NOTA BENE: due to some problems with libalias architecture, in kernel nat won't work with TSO enabled nic, thus you have to disable TSO via ifconfig (ifconfig foo0 -tso).
Approved by: glebius (mentor)
|
#
163548 |
|
20-Oct-2006 |
julian |
revert last change.. premature.. need to wait until if_ethersubr.c uses pfil to get to ipfw.
|
#
163545 |
|
20-Oct-2006 |
julian |
Move some variables to a more likely place and remove "temporary" stuff that is not needed any more.
|
#
158470 |
|
12-May-2006 |
mlaier |
Reintroduce net.inet6.ip6.fw.enable sysctl to dis/enable the ipv6 processing seperately. Also use pfil hook/unhook instead of keeping the check functions in pfil just to return there based on the sysctl. While here fix some whitespace on a nearby SYSCTL_ macro.
|
#
152928 |
|
29-Nov-2005 |
ume |
obey opt_inet6.h and opt_ipsec.h in kernel build directory.
Requested by: hrs
|
#
145246 |
|
18-Apr-2005 |
brooks |
Add IPv6 support to IPFW and Dummynet.
Submitted by: Mariano Tortoriello and Raffaele De Lorenzo (via luigi)
|
#
144712 |
|
06-Apr-2005 |
glebius |
When a packet has been reinjected into ipfw(4) after dummynet(4) processing we have a non-NULL args.rule. If the same packet later is subject to "tee" rule, its original is sent again into ipfw_chk() and it reenters at the same rule. This leads to infinite loop and frozen router.
Assign args.rule to NULL, any time we are going to send packet back to ipfw_chk() after a tee rule. This is a temporary workaround, which we will leave for RELENG_5. In HEAD we are going to make divert(4) save next rule the same way as dummynet(4) does.
PR: kern/79546 Submitted by: Oleg Bulyzhin Reviewed by: maxim, andre MFC after: 3 days
|
#
141351 |
|
05-Feb-2005 |
glebius |
Add a ng_ipfw node, implementing a quick and simple interface between ipfw(4) and netgraph(4) facilities.
Reviewed by: andre, brooks, julian
|
#
140345 |
|
16-Jan-2005 |
glebius |
- Reduce number of arguments passed to dummynet_io(), we already have cookie in struct ip_fw_args itself. - Remove redundant &= 0xffff from dummynet_io().
|
#
140224 |
|
14-Jan-2005 |
glebius |
o Clean up interface between ip_fw_chk() and its callers:
- ip_fw_chk() returns action as function return value. Field retval is removed from args structure. Action is not flag any more. It is one of integer constants. - Any action-specific cookies are returned either in new "cookie" field in args structure (dummynet, future netgraph glue), or in mbuf tag attached to packet (divert, tee, some future action).
o Convert parsing of return value from ip_fw_chk() in ipfw_check_{in,out}() to a switch structure, so that the functions are more readable, and a future actions can be added with less modifications.
Approved by: andre MFC after: 2 months
|
#
139823 |
|
06-Jan-2005 |
imp |
/* -> /*- for license, minor formatting changes
|
#
138652 |
|
10-Dec-2004 |
glebius |
Revert last change.
Andre: First lets get major new features into the kernel in a clean and nice way, and then start optimizing. In this case we don't have any obfusication that makes later profiling and/or optimizing difficult in any way.
Requested by: csjp, sam
|
#
138631 |
|
09-Dec-2004 |
glebius |
Check that DUMMYNET_LOADED before seeking dummynet m_tag.
Reviewed by: andre MFC after: 1 week
|
#
136714 |
|
19-Oct-2004 |
andre |
Convert IPDIVERT into a loadable module. This makes use of the dynamic loadability of protocols. The call to divert_packet() is done through a function pointer. All semantics of IPDIVERT remain intact. If IPDIVERT is not loaded ipfw will refuse to install divert rules and natd will complain about 'protocol not supported'. Once it is loaded both will work and accept rules and open the divert socket. The module can only be unloaded if no divert sockets are open. It does not close any divert sockets when an unload is requested but will return EBUSY instead.
|
#
135920 |
|
29-Sep-2004 |
mlaier |
Add an additional struct inpcb * argument to pfil(9) in order to enable passing along socket information. This is required to work around a LOR with the socket code which results in an easy reproducible hard lockup with debug.mpsafenet=1. This commit does *not* fix the LOR, but enables us to do so later. The missing piece is to turn the filter locking into a leaf lock and will follow in a seperate (later) commit.
This will hopefully be MT5'ed in order to fix the problem for RELENG_5 in forseeable future.
Suggested by: rwatson A lot of work by: csjp (he'd be even more helpful w/o mentor-reviews ;) Reviewed by: rwatson, csjp Tested by: -pf, -ipfw, LINT, csjp and myself MFC after: 3 days
LOR IDs: 14 - 17 (not fixed yet)
|
#
135167 |
|
13-Sep-2004 |
andre |
If we have to 'ipfw fwd'-tag a packet the second time in ipfw_pfil_out() don't prepend an already existing tag again. Instead unlink it and prepend it again to have it as the first tag in the chain.
PR: kern/71380
|
#
135154 |
|
13-Sep-2004 |
andre |
Make 'ipfw tee' behave as inteded and designed. A tee'd packet is copied and sent to the DIVERT socket while the original packet continues with the next rule. Unlike a normally diverted packet no IP reassembly attemts are made on tee'd packets and they are passed upwards totally unmodified.
Note: This will not be MFC'd to 4.x because of major infrastucture changes.
PR: kern/64240 (and many others collapsed into that one)
|
#
134383 |
|
27-Aug-2004 |
andre |
Always compile PFIL_HOOKS into the kernel and remove the associated kernel compile option. All FreeBSD packet filters now use the PFIL_HOOKS API and thus it becomes a standard part of the network stack.
If no hooks are connected the entire packet filter hooks section and related activities are jumped over. This removes any performance impact if no hooks are active.
Both OpenBSD and DragonFlyBSD have integrated PFIL_HOOKS permanently as well.
|
#
134346 |
|
26-Aug-2004 |
ru |
Revert the last change to sys/modules/ipfw/Makefile and fix a standalone module build in a better way.
Silence from: andre MFC after: 3 days
|
#
134055 |
|
19-Aug-2004 |
andre |
Fix a stupid typo which prevented an ipfw KLD unload from successfully cleaning up its remains. Do not terminate 'if' lines with ';'.
Spotted by: claudio@OpenBSD.ORG (sitting 3m from my desk) Pointy hat to: andre
|
#
134026 |
|
19-Aug-2004 |
andre |
Give a useful error message if someone tries to compile IPFIREWALL into the kernel without specifying PFIL_HOOKS as well.
|
#
134023 |
|
19-Aug-2004 |
andre |
Do not unconditionally ignore IPDIVERT and IPFIREWALL_FORWARD when building the ipfw KLD.
For IPFIREWALL_FORWARD this does not have any side effects. If the module has it but not the kernel it just doesn't do anything.
For IPDIVERT the KLD will be unloadable if the kernel doesn't have IPDIVERT compiled in too. However this is the least disturbing behaviour. The user can just recompile either module or the kernel to match the other one. The access to the machine is not denied if ipfw refuses to load.
|
#
134022 |
|
19-Aug-2004 |
andre |
Bring back the sysctl 'net.inet.ip.fw.enable' to unbreak the startup scripts and to be able to disable ipfw if it was compiled directly into the kernel.
|
#
133920 |
|
17-Aug-2004 |
andre |
Convert ipfw to use PFIL_HOOKS. This is change is transparent to userland and preserves the ipfw ABI. The ipfw core packet inspection and filtering functions have not been changed, only how ipfw is invoked is different.
However there are many changes how ipfw is and its add-on's are handled:
In general ipfw is now called through the PFIL_HOOKS and most associated magic, that was in ip_input() or ip_output() previously, is now done in ipfw_check_[in|out]() in the ipfw PFIL handler.
IPDIVERT is entirely handled within the ipfw PFIL handlers. A packet to be diverted is checked if it is fragmented, if yes, ip_reass() gets in for reassembly. If not, or all fragments arrived and the packet is complete, divert_packet is called directly. For 'tee' no reassembly attempt is made and a copy of the packet is sent to the divert socket unmodified. The original packet continues its way through ip_input/output().
ipfw 'forward' is done via m_tag's. The ipfw PFIL handlers tag the packet with the new destination sockaddr_in. A check if the new destination is a local IP address is made and the m_flags are set appropriately. ip_input() and ip_output() have some more work to do here. For ip_input() the m_flags are checked and a packet for us is directly sent to the 'ours' section for further processing. Destination changes on the input path are only tagged and the 'srcrt' flag to ip_forward() is set to disable destination checks and ICMP replies at this stage. The tag is going to be handled on output. ip_output() again checks for m_flags and the 'ours' tag. If found, the packet will be dropped back to the IP netisr where it is going to be picked up by ip_input() again and the directly sent to the 'ours' section. When only the destination changes, the route's 'dst' is overwritten with the new destination from the forward m_tag. Then it jumps back at the route lookup again and skips the firewall check because it has been marked with M_SKIP_FIREWALL. ipfw 'forward' has to be compiled into the kernel with 'option IPFIREWALL_FORWARD' to enable it.
DUMMYNET is entirely handled within the ipfw PFIL handlers. A packet for a dummynet pipe or queue is directly sent to dummynet_io(). Dummynet will then inject it back into ip_input/ip_output() after it has served its time. Dummynet packets are tagged and will continue from the next rule when they hit the ipfw PFIL handlers again after re-injection.
BRIDGING and IPFW_ETHER are not changed yet and use ipfw_chk() directly as they did before. Later this will be changed to dedicated ETHER PFIL_HOOKS.
More detailed changes to the code:
conf/files Add netinet/ip_fw_pfil.c.
conf/options Add IPFIREWALL_FORWARD option.
modules/ipfw/Makefile Add ip_fw_pfil.c.
net/bridge.c Disable PFIL_HOOKS if ipfw for bridging is active. Bridging ipfw is still directly invoked to handle layer2 headers and packets would get a double ipfw when run through PFIL_HOOKS as well.
netinet/ip_divert.c Removed divert_clone() function. It is no longer used.
netinet/ip_dummynet.[ch] Neither the route 'ro' nor the destination 'dst' need to be stored while in dummynet transit. Structure members and associated macros are removed.
netinet/ip_fastfwd.c Removed all direct ipfw handling code and replace it with the new 'ipfw forward' handling code.
netinet/ip_fw.h Removed 'ro' and 'dst' from struct ip_fw_args.
netinet/ip_fw2.c (Re)moved some global variables and the module handling.
netinet/ip_fw_pfil.c New file containing the ipfw PFIL handlers and module initialization.
netinet/ip_input.c Removed all direct ipfw handling code and replace it with the new 'ipfw forward' handling code. ip_forward() does not longer require the 'next_hop' struct sockaddr_in argument. Disable early checks if 'srcrt' is set.
netinet/ip_output.c Removed all direct ipfw handling code and replace it with the new 'ipfw forward' handling code.
netinet/ip_var.h Add ip_reass() as general function. (Used from ipfw PFIL handlers for IPDIVERT.)
netinet/raw_ip.c Directly check if ipfw and dummynet control pointers are active.
netinet/tcp_input.c Rework the 'ipfw forward' to local code to work with the new way of forward tags.
netinet/tcp_sack.c Remove include 'opt_ipfw.h' which is not needed here.
sys/mbuf.h Remove m_claim_next() macro which was exclusively for ipfw 'forward' and is no longer needed.
Approved by: re (scottl)
|