History log of /freebsd-11-stable/sys/netinet/sctp_os_bsd.h
Revision Date Author Comments
(<<< Hide modified files)
(Show modified files >>>)
# 332189 07-Apr-2018 tuexen

MFC r323657:

Remove code not used on any platform currently supported.


# 331722 29-Mar-2018 eadler

Revert r330897:

This was intended to be a non-functional change. It wasn't. The commit
message was thus wrong. In addition it broke arm, and merged crypto
related code.

Revert with prejudice.

This revert skips files touched in r316370 since that commit was since
MFCed. This revert also skips files that require $FreeBSD$ property
changes.

Thank you to those who helped me get out of this mess including but not
limited to gonzo, kevans, rgrimes.

Requested by: gjb (re)


# 330897 14-Mar-2018 eadler

Partial merge of the SPDX changes

These changes are incomplete but are making it difficult
to determine what other changes can/should be merged.

No objections from: pfg


# 315514 18-Mar-2017 ae

MFC r304572 (by bz):
Remove the kernel optoion for IPSEC_FILTERTUNNEL, which was deprecated
more than 7 years ago in favour of a sysctl in r192648.

MFC r305122:
Remove redundant sanity checks from ipsec[46]_common_input_cb().

This check already has been done in the each protocol callback.

MFC r309144,309174,309201 (by fabient):
IPsec RFC6479 support for replay window sizes up to 2^32 - 32 packets.

Since the previous algorithm, based on bit shifting, does not scale
with large replay windows, the algorithm used here is based on
RFC 6479: IPsec Anti-Replay Algorithm without Bit Shifting.
The replay window will be fast to be updated, but will cost as many bits
in RAM as its size.

The previous implementation did not provide a lock on the replay window,
which may lead to replay issues.

Obtained from: emeric.poupon@stormshield.eu
Sponsored by: Stormshield
Differential Revision: https://reviews.freebsd.org/D8468

MFC r309143,309146 (by fabient):
In a dual processor system (2*6 cores) during IPSec throughput tests,
we see a lot of contention on the arc4 lock, used to generate the IV
of the ESP output packets.

The idea of this patch is to split this mutex in order to reduce the
contention on this lock.

Update r309143 to prevent false sharing.

Reviewed by: delphij, markm, ache
Approved by: so
Obtained from: emeric.poupon@stormshield.eu
Sponsored by: Stormshield
Differential Revision: https://reviews.freebsd.org/D8130

MFC r313330:
Merge projects/ipsec into head/.

Small summary
-------------

o Almost all IPsec releated code was moved into sys/netipsec.
o New kernel modules added: ipsec.ko and tcpmd5.ko. New kernel
option IPSEC_SUPPORT added. It enables support for loading
and unloading of ipsec.ko and tcpmd5.ko kernel modules.
o IPSEC_NAT_T option was removed. Now NAT-T support is enabled by
default. The UDP_ENCAP_ESPINUDP_NON_IKE encapsulation type
support was removed. Added TCP/UDP checksum handling for
inbound packets that were decapsulated by transport mode SAs.
setkey(8) modified to show run-time NAT-T configuration of SA.
o New network pseudo interface if_ipsec(4) added. For now it is
build as part of ipsec.ko module (or with IPSEC kernel).
It implements IPsec virtual tunnels to create route-based VPNs.
o The network stack now invokes IPsec functions using special
methods. The only one header file <netipsec/ipsec_support.h>
should be included to declare all the needed things to work
with IPsec.
o All IPsec protocols handlers (ESP/AH/IPCOMP protosw) were removed.
Now these protocols are handled directly via IPsec methods.
o TCP_SIGNATURE support was reworked to be more close to RFC.
o PF_KEY SADB was reworked:
- now all security associations stored in the single SPI namespace,
and all SAs MUST have unique SPI.
- several hash tables added to speed up lookups in SADB.
- SADB now uses rmlock to protect access, and concurrent threads
can do SA lookups in the same time.
- many PF_KEY message handlers were reworked to reflect changes
in SADB.
- SADB_UPDATE message was extended to support new PF_KEY headers:
SADB_X_EXT_NEW_ADDRESS_SRC and SADB_X_EXT_NEW_ADDRESS_DST. They
can be used by IKE daemon to change SA addresses.
o ipsecrequest and secpolicy structures were cardinally changed to
avoid locking protection for ipsecrequest. Now we support
only limited number (4) of bundled SAs, but they are supported
for both INET and INET6.
o INPCB security policy cache was introduced. Each PCB now caches
used security policies to avoid SP lookup for each packet.
o For inbound security policies added the mode, when the kernel does
check for full history of applied IPsec transforms.
o References counting rules for security policies and security
associations were changed. The proper SA locking added into xform
code.
o xform code was also changed. Now it is possible to unregister xforms.
tdb_xxx structures were changed and renamed to reflect changes in
SADB/SPDB, and changed rules for locking and refcounting.

Obtained from: Yandex LLC
Relnotes: yes
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D9352

MFC r313331:
Add removed headers into the ObsoleteFiles.inc.

MFC r313561 (by glebius):
Move tcp_fields_to_net() static inline into tcp_var.h, just below its
friend tcp_fields_to_host(). There is third party code that also uses
this inline.

MFC r313697:
Remove IPsec related PCB code from SCTP.

The inpcb structure has inp_sp pointer that is initialized by
ipsec_init_pcbpolicy() function. This pointer keeps strorage for IPsec
security policies associated with a specific socket.
An application can use IP_IPSEC_POLICY and IPV6_IPSEC_POLICY socket
options to configure these security policies. Then ip[6]_output()
uses inpcb pointer to specify that an outgoing packet is associated
with some socket. And IPSEC_OUTPUT() method can use a security policy
stored in the inp_sp. For inbound packet the protocol-specific input
routine uses IPSEC_CHECK_POLICY() method to check that a packet conforms
to inbound security policy configured in the inpcb.

SCTP protocol doesn't specify inpcb for ip[6]_output() when it sends
packets. Thus IPSEC_OUTPUT() method does not consider such packets as
associated with some socket and can not apply security policies
from inpcb, even if they are configured. Since IPSEC_CHECK_POLICY()
method is called from protocol-specific input routine, it can specify
inpcb pointer and associated with socket inbound policy will be
checked. But there are two problems:
1. Such check is asymmetric, becasue we can not apply security policy
from inpcb for outgoing packet.
2. IPSEC_CHECK_POLICY() expects that caller holds INPCB lock and
access to inp_sp is protected. But for SCTP this is not correct,
becasue SCTP uses own locks to protect inpcb.

To fix these problems remove IPsec related PCB code from SCTP.
This imply that IP_IPSEC_POLICY and IPV6_IPSEC_POLICY socket options
will be not applicable to SCTP sockets. To be able correctly check
inbound security policies for SCTP, mark its protocol header with
the PR_LASTHDR flag.

Differential Revision: https://reviews.freebsd.org/D9538

MFC r313746:
Add missing check to fix the build with IPSEC_SUPPORT and without MAC.

MFC r313805:
Fix LINT build for powerpc.

Build kernel modules support only when both IPSEC and TCP_SIGNATURE
are not defined.

MFC r313922:
For translated packets do not adjust UDP checksum if it is zero.

In case when decrypted and decapsulated packet is an UDP datagram,
check that its checksum is not zero before doing incremental checksum
adjustment.

MFC r314339:
Document that the size of AH ICV for HMAC-SHA2-NNN should be half of
NNN bits as described in RFC4868.

PR: 215978

MFC r314812:
Introduce the concept of IPsec security policies scope.

Currently are defined three scopes: global, ifnet, and pcb.
Generic security policies that IKE daemon can add via PF_KEY interface
or an administrator creates with setkey(8) utility have GLOBAL scope.
Such policies can be applied by the kernel to outgoing packets and checked
agains inbound packets after IPsec processing.
Security policies created by if_ipsec(4) interfaces have IFNET scope.
Such policies are applied to packets that are passed through if_ipsec(4)
interface.
And security policies created by application using setsockopt()
IP_IPSEC_POLICY option have PCB scope. Such policies are applied to
packets related to specific socket. Currently there is no way to list
PCB policies via setkey(8) utility.

Modify setkey(8) and libipsec(3) to be able distinguish the scope of
security policies in the `setkey -DP` listing. Add two optional flags:
'-t' to list only policies related to virtual *tunneling* interfaces,
i.e. policies with IFNET scope, and '-g' to list only policies with GLOBAL
scope. By default policies from all scopes are listed.

To implement this PF_KEY's sadb_x_policy structure was modified.
sadb_x_policy_reserved field is used to pass the policy scope from the
kernel to userland. SADB_SPDDUMP message extended to support filtering
by scope: sadb_msg_satype field is used to specify bit mask of requested
scopes.

For IFNET policies the sadb_x_policy_priority field of struct sadb_x_policy
is used to pass if_ipsec's interface if_index to the userland. For GLOBAL
policies sadb_x_policy_priority is used only to manage order of security
policies in the SPDB. For IFNET policies it is not used, so it can be used
to keep if_index.

After this change the output of `setkey -DP` now looks like:
# setkey -DPt
0.0.0.0/0[any] 0.0.0.0/0[any] any
in ipsec
esp/tunnel/87.250.242.144-87.250.242.145/unique:145
spid=7 seq=3 pid=58025 scope=ifnet ifname=ipsec0
refcnt=1
# setkey -DPg
::/0 ::/0 icmp6 135,0
out none
spid=5 seq=1 pid=872 scope=global
refcnt=1

Obtained from: Yandex LLC
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D9805

PR: 212018
Relnotes: yes
Sponsored by: Yandex LLC


# 310218 18-Dec-2016 tuexen

MFC r309607:

Whitespace changes.

The tools using to generate the sources has been updated and produces
different whitespaces. Commit this seperately to avoid intermixing
these with real code changes.


# 302408 07-Jul-2016 gjb

Copy head@r302406 to stable/11 as part of the 11.0-RELEASE cycle.
Prune svn:mergeinfo from the new branch, as nothing has been merged
here.

Additional commits post-branch will follow.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation


/freebsd-11-stable/MAINTAINERS
/freebsd-11-stable/cddl
/freebsd-11-stable/cddl/contrib/opensolaris
/freebsd-11-stable/cddl/contrib/opensolaris/cmd/dtrace/test/tst/common/print
/freebsd-11-stable/cddl/contrib/opensolaris/cmd/zfs
/freebsd-11-stable/cddl/contrib/opensolaris/lib/libzfs
/freebsd-11-stable/contrib/amd
/freebsd-11-stable/contrib/apr
/freebsd-11-stable/contrib/apr-util
/freebsd-11-stable/contrib/atf
/freebsd-11-stable/contrib/binutils
/freebsd-11-stable/contrib/bmake
/freebsd-11-stable/contrib/byacc
/freebsd-11-stable/contrib/bzip2
/freebsd-11-stable/contrib/com_err
/freebsd-11-stable/contrib/compiler-rt
/freebsd-11-stable/contrib/dialog
/freebsd-11-stable/contrib/dma
/freebsd-11-stable/contrib/dtc
/freebsd-11-stable/contrib/ee
/freebsd-11-stable/contrib/elftoolchain
/freebsd-11-stable/contrib/elftoolchain/ar
/freebsd-11-stable/contrib/elftoolchain/brandelf
/freebsd-11-stable/contrib/elftoolchain/elfdump
/freebsd-11-stable/contrib/expat
/freebsd-11-stable/contrib/file
/freebsd-11-stable/contrib/gcc
/freebsd-11-stable/contrib/gcclibs/libgomp
/freebsd-11-stable/contrib/gdb
/freebsd-11-stable/contrib/gdtoa
/freebsd-11-stable/contrib/groff
/freebsd-11-stable/contrib/ipfilter
/freebsd-11-stable/contrib/ldns
/freebsd-11-stable/contrib/ldns-host
/freebsd-11-stable/contrib/less
/freebsd-11-stable/contrib/libarchive
/freebsd-11-stable/contrib/libarchive/cpio
/freebsd-11-stable/contrib/libarchive/libarchive
/freebsd-11-stable/contrib/libarchive/libarchive_fe
/freebsd-11-stable/contrib/libarchive/tar
/freebsd-11-stable/contrib/libc++
/freebsd-11-stable/contrib/libc-vis
/freebsd-11-stable/contrib/libcxxrt
/freebsd-11-stable/contrib/libexecinfo
/freebsd-11-stable/contrib/libpcap
/freebsd-11-stable/contrib/libstdc++
/freebsd-11-stable/contrib/libucl
/freebsd-11-stable/contrib/libxo
/freebsd-11-stable/contrib/llvm
/freebsd-11-stable/contrib/llvm/projects/libunwind
/freebsd-11-stable/contrib/llvm/tools/clang
/freebsd-11-stable/contrib/llvm/tools/lldb
/freebsd-11-stable/contrib/llvm/tools/llvm-dwarfdump
/freebsd-11-stable/contrib/llvm/tools/llvm-lto
/freebsd-11-stable/contrib/mdocml
/freebsd-11-stable/contrib/mtree
/freebsd-11-stable/contrib/ncurses
/freebsd-11-stable/contrib/netcat
/freebsd-11-stable/contrib/ntp
/freebsd-11-stable/contrib/nvi
/freebsd-11-stable/contrib/one-true-awk
/freebsd-11-stable/contrib/openbsm
/freebsd-11-stable/contrib/openpam
/freebsd-11-stable/contrib/openresolv
/freebsd-11-stable/contrib/pf
/freebsd-11-stable/contrib/sendmail
/freebsd-11-stable/contrib/serf
/freebsd-11-stable/contrib/sqlite3
/freebsd-11-stable/contrib/subversion
/freebsd-11-stable/contrib/tcpdump
/freebsd-11-stable/contrib/tcsh
/freebsd-11-stable/contrib/tnftp
/freebsd-11-stable/contrib/top
/freebsd-11-stable/contrib/top/install-sh
/freebsd-11-stable/contrib/tzcode/stdtime
/freebsd-11-stable/contrib/tzcode/zic
/freebsd-11-stable/contrib/tzdata
/freebsd-11-stable/contrib/unbound
/freebsd-11-stable/contrib/vis
/freebsd-11-stable/contrib/wpa
/freebsd-11-stable/contrib/xz
/freebsd-11-stable/crypto/heimdal
/freebsd-11-stable/crypto/openssh
/freebsd-11-stable/crypto/openssl
/freebsd-11-stable/gnu/lib
/freebsd-11-stable/gnu/usr.bin/binutils
/freebsd-11-stable/gnu/usr.bin/cc/cc_tools
/freebsd-11-stable/gnu/usr.bin/gdb
/freebsd-11-stable/lib/libc/locale/ascii.c
/freebsd-11-stable/sys/cddl/contrib/opensolaris
/freebsd-11-stable/sys/contrib/dev/acpica
/freebsd-11-stable/sys/contrib/ipfilter
/freebsd-11-stable/sys/contrib/libfdt
/freebsd-11-stable/sys/contrib/octeon-sdk
/freebsd-11-stable/sys/contrib/x86emu
/freebsd-11-stable/sys/contrib/xz-embedded
/freebsd-11-stable/usr.sbin/bhyve/atkbdc.h
/freebsd-11-stable/usr.sbin/bhyve/bhyvegc.c
/freebsd-11-stable/usr.sbin/bhyve/bhyvegc.h
/freebsd-11-stable/usr.sbin/bhyve/console.c
/freebsd-11-stable/usr.sbin/bhyve/console.h
/freebsd-11-stable/usr.sbin/bhyve/pci_fbuf.c
/freebsd-11-stable/usr.sbin/bhyve/pci_xhci.c
/freebsd-11-stable/usr.sbin/bhyve/pci_xhci.h
/freebsd-11-stable/usr.sbin/bhyve/ps2kbd.c
/freebsd-11-stable/usr.sbin/bhyve/ps2kbd.h
/freebsd-11-stable/usr.sbin/bhyve/ps2mouse.c
/freebsd-11-stable/usr.sbin/bhyve/ps2mouse.h
/freebsd-11-stable/usr.sbin/bhyve/rfb.c
/freebsd-11-stable/usr.sbin/bhyve/rfb.h
/freebsd-11-stable/usr.sbin/bhyve/sockstream.c
/freebsd-11-stable/usr.sbin/bhyve/sockstream.h
/freebsd-11-stable/usr.sbin/bhyve/usb_emul.c
/freebsd-11-stable/usr.sbin/bhyve/usb_emul.h
/freebsd-11-stable/usr.sbin/bhyve/usb_mouse.c
/freebsd-11-stable/usr.sbin/bhyve/vga.c
/freebsd-11-stable/usr.sbin/bhyve/vga.h
# 297663 07-Apr-2016 rrs

A couple of minor changes that I missed that Michael had done, most noted
in these is the change to non-strict ordering for incoming data (this will
make pkt-drill test 14 fail but its expected).


# 297662 07-Apr-2016 rrs

This is work done by Michael Tuexen and myself at the IETF. This
adds the new I-Data (Interleaved Data) message. This allows a user
to be able to have complete freedom from Head Of Line blocking that
was previously there due to the in-ability to send multiple large
messages without the TSN's being in sequence. The code as been
tested with Michaels various packet drill scripts as well as
inter-networking between the IETF's location in Argentina and Germany.


# 292782 27-Dec-2015 allanjude

Replace sys/crypto/sha2/sha2.c with lib/libmd/sha512c.c

cperciva's libmd implementation is 5-30% faster

The same was done for SHA256 previously in r263218

cperciva's implementation was lacking SHA-384 which I implemented, validated against OpenSSL and the NIST documentation

Extend sbin/md5 to create sha384(1)

Chase dependancies on sys/crypto/sha2/sha2.{c,h} and replace them with sha512{c.c,.h}

Reviewed by: cperciva, des, delphij
Approved by: secteam, bapt (mentor)
MFC after: 2 weeks
Sponsored by: ScaleEngine Inc.
Differential Revision: https://reviews.freebsd.org/D3929


# 285877 25-Jul-2015 tuexen

Move including netinet/icmp6.h around to avoid a problem when including
netinet/icmp6.h and net/netmap.h. Both use ni_flags...
This allows to build multistack with SCTP support.

MFC after: 1 week


# 284515 17-Jun-2015 tuexen

Add FIB support for SCTP.
This fixes https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=200379

MFC after: 3 days


# 279889 11-Mar-2015 tuexen

Update a comment to get it aligned with the code change.

Reported by: brueffer@


# 279888 11-Mar-2015 tuexen

It seems that sb_acc is a better replacement for sb_cc than sb_ccc. At
least it unbreaks the use of select() for SCTP sockets.

MFC after: 3 days


# 276911 10-Jan-2015 tuexen

Retire SCTP_BUF_EXTEND_SIZE. This patch was suggested by
Robert Watson.


# 276692 05-Jan-2015 rwatson

To ease changes to underlying mbuf structure and the mbuf allocator, reduce
the knowledge of mbuf layout, and in particular constants such as M_EXT,
MLEN, MHLEN, and so on, in mbuf consumers by unifying various alignment
utility functions (M_ALIGN(), MH_ALIGN(), MEXT_ALIGN() in a single
M_ALIGN() macro, implemented by a now-inlined m_align() function:

- Move m_align() from uipc_mbuf.c to mbuf.h; mark as __inline.
- Reimplement M_ALIGN(), MH_ALIGN(), and MEXT_ALIGN() using m_align().
- Update consumers around the tree to simply use M_ALIGN().

This change eliminates a number of cases where mbuf consumers must be aware
of whether or not mbufs returned by the allocator use external storage, but
also assumptions about the size of the returned mbuf. This will make it
easier to introduce changes in how we use external storage, as well as
features such as variable-size mbufs.

Differential Revision: https://reviews.freebsd.org/D1436
Reviewed by: glebius, trasz, gnn, bz
Sponsored by: EMC / Isilon Storage Division


# 275427 02-Dec-2014 tuexen

Do the renaming of sb_cc to sb_ccc in a way with less code changes by
using a macro.
This is an alternate approach to
https://svnweb.freebsd.org/changeset/base/275326
which is easier to handle upstream.

Discussed with: rrs, glebius


# 275326 30-Nov-2014 glebius

Merge from projects/sendfile:

o Introduce a notion of "not ready" mbufs in socket buffers. These
mbufs are now being populated by some I/O in background and are
referenced outside. This forces following implications:
- An mbuf which is "not ready" can't be taken out of the buffer.
- An mbuf that is behind a "not ready" in the queue neither.
- If sockbet buffer is flushed, then "not ready" mbufs shouln't be
freed.

o In struct sockbuf the sb_cc field is split into sb_ccc and sb_acc.
The sb_ccc stands for ""claimed character count", or "committed
character count". And the sb_acc is "available character count".
Consumers of socket buffer API shouldn't already access them directly,
but use sbused() and sbavail() respectively.
o Not ready mbufs are marked with M_NOTREADY, and ready but blocked ones
with M_BLOCKED.
o New field sb_fnrdy points to the first not ready mbuf, to avoid linear
search.
o New function sbready() is provided to activate certain amount of mbufs
in a socket buffer.

A special note on SCTP:
SCTP has its own sockbufs. Unfortunately, FreeBSD stack doesn't yet
allow protocol specific sockbufs. Thus, SCTP does some hacks to make
itself compatible with FreeBSD: it manages sockbufs on its own, but keeps
sb_cc updated to inform the stack of amount of data in them. The new
notion of "not ready" data isn't supported by SCTP. Instead, only a
mechanical substitute is done: s/sb_cc/sb_ccc/.
A proper solution would be to take away struct sockbuf from struct
socket and allow protocols to implement their own socket buffers, like
SCTP already does. This was discussed with rrs@.

Sponsored by: Netflix
Sponsored by: Nginx, Inc.


# 271665 16-Sep-2014 tuexen

The MTU is handled as a 32-bit entity within the SCTP stack.
This was reported by Peter Kasting from Google.

MFC after: 3 days


# 271204 06-Sep-2014 tuexen

Fix the handling of sysctl variables when used with VIMAGE.
While there do some cleanup of the code.

MFC after: 1 week


# 268537 11-Jul-2014 tuexen

Whitespace changes.

MFC after: 1 week


# 268530 11-Jul-2014 glebius

Fix style bug: rename the refcount field of m_ext to ext_cnt, to match
other members.

Sponsored by: Nginx, Inc.


# 262763 04-Mar-2014 glebius

- Remove rt_metrics_lite and simply put its members into rtentry.
- Use counter(9) for rt_pksent (former rt_rmx.rmx_pksent). This
removes another cache trashing ++ from packet forwarding path.
- Create zini/fini methods for the rtentry UMA zone. Via initialize
mutex and counter in them.
- Fix reporting of rmx_pksent to routing socket.
- Fix netstat(1) to report "Use" both in kvm(3) and sysctl(3) mode.

The change is mostly targeted for stable/10 merge. For head,
rt_pksent is expected to just disappear.

Discussed with: melifaro
Sponsored by: Netflix
Sponsored by: Nginx, Inc.


# 255160 02-Sep-2013 tuexen

All changes affect only SCTP-AUTH:
* Remove non working code related to SHA224.
* Remove support for non-standardised HMAC-IDs using SHA384 and SHA512.
* Prefer SHA256 over SHA1.
* Minor cleanup.

MFC after: 2 weeks


# 254527 19-Aug-2013 andre

Reorder the mbuf defines to make more sense and group related flags
together.

Add M_FLAG_PRINTF for use with printf(9) %b indentifier.

Use the generic mbuf flags print names in the net80211 code and adjust
the protocol specific bits for their new positions.

Change SCTP M_PROTO mapping from 5 to 1 to fit within the 16bit field
they use internally to store some additional information.

Discussed with: trociny, glebius


# 254523 19-Aug-2013 andre

Add m_clrprotoflags() to clear protocol specific mbuf flags at up and
downwards layer crossings.

Consistently use it within IP, IPv6 and ethernet protocols.

Discussed with: trociny, glebius


# 254521 19-Aug-2013 andre

Move the SCTP specific definition of M_NOTIFICATION onto a protocol
specific mbuf flag from sys/mbuf.h to netinet/sctp_os_bsd.h. It is
only relevant within SCTP.

Discussed with: tuexen


# 250962 24-May-2013 tuexen

Withdraw http://svnweb.freebsd.org/changeset/base/250809
since the real fix is in http://svnweb.freebsd.org/changeset/base/250952.


# 250809 19-May-2013 tuexen

Initialize the fibnum for outgoing packets to 0. This avoids
crashing due to the usage of uninitialized fibnum.
This bugs became visiable after
http://svnweb.freebsd.org/changeset/base/250700

MFC after: 2 weeks


# 237715 28-Jun-2012 tuexen

Pass the src and dst address of a received packet explicitly around.

MFC after: 3 days


# 237540 24-Jun-2012 tuexen

Do packet logging in a consistent way.

MFC after: 3 days


# 235828 23-May-2012 tuexen

Use consistent text at the begining of the files.

MFC after: 3 days


# 234995 04-May-2012 tuexen

Use SCTP_PRINTF() instead of printf() in all SCTP sources.

MFC after: 3 days


# 231852 17-Feb-2012 bz

Merge multi-FIB IPv6 support from projects/multi-fibv6/head/:

Extend the so far IPv4-only support for multiple routing tables (FIBs)
introduced in r178888 to IPv6 providing feature parity.

This includes an extended rtalloc(9) KPI for IPv6, the necessary
adjustments to the network stack, and user land support as in netstat.

Sponsored by: Cisco Systems, Inc.
Reviewed by: melifaro (basically)
MFC after: 10 days


# 221251 30-Apr-2011 tuexen

Remove some leftover debug code.

MFC after: 1 week


# 221249 30-Apr-2011 tuexen

Improve compilation of SCTP code without INET support.
Some bugs where fixed while doing this:
* ASCONF-ACK messages might use wrong port number when using
IPv6.
* Checking for additional addresses takes the correct address
into account and also does not do more comparisons than
necessary.

This patch is based on one received from bz@ who was
sponsored by The FreeBSD Foundation and iXsystems.

MFC after: 1 week


# 218319 05-Feb-2011 rrs

1) Typo correction in comments and one spacing change.
2) Mass update to all copyrights.
MFC after: 3 Months


# 218211 03-Feb-2011 rrs

Adds an experimental option to create a pool of
threads. These serve as input threads and are queued
packets based on the V-tag number. This is similar to
what a modern card can do with queue's for TCP... but
alas modern cards know nothing about SCTP.

MFC after: 3 months (maybe)


# 215034 09-Nov-2010 brucec

Fix typos.

PR: bin/148894
Submitted by: olgeni


# 211969 29-Aug-2010 tuexen

Fix the the SCTP_WITH_NO_CSUM option when used in combination with
interface supporting CRC offload. While at it, make use of the
feature that the loopback interface provides CRC offloading.

MFC after: 4 weeks


# 202449 16-Jan-2010 tuexen

Get rid of support of an old version of the SCTP-AUTH draft.
Get rid of unused MD5 code.

MFC after: 1 week


# 199437 17-Nov-2009 tuexen

Use always LIST_EMPTY instead of sometime SCTP_LIST_EMPTY,
which is defined as LIST_EMPTY.

Approved by: rrs (mentor)
MFC after: 1 month


# 199370 17-Nov-2009 tuexen

Allow the UMA to free data. This resolves the UMA related bug reported
by Julian.

Approved by: rrs (mentor)
MFC after: 1 month


# 197929 10-Oct-2009 tuexen

Correct include order as indicated by bz.

Approved by: re (mentor)
MFC after: 3 days


# 197914 09-Oct-2009 tuexen

Do not include vnet.h twice.

Approved by: rrs (mentor)
MFC after: 3 days


# 197326 19-Sep-2009 tuexen

Get SCTP working in combination with VIMAGE.
Contains code from bz.
Approved by: rrs (mentor)
MFC after: 1 month.


# 197288 17-Sep-2009 rrs

Support for VNET in SCTP (hopefully)


# 196229 14-Aug-2009 zec

SCTP is not yet compatible with options VIMAGE kernels although it compiles
with VIMAGE defined, so explicitly disallow building such kernels.

Reviewed by: rrs
Approved by: re (rwatson), julian (mentor)


# 196019 01-Aug-2009 rwatson

Merge the remainder of kern_vimage.c and vimage.h into vnet.c and
vnet.h, we now use jails (rather than vimages) as the abstraction
for virtualization management, and what remained was specific to
virtual network stacks. Minor cleanups are done in the process,
and comments updated to reflect these changes.

Reviewed by: bz
Approved by: re (vimage blanket)


# 195699 14-Jul-2009 rwatson

Build on Jeff Roberson's linker-set based dynamic per-CPU allocator
(DPCPU), as suggested by Peter Wemm, and implement a new per-virtual
network stack memory allocator. Modify vnet to use the allocator
instead of monolithic global container structures (vinet, ...). This
change solves many binary compatibility problems associated with
VIMAGE, and restores ELF symbols for virtualized global variables.

Each virtualized global variable exists as a "reference copy", and also
once per virtual network stack. Virtualized global variables are
tagged at compile-time, placing the in a special linker set, which is
loaded into a contiguous region of kernel memory. Virtualized global
variables in the base kernel are linked as normal, but those in modules
are copied and relocated to a reserved portion of the kernel's vnet
region with the help of a the kernel linker.

Virtualized global variables exist in per-vnet memory set up when the
network stack instance is created, and are initialized statically from
the reference copy. Run-time access occurs via an accessor macro, which
converts from the current vnet and requested symbol to a per-vnet
address. When "options VIMAGE" is not compiled into the kernel, normal
global ELF symbols will be used instead and indirection is avoided.

This change restores static initialization for network stack global
variables, restores support for non-global symbols and types, eliminates
the need for many subsystem constructors, eliminates large per-subsystem
structures that caused many binary compatibility issues both for
monitoring applications (netstat) and kernel modules, removes the
per-function INIT_VNET_*() macros throughout the stack, eliminates the
need for vnet_symmap ksym(2) munging, and eliminates duplicate
definitions of virtualized globals under VIMAGE_GLOBALS.

Bump __FreeBSD_version and update UPDATING.

Portions submitted by: bz
Reviewed by: bz, zec
Discussed with: gnn, jamie, jeff, jhb, julian, sam
Suggested by: peter
Approved by: re (kensmith)


# 193744 08-Jun-2009 bz

After r193232 rt_tables in vnet.h are no longer indirectly dependent on
the ROUTETABLES kernel option thus there is no need to include opt_route.h
anymore in all consumers of vnet.h and no longer depend on it for module
builds.

Remove the hidden include in flowtable.h as well and leave the two
explicit #includes in ip_input.c and ip_output.c.


# 191845 06-May-2009 zec

Unbreak options VIMAGE kernel builds.

Approved by: julian (mentor)


# 189371 04-Mar-2009 rrs

- PR-SCTP bug, where the CUM-ACK was not being updated
into the advance_peer_ack point so we would incorrectly
send a wrong value in the FWD-TSN
- PR-SCTP bug, where an PR packet is used for a window
probe which could incorrectly get the packet moved
back into the send_queue, which will cause major issues and
should not happen.
- Fix a trace to use the proper macro.


# 189106 27-Feb-2009 bz

For all files including net/vnet.h directly include opt_route.h and
net/route.h.

Remove the hidden include of opt_route.h and net/route.h from net/vnet.h.

We need to make sure that both opt_route.h and net/route.h are included
before net/vnet.h because of the way MRT figures out the number of FIBs
from the kernel option. If we do not, we end up with the default number
of 1 when including net/vnet.h and array sizes are wrong.

This does not change the list of files which depend on opt_route.h
but we can identify them now more easily.


# 188067 03-Feb-2009 rrs

- Cleanup checksum code.
- Prepare for CRC offloading, add MIB counters (RS/MT).
- Bugfix: Disable CRC computation for IPv6 addresses with local scope (MT).
- Bugfix: Handle close() with SO_LINGER correctly when notifications
are generated during the close() call(MT).
- Bugfix: Generate DRY event when sender is dry during subscription.
Only for 1-to-1 style sockets (RS/MT)
- Bugfix: Put vtags for the correct amount of time into time-wait (MT).
- Bugfix: Clear vtag entries correctly on expiration (MT).
- Bugfix: shutdown() indicates ENOTCONN when called for unconnected
1-to-1 style sockets (MT).
- Bugfix: In sctp Auth code (PL).
- Add support for devices that support SCTP csum offload (igb).
- Add missing sctp_associd to mib sysctl xsctp_tcb structure (RS)
Obtained from: With help from Peter Lei and Michael Tuexen


# 185895 10-Dec-2008 zec

Conditionally compile out V_ globals while instantiating the appropriate
container structures, depending on VIMAGE_GLOBALS compile time option.

Make VIMAGE_GLOBALS a new compile-time option, which by default will not
be defined, resulting in instatiations of global variables selected for
V_irtualization (enclosed in #ifdef VIMAGE_GLOBALS blocks) to be
effectively compiled out. Instantiate new global container structures
to hold V_irtualized variables: vnet_net_0, vnet_inet_0, vnet_inet6_0,
vnet_ipsec_0, vnet_netgraph_0, and vnet_gif_0.

Update the VSYM() macro so that depending on VIMAGE_GLOBALS the V_
macros resolve either to the original globals, or to fields inside
container structures, i.e. effectively

#ifdef VIMAGE_GLOBALS
#define V_rt_tables rt_tables
#else
#define V_rt_tables vnet_net_0._rt_tables
#endif

Update SYSCTL_V_*() macros to operate either on globals or on fields
inside container structs.

Extend the internal kldsym() lookups with the ability to resolve
selected fields inside the virtualization container structs. This
applies only to the fields which are explicitly registered for kldsym()
visibility via VNET_MOD_DECLARE() and vnet_mod_register(), currently
this is done only in sys/net/if.c.

Fix a few broken instances of MODULE_GLOBAL() macro use in SCTP code,
and modify the MODULE_GLOBAL() macro to resolve to V_ macros, which in
turn result in proper code being generated depending on VIMAGE_GLOBALS.

De-virtualize local static variables in sys/contrib/pf/net/pf_subr.c
which were prematurely V_irtualized by automated V_ prepending scripts
during earlier merging steps. PF virtualization will be done
separately, most probably after next PF import.

Convert a few variable initializations at instantiation to
initialization in init functions, most notably in ipfw. Also convert
TUNABLE_INT() initializers for V_ variables to TUNABLE_FETCH_INT() in
initializer functions.

Discussed at: devsummit Strassburg
Reviewed by: bz, julian
Approved by: julian (mentor)
Obtained from: //depot/projects/vimage-commit2/...
X-MFC after: never
Sponsored by: NLnet Foundation, The FreeBSD Foundation


# 185694 06-Dec-2008 rrs

Code from the hack-session known as the IETF (and a
bit of debugging afterwards):
- Fix protection code for notification generation.
- Decouple associd from vtag
- Allow vtags to have less strigent requirements in non-uniqueness.
o don't pre-hash them when you issue one in a cookie.
o Allow duplicates and use addresses and ports to
discriminate amongst the duplicates during lookup.
- Add support for the NAT draft draft-ietf-behave-sctpnat-00, this
is still experimental and needs more extensive testing with the
Jason Butt ipfw changes.
- Support for the SENDER_DRY event to get DTLS in OpenSSL working
with a set of patches from Michael Tuexen (hopefully heading to OpenSSL soon).
- Update the support of SCTP-AUTH by Peter Lei.
- Use macros for refcounting.
- Fix MTU for UDP encapsulation.
- Fix reporting back of unsent data.
- Update assoc send counter handling to be consistent with endpoint sent counter.
- Fix a bug in PR-SCTP.
- Fix so we only send another FWD-TSN when a SACK arrives IF and only
if the adv-peer-ack point progressed. However we still make sure
a timer is running if we do have an adv_peer_ack point.
- Fix PR-SCTP bug where chunks were retransmitted if they are sent
unreliable but not abandoned yet.

With the help of: Michael Teuxen and Peter Lei :-)
MFC after: 4 weeks


# 185571 02-Dec-2008 bz

Rather than using hidden includes (with cicular dependencies),
directly include only the header files needed. This reduces the
unneeded spamming of various headers into lots of files.

For now, this leaves us with very few modules including vnet.h
and thus needing to depend on opt_route.h.

Reviewed by: brooks, gnn, des, zec, imp
Sponsored by: The FreeBSD Foundation


# 184334 27-Oct-2008 rrs

Get rid of ifdef for vimage on version 8 comparison. Now the
scrubbing program properly takes care of this.


# 184205 23-Oct-2008 des

Retire the MALLOC and FREE macros. They are an abomination unto style(9).

MFC after: 3 months


# 181803 17-Aug-2008 bz

Commit step 1 of the vimage project, (network stack)
virtualization work done by Marko Zec (zec@).

This is the first in a series of commits over the course
of the next few weeks.

Mark all uses of global variables to be virtualized
with a V_ prefix.
Use macros to map them back to their global names for
now, so this is a NOP change only.

We hope to have caught at least 85-90% of what is needed
so we do not invalidate a lot of outstanding patches again.

Obtained from: //depot/projects/vimage-commit2/...
Reviewed by: brooks, des, ed, mav, julian,
jamie, kris, rwatson, zec, ...
(various people I forgot, different versions)
md5 (with a bit of help)
Sponsored by: NLnet Foundation, The FreeBSD Foundation
X-MFC after: never
V_Commit_Message_Reviewed_By: more people than the patch


# 180387 09-Jul-2008 rrs

1) Adds the rest of the VIMAGE change macros
2) Adds some __UserSpace__ on some of the common defines that
the user space code needs
3) Fixes a bug when we send up data to a user that failed. We
need to a) trim off the data chunk headers, if present, and
b) make sure the frag bit is communicated properly for the
msgs coming off the stream queues... i.e. we see if some
of the msg has been taken.

Obtained from: jeli contributed the VIMAGE changes on this pass Thanks Julain!


# 179786 14-Jun-2008 rrs

- Fixes foobar on my part. Some missing virtualization macros from
specific logging cases.


# 179783 14-Jun-2008 rrs

- Macro-izes the packed declaration in all headers.
- Vimage prep - these are major restructures to move
all global variables to be accessed via a macro or two.
The variables all go into a single structure.
- Asconf address addition tweaks (add_or_del Interfaces)
- Fix rwnd calcualtion to be more conservative.
- Support SACK_IMMEDIATE flag to skip delayed sack
by demand of peer.
- Comment updates in the sack mapping calculations
- Invarients panic added.
- Pre-support for UDP tunneling (we can do this on
MAC but will need added support from UDP to
get a "pipe" of UDP packets in.
- clear trace buffer sysctl added when local tracing on.

Note the majority of this huge patch is all the vimage prep stuff :-)


# 179157 20-May-2008 rrs

- Adds support for the multi-asconf (From Kozuka-san)
- Adds some prepwork (Not all yet) for vimage in particular
support the delete the sctppcbinfo.xx structs. There is
still a leak in here if it were to be called plus we stil
need the regrouping (From Me and Michael Tuexen)
- Adds support for UDP tunneling. For BSD there is no
socket yet setup so its disabled, but major argument
changes are in here to emcompass the passing of the port
number (zero when you don't have a udp tunnel, the default
for BSD). Will add some hooks in UDP here shortly (discussed
with Robert) that will allow easy tunneling. (Mainly from
Peter Lei and Michael Tuexen with some BSD work from me :-D)
- Some ease for windows, evidently leave is reserved by their
compile move label leave: -> out:

MFC after: 1 week


# 178888 09-May-2008 julian

Add code to allow the system to handle multiple routing tables.
This particular implementation is designed to be fully backwards compatible
and to be MFC-able to 7.x (and 6.x)

Currently the only protocol that can make use of the multiple tables is IPv4
Similar functionality exists in OpenBSD and Linux.

From my notes:

-----

One thing where FreeBSD has been falling behind, and which by chance I
have some time to work on is "policy based routing", which allows
different
packet streams to be routed by more than just the destination address.

Constraints:
------------

I want to make some form of this available in the 6.x tree
(and by extension 7.x) , but FreeBSD in general needs it so I might as
well do it in -current and back port the portions I need.

One of the ways that this can be done is to have the ability to
instantiate multiple kernel routing tables (which I will now
refer to as "Forwarding Information Bases" or "FIBs" for political
correctness reasons). Which FIB a particular packet uses to make
the next hop decision can be decided by a number of mechanisms.
The policies these mechanisms implement are the "Policies" referred
to in "Policy based routing".

One of the constraints I have if I try to back port this work to
6.x is that it must be implemented as a EXTENSION to the existing
ABIs in 6.x so that third party applications do not need to be
recompiled in timespan of the branch.

This first version will not have some of the bells and whistles that
will come with later versions. It will, for example, be limited to 16
tables in the first commit.
Implementation method, Compatible version. (part 1)
-------------------------------
For this reason I have implemented a "sufficient subset" of a
multiple routing table solution in Perforce, and back-ported it
to 6.x. (also in Perforce though not always caught up with what I
have done in -current/P4). The subset allows a number of FIBs
to be defined at compile time (8 is sufficient for my purposes in 6.x)
and implements the changes needed to allow IPV4 to use them. I have not
done the changes for ipv6 simply because I do not need it, and I do not
have enough knowledge of ipv6 (e.g. neighbor discovery) needed to do it.

Other protocol families are left untouched and should there be
users with proprietary protocol families, they should continue to work
and be oblivious to the existence of the extra FIBs.

To understand how this is done, one must know that the current FIB
code starts everything off with a single dimensional array of
pointers to FIB head structures (One per protocol family), each of
which in turn points to the trie of routes available to that family.

The basic change in the ABI compatible version of the change is to
extent that array to be a 2 dimensional array, so that
instead of protocol family X looking at rt_tables[X] for the
table it needs, it looks at rt_tables[Y][X] when for all
protocol families except ipv4 Y is always 0.
Code that is unaware of the change always just sees the first row
of the table, which of course looks just like the one dimensional
array that existed before.

The entry points rtrequest(), rtalloc(), rtalloc1(), rtalloc_ign()
are all maintained, but refer only to the first row of the array,
so that existing callers in proprietary protocols can continue to
do the "right thing".
Some new entry points are added, for the exclusive use of ipv4 code
called in_rtrequest(), in_rtalloc(), in_rtalloc1() and in_rtalloc_ign(),
which have an extra argument which refers the code to the correct row.

In addition, there are some new entry points (currently called
rtalloc_fib() and friends) that check the Address family being
looked up and call either rtalloc() (and friends) if the protocol
is not IPv4 forcing the action to row 0 or to the appropriate row
if it IS IPv4 (and that info is available). These are for calling
from code that is not specific to any particular protocol. The way
these are implemented would change in the non ABI preserving code
to be added later.

One feature of the first version of the code is that for ipv4,
the interface routes show up automatically on all the FIBs, so
that no matter what FIB you select you always have the basic
direct attached hosts available to you. (rtinit() does this
automatically).

You CAN delete an interface route from one FIB should you want
to but by default it's there. ARP information is also available
in each FIB. It's assumed that the same machine would have the
same MAC address, regardless of which FIB you are using to get
to it.

This brings us as to how the correct FIB is selected for an outgoing
IPV4 packet.

Firstly, all packets have a FIB associated with them. if nothing
has been done to change it, it will be FIB 0. The FIB is changed
in the following ways.

Packets fall into one of a number of classes.

1/ locally generated packets, coming from a socket/PCB.
Such packets select a FIB from a number associated with the
socket/PCB. This in turn is inherited from the process,
but can be changed by a socket option. The process in turn
inherits it on fork. I have written a utility call setfib
that acts a bit like nice..

setfib -3 ping target.example.com # will use fib 3 for ping.

It is an obvious extension to make it a property of a jail
but I have not done so. It can be achieved by combining the setfib and
jail commands.

2/ packets received on an interface for forwarding.
By default these packets would use table 0,
(or possibly a number settable in a sysctl(not yet)).
but prior to routing the firewall can inspect them (see below).
(possibly in the future you may be able to associate a FIB
with packets received on an interface.. An ifconfig arg, but not yet.)

3/ packets inspected by a packet classifier, which can arbitrarily
associate a fib with it on a packet by packet basis.
A fib assigned to a packet by a packet classifier
(such as ipfw) would over-ride a fib associated by
a more default source. (such as cases 1 or 2).

4/ a tcp listen socket associated with a fib will generate
accept sockets that are associated with that same fib.

5/ Packets generated in response to some other packet (e.g. reset
or icmp packets). These should use the FIB associated with the
packet being reponded to.

6/ Packets generated during encapsulation.
gif, tun and other tunnel interfaces will encapsulate using the FIB
that was in effect withthe proces that set up the tunnel.
thus setfib 1 ifconfig gif0 [tunnel instructions]
will set the fib for the tunnel to use to be fib 1.

Routing messages would be associated with their
process, and thus select one FIB or another.
messages from the kernel would be associated with the fib they
refer to and would only be received by a routing socket associated
with that fib. (not yet implemented)

In addition Netstat has been edited to be able to cope with the
fact that the array is now 2 dimensional. (It looks in system
memory using libkvm (!)). Old versions of netstat see only the first FIB.

In addition two sysctls are added to give:
a) the number of FIBs compiled in (active)
b) the default FIB of the calling process.

Early testing experience:
-------------------------

Basically our (IronPort's) appliance does this functionality already
using ipfw fwd but that method has some drawbacks.

For example,
It can't fully simulate a routing table because it can't influence the
socket's choice of local address when a connect() is done.

Testing during the generating of these changes has been
remarkably smooth so far. Multiple tables have co-existed
with no notable side effects, and packets have been routes
accordingly.

ipfw has grown 2 new keywords:

setfib N ip from anay to any
count ip from any to any fib N

In pf there seems to be a requirement to be able to give symbolic names to the
fibs but I do not have that capacity. I am not sure if it is required.

SCTP has interestingly enough built in support for this, called VRFs
in Cisco parlance. it will be interesting to see how that handles it
when it suddenly actually does something.

Where to next:
--------------------

After committing the ABI compatible version and MFCing it, I'd
like to proceed in a forward direction in -current. this will
result in some roto-tilling in the routing code.

Firstly: the current code's idea of having a separate tree per
protocol family, all of the same format, and pointed to by the
1 dimensional array is a bit silly. Especially when one considers that
there is code that makes assumptions about every protocol having the
same internal structures there. Some protocols don't WANT that
sort of structure. (for example the whole idea of a netmask is foreign
to appletalk). This needs to be made opaque to the external code.

My suggested first change is to add routing method pointers to the
'domain' structure, along with information pointing the data.
instead of having an array of pointers to uniform structures,
there would be an array pointing to the 'domain' structures
for each protocol address domain (protocol family),
and the methods this reached would be called. The methods would have
an argument that gives FIB number, but the protocol would be free
to ignore it.

When the ABI can be changed it raises the possibilty of the
addition of a fib entry into the "struct route". Currently,
the structure contains the sockaddr of the desination, and the resulting
fib entry. To make this work fully, one could add a fib number
so that given an address and a fib, one can find the third element, the
fib entry.

Interaction with the ARP layer/ LL layer would need to be
revisited as well. Qing Li has been working on this already.

This work was sponsored by Ironport Systems/Cisco

Reviewed by: several including rwatson, bz and mlair (parts each)
Obtained from: Ironport systems/Cisco


# 172218 18-Sep-2007 rrs

- The address lock is changed to a rwlock. This
also involves macro changes to have a RLOCK and a WLOCK
and placing the correct version within the code.
- The INP-INFO lock is changed to a rwlock.
- When sctp_shutdown() is called on Mac OS X, the socket lock is held.
So call sctp_chunk_output with SCTP_SO_LOCKED and
not SCTP_SO_NOT_LOCKED.
- Add SCTP_IPI_ADDR_[RW]LOCK and SCTP_IPI_ADDR_[RW]UNLOCK for Mac OS X.
- u_int64_t -> uint64_t
- add missing addr unlock for error return path
Approved by: re@freebsd.org (K Smith)


# 172090 08-Sep-2007 rrs

- Locking compatiability changes. This involves adding
additional flags to many function calls. The flags only
get used in BSD when we compile with lock testing. These
flags allow apple to escape the "giant" lock it holds on
the socket and have more fine-grained locking in the NKE.
It also allows us to test (with witness) the locking used
by apple via a compile switch (manually applied).

Approved by: re@freebsd.org(B Mah)


# 171943 23-Aug-2007 rrs

- Fix address add handling to clear cached routes and source addresses
when peer acks the add in case the routing table changes.
- Fix sctp_lower_sosend to send shutdown chunk for mbuf send
case when sndlen = 0 and sinfoflag = SCTP_EOF
- Fix sctp_lower_sosend for SCTP_ABORT mbuf send case with null data,
So that it does not send the "null" data mbuf out and cause
it to get freed twice.
- Fix so auto-asconf sysctl actually effect the socket's asconf state.
- Do not allow SCTP_AUTO_ASCONF option to be used on subset bound sockets.
- Memset bug in sctp_output.c (arguments were reversed) submitted
found and reported by Dave Jones (davej@codemonkey.org.uk).
- PD-API point needs to be invoked >= not just > to conform to socket api
draft this fixes sctp_indata.c in the two places need to be >=.
- move M_NOTIFICATION to use M_PROTO5.
- PEER_ADDR_PARAMS did not fail properly if you specify an address
that is not in the association with a valid assoc_id. This meant
you got or set the stcb level values instead of the destination
you thought you were going to get/set. Now validate if the
stcb is non-null and the net is NULL that the sa_family is
set and the address is unspecified otherwise return an error.
- The thread based iterator could crash if associations were freed
at the exact time it was running. rework the worker thread to
use the increment/decrement to prevent this and no longer use
the markers that the timer based iterator uses.
- Fix the memleak in sctp_add_addr_to_vrf() for the case when it is
detected that ifa is already pointing to a ifn.
- Fix it so that if someone is so insane that they drop the
send window below the minimal add mark, they still can send.
- Changed all state for associations to use mask safe macro.
- During front states in association freeing in sctp_inpcbfree, we
had a locking problem where locks were not in place where they
should have been.
- Free association calls were not testing the return value in
sctp_inpcb_free() properly... others should be cast void returns
where we don't care about the return value.
- If a reference count is held on an assoc, even from the "force free"
we should not do the actual free.. but instead let the timer
free it.
- When we enter sctp_input(), if the SCTP_ASOC_ABOUT_TO_BE_FREED
flag is set, we must NOT process the packet but handle it like
ootb. This is because while freeing an assoc we release the
locks to get all the higher order locks so we can purge all
the hash tables. This leaves a hole if a packet comes in
just at that point. Now sctp_common_input_processing() will
call the ootb code in such a case.
- Change MBUF M_NOTIFICATION to use M_PROTO5 (per Sam L). This makes
it so we don't have a conflict (I think this is a covertity change).
We made this change AFTER some conversation and looking to make sure
that M_PROTO5 does not have a problem between SCTP and the 802.11
stuff (which is the only other place its used).
- Fixed lock order reversal and missing atomic protection around
locked_tcb during association lookup and the 1-2-1 model.
- Added debug to source address selection.
- V6 output must always do checksum even for loopback.
- Remove more locks around inp that are not needed for an atomically
added/subtracted ref count.
- slight optimization in the way we zero the array in sctp_sack_check()
- It was possible to respond to a ABORT() with bad checksum with
a PKT-DROP. This lead to a PKT-DROP/ABORT war. Add code to NOT
send a PKT-DROP to any ABORT().
- Add an option for local logging (useful for macintosh or when
you need better performing during debugging). Note no commands
are here to get the log info, you must just use kgdb.
- The timer code needs to be aware of if it needs to call
sctp_sack_check() to slide the maps and adjust the cum-ack.
This is because it may be out of sync cum-ack wise.
- Added threshold managment logging.
- If the user picked just the right size, that just filled the send
window minus one mtu, we would enter a forever loop not copying and
at the same time not blocking. Change from < to <= solves this.
- Sysctl added to control the fragment interleave level which defaults
to 1.
- My rwnd control was not being used to control the rwnd properly (we
did not add and subtract to it :-() this is now fixed so we handle
small messages (1 byte etc) better to bring our rwnd down more
slowly.

Approved by: re@freebsd.org (Bruce Mah)


# 171531 21-Jul-2007 rrs

- remove duplicate code from sctp_asconf.c
- remove duplicate #include <sys/priv.h> that is not under
#ifdef FreeBSD version to allow compile on 6.1
- static analysis changes per the cisco SA tool including:
o some SA_IGNORE comments
o some checks for NULL before unlock.
o type corrections int -> size_t
- Fix it so sctp_alloc_asoc takes a thread/proc argument. Without this
we pass a NULL in to bind on implicit assoc setup and crash :-(
Approved by: re@freebsd.org(Ken Smith)


# 171477 17-Jul-2007 rrs

- added pre-checks to the bindx call.
- use proper tick gathering macro instead of ticks directly.
- Placed reasonable boundaries on sets that a user can do
that are converted to ticks from ms.
- Fix CMT_PF to always check to be sure CMT is on.
- Fix ticks use of CMT_PF.
- put back code to allow asconfs to be queued while INITs are in flight
and before the assoc is established.
- During window probes, an ack'd packet might be left with the window
probe mark on it causing it to be retransmitted. Change so that
the flight decrease macro clears the window_probe mark.
- Additional logging flight size/reading and ASOC LOG. This
is only enabled if you manually insert things into opt_sctp.h
since its a set of debug code only.
- Found an interesting SMP race in the way data was appended which
could cause a reader to lose a part of a message, had to
reorder when we marked the message was complete to after
the data was appended.
- bug in ADD-IP for the subset bound socket case when the peer has only
one address
- fix ASCONF implicit success/error handling case
- proper support of jails in Freebsd 6>
- copy out the timeval for the 64 bit sparc world on cookie-echo
alignment error crashes without this).
Approved by: re(Ken Smith)


# 171167 03-Jul-2007 gnn

Commit the change from FAST_IPSEC to IPSEC. The FAST_IPSEC
option is now deprecated, as well as the KAME IPsec code.
What was FAST_IPSEC is now IPSEC.

Approved by: re
Sponsored by: Secure Computing


# 171139 01-Jul-2007 gnn

Fix a dangling netinet6 to netipsec transition for SCTP include files.

Approved by: re


# 171133 01-Jul-2007 gnn

Commit IPv6 support for FAST_IPSEC to the tree.
This commit includes only the kernel files, the rest of the files
will follow in a second commit.

Reviewed by: bz
Approved by: re
Supported by: Secure Computing


# 170806 15-Jun-2007 rrs

- Matthew's changes to get inlines out, plus a few of my own
to deal with the VRF inline function -> becomes a macro now.
Submitted by: Matthew Jacobs


# 170747 15-Jun-2007 rrs

KTR_GEN -> KTR_SUBSYS (for Kris).


# 170744 14-Jun-2007 rrs

- Fix so ifn's are properly deleted when the ref count goes to 0.
- Fix so VRF's will clean themselves up when no references are around.
- Allow sctp_ifa to be passed into inpcb_bind, addr_mgmt_ep_sa to bypass
normal validation checks.
- turn auto-asconf off for subset bound sockets
- Moves all logging to use KTR. This gets rid of most
of the logging #ifdef's with a few exceptions reducing
the number of config options for SCTP.


# 170606 12-Jun-2007 rrs

- Restructure so bindx functions are not done inline to socket option
but are a seperate call that can be re-used if needed.
- 64 bit issues
o re-arrange cookie so it is better 64 bit aligned
o For wire level things we need the packed attribute.


# 170205 02-Jun-2007 rrs

- fix initial pcb vrf setting when the initial vrf is not the
default_vrf_id
- Missing lock/unlock of inp added as well in the v6 side.
- IFN hash table moves to sctppcbinfo since indexes are
unique across systems (including different VRFs) this makes it easier
to do ifn lookups.


# 170181 01-Jun-2007 rrs

- Take out the broken table-id concept. Panda Routers have a M-VRF
concept that is NOT well thought out for a multi-homed transport
protocol. So the useless table-id entries passed around need to
be removed.
- Add a event timer for the zero copy api.
- Fix a bug in sctp_timer.c when searching for an alternate
with the largest ssthresh (the compare was wrong).


# 170138 30-May-2007 rrs

- Fix a memory overwrite when the mapping array
is expanded, size of expansion was not taken int consideration.
- Fix so vtag hash is 1 bigger so that it modulo's out
correctly, avoids a panic when restart with right modulo happens.
- do not dereference stcb when control->do_not_ref_stcb is set
- Fix up packet logging to not often use a lock and also to
add to options.
- Fix some logging option duplication in the sctputil.h


# 170091 29-May-2007 rrs

- Fixes so we won't try to start a timer when we
hold a wq lock for the iterator. Panda uses a
silly recursive lock they hold through the timer.
- Add poor mans wireshark compile option..
- Allocate and start using SCTP_M_XXX for all SCTP_MALLOC() calls.
- sysctl now will get back the refcnt for viewing by onlookers.

Reviewed by: gnn


# 169420 09-May-2007 rrs

Two major items here:
- All printf that was surrounded by #ifdef SCTP_DEBUG moves to
a macro that does all of this. This removes all printfs from
the code and makes the code more portable and easier to
read.
- Static Analysis (cisco) - found a few bugs, but mostly we
add checks for NULL pointers and such to make the tool
happy. We now pass the Cisco SA tools checks except for
where it does not understand tailq/lists. We still need
to look at the coverity tools output too (this is like
the cisco SA tool) and see if it wants us to fix any other
items. Hopefully this will be the last major churn in the
code other than bug fixes.


# 169382 08-May-2007 rrs

- Copyright change, cisco's silly tool wants it to say:
"Copyright (c) 2001-2007, by Cisco Systems,"
instead of
*Copyright (c) 2001-2007, Cisco Systems,"

- Also fix a few straglers that were still in 2006.


# 169352 07-May-2007 rrs

- More macros for OS compatabilty
- PR-SCTP would ignore FWD-TSN's above a rwnd's worth
of TSN's (1 byte msgs).. this left the peer hopelessly
out of sync.. or an attacker. So now we abort the assoc.
- New IFN hash, also rename hashes to match addr/ifn now
that the vrf has multiple.
- Do not enable SCTP_PCB_FLAGS_RECVDATAIOEVNT per default
as defined in the Socket API ID.
- Export MTU information via sysctl.
- Vrf's need table id's. This is default for
BSD, but may be other things later when BSD
fully supports VRFs.
- Additional stream reset bug (caught by cisco dev-test).
- Additional validations for the address in sending a message (socket api).
-------- and -----
- Fix association notifications not to give the active open
side false notifications.
- Fix so sendfile and SENDALL will work properly (missing
flag to say socket sender is done).
- Fix Bug that prevented COOKIES from being retransmitted.
- Break out connectx into helper sub-models so that iox routines can
reuse the helpers.
- When an address is added during system init (non-dynamic mode) make
sure that the "defer use" flag is not set.
** its compiling on XR now :-D **

Reviewed by: gnn


# 169208 02-May-2007 rrs

- Somehow the disable fragment option got lost. We could
set/clear it but would not do it. Now we will.
- Moved to latest socket api for extended sndrcv info struct.
- Moved to support all new levels of fragment interleave (0-2).
- Codenomicon security test updates - length checks and such.
- Bug in stream reset (2 actually).
- setpeerprimary could unlock a null pointer, fixed.
- Added a flag in the pcb so netstat can see if we are listening easier.

Obtained from: (some of the Listen changes from Weongyo Jeong)


# 168299 03-Apr-2007 rrs

- fixed several places where we did not release INP locks.
- fixed a refcount bug in the new ifa structures.
- use vrf's from default stcb or inp whenever possible.
- Address limits raised to account for a full IP fragmented
packet (1000 addresses).
- flight size correcting updated to include one message only
and to handle case where the peer does not cumack the
next segment aka lists 1/1 in sack blocks..
- Various bad init/init-ack handling could cause a panic
since we tried to unlock the destroyed mutex. Fixes
so we properly exit when we need to destroy an assoc.
(Found by Cisco DevTest team :D)
- name rename in src-addr-selection from pass to sifa.
- route structure typedef'd to allow different platforms
and updated into sctp_os_bsd file.
- Max retransmissions a chunk can be made added.
Reviewed by: gnn


# 168124 31-Mar-2007 rrs

- Found bug in min split point bundling which caused
incorrect, non-bundlable fragmentation.
- Added min residual to better control split points for
both how big a msg must be as well as how much needs
to be left over.
- With our new algo in place, we need to implicitly
set "end of msg" on the sp-> structure otherwise we
end up with "hung" associations.
- Room reserved up front in IP header by pushing IP
header to back of mbuf.
- Fix so FR's peg count of retransmissions needed.
- Fix so an unlucky chunk that never gets across
will kill the assoc via the kill timer and send an
abort too.
- Fix bug in sctp_input which can result in a crash.
- Do not strip off IP options anymore.
- Clean up sctp_calculate_rto().
- Get rid of unused sysctl.
- Fixed so we discard all M-Cast
- Fixed so port check done AFTER checksum
- Fixed bug in fragmentation code that prevented
us from fragmenting a small complete message when
we needed to.
- Window probes were not marked back to unsent and
flight adjusted when a sack came in with no
window change or accepting of the probe data.
We now fix this with having a mark on the net and
the chunk so we can clear it out when the sack arrives
forcing it to retran just like it was "new" this
improves the handling of window probes, which were
dropped by the receiver.
- Tighten AUTH protocol error checks during INIT/INIT-ACK exchange


# 167736 20-Mar-2007 rrs

- window update sacks sent incorrectly after
shutdown which caused extra abort from peer.
- RTT time calculation was not being done in
express sack handling since it refered to an unused
variable (rto_pending). Removed variable.
- socket buffer high water access macro-ized.


# 167698 19-Mar-2007 rrs

Adds a hash table to speed local address lookup
on a per VRF basis (BSD has only one VRF currently).
Hash table is sized to 16 but may need to be adjusted
for machines with large numbers of addresses.
Reviewed by: gnn


# 167695 19-Mar-2007 rrs

- errno -> becomes error in sctp_output.c and sctputil.c
- SB_CLEAR macro defined and used for sb clearing.
- Fix for CMT express_sack_handling did not do proper
pseudo-cumack updates.
- Get rid of extraneous function that was never used ip_2_ip6_hdr()
- Fixed source address selection bug (initialization problem).
- Source address selection debug added.


# 167598 15-Mar-2007 rrs

- Sysctl's move to seperate file
- moved away from ifn/ifa access to sctp_ifa/sctp_ifn
built and managed by the add-ip code.
- cleaned up add-ip code to use the iterator
- made iterator be a thread, which enables auto-asconf now.
- rewrote and cleaned up source address selection (also
made it use new structures).
- Fixed a couple of memory leaks.
- DACK now settable as to how many packets to delay as
well as time.
- connectx() to latest socket API, new associd arg.
- Fixed issue with revoking and loosing potential to
send when we inflate the flight size. We now inflate
the cwnd too and deflate it later when the revoked
chunk is sent or acked.
- Got rid of some temp debug code
- src addr selection moved to a common file (sctp_output.c)
- Support for simple VRF's (we have support for multi-vfr
via compile switch that is scrubbed from BSD but we won't
need multi-vrf until we first get VRF :-D)
- Rest of mib work for address information now done
- Limit number of addresses in INIT/INIT-ACK to
a #def (30).

Reviewed by: gnn


# 166675 12-Feb-2007 rrs

- Copyright updates (aka 2007)
- ZONE get now also take a type cast so it does the
cast like mtod does.
- New macro SCTP_LIST_EMPTY, which in bsd is just
LIST_EMPTY
- Removal of const in some of the static hmac functions
(not needed)
- Store length changes to allow for new fields in auth
- Auth code updated to current draft (this should be the
RFC version we think).
- use uint8_t instead of u_char in LOOPBACK address comparison
- Some u_int32_t converted to uint32_t (in crc code)
- A bug was found in the mib counts for ordered/unordered
count, this was fixed (was referencing a freed mbuf).
- SCTP_ASOCLOG_OF_TSNS added (code will probably disappear
after my testing completes. It allows us to keep a
small log on each assoc of the last 40 TSN's in/out and
stream assignment. It is NOT in options and so is only
good for private builds.
- Some CMT changes in prep for Jana fixing his problem
with reneging when CMT is enabled (Concurrent Multipath
Transfer = CMT).
- Some missing mib stats added.
- Correction to number of open assoc's count in mib
- Correction to os_bsd.h to get right sha2 macros
- Add of special AUTH_04 flags so you can compile the code
with the old format (in case the peer does not yet support
the latest auth code).
- Nonce sum was incorrectly being set in when ecn_nonce was
NOT on.
- LOR in listen with implicit bind found and fixed.
- Moved away from using mbuf's for socket options to using
just data pointers. The mbufs were used to harmonize
NetBSD code since both Net and Open used this method. We
have decided to move away from that and more conform to
FreeBSD style (which makes more sense).
- Very very nasty bug found in some of my "debug" code. The
cookie_how collision case tracking had an endless loop in
it if you got a second retransmission of a cookie collision
case. This would lock up a CPU .. ugly..
- auth function goes to using size_t instead of int which
conforms to socketapi better
- Found the nasty bug that happens after 9 days of testing.. you
get the data chunk, deliver it and due to the reference to a ch->
that every now and then has been deleted (depending on the postion
in the mbuf) you have an invalid ch->ch.flags.. and thus you don't
advance the stream sequence number.. so you block the stream
permanently. The fix is to make local variables of these guys
and set them up before you have any chance of trimming the
mbuf.
- style fix in sctp_util.h, not sure how this got bad maybe in
the last patch? (aka it may not be in the real source).
- Found interesting bug when using the extended snd/rcv info where
we would get an error on receiving with this. Thats because
it was NOT padded to the same size as the snd_rcv info. We
increase (add the pad) so the two structs are the same size
in sctp_uio.h
- In sctp_usrreq.c one of the most common things we did for
socket options was to cast the pointer and validate the size.
This as been macro-ized to help make the code more readable.
- in sctputil.c two things, the socketapi class found a missing
flag type (the next msg is a notification) and a missing
scope recovery was also fixed.

Reviewed by: gnn


# 166086 18-Jan-2007 rrs

- most all includes (#include <>) migrate to the sctp_os_bsd.h file
- Finally all splxx() are removed
- Count error fixed in mapping array which might
cause a wrong cumack generation.
- Invariants around panic for case D + printf when no invariants.
- one-to-one model race condition fixed by using
a pre-formed connection and then completing the
work so accept won't happen on a non-formed
association.
- Some additional paranoia checks in sctp_output.
- Locks that were missing in the accept code.

Approved by: gnn


# 166023 15-Jan-2007 rrs

- Macroizes the V6ONLY flag check.
- Added a short time wait (not used yet) constant
- Corrected the type of the crc32c table (it was
unsigned long and really is a uint32_t
- Got rid of the user of MHeaders until they
are truely needed by lower layers.
- Fixed an initialization problem in the readq structure
(ordering was off).
- Found yet another collision bug when the random number
generator returns two numbers on one side (during a collision)
that are the same. Also added some tracking of cookies
that will go away when we know that we have the last collision
bug gone.
- Fixed an init bug for book_size_scale, that was causing
Early FR code to run when it should not.
- Fixed a flight size tracking bug that was associated with
Early FR but due to above bug also effected all FR's
- Fixed it so Max Burst also will apply to Fast Retransmit.
- Fixed a bug in the temporary logging code that allowed a
static log array overflow
- hashinit_flags is now used.
- Two last mcopym's were converted to the macro sctp_m_copym that
has always been used by all other places
- macro sctp_m_copym was converted to upper case.
- We now validate sinfo_flags on input (we did not before).
- Fixed a bug that prevented a user from sending data and immediately
shuting down with one send operation.
- Moved to use hashdestroy instead of free() in our macros.
- Fixed an init problem in our timed_wait vtag where we
did not fully initialize our time-wait blocks.
- Timer stops were re-positioned.
- A pcb cleanup method was added, however this probably will
not be used in BSD.. unless we make module loadable protocols
- I think this fixes the mysterious timer bug.. it was a
ordering of locks problem in the way we did timers. It
now conforms to the timeout(9) manual (except for the
_drain part, we had to do this a different way due
to locks).
- Fixed error return code so we get either CONNREUSED or CONNRESET
depending on where one is in progression
- Purged an unused clone macro.
- Fixed a read erro code issue where we were NOT getting the proper
error when the connection was reset.
- Purged an unused clone macro.
- Fixed a read erro code issue where we were NOT getting the proper
error when the connection was reset.
Approved by: gnn


# 165647 29-Dec-2006 rrs

a) macro-ization of all mbuf and random number
access plus timers. This makes the code
more portable and able to change out the
mbuf or timer system used more easily ;-)
b) removal of all use of pkt-hdr's until only
the places we need them (before ip_output routines).
c) remove a bunch of code not needed due to <b> aka
worrying about pkthdr's :-)
d) There was one last reorder problem it looks where
if a restart occur's and we release and relock (at
the point where we setup our alias vtag) we would
end up possibly getting the wrong TSN in place. The
code that fixed the TSN's just needed to be shifted
around BEFORE the release of the lock.. also code that
set the state (since this also could contribute).
Approved by: gnn


# 165220 14-Dec-2006 rrs

1) Fixes on a number of different collision case LOR's.
2) Fix all "magic numbers" to be constants.
3) A collision case that would generate two associations to
the same peer due to a missing lock is fixed.
4) Added tracking of where timers are stopped.
Approved by: gnn


# 163954 03-Nov-2006 rrs

Opps... in my fix up of all the $FreeBSD:$-> $FreeBSD$ I
inserted a few to the new files.. but I falied to
add the #include <sys/cdef.h>

Which causes a compile error.. sorry about that... got it
now :-)

Approved by:gnn


# 163953 03-Nov-2006 rrs

Ok, here it is, we finally add SCTP to current. Note that this
work is not just mine, but it is also the works of Peter Lei
and Michael Tuexen. They both are my two key other developers
working on the project.. and they need ata-boy's too:
****
peterlei@cisco.com
tuexen@fh-muenster.de
****
I did do a make sysent which updated the
syscall's and sysproto.. I hope that is correct... without
it you don't build since we have new syscalls for SCTP :-0

So go out and look at the NOTES, add
option SCTP (make sure inet and inet6 are present too)
and play with SCTP.

I will see about comitting some test tools I have after I
figure out where I should place them. I also have a
lib (libsctp.a) that adds some of the missing socketapi
functions that I need to put into lib's.. I will talk
to George about this :-)

There may still be some 64 bit issues in here, none of
us have a 64 bit processor to test with yet.. Michael
may have a MAC but thats another beast too..

If you have a mac and want to use SCTP contact Michael
he maintains a web site with a loadable module with
this code :-)

Reviewed by: gnn
Approved by: gnn