History log of /freebsd-10-stable/sys/netinet6/
Revision Date Author Comments
(<<< Hide modified files)
(Show modified files >>>)
343139 18-Jan-2019 hselasky

MFC r342884:
Fix loopback traffic when using non-lo0 link local IPv6 addresses.

The loopback interface can only receive packets with a single scope ID,
namely the scope ID of the loopback interface itself. To mitigate this
packets which use the scope ID are appearing as received by the real
network interface, see "origifp" in the patch. The current code would
drop packets which are designated for loopback which use a link-local
scope ID in the destination address or source address, because they
won't match the lo0's scope ID. To fix this restore the network
interface pointer from the scope ID in the destination address for
the problematic cases. See comments added in patch for a more detailed
description.

This issue was introduced with route caching by karels@ .

Reviewed by: bz (network)
Differential Revision: https://reviews.freebsd.org/D18769
Sponsored by: Mellanox Technologies

338985 27-Sep-2018 gordon

There are various cases where we modify the inp_vflag and inp_inc.inc_flags
fields during a syscall, but don't restore those fields if the operation
fails. This can leave the inp structure in an inconsistent state and cause
various problems.

Restore the inp_vflag and inp_inc.inc_flags fields when the underlying
operation fails and the inp could be in an inconsistent state.

This is a direct commit to the branch as the code is different enough in
the other branches to make it difficult to resolve a merge.

Submitted by: jtl@
Reported by: Jakub Jirasek, Secunia Research at Flexera
Reviewed by: jhb@
Approved by: so
Security: FreeBSD-EN-18:11.listen
Security: CVE-2018-6925

329158 12-Feb-2018 ae

MFC r328876:
Modify ip6_get_prevhdr() to be able use it safely.

Instead of returning pointer to the previous header, return its offset.
In frag6_input() use m_copyback() and determined offset to store next
header instead of accessing to it by pointer and assuming that the memory
is contiguous.

In rip6_input() use offset returned by ip6_get_prevhdr() instead of
calculating it from pointers arithmetic, because IP header can belong
to another mbuf in the chain.

Reported by: Maxime Villard <max at m00nbsd dot net>

328878 05-Feb-2018 ae

MFC r328770:
Merge r1.120 from NetBSD:
Fix a pretty simple, yet pretty tragic typo: we should return IPPROTO_DONE,
not IPPROTO_NONE. With IPPROTO_NONE we will keep parsing the header chain
on an mbuf that was already freed.

Reported by: Maxime Villard <max at m00nbsd dot net>

327550 04-Jan-2018 pfg

MFC r327295:
Start syncing changes from OpenBSD's ip6_id.c instead of ip_id.c.

correct non-repetitive ID code, based on comments from niels provos.
- seed2 is necessary, but use it as "seed2 + x" not "seed2 ^ x".
- skipping number is not needed, so disable it for 16bit generator (makes
the repetition period to 30000)

Obtained from: OpenBSD (CVS rev. 1.2)

321134 18-Jul-2017 ngie

MFC r318255:

Add missing braces around MCAST_EXCLUDE check when KTR support is
compiled into the kernel

This ensures that .iss_asm (the number of ASM listeners) isn't incorrectly
decremented for MLD-layer source datagrams when inspecting im*s_st[1]
(the second state in the structure).

PR: 217509 [1]

317335 23-Apr-2017 kp

MFC r317186

pf: Fix possible incorrect IPv6 fragmentation

When forwarding pf tracks the size of the largest fragment in a fragmented
packet, and refragments based on this size.
It failed to ensure that this size was a multiple of 8 (as is required for all
but the last fragment), so it could end up generating incorrect fragments.

For example, if we received an 8 byte and 12 byte fragment pf would emit a first
fragment with 12 bytes of payload and the final fragment would claim to be at
offset 8 (not 12).

We now assert that the fragment size is a multiple of 8 in ip6_fragment(), so
other users won't make the same mistake.

Reported by: Antonios Atlasis <aatlasis at secfu net>

314829 07-Mar-2017 ae

MFC r314430:
When IPv6 fragments reassembly is complete, update mbuf's csum_data
and csum_flags using information from all fragments. This fixes
dropping of reassembled packets due to wrong checksum when the IPv6
checksum offloading is enabled on a network card.

314667 04-Mar-2017 avg

MFC r283291: don't use CALLOUT_MPSAFE with callout_init()

The main purpose of this MFC is to reduce conflicts for other merges.
Parts of the original change have already "trickled down" via individual MFCs.


/freebsd-10-stable/sys/amd64/amd64/mp_watchdog.c
/freebsd-10-stable/sys/cddl/contrib/opensolaris/uts/common/dtrace/dtrace.c
/freebsd-10-stable/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa_misc.c
/freebsd-10-stable/sys/cddl/dev/profile/profile.c
/freebsd-10-stable/sys/compat/ndis/subr_ntoskrnl.c
/freebsd-10-stable/sys/contrib/ipfilter/netinet/ip_fil_freebsd.c
/freebsd-10-stable/sys/dev/altera/jtag_uart/altera_jtag_uart_tty.c
/freebsd-10-stable/sys/dev/ath/if_ath.c
/freebsd-10-stable/sys/dev/ce/if_ce.c
/freebsd-10-stable/sys/dev/cp/if_cp.c
/freebsd-10-stable/sys/dev/ctau/if_ct.c
/freebsd-10-stable/sys/dev/cx/if_cx.c
/freebsd-10-stable/sys/dev/cxgb/cxgb_main.c
/freebsd-10-stable/sys/dev/cxgb/cxgb_sge.c
/freebsd-10-stable/sys/dev/dcons/dcons_os.c
/freebsd-10-stable/sys/dev/drm2/drm_irq.c
/freebsd-10-stable/sys/dev/drm2/i915/intel_display.c
/freebsd-10-stable/sys/dev/glxsb/glxsb.c
/freebsd-10-stable/sys/dev/gxemul/cons/gxemul_cons.c
/freebsd-10-stable/sys/dev/hifn/hifn7751.c
/freebsd-10-stable/sys/dev/hyperv/storvsc/hv_storvsc_drv_freebsd.c
/freebsd-10-stable/sys/dev/if_ndis/if_ndis.c
/freebsd-10-stable/sys/dev/isci/isci_io_request.c
/freebsd-10-stable/sys/dev/mfi/mfi.c
/freebsd-10-stable/sys/dev/mwl/if_mwl.c
/freebsd-10-stable/sys/dev/nand/nandsim_chip.c
/freebsd-10-stable/sys/dev/ntb/ntb_hw/ntb_hw.c
/freebsd-10-stable/sys/dev/nxge/if_nxge.c
/freebsd-10-stable/sys/dev/oce/oce_if.c
/freebsd-10-stable/sys/dev/patm/if_patm_attach.c
/freebsd-10-stable/sys/dev/rndtest/rndtest.c
/freebsd-10-stable/sys/dev/safe/safe.c
/freebsd-10-stable/sys/dev/sound/midi/mpu401.c
/freebsd-10-stable/sys/dev/sound/pci/atiixp.c
/freebsd-10-stable/sys/dev/sound/pci/es137x.c
/freebsd-10-stable/sys/dev/sound/pci/hda/hdaa.c
/freebsd-10-stable/sys/dev/sound/pci/hda/hdac.c
/freebsd-10-stable/sys/dev/sound/pci/via8233.c
/freebsd-10-stable/sys/dev/twa/tw_osl_freebsd.c
/freebsd-10-stable/sys/dev/tws/tws.c
/freebsd-10-stable/sys/dev/ubsec/ubsec.c
/freebsd-10-stable/sys/dev/virtio/random/virtio_random.c
/freebsd-10-stable/sys/dev/xen/netfront/netfront.c
/freebsd-10-stable/sys/fs/nfs/nfs_commonport.c
/freebsd-10-stable/sys/gdb/gdb_cons.c
/freebsd-10-stable/sys/geom/gate/g_gate.c
/freebsd-10-stable/sys/geom/journal/g_journal.c
/freebsd-10-stable/sys/geom/mirror/g_mirror.c
/freebsd-10-stable/sys/geom/raid3/g_raid3.c
/freebsd-10-stable/sys/geom/sched/gs_rr.c
/freebsd-10-stable/sys/i386/i386/mp_watchdog.c
/freebsd-10-stable/sys/kern/init_main.c
/freebsd-10-stable/sys/kern/kern_synch.c
/freebsd-10-stable/sys/kern/kern_thread.c
/freebsd-10-stable/sys/kern/subr_vmem.c
/freebsd-10-stable/sys/kern/uipc_domain.c
/freebsd-10-stable/sys/mips/cavium/octe/ethernet.c
/freebsd-10-stable/sys/mips/cavium/octeon_rnd.c
/freebsd-10-stable/sys/mips/nlm/dev/net/xlpge.c
/freebsd-10-stable/sys/mips/rmi/dev/xlr/rge.c
/freebsd-10-stable/sys/net/if_spppsubr.c
/freebsd-10-stable/sys/net80211/ieee80211_ht.c
/freebsd-10-stable/sys/net80211/ieee80211_hwmp.c
/freebsd-10-stable/sys/net80211/ieee80211_mesh.c
/freebsd-10-stable/sys/net80211/ieee80211_node.c
/freebsd-10-stable/sys/net80211/ieee80211_proto.c
/freebsd-10-stable/sys/netgraph/netflow/ng_netflow.c
/freebsd-10-stable/sys/netgraph/netgraph.h
/freebsd-10-stable/sys/netinet/in_pcb.c
/freebsd-10-stable/sys/netinet/ip_mroute.c
/freebsd-10-stable/sys/netinet/tcp_hostcache.c
/freebsd-10-stable/sys/netinet/tcp_subr.c
in6_rmx.c
/freebsd-10-stable/sys/netpfil/ipfw/ip_dummynet.c
/freebsd-10-stable/sys/netpfil/ipfw/ip_fw_dynamic.c
/freebsd-10-stable/sys/netpfil/pf/if_pfsync.c
/freebsd-10-stable/sys/ofed/include/linux/timer.h
/freebsd-10-stable/sys/ofed/include/linux/workqueue.h
/freebsd-10-stable/sys/powerpc/mambo/mambo_console.c
/freebsd-10-stable/sys/powerpc/pseries/phyp_console.c
/freebsd-10-stable/sys/sys/callout.h
/freebsd-10-stable/sys/vm/uma_core.c
/freebsd-10-stable/sys/x86/x86/mca.c
309108 24-Nov-2016 jch

MFC r286227, r286443:

r286227:

Decompose TCP INP_INFO lock to increase short-lived TCP connections scalability:

- The existing TCP INP_INFO lock continues to protect the global inpcb list
stability during full list traversal (e.g. tcp_pcblist()).

- A new INP_LIST lock protects inpcb list actual modifications (inp allocation
and free) and inpcb global counters.

It allows to use TCP INP_INFO_RLOCK lock in critical paths (e.g. tcp_input())
and INP_INFO_WLOCK only in occasional operations that walk all connections.

PR: 183659
Differential Revision: https://reviews.freebsd.org/D2599
Reviewed by: jhb, adrian
Tested by: adrian, nitroboost-gmail.com
Sponsored by: Verisign, Inc.

r286443:

Fix a kernel assertion issue introduced with r286227:
Avoid too strict INP_INFO_RLOCK_ASSERT checks due to
tcp_notify() being called from in6_pcbnotify().

Reported by: Larry Rosenman <ler@lerctr.org>
Submitted by: markj, jch

303459 28-Jul-2016 sbruno

MFC r299829
Use Node Information flag names instead of hard-coding their values.

303458 28-Jul-2016 sbruno

MFC r296063 r297397 r299213

296063:
Lock the NDP default router list and count defrouter references.

This addresses a number of race conditions that can cause crashes as a
result of unsynchronized access to the list.

297397
Modify nd6_llinfo_timer() to acquire the nd6 lock before the LLE lock.

When expiring a neighbour cache entry we may need to look up the associated
default router, which requires the nd6 read lock. To avoid an LOR, the nd6
lock should be acquired first.

299213
Clean up callers of nd6_prelist_add().

nd6_prelist_add() sets *newp if and only if it is successful, so there's no
need for code that handles the case where the return value is 0 and
*newp == NULL. Fix some style bugs in nd6_prelist_add() while here.

Submitted by: Jason Wolfe <j@nitrology.com>

299145 05-May-2016 markj

MFC r295583, r295584, r295729, r295730:
NDP code cleanup changes.

MFC r295732:
Fix an IPv6 DAD reference count leak.

299014 03-May-2016 markj

MFC r295575, r295576, r295578, r295579, r295580:
Various NDP cleanups. No functional change intended.

297445 31-Mar-2016 ae

MFC r296984:
Change in6_selectsrc() to allow usage of non-local IPv6 addresses in
IPV6_PKTINFO ancillary data when IPV6_BINDANY socket option is set.

296052 25-Feb-2016 tuexen

MFC r295549:
Loopback addresses are 127.0.0.0/8, not 127.0.0.1/32.

MFC r295668:
Improve the teardown of the SCTP stack.

MFC r295670:
Whitespace changes.

MFC r295708:
Address a warning reported by D5245 / PVS.

MFC r295709:
Code cleanup which will silence a warning in PVS / D5245.

MFC r295710:
Add protection code for issues reported by PVS / D5245.

MFC r295771:
Fix reporting of mapped addressed in getpeername() and getsockname() for
IPv6 SCTP sockets.
This bugs were found because of an issue reported by PVS / D5245.

MFC r295772:
Add some protection code.

MFC r295773:
Add protection code.

MFC r295805:
Use the SCTP level pointer, not the interface level.

MFC r295929:
Don't leak an address in an error path.

Approved by: re (marius)

295389 08-Feb-2016 bz

MFC r292601,292654:

Since r256624 (head) we have been leaking routing table allocations
on vnet enabled jail shutdown. Call the provided cleanup
routines for IP versions 4 and 6 to plug these leaks.

Sponsored by: The FreeBSD Foundation
Reviewed by: gnn
Differential Revision:https://reviews.freebsd.org/D4530

Approved by: re (gjb)

294503 21-Jan-2016 bz

MFC 292953:

This code is not in modules that need KPI stability so no need to use
the wrapper functions as used in r252511 (head). We can directly use
the locking macros.

294215 17-Jan-2016 tuexen

MFC r291904:
Fix the allocation of outgoing streams:
* When processing a cookie, use the number of
streams announced in the INIT-ACK.
* When sending an INIT-ACK for an existing
association, use the value from the association,
not from the end-point.

294142 16-Jan-2016 tuexen

MFC r285877:
Move including netinet/icmp6.h around to avoid a problem when including
netinet/icmp6.h and net/netmap.h. Both use ni_flags...
This allows to build multistack with SCTP support.

293897 14-Jan-2016 glebius

o Fix SCTP ICMPv6 error message vulnerability. [SA-16:01.sctp]
o Fix Linux compatibility layer incorrect futex handling. [SA-16:03.linux]
o Fix Linux compatibility layer setgroups(2) system call. [SA-16:04.linux]
o Fix TCP MD5 signature denial of service. [SA-16:05.tcp]
o Fix insecure default bsnmpd.conf permissions. [SA-16:06.bsnmpd]

Security: FreeBSD-SA-16:01.sctp, CVE-2016-1879
Security: FreeBSD-SA-16:03.linux, CVE-2016-1880
Security: FreeBSD-SA-16:04.linux, CVE-2016-1881
Security: FreeBSD-SA-16:05.tcp, CVE-2016-1882
Security: FreeBSD-SA-16:06.bsnmpd, CVE-2015-5677

293358 07-Jan-2016 wollman

MFH r292836:

in6_if2idlen: treat bridge(4) interfaces like other Ethernet interfaces

bridge(4) interfaces have an if_type of IFT_BRIDGE, rather than
IFT_ETHER, even though they only support Ethernet-style links. This
caused in6_if2idlen to emit an "unknown link type (209)" warning to
the console every time it was called. Add IFT_BRIDGE to the case
statement in the appropriate place, indicating that it uses the same
IPv6 address format as other Ethernet-like interfaces.

292566 21-Dec-2015 kp

MFC r292219:

inet6: Do not assume every interface has ip6 enabled.

Certain interfaces (e.g. pfsync0) do not have ip6 addresses (in other words,
ifp->if_afdata[AF_INET6] is NULL). Ensure we don't panic when the MTU is
updated.

pfsync interfaces will never have ip6 support, because it's explicitly disabled
in in6_domifattach().

PR: 205194

291987 08-Dec-2015 ae

Fix typo in r291986.

(this is derect commit to stable/10)

291986 08-Dec-2015 ae

MFC r291578:
mld_v2_dispatch_general_query() is used by mld_fasttimo_vnet() to send
a reply to the MLDv2 General Query. In case when router has a lot of
multicast groups, the reply can take several packets due to MTU limitation.
Also we have a limit MLD_MAX_RESPONSE_BURST == 4, that limits the number
of packets we send in one shot. Then we recalculate the timer value and
schedule the remaining packets for sending.
The problem is that when we call mld_v2_dispatch_general_query() to send
remaining packets, we queue new reply in the same mbuf queue. And when
number of packets is bigger than MLD_MAX_RESPONSE_BURST, we get endless
reply of MLDv2 reports.
To fix this, add the check for remaining packets in the queue.

PR: 204831

290348 04-Nov-2015 hrs

MFC r288600:

- Schedule DAD for IN6_IFF_TENTATIVE addresses in nd6_timer(). This
catches cases that DAD probes cannot be sent because of
IFF_UP && !IFF_DRV_RUNNING.

- nd6_dad_starttimer() now calls nd6_dad_ns_output(), instead of
calling it before nd6_dad_starttimer().

- Do not release an entry in dadq when a duplicate entry is being
added.

288109 22-Sep-2015 garga

Remove extra space introduced in r287734. This is a stable/10 only fix
since original commit (r287094) is correct.

Approved by: loos
Sponsored by: Rubicon Communications (Netgate)

287734 13-Sep-2015 hrs

MFC 287094:

- Deprecate IN6_IFF_NODAD. It was used to prevent DAD on a loopback
interface but in6if_do_dad() already had a check for IFF_LOOPBACK.

- Remove in6if_do_dad() check in in6_broadcast_ifa(). An address
which needs DAD always has IN6_IFF_TENTATIVE there.

- in6if_do_dad() now returns EAGAIN when the interface is not ready
since DAD callout handler ignores such an interface.

- In DAD callout handler, mark an address as IN6_IFF_TENTATIVE
when the interface has ND6_IFF_IFDISABLED. And Do IFF_UP and
IFF_DRV_RUNNING check consistently when DAD is required.

- draft-ietf-6man-enhanced-dad is now published as RFC 7527.

- Fix some typos.

287733 13-Sep-2015 hrs

MFC 287095, 287610, 287611, 287617:

Remove obsolete API (SIOCGDRLST_IN6 and SIOCGPRLST_IN6) support.

287732 13-Sep-2015 hrs

MFC 287609:

Do not add IN6_IFF_TENTATIVE when ND6_IFF_NO_DAD.

287731 13-Sep-2015 hrs

MFC 287608:

Remove IN6_IFF_NOPFX. This flag was no longer used.

286316 05-Aug-2015 ae

MFC r285710:
Invoke LLE event handler when entry is deleted.

285825 23-Jul-2015 hrs

MFC r282805:

- Remove ND6_IFF_IGNORELOOP. This functionality was useless in practice
because a link where looped back NS messages are permanently observed
does not work with either NDP or ARP for IPv4.

- draft-ietf-6man-enhanced-dad is now RFC 7527.

Approved by: re (gjb)

285822 23-Jul-2015 hrs

MFC r273992:

Fix a bug which prevented ND6_IFF_IFDISABLED flag from clearing when
the newly-added IPv6 address was /128.

Approved by: re (gjb)

284633 20-Jun-2015 tuexen

MFC r284515:
Add FIB support for SCTP.
This fixes https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=200379

PR: 200379

284576 18-Jun-2015 kp

Merge r281234

Evaluate packet size after the firewall had its chance

Defer the packet size check until after the firewall has had a look at it. This
means that the firewall now has the opportunity to (re-)fragment an oversized
packet.

Differential Revision: https://reviews.freebsd.org/D2821
Reviewed by: gnn

284575 18-Jun-2015 kp

Merge r281165

Remove duplicate code

We'll just fall into the same local delivery block under the
'if (m->m_flags & M_FASTFWD_OURS)'.

Suggested by: ae
Differential Revision: https://reviews.freebsd.org/D2820
Reviewed by: gnn

284572 18-Jun-2015 kp

Merge r280955

Preserve IPv6 fragment IDs accross reassembly and refragmentation

When forwarding fragmented IPv6 packets and filtering with PF we
reassemble and refragment. That means we generate new fragment headers
and a new fragment ID.

We already save the fragment IDs so we can do the reassembly so it's
straightforward to apply the incoming fragment ID on the refragmented
packets.

Differential Revision: https://reviews.freebsd.org/D2817
Reviewed by: gnn

284570 18-Jun-2015 kp

Merge r278842

Factor out ip6_fragment() function, to be used in IPv6 stack and pf(4).

Differential Revision: https://reviews.freebsd.org/D2815
Reviewed by: gnn

284568 18-Jun-2015 kp

Merge r278828, r278832

- Factor out ip6_deletefraghdr() function, to be shared between IPv6 stack and pf(4).
- Move ip6_deletefraghdr() to frag6.c. (Suggested by bz)

Differential Revision: https://reviews.freebsd.org/D2813
Reviewed by: gnn

284072 06-Jun-2015 ae

MFC r276148:
Remove in_gif.h and in6_gif.h files. They only contain function
declarations used by gif(4). Instead declare these functions in C files.
Also make some variables static.

MFC r276215:
Extern declarations in C files loses compile-time checking that
the functions' calls match their definitions. Move them to header files.

284066 06-Jun-2015 ae

MFC r274246:
Overhaul if_gre(4).

Split it into two modules: if_gre(4) for GRE encapsulation and
if_me(4) for minimal encapsulation within IP.

gre(4) changes:
* convert to if_transmit;
* rework locking: protect access to softc with rmlock,
protect from concurrent ioctls with sx lock;
* correct interface accounting for outgoing datagramms (count only payload size);
* implement generic support for using IPv6 as delivery header;
* make implementation conform to the RFC 2784 and partially to RFC 2890;
* add support for GRE checksums - calculate for outgoing datagramms and check
for inconming datagramms;
* add support for sending sequence number in GRE header;
* remove support of cached routes. This fixes problem, when gre(4) doesn't
work at system startup. But this also removes support for having tunnels with
the same addresses for inner and outer header.
* deprecate support for various GREXXX ioctls, that doesn't used in FreeBSD.
Use our standard ioctls for tunnels.

me(4):
* implementation conform to RFC 2004;
* use if_transmit;
* use the same locking model as gre(4);

PR: 164475

MFC r274289 (by bz):
gcc requires variables to be initialised in two places. One of them
is correctly used only under the same conditional though.

For module builds properly check if the kernel supports INET or INET6,
as otherwise various mips kernels without IPv6 support would fail to build.

MFC r274964:
Add ip_gre.h to ObsoleteFiles.inc.

284016 05-Jun-2015 ae

Rework r281868 to not skip RTM announces for tunneling interfaces.
This is direct commit to stable/10.

Tested by: tuexen@

283901 02-Jun-2015 ae

MFC r275392:
Remove route chaching support from ipsec code. It isn't used for some time.
* remove sa_route_union declaration and route_cache member from struct secashead;
* remove key_sa_routechange() call from ICMP and ICMPv6 code;
* simplify ip_ipsec_mtu();
* remove #include <net/route.h>;

Sponsored by: Yandex LLC

283852 31-May-2015 ae

MFC r282965:
Add an ability accept encapsulated packets from different sources by one
gif(4) interface. Add new option "ignore_source" for gif(4) interface.
When it is enabled, gif's encapcheck function requires match only for
packet's destination address.

Differential Revision: https://reviews.freebsd.org/D2004
Sponsored by: Yandex LLC

283822 31-May-2015 tuexen

MFC r283650:

Fix and cleanup the debug information. This has no user-visible changes.
Thanks to Irene Ruengeler for proving a patch.

283708 29-May-2015 tuexen

MFC r276914:

Minimize the usage of SCTP_BUF_IS_EXTENDED.
This should help Robert...

283703 29-May-2015 tuexen

MFC r275868:

Plug a memory leak in an error code path.

Reported by: Coverity
CID: 1018936

282894 14-May-2015 ae

MFC r282578:
Mark data checksum as valid for multicast packets, that we send back
to myself via simloop.
Also remove duplicate check under #ifdef DIAGNOSTIC.

PR: 180065

282807 12-May-2015 hrs

MFC r274223 (by glebius):

Remove VNET_SYSCTL_ARG(). The generic sysctl(9) code handles that.

A panic could occur by "sysctl -a" when using VIMAGE-enabled stable/10
kernel after r262734 because of this missing MFC.

282622 08-May-2015 hiren

MFC r261708, r261847, r268525, r274316, r274347, r275593,
r276844, r276847, r279531, r279559, r279564, r279676

A bunch of IPv6 fixes by melifaro, hrs and ae

Major changes:
Simplify nd6_output_lle()
Add refcounting to DAD and fix races and other errors
Implement Enhanced DAD algorithm for IPv6

Suggested by: ae
Tested by: Jason Wolfe <j at nitrology.com>
Sponsored by: Limelight Networks

282445 05-May-2015 markj

MFC r281483:
Fix a possible refcount leak in regen_tmpaddr().

281955 24-Apr-2015 hiren

MFC r275358 r275483 r276982 - Removing M_FLOWID by hps@

r275358:
Start process of removing the use of the deprecated "M_FLOWID" flag
from the FreeBSD network code. The flag is still kept around in the
"sys/mbuf.h" header file, but does no longer have any users. Instead
the "m_pkthdr.rsstype" field in the mbuf structure is now used to
decide the meaning of the "m_pkthdr.flowid" field. To modify the
"m_pkthdr.rsstype" field please use the existing "M_HASHTYPE_XXX"
macros as defined in the "sys/mbuf.h" header file.

This patch introduces new behaviour in the transmit direction.
Previously network drivers checked if "M_FLOWID" was set in "m_flags"
before using the "m_pkthdr.flowid" field. This check has now now been
replaced by checking if "M_HASHTYPE_GET(m)" is different from
"M_HASHTYPE_NONE". In the future more hashtypes will be added, for
example hashtypes for hardware dedicated flows.

"M_HASHTYPE_OPAQUE" indicates that the "m_pkthdr.flowid" value is
valid and has no particular type. This change removes the need for an
"if" statement in TCP transmit code checking for the presence of a
valid flowid value. The "if" statement mentioned above is now a direct
variable assignment which is then later checked by the respective
network drivers like before.

r275483:
Remove M_FLOWID from SCTP code.

r276982:
Remove no longer used "M_FLOWID" flag from mbuf.h and update the netisr
manpage.

Note: The FreeBSD version has been bumped.

Reviewed by: hps, tuexen
Sponsored by: Limelight Networks


/freebsd-10-stable/share/man/man9/netisr.9
/freebsd-10-stable/sys/dev/bxe/bxe.c
/freebsd-10-stable/sys/dev/cxgb/cxgb_sge.c
/freebsd-10-stable/sys/dev/cxgbe/t4_main.c
/freebsd-10-stable/sys/dev/cxgbe/t4_sge.c
/freebsd-10-stable/sys/dev/e1000/if_igb.c
/freebsd-10-stable/sys/dev/ixgbe/ixgbe.c
/freebsd-10-stable/sys/dev/ixgbe/ixv.c
/freebsd-10-stable/sys/dev/ixl/ixl_txrx.c
/freebsd-10-stable/sys/dev/mxge/if_mxge.c
/freebsd-10-stable/sys/dev/netmap/netmap_freebsd.c
/freebsd-10-stable/sys/dev/oce/oce_if.c
/freebsd-10-stable/sys/dev/qlxgbe/ql_isr.c
/freebsd-10-stable/sys/dev/qlxgbe/ql_os.c
/freebsd-10-stable/sys/dev/qlxge/qls_isr.c
/freebsd-10-stable/sys/dev/qlxge/qls_os.c
/freebsd-10-stable/sys/dev/sfxge/sfxge_rx.c
/freebsd-10-stable/sys/dev/sfxge/sfxge_tx.c
/freebsd-10-stable/sys/dev/virtio/network/if_vtnet.c
/freebsd-10-stable/sys/dev/vmware/vmxnet3/if_vmx.c
/freebsd-10-stable/sys/dev/vxge/vxge.c
/freebsd-10-stable/sys/net/flowtable.c
/freebsd-10-stable/sys/net/ieee8023ad_lacp.c
/freebsd-10-stable/sys/net/if_lagg.c
/freebsd-10-stable/sys/net/if_lagg.h
/freebsd-10-stable/sys/net/netisr.c
/freebsd-10-stable/sys/netinet/in_pcb.h
/freebsd-10-stable/sys/netinet/ip_output.c
/freebsd-10-stable/sys/netinet/sctp_indata.c
/freebsd-10-stable/sys/netinet/sctp_input.c
/freebsd-10-stable/sys/netinet/sctp_output.c
/freebsd-10-stable/sys/netinet/sctp_pcb.c
/freebsd-10-stable/sys/netinet/sctp_structs.h
/freebsd-10-stable/sys/netinet/sctputil.c
/freebsd-10-stable/sys/netinet/tcp_input.c
/freebsd-10-stable/sys/netinet/tcp_syncache.c
sctp6_usrreq.c
/freebsd-10-stable/sys/ofed/drivers/net/mlx4/en_rx.c
/freebsd-10-stable/sys/ofed/drivers/net/mlx4/en_tx.c
/freebsd-10-stable/sys/sys/mbuf.h
/freebsd-10-stable/sys/sys/param.h
281911 24-Apr-2015 ae

MFC r281380:
Fix the IPV6_MULTICAST_IF sockopt handling. RFC 3493 says when the
interface index is specified as zero, the system should select the
interface to use for outgoing multicast packets. Even the comment
for the in6p_set_multicast_if() function says about index of zero.
But in fact for zero index the function just returns EADDRNOTAVAIL.

I.e. if you first set some interface and then will try reset it
with zero ifindex, you will get EADDRNOTAVAIL.

Reset im6o_multicast_ifp to NULL when interface index specified as
zero. Also return EINVAL in case when ifnet_byindex() returns NULL.
This will be the same behaviour as when ifindex is bigger than
V_if_index. And return EADDRNOTAVAIL only when interface is not
multicast capable.

281868 22-Apr-2015 ae

MFC r274988 (with modification):
Skip L2 addresses lookups for tunneling interfaces.

PR: 197286

281866 22-Apr-2015 ae

MFC r281309:
Fix the check for maximum mbuf's size needed to send ND6 NA and NS.
It is acceptable that the size can be equal to MCLBYTES. In the later
KAME's code this check has been moved under DIAGNOSTIC ifdef, because
the size of NA and NS is much smaller than MCLBYTES. So, it is safe to
replace the check with KASSERT.

PR: 199304

281230 07-Apr-2015 delphij

Improve patch for SA-15:04.igmp to solve a potential buffer overflow.

Fix multiple vulnerabilities of ntp. [SA-15:07]

Fix bsdinstall(8) insecure default GELI keyfile permissions. [SA-15:08]

Fix Denial of Service with IPv6 Router Advertisements. [SA-15:09]

280705 26-Mar-2015 ae

MFC r280236:
To avoid a possible race, release the reference to ifa after return
from nd6_dad_na_input().

279911 12-Mar-2015 ae

MFC r279588:
Fix deadlock in IPv6 PCB code.

When several threads are trying to send datagram to the same destination,
but fragmentation is disabled and datagram size exceeds link MTU,
ip6_output() calls pfctlinput2(PRC_MSGSIZE). It does notify all
sockets wanted to know MTU to this destination. And since all threads
hold PCB lock while sending, taking the lock for each PCB in the
in6_pcbnotify() leads to deadlock.

RFC 3542 p.11.3 suggests notify all application wanted to receive
IPV6_PATHMTU ancillary data for each ICMPv6 packet too big message.
But it doesn't require this, when we don't receive ICMPv6 message.

Change ip6_notify_pmtu() function to be able use it directly from
ip6_output() to notify only one socket, and to notify all sockets
when ICMPv6 packet too big message received.

MFC r279684:
tcp6_ctlinput() doesn't pass MTU value to in6_pcbnotify().
Check cmdarg isn't NULL before dereference, this check was in the
ip6_notify_pmtu() before r279588.

PR: 197059
Sponsored by: Yandex LLC

278801 15-Feb-2015 rrs

MFC of r278472
This fixes a bug in the way that the LLE timers for nd6
and arp were being used. They basically would pass in the
mutex to the callout_init. Because they used this method
to the callout system, it was possible to "stop" the callout.
When flushing the table and you stopped the running callout, the
callout_stop code would return 1 indicating that it was going
to stop the callout (that was about to run on the callout_wheel blocked
by the function calling the stop). Now when 1 was returned, it would
lower the reference count one extra time for the stopped timer, then
a few lines later delete the memory. Of course the callout_wheel was
stuck in the lock code and would then crash since it was accessing
freed memory. By using callout_init(c, 1) we always get a 0 back
and the reference counting bug does not rear its head. We do have
to make a few adjustments to the callouts themselves though to make
sure it does the proper thing if rescheduled as well as gets the lock.

Sponsored by: Netflix Inc.

278620 12-Feb-2015 ae

MFC r278268:
Print IPv6 address in log message instead of address of pointer.

277789 27-Jan-2015 bryanv

MFC r272886:

Add context pointer and source address to the UDP tunnel callback

These are needed for the forthcoming vxlan implementation. The context
pointer means we do not have to use a spare pointer field in the inpcb,
and the source address is required to populate vxlan's forwarding table.

276149 23-Dec-2014 ae

MFC r273087 (with modifications):
Overhaul if_gif(4):
o convert to if_transmit;
o use rmlock to protect access to gif_softc;
o use sx lock to protect from concurrent ioctls;
o remove a lot of unneeded and duplicated code;
o remove cached route support (it won't work with concurrent io);
o style fixes.

MFC r273090:
Move memset under ifdef INET6.

MFC r273091:
Add more ifdefs. SIOC*_IN6 are defined only with INET6.

MFC r273121:
Add inet/inet6 to the dependency list. Without them if_gif is useless.

MFC r273209 by bz:
After r273087,r273090,r273091,r273121 changes to gif(4) try to fix
NOIP builds for real.

MFC r273587:
Remove redundant check and m_pullup() call.

275828 16-Dec-2014 ae

MFC r275394:
Remove unneded check. No need to do m_pullup to the size that we prepended.

Sponsored by: Yandex LLC

274755 20-Nov-2014 ae

MFC r274434:
Fix ips_out_nosa errors accounting.

MFC r274454:
ipsec6_process_packet is called before ip6_output fixes ip6_plen.
Update ip6_plen before bpf processing to be able see correct value.

MFC r274455:
We don't return sp pointer, thus NULL assignment isn't needed.
And reference to sp will be freed at the end.

MFC r274465:
Remove redundant ip6_plen initialization.

MFC r274466:
Strip IP header only when we act in tunnel mode.

MFC r274467:
Count statistics for the specific address family.

Sponsored by: Yandex LLC

274266 08-Nov-2014 bryanv

MFC r272844:

Add missing UDP multicast receive dtrace probes

274265 08-Nov-2014 bryanv

MFC r272801:

Move the calls to u_tun_func() into udp6_append()

A similar cleanup for UDPv4 was performed in r220620.

274168 06-Nov-2014 ae

MFC r273855:
Fix mbuf leak in IPv6 multicast code.
When multicast capable interface goes away, it leaves multicast groups,
this leads to generate MLD reports, but MLD code does deffered send and
MLD reports are queued in the in6_multi's in6m_scq ifq. The problem is
that in6_multi structures are freed when interface leaves multicast groups
and thread that does deffered send will not take these queued packets.

PR: 194577

MFC r273857:
Move ifq drain into in6m_purge().

Suggested by: bms

Sponsored by: Yandex LLC

274132 05-Nov-2014 ae

MFC r266800 by vanhu:
IPv4-in-IPv6 and IPv6-in-IPv4 IPsec tunnels.
For IPv6-in-IPv4, you may need to do the following command
on the tunnel interface if it is configured as IPv4 only:
ifconfig <interface> inet6 -ifdisabled

Code logic inspired from NetBSD.
PR: kern/169438

MC r266822 by bz:
Use IPv4 statistics in ipsec4_process_packet() rather than the IPv6
version. This also unbreaks the NOINET6 builds after r266800.

MFC r268083 by zec:
The assumption in ipsec4_process_packet() that the payload may be
only IPv4 is wrong, so check the IP version before mangling the
payload header.

MFC r272394:
Do not strip outer header when operating in transport mode.
Instead requeue mbuf back to IPv4 protocol handler. If there is one extra IP-IP
encapsulation, it will be handled with tunneling interface. And thus proper
interface will be exposed into mbuf's rcvif. Also, tcpdump that listens on tunneling
interface will see packets in both directions.

PR: 194761

272859 09-Oct-2014 hrs

MFC r269054:

Fix EtherIP. TOS field must be initialized when the inner protocol is
PF_LINK, and multicast/broadcast flag should always be dropped because
the outer protocol uses unicast even when the inner address is not for
unicast. It had been broken since r236951 when gif_output() started to
use IFQ_HANDOFF().

272857 09-Oct-2014 hrs

MFC r266248:

Cancel DAD for an ifa when the ifp has ND6_IFF_IFDISABLED as early as
possible and do not clear IN6_IFF_TENTATIVE. If IFDISABLED was accidentally
set after a DAD started, TENTATIVE could be cleared because no NA was
received due to IFDISABLED, and as a result it could prevent DAD when
manually clearing IFDISABLED after that.

272847 09-Oct-2014 hrs

MFC r266857:

- Add rwlock to struct dadq. A panic could occur when a large number of
addresses performed DAD at the same time.

272790 09-Oct-2014 ae

MFC r271307:
Add the ability to set `prefer_source' flag to an IPv6 address.
It affects the IPv6 source address selection algorithm (RFC 6724)
and allows override the last rule ("longest matching prefix") for
choosing among equivalent addresses. The address with `prefer_source'
will be preferred source address.

272754 08-Oct-2014 tuexen

MFC r272706:
Fix a bug introduced in
https://svnweb.freebsd.org/base?view=revision&revision=272347

272664 06-Oct-2014 tuexen

MFC r272469:
UDP/IPv6 and UDPLite/IPv6 require a checksum. So check for it.

272663 06-Oct-2014 tuexen

MFC r272408:
Check for UDP/IPv6 packets that the length in the UDP header is at least
the minimum. Make the check similar to the one for UDPLite/IPv6.

272662 06-Oct-2014 tuexen

MFC r272404:
Fix the checksum computation for UDPLite/IPv6. This requires the
usage of a function computing the checksum only over a part of the function.
Therefore introduce in6_cksum_partial() and implement in6_cksum() based
on that.
While there, ensure that the UDPLite packet contains at least enough bytes
to contain the header.

272661 06-Oct-2014 tuexen

MFC r272347:
The default for UDPLITE_RECV_CSCOV is zero. RFC 3828 recommend
that this means full checksum coverage for received packets.
If an application is willing to accept packets with partial
coverage, it is expected to use the socket option and provide
the minimum coverage it accepts.

272645 06-Oct-2014 tuexen

MFC r272323:
If the checksum coverage field in the UDPLITE header is the length
of the complete UDPLITE packet, the packet has full checksum coverage.
So fix the condition.

272628 06-Oct-2014 tuexen

MFC r272296:
When plen != ulen, it should only be checked when this is UDP.

The commit is from kevlo and he agreed that I MFC it as part of the
UDPLite fixes.

271746 18-Sep-2014 tuexen

MFC r270673:
Announce SCTP support in the kern.features sysctl variables.

MFC r270859:
Enable SCTP support. It runs perfectly fine on a Wandboard quad.

MFC r271204 with manual intervention:
Fix the handling of sysctl variables when used with VIMAGE.
While there do some cleanup of the code.

MFC r271209:
Fix a leak of an address, if the address is scheduled for removal
and the stack is torn down.
Thanks to Peter Bostroem and Jiayang Liu from Google for reporting the
issue.

MFC r271219:
Use SYSCTL_PROC instead of SYSCTL_VNET_PROC.
Suggested by: glebius@

MFC r271221:
Use union sctp_sockstore instead of struct sockaddr_storage. This
eliminates some warnings when building in userland.
Thanks to Patrick Laimbock for reporting this issue.
Remove also some unnecessary casts.
There should be no functional change.

MFC r271228:
Address another warnings reported by Patrick Laimbock when compiling
in userspace. While there, improve consistency.

MFC r271230:
Address warnings generated by the clang analyzer.

Approved by: re (kib)

271288 08-Sep-2014 ae

MFC r270927:
Add the reverse part to rule #9. Also change its description in the
netstat(8) output.

Approved by: re (gjb)

271236 07-Sep-2014 rodrigc

MFC r262351:

Remove KASSERT from in6p_lookup_mcast_ifp().

When the devel/jenkins port, version 1.551 was started,
the kernel would panic if INVARIANTS was enabled in the kernel config.

Suggested by: bms

Approved by: re (gjb)

271185 06-Sep-2014 markj

MFC r270348:
Add some missing checks for unsupported interfaces (e.g. pflog(4)) when
handling ioctls. While here, remove duplicated checks for a NULL ifp in
in6_control(): this check is already done near the beginning of the
function.

MFC r270349:
Suppress warnings when retrieving protocol stats from interfaces that
don't support IPv6 (e.g. pflog(4)).

PR: 189117
Approved by: re (gjb)

270923 01-Sep-2014 ae

MFC r257985:
Fix panic with RADIX_MPATH, when RTFREE_LOCKED() called for already
unlocked route. Use in6_rtalloc() instead of in6_rtalloc1. This helps
simplify the code and remove several now unused variables.

PR: 156283

270044 16-Aug-2014 bz

MFC r259884:

Correct warnings comparing unsigned variables < 0 constantly reported
while building kernels. All instances removed are indeed unsigned so
the expressions could not be true.

269944 13-Aug-2014 ae

MFC r269306:
Add new rule to source address selection algorithm. It prefers address
with better virtual status. Use ifa_preferred() to choose better address.

PR: 187341

268052 30-Jun-2014 ume

MFC r267801: Make nd6_gctimer tunable.

265946 13-May-2014 kevlo

MFC r264212,r264213,r264248,r265776,r265811,r265909:

- Add support for UDP-Lite protocol (RFC 3828) to IPv4 and IPv6 stacks.
Tested with vlc and a test suite [1].
[1] http://www.erg.abdn.ac.uk/~gerrit/udp-lite/files/udplite_linux.tar.gz

Reviewed by: jhb, glebius, adrian

- Fix a logic bug which prevented the sending of UDP packet with 0 checksum.

- Disable TX checksum offload for UDP-Lite completely. It wasn't used for
partial checksum coverage, but even for full checksum coverage it doesn't
work.

264873 24-Apr-2014 ae

MFC r264582:
Remove unused variable.

PR: 173521

264722 21-Apr-2014 ae

MFC r264364:
Properly release the in6_multi lock.

Sponsored by: Yandex LLC

264224 07-Apr-2014 ae

MFC r263969,263971:
Don't generate an ICMPv6 error message if packet was consumed by filter.
Remove unused label.

Sponsored by: Yandex LLC

263478 21-Mar-2014 glebius

Merge r262763, r262767, r262771, r262806 from head:
- Remove rt_metrics_lite and simply put its members into rtentry.
- Use counter(9) for rt_pksent (former rt_rmx.rmx_pksent). This
removes another cache trashing ++ from packet forwarding path.
- Create zini/fini methods for the rtentry UMA zone. Via initialize
mutex and counter in them.
- Fix reporting of rmx_pksent to routing socket.
- Fix netstat(1) to report "Use" both in kvm(3) and sysctl(3) mode.

263307 18-Mar-2014 glebius

Merge r263091: fix mbuf flags clash that lead to failure of operation
of IPSEC and packet filters.

PR: kern/185876
PR: kern/186755

263069 12-Mar-2014 brueffer

MFC: r261710

Only count table lookups when we're actually processing packets.

PR: 183462
Submitted by: Sven-Thorsten Dietrich <thebigcorporation at gmail.com>
Reviewed by: bms

262743 04-Mar-2014 glebius

Merge r261582, r261601, r261610, r261613, r261627, r261640, r261641, r261823,
r261825, r261859, r261875, r261883, r261911, r262027, r262028, r262029,
r262030, r262162 from head.

Large flowtable revamp. See commit messages for merged revisions for
details.

Sponsored by: Netflix

262256 20-Feb-2014 ae

MFC r261835:
Drop packets to multicast address whose scop field contains the
reserved value 0.

Sponsored by: Yandex LLC

261716 10-Feb-2014 ae

MFC r261400:
Take exclusive lock only when lle isn't NULL. We don't need write access
to lle in most cases.

MFC r261583:
Unlock entry before retry.

Sponsored by: Yandex LLC

261218 28-Jan-2014 ae

MFC r260485,260496:
Remove extra nesting from X_ip6_mforward() function.
Also remove disabled definitions from ip6_mroute.h.

PR: 185148

260712 16-Jan-2014 ae

MFC r260481:
Add MRT6_DLOG() macro for debugging.
Reduce number of MRT6DEBUG ifdefs and fix some broken format strings.

260504 10-Jan-2014 ae

MFC r260151 (by adrian):
Use an RLOCK here instead of an RWLOCK - matching all the other calls
to lla_lookup().

This drastically reduces the very high lock contention when doing parallel
TCP throughput tests (> 1024 sockets) with IPv6.

MFC r260187:
lla_lookup() does modification only when LLE_CREATE is specified.
Thus we can use IF_AFDATA_RLOCK() instead of IF_AFDATA_LOCK() when doing
lla_lookup() without LLE_CREATE flag.

MFC r260217:
Add IF_AFDATA_WLOCK_ASSERT() in case lla_lookup() is called with
LLE_CREATE flag.

259983 28-Dec-2013 dim

MFC r259840:

In sys/netinet6/in6_mcast.c, in6m_is_ifp_detached() is only used
whenever KTR is defined, so put it between #ifdef KTR guards. This
avoids a warning about a unused function if KTR is not enabled.

258454 21-Nov-2013 tuexen

MFC r256556:
Remove a buggy comparision when setting manually the path MTU.
After fixing, the comparision would have become redundant.
Thanks to Andrew Galante for reporting the issue.

MFC r257272:
Fix compilation if SCTP_DONT_DO_PRIVADDR_SCOPE is defined.
The issue was reported by Andrew Galante.

MFC r257274:
Fix the value of *optlen when calling getsockopt() for
SCTP_REMOTE_UDP_ENCAPS_PORT.
This issue was reported by Andrew Galante.

MFC r257359:
Terminate a debug output with a \n.

MFC r257555:
Changes from upstream to improve compilation when INET or INET6
or none of them is defined.

MFC r257574:
Unlock the lock before destroying it.
This issue was reported by Andrew Galante.

MFC r257800:
Use htons()/ntohs() appropriately.
These issues were reported by Andrew Galante.

MFC r257803:
Make sure that we don't try to build an ASCONF-ACK chunk
larger than what fits in the the mbuf cluster.
This issue was reported by Andrew Galante.

MFC r257804:
Get rid of the artification limitation enforced by
SCTP_AUTH_RANDOM_SIZE_MAX.
This was suggested by Andrew Galante.

MFC r258221:
Cleanups which result in fixes which have been made upstream
and where partially suggested by Andrew Galante.
There is no functional change in FreeBSD.

MFC r258224:
When determining if an address belongs to an stcb, take the address family
into account for wildcard bound endpoints.

MFC r258228:
Remove a stray write operation.

MFC r258235:
Use SCTP_PR_SCTP_TTL when the user provides a positive
timetolive in sctp_sendmsg().

Approved by: re@

257961 11-Nov-2013 ae

MFC r257084:
Initialize inc_fibnum for properly handling ICMP6_PACKET_TOO_BIG
errors in multifib environment.

PR: 183265
Approved by: re (hrs)

256281 10-Oct-2013 gjb

Copy head (r256279) to stable/10 as part of the 10.0-RELEASE cycle.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation


256258 10-Oct-2013 hrs

Do not try to detach if the interface does not support IPv6.

Tested by: hselasky
PR: usb/182820
Approved by: re (glebius)


256108 07-Oct-2013 glebius

Fix mbuf leak.

Submitted by: Loganaden Velvindron <logan elandsys.com>
Obtained from: NetBSD
Approved by: re (kib)


255792 22-Sep-2013 bz

Update comment from draft to RFC number.

Submitted by: Loganaden Velvindron (logan elandsys.com)
Approved by: re (gjb)
MFC after: 6 days


255523 13-Sep-2013 trociny

Unregister inet/inet6 pfil hooks on vnet destroy.

Discussed with: andre
Approved by: re (rodrigc)


255442 10-Sep-2013 des

Fix the length calculation for the final block of a sendfile(2)
transmission which could be tricked into rounding up to the nearest
page size, leaking up to a page of kernel memory. [13:11]

In IPv6 and NetATM, stop SIOCSIFADDR, SIOCSIFBRDADDR, SIOCSIFDSTADDR
and SIOCSIFNETMASK at the socket layer rather than pass them on to the
link layer without validation or credential checks. [SA-13:12]

Prevent cross-mount hardlinks between different nullfs mounts of the
same underlying filesystem. [SA-13:13]

Security: CVE-2013-5666
Security: FreeBSD-SA-13:11.sendfile
Security: CVE-2013-5691
Security: FreeBSD-SA-13:12.ifioctl
Security: CVE-2013-5710
Security: FreeBSD-SA-13:13.nullfs
Approved by: re


255248 05-Sep-2013 jhb

Use an unsigned long when indexing into mfchashtbl[] and mf6ctable[]. This
matches the types used when computing hash indices and the type of the
maximum size of mfchashtbl[].

PR: kern/181821
Submitted by: Sven-Thorsten Dietrich <sven@vyatta.com> (IPv4)
MFC after: 1 week


254925 26-Aug-2013 jhb

Remove most of the remaining sysctl name list macros. They were only
ever intended for use in sysctl(8) and it has not used them for many
years.

Reviewed by: bde
Tested by: exp-run by bdrewery


254889 25-Aug-2013 markj

Implement the ip, tcp, and udp DTrace providers. The probe definitions use
dynamic translation so that their arguments match the definitions for
these providers in Solaris and illumos. Thus, existing scripts for these
providers should work unmodified on FreeBSD.

Tested by: gnn, hiren
MFC after: 1 month


254854 25-Aug-2013 tuexen

Provide human readable debug output.


254834 25-Aug-2013 andre

For now limit printf(9) %x of the 64bit pkthdr.csum_flags field to 32bits.
The upper 32bits are not occupied for now.

Sponsored by: The FreeBSD Foundation


254804 24-Aug-2013 andre

Restructure the mbuf pkthdr to make it fit for upcoming capabilities and
features. The changes in particular are:

o Remove rarely used "header" pointer and replace it with a 64bit protocol/
layer specific union PH_loc for local use. Protocols can flexibly overlay
their own 8 to 64 bit fields to store information while the packet is
worked on.

o Mechanically convert IP reassembly, IGMP/MLD and ATM to use pkthdr.PH_loc
instead of pkthdr.header.

o Extend csum_flags to 64bits to allow for additional future offload
information to be carried (e.g. iSCSI, IPsec offload, and others).

o Move the RSS hash type enumerator from abusing m_flags to its own 8bit
rsstype field. Adjust accessor macros.

o Add cosqos field to store Class of Service / Quality of Service information
with the packet. It is not yet supported in any drivers but allows us to
get on par with Cisco/Juniper in routing applications (plus MPLS QoS) with
a modernized ALTQ.

o Add four 8 bit fields l[2-5]hlen to store the relative header offsets
from the start of the packet. This is important for various offload
capabilities and to relieve the drivers from having to parse the packet
and protocol headers to find out location of checksums and other
information. Header parsing in drivers is a lot of copy-paste and
unhandled corner cases which we want to avoid.

o Add another flexible 64bit union to map various additional persistent
packet information, like ether_vtag, tso_segsz and csum fields.
Depending on the csum_flags settings some fields may have different usage
making it very flexible and adaptable to future capabilities.

o Restructure the CSUM flags to better signify their outbound (down the
stack) and inbound (up the stack) use. The CSUM flags used to be a bit
chaotic and rather poorly documented leading to incorrect use in many
places. Bring clarity into their use through better naming.
Compatibility mappings are provided to preserve the API. The drivers
can be corrected one by one and MFC'd without issue.

o The size of pkthdr stays the same at 48/56bytes (32/64bit architectures).

Sponsored by: The FreeBSD Foundation


254629 22-Aug-2013 delphij

Fix an integer overflow in computing the size of a temporary buffer
can result in a buffer which is too small for the requested
operation.

Security: CVE-2013-3077
Security: FreeBSD-SA-13:09.ip_multicast


254523 19-Aug-2013 andre

Add m_clrprotoflags() to clear protocol specific mbuf flags at up and
downwards layer crossings.

Consistently use it within IP, IPv6 and ethernet protocols.

Discussed with: trociny, glebius


254519 19-Aug-2013 andre

Move the global M_SKIP_FIREWALL mbuf flags to a protocol layer specific
flag instead. The flag is only used within the IP and IPv6 layer 3
protocols.

Because some firewall packages treat IPv4 and IPv6 packets the same the
flag should have the same value for both.

Discussed with: trociny, glebius


254441 17-Aug-2013 hrs

Return 0 in nbi->expire when la_expire == 0. Conversion from time_uptime to
time_second should not be performed in this case.


253999 06-Aug-2013 hrs

Fix incompatibility in ICMPV6CTL_ND6_PRLIST sysctl, and SIOCGPRLST_IN6,
SIOCGDRLST_IN6, and SIOCGNBRINFO_IN6 ioctl. These userland interfaces
treat expiration times in time_second, not time_uptime.


253970 05-Aug-2013 hrs

- Use time_uptime instead of time_second in data structures for
PF_INET6 in kernel. This fixes various malfunction when the wall time
clock is changed. Bump __FreeBSD_version to 1000041.

- Use clock_gettime(CLOCK_MONOTONIC_FAST) in userland utilities.

MFC after: 1 month


253950 05-Aug-2013 hrs

Fix a panic in tmpaddrtimer.


253841 31-Jul-2013 hrs

Allocate in6_ifextra (ifp->if_afdata[AF_INET6]) only for IPv6-capable
interfaces. This eliminates unnecessary IPv6 processing for non-IPv6
interfaces.

MFC after: 3 days


253571 23-Jul-2013 ae

Remove the large part of struct ipsecstat. Only few fields of this
structure is used, but they already have equal fields in the struct
newipsecstat, that was introduced with FAST_IPSEC and then was merged
together with old ipsecstat structure.

This fixes kernel stack overflow on some architectures after migration
ipsecstat to PCPU counters.

Reported by: Taku YAMAMOTO, Maciej Milewski


253282 12-Jul-2013 trociny

A complete duplication of binding should be allowed if on both new and
duplicated sockets a multicast address is bound and either
SO_REUSEPORT or SO_REUSEADDR is set.

But actually it works for the following combinations:

* SO_REUSEPORT is set for the fist socket and SO_REUSEPORT for the new;
* SO_REUSEADDR is set for the fist socket and SO_REUSEADDR for the new;
* SO_REUSEPORT is set for the fist socket and SO_REUSEADDR for the new;

and fails for this:

* SO_REUSEADDR is set for the fist socket and SO_REUSEPORT for the new.

Fix the last case.

PR: 179901
MFC after: 1 month


253101 09-Jul-2013 ae

Correct the size of allocated memory to store array of counters.


253086 09-Jul-2013 ae

Migrate structs in6_ifstat and icmp6_ifstat to PCPU counters.


253085 09-Jul-2013 ae

Migrate structs ip6stat, icmp6stat and rip6stat to PCPU counters.


253081 09-Jul-2013 ae

Prepare network statistics structures for migration to PCPU counters.
Use uint64_t as type for all fields of structures.

Changed structures: ahstat, arpstat, espstat, icmp6_ifstat, icmp6stat,
in6_ifstat, ip6stat, ipcompstat, ipipstat, ipsecstat, mrt6stat, mrtstat,
pfkeystat, pim6stat, pimstat, rip6stat, udpstat.

Discussed with: arch@


252710 04-Jul-2013 trociny

In r227207, to fix the issue with possible NULL inp_socket pointer
dereferencing, when checking for SO_REUSEPORT option (and SO_REUSEADDR
for multicast), INP_REUSEPORT flag was introduced to cache the socket
option. It was decided then that one flag would be enough to cache
both SO_REUSEPORT and SO_REUSEADDR: when processing SO_REUSEADDR
setsockopt(2), it was checked if it was called for a multicast address
and INP_REUSEPORT was set accordingly.

Unfortunately that approach does not work when setsockopt(2) is called
before binding to a multicast address: the multicast check fails and
INP_REUSEPORT is not set.

Fix this by adding INP_REUSEADDR flag to unconditionally cache
SO_REUSEADDR.

PR: 179901
Submitted by: Michael Gmelin freebsd grem.de (initial version)
Reviewed by: rwatson
MFC after: 1 week


252511 02-Jul-2013 hrs

- Allow ND6_IFF_AUTO_LINKLOCAL for IFT_BRIDGE. An interface with IFT_BRIDGE
is initialized with !ND6_IFF_AUTO_LINKLOCAL && !ND6_IFF_ACCEPT_RTADV
regardless of net.inet6.ip6.accept_rtadv and net.inet6.ip6.auto_linklocal.
To configure an autoconfigured link-local address (RFC 4862), the
following rc.conf(5) configuration can be used:

ifconfig_bridge0_ipv6="inet6 auto_linklocal"

- if_bridge(4) now removes IPv6 addresses on a member interface to be
added when the parent interface or one of the existing member
interfaces has an IPv6 address. if_bridge(4) merges each link-local
scope zone which the member interfaces form respectively, so it causes
address scope violation. Removal of the IPv6 addresses prevents it.

- if_lagg(4) now removes IPv6 addresses on a member interfaces
unconditionally.

- Set reasonable flags to non-IPv6-capable interfaces. [*]

Submitted by: rpaulo [*]
MFC after: 1 week


252141 24-Jun-2013 qingli

Delete the nd6 entries associated with an off-link prefix
if the same prefix cannot be found on an alternative
interface.

Reviewed by: hrs
MFC after: 1 week


252026 20-Jun-2013 ae

Use IPSECSTAT_INC() and IPSEC6STAT_INC() macros for ipsec statistics
accounting.

MFC after: 2 weeks


252009 19-Jun-2013 ae

Use PIM6STAT_INC() and MRT6STAT_INC() macros for IPv6 multicast
statistics accounting.

MFC after: 2 weeks


252007 19-Jun-2013 ae

Use RIP6STAT_INC() macro for raw ip6 statistics accounting.

MFC after: 2 weeks


251995 19-Jun-2013 ae

Use ICMP6STAT_INC() macro for ICMPv6 errors accounting.

MFC after: 2 weeks


250815 19-May-2013 melifaro

Really fix netmask address family this time.

MFC with: r250813


250813 19-May-2013 melifaro

Finish r85740 : Make IPv6 netmask has address family set.
This pleases routing daemons like bird.

MFC after: 2 weeks


250700 16-May-2013 julian

Finally change the mbuf to have its own fib field instead of stealing
4 flag bits. This was supposed to happen in 8.0, and again in 2012..

MFC after: never


250466 10-May-2013 tuexen

Honor the net.inet6.ip6.v6only sysctl variable and the IPV6_V6ONLY
socket option for SCTP sockets in the same way as for UDP or TCP
sockets.

MFC after: 2 weeks


250251 04-May-2013 hrs

Use FF02:0:0:0:0:2:FF00::/104 prefix for IPv6 Node Information Group
Address. Although KAME implementation used FF02:0:0:0:0:2::/96 based on
older versions of draft-ietf-ipngwg-icmp-name-lookup, it has been changed
in RFC 4620.

The kernel always joins the /104-prefixed address, and additionally does
/96-prefixed one only when net.inet6.icmp6.nodeinfo_oldmcprefix=1.
The default value of the sysctl is 1.

ping6(8) -N flag now uses /104-prefixed one. When this flag is specified
twice, it uses /96-prefixed one instead.

Reviewed by: ume
Based on work by: Thomas Scheffler
PR: conf/174957
MFC after: 2 weeks


249925 26-Apr-2013 glebius

Add const qualifier to the dst parameter of the ifnet if_output method.


249837 24-Apr-2013 ae

Remove unused variable.

MFC after: 1 week


249742 21-Apr-2013 oleg

Plug static llentry leak (ipv4 & ipv6 were affected).

PR: kern/172985
MFC after: 1 month


249552 16-Apr-2013 tijl

Fix build after r249543.


249546 16-Apr-2013 ae

Fix accounting after the r249528, also add several another counters to
the statistics.


249544 16-Apr-2013 ae

Use IP6S_M2MMAX macro.


249543 16-Apr-2013 ae

Replace hardcoded numbers.


249528 15-Apr-2013 ae

The source address selection algorithm tries to apply several rules
for the set of IPv6 addresses. Now each attempt goes into IPv6 statistics,
even if given rule did not won. Change this and take into account only
those rules, that won. Also add accounting for cases, when algorithm
fails to select an address.


249398 12-Apr-2013 ae

Free memory after deleting an address policy entry.

MFC after: 1 week


249294 09-Apr-2013 ae

Use IP6STAT_INC/IP6STAT_DEC macros to update ip6 stats.

MFC after: 1 week


248608 22-Mar-2013 kevlo

Clean up some unused leftover code.

Pointed out by: ae


248607 22-Mar-2013 kevlo

Remove unused global variables.

Reviewed by: ae, glebius


248328 15-Mar-2013 glebius

- Use m_getcl() instead of hand allocating.
- Do not calculate constant length values at run time,
CTASSERT() their sanity.
- Remove superfluous cleaning of mbuf fields after allocation.
- Replace compat macros with function calls.

Sponsored by: Nginx, Inc.


248321 15-Mar-2013 glebius

- Use m_getcl() instead of hand allocating.
- Use m_get()/m_gethdr() instead of macros.
- Remove superfluous cleaning of mbuf fields after allocation.

Sponsored by: Nginx, Inc.


248320 15-Mar-2013 glebius

Use m_getcl() instead of hand made allocation.

Sponsored by: Nginx, Inc.


248180 12-Mar-2013 ae

Take the inpcb rlock before calculating checksum, it was accidentally
moved in r191672.

Obtained from: Yandex LLC
MFC after: 1 week


245925 26-Jan-2013 np

Generate lle_event in the IPv6 neighbor discovery code too.

Reviewed by: bz@


245922 25-Jan-2013 np

Avoid NULL dereference in nd6_storelladdr when no mbuf is provided. It
is called this way from a couple of places in the OFED code. (toecore
calls it too but that's going to change shortly).

Reviewed by: bz@


245244 10-Jan-2013 ae

Simplify in6_setscope() function to get better performance.
Currently we use interface indeces as zone IDs for link-local and
interface-local scopes, and since we don't have any tool to configure
zone IDs, there is no need to acquire the afdata lock several times per
packet only to read if_index value.
So, now in6_setscope reads zone IDs for interface-local, link-local and
global scopes without a lock.

Sponsored by: Yandex LLC
MFC after: 2 weeks


245233 09-Jan-2013 ae

Remove unneeded variable.

MFC after: 1 week


245230 09-Jan-2013 ume

Add no_prefer_iface option.
It stops treating the address on the interface as special by source
address selection rule even when the interface is outgoing interface.
This is desired in some situation.

Requested by: hrs
Reviewed by: IHANet folks including hrs
MFC after: 1 week


245199 09-Jan-2013 ae

The in6_setscope() function determines the scope zone id of an address
and embeds it into address. Inside the kernel we keep addresses with
embedded zone id only for two scopes: link-local and interface-local.

For other scopes this function is nop in most cases. To reduce an
overhead of locking, first check that address is capable for embedding.
Also, handle the loopback address before acquire the lock.

Sponsored by: Yandex LLC
MFC after: 1 week


244989 03-Jan-2013 peter

Temporarily revert rev 244678. This is causing loopback problems with
the lo (loopback) interfaces.


244678 25-Dec-2012 glebius

The SIOCSIFFLAGS ioctl handler runs if_up()/if_down() that notify
all interested parties in case if interface flag IFF_UP has changed.

However, not only SIOCSIFFLAGS can raise the flag, but SIOCAIFADDR
and SIOCAIFADDR_IN6 can, too. The actual |= is done not in the protocol
code, but in code of interface drivers. To fix this historical layering
violation, we will check whether ifp->if_ioctl(SIOCSIFADDR) raised the
IFF_UP flag, and if it did, run the if_up() handler.

This fixes configuring an address under CARP control on an interface
that was initially !IFF_UP.

P.S. I intentionally omitted handling the IFF_SMART flag. This flag was
never ever used in any driver since it was introduced, and since it
means another layering violation, it should be garbage collected instead
of pretended to be supported.


244441 19-Dec-2012 ae

When we have some address to forward (e.g. it was specified with ipfw fwd),
we should pass it as first argument into in6_selectroute_fib function to
initiate new route lookup.

MFC after: 1 week


244440 19-Dec-2012 ae

Make dst_sa initialization only when it is actually needed.

MFC after: 1 week


244439 19-Dec-2012 ae

The selectroute functions does own account of EHOSTUNREACH errors,
no need to do it twice.

MFC after: 1 week


244360 17-Dec-2012 ae

Use M_PROTO7 flag for M_IP6_NEXTHOP, because M_PROTO2 was used for
M_AUTHIPHDR.

Pointy hat to: ae
Reported by: Vadim Goncharov
MFC after: 3 days


244272 15-Dec-2012 ae

In additional to the tailq of IPv6 addresses add the hash table.
For now use 256 buckets and fnv_hash function. Use xor'ed 32-bit
s6_addr32 parts of in6_addr structure as a hash key. Update
in6_localip and in6_is_addr_deprecated to use hash table for fastest
lookup.

Sponsored by: Yandex LLC
Discussed with: dwmalone, glebius, bz


244183 13-Dec-2012 glebius

Fix problem in r238990. The LLE_LINKED flag should be tested prior to
entering llentry_free(), and in case if we lose the race, we should simply
perform LLE_FREE_LOCKED(). Otherwise, if the race is lost by the thread
performing arptimer(), it will remove two references from the lle instead
of one.

Reported by: Ian FREISLICH <ianf clue.co.za>


243903 05-Dec-2012 hrs

- Move definition of V_deembed_scopeid to scope6_var.h.
- Deembed scope id in L3 address in in6_lltable_dump().
- Simplify scope id recovery in rtsock routines.
- Remove embedded scope id handling in ndp(8) and route(8) completely.


243882 05-Dec-2012 glebius

Mechanically substitute flags from historic mbuf allocator with
malloc(9) flags within sys.

Exceptions:

- sys/contrib not touched
- sys/mbuf.h edited manually


243336 20-Nov-2012 ae

Remove opt_inet.h, it isn't required here.

MFC after: 1 week


243235 18-Nov-2012 hrs

Check if an extracted zoneid is equal to the non-zero sin6_scope_id only when
it is link-local or MC interface-local.


243186 17-Nov-2012 tuexen

Add support for SCTP/UDP/IPV6.
This completes the support of
http://tools.ietf.org/html/draft-ietf-tsvwg-sctp-udp-encaps

MFC after: 1 week


243148 16-Nov-2012 ae

Reduce the overhead of locking, use IF_AFDATA_RLOCK() when we are doing
simple lookups.

Sponsored by: Yandex LLC
MFC after: 1 week


243031 14-Nov-2012 ae

if_afdata lock was converted from mutex to rwlock a long ago, so we can
replace IF_AFDATA_LOCK() macro depending to the access type.

Sponsored by: Yandex LLC
MFC after: 1 week


243029 14-Nov-2012 ae

SCOPE6_LOCK protects V_sid_default, no need to acquire it without
any access to V_sid_default.

Sponsored by: Yandex LLC
MFC after: 1 week


243028 14-Nov-2012 ae

zoneid has unsigned type.

MFC after: 1 week


242938 13-Nov-2012 obrien

Use consistent style.


242463 02-Nov-2012 ae

Remove the recently added sysctl variable net.pfil.forward.
Instead, add protocol specific mbuf flags M_IP_NEXTHOP and
M_IP6_NEXTHOP. Use them to indicate that the mbuf's chain
contains the PACKET_TAG_IPFORWARD tag. And do a tag lookup
only when this flag is set.

Suggested by: andre


242327 29-Oct-2012 tuexen

Whitespace changes due to upstream integration of SCTP changes in the
FreeBSD code base.


242079 25-Oct-2012 ae

Remove the IPFIREWALL_FORWARD kernel option and make possible to turn
on the related functionality in the runtime via the sysctl variable
net.pfil.forward. It is turned off by default.

Sponsored by: Yandex LLC
Discussed with: net@
MFC after: 2 weeks


241916 22-Oct-2012 delphij

Remove __P.

Submitted by: kevlo
Reviewed by: md5(1)
MFC after: 2 months


241913 22-Oct-2012 glebius

Switch the entire IPv4 stack to keep the IP packet header
in network byte order. Any host byte order processing is
done in local variables and host byte order values are
never[1] written to a packet.

After this change a packet processed by the stack isn't
modified at all[2] except for TTL.

After this change a network stack hacker doesn't need to
scratch his head trying to figure out what is the byte order
at the given place in the stack.

[1] One exception still remains. The raw sockets convert host
byte order before pass a packet to an application. Probably
this would remain for ages for compatibility.

[2] The ip_input() still subtructs header len from ip->ip_len,
but this is planned to be fixed soon.

Reviewed by: luigi, Maxim Dounin <mdounin mdounin.ru>
Tested by: ray, Olivier Cochard-Labbe <olivier cochard.me>


241884 22-Oct-2012 melifaro

Eliminate code checking if found IPv6 rte is dynamic. IPv6 redirects
are using (different) ND-based approach described in RFC 4861. This change
is similar to r241406 which conditionally skips the same check in IPv4.

This change is part of bigger patch eliminating rte locking.

Sponsored by: Yandex LLC.
OK'd by: hrs
MFC after: 2 weeks


241686 18-Oct-2012 andre

Mechanically remove the last stray remains of spl* calls from net*/*.
They have been Noop's for a long time now.


241502 13-Oct-2012 melifaro

Cleanup documentation: cloning route support has been removed in r186119.

MFC after: 2 weeks


241394 10-Oct-2012 kevlo

Revert previous commit...

Pointyhat to: kevlo (myself)


241370 09-Oct-2012 kevlo

Prefer NULL over 0 for pointers


241349 08-Oct-2012 avg

ip6_ipsec_output: fix a typo in r241344

Acting as a remote drone of glebius.


241344 08-Oct-2012 glebius

After r241245 it appeared that in_delayed_cksum(), which still expects
host byte order, was sometimes called with net byte order. Since we are
moving towards net byte order throughout the stack, the function was
converted to expect net byte order, and its consumers fixed appropriately:
- ip_output(), ipfilter(4) not changed, since already call
in_delayed_cksum() with header in net byte order.
- divert(4), ng_nat(4), ipfw_nat(4) now don't need to swap byte order
there and back.
- mrouting code and IPv6 ipsec now need to switch byte order there and
back, but I hope, this is temporary solution.
- In ipsec(4) shifted switch to net byte order prior to in_delayed_cksum().
- pf_route() catches up on r241245 changes to ip_output().


240233 08-Sep-2012 glebius

Merge the projects/pf/head branch, that was worked on for last six months,
into head. The most significant achievements in the new code:

o Fine grained locking, thus much better performance.
o Fixes to many problems in pf, that were specific to FreeBSD port.

New code doesn't have that many ifdefs and much less OpenBSDisms, thus
is more attractive to our developers.

Those interested in details, can browse through SVN log of the
projects/pf/head branch. And for reference, here is exact list of
revisions merged:

r232043, r232044, r232062, r232148, r232149, r232150, r232298, r232330,
r232332, r232340, r232386, r232390, r232391, r232605, r232655, r232656,
r232661, r232662, r232663, r232664, r232673, r232691, r233309, r233782,
r233829, r233830, r233834, r233835, r233836, r233865, r233866, r233868,
r233873, r234056, r234096, r234100, r234108, r234175, r234187, r234223,
r234271, r234272, r234282, r234307, r234309, r234382, r234384, r234456,
r234486, r234606, r234640, r234641, r234642, r234644, r234651, r235505,
r235506, r235535, r235605, r235606, r235826, r235991, r235993, r236168,
r236173, r236179, r236180, r236181, r236186, r236223, r236227, r236230,
r236252, r236254, r236298, r236299, r236300, r236301, r236397, r236398,
r236399, r236499, r236512, r236513, r236525, r236526, r236545, r236548,
r236553, r236554, r236556, r236557, r236561, r236570, r236630, r236672,
r236673, r236679, r236706, r236710, r236718, r237154, r237155, r237169,
r237314, r237363, r237364, r237368, r237369, r237376, r237440, r237442,
r237751, r237783, r237784, r237785, r237788, r237791, r238421, r238522,
r238523, r238524, r238525, r239173, r239186, r239644, r239652, r239661,
r239773, r240125, r240130, r240131, r240136, r240186, r240196, r240212.

I'd like to thank people who participated in early testing:

Tested by: Florian Smeets <flo freebsd.org>
Tested by: Chekaluk Vitaly <artemrts ukr.net>
Tested by: Ben Wilber <ben desync.com>
Tested by: Ian FREISLICH <ianf cloudseed.co.za>


239383 19-Aug-2012 trociny

In ip6_ctloutput() guard inp_flags modifications with INP_WLOCK.

MFC after: 2 weeks


238990 02-Aug-2012 glebius

Fix races between in_lltable_prefix_free(), lla_lookup(),
llentry_free() and arptimer():

o Use callout_init_rw() for lle timeout, this allows us safely
disestablish them.
- This allows us to simplify the arptimer() and make it
race safe.
o Consistently use ifp->if_afdata_lock to lock access to
linked lists in the lle hashes.
o Introduce new lle flag LLE_LINKED, which marks an entry that
is attached to the hash.
- Use LLE_LINKED to avoid double unlinking via consequent
calls to llentry_free().
- Mark lle with LLE_DELETED via |= operation istead of =,
so that other flags won't be lost.
o Make LLE_ADDREF(), LLE_REMREF() and LLE_FREE_LOCKED() more
consistent and provide more informative KASSERTs.

The patch is a collaborative work of all submitters and myself.

PR: kern/165863
Submitted by: Andrey Zonov <andrey zonov.org>
Submitted by: Ryan Stone <rysto32 gmail.com>
Submitted by: Eric van Gyzen <eric_van_gyzen dell.com>


238967 01-Aug-2012 glebius

Some more whitespace cleanup.


238960 31-Jul-2012 bz

In case of IPsec he have to do delayed checksum calculations before
adding any extension header, or rather before calling into IPsec
processing as we may send the packet and not return to IPv6 output
processing here.

PR: kern/170116
MFC After: 3 days


238945 31-Jul-2012 glebius

Some style(9) and whitespace changes.

Together with: Andrey Zonov <andrey zonov.org>


238935 31-Jul-2012 bz

Properly apply #ifdef INET and leave a comment that we are (will) apply
delayed IPv6 checksum processing in ip6_output.c when doing IPsec.

PR: kern/170116
MFC after: 3 days


238934 31-Jul-2012 bz

Improve the should-never-hit printf to ease debugging in case we'd ever hit
it again when doing the delayed IPv6 checksum calculations.

MFC after: 3 days


238878 29-Jul-2012 bz

For consistency put the IPsec comment iside the #fidef section.

MFC after: 3 days


238877 29-Jul-2012 bz

Fix a comment that we do not have an SA yet but need to acquire one.

MFC after: 3 days


238501 15-Jul-2012 tuexen

Changes which improve compilation if neither INET nor INET6 is defined.

MFC after: 3 days


238475 15-Jul-2012 tuexen

#ifdef INET and INET6 consistently. This also fixes a bug, where
it was done wrong.

MFC after: 3 days


238273 09-Jul-2012 hrs

Remove "prefer_source" address selection option. FreeBSD has had an
implementation of RFC 3484 for this purpose for a long time and "prefer_source"
was never implemented actually. ND6_IFF_PREFER_SOURCE macro is left intact.


238248 08-Jul-2012 bz

Implement handling of "atomic fragements" as outlined in
draft-gont-6man-ipv6-atomic-fragments to mitigate one class of
possible fragmentation-based attacks.

MFC after: 5 days


238222 08-Jul-2012 bz

As mentioned in the commit message of r237571 (copied from a prototype
patch of mine) also check if the 2nd in6_setscope() failed and return
the error in that case.

MFC after: 5 days


238092 04-Jul-2012 glebius

When ip_output()/ip6_output() is supplied a struct route *ro argument,
it skips FLOWTABLE lookup. However, the non-NULL ro has dual meaning
here: it may be supplied to provide route, and it may be supplied to
store and return to caller the route that ip_output()/ip6_output()
finds. In the latter case skipping FLOWTABLE lookup is pessimisation.

The difference between struct route filled by FLOWTABLE and filled
by rtalloc() family is that the former doesn't hold a reference on
its rtentry. Reference is hold by flow entry, and it is about to
be released in future. Thus, route filled by FLOWTABLE shouldn't
be passed to RTFREE() macro.

- Introduce new flag for struct route/route_in6, that marks route
not holding a reference on rtentry.
- Introduce new macro RO_RTFREE() that cleans up a struct route
depending on its kind.
- All callers to ip_output()/ip6_output() that do supply non-NULL
but empty route should use RO_RTFREE() to free results of
lookup.
- ip_output()/ip6_output() now do FLOWTABLE lookup always when
ro->ro_rt == NULL.

Tested by: tuexen (SCTP part)


238016 02-Jul-2012 glebius

Remove route caching from IP multicast routing code. There is no
reason to do that, and also, cached route never got unreferenced,
which meant a reference leak.

Reviewed by: bms


238003 02-Jul-2012 tuexen

Move common code parts to sctp_common_input_processing().

MFC after: 3 days


237736 28-Jun-2012 bms

Kick the current-state report timer when a V1 group report would
be triggered.

Submitted by: rpaulo@
MFC after: 3 days


237735 28-Jun-2012 bms

Fix a typo in MLD query exponent processing.

Submitted by: rpaulo@
MFC after: 3 days


237734 28-Jun-2012 bms

In MLDv2 general query processing, do not enforce the strict check
on query origins.

Submitted by: Gu Yong
MFC after: 3 days


237715 28-Jun-2012 tuexen

Pass the src and dst address of a received packet explicitly around.

MFC after: 3 days


237571 25-Jun-2012 delphij

Fix a LOR acquiring the if_afdata lock while holding an rtentry lock.
Possibly do some entra work in case we would not get into the
ifa0 != NULL paths later as we already do for the mltaddr before.

XXX We should possibly error in case in6_setscope fails.

Reference: http://lists.freebsd.org/pipermail/freebsd-net/2011-September/029829.html

Submitted by: bz
MFC after: 1 week


237569 25-Jun-2012 tuexen

Unify sctp_input() and sctp6_input().

MFC after: 3 days


237565 25-Jun-2012 tuexen

Whitespace cleanup.

MFC after: 3 days


237542 24-Jun-2012 tuexen

Pass the packet length explicitly around.

MFC after: 3 days


237540 24-Jun-2012 tuexen

Do packet logging in a consistent way.

MFC after: 3 days


237459 22-Jun-2012 bz

Just add a comment to further investigate when being closer to that code
again next time. The condition of the 2nd if() is very unlikely ever met.


237049 14-Jun-2012 tuexen

Pass flowid explicitly through the stack instead of taking it from
the mbuf chain at different places.
While there: Fix several bugs related to VRFs.

MFC after: 3 days


236958 12-Jun-2012 tuexen

Deliver IPV6_TCLASS, IPV6_HOPLIMIT and IPV6_PKTINFO cmsgs (if
requested) on IPV6 sockets, which have been marked to be not IPV6_V6ONLY,
for each received IPV4 packet.

MFC after: 3 days


236615 05-Jun-2012 bz

Plug two interface address refcount leaks in early error return cases
in the ioctl path.

Reported by: rpaulo
Reviewed by: emax
MFC after: 3 days


236501 03-Jun-2012 emax

Plug reference leak.

Interface routes are refcounted as packets move through the stack,
and there's garbage collection tied to it so that route changes can
safely propagate while traffic is flowing. In our setup, we weren't
changing or deleting any routes, but the refcounting logic in
ip6_input() was wrong and caused a reference leak on every inbound
V6 packet. This eventually caused a 32bit overflow, and the resulting
0 value caused the garbage collection to run on the active route.
That then snowballed into the panic.

Reviewed by: scottl
MFC after: 3 days


236332 30-May-2012 tuexen

Seperate SCTP checksum offloading for IPv4 and IPv6.
While there: remove some trainling whitespaces.

MFC after: 3 days
X-MFC with: 236170


236327 30-May-2012 emax

When we return deprecated addresses, we need to reference them.

Reviewed by: bz, scottl
MFC after: 3 days


236170 28-May-2012 bz

It turns out that too many drivers are not only parsing the L2/3/4
headers for TSO but also for generic checksum offloading. Ideally we
would only have one common function shared amongst all drivers, and
perhaps when updating them for IPv6 we should introduce that.
Eventually we should provide the meta information along with mbufs to
avoid (re-)parsing entirely.

To not break IPv6 (checksums and offload) and to be able to MFC the
changes without risking to hurt 3rd party drivers, duplicate the v4
framework, as other OSes have done as well.

Introduce interface capability flags for TX/RX checksum offload with
IPv6, to allow independent toggling (where possible). Add CSUM_*_IPV6
flags for UDP/TCP over IPv6, and reserve further for SCTP, and IPv6
fragmentation. Define CSUM_DELAY_DATA_IPV6 as we do for legacy IP and
add an alias for CSUM_DATA_VALID_IPV6.

This pretty much brings IPv6 handling in line with IPv4.
TSO is still handled in a different way and not via if_hwassist.

Update ifconfig to allow (un)setting of the new capability flags.
Update loopback to announce the new capabilities and if_hwassist flags.

Individual driver updates will have to follow, as will SCTP.

Reported by: gallatin, dim, ..
Reviewed by: gallatin (glanced at?)
MFC after: 3 days
X-MFC with: r235961,235959,235958


236130 27-May-2012 bz

Correctly get the payload length in host byte order. While we
already plan to support >64k payload here, the IPv6 header payload
length obviously is only 16 bit and the calculations need to be right.

Reported by: dim
Tested by: dim
MFC after: 1 day
X-MFC: with r235958


236087 26-May-2012 tuexen

Get rid of SCTP specific code to avoid CRC32C computations on loopback.
Just just offloading.
MFC after: 3 days


235986 25-May-2012 bz

MFp4 bz_ipv6_fast:

Use M_ZERO with malloc rather than calling bzero() ourselves.

Change if () panic() checks to KASSERT()s as they are only
catching invariants in code flow but not dependent on network
input/output.

Move initial assigments indirecting pointers after the lock
has been aquired.

Passing layer boundries, reset M_PROTOFLAGS.

Remove a NULL assignment before free.

Sponsored by: The FreeBSD Foundation
Sponsored by: iXsystems

Reviewed by: gnn (as part of the whole)
MFC After: 3 days


235962 25-May-2012 bz

MFp4 bz_ipv6_fast:

Factor out Hop-By-Hop option processing. It's still not heavily used,
it reduces the footprint of ip6_input() and makes ip6_input() more
readable.

Sponsored by: The FreeBSD Foundation
Sponsored by: iXsystems

Reviewed by: gnn (as part of the whole)
MFC After: 3 days


235959 25-May-2012 bz

MFp4 bz_ipv6_fast:

Defer checksum calulations on UDP6 output and respect the mbuf
flags set by NICs having done checksum validation for us already,
thus saving the computing time in the input path as well.

Sponsored by: The FreeBSD Foundation
Sponsored by: iXsystems

Reviewed by: gnn (as part of the whole)
MFC After: 3 days


235958 25-May-2012 bz

MFp4 bz_ipv6_fast:

Add support for delayed checksum calculations in the IPv6
output path. We currently cannot offload to the card if we
add extension headers (which incl. fragmentation).

Fix two SCTP offload support copy&paste bugs: calculate
checksums if fragmenting and no need to flag IPv4 header
checksums in the IPv6 forwarding path.

Sponsored by: The FreeBSD Foundation
Sponsored by: iXsystems

Reviewed by: gnn (as part of the whole)
MFC After: 3 days


235956 25-May-2012 bz

MFp4 bz_ipv6_fast:

Hide the ip6aux functions. The only one referenced outside ip6_input.c
is not compiled in yet (__notyet__) in route6.c (r235954). We do have
accessor functions that should be used.

Sponsored by: The FreeBSD Foundation
Sponsored by: iXsystems

Reviewed by: gnn (as part of the whole)
MFC After: 3 days
X-MFC: KPI?


235955 25-May-2012 bz

MFp4 bz_ipv6_fast:

Simplify the code removing a return from an earlier else case,
not differing from the default function return called now.

Sponsored by: The FreeBSD Foundation
Sponsored by: iXsystems

Reviewed by: gnn (as part of the whole)
MFC After: 3 days


235954 25-May-2012 bz

MFp4 bz_ipv6_fast:

We currently nowhere set IP6A_SWAP making the entire check useless
with the current code. Keep around but do not compile in.

Sponsored by: The FreeBSD Foundation
Sponsored by: iXsystems

Reviewed by: gnn (as part of the whole)
MFC After: 3 days


235953 25-May-2012 bz

MFp4 bz_ipv6_fast:

No need to hold the (expensive) rt lock over (expensive) logging.

Sponsored by: The FreeBSD Foundation
Sponsored by: iXsystems

Reviewed by: gnn (as part of the whole)
MFC After: 3 days


235924 24-May-2012 bz

MFp4 bz_ipv6_fast:

Introduce a (for now copied stripped down) in6_cksum_pseudo()
function. We should be able to use this from in6_cksum() but
we should also ponder possible MD specific improvements.
It takes an extra csum argument to allow for easy checks as
will be done by the upper layer protocol input paths.

Sponsored by: The FreeBSD Foundation
Sponsored by: iXsystems

Reviewed by: gnn (as part of the whole)
MFC After: 3 days


235921 24-May-2012 bz

MFp4 bz_ipv6_fast:

Optimize in6_cksum(), re-ordering work and limiting variable
initialization, removing a bzero() for mostly re-initialized
struct values, making use of the newly introduced in6_getscope(),
as well as converting an if/panic to a KASSERT().

Sponsored by: The FreeBSD Foundation
Sponsored by: iXsystems

Reviewed by: gnn (as part of the whole)
MFC After: 3 days


235916 24-May-2012 bz

MFp4 bz_ipv6_fast:

Introduce in6_getscope() to allow more effective checksum
computations without the need to copy the address to clear the
scope.

Sponsored by: The FreeBSD Foundation
Sponsored by: iXsystems

Reviewed by: gnn (as part of the whole)
MFC After: 3 days


235828 23-May-2012 tuexen

Use consistent text at the begining of the files.

MFC after: 3 days


235681 20-May-2012 marius

Rewrite nd6_sysctl_{d,p}rlist() to avoid misaligned accesses to char arrays
casted to structs by getting rid of these buffers entirely. In r169832, it
was tried to paper over this issue by 32-bit aligning the buffers. Depending
on compiler optimizations that still was insufficient for 64-bit architectures
with strong alignment requirements though.
While at it, add comments regarding the total lack of locking in this area.

Tested by: bz
Reviewed by: bz (slightly earlier version), yongari (earlier version)
MFC after: 1 week


235415 13-May-2012 tuexen

Missed to commit this in r235414.

MFC after: 3 days


235403 13-May-2012 tuexen

Use ECONNABORTED in cases where the ABORT was sent to the peer.

MFC after: 3 days


235360 12-May-2012 tuexen

Provide in the association change notification the received ABORT chunk
if case of SCTP_COMM_LOST or SCTP_CANT_STR_ASSOC as required by RFC 6458.

MFC after: 3 days


233272 21-Mar-2012 glebius

in6_pcblookup_local() still can return a pcb with NULL
inp_socket. To avoid panic, do not dereference inp_socket,
but obtain reuse port option from inp_flags2, like this
is done after next call to in_pcblookup_local() a few lines
down below.

Submitted by: rwatson


233005 15-Mar-2012 tuexen

Clean up, no functional change.

MFC after: 3 days.


232514 04-Mar-2012 bz

In nd6_options() ignore the RFC 6106 options completely rather than printing
them if nd6_debug is enabled as unknown. Leave a comment about the RFC4191
option as I am undecided so far.

Discussed with: hrs
MFC after: 3 days


232379 02-Mar-2012 hrs

Allow to configure net.inet6.ip6.{accept_rtadv,no_radr} by the loader tunables
as well because they have to be configured before interface initialization for
AF_INET6.


232378 02-Mar-2012 hrs

Remove a redundant check.


232127 24-Feb-2012 bz

In selectroute() add a missing fibnum argument to an in6_rtalloc()
call in an #if 0 section.

In in6_selecthlim() optimize a case where in6p cannot be NULL due to an
earlier check.

More consistently use u_int instead of int for fibnum function arguments.

Sponsored by: Cisco Systems, Inc.
MFC after: 3 days


232054 23-Feb-2012 kmacy

When using flowtable llentrys can outlive the interface with which they're associated
at which the lle_tbl pointer points to freed memory and the llt_free pointer is no longer
valid.

Move the free pointer in to the llentry itself and update the initalization sites.

MFC after: 2 weeks


231895 18-Feb-2012 tuexen

Remove two clang warnings.

MFC after: 1 month.


231852 17-Feb-2012 bz

Merge multi-FIB IPv6 support from projects/multi-fibv6/head/:

Extend the so far IPv4-only support for multiple routing tables (FIBs)
introduced in r178888 to IPv6 providing feature parity.

This includes an extended rtalloc(9) KPI for IPv6, the necessary
adjustments to the network stack, and user land support as in netstat.

Sponsored by: Cisco Systems, Inc.
Reviewed by: melifaro (basically)
MFC after: 10 days


230584 26-Jan-2012 glebius

Remove casts from inet6 address testing macros, thus preserving
qualifier from original argument.

Obtained from: NetBSD, r. 1.67
Submitted by: maxim


230531 25-Jan-2012 pluknet

Remove unused variable.

The actual ia6->ia6_lifetime access is hidden in
IFA6_IS_INVALID/IFA6_IS_DEPRECATED macros since a long time ago
(see netinet6/nd6.c, r1.104 of KAME for the reference).

MFC after: 3 days


230506 24-Jan-2012 bz

Plug a possible ifa_ref leak in case of premature return from in6_purgeaddr().

Reviewed by: rwatson
MFC after: 3 days


230496 24-Jan-2012 pluknet

Remove the stale XXX rt_newaddrmsg comment.
A routing socket message is generated since r192282.

Reviewed by: bz
MFC after: 3 days


230494 24-Jan-2012 bz

Remove unnecessary line break.

MFC after: 3 days


230442 22-Jan-2012 bz

Clean up some #endif comments removing from short sections. Add #endif
comments to longer, also refining strange ones.

Properly use #ifdef rather than #if defined() where possible. Four
#if defined(PCBGROUP) occurances (netinet and netinet6) were ignored to
avoid conflicts with eventually upcoming changes for RSS.

Reported by: bde (most)
Reviewed by: bde
MFC after: 3 days


230138 15-Jan-2012 tuexen

Small cleanup, no functional change.


229805 08-Jan-2012 tuexen

Add an SCTP sysctl "blackhole", similar to the one for TCP.
If set to 1, no ABORT is sent back in response to an incoming
INIT. If set to 2, no ABORT is sent back in response to
an out of the blue packet. If set to 0 (the default), ABORTs
are sent.
Discussed with rrs@.

MFC after: 1 month.


229621 05-Jan-2012 jhb

Convert all users of IF_ADDR_LOCK to use new locking macros that specify
either a read lock or write lock.

Reviewed by: bz
MFC after: 2 weeks


229547 05-Jan-2012 bz

Mark a couple of file local functions static and stop exporting them.

MFC after: 1 week


229546 05-Jan-2012 bz

Convert an #ifdef DIAGNOSTIC if/panic to a KASSERT.

MFC after: 1 week


229479 04-Jan-2012 jhb

Use the mli_relinmhead list normally used to defer calls to
in6m_release_locked() to defer calls to mld_v1_transmit_report() until
after the IF_ADDR_LOCK is dropped. This removes a race where the lock
is dropped and reacquired while attempting to walk an interface's
address list.

Reviewed by: bz
MFC after: 1 week


229465 04-Jan-2012 glebius

Use correct locking when traversing interface address list.

Reviewed by: bz


229420 03-Jan-2012 jhb

When cancelling multicast timers on an interface, don't release the
reference on a group in the leaving state while iterating over the loop.
Instead, use the same approach used in igmp_ifdetach() and mld_ifdetach()
of placing the groups to free on pending release list and then releasing
the references after dropping the IF_ADDR_LOCK. This closes an ugly race
where the code was dropping the lock in the middle of iterating over the
list. It also fixes some additional potential use-after-free bugs since
the cancellation routine also applied other changes to the group after
dropping the reference. Now those changes are performed before the
reference is dropped and the group is potentially freed.

Prodded to fix by: glebius
Reviewed by: bz
MFC after: 1 week


229414 03-Jan-2012 jhb

Grab a reference on the matching interface address (ifa) in the handling
of the SIOC[DG]LIFADDR icotls before dropping the IF_ADDR_LOCK() and
release the reference after using it. This prevents the address from
being potentially freed out from under the ioctl handler.

Reviewed by: bz
MFC after: 1 week


229390 03-Jan-2012 jhb

Use TAILQ_FOREACH() instead of TAILQ_FOREACH_SAFE() for some loops that
do not modify the queues they iterate over.

Submitted by: glebius


229276 02-Jan-2012 bz

Remove an uneeded inpcb forward declaration and align the function
declaration following to match the style in the rest of the file.

MFC after: 3 days


229127 31-Dec-2011 bz

Remove a declaration to a non-existent function.

MFC after: 3 days
Sponsored by: The FreeBSD Foundation


228966 29-Dec-2011 jhb

Use queue(3) macros instead of home-rolled versions in several places in
the INET6 code. This includes retiring the 'ndpr_next' and 'pfr_next'
macros.

Submitted by: pluknet (earlier version)
Reviewed by: pluknet


228907 27-Dec-2011 tuexen

Address issues found by clang. While there, fix also some style
issues.

MFC after: 3 months.


228866 24-Dec-2011 jhb

Fix a bug where TAILQ_FIRST(&V_ifnet) was accessed without holding the
proper lock.

Reviewed by: bz
MFC after: 1 week


228768 21-Dec-2011 glebius

Provide ABI compatibility shim to enable configuring of addresses
with ifconfig(8) prior to r228571.

Requested by: brooks


228700 19-Dec-2011 maxim

o Convert IPv6 read-only stats sysctls to the read-write ones.
o Teach netstat(1) -z to reset these stats sysctls.

PR: bin/153206
Reviewed by: glebuis
Sponsored by: NGINX, Inc.
MFC after: 1 month


228653 17-Dec-2011 tuexen

Fix unused parameter warnings.
While there, fix some whitespace issues.

MFC after: 3 months.


228571 16-Dec-2011 glebius

A major overhaul of the CARP implementation. The ip_carp.c was started
from scratch, copying needed functionality from the old implemenation
on demand, with a thorough review of all code. The main change is that
interface layer has been removed from the CARP. Now redundant addresses
are configured exactly on the interfaces, they run on.

The CARP configuration itself is, as before, configured and read via
SIOCSVH/SIOCGVH ioctls. A new prefix created with SIOCAIFADDR or
SIOCAIFADDR_IN6 may now be configured to a particular virtual host id,
which makes the prefix redundant.

ifconfig(8) semantics has been changed too: now one doesn't need
to clone carpXX interface, he/she should directly configure a vhid
on a Ethernet interface.

To supply vhid data from the kernel to an application the getifaddrs(8)
function had been changed to pass ifam_data with each address. [1]

The new implementation definitely closes all PRs related to carp(4)
being an interface, and may close several others. It also allows
to run a single redundant IP per interface.

Big thanks to Bjoern Zeeb for his help with inet6 part of patch, for
idea on using ifam_data and for several rounds of reviewing!

PR: kern/117000, kern/126945, kern/126714, kern/120130, kern/117448
Reviewed by: bz
Submitted by: bz [1]


228321 07-Dec-2011 glebius

Fix double free.

PR: kern/163089
Submitted by: Herbie Robinson <Herbie.Robinson stratus.com>


227481 13-Nov-2011 bz

Return the correct value for the IPV6_MULTICAST_HOPS getsockopt() call.

Submitted by: rpaulo
MFC after: 3 days


227460 11-Nov-2011 qingli

A default route learned from the RAs could be deleted manually
after its installation. This removal may be accidental and can
prevent the default route from being installed in the future if
the associated default router has the best preference. The cause
is the lack of status update in the default router on the state
of its route installation in the kernel FIB. This patch fixes
the described problem.

Reviewed by: hrs, discussed with hrs
MFC after: 5 days


227449 11-Nov-2011 trociny

Fix false positive EADDRINUSE that could be returned by bind, due to
the typo made in r227207.

Reported by: kib
Tested by: kib


227309 07-Nov-2011 ed

Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs.

The SYSCTL_NODE macro defines a list that stores all child-elements of
that node. If there's no SYSCTL_DECL macro anywhere else, there's no
reason why it shouldn't be static.


227308 07-Nov-2011 glebius

In icmp6_redirect_input:

- Assert that we got a valid mbuf with rcvif pointer. [1]
- Use __func__ in logging.

Submitted by: prabhakar lakhera <prabhakar.lakhera gmail.com> [1]
Submitted by: Kristof Provost <kristof sigsegv.be> [1]


227293 07-Nov-2011 ed

Mark MALLOC_DEFINEs static that have no corresponding MALLOC_DECLAREs.

This means that their use is restricted to a single C file.


227207 06-Nov-2011 trociny

Cache SO_REUSEPORT socket option in inpcb-layer in order to avoid
inp_socket->so_options dereference when we may not acquire the lock on
the inpcb.

This fixes the crash due to NULL pointer dereference in
in_pcbbind_setup() when inp_socket->so_options in a pcb returned by
in_pcblookup_local() was checked.

Reported by: dave jones <s.dave.jones@gmail.com>, Arnaud Lacombe <lacombar@gmail.com>
Suggested by: rwatson
Glanced by: rwatson
Tested by: dave jones <s.dave.jones@gmail.com>


227206 06-Nov-2011 trociny

Before dereferencing intotw() check for NULL, the same way as it is
done for in_pcb (see r157474).

MFC after: 1 week


227055 03-Nov-2011 pluknet

Remove a couple of write-only variables.


226453 16-Oct-2011 qingli

The code change made in r226040 was incomplete and resulted in
routes such as fe80::1%lo0 no being installed. This patch completes
the original intended fix.

Reviewed by: hrs, bz
MFC after: 3 days


226451 16-Oct-2011 qingli

The IPv6 code was influx at the time of r196865 due to the L2/L3
separation rewrite changes. r196865 was committed to fix a scope
violation problem in the following test scenario:

box-1# ifconfig em0 inet6 2001:db8:1:: prefixlen 64 anycast
box-1# ifconfig em1 inet6 2001:db8:2::1 prefixlen 64

box-2# ifconfig re0 inet6 2001:db8:1::6 prefixlen 64

em0 and re0 are on the same link.

box-2# ping6 2001:db8:1::
PING6(56=40+8+8 bytes) 2001:db8:1::6 --> 2001:db8:1::

the ICMPv6 response should have a source address of em1, which
is 2001:db8:2::1, not the link-local address of em0.

That code is no longer necessary and breaks the IPv6-Ready logo
testing, so revert it now.

Reviewed by: hrs
MFC after: 3 days


226446 16-Oct-2011 hrs

Fix a problem that an interface unexpectedly becomes IFF_UP by
just doing "ifconfing inet6 -ifdisabled" when the interface has
ND6_IFF_AUTO_LINKLOCAL flag and no link-local address.


226340 13-Oct-2011 glebius

Use TAILQ_FOREACH() in the nd6_dad_find() instead of hand-rolled implementation.


226338 13-Oct-2011 glebius

Restore functions in6_ifaddloop() and in6_ifremloop() that were
inlined by Qing Li in his big new-ARP commit. I am going to utilize
them in my newcarp work, and also these functions left declared
in in6_var.h for all the time they were absent.

Reviewed by: bz


226040 05-Oct-2011 qingli

The IFA_RTSELF instead of the IFA_ROUTE flag should be checked to
determine if a loopback route should be installed for an interface
IPv6 address. Another condition is the address must not belong to a
looopback interface.

Reviewed by: hrs
MFC after: 3 days


225885 30-Sep-2011 bz

Fix an obvious bug from r186196 shadowing a variable, not correctly
appending the new mbuf to the chain reference but possibly causing an mbuf
nextpkt loop leading to a memory used after handoff (or having been freed)
and leaking an mbuf here.

Reviewed by: rwatson, brooks
MFC after: 3 days


225698 20-Sep-2011 kmacy

Make KBI changes required for future MFCing of inpcb rtentry / llentry caching.

Reviewed by: rwatson, bz
Approved by: re (kib)


225682 20-Sep-2011 hrs

Copy ip6po_minmtu and ip6po_prefer_tempaddr in ip6_copypktopts(). This fixes
inconsistency when options are specified by both setsockopt() and ancillary
data types.

PR: kern/158307
Approved by: re (bz)


225521 13-Sep-2011 hrs

Add $ipv6_cpe_wanif to enable functionality required for IPv6 CPE
(r225485). When setting an interface name to it, the following
configurations will be enabled:

1. "no_radr" is set to all IPv6 interfaces automatically.

2. "-no_radr accept_rtadv" will be set only for $ipv6_cpe_wanif. This is
done just before evaluating $ifconfig_IF_ipv6 in the rc.d scripts (this
means you can manually supersede this configuration if necessary).

3. The node will add RA-sending routers to the default router list
even if net.inet6.ip6.forwarding=1.

This mode is added to conform to RFC 6204 (a router which connects
the end-user network to a service provider network). To enable
packet forwarding, you still need to set ipv6_gateway_enable=YES.

Note that accepting router entries into the default router list when
packet forwarding capability and a routing daemon are enabled can
result in messing up the routing table. To minimize such unexpected
behaviors, "no_radr" is set on all interfaces but $ipv6_cpe_wanif.

Approved by: re (bz)


225096 22-Aug-2011 pluknet

Fix if_addr_mtx recursion in mld6.

mld_set_version() is called only from mld_v1_input_query() and
mld_v2_input_query() both holding the if_addr_mtx lock, and then calling
into mld_v2_cancel_link_timers() acquires it the second time, which results
in mtx recursion. To avoid that, delay if_addr_mtx acquisition until after
mld_set_version() is called; while here, further reduce locking scope
to protect only the needed pieces: if_multiaddrs, in6m_lookup_locked().

PR: kern/158426
Reported by: Thomas <tps vr-web.de>,
Tom Vijlbrief <tom.vijlbrief xs4all.nl>
Tested by: Tom Vijlbrief
Reviewed by: bz
Approved by: re (kib)


225044 20-Aug-2011 bz

Add support for IPv6 to ipfw fwd:
Distinguish IPv4 and IPv6 addresses and optional port numbers in
user space to set the option for the correct protocol family.
Add support in the kernel for carrying the new IPv6 destination
address and port.
Add support to TCP and UDP for IPv6 and fix UDP IPv4 to not change
the address in the IP header.
Add support for IPv6 forwarding to a non-local destination.
Add a regession test uitilizing VIMAGE to check all 20 possible
combinations I could think of.

Obtained from: David Dolson at Sandvine Incorporated
(original version for ipfw fwd IPv6 support)
Sponsored by: Sandvine Incorporated
PR: bin/117214
MFC after: 4 weeks
Approved by: re (kib)


225043 20-Aug-2011 bz

Add an in6_localip() helper function as in6_localaddr() is not doing what
people think: returning true for an address in any connected subnet, not
necessarily on the local machine.

Sponsored by: Sandvine Incorporated
MFC after: 2 weeks
Approved by: re (kib)


224641 03-Aug-2011 tuexen

The result of a joint work between rrs@ and myself at the IETF:
* Decouple the path supervision using a separate HB timer per path.
* Add support for potentially failed state.
* Bring back RTO.min to 1 second.
* Accept packets on IP-addresses already announced via an ASCONF
* While there: do some cleanups.

Approved by: re@
MFC after: 2 months.


223963 12-Jul-2011 tuexen

The socket API only specifies SCTP for SOCK_SEQPACKET and
SOCK_STREAM, but not SOCK_DGRAM. So don't register it for
SOCK_DGRAM.
While there, fix some indentation.


223862 08-Jul-2011 zec

Permit ARP to proceed for IPv4 host routes for which the gateway is the
same as the host address. This already works fine for INET6 and ND6.

While here, remove two function pointers from struct lltable which are
only initialized but never used.

MFC after: 3 days


223637 28-Jun-2011 bz

Update packet filter (pf) code to OpenBSD 4.5.

You need to update userland (world and ports) tools
to be in sync with the kernel.

Submitted by: mlaier
Submitted by: eri


222856 08-Jun-2011 bz

Add the missing call to ip6_ipsec_filtertunnel() to be able to control
whether decapsulated IPsec packets will be passed to pfil again depending
on the setting of the net.ip6.ipsec6.filtertunnel sysctl.

PR: kern/157670
Submitted by: Manuel Kasper (mk neon1.net)
MFC after: 2 weeks


222845 08-Jun-2011 bz

Correct comments and debug logging in ipsec to better match reality.

MFC after: 3 days


222748 06-Jun-2011 rwatson

Implement a CPU-affine TCP and UDP connection lookup data structure,
struct inpcbgroup. pcbgroups, or "connection groups", supplement the
existing inpcbinfo connection hash table, which when pcbgroups are
enabled, might now be thought of more usefully as a per-protocol
4-tuple reservation table.

Connections are assigned to connection groups base on a hash of their
4-tuple; wildcard sockets require special handling, and are members
of all connection groups. During a connection lookup, a
per-connection group lock is employed rather than the global pcbinfo
lock. By aligning connection groups with input path processing,
connection groups take on an effective CPU affinity, especially when
aligned with RSS work placement (see a forthcoming commit for
details). This eliminates cache line migration associated with
global, protocol-layer data structures in steady state TCP and UDP
processing (with the exception of protocol-layer statistics; further
commit to follow).

Elements of this approach were inspired by Willman, Rixner, and Cox's
2006 USENIX paper, "An Evaluation of Network Stack Parallelization
Strategies in Modern Operating Systems". However, there are also
significant differences: we maintain the inpcb lock, rather than using
the connection group lock for per-connection state.

Likewise, the focus of this implementation is alignment with NIC
packet distribution strategies such as RSS, rather than pure software
strategies. Despite that focus, software distribution is supported
through the parallel netisr implementation, and works well in
configurations where the number of hardware threads is greater than
the number of NIC input queues, such as in the RMI XLR threaded MIPS
architecture.

Another important difference is the continued maintenance of existing
hash tables as "reservation tables" -- these are useful both to
distinguish the resource allocation aspect of protocol name management
and the more common-case lookup aspect. In configurations where
connection tables are aligned with hardware hashes, it is desirable to
use the traditional lookup tables for loopback or encapsulated traffic
rather than take the expense of hardware hashes that are hard to
implement efficiently in software (such as RSS Toeplitz).

Connection group support is enabled by compiling "options PCBGROUP"
into your kernel configuration; for the time being, this is an
experimental feature, and hence is not enabled by default.

Subject to the limited MFCability of change dependencies in inpcb,
and its change to the inpcbinfo init function signature, this change
in principle could be merged to FreeBSD 8.x.

Reviewed by: bz
Sponsored by: Juniper Networks, Inc.


222734 06-Jun-2011 hrs

Do not activate automatic LL addr configuration when 0/1->1 transition of
ND6_IFF_IFDISABLED flag.


222730 06-Jun-2011 hrs

- Make the code more proactively clear an ND6_IFF_IFDISABLED flag when
an explicit action for INET6 configuration happens. The changes are:

1. When an ND6 flag is changed via SIOCSIFINFO_FLAGS ioctl,
setting ND6_IFF_ACCEPT_RTADV and/or ND6_IFF_AUTO_LINKLOCAL now triggers
an attempt to clear the ND6_IFF_IFDISABLED flag.

2. When an AF_INET6 address is added successfully to an interface and
it is marked as ND6_IFF_IFDISABLED, an attempt to clear the
ND6_IFF_IFDISABLED happens.

This simplifies ND6_IFF_IFDISABLED flag manipulation by users via ifconfig(8);
in most cases manual configuration is no longer needed.

- When ND6_IFF_AUTO_LINKLOCAL is set and no link-local address is assigned to
an interface, SIOCSIFINFO_FLAGS ioctl now calls in6_ifattach() to configure
a link-local address.

This change ensures link-local address configuration when "ifconfig IF inet6"
command is invoked. For example, "ifconfig IF inet6 auto_linklocal" now
always try to configure an LL addr even if ND6_IFF_AUTO_LINKLOCAL is already
set to 1 (i.e. down/up cycle is no longer needed).

Reviewed by: bz


222728 06-Jun-2011 hrs

- Accept Router Advertisement messages even when net.inet6.ip6.forwarding=1.

- A new per-interface knob IFF_ND6_NO_RADR and sysctl IPV6CTL_NO_RADR.
This controls if accepting a route in an RA message as the default route.
The default value for each interface can be set by net.inet6.ip6.no_radr.
The system wide default value is 0.

- A new sysctl: net.inet6.ip6.norbit_raif. This controls if setting R-bit in
NA on RA accepting interfaces. The default is 0 (R-bit is set based on
net.inet6.ip6.forwarding).

Background:

IPv6 host/router model suggests a router sends an RA and a host accepts it for
router discovery. Because of that, KAME implementation does not allow
accepting RAs when net.inet6.ip6.forwarding=1. Accepting RAs on a router can
make the routing table confused since it can change the default router
unintentionally.

However, in practice there are cases where we cannot distinguish a host from
a router clearly. For example, a customer edge router often works as a host
against the ISP, and as a router against the LAN at the same time. Another
example is a complex network configurations like an L2TP tunnel for IPv6
connection to Internet over an Ethernet link with another native IPv6 subnet.
In this case, the physical interface for the native IPv6 subnet works as a
host, and the pseudo-interface for L2TP works as the default IP forwarding
route.

Problem:

Disabling processing RA messages when net.inet6.ip6.forwarding=1 and
accepting them when net.inet6.ip6.forward=0 cause the following practical
issues:

- A router cannot perform SLAAC. It becomes a problem if a box has
multiple interfaces and you want to use SLAAC on some of them, for
example. A customer edge router for IPv6 Internet access service
using an IPv6-over-IPv6 tunnel sometimes needs SLAAC on the
physical interface for administration purpose; updating firmware
and so on (link-local addresses can be used there, but GUAs by
SLAAC are often used for scalability).

- When a host has multiple IPv6 interfaces and it receives multiple RAs on
them, controlling the default route is difficult. Router preferences
defined in RFC 4191 works only when the routers on the links are
under your control.

Details of Implementation Changes:

Router Advertisement messages will be accepted even when
net.inet6.ip6.forwarding=1. More precisely, the conditions are as
follow:

(ACCEPT_RTADV && !NO_RADR && !ip6.forwarding)
=> Normal RA processing on that interface. (as IPv6 host)

(ACCEPT_RTADV && (NO_RADR || ip6.forwarding))
=> Accept RA but add the router to the defroute list with
rtlifetime=0 unconditionally. This effectively prevents
from setting the received router address as the box's
default route.

(!ACCEPT_RTADV)
=> No RA processing on that interface.

ACCEPT_RTADV and NO_RADR are per-interface knob. In short, all interface
are classified as "RA-accepting" or not. An RA-accepting interface always
processes RA messages regardless of ip6.forwarding. The difference caused by
NO_RADR or ip6.forwarding is whether the RA source address is considered as
the default router or not.

R-bit in NA on the RA accepting interfaces is set based on
net.inet6.ip6.forwarding. While RFC 6204 W-1 rule (for CPE case) suggests
a router should disable the R-bit completely even when the box has
net.inet6.ip6.forwarding=1, I believe there is no technical reason with
doing so. This behavior can be set by a new sysctl net.inet6.ip6.norbit_raif
(the default is 0).

Usage:

# ifconfig fxp0 inet6 accept_rtadv
=> accept RA on fxp0
# ifconfig fxp0 inet6 accept_rtadv no_radr
=> accept RA on fxp0 but ignore default route information in it.
# sysctl net.inet6.ip6.norbit_no_radr=1
=> R-bit in NAs on RA accepting interfaces will always be set to 0.


222712 05-Jun-2011 hrs

Use uint8_t for sockaddr sa_len.

Reviewed by: bz


222691 04-Jun-2011 rwatson

Add _mbuf() variants of various inpcb-related interfaces, including lookup,
hash install, etc. For now, these are arguments are unused, but as we add
RSS support, we will want to use hashes extracted from mbufs, rather than
manually calculated hashes of header fields, due to the expensive of the
software version of Toeplitz (and similar hashes).

Add notes that it would be nice to be able to pass mbufs into lookup
routines in pf(4), optimising firewall lookup in the same way, but the
code structure there doesn't facilitate that currently.

(In principle there is no reason this couldn't be MFCed -- the change
extends rather than modifies the KBI. However, it won't be useful without
other previous possibly less MFCable changes.)

Reviewed by: bz
Sponsored by: Juniper Networks, Inc.


222488 30-May-2011 rwatson

Decompose the current single inpcbinfo lock into two locks:

- The existing ipi_lock continues to protect the global inpcb list and
inpcb counter. This lock is now relegated to a small number of
allocation and free operations, and occasional operations that walk
all connections (including, awkwardly, certain UDP multicast receive
operations -- something to revisit).

- A new ipi_hash_lock protects the two inpcbinfo hash tables for
looking up connections and bound sockets, manipulated using new
INP_HASH_*() macros. This lock, combined with inpcb locks, protects
the 4-tuple address space.

Unlike the current ipi_lock, ipi_hash_lock follows the individual inpcb
connection locks, so may be acquired while manipulating a connection on
which a lock is already held, avoiding the need to acquire the inpcbinfo
lock preemptively when a binding change might later be required. As a
result, however, lookup operations necessarily go through a reference
acquire while holding the lookup lock, later acquiring an inpcb lock --
if required.

A new function in_pcblookup() looks up connections, and accepts flags
indicating how to return the inpcb. Due to lock order changes, callers
no longer need acquire locks before performing a lookup: the lookup
routine will acquire the ipi_hash_lock as needed. In the future, it will
also be able to use alternative lookup and locking strategies
transparently to callers, such as pcbgroup lookup. New lookup flags are,
supplementing the existing INPLOOKUP_WILDCARD flag:

INPLOOKUP_RLOCKPCB - Acquire a read lock on the returned inpcb
INPLOOKUP_WLOCKPCB - Acquire a write lock on the returned inpcb

Callers must pass exactly one of these flags (for the time being).

Some notes:

- All protocols are updated to work within the new regime; especially,
TCP, UDPv4, and UDPv6. pcbinfo ipi_lock acquisitions are largely
eliminated, and global hash lock hold times are dramatically reduced
compared to previous locking.
- The TCP syncache still relies on the pcbinfo lock, something that we
may want to revisit.
- Support for reverting to the FreeBSD 7.x locking strategy in TCP input
is no longer available -- hash lookup locks are now held only very
briefly during inpcb lookup, rather than for potentially extended
periods. However, the pcbinfo ipi_lock will still be acquired if a
connection state might change such that a connection is added or
removed.
- Raw IP sockets continue to use the pcbinfo ipi_lock for protection,
due to maintaining their own hash tables.
- The interface in6_pcblookup_hash_locked() is maintained, which allows
callers to acquire hash locks and perform one or more lookups atomically
with 4-tuple allocation: this is required only for TCPv6, as there is no
in6_pcbconnect_setup(), which there should be.
- UDPv6 locking remains significantly more conservative than UDPv4
locking, which relates to source address selection. This needs
attention, as it likely significantly reduces parallelism in this code
for multithreaded socket use (such as in BIND).
- In the UDPv4 and UDPv6 multicast cases, we need to revisit locking
somewhat, as they relied on ipi_lock to stablise 4-tuple matches, which
is no longer sufficient. A second check once the inpcb lock is held
should do the trick, keeping the general case from requiring the inpcb
lock for every inpcb visited.
- This work reminds us that we need to revisit locking of the v4/v6 flags,
which may be accessed lock-free both before and after this change.
- Right now, a single lock name is used for the pcbhash lock -- this is
undesirable, and probably another argument is required to take care of
this (or a char array name field in the pcbinfo?).

This is not an MFC candidate for 8.x due to its impact on lookup and
locking semantics. It's possible some of these issues could be worked
around with compatibility wrappers, if necessary.

Reviewed by: bz
Sponsored by: Juniper Networks, Inc.


222272 25-May-2011 bz

Add FEATURE() definitions for IPv4 and IPv6 so that we can use
feature_present(3) to dynamically decide whether to use one or the
other family.

Reviewed by: gnn
Sponsored by: The FreeBSD Foundation
Sponsored by: iXsystems
MFC after: 10 days


222215 23-May-2011 rwatson

Move from passing a wildcard boolean to a general set up lookup flags into
in_pcb_lport(), in_pcblookup_local(), and in_pcblookup_hash(), and similarly
for IPv6 functions. In the future, we would like to support other flags
relating to locking strategy.

This change doesn't appear to modify the KBI in practice, as callers already
passed in INPLOOKUP_WILDCARD rather than a simple boolean.

MFC after: 3 weeks
Reviewed by: bz
Sponsored by: Juniper Networks, Inc.


222143 20-May-2011 qingli

The statically configured (permanent) ARP entries are removed when an
interface is brought down, even though the interface address is still
valid. This patch maintains the permanent ARP entries as long as the
interface address (having the same prefix as that of the ARP entries)
is valid.

Reviewed by: delphij
MFC after: 5 days


221411 03-May-2011 tuexen

Remove code with any effect.


221249 30-Apr-2011 tuexen

Improve compilation of SCTP code without INET support.
Some bugs where fixed while doing this:
* ASCONF-ACK messages might use wrong port number when using
IPv6.
* Checking for additional addresses takes the correct address
into account and also does not do more comparisons than
necessary.

This patch is based on one received from bz@ who was
sponsored by The FreeBSD Foundation and iXsystems.

MFC after: 1 week


221248 30-Apr-2011 bz

Make the UDP code compile without INET. Expose udp_usrreq.c to IPv6 only
as well compiling out most functions adding or extending #ifdef INET
coverage.

Reviewed by: gnn
Sponsored by: The FreeBSD Foundation
Sponsored by: iXsystems
MFC after: 4 days


221247 30-Apr-2011 bz

Make the PCB code compile without INET support by adding #ifdef INETs
and correcting few #includes.

Reviewed by: gnn
Sponsored by: The FreeBSD Foundation
Sponsored by: iXsystems
MFC after: 4 days


221129 27-Apr-2011 bz

Make IPsec compile without INET adding appropriate #ifdef checks.

Unfold the IPSEC_COMMON_INPUT_CB() macro in xform_{ah,esp,ipcomp}.c
to not need three different versions depending on INET, INET6 or both.

Mark two places preparing for not yet supported functionality with IPv6.

Reviewed by: gnn
Sponsored by: The FreeBSD Foundation
Sponsored by: iXsystems
MFC after: 4 days


221009 25-Apr-2011 ticso

correct variable type name in comment


220881 20-Apr-2011 bz

MFp4 CH=191760,191770:

Not compiling in and not initializing from inetsw from in_proto.c for
IPv6 only, we need to initialize upper layer protocols from inet6sw.
Make sure to not initialize them twice in a Dual-Stack
environment but only conditionally on no INET as we have done for
TCP for a long time. Otherwise we would leak resources.

Reviewed by: gnn
Sponsored by: The FreeBSD Foundation
Sponsored by: iXsystems
MFC after: 3 days


220743 17-Apr-2011 bz

Fix IPv6 ND. After r219562 we in nd6_ns_input() were erroneously always
passing the cached proxydl reference (sockaddr_dl initialized or not) to
nd6_na_output(). nd6_na_output() will thus assume a proxy NA. Revert to
conditionally passing either &proxydl or NULL if no proxy case desired.

Tested by: ipv6gw and ref9-i386
Reported by: Pete French (petefrench ingresso.co.uk on stable)
Reported by: bz, simon on Y! cluster
Reported by: kib
PR: kern/151908
MFC after: 3 days


220463 09-Apr-2011 bz

Remove a check in udp6_send() that prevented v4-mapped v6 addresses from
working. We store v4 and v6 addresses as a union but for v4-mapped
addresses only store the 32bits w/o the ::ffff: word. That failed the
check as for example 127.0.0.1 would be ::7f00:1 rather than ::ffff:7f00:1
and the IN6_IS_ADDR_V4MAPPED() never worked here. Given we can hardly get
here with an unbound local address or invalid inp_vflags remove the check.

Reported by: tuexen
Reviewed by: tuexen
MFC after: 3 days


220462 09-Apr-2011 bz

After r219579 and r219779 unbreak v4-mapped v6 sockets for UDP
some more. Similar to what we do for TCP check for v4-mapped
addresses and then handle them or the normal v6 address case.
For either set inp_vflags before calling into the pcb connect
function so that we have an unambiguous view in case we need to
set the local address or port.

Looked at: tuexen (as part of more)
MFC after: 3 days


219819 21-Mar-2011 jeff

- Merge changes to the base system to support OFED. These include
a wider arg2 for sysctl, updates to vlan code, IFT_INFINIBAND,
and other miscellaneous small features.


219579 12-Mar-2011 bz

Merge the two identical implementations for local port selections from
in_pcbbind_setup() and in6_pcbsetport() in a single in_pcb_lport().

MFC after: 2 weeks


219570 12-Mar-2011 bz

Push a possible "unbind" in some situation from in6_pcbsetport() to
callers. This also fixes a problem when the prison call could set
the inp->in6p_laddr (laddr) and a following priv_check_cred() call
would return an error and will allow us to merge the IPv4 and IPv6
implementation.

MFC after: 2 weeks


219562 12-Mar-2011 bz

Make sure the locally cached value of rt->rt_gateway stays stable,
even after dropping the reference and unlocking. Previously we
have dereferenced a NULL pointer (after r121765).
Simply unlocking after the block does not work either because of
lock ordering (see r121765) and in addition we would still hold
a pointer to something that might be gone by the time we access it.
Thus take a copy of the value rather than just caching the pointer.

PR: kern/151908
Submitted by: chenyl (netstar2008 126.com) (initial version)
MFC after: 2 weeks


218909 21-Feb-2011 brucec

Fix typos - remove duplicate "the".

PR: bin/154928
Submitted by: Eitan Adler <lists at eitanadler.com>
MFC after: 3 days


218400 07-Feb-2011 tuexen

Fix bugs related to M_FLOWID:
* Store the flowid when receiving an SCTP/IPv6 packet.
* Store the flowid when receiving an SCTP packet with wrong CRC.
* Initilize flowid correctly.
* Put test code under INVARIANTS.
MFC after: 3 months.


218319 05-Feb-2011 rrs

1) Typo correction in comments and one spacing change.
2) Mass update to all copyrights.
MFC after: 3 Months


216669 22-Dec-2010 tuexen

Improve plausibility check in sctp_handle_sack().
Allow cmt_on_off to support values 0 (no CMT), 1 (CMT), and 2 (CMT/RP).

MFC after: 3 months.


216650 22-Dec-2010 jhay

Add IFT_L2VLAN to the list that is capable of supplying the ingredients
of the EUI64 part of an IPv6 address. Otherwise vlans will all use the
MAC address of the first ethernet interface of the system.

MFC after: 1 week


216277 07-Dec-2010 bz

Loosen the locking in nd6-free() again after r216022 to avoid
a LOR and a recursed lock.

Reported by: delphij
Tested by: delphij
PR: kern/148857
MFC After: 3 days


216022 29-Nov-2010 bz

Plug well observed races on la_hold entries with the callout handler.

Call the handler function with the lock held, return unlocked as we
might free the entry. Rework functions later in the call graph to be
either called with the lock held or, only if needed, unlocked.

Place asserts to document and tighten assumptions on various lle locking,
which were not always true before.

We call nd6_ns_output() unlocked and the assignment of ip6->ip6_src was
decentralized to minimize possible complexity introduced with the formerly
missing locking there. This also resulted in a push down of local
variable scopes into smaller blocks.

Reported by: many
PR: kern/148857
Submitted by: Dmitrij Tejblum (tejblum yandex-team.ru) (original version)
MFC After: 4 days


215956 27-Nov-2010 brucec

Fix more continuous/contiguous typos (cf. r215955)


215701 22-Nov-2010 dim

After some off-list discussion, revert a number of changes to the
DPCPU_DEFINE and VNET_DEFINE macros, as these cause problems for various
people working on the affected files. A better long-term solution is
still being considered. This reversal may give some modules empty
set_pcpu or set_vnet sections, but these are harmless.

Changes reverted:

------------------------------------------------------------------------
r215318 | dim | 2010-11-14 21:40:55 +0100 (Sun, 14 Nov 2010) | 4 lines

Instead of unconditionally emitting .globl's for the __start_set_xxx and
__stop_set_xxx symbols, only emit them when the set_vnet or set_pcpu
sections are actually defined.

------------------------------------------------------------------------
r215317 | dim | 2010-11-14 21:38:11 +0100 (Sun, 14 Nov 2010) | 3 lines

Apply the STATIC_VNET_DEFINE and STATIC_DPCPU_DEFINE macros throughout
the tree.

------------------------------------------------------------------------
r215316 | dim | 2010-11-14 21:23:02 +0100 (Sun, 14 Nov 2010) | 2 lines

Add macros to define static instances of VNET_DEFINE and DPCPU_DEFINE.


215559 20-Nov-2010 bz

In case of an early return from the function there is no need to zero
the route upfront, so defer as long as we can.

MFC after: 3 days


215423 17-Nov-2010 bz

Do not initialize flag variables before needed.
Consistently use the LLE_ prefix for lla_lookup() and the ND6_ prefix
for nd6_lookup() even though both are defined the same. Use the right
flag variable when checking each.

No real functional change.

MFC after: 4 days


215418 17-Nov-2010 bz

No need to re-initialize the callout. We initially do it in in6_lltable_new()
right after allocation. Worse, we are losing the right flags here.

MFC after: 4 days


215317 14-Nov-2010 dim

Apply the STATIC_VNET_DEFINE and STATIC_DPCPU_DEFINE macros throughout
the tree.


214250 23-Oct-2010 bz

Make the IPsec SADB embedded route cache a union to be able to hold both the
legacy and IPv6 route destination address.
Previously in case of IPv6, there was a memory overwrite due to not enough
space for the IPv6 address.

PR: kern/122565
MFC After: 2 weeks


213766 13-Oct-2010 rpaulo

Purposely tell the compiler that we ignore the return value of ADDCARRY()
in the REDUCE macro.

Reviewed by: dim, rdivacky


213225 27-Sep-2010 delphij

Add a bandaid for a long-standing race condition during route entry
un-expiring.

The previous version of code have no locking when testing rt_refcnt.
The result of the lack of locking may result in a condition where
a routing entry have a reference count but at the same time have
RTPRF_OURS bit set and an expiration timer. These would eventually
lead to a panic:

panic: rtqkill route really not free

When the system have ICMP redirects accepted from local gateway
in a moderate frequency, for instance.

Commit this workaround for now until we have some better solution.

PR: kern/149804
Reviewed by: bz
Tested by: Zhao Xin, Pete French
MFC after: 2 weeks


213101 24-Sep-2010 attilio

IP_BINDANY is not correctly handled in getsockopt() case.
Fix it by specifying the correct bits.

Sponsored by: Sandvine Incorporated
Reviewed by: bz, emaste, rstone
Obtained from: Sandvine Incorporated
MFC after: 10 days


212699 15-Sep-2010 tuexen

Remove unused variables.

MFC after: 2 weeks.


212155 02-Sep-2010 bz

MFp4 CH=183052 183053 183258:

In protosw we define pr_protocol as short, while on the wire
it is an uint8_t. That way we can have "internal" protocols
like DIVERT, SEND or gaps for modules (PROTO_SPACER).
Switch ipproto_{un,}register to accept a short protocol number(*)
and do an upfront check for valid boundries. With this we
also consistently report EPROTONOSUPPORT for out of bounds
protocols, as we did for proto == 0. This allows a caller
to not error for this case, which is especially important
if we want to automatically call these from domain handling.

(*) the functions have been without any in-tree consumer
since the initial introducation, so this is considered save.

Implement ip6proto_{un,}register() similarly to their legacy IP
counter parts to allow modules to hook up dynamically.

Reviewed by: philip, will
MFC after: 1 week


211969 29-Aug-2010 tuexen

Fix the the SCTP_WITH_NO_CSUM option when used in combination with
interface supporting CRC offload. While at it, make use of the
feature that the loopback interface provides CRC offloading.

MFC after: 4 weeks


211944 28-Aug-2010 tuexen

Fix the switching on/off of CMT using sysctl and socket option.
Fix the switching on/off of PF and NR-SACKs using sysctl.
Add minor improvement in handling malloc failures.
Improve the address checks when sending.

MFC after: 4 weeks


211530 20-Aug-2010 ume

optp may be NULL.


211520 19-Aug-2010 anchie

Fix mbuf leakages and remove unneccessary duplicate mbuf frees.
Use the right copy of an mbuf for the IP6_EXTHDR_CHECK.

Reported by: zec, hrs
Approved by: bz (mentor)


211501 19-Aug-2010 anchie

MFp4: anchie_soc2009 branch:

Add kernel side support for Secure Neighbor Discovery (SeND), RFC 3971.

The implementation consists of a kernel module that gets packets from
the nd6 code, sends them to user space on a dedicated socket and reinjects
them back for further processing.

Hooks are used from nd6 code paths to divert relevant packets to the
send implementation for processing in user space. The hooks are only
triggered if the send module is loaded. In case no user space
application is connected to the send socket, processing continues
normaly as if the module would not be loaded. Unloading the module
is not possible at this time due to missing nd6 locking.

The native SeND socket is similar to a raw IPv6 socket but with its own,
internal pseudo-protocol.

Approved by: bz (mentor)


211435 17-Aug-2010 ume

Make `ping6 -I' work with net.inet6.ip6.use_defaultzone=1.

MFC after: 2 weeks


211301 14-Aug-2010 bz

In rip6_input(), in case of multicast, we might skip the normal processing
and go to the next iteration early if multicast filtering would decide that
this socket shall not receive the data.
Unlock the pcb in that case or we leak the read lock and next time trying
to get a write lock, would hang forever.

PR: kern/149608
Submitted by: Chris Luke (chrisy flirble.org)
MFC after: 3 days


211193 11-Aug-2010 will

Unbreak LINT by moving all carp hooks to net/if.c / netinet/ip_carp.h, with
the appropriate ifdefs.

Reviewed by: bz
Approved by: ken (mentor)


211157 11-Aug-2010 will

Allow carp(4) to be loaded as a kernel module. Follow precedent set by
bridge(4), lagg(4) etc. and make use of function pointers and
pf_proto_register() to hook carp into the network stack.

Currently, because of the uncertainty about whether the unload path is free
of race condition panics, unloads are disallowed by default. Compiling with
CARPMOD_CAN_UNLOAD in CFLAGS removes this anti foot shooting measure.

This commit requires IP6PROTOSPACER, introduced in r211115.

Reviewed by: bz, simon
Approved by: ken (mentor)
MFC after: 2 weeks


211115 09-Aug-2010 bz

MFp4 CH180235:

Add proto spacers to inet6sw like we have for legacy IP. This allows us
to dynamically pf_proto_register() for INET6 from modules, needed by
upcoming CARP changes and SeND.
MC and SCTP could make use of it as well in theory in the future after
upcoming VIMAGE vnet teardown work.

Discussed with: will, anchie
MFC after: 10 days


210703 31-Jul-2010 bz

Document the mandatory argument to the arptimer() and
nd6_llinfo_timer() functions with a KASSERT().
Note: there is no need to return after panic.

In the legacy IP case, only assign the arg after the check,
in the IPv6 case, remove the extra checks for the table and
interface as they have to be there unless we freed and forgot
to cancel the timer. It doesn't matter anyway as we would
panic on the NULL pointer deref immediately and the bug is
elsewhere.
This unifies the code of both address families to some extend.

Reviewed by: rwatson
MFC after: 6 days


210350 21-Jul-2010 bz

Since r186119 IP6 input counters for octets and packets were not
working anymore. In addition more checks and operations were missing.

In case lla_lookup results in a match, get the ifaddr to update the
statistics counters, and check that the address is neither tentative,
duplicate or otherwise invalid before accepting the packet. If ok,
record the address information in the mbuf. [ as is done in case
lla_lookup does not return a result and we go through the FIB ].

Reported by: remko
Tested by: remko
MFC after: 2 weeks


208284 19-May-2010 alfred

Fix our version of IPv6 address representation.

We do not respect rules 3 and 4 in the required list:

1. omit leading zeros

2. "::" used to their maximum extent whenever possible

3. "::" used where shortens address the most

4. "::" used in the former part in case of a tie breaker

5. do not shorten one 16 bit 0 field

6. use lower case

http://tools.ietf.org/html/draft-ietf-6man-text-addr-representation-04.html

Submitted by: Kalluru Abhiram @ Juniper Networks
Obtained from: Juniper Networks
Reviewed by: hrs, dougb


208171 16-May-2010 kmacy

allocate ipv6 flows from the ipv6 flow zone

reported by: rrs@

MFC after: 3 days


208043 13-May-2010 kmacy

do a proper fix

Pointed out by: np@

MFC after: 3 days


208042 13-May-2010 kmacy

fix compile error on some builds by doing the equivalent of
an "extern VNET_DEFINE" without "__used"

MFC after: 3 days


207949 12-May-2010 kmacy

try working around panic by validating rt and lle

MFC after: 3 days


207902 10-May-2010 kmacy

boot time size the flowtable

MFC after: 3 days


207828 09-May-2010 kmacy

Add flowtable support to IPv6

Tested by: qingli@

Reviewed by: qingli@
MFC after: 3 days


207369 29-Apr-2010 bz

MFP4: @176978-176982, 176984, 176990-176994, 177441

"Whitspace" churn after the VIMAGE/VNET whirls.

Remove the need for some "init" functions within the network
stack, like pim6_init(), icmp_init() or significantly shorten
others like ip6_init() and nd6_init(), using static initialization
again where possible and formerly missed.

Move (most) variables back to the place they used to be before the
container structs and VIMAGE_GLOABLS (before r185088) and try to
reduce the diff to stable/7 and earlier as good as possible,
to help out-of-tree consumers to update from 6.x or 7.x to 8 or 9.

This also removes some header file pollution for putatively
static global variables.

Revert VIMAGE specific changes in ipfilter::ip_auth.c, that are
no longer needed.

Reviewed by: jhb
Discussed with: rwatson
Sponsored by: The FreeBSD Foundation
Sponsored by: CK Software GmbH
MFC after: 6 days


207277 27-Apr-2010 bz

Enhance the historic behaviour of raw sockets and jails in a way
that we allow all possible jail IPs as source address rather than
forcing the "primary". While IPv6 naturally has source address
selection, for legacy IP we do not go through the pain in case
IP_HDRINCL was not set. People should bind(2) for that.

This will, for example, allow ping(|6) -S to work correctly for
non-primary addresses.

Reported by: (ten 211.ru)
Tested by: (ten 211.ru)
MFC after: 4 days


207276 27-Apr-2010 bz

Make sure IPv6 source address selection does not change interface
addresses while walking the IPv6 address list if in the jail case
something is connecting to ::1.

Reported by: Pieter de Boer (pieter thedarkside.nl)
Tested by: Pieter de Boer (pieter thedarkside.nl)
MFC after: 4 days


207268 27-Apr-2010 kib

Provide 32bit compat for SIOCGDEFIFACE_IN6.

Based on submission by: pluknet gmail com
Reviewed by: emaste
MFC after: 2 weeks


206481 11-Apr-2010 bz

Plug reference leaks in the link-layer code ("new-arp") that previously
prevented the link-layer entry from being freed.

In both in.c and in6.c (though that code path seems to be basically dead)
plug a reference leak in case of a pending callout being drained.

In if_ether.c consistently add a reference before resetting the callout
and in case we canceled a pending one remove the reference for that.
In the final case in arptimer, before freeing the expired entry, remove
the reference again and explicitly call callout_stop() to clear the active
flag.

In nd6.c:nd6_free() we are only ever called from the callout function and
thus need to remove the reference there as well before calling into
llentry_free().

In if_llatbl.c when freeing entire tables make sure that in case we cancel
a pending callout to remove the reference as well.

Reviewed by: qingli (earlier version)
MFC after: 10 days
Problem observed, patch tested by: simon on ipv6gw.f.o,
Christian Kratzer (ck cksoft.de),
Evgenii Davidov (dado korolev-net.ru)
PR: kern/144564
Configurations still affected: with options FLOWTABLE


206454 10-Apr-2010 bms

When embedding the scope ID in MLDv1 output, check if the scope of the address
being embedded is in fact link-local, before attempting to embed it.

Note that this operation is a side-effect of trying to avoid recursion on
the IN6 scope lock.

PR: 144560
Submitted by: Petr Lampa
MFC after: 3 days


206137 03-Apr-2010 tuexen

* Fix some race condition in SACK/NR-SACK processing.
* Fix handling of mapping arrays when draining mbufs or processing
FORWARD-TSN chunks.
* Cleanup code (no duplicate code anymore for SACKs and NR-SACKs).
Part of this code was developed together with rrs.
MFC after: 2 weeks.


205637 25-Mar-2010 bz

We are holding a write lock here so avoid aquiring it twice calling
the "locked" version rather than the wrapper function.

MFC after: 6 days


205104 12-Mar-2010 rrs

The proper fix for the delayed SCTP checksum is to
have the delayed function take an argument as to the offset
to the SCTP header. This allows it to work for V4 and V6.
This of course means changing all callers of the function
to either pass the header len, if they have it, or create
it (ip_hl << 2 or sizeof(ip6_hdr)).
PR: 144529
MFC after: 2 weeks


205075 12-Mar-2010 rrs

With the recent change of the sctp checksum to support offload,
no delayed checksum was added to the ip6 output code. This
causes cards that do not support SCTP checksum offload to
have SCTP packets that are IPv6 NOT have the sctp checksum
performed. Thus you could not communicate with a peer. This
adds the missing bits to make the checksum happen for these cards.

PR: 144529
MFC after: 2 weeks


204402 27-Feb-2010 qingli

Use reference counting instead of locking to secure an address while
that address is being used to generate temporary IPv6 address. This
approach is sufficient and avoids recursive locking.

MFC after: 3 days


204072 18-Feb-2010 pjd

No need to include security/mac/mac_framework.h here.


202915 24-Jan-2010 bz

Correct a typo.

Submitted by: kensmith
MFC after: 3 days


202469 17-Jan-2010 bz

Garbage collect references to the no longer implemented tcp_fasttimo().

Discussed with: rwatson
MFC after: 5 days


202468 17-Jan-2010 bz

Add ip4.saddrsel/ip4.nosaddrsel (and equivalent for ip6) to control
whether to use source address selection (default) or the primary
jail address for unbound outgoing connections.

This is intended to be used by people upgrading from single-IP
jails to multi-IP jails but not having to change firewall rules,
application ACLs, ... but to force their connections (unless
otherwise changed) to the primry jail IP they had been used for
years, as well as for people prefering to implement similar policies.

Note that for IPv6, if configured incorrectly, this might lead to
scope violations, which single-IPv6 jails could as well, as by the
design of jails. [1]

Reviewed by: jamie, hrs (ipv6 part)
Pointed out by: hrs [1]
MFC After: 2 weeks
Asked for by: Jase Thew (bazerka beardz.net)


201794 08-Jan-2010 trasz

Replace several instances of 'if (!a & b)' with 'if (!(a &b))' in order
to silence newer GCC versions.


201688 06-Jan-2010 bz

Correct a typo.

Submitted by: sn_ (sn_ gmx.net) on hackers@
MFC after: 3 days


201543 04-Jan-2010 qingli

The IFA_RTSELF address flag marks a loopback route has been installed
for the interface address. This marker is necessary to properly support
PPP types of links where multiple links can have the same local end
IP address. The IFA_RTSELF flag bit maps to the RTF_HOST value, which
was combined into the route flag bits during prefix installation in
IPv6. This inclusion causing the prefix route to be unusable. This
patch fixes this bug by excluding the IFA_RTSELF flag during route
installation.

MFC after: 5 days


201284 30-Dec-2009 qingli

Multiple IPv6 addresses of the same prefix can be installed on the
same interface. The first address will install the prefix route into
the kernel routing table and that prefix will be marked as on-link.
Without RADIX_MPATH enabled, the other address aliases of the same
prefix will update the prefix reference count but no other routes
will be installed. Consequently the prefixes associated with these
addresses would not be marked as on-link. As such, incoming packets
destined to these address aliases will fail the ND6 on-link check
on input. This patch fixes the above problem by searching the kernel
routing table and try to find an on-link prefix on the given interface.

MFC after: 5 days


201282 30-Dec-2009 qingli

The proxy arp entries could not be added into the system over the
IFF_POINTOPOINT link types. The reason was due to the routing
entry returned from the kernel covering the remote end is of an
interface type that does not support ARP. This patch fixes this
problem by providing a hint to the kernel routing code, which
indicates the prefix route instead of the PPP host route should
be returned to the caller. Since a host route to the local end
point is also added into the routing table, and there could be
multiple such instantiations due to multiple PPP links can be
created with the same local end IP address, this patch also fixes
the loopback route installation failure problem observed prior to
this patch. The reference count of loopback route to local end would
be either incremented or decremented. The first instantiation would
create the entry and the last removal would delete the route entry.

MFC after: 5 days


200871 22-Dec-2009 bms

Use ALLOW_NEW_SOURCES and BLOCK_OLD_SOURCES to signal a join or leave
with SSM MLDv2 by default.
This is current practice and complies with RFC 4604, as well as being
required by production IPv6 networks in Japan.
The behaviour may be disabled by setting the net.inet6.mld.use_allow
sysctl/tunable to 0.

Requested by: Hideki Yamamoto
MFC after: 1 week


200572 15-Dec-2009 bms

Add missing #include <sys/ktr.h>.

Submitted by: Hideki Yamamoto
MFC after: 1 week


200473 13-Dec-2009 bz

Throughout the network stack we have a few places of
if (jailed(cred))
left. If you are running with a vnet (virtual network stack) those will
return true and defer you to classic IP-jails handling and thus things
will be "denied" or returned with an error.

Work around this problem by introducing another "jailed()" function,
jailed_without_vnet(), that also takes vnets into account, and permits
the calls, should the jail from the given cred have its own virtual
network stack.

We cannot change the classic jailed() call to do that, as it is used
outside the network stack as well.

Discussed with: julian, zec, jamie, rwatson (back in Sept)
MFC after: 5 days


199528 19-Nov-2009 bms

Adapt r197136 to IPv6 stack:
Comment some flawed assumptions in in6p_join_group() about
mixing SSM full-state and delta-based APIs.

MFC after: 1 day


199527 19-Nov-2009 bms

Adapt r197135 to IPv6 stack:
Don't allow joins w/o source on an existing group.
This is almost always pilot error.

We don't need to check for group filter UNDEFINED state at t1,
because we only ever allocate filters with their groups, so we
unconditionally reject such calls with EINVAL.
Trying to change the active filter mode w/o going through IPV6_MSFILTER
is also disallowed.

MFC after: 1 day


199526 19-Nov-2009 bms

Adapt r197132 to IPv6 stack:
Tighten input checking in in6p_join_group():
* Don't try to use the source address, when its family is unspecified.
* If we get a join without a source, on an existing inclusive
mode group, this is an error, as it would change the filter mode.

Fix a problem with the handling of in6_mfilter for new memberships:
* Do not rely on im6f being NULL; it is explicitly initialized to a
non-NULL pointer when constructing a membership.
* Explicitly initialize *im6f to EX mode when the source address
is unspecified.

This fixes a problem with in_mfilter slot recycling in the join path.

MFC after: 1 day


199523 19-Nov-2009 bms

Adapt r197314 to IPv6 stack:
Return ENOBUFS consistently if user attempts to exceed
in_mcast_maxsocksrc resource limit.

MFC after: 1 day


199522 19-Nov-2009 bms

Adapt r197130 to IPv6 stack:
Fix an obvious logic error in the IPv4 multicast leave processing,
where the filter mode vector was not updated correctly after the leave.

MFC after: 1 day


199518 19-Nov-2009 bms

Adapt the fix for IGMPv2 in r199287 for the IPv6 stack.
Only multicast routing is affected by the issue.

MFC after: 1 day


199225 12-Nov-2009 ume

- We are not guaranteed that we're not dropping a reference that
we did not add. Call LLE_REMREF() only when callout_stop()
actually canceled a pending callout.
- callout_reset() may cancel a pending callout. When
callout_reset() canceled a pending callout, call LLE_REMREF()
to drop a reference for the canceled callout.

MFC after: 1 week


199173 11-Nov-2009 ume

CURVNET_RESTORE() was not called in certain cases.

MFC after: 3 days


198993 06-Nov-2009 ume

Make nd6_llinfo_timer() does its job, again. ln->la_expire was
greater than time_second, in most cases.

MFC after: 3 days


198976 06-Nov-2009 ume

Don't call LLE_FREE() after nd6_free().

MFC after: 3 days


198418 23-Oct-2009 qingli

Use the correct option name in the preprocessor command to enable
or disable diagnostic messages.

Reviewed by: ru
MFC after: 3 days


198076 14-Oct-2009 bz

Explicitly compare to a return code.

Discussed with: philip (after we both misread the logic there the 1st time)
MFC after: 6 weeks


197996 12-Oct-2009 hrs

- Do not assign a link-local address when ND6_IFF_IFDISABLED.
Adding a tentative address is useless.

- Comment out a confused warning message when
in6_ifattach_linklocal() fails. This can occur when the
interface does not support ioctl(SIOCAIFADDR) (interfaces
associated with 802.11 wireless network device drivers, for
example).


197952 11-Oct-2009 julian

Virtualize the pfil hooks so that different jails may chose different
packet filters. ALso allows ipfw to be enabled on on ejail and disabled
on another. In 8.0 it's a global setting.

Sitting aroung in tree waiting to commit for: 2 months
MFC after: 2 months


197703 02-Oct-2009 hrs

Enable adding a link-local address even if ND6_IFF_IFDISABLED.
Note that when the interface has ND6_IFF_IFDISABLED, a newly-added
address is always marked as IN6_IFF_TENTATIVE so that the interface
can perform DAD after the ND6_IFF_IFDISABLED is cleared.


197288 17-Sep-2009 rrs

Support for VNET in SCTP (hopefully)


197227 15-Sep-2009 qingli

Self pointing routes are installed for configured interface addresses
and address aliases. After an interface is brought down and brought
back up again, those self pointing routes disappeared. This patch
ensures after an interface is brought back up, the loopback routes
are reinstalled properly.

Reviewed by: bz
MFC after: immediately


197138 12-Sep-2009 hrs

Improve flexibility of receiving Router Advertisement and
automatic link-local address configuration:

- Convert a sysctl net.inet6.ip6.accept_rtadv to one for the
default value of a per-IF flag ND6_IFF_ACCEPT_RTADV, not a
global knob. The default value of the sysctl is 0.

- Add a new per-IF flag ND6_IFF_AUTO_LINKLOCAL and convert a
sysctl net.inet6.ip6.auto_linklocal to one for its default
value. The default value of the sysctl is 1.

- Make ND6_IFF_IFDISABLED more robust. It can be used to disable
IPv6 functionality of an interface now.

- Receiving RA is allowed if ip6_forwarding==0 *and*
ND6_IFF_ACCEPT_RTADV is set on that interface. The former
condition will be revisited later to support a "host + router" box
like IPv6 CPE router. The current behavior is compatible with
the older releases of FreeBSD.

- The ifconfig(8) now supports these ND6 flags as well as "nud",
"prefer_source", and "disabled" in ndp(8). The ndp(8) now
supports "auto_linklocal".

Discussed with: bz and jinmei
Reviewed by: bz
MFC after: 3 days


196871 05-Sep-2009 qingli

The addresses that are assigned to the loopback interface
should be part of the kernel routing table.

Reviewed by: bz
MFC after: immediately


196865 05-Sep-2009 qingli

This patch fixes an address scope violation. Considering the
scenario where an anycast address is assigned on one interface,
and a global address with the same scope is assigned on another
interface. In other words, the interface owns the anycast
address has only the link-local address as one other address.
Without this patch, "ping6" the anycast address from another
station will observe the source address of the returned ICMP6
echo reply has the link-local address, not the global address
that exists on the other interface in the same node.

Reviewed by: bz
MFC after: immediately


196864 05-Sep-2009 qingli

This patch fixes the following issues:
- Interface link-local address is not reachable within the
node that owns the interface, this is due to the mismatch
in address scope as the result of the installed interface
address loopback route. Therefore for each interface
address loopback route, the rt_gateway field (of AF_LINK
type) will be used to track which interface a given
address belongs to. This will aid the address source to
use the proper interface for address scope/zone validation.
- The loopback address is not reachable. The root cause is
the same as the above.
- Empty nd6 entries are created for the IPv6 loopback addresses
only for validation reason. Doing so will eliminate as much
of the special case (loopback addresses) handling code
as possible, however, these empty nd6 entries should not
be returned to the userland applications such as the
"ndp" command.
Since both of the above issues contain common files, these
files are committed together.

Reviewed by: bz
MFC after: immediately


196649 30-Aug-2009 qingli

Prefix on-link verification is being performed on statically
configured prefixes. Since these statically configured prefixes
do not have any associated advertising routers, these prefixes
are treated as unreachable and those prefix routes are deleted
from the routing table. Therefore bypass prefixes that are not
learned from router advertisements during prefix on-link check.

Reviewed by: hrs


196569 26-Aug-2009 qingli

When multiple interfaces exist in the system, with each interface having
an IPv6 address assigned to it, and if an incoming packet received on
one interface has a packet destination address that belongs to another
interface, the routing table is consulted to determine how to reach this
packet destination. Since the packet destination is an interface address,
the route table will return a host route with the loopback interface as
rt_ifp. The input code must recognize this fact, instead of using the
loopback interface, the input code performs a search to find the right
interface that owns the given IPv6 address.

Reviewed by: bz, gnn, kmacy
MFC after: immediately


196535 25-Aug-2009 rwatson

Use locks specific to the lltable code, rather than borrow the ifnet
list/index locks, to protect link layer address tables. This avoids
lock order issues during interface teardown, but maintains the bug that
sysctl copy routines may be called while a non-sleepable lock is held.

Reviewed by: bz, kmacy
MFC after: 3 days


196481 23-Aug-2009 rwatson

Rework global locks for interface list and index management, correcting
several critical bugs, including race conditions and lock order issues:

Replace the single rwlock, ifnet_lock, with two locks, an rwlock and an
sxlock. Either can be held to stablize the lists and indexes, but both
are required to write. This allows the list to be held stable in both
network interrupt contexts and sleepable user threads across sleeping
memory allocations or device driver interactions. As before, writes to
the interface list must occur from sleepable contexts.

Reviewed by: bz, julian
MFC after: 3 days


196152 12-Aug-2009 qingli

A piece of code was added to install a host route when an IPv6 interface
address is configured with a /128 prefix. This is no longer necessary due
to r192011. In fact that code conflicts with r192011. This patch removes
the host route installation when detecting the /128 prefix, and instead
let the code added by r192011 to install the loopback route for that IPv6
interface address.

Reviewed by: bz
Approved by: re


196039 02-Aug-2009 rwatson

Many network stack subsystems use a single global data structure to hold
all pertinent statatistics for the subsystem. These structures are
sometimes "borrowed" by kernel modules that require a place to store
statistics for similar events.

Add KPI accessor functions for statistics structures referenced by kernel
modules so that they no longer encode certain specifics of how the data
structures are named and stored. This change is intended to make it
easier to move to per-CPU network stats following 8.0-RELEASE.

The following modules are affected by this change:

if_bridge
if_cxgb
if_gif
ip_mroute
ipdivert
pf

In practice, most of these statistics consumers should, in fact, maintain
their own statistics data structures rather than borrowing structures
from the base network stack. However, that change is too agressive for
this point in the release cycle.

Reviewed by: bz
Approved by: re (kib)


196019 01-Aug-2009 rwatson

Merge the remainder of kern_vimage.c and vimage.h into vnet.c and
vnet.h, we now use jails (rather than vimages) as the abstraction
for virtualization management, and what remained was specific to
virtual network stacks. Minor cleanups are done in the process,
and comments updated to reflect these changes.

Reviewed by: bz
Approved by: re (vimage blanket)


195914 27-Jul-2009 qingli

This patch does the following:

- Allow loopback route to be installed for address assigned to
interface of IFF_POINTOPOINT type.
- Install loopback route for an IPv4 interface addreess when the
"useloopback" sysctl variable is enabled. Similarly, install
loopback route for an IPv6 interface address when the sysctl variable
"nd6_useloopback" is enabled. Deleting loopback routes for interface
addresses is unconditional in case these sysctl variables were
disabled after an interface address has been assigned.

Reviewed by: bz
Approved by: re


195837 23-Jul-2009 rwatson

Introduce and use a sysinit-based initialization scheme for virtual
network stacks, VNET_SYSINIT:

- Add VNET_SYSINIT and VNET_SYSUNINIT macros to declare events that will
occur each time a network stack is instantiated and destroyed. In the
!VIMAGE case, these are simply mapped into regular SYSINIT/SYSUNINIT.
For the VIMAGE case, we instead use SYSINIT's to track their order and
properties on registration, using them for each vnet when created/
destroyed, or immediately on module load for already-started vnets.
- Remove vnet_modinfo mechanism that existed to serve this purpose
previously, as well as its dependency scheme: we now just use the
SYSINIT ordering scheme.
- Implement VNET_DOMAIN_SET() to allow protocol domains to declare that
they want init functions to be called for each virtual network stack
rather than just once at boot, compiling down to DOMAIN_SET() in the
non-VIMAGE case.
- Walk all virtualized kernel subsystems and make use of these instead
of modinfo or DOMAIN_SET() for init/uninit events. In some cases,
convert modular components from using modevent to using sysinit (where
appropriate). In some cases, do minor rejuggling of SYSINIT ordering
to make room for or better manage events.

Portions submitted by: jhb (VNET_SYSINIT), bz (cleanup)
Discussed with: jhb, bz, julian, zec
Reviewed by: bz
Approved by: re (VIMAGE blanket)


195814 21-Jul-2009 bz

sysctl_msec_to_ticks is used with both virtualized and
non-vrtiualized sysctls so we cannot used one common function.

Add a macro to convert the arg1 in the virtualized case to
vnet.h to not expose the maths to all over the code.

Add a wrapper for the single virtualized call, properly handling
arg1 and call the default implementation from there.

Convert the two over places to use the new macro.

Reviewed by: rwatson
Approved by: re (kib)


195782 20-Jul-2009 rwatson

Garbage collect vnet module registrations that have neither constructors
nor destructors, as there's no actual work to do.

In most cases, the constructors weren't needed because of the existing
protocol initialization functions run by net_init_domain() as part of
VNET_MOD_NET, or they were eliminated when support for static
initialization of virtualized globals was added.

Garbage collect dependency references to modules without constructors or
destructors, notably VNET_MOD_INET and VNET_MOD_INET6.

Reviewed by: bz
Approved by: re (vimage blanket)


195760 19-Jul-2009 rwatson

Reimplement and/or implement vnet list locking by replacing a mostly
unused custom mutex/condvar-based sleep locks with two locks: an
rwlock (for non-sleeping use) and sxlock (for sleeping use). Either
acquired for read is sufficient to stabilize the vnet list, but both
must be acquired for write to modify the list.

Replace previous no-op read locking macros, used in various places
in the stack, with actual locking to prevent race conditions. Callers
must declare when they may perform unbounded sleeps or not when
selecting how to lock.

Refactor vnet sysinits so that the vnet list and locks are initialized
before kernel modules are linked, as the kernel linker will use them
for modules loaded by the boot loader.

Update various consumers of these KPIs based on whether they may sleep
or not.

Reviewed by: bz
Approved by: re (kib)


195755 18-Jul-2009 bms

Fix a problem, whereby misbehaving IPv6 applications, which don't include
a valid zone ID or interface identifier in a v6 multicast leave, would
trigger a fairly paranoid KASSERT().

Observed with Boost++ regression tests on ref8.freebsd.org.

Approved by: re (kib)


195727 16-Jul-2009 rwatson

Remove unused VNET_SET() and related macros; only VNET_GET() is
ever actually used. Rename VNET_GET() to VNET() to shorten
variable references.

Discussed with: bz, julian
Reviewed by: bz
Approved by: re (kensmith, kib)


195699 14-Jul-2009 rwatson

Build on Jeff Roberson's linker-set based dynamic per-CPU allocator
(DPCPU), as suggested by Peter Wemm, and implement a new per-virtual
network stack memory allocator. Modify vnet to use the allocator
instead of monolithic global container structures (vinet, ...). This
change solves many binary compatibility problems associated with
VIMAGE, and restores ELF symbols for virtualized global variables.

Each virtualized global variable exists as a "reference copy", and also
once per virtual network stack. Virtualized global variables are
tagged at compile-time, placing the in a special linker set, which is
loaded into a contiguous region of kernel memory. Virtualized global
variables in the base kernel are linked as normal, but those in modules
are copied and relocated to a reserved portion of the kernel's vnet
region with the help of a the kernel linker.

Virtualized global variables exist in per-vnet memory set up when the
network stack instance is created, and are initialized statically from
the reference copy. Run-time access occurs via an accessor macro, which
converts from the current vnet and requested symbol to a per-vnet
address. When "options VIMAGE" is not compiled into the kernel, normal
global ELF symbols will be used instead and indirection is avoided.

This change restores static initialization for network stack global
variables, restores support for non-global symbols and types, eliminates
the need for many subsystem constructors, eliminates large per-subsystem
structures that caused many binary compatibility issues both for
monitoring applications (netstat) and kernel modules, removes the
per-function INIT_VNET_*() macros throughout the stack, eliminates the
need for vnet_symmap ksym(2) munging, and eliminates duplicate
definitions of virtualized globals under VIMAGE_GLOBALS.

Bump __FreeBSD_version and update UPDATING.

Portions submitted by: bz
Reviewed by: bz, zec
Discussed with: gnn, jamie, jeff, jhb, julian, sam
Suggested by: peter
Approved by: re (kensmith)


195643 12-Jul-2009 qingli

This patch adds a host route to an interface address (that is assigned
to a non loopback/ppp link type) through the loopback interface. Prior
to the new L2/L3 rewrite, this host route was explicitly created when
processing the IPv6 address assignment. This loopback host route is
deleted when that IPv6 address is removed from the interface.

Reviewed by: bz, gnn
Approved by: re


195157 29-Jun-2009 rwatson

Fix "options VIMAGE_GLOBALS" build following introduction of
in6_ifaddrhead.

Approved by: re (kib)


195102 27-Jun-2009 rwatson

In in6_update_ifa(), jump to 'cleanup' rather than returning directly
in one additional case, avoiding an ifaddr reference leak.

Defer releasing the in6_ifaddr's in6_ifaddrhead reference until the
end of in6_unlink_ifa(), as callers are inconsistent regarding whether
or not they hold a reference across the call. This avoids using the
ifaddr after it may have been freed.

Reported by: tegge
Reviewed by: tegge
Approved by: re (blanket)
MFC after: 6 weeks


194971 25-Jun-2009 rwatson

Add address list locking for in6_ifaddrhead/ia_link: as with locking
for in_ifaddrhead, we stick with an rwlock for the time being, which
we will revisit in the future with a possible move to rmlocks.

Some pieces of code require significant further reworking to be
safe from all classes of writer-writer races.

Reviewed by: bz
MFC after: 6 weeks


194943 25-Jun-2009 rwatson

Clean up reference management in in6_update_ifa and in6_unlink_ifa, and
in particular, add a reference for in6_ifaddrhead since we do remove a
reference for it when an IPv6 address is removed. This fixes ifconfig
delete of an IPv6 alias.

Reported by: tegge
MFC after: 6 weeks


194907 24-Jun-2009 rwatson

Convert netinet6 to using queue(9) rather than hand-crafted linked lists
for the global IPv6 address list (in6_ifaddr -> in6_ifaddrhead). Adopt
the code styles and conventions present in netinet where possible.

Reviewed by: gnn, bz
MFC after: 6 weeks (possibly not MFCable?)


194777 23-Jun-2009 bz

Make callers to in6_selectsrc() and in6_pcbladdr() pass in memory
to save the selected source address rather than returning an
unreferenced copy to a pointer that might long be gone by the
time we use the pointer for anything meaningful.

Asked for by: rwatson
Reviewed by: rwatson


194760 23-Jun-2009 rwatson

Modify most routines returning 'struct ifaddr *' to return references
rather than pointers, requiring callers to properly dispose of those
references. The following routines now return references:

ifaddr_byindex
ifa_ifwithaddr
ifa_ifwithbroadaddr
ifa_ifwithdstaddr
ifa_ifwithnet
ifaof_ifpforaddr
ifa_ifwithroute
ifa_ifwithroute_fib
rt_getifa
rt_getifa_fib
IFP_TO_IA
ip_rtaddr
in6_ifawithifp
in6ifa_ifpforlinklocal
in6ifa_ifpwithaddr
in6_ifadd
carp_iamatch6
ip6_getdstifaddr

Remove unused macro which didn't have required referencing:

IFP_TO_IA6

This closes many small races in which changes to interface
or address lists while an ifaddr was in use could lead to use of freed
memory (etc). In a few cases, add missing if_addr_list locking
required to safely acquire references.

Because of a lack of deep copying support, we accept a race in which
an in6_ifaddr pointed to by mbuf tags and extracted with
ip6_getdstifaddr() doesn't hold a reference while in transmit. Once
we have mbuf tag deep copy support, this can be fixed.

Reviewed by: bz
Obtained from: Apple, Inc. (portions)
MFC after: 6 weeks (portions)


194739 23-Jun-2009 bz

After cleaning up rt_tables from vnet.h and cleaning up opt_route.h
a lot of files no longer need route.h either. Garbage collect them.
While here remove now unneeded vnet.h #includes as well.


194714 23-Jun-2009 bz

In r194702 I meant to remove vnet.h which is no longer needed, not route.h.


194702 23-Jun-2009 bz

in6_rtqdrain() has been unused. Cleanup.
As this was the only consumer of net/route.h left remove that as well.


194602 21-Jun-2009 rwatson

Clean up common ifaddr management:

- Unify reference count and lock initialization in a single function,
ifa_init().
- Move tear-down from a macro (IFAFREE) to a function ifa_free().
- Move reference count bump from a macro (IFAREF) to a function ifa_ref().
- Instead of using a u_int protected by a mutex to refcount(9) for
reference count management.

The ifa_mtx is now used for exactly one ioctl, and possibly should be
removed.

MFC after: 3 weeks


194581 21-Jun-2009 rdivacky

Switch cmd argument to u_long. This matches what if_ethersubr.c does and
allows the code to compile cleanly on amd64 with clang.

Reviewed by: rwatson
Approved by: ed (mentor)


194368 17-Jun-2009 bz

Add explicit includes for jail.h to the files that need them and
remove the "hidden" one from vimage.h.


194118 13-Jun-2009 jamie

Rename the host-related prison fields to be the same as the host.*
parameters they represent, and the variables they replaced, instead of
abbreviated versions of them.

Approved by: bz (mentor)


194052 12-Jun-2009 zec

Remove unnecessary #ifdef lines and code.

Approved by: julian (mentor)


193893 10-Jun-2009 cperciva

Prevent integer overflow in direct pipe write code from circumventing
virtual-to-physical page lookups. [09:09]

Add missing permissions check for SIOCSIFINFO_IN6 ioctl. [09:10]

Fix buffer overflow in "autokey" negotiation in ntpd(8). [09:11]

Approved by: so (cperciva)
Approved by: re (not really, but SVN wants this...)
Security: FreeBSD-SA-09:09.pipe
Security: FreeBSD-SA-09:10.ipv6
Security: FreeBSD-SA-09:11.ntpd


193744 08-Jun-2009 bz

After r193232 rt_tables in vnet.h are no longer indirectly dependent on
the ROUTETABLES kernel option thus there is no need to include opt_route.h
anymore in all consumers of vnet.h and no longer depend on it for module
builds.

Remove the hidden include in flowtable.h as well and leave the two
explicit #includes in ip_input.c and ip_output.c.


193731 08-Jun-2009 zec

Introduce an infrastructure for dismantling vnet instances.

Vnet modules and protocol domains may now register destructor
functions to clean up and release per-module state. The destructor
mechanisms can be triggered by invoking "vimage -d", or a future
equivalent command which will be provided via the new jail framework.

While this patch introduces numerous placeholder destructor functions,
many of those are currently incomplete, thus leaking memory or (even
worse) failing to stop all running timers. Many of such issues are
already known and will be incrementaly fixed over the next weeks in
smaller incremental commits.

Apart from introducing new fields in structs ifnet, domain, protosw
and vnet_net, which requires the kernel and modules to be rebuilt, this
change should have no impact on nooptions VIMAGE builds, since vnet
destructors can only be called in VIMAGE kernels. Moreover,
destructor functions should be in general compiled in only in
options VIMAGE builds, except for kernel modules which can be safely
kldunloaded at run time.

Bump __FreeBSD_version to 800097.
Reviewed by: bz, julian
Approved by: rwatson, kib (re), julian (mentor)


193664 07-Jun-2009 hrs

Fix and add a workaround on an issue of EtherIP packet with reversed
version field sent via gif(4)+if_bridge(4). The EtherIP
implementation found on FreeBSD 6.1, 6.2, 6.3, 7.0, 7.1, and 7.2 had
an interoperability issue because it sent the incorrect EtherIP
packets and discarded the correct ones.

This change introduces the following two flags to gif(4):

accept_rev_ethip_ver: accepts both correct EtherIP packets and ones
with reversed version field, if enabled. If disabled, the gif
accepts the correct packets only. This flag is enabled by
default.

send_rev_ethip_ver: sends EtherIP packets with reversed version field
intentionally, if enabled. If disabled, the gif sends the correct
packets only. This flag is disabled by default.

These flags are stored in struct gif_softc and can be set by
ifconfig(8) on per-interface basis.

Note that this is an incompatible change of EtherIP with the older
FreeBSD releases. If you need to interoperate older FreeBSD boxes and
new versions after this commit, setting "send_rev_ethip_ver" is
needed.

Reviewed by: thompsa and rwatson
Spotted by: Shunsuke SHINOMIYA
PR: kern/125003
MFC after: 2 weeks


193511 05-Jun-2009 rwatson

Move "options MAC" from opt_mac.h to opt_global.h, as it's now in GENERIC
and used in a large number of files, but also because an increasing number
of incorrect uses of MAC calls were sneaking in due to copy-and-paste of
MAC-aware code without the associated opt_mac.h include.

Discussed with: pjd


193274 01-Jun-2009 zec

V_loif is not an array but a pure pointer, so treat it as such.

Reviewed by: bz
Approved by: julian (mentor)


193266 01-Jun-2009 zec

Remove an #undef MIN that slipped under the radar and led me to
hastily introduce an #define MIN() a few lines below in r191816.

Approved by: julian (mentor)
Discussed with: bz


193232 01-Jun-2009 bz

Convert the two dimensional array to be malloced and introduce
an accessor function to get the correct rnh pointer back.

Update netstat to get the correct pointer using kvm_read()
as well.

This not only fixes the ABI problem depending on the kernel
option but also permits the tunable to overwrite the kernel
option at boot time up to MAXFIBS, enlarging the number of
FIBs without having to recompile. So people could just use
GENERIC now.

Reviewed by: julian, rwatson, zec
X-MFC: not possible


193219 01-Jun-2009 rwatson

Reimplement the netisr framework in order to support parallel netisr
threads:

- Support up to one netisr thread per CPU, each processings its own
workstream, or set of per-protocol queues. Threads may be bound
to specific CPUs, or allowed to migrate, based on a global policy.

In the future it would be desirable to support topology-centric
policies, such as "one netisr per package".

- Allow each protocol to advertise an ordering policy, which can
currently be one of:

NETISR_POLICY_SOURCE: packets must maintain ordering with respect to
an implicit or explicit source (such as an interface or socket).

NETISR_POLICY_FLOW: make use of mbuf flow identifiers to place work,
as well as allowing protocols to provide a flow generation function
for mbufs without flow identifers (m2flow). Falls back on
NETISR_POLICY_SOURCE if now flow ID is available.

NETISR_POLICY_CPU: allow protocols to inspect and assign a CPU for
each packet handled by netisr (m2cpuid).

- Provide utility functions for querying the number of workstreams
being used, as well as a mapping function from workstream to CPU ID,
which protocols may use in work placement decisions.

- Add explicit interfaces to get and set per-protocol queue limits, and
get and clear drop counters, which query data or apply changes across
all workstreams.

- Add a more extensible netisr registration interface, in which
protocols declare 'struct netisr_handler' structures for each
registered NETISR_ type. These include name, handler function,
optional mbuf to flow ID function, optional mbuf to CPU ID function,
queue limit, and ordering policy. Padding is present to allow these
to be expanded in the future. If no queue limit is declared, then
a default is used.

- Queue limits are now per-workstream, and raised from the previous
IFQ_MAXLEN default of 50 to 256.

- All protocols are updated to use the new registration interface, and
with the exception of netnatm, default queue limits. Most protocols
register as NETISR_POLICY_SOURCE, except IPv4 and IPv6, which use
NETISR_POLICY_FLOW, and will therefore take advantage of driver-
generated flow IDs if present.

- Formalize a non-packet based interface between interface polling and
the netisr, rather than having polling pretend to be two protocols.
Provide two explicit hooks in the netisr worker for start and end
events for runs: netisr_poll() and netisr_pollmore(), as well as a
function, netisr_sched_poll(), to allow the polling code to schedule
netisr execution. DEVICE_POLLING still embeds single-netisr
assumptions in its implementation, so for now if it is compiled into
the kernel, a single and un-bound netisr thread is enforced
regardless of tunable configuration.

In the default configuration, the new netisr implementation maintains
the same basic assumptions as the previous implementation: a single,
un-bound worker thread processes all deferred work, and direct dispatch
is enabled by default wherever possible.

Performance measurement shows a marginal performance improvement over
the old implementation due to the use of batched dequeue.

An rmlock is used to synchronize use and registration/unregistration
using the framework; currently, synchronized use is disabled
(replicating current netisr policy) due to a measurable 3%-6% hit in
ping-pong micro-benchmarking. It will be enabled once further rmlock
optimization has taken place. However, in practice, netisrs are
rarely registered or unregistered at runtime.

A new man page for netisr will follow, but since one doesn't currently
exist, it hasn't been updated.

This change is not appropriate for MFC, although the polling shutdown
handler should be merged to 7-STABLE.

Bump __FreeBSD_version.

Reviewed by: bz


193217 01-Jun-2009 pjd

- Rename IP_NONLOCALOK IP socket option to IP_BINDANY, to be more consistent
with OpenBSD (and BSD/OS originally). We can't easly do it SOL_SOCKET option
as there is no more space for more SOL_SOCKET options, but this option also
fits better as an IP socket option, it seems.
- Implement this functionality also for IPv6 and RAW IP sockets.
- Always compile it in (don't use additional kernel options).
- Remove sysctl to turn this functionality on and off.
- Introduce new privilege - PRIV_NETINET_BINDANY, which allows to use this
functionality (currently only unjail root can use it).

Discussed with: julian, adrian, jhb, rwatson, kmacy


193066 29-May-2009 jamie

Place hostnames and similar information fully under the prison system.
The system hostname is now stored in prison0, and the global variable
"hostname" has been removed, as has the hostname_mtx mutex. Jails may
have their own host information, or they may inherit it from the
parent/system. The proper way to read the hostname is via
getcredhostname(), which will copy either the hostname associated with
the passed cred, or the system hostname if you pass NULL. The system
hostname can still be accessed directly (and without locking) at
prison0.pr_host, but that should be avoided where possible.

The "similar information" referred to is domainname, hostid, and
hostuuid, which have also become prison parameters and had their
associated global variables removed.

Approved by: bz (mentor)


192923 27-May-2009 bms

Merge final round of MLD changes from p4:
ip6_input.c, in6.h:
* Add netinet6-specific mbuf flag M_RTALERT_MLD, shadowing M_PROTO6.
* Always set this flag if HBH Router Alert option is present for MLD,
even when not forwarding.

icmp6.c:
* In icmp6_input(), spell m->m_pkthdr.rcvif as ifp to be consistent.
* Use scope ID for verifying input. Do not apply SSM filters here, no inpcb.
* Check for M_RTALERT_MLD when validating MLD traffic, as we can't see
IPv6 hop options outside of ip6_input().

in6_mcast.c:
* Use KAME scope/zone ID in in6_multi.
* Update net.inet6.ip6.mcast.filters implementation to use scope IDs
for comparisons.
* Fix scope ID treatment in multicast socket option processing.
Scope IDs passed in from userland will be ignored as other less
ambiguous APIs exist for specifying the link.
* Tighten userland input checks in IPv6 SSM delta and full-state ops.
* Source filter embedded scope IDs need to be revisited, for now
just clear them and ignore them on input.
* Adapt KAME behaviour of looking up the scope ID in the default zone
for multicast leaves, when the interface is ambiguous.

mld6.c:
* Tighten origin checks on MLD traffic as per RFC3810 Section 6.2:
* ip6_src MAY be the unspecified address for MLDv1 reports.
* ip6_src MAY have link-local address scope for MLDv1 reports,
MLDv1 queries, and MLDv2 queries.
* Perform address field validation *before* accepting queries.
* Use KAME scope/zone ID in query/report processing.
* Break const correctness for mld_v1_input_report(), mld_v1_input_query()
as we temporarily modify the input mbuf chain.
* Clear the scope ID before handoff to userland MLD daemon.
* Fix MLDv1 old querier present timer processing.
With the protocol defaults, hosts should revert to MLDv2 after 260s.
* Add net.inet6.mld.v1enable sysctl, default to on.

ifmcstat.c:
* Use sysctl by default; -K requests kvm(3) if so compiled.

mld.4:
* Connect man page to build.

Tested using PCS.


192895 27-May-2009 jamie

Add hierarchical jails. A jail may further virtualize its environment
by creating a child jail, which is visible to that jail and to any
parent jails. Child jails may be restricted more than their parents,
but never less. Jail names reflect this hierarchy, being MIB-style
dot-separated strings.

Every thread now points to a jail, the default being prison0, which
contains information about the physical system. Prison0's root
directory is the same as rootvnode; its hostname is the same as the
global hostname, and its securelevel replaces the global securelevel.
Note that the variable "securelevel" has actually gone away, which
should not cause any problems for code that properly uses
securelevel_gt() and securelevel_ge().

Some jail-related permissions that were kept in global variables and
set via sysctls are now per-jail settings. The sysctls still exist for
backward compatibility, used only by the now-deprecated jail(2) system
call.

Approved by: bz (mentor)


192649 23-May-2009 bz

Implement UDP control block support.

So far the udp_tun_func_t had been (ab)using inp_ppcb for udp in kernel
tunneling callbacks. Move that into the udpcb and add a field for flags
there to be used by upcoming changes instead of sticking udp only flags
into in_pcb flags2.

Bump __FreeBSD_version for ports to detect it and because of vnet* struct
size changes.

Submitted by: jhb (7.x version)
Reviewed by: rwatson


192648 23-May-2009 bz

Add sysctls to toggle the behaviour of the (former) IPSEC_FILTERTUNNEL
kernel option.
This also permits tuning of the option per virtual network stack, as
well as separately per inet, inet6.

The kernel option is left for a transition period, marked deprecated,
and will be removed soon.

Initially requested by: phk (1 year 1 day ago)
MFC after: 4 weeks


192562 21-May-2009 bms

Pullup from p4 tip:
* Fix MLDv2 general query timer (fallout from automated refactoring).
* Refactor MLDv1 timer.
MLDv2 query processing is now working.


192547 21-May-2009 bms

Pullup svn source to p4 top of tree:
* Fix LOR in MLDv2 query input path.
* Strip embedded KAME scope IDs for on-wire IPv6 address comparisons.


192476 20-May-2009 qingli

When an interface address is removed and the last prefix
route is also being deleted, the link-layer address table
(arp or nd6) will flush those L2 llinfo entries that match
the removed prefix.

Reviewed by: kmacy


192318 18-May-2009 bz

Add two missing INIT_VNET_INET6(curvnet) to make VIMAGE kernels happier.


192282 18-May-2009 qingli

This patch resolves the following issues:

-- A routing socket message is not generated when an IPv6 address is
either inserted or deleted from an interface. The missing routing
message problem was discovered by Randall Stewart and Michael Tuxen
during SCTP testing.

-- Previously when an IPv6 address is configured on an interface, if the
prefix length is /128, then a host route is instaleld in the kernel
for this address. But this host route is not deleted when that IPv6
address is removed from the interface.

-- Routes to the link-local all-nodes multicast address and the
interface-local all-nodes multicast address are not removed when
the last IPv6 address is removed from an interface.

Reviewed by: bz, gnn


191942 09-May-2009 imp

Implement RFC 5095 more fully. Rather than marking this no-op code as
BURN_BRIDGES, just remove it. Adjust comments.

Reviewed by: dwhite, emaste, battlez


191846 06-May-2009 zec

Remove unnecessary CURVNET_SET() calls where curvnet context is
(i.e. seems to be) already set.

This should reduce console noise due to curvnet recursion reports.

This change has no impact on nooptions VIMAGE builds.
Approved by: julian (mentor)


191828 05-May-2009 kan

Silence unsolicited spam printed out when KTR_MLD happens to be
in KTR_COMPILE mask. Compiling KTR trace points in does not necessarily
mean enabling them, use proper check against ktr_mask instead.


191816 05-May-2009 zec

Change the curvnet variable from a global const struct vnet *,
previously always pointing to the default vnet context, to a
dynamically changing thread-local one. The currvnet context
should be set on entry to networking code via CURVNET_SET() macros,
and reverted to previous state via CURVNET_RESTORE(). Recursions
on curvnet are permitted, though strongly discuouraged.

This change should have no functional impact on nooptions VIMAGE
kernel builds, where CURVNET_* macros expand to whitespace.

The curthread->td_vnet (aka curvnet) variable's purpose is to be an
indicator of the vnet context in which the current network-related
operation takes place, in case we cannot deduce the current vnet
context from any other source, such as by looking at mbuf's
m->m_pkthdr.rcvif->if_vnet, sockets's so->so_vnet etc. Moreover, so
far curvnet has turned out to be an invaluable consistency checking
aid: it helps to catch cases when sockets, ifnets or any other
vnet-aware structures may have leaked from one vnet to another.

The exact placement of the CURVNET_SET() / CURVNET_RESTORE() macros
was a result of an empirical iterative process, whith an aim to
reduce recursions on CURVNET_SET() to a minimum, while still reducing
the scope of CURVNET_SET() to networking only operations - the
alternative would be calling CURVNET_SET() on each system call entry.
In general, curvnet has to be set in three typicall cases: when
processing socket-related requests from userspace or from within the
kernel; when processing inbound traffic flowing from device drivers
to upper layers of the networking stack, and when executing
timer-driven networking functions.

This change also introduces a DDB subcommand to show the list of all
vnet instances.

Approved by: julian (mentor)


191738 02-May-2009 zec

Make indentation more uniform accross vnet container structs.

This is a purely cosmetic / NOP change.

Reviewed by: bz
Approved by: julian (mentor)
Verified by: svn diff -x -w producing no output


191718 01-May-2009 bms

Limit scope of acquisition of INP_RLOCK for multicast input filter
to the scope of its use, even though this may thrash the lock if
the INP is referenced for other purposes.

Tested by: David Wolfskill


191688 30-Apr-2009 zec

Permit buiding kernels with options VIMAGE, restricted to only a single
active network stack instance. Turning on options VIMAGE at compile
time yields the following changes relative to default kernel build:

1) V_ accessor macros for virtualized variables resolve to structure
fields via base pointers, instead of being resolved as fields in global
structs or plain global variables. As an example, V_ifnet becomes:

options VIMAGE: ((struct vnet_net *) vnet_net)->_ifnet
default build: vnet_net_0._ifnet
options VIMAGE_GLOBALS: ifnet

2) INIT_VNET_* macros will declare and set up base pointers to be used
by V_ accessor macros, instead of resolving to whitespace:

INIT_VNET_NET(ifp->if_vnet); becomes

struct vnet_net *vnet_net = (ifp->if_vnet)->mod_data[VNET_MOD_NET];

3) Memory for vnet modules registered via vnet_mod_register() is now
allocated at run time in sys/kern/kern_vimage.c, instead of per vnet
module structs being declared as globals. If required, vnet modules
can now request the framework to provide them with allocated bzeroed
memory by filling in the vmi_size field in their vmi_modinfo structures.

4) structs socket, ifnet, inpcbinfo, tcpcb and syncache_head are
extended to hold a pointer to the parent vnet. options VIMAGE builds
will fill in those fields as required.

5) curvnet is introduced as a new global variable in options VIMAGE
builds, always pointing to the default and only struct vnet.

6) struct sysctl_oid has been extended with additional two fields to
store major and minor virtualization module identifiers, oid_v_subs and
oid_v_mod. SYSCTL_V_* family of macros will fill in those fields
accordingly, and store the offset in the appropriate vnet container
struct in oid_arg1.
In sysctl handlers dealing with virtualized sysctls, the
SYSCTL_RESOLVE_V_ARG1() macro will compute the address of the target
variable and make it available in arg1 variable for further processing.

Unused fields in structs vnet_inet, vnet_inet6 and vnet_ipfw have
been deleted.

Reviewed by: bz, rwatson
Approved by: julian (mentor)


191672 29-Apr-2009 bms

Bite the bullet, and make the IPv6 SSM and MLDv2 mega-commit:
import from p4 bms_netdev. Summary of changes:

* Connect netinet6/in6_mcast.c to build.
The legacy KAME KPIs are mostly preserved.
* Eliminate now dead code from ip6_output.c.
Don't do mbuf bingo, we are not going to do RFC 2292 style
CMSG tricks for multicast options as they are not required
by any current IPv6 normative reference.
* Refactor transports (UDP, raw_ip6) to do own mcast filtering.
SCTP, TCP unaffected by this change.
* Add ip6_msource, in6_msource structs to in6_var.h.
* Hookup mld_ifinfo state to in6_ifextra, allocate from
domifattach path.
* Eliminate IN6_LOOKUP_MULTI(), it is no longer referenced.
Kernel consumers which need this should use in6m_lookup().
* Refactor IPv6 socket group memberships to use a vector (like IPv4).
* Update ifmcstat(8) for IPv6 SSM.
* Add witness lock order for IN6_MULTI_LOCK.
* Move IN6_MULTI_LOCK out of lower ip6_output()/ip6_input() paths.
* Introduce IP6STAT_ADD/SUB/INC/DEC as per rwatson's IPv4 cleanup.
* Update carp(4) for new IPv6 SSM KPIs.
* Virtualize ip6_mrouter socket.
Changes mostly localized to IPv6 MROUTING.
* Don't do a local group lookup in MROUTING.
* Kill unused KAME prototypes in6_purgemkludge(), in6_restoremkludge().
* Preserve KAME DAD timer jitter behaviour in MLDv1 compatibility mode.
* Bump __FreeBSD_version to 800084.
* Update UPDATING.

NOTE WELL:
* This code hasn't been tested against real MLDv2 queriers
(yet), although the on-wire protocol has been verified in Wireshark.
* There are a few unresolved issues in the socket layer APIs to
do with scope ID propagation.
* There is a LOR present in ip6_output()'s use of
in6_setscope() which needs to be resolved. See comments in mld6.c.
This is believed to be benign and can't be avoided for the moment
without re-introducing an indirect netisr.

This work was mostly derived from the IGMPv3 implementation, and
has been sponsored by a third party.


191666 29-Apr-2009 bms

Add MLDv2 protocol header, but do not connect it to the build.


191665 29-Apr-2009 bms

Import IPv6 SSM module but do not connect it to the build.


191662 29-Apr-2009 bms

Add IN6ADDR_LINKLOCAL_ALLV2ROUTERS_INIT, in6addr_linklocal_allv2routers
for use by MLDv2.
Add IPv6 SSM socket layer membership vector size constants and
tree bounds.
Remove unreferenced struct ipv6_mreq_source; SSM for IPv6 goes
straight to the RFC 3678 socket options.


191548 26-Apr-2009 zec

In preparation for turning on options VIMAGE in next commits,
rearrange / replace / adjust several INIT_VNET_* initializer
macros, all of which currently resolve to whitespace.

Reviewed by: bz (an older version of the patch)
Approved by: julian (mentor)


191433 23-Apr-2009 bz

Compare protosw pointer with NULL.

MFC after: 1 month


191341 20-Apr-2009 rwatson

Assert the interface address list lock in IFP_TO_IA6(), as it will
iterate the interface address list. Marginally expand IF_ADDR_LOCK()
coverage in mld6.c to make sure it's held when IFP_TO_IA6() is called.

MFC after: 2 weeks


191340 20-Apr-2009 rwatson

Prefer structure fields (ifa_link) to macro aliases for them
(ifa_list).

MFC after: 2 weeks


191337 20-Apr-2009 rwatson

Acquire interface address list lock around access to if_addrhead,
closing several writer-writer races, and some read-write races.

MFC after: 2 weeks


191336 20-Apr-2009 rwatson

Use TAILQ_FOREACH() and TAILQ_FOREACH_SAFE() rather than manually
accessing queue(9) structure fields for if_addrhead.

Prefer FreeBSD field name if_addrhead to compatibility macro
if_addrlist.

MFC after: 2 weeks


191323 20-Apr-2009 rwatson

Close some but not all writer-writer races when maintaining IPv6
interface address lists by locking the interface address list lock.

MFC after: 2 weeks


191317 20-Apr-2009 rwatson

Lock interface address lists before iterating over them in nd6.

MFC after: 2 weeks


191148 16-Apr-2009 kmacy

Change if_output to take a struct route as its fourth argument in order
to allow passing a cached struct llentry * down to L2

Reviewed by: rwatson


191117 15-Apr-2009 kmacy

add an llentry to struct route{_in6} to allow it to be passed around with
the rtentry


190964 12-Apr-2009 rwatson

Update stats in struct icmpstat and icmp6stat using four new
macros: ICMPSTAT_ADD(), ICMPSTAT_INC(), ICMP6STAT_ADD(), and
ICMP6STAT_INC(), rather than directly manipulating the fields
of these structures across the kernel. This will make it
easier to change the implementation of these statistics,
such as using per-CPU versions of the data structures.

In on case, icmp6stat members are manipulated indirectly, by
icmp6_errcount(), and this will require further work to fix
for per-CPU stats.

MFC after: 3 days


190963 12-Apr-2009 rwatson

Commit file omitted in r190962:

Update stats in struct udpstat using two new macros, UDPSTAT_ADD()
and UDPSTAT_INC(), rather than directly manipulating the fields
across the kernel. This will make it easier to change the
implementation of these statistics, such as using per-CPU versions
of the data structures.

MFC after: 3 days


190909 11-Apr-2009 zec

Introduce vnet module registration / initialization framework with
dependency tracking and ordering enforcement.

With this change, per-vnet initialization functions introduced with
r190787 are no longer directly called from traditional initialization
functions (which cc in most cases inlined to pre-r190787 code), but are
instead registered via the vnet framework first, and are invoked only
after all prerequisite modules have been initialized. In the long run,
this framework should allow us to both initialize and dismantle
multiple vnet instances in a correct order.

The problem this change aims to solve is how to replay the
initialization sequence of various network stack components, which
have been traditionally triggered via different mechanisms (SYSINIT,
protosw). Note that this initialization sequence was and still can be
subtly different depending on whether certain pieces of code have been
statically compiled into the kernel, loaded as modules by boot
loader, or kldloaded at run time.

The approach is simple - we record the initialization sequence
established by the traditional mechanisms whenever vnet_mod_register()
is called for a particular vnet module. The vnet_mod_register_multi()
variant allows a single initializer function to be registered multiple
times but with different arguments - currently this is only used in
kern/uipc_domain.c by net_add_domain() with different struct domain *
as arguments, which allows for protosw-registered initialization
routines to be invoked in a correct order by the new vnet
initialization framework.

For the purpose of identifying vnet modules, each vnet module has to
have a unique ID, which is statically assigned in sys/vimage.h.
Dynamic assignment of vnet module IDs is not supported yet.

A vnet module may specify a single prerequisite module at registration
time by filling in the vmi_dependson field of its vnet_modinfo struct
with the ID of the module it depends on. Unless specified otherwise,
all vnet modules depend on VNET_MOD_NET (container for ifnet list head,
rt_tables etc.), which thus has to and will always be initialized
first. The framework will panic if it detects any unresolved
dependencies before completing system initialization. Detection of
unresolved dependencies for vnet modules registered after boot
(kldloaded modules) is not provided.

Note that the fact that each module can specify only a single
prerequisite may become problematic in the long run. In particular,
INET6 depends on INET being already instantiated, due to TCP / UDP
structures residing in INET container. IPSEC also depends on INET,
which will in turn additionally complicate making INET6-only kernel
configs a reality.

The entire registration framework can be compiled out by turning on the
VIMAGE_GLOBALS kernel config option.

Reviewed by: bz
Approved by: julian (mentor)


190787 06-Apr-2009 zec

First pass at separating per-vnet initializer functions
from existing functions for initializing global state.

At this stage, the new per-vnet initializer functions are
directly called from the existing global initialization code,
which should in most cases result in compiler inlining those
new functions, hence yielding a near-zero functional change.

Modify the existing initializer functions which are invoked via
protosw, like ip_init() et. al., to allow them to be invoked
multiple times, i.e. per each vnet. Global state, if any,
is initialized only if such functions are called within the
context of vnet0, which will be determined via the
IS_DEFAULT_VNET(curvnet) check (currently always true).

While here, V_irtualize a few remaining global UMA zones
used by net/netinet/netipsec networking code. While it is
not yet clear to me or anybody else whether this is the right
thing to do, at this stage this makes the code more readable,
and makes it easier to track uncollected UMA-zone-backed
objects on vnet removal. In the long run, it's quite possible
that some form of shared use of UMA zone pools among multiple
vnets should be considered.

Bump __FreeBSD_version due to changes in layout of structs
vnet_ipfw, vnet_inet and vnet_net.

Approved by: julian (mentor)


190012 19-Mar-2009 bms

Introduce a number of changes to the MROUTING code.
This is purely a forwarding plane cleanup; no control plane
code is involved.

Summary:
* Split IPv4 and IPv6 MROUTING support. The static compile-time
kernel option remains the same, however, the modules may now
be built for IPv4 and IPv6 separately as ip_mroute_mod and
ip6_mroute_mod.
* Clean up the IPv4 multicast forwarding code to use BSD queue
and hash table constructs. Don't build our own timer abstractions
when ratecheck() and timevalclear() etc will do.
* Expose the multicast forwarding cache (MFC) and virtual interface
table (VIF) as sysctls, to reduce netstat's dependence on libkvm
for this information for running kernels.
* bandwidth meters however still require libkvm.
* Make the MFC hash table size a boot/load-time tunable ULONG,
net.inet.ip.mfchashsize (defaults to 256).
* Remove unused members from struct vif and struct mfc.
* Kill RSVP support, as no current RSVP implementation uses it.
These stubs could be moved to raw_ip.c.
* Don't share locks or initialization between IPv4 and IPv6.
* Don't use a static struct route_in6 in ip6_mroute.c.
The v6 code is still using a cached struct route_in6, this is
moved to mif6 for the time being.
* More cleanup remains to be merged from ip_mroute.c to ip6_mroute.c.

v4 path tested using ports/net/mcast-tools.
v6 changes are mostly mechanical locking and *have not* been tested.
As these changes partially break some kernel ABIs, they will not
be MFCed. There is a lot more work to be done here.

Reviewed by: Pavlin Radoslavov


189851 15-Mar-2009 rwatson

Remove IFF_NEEDSGIANT, a compatibility infrastructure introduced
in FreeBSD 5.x to allow network device drivers to run with Giant
despite the network stack being Giant-free. This significantly
simplifies calls into ioctl() on network interfaces, especially
in the multicast code, as well as eliminates deferred invocation
of interface if_start routines.

Disable the build on device drivers still depending on
IFF_NEEDSGIANT as they no longer compile. They will be removed
in a few weeks if they haven't been made MPSAFE in that time.
Disabled drivers:

if_ar
if_axe
if_aue
if_cdce
if_cue
if_kue
if_ray
if_rue
if_rum
if_sr
if_udav
if_ural
if_zyd

Drivers that were already disabled because of tty changes:

if_ppp
if_sl

Discussed on: arch@


189848 15-Mar-2009 rwatson

Correct a number of evolved problems with inp_vflag and inp_flags:
certain flags that should have been in inp_flags ended up in inp_vflag,
meaning that they were inconsistently locked, and in one case,
interpreted. Move the following flags from inp_vflag to gaps in the
inp_flags space (and clean up the inp_flags constants to make gaps
more obvious to future takers):

INP_TIMEWAIT
INP_SOCKREF
INP_ONESBCAST
INP_DROPPED

Some aspects of this change have no effect on kernel ABI at all, as these
are UDP/TCP/IP-internal uses; however, netstat and sockstat detect
INP_TIMEWAIT when listing TCP sockets, so any MFC will need to take this
into account.

MFC after: 1 week (or after dependencies are MFC'd)
Reviewed by: bz


189494 07-Mar-2009 marius

On architectures with strict alignment requirements compensate
the misalignment of the IP header that prepending the EtherIP
header might have caused.

PR: 131921
MFC after: 1 week


189303 03-Mar-2009 bz

Start removing IPv6 Type 0 Routing header code.
RH0 was deprecated by RFC 5095.

While most of the code had been disabled by #if 0 already, leave a
bit of infrastructure for possible RH2 code and a log message under
BURN_BRIDGES in case a user still tries to send RH0 packets.

Reviewed by: gnn (a bit back, earlier version)


189225 01-Mar-2009 bz

Add size-guards evaluated at compile-time to the main struct vnet_*
which are not in a module of their own like gif.

Single kernel compiles and universe will fail if the size of the struct
changes. Th expected values are given in sys/vimage.h.
See the comments where how to handle this.

Requested by: peter


189106 27-Feb-2009 bz

For all files including net/vnet.h directly include opt_route.h and
net/route.h.

Remove the hidden include of opt_route.h and net/route.h from net/vnet.h.

We need to make sure that both opt_route.h and net/route.h are included
before net/vnet.h because of the way MRT figures out the number of FIBs
from the kernel option. If we do not, we end up with the default number
of 1 when including net/vnet.h and array sizes are wrong.

This does not change the list of files which depend on opt_route.h
but we can identify them now more easily.


189103 27-Feb-2009 bz

Shuffle the vimage.h includes or add where missing.


188963 23-Feb-2009 rwatson

Assert the radix head lock in in6_rtqkill().

MFC after: 3 days


188306 08-Feb-2009 bz

Try to remove/assimilate as much of formerly IPv4/6 specific
(duplicate) code in sys/netipsec/ipsec.c and fold it into
common, INET/6 independent functions.

The file local functions ipsec4_setspidx_inpcb() and
ipsec6_setspidx_inpcb() were 1:1 identical after the change
in r186528. Rename to ipsec_setspidx_inpcb() and remove the
duplicate.

Public functions ipsec[46]_get_policy() were 1:1 identical.
Remove one copy and merge in the factored out code from
ipsec_get_policy() into the other. The public function left
is now called ipsec_get_policy() and callers were adapted.

Public functions ipsec[46]_set_policy() were 1:1 identical.
Rename file local ipsec_set_policy() function to
ipsec_set_policy_internal().
Remove one copy of the public functions, rename the other
to ipsec_set_policy() and adapt callers.

Public functions ipsec[46]_hdrsiz() were logically identical
(ignoring one questionable assert in the v6 version).
Rename the file local ipsec_hdrsiz() to ipsec_hdrsiz_internal(),
the public function to ipsec_hdrsiz(), remove the duplicate
copy and adapt the callers.
The v6 version had been unused anyway. Cleanup comments.

Public functions ipsec[46]_in_reject() were logically identical
apart from statistics. Move the common code into a file local
ipsec46_in_reject() leaving vimage+statistics in small AF specific
wrapper functions. Note: unfortunately we already have a public
ipsec_in_reject().

Reviewed by: sam
Discussed with: rwatson (renaming to *_internal)
MFC after: 26 days
X-MFC: keep wrapper functions for public symbols?


188151 05-Feb-2009 jamie

Don't bother null-checking the thread pointer before the prison checks
in udp6_connect (td is already dereferenced elsewhere without such a
check). This makes the conversion from a sockaddr to a sockaddr_in6
always happen, so convert once at the beginning of the function rather
than twice in the middle.

Approved by: bz (mentor)


188148 05-Feb-2009 jamie

Remove redundant calls of prison_local_ip4 in in_pcbbind_setup, and of
prison_local_ip6 in in6_pcbbind.

Approved by: bz (mentor)


188144 05-Feb-2009 jamie

Standardize the various prison_foo_ip[46] functions and prison_if to
return zero on success and an error code otherwise. The possible errors
are EADDRNOTAVAIL if an address being checked for doesn't match the
prison, and EAFNOSUPPORT if the prison doesn't have any addresses in
that address family. For most callers of these functions, use the
returned error code instead of e.g. a hard-coded EADDRNOTAVAIL or
EINVAL.

Always include a jailed() check in these functions, where a non-jailed
cred always returns success (and makes no changes). Remove the explicit
jailed() checks that preceded many of the function calls.

Approved by: bz (mentor)


188113 04-Feb-2009 bz

When iterating through the list trying to find a router in
defrouter_select(), NULL the cached llentry after unlocking
as we are no longer interested in it and with the second
iteration would try to unlock it again resulting in
panic: Lock (rw) lle not locked @ ...

Reported by: Mark Atkinson <m.atkinson@f5.com>
Tested by: Mark Atkinson <m.atkinson@f5.com>
PR: kern/128247 (in follow-up, unrelated to original report)


188067 03-Feb-2009 rrs

- Cleanup checksum code.
- Prepare for CRC offloading, add MIB counters (RS/MT).
- Bugfix: Disable CRC computation for IPv6 addresses with local scope (MT).
- Bugfix: Handle close() with SO_LINGER correctly when notifications
are generated during the close() call(MT).
- Bugfix: Generate DRY event when sender is dry during subscription.
Only for 1-to-1 style sockets (RS/MT)
- Bugfix: Put vtags for the correct amount of time into time-wait (MT).
- Bugfix: Clear vtag entries correctly on expiration (MT).
- Bugfix: shutdown() indicates ENOTCONN when called for unconnected
1-to-1 style sockets (MT).
- Bugfix: In sctp Auth code (PL).
- Add support for devices that support SCTP csum offload (igb).
- Add missing sctp_associd to mib sysctl xsctp_tcb structure (RS)
Obtained from: With help from Peter Lei and Michael Tuexen


187989 01-Feb-2009 bz

Remove the single global unlocked route cache ip6_forward_rt
from the inet6 stack along with statistics and make sure we
properly free the rt in all cases.

While the current situation is not better performance wise it
prevents panics seen more often these days.
After more inet6 and ipsec cleanup we should be able to improve
the situation again passing the rt to ip6_forward directly.

Leave the ip6_forward_rt entry in struct vinet6 but mark it
for removal.

PR: kern/128247, kern/131038
MFC after: 25 days
Committed from: Bugathon #6
Tested by: Denis Ahrens <denis@h3q.com> (different initial version)


187958 31-Jan-2009 bz

Remove unused local MACROs.

Submitted by: Christoph Mallon christoph.mallon@gmx.de
MFC after: 2 weeks


187949 31-Jan-2009 bz

Coalesce two consecutive #ifdef IPSEC blocks.
Move the skip_ipsec: label below the goto as we can never have
ipsecrt set if we get to that label so there is no need to check.

MFC after: 2 weeks


187947 31-Jan-2009 bz

Remove dead code from #if 0:
we do not have an ipsrcchk_rt anywhere else.

MFC after: 2 weeks


187946 31-Jan-2009 bz

Like with r185713 make sure to not leak a lock as rtalloc1(9) returns
a locked route. Thus we have to use RTFREE_LOCKED(9) to get it unlocked
and rtfree(9)d rather than just rtfree(9)d.

Since the PR was filed, new places with the same problem were added
with new code. Also check that the rt is valid before freeing it
either way there.

PR: kern/129793
Submitted by: Dheeraj Reddy <dheeraj@ece.gatech.edu>
MFC after: 2 weeks
Committed from: Bugathon #6


187939 30-Jan-2009 bz

Remove 4 entirely unsued ip6 variables.
Leave then in struct vinet6 to not break the ABI with kernel modules
but mark them for removal so we can do it in one batch when the time
is right.

MFC after: 1 month


187684 25-Jan-2009 bz

For consistency with prison_{local,remote,check}_ipN rename
prison_getipN to prison_get_ipN.

Submitted by: jamie (as part of a larger patch)
MFC after: 1 week


187380 18-Jan-2009 sam

remove too noisy DIAGNOSTIC code

Reviewed by: qingli


187094 12-Jan-2009 qingli

Revive the RTF_LLINFO flag in route.h. The kernel code is guarded
by the new kernel option COMPAT_ROUTE_FLAGS for binary backward
compatibility. The RTF_LLDATA flag maps to the same value as RTF_LLINFO.
RTF_LLDATA is used by the arp and ndp utilities. The RTF_LLDATA flag is
always returned to the userland regardless whether the COMPAT_ROUTE_FLAGS
is defined.


186980 09-Jan-2009 bz

Restrict arp, ndp and theoretically the FIB listing (if not
read with libkvm) to the addresses of a prison, when inside a
jail. [1]
As the patch from the PR was pre-'new-arp', add checks to the
llt_dump handlers as well.

While touching RTM_GET in route_output(), consistently use
curthread credentials rather than the creds from the socket
there. [2]

PR: kern/68189
Submitted by: Mark Delany <sxcg2-fuwxj@qmda.emu.st> [1]
Discussed with: rwatson [2]
Reviewed by: rwatson
MFC after: 4 weeks


186948 09-Jan-2009 bz

Make SIOCGIFADDR and related, as well as SIOCGIFADDR_IN6 and related
jail-aware. Up to now we returned the first address of the interface
for SIOCGIFADDR w/o an ifr_addr in the query. This caused problems for
programs querying for an address but running inside a jail, as the
address returned usually did not belong to the jail.
Like for v6, if there was an ifr_addr given on v4, you could probe
for more addresses on the interfaces that you were not allowed to see
from inside a jail. Return an error (EADDRNOTAVAIL) in that case
now unless the address is on the given interface and valid for the
jail.

PR: kern/114325
Reviewed by: rwatson
MFC after: 4 weeks


186821 06-Jan-2009 rrs

Addresses Roberts comments on comments. Also adds
the KASSERT and checks suggested.

Reviewed by: The udp tunneling was discussed on net@ under the
thread entitled "Heads up -- Thinking about UDP and tunneling"


186813 06-Jan-2009 rrs

Add the ability of an alternate transport protocol
to easily tunnel over udp by providing a hook
function that will be called instead of appending
to the socket buffer.


186791 05-Jan-2009 bz

Switch the last protosw* structs to C99 initializers.

Reviewed by: ed, julian, Christoph Mallon <christoph.mallon@gmx.de>
MFC after: 2 weeks


186751 04-Jan-2009 rwatson

Unlike with struct protosw, several instances of struct ip6protosw
did not use C99-style sparse structure initialization, so remove
NULL assignments for now-removed pr_usrreq function pointers.

Reported by: Chris Ruiz <yr.retarded at gmail.com>


186750 04-Jan-2009 rwatson

struct ip6protosw is a copy of struct protosw, so remove pr_usrreq there
to reflect removal from struct protosw.

Spotted by: ed


186708 03-Jan-2009 qingli

Some modules such as SCTP supplies a valid route entry as an input argument
to ip_output(). The destionation is represented in a sockaddr{} object
that may contain other pieces of information, e.g., port number. This
same destination sockaddr{} object may be passed into L2 code, which
could be used to create a L2 entry. Since there exists a L2 table per
address family, the L2 lookup function can make address family specific
comparison instead of the generic bcmp() operation over the entire
sockaddr{} structure.

Note in the IPv6 case the sin6_scope_id is not compared because the
address is currently stored in the embedded form inside the kernel.
The in6_lltable_lookup() has to account for the scope-id if this
storage format were to change in the future.


186500 26-Dec-2008 qingli

This checkin addresses a couple of issues:
1. The "route" command allows route insertion through the interface-direct
option "-iface". During if_attach(), an sockaddr_dl{} entry is created
for the interface and is part of the interface address list. This
sockaddr_dl{} entry describes the interface in detail. The "route"
command selects this entry as the "gateway" object when the "-iface"
option is present. The "arp" and "ndp" commands also interact with the
kernel through the routing socket when adding and removing static L2
entries. The static L2 information is also provided through the
"gateway" object with an AF_LINK family type, similar to what is
provided by the "route" command. In order to differentiate between
these two types of operations, a RTF_LLDATA flag is introduced. This
flag is set by the "arp" and "ndp" commands when issuing the add and
delete commands. This flag is also set in each L2 entry returned by the
kernel. The "arp" and "ndp" command follows a convention where a RTM_GET
is issued first followed by a RTM_ADD/DELETE. This RTM_GET request fills
in the fields for a "rtm" object, which is reinjected into the kernel by
a subsequent RTM_ADD/DELETE command. The entry returend from RTM_GET
is a prefix route, so the RTF_LLDATA flag must be specified when issuing
the RTM_ADD/DELETE messages.

2. Enforce the convention that NET_RT_FLAGS with a 0 w_arg is the
specification for retrieving L2 information. Also optimized the
code logic.

Reviewed by: julian


186468 24-Dec-2008 kmacy

avoid lock recursion by deferring the link check until after LLE lock is dropped


186393 22-Dec-2008 bz

Correct variable name in comment.

MFC after: 4 weeks


186392 22-Dec-2008 qingli

Similar to the INET case, do not destroy the nd6 entries for
interface addresses until those addresses are removed. I already
made the patch in INET but forgot to bring the code over for
INET6.


186293 18-Dec-2008 bz

Only unlock the llentry if it is actually valid.

Reported by: ed


186223 17-Dec-2008 bz

Another step assimilating IPv[46] PCB code:
normalize IN6P_* compat flags usage to their equialent
INP_* counterpart.

Discussed with: rwatson
Reviewed by: rwatson
MFC after: 4 weeks


186222 17-Dec-2008 bz

Use inc_flags instead of the inc_isipv6 alias which so far
had been the only flag with random usage patterns.
Switch inc_flags to be used as a real bit field by using
INC_ISIPV6 with bitops to check for the 'isipv6' condition.

While here fix a place or two where in case of v4 inc_flags
were not properly initialized before.[1]

Found by: rwatson during review [1]
Discussed with: rwatson
Reviewed by: rwatson
MFC after: 4 weeks


186217 17-Dec-2008 qingli

Remove the rt argument from nd6_storelladdr() because
rt is no longer accessed.


186216 17-Dec-2008 qingli

A couple of files were not meant to be committed.


186215 17-Dec-2008 qingli

in6_clsroute() was applied to prefix routes causing some
of them to expire. in6_clsroute() was only applied to
cloned routes that are no longer applicable after the
arp-v2 commit.


186198 17-Dec-2008 kmacy

* Compare pointer with NULL
* Remove trailing whitespace (added in r186162)
* Reduce indentation by rephrasing test

Submitted by: Christopher Mallon (christoph dot mallon at gmx dot de)


186196 16-Dec-2008 kmacy

- Simplify handling of the deferring of mbuf transmit until after lle lock drop
- add a couple of comments to clarify intent


186170 16-Dec-2008 kmacy

check pointers against NULL


186163 16-Dec-2008 kmacy

convert more pointer validation checks to checking against NULL


186162 16-Dec-2008 kmacy

simplify locking in find_pfxlist_reachable_router


186160 16-Dec-2008 kmacy

explicitly check return of lla_lookup against NULL


186159 16-Dec-2008 kmacy

advance tail pointer in nd6_output_lle and check lla_output return against NULL


186158 16-Dec-2008 kmacy

check return from lla_lookup against NULL not zero


186157 16-Dec-2008 kmacy

make sure redirect doesn't return without dropping the lock


186156 16-Dec-2008 kmacy

need to check that lle is not null before unlock if the break condition is not met
also fix the break condition to explicitly check against NULL


186155 16-Dec-2008 kmacy

unlock the llentry after use in find_pfxlist_reachable_router


186153 16-Dec-2008 qingli

Initialize the variable "router", and apply "static_route" flag
across the entire nd6_cache_lladdr() function.


186150 16-Dec-2008 kmacy

unlock and destroy an llentry's lock before freeing

Found by: sam


186148 16-Dec-2008 kmacy

unlock looked up llentrys in defrouter_select


186147 16-Dec-2008 kmacy

fix two use after frees in nd6_cache_lladdr caused by last minute unlock shuffling


186141 15-Dec-2008 bz

Another step assimilating IPv[46] PCB code - directly use
the inpcb names rather than the following IPv6 compat macros:
in6pcb,in6p_sp, in6p_ip6_nxt,in6p_flowinfo,in6p_vflag,
in6p_flags,in6p_socket,in6p_lport,in6p_fport,in6p_ppcb and
sotoin6pcb().

Apart from removing duplicate code in netipsec, this is a pure
whitespace, not a functional change.

Discussed with: rwatson
Reviewed by: rwatson (version before review requested changes)
MFC after: 4 weeks (set the timer and see then)


186119 15-Dec-2008 qingli

This main goals of this project are:
1. separating L2 tables (ARP, NDP) from the L3 routing tables
2. removing as much locking dependencies among these layers as
possible to allow for some parallelism in the search operations
3. simplify the logic in the routing code,

The most notable end result is the obsolescent of the route
cloning (RTF_CLONING) concept, which translated into code reduction
in both IPv4 ARP and IPv6 NDP related modules, and size reduction in
struct rtentry{}. The change in design obsoletes the semantics of
RTF_CLONING, RTF_WASCLONE and RTF_LLINFO routing flags. The userland
applications such as "arp" and "ndp" have been modified to reflect
those changes. The output from "netstat -r" shows only the routing
entries.

Quite a few developers have contributed to this project in the
past: Glebius Smirnoff, Luigi Rizzo, Alessandro Cerri, and
Andre Oppermann. And most recently:

- Kip Macy revised the locking code completely, thus completing
the last piece of the puzzle, Kip has also been conducting
active functional testing
- Sam Leffler has helped me improving/refactoring the code, and
provided valuable reviews
- Julian Elischer setup the perforce tree for me and has helped
me maintaining that branch before the svn conversion


186051 13-Dec-2008 kmacy

in6_addroute is called through rnh_addadr which is always called with the radix node head lock held
exclusively. Pass RTF_RNH_LOCKED to rtalloc so that rtalloc1_fib will not try to re-acquire the lock.


186048 13-Dec-2008 bz

Second round of putting global variables, which were virtualized
but formerly missed under VIMAGE_GLOBAL.

Put the extern declarations of the virtualized globals
under VIMAGE_GLOBAL as the globals themsevles are already.
This will help by the time when we are going to remove the globals
entirely.

Sponsored by: The FreeBSD Foundation


185965 12-Dec-2008 kmacy

RTF_RNH_LOCKED needs to be passed in the flags arg not report,
apologies to thompsa


185964 12-Dec-2008 thompsa

Pass RTF_RNH_LOCKED to rtalloc1 sunce the node head is locked, this avoids a
recursive lock panic on inet6 detach.

Reviewed by: kmacy


185937 11-Dec-2008 bz

Put a global variables, which were virtualized but formerly
missed under VIMAGE_GLOBAL.

Start putting the extern declarations of the virtualized globals
under VIMAGE_GLOBAL as the globals themsevles are already.
This will help by the time when we are going to remove the globals
entirely.

While there garbage collect a few dead externs from ip6_var.h.

Sponsored by: The FreeBSD Foundation


185895 10-Dec-2008 zec

Conditionally compile out V_ globals while instantiating the appropriate
container structures, depending on VIMAGE_GLOBALS compile time option.

Make VIMAGE_GLOBALS a new compile-time option, which by default will not
be defined, resulting in instatiations of global variables selected for
V_irtualization (enclosed in #ifdef VIMAGE_GLOBALS blocks) to be
effectively compiled out. Instantiate new global container structures
to hold V_irtualized variables: vnet_net_0, vnet_inet_0, vnet_inet6_0,
vnet_ipsec_0, vnet_netgraph_0, and vnet_gif_0.

Update the VSYM() macro so that depending on VIMAGE_GLOBALS the V_
macros resolve either to the original globals, or to fields inside
container structures, i.e. effectively

#ifdef VIMAGE_GLOBALS
#define V_rt_tables rt_tables
#else
#define V_rt_tables vnet_net_0._rt_tables
#endif

Update SYSCTL_V_*() macros to operate either on globals or on fields
inside container structs.

Extend the internal kldsym() lookups with the ability to resolve
selected fields inside the virtualization container structs. This
applies only to the fields which are explicitly registered for kldsym()
visibility via VNET_MOD_DECLARE() and vnet_mod_register(), currently
this is done only in sys/net/if.c.

Fix a few broken instances of MODULE_GLOBAL() macro use in SCTP code,
and modify the MODULE_GLOBAL() macro to resolve to V_ macros, which in
turn result in proper code being generated depending on VIMAGE_GLOBALS.

De-virtualize local static variables in sys/contrib/pf/net/pf_subr.c
which were prematurely V_irtualized by automated V_ prepending scripts
during earlier merging steps. PF virtualization will be done
separately, most probably after next PF import.

Convert a few variable initializations at instantiation to
initialization in init functions, most notably in ipfw. Also convert
TUNABLE_INT() initializers for V_ variables to TUNABLE_FETCH_INT() in
initializer functions.

Discussed at: devsummit Strassburg
Reviewed by: bz, julian
Approved by: julian (mentor)
Obtained from: //depot/projects/vimage-commit2/...
X-MFC after: never
Sponsored by: NLnet Foundation, The FreeBSD Foundation


185751 08-Dec-2008 imp

Add missing include to sys/lock.h before sys/rwlock.h


185747 07-Dec-2008 kmacy

- convert radix node head lock from mutex to rwlock
- make radix node head lock not recursive
- fix LOR in rtexpunge
- fix LOR in rtredirect

Reviewed by: sam


185694 06-Dec-2008 rrs

Code from the hack-session known as the IETF (and a
bit of debugging afterwards):
- Fix protection code for notification generation.
- Decouple associd from vtag
- Allow vtags to have less strigent requirements in non-uniqueness.
o don't pre-hash them when you issue one in a cookie.
o Allow duplicates and use addresses and ports to
discriminate amongst the duplicates during lookup.
- Add support for the NAT draft draft-ietf-behave-sctpnat-00, this
is still experimental and needs more extensive testing with the
Jason Butt ipfw changes.
- Support for the SENDER_DRY event to get DTLS in OpenSSL working
with a set of patches from Michael Tuexen (hopefully heading to OpenSSL soon).
- Update the support of SCTP-AUTH by Peter Lei.
- Use macros for refcounting.
- Fix MTU for UDP encapsulation.
- Fix reporting back of unsent data.
- Update assoc send counter handling to be consistent with endpoint sent counter.
- Fix a bug in PR-SCTP.
- Fix so we only send another FWD-TSN when a SACK arrives IF and only
if the adv-peer-ack point progressed. However we still make sure
a timer is running if we do have an adv_peer_ack point.
- Fix PR-SCTP bug where chunks were retransmitted if they are sent
unreliable but not abandoned yet.

With the help of: Michael Teuxen and Peter Lei :-)
MFC after: 4 weeks


185571 02-Dec-2008 bz

Rather than using hidden includes (with cicular dependencies),
directly include only the header files needed. This reduces the
unneeded spamming of various headers into lots of files.

For now, this leaves us with very few modules including vnet.h
and thus needing to depend on opt_route.h.

Reviewed by: brooks, gnn, des, zec, imp
Sponsored by: The FreeBSD Foundation


185435 29-Nov-2008 bz

MFp4:
Bring in updated jail support from bz_jail branch.

This enhances the current jail implementation to permit multiple
addresses per jail. In addtion to IPv4, IPv6 is supported as well.
Due to updated checks it is even possible to have jails without
an IP address at all, which basically gives one a chroot with
restricted process view, no networking,..

SCTP support was updated and supports IPv6 in jails as well.

Cpuset support permits jails to be bound to specific processor
sets after creation.

Jails can have an unrestricted (no duplicate protection, etc.) name
in addition to the hostname. The jail name cannot be changed from
within a jail and is considered to be used for management purposes
or as audit-token in the future.

DDB 'show jails' command was added to aid debugging.

Proper compat support permits 32bit jail binaries to be used on 64bit
systems to manage jails. Also backward compatibility was preserved where
possible: for jail v1 syscalls, as well as with user space management
utilities.

Both jail as well as prison version were updated for the new features.
A gap was intentionally left as the intermediate versions had been
used by various patches floating around the last years.

Bump __FreeBSD_version for the afore mentioned and in kernel changes.

Special thanks to:
- Pawel Jakub Dawidek (pjd) for his multi-IPv4 patches
and Olivier Houchard (cognet) for initial single-IPv6 patches.
- Jeff Roberson (jeff) and Randall Stewart (rrs) for their
help, ideas and review on cpuset and SCTP support.
- Robert Watson (rwatson) for lots and lots of help, discussions,
suggestions and review of most of the patch at various stages.
- John Baldwin (jhb) for his help.
- Simon L. Nielsen (simon) as early adopter testing changes
on cluster machines as well as all the testers and people
who provided feedback the last months on freebsd-jail and
other channels.
- My employer, CK Software GmbH, for the support so I could work on this.

Reviewed by: (see above)
MFC after: 3 months (this is just so that I get the mail)
X-MFC Before: 7.2-RELEASE if possible


185419 28-Nov-2008 zec

Unhide declarations of network stack virtualization structs from
underneath #ifdef VIMAGE blocks.

This change introduces some churn in #include ordering and nesting
throughout the network stack and drivers but is not expected to cause
any additional issues.

In the next step this will allow us to instantiate the virtualization
container structures and switch from using global variables to their
"containerized" counterparts.

Reviewed by: bz, julian
Approved by: julian (mentor)
Obtained from: //depot/projects/vimage-commit2/...
X-MFC after: never
Sponsored by: NLnet Foundation, The FreeBSD Foundation


185370 27-Nov-2008 bz

Merge in6_pcbfree() into in_pcbfree() which after the previous
IPsec change in r185366 only differed in two additonal IPv6 lines.
Rather than splattering conditional code everywhere add the v6
check centrally at this single place.

Reviewed by: rwatson (as part of a larger changset)
MFC after: 6 weeks (*)
(*) possibly need to leave a stub wrapper in 7 to keep the symbol.


185366 27-Nov-2008 bz

Unify ipsec[46]_delete_pcbpolicy in ipsec_delete_pcbpolicy.
Ignoring different names because of macros (in6pcb, in6p_sp) and
inp vs. in6p variable name both functions were entirely identical.

Reviewed by: rwatson (as part of a larger changeset)
MFC after: 6 weeks (*)
(*) possibly need to leave a stub wrappers in 7 to keep the symbols.


185348 26-Nov-2008 zec

Merge more of currently non-functional (i.e. resolving to
whitespace) macros from p4/vimage branch.

Do a better job at enclosing all instantiations of globals
scheduled for virtualization in #ifdef VIMAGE_GLOBALS blocks.

De-virtualize and mark as const saorder_state_alive and
saorder_state_any arrays from ipsec code, given that they are never
updated at runtime, so virtualizing them would be pointless.

Reviewed by: bz, julian
Approved by: julian (mentor)
Obtained from: //depot/projects/vimage-commit2/...
X-MFC after: never
Sponsored by: NLnet Foundation, The FreeBSD Foundation


185344 26-Nov-2008 bz

Remove in6_pcbdetach() as it is exactly the same function
as in_pcbdetach() and we don't need the code twice.

Reviewed by: rwatson
MFC after: 6 weeks (*)
(*) possibly need to leave a stub wrapper in 7 to keep the symbol.


185333 26-Nov-2008 bz

Unify the v4 and v6 versions of pcbdetach and pcbfree as good
as possible so that they are easily diffable.

No functional changes.

Reviewed by: rwatson
MFC after: 6 weeks


185332 26-Nov-2008 bz

Plug a credential leak in case the inpcb is freed by
in6_pcbfree() instead of in_pcbfree(); missed in r183606.

Reviewed by: rwatson
MFC after: 3 days (instantly for 7.1-RC?)


185088 19-Nov-2008 zec

Change the initialization methodology for global variables scheduled
for virtualization.

Instead of initializing the affected global variables at instatiation,
assign initial values to them in initializer functions. As a rule,
initialization at instatiation for such variables should never be
introduced again from now on. Furthermore, enclose all instantiations
of such global variables in #ifdef VIMAGE_GLOBALS blocks.

Essentialy, this change should have zero functional impact. In the next
phase of merging network stack virtualization infrastructure from
p4/vimage branch, the new initialization methology will allow us to
switch between using global variables and their counterparts residing in
virtualization containers with minimum code churn, and in the long run
allow us to intialize multiple instances of such container structures.

Discussed at: devsummit Strassburg
Reviewed by: bz, julian
Approved by: julian (mentor)
Obtained from: //depot/projects/vimage-commit2/...
X-MFC after: never
Sponsored by: NLnet Foundation, The FreeBSD Foundation


184307 26-Oct-2008 rwatson

Add a MAC label, MAC Framework, and MAC policy entry points for IPv6
fragment reassembly queues.

This allows policies to label reassembly queues, perform access
control checks when matching fragments to a queue, update a queue
label when fragments are matched, and label the resulting
reassembled datagram.

Obtained from: TrustedBSD Project


184214 23-Oct-2008 des

Fix a number of style issues in the MALLOC / FREE commit. I've tried to
be careful not to fix anything that was already broken; the NFSv4 code is
particularly bad in this respect.


184205 23-Oct-2008 des

Retire the MALLOC and FREE macros. They are an abomination unto style(9).

MFC after: 3 months


184096 20-Oct-2008 bz

Bring over the change switching from using sequential to random
ephemeral port allocation as implemented in netinet/in_pcb.c rev. 1.143
(initially from OpenBSD) and follow-up commits during the last four and
a half years including rev. 1.157, 1.162 and 1.199.
This now is relying on the same infrastructure as has been implemented
in in_pcb.c since rev. 1.199.

Reviewed by: silby, rpaulo, mlaier
MFC after: 2 months


183923 15-Oct-2008 bz

Check that the mbuf len is positive (like we do in the v4 case).

Read the other way round this means that even with the checks
the m_len turned negative in some cases which led to panics.
The reason to my understanding seems to be that the checks are wrong
(also for v4) ignoring possible padding when checking cmsg_len or
padding after data when adjusting the mbuf.
Doing proper cheks seems to break applications like named so
further investigation and regression tests are needed.

PR: kern/119123
Tested by: Ashish Shukla wahjava gmail.com
MFC after: 3 days


183807 12-Oct-2008 rwatson

When disconnecting a UDPv6 socket, acquire the socket lock around the
changing of the so_state field, as is done in UDPv4. Remove XXX
locking comment.

MFC after: 3 days


183611 04-Oct-2008 bz

Style changes: compare pointer to NULL and move a }.

MFC after: 6 weeks


183606 04-Oct-2008 bz

Cache so_cred as inp_cred in the inpcb.
This means that inp_cred is always there, even after the socket
has gone away. It also means that it is constant for the lifetime
of the inp.
Both facts lead to simpler code and possibly less locking.

Suggested by: rwatson
Reviewed by: rwatson
MFC after: 6 weeks
X-MFC Note: use a inp_pspare for inp_cred


183550 02-Oct-2008 zec

Step 1.5 of importing the network stack virtualization infrastructure
from the vimage project, as per plan established at devsummit 08/08:
http://wiki.freebsd.org/Image/Notes200808DevSummit

Introduce INIT_VNET_*() initializer macros, VNET_FOREACH() iterator
macros, and CURVNET_SET() context setting macros, all currently
resolving to NOPs.

Prepare for virtualization of selected SYSCTL objects by introducing a
family of SYSCTL_V_*() macros, currently resolving to their global
counterparts, i.e. SYSCTL_V_INT() == SYSCTL_INT().

Move selected #defines from sys/sys/vimage.h to newly introduced header
files specific to virtualized subsystems (sys/net/vnet.h,
sys/netinet/vinet.h etc.).

All the changes are verified to have zero functional impact at this
point in time by doing MD5 comparision between pre- and post-change
object files(*).

(*) netipsec/keysock.c did not validate depending on compile time options.

Implemented by: julian, bz, brooks, zec
Reviewed by: julian, bz, brooks, kris, rwatson, ...
Approved by: julian (mentor)
Obtained from: //depot/projects/vimage-commit2/...
X-MFC after: never
Sponsored by: NLnet Foundation, The FreeBSD Foundation


183529 02-Oct-2008 cperciva

Default to ignoring potentially evil IPv6 Neighbor Solicitation
messages.

Approved by: so (cperciva)
Approved by: re (kensmith)
Security: FreeBSD-SA-08:10.nd6
Thanks to: jinmei, bz


183265 22-Sep-2008 rwatson

When invoking the udp_send() from udp6_send() due to use of a v6-mapped
IPv4 address, first drop the udbinfo and inpcb locks, which will otherwise
be recursed. This leads to a potential minor race, but is preferable to a
deadlock when acquiring a read lock after a write lock on the inpcb.

MFC after: 3 days
Reported by: Norbert Papke <fbsd-ml@scrapper.ca>, lioux


182915 10-Sep-2008 bz

mld_timerresid() returns ms so instead of doing the maths in usec
and then dividing down to ms, do the maths in ms.

Obtained from: NetBSD mld6.c rev. 1.47
MFC after: 2 months


182740 03-Sep-2008 simon

- Fix amd64 local privilege escalation. [08:07]
- Fix nmount(2) local privilege escalation. [08:08]
- Fix IPv6 remote kernel panics. [08:09]

Fix for [08:07] is merge of r181823.

Submitted by: kib [08:07], csjp [08:08], bz [08:09]
Reviewed by: peter [08:07], jhb [08:07]
Reviewed by: jinmei [08:09], rwatson [08:09]
Approved by: re (SA blanket)
Approved by: so (simon)
Security: FreeBSD-SA-08:07.amd64
Security: FreeBSD-SA-08:08.nmount
Security: FreeBSD-SA-08:09.icmp6


182713 03-Sep-2008 bz

Fix a bug, when a specially crafted ICMPV6 MLD packet could lead
to an integer divide by zero panic in the kernel, if the kernel was
run with hz<1000.
Neither i386, pc98, amd64 or sparc64 are affected in the currently
supported branches and default configuration.

Submitted by: Miikka Saukko, Ossi Herrala and Jukka Taimisto from
the CROSS project at Codenomicon Ltd. via CERT-FI.
Reviewed by: bz, rwatson
Security: CVE-2008-2464
MFC after: 8 hours


182537 31-Aug-2008 rwatson

In UDPv6, reduce scope of global udbinfo lock during append to last
matching socket by dropping it before udp6_append(), and remove
duplicate unlocks of udbinfo and inpcb in sysctl return path.

MFC after: 3 days


182150 25-Aug-2008 julian

another missed V_


181888 20-Aug-2008 julian

Fix some of the formatting fixes.. It's amazing how some thing stand out
in a commit message.


181887 20-Aug-2008 julian

A bunch of formatting fixes brough to light by, or created by the Vimage commit
a few days ago.


181832 18-Aug-2008 bz

As part of step 1.5 of the vimage framework resolve conflicts with
file local static globals which would be folded onto the same name
with the V_ macros.

Reviewed by: kris, brooks, simon


181803 17-Aug-2008 bz

Commit step 1 of the vimage project, (network stack)
virtualization work done by Marko Zec (zec@).

This is the first in a series of commits over the course
of the next few weeks.

Mark all uses of global variables to be virtualized
with a V_ prefix.
Use macros to map them back to their global names for
now, so this is a NOP change only.

We hope to have caught at least 85-90% of what is needed
so we do not invalidate a lot of outstanding patches again.

Obtained from: //depot/projects/vimage-commit2/...
Reviewed by: brooks, des, ed, mav, julian,
jamie, kris, rwatson, zec, ...
(various people I forgot, different versions)
md5 (with a bit of help)
Sponsored by: NLnet Foundation, The FreeBSD Foundation
X-MFC after: never
V_Commit_Message_Reviewed_By: more people than the patch


181782 16-Aug-2008 bz

Fix a regression introduced in r179289 splitting up ip6_savecontrol()
into v4-only vs. v6-only inp_flags processing.
When ip6_savecontrol_v4() is called from ip6_savecontrol() we
were not passing back the **mp thus the information will be missing
in userland.
Istead of going with a *** as suggested in the PR we are returning
**mp now and passing in the v4only flag as a pointer argument.

PR: kern/126349
Reviewed by: rwatson, dwmalone


180990 30-Jul-2008 rwatson

Adopt the slightly weaker consistency locking approach used in IPv4 raw
sockets for IPv6 raw sockets: separately lock the inpcb for determining
the destination address for a connect()'d raw socket at the rip6_send()
layer, and then re-acquire the inpcb lock in the rip6_output() layer to
query other options on the socket. Previously, the global raw IP socket
lock was used, which while correct and marginally more consistent, could
add significantly to global raw IP socket lock contention.

MFC after: 1 week


180968 29-Jul-2008 rwatson

When copying in and out current ICMPv6 filters on a raw IPv6 socket,
lock the inpcb and use a local stack variable to copy to/from userspace
so that sooptcopyin()/sooptcopyout() aren't called while holding an
rwlock.

While here, fix a bug in which a failed sooptcopyin() might lead to
partially consistent ICMPv6 filters on the socket by not ignoring the
error returned by sooptcopyin().

MFC after: 2 weeks


180965 29-Jul-2008 rwatson

Since we fail IPv6 raw socket allocation if inp->in6p_icmp6filt can't
be allocated, there's no need to conditionize use and freeing of it
later.

MFC after: 1 week


180957 29-Jul-2008 rwatson

Marginally decomplicate set/getsockopt code in ip6_output.c by simply
using the passed arguments explicitly and unconditionally rather than
testing them and calling panic(). The result is the same but easier
to read.

MFC after: 3 days


180932 28-Jul-2008 mav

Move inpcb lock higher to protect some nonbinding fields reading.
It fixes nothing at this time, but decided to be more correct.


180850 27-Jul-2008 mav

According to in_pcb.h protocol binding information has double locking.
It allows access it while list travercing holding only global pcbinfo lock.


180427 10-Jul-2008 bz

Pass the ucred along into in{,6}_pcblookup_local for upcoming
prison checks.

Reviewed by: rwatson


180425 10-Jul-2008 bz

For consistency take lport as u_short in in{,6}_pcblookup_local.
All callers either pass in an u_short or u_int16_t.

Reviewed by: rwatson


180387 09-Jul-2008 rrs

1) Adds the rest of the VIMAGE change macros
2) Adds some __UserSpace__ on some of the common defines that
the user space code needs
3) Fixes a bug when we send up data to a user that failed. We
need to a) trim off the data chunk headers, if present, and
b) make sure the frag bit is communicated properly for the
msgs coming off the stream queues... i.e. we see if some
of the msg has been taken.

Obtained from: jeli contributed the VIMAGE changes on this pass Thanks Julain!


180386 09-Jul-2008 bz

Document required locking in in6_sleectsrc() in case an inp is
passed in by adding an assert.

Requested by: rwatson
Reviewed by: rwatson


180371 08-Jul-2008 bz

Change the parameters to in6_selectsrc():
- pass in the inp instead of both in6p_moptions and laddr.
- pass in cred for upcoming prison checks.

Reviewed by: rwatson


180365 08-Jul-2008 rwatson

Use soreceive_dgram() and sosend_dgram() with UDPv6, as we do with UDPv4.

Tested by: ps
MFC after: 3 months


180343 07-Jul-2008 rwatson

Drop read lock on udbinfo earlier during delivery to the last matching
UDP socket for a datagram; the inpcb read lock is sufficient to provide
inpcb stability during udp6_append().

MFC after: 1 month


180305 05-Jul-2008 rwatson

Improve approximation of style(9) in raw socket code.


180291 05-Jul-2008 rwatson

Introduce a new lock, hostname_mtx, and use it to synchronize access
to global hostname and domainname variables. Where necessary, copy
to or from a stack-local buffer before performing copyin() or
copyout(). A few uses, such as in cd9660 and daemon_saver, remain
under-synchronized and will require further updates.

Correct a bug in which a failed copyin() of domainname would leave
domainname potentially corrupted.

MFC after: 3 weeks


180239 04-Jul-2008 rwatson

Remove NETISR_MPSAFE, which allows specific netisr handlers to be directly
dispatched without Giant, and add NETISR_FORCEQUEUE, which allows specific
netisr handlers to always be dispatched via a queue (deferred). Mark the
usb and if_ppp netisr handlers as NETISR_FORCEQUEUE, and explicitly
acquire Giant in those handlers.

Previously, any netisr handler not marked NETISR_MPSAFE would necessarily
run deferred and with Giant acquired. This change removes Giant
scaffolding from the netisr infrastructure, but NETISR_FORCEQUEUE allows
non-MPSAFE handlers to continue to force deferred dispatch so as to avoid
lock order reversals between their acqusition of Giant and any calling
context.

It is likely we will be able to remove NETISR_FORCEQUEUE once
IFF_NEEDSGIANT is removed, as non-MPSAFE usb and if_ppp drivers will no
longer be supported.

Reviewed by: bz
MFC after: 1 month
X-MFC note: We can't remove NETISR_MPSAFE from stable/7 for KPI reasons,
but the rest can go back.


180214 03-Jul-2008 rwatson

Remove GIANT_REQUIRED from IPv6 input, forward, and frag6 code. The frag6
code is believed to be MPSAFE, and leaving aside the IPv6 route cache in
forwarding, Giant appears not to adequately synchronize the data structures
in the input or forwarding paths.


180197 02-Jul-2008 rwatson

Set the IPv6 netisr handler as NETISR_MPSAFE on the basis that, despite
there still being some well-known races in mld6 and nd6, running with
Giant over the netisr handler provides little or not additional
synchronization that might cause mld6 and nd6 to behave better.


180090 29-Jun-2008 bz

Try to fix errors introduced in svn180085/cvs rev. 1.10:

* Include ip6_var.h for ip6stat.
* Use the correct name under ip6stat: `ip6s_cantforward' instead
of its IPv4 counterpart.

MFC after: 10 days


180088 29-Jun-2008 kan

Repair botched variable rename.

Pointy hat to: julian


180085 29-Jun-2008 julian

Oops, we've been incrementing the wrong cantforward variable.

Obtained from: vimage tree


180084 29-Jun-2008 julian

Rename two vars so that they are different from the same vars in ipv4.
They are static so it was not a problem 'per se' but it was confusing to
the reader.

Obtained from: vimage tree


179783 14-Jun-2008 rrs

- Macro-izes the packed declaration in all headers.
- Vimage prep - these are major restructures to move
all global variables to be accessed via a macro or two.
The variables all go into a single structure.
- Asconf address addition tweaks (add_or_del Interfaces)
- Fix rwnd calcualtion to be more conservative.
- Support SACK_IMMEDIATE flag to skip delayed sack
by demand of peer.
- Comment updates in the sack mapping calculations
- Invarients panic added.
- Pre-support for UDP tunneling (we can do this on
MAC but will need added support from UDP to
get a "pipe" of UDP packets in.
- clear trace buffer sysctl added when local tracing on.

Note the majority of this huge patch is all the vimage prep stuff :-)


179412 29-May-2008 rwatson

Employ read locks on UDP inpcbs, rather than write locks, when
monitoring UDP connections using sysctls. In some cases, add
previously missing locking of inpcbs, as inp_socket is followed,
which also allows us to drop global locks more quickly.

MFC after: 1 week


179289 24-May-2008 bz

Factor out the v4-only vs. the v6-only inp_flags processing in
ip6_savecontrol in preparation for udp_append() to no longer
need an WLOCK as we will no longer be modifying socket options.

Requested by: rwatson
Reviewed by: gnn
MFC after: 10 days


179157 20-May-2008 rrs

- Adds support for the multi-asconf (From Kozuka-san)
- Adds some prepwork (Not all yet) for vimage in particular
support the delete the sctppcbinfo.xx structs. There is
still a leak in here if it were to be called plus we stil
need the regrouping (From Me and Michael Tuexen)
- Adds support for UDP tunneling. For BSD there is no
socket yet setup so its disabled, but major argument
changes are in here to emcompass the passing of the port
number (zero when you don't have a udp tunnel, the default
for BSD). Will add some hooks in UDP here shortly (discussed
with Robert) that will allow easy tunneling. (Mainly from
Peter Lei and Michael Tuexen with some BSD work from me :-D)
- Some ease for windows, evidently leave is reserved by their
compile move label leave: -> out:

MFC after: 1 week


178888 09-May-2008 julian

Add code to allow the system to handle multiple routing tables.
This particular implementation is designed to be fully backwards compatible
and to be MFC-able to 7.x (and 6.x)

Currently the only protocol that can make use of the multiple tables is IPv4
Similar functionality exists in OpenBSD and Linux.

From my notes:

-----

One thing where FreeBSD has been falling behind, and which by chance I
have some time to work on is "policy based routing", which allows
different
packet streams to be routed by more than just the destination address.

Constraints:
------------

I want to make some form of this available in the 6.x tree
(and by extension 7.x) , but FreeBSD in general needs it so I might as
well do it in -current and back port the portions I need.

One of the ways that this can be done is to have the ability to
instantiate multiple kernel routing tables (which I will now
refer to as "Forwarding Information Bases" or "FIBs" for political
correctness reasons). Which FIB a particular packet uses to make
the next hop decision can be decided by a number of mechanisms.
The policies these mechanisms implement are the "Policies" referred
to in "Policy based routing".

One of the constraints I have if I try to back port this work to
6.x is that it must be implemented as a EXTENSION to the existing
ABIs in 6.x so that third party applications do not need to be
recompiled in timespan of the branch.

This first version will not have some of the bells and whistles that
will come with later versions. It will, for example, be limited to 16
tables in the first commit.
Implementation method, Compatible version. (part 1)
-------------------------------
For this reason I have implemented a "sufficient subset" of a
multiple routing table solution in Perforce, and back-ported it
to 6.x. (also in Perforce though not always caught up with what I
have done in -current/P4). The subset allows a number of FIBs
to be defined at compile time (8 is sufficient for my purposes in 6.x)
and implements the changes needed to allow IPV4 to use them. I have not
done the changes for ipv6 simply because I do not need it, and I do not
have enough knowledge of ipv6 (e.g. neighbor discovery) needed to do it.

Other protocol families are left untouched and should there be
users with proprietary protocol families, they should continue to work
and be oblivious to the existence of the extra FIBs.

To understand how this is done, one must know that the current FIB
code starts everything off with a single dimensional array of
pointers to FIB head structures (One per protocol family), each of
which in turn points to the trie of routes available to that family.

The basic change in the ABI compatible version of the change is to
extent that array to be a 2 dimensional array, so that
instead of protocol family X looking at rt_tables[X] for the
table it needs, it looks at rt_tables[Y][X] when for all
protocol families except ipv4 Y is always 0.
Code that is unaware of the change always just sees the first row
of the table, which of course looks just like the one dimensional
array that existed before.

The entry points rtrequest(), rtalloc(), rtalloc1(), rtalloc_ign()
are all maintained, but refer only to the first row of the array,
so that existing callers in proprietary protocols can continue to
do the "right thing".
Some new entry points are added, for the exclusive use of ipv4 code
called in_rtrequest(), in_rtalloc(), in_rtalloc1() and in_rtalloc_ign(),
which have an extra argument which refers the code to the correct row.

In addition, there are some new entry points (currently called
rtalloc_fib() and friends) that check the Address family being
looked up and call either rtalloc() (and friends) if the protocol
is not IPv4 forcing the action to row 0 or to the appropriate row
if it IS IPv4 (and that info is available). These are for calling
from code that is not specific to any particular protocol. The way
these are implemented would change in the non ABI preserving code
to be added later.

One feature of the first version of the code is that for ipv4,
the interface routes show up automatically on all the FIBs, so
that no matter what FIB you select you always have the basic
direct attached hosts available to you. (rtinit() does this
automatically).

You CAN delete an interface route from one FIB should you want
to but by default it's there. ARP information is also available
in each FIB. It's assumed that the same machine would have the
same MAC address, regardless of which FIB you are using to get
to it.

This brings us as to how the correct FIB is selected for an outgoing
IPV4 packet.

Firstly, all packets have a FIB associated with them. if nothing
has been done to change it, it will be FIB 0. The FIB is changed
in the following ways.

Packets fall into one of a number of classes.

1/ locally generated packets, coming from a socket/PCB.
Such packets select a FIB from a number associated with the
socket/PCB. This in turn is inherited from the process,
but can be changed by a socket option. The process in turn
inherits it on fork. I have written a utility call setfib
that acts a bit like nice..

setfib -3 ping target.example.com # will use fib 3 for ping.

It is an obvious extension to make it a property of a jail
but I have not done so. It can be achieved by combining the setfib and
jail commands.

2/ packets received on an interface for forwarding.
By default these packets would use table 0,
(or possibly a number settable in a sysctl(not yet)).
but prior to routing the firewall can inspect them (see below).
(possibly in the future you may be able to associate a FIB
with packets received on an interface.. An ifconfig arg, but not yet.)

3/ packets inspected by a packet classifier, which can arbitrarily
associate a fib with it on a packet by packet basis.
A fib assigned to a packet by a packet classifier
(such as ipfw) would over-ride a fib associated by
a more default source. (such as cases 1 or 2).

4/ a tcp listen socket associated with a fib will generate
accept sockets that are associated with that same fib.

5/ Packets generated in response to some other packet (e.g. reset
or icmp packets). These should use the FIB associated with the
packet being reponded to.

6/ Packets generated during encapsulation.
gif, tun and other tunnel interfaces will encapsulate using the FIB
that was in effect withthe proces that set up the tunnel.
thus setfib 1 ifconfig gif0 [tunnel instructions]
will set the fib for the tunnel to use to be fib 1.

Routing messages would be associated with their
process, and thus select one FIB or another.
messages from the kernel would be associated with the fib they
refer to and would only be received by a routing socket associated
with that fib. (not yet implemented)

In addition Netstat has been edited to be able to cope with the
fact that the array is now 2 dimensional. (It looks in system
memory using libkvm (!)). Old versions of netstat see only the first FIB.

In addition two sysctls are added to give:
a) the number of FIBs compiled in (active)
b) the default FIB of the calling process.

Early testing experience:
-------------------------

Basically our (IronPort's) appliance does this functionality already
using ipfw fwd but that method has some drawbacks.

For example,
It can't fully simulate a routing table because it can't influence the
socket's choice of local address when a connect() is done.

Testing during the generating of these changes has been
remarkably smooth so far. Multiple tables have co-existed
with no notable side effects, and packets have been routes
accordingly.

ipfw has grown 2 new keywords:

setfib N ip from anay to any
count ip from any to any fib N

In pf there seems to be a requirement to be able to give symbolic names to the
fibs but I do not have that capacity. I am not sure if it is required.

SCTP has interestingly enough built in support for this, called VRFs
in Cisco parlance. it will be interesting to see how that handles it
when it suddenly actually does something.

Where to next:
--------------------

After committing the ABI compatible version and MFCing it, I'd
like to proceed in a forward direction in -current. this will
result in some roto-tilling in the routing code.

Firstly: the current code's idea of having a separate tree per
protocol family, all of the same format, and pointed to by the
1 dimensional array is a bit silly. Especially when one considers that
there is code that makes assumptions about every protocol having the
same internal structures there. Some protocols don't WANT that
sort of structure. (for example the whole idea of a netmask is foreign
to appletalk). This needs to be made opaque to the external code.

My suggested first change is to add routing method pointers to the
'domain' structure, along with information pointing the data.
instead of having an array of pointers to uniform structures,
there would be an array pointing to the 'domain' structures
for each protocol address domain (protocol family),
and the methods this reached would be called. The methods would have
an argument that gives FIB number, but the protocol would be free
to ignore it.

When the ABI can be changed it raises the possibilty of the
addition of a fib entry into the "struct route". Currently,
the structure contains the sockaddr of the desination, and the resulting
fib entry. To make this work fully, one could add a fib number
so that given an address and a fib, one can find the third element, the
fib entry.

Interaction with the ARP layer/ LL layer would need to be
revisited as well. Qing Li has been working on this already.

This work was sponsored by Ironport Systems/Cisco

Reviewed by: several including rwatson, bz and mlair (parts each)
Obtained from: Ironport systems/Cisco


178419 22-Apr-2008 rwatson

Acquire a read lock, rather than a write lock, on a UDPv6 inpcb when
delivering to the socket or extracting socket details for monitoring
purposes.

MFC after: 3 months


178378 21-Apr-2008 rwatson

In ICMPv6, read lock rather than write lock the inpcb on receive.

MFC after: 3 months


178377 21-Apr-2008 rwatson

With IPv4 raw sockets, read lock rather than write lock the inpcb when
receiving or transmitting.

With IPv6 raw sockets, read lock rather than write lock the inpcb when
receiving. Unfortunately, IPv6 source address selection appears to
require a write lock on the inpcb for the time being.

MFC after: 3 months


178320 19-Apr-2008 rwatson

When querying a local or remote address on an IPv6 socket, use only a
read lock on the inpcb.

MFC after: 3 months


178285 17-Apr-2008 rwatson

Convert pcbinfo and inpcb mutexes to rwlocks, and modify macros to
explicitly select write locking for all use of the inpcb mutex.
Update some pcbinfo lock assertions to assert locked rather than
write-locked, although in practice almost all uses of the pcbinfo
rwlock main exclusive, and all instances of inpcb lock acquisition
are exclusive.

This change should introduce (ideally) little functional change.
However, it lays the groundwork for significantly increased
parallelism in the TCP/IP code.

MFC after: 3 months
Tested by: kris (superset of committered patch)


178201 14-Apr-2008 rrs

- Have SCTP use the new pru_flush functionality

PR: 122710
MFC after: 1 week


178167 13-Apr-2008 qingli

This patch provides the back end support for equal-cost multi-path
(ECMP) for both IPv4 and IPv6. Previously, multipath route insertion
is disallowed. For example,

route add -net 192.103.54.0/24 10.9.44.1
route add -net 192.103.54.0/24 10.9.44.2

The second route insertion will trigger an error message of
"add net 192.103.54.0/24: gateway 10.2.5.2: route already in table"

Multiple default routes can also be inserted. Here is the netstat
output:

default 10.2.5.1 UGS 0 3074 bge0 =>
default 10.2.5.2 UGS 0 0 bge0

When multipath routes exist, the "route delete" command requires
a specific gateway to be specified or else an error message would
be displayed. For example,

route delete default

would fail and trigger the following error message:

"route: writing to routing socket: No such process"
"delete net default: not in table"

On the other hand,

route delete default 10.2.5.2

would be successful: "delete net default: gateway 10.2.5.2"

One does not have to specify a gateway if there is only a single
route for a particular destination.

I need to perform more testings on address aliases and multiple
interfaces that have the same IP prefixes. This patch as it
stands today is not yet ready for prime time. Therefore, the ECMP
code fragments are fully guarded by the RADIX_MPATH macro.
Include the "options RADIX_MPATH" in the kernel configuration
to enable this feature.

Reviewed by: robert, sam, gnn, julian, kmacy


177961 06-Apr-2008 rwatson

In in_pcbnotifyall() and in6_pcbnotify(), use LIST_FOREACH_SAFE() and
eliminate unnecessary local variable caching of the list head pointer,
making the code a bit easier to read.

MFC after: 3 weeks


177599 25-Mar-2008 ru

Replaced the misleading uses of a historical artefact M_TRYWAIT with M_WAIT.
Removed dead code that assumed that M_TRYWAIT can return NULL; it's not true
since the advent of MBUMA.

Reviewed by: arch

There are ongoing disputes as to whether we want to switch to directly using
UMA flags M_WAITOK/M_NOWAIT for mbuf(9) allocation.


177175 14-Mar-2008 bz

Correct IPsec behaviour with a 'use' level in SP but no SA available.
In that case return an continue processing the packet without IPsec.

PR: 121384
MFC after: 5 days
Reported by: Cyrus Rahman (crahman gmail.com)
Tested by: Cyrus Rahman (crahman gmail.com) [slightly older version]


177167 14-Mar-2008 bz

Correct reference counting on the SP for outgoing IPv6 IPsec connections.

PR: 121374
Reported by: Cyrus Rahman (crahman gmail.com)
Tested by: Cyrus Rahman (crahman gmail.com)
MFC after: 5 days


177166 14-Mar-2008 bz

#if 0 out a currently unsued (and incomplete) function: ip6_ipsec_mtu().
No need to compile 'dead' code.
I am leaving it in because we will have to review the concept and
should use the common function in various places.

MFC after: 5 days


177165 14-Mar-2008 bz

Replace the function name in two identical printfs
by __func__, __LINE__ so we can distinguish them
when people report a problem.

PR: 121373
MFC after: 5 days


175892 02-Feb-2008 bz

Rather than passing around a cached 'priv', pass in an ucred to
ipsec*_set_policy and do the privilege check only if needed.

Try to assimilate both ip*_ctloutput code blocks calling ipsec*_set_policy.

Reviewed by: rwatson


175630 24-Jan-2008 bz

Replace the last susers calls in netinet6/ with privilege checks.

Introduce a new privilege allowing to set certain IP header options
(hop-by-hop, routing headers).

Leave a few comments to be addressed later.

Reviewed by: rwatson (older version, before addressing his comments)


175512 20-Jan-2008 bz

Correct the commented out debugging printf()s in REPLACE and NEXT macros.
ip6_sprintf() needs a buffer as first argument these days.

MFC after: 2 weeks


175162 08-Jan-2008 obrien

un-__P()


174717 17-Dec-2007 rwatson

Fix leaking MAC labels for IPv6 inpcbs by adding missing MAC label
destroy call; this transpired because the inpcb alloc path for IPv4/IPv6
is the same code, but IPv6 has a separate free path. The results was
that as new IPv6 TCP connections were created, kernel memory would
gradually leak.

MFC after: 3 days
Reported by: tanyong <tanyong at ercist dot iscas dot ac dot cn>,
zhouzhouyi


174510 10-Dec-2007 obrien

Clean up VCS Ids.


174376 06-Dec-2007 julian

Remove more dup'd code
MFC After: 1 week


174375 06-Dec-2007 julian

remove duped code

Reviewed By: gnn
MRC after: 1 week


173824 21-Nov-2007 mtm

Instead of manually freeing the packet options structure (and not even doing
a good job of it) in the copypktopts() function, just call ip6_clearpktopts()
directly. Otherwise, the callers of this function would end up freeing the
memory twice.

Reviewed by: jinmei
PR: kern/116360


173095 28-Oct-2007 rwatson

Move towards more explicit support for various network protocol stacks
in the TrustedBSD MAC Framework:

- Add mac_atalk.c and add explicit entry point mac_netatalk_aarp_send()
for AARP packet labeling, rather than using a generic link layer
entry point.

- Add mac_inet6.c and add explicit entry point mac_netinet6_nd6_send()
for ND6 packet labeling, rather than using a generic link layer entry
point.

- Add expliict entry point mac_netinet_arp_send() for ARP packet
labeling, and mac_netinet_igmp_send() for IGMP packet labeling,
rather than using a generic link layer entry point.

- Remove previous genering link layer entry point,
mac_mbuf_create_linklayer() as it is no longer used.

- Add implementations of new entry points to various policies, largely
by replicating the existing link layer entry point for them; remove
old link layer entry point implementation.

- Make MAC_IFNET_LOCK(), MAC_IFNET_UNLOCK(), and mac_ifnet_mtx global
to the MAC Framework rather than static to mac_net.c as it is now
needed outside of mac_net.c.

Obtained from: TrustedBSD Project


173018 26-Oct-2007 rwatson

Rename 'mac_mbuf_create_from_firewall' to 'mac_netinet_firewall_send' as
we move towards netinet as a pseudo-object for the MAC Framework.

Rename 'mac_create_mbuf_linklayer' to 'mac_mbuf_create_linklayer' to
reflect general object-first ordering preference.

Sponsored by: SPARTA (original patches against Mac OS X)
Obtained from: TrustedBSD Project, Apple Computer


172930 24-Oct-2007 rwatson

Merge first in a series of TrustedBSD MAC Framework KPI changes
from Mac OS X Leopard--rationalize naming for entry points to
the following general forms:

mac_<object>_<method/action>
mac_<object>_check_<method/action>

The previous naming scheme was inconsistent and mostly
reversed from the new scheme. Also, make object types more
consistent and remove spaces from object types that contain
multiple parts ("posix_sem" -> "posixsem") to make mechanical
parsing easier. Introduce a new "netinet" object type for
certain IPv4/IPv6-related methods. Also simplify, slightly,
some entry point names.

All MAC policy modules will need to be recompiled, and modules
not updates as part of this commit will need to be modified to
conform to the new KPI.

Sponsored by: SPARTA (original patches against Mac OS X)
Obtained from: TrustedBSD Project, Apple Computer


172885 22-Oct-2007 jhb

Close a race when trying to lookup a gateway route in rt_check().
Specifically, if two threads were doing concurrent lookups and the existing
gateway was marked down, the the first thread would drop a reference on the
gateway route and then unlock the "root" route while it tried to allocate
a new route. The second thread could then also drop a reference on the
same gateway route resulting in a reference underflow. Fix this by
clearing the gateway route pointer after dropping the reference count but
before dropping the lock. Secondly, in this same case, the second thread
would overwrite the gateway route pointer w/o free'ing a reference to the
route installed by the first thread. In practice this would probably just
fix a lost reference that would result in a route never being freed.

This fixes panics observed in rt_check() and rtexpunge().

MFC after: 1 week
PR: kern/112490
Insight from: mehuljv at yahoo.com
Reviewed by: ru (found the "not-setting it to NULL" part)
Tested by: several


172156 13-Sep-2007 rrs

- Incorrect error EAGAIN returned for invalid send on a locked
stream (using EEOR mode). Changed to EINVAL (in sctp_output.c)
- Static analysis comments added
- fix in mobility code to return a value (static analysis found).
- sctp6_notify function made visible instead of
static (this is needed for Panda).

Approved by: re@freebsd.org (B Mah)


172091 08-Sep-2007 rrs

- send call has a reference to uio->uio_resid in
the recent send code, but uio may be NULL on sendfile
calls. Change to use sndlen variable.
- EMSGSIZE is not being returned in non-blocking mode
and needs a small tweak to look if the msg would
ever fit when returning EWOULDBLOCK.
- FWD-TSN has a bug in stream processing which could
cause a panic. This is a follow on to the codenomicon
fix.
- PDAPI level 1 and 2 do not work unless the reader
gets his returned buffer full. Fix so we can break
out when at level 1 or 2.
- Fix fast-handoff features to copy across properly on
accepted sockets
- Fix sctp_peeloff() system call when no true system call
exists to screen arguments for errors. In cases where a
real system call exists the system call itself does this.
- Fix raddr leak in recent add-ip code change for bundled
asconfs (even when non-bundled asconfs are received)
- Make sure ipi_addr lock is held when walking global addr
list. Need to change this lock type to a rwlock().
- Add don't wake flag on both input and output when the
socket is closing.
- When deleting an address verify the interface is correct
before allowing the delete to process. This protects panda
and unnumbered.
- Clean up old sysctl stuff and get rid of the old Open/Net
BSD structures.
- Add a function to watch the ranges in the sysctl sets.
- When appending in the reassembly queue, validate that
the assoc has not gone to about to be freed. If so
(in the middle) abort out. Note this especially effects
MAC I think due to the lock/unlock they do (or with
LOCK testing in place).
- Netstat patch to get rid of warnings.
- Make sure that no data gets queued to inactive/unconfirmed
destinations. This especially effect CMT but also makes a
impact on regular SCTP as well.
- During init collision when we detect seq number out
of sync we need to treat it like Case C and discard
the cookie (no invarient needed here).
- Atomic access to the random store.
- When we declare a vtag good, we need to shove it
into the time wait hash to prevent further use. When
the tag is put into the assoc hash, we need to remove it
from the twait hash (where it will surely be). This prevents
duplicate tag assignments.
- Move decr-ref count to better protect sysctl out of
data.
- ltrace error corrections in sctp6_usrreq.c
- Add hook for interface up/down to be sent to us.
- Make sysctl() exported structures independent of processor
architecture.
- Fix route and src addr cache clearing for delete address case.
- Make sure address marked SCTP_DEL_IP_ADDRESS is never selected
as src addr.
- in icmp handling fixed so we actually look at the icmp codes
to figure out what to do.
- Modified mobility code.
Reception of DELETE IP ADDRESS for a primary destination and
SET PRIMARY for a new primary destination is used for
retransmission trigger to the new primary destination.
Also, in this case, destination of chunks in send_queue are
changed to the new primary destination.
- Fix so that we disallow sending by mbuf to ever have EEOR
mode set upon it.

Approved by: re@freebsd.org (B Mah)


172090 08-Sep-2007 rrs

- Locking compatiability changes. This involves adding
additional flags to many function calls. The flags only
get used in BSD when we compile with lock testing. These
flags allow apple to escape the "giant" lock it holds on
the socket and have more fine-grained locking in the NKE.
It also allows us to test (with witness) the locking used
by apple via a compile switch (manually applied).

Approved by: re@freebsd.org(B Mah)


172087 08-Sep-2007 rwatson

Continue UDP/UDPv6 synchronization project:

- Fix copyrights, comments in UDPv6.
- Remove macro defines for in6pcb and udp6stat.
- Consistently refer to inpcbs as 'inp' and not also 'in6p'.

Reviewed by: gnn, jinmei, bz
Approved by: re (bmah)


171990 27-Aug-2007 rrs

- During shutdown pending, when the last sack came in and
the last message on the send stream was "null" but still
there, a state we allow, we could get hung and not clean
it up and wait for the shutdown guard timer to clear the
association without a graceful close. Fix this so that
that we properly clean up.
- Added support for Multiple ASCONF per new RFC. We only
(so far) accept input of these and cannot yet generate
a multi-asconf.
- Sysctl'd support for experimental Fast Handover feature. Always
disabled unless sysctl or socket option changes to enable.
- Error case in add-ip where the peer supports AUTH and ADD-IP
but does NOT require AUTH of ASCONF/ASCONF-ACK. We need to
ABORT in this case.
- According to the Kyoto summit of socket api developers
(Solaris, Linux, BSD). We need to have:
o non-eeor mode messages be atomic - Fixed
o Allow implicit setup of an assoc in 1-2-1 model if
using the sctp_**() send calls - Fixed
o Get rid of HAVE_XXX declarations - Done
o add a sctp_pr_policy in hole in sndrcvinfo structure - Done
o add a PR_SCTP_POLICY_VALID type flag - yet to-do in a future patch!
- Optimize sctp6 calls to reuse code in sctp_usrreq. Also optimize
when we close sending out the data and disabling Nagle.
- Change key concatenation order to match the auth RFC
- When sending OOTB shutdown_complete always do csum.
- Don't send PKT-DROP to a PKT-DROP
- For abort chunks just always checksums same for
shutdown-complete.
- inpcb_free front state had a bug where in queue
data could wedge an assoc. We need to just abandon
ones in front states (free_assoc).
- If a peer sends us a 64k abort, we would try to
assemble a response packet which may be larger than
64k. This then would be dropped by IP. Instead make
a "minimum" size for us 64k-2k (we want at least
2k for our initack). If we receive such an init
discard it early without all the processing.
- When we peel off we must increment the tcb ref count
to keep it from being freed from underneath us.
- handling fwd-tsn had bugs that caused memory overwrites
when given faulty data, fixed so can't happen and we
also stop at the first bad stream no.
- Fixed so comm-up generates the adaption indication.
- peeloff did not get the hmac params copied.
- fix it so we lock the addr list when doing src-addr selection
(in future we need to use a multi-reader/one writer lock here)
- During lowlevel output, we could end up with a _l_addr set
to null if the iterator is calling the output routine. This
means we would possibly crash when we gather the MTU info.
Fix so we only do the gather where we have a src address
cached.
- we need to be sure to set abort flag on conn state when
we receive an abort.
- peeloff could leak a socket. Moved code so the close will
find the socket if the peeloff fails (uipc_syscalls.c)

Approved by: re@freebsd.org(Ken Smith)


171943 24-Aug-2007 rrs

- Fix address add handling to clear cached routes and source addresses
when peer acks the add in case the routing table changes.
- Fix sctp_lower_sosend to send shutdown chunk for mbuf send
case when sndlen = 0 and sinfoflag = SCTP_EOF
- Fix sctp_lower_sosend for SCTP_ABORT mbuf send case with null data,
So that it does not send the "null" data mbuf out and cause
it to get freed twice.
- Fix so auto-asconf sysctl actually effect the socket's asconf state.
- Do not allow SCTP_AUTO_ASCONF option to be used on subset bound sockets.
- Memset bug in sctp_output.c (arguments were reversed) submitted
found and reported by Dave Jones (davej@codemonkey.org.uk).
- PD-API point needs to be invoked >= not just > to conform to socket api
draft this fixes sctp_indata.c in the two places need to be >=.
- move M_NOTIFICATION to use M_PROTO5.
- PEER_ADDR_PARAMS did not fail properly if you specify an address
that is not in the association with a valid assoc_id. This meant
you got or set the stcb level values instead of the destination
you thought you were going to get/set. Now validate if the
stcb is non-null and the net is NULL that the sa_family is
set and the address is unspecified otherwise return an error.
- The thread based iterator could crash if associations were freed
at the exact time it was running. rework the worker thread to
use the increment/decrement to prevent this and no longer use
the markers that the timer based iterator uses.
- Fix the memleak in sctp_add_addr_to_vrf() for the case when it is
detected that ifa is already pointing to a ifn.
- Fix it so that if someone is so insane that they drop the
send window below the minimal add mark, they still can send.
- Changed all state for associations to use mask safe macro.
- During front states in association freeing in sctp_inpcbfree, we
had a locking problem where locks were not in place where they
should have been.
- Free association calls were not testing the return value in
sctp_inpcb_free() properly... others should be cast void returns
where we don't care about the return value.
- If a reference count is held on an assoc, even from the "force free"
we should not do the actual free.. but instead let the timer
free it.
- When we enter sctp_input(), if the SCTP_ASOC_ABOUT_TO_BE_FREED
flag is set, we must NOT process the packet but handle it like
ootb. This is because while freeing an assoc we release the
locks to get all the higher order locks so we can purge all
the hash tables. This leaves a hole if a packet comes in
just at that point. Now sctp_common_input_processing() will
call the ootb code in such a case.
- Change MBUF M_NOTIFICATION to use M_PROTO5 (per Sam L). This makes
it so we don't have a conflict (I think this is a covertity change).
We made this change AFTER some conversation and looking to make sure
that M_PROTO5 does not have a problem between SCTP and the 802.11
stuff (which is the only other place its used).
- Fixed lock order reversal and missing atomic protection around
locked_tcb during association lookup and the 1-2-1 model.
- Added debug to source address selection.
- V6 output must always do checksum even for loopback.
- Remove more locks around inp that are not needed for an atomically
added/subtracted ref count.
- slight optimization in the way we zero the array in sctp_sack_check()
- It was possible to respond to a ABORT() with bad checksum with
a PKT-DROP. This lead to a PKT-DROP/ABORT war. Add code to NOT
send a PKT-DROP to any ABORT().
- Add an option for local logging (useful for macintosh or when
you need better performing during debugging). Note no commands
are here to get the log info, you must just use kgdb.
- The timer code needs to be aware of if it needs to call
sctp_sack_check() to slide the maps and adjust the cum-ack.
This is because it may be out of sync cum-ack wise.
- Added threshold managment logging.
- If the user picked just the right size, that just filled the send
window minus one mtu, we would enter a forever loop not copying and
at the same time not blocking. Change from < to <= solves this.
- Sysctl added to control the fragment interleave level which defaults
to 1.
- My rwnd control was not being used to control the rwnd properly (we
did not add and subtract to it :-() this is now fixed so we handle
small messages (1 byte etc) better to bring our rwnd down more
slowly.

Approved by: re@freebsd.org (Bruce Mah)


171732 05-Aug-2007 bz

Rename option IPSEC_FILTERGIF to IPSEC_FILTERTUNNEL.
Also rename the related functions in a similar way.
There are no functional changes.

For a packet coming in with IPsec tunnel mode, the default is
to only call into the firewall with the "outer" IP header and
payload.

With this option turned on, in addition to the "outer" parts,
the "inner" IP header and payload are passed to the
firewall too when going through ip_input() the second time.

The option was never only related to a gif(4) tunnel within
an IPsec tunnel and thus the name was very misleading.

Discussed at: BSDCan 2007
Best new name suggested by: rwatson
Reviewed by: rwatson
Approved by: re (bmah)


171609 27-Jul-2007 rwatson

Continue effort to improve parity between UDPv4 and UDPv6: add a missing
scope security check for the UDPv6 socket credential lookup service,
allowing security policies to bound access to credential information.
While not an immediate issue for Jail, which doesn't allow use of UDPv6,
this may be relevant to other security policies that may wish to control
ident lookups.

While here, eliminate a very unlikely panic case, in which a socket in
the process of being freed is inspected by the sysctl.

Approved by: re (kensmith)
Reviewed by: bz


171572 24-Jul-2007 rrs

- take out a needless panic under invariants for sctp_output.c
- Fix addrs's error checking of sctp_sendx(3) when addrcnt is less than
SCTP_SMALL_IOVEC_SIZE
- re-add back inpcb_bind local address check bypass capability
- Fix it so sctp_opt_info is independant of assoc_id postion.
- Fix cookie life set to use MSEC_TO_TICKS() macro.
- asconf changes
o More comment changes/clarifications related to the old local address
"not" list which is now an explicit restricted list.

o Rename some functions for clarity:
- sctp_add/del_local_addr_assoc to xxx_local_addr_restricted()
- asconf related iterator functions to sctp_asconf_iterator_xxx()

o Fix bug when the same address is deleted and added (and removed from
the asconf queue) where the ifa is "freed" twice refcount wise,
possibly freeing it completely.

o Fix bug in output where the first ASCONF would not go out after the
last address is changed (e.g. only goes out when retransmitted).

o Fix bug where multiple ASCONFs can be bundled in the same packet with
the and with the same serial numbers.

o Fix asconf stcb iterator to not send ASCONF until after all work
queue entries have been processed.

o Change behavior so that when the last address is deleted (auto asconf
on a bound all endpoint) no action is taken until an address is
added; at that time, an ASCONF add+delete is sent (if the assoc
is still up).

o Fix local address counting so that address scoping is taken into
account.

o #ifdef SCTP_TIMER_BASED_ASCONF the old timer triggered sending
of ASCONF (after an RTO). The default now is to send
ASCONF immediately (except for the case of changing/deleting the
last usable address).
Approved by: re(ken smith)@freebsd.org


171552 23-Jul-2007 rwatson

Continue effort to align UDPv4 and UDPv6 implementations by merging
udp6_output() from udp6_output.c to udp6_usrreq.c, matching the UDPv4
structure, and allowing us to remove udp6_output.c.

Reviewed by: bz, gnn
Approved by: re (bmah)


171531 21-Jul-2007 rrs

- remove duplicate code from sctp_asconf.c
- remove duplicate #include <sys/priv.h> that is not under
#ifdef FreeBSD version to allow compile on 6.1
- static analysis changes per the cisco SA tool including:
o some SA_IGNORE comments
o some checks for NULL before unlock.
o type corrections int -> size_t
- Fix it so sctp_alloc_asoc takes a thread/proc argument. Without this
we pass a NULL in to bind on implicit assoc setup and crash :-(
Approved by: re@freebsd.org(Ken Smith)


171508 19-Jul-2007 rwatson

Attempt to improve feature parity between UDPv4 and UDPv6 by merging
UDPv4 features to UDPv6:

- Add MAC checks on delivery and MAC labeling on transmit.
- Check for (and reject) datagrams with destination port 0.
- For multicast delivery, check the source port only if the socket being
considered as a destination has been connected.
- Implement UDP blackholing based on net.inet.udp.blackhole.
- Add a new ICMPv6 unreachable reply rate limiting category for failed
delivery attempts and implement rate limiting for UDPv6 (submitted by
bz).

Approved by: re (kensmith)
Reviewed by: bz


171496 19-Jul-2007 bz

Restore behavior changed with rev. 1.46 and make
IPV6_IPSEC_POLICY always visible again. This unbreaks some
third party user space applications.

PR: 114491
Reported by: sumikawa
Reviewed by: sumikawa
Approved by: re (hrs)


171477 17-Jul-2007 rrs

- added pre-checks to the bindx call.
- use proper tick gathering macro instead of ticks directly.
- Placed reasonable boundaries on sets that a user can do
that are converted to ticks from ms.
- Fix CMT_PF to always check to be sure CMT is on.
- Fix ticks use of CMT_PF.
- put back code to allow asconfs to be queued while INITs are in flight
and before the assoc is established.
- During window probes, an ack'd packet might be left with the window
probe mark on it causing it to be retransmitted. Change so that
the flight decrease macro clears the window_probe mark.
- Additional logging flight size/reading and ASOC LOG. This
is only enabled if you manually insert things into opt_sctp.h
since its a set of debug code only.
- Found an interesting SMP race in the way data was appended which
could cause a reader to lose a part of a message, had to
reorder when we marked the message was complete to after
the data was appended.
- bug in ADD-IP for the subset bound socket case when the peer has only
one address
- fix ASCONF implicit success/error handling case
- proper support of jails in Freebsd 6>
- copy out the timeval for the 64 bit sparc world on cookie-echo
alignment error crashes without this).
Approved by: re(Ken Smith)


171440 14-Jul-2007 rrs

- Modular congestion control, with RFC2581 being the default.
- CMT_PF states added (w/sysctl to turn the PF version on)
- sctp_input.c had a missing incr of cookie case when the
auth was bad. This meant a free was called without an
increment to refcnt, added increment like rest of code.
- There was a case, unlikely, when the scope of the destination
changed (this is a TSNH case). In that case, it would not free
the alloc'ed asoc (in sctp_input.c).
- When listed addresses found a colliding cookie/Init, then
the collided upon tcb was not unlocked in sctp_pcb.c
- Add error checking on arguments of sctp_sendx(3) to prevent it from
referencing a NULL pointer.
- Fix an error return of sctp_sendx(3), it was returing
ENOMEM not -1.
- Get assoc id was changed to use the sanctified socket api
method for getting a assoc id (PEER_ADDR_INFO instead of
PEER_ADDR_PARAMS).
- Fix it so a peeled off socket will get a proper error return
if it trys to send to a different address then it is connected to.
- Fix so that select_a_stream can avoid an endless loop that
could hang a caller.
- time_entered (state set time) was not being set in all cases
to the time we went established.
Approved by: re(ken smith)


171328 09-Jul-2007 rwatson

General style, white space, and comment cleanup; move to ANSI C
prototypes, don't use register, etc. Synchronize structure and
layout to the IPv4 versions of these functions to a greater extent,
making visual comparison easier.

Remove now stale or incorrect comments.

Enable full lock assertions, and correct one exception handling
case where the wrong label was jumped to.

Tested by: bz
Approved by: re (bmah)


171260 05-Jul-2007 delphij

Space cleanup

Approved by: re (rwatson)


171259 05-Jul-2007 delphij

ANSIfy[1] plus some style cleanup nearby.

Discussed with: gnn, rwatson
Submitted by: Karl Sj?dahl - dunceor <dunceor gmail com> [1]
Approved by: re (rwatson)


171237 05-Jul-2007 peter

Fix a stray splx() that caused a new warning.

Approved by: re (rwatson)


171232 05-Jul-2007 peter

Fix 'assignment used as truth value' warning

Approved by: re (rwatson)


171198 04-Jul-2007 gnn

Remove a last, dangling, file from the Kame IPsec code.

Approved by: re
Spotted by: rwatson, bz


171173 03-Jul-2007 mlaier

Link pf 4.1 to the build:
- move ftp-proxy from libexec to usr.sbin
- add tftp-proxy
- new altq mtag link

Approved by: re (kensmith)


171167 03-Jul-2007 gnn

Commit the change from FAST_IPSEC to IPSEC. The FAST_IPSEC
option is now deprecated, as well as the KAME IPsec code.
What was FAST_IPSEC is now IPSEC.

Approved by: re
Sponsored by: Secure Computing


171148 02-Jul-2007 gnn

Removing old, dead, KAME IPsec files as part of the move to the
new FAST_IPSEC based IPsec stack.

Approved by: re
Reviewed by: bz


171136 01-Jul-2007 gnn

Follow on cleanup and removal of two unnecessary include files.

Reviewed by: bz
Approved by: re
Supported by: Secure Computing


171133 01-Jul-2007 gnn

Commit IPv6 support for FAST_IPSEC to the tree.
This commit includes only the kernel files, the rest of the files
will follow in a second commit.

Reviewed by: bz
Approved by: re
Supported by: Secure Computing


170862 17-Jun-2007 mjacob

gcc4.2 somehow doesn't believe that finaldst can stay stable between
where it's initialized and where it's checked twice such that the
origingal destination address is saved. Make it happier and trim
things down a bit.


170859 17-Jun-2007 rrs

- For sctp_input/sctp6_input add announcment when a packet arrives (debug)
- re-factor the packet drop in sctp_output a bit more, we don't need the
trim after all, but the size calc is now corrected.
- When a assoc is in the COOKIE-ECHO/COOKIE-WAIT state and the user
closes, it should not matter if data is queued, the assoc should be
purged.
- In error leg a missing free_chunk when iph comes in NULL (should not
happen but just in case).


170801 15-Jun-2007 mjacob

Garbage collect unused variables.


170744 14-Jun-2007 rrs

- Fix so ifn's are properly deleted when the ref count goes to 0.
- Fix so VRF's will clean themselves up when no references are around.
- Allow sctp_ifa to be passed into inpcb_bind, addr_mgmt_ep_sa to bypass
normal validation checks.
- turn auto-asconf off for subset bound sockets
- Moves all logging to use KTR. This gets rid of most
of the logging #ifdef's with a few exceptions reducing
the number of config options for SCTP.


170689 13-Jun-2007 rwatson

Include priv.h to pick up suser(9) definitions, missed in an earlier
commit.

Warnings spotted by: kris


170613 12-Jun-2007 bms

Import rewrite of IPv4 socket multicast layer to support source-specific
and protocol-independent host mode multicast. The code is written to
accomodate IPv6, IGMPv3 and MLDv2 with only a little additional work.

This change only pertains to FreeBSD's use as a multicast end-station and
does not concern multicast routing; for an IGMPv3/MLDv2 router
implementation, consider the XORP project.

The work is based on Wilbert de Graaf's IGMPv3 code drop for FreeBSD 4.6,
which is available at: http://www.kloosterhof.com/wilbert/igmpv3.html

Summary
* IPv4 multicast socket processing is now moved out of ip_output.c
into a new module, in_mcast.c.
* The in_mcast.c module implements the IPv4 legacy any-source API in
terms of the protocol-independent source-specific API.
* Source filters are lazy allocated as the common case does not use them.
They are part of per inpcb state and are covered by the inpcb lock.
* struct ip_mreqn is now supported to allow applications to specify
multicast joins by interface index in the legacy IPv4 any-source API.
* In UDP, an incoming multicast datagram only requires that the source
port matches the 4-tuple if the socket was already bound by source port.
An unbound socket SHOULD be able to receive multicasts sent from an
ephemeral source port.
* The UDP socket multicast filter mode defaults to exclusive, that is,
sources present in the per-socket list will be blocked from delivery.
* The RFC 3678 userland functions have been added to libc: setsourcefilter,
getsourcefilter, setipv4sourcefilter, getipv4sourcefilter.
* Definitions for IGMPv3 are merged but not yet used.
* struct sockaddr_storage is now referenced from <netinet/in.h>. It
is therefore defined there if not already declared in the same way
as for the C99 types.
* The RFC 1724 hack (specify 0.0.0.0/8 addresses to IP_MULTICAST_IF
which are then interpreted as interface indexes) is now deprecated.
* A patch for the Rhyolite.com routed in the FreeBSD base system
is available in the -net archives. This only affects individuals
running RIPv1 or RIPv2 via point-to-point and/or unnumbered interfaces.
* Make IPv6 detach path similar to IPv4's in code flow; functionally same.
* Bump __FreeBSD_version to 700048; see UPDATING.

This work was financially supported by another FreeBSD committer.

Obtained from: p4://bms_netdev
Submitted by: Wilbert de Graaf (original work)
Reviewed by: rwatson (locking), silence from fenner,
net@ (but with encouragement)


170606 12-Jun-2007 rrs

- Restructure so bindx functions are not done inline to socket option
but are a seperate call that can be re-used if needed.
- 64 bit issues
o re-arrange cookie so it is better 64 bit aligned
o For wire level things we need the packed attribute.


170587 12-Jun-2007 rwatson

Eliminate now-unused SUSER_ALLOWJAIL arguments to priv_check_cred(); in
some cases, move to priv_check() if it was an operation on a thread and
no other flags were present.

Eliminate caller-side jail exception checking (also now-unused); jail
privilege exception code now goes solely in kern_jail.c.

We can't yet eliminate suser() due to some cases in the KAME code where
a privilege check is performed and then used in many different deferred
paths. Do, however, move those prototypes to priv.h.

Reviewed by: csjp
Obtained from: TrustedBSD Project


170275 04-Jun-2007 jinmei

cleanup about the reassembly structures and routine:
- removed unused structure members
- fixed a minor bug that the ECN code point may not be restored correctly

Approved by: ume (mentor)
MFC after: 1 week


170205 02-Jun-2007 rrs

- fix initial pcb vrf setting when the initial vrf is not the
default_vrf_id
- Missing lock/unlock of inp added as well in the v6 side.
- IFN hash table moves to sctppcbinfo since indexes are
unique across systems (including different VRFs) this makes it easier
to do ifn lookups.


170202 02-Jun-2007 jinmei

fixed memory leak for IPv6 multicast membership information associated
with interface addresses.

Approved by: gnn (mentor)
MFC after: 1 week


170200 02-Jun-2007 jinmei

simplified the fix in rev. 1.69 by replacing RT_REMREF+RT_UNLOCK with
RTFREE_LOCKED.

Approved by: gnn (mentor)


170181 01-Jun-2007 rrs

- Take out the broken table-id concept. Panda Routers have a M-VRF
concept that is NOT well thought out for a multi-homed transport
protocol. So the useless table-id entries passed around need to
be removed.
- Add a event timer for the zero copy api.
- Fix a bug in sctp_timer.c when searching for an alternate
with the largest ssthresh (the compare was wrong).


170091 29-May-2007 rrs

- Fixes so we won't try to start a timer when we
hold a wq lock for the iterator. Panda uses a
silly recursive lock they hold through the timer.
- Add poor mans wireshark compile option..
- Allocate and start using SCTP_M_XXX for all SCTP_MALLOC() calls.
- sysctl now will get back the refcnt for viewing by onlookers.

Reviewed by: gnn


170056 28-May-2007 rrs

- fixed autclose to not allow setting on 1-2-1 model.
- bounded cookie-life to 1 second minimum in socket option set.
- Delayed_ack_time becomes delayed_ack per new socket api document.
- Improve port number selection, we now use low/high bounds and
no chance of a endless loop. Only one call to random per bind
as well.
- fixes so set_peer_primary pre-screens addresses to be
valid to this host.
- maxseg did not allow setting on an assoc basis. We needed
to thus track and use an association value instead of a inp value.
- Fixed ep get of HB status to report back properly.
- use settings flag to tell if assoc level hb is on off not
the timer.. since the timer may still run if unconf address
are present.
- check for crazy ENABLE/DISABLE conditions.
- set and get of pmtud (fixed path mtu) not always taking into account ovh.
- Getting PMTU info on stcb only needs to return PMTUD_ENABLED if
any net is doing PMTU discovery.
- Panic or warning fixed to not do so when a valid ip frag is
taking place.
- sndrcvinfo appearing in both inp and stcb was full size, instead
of the non-pad version. This saves about 92 bytes from each struct
by carefully converting to use the smaller version.
- one-2-one model get(maxseg) would always get ep value, never the
tcb's value.
- The delayed ack time could be under a tick, this fixes so
it bounds it to at least 1 tick for platforms whos tick
is more than a ms.
- Fragment interleave level set to wrong default value.
- Fragment interleave could not set level 0.
- Defered stream reset was broken due to a guard check and ntohl issue.
- Found two lock order reversals and fixed.
- Tighten up address checking, if the user gives an address the sa_len
had better be set properly.
- Get asoc by assoc-id would return a locked tcb when it was asked
not to if the tcb was in the restart hash.
- sysctl to dig down and get more association details

Reviewed by: gnn


169975 25-May-2007 jinmei

do not directly call rtfree() to meet an assumption in the callee.
(this fix suppresses a warning message appearing in the boot time on
IPv6-enabled systems)

Approved by: gnn (mentor)


169832 21-May-2007 cognet

Force the alignment of the chars arrays, as they are casted later to
structs.
gcc 4.2 doesn't do it by default, and that results in unaligned access on
arm.

Reviewed by: gnn, imp


169664 17-May-2007 jinmei

- Disabled responding to NI queries from a global address by default as
specified in RFC4620. A new flag for icmp6_nodeinfo was added to enable the
feature.
- Also cleaned up the code so that the semantics of the icmp6_nodeinfo
flags is clearer (i.e., defined specific macro names instead of using
hard-coded values).

Approved by: gnn (mentor)
MFC after: 1 week


169655 17-May-2007 rrs

- Fixed 1-2-1 model to not worry about associd in sockopts
- Fixed RTOinfo for bounding.
- Fixed connect() to return ECONNREFUSED when an ABORT is received.
- Added comments to direct Static Analysis not to look at some things
it does not understand (comments are /* sa_ignore XXXXX */)
- Bind when colliding was broken, missing not_found = 1 before
checking to see if the port was in use caused endless bind loop.
- Cookie life needs to be in milliseconds to conform to socket api.
- Cookie life is not supposed to change if its 0, On the assoc
level set we changed it to 0 opps.
- Two more static analysis issues identified by the cisco
tool. Null checks needed.
- An issue for sendfile(). Need to validate the correct
input argument.
- When sending failed due to a no route to host, we leaked
the mbuf chain failing to call m_freem().
- Fix #ifdef issue for getting hash block len when HAVE_SHA2 is NOT defined
Reviewed by: gnn


169557 14-May-2007 jinmei

handle IPv6 router alert option contained in an incoming packet per
option value so that unrecognized options are ignored as specified in RFC2711.
(packets containing an MLD router alert option are passed to the upper layer
as before).

Approved by: gnn (mentor), ume (mentor)


169462 11-May-2007 rwatson

Reduce network stack oddness: implement .pru_sockaddr and .pru_peeraddr
protocol entry points using functions named proto_getsockaddr and
proto_getpeeraddr rather than proto_setsockaddr and proto_setpeeraddr.
While it's true that sockaddrs are allocated and set, the net effect is
to retrieve (get) the socket address or peer address from a socket, not
set it, so align names to that intent.


169432 09-May-2007 mjacob

Need sys/cdevs.h for the macro FBSDID to work.


169425 09-May-2007 gnn

Integrate the Camellia Block Cipher. For more information see RFC 4132
and its bibliography.

Submitted by: Tomoyuki Okazaki <okazaki at kick dot gr dot jp>
MFC after: 1 month


169420 09-May-2007 rrs

Two major items here:
- All printf that was surrounded by #ifdef SCTP_DEBUG moves to
a macro that does all of this. This removes all printfs from
the code and makes the code more portable and easier to
read.
- Static Analysis (cisco) - found a few bugs, but mostly we
add checks for NULL pointers and such to make the tool
happy. We now pass the Cisco SA tools checks except for
where it does not understand tailq/lists. We still need
to look at the coverity tools output too (this is like
the cisco SA tool) and see if it wants us to fix any other
items. Hopefully this will be the last major churn in the
code other than bug fixes.


169388 08-May-2007 gnn

Reduce the default number of header options that the IPv6 protocol
stack will process from 50 to 15. As this is a sysctl variable it
can be tuned up or down at the user/administrator's whim.

Submitted by: itojun
MFC after: 1 day


169382 08-May-2007 rrs

- Copyright change, cisco's silly tool wants it to say:
"Copyright (c) 2001-2007, by Cisco Systems,"
instead of
*Copyright (c) 2001-2007, Cisco Systems,"

- Also fix a few straglers that were still in 2006.


169380 08-May-2007 rrs

- Get rid of the sctp_inpcb_free() "magic numbers", now they
are sensible defines that tell what you are directing
the function to do.


169378 08-May-2007 rrs

- Static analyisis fixes for cisco's commit (this is equivilant
to the coverity tool.. may even be the same one.. not sure).
- A bug in the way sctp_abort() and friends were
setting the IP_CLOSE flag.. and NOT passing the
last argument as a (,1)... so that things would
get freed..


169352 08-May-2007 rrs

- More macros for OS compatabilty
- PR-SCTP would ignore FWD-TSN's above a rwnd's worth
of TSN's (1 byte msgs).. this left the peer hopelessly
out of sync.. or an attacker. So now we abort the assoc.
- New IFN hash, also rename hashes to match addr/ifn now
that the vrf has multiple.
- Do not enable SCTP_PCB_FLAGS_RECVDATAIOEVNT per default
as defined in the Socket API ID.
- Export MTU information via sysctl.
- Vrf's need table id's. This is default for
BSD, but may be other things later when BSD
fully supports VRFs.
- Additional stream reset bug (caught by cisco dev-test).
- Additional validations for the address in sending a message (socket api).
-------- and -----
- Fix association notifications not to give the active open
side false notifications.
- Fix so sendfile and SENDALL will work properly (missing
flag to say socket sender is done).
- Fix Bug that prevented COOKIES from being retransmitted.
- Break out connectx into helper sub-models so that iox routines can
reuse the helpers.
- When an address is added during system init (non-dynamic mode) make
sure that the "defer use" flag is not set.
** its compiling on XR now :-D **

Reviewed by: gnn


169273 05-May-2007 suz

some minor modification to the previous commit to sys/netinet6/nd6.c and nd6_nbr.c.
- added some clarification comments
- removed an unnecesary code

Obtained from: KAME
MFC after: 1 week


169241 04-May-2007 suz

fixed a memory leak in unresolved ND queue processing

Obtained from: KAME
MFC after: 1 week


169208 02-May-2007 rrs

- Somehow the disable fragment option got lost. We could
set/clear it but would not do it. Now we will.
- Moved to latest socket api for extended sndrcv info struct.
- Moved to support all new levels of fragment interleave (0-2).
- Codenomicon security test updates - length checks and such.
- Bug in stream reset (2 actually).
- setpeerprimary could unlock a null pointer, fixed.
- Added a flag in the pcb so netstat can see if we are listening easier.

Obtained from: (some of the Listen changes from Weongyo Jeong)


169179 01-May-2007 rwatson

Remove unused pcbinfo arguments to in_setsockaddr() and
in_setpeeraddr().


169154 30-Apr-2007 rwatson

Rename some fields of struct inpcbinfo to have the ipi_ prefix,
consistent with the naming of other structure field members, and
reducing improper grep matches. Clean up and comment structure
fields in structure definition.


168970 23-Apr-2007 gnn

Turn off route header processing for now due to issues pointed out
by Philippe Biondi and Arnaud Ebalard. This is a temporary fix
until more discussion can be had on the exact risks involved in
allowing source routing in IPv6

Submitted by: itojun
Reviewed by: jinmei
MFC after: 1 day


168932 21-Apr-2007 rwatson

Teach netinet6 to use PRIV_NETINET_REUSEPORT.


168709 14-Apr-2007 rrs

- fix source address selection when picking an acceptable address
- name change of prefered -> preferred
- CMT fast recover code added.
- Comment fixes in CMT.
- We were not giving a reason of cant_start_asoc per socket api
if we failed to get init/or/cookie to bring up an assoc. Change
so we don't just give a generic "comm lost" but look at actual
states of dying assoc.
- change "crc32" arguments to "crc32c" to silence strict/noisy
compiler warnings when crc32() is also declared
- A few minor tweaks to get the portable stuff truely portable
for sctp6_usrreq.c :-D
- one-2-one style vrf match problem.
- window recovery would leave chks marked for retran
during window probes on the sent queue. This would then
cause an out-of-order problem and assure that the flight
size "problem" would occur.
- Solves a flight size logging issue that caused rwnd
overruns, flight size off as well as false retransmissions.g
- Macroize the up and down of flight size.
- Fix a ECNE bug in its counting.
- The strict_sacks options was causing aborts when window probing
was active, fix to make strict sacks a bit smarter about what
the next unsent TSN is.
- Fixes a one-2-one wakeup bug found by Martin Kulas.
- If-defed out form, Andre's copy routines pending his
commit of at least m_last().. need to adjust for 6.2 as
well.. since m_last won't exist.
Reviewed by: gnn


168627 11-Apr-2007 rwatson

Remove obsolete comment about privileges: SUSER_ALLOWJAIL is no longer set
in this code.


168299 03-Apr-2007 rrs

- fixed several places where we did not release INP locks.
- fixed a refcount bug in the new ifa structures.
- use vrf's from default stcb or inp whenever possible.
- Address limits raised to account for a full IP fragmented
packet (1000 addresses).
- flight size correcting updated to include one message only
and to handle case where the peer does not cumack the
next segment aka lists 1/1 in sack blocks..
- Various bad init/init-ack handling could cause a panic
since we tried to unlock the destroyed mutex. Fixes
so we properly exit when we need to destroy an assoc.
(Found by Cisco DevTest team :D)
- name rename in src-addr-selection from pass to sifa.
- route structure typedef'd to allow different platforms
and updated into sctp_os_bsd file.
- Max retransmissions a chunk can be made added.
Reviewed by: gnn


168191 31-Mar-2007 jhb

Optimize sx locks to use simple atomic operations for the common cases of
obtaining and releasing shared and exclusive locks. The algorithms for
manipulating the lock cookie are very similar to that rwlocks. This patch
also adds support for exclusive locks using the same algorithm as mutexes.

A new sx_init_flags() function has been added so that optional flags can be
specified to alter a given locks behavior. The flags include SX_DUPOK,
SX_NOWITNESS, SX_NOPROFILE, and SX_QUITE which are all identical in nature
to the similar flags for mutexes.

Adaptive spinning on select locks may be enabled by enabling the
ADAPTIVE_SX kernel option. Only locks initialized with the SX_ADAPTIVESPIN
flag via sx_init_flags() will adaptively spin.

The common cases for sx_slock(), sx_sunlock(), sx_xlock(), and sx_xunlock()
are now performed inline in non-debug kernels. As a result, <sys/sx.h> now
requires <sys/lock.h> to be included prior to <sys/sx.h>.

The new kernel option SX_NOINLINE can be used to disable the aforementioned
inlining in non-debug kernels.

The size of struct sx has changed, so the kernel ABI is probably greatly
disturbed.

MFC after: 1 month
Submitted by: attilio
Tested by: kris, pjd


168124 31-Mar-2007 rrs

- Found bug in min split point bundling which caused
incorrect, non-bundlable fragmentation.
- Added min residual to better control split points for
both how big a msg must be as well as how much needs
to be left over.
- With our new algo in place, we need to implicitly
set "end of msg" on the sp-> structure otherwise we
end up with "hung" associations.
- Room reserved up front in IP header by pushing IP
header to back of mbuf.
- Fix so FR's peg count of retransmissions needed.
- Fix so an unlucky chunk that never gets across
will kill the assoc via the kill timer and send an
abort too.
- Fix bug in sctp_input which can result in a crash.
- Do not strip off IP options anymore.
- Clean up sctp_calculate_rto().
- Get rid of unused sysctl.
- Fixed so we discard all M-Cast
- Fixed so port check done AFTER checksum
- Fixed bug in fragmentation code that prevented
us from fragmenting a small complete message when
we needed to.
- Window probes were not marked back to unsent and
flight adjusted when a sack came in with no
window change or accepting of the probe data.
We now fix this with having a mark on the net and
the chunk so we can clear it out when the sack arrives
forcing it to retran just like it was "new" this
improves the handling of window probes, which were
dropped by the receiver.
- Tighten AUTH protocol error checks during INIT/INIT-ACK exchange


167729 20-Mar-2007 bms

Implement reference counting for ifmultiaddr, in_multi, and in6_multi
structures. Detect when ifnet instances are detached from the network
stack and perform appropriate cleanup to prevent memory leaks.

This has been implemented in such a way as to be backwards ABI compatible.
Kernel consumers are changed to use if_delmulti_ifma(); in_delmulti()
is unable to detect interface removal by design, as it performs searches
on structures which are removed with the interface.

With this architectural change, the panics FreeBSD users have experienced
with carp and pfsync should be resolved.

Obtained from: p4 branch bms_netdev
Reviewed by: andre
Sponsored by: Garance A Drosehn
Idea from: NetBSD
MFC after: 1 month


167695 19-Mar-2007 rrs

- errno -> becomes error in sctp_output.c and sctputil.c
- SB_CLEAR macro defined and used for sb clearing.
- Fix for CMT express_sack_handling did not do proper
pseudo-cumack updates.
- Get rid of extraneous function that was never used ip_2_ip6_hdr()
- Fixed source address selection bug (initialization problem).
- Source address selection debug added.


167598 15-Mar-2007 rrs

- Sysctl's move to seperate file
- moved away from ifn/ifa access to sctp_ifa/sctp_ifn
built and managed by the add-ip code.
- cleaned up add-ip code to use the iterator
- made iterator be a thread, which enables auto-asconf now.
- rewrote and cleaned up source address selection (also
made it use new structures).
- Fixed a couple of memory leaks.
- DACK now settable as to how many packets to delay as
well as time.
- connectx() to latest socket API, new associd arg.
- Fixed issue with revoking and loosing potential to
send when we inflate the flight size. We now inflate
the cwnd too and deflate it later when the revoked
chunk is sent or acked.
- Got rid of some temp debug code
- src addr selection moved to a common file (sctp_output.c)
- Support for simple VRF's (we have support for multi-vfr
via compile switch that is scrubbed from BSD but we won't
need multi-vrf until we first get VRF :-D)
- Rest of mib work for address information now done
- Limit number of addresses in INIT/INIT-ACK to
a #def (30).

Reviewed by: gnn


167125 28-Feb-2007 bms

Add comments about common idioms for cleanup pass at a later date.


167119 28-Feb-2007 bms

Remove code which would never be used, viz a viz Quality-of-Service;
the token bucket filter got killed in netinet, so it gets killed here
too. Correct comments.


167118 28-Feb-2007 bms

Add a comment about a struct which needs to be global.
Remove an unused global variable.
Staticize variables which do not need to be global.


166948 24-Feb-2007 bms

Fix tinderbox. ip6_mrouter should be defined in raw_ip6.c as it is
tested to determine if the userland socket is open; this, in turn, is
used to determine if the module has been loaded.

Tested with: LINT


166938 24-Feb-2007 bms

Make IPv6 multicast forwarding dynamically loadable from a GENERIC kernel.
It is built in the same module as IPv4 multicast forwarding, i.e. ip_mroute.ko,
if and only if IPv6 support is enabled for loadable modules.
Export IPv6 forwarding structs to userland netstat(1) via sysctl(9).


166842 20-Feb-2007 rwatson

Rename two identically named log_in_vain variables: tcp_input.c's static
log_in_vain to tcp_log_in_vain, and udp_usrreq's global log_in_vain to
udp_log_in_vain.

MFC after: 1 week


166675 12-Feb-2007 rrs

- Copyright updates (aka 2007)
- ZONE get now also take a type cast so it does the
cast like mtod does.
- New macro SCTP_LIST_EMPTY, which in bsd is just
LIST_EMPTY
- Removal of const in some of the static hmac functions
(not needed)
- Store length changes to allow for new fields in auth
- Auth code updated to current draft (this should be the
RFC version we think).
- use uint8_t instead of u_char in LOOPBACK address comparison
- Some u_int32_t converted to uint32_t (in crc code)
- A bug was found in the mib counts for ordered/unordered
count, this was fixed (was referencing a freed mbuf).
- SCTP_ASOCLOG_OF_TSNS added (code will probably disappear
after my testing completes. It allows us to keep a
small log on each assoc of the last 40 TSN's in/out and
stream assignment. It is NOT in options and so is only
good for private builds.
- Some CMT changes in prep for Jana fixing his problem
with reneging when CMT is enabled (Concurrent Multipath
Transfer = CMT).
- Some missing mib stats added.
- Correction to number of open assoc's count in mib
- Correction to os_bsd.h to get right sha2 macros
- Add of special AUTH_04 flags so you can compile the code
with the old format (in case the peer does not yet support
the latest auth code).
- Nonce sum was incorrectly being set in when ecn_nonce was
NOT on.
- LOR in listen with implicit bind found and fixed.
- Moved away from using mbuf's for socket options to using
just data pointers. The mbufs were used to harmonize
NetBSD code since both Net and Open used this method. We
have decided to move away from that and more conform to
FreeBSD style (which makes more sense).
- Very very nasty bug found in some of my "debug" code. The
cookie_how collision case tracking had an endless loop in
it if you got a second retransmission of a cookie collision
case. This would lock up a CPU .. ugly..
- auth function goes to using size_t instead of int which
conforms to socketapi better
- Found the nasty bug that happens after 9 days of testing.. you
get the data chunk, deliver it and due to the reference to a ch->
that every now and then has been deleted (depending on the postion
in the mbuf) you have an invalid ch->ch.flags.. and thus you don't
advance the stream sequence number.. so you block the stream
permanently. The fix is to make local variables of these guys
and set them up before you have any chance of trimming the
mbuf.
- style fix in sctp_util.h, not sure how this got bad maybe in
the last patch? (aka it may not be in the real source).
- Found interesting bug when using the extended snd/rcv info where
we would get an error on receiving with this. Thats because
it was NOT padded to the same size as the snd_rcv info. We
increase (add the pad) so the two structs are the same size
in sctp_uio.h
- In sctp_usrreq.c one of the most common things we did for
socket options was to cast the pointer and validate the size.
This as been macro-ized to help make the code more readable.
- in sctputil.c two things, the socketapi class found a missing
flag type (the next msg is a notification) and a missing
scope recovery was also fixed.

Reviewed by: gnn


166619 10-Feb-2007 bms

In the ICMP6 path to handle FQDN 'who-are-you' queries, check that the
packet header mbuf is non-NULL before trying to create a duplicate of it.

PR: 95957
Reviewed by: ume
MFC after: 3 days


166511 05-Feb-2007 bms

MFC after: 3 days


166456 03-Feb-2007 ume

ng_iface requiers neighbor cache as well.

MFC after: 3 days


166266 26-Jan-2007 bmah

Revert nd6.c revs. 1.67, 1.68, 1.69, 1.70 in an attempt to unbreak
IPv6 over point-to-point gif(4) tunnels.

These revisions caused a host route to the destination of a
point-to-point gif(4) interface to not get installed when the interface
and destination addresses were assigned. This caused
"no route to host" errors when trying to send traffic over the
interface. The first packet arriving inbound over the tunnel,
however, would cause the correct route to get installed, allowing
subsequent outbound traffic to be routed correctly.

gif(4) interfaces with prefix lengths of less than 128 bits
(i.e. no explicit destination address assigned) were not affected
by this bug.

This bug fix is a possible candidate for a 6.2-RELEASE errata note.

Approved by: jhay (original committer)
Discussed with: jhay, JINMEI Tatuya
MFC after: 3 days


166086 18-Jan-2007 rrs

- most all includes (#include <>) migrate to the sctp_os_bsd.h file
- Finally all splxx() are removed
- Count error fixed in mapping array which might
cause a wrong cumack generation.
- Invariants around panic for case D + printf when no invariants.
- one-to-one model race condition fixed by using
a pre-formed connection and then completing the
work so accept won't happen on a non-formed
association.
- Some additional paranoia checks in sctp_output.
- Locks that were missing in the accept code.

Approved by: gnn


166046 16-Jan-2007 ume

Avoid infinite loop if nicmp6 and nip6 are not on the same mbuf.
NetBSD PR 34994+35333

MFC after: 3 days


166023 15-Jan-2007 rrs

- Macroizes the V6ONLY flag check.
- Added a short time wait (not used yet) constant
- Corrected the type of the crc32c table (it was
unsigned long and really is a uint32_t
- Got rid of the user of MHeaders until they
are truely needed by lower layers.
- Fixed an initialization problem in the readq structure
(ordering was off).
- Found yet another collision bug when the random number
generator returns two numbers on one side (during a collision)
that are the same. Also added some tracking of cookies
that will go away when we know that we have the last collision
bug gone.
- Fixed an init bug for book_size_scale, that was causing
Early FR code to run when it should not.
- Fixed a flight size tracking bug that was associated with
Early FR but due to above bug also effected all FR's
- Fixed it so Max Burst also will apply to Fast Retransmit.
- Fixed a bug in the temporary logging code that allowed a
static log array overflow
- hashinit_flags is now used.
- Two last mcopym's were converted to the macro sctp_m_copym that
has always been used by all other places
- macro sctp_m_copym was converted to upper case.
- We now validate sinfo_flags on input (we did not before).
- Fixed a bug that prevented a user from sending data and immediately
shuting down with one send operation.
- Moved to use hashdestroy instead of free() in our macros.
- Fixed an init problem in our timed_wait vtag where we
did not fully initialize our time-wait blocks.
- Timer stops were re-positioned.
- A pcb cleanup method was added, however this probably will
not be used in BSD.. unless we make module loadable protocols
- I think this fixes the mysterious timer bug.. it was a
ordering of locks problem in the way we did timers. It
now conforms to the timeout(9) manual (except for the
_drain part, we had to do this a different way due
to locks).
- Fixed error return code so we get either CONNREUSED or CONNRESET
depending on where one is in progression
- Purged an unused clone macro.
- Fixed a read erro code issue where we were NOT getting the proper
error when the connection was reset.
- Purged an unused clone macro.
- Fixed a read erro code issue where we were NOT getting the proper
error when the connection was reset.
Approved by: gnn


165965 12-Jan-2007 imp

Marked these as packed correctly


165647 29-Dec-2006 rrs

a) macro-ization of all mbuf and random number
access plus timers. This makes the code
more portable and able to change out the
mbuf or timer system used more easily ;-)
b) removal of all use of pkt-hdr's until only
the places we need them (before ip_output routines).
c) remove a bunch of code not needed due to <b> aka
worrying about pkthdr's :-)
d) There was one last reorder problem it looks where
if a restart occur's and we release and relock (at
the point where we setup our alias vtag) we would
end up possibly getting the wrong TSN in place. The
code that fixed the TSN's just needed to be shifted
around BEFORE the release of the lock.. also code that
set the state (since this also could contribute).
Approved by: gnn


165287 16-Dec-2006 bz

In ip6_sprintf print the addresses in a more common/readable
format eliminating leading zeros like in :0001 -> :1.

Reviewed by: mlaier


165220 14-Dec-2006 rrs

1) Fixes on a number of different collision case LOR's.
2) Fix all "magic numbers" to be constants.
3) A collision case that would generate two associations to
the same peer due to a missing lock is fixed.
4) Added tracking of where timers are stopped.
Approved by: gnn


165118 12-Dec-2006 bz

MFp4: 92972, 98913 + one more change

In ip6_sprintf no longer use and return one of eight static buffers
for printing/logging ipv6 addresses.
The caller now has to hand in a sufficiently large buffer as first
argument.


164604 25-Nov-2006 ru

- In nd6_rtrequest(), when caching an rtentry, don't forget
to add a reference to it; otherwise, we could later access
a freed memory. This is believed to fix panics some users
were observing when running route6d(8), and is similar to
the fix in sys/netinet/if_ether.c,v 1.139 by glebius@.

PR: kern/93910, kern/105437
Testing by: Wojciech Puchar (still ongoing)

- Add rtentry locking to nd6_output() similar to rt_check().

MFC after: 4 days


164085 08-Nov-2006 rrs

-Fixes first of all the getcred on IPv6 and V4. The
copy's were incorrect and so was the locking.
-A bug was also found that would create a race and
panic when an abort arrived on a socket being read
from.
-Also fix the reader to get MSG_TRUNC when a partial
delivery is aborted.
-Also addresses a couple of coverity caught error path
memory leaks and a couple of other valid complaints
Approved by: gnn


164039 06-Nov-2006 rwatson

Convert three new suser(9) calls introduced between when the priv(9)
patch was prepared and committed to priv(9) calls. Add XXX comments
as, in each case, the semantics appear to differ from the TCP/UDP
versions of the calls with respect to jail, and because cr_canseecred()
is not used to validate the query.

Obtained from: TrustedBSD Project


164033 06-Nov-2006 rwatson

Sweep kernel replacing suser(9) calls with priv(9) calls, assigning
specific privilege names to a broad range of privileges. These may
require some future tweaking.

Sponsored by: nCircle Network Security, Inc.
Obtained from: TrustedBSD Project
Discussed on: arch@
Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri,
Alex Lyashkov <umka at sevcity dot net>,
Skip Ford <skip dot ford at verizon dot net>,
Antoine Brodin <antoine dot brodin at laposte dot net>


163996 05-Nov-2006 rrs

Tons of fixes to get all the 64bit issues removed.
This also moves two 16 bit int's to become 32 bit
values so we do not have to use atomic_add_16.
Most of the changes are %p, casts and other various
nasty's that were in the orignal code base. With this
commit my machine will now do a build universe.. however
I as yet have not tested on a 64bit machine .. it may not work :-(


163954 03-Nov-2006 rrs

Opps... in my fix up of all the $FreeBSD:$-> $FreeBSD$ I
inserted a few to the new files.. but I falied to
add the #include <sys/cdef.h>

Which causes a compile error.. sorry about that... got it
now :-)

Approved by:gnn


163953 03-Nov-2006 rrs

Ok, here it is, we finally add SCTP to current. Note that this
work is not just mine, but it is also the works of Peter Lei
and Michael Tuexen. They both are my two key other developers
working on the project.. and they need ata-boy's too:
****
peterlei@cisco.com
tuexen@fh-muenster.de
****
I did do a make sysent which updated the
syscall's and sysproto.. I hope that is correct... without
it you don't build since we have new syscalls for SCTP :-0

So go out and look at the NOTES, add
option SCTP (make sure inet and inet6 are present too)
and play with SCTP.

I will see about comitting some test tools I have after I
figure out where I should place them. I also have a
lib (libsctp.a) that adds some of the missing socketapi
functions that I need to put into lib's.. I will talk
to George about this :-)

There may still be some 64 bit issues in here, none of
us have a 64 bit processor to test with yet.. Michael
may have a MAC but thats another beast too..

If you have a mac and want to use SCTP contact Michael
he maintains a web site with a loadable module with
this code :-)

Reviewed by: gnn
Approved by: gnn


163606 22-Oct-2006 rwatson

Complete break-out of sys/sys/mac.h into sys/security/mac/mac_framework.h
begun with a repo-copy of mac.h to mac_framework.h. sys/mac.h now
contains the userspace and user<->kernel API and definitions, with all
in-kernel interfaces moved to mac_framework.h, which is now included
across most of the kernel instead.

This change is the first step in a larger cleanup and sweep of MAC
Framework interfaces in the kernel, and will not be MFC'd.

Obtained from: TrustedBSD Project
Sponsored by: SPARTA


163308 13-Oct-2006 ume

Make net.inet6.ip6.auto_linklocal tunable. Someone may want to
enable/disable auto_linklocal even in single user mode.

Discussed with: re@, gnn@
MFC after: 3 days


163306 13-Oct-2006 ume

Revert the default value of net.inet6.ip6.auto_linklocal to 1.
If ipv6_enable is not set to "YES", net.inet6.ip6.auto_linklocal
is turned to 0 at boot.

Discussed with: re@, gnn@
MFC after: 3 days


162973 02-Oct-2006 jhay

Hopefully the last tweak in trying to make it possible to add ipv6 direct
host routes without side effects.

Submitted by: JINMEI Tatuya
MFC after: 4 days


162949 02-Oct-2006 gnn

Turn off automatic link local address if ipv6_enable is not set to YES
in rc.conf

Reviewed by: KAME core team, cperciva
MFC after: 3 days


162899 30-Sep-2006 jhay

A better fix is to check if it is a host route.

Submitted by: ume
MFC after: 5 days


162863 30-Sep-2006 jhay

My previous commit broke "route add -inet6 <network_addr> -interface gif0".
Fix that by excluding point-to-point interfaces.

MFC after: 5 days


162797 29-Sep-2006 bms

Nits.

Submitted by: ru


162794 29-Sep-2006 bms

Push removal of mrouted down to the rest of the tree.


162540 22-Sep-2006 suz

fixed a bug that IPv6 packets arriving to stf are not accepted.
(a degrade introduced in in6.c Rev 1.61)

PR: kern/103415
Submitted by: JINMEI Tatuya
MFC after: 1 week


162343 16-Sep-2006 jhay

Make it possible to add an IPv6 host route to a host directly connected.

Use something like this:
route add -inet6 <dest_addr> <my_addr_on_that_interface> -interface -llinfo

This is usefull for wireless adhoc mesh networks.

MFC after: 5 days


162119 07-Sep-2006 jhay

All multicast listeners on a port should get one copy of the packet. This
was broken during the locking changes.


162084 06-Sep-2006 andre

First step of TSO (TCP segmentation offload) support in our network stack.

o add IFCAP_TSO[46] for drivers to announce this capability for IPv4 and IPv6
o add CSUM_TSO flag to mbuf pkthdr csum_flags field
o add tso_segsz field to mbuf pkthdr
o enhance ip_output() packet length check to allow for large TSO packets
o extend tcp_maxmtu[46]() with a flag pointer to pass interface capabilities
o adjust all callers of tcp_maxmtu[46]() accordingly

Discussed on: -current, -net
Sponsored by: TCP/IP Optimization Fundraise 2005


162045 05-Sep-2006 jhay

Use net.inet6.ip6.redirect / ip6_sendredirects as part of the decision
to generate icmp6 redirects. Now it is possible to switch redirects off.

MFC after: 1 week


160981 04-Aug-2006 brooks

With exception of the if_name() macro, all definitions in net_osdep.h
were unused or already in if_var.h so add if_name() to if_var.h and
remove net_osdep.h along with all references to it.

Longer term we may want to kill off if_name() entierly since all modern
BSDs have if_xname variables rendering it unnecessicary.


160591 23-Jul-2006 rwatson

Align IPv6 socket locking with IPv4 locking: lock socket buffer explicitly
and use _locked variants to avoid extra lock and unlock operations.

Reviewed by: gnn
MFC after: 1 week


160562 22-Jul-2006 gnn

The KAME project ceased work on IPv6 and IPSec in March of 2006.
Remove the README file which warns against cosmetic or local only
changes. FreeBSD committers should now feel free to work on the
IPv6 and IPSec code without fetters. The KAME mailing lists still
exist and it is always a good idea to ask questions about this code
on the snap-users@kame.net mailing list.

Reviewed by: rwatson, brooks


160549 21-Jul-2006 rwatson

Change semantics of socket close and detach. Add a new protocol switch
function, pru_close, to notify protocols that the file descriptor or
other consumer of a socket is closing the socket. pru_abort is now a
notification of close also, and no longer detaches. pru_detach is no
longer used to notify of close, and will be called during socket
tear-down by sofree() when all references to a socket evaporate after
an earlier call to abort or close the socket. This means detach is now
an unconditional teardown of a socket, whereas previously sockets could
persist after detach of the protocol retained a reference.

This faciliates sharing mutexes between layers of the network stack as
the mutex is required during the checking and removal of references at
the head of sofree(). With this change, pru_detach can now assume that
the mutex will no longer be required by the socket layer after
completion, whereas before this was not necessarily true.

Reviewed by: gnn


160491 18-Jul-2006 ups

Fix race conditions on enumerating pcb lists by moving the initialization
( and where appropriate the destruction) of the pcb mutex to the init/finit
functions of the pcb zones.
This allows locking of the pcb entries and race condition free comparison
of the generation count.
Rearrange locking a bit to avoid extra locking operation to update the generation
count in in_pcballoc(). (in_pcballoc now returns the pcb locked)

I am planning to convert pcb list handling from a type safe to a reference count
model soon. ( As this allows really freeing the PCBs)

Reviewed by: rwatson@, mohans@
MFC after: 1 week


160123 05-Jul-2006 oleg

Complete timebase (time_second -> time_uptime) conversion.

PR: kern/94249
Reviewed by: andre (few months ago)
Approved by: glebius (mentor)


160051 30-Jun-2006 yar

We needn't check "m" for NULL here because "off" should be within
the mbuf chain. If we ever get a buggy caller, a bogus "off" should
be caught by the sanity check at the function entry. Null "m" here
means a very unusual condition of a totally broken mbuf chain (wrong
m_pkthdr.len or whatever), so we can just page fault later.

Found by: Coverity Prevent(tm)
CID: 825


160038 29-Jun-2006 yar

There is a consensus that ifaddr.ifa_addr should never be NULL,
except in places dealing with ifaddr creation or destruction; and
in such special places incomplete ifaddrs should never be linked
to system-wide data structures. Therefore we can eliminate all the
superfluous checks for "ifa->ifa_addr != NULL" and get ready
to the system crashing honestly instead of masking possible bugs.

Suggested by: glebius, jhb, ru


160031 29-Jun-2006 yar

Use queue(3) macros instead of accessing list/queue internals directly.


160024 29-Jun-2006 bz

Use INPLOOKUP_WILDCARD instead of just 1 more consistently.

OKed by: rwatson (some weeks ago)


159977 27-Jun-2006 pjd

- Use suser_cred(9) instead of directly comparing cr_uid.
- Compare pointer with NULL, instead of 0.

Reviewed by: rwatson


159976 27-Jun-2006 pjd

- Use suser_cred(9) instead of directly checking cr_uid.
- Change the order of conditions to first verify that we actually need
to check for privileges and then eventually check them.

Reviewed by: rwatson


159925 25-Jun-2006 rwatson

Use suser_cred() instead of a direct comparison of cr_uid with 0 in
rip6_output().

MFC after: 1 week


159390 08-Jun-2006 gnn

Fix spurious warnings from neighbor discovery when working with IPv6 over
point to point tunnels (gif).

PR: 93220
Submitted by: Jinmei Tatuya
MFC after: 1 week


158843 23-May-2006 tanimura

Avoid spurious release of an rtentry.


158765 20-May-2006 bz

In IN6_IS_ADDR_V4MAPPED case instead of returning directly set error and
goto out so that locks will be dropped.

Reviewed by: rwatson, gnn


158500 12-May-2006 mlaier

Remove ip6fw. Since ipfw has full functional IPv6 support now and - in
contrast to ip6fw - is properly lockes, it is time to retire ip6fw.


158295 04-May-2006 bz

Assert ip6_forward_rt protected by Giant adding GIANT_REQUIRED to
functions not yet asserting it but working on global ip6_forward_rt
route cache which is not locked and perhaps should go away in the
future though cache hit/miss ration wasn't bad.

It's #if 0ed in frag6 because the code working on ip6_forward_rt is.


158237 01-May-2006 rwatson

Break out socket access control and delivery logic from udp6_input()
into its own function, udp6_append(). This mirrors a similar structure
in udp_input() and udp_append(), and makes the whole thing a lot more
readable.

While here, add missing inpcb locking in UDP6 input path.

Reviewed by: bz
MFC after: 3 months


158011 25-Apr-2006 rwatson

Move lock assertions to top of in6_pcbladdr(): we still want them to run
even if we're going to return an argument-based error.

Assert pcbinfo lock in in6_pcblookup_local(), in6_pcblookup_hash(), since
they walk pcbinfo inpcb lists.

Assert inpcb and pcbinfo locks in in6_pcbsetport(), since
port reservations are changing.

MFC after: 3 months


157978 23-Apr-2006 rwatson

Modify in6_pcbpurgeif0() to accept a pcbinfo structure rather than a pcb
list head structure; this improves congruence to IPv4, and also allows
in6_pcbpurgeif0() to lock the pcbinfo. Modify in6_pcbpurgeif0() to lock
the pcbinfo before iterating the pcb list, use queue(9)'s LIST_FOREACH()
for the iteration, and to lock individual inpcb's while manipulating
them.

MFC after: 3 months


157927 21-Apr-2006 ps

Allow for nmbclusters and maxsockets to be increased via sysctl.
An eventhandler is used to update all the various zones that depend
on these values.


157767 15-Apr-2006 rwatson

Mirror IPv4 pcb locking into in6_setsockaddr() and in6_setpeeraddr():
acquire inpcb lock when reading inpcb port+address in order to prevent
races with other threads that may be changing them.

MFC after: 3 months


157679 12-Apr-2006 rwatson

Assert the inpcb lock in udp6_output(), as we dereference various
fields.

MFC after: 3 months


157678 12-Apr-2006 rwatson

Add comment to udp6_input() that locking is missing from multicast
UDPv6 delivery.

Lock the inpcb of the UDP connection being delivered to before
processing IPSEC policy and other delivery activities.

MFC after: 3 months


157677 12-Apr-2006 rwatson

Add udbinfo locking in udp6_input() to protect lookups of the inpcb
lists during UDPv6 receipt.

MFC after: 3 months


157676 12-Apr-2006 rwatson

Don't use spl around call to in_pcballoc() in IPv6 raw socket support;
all necessary synchronization appears present.

MFC after: 3 months


157675 12-Apr-2006 rwatson

Remove one remaining use of spl in the IPv6 fragmentation code, as
this code appears properly locked.

MFC after: 3 months


157674 12-Apr-2006 rwatson

Add missing locking to udp6_getcred(), remove spl use.

MFC after: 3 months


157673 12-Apr-2006 rwatson

Remove spl use from IPv6 inpcb code.

In various inpcb methods for IPv6 sockets, don't check of so_pcb is NULL,
assert it isn't.

MFC after: 3 months


157633 10-Apr-2006 suz

ip6_mrouter_done(): use if_allmulti(0) for disabling the multicast promiscuous mode

Obtained from: KAME
MFC after: 2 days


157607 09-Apr-2006 rwatson

Fix assertion description: !=, not ==.

Submitted by: pjd
MFC after: 3 months


157374 01-Apr-2006 rwatson

Update in_pcb-derived basic socket types following changes to
pru_abort(), pru_detach(), and in_pcbdetach():

- Universally support and enforce the invariant that so_pcb is
never NULL, converting dozens of unnecessary NULL checks into
assertions, and eliminating dozens of unnecessary error handling
cases in protocol code.

- In some cases, eliminate unnecessary pcbinfo locking, as it is no
longer required to ensure so_pcb != NULL. For example, in protocol
shutdown methods, and in raw IP send.

- Abort and detach protocol switch methods no longer return failures,
nor attempt to free sockets, as the socket layer does this.

- Invoke in_pcbfree() after in_pcbdetach() in order to free the
detached in_pcb structure for a socket.

MFC after: 3 months


157373 01-Apr-2006 rwatson

Break out in_pcbdetach() into two functions:

- in_pcbdetach(), which removes the link between an inpcb and its
socket.

- in_pcbfree(), which frees a detached pcb.

Unlike the previous in_pcbdetach(), neither of these functions will
attempt to conditionally free the socket, as they are responsible only
for managing in_pcb memory. Mirror these changes into in6_pcbdetach()
by breaking it into in6_pcbdetach() and in6_pcbfree().

While here, eliminate undesired checks for NULL inpcb pointers in
sockets, as we will now have as an invariant that sockets will always
have valid so_pcb pointers.

MFC after: 3 months


157370 01-Apr-2006 rwatson

Chance protocol switch method pru_detach() so that it returns void
rather than an error. Detaches do not "fail", they other occur or
the protocol flags SS_PROTOREF to take ownership of the socket.

soclose() no longer looks at so_pcb to see if it's NULL, relying
entirely on the protocol to decide whether it's time to free the
socket or not using SS_PROTOREF. so_pcb is now entirely owned and
managed by the protocol code. Likewise, no longer test so_pcb in
other socket functions, such as soreceive(), which have no business
digging into protocol internals.

Protocol detach routines no longer try to free the socket on detach,
this is performed in the socket code if the protocol permits it.

In rts_detach(), no longer test for rp != NULL in detach, and
likewise in other protocols that don't permit a NULL so_pcb, reduce
the incidence of testing for it during detach.

netinet and netinet6 are not fully updated to this change, which
will be in an upcoming commit. In their current state they may leak
memory or panic.

MFC after: 3 months


157366 01-Apr-2006 rwatson

Change protocol switch pru_abort() API so that it returns void rather
than an int, as an error here is not meaningful. Modify soabort() to
unconditionally free the socket on the return of pru_abort(), and
modify most protocols to no longer conditionally free the socket,
since the caller will do this.

This commit likely leaves parts of netinet and netinet6 in a situation
where they may panic or leak memory, as they have not are not fully
updated by this commit. This will be corrected shortly in followup
commits to these components.

MFC after: 3 months


157209 28-Mar-2006 dwmalone

This comment on various IPPORT_ defines was copied from in.h and
probably never fully applied to IPv6. Over time it has become more
stale, so replace it with something more up to date.

Reviewed by: ume
MFC after: 1 month


157207 28-Mar-2006 rwatson

Remove manual assignment of m_pkthdr from one mbuf to another in
ipsec_copypkt(), as this is already handled by the call to M_MOVE_PKTHDR(),
which also knows how to correctly handle MAC m_tags. This corrects a panic
when running with MAC and KAME IPSEC.

PR: kern/94599
Submitted by: zhouyi zhou <zhouyi04 at ios dot cn>
Reviewed by: bz
MFC after: 3 days


157097 24-Mar-2006 suz

fixed a memory leak when net.inet6.icmp6.nd6_maxqueuelen is greater than 1

Obtained from: KAME
MFC after: 3 days


156877 19-Mar-2006 dwmalone

Make net.inet.ip.portrange.reservedhigh and
net.inet.ip.portrange.reservedlow apply to IPv6 aswell as IPv4.

We could have made new sysctls for IPv6, but that potentially makes
things complicated for mapped addresses. This seems like the least
confusing option and least likely to cause obscure problems in the
future.

This change makes the mac_portacl module useful with IPv6 apps.

Reviewed by: ume
MFC after: 1 month


156871 19-Mar-2006 suz

implements section 2.2 of RFC4191, regarding the reserved preference value (10)

Obtained from: KAME
MFC after: 1 day


156865 19-Mar-2006 suz

updates net.inet6.ip6.kame_version as the proof of the latest KAME merge

Reviewed by: KAME
MFC after: 2 days


156274 04-Mar-2006 suz

fixed a bug that an MLD report is not advertised when group-specific MLD query is received.

PR: kern/93526
Obtained from: KAME
MFC after: 1 day


155575 12-Feb-2006 ume

avoided the use of purged address structure when an address became
invalid in nd6_timer().

PR: kern/93170
Reported by: kris
Submitted by: JINMEI Tatuya <jinmei__at__isl.rdc.toshiba.co.jp>
Confirmed by: kris
Obtained from: KAME
MFC after: 2 days


155454 08-Feb-2006 gnn

Fix for an inappropriate bzero of the ICMPv6 stats. The code was zero'ing the wrong structure member but setting the correct one.

Submitted by: James dot Juran at baesystems dot com
Reviewed by: gnn
MFC after: 1 week


155333 05-Feb-2006 ume

shut up strict-aliasing rules warning.


155217 02-Feb-2006 ume

make IPV6_V6ONLY socket option work for UDP as well.

PR: ports/92620
Reported by: Kurt Miller <kurt__at__intricatesoftware.com>
MFC after: 1 week


155201 02-Feb-2006 csjp

Somewhat re-factor the read/write locking mechanism associated with the packet
filtering mechanisms to use the new rwlock(9) locking API:

- Drop the variables stored in the phil_head structure which were specific to
conditions and the home rolled read/write locking mechanism.
- Drop some includes which were used for condition variables
- Drop the inline functions, and convert them to macros. Also, move these
macros into pfil.h
- Move pfil list locking macros intp phil.h as well
- Rename ph_busy_count to ph_nhooks. This variable will represent the number
of IN/OUT hooks registered with the pfil head structure
- Define PFIL_HOOKED macro which evaluates to true if there are any
hooks to be ran by pfil_run_hooks
- In the IP/IP6 stacks, change the ph_busy_count comparison to use the new
PFIL_HOOKED macro.
- Drop optimization in pfil_run_hooks which checks to see if there are any
hooks to be ran, and returns if not. This check is already performed by the
IP stacks when they call:

if (!PFIL_HOOKED(ph))
goto skip_hooks;

- Drop in assertion which makes sure that the number of hooks never drops
below 0 for good measure. This in theory should never happen, and if it
does than there are problems somewhere
- Drop special logic around PFIL_WAITOK because rw_wlock(9) does not sleep
- Drop variables which support home rolled read/write locking mechanism from
the IPFW firewall chain structure.
- Swap out the read/write firewall chain lock internal to use the rwlock(9)
API instead of our home rolled version
- Convert the inlined functions to macros

Reviewed by: mlaier, andre, glebius
Thanks to: jhb for the new locking API


155037 30-Jan-2006 glebius

Add some initial locking to gif(4). It doesn't covers the whole driver,
however IPv4-in-IPv4 tunnels are now stable on SMP. Details:

- Add per-softc mutex.
- Hold the mutex on output.

The main problem was the rtentry, placed in softc. It could be
freed by ip_output(). Meanwhile, another thread being in
in_gif_output() can read and write this rtentry.

Reported by: many
Tested by: Alexander Shiryaev <aixp mail.ru>


154804 25-Jan-2006 ume

don't embed scope id before running packet filters.

Reported by: YAMAMOTO Takashi <yamt__at__mwd.biglobe.ne.jp>
Obtained from: NetBSD
MFC after: 1 week


154667 22-Jan-2006 rwatson

Convert in6_cksum() to ANSI C function declaration.

MFC after: 1 week


154324 14-Jan-2006 rwatson

When storing the results of malloc() in a pointer to a pointer, check
the pointer to a pointer for NULL, not the pointer for NULL.

Noticed by: Coverity Prevent analysis tool
MFC after: 3 days


154322 13-Jan-2006 rwatson

In ipcomp6_input(), check 'md' not 'm' after a call to m_pulldown(): 'm'
may be a stale pointer at this point, and we're interested in whether or
not m_pulldown() failed.

Noticed by: Coverity Prevent analysis tool
MFC after: 3 days


154131 09-Jan-2006 suz

added a note about the assumption for m->m_pkthdr.rcvif

Obtained from: KAME
MFC After: 1 day


153621 21-Dec-2005 thompsa

Add RFC 3378 EtherIP support. This change makes it possible to add gif
interfaces to bridges, which will then send and receive IP protocol 97 packets.
Packets are Ethernet frames with an EtherIP header prepended.

Obtained from: NetBSD
MFC after: 2 weeks


153257 09-Dec-2005 suz

fixed a kernel crash at the initialization time of PIM-SM register interface

MFC after: 2 days


153227 08-Dec-2005 ume

the response NS to a DAD NS was not sent correctly due to the
invalid destination address.

Submitted by: JINMEI Tatuya <jinmei__at__isl.rdc.toshiba.co.jp>
MFC after: 1 day


152524 16-Nov-2005 suz

fixed a kernel crash due to an improper removal of callout-timer
(ToDo: similar fix is necessary for other NDP-related callout-timers
in netinet6/nd6*.c)

PR: kern/88725
MFC after: 1 month


152242 09-Nov-2005 ru

Use sparse initializers for "struct domain" and "struct protosw",
so they are easier to follow for the human being.


151915 31-Oct-2005 suz

statically configured IPv6 address is properly added/deleted now

Obtained from: KAME
Reported in: freebsd-net@freebsd
MFC after: 1 day


151546 22-Oct-2005 suz

fixed a compilation failure on amd64/sparc64/ia64

Submitted by: max
MFC after: 2 month


151540 21-Oct-2005 suz

nuked non-existing commands


151539 21-Oct-2005 suz

sync with KAME regarding NDP

- introduced fine-grain-timer to manage ND-caches and IPv6 Multicast-Listeners
- supports Router-Preference <draft-ietf-ipv6-router-selection-07.txt>
- better prefix lifetime management
- more spec-comformant DAD advertisement
- updated RFC/internet-draft revisions

Obtained from: KAME
Reviewed by: ume, gnn
MFC after: 2 month


151537 21-Oct-2005 suz

perform NUD on an IPv6-aware point-to-point interface

Obtained from: KAME
MFC after: 1 week


151536 21-Oct-2005 suz

sync with KAME (renamed a macro IPV6_DADOUTPUT to IPV6_UNSPECSRC)

Obtained from: KAME


151479 19-Oct-2005 suz

sync with KAME (nuked unused code, use NULL to denote a NULL pointer)

Obtained from: KAME
Reviewed by: ume, gnn


151478 19-Oct-2005 suz

sync with KAME (removed a unnecesary non-standard macro)

Obtained from: KAME
Reviewd by: ume, gnn


151477 19-Oct-2005 suz

sync with KAME regarding the following clarification in RFC3542:
- disable IPv6 operation if DAD fails for some EUI-64 link-local addresses.
- export get_hw_ifid() (and rename it) as a subroutine for this process.

Obtained from: KAME
Reviewd by: ume, gnn
MFC after: 2 week


151475 19-Oct-2005 suz

sync with KAME (don't respond to NI_QTYPE_IPV4ADDR)

Obtained from: KAME
Reviewed by: ume, gnn


151474 19-Oct-2005 suz

supported an ndp command suboption to disable IPv6 in the given interface

Obtained from: KAME
Reviewd by: ume, gnn
MFC after: 2 week


151468 19-Oct-2005 suz

added an ioctl option in kernel so that ndp/rtadvd can change some NDP-related kernel variables based on their configurations (RFC2461 p.43 6.2.1 mandates this for IPv6 routers)

Obtained from: KAME
Reviewd by: ume, gnn
MFC after: 2 weeks


151465 19-Oct-2005 suz

sync with KAME in the following points:
- fixed typos
- improved some comment descriptions
- use NULL, instead of 0, to denote a NULL pointer
- avoid embedding a magic number in the code
- use nd6log() instead of log() to record NDP-specific logs
- nuked an unnecessay white space

Obtained from: KAME
MFC after: 1 day


151459 19-Oct-2005 suz

Raw IPv6 checksum must use the protocol number of the last header, instead of the first next-header value.

Obtained from: KAME
MFC after: 1 day


151413 17-Oct-2005 suz

fixed a kernel crash when IPv6 PIM-SM routing is enabled and a PIM register message is received

Obtained from: KAME
MFC After: 3 days


151362 15-Oct-2005 suz

added a missing unlock

Submitted by: JINMEI Tatuya
MFC After: 1 day


151253 12-Oct-2005 ume

AES counter mode uses 8byte IV, not 16 bytes.

Obtained from: NetBSD


150351 19-Sep-2005 andre

Use monotonic 'time_uptime' instead of 'time_second' as timebase
for rt->rt_rmx.rmx_expire.


150202 16-Sep-2005 suz

plugged a possible memory leak

Obtained from: KAME
MFC after: 1 day


149849 07-Sep-2005 obrien

IPv6 was improperly defining its malloc type the same as IPv4 (M_IPMADDR,
M_IPMOPTS, M_MRTABLE). Thus we had conflicting instantiations.
Create an IPv6-specific type to overcome this.


149829 06-Sep-2005 thompsa

Add support for multicast to the bridge and allow inet6 addresses to be
assigned to the interface.

IPv6 auto-configuration is disabled. An IPv6 link-local address has a
link-local scope within one link, the spec is unclear for the bridge case and
it may cause scope violation.

An address can be assigned in the usual way;
ifconfig bridge0 inet6 xxxx:...

Tested by: bmah
Reviewed by: ume (netinet6)
Approved by: mlaier (mentor)
MFC after: 1 week


149635 30-Aug-2005 andre

Use the correct mbuf type for MGET().


149224 18-Aug-2005 suz

added a missing unlock (just do the same thing as in netinet/raw_ip.c)

Obtained from: KAME
MFC after: 3 days


149200 17-Aug-2005 ume

- fix race condition using sx lock.
- use TAILQ_FOREACH() for readability.

Suggested by: jhb


149148 16-Aug-2005 ume

avoid exclusive sleep mutex.


149033 13-Aug-2005 ume

added a knob to enable path MTU discovery for multicast packets.
(by default, it is disabled)

Submitted by: suz
Obtained from: KAME


148987 12-Aug-2005 ume

- fix typo in comment.
- nuke unused code.

Submitted by: suz
Obtained from: KAME


148954 11-Aug-2005 glebius

o Make rt_check() function more strict:
- rt0 passed to rt_check() must not be NULL, assert this.
- rt returned by rt_check() must be valid locked rtentry,
if no error occured.
o Modify callers, so that they never pass NULL rt0
to rt_check().

Reviewed by: sam, ume (nd6.c)


148953 11-Aug-2005 ume

create sysctl tree dynamically. it is required to share
net.inet6.ip6.fw with upcomming ipfw2 improvement for IPv6.

Requested by: bz


148940 10-Aug-2005 ume

removed RFC1885-related code. it was obsoleted by RFC2463, and the
code was #ifdef'ed out for a long time.

Submitted by: suz
Obtained from: KAME


148921 10-Aug-2005 suz

supports stealth forwarding in IPv6, as well as in IPv4

PR: kern/54625
MFC after: 1 week


148920 10-Aug-2005 obrien

Remove public declarations of variables that were forgotten when they were
made static.


148917 10-Aug-2005 obrien

Style nit.


148914 10-Aug-2005 suz

fixed a kernel crash at the start-up time of an IPv6 multicast daemons o
(e.g. pim6dd, pim6sd)

MFC after: 3 days


148892 09-Aug-2005 ume

corrected the fourth argument to ni6_addrs().


148887 09-Aug-2005 rwatson

Propagate rename of IFF_OACTIVE and IFF_RUNNING to IFF_DRV_OACTIVE and
IFF_DRV_RUNNING, as well as the move from ifnet.if_flags to
ifnet.if_drv_flags. Device drivers are now responsible for
synchronizing access to these flags, as they are in if_drv_flags. This
helps prevent races between the network stack and device driver in
maintaining the interface flags field.

Many __FreeBSD__ and __FreeBSD_version checks maintained and continued;
some less so.

Reviewed by: pjd, bz
MFC after: 7 days


148883 09-Aug-2005 glebius

In preparation for fixing races in ARP (and probably in other
L2/L3 mappings) make rt_check() return a locked rtentry.


148882 09-Aug-2005 glebius

- Use 'error' variable to store error value, instead of 'i'.
- Push 'i' into the only block where it is used.
- Remove redundant check for rt being NULL. If rt_check() hasn't
returned an error, then rt is valid.

Reviewed by: gnn


148653 02-Aug-2005 rwatson

Modify network protocol consumers of the ifnet multicast address lists
to lock if_addr_mtx.

Problem reported by: Ed Maste <emaste at phaedrus dot sandvine dot ca>
MFC after: 1 week


148488 28-Jul-2005 ume

simplied the fix to FreeBSD-SA-04:06.ipv6. The previous one worried
too much even though we actually validate the parameters. This code
also is more compatible with other *BSDs, which do copyin within
setsockopt().

Submitted by: Keiichi SHIMA <keiichi__at__iijlab.net>
Reviewed by: security-officer (nectar)
Obtained from: KAME


148436 27-Jul-2005 cperciva

Correct a buffer overflow which can occur when decompressing a
carefully crafted deflated data stream. [1]

Correct problems in the AES-XCBC-MAC IPsec authentication algorithm. [2]

Submitted by: suz [2]
Security: FreeBSD-SA-05:18.zlib [1], FreeBSD-SA-05:19.ipsec [2]


148417 26-Jul-2005 ume

nuke duplicate inclusion of scope6_var.h.


148399 25-Jul-2005 ume

oops, make it compilable. i need sleep. X-(


148396 25-Jul-2005 ume

restore locks which disappeared wrongly by my previous commit.


148385 25-Jul-2005 ume

scope cleanup. with this change
- most of the kernel code will not care about the actual encoding of
scope zone IDs and won't touch "s6_addr16[1]" directly.
- similarly, most of the kernel code will not care about link-local
scoped addresses as a special case.
- scope boundary check will be stricter. For example, the current
*BSD code allows a packet with src=::1 and dst=(some global IPv6
address) to be sent outside of the node, if the application do:
s = socket(AF_INET6);
bind(s, "::1");
sendto(s, some_global_IPv6_addr);
This is clearly wrong, since ::1 is only meaningful within a single
node, but the current implementation of the *BSD kernel cannot
reject this attempt.

Submitted by: JINMEI Tatuya <jinmei__at__isl.rdc.toshiba.co.jp>
Obtained from: KAME


148250 21-Jul-2005 ume

always copy ip6_pktopt. remove needcopy and needfree
argument/structure member accordingly.

Submitted by: Keiichi SHIMA <keiichi__at__iijlab.net>
Obtained from: KAME


148247 21-Jul-2005 ume

simplified udp6_output() and rip6_output(): do not override
in6p_outputopts at the entrance of the functions. this trick was
necessary when we passed an in6 pcb to in6_embedscope(), within which
the in6p_outputopts member was used, but we do not use this kind of
interface any more.

Submitted by: Keiichi SHIMA <keiichi__at__iijlab.net>
Obtained from: KAME


148242 21-Jul-2005 ume

be consistent on naming advanced API functions; use ip6_XXXpktopt(s).

Submitted by: Keiichi SHIMA <keiichi__at__iijlab.net>
Obtained from: KAME


148241 21-Jul-2005 ume

NULL is not zero.

Submitted by: Keiichi SHIMA <keiichi__at__iijlab.net>
Obtained from: KAME


148210 20-Jul-2005 ume

do not hardcode if_mtu values in here, except for IFT_{ARC,FDDI} -
they need special handling. makes it possible to take advantage of 9k ether
frames.

Obtained from: NetBSD


148169 20-Jul-2005 ume

update comments:
- RFC2292bis -> RFC3542
- typo fixes

Submitted by: Keiichi SHIMA <keiichi__at__iijlab.net>
Obtained from: KAME


147744 02-Jul-2005 thompsa

Check the alignment of the IP header before passing the packet up to the
packet filter. This would cause a panic on architectures that require strict
alignment such as sparc64 (tier1) and ia64/ppc (tier2).

This adds two new macros that check the alignment, these are compile time
dependent on __NO_STRICT_ALIGNMENT which is set for i386 and amd64 where
alignment isn't need so the cost is avoided.

IP_HDR_ALIGNED_P()
IP6_HDR_ALIGNED_P()

Move bridge_ip_checkbasic()/bridge_ip6_checkbasic() up so that the alignment
is checked for ipfw and dummynet too.

PR: ia64/81284
Obtained from: NetBSD
Approved by: re (dwhite), mlaier (mentor)


147507 20-Jun-2005 ume

fix IP(v4) over IPv6 tunneling most likely broken with ifnet changes.

Submitted by: bz
Approved by: re (dwhite)


147306 12-Jun-2005 brooks

Fix IPv6 neighbor discovery by using IF_LLADDR to get the mac address
instead of a particularly ugly cast + pointer math hack.

Reported by: kuriyama, kris


147256 10-Jun-2005 brooks

Stop embedding struct ifnet at the top of driver softcs. Instead the
struct ifnet or the layer 2 common structure it was embedded in have
been replaced with a struct ifnet pointer to be filled by a call to the
new function, if_alloc(). The layer 2 common structure is also allocated
via if_alloc() based on the interface type. It is hung off the new
struct ifnet member, if_l2com.

This change removes the size of these structures from the kernel ABI and
will allow us to better manage them as interfaces come and go.

Other changes of note:
- Struct arpcom is no longer referenced in normal interface code.
Instead the Ethernet address is accessed via the IFP2ENADDR() macro.
To enforce this ac_enaddr has been renamed to _ac_enaddr.
- The second argument to ether_ifattach is now always the mac address
from driver private storage rather than sometimes being ac_enaddr.

Reviewed by: sobomax, sam


146883 02-Jun-2005 iedowse

Use IFF_LOCKGIANT/IFF_UNLOCKGIANT around calls to the interface
if_ioctl routine. This should fix a number of code paths through
soo_ioctl() that could call into Giant-locked network drivers without
first acquiring Giant.


146857 01-Jun-2005 rwatson

Lock udbinfo and inp before calling in6_pcbdetach() from udp6_abort().

MFC after: 1 week


146228 15-May-2005 gnn

Fixes for various nits found by the Coverity tool.

In particular 2 missed return values and an inappropriate bcopy from
a possibly NULL pointer.

Reviewed by: jake
Approved by: rwatson
MFC after: 1 week


145246 18-Apr-2005 brooks

Add IPv6 support to IPFW and Dummynet.

Submitted by: Mariano Tortoriello and Raffaele De Lorenzo (via luigi)


145065 14-Apr-2005 gnn

Remove dead code which would never execute.
i.e. checking to see if a cluster was every less than 48 bytes,
a rather unlikely case.

Check return value of m_dup_pkthdr() calls.

Found by: Coverity
Reviewed by: rwatson (mentor), Keiichi Shima (for Kame)
Approved by: rwatson (mentor)


144261 29-Mar-2005 sam

check for malloc failure (also move malloc up to simplify error recovery)

Noticed by: Coverity Prevent analysis tool
Reviewed by: gnn


143881 20-Mar-2005 glebius

ifma_protospec is a pointer. Use NULL when assigning or compating it.


143675 16-Mar-2005 sam

correct bounds check

Noticed by: Coverity Prevent analysis tool


143406 11-Mar-2005 ume

refer opencrypto/cast.h directly.


143322 09-Mar-2005 ume

reported from VANHULLEBUS Yvan [remote kernel crash may result]

Submitted by: itojun
Obtained from: KAME
MFC after: 1 day


142987 02-Mar-2005 suz

ignores ICMPv6 code field in case of ICMPv6 Packet-Too-Big (as specified in RFC2463 and draft-ietf-ipngwg-icmp-v3-06.txt)

Obtained from: KAME
MFC after: 1 day


142681 27-Feb-2005 ume

icmp6_notify_error uses IP6_EXTHDR_CHECK, which in turn calls
m_pullup. icmp6_notify_error continued to use the old pointer,
which after the m_pullup is not suitable as a packet header any
longer (see m_move_pkthdr).
and this is what causes the kernel panic in sbappendaddr later on.

PR: kern/77934
Submitted by: Gerd Rausch <gerd@juniper.net>
MFC after: 2 days


142679 27-Feb-2005 ume

fix typo.

MFC after: 2 days


142674 27-Feb-2005 ume

initialized the last arg to ip6_process_hopopts(), because the recent
code requires it to be 0 when a jumbo payload option is contained.

PR: kern/77934
Submitted by: Gerd Rausch <gerd@juniper.net>
Obtained from: KAME
MFC after: 2 days


142520 25-Feb-2005 sam

remove dead code

Noticed by: Coverity Prevent analysis tool


142337 23-Feb-2005 sam

eliminate dead code

Noticed by: Coverity Prevent analysis tool


142215 22-Feb-2005 glebius

Add CARP (Common Address Redundancy Protocol), which allows multiple
hosts to share an IP address, providing high availability and load
balancing.

Original work on CARP done by Michael Shalayeff, with many
additions by Marco Pfatschbacher and Ryan McBride.

FreeBSD port done solely by Max Laier.

Patch by: mlaier
Obtained from: OpenBSD (mickey, mcbride)


141553 09-Feb-2005 rwatson

Add missed merge of ripcbinfo extern. Given how widely used
ripcbinfo is, we should probably add it to an include file.

Spotted by: mux


141545 08-Feb-2005 rwatson

Lock raw IP socket pcb list and PCBs when processing input via
icmp6_rip6_input().

Reviewed by: gnn
MFC after: 1 week


141422 06-Feb-2005 rwatson

Remove a comment from the raw IPv6 output function regarding
M_TRYWAIT allocations: M_PREPEND() now uses M_DONTWAIT.

MFC after: 3 days


140588 21-Jan-2005 ume

we don't need to make fake sockaddr_in6 to compare subject address.

MFC after: 1 week


139826 07-Jan-2005 imp

/* -> /*- for license, minor formatting changes, separate for KAME


138653 10-Dec-2004 glebius

In certain cases ip_output() can free our route, so check
for its presence before RTFREE().

Noticed by: ru


138620 09-Dec-2004 glebius

style the last change


138619 09-Dec-2004 glebius

MFinet4:

- Make route cacheing optional, configurable via IFF_LINK0 flag.
- Turn it off by default.

Reminded by: suz


138184 29-Nov-2004 gnn

Reviewed by: SUZUKI Shinsuke <suz@kame.net>
Approved by: Robert Watson <rwatson@freebsd.org>

Add locking to the IPv6 scoping code.

All spl() like calls have also been removed.

Cleaning up the handling of ifnet data will happen at a later date.


137396 08-Nov-2004 suz

support TCP-MD5(IPv4) in KAME-IPSEC, too.

MFC after: 3 week


137386 08-Nov-2004 phk

Initialize struct pr_userreqs in new/sparse style and fill in common
default elements in net_init_domain().

This makes it possible to grep these structures and see any bogosities.


137011 28-Oct-2004 suz

fixed a bug that incorrect IPsec request level may be returned for proto AH

Obtained from: KAME


136689 19-Oct-2004 andre

Be more careful to only index valid IP protocols and be more verbose with
comments.


136682 18-Oct-2004 rwatson

Push acquisition of the accept mutex out of sofree() into the caller
(sorele()/sotryfree()):

- This permits the caller to acquire the accept mutex before the socket
mutex, avoiding sofree() having to drop the socket mutex and re-order,
which could lead to races permitting more than one thread to enter
sofree() after a socket is ready to be free'd.

- This also covers clearing of the so_pcb weak socket reference from
the protocol to the socket, preventing races in clearing and
evaluation of the reference such that sofree() might be called more
than once on the same socket.

This appears to close a race I was able to easily trigger by repeatedly
opening and resetting TCP connections to a host, in which the
tcp_close() code called as a result of the RST raced with the close()
of the accepted socket in the user process resulting in simultaneous
attempts to de-allocate the same socket. The new locking increases
the overhead for operations that may potentially free the socket, so we
will want to revise the synchronization strategy here as we normalize
the reference counting model for sockets. The use of the accept mutex
in freeing of sockets that are not listen sockets is primarily
motivated by the potential need to remove the socket from the
incomplete connection queue on its parent (listen) socket, so cleaning
up the reference model here may allow us to substantially weaken the
synchronization requirements.

RELENG_5_3 candidate.

MFC after: 3 days
Reviewed by: dwhite
Discussed with: gnn, dwhite, green
Reported by: Marc UBM Bocklet <ubm at u-boot-man dot de>
Reported by: Vlad <marchenko at gmail dot com>


136184 06-Oct-2004 suz

fixed too delayed routing cache expiry. (tvtohz() converts a time interval to ticks, whereas hzto() converts an absolute time to ticks)

Obtained from: KAME


136076 03-Oct-2004 green

Prevent reentrancy of the IPv6 routing code (leading to crash with
INVARIANTS on, who knows what with it off).


136069 02-Oct-2004 dwhite

Disable MTU feedback in IPv6 if the sender writes data that must be fragmented.
Discussed extensively with KAME. The API author's intent isn't clear at this
point, so rather than remove the code entirely, #if 0 out and put a big
comment in for now. The IPV6_RECVPATHMTU sockopt is available if the
application wants to be notified of the path MTU to optimize packet sizes.

Thanks to JINMEI Tatuya <jinmei@isl.rdc.toshiba.co.jp> for putting up
with my incessant badgering on this issue, and fenner for pointing out
the API issue and suggesting solutions.


135920 29-Sep-2004 mlaier

Add an additional struct inpcb * argument to pfil(9) in order to enable
passing along socket information. This is required to work around a LOR with
the socket code which results in an easy reproducible hard lockup with
debug.mpsafenet=1. This commit does *not* fix the LOR, but enables us to do
so later. The missing piece is to turn the filter locking into a leaf lock
and will follow in a seperate (later) commit.

This will hopefully be MT5'ed in order to fix the problem for RELENG_5 in
forseeable future.

Suggested by: rwatson
A lot of work by: csjp (he'd be even more helpful w/o mentor-reviews ;)
Reviewed by: rwatson, csjp
Tested by: -pf, -ipfw, LINT, csjp and myself
MFC after: 3 days

LOR IDs: 14 - 17 (not fixed yet)


135577 22-Sep-2004 stefanf

Prefer C99's __func__ over GCC's __FUNCTION__.


134822 05-Sep-2004 rwatson

Call callout_init() on nd6_slowtimo_ch before setting it going; otherwise,
the flags field will be improperly initialized resulting in inconsistent
operation (sometimes with Giant, sometimes without, et al).

RELENG_5 candidate.


134655 02-Sep-2004 rwatson

Unlock rather than lock the ripcbinfo lock at the end of rip6_input().

RELENG_5 candidate.

Foot provided by: Patrick Guelat <pg at imp dot ch>


134445 28-Aug-2004 rwatson

Mark Netgraph TTY, KAME IPSEC, and IPX/SPX as requiring Giant for correct
operation using NET_NEEDS_GIANT(). This will result in a boot-time
restoration of Giant-enabled network operation, or run-time warning on
dynamic load (applicable only to the Netgraph component). Additional
components will likely need to be marked with this in the future.


134391 27-Aug-2004 andre

Apply error and success logic consistently to the function netisr_queue() and
its users.

netisr_queue() now returns (0) on success and ERRNO on failure. At the
moment ENXIO (netisr queue not functional) and ENOBUFS (netisr queue full)
are supported.

Previously it would return (1) on success but the return value of IF_HANDOFF()
was interpreted wrongly and (0) was actually returned on success. Due to this
schednetisr() was never called to kick the scheduling of the isr. However this
was masked by other normal packets coming through netisr_dispatch() causing the
dequeueing of waiting packets.

PR: kern/70988
Found by: MOROHOSHI Akihiko <moro@remus.dti.ne.jp>
MFC after: 3 days


134383 27-Aug-2004 andre

Always compile PFIL_HOOKS into the kernel and remove the associated kernel
compile option. All FreeBSD packet filters now use the PFIL_HOOKS API and
thus it becomes a standard part of the network stack.

If no hooks are connected the entire packet filter hooks section and related
activities are jumped over. This removes any performance impact if no hooks
are active.

Both OpenBSD and DragonFlyBSD have integrated PFIL_HOOKS permanently as well.


134188 23-Aug-2004 rwatson

Remove in6_prefix.[ch] and the contained router renumbering capability.
The prefix management code currently resides in nd6, leaving only the
unused router renumbering capability in the in6_prefix files. Removing
it will make it easier for us to provide locking for the remainder of
IPv6 by reducing the number of objects requiring synchronized access.

This functionality has also been removed from NetBSD and OpenBSD.

Submitted by: George Neville-Neil <gnn at neville-neil.com>
Discussed with/approved by: suz, keiichi at kame.net, core at kame.net


134121 21-Aug-2004 rwatson

When notifying protocol components of an event on an in6pcb, use the
result of the notify() function to decide if we need to unlock the
in6pcb or not, rather than always unlocking. Otherwise, we may unlock
and already unlocked in6pcb.

Reported by: kuriyama, Gordon Bergling <gbergling at 0xfce3.net>
Tested by: kuriyama, Gordon Bergling <gbergling at 0xfce3.net>
Discussed with: mdodd


133720 14-Aug-2004 dwmalone

Get rid of the RANDOM_IP_ID option and make it a sysctl. NetBSD
have already done this, so I have styled the patch on their work:

1) introduce a ip_newid() static inline function that checks
the sysctl and then decides if it should return a sequential
or random IP ID.

2) named the sysctl net.inet.ip.random_id

3) IPv6 flow IDs and fragment IDs are now always random.
Flow IDs and frag IDs are significantly less common in the
IPv6 world (ie. rarely generated per-packet), so there should
be smaller performance concerns.

The sysctl defaults to 0 (sequential IP IDs).

Reviewed by: andre, silby, mlaier, ume
Based on: NetBSD
MFC after: 2 months


133592 12-Aug-2004 rwatson

When allocating the IPv6 header to stick in front of raw packet being
sent via a raw IPv6 socket, use M_DONTWAIT not M_TRYWAIT, as we're
holding the raw pcb mutex.

Reported, tested by: kuriyama


133192 06-Aug-2004 rwatson

Pass pcbinfo structures to in6_pcbnotify() rather than pcbhead
structures, allowing in6_pcbnotify() to lock the pcbinfo and each
inpcb that it notifies of ICMPv6 events. This prevents inpcb
assertions from firing when IPv6 generates and delievers event
notifications for inpcbs.

Reported by: kuriyama
Tested by: kuriyama


132794 28-Jul-2004 yar

Disallow a particular kind of port theft described by the following scenario:

Alice is too lazy to write a server application in PF-independent
manner. Therefore she knocks up the server using PF_INET6 only
and allows the IPv6 socket to accept mapped IPv4 as well. An evil
hacker known on IRC as cheshire_cat has an account in the same
system. He starts a process listening on the same port as used
by Alice's server, but in PF_INET. As a consequence, cheshire_cat
will distract all IPv4 traffic supposed to go to Alice's server.

Such sort of port theft was initially enabled by copying the code that
implemented the RFC 2553 semantics on IPv4/6 sockets (see inet6(4)) for
the implied case of the same owner for both connections. After this
change, the above scenario will be impossible. In the same setting,
the user who attempts to start his server last will get EADDRINUSE.

Of course, using IPv4 mapped to IPv6 leads to security complications
in the first place, but there is no reason to make it even more unsafe.

This change doesn't apply to KAME since it affects a FreeBSD-specific
part of the code. It doesn't modify the out-of-box behaviour of the
TCP/IP stack either as long as mapping IPv4 to IPv6 is off by default.

MFC after: 1 month


132714 27-Jul-2004 rwatson

Commit a first pass at in6pcb and pcbinfo locking for IPv6,
synchronizing IPv6 protocol control blocks and lists. These changes
are modeled on the inpcb locking for IPv4, submitted by Jennifer Yang,
and committed by Jeffrey Hsu. With these locking changes, IPv6 use of
inpcbs is now substantially more MPSAFE, and permits IPv4 inpcb locking
assertions to be run in the presence of IPv6 compiled into the kernel.


132699 27-Jul-2004 yar

Don't consider TCP connections beyond LISTEN state
(i.e. with the foreign address being not wildcard) when checking
for possible port theft since such connections cannot be stolen.

The port theft check is FreeBSD-specific and isn't in the KAME tree.

PR: bin/65928 (in the audit trail)
Reviewed by: -net, -hackers (silence)
Tested by: Nick Leuta <skynick at mail.sc.ru>
MFC after: 1 month


132653 26-Jul-2004 cperciva

Rename suser_cred()'s PRISON_ROOT flag to SUSER_ALLOWJAIL. This is
somewhat clearer, but more importantly allows for a consistent naming
scheme for suser_cred flags.

The old name is still defined, but will be removed in a few days (unless I
hear any complaints...)

Discussed with: rwatson, scottl
Requested by: jhb


132199 15-Jul-2004 phk

Do a pass over all modules in the kernel and make them return EOPNOTSUPP
for unknown events.

A number of modules return EINVAL in this instance, and I have left
those alone for now and instead taught MOD_QUIESCE to accept this
as "didn't do anything".


130416 13-Jun-2004 mlaier

Link ALTQ to the build and break with ABI for struct ifnet. Please recompile
your (network) modules as well as any userland that might make sense of
sizeof(struct ifnet).
This does not change the queueing yet. These changes will follow in a
seperate commit. Same with the driver changes, which need case by case
evaluation.

__FreeBSD_version bump will follow.

Tested-by: (i386)LINT


130388 12-Jun-2004 rwatson

Missed directory in previous commit; need to hold SOCK_LOCK(so)
before calling sotryfree().

-- Body of earlier bulk commit this belonged with --

Log:
Extend coverage of SOCK_LOCK(so) to include so_count, the socket
reference count:

- Assert SOCK_LOCK(so) macros that directly manipulate so_count:
soref(), sorele().

- Assert SOCK_LOCK(so) in macros/functions that rely on the state of
so_count: sofree(), sotryfree().

- Acquire SOCK_LOCK(so) before calling these functions or macros in
various contexts in the stack, both at the socket and protocol
layers.

- In some cases, perform soisdisconnected() before sotryfree(), as
this could result in frobbing of a non-present socket if
sotryfree() actually frees the socket.

- Note that sofree()/sotryfree() will release the socket lock even if
they don't free the socket.

Submitted by: sam
Sponsored by: FreeBSD Foundation
Obtained from: BSD/OS


130002 02-Jun-2004 ume

do not check super user privilege in ip6_savecontrol. It is
meaningless and can even be harmful.

Obtained from: KAME
MFC after: 3 days


129880 30-May-2004 phk

add missing #include <sys/module.h>


129196 14-May-2004 wpaul

Fix a bug which I discovered recently while doing IPv6 testing at
Wind River. In the IPv4 output path, one of the tests in ip_output()
checks how many slots are actually available in the interface output
queue before attempting to send a packet. If, for example, we need
to transmit a packet of 32K bytes over an interface with an MTU of
1500, we know it's going to take about 21 fragments to do it. If
there's less than 21 slots left in the output queue, there's no point
in transmitting anything at all: IP does not do retransmission, so
sending only some of the fragments would just be a waste of bandwidth.
(In an extreme case, if you're sending a heavy stream of fragmented
packets, you might find yourself sending nothing by the first fragment
of all your packets.) So if ip_output() notices there's not enough
room in the output queue to send the frame, it just dumps the packet
and returns ENOBUFS to the app.

It turns out ip6_output() lacks this code. Consequently, this caused
the netperf UDPIPV6_STREAM test to produce very poor results with large
write sizes. This commit adds code to check the remaining space in the
output queue and junk fragmented packets if they're too big to be
sent, just like with IPv4. (I can't imagine anyone's running an NFS
server using UDP over IPv6, but if they are, this will likely make them
a lot happier. :)


128666 26-Apr-2004 luigi

fix the change of interface in nd6_storelladdr for multicast
addresses too.

Reported by: Jun Kuriyama


128636 25-Apr-2004 luigi

This commit does two things:

1. rt_check() cleanup:
rt_check() is only necessary for some address families to gain access
to the corresponding arp entry, so call it only in/near the *resolve()
routines where it is actually used -- at the moment this is
arpresolve(), nd6_storelladdr() (the call is embedded here),
and atmresolve() (the call is just before atmresolve to reduce
the number of changes).
This change will make it a lot easier to decouple the arp table
from the routing table.

There is an extra call to rt_check() in if_iso88025subr.c to
determine the routing info length. I have left it alone for
the time being.

The interface of arpresolve() and nd6_storelladdr() now changes slightly:
+ the 'rtentry' parameter (really a hint from the upper level layer)
is now passed unchanged from *_output(), so it becomes the route
to the final destination and not to the gateway.
+ the routines will return 0 if resolution is possible, non-zero
otherwise.
+ arpresolve() returns EWOULDBLOCK in case the mbuf is being held
waiting for an arp reply -- in this case the error code is masked
in the caller so the upper layer protocol will not see a failure.

2. arpcom untangling
Where possible, use 'struct ifnet' instead of 'struct arpcom' variables,
and use the IFP2AC macro to access arpcom fields.
This mostly affects the netatalk code.

=== Detailed changes: ===
net/if_arcsubr.c
rt_check() cleanup, remove a useless variable

net/if_atmsubr.c
rt_check() cleanup

net/if_ethersubr.c
rt_check() cleanup, arpcom untangling

net/if_fddisubr.c
rt_check() cleanup, arpcom untangling

net/if_iso88025subr.c
rt_check() cleanup

netatalk/aarp.c
arpcom untangling, remove a block of duplicated code

netatalk/at_extern.h
arpcom untangling

netinet/if_ether.c
rt_check() cleanup (change arpresolve)

netinet6/nd6.c
rt_check() cleanup (change nd6_storelladdr)


128422 19-Apr-2004 luigi

ifp has the same value as rt->rti_ifp so remove the dependency
on the route entry to locate the necessary information.


128421 19-Apr-2004 luigi

Remove a tail-recursive call in nd6_output.
This change is functionally identical to the original code, though
I have no idea if that was correct in the first place (see comment
in the commit).


128397 18-Apr-2004 luigi

Replace Bcopy/Bzero with 'the real thing' as in the rest of the file.


128019 07-Apr-2004 imp

Remove advertising clause from University of California Regent's
license, per letter dated July 22, 1999 and email from Peter Wemm,
Alan Cox and Robert Watson.

Approved by: core, peter, alc, rwatson


127711 01-Apr-2004 suz

UDP checksum is mandatory in IPv6 (RFC2460 p.28)

Obtained from: KAME


127505 27-Mar-2004 pjd

Reduce 'td' argument to 'cred' (struct ucred) argument in those functions:
- in_pcbbind(),
- in_pcbbind_setup(),
- in_pcbconnect(),
- in_pcbconnect_setup(),
- in6_pcbbind(),
- in6_pcbconnect(),
- in6_pcbsetport().
"It should simplify/clarify things a great deal." --rwatson

Requested by: rwatson
Reviewed by: rwatson, ume


127504 27-Mar-2004 pjd

Remove unused argument.

Reviewed by: ume


127502 27-Mar-2004 pjd

Remove unused prototype.

Reviewed by: ume


127463 26-Mar-2004 ume

Validate IPv6 socket options more carefully to avoid a panic.

PR: kern/61513
Reviewed by: cperciva, nectar


126794 10-Mar-2004 rwatson

Move the AH algorithm list from a static local function variable to
a static const global variable in ah_core.c. This makes it more clear
that this array does not require synchronization, as well as
synchronizing the layout to the ESP algorithm list. This is the
version of my patch that Itojun committed to the KAME tree.

Obtained from: me, via KAME


126603 04-Mar-2004 ume

move in6_addmulti()/in6_delmulti() into mld6.c

Obtained from: KAME


126595 04-Mar-2004 ume

missing splx().

Obtained from: KAME
MFC after: 3 days


126552 03-Mar-2004 ume

- stlye and comments
- variable name change (scopeid -> zoneid)
- u_short -> u_int16_t, u_char -> u_int8_t

Obtained from: KAME


126508 02-Mar-2004 mlaier

Move PFIL_HOOKS and ipfw past the scope checks to allow easy redirection to
linklocal.

Obtained from: OpenBSD
Reviewed by: ume
Approved by: bms(mentor)


126489 02-Mar-2004 ume

scope awareness of ff01:: is not merged, yet. So, clear
embeded form of scopeid for ff01:: for now.

Pointed out by: mlaier


126444 01-Mar-2004 ume

- reject incoming packets to an interface-local multicast address from
the wire.
- added a generic scope check, and removed checks for loopback src/dst
addresses.

Obtained from: KAME


126264 26-Feb-2004 mlaier

Bring eventhandler callbacks for pf.
This enables pf to track dynamic address changes on interfaces (dailup) with
the "on (<ifname>)"-syntax. This also brings hooks in anticipation of
tracking cloned interfaces, which will be in future versions of pf.

Approved by: bms(mentor)


126263 26-Feb-2004 mlaier

Tweak existing header and other build infrastructure to be able to build
pf/pflog/pfsync as modules. Do not list them in NOTES or modules/Makefile
(i.e. do not connect it to any (automatic) builds - yet).

Approved by: bms(mentor)


126194 24-Feb-2004 ume

in icmp6_mtudisc_update(), use ND link mtu to detect if the path MTU
should be updated.

Helped by: andre


126184 24-Feb-2004 cperciva

Fix array overflow: If len=128, don't access [16] of a 16-byte IPv6
address, even if we subsequently ignore its value by applying a >>8
to it.

Reported by: "Ted Unangst" <tedu@coverity.com>
Approved by: rwatson (mentor), {ume, suz} (KAME)


126006 19-Feb-2004 ume

- call ip6_output() instead of nd6_output() when ipsec tunnel
mode is applied, since tunneled packets are considered to be
generated packets from a tunnel encapsulating node.
- tunnel mode may not be applied if SA mode is ANY and policy
does not say "tunnel it". check if we have extra IPv6 header
on the packet after ipsec6_output_tunnel() and call ip6_output()
only if additional IPv6 header is added.
- free the copyed packet before returning.

Obtained from: KAME


125941 17-Feb-2004 ume

IPSEC and FAST_IPSEC have the same internal API now;
so merge these (IPSEC has an extra ipsecstat)

Submitted by: "Bjoern A. Zeeb" <bzeeb+freebsd@zabbadoz.net>


125878 16-Feb-2004 ume

correct function name in comment.

Submitted by: "Bjoern A. Zeeb" <bzeeb+freebsd@zabbadoz.net>


125874 16-Feb-2004 ume

nuke unused functions.

Submitted by: "Bjoern A. Zeeb" <bzeeb+freebsd@zabbadoz.net>


125873 16-Feb-2004 ume

we don't need to include ipsec.h.

Submitted by: "Bjoern A. Zeeb" <bzeeb+freebsd@zabbadoz.net>


125777 13-Feb-2004 ume

- wrap mappedaddr block by #ifdef INET for IPv6-only kernel in future.
- rejects IPv6 packet toward IPv4-mapped address if its source address
is not an IPv4-mapped IPv6 address, since the converted IPv4 packets
would have an unexpected IPv4 source address.
- when V6ONLY socket option is set, discard packets destined to a
v4/ipv4 mapped ipv6 address.
- have PULLDOWN_TEST codepath.
- get rid of in6_mcmatch().

Obtained from: KAME


125776 13-Feb-2004 ume

supported IPV6_RECVPATHMTU socket option.

Obtained from: KAME


125680 11-Feb-2004 bms

Initial import of RFC 2385 (TCP-MD5) digest support.

This is the first of two commits; bringing in the kernel support first.
This can be enabled by compiling a kernel with options TCP_SIGNATURE
and FAST_IPSEC.

For the uninitiated, this is a TCP option which provides for a means of
authenticating TCP sessions which came into being before IPSEC. It is
still relevant today, however, as it is used by many commercial router
vendors, particularly with BGP, and as such has become a requirement for
interconnect at many major Internet points of presence.

Several parts of the TCP and IP headers, including the segment payload,
are digested with MD5, including a shared secret. The PF_KEY interface
is used to manage the secrets using security associations in the SADB.

There is a limitation here in that as there is no way to map a TCP flow
per-port back to an SPI without polluting tcpcb or using the SPD; the
code to do the latter is unstable at this time. Therefore this code only
supports per-host keying granularity.

Whilst FAST_IPSEC is mutually exclusive with KAME IPSEC (and thus IPv6),
TCP_SIGNATURE applies only to IPv4. For the vast majority of prospective
users of this feature, this will not pose any problem.

This implementation is output-only; that is, the option is honoured when
responding to a host initiating a TCP session, but no effort is made
[yet] to authenticate inbound traffic. This is, however, sufficient to
interwork with Cisco equipment.

Tested with a Cisco 2501 running IOS 12.0(27), and Quagga 0.96.4 with
local patches. Patches for tcpdump to validate TCP-MD5 sessions are also
available from me upon request.

Sponsored by: sentex.net


125626 09-Feb-2004 ume

fix build with FAST_IPSEC.

Reported by: cjc


125595 08-Feb-2004 ume

- obey ip6po_minmtu.
- notify a proper path MTU to applications.

Obtained from: KAME


125436 04-Feb-2004 ume

KNF

Obtained from: KAME


125396 03-Feb-2004 ume

pass pcb rather than so. it is expected that per socket policy
works again.


125147 28-Jan-2004 ume

protect access to ifnet structure with mutex.


124465 13-Jan-2004 ume

call ipsec_pcbconn()/ipsec_pcbdisconn() from in6_pcbconnect().

Obtained from: KAME


124456 13-Jan-2004 ume

correct spelling

Submitted by: "Bjoern A. Zeeb" <bzeeb+freebsd@zabbadoz.net>
Reviewed by: itojun


124455 13-Jan-2004 ume

fix potential 'cannot-happen' memory leak

Submitted by: "Bjoern A. Zeeb" <bzeeb+freebsd@zabbadoz.net>
Reviewed by: itojun


124337 10-Jan-2004 ume

try rtinit() only when the route is not installed.
this allows, e.g., duplicated attempts of 'ifconfig lo0 ::1'
like for IPv4.

Obtained from: KAME
MFC after: 1 week


124333 10-Jan-2004 truckman

Don't execute the code in in6_ifdetach() that removes the link-local
allnodes multicast route if the routing table has not been initialized.
This avoids a panic during boot if an interface detaches before the
routing table is initialized.

Submitted by: sam


124332 10-Jan-2004 ume

in set{peer, sock}addr, do not convert the unspecified
address (::) to the mapped address form.

PR: kern/22868
Obtained from: KAME
MFC after: 3 days


123842 25-Dec-2003 dwmalone

When calculating the sequence number to use in an ip6fw reset, remember to
add one if the SYN flag was set in the original packet. This seems to make
ip6fw reset work correctly for new and in-progress connections. Update
the man page to reflect the fact it now seems to work.

Glanced at by: ume
MFC after: 2 weeks


123760 23-Dec-2003 ume

Catch a few places where NULL (pointer) was used where 0 (integer) was
expected (fix build).


123740 23-Dec-2003 peter

Catch a few places where NULL (pointer) was used where 0 (integer) was
expected.


123713 22-Dec-2003 suz

fixed a bug that IPv6 routing header does not work properly if specified from userland application

reviewed by: ume


123587 17-Dec-2003 suz

fixed an IPv6 path MTU discovery failure owing to a lack of initialization

Reviewed by: ume
Approved by: re (scottl)
MFC after: 1 day


123396 10-Dec-2003 ume

validate the argument for multicast routing socket options
correctly.

Obtained from: KAME
MFC after: 3 days


123296 08-Dec-2003 ume

- changed the logic in nd6_is_addr_neighbor(); check on-link prefixes
(not interface addresses) to see if a given address is on-link.
- skip offlink prefixes in neighbor determination in nd6_is_addr_neighbor.
- in nd6_is_addr_neighbor, regarded every address as on-link when the
default router list is empty. otherwise, we'd not be able make a neighbor
cache for the address.
this algorithm is applied to hosts only.
- in nd6_is_addr_neighbor, check if the default interface is equal to
the interface in question in addition to check if the default router
list is empty.

Obtained from: KAME


122991 26-Nov-2003 sam

Split the "inp" mutex class into separate classes for each of divert,
raw, tcp, udp, raw6, and udp6 sockets to avoid spurious witness
complaints.

Reviewed by: rwatson
Approved by: re (rwatson)


122970 24-Nov-2003 ume

pktopt may be null.

Approved by: re (rwatson)


122927 20-Nov-2003 andre

Introduce tcp_hostcache and remove the tcp specific metrics from
the routing table. Move all usage and references in the tcp stack
from the routing table metrics to the tcp hostcache.

It caches measured parameters of past tcp sessions to provide better
initial start values for following connections from or to the same
source or destination. Depending on the network parameters to/from
the remote host this can lead to significant speedups for new tcp
connections after the first one because they inherit and shortcut
the learning curve.

tcp_hostcache is designed for multiple concurrent access in SMP
environments with high contention and is hash indexed by remote
ip address.

It removes significant locking requirements from the tcp stack with
regard to the routing table.

Reviewed by: sam (mentor), bms
Reviewed by: -net, -current, core@kame.net (IPv6 parts)
Approved by: re (scottl)


122922 20-Nov-2003 andre

Introduce tcp_hostcache and remove the tcp specific metrics from
the routing table. Move all usage and references in the tcp stack
from the routing table metrics to the tcp hostcache.

It caches measured parameters of past tcp sessions to provide better
initial start values for following connections from or to the same
source or destination. Depending on the network parameters to/from
the remote host this can lead to significant speedups for new tcp
connections after the first one because they inherit and shortcut
the learning curve.

tcp_hostcache is designed for multiple concurrent access in SMP
environments with high contention and is hash indexed by remote
ip address.

It removes significant locking requirements from the tcp stack with
regard to the routing table.

Reviewed by: sam (mentor), bms
Reviewed by: -net, -current, core@kame.net (IPv6 parts)
Approved by: re (scottl)


122921 20-Nov-2003 andre

Remove RTF_PRCLONING from routing table and adjust users of it
accordingly. The define is left intact for ABI compatibility
with userland.

This is a pre-step for the introduction of tcp_hostcache. The
network stack remains fully useable with this change.

Reviewed by: sam (mentor), bms
Reviewed by: -net, -current, core@kame.net (IPv6 parts)
Approved by: re (scottl)


122875 18-Nov-2003 rwatson

Introduce a MAC label reference in 'struct inpcb', which caches
the MAC label referenced from 'struct socket' in the IPv4 and
IPv6-based protocols. This permits MAC labels to be checked during
network delivery operations without dereferencing inp->inp_socket
to get to so->so_label, which will eventually avoid our having to
grab the socket lock during delivery at the network layer.

This change introduces 'struct inpcb' as a labeled object to the
MAC Framework, along with the normal circus of entry points:
initialization, creation from socket, destruction, as well as a
delivery access control check.

For most policies, the inpcb label will simply be a cache of the
socket label, so a new protocol switch method is introduced,
pr_sosetlabel() to notify protocols that the socket layer label
has been updated so that the cache can be updated while holding
appropriate locks. Most protocols implement this using
pru_sosetlabel_null(), but IPv4/IPv6 protocols using inpcbs use
the the worker function in_pcbsosetlabel(), which calls into the
MAC Framework to perform a cache update.

Biba, LOMAC, and MLS implement these entry points, as do the stub
policy, and test policy.

Reviewed by: sam, bms
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories


122846 17-Nov-2003 ume

correct to look right interface.


122743 15-Nov-2003 ume

- m_cat() may free the mbuf on 2nd arg, so m_pkthdr manipulation has
to happen before the call to m_cat().
- correct signedness mixups.
- remove variable that is only assigned too but not referenced.

Obtained from: KAME


122742 15-Nov-2003 ume

oops, correct wrong change in previous commit.


122740 15-Nov-2003 ume

increase AH_MAXSUMSIZE for hmac-sha2-512

Obtained from: KAME


122739 15-Nov-2003 ume

preparation for 64bit sequence number.

Obtained from: KAME


122738 15-Nov-2003 ume

fixed a bug comparing sav->key_auth and SADB_AALG_NONE.

Obtained from: KAME


122581 12-Nov-2003 ume

reflect ip6_pktopts and ip6_moptions into embeded scope of
destination address. it makes `ping6 -I <if> <link-local>'
work again. since we don't merge scope cleanup yet, we need
this for workaround.


122509 11-Nov-2003 ume

cleanup rijndael API.
since there are naming conflicts with opencrypto, #define was
added to rename functions intend to avoid conflicts.

Obtained from: KAME


122412 10-Nov-2003 ume

enable aes-xcbc-mac and aes-ctr, again.


122334 08-Nov-2003 sam

replace explicit changes to rt_refcnt by RT_ADDREF and RT_REMREF
macros that expand to include assertions when the system is built
with INVARIANTS

Supported by: FreeBSD Foundation


122320 08-Nov-2003 sam

o add a flags parameter to netisr_register that is used to specify
whether or not the isr needs to hold Giant when running; Giant-less
operation is also controlled by the setting of debug_mpsafenet
o mark all netisr's except NETISR_IP as needing Giant
o add a GIANT_REQUIRED assertion to the top of netisr's that need Giant
o pickup Giant (when debug_mpsafenet is 1) inside ip_input before
calling up with a packet
o change netisr handling so swi_net runs w/o Giant; instead we grab
Giant before invoking handlers based on whether the handler needs Giant
o change netisr handling so that netisr's that are marked MPSAFE may
have multiple instances active at a time
o add netisr statistics for packets dropped because the isr is inactive

Supported by: FreeBSD Foundation


122246 07-Nov-2003 ume

nuke obsoleted ipsec_gethist(). it just did panic to notify user
that it was obsoleted. it is better to fail than just hiding use
of ipsec_gethist() at build.

Sugessted by: "Bjoern A. Zeeb" <bzeeb-lists@lists.zabbadoz.net>


122174 06-Nov-2003 ume

correct behavior when ipv6mr_interface is 0. Matthias Drochner

Notified by: itojun
Obtained from: NetBSD


122128 05-Nov-2003 ume

byebye in6_ifawithscope(). it was a function for old source
address selection.

Obtained from: KAME


122122 05-Nov-2003 ume

make sure to treat destrination address as KAME internal form
of embedscope.


122077 04-Nov-2003 ume

source address selection part of RFC3484.
TODO: since there is scope issue to be solved, multicast and
link-local address are treated as special for workaround for
now.

Obtained from: KAME


122062 04-Nov-2003 ume

- cleanup SP refcnt issue.
- share policy-on-socket for listening socket.
- don't copy policy-on-socket at all. secpolicy no longer contain
spidx, which saves a lot of memory.
- deep-copy pcb policy if it is an ipsec policy. assign ID field to
all SPD entries. make it possible for racoon to grab SPD entry on
pcb.
- fixed the order of searching SA table for packets.
- fixed to get a security association header. a mode is always needed
to compare them.
- fixed that the incorrect time was set to
sadb_comb_{hard|soft}_usetime.
- disallow port spec for tunnel mode policy (as we don't reassemble).
- an user can define a policy-id.
- clear enc/auth key before freeing.
- fixed that the kernel crashed when key_spdacquire() was called
because key_spdacquire() had been implemented imcopletely.
- preparation for 64bit sequence number.
- maintain ordered list of SA, based on SA id.
- cleanup secasvar management; refcnt is key.c responsibility;
alloc/free is keydb.c responsibility.
- cleanup, avoid double-loop.
- use hash for spi-based lookup.
- mark persistent SP "persistent".
XXX in theory refcnt should do the right thing, however, we have
"spdflush" which would touch all SPs. another solution would be to
de-register persistent SPs from sptree.
- u_short -> u_int16_t
- reduce kernel stack usage by auto variable secasindex.
- clarify function name confusion. ipsec_*_policy ->
ipsec_*_pcbpolicy.
- avoid variable name confusion.
(struct inpcbpolicy *)pcb_sp, spp (struct secpolicy **), sp (struct
secpolicy *)
- count number of ipsec encapsulations on ipsec4_output, so that we
can tell ip_output() how to handle the packet further.
- When the value of the ul_proto is ICMP or ICMPV6, the port field in
"src" of the spidx specifies ICMP type, and the port field in "dst"
of the spidx specifies ICMP code.
- avoid from applying IPsec transport mode to the packets when the
kernel forwards the packets.

Tested by: nork
Obtained from: KAME


122059 04-Nov-2003 ume

use nd6log().

Obtained from: KAME


122058 04-Nov-2003 ume

- update comments to refrect recent BSDs.
- nuke unused macro PSUEDO_SET().
- I believe our if_xname stuff is nothing strange against other BSDs.

Obtained from: KAME


121901 02-Nov-2003 ume

rename variables.

Obtained from: KAME


121816 31-Oct-2003 brooks

Replace the if_name and if_unit members of struct ifnet with new members
if_xname, if_dname, and if_dunit. if_xname is the name of the interface
and if_dname/unit are the driver name and instance.

This change paves the way for interface renaming and enhanced pseudo
device creation and configuration symantics.

Approved By: re (in principle)
Reviewed By: njl, imp
Tested On: i386, amd64, sparc64
Obtained From: NetBSD (if_xname)


121814 31-Oct-2003 ume

correct stat to increment.

Obtained from: KAME


121811 31-Oct-2003 ume

do not insert a dest option header (even specified by a user) that
should be placed before a routing header, unless a routing header
really exists.

Obtained from: KAME


121809 31-Oct-2003 ume

(icmp6_rip6_input) if the received data is small enough but in an
mbuf cluster, copy the data to a separate mbuf that do not use a
cluster. this change will reduce the possiblity of packet loss
in the socket layer.

Obtained from: KAME


121808 31-Oct-2003 ume

rename MLD6_* to MLD_*.

Obtained from: KAME


121807 31-Oct-2003 ume

use arc4random.

Obtained from: KAME


121806 31-Oct-2003 ume

initialize in6_tmpaddrtimer_ch.

Obtained from: KAME


121805 31-Oct-2003 ume

nuku unused functions in6_nigroup_attach() and
in6_nigroup_detach().

Obtained from: KAME


121770 30-Oct-2003 sam

Overhaul routing table entry cleanup by introducing a new rtexpunge
routine that takes a locked routing table reference and removes all
references to the entry in the various data structures. This
eliminates instances of recursive locking and also closes races
where the lock on the entry had to be dropped prior to calling
rtrequest(RTM_DELETE). This also cleans up confusion where the
caller held a reference to an entry that might have been reclaimed
(and in some cases used that reference).

Supported by: FreeBSD Foundation


121765 30-Oct-2003 sam

use a local variable to avoid holding a lock across a call out of view

Supported by: FreeBSD Foundation


121750 30-Oct-2003 ume

- unlock on error.
- don't call malloc with M_WAITOK within lock context.


121742 30-Oct-2003 ume

add management part of address selection policy described in
RFC3484.

Obtained from: KAME


121716 29-Oct-2003 sam

correct LOR by using a local variable to hold result
instead of holding a lock while calling out of view

Supported by: FreeBSD Foundation


121684 29-Oct-2003 ume

add ECN support in layer-3.
- implement the tunnel egress rule in ip_ecn_egress() in ip_ecn.c.
make ip{,6}_ecn_egress() return integer to tell the caller that
this packet should be dropped.
- handle ECN at fragment reassembly in ip_input.c and frag6.c.

Obtained from: KAME


121674 29-Oct-2003 ume

ip6_savecontrol() argument is redundant


121673 29-Oct-2003 ume

hide m_tag, again.

Requested by: sam


121631 28-Oct-2003 ume

make sure to accept only IPv6 packet.

Obtained from: KAME


121630 28-Oct-2003 ume

cleanup use of m_tag.

Obtained from: KAME


121607 27-Oct-2003 ume

M_DONTWAIT was passed into malloc().

Submitted by: Ian Dowse <iedowse@maths.tcd.ie>


121578 26-Oct-2003 ume

re-add wrongly disappered IPV6_CHECKSUM stuff by introducing
ip6_raw_ctloutput().

Obtained from: KAME


121576 26-Oct-2003 ume

drop unused defines.


121575 26-Oct-2003 ume

drop unused fields.


121569 26-Oct-2003 ume

use uint32_t instead of u_int32_t for newly introduced
struct definition.


121499 25-Oct-2003 ume

revert following unwanted changes:
- __packed to __attribute__((__packed__)
- uintN_t back to u_intN_t

Reported by: bde


121498 25-Oct-2003 ume

correct namespace pollution.

Submitted by: bde


121478 24-Oct-2003 ume

remove the ip6r0_addr and ip6r0_slmap members from ip6_rthdr0{}
according to rfc2292bis.

Obtained from: KAME


121472 24-Oct-2003 ume

Switch Advanced Sockets API for IPv6 from RFC2292 to RFC3542
(aka RFC2292bis). Though I believe this commit doesn't break
backward compatibility againt existing binaries, it breaks
backward compatibility of API.
Now, the applications which use Advanced Sockets API such as
telnet, ping6, mld6query and traceroute6 use RFC3542 API.

Obtained from: KAME


121445 23-Oct-2003 sam

check return result from rtalloc1 before invoking RTUNLOCK


121358 22-Oct-2003 ume

we have ppsratecheck().


121355 22-Oct-2003 ume

IP6Q_LOCK_CHECK -> IP6Q_LOCK_ASSERT.

Sugested by: sam


121353 22-Oct-2003 ume

drop the code of HAVE_NRL_INPCB part. our system doesn't
use NRL style INPCB.


121346 22-Oct-2003 ume

pretect ip6 reassemble queue by use of mutex.

Submitted by: rwatson (with modification)


121345 22-Oct-2003 ume

- implement lock around IPv6 reassembly, to avoid panic due to
frag6_drain (mutex version will come later).
- limit number of fragments (not fragment queues) in kernel.

Obtained from: KAME


121343 22-Oct-2003 ume

protect sid_default and sid.

Submitted by: rwatson (with modification)


121342 22-Oct-2003 ume

reduce calling in6_addr2zoneid().


121335 22-Oct-2003 suz

more strict sanity check for ESP tail

Obtained from: KAME


121315 21-Oct-2003 ume

- change scope to zone.
- change node-local to interface-local.
- better error handling of address-to-scope mapping.
- use in6_clearscope().

Obtained from: KAME


121283 20-Oct-2003 ume

correct linkmtu handling.

Obtained from: KAME


121257 19-Oct-2003 ume

- revert to old rijndael code. new rijndael code broke gbde.
- since aes-xcbc-mac and aes-ctr require functions in new
rijndael code, aes-xcbc-mac and aes-ctr are disabled for now.


121214 18-Oct-2003 ume

rtfree() must be called in lock context.

Reported by: jhay


121168 17-Oct-2003 ume

nuke duplicate function and unused function.

Obtained from: KAME


121167 17-Oct-2003 ume

revert wrongly dropped null check by previous commit.


121161 17-Oct-2003 ume

- add dom_if{attach,detach} framework.
- transition to use ifp->if_afdata.

Obtained from: KAME


121144 16-Oct-2003 sam

fix horribly botched MFp4 merge


121143 16-Oct-2003 sam

pfil hooks can modify packet contents so check if the destination
address has been changed when PFIL_HOOKS is enabled and, if it has,
arrange for the proper action by ip*_forward.

Submitted by: Pyun YongHyeon
Supported by: FreeBSD Foundation


121092 14-Oct-2003 sam

MFp4: correct locking issues in nd6_lookup

Supported by: FreeBSD Foundation


121072 13-Oct-2003 ume

use BF_ecb_encrypt().

Obtained from: KAME


121071 13-Oct-2003 ume

- support AES counter mode for ESP.
- use size_t as return type of schedlen(), as there's no error
check needed.
- clear key schedule buffer before freeing.

Obtained from: KAME


121062 13-Oct-2003 ume

support AES XCBC MAC for AH.

Obtained from: KAME


121061 13-Oct-2003 ume

- support AES XCBC MAC for AH
- correct SADB_X_AALG_RIPEMD160HMAC to 8

Obtained from: KAME


121046 12-Oct-2003 ume

include opencrypto/rmd160.h


121041 12-Oct-2003 ume

remove unused variable.

Obtained from: KAME


121033 12-Oct-2003 ume

- avoid hardcoded values.
- correct signedness mixups.
- log fix.
- preparation for 64bit sequence number.
introduce SA id (unique ID for SA - SPI is useless as duplicated
SPI is allowed)
- no need to malloc/free cksum buffer.

Obtained from: KAME


121027 12-Oct-2003 ume

- always check for optlen overrun.
- panic if NULL is passed to ah_sumsiz (as we never do it,
and callers do not properly check negative returns).

Obtained from: KAME


121025 12-Oct-2003 ume

- correct signedness mixups.
- avoid assuming result buffer size

Obtained from: KAME


121023 12-Oct-2003 ume

avoid hardcoding MD5 result length (16)

Obtained from: KAME


121021 12-Oct-2003 ume

- RIPEMD160 support
- pass size arg to ah->result (avoid assuming result buffer size)

Obtained from: KAME


120978 10-Oct-2003 ume

fixed an endian bug on fragment header scanning

Obtained from: KAME


120971 10-Oct-2003 ume

nuke SCOPEDROUTING. Though it was there for a long time,
it was never enabled.


120970 10-Oct-2003 ume

switch cast128 implementation to implementation by Steve Reid;
smaller footprint.

Obtained from: KAME


120943 09-Oct-2003 ume

- typo. found by markus@openbsd
- correct signedness mixup in pointer passing.
- drop meaningless variable.

Obtained from: KAME


120941 09-Oct-2003 ume

- typo in comment
- style
- ANSIfy
(there is no functional change.)

Obtained from: KAME


120913 08-Oct-2003 ume

- fix typo in comments.
- style.
- NULL is not 0.
- some variables were renamed.
- nuke unused logic.
(there is no functional change.)

Obtained from: KAME


120894 07-Oct-2003 sam

must lock route when the caller provided a route but not
an interface; otherwise the subsequent unlock blows up

Suffered by: Marcel Moolenaar <marcel@xcllnt.net>
Supported by: FreeBSD Foundation


120893 07-Oct-2003 ume

indent


120892 07-Oct-2003 ume

style and indent. no functional change.

Obtained from: KAME


120891 07-Oct-2003 ume

- fix typo in comment.
- style.

Obtained from: KAME


120890 07-Oct-2003 ume

nuke unused CTL_IPV6PROTO_NAMES macro.


120856 06-Oct-2003 ume

return(code) -> return (code)
(reduce diffs against KAME)


120727 04-Oct-2003 sam

Locking for updates to routing table entries. Each rtentry gets a mutex
that covers updates to the contents. Note this is separate from holding
a reference and/or locking the routing table itself.

Other/related changes:

o rtredirect loses the final parameter by which an rtentry reference
may be returned; this was never used and added unwarranted complexity
for locking.
o minor style cleanups to routing code (e.g. ansi-fy function decls)
o remove the logic to bump the refcnt on the parent of cloned routes,
we assume the parent will remain as long as the clone; doing this avoids
a circularity in locking during delete
o convert some timeouts to MPSAFE callouts

Notes:

1. rt_mtx in struct rtentry is guarded by #ifdef _KERNEL as user-level
applications cannot/do-no know about mutex's. Doing this requires
that the mutex be the last element in the structure. A better solution
is to introduce an externalized version of struct rtentry but this is
a major task because of the intertwining of rtentry and other data
structures that are visible to user applications.
2. There are known LOR's that are expected to go away with forthcoming
work to eliminate many held references. If not these will be resolved
prior to release.
3. ATM changes are untested.

Sponsored by: FreeBSD Foundation
Obtained from: BSD/OS (partly)


120652 01-Oct-2003 ume

add randomtab for ip6_randomflowlabel().

Obtained from: KAME


120649 01-Oct-2003 ume

randomize IPv6 flowlabel when RANDOM_IP_ID is defined.

Obtained from: KAME


120648 01-Oct-2003 ume

use arc4random()


120645 01-Oct-2003 ume

- include opt_random_ip_id.h
- we don't need to obtain microtime when using ip6_randomid.


120643 01-Oct-2003 ume

we don't need ip6_id when RANDOM_IP_ID is defined.


120642 01-Oct-2003 ume

include opt_random_ip_id.h


120641 01-Oct-2003 ume

Don't compiled ip6_randomid() in if RANDOM_IP_ID is not defined.


120640 01-Oct-2003 ume

Obey RANDOM_IP_ID.

Requested by: sam


120639 01-Oct-2003 ume

randomize IPv6 fragment ID.

Obtained from: KAME


120593 30-Sep-2003 sam

Correct pfil_run_hooks return handling: if the return value is non-zero
then the mbuf has been consumed by a hook; otherwise beware of a null
mbuf return (gack). In particular the bridge was doing the wrong thing.
While in the ipv6 code make it's handling of pfil_run_hooks identical
to netbsd.

Pointed out by: Pyun YongHyeon <yongari@kt-is.co.kr>


120386 23-Sep-2003 sam

o update PFIL_HOOKS support to current API used by netbsd
o revamp IPv4+IPv6+bridge usage to match API changes
o remove pfil_head instances from protosw entries (no longer used)
o add locking
o bump FreeBSD version for 3rd party modules

Heavy lifting by: "Max Laier" <max@love2party.net>
Supported by: FreeBSD Foundation
Obtained from: NetBSD (bits of pfil.h and pfil.c)


120049 14-Sep-2003 mdodd

Enable IPv6 for Token Ring.


120041 13-Sep-2003 wpaul

The in6_ifattach() routine contains the following code:

in6_pcbpurgeif0(LIST_FIRST(udbinfo.listhead), ifp);
in6_pcbpurgeif0(LIST_FIRST(ripcbinfo.listhead), ifp);

The problem here is that udbinfo.listhead and ripcbinfo.listhead are
not initialized during the device probe/attach phase of the kernel
boot process. So if, for example, a network driver calls ether_ifattach()
in its foo_attach() routine and then decides that something is wrong
and calls ether_ifdetach() to reverse the process, we will panic trying
to dereference the uninitialized list head pointers. (Though the
same sequence of events performed after the kernel has come up works
file, i.e. doing kldload if_foo from multiuser.)

Change this to:

if (udbinfo.listhead != NULL)
in6_pcbpurgeif0(LIST_FIRST(udbinfo.listhead), ifp);
if (ripcbinfo.listhead != NULL)
in6_pcbpurgeif0(LIST_FIRST(ripcbinfo.listhead), ifp);

to avoid the NULL pointer dereferences.


119995 11-Sep-2003 ru

Fix a bunch of off-by-one errors in the range checking code.


118498 05-Aug-2003 ume

introduced a flag bit "ND6_IFF_ACCEPT_RTADV" in the nd_ifinfo structure to
control whether to accept RAs per-interface basis.
the new stuff ensures the backward compatibility;
- the kernel does not accept RAs on any interfaces by default.
- since the default value of the flag bit is on, the kernel accepts RAs
on all interfaces when net.inet6.ip6.accept_rtadv is 1.

Obtained from: KAME
MFC after: 1 week


118171 29-Jul-2003 ume

Cleanup useless break.

Submitted by: JINMEI Tatuya <jinmei@isl.rdc.toshiba.co.jp>


118091 27-Jul-2003 ume

ip6fw does not handle ESP correctly

PR: kern/54874
Submitted by: JINMEI Tatuya <jinmei@shuttle.wide.toshiba.co.jp>
MFC after: 1 week


116453 17-Jun-2003 cognet

Do not attempt to access to inp_socket fields if the socket is in the TIME_WAIT
state, as inp_socket will then be NULL. This fixes a panic that occurs when one
tries to bind a port that was previously binded with remaining TIME_WAIT
sockets.


114259 29-Apr-2003 mdodd

Add definitions for IN6ADDR_LINKLOCAL_ALLMDNS_INIT and INADDR_ALLMDNS_GROUP.


114205 29-Apr-2003 suz

panic() doesn't need \n

Obtained from: KAME
MFC after: 2 days


114155 28-Apr-2003 suz

sync with the latest KAME (just a cosmetic change)

MFC after: 1 day


113799 21-Apr-2003 obrien

Explicitly declare 'int' parameters.


112781 29-Mar-2003 suz

fixed a mbuf leak when an IP packet from ESP tunnel is redirected

obtained from: KAME


112678 26-Mar-2003 ume

made sure to keep the current stored lifetime when it was not updated
by an RA.
(a detailed description of this issue is found at the following URL.)
http://www.tahi.org/report/freebsd/freebsd48-rc2-20030316/host/lcna-stateless-addrconf/38.html

Reported by: Ozoe Nobumichi <ozoe@tahi.org>
through a periodic TAHI test
Submitted by: JINMEI Tatuya <jinmei@isl.rdc.toshiba.co.jp>
Obtained from: KAME


112130 12-Mar-2003 sam

correct malloc flag argument

Reported by: Kris Kennaway <kris@obsecurity.org>


111888 04-Mar-2003 jlemon

Update netisr handling; Each SWI now registers its queue, and all queue
drain routines are done by swi_net, which allows for better queue control
at some future point. Packets may also be directly dispatched to a netisr
instead of queued, this may be of interest at some installations, but
currently defaults to off.

Reviewed by: hsu, silby, jayanth, sam
Sponsored by: DARPA, NAI Labs


111397 24-Feb-2003 jlemon

Fix another case for timewait.


111186 20-Feb-2003 jlemon

Remove unused variables in the IPSEC case.

Submitted by: Lars Eggert <larse@ISI.EDU>


111145 19-Feb-2003 jlemon

Add a TCP TIMEWAIT state which uses less space than a fullblown TCP
control block. Allow the socket and tcpcb structures to be freed
earlier than inpcb. Update code to understand an inp w/o a socket.

Reviewed by: hsu, silby, jayanth
Sponsored by: DARPA, NAI Labs


111119 19-Feb-2003 imp

Back out M_* changes, per decision of the TRB.

Approved by: trb


111070 18-Feb-2003 sam

M_MOVE_PKTHDR must happen before any cluster is attached

Submitted by: Harti Brandt <brandt@fokus.fraunhofer.de>
MFC after: 1 day


110232 02-Feb-2003 alfred

Consolidate MIN/MAX macros into one place (param.h).

Submitted by: Hiten Pandya <hiten@unixdaemons.com>


109623 21-Jan-2003 alfred

Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0.
Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.


108955 08-Jan-2003 ume

"struct route" is not sufficient. NetBSD PR 18751

Obtained from: KAME
MFC after: 1 days


108825 06-Jan-2003 sam

don't reference a pkthdr after M_MOVE_PKTHDR has "remove it"; instead
reference the pkthdr now in the destination of the move

Sponsored by: Vernier Networks


108824 06-Jan-2003 sam

purge extraneous clears of M_PKTHDR since M_MOVE_PKTHDR does this already


108797 06-Jan-2003 mike

Bah, just use %zu for printing size_t.


108765 06-Jan-2003 mike

Cast return values of sizeof() to int so they can be printed with %d.
The size of this struct is unlikely to ever grow beyond what an int
can represent.

Noticed by: alpha tinderbox


108741 05-Jan-2003 sam

correct pkthdr length calculation for ipv6 echo packets; after moving a packet
header with M_MOVE_PKTHDR one should not reference the packet header in the
original packet; in this case the code was assuming that m_adj would alter
m_pkthdr.len which stopped happening because M_MOVE_PKTHDR removes the
M_PKTHDR bit from m_flags

Submitted by: Bill Fenner <fenner@research.att.com>


108533 01-Jan-2003 schweikh

Correct typos, mostly s/ a / an / where appropriate. Some whitespace cleanup,
especially in troff files.


108470 30-Dec-2002 schweikh

Fix typos, mostly s/ an / a / where appropriate and a few s/an/and/
Add FreeBSD Id tag where missing.


108466 30-Dec-2002 sam

Correct mbuf packet header propagation. Previously, packet headers
were sometimes propagated using M_COPY_PKTHDR which actually did
something between a "move" and a "copy" operation. This is replaced
by M_MOVE_PKTHDR (which copies the pkthdr contents and "removes" it
from the source mbuf) and m_dup_pkthdr which copies the packet
header contents including any m_tag chain. This corrects numerous
problems whereby mbuf tags could be lost during packet manipulations.

These changes also introduce arguments to m_tag_copy and m_tag_copy_chain
to specify if the tag copy work should potentially block. This
introduces an incompatibility with openbsd which we may want to revisit.

Note that move/dup of packet headers does not handle target mbufs
that have a cluster bound to them. We may want to support this;
for now we watch for it with an assert.

Finally, M_COPYFLAGS was updated to include M_FIRSTFRAG|M_LASTFRAG.

Supported by: Vernier Networks
Reviewed by: Robert Watson <rwatson@FreeBSD.org>


108269 25-Dec-2002 ru

If the caller of rtrequest*(RTM_DELETE, ...) asked for a copy of
the entry being removed (ret_nrt != NULL), increment the entry's
rt_refcnt like we do it for RTM_ADD and RTM_RESOLVE, rather than
messing around with 1->0 transitions for rtfree() all over.


108250 24-Dec-2002 hsu

SMP locking for radix nodes.


108172 22-Dec-2002 hsu

SMP locking for ifnet list.


108143 20-Dec-2002 sam

define HAVE_PPSRATECHECK now that we have this stuff in the kernel
(probably belongs elsewhere; add it this way for now so the system
will build)


108107 19-Dec-2002 bmilekic

o Untangle the confusion with the malloc flags {M_WAITOK, M_NOWAIT} and
the mbuf allocator flags {M_TRYWAIT, M_DONTWAIT}.
o Fix a bpf_compat issue where malloc() was defined to just call
bpf_alloc() and pass the 'canwait' flag(s) along. It's been changed
to call bpf_alloc() but pass the corresponding M_TRYWAIT or M_DONTWAIT
flag (and only one of those two).

Submitted by: Hiten Pandya <hiten@unixdaemons.com> (hiten->commit_count++)


108033 18-Dec-2002 hsu

Lock up ifaddr reference counts.


107925 16-Dec-2002 suz

fixed a bug that IPv6 multicast packet is not forwarded if its packet size is equal to the outgoing interface's MTU

Approved by: re
Obtained from: KAME
MFC after: 3 days


106259 31-Oct-2002 ume

plugged memory leakage in some erroneous cases

Obtained from: KAME
MFC after: 1 week


105340 17-Oct-2002 ume

last arg of in6?_gif_output() is not used any more.

Obtained from: KAME
MFC after: 3 weeks


105295 16-Oct-2002 ume

use encapcheck.

Obtained from: KAME
MFC after: 3 weeks


105293 16-Oct-2002 ume

- after gif_set_tunnel(), psrc/pdst may be null. set IFF_RUNNING accordingly.
- set IFF_UP on SIOCSIFADDR. be consistent with others.
- set if_addrlen explicitly (just in case)
- multi destination mode is long gone.
- missing break statement
- add gif_set_tunnel(), so that we can set tunnel address from within the
kernel at ease.
- encap_attach/detach dynamically on ioctls
- move encap_attach() to dedicated function in in*_gif.c

Obtained from: KAME
MFC after: 3 weeks


105199 16-Oct-2002 sam

Tie new "Fast IPsec" code into the build. This involves the usual
configuration stuff as well as conditional code in the IPv4 and IPv6
areas. Everything is conditional on FAST_IPSEC which is mutually
exclusive with IPSEC (KAME IPsec implmentation).

As noted previously, don't use FAST_IPSEC with INET6 at the moment.

Reviewed by: KAME, rwatson
Approved by: silence
Supported by: Vernier Networks


105194 16-Oct-2002 sam

Replace aux mbufs with packet tags:

o instead of a list of mbufs use a list of m_tag structures a la openbsd
o for netgraph et. al. extend the stock openbsd m_tag to include a 32-bit
ABI/module number cookie
o for openbsd compatibility define a well-known cookie MTAG_ABI_COMPAT and
use this in defining openbsd-compatible m_tag_find and m_tag_get routines
o rewrite KAME use of aux mbufs in terms of packet tags
o eliminate the most heavily used aux mbufs by adding an additional struct
inpcb parameter to ip_output and ip6_output to allow the IPsec code to
locate the security policy to apply to outbound packets
o bump __FreeBSD_version so code can be conditionalized
o fixup ipfilter's call to ip_output based on __FreeBSD_version

Reviewed by: julian, luigi (silent), -arch, -net, darren
Approved by: julian, silence from everyone else
Obtained from: openbsd (mostly)
MFC after: 1 month


103842 23-Sep-2002 alfred

s/__attribute__((__packed__))/__packed/g


102397 25-Aug-2002 cjc

Lock the sysctl(8) knobs that turn ip{,6}fw(8) firewalling and
firewall logging on and off when at elevated securelevel(8). It would
be nice to be able to only lock these at securelevel >= 3, like rules
are, but there is no such functionality at present. I don't see reason
to be adding features to securelevel(8) with MAC being merged into 5.0.

PR: kern/39396
Reviewed by: luigi
MFC after: 1 week


102347 24-Aug-2002 ume

check packet length before fetching ESP crypto checksum.

Obtained from: KAME
MFC after: 2 days


102291 22-Aug-2002 archie

Replace (ab)uses of "NULL" where "0" is really meant.


102227 21-Aug-2002 mike

o Merge <machine/ansi.h> and <machine/types.h> into a new header
called <machine/_types.h>.
o <machine/ansi.h> will continue to live so it can define MD clock
macros, which are only MD because of gratuitous differences between
architectures.
o Change all headers to make use of this. This mainly involves
changing:
#ifdef _BSD_FOO_T_
typedef _BSD_FOO_T_ foo_t;
#undef _BSD_FOO_T_
#endif
to:
#ifndef _FOO_T_DECLARED
typedef __foo_t foo_t;
#define _FOO_T_DECLARED
#endif

Concept by: bde
Reviewed by: jake, obrien


102218 21-Aug-2002 truckman

Create new functions in_sockaddr(), in6_sockaddr(), and
in6_v4mapsin6_sockaddr() which allocate the appropriate sockaddr_in*
structure and initialize it with the address and port information passed
as arguments. Use calls to these new functions to replace code that is
replicated multiple times in in_setsockaddr(), in_setpeeraddr(),
in6_setsockaddr(), in6_setpeeraddr(), in6_mapped_sockaddr(), and
in6_mapped_peeraddr(). Inline COMMON_END in tcp_usr_accept() so that
we can call in_sockaddr() with temporary copies of the address and port
after the PCB is unlocked.

Fix the lock violation in tcp6_usr_accept() (caused by calling MALLOC()
inside in6_mapped_peeraddr() while the PCB is locked) by changing
the implementation of tcp6_usr_accept() to match tcp_usr_accept().

Reviewed by: suz


102131 19-Aug-2002 jmallett

Enclose IPv6 addresses in brackets when they are displayed printable with a
TCP/UDP port seperated by a colon. This is for the log_in_vain facility.

Pointed out by: Edward J. M. Brocklesby
Reviewed by: ume
MFC after: 2 weeks


102052 18-Aug-2002 sobomax

Increase size of ifnet.if_flags from 16 bits (short) to 32 bits (int). To avoid
breaking application ABI use unused ifreq.ifru_flags[1] for upper 16 bits in
SIOCSIFFLAGS and SIOCGIFFLAGS ioctl's.

Reviewed by: -hackers, -net


101240 02-Aug-2002 rwatson

Introduce support for Mandatory Access Control and extensible
kernel access control.

When generating nd6 output on an interface, label the packet
appropriately.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs


100688 25-Jul-2002 ume

correct comment for setsockopt arg size.

Reported by: Martin Laabs <martin@martin.erfurt.thur.de>
Obtained from: KAME
MFC after: 1 week


100683 25-Jul-2002 ume

cleanup usage of ip6_mapped_addr_on and ip6_v6only. now,
ip6_mapped_addr_on is unified into ip6_v6only.

MFC after: 1 week


100676 25-Jul-2002 ume

Change the default setting of an IPv4-mapped IPv6 address to off.

Requested by: many people


100629 24-Jul-2002 ume

make sure to set/unset INP_IPV4 according to a value
of IN6P_IPV6_V6ONLY

Reviewed by: Keiichi SHIMA <keiichi@iij.ad.jp>


100508 22-Jul-2002 ume

do not refer to IN6P_BINDV6ONLY anymore.

Obtained from: KAME
MFC after: 1 week


100503 22-Jul-2002 ume

sin6_len is not an address family. I believe this doesn't
break compatibility with POSIX.1-2001.


100277 18-Jul-2002 ume

fixed to make mbuf chain.

Obtained from: KAME
MFC after: 1 week


100132 15-Jul-2002 ume

- fixed a bug that we can't send a packet to ipv4mapped ipv6 address
using a udp6 socket without bind(2)ing.
- fbsd4/430 reported from the FreeBSD team.
- this fix is different from the fix reported in the above PR. i think
this better, but we need some test.

Obtained from: KAME
MFC after: 3 weeks


98211 14-Jun-2002 hsu

Notify functions can destroy the pcb, so they have to return an
indication of whether this happenned so the calling function
knows whether or not to unlock the pcb.

Submitted by: Jennifer Yang (yangjihui@yahoo.com)
Bug reported by: Sid Carter (sidcarter@symonds.net)


98141 12-Jun-2002 hsu

As a stop-gap measure, add one INP_LOCK_DESTROY() to in6_pcbdetach() to
get kernel compiled with INET6 to boot.


98102 10-Jun-2002 hsu

Lock up inpcb.

Submitted by: Jennifer Yang <yangjihui@yahoo.com>


97676 31-May-2002 ume

__FreeBSD__ is not a compiler constant. We must use
__FreeBSD_version here.

Submitted by: rwatson


97658 31-May-2002 tanimura

Back out my lats commit of locking down a socket, it conflicts with hsu's work.

Requested by: hsu


97181 23-May-2002 mike

o Conditionalize sections for POSIX.1-2001 compatibility.
o Use POSIX spelling for types, where possible.
o Define size_t in the __BSD_VISIBLE case (this isn't really needed
for standards conformance, but follows the tradition of not
requiring <sys/types.h> as a prerequisite).
o Use _BYTE_ORDER and friends instead of BYTE_ORDER and friends, since
there may not be enough pollution in order for the latter to work.
o Add an XXX note about the missing IPPROTO_IPV6 macro.


96972 20-May-2002 tanimura

Lock down a socket, milestone 1.

o Add a mutex (sb_mtx) to struct sockbuf. This protects the data in a
socket buffer. The mutex in the receive buffer also protects the data
in struct socket.

o Determine the lock strategy for each members in struct socket.

o Lock down the following members:

- so_count
- so_options
- so_linger
- so_state

o Remove *_locked() socket APIs. Make the following socket APIs
touching the members above now require a locked socket:

- sodisconnect()
- soisconnected()
- soisconnecting()
- soisdisconnected()
- soisdisconnecting()
- sofree()
- soref()
- sorele()
- sorwakeup()
- sotryfree()
- sowakeup()
- sowwakeup()

Reviewed by: alfred


96457 12-May-2002 ume

Recent zlib does not like Z_FLUSH at the end of inflate().

Reported by: quak@mydiax.ch
Obtained from: KAME
MFC after: 2 days
and approved by re


96116 06-May-2002 ume

Revised MLD-related definitions
- Used mld_xxx and MLD_xxx instead of mld6_xxx and MLD6_xxx according
to the official defintions in rfc2292bis
(macro definitions for backward compatibility were provided)
- Changed the first member of mld_hdr{} from mld_hdr to mld_icmp6_hdr
to avoid name space conflict in C++

This change makes ports/net/pchar compilable again under -CURRENT.

Obtained from: KAME


95759 30-Apr-2002 tanimura

Revert the change of #includes in sys/filedesc.h and sys/socketvar.h.

Requested by: bde

Since locking sigio_lock is usually followed by calling pgsigio(),
move the declaration of sigio_lock and the definitions of SIGIO_*() to
sys/signalvar.h.

While I am here, sort include files alphabetically, where possible.


95395 24-Apr-2002 ume

Correct timer management (deprecated) in nd6_timer.

Obtained from: KAME
MFC after: 3 days


95023 19-Apr-2002 suz

just merged cosmetic changes from KAME to ease sync between KAME and FreeBSD.
(based on freebsd4-snap-20020128)

Reviewed by: ume
MFC after: 1 week


94357 10-Apr-2002 mike

Unconditionalize the definition of INET_ADDRSTRLEN and
INET6_ADDRSTRLEN. Doing this helps expose bogus redefinitions in 3rd
party software.


93920 06-Apr-2002 mdodd

Use <net/fddi.h> rather than <netinet/if_fddi.h>.


93818 04-Apr-2002 jhb

Change callers of mtx_init() to pass in an appropriate lock type name. In
most cases NULL is passed, but in some cases such as network driver locks
(which use the MTX_NETWORK_LOCK macro) and UMA zone locks, a name is used.

Tested on: i386, alpha, sparc64


93593 01-Apr-2002 jhb

Change the suser() API to take advantage of td_ucred as well as do a
general cleanup of the API. The entire API now consists of two functions
similar to the pre-KSE API. The suser() function takes a thread pointer
as its only argument. The td_ucred member of this thread must be valid
so the only valid thread pointers are curthread and a few kernel threads
such as thread0. The suser_cred() function takes a pointer to a struct
ucred as its first argument and an integer flag as its second argument.
The flag is currently only used for the PRISON_ROOT flag.

Discussed on: smp@


93539 01-Apr-2002 ume

In nd6_lookup(), check if rt_llinfo is non-NULL to avoid returning an
entry that has the LLINFO flag but is not a neighbor cache entry.

Obtained from: KAME
MFC after: 1 week


93388 29-Mar-2002 ume

Fix cached route problem.

Submitted by: Keiichi SHIMA <keiichi@iij.ad.jp> (KAME)
Reviewed by: JINMEI Tatuya <jinmei@isl.rdc.toshiba.co.jp> (KAME)
MFC after: 1 week


93364 29-Mar-2002 ume

double m_free() - not critical. from niklas@openbsd

Obtained from: KAME
MFC after: 1 week


93133 25-Mar-2002 ume

Corrected arguments to key_allocsa called from
{esp6, ah6}_ctlinput. Previous ones were uninitialized
auto variables, which were completely bogus.

Obtained from: KAME
MFC after: 1 week


93128 25-Mar-2002 ume

3rd arg to bcmp() was wrong. From: David Wang <dsw@juniper.net>

Obtained from: KAME
MFC after: 1 week


92767 20-Mar-2002 jeff

Remove references to vm_zone.h and switch over to the new uma API.


92733 19-Mar-2002 peter

Pacify gcc-3.1.


92716 19-Mar-2002 alfred

Remove duplicate extern declarations to silence warnings.


92700 19-Mar-2002 darrenr

put an extern for ip6_protox in here where it is only used for kernel compiling


92699 19-Mar-2002 darrenr

put an extern for inet6sw in here and make it active only for kernel compiling


91984 10-Mar-2002 mike

o Add INET_ADDRSTRLEN and INET6_ADDRSTRLEN defines to <arpa/inet.h>
for POSIX.1-2001 conformance.
o Add magic to <netinet/in.h> and <netinet6/in6.h> to prevent
redefining INET_ADDRSTRLEN and INET6_ADDRSTRLEN.
o Add a note about missing typedefs in <arpa/inet.h>.


91713 05-Mar-2002 ume

- use des_ecb3_encrypt().
- style: added spaces after /* and before */

Obtained from: KAME
MFC after: 2 weeks


91712 05-Mar-2002 ume

Oops, now, encription and decription are separate function.

MFC after: 2 weeks


91671 05-Mar-2002 ume

- Speedup 3DES by using assembly code for i386.
- Sync des/blowfish to more recent openssl.

Obtained from: KAME/NetBSD
MFC after: 2 weeks


91491 28-Feb-2002 ume

- In nd6_rtrequest(), ignored a route when it is created by cloning and
is not a neighbor. see comments for the detailed reason.

- Rejected the process of nd6_rtrequest() when the request is RESOLVE and
the interface does not need neighbor caches.

Obtained from: KAME
MFC After: 1 week


91453 28-Feb-2002 peter

Fix another boatload of warnings (missing include) and a cosmetic
-Wuninitialized warning.


91354 27-Feb-2002 dd

Introduce a version field to `struct xucred' in place of one of the
spares (the size of the field was changed from u_short to u_int to
reflect what it really ends up being). Accordingly, change users of
xucred to set and check this field as appropriate. In the kernel,
this is being done inside the new cru2x() routine which takes a
`struct ucred' and fills out a `struct xucred' according to the
former. This also has the pleasant sideaffect of removing some
duplicate code.

Reviewed by: rwatson


91346 27-Feb-2002 alfred

Fix warnings caused by discarding const.

Hairy Eyeball At: peter


91327 26-Feb-2002 brooks

Fix warnings in the gif(4) driver so it compiles with -Werror.


90868 18-Feb-2002 mike

o Move NTOHL() and associated macros into <sys/param.h>. These are
deprecated in favor of the POSIX-defined lowercase variants.
o Change all occurrences of NTOHL() and associated marcros in the
source tree to use the lowercase function variants.
o Add missing license bits to sparc64's <machine/endian.h>.
Approved by: jake
o Clean up <machine/endian.h> files.
o Remove unused __uint16_swap_uint32() from i386's <machine/endian.h>.
o Remove prototypes for non-existent bswapXX() functions.
o Include <machine/endian.h> in <arpa/inet.h> to define the
POSIX-required ntohl() family of functions.
o Do similar things to expose the ntohl() family in libstand, <netinet/in.h>,
and <sys/param.h>.
o Prepend underscores to the ntohl() family to help deal with
complexities associated with having MD (asm and inline) versions, and
having to prevent exposure of these functions in other headers that
happen to make use of endian-specific defines.
o Create weak aliases to the canonical function name to help deal with
third-party software forgetting to include an appropriate header.
o Remove some now unneeded pollution from <sys/types.h>.
o Add missing <arpa/inet.h> includes in userland.

Tested on: alpha, i386
Reviewed by: bde, jake, tmm


90199 04-Feb-2002 ume

Corrected an argument to in6_pcbnotify().

Obtained from: KAME
MFC after: 1 week


89623 21-Jan-2002 ume

- Check the address family of a cached destination, in case of
sharing the cache with IPv4.
- Check if the cached route is up in in6_selectsrc().

Obtained from: KAME


89069 08-Jan-2002 msmith

Initialise the intrq_present fields at runtime, not link time. This allows
us to load protocols at runtime, and avoids the use of common variables.

Also fix the ip6_intrq assignment so that it works at all.


89067 08-Jan-2002 msmith

Staticise the fw chain.


88069 17-Dec-2001 sumikawa

Back out cometic changes. This is for easily syncing with KAME in other BSDs.


87599 10-Dec-2001 obrien

Update to C99, s/__FUNCTION__/__func__/,
also don't use ANSI string concatenation.


87565 09-Dec-2001 arr

- Replace M_WAIT with M_TRYWAIT since the M_WAIT flag is deprecated.

Spotted by: bde


86975 27-Nov-2001 ume

fixed the cast128 calculation with a short cipher key length.
the memory was overridden when the key length was less than 16 bytes.

Obtained from: KAME
MFC after: 1 week


86764 22-Nov-2001 jlemon

Introduce a syncache, which enables FreeBSD to withstand a SYN flood
DoS in an improved fashion over the existing code.

Reviewed by: silby (in a previous iteration)
Sponsored by: DARPA, NAI Labs


86487 17-Nov-2001 dillon

Give struct socket structures a ref counting interface similar to
vnodes. This will hopefully serve as a base from which we can
expand the MP code. We currently do not attempt to obtain any
mutex or SX locks, but the door is open to add them when we nail
down exactly how that part of it is going to work.


86183 08-Nov-2001 rwatson

o Replace reference to 'struct proc' with 'struct thread' in 'struct
sysctl_req', which describes in-progress sysctl requests. This permits
sysctl handlers to have access to the current thread, permitting work
on implementing td->td_ucred, migration of suser() to using struct
thread to derive the appropriate ucred, and allowing struct thread to be
passed down to other code, such as network code where td is not currently
available (and curproc is used).

o Note: netncp and netsmb are not updated to reflect this change, as they
are not currently KSE-adapted.

Reviewed by: julian
Obtained from: TrustedBSD Project


86159 06-Nov-2001 ume

Fixed the behavior when there is no inbound policy for the ipsec
tunneled packet.
When there is no suitable inbound policy for the packet of the ipsec
tunnel mode, the kernel never decapsulate the tunneled packet
as the ipsec tunnel mode even when the system wide policy is "none".
Then the kernel leaves the generic tunnel module to process this
packet. If there is no rule of the generic tunnel, the packet
is rejected and the statistics will be counted up.

Obtained from: KAME
MFC after: 1 week


85675 29-Oct-2001 sumikawa

Fix fragmented packet handling.

Obtained from: KAME
MFC after: 3 weeks


85074 17-Oct-2001 ru

Pull post-4.4BSD change to sys/net/route.c from BSD/OS 4.2.

Have sys/net/route.c:rtrequest1(), which takes ``rt_addrinfo *''
as the argument. Pass rt_addrinfo all the way down to rtrequest1
and ifa->ifa_rtrequest. 3rd argument of ifa->ifa_rtrequest is now
``rt_addrinfo *'' instead of ``sockaddr *'' (almost noone is
using it anyways).

Benefit: the following command now works. Previously we needed
two route(8) invocations, "add" then "change".
# route add -inet6 default ::1 -ifp gif0

Remove unsafe typecast in rtrequest(), from ``rtentry *'' to
``sockaddr *''. It was introduced by 4.3BSD-Reno and never
corrected.

Obtained from: BSD/OS, NetBSD
MFC after: 1 month
PR: kern/28360


85072 17-Oct-2001 ru

Pull fix for memory leak in in6_losing() from netinet/in_pcb.c,v 1.85.

MFC after: 1 week


85055 17-Oct-2001 ume

Fixed to process a IPv6 packet when ah transport after esp tunnel
should be applied. the SA of AH transport could not be selected
from the SAD because of this bug.

Obtained from: KAME
MFC after: 1 week


84994 15-Oct-2001 darrenr

catch forwarded ipv6 packets with pfil_hooks for outbound things too


83934 25-Sep-2001 brooks

Make faith loadable, unloadable, and clonable.


83709 20-Sep-2001 sumikawa

Removed a wrong comment.

Obtained from: KAME
MFC after: 1 week


83366 12-Sep-2001 julian

KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after: ha ha ha ha


83187 07-Sep-2001 julian

Patches from KAME to remove usage of Varargs in existing
IPV4 code. For now they will still have some in the developing stuff (IPv6)

Submitted by: Keiichi SHIMA / <keiichi@iij.ad.jp>
Obtained from: KAME


83130 06-Sep-2001 jlemon

Wrap array accesses in macros, which also happen to be lvalues:

ifnet_addrs[i - 1] -> ifaddr_byindex(i)
ifindex2ifnet[i] -> ifnet_byindex(i)

This is intended to ease the conversion to SMPng.


82884 03-Sep-2001 julian

Patches from Keiichi SHIMA <keiichi@iij.ad.jp>
to make ip use the standard protosw structure again.

Obtained from: Well, KAME I guess.


82657 31-Aug-2001 jlemon

Add missing "opt_inet6.h" header.


81990 20-Aug-2001 fenner

Fix fencepost error causing creation of 0-length mbufs when the boundary
between header and data fell on the boundary between two mbufs.


81369 10-Aug-2001 simokawa

Fix unaligned access (fault) on alpha with ndp -p/-r and sysctl -a.
Discussed on users@jp.ipv6.org

MFC candidate.


81127 04-Aug-2001 ume

When running aplication joined multicast address,
removing network card, and kill aplication.
imo_membership[].inm_ifp refer interface pointer
after removing interface.
When kill aplication, release socket,and imo_membership.
imo_membership use already not exist interface pointer.
Then, kernel panic.

PR: 29345
Submitted by: Inoue Yuichi <inoue@nd.net.fujitsu.co.jp>
Obtained from: KAME
MFC after: 3 days


81115 03-Aug-2001 ume

When global anycast address was assigned to lo0, wrong source
address was selected.

Reported by: Shingo WATANABE <nabe@nabechan.org>
Submitted by: JINMEI Tatuya <jinmei@isl.rdc.toshiba.co.jp>
MFC after: 3 days


80406 26-Jul-2001 ume

move ipsec security policy allocation into in_pcballoc, before
making pcbs available to the outside world. otherwise, we will see
inpcb without ipsec security policy attached (-> panic() in ipsec.c).

Obtained from: KAME
MFC after: 3 days


79763 15-Jul-2001 ume

do not M_WAITOK in in6_update_ifa(), since this function can be called
under splnet(). (some comment was added by KAME)

PR: 28927
MFC after: 1 week


79426 08-Jul-2001 ume

soopt_mcopyout() frees mbuf if error occurs, and DOES NOT free it if it is
successful.
This part was lacked during merge.

Obtained from: KAME
MFC after: 1 week


79404 07-Jul-2001 ume

The m_free call in the ip6_fw_ctl_ptr == NULL case apparently
tries to free uninitialized mbuf.
This was my mistake during recent KAME merge. This part is for
*BSD other than FreeBSD.

Submitted by: Alexander N. Kabaev <ak03@gte.com>


79197 04-Jul-2001 ume

When the link-layer address of a router changes, select the
best router again. In particular, when the neighbor entry is newly
created, it might affect the selection policy.

Obtained from: KAME
MFC after: 1 week


79139 03-Jul-2001 ume

use TAILQ_FOREACH() in searching address list

Obtained from: KAME
MFC after: 1 week


79106 02-Jul-2001 brooks

gif(4) and stf(4) modernization:

- Remove gif dependencies from stf.
- Make gif and stf into modules
- Make gif cloneable.

PR: kern/27983
Reviewed by: ru, ume
Obtained from: NetBSD
MFC after: 1 week


78931 28-Jun-2001 ume

- create an entry of IPV6CTL_STATS sysctl.
- fix the problem that netstat doesn't show raw6 and icmp6 pcblist.
- make netstat use sysctl to retreive stats of ipv6 and icmpv6
instead of kread.

Obtained from: KAME
MFC after: 1 week


78914 28-Jun-2001 kuriyama

Fix typo (s/=/+=/) in previous commit.


78891 27-Jun-2001 ume

refresh default router list on nd6_purge(), only if we are an
autoconfigured host.

Obtained from: KAME


78731 24-Jun-2001 kuriyama

Merge from netinet/ip_fw.c (1.117 -> 1.118).
o Use syslog(3) interface for logging.

Reviewed by: ume
MFC after: 10 days


78725 24-Jun-2001 ume

remove IN6_IS_ADDR_ANY macro (outside of standard, #if 0'ed for a long time)

Obtained from: KAME
MFC after: 10 days


78721 24-Jun-2001 ume

disallow setsockopt(IPV6_V6ONLY) for already bound sockets.

Obtained from: KAME
MFC after: 10 days


78704 24-Jun-2001 ume

on icmp6 node information query (FQDN), do not return hostnames with
two dots (like "foo..bar"). 0-length labels are not distinguishable
with multiple name replies.

Obtained from: KAME
MFC after: 10 days


78703 24-Jun-2001 ume

decrease warning

Obtained from: KAME
MFC after: 10 days


78702 24-Jun-2001 ume

Nuke the comment about MIP6. We don't have MIP6 code, yet.

MFC after: 10 days


78468 19-Jun-2001 sumikawa

Add IFT_L2VLAN for supported NDP type. IPv6 over VLAN works now.

Obtained from: KAME
MFC after: 2 weeks


78407 18-Jun-2001 ume

call pfxlist_onlink_check() at the end of in6_tmpifadd(), to make sure
a temporary address generated from a detached public one also detached.

Submitted by: JINMEI Tatuya <jinmei@isl.rdc.toshiba.co.jp>
Obtained from: KAME


78064 11-Jun-2001 ume

Sync with recent KAME.
This work was based on kame-20010528-freebsd43-snap.tgz and some
critical problem after the snap was out were fixed.
There are many many changes since last KAME merge.

TODO:
- The definitions of SADB_* in sys/net/pfkeyv2.h are still different
from RFC2407/IANA assignment because of binary compatibility
issue. It should be fixed under 5-CURRENT.
- ip6po_m member of struct ip6_pktopts is no longer used. But, it
is still there because of binary compatibility issue. It should
be removed under 5-CURRENT.

Reviewed by: itojun
Obtained from: KAME
MFC after: 3 weeks


77969 10-Jun-2001 jesper

Make the default value of net.inet.ip.maxfragpackets and
net.inet6.ip6.maxfragpackets dependent on nmbclusters,
defaulting to nmbclusters / 4

Reviewed by: bde
MFC after: 1 week


77574 01-Jun-2001 kris

Add ``options RANDOM_IP_ID'' which randomizes the ID field of IP packets.
This closes a minor information leak which allows a remote observer to
determine the rate at which the machine is generating packets, since the
default behaviour is to increment a counter for each packet sent.

Reviewed by: -net
Obtained from: OpenBSD


77572 01-Jun-2001 obrien

Back out jesper's 2001/05/31 14:58:11 PDT commit. It does not compile.


77546 31-May-2001 jesper

Change the default value of net.inet6.ip6.maxfragpackets from
200 to NMBCLUSTERS/4 to match the IPv4 case.

MFC after: 1 week


77065 23-May-2001 ume

Fix memory leak.

Submitted by: itojun


77003 22-May-2001 ume

M_COPY_PKTHDR has to be done before MCLGET.

Obtained from: KAME


76899 20-May-2001 sumikawa

Plug memoly leak in overlaps fragment cases.

Obtained from: KAME


75729 20-Apr-2001 ume

Fix typo in previous commit.

Submitted by: JINMEI Tatuya <jinmei@isl.rdc.toshiba.co.jp>


75719 19-Apr-2001 ume

- Fix to receive icmp6 echo reply within the host itself to ff02::1.
- Fix to receive icmp6 echo reply to link-local of itself.

Reported by: Eriya Akasaka <eakasaka@rodfbs.org>
Submitted by: JINMEI Tatuya <jinmei@isl.rdc.toshiba.co.jp>


75246 05-Apr-2001 ume

- correct logic of per-address input packet counts for lo0
- reject packets to fe80::xxxx%lo0 (xxxx != 1)

Submitted by: JINMEI Tatuya <jinmei@isl.rdc.toshiba.co.jp>


74955 28-Mar-2001 ume

Make per-address input packet counts for lo0 work.

Reported by: bmah
Submitted by: Noriyasu KATO <noriyasu.kato@toshiba.co.jp> (via itojun)


74356 16-Mar-2001 ume

nuke IPSEC_SRCSEL which does not do the right thing.
adjust state->ro if the tunnel endpoint is offlink.
KAME PR 233.

PR: kern/21079


74336 16-Mar-2001 kuriyama

Merge from kame (1.175 -> 1.176):
cope with freebsd4 bridge code.


74093 11-Mar-2001 bmilekic

Plug several mbuf leaks in error cases (in nd6)

Submitted by: jhay


73059 26-Feb-2001 kris

More IP option length validation.
Includes the following revisions from KAME (two of these were actually
committed previously but the CVS revisions weren't documented):

1.40 kame/kame/sys/netinet6/ah_core.c (committed in previous rev)
1.41 kame/kame/sys/netinet6/ah_core.c
1.28 kame/kame/sys/netinet6/ah_output.c (committed in previous rev)
1.29 kame/kame/sys/netinet6/ah_output.c
1.30 kame/kame/sys/netinet6/ah_output.c
1.129 kame/kame/sys/netinet6/nd6.c
1.130 kame/kame/sys/netinet6/nd6.c
1.24 kame/kame/sys/netinet6/dest6.c
1.25 kame/kame/sys/netinet6/dest6.c

Obtained from: KAME


72758 20-Feb-2001 simokawa

Better detection of duplicated initialization.

Obtained from: KAME


72739 20-Feb-2001 kris

Correct IPv4 option processing.

Submitted by: itojun
Obtained from: KAME


72650 18-Feb-2001 green

Switch to using a struct xucred instead of a struct xucred when not
actually in the kernel. This structure is a different size than
what is currently in -CURRENT, but should hopefully be the last time
any application breakage is caused there. As soon as any major
inconveniences are removed, the definition of the in-kernel struct
ucred should be conditionalized upon defined(_KERNEL).

This also changes struct export_args to remove dependency on the
constantly-changing struct ucred, as well as limiting the bounds
of the size fields to the correct size. This means: a) mountd and
friends won't break all the time, b) mountd and friends won't crash
the kernel all the time if they don't know what they're doing wrt
actual struct export_args layout.

Reviewed by: bde


72093 06-Feb-2001 asmodai

Fix typo: compatability -> compatibility.

Compatability is not an existing english word.


72091 06-Feb-2001 asmodai

Fix typo: seperate -> separate.

Seperate does not exist in the english language.


72084 06-Feb-2001 phk

Convert if_multiaddrs from LIST to TAILQ so that it can be traversed
backwards in the three drivers which want to do that.

Reviewed by: mikeh


71794 29-Jan-2001 peter

Yikes, these files bogusly #include "loop.h" but didn't use the value.
My searching for NLOOP missed them. :-(


71450 23-Jan-2001 kris

Fix the vulnerability with TCP ECE packets recently fixed in ipfw.
This is untested, but believed to work.


71389 22-Jan-2001 ume

avoid conflicting #define symbol (s/FW_IFNLEN/IP6&/).

Obtained from: KAME


71375 22-Jan-2001 ume

on in6_ifdetach(), do not remove default route mistakenly

Obtained from: KAME


71350 21-Jan-2001 des

First step towards an MP-safe zone allocator:
- have zalloc() and zfree() always lock the vm_zone.
- remove zalloci() and zfreei(), which are now redundant.

Reviewed by: bmilekic, jasone


71334 21-Jan-2001 ume

permit icmp6 type <= 256 (was 32).

Obtained from: KAME


71305 20-Jan-2001 ume

When ip6_fw_ctl() or soopt_mcopyout() return without success,
don't free mbuf. It is already freed by these routins.

PR: kern/24248


71207 18-Jan-2001 itojun

workaround; be sure to initialize nd6 interface information when IPv6
interface address gets added. this will avoid presenting EMSGSIZE when
outgoing interface is down (and never brought up).

sync with kame.


70601 02-Jan-2001 ume

do not touch ra_addr if it is NULL. from IIJ SEIL team

Obtained from: KAME


70254 21-Dec-2000 bmilekic

* Rename M_WAIT mbuf subsystem flag to M_TRYWAIT.
This is because calls with M_WAIT (now M_TRYWAIT) may not wait
forever when nothing is available for allocation, and may end up
returning NULL. Hopefully we now communicate more of the right thing
to developers and make it very clear that it's necessary to check whether
calls with M_(TRY)WAIT also resulted in a failed allocation.
M_TRYWAIT basically means "try harder, block if necessary, but don't
necessarily wait forever." The time spent blocking is tunable with
the kern.ipc.mbuf_wait sysctl.
M_WAIT is now deprecated but still defined for the next little while.

* Fix a typo in a comment in mbuf.h

* Fix some code that was actually passing the mbuf subsystem's M_WAIT to
malloc(). Made it pass M_WAITOK instead. If we were ever to redefine the
value of the M_WAIT flag, this could have became a big problem.


69774 08-Dec-2000 phk

Staticize some malloc M_ instances.


69152 25-Nov-2000 jlemon

Lock down the network interface queues. The queue mutex must be obtained
before adding/removing packets from the queue. Also, the if_obytes and
if_omcasts fields should only be manipulated under protection of the mutex.

IF_ENQUEUE, IF_PREPEND, and IF_DEQUEUE perform all necessary locking on
the queue. An IF_LOCK macro is provided, as well as the old (mutex-less)
versions of the macros in the form _IF_ENQUEUE, _IF_QFULL, for code which
needs them, but their use is discouraged.

Two new macros are introduced: IF_DRAIN() to drain a queue, and IF_HANDOFF,
which takes care of locking/enqueue, and also statistics updating/start
if necessary.


68620 11-Nov-2000 bmilekic

Change check from mbuf->m_ext.ext_free to use the new ext_type in order
to determine whether the given mbuf has a cluster (or some other type of
external storage) attached to it.

Note: This code should eventually be made to use M_WRITABLE() to determine
whether or not a copy should be made.

Reviewed by: jlemon


68532 09-Nov-2000 ume

backout my previous commit (KAME PR 296). foo != TUNNEL will
forbid "ANY" SA from being used for tnunel mode.

Reported by: Chris Cason <casonc@netplex.aussie.org>


68277 03-Nov-2000 ume

check whether the packet is tunnel mode. reported from <larse@ISI.EDU>

Obtained from: KAME


67893 29-Oct-2000 phk

Move suser() and suser_xxx() prototypes and a related #define from
<sys/proc.h> to <sys/systm.h>.

Correctly document the #includes needed in the manpage.

Add one now needed #include of <sys/systm.h>.
Remove the consequent 48 unused #includes of <sys/proc.h>.


67833 29-Oct-2000 joe

Count per-address statistics for IP fragments.

Requested by: ru
Obtained from: BSD/OS


67708 27-Oct-2000 phk

Convert all users of fldoff() to offsetof(). fldoff() is bad
because it only takes a struct tag which makes it impossible to
use unions, typedefs etc.

Define __offsetof() in <machine/ansi.h>

Define offsetof() in terms of __offsetof() in <stddef.h> and <sys/types.h>

Remove myriad of local offsetof() definitions.

Remove includes of <stddef.h> in kernel code.

NB: Kernelcode should *never* include from /usr/include !

Make <sys/queue.h> include <machine/ansi.h> to avoid polluting the API.

Deprecate <struct.h> with a warning. The warning turns into an error on
01-12-2000 and the file gets removed entirely on 01-01-2001.

Paritials reviews by: various.
Significant brucifications by: bde


67456 23-Oct-2000 itojun

be careful on mbuf overrun on ctlinput.
short icmp6 packet may be able to panic the kernel.
sync with kame.


67455 23-Oct-2000 itojun

kame 1.32 -> 1.33
in add_m6fc(), set interface list for all cases.
in response to a report from Hoerdt Mickael.

kame 1.31 -> 1.32
discard PIM register if the version of the inner packet is incorrect (i.e. IPv6)
(according to clarfication of recent discussion in the IETF pim ML)


67334 19-Oct-2000 joe

Augment the 'ifaddr' structure with a 'struct if_data' to keep
statistics on a per network address basis.

Teach the IPv4 and IPv6 input/output routines to log packets/bytes
against the network address connected to the flow.

Teach netstat to display the per-address stats for IP protocols
when 'netstat -i' is evoked, instead of displaying the per-interface
stats.


66890 09-Oct-2000 archie

Fix broken const'ness in declaration of sha1_loop().


66586 03-Oct-2000 itojun

make pr_type type meet with struct protosw. sync with kame


66504 01-Oct-2000 itojun

add missing \n. sync with kame.


66303 23-Sep-2000 ume

Make ip6fw as loadable module.


65895 15-Sep-2000 ume

examined the gateway (from the routing table) only when the address
family of the gateway is AF_INET6.

Submitted by: JINMEI Tatuya <jinmei@isl.rdc.toshiba.co.jp>


65837 14-Sep-2000 ru

Follow BSD/OS and NetBSD, keep the ip_id field in network order all the time.

Requested by: wollman


65637 09-Sep-2000 itojun

add attrbute(packed) to union def with specific align constraitn.
move file static variable to auto variable, make in6_cksum() work better in
kernel-MP environment. sync with kame.

From: Alfred Perlstein <bright@wintelcom.net>


65406 03-Sep-2000 itojun

repair type 0 routing header support. it was caused by RFC2292/2292bis
difference. from: jinmei@kame.net


65124 27-Aug-2000 itojun

warn that setsockopt/sysctl # spaces are shared among *BSD, and should better
be consulted with KAME guys if you want a number.


64837 19-Aug-2000 dwmalone

Replace the mbuf external reference counting code with something
that should be better.

The old code counted references to mbuf clusters by using the offset
of the cluster from the start of memory allocated for mbufs and
clusters as an index into an array of chars, which did the reference
counting. If the external storage was not a cluster then reference
counting had to be done by the code using that external storage.

NetBSD's system of linked lists of mbufs was cosidered, but Alfred
felt it would have locking issues when the kernel was made more
SMP friendly.

The system implimented uses a pool of unions to track external
storage. The union contains an int for counting the references and
a pointer for forming a free list. The reference counts are
incremented and decremented atomically and so should be SMP friendly.
This system can track reference counts for any sort of external
storage.

Access to the reference counting stuff is now through macros defined
in mbuf.h, so it should be easier to make changes to the system in
the future.

The possibility of storing the reference count in one of the
referencing mbufs was considered, but was rejected 'cos it would
often leave extra mbufs allocated. Storing the reference count in
the cluster was also considered, but because the external storage
may not be a cluster this isn't an option.

The size of the pool of reference counters is available in the
stats provided by "netstat -m".

PR: 19866
Submitted by: Bosko Milekic <bmilekic@dsuper.net>
Reviewed by: alfred (glanced at by others on -net)


64701 16-Aug-2000 itojun

add missing splx(), when outgoing interface queue is full on tunnelled
IPsec packet output. KAME PR 280.


64558 12-Aug-2000 ume

Make compilable with -DIPFILTER.
Because I don't use ipfilter at all, this is not tested.
I don't know if ipfilter is work for IPv6.

Submitted by: yoshiaki@kt.rim.or.jp


64541 11-Aug-2000 itojun

backout ND6_USE_RTSOCK change in previous


64540 11-Aug-2000 itojun

avoid duplicated rtfree() on default router list change (could cause panic).
sync with kame 1.46 -> 1.47.


64503 10-Aug-2000 ume

Make ip6fw zero work.

PR: bin/20522


64118 02-Aug-2000 peter

GRRR! Fix the 'panic: ip6_init' caused by darrenr's incomplete changes
for the pfil hooks. The protosw and ip6protosw structures were out of
sync with each other. :-(


64060 31-Jul-2000 darrenr

activate pfil_hooks and covert ipfilter to use it


63256 16-Jul-2000 itojun

s/IPSEC_IPV6FWD/IPSEC/. this avoids unexpected behavior on ipv6 fowarding.
(even if you ask for tunnel-mode encryption packets will go out in clear)
sync with kame.


63024 12-Jul-2000 itojun

remove m_pulldown statistics, which is highly experimental and does not
belong to *bsd-merged tree


63001 12-Jul-2000 itojun

correct rtentry reference count in in6_ifloop_request().
if you reconfigure inet6 too much, the reference count can go
into negative by mistake. KAME in6.c 1.98 -> 1.99.


62744 07-Jul-2000 grog

Suppress a warning message about trigraphs.

Approved-by: itojun


62648 05-Jul-2000 itojun

add list of KAME files - may not be 100% correct


62604 05-Jul-2000 itojun

split net.inet6.ip6.rtexpire (and others) from net.inet.ip.*.
From: Andrzej Bialecki <abial@webgiro.com>


62601 05-Jul-2000 itojun

correct compilation with IPSEC_IPV6FWD.
From: Ollivier Robert <roberto@keltia.freenix.fr>


62587 04-Jul-2000 itojun

sync with kame tree as of july00. tons of bug fixes/improvements.

API changes:
- additional IPv6 ioctls
- IPsec PF_KEY API was changed, it is mandatory to upgrade setkey(8).
(also syntax change)


62573 04-Jul-2000 phk

Previous commit changing SYSCTL_HANDLER_ARGS violated KNF.

Pointed out by: bde


62454 03-Jul-2000 phk

Style police catches up with rev 1.26 of src/sys/sys/sysctl.h:

Sanitize SYSCTL_HANDLER_ARGS so that simplistic tools can grog our
sources:

-sysctl_vm_zone SYSCTL_HANDLER_ARGS
+sysctl_vm_zone (SYSCTL_HANDLER_ARGS)


61965 22-Jun-2000 ume

Inhibit successful DAD messages and "no default interface" messages.
It seems that people find them too noisy.
(ND6_DEBUG will enable them)

Obtained from: KAME Project


61958 22-Jun-2000 itojun

correct bad TTL with packets generated by v4 mapped udp. from kame


60938 26-May-2000 jake

Back out the previous change to the queue(3) interface.
It was not discussed and should probably not happen.

Requested by: msmith and others


60889 24-May-2000 archie

Just need to pass the address family to if_simloop(), not the whole sockaddr.


60833 23-May-2000 jake

Change the way that the queue(3) structures are declared; don't assume that
the type argument to *_HEAD and *_ENTRY is a struct.

Suggested by: phk
Reviewed by: phk
Approved by: mdodd


60393 11-May-2000 bde

Fixed missing prototype for inet6_rthdr_reverse().


60265 09-May-2000 ps

Add missing include machine/in_cksum.h.

Submitted by: n_hibma


59760 29-Apr-2000 phk

Remove unneeded #include <sys/kernel.h>


59391 19-Apr-2000 phk

Remove ~25 unneeded #include <sys/conf.h>
Remove ~60 unneeded #include <sys/malloc.h>


59333 17-Apr-2000 sumikawa

even if nd6_nud_hint is called, do not change a neighbor's status
unless the old status is probably reachable (i.e. the link-layer address
has already been resolved).

Obtained from: KAME Project


58907 01-Apr-2000 shin

Support per socket based IPv4 mapped IPv6 addr enable/disable control.

Submitted by: ume


58452 22-Mar-2000 green

in6_pcb.c:
Remove a bogus (redundant, just weird, etc.) key_freeso(so).
There are no consumers of it now, nor does it seem there
ever will be.

in6?_pcb.c:
Add an if (inp->in6?p_sp != NULL) before the call to
ipsec[46]_delete_pcbpolicy(inp). In low-memory conditions
this can cause a crash because in6?_sp can be NULL...


57972 13-Mar-2000 shin

Backout the previous change to __KAME_VERSION (FreeBSD4.x addition),
because this is now 5.0-current.


57943 12-Mar-2000 shin

Change __KAME_VERSION value. Added the word "FreeBSD4.x" to identify the
system with other platform and/or other version of FreeBSD, which is also
integrated KAME code based on another date.

Approved by: jkh


57917 11-Mar-2000 shin

Forbid include of netinet6/ip6.h from user-land, and if included,
print an error message which say, "include netinet/ip6.h".
This is postponed to apply to avoid tcpdump compile error.
Now apply this because tcpdump has been already fixed.

Approved by: jkh

Obtained from: KAME project


57912 11-Mar-2000 shin

Replace m_pkthdr.rcvif with oif when oif is not NULL, to count
icmp6 error statistics based on sending interface.
This also prevent kernel panic when rcvif is not initialized after M_PKTHDR().
(The initialization issue also need to be fixed in the future.)

Approved by: jkh

Submitted by: k-sugyou@kame.net


57855 09-Mar-2000 shin

Initialize mbuf pointer at getting ipsec policy.
Without this, kernel will panic at getsockopt() of IPSEC_POLICY.
Also make compilable libipsec/test-policy.c which tries getsockopt() of
IPSEC_POLICY.

Approved by: jkh

Submitted by: sakane@kame.net


57851 09-Mar-2000 shin

Update icmp node info query message bit order of query types,
according to draft-ietf-ipngwg-icmp-name-lookups-04 to 05 change.
This is necessary before 4.0, because,
-This change is non backword compatible
-Other KAME derived platforms applied 05
-Author of the draft said he never do backword imcompatible changes
again.

Approved by: jkh

Obtained from: KAME project


57719 03-Mar-2000 shin

CMSG_XXX macros alignment fixes to follow RFC2292.

Approved by: jkh

Submitted by: Partly from tech@openbsd
Reviewed by: itojun


57535 27-Feb-2000 shin

At detaching IPv6 raw socket, also finish IPv6 multicast router.

Approved by: jkh

Submitted by: fenner


57178 13-Feb-2000 peter

Clean up some loose ends in the network code, including the X.25 and ISO
#ifdefs. Clean out unused netisr's and leftover netisr linker set gunk.
Tested on x86 and alpha, including world.

Approved by: jkh


57121 10-Feb-2000 shin

Prototype fix for IPsec authentication related functions

Some of IPsec authentication related functions should have
'const' for its 2nd argument, but not now.
But if someone try to use them, and passed const data for
those functions, then much bogus compile warnings will be
generated.
So those funcs prototype should be modified.

Requested by: archie
Approved by: jkh


57120 10-Feb-2000 shin

Forbid include of soem inet6 header files from wrong place

KAME put INET6 related stuff into sys/netinet6 dir, but IPv6
standard API(RFC2553) require following files to be under sys/netinet.
netinet/ip6.h
netinet/icmp6.h
Now those header files just include each following files.
netinet6/ip6.h
netinet6/icmp6.h

Also KAME has netinet6/in6.h for easy INET6 common defs
sharing between different BSDs, but RFC2553 requires only
netinet/in.h should be included from userland.
So netinet/in.h also includes netinet6/in6.h inside.

To keep apps portability, apps should not directly include
above files from netinet6 dir.
Ideally, all contents of,
netinet6/ip6.h
netinet6/icmp6.h
netinet6/in6.h
should be moved into
netinet/ip6.h
netinet/icmp6.h
netinet/in.h
but to avoid big changes in this stage, add some hack, that
-Put some special macro define into those files under neitnet
-Let files under netinet6 cause error if it is included
from some apps, and, if the specifal macro define is not
defined.
(which should have been defined if files under netinet is
included)
-And let them print an error message which tells the
correct name of the include file to be included.

Also fix apps which includes invalid header files.

Approved by: jkh

Obtained from: KAME project


57018 07-Feb-2000 shin

IPv6 prefix assignment bug fixes.

(1)When all related IPv6 addresses are removed,
then remove the associated IPv6 prefix.
(2)When multiple IPv6 link local addrs exist for a same
interface , then let its IPv6 prefix have multiple
interface id, and create multiple IPv6 global addrs with same
interface id.
(3)When a new IPv6 link local addr is assigned for an
interface, then let its IPv6 prefix also have the
interface id of the new IPv6 link local addr, and
create new IPv6 global addrs with same interface id.

Approved by: jkh


57017 07-Feb-2000 shin

Permit site local addr in IPv6 source address selection rule.

KAME source addr selection rule had a problem to treat IPv6 site
local addr.
The rule is completely rewritten recently and the above problem
is also fixed, but rewriting same code part in freebsd4.0 is too
dangerous in this stage, so just add workaround to avoid
the problem. Just add code for IPv6 site local addresses into IPv6
source addr selection algorythm part.


56815 29-Jan-2000 shin

Add ip6fw.
Yes it is almost code freeze, but as the result of many thought, now I
think this should be added before 4.0...

make world check, kernel build check is done.

Reviewed by: green
Obtained from: KAME project


56743 28-Jan-2000 shin

Backout diffs which should not be included.


56738 28-Jan-2000 shin

#This is a null commit to give correct description for the previous change.
#Please forget the strange log message of the previous commit .

IPv6 multicast routing.
kernel IPv6 multicast routing support.
pim6 dense mode daemon
pim6 sparse mode daemon
netstat support of IPv6 multicast routing statistics

Merging to the current and testing with other existing multicast routers
is done by Tatsuya Jinmei <jinmei@kame.net>, who writes and maintainances
the base code in KAME distribution.

Make world check and kernel build check was also successful.

Obtained from: KAME project


56723 28-Jan-2000 shin

Sorry I didn't commit these files at the commit just a few minutes before.
(IPv6 multicast routing)
I think I mistakenly touched TAB and the last arg sys/netinet6 to
the cvs commit changed to sys/netinet6/in6_proto.c.


56722 28-Jan-2000 shin

IPv6 multicast routing.
kernel IPv6 multicast routing support.
pim6 dense mode daemon
pim6 sparse mode daemon
netstat support of IPv6 multicast routing statistics

Merging to the current and testing with other existing multicast routers
is done by Tatsuya Jinmei <jinmei@kame.net>, who writes and maintainances
the base code in KAME distribution.

Make world check and kernel build check was also successful.


56669 27-Jan-2000 shin

Added ip6_forwarding check when prefix related ioctl is called.
(prefix related ioctl should only be called on router,
because host use dynamic address and prefix configuration mechanism,
and those prefix are managed separately with ones whih are assined
manually.)


56555 24-Jan-2000 brian

Move the *intrq variables into net/intrq.c and unconditionally
include this in all kernels. Declare some const *intrq_present
variables that can be checked by a module prior to using *intrq
to queue data.

Make the if_tun module capable of processing atm, ip, ip6, ipx,
natm and netatalk packets when TUNSIFHEAD is ioctl()d on.

Review not required by: freebsd-hackers


56228 18-Jan-2000 shin

Merge a bug fix from freebsd-current; check m != NULL before touching it,
at udp6_ctlinput().
There should be kernel panic at PCCARD suspend etc, before this bug fix.

Submitted by: Hajimu UMEMOTO <ume@mahoroba.org>


56117 16-Jan-2000 shin

fix kernel panic at rtfree() in INET6 enabled envrionment.
This is probably due to twice rtfree() in in6_pcbdetach(),
one for inp->in6p_route.ro_rt, and another one for inp->inp_route.ro_rt.
But these 2 are actually shared in inpcb, so 2nd rtfree() is not necessary.


56041 15-Jan-2000 shin

Fixed the problem that IPsec connection hangs when bigger data is sent.
-opt_ipsec.h was missing on some tcp files (sorry for basic mistake)
-made buildable as above fix
-also added some missing IPv4 mapped IPv6 addr consideration into
ipsec4_getpolicybysock


56018 15-Jan-2000 shin

wrapped prototype declarations by __P(())

Submitted by: bde


56016 15-Jan-2000 shin

add forward declarations, and small cosmetic changes.

Submitted by: bde


55919 13-Jan-2000 shin

fix wrong name which is hidden by wrong ifdef.
Sorry for build failure. There was a mistake when I moved the patch
from my build check machine to commit machine.

Specified by: peter


55917 13-Jan-2000 shin

Change struct sockaddr_storage member name, because following change
is very likely to become consensus as recent ietf/ipng mailing list
discussion. Also recent KAME repository and other KAME patched BSDs
also applied it.

s/__ss_family/ss_family/
s/__ss_len/ss_len/

Makeworld is confirmed, and no application should be affected by this change
yet.


55873 13-Jan-2000 shin

removed an ours case which think a packet destined to loopback interface
with IPv6 loopback addr for its dest or src addr as ours.


55872 13-Jan-2000 shin

added missing IPV6_PORTRANGE case.


55871 13-Jan-2000 shin

remove unnecessary "$ifdef INET6"


55679 09-Jan-2000 shin

tcp updates to support IPv6.
also a small patch to sys/nfs/nfs_socket.c, as max_hdr size change.

Reviewed by: freebsd-arch, cvs-committers
Obtained from: KAME project


55344 03-Jan-2000 shin

prevent kernel panic at suspend/resume.

confirmed by: sanpei, joe

PR: kern/15742


55205 29-Dec-1999 peter

Change #ifdef KERNEL to #ifdef _KERNEL in the public headers. "KERNEL"
is an application space macro and the applications are supposed to be free
to use it as they please (but cannot). This is consistant with the other
BSD's who made this change quite some time ago. More commits to come.


55163 28-Dec-1999 shin

Getaddrinfo(), getnameinfo(), and etc support in libc/net.
Several udp and raw apps IPv6 support.

Reviewed by: freebsd-arch, cvs-committers
Obtained from: KAME project


55009 22-Dec-1999 shin

IPSEC support in the kernel.
pr_input() routines prototype is also changed to support IPSEC and IPV6
chained protocol headers.

Reviewed by: freebsd-arch, cvs-committers
Obtained from: KAME project


54952 21-Dec-1999 eivind

Change incorrect NULLs to 0s


54799 19-Dec-1999 green

M_PREPEND-related cleanups (unregisterifying struct mbuf *s).


54350 09-Dec-1999 shin

rtcalloc() is removed because it turned out not to be necessary for FreeBSD.
(It was added as a part of KAME patch)

Specified by: jdp@polstra.com


54263 07-Dec-1999 shin

udp IPv6 support, IPv6/IPv4 tunneling support in kernel,
packet divert at kernel for IPv6/IPv4 translater daemon

This includes queue related patch submitted by jburkhol@home.com.

Submitted by: queue related patch from jburkhol@home.com
Reviewed by: freebsd-arch, cvs-committers
Obtained from: KAME project


53957 30-Nov-1999 shin

Just to avoid warning message about trigraph.

Commented by: green


53877 29-Nov-1999 itojun

there's no memcmp() in kernel, use bcmp() instead.
in userland memcmp() is preferred for ANSI preference.
(from KAME repository)


53626 23-Nov-1999 shin

Removed IPSEC and IPV6FIREWALL because they are not ready yet.


53541 22-Nov-1999 shin

KAME netinet6 basic part(no IPsec,no V6 Multicast Forwarding, no UDP/TCP
for IPv6 yet)

With this patch, you can assigne IPv6 addr automatically, and can reply to
IPv6 ping.

Reviewed by: freebsd-arch, cvs-committers
Obtained from: KAME project


52904 05-Nov-1999 shin

KAME related header files additions and merges.
(only those which don't affect c source files so much)

Reviewed by: cvs-committers
Obtained from: KAME project