178064Sume	Implementation Note
257522Sshin
378064Sume	KAME Project
478064Sume	http://www.kame.net/
578064Sume	$KAME: IMPLEMENTATION,v 1.216 2001/05/25 07:43:01 jinmei Exp $
678064Sume	$FreeBSD$
757522Sshin
8122115SumeNOTE: The document tries to describe behaviors/implementation choices
9164224Sbzof the latest KAME/*BSD stack.  The description here may not be
10164224Sbzapplicable to KAME-integrated *BSD releases, as we have certain amount
11164224Sbzof changes between them.  Still, some of the content can be useful for
12164224SbzKAME-integrated *BSD releases.
13122115Sume
14122115SumeTable of Contents
15122115Sume
16122115Sume	1. IPv6
17122115Sume	1.1 Conformance
18122115Sume	1.2 Neighbor Discovery
19122115Sume	1.3 Scope Zone Index
20122115Sume	1.3.1 Kernel internal
21122115Sume	1.3.2 Interaction with API
22122115Sume	1.3.3 Interaction with users (command line)
23122115Sume	1.4 Plug and Play
24122115Sume	1.4.1 Assignment of link-local, and special addresses
25122115Sume	1.4.2 Stateless address autoconfiguration on hosts
26122115Sume	1.4.3 DHCPv6
27122115Sume	1.5 Generic tunnel interface
28122115Sume	1.6 Address Selection
29122115Sume	1.6.1 Source Address Selection
30122115Sume	1.6.2 Destination Address Ordering
31122115Sume	1.7 Jumbo Payload
32122115Sume	1.8 Loop prevention in header processing
33122115Sume	1.9 ICMPv6
34122115Sume	1.10 Applications
35122115Sume	1.11 Kernel Internals
36122115Sume	1.12 IPv4 mapped address and IPv6 wildcard socket
37122115Sume	1.12.1 KAME/BSDI3 and KAME/FreeBSD228
38122115Sume	1.12.2 KAME/FreeBSD[34]x
39122115Sume	1.12.2.1 KAME/FreeBSD[34]x, listening side
40122115Sume	1.12.2.2 KAME/FreeBSD[34]x, initiating side
41122115Sume	1.12.3 KAME/NetBSD
42122115Sume	1.12.3.1 KAME/NetBSD, listening side
43122115Sume	1.12.3.2 KAME/NetBSD, initiating side
44122115Sume	1.12.4 KAME/BSDI4
45122115Sume	1.12.4.1 KAME/BSDI4, listening side
46122115Sume	1.12.4.2 KAME/BSDI4, initiating side
47122115Sume	1.12.5 KAME/OpenBSD
48122115Sume	1.12.5.1 KAME/OpenBSD, listening side
49122115Sume	1.12.5.2 KAME/OpenBSD, initiating side
50122115Sume	1.12.6 More issues
51122115Sume	1.12.7 Interaction with SIIT translator
52122115Sume	1.13 sockaddr_storage
53122115Sume	1.14 Invalid addresses on the wire
54122115Sume	1.15 Node's required addresses
55122115Sume	1.15.1 Host case
56122115Sume	1.15.2 Router case
57122115Sume	1.16 Advanced API
58122115Sume	1.17 DNS resolver
59122115Sume	2. Network Drivers
60122115Sume	2.1 FreeBSD 2.2.x-RELEASE
61122115Sume	2.2 BSD/OS 3.x
62122115Sume	2.3 NetBSD
63122115Sume	2.4 FreeBSD 3.x-RELEASE
64122115Sume	2.5 FreeBSD 4.x-RELEASE
65122115Sume	2.6 OpenBSD 2.x
66122115Sume	2.7 BSD/OS 4.x
67122115Sume	3. Translator
68122115Sume	3.1 FAITH TCP relay translator
69122115Sume	3.2 IPv6-to-IPv4 header translator
70122115Sume	4. IPsec
71122115Sume	4.1 Policy Management
72122115Sume	4.2 Key Management
73122115Sume	4.3 AH and ESP handling
74122115Sume	4.4 IPComp handling
75122115Sume	4.5 Conformance to RFCs and IDs
76122115Sume	4.6 ECN consideration on IPsec tunnels
77122115Sume	4.7 Interoperability
78122115Sume	4.8 Operations with IPsec tunnel mode
79122115Sume	4.8.1 RFC2401 IPsec tunnel mode approach
80122115Sume	4.8.2 draft-touch-ipsec-vpn approach
81122115Sume	5. ALTQ
82122115Sume	6. Mobile IPv6
83122115Sume	6.1 KAME node as correspondent node
84122115Sume	6.2 KAME node as home agent/mobile node
85122115Sume	6.3 Old Mobile IPv6 code
86164224Sbz	7. Coding style
87164224Sbz	8. Policy on technology with intellectual property right restriction
88122115Sume
8957522Sshin1. IPv6
9057522Sshin
9157522Sshin1.1 Conformance
9257522Sshin
9357522SshinThe KAME kit conforms, or tries to conform, to the latest set of IPv6
9457522Sshinspecifications.  For future reference we list some of the relevant documents
9557522Sshinbelow (NOTE: this is not a complete list - this is too hard to maintain...).
9657522SshinFor details please refer to specific chapter in the document, RFCs, manpages
9757522Sshincome with KAME, or comments in the source code.
9857522Sshin
9962588SitojunConformance tests have been performed on past and latest KAME STABLE kit,
10057522Sshinat TAHI project.  Results can be viewed at http://www.tahi.org/report/KAME/.
10157522SshinWe also attended Univ. of New Hampshire IOL tests (http://www.iol.unh.edu/)
10257522Sshinin the past, with our past snapshots.
10357522Sshin
10457522SshinRFC1639: FTP Operation Over Big Address Records (FOOBAR)
10557522Sshin    * RFC2428 is preferred over RFC1639.  ftp clients will first try RFC2428,
10657522Sshin      then RFC1639 if failed.
10757522SshinRFC1886: DNS Extensions to support IPv6
10878064SumeRFC1933: (see RFC2893)
10957522SshinRFC1981: Path MTU Discovery for IPv6
11057522SshinRFC2080: RIPng for IPv6
11157522Sshin    * KAME-supplied route6d, bgpd and hroute6d support this.
11257522SshinRFC2283: Multiprotocol Extensions for BGP-4
11357522Sshin    * so-called "BGP4+".
11457522Sshin    * KAME-supplied bgpd supports this.
11557522SshinRFC2292: Advanced Sockets API for IPv6
116122115Sume    * see RFC3542
11757522SshinRFC2362: Protocol Independent Multicast-Sparse Mode (PIM-SM)
11878064Sume    * RFC2362 defines the packet formats and the protcol of PIM-SM.
11957522SshinRFC2373: IPv6 Addressing Architecture
12057522Sshin    * KAME supports node required addresses, and conforms to the scope
12157522Sshin      requirement.
12257522SshinRFC2374: An IPv6 Aggregatable Global Unicast Address Format
12357522Sshin    * KAME supports 64-bit length of Interface ID.
12457522SshinRFC2375: IPv6 Multicast Address Assignments
12557522Sshin    * Userland applications use the well-known addresses assigned in the RFC.
12657522SshinRFC2428: FTP Extensions for IPv6 and NATs
12757522Sshin    * RFC2428 is preferred over RFC1639.  ftp clients will first try RFC2428,
12857522Sshin      then RFC1639 if failed.
12957522SshinRFC2460: IPv6 specification
13057522SshinRFC2461: Neighbor discovery for IPv6
13157522Sshin    * See 1.2 in this document for details.
13257522SshinRFC2462: IPv6 Stateless Address Autoconfiguration
13357522Sshin    * See 1.4 in this document for details.
13457522SshinRFC2463: ICMPv6 for IPv6 specification
135122115Sume    * See 1.9 in this document for details.
13657522SshinRFC2464: Transmission of IPv6 Packets over Ethernet Networks
13757522SshinRFC2465: MIB for IPv6: Textual Conventions and General Group
13857522Sshin    * Necessary statistics are gathered by the kernel.  Actual IPv6 MIB
13957522Sshin      support is provided as patchkit for ucd-snmp.
14057522SshinRFC2466: MIB for IPv6: ICMPv6 group
14157522Sshin    * Necessary statistics are gathered by the kernel.  Actual IPv6 MIB
14257522Sshin      support is provided as patchkit for ucd-snmp.
14357522SshinRFC2467: Transmission of IPv6 Packets over FDDI Networks
14457522SshinRFC2472: IPv6 over PPP
14557522SshinRFC2492: IPv6 over ATM Networks
14657522Sshin    * only PVC is supported.
14757522SshinRFC2497: Transmission of IPv6 packet over ARCnet Networks
14857522SshinRFC2545: Use of BGP-4 Multiprotocol Extensions for IPv6 Inter-Domain Routing
149122115SumeRFC2553: (see RFC3493)
15078064SumeRFC2671: Extension Mechanisms for DNS (EDNS0)
15178064Sume    * see USAGE for how to use it.
15278064Sume    * not supported on kame/freebsd4 and kame/bsdi4.
15378064SumeRFC2673: Binary Labels in the Domain Name System
15478064Sume    * KAME/bsdi4 supports A6, DNAME and binary label to some extent.
15578064Sume    * KAME apps/bind8 repository has resolver library with partial A6, DNAME
15678064Sume      and binary label support.
15757522SshinRFC2675: IPv6 Jumbograms
15857522Sshin    * See 1.7 in this document for details.
15957522SshinRFC2710: Multicast Listener Discovery for IPv6
16057522SshinRFC2711: IPv6 router alert option
16162588SitojunRFC2732: Format for Literal IPv6 Addresses in URL's
16262588Sitojun    * The spec is implemented in programs that handle URLs
16362588Sitojun      (like freebsd ftpio(3) and fetch(1), or netbsd ftp(1))
16478064SumeRFC2874: DNS Extensions to Support IPv6 Address Aggregation and Renumbering
16578064Sume    * KAME/bsdi4 supports A6, DNAME and binary label to some extent.
16678064Sume    * KAME apps/bind8 repository has resolver library with partial A6, DNAME
16778064Sume      and binary label support.
16878064SumeRFC2893: Transition Mechanisms for IPv6 Hosts and Routers
16978064Sume    * IPv4 compatible address is not supported.
17078064Sume    * automatic tunneling (4.3) is not supported.
17178064Sume    * "gif" interface implements IPv[46]-over-IPv[46] tunnel in a generic way,
17278064Sume      and it covers "configured tunnel" described in the spec.
17378064Sume      See 1.5 in this document for details.
17478064SumeRFC2894: Router renumbering for IPv6
17578064SumeRFC3041: Privacy Extensions for Stateless Address Autoconfiguration in IPv6
17678064SumeRFC3056: Connection of IPv6 Domains via IPv4 Clouds
17778064Sume    * So-called "6to4".
17878064Sume    * "stf" interface implements it.  Be sure to read
17978064Sume      draft-itojun-ipv6-transition-abuse-01.txt
18078064Sume      below before configuring it, there can be security issues.
181148394SumeRFC3142: An IPv6-to-IPv4 transport relay translator
182148394Sume    * FAITH tcp relay translator (faithd) implements this.  See 3.1 for more
183148394Sume      details.
184122115SumeRFC3152: Delegation of IP6.ARPA
185122115Sume    * libinet6 resolvers contained in the KAME snaps support to use
186122115Sume      the ip6.arpa domain (with the nibble format) for IPv6 reverse
187122115Sume      lookups.
188122115SumeRFC3484: Default Address Selection for IPv6
189122115Sume    * the selection algorithm for both source and destination addresses
190122115Sume      is implemented based on the RFC, though some rules are still omitted.
191122115SumeRFC3493: Basic Socket Interface Extensions for IPv6
192122115Sume    * IPv4 mapped address (3.7) and special behavior of IPv6 wildcard bind
193122115Sume      socket (3.8) are,
194122115Sume	- supported and turned on by default on KAME/FreeBSD[34]
195122115Sume	  and KAME/BSDI4,
196122115Sume	- supported but turned off by default on KAME/NetBSD and KAME/FreeBSD5,
197122115Sume	- not supported on KAME/FreeBSD228, KAME/OpenBSD and KAME/BSDI3.
198122115Sume      see 1.12 in this document for details.
199148394Sume    * The AI_ALL and AI_V4MAPPED flags are not supported.
200122115SumeRFC3542: Advanced Sockets API for IPv6 (revised)
201122115Sume    * For supported library functions/kernel APIs, see sys/netinet6/ADVAPI.
20278064Sume    * Some of the updates in the draft are not implemented yet.  See
20378064Sume      TODO.2292bis for more details.
204151539SsuzRFC4007: IPv6 Scoped Address Architecture
205151539Ssuz    * some part of the documentation (especially about the routing
206151539Ssuz      model) is not supported yet.
207151539Ssuz    * zone indices that contain scope types have not been supported yet.
208151539Ssuz
209122115Sumedraft-ietf-ipngwg-icmp-name-lookups-09: IPv6 Name Lookups Through ICMP
210151539Ssuzdraft-ietf-ipv6-router-selection-07.txt:
21178064Sume	Default Router Preferences and More-Specific Routes
212151539Ssuz    * router-side: both router preference and specific routes are supported.
213151539Ssuz    * host-side: only router preference is supported.
21478064Sumedraft-ietf-pim-sm-v2-new-02.txt
21578064Sume	A revised version of RFC2362, which includes the IPv6 specific
21678064Sume	packet format and protocol descriptions.
21778064Sumedraft-ietf-dnsext-mdns-00.txt: Multicast DNS
21878064Sume    * kame/mdnsd has test implementation, which will not be built in
21978064Sume      default compilation.  The draft will experience a major change in the
22078064Sume      near future, so don't rely upon it.
221151539Ssuzdraft-ietf-ipngwg-icmp-v3-02.txt: ICMPv6 for IPv6 specification (revised)
222151539Ssuz    * See 1.9 in this document for details.
223122115Sumedraft-itojun-ipv6-tcp-to-anycast-01.txt:
224122115Sume	Disconnecting TCP connection toward IPv6 anycast address
225151539Ssuzdraft-ietf-ipv6-rfc2462bis-06.txt: IPv6 Stateless Address
226151539Ssuz	Autoconfiguration (revised)
227122115Sumedraft-itojun-ipv6-transition-abuse-01.txt:
22878064Sume	Possible abuse against IPv6 transition technologies (expired)
22978064Sume    * KAME does not implement RFC1933/2893 automatic tunnel.
23062588Sitojun    * "stf" interface implements some address filters.  Refer to stf(4)
23162588Sitojun      for details.  Since there's no way to make 6to4 interface 100% secure,
23262588Sitojun      we do not include "stf" interface into GENERIC.v6 compilation.
23362588Sitojun    * kame/openbsd completely disables IPv4 mapped address support.
23462588Sitojun    * kame/netbsd makes IPv4 mapped address support off by default.
23578064Sume    * See section 1.12.6 and 1.14 for more details.
23678064Sumedraft-itojun-ipv6-flowlabel-api-01.txt: Socket API for IPv6 flow label field
23778064Sume    * no consideration is made against the use of routing headers and such.
23857522Sshin
23957522Sshin1.2 Neighbor Discovery
24057522Sshin
241151539SsuzOur implementation of Neighbor Discovery is fairly stable.  Currently
242151539SsuzAddress Resolution, Duplicated Address Detection, and Neighbor
243151539SsuzUnreachability Detection are supported.  In the near future we will be
244151539Ssuzadding an Unsolicited Neighbor Advertisement transmission command as
245151539Ssuzan administration tool.
24657522Sshin
24762588SitojunDuplicated Address Detection (DAD) will be performed when an IPv6 address
24862588Sitojunis assigned to a network interface, or the network interface is enabled
24962588Sitojun(ifconfig up).  It is documented in RFC2462 5.4.
25057522SshinIf DAD fails, the address will be marked "duplicated" and message will be
25157522Sshingenerated to syslog (and usually to console).  The "duplicated" mark
25257522Sshincan be checked with ifconfig.  It is administrators' responsibility to check
25362588Sitojunfor and recover from DAD failures.  We may try to improve failure recovery
25462588Sitojunin future KAME code.
255151539Ssuz
256151539SsuzA successor version of RFC2462 (called rfc2462bis) clarifies the
257151539Ssuzbehavior when DAD fails (i.e., duplicate is detected): if the
258151539Ssuzduplicate address is a link-local address formed from an interface
259151539Ssuzidentifier based on the hardware address which is supposed to be
260151539Ssuzuniquely assigned (e.g., EUI-64 for an Ethernet interface), IPv6
261151539Ssuzoperation on the interface should be disabled.  The KAME
262151539Ssuzimplementation supports this as follows: if this type of duplicate is
263151539Ssuzdetected, the kernel marks "disabled" in the ND specific data
264151539Ssuzstructure for the interface.  Every IPv6 I/O operation in the kernel
265151539Ssuzchecks this mark, and the kernel will drop packets received on or
266151539Ssuzbeing sent to the "disabled" interface.  Whether the IPv6 operation is
267151539Ssuzdisabled or not can be confirmed by the ndp(8) command.  See the man
268151539Ssuzpage for more details.
269151539Ssuz
27062588SitojunDAD procedure may not be effective on certain network interfaces/drivers.
27162588SitojunIf a network driver needs long initialization time (with wireless network
27262588Sitojuninterfaces this situation is popular), and the driver mistakingly raises
27362588SitojunIFF_RUNNING before the driver becomes ready, DAD code will try to transmit
27462588SitojunDAD probes to not-really-ready network driver and the packet will not go out
27562588Sitojunfrom the interface.  In such cases, network drivers should be corrected.
27657522Sshin
27762588SitojunSome of network drivers loop multicast packets back to themselves,
278151539Ssuzeven if instructed not to do so (especially in promiscuous mode).  In
279151539Ssuzsuch cases DAD may fail, because the DAD engine sees inbound NS packet
280151539Ssuz(actually from the node itself) and considers it as a sign of
281151539Ssuzduplicate.  In this case, drivers should be corrected to honor
282151539SsuzIFF_SIMPLEX behavior.  For example, you may need to check source MAC
283151539Ssuzaddress on an inbound packet, and reject it if it is from the node
284151539Ssuzitself.
28557522Sshin
28657522SshinNeighbor Discovery specification (RFC2461) does not talk about neighbor
28757522Sshincache handling in the following cases:
28857522Sshin(1) when there was no neighbor cache entry, node received unsolicited
28957522Sshin    RS/NS/NA/redirect packet without link-layer address
29057522Sshin(2) neighbor cache handling on medium without link-layer address
29157522Sshin    (we need a neighbor cache entry for IsRouter bit)
29257522SshinFor (1), we implemented workaround based on discussions on IETF ipngwg mailing
29357522Sshinlist.  For more details, see the comments in the source code and email
29457522Sshinthread started from (IPng 7155), dated Feb 6 1999.
29557522Sshin
296151539SsuzIPv6 on-link determination rule (RFC2461) is quite different from
297151539Ssuzassumptions in BSD IPv4 network code.  To implement the behavior in
298151539SsuzRFC2461 section 6.3.6 (3), the kernel needs to know the default
29962588Sitojunoutgoing interface.  To configure the default outgoing interface, use
300151539Ssuzcommands like "ndp -I de0" as root.  Then the kernel will have a
301151539Ssuz"default" route to the interface with the cloning "C" bit being on.
302151539SsuzThis default route will cause to make a neighbor cache entry for every
303151539Ssuzdestination that does not match an explicit route entry.
30457522Sshin
305151539SsuzNote that we intentionally disable configuring the default interface
306151539Ssuzby default.  This is because we found it sometimes caused inconvenient
307151539Ssuzsituation while it was rarely useful in practical usage.  For example,
308151539Ssuzconsider a destination that has both IPv4 and IPv6 addresses but is
309151539Ssuzonly reachable via IPv4.  Since our getaddrinfo(3) prefers IPv6 by
310151539Ssuzdefault, an (TCP) application using the library with PF_UNSPEC first
311151539Ssuztries to connect to the IPv6 address.  If we turn on RFC 2461 6.3.6
312151539Ssuz(3), we have to wait for quite a long period before the first attempt
313151539Ssuzto make a connection fails.  If we turn it off, the first attempt will
314151539Ssuzimmediately fail with EHOSTUNREACH, and then the application can try
315151539Ssuzthe next, reachable address.
316151539Ssuz
317151539SsuzThe notion of the default interface is also disabled when the node is
318151539Ssuzacting as a router.  The reason is that routers tend to control all
319151539Ssuzroutes stored in the kernel and the default route automatically
320151539Ssuzinstalled would rather confuse the routers.  Note that the spec misuse
321151539Ssuzthe word "host" and "node" in several places in Section 5.2 of RFC
322151539Ssuz2461.  We basically read the word "node" in this section as "host,"
323151539Ssuzand thus believe the implementation policy does not break the
324151539Ssuzspecification.
325151539Ssuz
32657522SshinTo avoid possible DoS attacks and infinite loops, KAME stack will accept
32757522Sshinonly 10 options on ND packet.  Therefore, if you have 20 prefix options
32857522Sshinattached to RA, only the first 10 prefixes will be recognized.
329148394SumeIf this troubles you, please contact the KAME team and/or modify
33057522Sshinnd6_maxndopt in sys/netinet6/nd6.c.  If there are high demands we may
331148394Sumeprovide a sysctl knob for the variable.
33257522Sshin
33362588SitojunProxy Neighbor Advertisement support is implemented in the kernel.
33462588SitojunFor instance, you can configure it by using the following command:
33562588Sitojun	# ndp -s fe80::1234%ne0 0:1:2:3:4:5 proxy
33662588Sitojunwhere ne0 is the interface which attaches to the same link as the
33762588Sitojunproxy target.
33862588SitojunThere are certain limitations, though:
33962588Sitojun- It does not send unsolicited multicast NA on configuration.  This is MAY
34062588Sitojun  behavior in RFC2461.
34162588Sitojun- It does not add random delay before transmission of solicited NA.  This is
34262588Sitojun  SHOULD behavior in RFC2461.
34362588Sitojun- We cannot configure proxy NDP for off-link address.  The target address for
34462588Sitojun  proxying must be link-local address, or must be in prefixes configured to
34562588Sitojun  node which does proxy NDP.
34662588Sitojun- RFC2461 is unclear about if it is legal for a host to perform proxy ND.
34762588Sitojun  We do not prohibit hosts from doing proxy ND, but there will be very limited
34862588Sitojun  use in it.
34962588Sitojun
350151539SsuzStarting mid March 2000, we support Neighbor Unreachability Detection
351151539Ssuz(NUD) on p2p interfaces, including tunnel interfaces (gif).  NUD is
352151539Ssuzturned on by default.  Before March 2000 the KAME stack did not
353151539Ssuzperform NUD on p2p interfaces.  If the change raises any
354151539Ssuzinteroperability issues, you can turn off/on NUD by per-interface
355151539Ssuzbasis.  Use "ndp -i interface -nud" to turn it off.  Consult ndp(8)
356151539Ssuzfor details.
35762588Sitojun
35862588SitojunRFC2461 specifies upper-layer reachability confirmation hint.  Whenever
35962588Sitojunupper-layer reachability confirmation hint comes, ND process can use it
36062588Sitojunto optimize neighbor discovery process - ND process can omit real ND exchange
36162588Sitojunand keep the neighbor cache state in REACHABLE.
36262588SitojunWe currently have two sources for hints: (1) setsockopt(IPV6_REACHCONF)
363151539Ssuzdefined by the RFC3542 API, and (2) hints from tcp(6)_input.
364151539Ssuz
365151539SsuzIt is questionable if they are really trustworthy.  For example, a
366151539Ssuzrogue userland program can use IPV6_REACHCONF to confuse the ND
367151539Ssuzprocess.  Neighbor cache is a system-wide information pool, and it is
368151539Ssuzbad to allow a single process to affect others.  Also, tcp(6)_input
369151539Ssuzcan be hosed by hijack attempts.  It is wrong to allow hijack attempts
370151539Ssuzto affect the ND process.
371151539Ssuz
372151539SsuzStarting June 2000, the ND code has a protection mechanism against
373151539Ssuzincorrect upper-layer reachability confirmation.  The ND code counts
374151539Ssuzsubsequent upper-layer hints.  If the number of hints reaches the
375151539Ssuzmaximum, the ND code will ignore further upper-layer hints and run
376151539Ssuzreal ND process to confirm reachability to the peer.  sysctl
377151539Ssuznet.inet6.icmp6.nd6_maxnudhint defines the maximum # of subsequent
37862588Sitojunupper-layer hints to be accepted.
37962588Sitojun(from April 2000 to June 2000, we rejected setsockopt(IPV6_REACHCONF) from
380151539Ssuznon-root process - after a local discussion, it looks that hints are not
38162588Sitojunthat trustworthy even if they are from privileged processes)
38262588Sitojun
38378064SumeIf inbound ND packets carry invalid values, the KAME kernel will
38478064Sumedrop these packet and increment statistics variable.  See
38578064Sume"netstat -sn", icmp6 section.  For detailed debugging session, you can
38678064Sumeturn on syslog output from the kernel on errors, by turning on sysctl MIB
38778064Sumenet.inet6.icmp6.nd6_debug.  nd6_debug can be turned on at bootstrap
38878064Sumetime, by defining ND6_DEBUG kernel compilation option (so you can
38978064Sumedebug behavior during bootstrap).  nd6_debug configuration should
390148394Sumeonly be used for test/debug purposes - for a production environment,
39178064Sumend6_debug must be set to 0.  If you leave it to 1, malicious parties
39278064Sumecan inject broken packet and fill up /var/log partition.
39357522Sshin
39478064Sume1.3 Scope Zone Index
39578064Sume
39662588SitojunIPv6 uses scoped addresses.  It is therefore very important to
39778064Sumespecify the scope zone index (link index for a link-local address, or
39878064Sumesite index for a site-local address) with an IPv6 address.  Without a
39978064Sumezone index, a scoped IPv6 address is ambiguous to the kernel, and
400148394Sumethe kernel would not be able to determine the outbound zone for a
40178064Sumepacket to the scoped address.  KAME code tries to address the issue in
40278064Sumeseveral ways.
40357522Sshin
404148394SumeThe entire architecture of scoped addresses is documented in RFC4007.
405148394SumeOne non-trivial point of the architecture is that the link scope is
406148394Sume(theoretically) larger than the interface scope.  That is, two
407148394Sumedifferent interfaces can belong to a same single link.  However, in a
408148394Sumenormal operation, we can assume that there is 1-to-1 relationship
409148394Sumebetween links and interfaces.  In other words, we can usually put
410148394Sumelinks and interfaces in the same scope type.  The current KAME
411148394Sumeimplementation assumes the 1-to-1 relationship.  In particular, we use
412148394Sumeinterface names such as "ne1" as unique link identifiers.  This would
413148394Sumebe much more human-readable and intuitive than numeric identifiers,
414148394Sumebut please keep your mind on the theoretical difference between links
415148394Sumeand interfaces.
41657522Sshin
41778064SumeSite-local addresses are very vaguely defined in the specs, and both
41878064Sumethe specification and the KAME code need tons of improvements to
41978064Sumeenable its actual use.  For example, it is still very unclear how we
42078064Sumedefine a site, or how we resolve host names in a site.  There is work
42178064Sumeunderway to define behavior of routers at site border, but, we have
422148394Sumealmost no code for site boundary node support (neither forwarding nor
42378064Sumerouting) and we bet almost noone has.  We recommend, at this moment,
42478064Sumeyou to use global addresses for experiments - there are way too many
42578064Sumepitfalls if you use site-local addresses.
42678064Sume
42762588Sitojun1.3.1 Kernel internal
42862588Sitojun
42978064SumeIn the kernel, the link index for a link-local scope address is
43062588Sitojunembedded into the 2nd 16bit-word (the 3rd and 4th bytes) in the IPv6
43162588Sitojunaddress.
43257522SshinFor example, you may see something like:
43357522Sshin	fe80:1::200:f8ff:fe01:6317
43478064Sumein the routing table and the interface address structure (struct
43578064Sumein6_ifaddr).  The address above is a link-local unicast address which
43678064Sumebelongs to a network link whose link identifier is 1 (note that it
43778064Sumeeqauls to the interface index by the assumption of our
43878064Sumeimplementation).  The embedded index enables us to identify IPv6
43978064Sumelink-local addresses over multiple links effectively and with only a
44057522Sshinlittle code change.
44162588Sitojun
442148394SumeThe use of the internal format must be limited inside the kernel.  In
443148394Sumeparticular, addresses sent by an application should not contain the
444148394Sumeembedded index (except via some very special APIs such as routing
445148394Sumesockets).  Instead, the index should be specified in the sin6_scope_id
446148394Sumefield of a sockaddr_in6 structure.  Obviously, packets sent to or
447148394Sumereceived from must not contain the embedded index either, since the
448148394Sumeindex is meaningful only within the sending/receiving node.
449148394Sume
450148394SumeIn order to deal with the differences, several kernel routines are
451148394Sumeprovided.  These are available by including <netinet6/scope_var.h>.
452148394SumeTypically, the following functions will be most generally used:
453148394Sume
454148394Sume- int sa6_embedscope(struct sockaddr_in6 *sa6, int defaultok);
455148394Sume  Embed sa6->sin6_scope_id into sa6->sin6_addr.  If sin6_scope_id is
456148394Sume  0, defaultok is non-0, and the default zone ID (see RFC4007) is
457148394Sume  configured, the default ID will be used instead of the value of the
458148394Sume  sin6_scope_id field.  On success, sa6->sin6_scope_id will be reset
459148394Sume  to 0.
460148394Sume
461148394Sume  This function returns 0 on success, or a non-0 error code otherwise.
462148394Sume 
463148394Sume- int sa6_recoverscope(struct sockaddr_in6 *sa6);
464148394Sume  Extract embedded zone ID in sa6->sin6_addr and set
465148394Sume  sa6->sin6_scope_id to that ID.  The embedded ID will be cleared with
466148394Sume  0.
467148394Sume
468148394Sume  This function returns 0 on success, or a non-0 error code otherwise.
469148394Sume
470148394Sume- int in6_clearscope(struct in6_addr *in6);
471148394Sume  Reset the embedded zone ID in 'in6' to 0.  This function never fails, and
472148394Sume  returns 0 if the original address is intact or non 0 if the address is
473148394Sume  modified.  The return value doesn't matter in most cases; currently, the
474148394Sume  only point where we care about the return value is ip6_input() for checking
475148394Sume  whether the source or destination addresses of the incoming packet is in
476148394Sume  the embedded form.
477148394Sume
478148394Sume- int in6_setscope(struct in6_addr *in6, struct ifnet *ifp,
479148394Sume                   u_int32_t *zoneidp);
480148394Sume  Embed zone ID determined by the address scope type for 'in6' and the
481148394Sume  interface 'ifp' into 'in6'.  If zoneidp is non NULL, *zoneidp will
482148394Sume  also have the zone ID.
483148394Sume
484148394Sume  This function returns 0 on success, or a non-0 error code otherwise.
485148394Sume
486148394SumeThe typical usage of these functions is as follows:
487148394Sume
488148394Sumesa6_embedscope() will be used at the socket or transport layer to
489148394Sumeconvert a sockaddr_in6 structure passed by an application into the
490148394Sumekernel-internal form.  In this usage, the second argument is often the
491148394Sume'ip6_use_defzone' global variable.
492148394Sume
493148394Sumesa6_recoverscope() will also be used at the socket or transport layer
494148394Sumeto convert an in6_addr structure with the embedded zone ID into a
495148394Sumesockaddr_in6 structure with the corresponding ID in the sin6_scope_id
496148394Sumefield (and without the embedded ID in sin6_addr).
497148394Sume
498148394Sumein6_clearscope() will be used just before sending a packet to the wire
499148394Sumeto remove the embedded ID.  In general, this must be done at the last
500148394Sumestage of an output path, since otherwise the address would lose the ID
501148394Sumeand could be ambiguous with regard to scope.
502148394Sume
503148394Sumein6_setscope() will be used when the kernel receives a packet from the
504148394Sumewire to construct the kernel internal form for each address field in
505148394Sumethe packet (typical examples are the source and destination addresses
506148394Sumeof the packet).  In the typical usage, the third argument 'zoneidp'
507148394Sumewill be NULL.  A non-NULL value will be used when the validity of the
508148394Sumezone ID must be checked, e.g., when forwarding a packet to another
509148394Sumelink (see ip6_forward() for this usage).
510148394Sume
511148394SumeAn application, when sending a packet, is basically assumed to specify
512148394Sumethe appropriate scope zone of the destination address by the
513148394Sumesin6_scope_id field (this might be done transparently from the
514148394Sumeapplication with getaddrinfo() and the extended textual format - see
515148394Sumebelow), or at least the default scope zone(s) must be configured as a
516148394Sumelast resort.  In some cases, however, an application could specify an
517148394Sumeambiguous address with regard to scope, expecting it is disambiguated
518148394Sumein the kernel by some other means.  A typical usage is to specify the
519148394Sumeoutgoing interface through another API, which can disambiguate the
520148394Sumeunspecified scope zone.  Such a usage is not recommended, but the
521148394Sumekernel implements some trick to deal with even this case.
522148394Sume
523148394SumeA rough sketch of the trick can be summarized as the following
524148394Sumesequence.
525148394Sume
526148394Sume   sa6_embedscope(dst, ip6_use_defzone);
527148394Sume   in6_selectsrc(dst, ..., &ifp, ...);
528148394Sume   in6_setscope(&dst->sin6_addr, ifp, NULL);
529148394Sume
530148394Sumesa6_embedscope() first tries to convert sin6_scope_id (or the default
531148394Sumezone ID) into the kernel-internal form.  This can fail with an
532148394Sumeambiguous destination, but it still tries to get the outgoing
533148394Sumeinterface (ifp) in the attempt of determining the source address of
534148394Sumethe outgoing packet using in6_selectsrc().  If the interface is
535148394Sumedetected, and the scope zone was originally ambiguous, in6_setscope()
536148394Sumecan finally determine the appropriate ID with the address itself and
537148394Sumethe interface, and construct the kernel-internal form.  See, for
538148394Sumeexample, comments in udp6_output() for more concrete example.
539148394Sume
540148394SumeIn any case, kernel routines except ones in netinet6/scope6.c MUST NOT
541148394Sumedirectly refer to the embedded form.  They MUST use the above
542148394Sumeinterface functions.  In particular, kernel routines MUST NOT have the
543148394Sumefollowing code fragment:
544148394Sume
545148394Sume	/* This is a bad practice.  Don't do this */
546148394Sume	if (IN6_IS_ADDR_LINKLOCAL(&sin6->sin6_addr))
547148394Sume		sin6->sin6_addr.s6_addr16[1] = htons(ifp->if_index);
548148394Sume
549148394SumeThis is bad for several reasons.  First, address ambiguity is not
550148394Sumespecific to link-local addresses (any non-global multicast addresses
551148394Sumeare inherently ambiguous, and this is particularly true for
552148394Sumeinterface-local addresses).  Secondly, this is vulnerable to future
553148394Sumechanges of the embedded form (the embedded position may change, or the
554148394Sumezone ID may not actually be the interface index).  Only scope6.c
555148394Sumeroutines should know the details.
556148394Sume
557148394SumeThe above code fragment should thus actually be as follows:
558148394Sume
559148394Sume	/* This is correct. */
560148394Sume	in6_setscope(&sin6->sin6_addr, ifp, NULL);
561148394Sume	(and catch errors if possible and necessary)
562148394Sume
56362588Sitojun1.3.2 Interaction with API
56462588Sitojun
56578064SumeThere are several candidates of API to deal with scoped addresses
56678064Sumewithout ambiguity.
56762588Sitojun
56878064SumeThe IPV6_PKTINFO ancillary data type or socket option defined in the
569122115Sumeadvanced API (RFC2292 or RFC3542) can specify
57078064Sumethe outgoing interface of a packet.  Similarly, the IPV6_PKTINFO or
57178064SumeIPV6_RECVPKTINFO socket options tell kernel to pass the incoming
57278064Sumeinterface to user applications.
57357522Sshin
57478064SumeThese options are enough to disambiguate scoped addresses of an
57578064Sumeincoming packet, because we can uniquely identify the corresponding
57678064Sumezone of the scoped address(es) by the incoming interface.  However,
57778064Sumethey are too strong for outgoing packets.  For example, consider a
57878064Sumemulti-sited node and suppose that more than one interface of the node
57978064Sumebelongs to a same site.  When we want to send a packet to the site,
58078064Sumewe can only specify one of the interfaces for the outgoing packet with
58178064Sumethese options; we cannot just say "send the packet to (one of the
58278064Sumeinterfaces of) the site."
58357522Sshin
58478064SumeAnother kind of candidates is to use the sin6_scope_id member in the
585122115Sumesockaddr_in6 structure, defined in RFC2553.  The KAME kernel
586122115Sumeinterprets the sin6_scope_id field properly in order to disambiguate scoped
58778064Sumeaddresses.  For example, if an application passes a sockaddr_in6
58878064Sumestructure that has a non-zero sin6_scope_id value to the sendto(2)
58978064Sumesystem call, the kernel should send the packet to the appropriate zone
59078064Sumeaccording to the sin6_scope_id field.  Similarly, when the source or
59178064Sumethe destination address of an incoming packet is a scoped one, the
59278064Sumekernel should detect the correct zone identifier based on the address
59378064Sumeand the receiving interface, fill the identifier in the sin6_scope_id
59478064Sumefield of a sockaddr_in6 structure, and then pass the packet to an
59578064Sumeapplication via the recvfrom(2) system call, etc.
59678064Sume
59778064SumeHowever, the semantics of the sin6_scope_id is still vague and on the
59878064Sumeway to standardization.  Additionally, not so many operating systems
59978064Sumesupport the behavior above at this moment.
60078064Sume
60178064SumeIn summary,
60278064Sume- If your target system is limited to KAME based ones (i.e. BSD
60378064Sume  variants and KAME snaps), use the sin6_scope_id field assuming the
60478064Sume  kernel behavior described above.
60578064Sume- Otherwise, (i.e. if your program should be portable on other systems
60678064Sume  than BSDs)
60778064Sume  + Use the advanced API to disambiguate scoped addresses of incoming
60878064Sume    packets.
60978064Sume  + To disambiguate scoped addresses of outgoing packets,
61078064Sume    * if it is okay to just specify the outgoing interface, use the
61178064Sume      advanced API.  This would be the case, for example, when you
61278064Sume      should only consider link-local addresses and your system
61378064Sume      assumes 1-to-1 relationship between links and interfaces.
61478064Sume    * otherwise, sorry but you lose.  Please rush the IETF IPv6
61578064Sume      community into standardizing the semantics of the sin6_scope_id
61678064Sume      field.
61778064Sume
61878064SumeRouting daemons and configuration programs, like route6d and ifconfig,
61978064Sumewill need to manipulate the "embedded" zone index.  These programs use
62078064Sumerouting sockets and ioctls (like SIOCGIFADDR_IN6) and the kernel API
62178064Sumewill return IPv6 addresses with the 2nd 16bit-word filled in.  The
62278064SumeAPIs are for manipulating kernel internal structure.  Programs that
62378064Sumeuse these APIs have to be prepared about differences in kernels
62478064Sumeanyway.
62578064Sume
62678064Sumegetaddrinfo(3) and getnameinfo(3) support an extended numeric IPv6
627148394Sumesyntax, as documented in RFC4007.  You can specify the outgoing link,
628148394Sumeby using the name of the outgoing interface as the link, like
629148394Sume"fe80::1%ne0" (again, note that we assume there is 1-to-1 relationship
630148394Sumebetween links and interfaces.)  This way you will be able to specify a
631148394Sumelink-local scoped address without much trouble.
63278064Sume
63378064SumeOther APIs like inet_pton(3) and inet_ntop(3) are inherently
63478064Sumeunfriendly with scoped addresses, since they are unable to annotate
63578064Sumeaddresses with zone identifier.
63678064Sume
63762588Sitojun1.3.3 Interaction with users (command line)
63862588Sitojun
63978064SumeMost of user applications now support the extended numeric IPv6
64078064Sumesyntax.  In this case, you can specify outgoing link, by using the name
64178064Sumeof the outgoing interface like "fe80::1%ne0" (sorry for the duplicated
64278064Sumenotice, but please recall again that we assume 1-to-1 relationship
64378064Sumebetween links and interfaces).  This is even the case for some
64478064Sumemanagement tools such as route(8) or ndp(8).  For example, to install
64578064Sumethe IPv6 default route by hand, you can type like
64662588Sitojun	# route add -inet6 default fe80::9876:5432:1234:abcd%ne0
64762588Sitojun(Although we suggest you to run dynamic routing instead of static
64862588Sitojunroutes, in order to avoid configuration mistakes.)
64962588Sitojun
65062588SitojunSome applications have command line options for specifying an
65162588Sitojunappropriate zone of a scoped address (like "ping6 -I ne0 ff02::1" to
65278064Sumespecify the outgoing interface).  However, you can't always expect such
653122115Sumeoptions.  Additionally, specifying the outgoing "interface" is in
654122115Sumetheory an overspecification as a way to specify the outgoing "link"
655122115Sume(see above).  Thus, we recommend you to use the extended format
656122115Sumedescribed above.  This should apply to the case where the outgoing
657122115Sumeinterface is specified.
65862588Sitojun
65962588SitojunIn any case, when you specify a scoped address to the command line,
66062588SitojunNEVER write the embedded form (such as ff02:1::1 or fe80:2::fedc),
66162588Sitojunwhich should only be used inside the kernel (see Section 1.3.1), and 
66262588Sitojunis not supposed to work.
66362588Sitojun
66457522Sshin1.4 Plug and Play
66557522Sshin
66657522SshinThe KAME kit implements most of the IPv6 stateless address
66757522Sshinautoconfiguration in the kernel.
66857522SshinNeighbor Discovery functions are implemented in the kernel as a whole.
66957522SshinRouter Advertisement (RA) input for hosts is implemented in the
67057522Sshinkernel.  Router Solicitation (RS) output for endhosts, RS input
67157522Sshinfor routers, and RA output for routers are implemented in the
67257522Sshinuserland.
67357522Sshin
67457522Sshin1.4.1 Assignment of link-local, and special addresses
67557522Sshin
67662588SitojunIPv6 link-local address is generated from IEEE802 address (ethernet MAC
67757522Sshinaddress).  Each of interface is assigned an IPv6 link-local address
67857522Sshinautomatically, when the interface becomes up (IFF_UP).  Also, direct route
67957522Sshinfor the link-local address is added to routing table.
68057522Sshin
68157522SshinHere is an output of netstat command:
68257522Sshin
68357522SshinInternet6:
68457522SshinDestination                   Gateway                   Flags      Netif Expire
68562588Sitojunfe80::%ed0/64                 link#1                    UC           ed0
68662588Sitojunfe80::%ep0/64                 link#2                    UC           ep0
68757522Sshin
68857522SshinInterfaces that has no IEEE802 address (pseudo interfaces like tunnel
68957522Sshininterfaces, or ppp interfaces) will borrow IEEE802 address from other
69057522Sshininterfaces, such as ethernet interfaces, whenever possible.
69157522SshinIf there is no IEEE802 hardware attached, last-resort pseudorandom value,
69257522Sshinwhich is from MD5(hostname), will be used as source of link-local address.
69357522SshinIf it is not suitable for your usage, you will need to configure the
69457522Sshinlink-local address manually.
69557522Sshin
69657522SshinIf an interface is not capable of handling IPv6 (such as lack of multicast
69757522Sshinsupport), link-local address will not be assigned to that interface.
69857522SshinSee section 2 for details.
69957522Sshin
70057522SshinEach interface joins the solicited multicast address and the
70157522Sshinlink-local all-nodes multicast addresses (e.g.  fe80::1:ff01:6317
70257522Sshinand ff02::1, respectively, on the link the interface is attached).
70357522SshinIn addition to a link-local address, the loopback address (::1) will be
70457522Sshinassigned to the loopback interface.  Also, ::1/128 and ff01::/32 are
70557522Sshinautomatically added to routing table, and loopback interface joins
70657522Sshinnode-local multicast group ff01::1.
70757522Sshin
70857522Sshin1.4.2 Stateless address autoconfiguration on hosts
70957522Sshin
71057522SshinIn IPv6 specification, nodes are separated into two categories:
71157522Sshinrouters and hosts.  Routers forward packets addressed to others, hosts does
71257522Sshinnot forward the packets.  net.inet6.ip6.forwarding defines whether this
71362588Sitojunnode is a router or a host (router if it is 1, host if it is 0).
71457522Sshin
71562588SitojunIt is NOT recommended to change net.inet6.ip6.forwarding while the node
716148394Sumeis in operation.  IPv6 specification defines behavior for "host" and "router"
71762588Sitojunquite differently, and switching from one to another can cause serious
71862588Sitojuntroubles.  It is recommended to configure the variable at bootstrap time only.
71962588Sitojun
72062588SitojunThe first step in stateless address configuration is Duplicated Address
72162588SitojunDetection (DAD).  See 1.2 for more detail on DAD.
72262588Sitojun
72357522SshinWhen a host hears Router Advertisement from the router, a host may
724151539Ssuzautoconfigure itself by stateless address autoconfiguration.  This
725151539Ssuzbehavior can be controlled by the net.inet6.ip6.accept_rtadv sysctl
726151539Ssuzvariable and a per-interface flag managed in the kernel.  The latter,
727151539Ssuzwhich we call "if_accept_rtadv" here, can be changed by the ndp(8)
728151539Ssuzcommand (see the manpage for more details).  When the sysctl variable
729151539Ssuzis set to 1, and the flag is set, the host autoconfigures itself.  By
730151539Ssuzautoconfiguration, network address prefixes for the receiving
731151539Ssuzinterface (usually global address prefix) are added.  The default
732151539Ssuzroute is also configured.
73357522Sshin
73462588SitojunRouters periodically generate Router Advertisement packets.  To
73562588Sitojunrequest an adjacent router to generate RA packet, a host can transmit
73662588SitojunRouter Solicitation.  To generate an RS packet at any time, use the
737151539Ssuz"rtsol" command.  The "rtsold" daemon is also available. "rtsold"
738151539Ssuzgenerates Router Solicitation whenever necessary, and it works greatly
73962588Sitojunfor nomadic usage (notebooks/laptops).  If one wishes to ignore Router
74062588SitojunAdvertisements, use sysctl to set net.inet6.ip6.accept_rtadv to 0.
741151539SsuzAdditionally, ndp(8) command can be used to control the behavior
742151539Ssuzper-interface basis.
74362588Sitojun
74457522SshinTo generate Router Advertisement from a router, use the "rtadvd" daemon.
74557522Sshin
74662588SitojunNote that the IPv6 specification assumes the following items and that
74762588Sitojunnonconforming cases are left unspecified:
74857522Sshin- Only hosts will listen to router advertisements
749151539Ssuz- Hosts have a single network interface (except loopback)
75062588SitojunThis is therefore unwise to enable net.inet6.ip6.accept_rtadv on routers,
751151539Ssuzor multi-interface hosts.  A misconfigured node can behave strange
75257522Sshin(KAME code allows nonconforming configuration, for those who would like
75357522Sshinto do some experiments).
75457522Sshin
75557522SshinTo summarize the sysctl knob:
75657522Sshin	accept_rtadv	forwarding	role of the node
75757522Sshin	---		---		---
75857522Sshin	0		0		host (to be manually configured)
75957522Sshin	0		1		router
76057522Sshin	1		0		autoconfigured host
761151539Ssuz					(spec assumes that hosts have a single
762151539Ssuz					interface only, autoconfigred hosts
763151539Ssuz					with multiple interfaces are
764151539Ssuz					out-of-scope)
76557522Sshin	1		1		invalid, or experimental
76657522Sshin					(out-of-scope of spec)
76757522Sshin
768151539SsuzThe if_accept_rtadv flag is referred only when accept_rtadv is 1 (the
769151539Ssuzlatter two cases).  The flag does not have any effects when the sysctl
770151539Ssuzvariable is 0.
771151539Ssuz
77257522SshinSee 1.2 in the document for relationship between DAD and autoconfiguration.
77357522Sshin
77462588Sitojun1.4.3 DHCPv6
77557522Sshin
77662588SitojunWe supply a tiny DHCPv6 server/client in kame/dhcp6. However, the
77762588Sitojunimplementation is premature (for example, this does NOT implement
77862588Sitojunaddress lease/release), and it is not in default compilation tree on
77962588Sitojunsome platforms. If you want to do some experiment, compile it on your
78062588Sitojunown.
78157522Sshin
78257522SshinDHCPv6 and autoconfiguration also needs more work.  "Managed" and "Other"
78357522Sshinbits in RA have no special effect to stateful autoconfiguration procedure
78457522Sshinin DHCPv6 client program ("Managed" bit actually prevents stateless
78557522Sshinautoconfiguration, but no special action will be taken for DHCPv6 client).
78657522Sshin
78757522Sshin1.5 Generic tunnel interface
78857522Sshin
78957522SshinGIF (Generic InterFace) is a pseudo interface for configured tunnel.
79057522SshinDetails are described in gif(4) manpage.
79157522SshinCurrently
79257522Sshin	v6 in v6
79357522Sshin	v6 in v4
79457522Sshin	v4 in v6
79557522Sshin	v4 in v4
79657522Sshinare available.  Use "gifconfig" to assign physical (outer) source
79757522Sshinand destination address to gif interfaces.
79857522SshinConfiguration that uses same address family for inner and outer IP
79957522Sshinheader (v4 in v4, or v6 in v6) is dangerous.  It is very easy to
80057522Sshinconfigure interfaces and routing tables to perform infinite level
80157522Sshinof tunneling.  Please be warned.
80257522Sshin
80357522Sshingif can be configured to be ECN-friendly.  See 4.5 for ECN-friendliness
80457522Sshinof tunnels, and gif(4) manpage for how to configure.
80557522Sshin
80657522SshinIf you would like to configure an IPv4-in-IPv6 tunnel with gif interface,
80762588Sitojunread gif(4) carefully.  You may need to remove IPv6 link-local address
80857522Sshinautomatically assigned to the gif interface.
80957522Sshin
810122115Sume1.6 Address Selection
81157522Sshin
812122115Sume1.6.1 Source Address Selection
81357522Sshin
814122115SumeThe KAME kernel chooses the source address for an outgoing packet
815122115Sumesent from a user application as follows:
81662588Sitojun
817122115Sume1. if the source address is explicitly specified via an IPV6_PKTINFO
818122115Sume   ancillary data item or the socket option of that name, just use it.
819122115Sume   Note that this item/option overrides the bound address of the
820122115Sume   corresponding (datagram) socket.
82157522Sshin
822122115Sume2. if the corresponding socket is bound, use the bound address.
82362588Sitojun
824122115Sume3. otherwise, the kernel first tries to find the outgoing interface of
825122115Sume   the packet.  If it fails, the source address selection also fails.
826122115Sume   If the kernel can find an interface, choose the most appropriate
827122115Sume   address based on the algorithm described in RFC3484.
82862588Sitojun
829122115Sume   The policy table used in this algorithm is stored in the kernel.
830122115Sume   To install or view the policy, use the ip6addrctl(8) command.  The
831122115Sume   kernel does not have pre-installed policy.  It is expected that the
832122115Sume   default policy described in the draft should be installed at the
833122115Sume   bootstrap time using this command.
83462588Sitojun
835122115Sume   This draft allows an implementation to add implementation-specific
836122115Sume   rules with higher precedence than the rule "Use longest matching
837122115Sume   prefix."  KAME's implementation has the following additional rules
838122115Sume   (that apply in the appeared order):
83978064Sume
840122115Sume   - prefer addresses on alive interfaces, that is, interfaces with
841122115Sume     the UP flag being on.  This rule is particularly useful for
842122115Sume     routers, since some routing daemons stop advertising prefixes
843122115Sume    (addresses) on interfaces that have become down.
84462588Sitojun
845151539Ssuz   - prefer addresses on "preferred" interfaces.  "Preferred"
846151539Ssuz     interfaces can be specified by the ndp(8) command.  By default,
847151539Ssuz     no interface is preferred, that is, this rule does not apply.
848151539Ssuz     Again, this rule is particularly useful for routers, since there
849151539Ssuz     is a convention, among router administrators, of assigning
850151539Ssuz     "stable" addresses on a particular interface (typically a
851151539Ssuz     loopback interface).
852151539Ssuz
853122115Sume   In any case, addresses that break the scope zone of the
854122115Sume   destination, or addresses whose zone do not contain the outgoing
855122115Sume   interface are never chosen.
85662588Sitojun
857122115SumeWhen the procedure above fails, the kernel usually returns
858122115SumeEADDRNOTAVAIL to the application.
85962588Sitojun
860122115SumeIn some cases, the specification explicitly requires the
861122115Sumeimplementation to choose a particular source address.  The source
862122115Sumeaddress for a Neighbor Advertisement (NA) message is an example.
86357522SshinUnder the spec (RFC2461 7.2.2) NA's source should be the target
864122115Sumeaddress of the corresponding NS's target.  In this case we follow the
865122115Sumespec rather than the above rule.
86657522Sshin
86762588SitojunIf you would like to prohibit the use of deprecated address for some
86862588Sitojunreason, configure net.inet6.ip6.use_deprecated to 0.  The issue
86962588Sitojunrelated to deprecated address is described in RFC2462 5.5.4 (NOTE:
87062588Sitojunthere is some debate underway in IETF ipngwg on how to use
87157522Sshin"deprecated" address).
87257522Sshin
873122115SumeAs documented in the source address selection document, temporary
874122115Sumeaddresses for privacy extension are less preferred to public addresses
875122115Sumeby default.  However, for administrators who are particularly aware of
876122115Sumethe privacy, there is a system-wide sysctl(3) variable
877122115Sume"net.inet6.ip6.prefer_tempaddr".  When the variable is set to
878122115Sumenon-zero, the kernel will rather prefer temporary addresses.  The
879122115Sumedefault value of this variable is 0.
880122115Sume
881122115Sume1.6.2 Destination Address Ordering
882122115Sume
883122115SumeKAME's getaddrinfo(3) supports the destination address ordering
884122115Sumealgorithm described in RFC3484.  Getaddrinfo(3) needs to know the
885122115Sumesource address for each destination address and policy entries
886122115Sume(described in the previous section) for the source and destination
887122115Sumeaddresses.  To get the source address, the library function opens a
888122115SumeUDP socket and tries to connect(2) for the destination.  To get the
889122115Sumepolicy entry, the function issues sysctl(3).
890122115Sume
89157522Sshin1.7 Jumbo Payload
89257522Sshin
89357522SshinKAME supports the Jumbo Payload hop-by-hop option used to send IPv6
89457522Sshinpackets with payloads longer than 65,535 octets.  But since currently
89557522SshinKAME does not support any physical interface whose MTU is more than
89657522Sshin65,535, such payloads can be seen only on the loopback interface(i.e.
89757522Sshinlo0).
89857522Sshin
89957522SshinIf you want to try jumbo payloads, you first have to reconfigure the
90057522Sshinkernel so that the MTU of the loopback interface is more than 65,535
90157522Sshinbytes; add the following to the kernel configuration file:
90257522Sshin	options		"LARGE_LOMTU"		#To test jumbo payload
90357522Sshinand recompile the new kernel.
90457522Sshin
90557522SshinThen you can test jumbo payloads by the ping6 command with -b and -s
90657522Sshinoptions.  The -b option must be specified to enlarge the size of the
90757522Sshinsocket buffer and the -s option specifies the length of the packet,
90862588Sitojunwhich should be more than 65,535.  For example, type as follows; 
90957522Sshin	% ping6 -b 70000 -s 68000 ::1
91057522Sshin
91157522SshinThe IPv6 specification requires that the Jumbo Payload option must not
91257522Sshinbe used in a packet that carries a fragment header.  If this condition
91357522Sshinis broken, an ICMPv6 Parameter Problem message must be sent to the
91457522Sshinsender.  KAME kernel follows the specification, but you cannot usually
91557522Sshinsee an ICMPv6 error caused by this requirement.
91657522Sshin
91757522SshinIf KAME kernel receives an IPv6 packet, it checks the frame length of
91857522Sshinthe packet and compares it to the length specified in the payload
91957522Sshinlength field of the IPv6 header or in the value of the Jumbo Payload
92057522Sshinoption, if any.  If the former is shorter than the latter, KAME kernel
921148394Sumediscards the packet and increments the statistics.  You can see the
92257522Sshinstatistics as output of netstat command with `-s -p ip6' option:
92357522Sshin	% netstat -s -p ip6
92457522Sshin	ip6:
92557522Sshin		(snip)
92657522Sshin		1 with data size < data length
92757522Sshin
92857522SshinSo, KAME kernel does not send an ICMPv6 error unless the erroneous
92957522Sshinpacket is an actual Jumbo Payload, that is, its packet size is more
93057522Sshinthan 65,535 bytes.  As described above, KAME kernel currently does not
93157522Sshinsupport physical interface with such a huge MTU, so it rarely returns an
93257522SshinICMPv6 error.
93357522Sshin
93457522SshinTCP/UDP over jumbogram is not supported at this moment.  This is because
93557522Sshinwe have no medium (other than loopback) to test this.  Contact us if you
93657522Sshinneed this.
93757522Sshin
93857522SshinIPsec does not work on jumbograms.  This is due to some specification twists
93957522Sshinin supporting AH with jumbograms (AH header size influences payload length,
94057522Sshinand this makes it real hard to authenticate inbound packet with jumbo payload
94157522Sshinoption as well as AH).
94257522Sshin
94357522SshinThere are fundamental issues in *BSD support for jumbograms.  We would like to
94462588Sitojunaddress those, but we need more time to finalize the task.  To name a few:
94562588Sitojun- mbuf pkthdr.len field is typed as "int" in 4.4BSD, so it cannot hold
94657522Sshin  jumbogram with len > 2G on 32bit architecture CPUs.  If we would like to
94757522Sshin  support jumbogram properly, the field must be expanded to hold 4G +
94857522Sshin  IPv6 header + link-layer header.  Therefore, it must be expanded to at least
94957522Sshin  int64_t (u_int32_t is NOT enough).
95057522Sshin- We mistakingly use "int" to hold packet length in many places.  We need
95162588Sitojun  to convert them into larger numeric type.  It needs a great care, as we may
95257522Sshin  experience overflow during packet length computation.
95357522Sshin- We mistakingly check for ip6_plen field of IPv6 header for packet payload
95457522Sshin  length in various places.  We should be checking mbuf pkthdr.len instead.
95557522Sshin  ip6_input() will perform sanity check on jumbo payload option on input,
95657522Sshin  and we can safely use mbuf pkthdr.len afterwards.
95762588Sitojun- TCP code needs careful updates in bunch of places, of course.
95857522Sshin
95957522Sshin1.8 Loop prevention in header processing
96057522Sshin
96157522SshinIPv6 specification allows arbitrary number of extension headers to
96257522Sshinbe placed onto packets.  If we implement IPv6 packet processing
96357522Sshincode in the way BSD IPv4 code is implemented, kernel stack may
96457522Sshinoverflow due to long function call chain.  KAME sys/netinet6 code
96557522Sshinis carefully designed to avoid kernel stack overflow.  Because of
96657522Sshinthis, KAME sys/netinet6 code defines its own protocol switch
96757522Sshinstructure, as "struct ip6protosw" (see netinet6/ip6protosw.h).
96878064Sume
96978064SumeIn addition to this, we restrict the number of extension headers
97078064Sume(including the IPv6 header) in each incoming packet, in order to
97178064Sumeprevent a DoS attack that tries to send packets with a massive number
97278064Sumeof extension headers.  The upper limit can be configured by the sysctl
973148394Sumevalue net.inet6.ip6.hdrnestlimit.  In particular, if the value is 0,
97478064Sumethe node will allow an arbitrary number of headers. As of writing this
97578064Sumedocument, the default value is 50.
97678064Sume
97762588SitojunIPv4 part (sys/netinet) remains untouched for compatibility.
97857522SshinBecause of this, if you receive IPsec-over-IPv4 packet with massive
97957522Sshinnumber of IPsec headers, kernel stack may blow up.  IPsec-over-IPv6 is okay.
98057522Sshin
98157522Sshin1.9 ICMPv6
98257522Sshin
98357522SshinAfter RFC2463 was published, IETF ipngwg has decided to disallow ICMPv6 error
98457522Sshinpacket against ICMPv6 redirect, to prevent ICMPv6 storm on a network medium.
98557522SshinKAME already implements this into the kernel.
98657522Sshin
98762588SitojunRFC2463 requires rate limitation for ICMPv6 error packets generated by a
98862588Sitojunnode, to avoid possible DoS attacks.  KAME kernel implements two rate-
98962588Sitojunlimitation mechanisms, tunable via sysctl:
99062588Sitojun- Minimum time interval between ICMPv6 error packets
99162588Sitojun	KAME kernel will generate no more than one ICMPv6 error packet,
99262588Sitojun	during configured time interval.  net.inet6.icmp6.errratelimit
99362588Sitojun	controls the interval (default: disabled).
99462588Sitojun- Maximum ICMPv6 error packet-per-second
99562588Sitojun	KAME kernel will generate no more than the configured number of
99662588Sitojun	packets in one second.  net.inet6.icmp6.errppslimit controls the
99762588Sitojun	maximum packet-per-second value (default: 200pps)
99862588SitojunBasically, we need to pick values that are suitable against the bandwidth
99962588Sitojunof link layer devices directly attached to the node.  In some cases the
100062588Sitojundefault values may not fit well.  We are still unsure if the default value
100162588Sitojunis sane or not.  Comments are welcome.
100262588Sitojun
100357522Sshin1.10 Applications
100457522Sshin
100557522SshinFor userland programming, we support IPv6 socket API as specified in
1006148394SumeRFC2553/3493, RFC3542 and upcoming internet drafts.
100757522Sshin
100857522SshinTCP/UDP over IPv6 is available and quite stable.  You can enjoy "telnet",
100957522Sshin"ftp", "rlogin", "rsh", "ssh", etc.  These applications are protocol
101057522Sshinindependent.  That is, they automatically chooses IPv4 or IPv6
101157522Sshinaccording to DNS.
101257522Sshin
101357522Sshin1.11 Kernel Internals
101457522Sshin
101557522Sshin (*) TCP/UDP part is handled differently between operating system platforms.
101657522Sshin     See 1.12 for details.
101757522Sshin
101857522SshinThe current KAME has escaped from the IPv4 netinet logic.  While
101957522Sshinip_forward() calls ip_output(), ip6_forward() directly calls
102057522Sshinif_output() since routers must not divide IPv6 packets into fragments.
102157522Sshin
102257522SshinICMPv6 should contain the original packet as long as possible up to
102357522Sshin1280.  UDP6/IP6 port unreach, for instance, should contain all
102457522Sshinextension headers and the *unchanged* UDP6 and IP6 headers.
102562588SitojunSo, all IP6 functions except TCP6 never convert network byte
102657522Sshinorder into host byte order, to save the original packet.
102757522Sshin
102862588Sitojuntcp6_input(), udp6_input() and icmp6_input() can't assume that IP6
102957522Sshinheader is preceding the transport headers due to extension
103057522Sshinheaders.  So, in6_cksum() was implemented to handle packets whose IP6
103162588Sitojunheader and transport header is not continuous.  TCP/IP6 nor UDP/IP6
103257522Sshinheader structure don't exist for checksum calculation.
103357522Sshin
103457522SshinTo process IP6 header, extension headers and transport headers easily,
103557522SshinKAME requires network drivers to store packets in one internal mbuf or
103657522Sshinone or more external mbufs.  A typical old driver prepares two
103762588Sitojuninternal mbufs for 100 - 208 bytes data, however, KAME's reference
103857522Sshinimplementation stores it in one external mbuf.
103957522Sshin
104057522Sshin"netstat -s -p ip6" tells you whether or not your driver conforms
104157522SshinKAME's requirement.  In the following example, "cce0" violates the
104257522Sshinrequirement. (For more information, refer to Section 2.)
104357522Sshin
104457522Sshin        Mbuf statistics:
104557522Sshin                317 one mbuf
104657522Sshin                two or more mbuf::
104757522Sshin                        lo0 = 8
104857522Sshin			cce0 = 10
104957522Sshin                3282 one ext mbuf
105057522Sshin                0 two or more ext mbuf
105157522Sshin
105257522SshinEach input function calls IP6_EXTHDR_CHECK in the beginning to check
105357522Sshinif the region between IP6 and its header is
105457522Sshincontinuous.  IP6_EXTHDR_CHECK calls m_pullup() only if the mbuf has
105557522SshinM_LOOP flag, that is, the packet comes from the loopback
105657522Sshininterface.  m_pullup() is never called for packets coming from physical
105757522Sshinnetwork interfaces.
105857522Sshin
105962588SitojunTCP6 reassembly makes use of IP6 header to store reassemble
106062588Sitojuninformation.  IP6 is not supposed to be just before TCP6, so
106162588Sitojunip6tcpreass structure has a pointer to TCP6 header.  Of course, it has
106262588Sitojunalso a pointer back to mbuf to avoid m_pullup().
106357522Sshin
106462588SitojunLike TCP6, both IP and IP6 reassemble functions never call m_pullup().
106562588Sitojun
106662588Sitojunxxx_ctlinput() calls in_mrejoin() on PRC_IFNEWADDR.  We think this is
106762588Sitojunone of 4.4BSD implementation flaws.  Since 4.4BSD keeps ia_multiaddrs
106862588Sitojunin in_ifaddr{}, it can't use multicast feature if the interface has no
106962588Sitojununicast address.  So, if an application joins to an interface and then
107062588Sitojunall unicast addresses are removed from the interface, the application
107162588Sitojuncan't send/receive any multicast packets.  Moreover, if a new unicast
107262588Sitojunaddress is assigned to the interface, in_mrejoin() must be called.
107362588SitojunKAME's interfaces, however, have ALWAYS one link-local unicast
107462588Sitojunaddress.  These extensions have thus not been implemented in KAME.
107562588Sitojun
107657522Sshin1.12 IPv4 mapped address and IPv6 wildcard socket
107757522Sshin
1078122115SumeRFC2553/3493 describes IPv4 mapped address (3.7) and special behavior
107957522Sshinof IPv6 wildcard bind socket (3.8).  The spec allows you to:
108057522Sshin- Accept IPv4 connections by AF_INET6 wildcard bind socket.
108157522Sshin- Transmit IPv4 packet over AF_INET6 socket by using special form of
108257522Sshin  the address like ::ffff:10.1.1.1.
108357522Sshinbut the spec itself is very complicated and does not specify how the
108457522Sshinsocket layer should behave.
108557522SshinHere we call the former one "listening side" and the latter one "initiating
108657522Sshinside", for reference purposes.
108757522Sshin
108857522SshinAlmost all KAME implementations treat tcp/udp port number space separately
108962588Sitojunbetween IPv4 and IPv6.  You can perform wildcard bind on both of the address
109057522Sshinfamilies, on the same port.
109157522Sshin
109262588SitojunThere are some OS-platform differences in KAME code, as we use tcp/udp
109362588Sitojuncode from different origin.  The following table summarizes the behavior.
109457522Sshin
109557522Sshin		listening side		initiating side
109662588Sitojun		(AF_INET6 wildcard	(connection to ::ffff:10.1.1.1)
109757522Sshin		socket gets IPv4 conn.)
109857522Sshin		---			---
109962588SitojunKAME/BSDI3	not supported		not supported
110062588SitojunKAME/FreeBSD228	not supported		not supported
110162588SitojunKAME/FreeBSD3x	configurable		supported
110257522Sshin		default: enabled
110362588SitojunKAME/FreeBSD4x	configurable		supported
110462588Sitojun		default: enabled
110562588SitojunKAME/NetBSD	configurable		supported
1106148394Sume		default: disabled
110762588SitojunKAME/BSDI4	enabled			supported
110862588SitojunKAME/OpenBSD	not supported		not supported
110957522Sshin
111057522SshinThe following sections will give you more details, and how you can
111157522Sshinconfigure the behavior.
111257522Sshin
111357522SshinComments on listening side:
111457522Sshin
1115122115SumeIt looks that RFC2553/3493 talks too little on wildcard bind issue,
111662588Sitojunspecifically on (1) port space issue, (2) failure mode, (3) relationship
111762588Sitojunbetween AF_INET/INET6 wildcard bind like ordering constraint, and (4) behavior
111862588Sitojunwhen conflicting socket is opened/closed.  There can be several separate
111957522Sshininterpretation for this RFC which conform to it but behaves differently.
112057522SshinSo, to implement portable application you should assume nothing
112157522Sshinabout the behavior in the kernel.  Using getaddrinfo() is the safest way.
112257522SshinPort number space and wildcard bind issues were discussed in detail
112357522Sshinon ipv6imp mailing list, in mid March 1999 and it looks that there's
112457522Sshinno concrete consensus (means, up to implementers).  You may want to
112557522Sshincheck the mailing list archives.
112662588SitojunWe supply a tool called "bindtest" that explores the behavior of
112762588Sitojunkernel bind(2).  The tool will not be compiled by default.
112857522Sshin
112957522SshinIf a server application would like to accept IPv4 and IPv6 connections,
113062588Sitojunit should use AF_INET and AF_INET6 socket (you'll need two sockets).
113157522SshinUse getaddrinfo() with AI_PASSIVE into ai_flags, and socket(2) and bind(2)
113257522Sshinto all the addresses returned.
113357522SshinBy opening multiple sockets, you can accept connections onto the socket with
113457522Sshinproper address family.  IPv4 connections will be accepted by AF_INET socket,
113562588Sitojunand IPv6 connections will be accepted by AF_INET6 socket (NOTE: KAME/BSDI4
113662588Sitojunkernel sometimes violate this - we will fix it).
113757522Sshin
113862588SitojunIf you try to support IPv6 traffic only and would like to reject IPv4
113962588Sitojuntraffic, always check the peer address when a connection is made toward
114057522SshinAF_INET6 listening socket.  If the address is IPv4 mapped address, you may
114157522Sshinwant to reject the connection.  You can check the condition by using
114262588SitojunIN6_IS_ADDR_V4MAPPED() macro.  This is one of the reasons the author of
114362588Sitojunthe section (itojun) dislikes special behavior of AF_INET6 wildcard bind.
114457522Sshin
114557522SshinComments on initiating side:
114657522Sshin
114757522SshinAdvise to application implementers: to implement a portable IPv6 application
114857522Sshin(which works on multiple IPv6 kernels), we believe that the following
114957522Sshinis the key to the success:
115057522Sshin- NEVER hardcode AF_INET nor AF_INET6.
115157522Sshin- Use getaddrinfo() and getnameinfo() throughout the system.
115257522Sshin  Never use gethostby*(), getaddrby*(), inet_*() or getipnodeby*().
115357522Sshin- If you would like to connect to destination, use getaddrinfo() and try
115457522Sshin  all the destination returned, like telnet does.
115557522Sshin- Some of the IPv6 stack is shipped with buggy getaddrinfo().  Ship a minimal
115657522Sshin  working version with your application and use that as last resort.
115757522Sshin
115857522SshinIf you would like to use AF_INET6 socket for both IPv4 and IPv6 outgoing
115962588Sitojunconnection, you will need tweaked implementation in DNS support libraries,
1160122115Sumeas documented in RFC2553/3493 6.1.  KAME libinet6 includes the tweak in
116162588Sitojungetipnodebyname().  Note that getipnodebyname() itself is not recommended as
116262588Sitojunit does not handle scoped IPv6 addresses at all.  For IPv6 name resolution
116362588Sitojungetaddrinfo() is the preferred API.  getaddrinfo() does not implement the
116462588Sitojuntweak.
116557522Sshin
116657522SshinWhen writing applications that make outgoing connections, story goes much
116762588Sitojunsimpler if you treat AF_INET and AF_INET6 as totally separate address family.
116857522Sshin{set,get}sockopt issue goes simpler, DNS issue will be made simpler.  We do
116957522Sshinnot recommend you to rely upon IPv4 mapped address.
117057522Sshin
117162588Sitojun1.12.1 KAME/BSDI3 and KAME/FreeBSD228
117257522Sshin
117362588SitojunThe platforms do not support IPv4 mapped address at all (both listening side
117462588Sitojunand initiating side).  AF_INET6 and AF_INET sockets are totally separated.
117557522Sshin
117662588SitojunPort number space is totally separate between AF_INET and AF_INET6 sockets. 
117757522Sshin
117878064SumeIt should be noted that KAME/BSDI3 and KAME/FreeBSD228 are not conformant
1179122115Sumeto RFC2553/3493 section 3.7 and 3.8.  It is due to code sharing reasons.
118078064Sume
118162588Sitojun1.12.2 KAME/FreeBSD[34]x
118257522Sshin
118362588SitojunKAME/FreeBSD3x and KAME/FreeBSD4x use shared tcp4/6 code (from
118462588Sitojunsys/netinet/tcp*) and shared udp4/6 code (from sys/netinet/udp*).
118562588SitojunThey use unified inpcb/in6pcb structure.
118657522Sshin
118762588Sitojun1.12.2.1 KAME/FreeBSD[34]x, listening side
118857522Sshin
118962588SitojunThe platform can be configured to support IPv4 mapped address/special
119062588SitojunAF_INET6 wildcard bind (enabled by default).  There is no kernel compilation
119162588Sitojunoption to disable it.  You can enable/disable the behavior with sysctl
119262588Sitojun(per-node), or setsockopt (per-socket).
119362588Sitojun
119462588SitojunWildcard AF_INET6 socket grabs IPv4 connection if and only if the following 
119557522Sshinconditions are satisfied:
119657522Sshin- there's no AF_INET socket that matches the IPv4 connection
119757522Sshin- the AF_INET6 socket is configured to accept IPv4 traffic, i.e.
119878064Sume  getsockopt(IPV6_V6ONLY) returns 0.
119957522Sshin
120062588Sitojun(XXX need checking)
120157522Sshin
120262588Sitojun1.12.2.2 KAME/FreeBSD[34]x, initiating side
120357522Sshin
120462588SitojunKAME/FreeBSD3x supports outgoing connection to IPv4 mapped address
120562588Sitojun(::ffff:10.1.1.1), if the node is configured to accept IPv4 connections
120662588Sitojunby AF_INET6 socket.
120762588Sitojun
120862588Sitojun(XXX need checking)
120962588Sitojun
121062588Sitojun1.12.3 KAME/NetBSD
121162588Sitojun
121262588SitojunKAME/NetBSD uses shared tcp4/6 code (from sys/netinet/tcp*) and shared
121362588Sitojunudp4/6 code (from sys/netinet/udp*).  The implementation is made differently
121462588Sitojunfrom KAME/FreeBSD[34]x.  KAME/NetBSD uses separate inpcb/in6pcb structures,
121562588Sitojunwhile KAME/FreeBSD[34]x uses merged inpcb structure.
121662588Sitojun
121778064SumeIt should be noted that the default configuration of KAME/NetBSD is not
1218122115Sumeconformant to RFC2553/3493 section 3.8.  It is intentionally turned off by
1219122115Sumedefault for security reasons.
122078064Sume
122162588SitojunThe platform can be configured to support IPv4 mapped address/special AF_INET6
122262588Sitojunwildcard bind (disabled by default).  Kernel behavior can be summarized as
122362588Sitojunfollows:
122462588Sitojun- default: special support code will be compiled in, but is disabled by
122578064Sume  default.  It can be controlled by sysctl (net.inet6.ip6.v6only),
122678064Sume  or setsockopt(IPV6_V6ONLY).
1227122115Sume- add "INET6_BINDV6ONLY": No special support code for AF_INET6 wildcard socket
122862588Sitojun  will be compiled in.  AF_INET6 sockets and AF_INET sockets are totally
122962588Sitojun  separate.  The behavior is similar to what described in 1.12.1.
123062588Sitojun
123162588Sitojunsysctl setting will affect per-socket configuration at in6pcb creation time
123262588Sitojunonly.  In other words, per-socket configuration will be copied from sysctl
123362588Sitojunconfiguration at in6pcb creation time.  To change per-socket behavior, you
123462588Sitojunmust perform setsockopt or reopen the socket.  Change in sysctl configuration
123562588Sitojunwill not change the behavior or sockets that are already opened.
123662588Sitojun
1237122115Sume1.12.3.1 KAME/NetBSD, listening side
1238122115Sume
123962588SitojunWildcard AF_INET6 socket grabs IPv4 connection if and only if the following 
124062588Sitojunconditions are satisfied:
124162588Sitojun- there's no AF_INET socket that matches the IPv4 connection
124262588Sitojun- the AF_INET6 socket is configured to accept IPv4 traffic, i.e.
124378064Sume  getsockopt(IPV6_V6ONLY) returns 0.
124462588Sitojun
124562588SitojunYou cannot bind(2) with IPv4 mapped address.  This is a workaround for port
124662588Sitojunnumber duplicate and other twists.
124762588Sitojun
124862588Sitojun1.12.3.2 KAME/NetBSD, initiating side
124962588Sitojun
1250122115SumeWhen getsockopt(IPV6_V6ONLY) is 0 for a socket, you can make an outgoing
1251122115Sumetraffic to IPv4 destination over AF_INET6 socket, using IPv4 mapped
1252122115Sumeaddress destination (::ffff:10.1.1.1).
125362588Sitojun
1254122115SumeWhen getsockopt(IPV6_V6ONLY) is 1 for a socket, you cannot use IPv4 mapped
1255122115Sumeaddress for outgoing traffic.
1256122115Sume
125762588Sitojun1.12.4 KAME/BSDI4
125862588Sitojun
125962588SitojunKAME/BSDI4 uses NRL-based TCP/UDP stack and inpcb source code,
126062588Sitojunwhich was derived from NRL IPv6/IPsec stack.  We guess it supports IPv4 mapped
126162588Sitojunaddress and speical AF_INET6 wildcard bind.  The implementation is, again,
126262588Sitojundifferent from other KAME/*BSDs.
126362588Sitojun
126462588Sitojun1.12.4.1 KAME/BSDI4, listening side
126562588Sitojun
126662588SitojunNRL inpcb layer supports special behavior of AF_INET6 wildcard socket.
126762588SitojunThere is no way to disable the behavior.
126862588Sitojun
126962588SitojunWildcard AF_INET6 socket grabs IPv4 connection if and only if the following 
127062588Sitojuncondition is satisfied:
127162588Sitojun- there's no AF_INET socket that matches the IPv4 connection
127262588Sitojun
127362588Sitojun1.12.4.2 KAME/BSDI4, initiating side
127462588Sitojun
127562588SitojunKAME/BSDi4 supports connection initiation to IPv4 mapped address
127662588Sitojun(like ::ffff:10.1.1.1).
127762588Sitojun
127862588Sitojun1.12.5 KAME/OpenBSD
127962588Sitojun
128062588SitojunKAME/OpenBSD uses NRL-based TCP/UDP stack and inpcb source code,
128162588Sitojunwhich was derived from NRL IPv6/IPsec stack.
128262588Sitojun
1283122115SumeIt should be noted that KAME/OpenBSD is not conformant to RFC2553/3493 section
1284122115Sume3.7 and 3.8.  It is intentionally omitted for security reasons.
128578064Sume
128662588Sitojun1.12.5.1 KAME/OpenBSD, listening side
128762588Sitojun
128862588SitojunKAME/OpenBSD disables special behavior on AF_INET6 wildcard bind for
128962588Sitojunsecurity reasons (if IPv4 traffic toward AF_INET6 wildcard bind is allowed,
129062588Sitojunaccess control will become much harder).  KAME/BSDI4 uses NRL-based TCP/UDP
129162588Sitojunstack as well, however, the behavior is different due to OpenBSD's security
129262588Sitojunpolicy.
129362588Sitojun
129462588SitojunAs a result the behavior of KAME/OpenBSD is similar to KAME/BSDI3 and
129562588SitojunKAME/FreeBSD228 (see 1.12.1 for more detail).
129662588Sitojun
129762588Sitojun1.12.5.2 KAME/OpenBSD, initiating side
129862588Sitojun
129962588SitojunKAME/OpenBSD does not support connection initiation to IPv4 mapped address
130062588Sitojun(like ::ffff:10.1.1.1).
130162588Sitojun
130262588Sitojun1.12.6 More issues
130362588Sitojun
130462588SitojunIPv4 mapped address support adds a big requirement to EVERY userland codebase.
130562588SitojunEvery userland code should check if an AF_INET6 sockaddr contains IPv4
130662588Sitojunmapped address or not.  This adds many twists:
130762588Sitojun
130862588Sitojun- Access controls code becomes harder to write.
130962588Sitojun  For example, if you would like to reject packets from 10.0.0.0/8,
131062588Sitojun  you need to reject packets to AF_INET socket from 10.0.0.0/8,
131162588Sitojun  and to AF_INET6 socket from ::ffff:10.0.0.0/104.
131262588Sitojun- If a protocol on top of IPv4 is defined differently with IPv6, we need to be
131362588Sitojun  really careful when we determine which protocol to use.
131462588Sitojun  For example, with FTP protocol, we can not simply use sa_family to determine
131562588Sitojun  FTP command sets.  The following example is incorrect:
131662588Sitojun	if (sa_family == AF_INET)
131762588Sitojun		use EPSV/EPRT or PASV/PORT;	/*IPv4*/
131862588Sitojun	else if (sa_family == AF_INET6)
131962588Sitojun		use EPSV/EPRT or LPSV/LPRT;	/*IPv6*/
132062588Sitojun	else
132162588Sitojun		error;
132278064Sume  The correct code, with consideration to IPv4 mapped address, would be:
132362588Sitojun	if (sa_family == AF_INET)
132462588Sitojun		use EPSV/EPRT or PASV/PORT;	/*IPv4*/
132562588Sitojun	else if (sa_family == AF_INET6 && IPv4 mapped address)
132662588Sitojun		use EPSV/EPRT or PASV/PORT;	/*IPv4 command set on AF_INET6*/
132762588Sitojun	else if (sa_family == AF_INET6 && !IPv4 mapped address)
132862588Sitojun		use EPSV/EPRT or LPSV/LPRT;	/*IPv6*/
132962588Sitojun	else
133062588Sitojun		error;
133162588Sitojun  It is too much to ask for every body to be careful like this.
133262588Sitojun  The problem is, we are not sure if the above code fragment is perfect for
133362588Sitojun  all situations.
133462588Sitojun- By enabling kernel support for IPv4 mapped address (outgoing direction),
133562588Sitojun  servers on the kernel can be hosed by IPv6 native packet that has IPv4
133662588Sitojun  mapped address in IPv6 header source, and can generate unwanted IPv4 packets.
1337122115Sume  draft-itojun-ipv6-transition-abuse-01.txt, draft-cmetz-v6ops-v4mapped-api-
1338122115Sume  harmful-00.txt, and draft-itojun-v6ops-v4mapped-harmful-01.txt
1339122115Sume  has more on this scenario.
134062588Sitojun
134162588SitojunDue to the above twists, some of KAME userland programs has restrictions on
134262588Sitojunthe use of IPv4 mapped addresses:
134362588Sitojun- rshd/rlogind do not accept connections from IPv4 mapped address.
134462588Sitojun  This is to avoid malicious use of IPv4 mapped address in IPv6 native
134562588Sitojun  packet, to bypass source-address based authentication.
134678064Sume- ftp/ftpd assume that you are on dual stack network.  IPv4 mapped address
134778064Sume  will be decoded in userland, and will be passed to AF_INET sockets
134878064Sume  (in other words, ftp/ftpd do not support SIIT environment).
134962588Sitojun
135078064Sume1.12.7 Interaction with SIIT translator
135178064Sume
135278064SumeSIIT translator is specified in RFC2765.  KAME node cannot become a SIIT
135378064Sumetranslator box, nor SIIT end node (a node in SIIT cloud).
135478064Sume
135578064SumeTo become a SIIT translator box, we need to put additional code for that.
135678064SumeWe do not have the code in our tree at this moment.
135778064Sume
135878064SumeThere are multiple reasons that we are unable to become SIIT end node.
135978064Sume(1) SIIT translators require end nodes in the SIIT cloud to be IPv6-only.
136078064SumeSince we are unable to compile INET-less kernel, we are unable to become
136178064SumeSIIT end node.  (2) As presented in 1.12.6, some of our userland code assumes
136278064Sumedual stack network.  (3) KAME stack filters out IPv6 packets with IPv4
136378064Sumemapped address in the header, to secure non-SIIT case (which is much more
136478064Sumecommon).  Effectively KAME node will reject any packets via SIIT translator
136578064Sumebox.  See section 1.14 for more detail about the last item.
136678064Sume
136778064SumeThere are documentation issues too - SIIT document requires very strange
136878064Sumethings.  For example, SIIT document asks IPv6-only (meaning no IPv4 code)
136978064Sumenode to be able to construct IPv4 IPsec headers.  If a node knows how to
137078064Sumeconstruct IPv4 IPsec headers, that is not an IPv6-only node, it is a dual-stack
137178064Sumenode.  The requirements imposed in SIIT document contradict with the other
137278064Sumepart of the document itself.
137378064Sume
137457522Sshin1.13 sockaddr_storage
137557522Sshin
137662588SitojunWhen RFC2553 was about to be finalized, there was discussion on how struct
137757522Sshinsockaddr_storage members are named.  One proposal is to prepend "__" to the
137857522Sshinmembers (like "__ss_len") as they should not be touched.  The other proposal
137957522Sshinwas that don't prepend it (like "ss_len") as we need to touch those members
138057522Sshindirectly.  There was no clear consensus on it.
138157522Sshin
138257522SshinAs a result, RFC2553 defines struct sockaddr_storage as follows:
138357522Sshin	struct sockaddr_storage {
138457522Sshin		u_char	__ss_len;	/* address length */
138557522Sshin		u_char	__ss_family;	/* address family */
138657522Sshin		/* and bunch of padding */
138757522Sshin	};
138857522SshinOn the contrary, XNET draft defines as follows:
138957522Sshin	struct sockaddr_storage {
139057522Sshin		u_char	ss_len;		/* address length */
139157522Sshin		u_char	ss_family;	/* address family */
139257522Sshin		/* and bunch of padding */
139357522Sshin	};
139457522Sshin
1395122115SumeIn December 1999, it was agreed that RFC2553bis (RFC3493) should pick the
1396122115Sumelatter (XNET) definition.
139757522Sshin
139857522SshinKAME kit prior to December 1999 used RFC2553 definition.  KAME kit after
139957522SshinDecember 1999 (including December) will conform to XNET definition,
1400122115Sumebased on RFC3493 discussion.
140157522Sshin
140257522SshinIf you look at multiple IPv6 implementations, you will be able to see
140357522Sshinboth definitions.  As an userland programmer, the most portable way of
140457522Sshindealing with it is to:
140557522Sshin(1) ensure ss_family and/or ss_len are available on the platform, by using
140657522Sshin    GNU autoconf,
140757522Sshin(2) have -Dss_family=__ss_family to unify all occurences (including header
140857522Sshin    file) into __ss_family, or
140957522Sshin(3) never touch __ss_family.  cast to sockaddr * and use sa_family like:
141057522Sshin	struct sockaddr_storage ss;
141157522Sshin	family = ((struct sockaddr *)&ss)->sa_family
141257522Sshin
141362588Sitojun1.14 Invalid addresses on the wire
141462588Sitojun
141562588SitojunSome of IPv6 transition technologies embed IPv4 address into IPv6 address.
141662588SitojunThese specifications themselves are fine, however, there can be certain
141762588Sitojunset of attacks enabled by these specifications.  Recent speicifcation
141862588Sitojundocuments covers up those issues, however, there are already-published RFCs
141962588Sitojunthat does not have protection against those (like using source address of
142062588Sitojun::ffff:127.0.0.1 to bypass "reject packet from remote" filter).
142162588Sitojun
142262588SitojunTo name a few, these address ranges can be used to hose an IPv6 implementation,
142362588Sitojunor bypass security controls:
142462588Sitojun- IPv4 mapped address that embeds unspecified/multicast/loopback/broadcast
142562588Sitojun  IPv4 address (if they are in IPv6 native packet header, they are malicious)
142662588Sitojun	::ffff:0.0.0.0/104	::ffff:127.0.0.0/104
142762588Sitojun	::ffff:224.0.0.0/100	::ffff:255.0.0.0/104 
142878064Sume- 6to4 (RFC3056) prefix generated from unspecified/multicast/loopback/
142978064Sume  broadcast/private IPv4 address
143062588Sitojun	2002:0000::/24		2002:7f00::/24		2002:e000::/24
143162588Sitojun	2002:ff00::/24		2002:0a00::/24		2002:ac10::/28	
143262588Sitojun	2002:c0a8::/32
143378064Sume- IPv4 compatible address that embeds unspecified/multicast/loopback/broadcast
143478064Sume  IPv4 address (if they are in IPv6 native packet header, they are malicious).
143578064Sume  Note that, since KAME doe snot support RFC1933/2893 auto tunnels, KAME nodes
143678064Sume  are not vulnerable to these packets.
143778064Sume	::0.0.0.0/104	::127.0.0.0/104	::224.0.0.0/100	::255.0.0.0/104 
143862588Sitojun
143978064SumeAlso, since KAME does not support RFC1933/2893 auto tunnels, seeing IPv4
144078064Sumecompatible is very rare.  You should take caution if you see those on the wire.
144162588Sitojun
144278064SumeIf we see IPv6 packets with IPv4 mapped address (::ffff:0.0.0.0/96) in the
144378064Sumeheader in dual-stack environment (not in SIIT environment), they indicate
144478064Sumethat someone is trying to inpersonate IPv4 peer.  The packet should be dropped.
144578064Sume
144678064SumeIPv6 specifications do not talk very much about IPv6 unspecified address (::)
144778064Sumein the IPv6 source address field.  Clarification is in progress.
144878064SumeHere are couple of comments:
144978064Sume- IPv6 unspecified address can be used in IPv6 source address field, if and
145078064Sume  only if we have no legal source address for the node.  The legal situations
145178064Sume  include, but may not be limited to, (1) MLD while no IPv6 address is assigned
145278064Sume  to the node and (2) DAD.
145378064Sume- If IPv6 TCP packet has IPv6 unspecified address, it is an attack attempt.
145478064Sume  The form can be used as a trigger for TCP DoS attack.  KAME code already
145578064Sume  filters them out.
145678064Sume- The following examples are seemingly illegal.  It seems that there's general
1457151539Ssuz  consensus among ipngwg for those.  (1) Mobile IPv6 home address option,
145878064Sume  (2) offlink packets (so routers should not forward them).
145978064Sume  KAME implmements (2) already.
146078064Sume
146162588SitojunKAME code is carefully written to avoid such incidents.  More specifically,
146262588SitojunKAME kernel will reject packets with certain source/dstination address in IPv6
146362588Sitojunbase header, or IPv6 routing header.  Also, KAME default configuration file
146462588Sitojunis written carefully, to avoid those attacks.
146562588Sitojun
1466122115Sumedraft-itojun-ipv6-transition-abuse-01.txt, draft-cmetz-v6ops-v4mapped-api-
1467122115Sumeharmful-00.txt and draft-itojun-v6ops-v4mapped-harmful-01.txt has more on
1468122115Sumethis issue.
146962588Sitojun
147062588Sitojun1.15 Node's required addresses
147162588Sitojun
147262588SitojunRFC2373 section 2.8 talks about required addresses for an IPv6
147362588Sitojunnode.  The section talks about how KAME stack manages those required
147462588Sitojunaddresses.
147562588Sitojun
147662588Sitojun1.15.1 Host case
147762588Sitojun
147862588SitojunThe following items are automatically assigned to the node (or the node will
147962588Sitojunautomatically joins the group), at bootstrap time:
148062588Sitojun- Loopback address
148162588Sitojun- All-nodes multicast addresses (ff01::1)
148262588Sitojun
148362588SitojunThe following items will be automatically handled when the interface becomes
148462588SitojunIFF_UP:
148562588Sitojun- Its link-local address for each interface
148662588Sitojun- Solicited-node multicast address for link-local addresses
148762588Sitojun- Link-local allnodes multicast address (ff02::1)
148862588Sitojun
148962588SitojunThe following items need to be configured manually by ifconfig(8) or prefix(8).
149062588SitojunAlternatively, these can be autoconfigured by using stateless address
149162588Sitojunautoconfiguration.
149262588Sitojun- Assigned unicast/anycast addresses
149362588Sitojun- Solicited-Node multicast address for assigned unicast address
149462588Sitojun
149562588SitojunUsers can join groups by using appropriate system calls like setsockopt(2).
149662588Sitojun
149762588Sitojun1.15.2 Router case
149862588Sitojun
149962588SitojunIn addition to the above, routers needs to handle the following items.
150062588Sitojun
150162588SitojunThe following items need to be configured manually by using ifconfig(8).
150262588Sitojuno The subnet-router anycast addresses for the interfaces it is configured
150362588Sitojun  to act as a router on (prefix::/64)
150462588Sitojuno All other anycast addresses with which the router has been configured
150562588Sitojun
150662588SitojunThe router will join the following multicast group when rtadvd(8) is available
150762588Sitojunfor the interface.
150862588Sitojuno All-Routers Multicast Addresses (ff02::2)
150962588Sitojun
151062588SitojunRouting daemons will join appropriate multicast groups, as necessary,
151162588Sitojunlike ff02::9 for RIPng.
151262588Sitojun
151362588SitojunUsers can join groups by using appropriate system calls like setsockopt(2).
151462588Sitojun
151578064Sume1.16 Advanced API
151678064Sume
1517122115SumeCurrent KAME kernel implements RFC3542 API.  It also implements RFC2292 API,
151878064Sumefor backward compatibility purposes with *BSD-integrated codebase.
1519122115SumeKAME tree ships with RFC3542 headers.
1520122115Sume*BSD-integrated codebase implements either RFC2292, or RFC3542, API.
152178064Sumesee "COVERAGE" document for detailed implementation status.
152278064Sume
152378064SumeHere are couple of issues to mention:
152478064Sume- *BSD-integrated binaries, compiled for RFC2292, will work on KAME kernel.
152578064Sume  For example, OpenBSD 2.7 /sbin/rtsol will work on KAME/openbsd kernel.
1526122115Sume- KAME binaries, compiled using RFC3542, will not work on *BSD-integrated
152778064Sume  kenrel.  For example, KAME /usr/local/v6/sbin/rtsol will not work on
152878064Sume  OpenBSD 2.7 kernel.
1529122115Sume- RFC3542 API is not compatible with RFC2292 API.  RFC3542 #define symbols
153078064Sume  conflict with RFC2292 symbols.  Therefore, if you compile programs that
153178064Sume  assume RFC2292 API, the compilation itself goes fine, however, the compiled
153278064Sume  binary will not work correctly.  The problem is not KAME issue, but API
1533122115Sume  issue.  For example, Solaris 8 implements RFC3542 API.  If you compile
153478064Sume  RFC2292-based code on Solaris 8, the binary can behave strange.
153578064Sume
153678064SumeThere are few (or couple of) incompatible behavior in RFC2292 binary backward
153778064Sumecompatibility support in KAME tree.  To enumerate:
153878064Sume- Type 0 routing header lacks support for strict/loose bitmap.
153978064Sume  Even if we see packets with "strict" bit set, those bits will not be made
154078064Sume  visible to the userland.
154178064Sume  Background: RFC2292 document is based on RFC1883 IPv6, and it uses
1542122115Sume  strict/loose bitmap.  RFC3542 document is based on RFC2460 IPv6, and it has
154378064Sume  no strict/loose bitmap (it was removed from RFC2460).  KAME tree obeys
154478064Sume  RFC2460 IPv6, and lacks support for strict/loose bitmap.
154578064Sume
1546122115SumeThe RFC3542 documents leave some particular cases unspecified.  The
1547122115SumeKAME implementation treats them as follows:
1548122115Sume- The IPV6_DONTFRAG and IPV6_RECVPATHMTU socket options for TCP
1549122115Sume  sockets are ignored.  That is, the setsocktopt() call will succeed
1550122115Sume  but the specified value will have no effect.
1551122115Sume
1552122115Sume1.17 DNS resolver
1553122115Sume
1554122115SumeKAME ships with modified DNS resolver, in libinet6.a.
1555122115Sumelibinet6.a has a comple of extensions against libc DNS resolver:
1556122115Sume- Can take "options insecure1" and "options insecure2" in /etc/resolv.conf,
1557122115Sume  which toggles RES_INSECURE[12] option flag bit.
1558122115Sume- EDNS0 receive buffer size notification support.  It can be enabled by
1559122115Sume  "options edns0" in /etc/resolv.conf.  See USAGE for details.
1560122115Sume- IPv6 transport support (queries/responses over IPv6).  Most of BSD official
1561122115Sume  releases now has it already.
1562122115Sume- Partial A6 chain chasing/DNAME/bit string label support (KAME/BSDI4).
1563122115Sume
1564122115Sume
156557522Sshin2. Network Drivers
156657522Sshin
156762588SitojunKAME requires three items to be added into the standard drivers:
156857522Sshin
1569122115Sume(1) (freebsd[234] and bsdi[34] only) mbuf clustering requirement.
1570122115Sume    In this stable release, we changed MINCLSIZE into MHLEN+1 for all the
1571122115Sume    operating systems in order to make all the drivers behave as we expect.  
157257522Sshin
157357522Sshin(2) multicast.  If "ifmcstat" yields no multicast group for a
157457522Sshin    interface, that interface has to be patched.
157557522Sshin
157662588SitojunTo avoid troubles, we suggest you to comment out the device drivers
157762588Sitojunfor unsupported/unnecessary cards, from the kernel configuration file.
157862588SitojunIf you accidentally enable unsupported drivers, some of the userland
157962588Sitojuntools may not work correctly (routing daemons are typical example).
158057522Sshin
158162588SitojunIn the following sections, "official support" means that KAME developers
158262588Sitojunare using that ethernet card/driver frequently.
158362588Sitojun
158457522Sshin(NOTE: In the past we required all pcmcia drivers to have a call to
158557522Sshinin6_ifattach().  We have no such requirement any more)
158657522Sshin
158762588Sitojun2.1 FreeBSD 2.2.x-RELEASE
158862588Sitojun
158962588SitojunHere is a list of FreeBSD 2.2.x-RELEASE drivers and its conditions:
159062588Sitojun
159162588Sitojun	driver	mbuf(1)		multicast(2)	official support?
159262588Sitojun	---	---		---		---
159362588Sitojun	(Ethernet)
159462588Sitojun	ar	looks ok	-		-
159562588Sitojun	cnw	ok		ok		yes (*)
159662588Sitojun	ed	ok		ok		yes
159762588Sitojun	ep	ok		ok		yes
159862588Sitojun	fe	ok		ok		yes
159962588Sitojun	sn	looks ok	-		-   (*)
160062588Sitojun	vx	looks ok	-		-
160162588Sitojun	wlp	ok		ok		-   (*)
160262588Sitojun	xl	ok		ok		yes
160362588Sitojun	zp	ok		ok		-
160462588Sitojun	(FDDI)
160562588Sitojun	fpa	looks ok	?		-
160662588Sitojun	(ATM)
160762588Sitojun	en	ok		ok		yes
160862588Sitojun	(Serial)
160962588Sitojun	lp	?		-		not work
161062588Sitojun	sl	?		-		not work
161162588Sitojun	sr	looks ok	ok		-   (**)
161262588Sitojun
161362588SitojunYou may want to add an invocation of "rtsol" in "/etc/pccard_ether",
161462588Sitojunif you are using notebook computers and PCMCIA ethernet card.
161562588Sitojun
161662588Sitojun(*) These drivers are distributed with PAO (http://www.jp.freebsd.org/PAO/).
161762588Sitojun
161862588Sitojun(**) There was some report says that, if you make sr driver up and down and
161962588Sitojunthen up, the kernel may hang up.  We have disabled frame-relay support from
162062588Sitojunsr driver and after that this looks to be working fine.  If you need
162162588Sitojunframe-relay support to come back, please contact KAME developers.
162262588Sitojun
162362588Sitojun2.2 BSD/OS 3.x
162462588Sitojun
162562588SitojunThe following lists BSD/OS 3.x device drivers and its conditions:
162662588Sitojun
162762588Sitojun	driver	mbuf(1)		multicast(2)	official support?
162862588Sitojun	---	---		---		---
162962588Sitojun	(Ethernet)
163062588Sitojun	cnw	ok		ok		yes
163162588Sitojun	de	ok		ok		-
163262588Sitojun	df	ok		ok		-
163362588Sitojun	eb	ok		ok		-
163462588Sitojun	ef	ok		ok		yes
163562588Sitojun	exp	ok		ok		-
163662588Sitojun	mz	ok		ok		yes
163762588Sitojun	ne	ok		ok		yes
163862588Sitojun	we	ok		ok		-
163962588Sitojun	(FDDI)
164062588Sitojun	fpa	ok		ok		-
164162588Sitojun	(ATM)
164262588Sitojun	en	maybe		ok		-
164362588Sitojun	(Serial)
164462588Sitojun	ntwo	ok		ok		yes
164562588Sitojun	sl	?		-		not work
164662588Sitojun	appp	?		-		not work
164762588Sitojun
164862588SitojunYou may want to use "@insert" directive in /etc/pccard.conf to invoke
164962588Sitojun"rtsol" command right after dynamic insertion of PCMCIA ethernet cards.
165062588Sitojun
165162588Sitojun2.3 NetBSD
165262588Sitojun
165362588SitojunThe following table lists the network drivers we have tried so far.
165462588Sitojun
165562588Sitojun	driver		mbuf(1)	multicast(2)	official support?
165662588Sitojun	---		---	---		---
165762588Sitojun	(Ethernet)
165862588Sitojun	awi pcmcia/i386	ok	ok		-
165962588Sitojun	bah zbus/amiga	NG(*)
166062588Sitojun	cnw pcmcia/i386	ok	ok		yes
166162588Sitojun	ep pcmcia/i386	ok	ok		-
1662151539Ssuz	fxp pci/i386	ok(*2)	ok		-
1663151539Ssuz	tlp pci/i386	ok	ok		-
166462588Sitojun	le sbus/sparc	ok	ok		yes
166562588Sitojun	ne pci/i386	ok	ok		yes
166662588Sitojun	ne pcmcia/i386	ok	ok		yes
1667151539Ssuz	rtk pci/i386	ok	ok		-
166862588Sitojun	wi pcmcia/i386	ok	ok		yes
166962588Sitojun	(ATM)
167062588Sitojun	en pci/i386	ok	ok		-
167162588Sitojun
167262588Sitojun(*) This may need some fix, but I'm not sure what arcnet interfaces assume...
167362588Sitojun
167462588Sitojun2.4 FreeBSD 3.x-RELEASE
167562588Sitojun
167662588SitojunHere is a list of FreeBSD 3.x-RELEASE drivers and its conditions:
167762588Sitojun
167862588Sitojun	driver	mbuf(1)		multicast(2)	official support?
167962588Sitojun	---	---		---		---
168062588Sitojun	(Ethernet)
168162588Sitojun	cnw	ok		ok		-(*)
168262588Sitojun	ed	?		ok		-
168362588Sitojun	ep	ok		ok		-
168462588Sitojun	fe	ok		ok		yes
168562588Sitojun	fxp	?(**)
168662588Sitojun	lnc	?		ok		-
168762588Sitojun	sn	?		?		-(*)
168862588Sitojun	wi	ok		ok		yes
168962588Sitojun	xl	?		ok		-
169062588Sitojun
169162588Sitojun(*) These drivers are distributed with PAO as PAO3
169262588Sitojun    (http://www.jp.freebsd.org/PAO/).
1693151539Ssuz(**) there were trouble reports with multicast filter initialization.
169462588Sitojun
169562588SitojunMore drivers will just simply work on KAME FreeBSD 3.x-RELEASE but have not
169662588Sitojunbeen checked yet.
169762588Sitojun
169878064Sume2.5 FreeBSD 4.x-RELEASE
169962588Sitojun
170078064SumeHere is a list of FreeBSD 4.x-RELEASE drivers and its conditions:
170178064Sume
170278064Sume	driver		multicast
170378064Sume	---		---
170478064Sume	(Ethernet)
170578064Sume	lnc/vmware	ok
170678064Sume
170778064Sume2.6 OpenBSD 2.x
170878064Sume
170962588SitojunHere is a list of OpenBSD 2.x drivers and its conditions:
171062588Sitojun
171162588Sitojun	driver		mbuf(1)		multicast(2)	official support?
171262588Sitojun	---		---		---		---
171362588Sitojun	(Ethernet)
171462588Sitojun	de pci/i386	ok		ok		yes
171562588Sitojun	fxp pci/i386	?(*)
171662588Sitojun	le sbus/sparc	ok		ok		yes
171762588Sitojun	ne pci/i386	ok		ok		yes
171862588Sitojun	ne pcmcia/i386	ok		ok		yes
171962588Sitojun	wi pcmcia/i386	ok		ok		yes
172062588Sitojun
172162588Sitojun(*) There seem to be some problem in driver, with multicast filter
172262588Sitojunconfiguration.  This happens with certain revision of chipset on the card.
172362588SitojunShould be fixed by now by workaround in sys/net/if.c, but still not sure.
172462588Sitojun
172578064Sume2.7 BSD/OS 4.x
172662588Sitojun
172762588SitojunThe following lists BSD/OS 4.x device drivers and its conditions:
172862588Sitojun
172962588Sitojun	driver	mbuf(1)		multicast(2)	official support?
173062588Sitojun	---	---		---		---
173162588Sitojun	(Ethernet)
173262588Sitojun	de	ok		ok		yes
173362588Sitojun	exp	(*)
173462588Sitojun
173562588SitojunYou may want to use "@insert" directive in /etc/pccard.conf to invoke
173662588Sitojun"rtsol" command right after dynamic insertion of PCMCIA ethernet cards.
173762588Sitojun
173862588Sitojun(*) exp driver has serious conflict with KAME initialization sequence.
173962588SitojunA workaround is committed into sys/i386/pci/if_exp.c, and should be okay by now.
174062588Sitojun
1741151539Ssuz
174257522Sshin3. Translator
174357522Sshin
174457522SshinWe categorize IPv4/IPv6 translator into 4 types.
174557522Sshin
174657522SshinTranslator A --- It is used in the early stage of transition to make
174757522Sshinit possible to establish a connection from an IPv6 host in an IPv6
174857522Sshinisland to an IPv4 host in the IPv4 ocean.
174957522Sshin
175057522SshinTranslator B --- It is used in the early stage of transition to make
175157522Sshinit possible to establish a connection from an IPv4 host in the IPv4
175257522Sshinocean to an IPv6 host in an IPv6 island.
175357522Sshin
175457522SshinTranslator C --- It is used in the late stage of transition to make it
175557522Sshinpossible to establish a connection from an IPv4 host in an IPv4 island
175657522Sshinto an IPv6 host in the IPv6 ocean.
175757522Sshin
175857522SshinTranslator D --- It is used in the late stage of transition to make it
175957522Sshinpossible to establish a connection from an IPv6 host in the IPv6 ocean
176057522Sshinto an IPv4 host in an IPv4 island.
176157522Sshin
176257522SshinKAME provides an TCP relay translator for category A.  This is called
176357522Sshin"FAITH".  We also provide IP header translator for category A.
176457522Sshin
176557522Sshin3.1 FAITH TCP relay translator
176657522Sshin
176757522SshinFAITH system uses TCP relay daemon called "faithd" helped by the KAME kernel.
176857522SshinFAITH will reserve an IPv6 address prefix, and relay TCP connection
176957522Sshintoward that prefix to IPv4 destination.
177057522Sshin
177157522SshinFor example, if the reserved IPv6 prefix is 3ffe:0501:0200:ffff::, and
177257522Sshinthe IPv6 destination for TCP connection is 3ffe:0501:0200:ffff::163.221.202.12,
177357522Sshinthe connection will be relayed toward IPv4 destination 163.221.202.12.
177457522Sshin
177557522Sshin	destination IPv4 node (163.221.202.12)
177657522Sshin	  ^
177757522Sshin	  | IPv4 tcp toward 163.221.202.12
177857522Sshin	FAITH-relay dual stack node
177957522Sshin	  ^
178057522Sshin	  | IPv6 TCP toward 3ffe:0501:0200:ffff::163.221.202.12
178157522Sshin	source IPv6 node
178257522Sshin
178357522Sshinfaithd must be invoked on FAITH-relay dual stack node.
178457522Sshin
1785151539SsuzFor more details, consult kame/kame/faithd/README and RFC3142.
178657522Sshin
178757522Sshin3.2 IPv6-to-IPv4 header translator
178857522Sshin
178978064Sume(to be written)
179057522Sshin
1791151539Ssuz
179257522Sshin4. IPsec
179357522Sshin
179462588SitojunIPsec is implemented as the following three components.
179557522Sshin
179657522Sshin(1) Policy Management
179757522Sshin(2) Key Management
179862588Sitojun(3) AH, ESP and IPComp handling in kernel
179957522Sshin
180062588SitojunNote that KAME/OpenBSD does NOT include support for KAME IPsec code,
180162588Sitojunas OpenBSD team has their home-brew IPsec stack and they have no plan
180262588Sitojunto replace it.  IPv6 support for IPsec is, therefore, lacking on KAME/OpenBSD.
180362588Sitojun
180478064Sumehttp://www.netbsd.org/Documentation/network/ipsec/ has more information
180578064Sumeincluding usage examples.
180678064Sume
180757522Sshin4.1 Policy Management
180857522Sshin
1809122115SumeThe kernel implements experimental policy management code.  There are two ways
181057522Sshinto manage security policy.  One is to configure per-socket policy using
181157522Sshinsetsockopt(3).  In this cases, policy configuration is described in
181257522Sshinipsec_set_policy(3).  The other is to configure kernel packet filter-based
181357522Sshinpolicy using PF_KEY interface, via setkey(8).
181457522Sshin
181562588SitojunThe policy entry will be matched in order.  The order of entries makes
181662588Sitojundifference in behavior.
181757522Sshin
181857522Sshin4.2 Key Management
181957522Sshin
182057522SshinThe key management code implemented in this kit (sys/netkey) is a
182157522Sshinhome-brew PFKEY v2 implementation.  This conforms to RFC2367.
182257522Sshin
182362588SitojunThe home-brew IKE daemon, "racoon" is included in the kit (kame/kame/racoon,
182462588Sitojunor usr.sbin/racoon).
182557522SshinBasically you'll need to run racoon as daemon, then setup a policy
182657522Sshinto require keys (like ping -P 'out ipsec esp/transport//use').
182757522SshinThe kernel will contact racoon daemon as necessary to exchange keys.
182857522Sshin
182962588SitojunIn IKE spec, there's ambiguity about interpretation of "tunnel" proposal.
183062588SitojunFor example, if we would like to propose the use of following packet:
183162588Sitojun	IP AH ESP IP payload
183262588Sitojunsome implementation proposes it as "AH transport and ESP tunnel", since
183362588Sitojunthis is more logical from packet construction point of view.  Some
183462588Sitojunimplementation proposes it as "AH tunnel and ESP tunnel".
1835122115SumeRacoon follows the latter route (previously it followed the former, and
1836122115Sumethe latter interpretation seems to be popular/consensus).
183762588SitojunThis raises real interoperability issue.  We hope this to be resolved quickly.
183862588Sitojun
1839122115Sumeracoon does not implement byte lifetime for both phase 1 and phase 2
1840122115Sume(RFC2409 page 35, Life Type = kilobytes).
1841122115Sume
184257522Sshin4.3 AH and ESP handling
184357522Sshin
184457522SshinIPsec module is implemented as "hooks" to the standard IPv4/IPv6
184557522Sshinprocessing.  When sending a packet, ip{,6}_output() checks if ESP/AH
184657522Sshinprocessing is required by checking if a matching SPD (Security
184757522SshinPolicy Database) is found.  If ESP/AH is needed,
184857522Sshin{esp,ah}{4,6}_output() will be called and mbuf will be updated
184957522Sshinaccordingly.  When a packet is received, {esp,ah}4_input() will be
185057522Sshincalled based on protocol number, i.e. (*inetsw[proto])().
185157522Sshin{esp,ah}4_input() will decrypt/check authenticity of the packet,
185257522Sshinand strips off daisy-chained header and padding for ESP/AH.  It is
185357522Sshinsafe to strip off the ESP/AH header on packet reception, since we
185457522Sshinwill never use the received packet in "as is" form.
185557522Sshin
185657522SshinBy using ESP/AH, TCP4/6 effective data segment size will be affected by
185757522Sshinextra daisy-chained headers inserted by ESP/AH.  Our code takes care of
185857522Sshinthe case.
185957522Sshin
186057522SshinBasic crypto functions can be found in directory "sys/crypto".  ESP/AH
186157522Sshintransform are listed in {esp,ah}_core.c with wrapper functions.  If you
186257522Sshinwish to add some algorithm, add wrapper function in {esp,ah}_core.c, and
186357522Sshinadd your crypto algorithm code into sys/crypto.
186457522Sshin
186562588SitojunTunnel mode works basically fine, but comes with the following restrictions:
186662588Sitojun- You cannot run routing daemon across IPsec tunnel, since we do not model
186762588Sitojun  IPsec tunnel as pseudo interfaces.
186857522Sshin- Authentication model for AH tunnel must be revisited.  We'll need to
186957522Sshin  improve the policy management engine, eventually.
187078064Sume- Path MTU discovery does not work across IPv6 IPsec tunnel gateway due to
187178064Sume  insufficient code.
187257522Sshin
187362588SitojunAH specificaton does not talk much about "multiple AH on a packet" case.
187462588SitojunWe incrementally compute AH checksum, from inside to outside.  Also, we
187562588Sitojuntreat inner AH to be immutable.
187662588SitojunFor example, if we are to create the following packet:
187762588Sitojun	IP AH1 AH2 AH3 payload
187862588Sitojunwe do it incrementally.  As a result, we get crypto checksums like below:
187962588Sitojun	AH3 has checksum against "IP AH3' payload".
188062588Sitojun		where AH3' = AH3 with checksum field filled with 0.
188162588Sitojun	AH2 has checksum against "IP AH2' AH3 payload".
188262588Sitojun	AH1 has checksum against "IP AH1' AH2 AH3 payload",
188362588SitojunAlso note that AH3 has the smallest sequence number, and AH1 has the largest
188462588Sitojunsequence number.
188557522Sshin
188678064SumeTo avoid traffic analysis on shorter packets, ESP output logic supports
188778064Sumerandom length padding.  By setting net.inet.ipsec.esp_randpad (or
188878064Sumenet.inet6.ipsec6.esp_randpad) to positive value N, you can ask the kernel
188978064Sumeto randomly pad packets shorter than N bytes, to random length smaller than
189078064Sumeor equal to N.  Note that N does not include ESP authentication data length.
189178064SumeAlso note that the random padding is not included in TCP segment
189278064Sumesize computation.  Negative value will turn off the functionality.
189378064SumeRecommeded value for N is like 128, or 256.  If you use a too big number
189478064Sumeas N, you may experience inefficiency due to fragmented packtes.
189578064Sume
189662588Sitojun4.4 IPComp handling
189762588Sitojun
189862588SitojunIPComp stands for IP payload compression protocol.  This is aimed for
189962588Sitojunpayload compression, not the header compression like PPP VJ compression.
190062588SitojunThis may be useful when you are using slow serial link (say, cell phone)
190162588Sitojunwith powerful CPU (well, recent notebook PCs are really powerful...).
190262588SitojunThe protocol design of IPComp is very similar to IPsec, though it was
190362588Sitojundefined separately from IPsec itself.
190462588Sitojun
190562588SitojunHere are some points to be noted:
190662588Sitojun- IPComp is treated as part of IPsec protocol suite, and SPI and
190762588Sitojun  CPI space is unified.  Spec says that there's no relationship
190862588Sitojun  between two so they are assumed to be separate in specs.
190962588Sitojun- IPComp association (IPCA) is kept in SAD.
191062588Sitojun- It is possible to use well-known CPI (CPI=2 for DEFLATE for example),
191162588Sitojun  for outbound/inbound packet, but for indexing purposes one element from
191262588Sitojun  SPI/CPI space will be occupied anyway.
191362588Sitojun- pfkey is modified to support IPComp.  However, there's no official
191462588Sitojun  SA type number assignment yet.  Portability with other IPComp
191562588Sitojun  stack is questionable (anyway, who else implement IPComp on UN*X?).
191662588Sitojun- Spec says that IPComp output processing must be performed before AH/ESP
191762588Sitojun  output processing, to achieve better compression ratio and "stir" data
191862588Sitojun  stream before encryption.  The most meaningful processing order is:
191962588Sitojun  (1) compress payload by IPComp, (2) encrypt payload by ESP, then (3) attach
192062588Sitojun  authentication data by AH.
192162588Sitojun  However, with manual SPD setting, you are able to violate the ordering
192262588Sitojun  (KAME code is too generic, maybe).  Also, it is just okay to use IPComp
192362588Sitojun  alone, without AH/ESP.
192462588Sitojun- Though the packet size can be significantly decreased by using IPComp, no
192562588Sitojun  special consideration is made about path MTU (spec talks nothing about MTU
192662588Sitojun  consideration).  IPComp is designed for serial links, not ethernet-like
192762588Sitojun  medium, it seems.
192862588Sitojun- You can change compression ratio on outbound packet, by changing
192962588Sitojun  deflate_policy in sys/netinet6/ipcomp_core.c.  You can also change outbound
193062588Sitojun  history buffer size by changing deflate_window_out in the same source code.
193162588Sitojun  (should it be sysctl accessible, or per-SAD configurable?)
193262588Sitojun- Tunnel mode IPComp is not working right.  KAME box can generate tunnelled
193362588Sitojun  IPComp packet, however, cannot accept tunneled IPComp packet.
193462588Sitojun- You can negotiate IPComp association with racoon IKE daemon.
193562588Sitojun- KAME code does not attach Adler32 checksum to compressed data.
193662588Sitojun  see ipsec wg mailing list discussion in Jan 2000 for details.
193762588Sitojun
193862588Sitojun4.5 Conformance to RFCs and IDs
193962588Sitojun
194057522SshinThe IPsec code in the kernel conforms (or, tries to conform) to the
194157522Sshinfollowing standards:
194257522Sshin    "old IPsec" specification documented in rfc182[5-9].txt
194378064Sume    "new IPsec" specification documented in:
1944122115Sume	rfc240[1-6].txt rfc241[01].txt rfc2451.txt rfc3602.txt
194562588Sitojun    IPComp:
194662588Sitojun	RFC2393: IP Payload Compression Protocol (IPComp)
194778064SumeIKE specifications (rfc240[7-9].txt) are implemented in userland
194878064Sumeas "racoon" IKE daemon.
194957522Sshin
195057522SshinCurrently supported algorithms are:
195157522Sshin    old IPsec AH
195257522Sshin	null crypto checksum (no document, just for debugging)
195357522Sshin	keyed MD5 with 128bit crypto checksum (rfc1828.txt)
195457522Sshin	keyed SHA1 with 128bit crypto checksum (no document)
195557522Sshin	HMAC MD5 with 128bit crypto checksum (rfc2085.txt)
195657522Sshin	HMAC SHA1 with 128bit crypto checksum (no document)
1957121021Sume	HMAC RIPEMD160 with 128bit crypto checksum (no document)
195857522Sshin    old IPsec ESP
195957522Sshin	null encryption (no document, similar to rfc2410.txt)
196057522Sshin	DES-CBC mode (rfc1829.txt)
196157522Sshin    new IPsec AH
196257522Sshin	null crypto checksum (no document, just for debugging)
196357522Sshin	keyed MD5 with 96bit crypto checksum (no document)
196457522Sshin	keyed SHA1 with 96bit crypto checksum (no document)
196557522Sshin	HMAC MD5 with 96bit crypto checksum (rfc2403.txt
196657522Sshin	HMAC SHA1 with 96bit crypto checksum (rfc2404.txt)
1967151539Ssuz	HMAC SHA2-256 with 96bit crypto checksum (draft-ietf-ipsec-ciph-sha-256-00.txt)
196878064Sume	HMAC SHA2-384 with 96bit crypto checksum (no document)
196978064Sume	HMAC SHA2-512 with 96bit crypto checksum (no document)
1970121021Sume	HMAC RIPEMD160 with 96bit crypto checksum (RFC2857)
1971121071Sume	AES XCBC MAC with 96bit crypto checksum (RFC3566)
197257522Sshin    new IPsec ESP
197357522Sshin	null encryption (rfc2410.txt)
197457522Sshin	DES-CBC with derived IV
197557522Sshin		(draft-ietf-ipsec-ciph-des-derived-01.txt, draft expired)
197657522Sshin	DES-CBC with explicit IV (rfc2405.txt)
197757522Sshin	3DES-CBC with explicit IV (rfc2451.txt)
197857522Sshin	BLOWFISH CBC (rfc2451.txt)
197957522Sshin	CAST128 CBC (rfc2451.txt)
1980121071Sume	RIJNDAEL/AES CBC (rfc3602.txt)
1981151539Ssuz	AES counter mode (rfc3686.txt)
1982121071Sume
1983151539Ssuz	each of the above can be combined with new IPsec AH schemes for
1984151539Ssuz	ESP authentication.
198562588Sitojun    IPComp
198662588Sitojun	RFC2394: IP Payload Compression Using DEFLATE
198757522Sshin
198857522SshinThe following algorithms are NOT supported:
198957522Sshin    old IPsec AH
199057522Sshin	HMAC MD5 with 128bit crypto checksum + 64bit replay prevention
199157522Sshin		(rfc2085.txt)
199257522Sshin	keyed SHA1 with 160bit crypto checksum + 32bit padding (rfc1852.txt)
199357522Sshin
199462588SitojunThe key/policy management API is based on the following document, with fair
199562588Sitojunamount of extensions:
199662588Sitojun	RFC2367: PF_KEY key management API
199757522Sshin
199862588Sitojun4.6 ECN consideration on IPsec tunnels
199957522Sshin
200057522SshinKAME IPsec implements ECN-friendly IPsec tunnel, described in
200162588Sitojundraft-ietf-ipsec-ecn-02.txt.
200257522SshinNormal IPsec tunnel is described in RFC2401.  On encapsulation,
200357522SshinIPv4 TOS field (or, IPv6 traffic class field) will be copied from inner
200457522SshinIP header to outer IP header.  On decapsulation outer IP header
200557522Sshinwill be simply dropped.  The decapsulation rule is not compatible
200657522Sshinwith ECN, since ECN bit on the outer IP TOS/traffic class field will be
200757522Sshinlost.
200857522SshinTo make IPsec tunnel ECN-friendly, we should modify encapsulation
200957522Sshinand decapsulation procedure.  This is described in
201062588Sitojundraft-ietf-ipsec-ecn-02.txt, chapter 3.3.
201157522Sshin
201257522SshinKAME IPsec tunnel implementation can give you three behaviors, by setting
201357522Sshinnet.inet.ipsec.ecn (or net.inet6.ipsec6.ecn) to some value:
201457522Sshin- RFC2401: no consideration for ECN (sysctl value -1)
201557522Sshin- ECN forbidden (sysctl value 0)
201657522Sshin- ECN allowed (sysctl value 1)
201757522SshinNote that the behavior is configurable in per-node manner, not per-SA manner
201862588Sitojun(draft-ietf-ipsec-ecn-02 wants per-SA configuration, but it looks too much
201962588Sitojunfor me).
202057522Sshin
202157522SshinThe behavior is summarized as follows (see source code for more detail):
202257522Sshin
202357522Sshin		encapsulate			decapsulate
202457522Sshin		---				---
202557522SshinRFC2401		copy all TOS bits		drop TOS bits on outer
202657522Sshin		from inner to outer.		(use inner TOS bits as is)
202757522Sshin
202857522SshinECN forbidden	copy TOS bits except for ECN	drop TOS bits on outer
202957522Sshin		(masked with 0xfc) from inner	(use inner TOS bits as is)
203057522Sshin		to outer.  set ECN bits to 0.
203157522Sshin
203257522SshinECN allowed	copy TOS bits except for ECN	use inner TOS bits with some
203357522Sshin		CE (masked with 0xfe) from	change.  if outer ECN CE bit
203457522Sshin		inner to outer.			is 1, enable ECN CE bit on
203557522Sshin		set ECN CE bit to 0.		the inner.
203657522Sshin
203757522SshinGeneral strategy for configuration is as follows:
203857522Sshin- if both IPsec tunnel endpoint are capable of ECN-friendly behavior,
203957522Sshin  you'd better configure both end to "ECN allowed" (sysctl value 1).
204057522Sshin- if the other end is very strict about TOS bit, use "RFC2401"
204157522Sshin  (sysctl value -1).
204257522Sshin- in other cases, use "ECN forbidden" (sysctl value 0).
204357522SshinThe default behavior is "ECN forbidden" (sysctl value 0).
204457522Sshin
204557522SshinFor more information, please refer to:
204662588Sitojun	draft-ietf-ipsec-ecn-02.txt
204757522Sshin	RFC2481 (Explicit Congestion Notification)
204857522Sshin	KAME sys/netinet6/{ah,esp}_input.c
204957522Sshin
205057522Sshin(Thanks goes to Kenjiro Cho <kjc@csl.sony.co.jp> for detailed analysis)
205157522Sshin
205262588Sitojun4.7 Interoperability
205357522Sshin
205462588SitojunIPsec, IPComp (in kernel) and IKE (in userland as "racoon") has been tested
205562588Sitojunat several interoperability test events, and it is known to interoperate
205662588Sitojunwith many other implementations well.  Also, KAME IPsec has quite wide
205762588Sitojuncoverage for IPsec crypto algorithms documented in RFC (we do not cover
205862588Sitojunalgorithms with intellectual property issues, though).
205962588Sitojun
206057522SshinHere are (some of) platforms we have tested IPsec/IKE interoperability
206178064Sumein the past, no particular order.  Note that both ends (KAME and
206262588Sitojunothers) may have modified their implementation, so use the following
206362588Sitojunlist just for reference purposes.
2064151539Ssuz	6WIND, ACC, Allied-telesis, Altiga, Ashley-laurent (vpcom.com),
2065151539Ssuz	BlueSteel, CISCO IOS, Checkpoint FW-1, Compaq Tru54 UNIX
2066151539Ssuz	X5.1B-BL4, Cryptek, Data Fellows (F-Secure), Ericsson,
2067151539Ssuz	F-Secure VPN+ 5.40, Fitec, Fitel, FreeS/WAN, HITACHI, HiFn,
2068151539Ssuz	IBM AIX 5.1, III, IIJ (fujie stack), Intel Canada, Intel
2069151539Ssuz	Packet Protect, MEW NetCocoon, MGCS, Microsoft WinNT/2000/XP,
2070151539Ssuz	NAI PGPnet, NEC IX5000, NIST (linux IPsec + plutoplus),
2071151539Ssuz	NetLock, Netoctave, Netopia, Netscreen, Nokia EPOC, Nortel
2072151539Ssuz	GatewayController/CallServer 2000 (not released yet),
2073151539Ssuz	NxNetworks, OpenBSD isakmpd on OpenBSD, Oullim information
2074151539Ssuz	technologies SECUREWORKS VPN gateway 3.0, Pivotal, RSA,
2075151539Ssuz	Radguard, RapidStream, RedCreek, Routerware, SSH, SecGo
2076151539Ssuz	CryptoIP v3, Secure Computing, Soliton, Sun Solaris 8,
2077151539Ssuz	TIS/NAI Gauntret, Toshiba, Trilogy AdmitOne 2.6, Trustworks
2078151539Ssuz	TrustedClient v3.2, USAGI linux, VPNet, Yamaha RT series,
2079151539Ssuz	ZyXEL
208057522Sshin
208162588SitojunHere are (some of) platforms we have tested IPComp/IKE interoperability
208262588Sitojunin the past, in no particular order.
2083151539Ssuz	Compaq, IRE, SSH, NetLock, FreeS/WAN, F-Secure VPN+ 5.40
208457522Sshin
208578064SumeVPNC (vpnc.org) provides IPsec conformance tests, using KAME and OpenBSD
208678064SumeIPsec/IKE implementations.  Their test results are available at
208778064Sumehttp://www.vpnc.org/conformance.html, and it may give you more idea
208878064Sumeabout which implementation interoperates with KAME IPsec/IKE implementation.
208978064Sume
2090122115Sume4.8 Operations with IPsec tunnel mode
2091122115Sume
2092122115SumeFirst of all, IPsec tunnel is a very hairy thing.  It seems to do a neat thing
2093122115Sumelike VPN configuration or secure remote accesses, however, it comes with lots
2094122115Sumeof architectural twists.
2095122115Sume
2096122115SumeRFC2401 defines IPsec tunnel mode, within the context of IPsec.  RFC2401
2097122115Sumedefines tunnel mode packet encapsulation/decapsulation on its own, and
2098122115Sumedoes not refer other tunnelling specifications.  Since RFC2401 advocates
2099122115Sumefilter-based SPD database matches, it would be natural for us to implement
2100122115SumeIPsec IPsec tunnel mode as filters - not as pseudo interfaces.
2101122115Sume
2102122115SumeThere are some people who are trying to separate IPsec "tunnel mode" from
2103122115Sumethe IPsec itself.  They would like to implement IPsec transport mode only,
2104122115Sumeand combine it with tunneling pseudo devices.  The prime example is found
2105122115Sumein draft-touch-ipsec-vpn-01.txt.  However, if you really define pseudo
2106122115Sumeinterfaces separately from IPsec, IKE daemons would need to negotiate
2107122115Sumetransport mode SAs, instead of tunnel mode SAs.  Therefore, we cannot
2108122115Sumereally mix RFC2401-based interpretation and draft-touch-ipsec-vpn-01.txt
2109122115Sumeinterpretation.
2110122115Sume
2111122115SumeThe KAME stack implements can be configured in two ways.  You may need
2112122115Sumeto recompile your kernel to switch the behavior.
2113122115Sume- RFC2401 IPsec tunnel mode appraoch (4.8.1)
2114122115Sume- draft-touch-ipsec-vpn approach (4.8.2)
2115122115Sume	Works in all kernel configuration, but racoon(8) may not interoperate.
2116122115Sume
2117122115SumeThere are pros and cons on these approaches:
2118122115Sume
2119122115SumeRFC2401 IPsec tunnel mode (filter-like) approach
2120122115Sume	PRO: SPD lookup fits nicely with packet filters (if you integrate them)
2121122115Sume	CON: cannot run routing daemons across IPsec tunnels
2122122115Sume	CON: it is very hard to control source address selection on originating
2123122115Sume		cases
2124122115Sume	???: IPv6 scope zone is kept the same
2125122115Sumedraft-touch-ipsec-vpn (transportmode + Pseudo-interface) approach
2126122115Sume	PRO: run routing daemons across IPsec tunnels
2127122115Sume	PRO: source address selection can be done normally, by looking at
2128122115Sume		IPsec tunnel pseudo devices
2129122115Sume	CON: on outbound, possibility of infinite loops if routing setup
2130122115Sume		is wrong
2131122115Sume	CON: due to differences in encap/decap logic from RFC2401, it may not
2132122115Sume		interoperate with very picky RFC2401 implementations
2133122115Sume		(those who check TOS bits, for example)
2134122115Sume	CON: cannot negotiate IKE with other IPsec tunnel-mode devices
2135122115Sume		(the other end has to implement 
2136122115Sume	???: IPv6 scope zone is likely to be different from the real ethernet
2137122115Sume		interface
2138122115Sume
2139122115SumeThe recommendation is different depending on the situation you have:
2140122115Sume- use draft-touch-ipsec-vpn if you have the control over the other end.
2141122115Sume  this one is the best in terms of simplicity.
2142122115Sume- if the other end is normal IPsec device with RFC2401 implementation,
2143122115Sume  you need to use RFC2401, otherwise you won't be able to run IKE.
2144122115Sume- use RFC2401 approach if you just want to forward packets back and forth
2145122115Sume  and there's no plan to use IPsec gateway itself as an originating device.
2146122115Sume
2147122115Sume4.8.1 RFC2401 IPsec tunnel mode approach
2148122115Sume
2149122115SumeTo configure your device as RFC2401 IPsec tunnel mode endpoint, you will
2150122115Sumeuse "tunnel" keyword in setkey(8) "spdadd" directives.  Let us assume the
2151122115Sumefollowing topology (A and B could be a network, like prefix/length):
2152122115Sume
2153122115Sume	((((((((((((The internet))))))))))))
2154122115Sume	  |			  |
2155122115Sume	  |C (global)		  |D
2156122115Sume	your device		peer's device
2157122115Sume	  |A (private)		  |B
2158122115Sume	==+===== VPN net	==+===== VPN net
2159122115Sume
2160122115SumeThe policy configuration directive is like this.  You will need manual
2161122115SumeSAs, or IKE daemon, for actual encryption:
2162122115Sume
2163122115Sume	# setkey -c <<EOF
2164122115Sume	spdadd A B any -P out ipsec esp/tunnel/C-D/use;
2165122115Sume	spdadd B A any -P in ipsec esp/tunnel/D-C/use;
2166122115Sume	^D
2167122115Sume
2168122115SumeThe inbound/outbound traffic is monitored/captured by SPD engine, which works
2169122115Sumejust like packet filters.
2170122115Sume
2171122115SumeWith this, forwarding case should work flawlessly.  However, troubles arise
2172122115Sumewhen you have one of the following requirements:
2173122115Sume- When you originate traffic from your VPN gateway device to VPN net on the
2174122115Sume  other end (like B), you want your source address to be A (private side)
2175122115Sume  so that the traffic would be protected by the policy.
2176122115Sume  With this approach, however, the source address selection logic follows
2177122115Sume  normal routing table, and C (global side) will be picked for any outgoing
2178122115Sume  traffic, even if the destination is B.  The resulting packet will be like
2179122115Sume  this:
2180122115Sume	IP[C -> B] payload
2181122115Sume  and will not match the policy (= sent in clear).
2182122115Sume- When you want to run routing protocols on top of the IPsec tunnel, it is
2183122115Sume  not possible.  As there is no pseudo device that identifies the IPsec tunnel,
2184122115Sume  you cannot identify where the routing information came from.  As a result,
2185122115Sume  you can't run routing daemons.
2186122115Sume
2187122115Sume4.8.2 draft-touch-ipsec-vpn approach
2188122115Sume
2189122115SumeWith this approach, you will configure gif(4) tunnel interfaces, as well as
2190122115SumeIPsec transport mode SAs.
2191122115Sume
2192122115Sume	# gifconfig gif0 C D
2193122115Sume	# ifconfig gif0 A B
2194122115Sume	# setkey -c <<EOF
2195122115Sume	spdadd C D any -P out ipsec esp/transport//use;
2196122115Sume	spdadd D C any -P in ipsec esp/transport//use;
2197122115Sume	^D
2198122115Sume
2199122115SumeSince we have a pseudo-interface "gif0", and it affects the routes and
2200122115Sumethe source address selection logic, we can have source address A, for
2201122115Sumepackets originated by the VPN gateway to B (and the VPN cloud).
2202122115SumeWe can also exchange routing information over the tunnel (gif0), as the tunnel
2203122115Sumeis represented as a pseudo interface (dynamic routes points to the
2204122115Sumepseudo interface).
2205122115Sume
2206122115SumeThere is a big drawbacks, however; with this, you can use IKE if and only if
2207122115Sumethe other end is using draft-touch-ipsec-vpn approach too.  Since racoon(8)
2208122115Sumegrabs phase 2 IKE proposals from the kernel SPD database, you will be
2209122115Sumenegotiating IPsec transport-mode SAs with the other end, not tunnel-mode SAs.
2210122115SumeAlso, since the encapsulation mechanism is different from RFC2401, you may not
2211122115Sumebe able to interoperate with a picky RFC2401 implementations - if the other
2212122115Sumeend checks certain outer IP header fields (like TOS), you will not be able to
2213122115Sumeinteroperate.
2214122115Sume
2215122115Sume
221662588Sitojun5. ALTQ
221757522Sshin
2218151539SsuzKAME kit includes ALTQ, which supports FreeBSD3, FreeBSD4, FreeBSD5
2219151539SsuzNetBSD.  OpenBSD has ALTQ merged into pf and its ALTQ code is not
2220151539Ssuzcompatible with other platforms so that KAME's ALTQ is not used for
2221151539SsuzOpenBSD.  For BSD/OS, ALTQ does not work.
2222151539SsuzALTQ in KAME supports IPv6.
222378064Sume(actually, ALTQ is developed on KAME repository since ALTQ 2.1 - Jan 2000)
222457522Sshin
222578064SumeALTQ occupies single character device number.  For FreeBSD, it is officially
222678064Sumeallocated.  For OpenBSD and NetBSD, we use the number which is not
222778064Sumecurrently allocated (will eventually get an official number).
222878064SumeThe character device is enabled for i386 architecture only.  To enable and
222978064Sumecompile ALTQ-ready kernel for other archititectures, take the following steps:
223078064Sume- assume that your architecture is FOOBAA.
223178064Sume- modify sys/arch/FOOBAA/FOOBAA/conf.c (or somewhere that defines cdevsw),
223278064Sume  to include a line for ALTQ.  look at sys/arch/i386/i386/conf.c for
223378064Sume  example.  The major number must be same as i386 case.
223478064Sume- copy kernel configuration file (like ALTQ.v6 or GENERIC.v6) from i386,
223578064Sume  and modify accordingly.
223678064Sume- build a kernel.
223778064Sume- before building userland, change netbsd/{lib,usr.sbin,usr.bin}/Makefile
223878064Sume  (or openbsd/foobaa) so that it will visit altq-related sub directories.
223978064Sume
224057522Sshin
2241151539Ssuz6. Mobile IPv6
2242151539Ssuz
224378064Sume6.1 KAME node as correspondent node
224457522Sshin
224578064SumeDefault installation recognizes home address option (in destination
224678064Sumeoptions header).  No sub-options are supported.  interaction with
224778064SumeIPsec, and/or 2292bis API, needs further study.
224878064Sume
224978064Sume6.2 KAME node as home agent/mobile node
225078064Sume
225178064SumeKAME kit includes Ericsson mobile-ip6 code.  The integration is just started
225278064Sume(in Feb 2000), and we will need some more time to integrate it better.
225378064Sume
225478064SumeSee kame/mip6config/{QUICKSTART,README_MIP6.txt} for more details.
225578064Sume
225678064SumeThe Ericsson code implements revision 09 of the mobile-ip6 draft.  There
225778064Sumeare other implementations available:
225878064Sume	NEC: http://www.6bone.nec.co.jp/mipv6/internal-dist/ (-13 draft)
225978064Sume	SFC: http://neo.sfc.wide.ad.jp/~mip6/ (-13 draft)
226078064Sume
226178064Sume7. Coding style
226278064Sume
226378064SumeThe KAME developers basically do not make a bother about coding
226478064Sumestyle.  However, there is still some agreement on the style, in order
226578064Sumeto make the distributed develoment smooth.
226678064Sume
2267122115Sume- follow *BSD KNF where possible.  note: there are multiple KNF standards.
226878064Sume- the tab character should be 8 columns wide (tabstops are at 8, 16, 24, ...
226978064Sume  column).  With vi, use ":set ts=8 sw=8".
2270122115Sume  With GNU Emacs 20 and later, the easiest way is to use the "bsd" style of
2271122115Sume  cc-mode with the variable "c-basic-offset" being 8;
2272122115Sume  (add-hook 'c-mode-common-hook
2273122115Sume	    (function
2274122115Sume	     (lambda ()
2275122115Sume	       (c-set-style "bsd")
2276122115Sume	       (setq c-basic-offset 8)  ; XXX for Emacs 20 only
2277122115Sume	       )))
2278122115Sume  The "bsd" style in GNU Emacs 21 sets the variable to 8 by default,
2279122115Sume  so the line marked by "XXX" is not necessary if you only use GNU
2280122115Sume  Emacs 21.
228178064Sume- each line should be within 80 characters.
228278064Sume- keep a single open/close bracket in a comment such as in the following
228378064Sume  line:
228478064Sume	putchar('(');	/* ) */
228578064Sume  without this, some vi users would have a hard time to match a pair of
228678064Sume  brackets.  Although this type of bracket seems clumsy and is even
228778064Sume  harmful for some other type of vi users and Emacs users, the
228878064Sume  agreement in the KAME developers is to allow it.
228978064Sume- add the following line to the head of every KAME-derived file:
229078064Sume  /*	(dollar)KAME(dollar)	*/
229178064Sume  where "(dollar)" is the dollar character ($), and around "$" are tabs.
2292122115Sume  (this is for C.  For other language, you should use its own comment
229378064Sume  line.)
229478064Sume  Once commited to the CVS repository, this line will contain its
229578064Sume  version number (see, for example, at the top of this file).  This
229678064Sume  would make it easy to report a bug.
229778064Sume- when creating a new file with the WIDE copyright, tap "make copyright.c" at
229878064Sume  the top-level, and use copyright.c as a template.  KAME RCS tag will be
229978064Sume  included automatically.
230078064Sume- when editting a third-party package, keep its own coding style as
230178064Sume  much as possible, even if the style does not follow the items above.
2302122115Sume- it is recommended to always wrap an expression containing
2303122115Sume  bitwise operators by parentheses, especially when the expression is
2304122115Sume  combined with relational operators, in order to avoid unintentional
2305122115Sume  mismatch of operators.  Thus, we should write
2306122115Sume	if ((a & b) == 0)	/* (A) */
2307122115Sume  or
2308122115Sume	if (a & (b == 0))	/* (B) */
2309122115Sume  instead of
2310122115Sume	if (a & b == 0)		/* (C) */
2311122115Sume  even if the programmer's intention was (C), which is equivalent to
2312122115Sume  (B) according to the grammar of the language C.
2313122115Sume  Thus, we should write a code to test if a bit-flag is set for a
2314122115Sume  given variable as follows:
2315122115Sume	if ((flag & FLAG_A) == 0)	/* (D) the FLAG_A is NOT set */
2316122115Sume	if ((flag & FLAG_A) != 0)	/* (E) the FLAG_A is set */
2317122115Sume  Some developers in the KAME project rather prefer the following style:
2318122115Sume	if (!(flag & FLAG_A))	/* (F) the FLAG_A is NOT set */
2319122115Sume	if ((flag & FLAG_A))	/* (G) the FLAG_A is set */
2320122115Sume  because it would be more intuitive in terms of the relationship
2321122115Sume  between the negation operator (!) and the semantics of the
2322122115Sume  condition.  The KAME developers have discussed the style, and have
2323122115Sume  agreed that all the styles from (D) to (G) are valid.  So, when you
2324122115Sume  see styles like (D) and (E) in the KAME code and feel a bit strange,
2325122115Sume  please just keep them.  They are intentional.
2326122115Sume- When inserting a separate block just to define some intra-block
2327122115Sume  variables, add the level of indentation as if the block was in a
2328122115Sume  control statement such as if-else, for, or while.  For example,
2329122115Sume	foo ()
2330122115Sume	{
2331122115Sume		int a;
233278064Sume
2333122115Sume		{
2334122115Sume			int internal_a;
2335122115Sume			...
2336122115Sume		}
2337122115Sume	}
2338122115Sume  should be used, instead of
2339122115Sume	foo ()
2340122115Sume	{
2341122115Sume		int a;
2342122115Sume
2343122115Sume	    {
2344122115Sume		int internal_a;
2345122115Sume		...
2346122115Sume	     }
2347122115Sume	}
2348122115Sume- Do not use printf() or log() in the packet input path of the kernel code.
2349122115Sume  They can make the system vulnerable to packet flooding attacks (results in
2350122115Sume  /var overflow).
2351122115Sume- (not a style issue)
2352122115Sume  To disable a module that is mistakenly imported (by CVS), just
2353122115Sume  remove the source tree in the repository.  Note, however, that the
2354122115Sume  removal might annoy other developers who have already checked the
2355122115Sume  module out, so you should announce the removal as soon as possible.
2356122115Sume  Also, be 100% sure not to remove other modules.
2357122115Sume
235878064SumeWhen you want to contribute something to the KAME project, and if *you
235978064Sumedo not mind* the agreement, it would be helpful for the project to
236078064Sumekeep these rules.  Note, however, that we would never intend to force
236178064Sumeyou to adopt our rules.  We would rather regard your own style,
236278064Sumeespecially when you have a policy about the style.
236378064Sume
2364122115Sume
2365164224Sbz8. Policy on technology with intellectual property right restriction
2366122115Sume
2367122115SumeThere are quite a few IETF documents/whatever which has intellectual property
2368122115Sumeright (IPR) restriction.  KAME's stance is stated below.
2369122115Sume
2370122115Sume    The goal of KAME is to provide freely redistributable, BSD-licensed,
2371122115Sume    implementation of Internet protocol technologies.
2372122115Sume    For this purpose, we implement protocols that (1) do not need license
2373122115Sume    contract with IPR holder, and (2) are royalty-free.
2374122115Sume    The reason for (1) is, even if KAME contracts with the IPR holder in
2375122115Sume    question, the users of KAME stack (usually implementers of some other
2376122115Sume    codebase) would need to make a license contract with the IPR holder.
2377122115Sume    It would damage the "freely redistributable" status of KAME codebase.
2378122115Sume
2379122115Sume    By doing so KAME is (implicitly) trying to advocate no-license-contract,
2380122115Sume    royalty-free, release of IPRs.
2381122115Sume
2382122115SumeNote however, as documented in README, we do not guarantee that KAME code
2383122115Sumeis free of IPR infringement, you MUST check it if you are to integrate
2384122115SumeKAME into your product (or whatever):
2385122115Sume    READ CAREFULLY: Several countries have legal enforcement for
2386122115Sume    export/import/use of cryptographic software.  Check it before playing
2387122115Sume    with the kit.  We do not intend to be your legalease clearing house
2388122115Sume    (NO WARRANTY).  If you intend to include KAME stack into your product,
2389122115Sume    you'll need to check if the licenses on each file fit your situations,
2390122115Sume    and/or possible intellectual property right issues.
2391122115Sume
239257522Sshin						 <end of IMPLEMENTATION>
2393