IMPLEMENTATION revision 148394
1	Implementation Note
2
3	KAME Project
4	http://www.kame.net/
5	$KAME: IMPLEMENTATION,v 1.216 2001/05/25 07:43:01 jinmei Exp $
6	$FreeBSD: head/share/doc/IPv6/IMPLEMENTATION 148394 2005-07-25 16:26:47Z ume $
7
8NOTE: The document tries to describe behaviors/implementation choices
9of the latest KAME/*BSD stack (like KAME/NetBSD 1.5.1).  The description
10here may not be applicable to KAME-integrated *BSD releases (like stock
11NetBSD 1.5.1), as we have certain amount of changes between them.  Still,
12some of the content can be useful for KAME-integrated *BSD releases.
13
14Table of Contents
15
16	1. IPv6
17	1.1 Conformance
18	1.2 Neighbor Discovery
19	1.3 Scope Zone Index
20	1.3.1 Kernel internal
21	1.3.2 Interaction with API
22	1.3.3 Interaction with users (command line)
23	1.4 Plug and Play
24	1.4.1 Assignment of link-local, and special addresses
25	1.4.2 Stateless address autoconfiguration on hosts
26	1.4.3 DHCPv6
27	1.5 Generic tunnel interface
28	1.6 Address Selection
29	1.6.1 Source Address Selection
30	1.6.2 Destination Address Ordering
31	1.7 Jumbo Payload
32	1.8 Loop prevention in header processing
33	1.9 ICMPv6
34	1.10 Applications
35	1.11 Kernel Internals
36	1.12 IPv4 mapped address and IPv6 wildcard socket
37	1.12.1 KAME/BSDI3 and KAME/FreeBSD228
38	1.12.2 KAME/FreeBSD[34]x
39	1.12.2.1 KAME/FreeBSD[34]x, listening side
40	1.12.2.2 KAME/FreeBSD[34]x, initiating side
41	1.12.3 KAME/NetBSD
42	1.12.3.1 KAME/NetBSD, listening side
43	1.12.3.2 KAME/NetBSD, initiating side
44	1.12.4 KAME/BSDI4
45	1.12.4.1 KAME/BSDI4, listening side
46	1.12.4.2 KAME/BSDI4, initiating side
47	1.12.5 KAME/OpenBSD
48	1.12.5.1 KAME/OpenBSD, listening side
49	1.12.5.2 KAME/OpenBSD, initiating side
50	1.12.6 More issues
51	1.12.7 Interaction with SIIT translator
52	1.13 sockaddr_storage
53	1.14 Invalid addresses on the wire
54	1.15 Node's required addresses
55	1.15.1 Host case
56	1.15.2 Router case
57	1.16 Advanced API
58	1.17 DNS resolver
59	2. Network Drivers
60	2.1 FreeBSD 2.2.x-RELEASE
61	2.2 BSD/OS 3.x
62	2.3 NetBSD
63	2.4 FreeBSD 3.x-RELEASE
64	2.5 FreeBSD 4.x-RELEASE
65	2.6 OpenBSD 2.x
66	2.7 BSD/OS 4.x
67	3. Translator
68	3.1 FAITH TCP relay translator
69	3.2 IPv6-to-IPv4 header translator
70	4. IPsec
71	4.1 Policy Management
72	4.2 Key Management
73	4.3 AH and ESP handling
74	4.4 IPComp handling
75	4.5 Conformance to RFCs and IDs
76	4.6 ECN consideration on IPsec tunnels
77	4.7 Interoperability
78	4.8 Operations with IPsec tunnel mode
79	4.8.1 RFC2401 IPsec tunnel mode approach
80	4.8.2 draft-touch-ipsec-vpn approach
81	5. ALTQ
82	6. Mobile IPv6
83	6.1 KAME node as correspondent node
84	6.2 KAME node as home agent/mobile node
85	6.3 Old Mobile IPv6 code
86	7. Routing table extensions
87	7.1 ART routing table lookup algorithm
88	7.2 Multipath routing support
89	8. Coding style
90	9. Policy on technology with intellectual property right restriction
91
921. IPv6
93
941.1 Conformance
95
96The KAME kit conforms, or tries to conform, to the latest set of IPv6
97specifications.  For future reference we list some of the relevant documents
98below (NOTE: this is not a complete list - this is too hard to maintain...).
99For details please refer to specific chapter in the document, RFCs, manpages
100come with KAME, or comments in the source code.
101
102Conformance tests have been performed on past and latest KAME STABLE kit,
103at TAHI project.  Results can be viewed at http://www.tahi.org/report/KAME/.
104We also attended Univ. of New Hampshire IOL tests (http://www.iol.unh.edu/)
105in the past, with our past snapshots.
106
107RFC1639: FTP Operation Over Big Address Records (FOOBAR)
108    * RFC2428 is preferred over RFC1639.  ftp clients will first try RFC2428,
109      then RFC1639 if failed.
110RFC1886: DNS Extensions to support IPv6
111RFC1933: (see RFC2893)
112RFC1981: Path MTU Discovery for IPv6
113RFC2080: RIPng for IPv6
114    * KAME-supplied route6d, bgpd and hroute6d support this.
115RFC2283: Multiprotocol Extensions for BGP-4
116    * so-called "BGP4+".
117    * KAME-supplied bgpd supports this.
118RFC2292: Advanced Sockets API for IPv6
119    * see RFC3542
120RFC2362: Protocol Independent Multicast-Sparse Mode (PIM-SM)
121    * RFC2362 defines the packet formats and the protcol of PIM-SM.
122RFC2373: IPv6 Addressing Architecture
123    * KAME supports node required addresses, and conforms to the scope
124      requirement.
125RFC2374: An IPv6 Aggregatable Global Unicast Address Format
126    * KAME supports 64-bit length of Interface ID.
127RFC2375: IPv6 Multicast Address Assignments
128    * Userland applications use the well-known addresses assigned in the RFC.
129RFC2428: FTP Extensions for IPv6 and NATs
130    * RFC2428 is preferred over RFC1639.  ftp clients will first try RFC2428,
131      then RFC1639 if failed.
132RFC2460: IPv6 specification
133RFC2461: Neighbor discovery for IPv6
134    * See 1.2 in this document for details.
135RFC2462: IPv6 Stateless Address Autoconfiguration
136    * See 1.4 in this document for details.
137RFC2463: ICMPv6 for IPv6 specification
138    * See 1.9 in this document for details.
139RFC2464: Transmission of IPv6 Packets over Ethernet Networks
140RFC2465: MIB for IPv6: Textual Conventions and General Group
141    * Necessary statistics are gathered by the kernel.  Actual IPv6 MIB
142      support is provided as patchkit for ucd-snmp.
143RFC2466: MIB for IPv6: ICMPv6 group
144    * Necessary statistics are gathered by the kernel.  Actual IPv6 MIB
145      support is provided as patchkit for ucd-snmp.
146RFC2467: Transmission of IPv6 Packets over FDDI Networks
147RFC2472: IPv6 over PPP
148RFC2492: IPv6 over ATM Networks
149    * only PVC is supported.
150RFC2497: Transmission of IPv6 packet over ARCnet Networks
151RFC2545: Use of BGP-4 Multiprotocol Extensions for IPv6 Inter-Domain Routing
152RFC2553: (see RFC3493)
153RFC2671: Extension Mechanisms for DNS (EDNS0)
154    * see USAGE for how to use it.
155    * not supported on kame/freebsd4 and kame/bsdi4.
156RFC2673: Binary Labels in the Domain Name System
157    * KAME/bsdi4 supports A6, DNAME and binary label to some extent.
158    * KAME apps/bind8 repository has resolver library with partial A6, DNAME
159      and binary label support.
160RFC2675: IPv6 Jumbograms
161    * See 1.7 in this document for details.
162RFC2710: Multicast Listener Discovery for IPv6
163RFC2711: IPv6 router alert option
164RFC2732: Format for Literal IPv6 Addresses in URL's
165    * The spec is implemented in programs that handle URLs
166      (like freebsd ftpio(3) and fetch(1), or netbsd ftp(1))
167RFC2874: DNS Extensions to Support IPv6 Address Aggregation and Renumbering
168    * KAME/bsdi4 supports A6, DNAME and binary label to some extent.
169    * KAME apps/bind8 repository has resolver library with partial A6, DNAME
170      and binary label support.
171RFC2893: Transition Mechanisms for IPv6 Hosts and Routers
172    * IPv4 compatible address is not supported.
173    * automatic tunneling (4.3) is not supported.
174    * "gif" interface implements IPv[46]-over-IPv[46] tunnel in a generic way,
175      and it covers "configured tunnel" described in the spec.
176      See 1.5 in this document for details.
177RFC2894: Router renumbering for IPv6
178RFC3041: Privacy Extensions for Stateless Address Autoconfiguration in IPv6
179RFC3056: Connection of IPv6 Domains via IPv4 Clouds
180    * So-called "6to4".
181    * "stf" interface implements it.  Be sure to read
182      draft-itojun-ipv6-transition-abuse-01.txt
183      below before configuring it, there can be security issues.
184RFC3142: An IPv6-to-IPv4 transport relay translator
185    * FAITH tcp relay translator (faithd) implements this.  See 3.1 for more
186      details.
187RFC3152: Delegation of IP6.ARPA
188    * libinet6 resolvers contained in the KAME snaps support to use
189      the ip6.arpa domain (with the nibble format) for IPv6 reverse
190      lookups.
191RFC3484: Default Address Selection for IPv6
192    * the selection algorithm for both source and destination addresses
193      is implemented based on the RFC, though some rules are still omitted.
194RFC3493: Basic Socket Interface Extensions for IPv6
195    * IPv4 mapped address (3.7) and special behavior of IPv6 wildcard bind
196      socket (3.8) are,
197	- supported and turned on by default on KAME/FreeBSD[34]
198	  and KAME/BSDI4,
199	- supported but turned off by default on KAME/NetBSD and KAME/FreeBSD5,
200	- not supported on KAME/FreeBSD228, KAME/OpenBSD and KAME/BSDI3.
201      see 1.12 in this document for details.
202    * The AI_ALL and AI_V4MAPPED flags are not supported.
203RFC3542: Advanced Sockets API for IPv6 (revised)
204    * For supported library functions/kernel APIs, see sys/netinet6/ADVAPI.
205    * Some of the updates in the draft are not implemented yet.  See
206      TODO.2292bis for more details.
207draft-ietf-ipngwg-icmp-name-lookups-09: IPv6 Name Lookups Through ICMP
208draft-ietf-ngtrans-tcpudp-relay-04.txt:
209	An IPv6-to-IPv4 transport relay translator
210    * FAITH tcp relay translator (faithd) implements this.  See 3.1 for more
211      details.
212draft-ietf-ipngwg-router-selection-01.txt:
213	Default Router Preferences and More-Specific Routes
214    * router-side only.
215draft-ietf-ipngwg-scoping-arch-02.txt:
216	The architecture, text representation, and usage of IPv6
217	scoped addresses.
218    * some part of the documentation (especially about the routing
219      model) is not supported yet.
220draft-ietf-pim-sm-v2-new-02.txt
221	A revised version of RFC2362, which includes the IPv6 specific
222	packet format and protocol descriptions.
223draft-ietf-dnsext-mdns-00.txt: Multicast DNS
224    * kame/mdnsd has test implementation, which will not be built in
225      default compilation.  The draft will experience a major change in the
226      near future, so don't rely upon it.
227draft-itojun-ipv6-tcp-to-anycast-01.txt:
228	Disconnecting TCP connection toward IPv6 anycast address
229draft-itojun-ipv6-transition-abuse-01.txt:
230	Possible abuse against IPv6 transition technologies (expired)
231    * KAME does not implement RFC1933/2893 automatic tunnel.
232    * "stf" interface implements some address filters.  Refer to stf(4)
233      for details.  Since there's no way to make 6to4 interface 100% secure,
234      we do not include "stf" interface into GENERIC.v6 compilation.
235    * kame/openbsd completely disables IPv4 mapped address support.
236    * kame/netbsd makes IPv4 mapped address support off by default.
237    * See section 1.12.6 and 1.14 for more details.
238draft-itojun-ipv6-flowlabel-api-01.txt: Socket API for IPv6 flow label field
239    * no consideration is made against the use of routing headers and such.
240
2411.2 Neighbor Discovery
242
243Neighbor Discovery is fairly stable.  Currently Address Resolution,
244Duplicated Address Detection, and Neighbor Unreachability Detection
245are supported.  In the near future we will be adding Unsolicited Neighbor
246Advertisement transmission command as admin tool.
247
248Duplicated Address Detection (DAD) will be performed when an IPv6 address
249is assigned to a network interface, or the network interface is enabled
250(ifconfig up).  It is documented in RFC2462 5.4.
251If DAD fails, the address will be marked "duplicated" and message will be
252generated to syslog (and usually to console).  The "duplicated" mark
253can be checked with ifconfig.  It is administrators' responsibility to check
254for and recover from DAD failures.  We may try to improve failure recovery
255in future KAME code.
256DAD procedure may not be effective on certain network interfaces/drivers.
257If a network driver needs long initialization time (with wireless network
258interfaces this situation is popular), and the driver mistakingly raises
259IFF_RUNNING before the driver becomes ready, DAD code will try to transmit
260DAD probes to not-really-ready network driver and the packet will not go out
261from the interface.  In such cases, network drivers should be corrected.
262
263Some of network drivers loop multicast packets back to themselves,
264even if instructed not to do so (especially in promiscuous mode).
265In such cases DAD may fail, because DAD engine sees inbound NS packet
266(actually from the node itself) and considers it as a sign of duplicate.
267In this case, drivers should be corrected to honor IFF_SIMPLEX behavior.
268For example, you may need to check source MAC address on an inbound packet,
269and reject it if it is from the node itself.
270You may also want to look at #if condition marked "heuristics" in
271sys/netinet6/nd6_nbr.c:nd6_dad_timer() as workaround (note that the code
272fragment in "heuristics" section is not spec conformant).
273
274Neighbor Discovery specification (RFC2461) does not talk about neighbor
275cache handling in the following cases:
276(1) when there was no neighbor cache entry, node received unsolicited
277    RS/NS/NA/redirect packet without link-layer address
278(2) neighbor cache handling on medium without link-layer address
279    (we need a neighbor cache entry for IsRouter bit)
280For (1), we implemented workaround based on discussions on IETF ipngwg mailing
281list.  For more details, see the comments in the source code and email
282thread started from (IPng 7155), dated Feb 6 1999.
283
284IPv6 on-link determination rule (RFC2461) is quite different from assumptions
285in BSD IPv4 network code.  To implement behavior in RFC2461 section 5.2
286(when default router list is empty), the kernel needs to know the default
287outgoing interface.  To configure the default outgoing interface, use
288commands like "ndp -I de0" as root.  Note that the spec misuse the word
289"host" and "node" in several places in the section.
290
291To avoid possible DoS attacks and infinite loops, KAME stack will accept
292only 10 options on ND packet.  Therefore, if you have 20 prefix options
293attached to RA, only the first 10 prefixes will be recognized.
294If this troubles you, please contact the KAME team and/or modify
295nd6_maxndopt in sys/netinet6/nd6.c.  If there are high demands we may
296provide a sysctl knob for the variable.
297
298Proxy Neighbor Advertisement support is implemented in the kernel.
299For instance, you can configure it by using the following command:
300	# ndp -s fe80::1234%ne0 0:1:2:3:4:5 proxy
301where ne0 is the interface which attaches to the same link as the
302proxy target.
303There are certain limitations, though:
304- It does not send unsolicited multicast NA on configuration.  This is MAY
305  behavior in RFC2461.
306- It does not add random delay before transmission of solicited NA.  This is
307  SHOULD behavior in RFC2461.
308- We cannot configure proxy NDP for off-link address.  The target address for
309  proxying must be link-local address, or must be in prefixes configured to
310  node which does proxy NDP.
311- RFC2461 is unclear about if it is legal for a host to perform proxy ND.
312  We do not prohibit hosts from doing proxy ND, but there will be very limited
313  use in it.
314
315Starting mid March 2000, we support Neighbor Unreachability Detection (NUD)
316on p2p interfaces, including tunnel interfaces (gif).  NUD is turned on by
317default.  Before March 2000 KAME stack did not perform NUD on p2p interfaces.
318If the change raises any interoperability issues, you can turn off/on NUD
319by per-interface basis.  Use "ndp -i interface -nud" to turn it off.
320Consult ndp(8) for details.
321
322RFC2461 specifies upper-layer reachability confirmation hint.  Whenever
323upper-layer reachability confirmation hint comes, ND process can use it
324to optimize neighbor discovery process - ND process can omit real ND exchange
325and keep the neighbor cache state in REACHABLE.
326We currently have two sources for hints: (1) setsockopt(IPV6_REACHCONF)
327defined by 2292bis API, and (2) hints from tcp_input.
328It is questionable if they are really trustworthy.  For example, a rogue
329userland program can use IPV6_REACHCONF to confuse ND process.  Neighbor
330cache is a system-wide information pool, and it is bad to allow single process
331to affect others.  Also, tcp_input can be hosed by hijack attempts.  It is
332wrong to allow hijack attempts to affect ND process.
333Starting June 2000, ND code has a protection mechanism against incorrect
334upper-layer reachability confirmation.  ND code counts subsequent upper-layer
335hints.  If the number of hints reaches maximum, ND code will ignore further
336upper-layer hints and run real ND process to confirm reachability to the peer.
337sysctl net.inet6.icmp6.nd6_maxnudhint defines maximum # of subsequent
338upper-layer hints to be accepted.
339(from April 2000 to June 2000, we rejected setsockopt(IPV6_REACHCONF) from
340non-root process - after local discussion, it looks that hints are not
341that trustworthy even if they are from privileged processes)
342
343If inbound ND packets carry invalid values, the KAME kernel will
344drop these packet and increment statistics variable.  See
345"netstat -sn", icmp6 section.  For detailed debugging session, you can
346turn on syslog output from the kernel on errors, by turning on sysctl MIB
347net.inet6.icmp6.nd6_debug.  nd6_debug can be turned on at bootstrap
348time, by defining ND6_DEBUG kernel compilation option (so you can
349debug behavior during bootstrap).  nd6_debug configuration should
350only be used for test/debug purposes - for a production environment,
351nd6_debug must be set to 0.  If you leave it to 1, malicious parties
352can inject broken packet and fill up /var/log partition.
353
3541.3 Scope Zone Index
355
356IPv6 uses scoped addresses.  It is therefore very important to
357specify the scope zone index (link index for a link-local address, or
358site index for a site-local address) with an IPv6 address.  Without a
359zone index, a scoped IPv6 address is ambiguous to the kernel, and
360the kernel would not be able to determine the outbound zone for a
361packet to the scoped address.  KAME code tries to address the issue in
362several ways.
363
364The entire architecture of scoped addresses is documented in RFC4007.
365One non-trivial point of the architecture is that the link scope is
366(theoretically) larger than the interface scope.  That is, two
367different interfaces can belong to a same single link.  However, in a
368normal operation, we can assume that there is 1-to-1 relationship
369between links and interfaces.  In other words, we can usually put
370links and interfaces in the same scope type.  The current KAME
371implementation assumes the 1-to-1 relationship.  In particular, we use
372interface names such as "ne1" as unique link identifiers.  This would
373be much more human-readable and intuitive than numeric identifiers,
374but please keep your mind on the theoretical difference between links
375and interfaces.
376
377Site-local addresses are very vaguely defined in the specs, and both
378the specification and the KAME code need tons of improvements to
379enable its actual use.  For example, it is still very unclear how we
380define a site, or how we resolve host names in a site.  There is work
381underway to define behavior of routers at site border, but, we have
382almost no code for site boundary node support (neither forwarding nor
383routing) and we bet almost noone has.  We recommend, at this moment,
384you to use global addresses for experiments - there are way too many
385pitfalls if you use site-local addresses.
386
3871.3.1 Kernel internal
388
389In the kernel, the link index for a link-local scope address is
390embedded into the 2nd 16bit-word (the 3rd and 4th bytes) in the IPv6
391address.
392For example, you may see something like:
393	fe80:1::200:f8ff:fe01:6317
394in the routing table and the interface address structure (struct
395in6_ifaddr).  The address above is a link-local unicast address which
396belongs to a network link whose link identifier is 1 (note that it
397eqauls to the interface index by the assumption of our
398implementation).  The embedded index enables us to identify IPv6
399link-local addresses over multiple links effectively and with only a
400little code change.
401
402The use of the internal format must be limited inside the kernel.  In
403particular, addresses sent by an application should not contain the
404embedded index (except via some very special APIs such as routing
405sockets).  Instead, the index should be specified in the sin6_scope_id
406field of a sockaddr_in6 structure.  Obviously, packets sent to or
407received from must not contain the embedded index either, since the
408index is meaningful only within the sending/receiving node.
409
410In order to deal with the differences, several kernel routines are
411provided.  These are available by including <netinet6/scope_var.h>.
412Typically, the following functions will be most generally used:
413
414- int sa6_embedscope(struct sockaddr_in6 *sa6, int defaultok);
415  Embed sa6->sin6_scope_id into sa6->sin6_addr.  If sin6_scope_id is
416  0, defaultok is non-0, and the default zone ID (see RFC4007) is
417  configured, the default ID will be used instead of the value of the
418  sin6_scope_id field.  On success, sa6->sin6_scope_id will be reset
419  to 0.
420
421  This function returns 0 on success, or a non-0 error code otherwise.
422 
423- int sa6_recoverscope(struct sockaddr_in6 *sa6);
424  Extract embedded zone ID in sa6->sin6_addr and set
425  sa6->sin6_scope_id to that ID.  The embedded ID will be cleared with
426  0.
427
428  This function returns 0 on success, or a non-0 error code otherwise.
429
430- int in6_clearscope(struct in6_addr *in6);
431  Reset the embedded zone ID in 'in6' to 0.  This function never fails, and
432  returns 0 if the original address is intact or non 0 if the address is
433  modified.  The return value doesn't matter in most cases; currently, the
434  only point where we care about the return value is ip6_input() for checking
435  whether the source or destination addresses of the incoming packet is in
436  the embedded form.
437
438- int in6_setscope(struct in6_addr *in6, struct ifnet *ifp,
439                   u_int32_t *zoneidp);
440  Embed zone ID determined by the address scope type for 'in6' and the
441  interface 'ifp' into 'in6'.  If zoneidp is non NULL, *zoneidp will
442  also have the zone ID.
443
444  This function returns 0 on success, or a non-0 error code otherwise.
445
446The typical usage of these functions is as follows:
447
448sa6_embedscope() will be used at the socket or transport layer to
449convert a sockaddr_in6 structure passed by an application into the
450kernel-internal form.  In this usage, the second argument is often the
451'ip6_use_defzone' global variable.
452
453sa6_recoverscope() will also be used at the socket or transport layer
454to convert an in6_addr structure with the embedded zone ID into a
455sockaddr_in6 structure with the corresponding ID in the sin6_scope_id
456field (and without the embedded ID in sin6_addr).
457
458in6_clearscope() will be used just before sending a packet to the wire
459to remove the embedded ID.  In general, this must be done at the last
460stage of an output path, since otherwise the address would lose the ID
461and could be ambiguous with regard to scope.
462
463in6_setscope() will be used when the kernel receives a packet from the
464wire to construct the kernel internal form for each address field in
465the packet (typical examples are the source and destination addresses
466of the packet).  In the typical usage, the third argument 'zoneidp'
467will be NULL.  A non-NULL value will be used when the validity of the
468zone ID must be checked, e.g., when forwarding a packet to another
469link (see ip6_forward() for this usage).
470
471An application, when sending a packet, is basically assumed to specify
472the appropriate scope zone of the destination address by the
473sin6_scope_id field (this might be done transparently from the
474application with getaddrinfo() and the extended textual format - see
475below), or at least the default scope zone(s) must be configured as a
476last resort.  In some cases, however, an application could specify an
477ambiguous address with regard to scope, expecting it is disambiguated
478in the kernel by some other means.  A typical usage is to specify the
479outgoing interface through another API, which can disambiguate the
480unspecified scope zone.  Such a usage is not recommended, but the
481kernel implements some trick to deal with even this case.
482
483A rough sketch of the trick can be summarized as the following
484sequence.
485
486   sa6_embedscope(dst, ip6_use_defzone);
487   in6_selectsrc(dst, ..., &ifp, ...);
488   in6_setscope(&dst->sin6_addr, ifp, NULL);
489
490sa6_embedscope() first tries to convert sin6_scope_id (or the default
491zone ID) into the kernel-internal form.  This can fail with an
492ambiguous destination, but it still tries to get the outgoing
493interface (ifp) in the attempt of determining the source address of
494the outgoing packet using in6_selectsrc().  If the interface is
495detected, and the scope zone was originally ambiguous, in6_setscope()
496can finally determine the appropriate ID with the address itself and
497the interface, and construct the kernel-internal form.  See, for
498example, comments in udp6_output() for more concrete example.
499
500In any case, kernel routines except ones in netinet6/scope6.c MUST NOT
501directly refer to the embedded form.  They MUST use the above
502interface functions.  In particular, kernel routines MUST NOT have the
503following code fragment:
504
505	/* This is a bad practice.  Don't do this */
506	if (IN6_IS_ADDR_LINKLOCAL(&sin6->sin6_addr))
507		sin6->sin6_addr.s6_addr16[1] = htons(ifp->if_index);
508
509This is bad for several reasons.  First, address ambiguity is not
510specific to link-local addresses (any non-global multicast addresses
511are inherently ambiguous, and this is particularly true for
512interface-local addresses).  Secondly, this is vulnerable to future
513changes of the embedded form (the embedded position may change, or the
514zone ID may not actually be the interface index).  Only scope6.c
515routines should know the details.
516
517The above code fragment should thus actually be as follows:
518
519	/* This is correct. */
520	in6_setscope(&sin6->sin6_addr, ifp, NULL);
521	(and catch errors if possible and necessary)
522
5231.3.2 Interaction with API
524
525There are several candidates of API to deal with scoped addresses
526without ambiguity.
527
528The IPV6_PKTINFO ancillary data type or socket option defined in the
529advanced API (RFC2292 or RFC3542) can specify
530the outgoing interface of a packet.  Similarly, the IPV6_PKTINFO or
531IPV6_RECVPKTINFO socket options tell kernel to pass the incoming
532interface to user applications.
533
534These options are enough to disambiguate scoped addresses of an
535incoming packet, because we can uniquely identify the corresponding
536zone of the scoped address(es) by the incoming interface.  However,
537they are too strong for outgoing packets.  For example, consider a
538multi-sited node and suppose that more than one interface of the node
539belongs to a same site.  When we want to send a packet to the site,
540we can only specify one of the interfaces for the outgoing packet with
541these options; we cannot just say "send the packet to (one of the
542interfaces of) the site."
543
544Another kind of candidates is to use the sin6_scope_id member in the
545sockaddr_in6 structure, defined in RFC2553.  The KAME kernel
546interprets the sin6_scope_id field properly in order to disambiguate scoped
547addresses.  For example, if an application passes a sockaddr_in6
548structure that has a non-zero sin6_scope_id value to the sendto(2)
549system call, the kernel should send the packet to the appropriate zone
550according to the sin6_scope_id field.  Similarly, when the source or
551the destination address of an incoming packet is a scoped one, the
552kernel should detect the correct zone identifier based on the address
553and the receiving interface, fill the identifier in the sin6_scope_id
554field of a sockaddr_in6 structure, and then pass the packet to an
555application via the recvfrom(2) system call, etc.
556
557However, the semantics of the sin6_scope_id is still vague and on the
558way to standardization.  Additionally, not so many operating systems
559support the behavior above at this moment.
560
561In summary,
562- If your target system is limited to KAME based ones (i.e. BSD
563  variants and KAME snaps), use the sin6_scope_id field assuming the
564  kernel behavior described above.
565- Otherwise, (i.e. if your program should be portable on other systems
566  than BSDs)
567  + Use the advanced API to disambiguate scoped addresses of incoming
568    packets.
569  + To disambiguate scoped addresses of outgoing packets,
570    * if it is okay to just specify the outgoing interface, use the
571      advanced API.  This would be the case, for example, when you
572      should only consider link-local addresses and your system
573      assumes 1-to-1 relationship between links and interfaces.
574    * otherwise, sorry but you lose.  Please rush the IETF IPv6
575      community into standardizing the semantics of the sin6_scope_id
576      field.
577
578Routing daemons and configuration programs, like route6d and ifconfig,
579will need to manipulate the "embedded" zone index.  These programs use
580routing sockets and ioctls (like SIOCGIFADDR_IN6) and the kernel API
581will return IPv6 addresses with the 2nd 16bit-word filled in.  The
582APIs are for manipulating kernel internal structure.  Programs that
583use these APIs have to be prepared about differences in kernels
584anyway.
585
586getaddrinfo(3) and getnameinfo(3) support an extended numeric IPv6
587syntax, as documented in RFC4007.  You can specify the outgoing link,
588by using the name of the outgoing interface as the link, like
589"fe80::1%ne0" (again, note that we assume there is 1-to-1 relationship
590between links and interfaces.)  This way you will be able to specify a
591link-local scoped address without much trouble.
592
593Other APIs like inet_pton(3) and inet_ntop(3) are inherently
594unfriendly with scoped addresses, since they are unable to annotate
595addresses with zone identifier.
596
5971.3.3 Interaction with users (command line)
598
599Most of user applications now support the extended numeric IPv6
600syntax.  In this case, you can specify outgoing link, by using the name
601of the outgoing interface like "fe80::1%ne0" (sorry for the duplicated
602notice, but please recall again that we assume 1-to-1 relationship
603between links and interfaces).  This is even the case for some
604management tools such as route(8) or ndp(8).  For example, to install
605the IPv6 default route by hand, you can type like
606	# route add -inet6 default fe80::9876:5432:1234:abcd%ne0
607(Although we suggest you to run dynamic routing instead of static
608routes, in order to avoid configuration mistakes.)
609
610Some applications have command line options for specifying an
611appropriate zone of a scoped address (like "ping6 -I ne0 ff02::1" to
612specify the outgoing interface).  However, you can't always expect such
613options.  Additionally, specifying the outgoing "interface" is in
614theory an overspecification as a way to specify the outgoing "link"
615(see above).  Thus, we recommend you to use the extended format
616described above.  This should apply to the case where the outgoing
617interface is specified.
618
619In any case, when you specify a scoped address to the command line,
620NEVER write the embedded form (such as ff02:1::1 or fe80:2::fedc),
621which should only be used inside the kernel (see Section 1.3.1), and 
622is not supposed to work.
623
6241.4 Plug and Play
625
626The KAME kit implements most of the IPv6 stateless address
627autoconfiguration in the kernel.
628Neighbor Discovery functions are implemented in the kernel as a whole.
629Router Advertisement (RA) input for hosts is implemented in the
630kernel.  Router Solicitation (RS) output for endhosts, RS input
631for routers, and RA output for routers are implemented in the
632userland.
633
6341.4.1 Assignment of link-local, and special addresses
635
636IPv6 link-local address is generated from IEEE802 address (ethernet MAC
637address).  Each of interface is assigned an IPv6 link-local address
638automatically, when the interface becomes up (IFF_UP).  Also, direct route
639for the link-local address is added to routing table.
640
641Here is an output of netstat command:
642
643Internet6:
644Destination                   Gateway                   Flags      Netif Expire
645fe80::%ed0/64                 link#1                    UC           ed0
646fe80::%ep0/64                 link#2                    UC           ep0
647
648Interfaces that has no IEEE802 address (pseudo interfaces like tunnel
649interfaces, or ppp interfaces) will borrow IEEE802 address from other
650interfaces, such as ethernet interfaces, whenever possible.
651If there is no IEEE802 hardware attached, last-resort pseudorandom value,
652which is from MD5(hostname), will be used as source of link-local address.
653If it is not suitable for your usage, you will need to configure the
654link-local address manually.
655
656If an interface is not capable of handling IPv6 (such as lack of multicast
657support), link-local address will not be assigned to that interface.
658See section 2 for details.
659
660Each interface joins the solicited multicast address and the
661link-local all-nodes multicast addresses (e.g.  fe80::1:ff01:6317
662and ff02::1, respectively, on the link the interface is attached).
663In addition to a link-local address, the loopback address (::1) will be
664assigned to the loopback interface.  Also, ::1/128 and ff01::/32 are
665automatically added to routing table, and loopback interface joins
666node-local multicast group ff01::1.
667
6681.4.2 Stateless address autoconfiguration on hosts
669
670In IPv6 specification, nodes are separated into two categories:
671routers and hosts.  Routers forward packets addressed to others, hosts does
672not forward the packets.  net.inet6.ip6.forwarding defines whether this
673node is a router or a host (router if it is 1, host if it is 0).
674
675It is NOT recommended to change net.inet6.ip6.forwarding while the node
676is in operation.  IPv6 specification defines behavior for "host" and "router"
677quite differently, and switching from one to another can cause serious
678troubles.  It is recommended to configure the variable at bootstrap time only.
679
680The first step in stateless address configuration is Duplicated Address
681Detection (DAD).  See 1.2 for more detail on DAD.
682
683When a host hears Router Advertisement from the router, a host may
684autoconfigure itself by stateless address autoconfiguration.
685This behavior can be controlled by net.inet6.ip6.accept_rtadv
686(host autoconfigures itself if it is set to 1).
687By autoconfiguration, network address prefix for the receiving interface
688(usually global address prefix) is added. The default route is also
689configured.
690
691Routers periodically generate Router Advertisement packets.  To
692request an adjacent router to generate RA packet, a host can transmit
693Router Solicitation.  To generate an RS packet at any time, use the
694"rtsol" command. The "rtsold" daemon is also available. "rtsold"
695generates Router Solicitation whenever necessary, and it works great
696for nomadic usage (notebooks/laptops).  If one wishes to ignore Router
697Advertisements, use sysctl to set net.inet6.ip6.accept_rtadv to 0.
698
699To generate Router Advertisement from a router, use the "rtadvd" daemon.
700
701Note that the IPv6 specification assumes the following items and that
702nonconforming cases are left unspecified:
703- Only hosts will listen to router advertisements
704- Hosts have single network interface (except loopback)
705This is therefore unwise to enable net.inet6.ip6.accept_rtadv on routers,
706or multi-interface host.  A misconfigured node can behave strange
707(KAME code allows nonconforming configuration, for those who would like
708to do some experiments).
709
710To summarize the sysctl knob:
711	accept_rtadv	forwarding	role of the node
712	---		---		---
713	0		0		host (to be manually configured)
714	0		1		router
715	1		0		autoconfigured host
716					(spec assumes that host has single
717					interface only, autoconfigred host with
718					multiple interface is out-of-scope)
719	1		1		invalid, or experimental
720					(out-of-scope of spec)
721
722See 1.2 in the document for relationship between DAD and autoconfiguration.
723
7241.4.3 DHCPv6
725
726We supply a tiny DHCPv6 server/client in kame/dhcp6. However, the
727implementation is premature (for example, this does NOT implement
728address lease/release), and it is not in default compilation tree on
729some platforms. If you want to do some experiment, compile it on your
730own.
731
732DHCPv6 and autoconfiguration also needs more work.  "Managed" and "Other"
733bits in RA have no special effect to stateful autoconfiguration procedure
734in DHCPv6 client program ("Managed" bit actually prevents stateless
735autoconfiguration, but no special action will be taken for DHCPv6 client).
736
7371.5 Generic tunnel interface
738
739GIF (Generic InterFace) is a pseudo interface for configured tunnel.
740Details are described in gif(4) manpage.
741Currently
742	v6 in v6
743	v6 in v4
744	v4 in v6
745	v4 in v4
746are available.  Use "gifconfig" to assign physical (outer) source
747and destination address to gif interfaces.
748Configuration that uses same address family for inner and outer IP
749header (v4 in v4, or v6 in v6) is dangerous.  It is very easy to
750configure interfaces and routing tables to perform infinite level
751of tunneling.  Please be warned.
752
753gif can be configured to be ECN-friendly.  See 4.5 for ECN-friendliness
754of tunnels, and gif(4) manpage for how to configure.
755
756If you would like to configure an IPv4-in-IPv6 tunnel with gif interface,
757read gif(4) carefully.  You may need to remove IPv6 link-local address
758automatically assigned to the gif interface.
759
7601.6 Address Selection
761
7621.6.1 Source Address Selection
763
764The KAME kernel chooses the source address for an outgoing packet
765sent from a user application as follows:
766
7671. if the source address is explicitly specified via an IPV6_PKTINFO
768   ancillary data item or the socket option of that name, just use it.
769   Note that this item/option overrides the bound address of the
770   corresponding (datagram) socket.
771
7722. if the corresponding socket is bound, use the bound address.
773
7743. otherwise, the kernel first tries to find the outgoing interface of
775   the packet.  If it fails, the source address selection also fails.
776   If the kernel can find an interface, choose the most appropriate
777   address based on the algorithm described in RFC3484.
778
779   The policy table used in this algorithm is stored in the kernel.
780   To install or view the policy, use the ip6addrctl(8) command.  The
781   kernel does not have pre-installed policy.  It is expected that the
782   default policy described in the draft should be installed at the
783   bootstrap time using this command.
784
785   This draft allows an implementation to add implementation-specific
786   rules with higher precedence than the rule "Use longest matching
787   prefix."  KAME's implementation has the following additional rules
788   (that apply in the appeared order):
789
790   - prefer addresses on alive interfaces, that is, interfaces with
791     the UP flag being on.  This rule is particularly useful for
792     routers, since some routing daemons stop advertising prefixes
793    (addresses) on interfaces that have become down.
794
795   In any case, addresses that break the scope zone of the
796   destination, or addresses whose zone do not contain the outgoing
797   interface are never chosen.
798
799When the procedure above fails, the kernel usually returns
800EADDRNOTAVAIL to the application.
801
802In some cases, the specification explicitly requires the
803implementation to choose a particular source address.  The source
804address for a Neighbor Advertisement (NA) message is an example.
805Under the spec (RFC2461 7.2.2) NA's source should be the target
806address of the corresponding NS's target.  In this case we follow the
807spec rather than the above rule.
808
809If you would like to prohibit the use of deprecated address for some
810reason, configure net.inet6.ip6.use_deprecated to 0.  The issue
811related to deprecated address is described in RFC2462 5.5.4 (NOTE:
812there is some debate underway in IETF ipngwg on how to use
813"deprecated" address).
814
815As documented in the source address selection document, temporary
816addresses for privacy extension are less preferred to public addresses
817by default.  However, for administrators who are particularly aware of
818the privacy, there is a system-wide sysctl(3) variable
819"net.inet6.ip6.prefer_tempaddr".  When the variable is set to
820non-zero, the kernel will rather prefer temporary addresses.  The
821default value of this variable is 0.
822
8231.6.2 Destination Address Ordering
824
825KAME's getaddrinfo(3) supports the destination address ordering
826algorithm described in RFC3484.  Getaddrinfo(3) needs to know the
827source address for each destination address and policy entries
828(described in the previous section) for the source and destination
829addresses.  To get the source address, the library function opens a
830UDP socket and tries to connect(2) for the destination.  To get the
831policy entry, the function issues sysctl(3).
832
8331.7 Jumbo Payload
834
835KAME supports the Jumbo Payload hop-by-hop option used to send IPv6
836packets with payloads longer than 65,535 octets.  But since currently
837KAME does not support any physical interface whose MTU is more than
83865,535, such payloads can be seen only on the loopback interface(i.e.
839lo0).
840
841If you want to try jumbo payloads, you first have to reconfigure the
842kernel so that the MTU of the loopback interface is more than 65,535
843bytes; add the following to the kernel configuration file:
844	options		"LARGE_LOMTU"		#To test jumbo payload
845and recompile the new kernel.
846
847Then you can test jumbo payloads by the ping6 command with -b and -s
848options.  The -b option must be specified to enlarge the size of the
849socket buffer and the -s option specifies the length of the packet,
850which should be more than 65,535.  For example, type as follows; 
851	% ping6 -b 70000 -s 68000 ::1
852
853The IPv6 specification requires that the Jumbo Payload option must not
854be used in a packet that carries a fragment header.  If this condition
855is broken, an ICMPv6 Parameter Problem message must be sent to the
856sender.  KAME kernel follows the specification, but you cannot usually
857see an ICMPv6 error caused by this requirement.
858
859If KAME kernel receives an IPv6 packet, it checks the frame length of
860the packet and compares it to the length specified in the payload
861length field of the IPv6 header or in the value of the Jumbo Payload
862option, if any.  If the former is shorter than the latter, KAME kernel
863discards the packet and increments the statistics.  You can see the
864statistics as output of netstat command with `-s -p ip6' option:
865	% netstat -s -p ip6
866	ip6:
867		(snip)
868		1 with data size < data length
869
870So, KAME kernel does not send an ICMPv6 error unless the erroneous
871packet is an actual Jumbo Payload, that is, its packet size is more
872than 65,535 bytes.  As described above, KAME kernel currently does not
873support physical interface with such a huge MTU, so it rarely returns an
874ICMPv6 error.
875
876TCP/UDP over jumbogram is not supported at this moment.  This is because
877we have no medium (other than loopback) to test this.  Contact us if you
878need this.
879
880IPsec does not work on jumbograms.  This is due to some specification twists
881in supporting AH with jumbograms (AH header size influences payload length,
882and this makes it real hard to authenticate inbound packet with jumbo payload
883option as well as AH).
884
885There are fundamental issues in *BSD support for jumbograms.  We would like to
886address those, but we need more time to finalize the task.  To name a few:
887- mbuf pkthdr.len field is typed as "int" in 4.4BSD, so it cannot hold
888  jumbogram with len > 2G on 32bit architecture CPUs.  If we would like to
889  support jumbogram properly, the field must be expanded to hold 4G +
890  IPv6 header + link-layer header.  Therefore, it must be expanded to at least
891  int64_t (u_int32_t is NOT enough).
892- We mistakingly use "int" to hold packet length in many places.  We need
893  to convert them into larger numeric type.  It needs a great care, as we may
894  experience overflow during packet length computation.
895- We mistakingly check for ip6_plen field of IPv6 header for packet payload
896  length in various places.  We should be checking mbuf pkthdr.len instead.
897  ip6_input() will perform sanity check on jumbo payload option on input,
898  and we can safely use mbuf pkthdr.len afterwards.
899- TCP code needs careful updates in bunch of places, of course.
900
9011.8 Loop prevention in header processing
902
903IPv6 specification allows arbitrary number of extension headers to
904be placed onto packets.  If we implement IPv6 packet processing
905code in the way BSD IPv4 code is implemented, kernel stack may
906overflow due to long function call chain.  KAME sys/netinet6 code
907is carefully designed to avoid kernel stack overflow.  Because of
908this, KAME sys/netinet6 code defines its own protocol switch
909structure, as "struct ip6protosw" (see netinet6/ip6protosw.h).
910
911In addition to this, we restrict the number of extension headers
912(including the IPv6 header) in each incoming packet, in order to
913prevent a DoS attack that tries to send packets with a massive number
914of extension headers.  The upper limit can be configured by the sysctl
915value net.inet6.ip6.hdrnestlimit.  In particular, if the value is 0,
916the node will allow an arbitrary number of headers. As of writing this
917document, the default value is 50.
918
919IPv4 part (sys/netinet) remains untouched for compatibility.
920Because of this, if you receive IPsec-over-IPv4 packet with massive
921number of IPsec headers, kernel stack may blow up.  IPsec-over-IPv6 is okay.
922
9231.9 ICMPv6
924
925After RFC2463 was published, IETF ipngwg has decided to disallow ICMPv6 error
926packet against ICMPv6 redirect, to prevent ICMPv6 storm on a network medium.
927KAME already implements this into the kernel.
928
929RFC2463 requires rate limitation for ICMPv6 error packets generated by a
930node, to avoid possible DoS attacks.  KAME kernel implements two rate-
931limitation mechanisms, tunable via sysctl:
932- Minimum time interval between ICMPv6 error packets
933	KAME kernel will generate no more than one ICMPv6 error packet,
934	during configured time interval.  net.inet6.icmp6.errratelimit
935	controls the interval (default: disabled).
936- Maximum ICMPv6 error packet-per-second
937	KAME kernel will generate no more than the configured number of
938	packets in one second.  net.inet6.icmp6.errppslimit controls the
939	maximum packet-per-second value (default: 200pps)
940Basically, we need to pick values that are suitable against the bandwidth
941of link layer devices directly attached to the node.  In some cases the
942default values may not fit well.  We are still unsure if the default value
943is sane or not.  Comments are welcome.
944
9451.10 Applications
946
947For userland programming, we support IPv6 socket API as specified in
948RFC2553/3493, RFC3542 and upcoming internet drafts.
949
950TCP/UDP over IPv6 is available and quite stable.  You can enjoy "telnet",
951"ftp", "rlogin", "rsh", "ssh", etc.  These applications are protocol
952independent.  That is, they automatically chooses IPv4 or IPv6
953according to DNS.
954
9551.11 Kernel Internals
956
957 (*) TCP/UDP part is handled differently between operating system platforms.
958     See 1.12 for details.
959
960The current KAME has escaped from the IPv4 netinet logic.  While
961ip_forward() calls ip_output(), ip6_forward() directly calls
962if_output() since routers must not divide IPv6 packets into fragments.
963
964ICMPv6 should contain the original packet as long as possible up to
9651280.  UDP6/IP6 port unreach, for instance, should contain all
966extension headers and the *unchanged* UDP6 and IP6 headers.
967So, all IP6 functions except TCP6 never convert network byte
968order into host byte order, to save the original packet.
969
970tcp6_input(), udp6_input() and icmp6_input() can't assume that IP6
971header is preceding the transport headers due to extension
972headers.  So, in6_cksum() was implemented to handle packets whose IP6
973header and transport header is not continuous.  TCP/IP6 nor UDP/IP6
974header structure don't exist for checksum calculation.
975
976To process IP6 header, extension headers and transport headers easily,
977KAME requires network drivers to store packets in one internal mbuf or
978one or more external mbufs.  A typical old driver prepares two
979internal mbufs for 100 - 208 bytes data, however, KAME's reference
980implementation stores it in one external mbuf.
981
982"netstat -s -p ip6" tells you whether or not your driver conforms
983KAME's requirement.  In the following example, "cce0" violates the
984requirement. (For more information, refer to Section 2.)
985
986        Mbuf statistics:
987                317 one mbuf
988                two or more mbuf::
989                        lo0 = 8
990			cce0 = 10
991                3282 one ext mbuf
992                0 two or more ext mbuf
993
994Each input function calls IP6_EXTHDR_CHECK in the beginning to check
995if the region between IP6 and its header is
996continuous.  IP6_EXTHDR_CHECK calls m_pullup() only if the mbuf has
997M_LOOP flag, that is, the packet comes from the loopback
998interface.  m_pullup() is never called for packets coming from physical
999network interfaces.
1000
1001TCP6 reassembly makes use of IP6 header to store reassemble
1002information.  IP6 is not supposed to be just before TCP6, so
1003ip6tcpreass structure has a pointer to TCP6 header.  Of course, it has
1004also a pointer back to mbuf to avoid m_pullup().
1005
1006Like TCP6, both IP and IP6 reassemble functions never call m_pullup().
1007
1008xxx_ctlinput() calls in_mrejoin() on PRC_IFNEWADDR.  We think this is
1009one of 4.4BSD implementation flaws.  Since 4.4BSD keeps ia_multiaddrs
1010in in_ifaddr{}, it can't use multicast feature if the interface has no
1011unicast address.  So, if an application joins to an interface and then
1012all unicast addresses are removed from the interface, the application
1013can't send/receive any multicast packets.  Moreover, if a new unicast
1014address is assigned to the interface, in_mrejoin() must be called.
1015KAME's interfaces, however, have ALWAYS one link-local unicast
1016address.  These extensions have thus not been implemented in KAME.
1017
10181.12 IPv4 mapped address and IPv6 wildcard socket
1019
1020RFC2553/3493 describes IPv4 mapped address (3.7) and special behavior
1021of IPv6 wildcard bind socket (3.8).  The spec allows you to:
1022- Accept IPv4 connections by AF_INET6 wildcard bind socket.
1023- Transmit IPv4 packet over AF_INET6 socket by using special form of
1024  the address like ::ffff:10.1.1.1.
1025but the spec itself is very complicated and does not specify how the
1026socket layer should behave.
1027Here we call the former one "listening side" and the latter one "initiating
1028side", for reference purposes.
1029
1030Almost all KAME implementations treat tcp/udp port number space separately
1031between IPv4 and IPv6.  You can perform wildcard bind on both of the address
1032families, on the same port.
1033
1034There are some OS-platform differences in KAME code, as we use tcp/udp
1035code from different origin.  The following table summarizes the behavior.
1036
1037		listening side		initiating side
1038		(AF_INET6 wildcard	(connection to ::ffff:10.1.1.1)
1039		socket gets IPv4 conn.)
1040		---			---
1041KAME/BSDI3	not supported		not supported
1042KAME/FreeBSD228	not supported		not supported
1043KAME/FreeBSD3x	configurable		supported
1044		default: enabled
1045KAME/FreeBSD4x	configurable		supported
1046		default: enabled
1047KAME/NetBSD	configurable		supported
1048		default: disabled
1049KAME/BSDI4	enabled			supported
1050KAME/OpenBSD	not supported		not supported
1051
1052The following sections will give you more details, and how you can
1053configure the behavior.
1054
1055Comments on listening side:
1056
1057It looks that RFC2553/3493 talks too little on wildcard bind issue,
1058specifically on (1) port space issue, (2) failure mode, (3) relationship
1059between AF_INET/INET6 wildcard bind like ordering constraint, and (4) behavior
1060when conflicting socket is opened/closed.  There can be several separate
1061interpretation for this RFC which conform to it but behaves differently.
1062So, to implement portable application you should assume nothing
1063about the behavior in the kernel.  Using getaddrinfo() is the safest way.
1064Port number space and wildcard bind issues were discussed in detail
1065on ipv6imp mailing list, in mid March 1999 and it looks that there's
1066no concrete consensus (means, up to implementers).  You may want to
1067check the mailing list archives.
1068We supply a tool called "bindtest" that explores the behavior of
1069kernel bind(2).  The tool will not be compiled by default.
1070
1071If a server application would like to accept IPv4 and IPv6 connections,
1072it should use AF_INET and AF_INET6 socket (you'll need two sockets).
1073Use getaddrinfo() with AI_PASSIVE into ai_flags, and socket(2) and bind(2)
1074to all the addresses returned.
1075By opening multiple sockets, you can accept connections onto the socket with
1076proper address family.  IPv4 connections will be accepted by AF_INET socket,
1077and IPv6 connections will be accepted by AF_INET6 socket (NOTE: KAME/BSDI4
1078kernel sometimes violate this - we will fix it).
1079
1080If you try to support IPv6 traffic only and would like to reject IPv4
1081traffic, always check the peer address when a connection is made toward
1082AF_INET6 listening socket.  If the address is IPv4 mapped address, you may
1083want to reject the connection.  You can check the condition by using
1084IN6_IS_ADDR_V4MAPPED() macro.  This is one of the reasons the author of
1085the section (itojun) dislikes special behavior of AF_INET6 wildcard bind.
1086
1087Comments on initiating side:
1088
1089Advise to application implementers: to implement a portable IPv6 application
1090(which works on multiple IPv6 kernels), we believe that the following
1091is the key to the success:
1092- NEVER hardcode AF_INET nor AF_INET6.
1093- Use getaddrinfo() and getnameinfo() throughout the system.
1094  Never use gethostby*(), getaddrby*(), inet_*() or getipnodeby*().
1095- If you would like to connect to destination, use getaddrinfo() and try
1096  all the destination returned, like telnet does.
1097- Some of the IPv6 stack is shipped with buggy getaddrinfo().  Ship a minimal
1098  working version with your application and use that as last resort.
1099
1100If you would like to use AF_INET6 socket for both IPv4 and IPv6 outgoing
1101connection, you will need tweaked implementation in DNS support libraries,
1102as documented in RFC2553/3493 6.1.  KAME libinet6 includes the tweak in
1103getipnodebyname().  Note that getipnodebyname() itself is not recommended as
1104it does not handle scoped IPv6 addresses at all.  For IPv6 name resolution
1105getaddrinfo() is the preferred API.  getaddrinfo() does not implement the
1106tweak.
1107
1108When writing applications that make outgoing connections, story goes much
1109simpler if you treat AF_INET and AF_INET6 as totally separate address family.
1110{set,get}sockopt issue goes simpler, DNS issue will be made simpler.  We do
1111not recommend you to rely upon IPv4 mapped address.
1112
11131.12.1 KAME/BSDI3 and KAME/FreeBSD228
1114
1115The platforms do not support IPv4 mapped address at all (both listening side
1116and initiating side).  AF_INET6 and AF_INET sockets are totally separated.
1117
1118Port number space is totally separate between AF_INET and AF_INET6 sockets. 
1119
1120It should be noted that KAME/BSDI3 and KAME/FreeBSD228 are not conformant
1121to RFC2553/3493 section 3.7 and 3.8.  It is due to code sharing reasons.
1122
11231.12.2 KAME/FreeBSD[34]x
1124
1125KAME/FreeBSD3x and KAME/FreeBSD4x use shared tcp4/6 code (from
1126sys/netinet/tcp*) and shared udp4/6 code (from sys/netinet/udp*).
1127They use unified inpcb/in6pcb structure.
1128
11291.12.2.1 KAME/FreeBSD[34]x, listening side
1130
1131The platform can be configured to support IPv4 mapped address/special
1132AF_INET6 wildcard bind (enabled by default).  There is no kernel compilation
1133option to disable it.  You can enable/disable the behavior with sysctl
1134(per-node), or setsockopt (per-socket).
1135
1136Wildcard AF_INET6 socket grabs IPv4 connection if and only if the following 
1137conditions are satisfied:
1138- there's no AF_INET socket that matches the IPv4 connection
1139- the AF_INET6 socket is configured to accept IPv4 traffic, i.e.
1140  getsockopt(IPV6_V6ONLY) returns 0.
1141
1142(XXX need checking)
1143
11441.12.2.2 KAME/FreeBSD[34]x, initiating side
1145
1146KAME/FreeBSD3x supports outgoing connection to IPv4 mapped address
1147(::ffff:10.1.1.1), if the node is configured to accept IPv4 connections
1148by AF_INET6 socket.
1149
1150(XXX need checking)
1151
11521.12.3 KAME/NetBSD
1153
1154KAME/NetBSD uses shared tcp4/6 code (from sys/netinet/tcp*) and shared
1155udp4/6 code (from sys/netinet/udp*).  The implementation is made differently
1156from KAME/FreeBSD[34]x.  KAME/NetBSD uses separate inpcb/in6pcb structures,
1157while KAME/FreeBSD[34]x uses merged inpcb structure.
1158
1159It should be noted that the default configuration of KAME/NetBSD is not
1160conformant to RFC2553/3493 section 3.8.  It is intentionally turned off by
1161default for security reasons.
1162
1163The platform can be configured to support IPv4 mapped address/special AF_INET6
1164wildcard bind (disabled by default).  Kernel behavior can be summarized as
1165follows:
1166- default: special support code will be compiled in, but is disabled by
1167  default.  It can be controlled by sysctl (net.inet6.ip6.v6only),
1168  or setsockopt(IPV6_V6ONLY).
1169- add "INET6_BINDV6ONLY": No special support code for AF_INET6 wildcard socket
1170  will be compiled in.  AF_INET6 sockets and AF_INET sockets are totally
1171  separate.  The behavior is similar to what described in 1.12.1.
1172
1173sysctl setting will affect per-socket configuration at in6pcb creation time
1174only.  In other words, per-socket configuration will be copied from sysctl
1175configuration at in6pcb creation time.  To change per-socket behavior, you
1176must perform setsockopt or reopen the socket.  Change in sysctl configuration
1177will not change the behavior or sockets that are already opened.
1178
11791.12.3.1 KAME/NetBSD, listening side
1180
1181Wildcard AF_INET6 socket grabs IPv4 connection if and only if the following 
1182conditions are satisfied:
1183- there's no AF_INET socket that matches the IPv4 connection
1184- the AF_INET6 socket is configured to accept IPv4 traffic, i.e.
1185  getsockopt(IPV6_V6ONLY) returns 0.
1186
1187You cannot bind(2) with IPv4 mapped address.  This is a workaround for port
1188number duplicate and other twists.
1189
11901.12.3.2 KAME/NetBSD, initiating side
1191
1192When getsockopt(IPV6_V6ONLY) is 0 for a socket, you can make an outgoing
1193traffic to IPv4 destination over AF_INET6 socket, using IPv4 mapped
1194address destination (::ffff:10.1.1.1).
1195
1196When getsockopt(IPV6_V6ONLY) is 1 for a socket, you cannot use IPv4 mapped
1197address for outgoing traffic.
1198
11991.12.4 KAME/BSDI4
1200
1201KAME/BSDI4 uses NRL-based TCP/UDP stack and inpcb source code,
1202which was derived from NRL IPv6/IPsec stack.  We guess it supports IPv4 mapped
1203address and speical AF_INET6 wildcard bind.  The implementation is, again,
1204different from other KAME/*BSDs.
1205
12061.12.4.1 KAME/BSDI4, listening side
1207
1208NRL inpcb layer supports special behavior of AF_INET6 wildcard socket.
1209There is no way to disable the behavior.
1210
1211Wildcard AF_INET6 socket grabs IPv4 connection if and only if the following 
1212condition is satisfied:
1213- there's no AF_INET socket that matches the IPv4 connection
1214
12151.12.4.2 KAME/BSDI4, initiating side
1216
1217KAME/BSDi4 supports connection initiation to IPv4 mapped address
1218(like ::ffff:10.1.1.1).
1219
12201.12.5 KAME/OpenBSD
1221
1222KAME/OpenBSD uses NRL-based TCP/UDP stack and inpcb source code,
1223which was derived from NRL IPv6/IPsec stack.
1224
1225It should be noted that KAME/OpenBSD is not conformant to RFC2553/3493 section
12263.7 and 3.8.  It is intentionally omitted for security reasons.
1227
12281.12.5.1 KAME/OpenBSD, listening side
1229
1230KAME/OpenBSD disables special behavior on AF_INET6 wildcard bind for
1231security reasons (if IPv4 traffic toward AF_INET6 wildcard bind is allowed,
1232access control will become much harder).  KAME/BSDI4 uses NRL-based TCP/UDP
1233stack as well, however, the behavior is different due to OpenBSD's security
1234policy.
1235
1236As a result the behavior of KAME/OpenBSD is similar to KAME/BSDI3 and
1237KAME/FreeBSD228 (see 1.12.1 for more detail).
1238
12391.12.5.2 KAME/OpenBSD, initiating side
1240
1241KAME/OpenBSD does not support connection initiation to IPv4 mapped address
1242(like ::ffff:10.1.1.1).
1243
12441.12.6 More issues
1245
1246IPv4 mapped address support adds a big requirement to EVERY userland codebase.
1247Every userland code should check if an AF_INET6 sockaddr contains IPv4
1248mapped address or not.  This adds many twists:
1249
1250- Access controls code becomes harder to write.
1251  For example, if you would like to reject packets from 10.0.0.0/8,
1252  you need to reject packets to AF_INET socket from 10.0.0.0/8,
1253  and to AF_INET6 socket from ::ffff:10.0.0.0/104.
1254- If a protocol on top of IPv4 is defined differently with IPv6, we need to be
1255  really careful when we determine which protocol to use.
1256  For example, with FTP protocol, we can not simply use sa_family to determine
1257  FTP command sets.  The following example is incorrect:
1258	if (sa_family == AF_INET)
1259		use EPSV/EPRT or PASV/PORT;	/*IPv4*/
1260	else if (sa_family == AF_INET6)
1261		use EPSV/EPRT or LPSV/LPRT;	/*IPv6*/
1262	else
1263		error;
1264  The correct code, with consideration to IPv4 mapped address, would be:
1265	if (sa_family == AF_INET)
1266		use EPSV/EPRT or PASV/PORT;	/*IPv4*/
1267	else if (sa_family == AF_INET6 && IPv4 mapped address)
1268		use EPSV/EPRT or PASV/PORT;	/*IPv4 command set on AF_INET6*/
1269	else if (sa_family == AF_INET6 && !IPv4 mapped address)
1270		use EPSV/EPRT or LPSV/LPRT;	/*IPv6*/
1271	else
1272		error;
1273  It is too much to ask for every body to be careful like this.
1274  The problem is, we are not sure if the above code fragment is perfect for
1275  all situations.
1276- By enabling kernel support for IPv4 mapped address (outgoing direction),
1277  servers on the kernel can be hosed by IPv6 native packet that has IPv4
1278  mapped address in IPv6 header source, and can generate unwanted IPv4 packets.
1279  draft-itojun-ipv6-transition-abuse-01.txt, draft-cmetz-v6ops-v4mapped-api-
1280  harmful-00.txt, and draft-itojun-v6ops-v4mapped-harmful-01.txt
1281  has more on this scenario.
1282
1283Due to the above twists, some of KAME userland programs has restrictions on
1284the use of IPv4 mapped addresses:
1285- rshd/rlogind do not accept connections from IPv4 mapped address.
1286  This is to avoid malicious use of IPv4 mapped address in IPv6 native
1287  packet, to bypass source-address based authentication.
1288- ftp/ftpd assume that you are on dual stack network.  IPv4 mapped address
1289  will be decoded in userland, and will be passed to AF_INET sockets
1290  (in other words, ftp/ftpd do not support SIIT environment).
1291
12921.12.7 Interaction with SIIT translator
1293
1294SIIT translator is specified in RFC2765.  KAME node cannot become a SIIT
1295translator box, nor SIIT end node (a node in SIIT cloud).
1296
1297To become a SIIT translator box, we need to put additional code for that.
1298We do not have the code in our tree at this moment.
1299
1300There are multiple reasons that we are unable to become SIIT end node.
1301(1) SIIT translators require end nodes in the SIIT cloud to be IPv6-only.
1302Since we are unable to compile INET-less kernel, we are unable to become
1303SIIT end node.  (2) As presented in 1.12.6, some of our userland code assumes
1304dual stack network.  (3) KAME stack filters out IPv6 packets with IPv4
1305mapped address in the header, to secure non-SIIT case (which is much more
1306common).  Effectively KAME node will reject any packets via SIIT translator
1307box.  See section 1.14 for more detail about the last item.
1308
1309There are documentation issues too - SIIT document requires very strange
1310things.  For example, SIIT document asks IPv6-only (meaning no IPv4 code)
1311node to be able to construct IPv4 IPsec headers.  If a node knows how to
1312construct IPv4 IPsec headers, that is not an IPv6-only node, it is a dual-stack
1313node.  The requirements imposed in SIIT document contradict with the other
1314part of the document itself.
1315
13161.13 sockaddr_storage
1317
1318When RFC2553 was about to be finalized, there was discussion on how struct
1319sockaddr_storage members are named.  One proposal is to prepend "__" to the
1320members (like "__ss_len") as they should not be touched.  The other proposal
1321was that don't prepend it (like "ss_len") as we need to touch those members
1322directly.  There was no clear consensus on it.
1323
1324As a result, RFC2553 defines struct sockaddr_storage as follows:
1325	struct sockaddr_storage {
1326		u_char	__ss_len;	/* address length */
1327		u_char	__ss_family;	/* address family */
1328		/* and bunch of padding */
1329	};
1330On the contrary, XNET draft defines as follows:
1331	struct sockaddr_storage {
1332		u_char	ss_len;		/* address length */
1333		u_char	ss_family;	/* address family */
1334		/* and bunch of padding */
1335	};
1336
1337In December 1999, it was agreed that RFC2553bis (RFC3493) should pick the
1338latter (XNET) definition.
1339
1340KAME kit prior to December 1999 used RFC2553 definition.  KAME kit after
1341December 1999 (including December) will conform to XNET definition,
1342based on RFC3493 discussion.
1343
1344If you look at multiple IPv6 implementations, you will be able to see
1345both definitions.  As an userland programmer, the most portable way of
1346dealing with it is to:
1347(1) ensure ss_family and/or ss_len are available on the platform, by using
1348    GNU autoconf,
1349(2) have -Dss_family=__ss_family to unify all occurences (including header
1350    file) into __ss_family, or
1351(3) never touch __ss_family.  cast to sockaddr * and use sa_family like:
1352	struct sockaddr_storage ss;
1353	family = ((struct sockaddr *)&ss)->sa_family
1354
13551.14 Invalid addresses on the wire
1356
1357Some of IPv6 transition technologies embed IPv4 address into IPv6 address.
1358These specifications themselves are fine, however, there can be certain
1359set of attacks enabled by these specifications.  Recent speicifcation
1360documents covers up those issues, however, there are already-published RFCs
1361that does not have protection against those (like using source address of
1362::ffff:127.0.0.1 to bypass "reject packet from remote" filter).
1363
1364To name a few, these address ranges can be used to hose an IPv6 implementation,
1365or bypass security controls:
1366- IPv4 mapped address that embeds unspecified/multicast/loopback/broadcast
1367  IPv4 address (if they are in IPv6 native packet header, they are malicious)
1368	::ffff:0.0.0.0/104	::ffff:127.0.0.0/104
1369	::ffff:224.0.0.0/100	::ffff:255.0.0.0/104 
1370- 6to4 (RFC3056) prefix generated from unspecified/multicast/loopback/
1371  broadcast/private IPv4 address
1372	2002:0000::/24		2002:7f00::/24		2002:e000::/24
1373	2002:ff00::/24		2002:0a00::/24		2002:ac10::/28	
1374	2002:c0a8::/32
1375- IPv4 compatible address that embeds unspecified/multicast/loopback/broadcast
1376  IPv4 address (if they are in IPv6 native packet header, they are malicious).
1377  Note that, since KAME doe snot support RFC1933/2893 auto tunnels, KAME nodes
1378  are not vulnerable to these packets.
1379	::0.0.0.0/104	::127.0.0.0/104	::224.0.0.0/100	::255.0.0.0/104 
1380
1381Also, since KAME does not support RFC1933/2893 auto tunnels, seeing IPv4
1382compatible is very rare.  You should take caution if you see those on the wire.
1383
1384If we see IPv6 packets with IPv4 mapped address (::ffff:0.0.0.0/96) in the
1385header in dual-stack environment (not in SIIT environment), they indicate
1386that someone is trying to inpersonate IPv4 peer.  The packet should be dropped.
1387
1388IPv6 specifications do not talk very much about IPv6 unspecified address (::)
1389in the IPv6 source address field.  Clarification is in progress.
1390Here are couple of comments:
1391- IPv6 unspecified address can be used in IPv6 source address field, if and
1392  only if we have no legal source address for the node.  The legal situations
1393  include, but may not be limited to, (1) MLD while no IPv6 address is assigned
1394  to the node and (2) DAD.
1395- If IPv6 TCP packet has IPv6 unspecified address, it is an attack attempt.
1396  The form can be used as a trigger for TCP DoS attack.  KAME code already
1397  filters them out.
1398- The following examples are seemingly illegal.  It seems that there's general
1399  consensus among ipngwg for those.  (1) mobile-ip6 home address option,
1400  (2) offlink packets (so routers should not forward them).
1401  KAME implmements (2) already.
1402
1403KAME code is carefully written to avoid such incidents.  More specifically,
1404KAME kernel will reject packets with certain source/dstination address in IPv6
1405base header, or IPv6 routing header.  Also, KAME default configuration file
1406is written carefully, to avoid those attacks.
1407
1408draft-itojun-ipv6-transition-abuse-01.txt, draft-cmetz-v6ops-v4mapped-api-
1409harmful-00.txt and draft-itojun-v6ops-v4mapped-harmful-01.txt has more on
1410this issue.
1411
14121.15 Node's required addresses
1413
1414RFC2373 section 2.8 talks about required addresses for an IPv6
1415node.  The section talks about how KAME stack manages those required
1416addresses.
1417
14181.15.1 Host case
1419
1420The following items are automatically assigned to the node (or the node will
1421automatically joins the group), at bootstrap time:
1422- Loopback address
1423- All-nodes multicast addresses (ff01::1)
1424
1425The following items will be automatically handled when the interface becomes
1426IFF_UP:
1427- Its link-local address for each interface
1428- Solicited-node multicast address for link-local addresses
1429- Link-local allnodes multicast address (ff02::1)
1430
1431The following items need to be configured manually by ifconfig(8) or prefix(8).
1432Alternatively, these can be autoconfigured by using stateless address
1433autoconfiguration.
1434- Assigned unicast/anycast addresses
1435- Solicited-Node multicast address for assigned unicast address
1436
1437Users can join groups by using appropriate system calls like setsockopt(2).
1438
14391.15.2 Router case
1440
1441In addition to the above, routers needs to handle the following items.
1442
1443The following items need to be configured manually by using ifconfig(8).
1444o The subnet-router anycast addresses for the interfaces it is configured
1445  to act as a router on (prefix::/64)
1446o All other anycast addresses with which the router has been configured
1447
1448The router will join the following multicast group when rtadvd(8) is available
1449for the interface.
1450o All-Routers Multicast Addresses (ff02::2)
1451
1452Routing daemons will join appropriate multicast groups, as necessary,
1453like ff02::9 for RIPng.
1454
1455Users can join groups by using appropriate system calls like setsockopt(2).
1456
14571.16 Advanced API
1458
1459Current KAME kernel implements RFC3542 API.  It also implements RFC2292 API,
1460for backward compatibility purposes with *BSD-integrated codebase.
1461KAME tree ships with RFC3542 headers.
1462*BSD-integrated codebase implements either RFC2292, or RFC3542, API.
1463see "COVERAGE" document for detailed implementation status.
1464
1465Here are couple of issues to mention:
1466- *BSD-integrated binaries, compiled for RFC2292, will work on KAME kernel.
1467  For example, OpenBSD 2.7 /sbin/rtsol will work on KAME/openbsd kernel.
1468- KAME binaries, compiled using RFC3542, will not work on *BSD-integrated
1469  kenrel.  For example, KAME /usr/local/v6/sbin/rtsol will not work on
1470  OpenBSD 2.7 kernel.
1471- RFC3542 API is not compatible with RFC2292 API.  RFC3542 #define symbols
1472  conflict with RFC2292 symbols.  Therefore, if you compile programs that
1473  assume RFC2292 API, the compilation itself goes fine, however, the compiled
1474  binary will not work correctly.  The problem is not KAME issue, but API
1475  issue.  For example, Solaris 8 implements RFC3542 API.  If you compile
1476  RFC2292-based code on Solaris 8, the binary can behave strange.
1477
1478There are few (or couple of) incompatible behavior in RFC2292 binary backward
1479compatibility support in KAME tree.  To enumerate:
1480- Type 0 routing header lacks support for strict/loose bitmap.
1481  Even if we see packets with "strict" bit set, those bits will not be made
1482  visible to the userland.
1483  Background: RFC2292 document is based on RFC1883 IPv6, and it uses
1484  strict/loose bitmap.  RFC3542 document is based on RFC2460 IPv6, and it has
1485  no strict/loose bitmap (it was removed from RFC2460).  KAME tree obeys
1486  RFC2460 IPv6, and lacks support for strict/loose bitmap.
1487
1488The RFC3542 documents leave some particular cases unspecified.  The
1489KAME implementation treats them as follows:
1490- The IPV6_DONTFRAG and IPV6_RECVPATHMTU socket options for TCP
1491  sockets are ignored.  That is, the setsocktopt() call will succeed
1492  but the specified value will have no effect.
1493
14941.17 DNS resolver
1495
1496KAME ships with modified DNS resolver, in libinet6.a.
1497libinet6.a has a comple of extensions against libc DNS resolver:
1498- Can take "options insecure1" and "options insecure2" in /etc/resolv.conf,
1499  which toggles RES_INSECURE[12] option flag bit.
1500- EDNS0 receive buffer size notification support.  It can be enabled by
1501  "options edns0" in /etc/resolv.conf.  See USAGE for details.
1502- IPv6 transport support (queries/responses over IPv6).  Most of BSD official
1503  releases now has it already.
1504- Partial A6 chain chasing/DNAME/bit string label support (KAME/BSDI4).
1505
1506
15072. Network Drivers
1508
1509KAME requires three items to be added into the standard drivers:
1510
1511(1) (freebsd[234] and bsdi[34] only) mbuf clustering requirement.
1512    In this stable release, we changed MINCLSIZE into MHLEN+1 for all the
1513    operating systems in order to make all the drivers behave as we expect.  
1514
1515(2) multicast.  If "ifmcstat" yields no multicast group for a
1516    interface, that interface has to be patched.
1517
1518To avoid troubles, we suggest you to comment out the device drivers
1519for unsupported/unnecessary cards, from the kernel configuration file.
1520If you accidentally enable unsupported drivers, some of the userland
1521tools may not work correctly (routing daemons are typical example).
1522
1523In the following sections, "official support" means that KAME developers
1524are using that ethernet card/driver frequently.
1525
1526(NOTE: In the past we required all pcmcia drivers to have a call to
1527in6_ifattach().  We have no such requirement any more)
1528
15292.1 FreeBSD 2.2.x-RELEASE
1530
1531Here is a list of FreeBSD 2.2.x-RELEASE drivers and its conditions:
1532
1533	driver	mbuf(1)		multicast(2)	official support?
1534	---	---		---		---
1535	(Ethernet)
1536	ar	looks ok	-		-
1537	cnw	ok		ok		yes (*)
1538	ed	ok		ok		yes
1539	ep	ok		ok		yes
1540	fe	ok		ok		yes
1541	sn	looks ok	-		-   (*)
1542	vx	looks ok	-		-
1543	wlp	ok		ok		-   (*)
1544	xl	ok		ok		yes
1545	zp	ok		ok		-
1546	(FDDI)
1547	fpa	looks ok	?		-
1548	(ATM)
1549	en	ok		ok		yes
1550	(Serial)
1551	lp	?		-		not work
1552	sl	?		-		not work
1553	sr	looks ok	ok		-   (**)
1554
1555You may want to add an invocation of "rtsol" in "/etc/pccard_ether",
1556if you are using notebook computers and PCMCIA ethernet card.
1557
1558(*) These drivers are distributed with PAO (http://www.jp.freebsd.org/PAO/).
1559
1560(**) There was some report says that, if you make sr driver up and down and
1561then up, the kernel may hang up.  We have disabled frame-relay support from
1562sr driver and after that this looks to be working fine.  If you need
1563frame-relay support to come back, please contact KAME developers.
1564
15652.2 BSD/OS 3.x
1566
1567The following lists BSD/OS 3.x device drivers and its conditions:
1568
1569	driver	mbuf(1)		multicast(2)	official support?
1570	---	---		---		---
1571	(Ethernet)
1572	cnw	ok		ok		yes
1573	de	ok		ok		-
1574	df	ok		ok		-
1575	eb	ok		ok		-
1576	ef	ok		ok		yes
1577	exp	ok		ok		-
1578	mz	ok		ok		yes
1579	ne	ok		ok		yes
1580	we	ok		ok		-
1581	(FDDI)
1582	fpa	ok		ok		-
1583	(ATM)
1584	en	maybe		ok		-
1585	(Serial)
1586	ntwo	ok		ok		yes
1587	sl	?		-		not work
1588	appp	?		-		not work
1589
1590You may want to use "@insert" directive in /etc/pccard.conf to invoke
1591"rtsol" command right after dynamic insertion of PCMCIA ethernet cards.
1592
15932.3 NetBSD
1594
1595The following table lists the network drivers we have tried so far.
1596
1597	driver		mbuf(1)	multicast(2)	official support?
1598	---		---	---		---
1599	(Ethernet)
1600	awi pcmcia/i386	ok	ok		-
1601	bah zbus/amiga	NG(*)
1602	cnw pcmcia/i386	ok	ok		yes
1603	ep pcmcia/i386	ok	ok		-
1604	le sbus/sparc	ok	ok		yes
1605	ne pci/i386	ok	ok		yes
1606	ne pcmcia/i386	ok	ok		yes
1607	wi pcmcia/i386	ok	ok		yes
1608	(ATM)
1609	en pci/i386	ok	ok		-
1610
1611(*) This may need some fix, but I'm not sure what arcnet interfaces assume...
1612
16132.4 FreeBSD 3.x-RELEASE
1614
1615Here is a list of FreeBSD 3.x-RELEASE drivers and its conditions:
1616
1617	driver	mbuf(1)		multicast(2)	official support?
1618	---	---		---		---
1619	(Ethernet)
1620	cnw	ok		ok		-(*)
1621	ed	?		ok		-
1622	ep	ok		ok		-
1623	fe	ok		ok		yes
1624	fxp	?(**)
1625	lnc	?		ok		-
1626	sn	?		?		-(*)
1627	wi	ok		ok		yes
1628	xl	?		ok		-
1629
1630(*) These drivers are distributed with PAO as PAO3
1631    (http://www.jp.freebsd.org/PAO/).
1632(**) there are trouble reports with multicast filter initialization.
1633
1634More drivers will just simply work on KAME FreeBSD 3.x-RELEASE but have not
1635been checked yet.
1636
16372.5 FreeBSD 4.x-RELEASE
1638
1639Here is a list of FreeBSD 4.x-RELEASE drivers and its conditions:
1640
1641	driver		multicast
1642	---		---
1643	(Ethernet)
1644	lnc/vmware	ok
1645
16462.6 OpenBSD 2.x
1647
1648Here is a list of OpenBSD 2.x drivers and its conditions:
1649
1650	driver		mbuf(1)		multicast(2)	official support?
1651	---		---		---		---
1652	(Ethernet)
1653	de pci/i386	ok		ok		yes
1654	fxp pci/i386	?(*)
1655	le sbus/sparc	ok		ok		yes
1656	ne pci/i386	ok		ok		yes
1657	ne pcmcia/i386	ok		ok		yes
1658	wi pcmcia/i386	ok		ok		yes
1659
1660(*) There seem to be some problem in driver, with multicast filter
1661configuration.  This happens with certain revision of chipset on the card.
1662Should be fixed by now by workaround in sys/net/if.c, but still not sure.
1663
16642.7 BSD/OS 4.x
1665
1666The following lists BSD/OS 4.x device drivers and its conditions:
1667
1668	driver	mbuf(1)		multicast(2)	official support?
1669	---	---		---		---
1670	(Ethernet)
1671	de	ok		ok		yes
1672	exp	(*)
1673
1674You may want to use "@insert" directive in /etc/pccard.conf to invoke
1675"rtsol" command right after dynamic insertion of PCMCIA ethernet cards.
1676
1677(*) exp driver has serious conflict with KAME initialization sequence.
1678A workaround is committed into sys/i386/pci/if_exp.c, and should be okay by now.
1679
16803. Translator
1681
1682We categorize IPv4/IPv6 translator into 4 types.
1683
1684Translator A --- It is used in the early stage of transition to make
1685it possible to establish a connection from an IPv6 host in an IPv6
1686island to an IPv4 host in the IPv4 ocean.
1687
1688Translator B --- It is used in the early stage of transition to make
1689it possible to establish a connection from an IPv4 host in the IPv4
1690ocean to an IPv6 host in an IPv6 island.
1691
1692Translator C --- It is used in the late stage of transition to make it
1693possible to establish a connection from an IPv4 host in an IPv4 island
1694to an IPv6 host in the IPv6 ocean.
1695
1696Translator D --- It is used in the late stage of transition to make it
1697possible to establish a connection from an IPv6 host in the IPv6 ocean
1698to an IPv4 host in an IPv4 island.
1699
1700KAME provides an TCP relay translator for category A.  This is called
1701"FAITH".  We also provide IP header translator for category A.
1702
17033.1 FAITH TCP relay translator
1704
1705FAITH system uses TCP relay daemon called "faithd" helped by the KAME kernel.
1706FAITH will reserve an IPv6 address prefix, and relay TCP connection
1707toward that prefix to IPv4 destination.
1708
1709For example, if the reserved IPv6 prefix is 3ffe:0501:0200:ffff::, and
1710the IPv6 destination for TCP connection is 3ffe:0501:0200:ffff::163.221.202.12,
1711the connection will be relayed toward IPv4 destination 163.221.202.12.
1712
1713	destination IPv4 node (163.221.202.12)
1714	  ^
1715	  | IPv4 tcp toward 163.221.202.12
1716	FAITH-relay dual stack node
1717	  ^
1718	  | IPv6 TCP toward 3ffe:0501:0200:ffff::163.221.202.12
1719	source IPv6 node
1720
1721faithd must be invoked on FAITH-relay dual stack node.
1722
1723For more details, consult kame/kame/faithd/README and
1724draft-ietf-ngtrans-tcpudp-relay-04.txt.
1725
17263.2 IPv6-to-IPv4 header translator
1727
1728(to be written)
1729
17304. IPsec
1731
1732IPsec is implemented as the following three components.
1733
1734(1) Policy Management
1735(2) Key Management
1736(3) AH, ESP and IPComp handling in kernel
1737
1738Note that KAME/OpenBSD does NOT include support for KAME IPsec code,
1739as OpenBSD team has their home-brew IPsec stack and they have no plan
1740to replace it.  IPv6 support for IPsec is, therefore, lacking on KAME/OpenBSD.
1741
1742http://www.netbsd.org/Documentation/network/ipsec/ has more information
1743including usage examples.
1744
17454.1 Policy Management
1746
1747The kernel implements experimental policy management code.  There are two ways
1748to manage security policy.  One is to configure per-socket policy using
1749setsockopt(3).  In this cases, policy configuration is described in
1750ipsec_set_policy(3).  The other is to configure kernel packet filter-based
1751policy using PF_KEY interface, via setkey(8).
1752
1753The policy entry will be matched in order.  The order of entries makes
1754difference in behavior.
1755
17564.2 Key Management
1757
1758The key management code implemented in this kit (sys/netkey) is a
1759home-brew PFKEY v2 implementation.  This conforms to RFC2367.
1760
1761The home-brew IKE daemon, "racoon" is included in the kit (kame/kame/racoon,
1762or usr.sbin/racoon).
1763Basically you'll need to run racoon as daemon, then setup a policy
1764to require keys (like ping -P 'out ipsec esp/transport//use').
1765The kernel will contact racoon daemon as necessary to exchange keys.
1766
1767In IKE spec, there's ambiguity about interpretation of "tunnel" proposal.
1768For example, if we would like to propose the use of following packet:
1769	IP AH ESP IP payload
1770some implementation proposes it as "AH transport and ESP tunnel", since
1771this is more logical from packet construction point of view.  Some
1772implementation proposes it as "AH tunnel and ESP tunnel".
1773Racoon follows the latter route (previously it followed the former, and
1774the latter interpretation seems to be popular/consensus).
1775This raises real interoperability issue.  We hope this to be resolved quickly.
1776
1777racoon does not implement byte lifetime for both phase 1 and phase 2
1778(RFC2409 page 35, Life Type = kilobytes).
1779
17804.3 AH and ESP handling
1781
1782IPsec module is implemented as "hooks" to the standard IPv4/IPv6
1783processing.  When sending a packet, ip{,6}_output() checks if ESP/AH
1784processing is required by checking if a matching SPD (Security
1785Policy Database) is found.  If ESP/AH is needed,
1786{esp,ah}{4,6}_output() will be called and mbuf will be updated
1787accordingly.  When a packet is received, {esp,ah}4_input() will be
1788called based on protocol number, i.e. (*inetsw[proto])().
1789{esp,ah}4_input() will decrypt/check authenticity of the packet,
1790and strips off daisy-chained header and padding for ESP/AH.  It is
1791safe to strip off the ESP/AH header on packet reception, since we
1792will never use the received packet in "as is" form.
1793
1794By using ESP/AH, TCP4/6 effective data segment size will be affected by
1795extra daisy-chained headers inserted by ESP/AH.  Our code takes care of
1796the case.
1797
1798Basic crypto functions can be found in directory "sys/crypto".  ESP/AH
1799transform are listed in {esp,ah}_core.c with wrapper functions.  If you
1800wish to add some algorithm, add wrapper function in {esp,ah}_core.c, and
1801add your crypto algorithm code into sys/crypto.
1802
1803Tunnel mode works basically fine, but comes with the following restrictions:
1804- You cannot run routing daemon across IPsec tunnel, since we do not model
1805  IPsec tunnel as pseudo interfaces.
1806- Authentication model for AH tunnel must be revisited.  We'll need to
1807  improve the policy management engine, eventually.
1808- Path MTU discovery does not work across IPv6 IPsec tunnel gateway due to
1809  insufficient code.
1810
1811AH specificaton does not talk much about "multiple AH on a packet" case.
1812We incrementally compute AH checksum, from inside to outside.  Also, we
1813treat inner AH to be immutable.
1814For example, if we are to create the following packet:
1815	IP AH1 AH2 AH3 payload
1816we do it incrementally.  As a result, we get crypto checksums like below:
1817	AH3 has checksum against "IP AH3' payload".
1818		where AH3' = AH3 with checksum field filled with 0.
1819	AH2 has checksum against "IP AH2' AH3 payload".
1820	AH1 has checksum against "IP AH1' AH2 AH3 payload",
1821Also note that AH3 has the smallest sequence number, and AH1 has the largest
1822sequence number.
1823
1824To avoid traffic analysis on shorter packets, ESP output logic supports
1825random length padding.  By setting net.inet.ipsec.esp_randpad (or
1826net.inet6.ipsec6.esp_randpad) to positive value N, you can ask the kernel
1827to randomly pad packets shorter than N bytes, to random length smaller than
1828or equal to N.  Note that N does not include ESP authentication data length.
1829Also note that the random padding is not included in TCP segment
1830size computation.  Negative value will turn off the functionality.
1831Recommeded value for N is like 128, or 256.  If you use a too big number
1832as N, you may experience inefficiency due to fragmented packtes.
1833
18344.4 IPComp handling
1835
1836IPComp stands for IP payload compression protocol.  This is aimed for
1837payload compression, not the header compression like PPP VJ compression.
1838This may be useful when you are using slow serial link (say, cell phone)
1839with powerful CPU (well, recent notebook PCs are really powerful...).
1840The protocol design of IPComp is very similar to IPsec, though it was
1841defined separately from IPsec itself.
1842
1843Here are some points to be noted:
1844- IPComp is treated as part of IPsec protocol suite, and SPI and
1845  CPI space is unified.  Spec says that there's no relationship
1846  between two so they are assumed to be separate in specs.
1847- IPComp association (IPCA) is kept in SAD.
1848- It is possible to use well-known CPI (CPI=2 for DEFLATE for example),
1849  for outbound/inbound packet, but for indexing purposes one element from
1850  SPI/CPI space will be occupied anyway.
1851- pfkey is modified to support IPComp.  However, there's no official
1852  SA type number assignment yet.  Portability with other IPComp
1853  stack is questionable (anyway, who else implement IPComp on UN*X?).
1854- Spec says that IPComp output processing must be performed before AH/ESP
1855  output processing, to achieve better compression ratio and "stir" data
1856  stream before encryption.  The most meaningful processing order is:
1857  (1) compress payload by IPComp, (2) encrypt payload by ESP, then (3) attach
1858  authentication data by AH.
1859  However, with manual SPD setting, you are able to violate the ordering
1860  (KAME code is too generic, maybe).  Also, it is just okay to use IPComp
1861  alone, without AH/ESP.
1862- Though the packet size can be significantly decreased by using IPComp, no
1863  special consideration is made about path MTU (spec talks nothing about MTU
1864  consideration).  IPComp is designed for serial links, not ethernet-like
1865  medium, it seems.
1866- You can change compression ratio on outbound packet, by changing
1867  deflate_policy in sys/netinet6/ipcomp_core.c.  You can also change outbound
1868  history buffer size by changing deflate_window_out in the same source code.
1869  (should it be sysctl accessible, or per-SAD configurable?)
1870- Tunnel mode IPComp is not working right.  KAME box can generate tunnelled
1871  IPComp packet, however, cannot accept tunneled IPComp packet.
1872- You can negotiate IPComp association with racoon IKE daemon.
1873- KAME code does not attach Adler32 checksum to compressed data.
1874  see ipsec wg mailing list discussion in Jan 2000 for details.
1875
18764.5 Conformance to RFCs and IDs
1877
1878The IPsec code in the kernel conforms (or, tries to conform) to the
1879following standards:
1880    "old IPsec" specification documented in rfc182[5-9].txt
1881    "new IPsec" specification documented in:
1882	rfc240[1-6].txt rfc241[01].txt rfc2451.txt rfc3602.txt
1883    IPComp:
1884	RFC2393: IP Payload Compression Protocol (IPComp)
1885IKE specifications (rfc240[7-9].txt) are implemented in userland
1886as "racoon" IKE daemon.
1887
1888Currently supported algorithms are:
1889    old IPsec AH
1890	null crypto checksum (no document, just for debugging)
1891	keyed MD5 with 128bit crypto checksum (rfc1828.txt)
1892	keyed SHA1 with 128bit crypto checksum (no document)
1893	HMAC MD5 with 128bit crypto checksum (rfc2085.txt)
1894	HMAC SHA1 with 128bit crypto checksum (no document)
1895	HMAC RIPEMD160 with 128bit crypto checksum (no document)
1896    old IPsec ESP
1897	null encryption (no document, similar to rfc2410.txt)
1898	DES-CBC mode (rfc1829.txt)
1899    new IPsec AH
1900	null crypto checksum (no document, just for debugging)
1901	keyed MD5 with 96bit crypto checksum (no document)
1902	keyed SHA1 with 96bit crypto checksum (no document)
1903	HMAC MD5 with 96bit crypto checksum (rfc2403.txt
1904	HMAC SHA1 with 96bit crypto checksum (rfc2404.txt)
1905	HMAC SHA2-256 with 96bit crypto checksum (no document)
1906	HMAC SHA2-384 with 96bit crypto checksum (no document)
1907	HMAC SHA2-512 with 96bit crypto checksum (no document)
1908	HMAC RIPEMD160 with 96bit crypto checksum (RFC2857)
1909	AES XCBC MAC with 96bit crypto checksum (RFC3566)
1910    new IPsec ESP
1911	null encryption (rfc2410.txt)
1912	DES-CBC with derived IV
1913		(draft-ietf-ipsec-ciph-des-derived-01.txt, draft expired)
1914	DES-CBC with explicit IV (rfc2405.txt)
1915	3DES-CBC with explicit IV (rfc2451.txt)
1916	BLOWFISH CBC (rfc2451.txt)
1917	CAST128 CBC (rfc2451.txt)
1918	RIJNDAEL/AES CBC (rfc3602.txt)
1919	AES counter mode (draft-ietf-ipsec-ciph-aes-ctr-03.txt)
1920
1921	each of the above can be combined with:
1922	    ESP authentication with HMAC-MD5(96bit)
1923	    ESP authentication with HMAC-SHA1(96bit)
1924    IPComp
1925	RFC2394: IP Payload Compression Using DEFLATE
1926
1927The following algorithms are NOT supported:
1928    old IPsec AH
1929	HMAC MD5 with 128bit crypto checksum + 64bit replay prevention
1930		(rfc2085.txt)
1931	keyed SHA1 with 160bit crypto checksum + 32bit padding (rfc1852.txt)
1932
1933The key/policy management API is based on the following document, with fair
1934amount of extensions:
1935	RFC2367: PF_KEY key management API
1936
19374.6 ECN consideration on IPsec tunnels
1938
1939KAME IPsec implements ECN-friendly IPsec tunnel, described in
1940draft-ietf-ipsec-ecn-02.txt.
1941Normal IPsec tunnel is described in RFC2401.  On encapsulation,
1942IPv4 TOS field (or, IPv6 traffic class field) will be copied from inner
1943IP header to outer IP header.  On decapsulation outer IP header
1944will be simply dropped.  The decapsulation rule is not compatible
1945with ECN, since ECN bit on the outer IP TOS/traffic class field will be
1946lost.
1947To make IPsec tunnel ECN-friendly, we should modify encapsulation
1948and decapsulation procedure.  This is described in
1949draft-ietf-ipsec-ecn-02.txt, chapter 3.3.
1950
1951KAME IPsec tunnel implementation can give you three behaviors, by setting
1952net.inet.ipsec.ecn (or net.inet6.ipsec6.ecn) to some value:
1953- RFC2401: no consideration for ECN (sysctl value -1)
1954- ECN forbidden (sysctl value 0)
1955- ECN allowed (sysctl value 1)
1956Note that the behavior is configurable in per-node manner, not per-SA manner
1957(draft-ietf-ipsec-ecn-02 wants per-SA configuration, but it looks too much
1958for me).
1959
1960The behavior is summarized as follows (see source code for more detail):
1961
1962		encapsulate			decapsulate
1963		---				---
1964RFC2401		copy all TOS bits		drop TOS bits on outer
1965		from inner to outer.		(use inner TOS bits as is)
1966
1967ECN forbidden	copy TOS bits except for ECN	drop TOS bits on outer
1968		(masked with 0xfc) from inner	(use inner TOS bits as is)
1969		to outer.  set ECN bits to 0.
1970
1971ECN allowed	copy TOS bits except for ECN	use inner TOS bits with some
1972		CE (masked with 0xfe) from	change.  if outer ECN CE bit
1973		inner to outer.			is 1, enable ECN CE bit on
1974		set ECN CE bit to 0.		the inner.
1975
1976General strategy for configuration is as follows:
1977- if both IPsec tunnel endpoint are capable of ECN-friendly behavior,
1978  you'd better configure both end to "ECN allowed" (sysctl value 1).
1979- if the other end is very strict about TOS bit, use "RFC2401"
1980  (sysctl value -1).
1981- in other cases, use "ECN forbidden" (sysctl value 0).
1982The default behavior is "ECN forbidden" (sysctl value 0).
1983
1984For more information, please refer to:
1985	draft-ietf-ipsec-ecn-02.txt
1986	RFC2481 (Explicit Congestion Notification)
1987	KAME sys/netinet6/{ah,esp}_input.c
1988
1989(Thanks goes to Kenjiro Cho <kjc@csl.sony.co.jp> for detailed analysis)
1990
19914.7 Interoperability
1992
1993IPsec, IPComp (in kernel) and IKE (in userland as "racoon") has been tested
1994at several interoperability test events, and it is known to interoperate
1995with many other implementations well.  Also, KAME IPsec has quite wide
1996coverage for IPsec crypto algorithms documented in RFC (we do not cover
1997algorithms with intellectual property issues, though).
1998
1999Here are (some of) platforms we have tested IPsec/IKE interoperability
2000in the past, no particular order.  Note that both ends (KAME and
2001others) may have modified their implementation, so use the following
2002list just for reference purposes.
2003	ACC, allied-telesis, Altiga, Ashley-laurent (vpcom.com), BlueSteel,
2004	CISCO IOS, Cryptek, Checkpoint FW-1, Data Fellows (F-Secure),
2005	Ericsson, Fitel, FreeS/WAN, HiFn, HITACHI, IBM AIX, IIJ, Intel Canada,
2006	Intel Packet Protect, MEW NetCocoon, MGCS, Microsoft WinNT/2000,
2007	NAI PGPnet, NetLock, NIST (linux IPsec + plutoplus), NEC IX5000,
2008	Netscreen, NxNetworks, OpenBSD isakmpd, Pivotal, Radguard, RapidStream,
2009	RedCreek, Routerware, RSA, SSH (both IPv4/IPv6), Secure Computing,
2010	Soliton, Sun Solaris8, TIS/NAI Gauntret, Toshiba, VPNet,
2011	Yamaha RT series
2012
2013Here are (some of) platforms we have tested IPComp/IKE interoperability
2014in the past, in no particular order.
2015	IRE, SSH (both IPv4/IPv6), NetLock
2016
2017VPNC (vpnc.org) provides IPsec conformance tests, using KAME and OpenBSD
2018IPsec/IKE implementations.  Their test results are available at
2019http://www.vpnc.org/conformance.html, and it may give you more idea
2020about which implementation interoperates with KAME IPsec/IKE implementation.
2021
20224.8 Operations with IPsec tunnel mode
2023
2024First of all, IPsec tunnel is a very hairy thing.  It seems to do a neat thing
2025like VPN configuration or secure remote accesses, however, it comes with lots
2026of architectural twists.
2027
2028RFC2401 defines IPsec tunnel mode, within the context of IPsec.  RFC2401
2029defines tunnel mode packet encapsulation/decapsulation on its own, and
2030does not refer other tunnelling specifications.  Since RFC2401 advocates
2031filter-based SPD database matches, it would be natural for us to implement
2032IPsec IPsec tunnel mode as filters - not as pseudo interfaces.
2033
2034There are some people who are trying to separate IPsec "tunnel mode" from
2035the IPsec itself.  They would like to implement IPsec transport mode only,
2036and combine it with tunneling pseudo devices.  The prime example is found
2037in draft-touch-ipsec-vpn-01.txt.  However, if you really define pseudo
2038interfaces separately from IPsec, IKE daemons would need to negotiate
2039transport mode SAs, instead of tunnel mode SAs.  Therefore, we cannot
2040really mix RFC2401-based interpretation and draft-touch-ipsec-vpn-01.txt
2041interpretation.
2042
2043The KAME stack implements can be configured in two ways.  You may need
2044to recompile your kernel to switch the behavior.
2045- RFC2401 IPsec tunnel mode appraoch (4.8.1)
2046- draft-touch-ipsec-vpn approach (4.8.2)
2047	Works in all kernel configuration, but racoon(8) may not interoperate.
2048
2049There are pros and cons on these approaches:
2050
2051RFC2401 IPsec tunnel mode (filter-like) approach
2052	PRO: SPD lookup fits nicely with packet filters (if you integrate them)
2053	CON: cannot run routing daemons across IPsec tunnels
2054	CON: it is very hard to control source address selection on originating
2055		cases
2056	???: IPv6 scope zone is kept the same
2057draft-touch-ipsec-vpn (transportmode + Pseudo-interface) approach
2058	PRO: run routing daemons across IPsec tunnels
2059	PRO: source address selection can be done normally, by looking at
2060		IPsec tunnel pseudo devices
2061	CON: on outbound, possibility of infinite loops if routing setup
2062		is wrong
2063	CON: due to differences in encap/decap logic from RFC2401, it may not
2064		interoperate with very picky RFC2401 implementations
2065		(those who check TOS bits, for example)
2066	CON: cannot negotiate IKE with other IPsec tunnel-mode devices
2067		(the other end has to implement 
2068	???: IPv6 scope zone is likely to be different from the real ethernet
2069		interface
2070
2071The recommendation is different depending on the situation you have:
2072- use draft-touch-ipsec-vpn if you have the control over the other end.
2073  this one is the best in terms of simplicity.
2074- if the other end is normal IPsec device with RFC2401 implementation,
2075  you need to use RFC2401, otherwise you won't be able to run IKE.
2076- use RFC2401 approach if you just want to forward packets back and forth
2077  and there's no plan to use IPsec gateway itself as an originating device.
2078
20794.8.1 RFC2401 IPsec tunnel mode approach
2080
2081To configure your device as RFC2401 IPsec tunnel mode endpoint, you will
2082use "tunnel" keyword in setkey(8) "spdadd" directives.  Let us assume the
2083following topology (A and B could be a network, like prefix/length):
2084
2085	((((((((((((The internet))))))))))))
2086	  |			  |
2087	  |C (global)		  |D
2088	your device		peer's device
2089	  |A (private)		  |B
2090	==+===== VPN net	==+===== VPN net
2091
2092The policy configuration directive is like this.  You will need manual
2093SAs, or IKE daemon, for actual encryption:
2094
2095	# setkey -c <<EOF
2096	spdadd A B any -P out ipsec esp/tunnel/C-D/use;
2097	spdadd B A any -P in ipsec esp/tunnel/D-C/use;
2098	^D
2099
2100The inbound/outbound traffic is monitored/captured by SPD engine, which works
2101just like packet filters.
2102
2103With this, forwarding case should work flawlessly.  However, troubles arise
2104when you have one of the following requirements:
2105- When you originate traffic from your VPN gateway device to VPN net on the
2106  other end (like B), you want your source address to be A (private side)
2107  so that the traffic would be protected by the policy.
2108  With this approach, however, the source address selection logic follows
2109  normal routing table, and C (global side) will be picked for any outgoing
2110  traffic, even if the destination is B.  The resulting packet will be like
2111  this:
2112	IP[C -> B] payload
2113  and will not match the policy (= sent in clear).
2114- When you want to run routing protocols on top of the IPsec tunnel, it is
2115  not possible.  As there is no pseudo device that identifies the IPsec tunnel,
2116  you cannot identify where the routing information came from.  As a result,
2117  you can't run routing daemons.
2118
21194.8.2 draft-touch-ipsec-vpn approach
2120
2121With this approach, you will configure gif(4) tunnel interfaces, as well as
2122IPsec transport mode SAs.
2123
2124	# gifconfig gif0 C D
2125	# ifconfig gif0 A B
2126	# setkey -c <<EOF
2127	spdadd C D any -P out ipsec esp/transport//use;
2128	spdadd D C any -P in ipsec esp/transport//use;
2129	^D
2130
2131Since we have a pseudo-interface "gif0", and it affects the routes and
2132the source address selection logic, we can have source address A, for
2133packets originated by the VPN gateway to B (and the VPN cloud).
2134We can also exchange routing information over the tunnel (gif0), as the tunnel
2135is represented as a pseudo interface (dynamic routes points to the
2136pseudo interface).
2137
2138There is a big drawbacks, however; with this, you can use IKE if and only if
2139the other end is using draft-touch-ipsec-vpn approach too.  Since racoon(8)
2140grabs phase 2 IKE proposals from the kernel SPD database, you will be
2141negotiating IPsec transport-mode SAs with the other end, not tunnel-mode SAs.
2142Also, since the encapsulation mechanism is different from RFC2401, you may not
2143be able to interoperate with a picky RFC2401 implementations - if the other
2144end checks certain outer IP header fields (like TOS), you will not be able to
2145interoperate.
2146
2147
21485. ALTQ
2149
2150KAME kit includes ALTQ 2.1 code, which supports FreeBSD2, FreeBSD3,
2151NetBSD and OpenBSD.  For BSD/OS, ALTQ does not work.
2152ALTQ in KAME supports (or tries to support) IPv6.
2153(actually, ALTQ is developed on KAME repository since ALTQ 2.1 - Jan 2000)
2154
2155ALTQ occupies single character device number.  For FreeBSD, it is officially
2156allocated.  For OpenBSD and NetBSD, we use the number which is not
2157currently allocated (will eventually get an official number).
2158The character device is enabled for i386 architecture only.  To enable and
2159compile ALTQ-ready kernel for other archititectures, take the following steps:
2160- assume that your architecture is FOOBAA.
2161- modify sys/arch/FOOBAA/FOOBAA/conf.c (or somewhere that defines cdevsw),
2162  to include a line for ALTQ.  look at sys/arch/i386/i386/conf.c for
2163  example.  The major number must be same as i386 case.
2164- copy kernel configuration file (like ALTQ.v6 or GENERIC.v6) from i386,
2165  and modify accordingly.
2166- build a kernel.
2167- before building userland, change netbsd/{lib,usr.sbin,usr.bin}/Makefile
2168  (or openbsd/foobaa) so that it will visit altq-related sub directories.
2169
21706. mobile-ip6
2171
21726.1 KAME node as correspondent node
2173
2174Default installation recognizes home address option (in destination
2175options header).  No sub-options are supported.  interaction with
2176IPsec, and/or 2292bis API, needs further study.
2177
21786.2 KAME node as home agent/mobile node
2179
2180KAME kit includes Ericsson mobile-ip6 code.  The integration is just started
2181(in Feb 2000), and we will need some more time to integrate it better.
2182
2183See kame/mip6config/{QUICKSTART,README_MIP6.txt} for more details.
2184
2185The Ericsson code implements revision 09 of the mobile-ip6 draft.  There
2186are other implementations available:
2187	NEC: http://www.6bone.nec.co.jp/mipv6/internal-dist/ (-13 draft)
2188	SFC: http://neo.sfc.wide.ad.jp/~mip6/ (-13 draft)
2189
21907. Coding style
2191
2192The KAME developers basically do not make a bother about coding
2193style.  However, there is still some agreement on the style, in order
2194to make the distributed develoment smooth.
2195
2196- follow *BSD KNF where possible.  note: there are multiple KNF standards.
2197- the tab character should be 8 columns wide (tabstops are at 8, 16, 24, ...
2198  column).  With vi, use ":set ts=8 sw=8".
2199  With GNU Emacs 20 and later, the easiest way is to use the "bsd" style of
2200  cc-mode with the variable "c-basic-offset" being 8;
2201  (add-hook 'c-mode-common-hook
2202	    (function
2203	     (lambda ()
2204	       (c-set-style "bsd")
2205	       (setq c-basic-offset 8)  ; XXX for Emacs 20 only
2206	       )))
2207  The "bsd" style in GNU Emacs 21 sets the variable to 8 by default,
2208  so the line marked by "XXX" is not necessary if you only use GNU
2209  Emacs 21.
2210- each line should be within 80 characters.
2211- keep a single open/close bracket in a comment such as in the following
2212  line:
2213	putchar('(');	/* ) */
2214  without this, some vi users would have a hard time to match a pair of
2215  brackets.  Although this type of bracket seems clumsy and is even
2216  harmful for some other type of vi users and Emacs users, the
2217  agreement in the KAME developers is to allow it.
2218- add the following line to the head of every KAME-derived file:
2219  /*	(dollar)KAME(dollar)	*/
2220  where "(dollar)" is the dollar character ($), and around "$" are tabs.
2221  (this is for C.  For other language, you should use its own comment
2222  line.)
2223  Once commited to the CVS repository, this line will contain its
2224  version number (see, for example, at the top of this file).  This
2225  would make it easy to report a bug.
2226- when creating a new file with the WIDE copyright, tap "make copyright.c" at
2227  the top-level, and use copyright.c as a template.  KAME RCS tag will be
2228  included automatically.
2229- when editting a third-party package, keep its own coding style as
2230  much as possible, even if the style does not follow the items above.
2231- it is recommended to always wrap an expression containing
2232  bitwise operators by parentheses, especially when the expression is
2233  combined with relational operators, in order to avoid unintentional
2234  mismatch of operators.  Thus, we should write
2235	if ((a & b) == 0)	/* (A) */
2236  or
2237	if (a & (b == 0))	/* (B) */
2238  instead of
2239	if (a & b == 0)		/* (C) */
2240  even if the programmer's intention was (C), which is equivalent to
2241  (B) according to the grammar of the language C.
2242  Thus, we should write a code to test if a bit-flag is set for a
2243  given variable as follows:
2244	if ((flag & FLAG_A) == 0)	/* (D) the FLAG_A is NOT set */
2245	if ((flag & FLAG_A) != 0)	/* (E) the FLAG_A is set */
2246  Some developers in the KAME project rather prefer the following style:
2247	if (!(flag & FLAG_A))	/* (F) the FLAG_A is NOT set */
2248	if ((flag & FLAG_A))	/* (G) the FLAG_A is set */
2249  because it would be more intuitive in terms of the relationship
2250  between the negation operator (!) and the semantics of the
2251  condition.  The KAME developers have discussed the style, and have
2252  agreed that all the styles from (D) to (G) are valid.  So, when you
2253  see styles like (D) and (E) in the KAME code and feel a bit strange,
2254  please just keep them.  They are intentional.
2255- When inserting a separate block just to define some intra-block
2256  variables, add the level of indentation as if the block was in a
2257  control statement such as if-else, for, or while.  For example,
2258	foo ()
2259	{
2260		int a;
2261
2262		{
2263			int internal_a;
2264			...
2265		}
2266	}
2267  should be used, instead of
2268	foo ()
2269	{
2270		int a;
2271
2272	    {
2273		int internal_a;
2274		...
2275	     }
2276	}
2277- Do not use printf() or log() in the packet input path of the kernel code.
2278  They can make the system vulnerable to packet flooding attacks (results in
2279  /var overflow).
2280- (not a style issue)
2281  To disable a module that is mistakenly imported (by CVS), just
2282  remove the source tree in the repository.  Note, however, that the
2283  removal might annoy other developers who have already checked the
2284  module out, so you should announce the removal as soon as possible.
2285  Also, be 100% sure not to remove other modules.
2286
2287When you want to contribute something to the KAME project, and if *you
2288do not mind* the agreement, it would be helpful for the project to
2289keep these rules.  Note, however, that we would never intend to force
2290you to adopt our rules.  We would rather regard your own style,
2291especially when you have a policy about the style.
2292
2293
22949. Policy on technology with intellectual property right restriction
2295
2296There are quite a few IETF documents/whatever which has intellectual property
2297right (IPR) restriction.  KAME's stance is stated below.
2298
2299    The goal of KAME is to provide freely redistributable, BSD-licensed,
2300    implementation of Internet protocol technologies.
2301    For this purpose, we implement protocols that (1) do not need license
2302    contract with IPR holder, and (2) are royalty-free.
2303    The reason for (1) is, even if KAME contracts with the IPR holder in
2304    question, the users of KAME stack (usually implementers of some other
2305    codebase) would need to make a license contract with the IPR holder.
2306    It would damage the "freely redistributable" status of KAME codebase.
2307
2308    By doing so KAME is (implicitly) trying to advocate no-license-contract,
2309    royalty-free, release of IPRs.
2310
2311Note however, as documented in README, we do not guarantee that KAME code
2312is free of IPR infringement, you MUST check it if you are to integrate
2313KAME into your product (or whatever):
2314    READ CAREFULLY: Several countries have legal enforcement for
2315    export/import/use of cryptographic software.  Check it before playing
2316    with the kit.  We do not intend to be your legalease clearing house
2317    (NO WARRANTY).  If you intend to include KAME stack into your product,
2318    you'll need to check if the licenses on each file fit your situations,
2319    and/or possible intellectual property right issues.
2320
2321						 <end of IMPLEMENTATION>
2322