1238106SdesRequirements for Recursive Caching Resolver 
2238106Sdes	(a.k.a. Treeshrew, Unbound-C)
3238106SdesBy W.C.A. Wijngaards, NLnet Labs, October 2006.
4238106Sdes
5238106SdesContents
6238106Sdes1. Introduction
7238106Sdes2. History
8238106Sdes3. Goals
9238106Sdes4. Non-Goals
10238106Sdes
11238106Sdes
12238106Sdes1. Introduction
13238106Sdes---------------
14238106SdesThis is the requirements document for a DNS name server and aims to
15238106Sdesdocument the goals and non-goals of the project.  The DNS (the Domain
16238106SdesName System) is a global, replicated database that uses a hierarchical
17238106Sdesstructure for queries.
18238106Sdes
19238106SdesData in the DNS is stored in Resource Record sets (RR sets), and has a
20238106Sdestime to live (TTL).  During this time the data can be cached.  It is
21238106Sdesthus useful to cache data to speed up future lookups.  A server that
22238106Sdeslooks up data in the DNS for clients and caches previous answers to
23238106Sdesspeed up processing is called a caching, recursive nameserver.  
24238106Sdes
25238106SdesThis project aims to develop such a nameserver in modular components, so
26238106Sdesthat also DNSSEC (secure DNS) validation and stub-resolvers (that do not
27238106Sdesrun as a server, but a linked into an application) are easily possible.
28238106Sdes
29238106SdesThe main components are the Validator that validates the security
30238106Sdesfingerprints on data sets, the Iterator that sends queries to the
31238106Sdeshierarchical DNS servers that own the data and the Cache that stores
32238106Sdesdata from previous queries.  The networking and query management code
33238106Sdesthen interface with the modules to perform the necessary processing.
34238106Sdes
35238106SdesIn Section 2 the origins of the Unbound project are documented. Section
36238106Sdes3 lists the goals, while Section 4 lists the explicit non-goals of the
37238106Sdesproject. Section 5 discusses choices made during development.
38238106Sdes
39238106Sdes
40238106Sdes2. History
41238106Sdes----------
42238106SdesThe unbound resolver project started by Bill Manning, David Blacka, and
43238106SdesMatt Larson (from the University of California and from Verisign), that
44238106Sdescreated a Java based prototype resolver called Unbound.  The basic
45238106Sdesdesign decisions of clean modules was executed.
46238106Sdes
47238106SdesThe Java prototype worked very well, with contributions from Geoff
48238106SdesSisson and Roy Arends from Nominet.  Around 2006 the idea came to create
49238106Sdesa full-fledged C implementation ready for deployed use.  NLnet Labs
50238106Sdesvolunteered to write this implementation.
51238106Sdes
52238106Sdes
53238106Sdes3. Goals
54238106Sdes--------
55238106Sdeso A validating recursive DNS resolver.
56238106Sdeso Code diversity in the DNS resolver monoculture.
57238106Sdeso Drop-in replacement for BIND apart from config.
58238106Sdeso DNSSEC support.
59238106Sdeso Fully RFC compliant.
60238106Sdeso High performance
61238106Sdes	* even with validation.
62238106Sdeso Used as
63238106Sdes	* stub resolver.
64238106Sdes	* full caching name server.
65238106Sdes	* resolver library.
66238106Sdeso Elegant design of validator, resolver, cache modules.
67238106Sdes	* provide the ability to pick and choose modules.
68238106Sdeso Robust.
69238106Sdeso In C, open source: The BSD license. 
70238106Sdeso Highly portable, targets include modern Unix systems, such as *BSD,
71238106Sdessolaris, linux, and maybe also the windows platform.
72238106Sdeso Smallest as possible component that does the job.
73238106Sdeso Stub-zones can be configured (local data or AS112 zones).
74238106Sdes
75238106Sdes
76238106Sdes4. Non-Goals
77238106Sdes------------
78238106Sdeso An authoritative name server.
79238106Sdeso Too many Features.
80238106Sdes
81238106Sdes
82238106Sdes5. Choices
83238106Sdes----------
84238106Sdeso rfc2181 decourages duplicates RRs in RRsets. unbound does not create
85238106Sdes  duplicates, but when presented with duplicates on the wire from the
86238106Sdes  authoritative servers, does not perform duplicate removal.
87238106Sdes  It does do some rrsig duplicate removal, in the msgparser, for dnssec qtype
88238106Sdes  rrsig and any, because of special rrsig processing in the msgparser.
89238106Sdeso The harden-glue feature, when yes all out of zone glue is deleted, when
90238106Sdes  no out of zone glue is used for further resolving, is more complicated 
91238106Sdes  than that, see below.
92238106Sdes  Main points:
93238106Sdes  	* rfc2182 trust handling is used. 
94238106Sdes	* data is let through only in very specific cases
95238106Sdes	* spoofability remains possible.
96238106Sdes  Not all glue is let through (despite the name of the option). Only glue 
97238106Sdes  which is present in a delegation, of type A and AAAA, where the name is
98238106Sdes  present in the NS record in the authority section is let through.
99238106Sdes  The glue that is let through is stored in the cache (marked as 'from the
100238106Sdes  additional section'). And will then be used for sending queries to. It
101238106Sdes  will not be present in the reply to the client (if RD is off).
102238106Sdes  A direct query for that name will attempt to get a msg into the message
103238106Sdes  cache. Since A and AAAA queries are not synthesized by the unbound cache,
104238106Sdes  this query will be (eventually) sent to the authoritative server and its
105238106Sdes  answer will be put in the cache, marked as 'from the answer section' and
106238106Sdes  thus remove the 'from the additional section' data, and this record is 
107238106Sdes  returned to the client.
108238106Sdes  The message has a TTL smaller or equal to the TTL of the answer RR.
109238106Sdes  If the cache memory is low; the answer RR may be dropped, and a glue
110238106Sdes  RR may be inserted, within the message TTL time, and thus return the
111238106Sdes  spoofed glue to a client. When the message expires, it is refetched and
112238106Sdes  the cached RR is updated with the correct content.
113238106Sdes  The server can be spoofed by getting it to visit a especially prepared 
114238106Sdes  domain. This domain then inserts an address for another authoritative 
115238106Sdes  server into the cache, when visiting that other domain, this address may
116238106Sdes  then be used to send queries to. And fake answers may be returned.
117238106Sdes  If the other domain is signed by DNSSEC, the fakes will be detected.
118238106Sdes
119238106Sdes  In summary, the harden glue feature presents a security risk if
120238106Sdes  disabled. Disabling the feature leads to possible better performance
121238106Sdes  as more glue is present for the recursive service to use. The feature
122238106Sdes  is implemented so as to minimise the security risk, while trying to 
123238106Sdes  keep this performance gain.
124238106Sdeso The method by which dnssec-lameness is detected is not secure. DNSSEC lame
125238106Sdes  is when a server has the zone in question, but lacks dnssec data, such as
126238106Sdes  signatures. The method to detect dnssec lameness looks at nonvalidated 
127238106Sdes  data from the parent of a zone. This can be used, by spoofing the parent,
128238106Sdes  to create a false sense of dnssec-lameness in the child, or a false sense
129238106Sdes  or dnssec-non-lameness in the child. The first results in the server marked
130238106Sdes  lame, and not used for 900 seconds, and the second will result in a 
131238106Sdes  validator failure (SERVFAIL again), when the query is validated later on.
132238106Sdes
133238106Sdes  Concluding, a spoof of the parent delegation can be used for many cases
134238106Sdes  of denial of service. I.e. a completely different NS set could be returned,
135238106Sdes  or the information withheld. All of these alterations can be caught by
136238106Sdes  the validator if the parent is signed, and result in 900 seconds bogus. 
137238106Sdes  The dnssec-lameness detection is used to detect operator failures, 
138238106Sdes  before the validator will properly verify the messages.
139238106Sdes
140238106Sdes  Also for zones for which no chain of trust exists, but a DS is given by the
141238106Sdes  parent, dnssec-lameness detection enables. This delivers dnssec to our 
142238106Sdes  clients when possible (for client validators).
143238106Sdes
144238106Sdes  The following issue needs to be resolved:
145238106Sdes	a server that serves both a parent and child zone, where
146238106Sdes	parent is signed, but child is not. The server must not be marked 
147238106Sdes	lame for the parent zone, because the child answer is not signed. 
148238106Sdes  Instead of a false positive, we want false negatives; failure to 
149238106Sdes  detect dnssec-lameness is less of a problem than marking honest 
150238106Sdes  servers lame. dnssec-lameness is a config error and deserves the trouble.
151238106Sdes  So, only messages that identify the zone are used to mark the zone
152238106Sdes  lame. The zone is identified by SOA or NS RRsets in the answer/auth.
153238106Sdes  That includes almost all negative responses and also A, AAAA qtypes.
154238106Sdes  That would be most responses from servers.
155238106Sdes  For referrals, delegations that add a single label can be checked to be
156238106Sdes  from their zone, this covers most delegation-centric zones.
157238106Sdes
158238106Sdes  So possibly, for complicated setups, with multiple (parent-child) zones 
159238106Sdes  on a server, dnssec-lameness detection does not work - no dnssec-lameness 
160238106Sdes  is detected. Instead the zone that is dnssec-lame becomes bogus.
161238106Sdes
162238106Sdeso authority features.
163238106Sdes  This is a recursive server, and authority features are out of scope.
164238106Sdes  However, some authority features are expected in a recursor. Things like
165238106Sdes  localhost, reverse lookup for 127.0.0.1, or blocking AS112 traffic.
166238106Sdes  Also redirection of domain names with fixed data is needed by service
167238106Sdes  providers. Limited support is added specifically to address this.
168238106Sdes
169238106Sdes  Adding full authority support, requires much more code, and more complex
170238106Sdes  maintenance.
171238106Sdes
172238106Sdes  The limited support allows adding some static data (for localhost and so),
173238106Sdes  and to respond with a fixed rcode (NXDOMAIN) for domains (such as AS112).
174238106Sdes
175238106Sdes  You can put authority data on a separate server, and set the server in 
176238106Sdes  unbound.conf as stub for those zones, this allows clients to access data 
177238106Sdes  from the server without making unbound authoritative for the zones.
178238106Sdes
179238106Sdeso the access control denies queries before any other processing.
180238106Sdes  This denies queries that are not authoritative, or version.bind, or any.
181238106Sdes  And thus prevents cache-snooping (denied hosts cannot make non-recursive
182238106Sdes  queries and get answers from the cache).
183238106Sdes
184238106Sdeso If a client makes a query without RD bit, in the case of a returned 
185238106Sdes  message from cache which is:
186238106Sdes	answer section: empty
187238106Sdes	auth section: NS record present, no SOA record, no DS record, 
188238106Sdes		maybe NSEC or NSEC3 records present.
189238106Sdes	additional: A records or other relevant records.
190238106Sdes  A SOA record would indicate that this was a NODATA answer.
191238106Sdes  A DS records would indicate a referral.
192238106Sdes  Absence of NS record would indicate a NODATA answer as well.
193238106Sdes
194238106Sdes  Then the receiver does not know whether this was a referral
195238106Sdes  with attempt at no-DS proof) or a nodata answer with attempt
196238106Sdes  at no-data proof. It could be determined by attempting to prove
197238106Sdes  either condition; and looking if only one is valid, but both 
198238106Sdes  proofs could be valid, or neither could be valid, which creates
199238106Sdes  doubt. This case is validated by unbound as a 'referral' which
200238106Sdes  ascertains that RRSIGs are OK (and not omitted), but does not
201238106Sdes  check NSEC/NSEC3. 
202238106Sdes
203238106Sdeso Case preservation
204238106Sdes  Unbound preserves the casing received from authority servers as best 
205238106Sdes  as possible. It compresses without case, so case can get lost there.
206238106Sdes  The casing from the query name is used in preference to the casing
207238106Sdes  of the authority server. This is the same as BIND. RFC4343 allows either 
208238106Sdes  behaviour.
209238106Sdes 
210238106Sdeso Denial of service protection
211238106Sdes  If many queries are made, and they are made to names for which the
212238106Sdes  authority servers do not respond, then the requestlist for unbound
213238106Sdes  fills up fast.  This results in denial of service for new queries.
214238106Sdes  To combat this the first 50% of the requestlist can run to completion.
215238106Sdes  The last 50% of the requestlist get (200 msec) at least and are replaced
216238106Sdes  by newer queries when older (LIFO).
217238106Sdes  When a new query comes in, and a place in the first 50% is available, this
218238106Sdes  is preferred.  Otherwise, it can replace older queries out of the last 50%.
219238106Sdes  Thus, even long queries get a 50% chance to be resolved.  And many 'short'
220238106Sdes  one or two round-trip resolves can be done in the last 50% of the list.
221238106Sdes  The timeout can be configured.
222238106Sdes
223238106Sdeso EDNS fallback. Is done according to the EDNS RFC (and update draft-00).
224238106Sdes  Unbound assumes EDNS 0 support for the first query.  Then it can detect
225238106Sdes  support (if the servers replies) or non-support (on a NOTIMPL or FORMERR).
226238106Sdes  Some middleboxes drop EDNS 0 queries, mainly when forwarding, not when
227238106Sdes  routing packets.  To detect this, when timeouts keep happening, as the
228238106Sdes  timeout approached 5-10 seconds, and EDNS status has not been detected yet,
229238106Sdes  a single probe query is sent.  This probe has a sub-second timeout, and
230238106Sdes  if the server responds (quickly) without EDNS, this is cached for 15 min.
231238106Sdes  This works very well when detecting an address that you use much - like
232238106Sdes  a forwarder address - which is where the middleboxes need to be detected.
233238106Sdes  Otherwise, it results in a 5 second wait time before EDNS timeout is 
234238106Sdes  detected, which is slow but it works at least. 
235238106Sdes  It minimizes the chances of a dropped query making a (DNSSEC) EDNS server
236238106Sdes  falsely EDNS-nonsupporting, and thus DNSSEC-bogus, works well with 
237238106Sdes  middleboxes, and can detect the occasional authority that drops EDNS.
238238106Sdes  For some boxes it is necessary to probe for every failing query, a
239238106Sdes  reassurance that the DNS server does EDNS does not mean that path can
240238106Sdes  take large DNS answers.
241238106Sdes
242238106Sdeso 0x20 backoff.
243238106Sdes  The draft describes to back off to the next server, and go through all
244238106Sdes  servers several times.  Unbound goes on get the full list of nameserver
245238106Sdes  addresses, and then makes 3 * number of addresses queries.
246238106Sdes  They are sent to a random server, but no one address more than 4 times.
247238106Sdes  It succeeds if one has 0x20 intact, or else all are equal.
248238106Sdes  Otherwise, servfail is returned to the client.
249238106Sdes
250238106Sdeso NXDOMAIN and SOA serial numbers.
251238106Sdes  Unbound keeps TTL values for message formats, and thus rcodes, such
252238106Sdes  as NXDOMAIN.  Also it keeps the latest rrsets in the rrset cache.
253238106Sdes  So it will faithfully negative cache for the exact TTL as originally
254238106Sdes  specified for an NXDOMAIN message, but send a newer SOA record if
255238106Sdes  this has been found in the mean time.  In point, this could lead to a
256238106Sdes  negative cached NXDOMAIN reply with a SOA RR where the serial number
257238106Sdes  indicates a zone version where this domain is not any longer NXDOMAIN.
258238106Sdes  These situations become consistent once the original TTL expires.
259238106Sdes  If the domain is DNSSEC signed, by the way, then NSEC records are
260238106Sdes  updated more carefully.  If one of the NSEC records in an NXDOMAIN is
261238106Sdes  updated from another query, the NXDOMAIN is dropped from the cache,
262238106Sdes  and queried for again, so that its proof can be checked again.
263238106Sdes
264238106Sdeso SOA records in negative cached answers for DS queries.
265238106Sdes  The current unbound code uses a negative cache for queries for type DS.
266238106Sdes  This speeds up building chains of trust, and uses NSEC and NSEC3
267238106Sdes  (optout) information to speed up lookups.  When used internally,
268238106Sdes  the bare NSEC(3) information is sufficient, probably picked up from
269238106Sdes  a referral.  When answering to clients, a SOA record is needed for
270238106Sdes  the correct message format, a SOA record is picked from the cache
271238106Sdes  (and may not actually match the serial number of the SOA for which the
272238106Sdes  NSEC and NSEC3 records were obtained) if available otherwise network
273238106Sdes  queries are performed to get the data.
274238106Sdes
275238106Sdeso Parent and child with different nameserver information.
276238106Sdes  A misconfiguration that sometimes happens is where the parent and child
277238106Sdes  have different NS, glue information.  The child is authoritative, and
278238106Sdes  unbound will not trust information from the parent nameservers as the
279238106Sdes  final answer.  To help lookups, unbound will however use the parent-side
280238106Sdes  version of the glue as a last resort lookup.  This resolves lookups for
281238106Sdes  those misconfigured domains where the servers reported by the parent
282238106Sdes  are the only ones working, and servers reported by the child do not.
283238106Sdes
284238106Sdeso Failure of validation and probing.
285238106Sdes  Retries on a validation failure are now 5x to a different nameserver IP
286238106Sdes  (if possible), and then it gives up, for one name, type, class entry in
287238106Sdes  the message cache.  If a DNSKEY or DS fails in the chain of trust in the
288238106Sdes  key cache additionally, after the probing, a bad key entry is created that
289238106Sdes  makes the entire zone bogus for 900 seconds.  This is a fixed value at
290238106Sdes  this time and is conservative in sending probes.  It makes the compound
291238106Sdes  effect of many resolvers less and easier to handle, but penalizes
292238106Sdes  individual resolvers by having less probes and a longer time before fixes
293238106Sdes  are picked up.
294238106Sdes
295