1238106SdesRequirements for Recursive Caching Resolver 2238106Sdes (a.k.a. Treeshrew, Unbound-C) 3238106SdesBy W.C.A. Wijngaards, NLnet Labs, October 2006. 4238106Sdes 5238106SdesContents 6238106Sdes1. Introduction 7238106Sdes2. History 8238106Sdes3. Goals 9238106Sdes4. Non-Goals 10238106Sdes 11238106Sdes 12238106Sdes1. Introduction 13238106Sdes--------------- 14238106SdesThis is the requirements document for a DNS name server and aims to 15238106Sdesdocument the goals and non-goals of the project. The DNS (the Domain 16238106SdesName System) is a global, replicated database that uses a hierarchical 17238106Sdesstructure for queries. 18238106Sdes 19238106SdesData in the DNS is stored in Resource Record sets (RR sets), and has a 20238106Sdestime to live (TTL). During this time the data can be cached. It is 21238106Sdesthus useful to cache data to speed up future lookups. A server that 22238106Sdeslooks up data in the DNS for clients and caches previous answers to 23238106Sdesspeed up processing is called a caching, recursive nameserver. 24238106Sdes 25238106SdesThis project aims to develop such a nameserver in modular components, so 26238106Sdesthat also DNSSEC (secure DNS) validation and stub-resolvers (that do not 27238106Sdesrun as a server, but a linked into an application) are easily possible. 28238106Sdes 29238106SdesThe main components are the Validator that validates the security 30238106Sdesfingerprints on data sets, the Iterator that sends queries to the 31238106Sdeshierarchical DNS servers that own the data and the Cache that stores 32238106Sdesdata from previous queries. The networking and query management code 33238106Sdesthen interface with the modules to perform the necessary processing. 34238106Sdes 35238106SdesIn Section 2 the origins of the Unbound project are documented. Section 36238106Sdes3 lists the goals, while Section 4 lists the explicit non-goals of the 37238106Sdesproject. Section 5 discusses choices made during development. 38238106Sdes 39238106Sdes 40238106Sdes2. History 41238106Sdes---------- 42238106SdesThe unbound resolver project started by Bill Manning, David Blacka, and 43238106SdesMatt Larson (from the University of California and from Verisign), that 44238106Sdescreated a Java based prototype resolver called Unbound. The basic 45238106Sdesdesign decisions of clean modules was executed. 46238106Sdes 47238106SdesThe Java prototype worked very well, with contributions from Geoff 48238106SdesSisson and Roy Arends from Nominet. Around 2006 the idea came to create 49238106Sdesa full-fledged C implementation ready for deployed use. NLnet Labs 50238106Sdesvolunteered to write this implementation. 51238106Sdes 52238106Sdes 53238106Sdes3. Goals 54238106Sdes-------- 55238106Sdeso A validating recursive DNS resolver. 56238106Sdeso Code diversity in the DNS resolver monoculture. 57238106Sdeso Drop-in replacement for BIND apart from config. 58238106Sdeso DNSSEC support. 59238106Sdeso Fully RFC compliant. 60238106Sdeso High performance 61238106Sdes * even with validation. 62238106Sdeso Used as 63238106Sdes * stub resolver. 64238106Sdes * full caching name server. 65238106Sdes * resolver library. 66238106Sdeso Elegant design of validator, resolver, cache modules. 67238106Sdes * provide the ability to pick and choose modules. 68238106Sdeso Robust. 69238106Sdeso In C, open source: The BSD license. 70238106Sdeso Highly portable, targets include modern Unix systems, such as *BSD, 71238106Sdessolaris, linux, and maybe also the windows platform. 72238106Sdeso Smallest as possible component that does the job. 73238106Sdeso Stub-zones can be configured (local data or AS112 zones). 74238106Sdes 75238106Sdes 76238106Sdes4. Non-Goals 77238106Sdes------------ 78238106Sdeso An authoritative name server. 79238106Sdeso Too many Features. 80238106Sdes 81238106Sdes 82238106Sdes5. Choices 83238106Sdes---------- 84238106Sdeso rfc2181 decourages duplicates RRs in RRsets. unbound does not create 85238106Sdes duplicates, but when presented with duplicates on the wire from the 86238106Sdes authoritative servers, does not perform duplicate removal. 87238106Sdes It does do some rrsig duplicate removal, in the msgparser, for dnssec qtype 88238106Sdes rrsig and any, because of special rrsig processing in the msgparser. 89238106Sdeso The harden-glue feature, when yes all out of zone glue is deleted, when 90238106Sdes no out of zone glue is used for further resolving, is more complicated 91238106Sdes than that, see below. 92238106Sdes Main points: 93238106Sdes * rfc2182 trust handling is used. 94238106Sdes * data is let through only in very specific cases 95238106Sdes * spoofability remains possible. 96238106Sdes Not all glue is let through (despite the name of the option). Only glue 97238106Sdes which is present in a delegation, of type A and AAAA, where the name is 98238106Sdes present in the NS record in the authority section is let through. 99238106Sdes The glue that is let through is stored in the cache (marked as 'from the 100238106Sdes additional section'). And will then be used for sending queries to. It 101238106Sdes will not be present in the reply to the client (if RD is off). 102238106Sdes A direct query for that name will attempt to get a msg into the message 103238106Sdes cache. Since A and AAAA queries are not synthesized by the unbound cache, 104238106Sdes this query will be (eventually) sent to the authoritative server and its 105238106Sdes answer will be put in the cache, marked as 'from the answer section' and 106238106Sdes thus remove the 'from the additional section' data, and this record is 107238106Sdes returned to the client. 108238106Sdes The message has a TTL smaller or equal to the TTL of the answer RR. 109238106Sdes If the cache memory is low; the answer RR may be dropped, and a glue 110238106Sdes RR may be inserted, within the message TTL time, and thus return the 111238106Sdes spoofed glue to a client. When the message expires, it is refetched and 112238106Sdes the cached RR is updated with the correct content. 113238106Sdes The server can be spoofed by getting it to visit a especially prepared 114238106Sdes domain. This domain then inserts an address for another authoritative 115238106Sdes server into the cache, when visiting that other domain, this address may 116238106Sdes then be used to send queries to. And fake answers may be returned. 117238106Sdes If the other domain is signed by DNSSEC, the fakes will be detected. 118238106Sdes 119238106Sdes In summary, the harden glue feature presents a security risk if 120238106Sdes disabled. Disabling the feature leads to possible better performance 121238106Sdes as more glue is present for the recursive service to use. The feature 122238106Sdes is implemented so as to minimise the security risk, while trying to 123238106Sdes keep this performance gain. 124238106Sdeso The method by which dnssec-lameness is detected is not secure. DNSSEC lame 125238106Sdes is when a server has the zone in question, but lacks dnssec data, such as 126238106Sdes signatures. The method to detect dnssec lameness looks at nonvalidated 127238106Sdes data from the parent of a zone. This can be used, by spoofing the parent, 128238106Sdes to create a false sense of dnssec-lameness in the child, or a false sense 129238106Sdes or dnssec-non-lameness in the child. The first results in the server marked 130238106Sdes lame, and not used for 900 seconds, and the second will result in a 131238106Sdes validator failure (SERVFAIL again), when the query is validated later on. 132238106Sdes 133238106Sdes Concluding, a spoof of the parent delegation can be used for many cases 134238106Sdes of denial of service. I.e. a completely different NS set could be returned, 135238106Sdes or the information withheld. All of these alterations can be caught by 136238106Sdes the validator if the parent is signed, and result in 900 seconds bogus. 137238106Sdes The dnssec-lameness detection is used to detect operator failures, 138238106Sdes before the validator will properly verify the messages. 139238106Sdes 140238106Sdes Also for zones for which no chain of trust exists, but a DS is given by the 141238106Sdes parent, dnssec-lameness detection enables. This delivers dnssec to our 142238106Sdes clients when possible (for client validators). 143238106Sdes 144238106Sdes The following issue needs to be resolved: 145238106Sdes a server that serves both a parent and child zone, where 146238106Sdes parent is signed, but child is not. The server must not be marked 147238106Sdes lame for the parent zone, because the child answer is not signed. 148238106Sdes Instead of a false positive, we want false negatives; failure to 149238106Sdes detect dnssec-lameness is less of a problem than marking honest 150238106Sdes servers lame. dnssec-lameness is a config error and deserves the trouble. 151238106Sdes So, only messages that identify the zone are used to mark the zone 152238106Sdes lame. The zone is identified by SOA or NS RRsets in the answer/auth. 153238106Sdes That includes almost all negative responses and also A, AAAA qtypes. 154238106Sdes That would be most responses from servers. 155238106Sdes For referrals, delegations that add a single label can be checked to be 156238106Sdes from their zone, this covers most delegation-centric zones. 157238106Sdes 158238106Sdes So possibly, for complicated setups, with multiple (parent-child) zones 159238106Sdes on a server, dnssec-lameness detection does not work - no dnssec-lameness 160238106Sdes is detected. Instead the zone that is dnssec-lame becomes bogus. 161238106Sdes 162238106Sdeso authority features. 163238106Sdes This is a recursive server, and authority features are out of scope. 164238106Sdes However, some authority features are expected in a recursor. Things like 165238106Sdes localhost, reverse lookup for 127.0.0.1, or blocking AS112 traffic. 166238106Sdes Also redirection of domain names with fixed data is needed by service 167238106Sdes providers. Limited support is added specifically to address this. 168238106Sdes 169238106Sdes Adding full authority support, requires much more code, and more complex 170238106Sdes maintenance. 171238106Sdes 172238106Sdes The limited support allows adding some static data (for localhost and so), 173238106Sdes and to respond with a fixed rcode (NXDOMAIN) for domains (such as AS112). 174238106Sdes 175238106Sdes You can put authority data on a separate server, and set the server in 176238106Sdes unbound.conf as stub for those zones, this allows clients to access data 177238106Sdes from the server without making unbound authoritative for the zones. 178238106Sdes 179238106Sdeso the access control denies queries before any other processing. 180238106Sdes This denies queries that are not authoritative, or version.bind, or any. 181238106Sdes And thus prevents cache-snooping (denied hosts cannot make non-recursive 182238106Sdes queries and get answers from the cache). 183238106Sdes 184238106Sdeso If a client makes a query without RD bit, in the case of a returned 185238106Sdes message from cache which is: 186238106Sdes answer section: empty 187238106Sdes auth section: NS record present, no SOA record, no DS record, 188238106Sdes maybe NSEC or NSEC3 records present. 189238106Sdes additional: A records or other relevant records. 190238106Sdes A SOA record would indicate that this was a NODATA answer. 191238106Sdes A DS records would indicate a referral. 192238106Sdes Absence of NS record would indicate a NODATA answer as well. 193238106Sdes 194238106Sdes Then the receiver does not know whether this was a referral 195238106Sdes with attempt at no-DS proof) or a nodata answer with attempt 196238106Sdes at no-data proof. It could be determined by attempting to prove 197238106Sdes either condition; and looking if only one is valid, but both 198238106Sdes proofs could be valid, or neither could be valid, which creates 199238106Sdes doubt. This case is validated by unbound as a 'referral' which 200238106Sdes ascertains that RRSIGs are OK (and not omitted), but does not 201238106Sdes check NSEC/NSEC3. 202238106Sdes 203238106Sdeso Case preservation 204238106Sdes Unbound preserves the casing received from authority servers as best 205238106Sdes as possible. It compresses without case, so case can get lost there. 206238106Sdes The casing from the query name is used in preference to the casing 207238106Sdes of the authority server. This is the same as BIND. RFC4343 allows either 208238106Sdes behaviour. 209238106Sdes 210238106Sdeso Denial of service protection 211238106Sdes If many queries are made, and they are made to names for which the 212238106Sdes authority servers do not respond, then the requestlist for unbound 213238106Sdes fills up fast. This results in denial of service for new queries. 214238106Sdes To combat this the first 50% of the requestlist can run to completion. 215238106Sdes The last 50% of the requestlist get (200 msec) at least and are replaced 216238106Sdes by newer queries when older (LIFO). 217238106Sdes When a new query comes in, and a place in the first 50% is available, this 218238106Sdes is preferred. Otherwise, it can replace older queries out of the last 50%. 219238106Sdes Thus, even long queries get a 50% chance to be resolved. And many 'short' 220238106Sdes one or two round-trip resolves can be done in the last 50% of the list. 221238106Sdes The timeout can be configured. 222238106Sdes 223238106Sdeso EDNS fallback. Is done according to the EDNS RFC (and update draft-00). 224238106Sdes Unbound assumes EDNS 0 support for the first query. Then it can detect 225238106Sdes support (if the servers replies) or non-support (on a NOTIMPL or FORMERR). 226238106Sdes Some middleboxes drop EDNS 0 queries, mainly when forwarding, not when 227238106Sdes routing packets. To detect this, when timeouts keep happening, as the 228238106Sdes timeout approached 5-10 seconds, and EDNS status has not been detected yet, 229238106Sdes a single probe query is sent. This probe has a sub-second timeout, and 230238106Sdes if the server responds (quickly) without EDNS, this is cached for 15 min. 231238106Sdes This works very well when detecting an address that you use much - like 232238106Sdes a forwarder address - which is where the middleboxes need to be detected. 233238106Sdes Otherwise, it results in a 5 second wait time before EDNS timeout is 234238106Sdes detected, which is slow but it works at least. 235238106Sdes It minimizes the chances of a dropped query making a (DNSSEC) EDNS server 236238106Sdes falsely EDNS-nonsupporting, and thus DNSSEC-bogus, works well with 237238106Sdes middleboxes, and can detect the occasional authority that drops EDNS. 238238106Sdes For some boxes it is necessary to probe for every failing query, a 239238106Sdes reassurance that the DNS server does EDNS does not mean that path can 240238106Sdes take large DNS answers. 241238106Sdes 242238106Sdeso 0x20 backoff. 243238106Sdes The draft describes to back off to the next server, and go through all 244238106Sdes servers several times. Unbound goes on get the full list of nameserver 245238106Sdes addresses, and then makes 3 * number of addresses queries. 246238106Sdes They are sent to a random server, but no one address more than 4 times. 247238106Sdes It succeeds if one has 0x20 intact, or else all are equal. 248238106Sdes Otherwise, servfail is returned to the client. 249238106Sdes 250238106Sdeso NXDOMAIN and SOA serial numbers. 251238106Sdes Unbound keeps TTL values for message formats, and thus rcodes, such 252238106Sdes as NXDOMAIN. Also it keeps the latest rrsets in the rrset cache. 253238106Sdes So it will faithfully negative cache for the exact TTL as originally 254238106Sdes specified for an NXDOMAIN message, but send a newer SOA record if 255238106Sdes this has been found in the mean time. In point, this could lead to a 256238106Sdes negative cached NXDOMAIN reply with a SOA RR where the serial number 257238106Sdes indicates a zone version where this domain is not any longer NXDOMAIN. 258238106Sdes These situations become consistent once the original TTL expires. 259238106Sdes If the domain is DNSSEC signed, by the way, then NSEC records are 260238106Sdes updated more carefully. If one of the NSEC records in an NXDOMAIN is 261238106Sdes updated from another query, the NXDOMAIN is dropped from the cache, 262238106Sdes and queried for again, so that its proof can be checked again. 263238106Sdes 264238106Sdeso SOA records in negative cached answers for DS queries. 265238106Sdes The current unbound code uses a negative cache for queries for type DS. 266238106Sdes This speeds up building chains of trust, and uses NSEC and NSEC3 267238106Sdes (optout) information to speed up lookups. When used internally, 268238106Sdes the bare NSEC(3) information is sufficient, probably picked up from 269238106Sdes a referral. When answering to clients, a SOA record is needed for 270238106Sdes the correct message format, a SOA record is picked from the cache 271238106Sdes (and may not actually match the serial number of the SOA for which the 272238106Sdes NSEC and NSEC3 records were obtained) if available otherwise network 273238106Sdes queries are performed to get the data. 274238106Sdes 275238106Sdeso Parent and child with different nameserver information. 276238106Sdes A misconfiguration that sometimes happens is where the parent and child 277238106Sdes have different NS, glue information. The child is authoritative, and 278238106Sdes unbound will not trust information from the parent nameservers as the 279238106Sdes final answer. To help lookups, unbound will however use the parent-side 280238106Sdes version of the glue as a last resort lookup. This resolves lookups for 281238106Sdes those misconfigured domains where the servers reported by the parent 282238106Sdes are the only ones working, and servers reported by the child do not. 283238106Sdes 284238106Sdeso Failure of validation and probing. 285238106Sdes Retries on a validation failure are now 5x to a different nameserver IP 286238106Sdes (if possible), and then it gives up, for one name, type, class entry in 287238106Sdes the message cache. If a DNSKEY or DS fails in the chain of trust in the 288238106Sdes key cache additionally, after the probing, a bad key entry is created that 289238106Sdes makes the entire zone bogus for 900 seconds. This is a fixed value at 290238106Sdes this time and is conservative in sending probes. It makes the compound 291238106Sdes effect of many resolvers less and easier to handle, but penalizes 292238106Sdes individual resolvers by having less probes and a longer time before fixes 293238106Sdes are picked up. 294238106Sdes 295