Name Server Selec+on of DNS Caching Resolvers
Yingdi Yu1, Duane Wessels2, Ma> Larson2, and Lixia Zhang1
1
1 UCLA 2 VeriSign Labs
1UCLA 2VeriSign Lab
Why Select Servers
• Query the fastest server – To minimize itera+ve query delays.
• Also query other servers from +me to +me – to tell which server is the fastest. – to avoid overloading fast servers. – to prevent from being a>acked, e.g., Kaminsky-‐style cache poisoning.
2
Ques+ons to Answer
• Do caching resolvers select the fastest server every +me?
• If some resolvers select slow servers: is that inten+onal, or by mistake?
• Can resolvers promptly detect changes in query response delays?
3
Tested Caching Resolvers
• BIND 9.7.3 • BIND 9.8.0 • PowerDNS 3.1.5 • Unbound 1.4.10 • DNSCache 1.05 • Windows DNS 6.1 (Windows Server 2008)
4
Measurement Testbed
5
Root
13 servers
SLD
⋯⋯ .com Network Emulator
Caching Resolver
Trace Replayer
Test domain
Redirect queries to the Test domain
Use Wildcard to terminate queries
Emulate different packet loss and delays to different servers.
Replay a DNS lookup trace
The tested caching resolvers (default cache size)
Query Load
• Trace data – From a large U.S. ISP – 10 minutes of resolver logs – About 3.5 million lookups – 408,808 unique DNS names – 154,165 Second Level Domains
• Average itera+ve query rate: – 250 queries/sec – Could be higher if
• Cache is small • TTL of DNS RRs are short
6
70ms%
Caching%%Resolver%
TLD%NS%1%
TLD%NS%2%
TLD%NS%3%
TLD%NS%13%
Test Scenarios • Scenario 1:
– Whether caching resolvers can tell the differences in RTT.
7
70ms%
Caching%%Resolver%
TLD%NS%1%
TLD%NS%2%
TLD%NS%3%
TLD%NS%13%
Test Scenarios • Scenario 1:
– Whether caching resolvers can tell the differences in RTT.
• Scenario 2: – How caching resolver handle a
unreachable server?
8
70ms%
Caching%%Resolver%
Become%reachable%again%a5er%5%mins%
TLD%NS%1%
TLD%NS%2%
TLD%NS%3%
TLD%NS%13%
Test Scenarios • Scenario 1:
– Whether caching resolvers can tell the differences in RTT.
• Scenario 2: – How caching resolver handle a
unreachable server?
• Scenario 3: – Whether caching resolvers can
detect server recovery in a short +me.
9
Server Selec+on • Strategy 1:
– Select the one with the least Smoothed RTT (SRTT). – Responses update SRTT of corresponding servers. – Other servers: SRTT decays exponen+ally. – Implementa+ons: BIND 9.8, PowerDNS
• Strategy 2: – Select a server based on an SRTT-‐related probability. – Responses update SRTT of corresponding servers. – Slow servers can be selected, but probabili+es may be small. – Implementa+ons: BIND 9.7, Unbound
10
Some prefer the fastest • PowerDNS
– Always selects the least SRTT server.
• BIND 9.7 – Selects sta+s+cally among all servers
with SRTT < 128ms. – Smaller SRTT, higher probability. – Decays all servers’ SRTT
• Even a server’s RTT > 128ms, it can s+ll be selected ajer certain +me.
12
Name server indexed by RTT(ms)50 60 70 80 90 100 110 120 130 140 150 160 170
Que
ries
rece
ived
(%)
0
5
10
15
20
Name server indexed by RTT(ms)50 60 70 80 90 100 110 120 130 140 150 160 170
Que
ries
rece
ived
(%)
0
20
40
60
80
100
Bind 9.7 in Scenario 1
PowerDNS in Scenario 1
Some do NOT prefer the fastest • DNSCache:
– Does not measure RTT. – Randomly selects among all
servers.
• Unbound: – Measures RTT. – Randomly selects among
servers with SRTT < 400ms.
13
Name server indexed by RTT(ms)50 60 70 80 90 100 110 120 130 140 150 160 170
Que
ries
rece
ived
(%)
0
2
4
6
8
10
Name server indexed by RTT(ms)Inf. 60 70 80 90 100 110 120 130 140 150 160 170
Que
ries
rece
ived
(%)
0
2
4
6
8
10
Name server indexed by RTT(ms)50 60 70 80 90 100 110 120 130 140 150 160 170
Que
ries
rece
ived
(%)
0
2
4
6
8
10
Unbound in Scenario 1 dnscache in Scenario 1 Windows DNS* in Scenario 1
Queries are distributed evenly among servers.
* Without source code, we do not know reasons for query distribu+on of Windows DNS.
Some sends more queries to slower
• BIND 9.8 – Always selects the least SRTT
server (same as PowerDNS).
• Measurements show that more queries are sent to slower servers.
• Why?
14
Name server indexed by RTT(ms)50 60 70 80 90 100 110 120 130 140 150 160 170
Que
ries
rece
ived
(%)
0
3
6
9
12
BIND 9.8 in Scenario 1
* ISC has been no+fied about this problem and we are awai+ng their response.
How does BIND 9.8 select a server? • The por+on of queries to a
name server NS is:
• Both tselect and tupdate are approximately equal to RTT.
• Faster decaying, smaller differences in tdecay!
• Decaying speed is coupled with query rate.
15
tdecay tselect tupdate
+me
SRTT
t1 t2
rmin
rrtt
t3 t4
Periodical SRTT varia+on of a name server NS
tselect
tdecay + tselect + tupdate Nquery =
If query rate is low
Name server indexed by RTT(ms)50 60 70 80 90 100 110 120 130 140 150 160 170
Que
ries
rece
ived
(%)
0
3
6
9
12
Name server indexed by RTT(ms)50 60 70 80 90 100 110 120 130 140 150 160 170
Que
ries
rece
ived
(%)
0
3
6
9
12
16
Brief Article
The Author
December 5, 2011
1 First section
Your text goes here.
SRTTn = � · SRTTn+1 (1)
1.1 A subsection
More text.
1
Brief Article
The Author
December 5, 2011
1 First section
Your text goes here.
SRTTn = � · SRTTn+1 (1)
1.1 A subsection
More text.
1
250 queries/second 25 queries/second
BIND 9.8 under different query rates in Scenario 1
Lower query rate Larger tdecay Fewer queries to slower servers
tselect
tdecay + tselect + tupdate Nquery =
What leads to high itera+ve query rate
• Factors at resolver side: – Cache is small in resolver. – A resolver is shared by a large number of users.
• Factors at domain side: – TTL of resource record is short. – Domain has a lot of records that are ojen queried.
17
If a server is unresponsive • BIND may send much more
queries to the unresponsive one. • Treat unresponsive as
“responsive” with a large SRTT. – Queries are s+ll sent to the
unresponsive server for tselect seconds. – tselect ≈ the value of +meout +mer.
• Longer tselect & shorter tdecay, larger Nquery.
18
Name server indexed by RTT(ms)Inf. 60 70 80 90 100 110 120 130 140 150 160 170
Que
ries
rece
ived
(%)
0
5
10
15
20
Name server indexed by RTT(ms)Inf. 60 70 80 90 100 110 120 130 140 150 160 170
Que
ries
rece
ived
(%)
0
5
10
15
20
25
Bind 9.7 in Scenario 2
Bind 9.8 in Scenario 2
tselect
tdecay + tselect + tupdate Nquery =
How others handle unresponsive servers
19
Name server indexed by RTT(ms)Inf. 60 70 80 90 100 110 120 130 140 150 160 170
Que
ries
rece
ived
(%)
0
2
4
6
8
10
Name server indexed by RTT(ms)Inf. 60 70 80 90 100 110 120 130 140 150 160 170
Que
ries
rece
ived
(%)
0
2
4
6
8
10
Name server indexed by RTT(ms)Inf. 60 70 80 90 100 110 120 130 140 150 160 170
Que
ries
rece
ived
(%)
0
2
4
6
8
10
Unbound (✔): Can detect the unresponsive server, then use a few queries to probe it.
DNSCache (✖): No historical RTT record. Windows DNS (✔): Unknown
Name server indexed by RTT(ms)Inf. 60 70 80 90 100 110 120 130 140 150 160 170
Que
ries
rece
ived
(%)
0
20
40
60
80
100
PowerDNS (✔): Slow decaying reduces frequency of selec+ng slow servers.
Latency to detect server recovery
• Unbound: up to 15 minutes – Large interval between two consecu+ve probes.
• PowerDNS: 3 minutes – Caused by slow decaying
• BIND: depends on query rate • Windows DNS: about 1 second
20
Conclusions
• Comprehensive test is needed. • Some “seemingly sound” implementa+ons may not work as expected. – e.g. SRTT decaying.
• An unresponsive server should NOT be treated as a regular server with a large SRTT.
• Unresponsive servers can impact zone performance. – Should be repaired ASAP, even if others work well.
22
Top Related