BIND DS Servfail

10
SERVFAILS IN .NET AFTER PUBLICATION OF DS RECORD DUANE WESSELS VERISIGN LABS

Transcript of BIND DS Servfail

Page 1: BIND DS Servfail

SERVFAILS IN .NET AFTER PUBLICATION OF DS RECORD DUANE WESSELS VERISIGN LABS

Page 2: BIND DS Servfail

Table of Contents Introduction................................................................................................................................... 3 Experiment Setup......................................................................................................................... 3

Phase 1: Pre-Signing.............................................................................................................................. 4 Phase 2: Deliberately Unvalidatable..................................................................................................... 4 Phase 3: DS in .NET................................................................................................................................ 5 Phase 4: Unblinding ............................................................................................................................... 5 Phase 5: DS in root................................................................................................................................. 5

SERVFAIL from BIND 9.7.0 .......................................................................................................... 5 Dependency on Phase Duration ................................................................................................. 6 When TTLs match......................................................................................................................... 8 Earlier Versions of BIND.............................................................................................................. 8 Fixed in BIND 9.7.1b1 ................................................................................................................... 9 Without Blinding......................................................................................................................... 10 Workarounds .............................................................................................................................. 10

Page 3: BIND DS Servfail

INTRODUCTION On the day that the DS record for .NET was published in the root zone (December 9,

2010) a user reported experiencing a failure to resolve all .NET domain names for a

period of approximately two hours until his nameserver was restarted.

The resolver was configured with a root zone trust anchor. The resolver software was

BIND 9.7.0-P2.

This document describes an attempt to reproduce the reported problem in a lab

environment. This experiment is designed to mimic the procedures for signing the .NET

zone. There are five phases:

1. Pre-signing. The .NET zone is not signed and does not contain DNSKEY or DS

records.

2. Deliberately Unvalidatable. The .NET zone is signed, but DNSKEY records are

“blinded.”

3. DS in .NET. DS records are published in the .NET zone.

4. Unblinding. The DNSKEY records in the .NET zone are unblinded.

5. DS in root. The DS record for .NET is published in the root zone.

During the actual .NET deployment, at least 48 hours passed between phases 4 and 5,

allowing sufficient time for all DNSKEY records (which have TTL 24 hours) to expire from

resolver caches.

EXPERIMENT SETUP In these experiments, shorter TTLs are used. Real world TTLs are divided by 1440 so

that 1-day now becomes 1-minute.

A root zone is created with a single delegation to .NET.

Page 4: BIND DS Servfail

A .NET zone is created with two delegations to UNSIGNED.NET and SIGNED.NET.

Zones are signed and served authoritatively by BIND-9.7.0 tools.

A Perl script using Net::DNS sends queries to the BIND resolver every 15 seconds and

logs the query time, query name, response code, AD bit, RDATA, and TTL.

PHASE 1: PRE-SIGNING

The root zone is signed with 1024-bit keys using algorithm 8. It contains only a single

delegation to .NET, plus the necessary glue.

. 60 IN SOA ns1. dns. ( 1292377267 ; serial 15 ; refresh (15 seconds) 10 ; retry (10 seconds) 420 ; expire (7 minutes) 60 ; minimum (1 minute) ) 60 RRSIG SOA 8 0 60 20110119191346 ( 20101220191346 11325 . DoREtOvBUCc18cuo8Jst7wJ046Ie3PoYNc4l a7B5GGkoJEWY2YwL2vsyeOrSPBc+z+waLe2R mfu3OdrYA6QfUMWX0Ej4Gh83+OsWXlbxnqla +8dIYY5JZws7n64izxsbVXFATM3HutECdtxi /q7JxKbD3A9PBdJhKADIwG1/CK4= ) 360 NS a.root-servers.net. (etc) net. 120 IN NS a.gtld-servers.net. (etc)

PHASE 2: DELIBERATELY UNVALIDATABLE

Keys are generated for the .NET zone (1024-bit, algorithm 8). The zone is signed.

DNSKEY records are blinded before publishing using this command and the zoneblind-

private.pl script from the root-dnssec repository.

$ named-checkzone -i none -n ignore -o - net ../zones/net.signed \ | perl zoneblind-private.pl \ > net.blind $scp -p net.blind tld:/etc/namedb/master/net.zone

DNSKEY records are given a 60 second TTL.

Page 5: BIND DS Servfail

PHASE 3: DS IN .NET

The SIGNED.NET zone is signed (1024-bit Algorithm 8) and its DS records are added to

the .NET zone.

The .NET zone is re-signed and re-blinded.

PHASE 4: UNBLINDING

The .NET zone is re-signed but no longer blinded.

PHASE 5: DS IN ROOT

DS records for .NET are added to the root zone, which is re-signed and published.

SERVFAIL FROM BIND 9.7.0 Following phase 5, after cached records expire, BIND-9.7.0-P2 may return SERVFAIL

for the unsigned zone (and NOERROR for the signed zone). Here is the output from the

query script:

(in phase 4 here) 1293039186 www.unsigned.net NOERROR AD=0 127.0.0.1 30 1293039201 www.signed.net NOERROR AD=0 127.0.0.1 15 1293039201 www.unsigned.net NOERROR AD=0 127.0.0.1 15 1293039216 www.signed.net NOERROR AD=0 127.0.0.1 60 1293039216 www.unsigned.net NOERROR AD=0 127.0.0.1 60 1293039231 www.signed.net NOERROR AD=0 127.0.0.1 45 1293039231 www.unsigned.net NOERROR AD=0 127.0.0.1 45 1293039246 www.signed.net NOERROR AD=0 127.0.0.1 30 1293039246 www.unsigned.net NOERROR AD=0 127.0.0.1 30 1293039261 www.signed.net NOERROR AD=0 127.0.0.1 15 1293039261 www.unsigned.net NOERROR AD=0 127.0.0.1 15 (phase 5 begins here) 1293039276 www.signed.net NOERROR AD=0 127.0.0.1 60 1293039277 www.unsigned.net SERVFAIL AD=0 1293039292 www.signed.net NOERROR AD=0 127.0.0.1 44 1293039292 www.unsigned.net SERVFAIL AD=0 1293039307 www.signed.net NOERROR AD=0 127.0.0.1 29 1293039307 www.unsigned.net SERVFAIL AD=0 1293039322 www.signed.net NOERROR AD=0 127.0.0.1 14 1293039322 www.unsigned.net SERVFAIL AD=0

Page 6: BIND DS Servfail

(all cached records expired by now) 1293039338 www.signed.net NOERROR AD=1 127.0.0.1 60 1293039338 www.unsigned.net SERVFAIL AD=0 1293039353 www.signed.net NOERROR AD=1 127.0.0.1 45 1293039353 www.unsigned.net SERVFAIL AD=0 1293039368 www.signed.net NOERROR AD=1 127.0.0.1 30 1293039368 www.unsigned.net SERVFAIL AD=0 1293039383 www.signed.net NOERROR AD=1 127.0.0.1 15 1293039383 www.unsigned.net SERVFAIL AD=0 1293039398 www.signed.net NOERROR AD=1 127.0.0.1 60 1293039398 www.unsigned.net SERVFAIL AD=0 1293039413 www.signed.net NOERROR AD=1 127.0.0.1 44 1293039413 www.unsigned.net SERVFAIL AD=0

Experimentation showed that the SERVFAIL condition did not always happen and may

depend on the duration of each phase of the deployment.

DEPENDENCY ON PHASE DURATION A script was written to repeatedly run the simulation, each time varying the durations of

phases 3 and 4. One goal of this experiment is to discover if, by using a longer phase 4

duration, the SERVFAIL condition can be avoided.

The duration of phase 3 is the time between publishing DS records in the NET zone and

unblinding the NET DNSKEYs.

The duration of phase 4 is the time between unblinding the NET DNSKEYs and

publishing the NET DS record in the root zone.

In this experiment, once the NET DS record is published, the script sleeps for 70

seconds to allow cached records to expire. It then uses dig to issue a query for

WWW.UNSIGNED.NET and records the response code. The following figure shows the

results:

Page 7: BIND DS Servfail

The phase 3 duration is represented on the Y-axis, and the phase 4 duration on the X-

axis. Red triangles indicate SERVFAIL results, while green circles show cases where

resolution of unsigned names was successful.

Recall that these experiments use TTLs equal to real-world TTLs divided by 1440. Thus,

a TTL originally equal to one day becomes a 1-minute TTL in the experiment. The axis

labels in the graph indicate that the axis values may be interpreted as either days or

minutes.

The pattern appears to be related to the sum of the phase 3 and phase 4 duration, as

shown here:

BIND 9.7.0 P2 DS Introduction Behavior

Time between Unblinding and DS in Root (minutes or days)0 1 2 3 4 5 6 7 8 9 10

Tim

e be

twee

n Si

gnin

g an

dUn

blin

ding

(min

utes

or d

ays)

0

1

2

3

SERVFAILNOERROR

BIND 9.7.0 P2 DS Introduction Behavior

Time between Signing and DS in Root (minutes or days)0 1 2 3 4 5 6 7 8 9 10

Tim

e be

twee

n Si

gnin

g an

dUn

blin

ding

(min

utes

or d

ays)

0

1

2

3

SERVFAILNOERROR

Page 8: BIND DS Servfail

Furthermore, the fact that the pattern has period of 2 days with a “good region” that is 1

day long followed by a “bad region” that is one day long, leads us to believe that it is

caused by the difference in NS/glue TTLs (2 days) versus DNSKEY TTLs (1 day).

WHEN TTLS MATCH In the production NET zone, NS/glue records have 2-day TTLs and DNSKEY records

have 1-day TTLs. In these experiments so far they have been 2-minute and 1-minute

TTLs.

When both DNSKEY and NS/glue TTLs are set to 1 day/minute, the following results are

obtained:

Clearly, when the TTLs match, BIND 9.7.0-P2 does not exhibit the SERVFAIL problem.

EARLIER VERSIONS OF BIND The bug is present in versions as early as BIND-9.6.2-P2, which is one of the first

versions to support the SHA-256 algorithm:

BIND 9.7.0 P2 DS Introduction BehaviorMatching TTLs

Time between Unblinding and DS in Root (minutes or days)0 1 2 3 4 5 6 7 8 9 10

Tim

e be

twee

n Si

gnin

g an

dUn

blin

ding

(min

utes

or d

ays)

0

1

2

3

SERVFAILNOERROR

Page 9: BIND DS Servfail

FIXED IN BIND 9.7.1B1 Based on bug descriptions in the BIND CHANGES file, the following entry sounds like it

could be the bug we are seeing here:

2890. [bug] Handle the introduction of new trusted-keys and DS, DLV RRsets better. [RT #21097]

The same tests as above were made against BIND 9.7.1b1 with the following results:

BIND 9.6.2 P2 DS Introduction BehaviorMismatched TTLs

Time between Unblinding and DS in Root (minutes or days)0 1 2 3 4 5 6 7 8 9 10

Tim

e be

twee

n Si

gnin

g an

dUn

blin

ding

(min

utes

or d

ays)

0

1

2

3

SERVFAILNOERROR

BIND 9.7.1B1 DS Introduction BehaviorMismatched TTLs

Time between Unblinding and DS in Root (minutes or days)0 1 2 3 4 5 6 7 8 9 10

Tim

e be

twee

n Si

gnin

g an

dUn

blin

ding

(min

utes

or d

ays)

0

1

2

3

SERVFAILNOERROR

Page 10: BIND DS Servfail

WITHOUT BLINDING In the following test, the .NET keys are not blinded, but the DNSKEY and NS TTLs

remain different. In this case the bug still manifests, indicating that the introduction of

the DS record in the root zone, rather than the blinding/unblinding process, is the likely

cause.

WORKAROUNDS Based on these tests, the following there are three ways to work around this bug in

BIND:

1. Upgrade resolver software to BIND 9.7.1b1 or later

2. Make the zoneʼs DNSKEY and NS TTLs match

3. Restart the resolver after publication of DS record

The workarounds are, for now, only enumerated here without further discussion as to

their relative merits or operational impacts.

© 2011 Verisign, Inc. All rights reserved. VERISIGN and other trademarks, service marks, and designs are registered or unregistered trademarks of Verisign, Inc. and its subsidiaries in the United States and in foreign countries. All other trademarks are property of their respective owners

BIND 9.7.0 P2 DS Introduction BehaviorWithout Blinding

Time between No Blinding and DS in Root (minutes or days)0 1 2 3 4 5 6 7 8 9 10

Tim

e be

twee

n Si

gnin

g an

dN

oBl

indi

ng (m

inut

es o

r day

s)

0

1

2

3

SERVFAILNOERROR