Globally Synchronized time via Datacenter Networks

48
Globally Synchronized time via Datacenter Networks Ki Suh Lee Cornell University Joint work with Han Wang, Vishal Shrivastav and Hakim Weatherspoon 1

Transcript of Globally Synchronized time via Datacenter Networks

Page 1: Globally Synchronized time via Datacenter Networks

GloballySynchronizedtimeviaDatacenterNetworks

KiSuhLeeCornellUniversity

JointworkwithHanWang,VishalShrivastav andHakimWeatherspoon

1

Page 2: Globally Synchronized time via Datacenter Networks

SynchronizedClocks

• Fundamentalfornetworkanddistributedsystems– OWD,Monitoring,Coordination,Snapshots,Updates,…

• Goal:Minimizedandboundedprecisionwithscalability– Minimizedandboundedprecision:hundredsofnanoseconds– Scalability:Entiredatacenter

2

Page 3: Globally Synchronized time via Datacenter Networks

ClockSynchronizationProtocol

• Offset: Timedifferencebetweentwoclocks• Precision:Theworstcaseofoffset

3

Client Timeserver

𝑡"𝑡#

𝑡$

𝑡%

Page 4: Globally Synchronized time via Datacenter Networks

ClockSynchronizationProtocol

• RTT= 𝑡% − 𝑡" − 𝑡$ − 𝑡#

• Offset=(()*(+)#

− (-*(.#

• Offset= 𝑡# − 𝑡" − 𝑅𝑇𝑇/2

4

Client Timeserver

𝑡"𝑡#

𝑡$

𝑡%

Page 5: Globally Synchronized time via Datacenter Networks

5

CurrenttimeprotocolsdoNOTprovidebounded precision,duetouncertainty inmeasuredRTT!

Page 6: Globally Synchronized time via Datacenter Networks

Challenge:RTTisnotaccurate

6

• Errorsfrom– Oscillatorskew– InaccurateTimestamping– NetworkStack– NetworkJitter

Client Timeserver

𝑡"𝑡#

𝑡$

𝑡%

Page 7: Globally Synchronized time via Datacenter Networks

Challenge:RTTisnotaccurate• Errorsfrom– Oscillatorskew– InaccurateTimestamping– NetworkStack– NetworkJitter

• PTP

– Hardwaretimestamping– PTP-enabledswitches– Filtering/Smoothing

7

Client Timeserver

𝑡"𝑡#

𝑡$

𝑡%

Page 8: Globally Synchronized time via Datacenter Networks

Challenge:Scalability

• Re-synchronizationperiodvs.Networkoverhead• Limitednumberofclients

8

Client Timeserver

𝑡"𝑡#

𝑡$

𝑡%

Page 9: Globally Synchronized time via Datacenter Networks

SynchronizationProtocols

9

Client Timeserver

𝑡"𝑡#

𝑡$

𝑡%

Precision Scalability Overhead ExtraHardwareNTP us Good Moderate NonePTP sub-us Good Moderate PTP-enableddevicesGPS ns Bad None Timingsignal receivers,cables

Page 10: Globally Synchronized time via Datacenter Networks

Solution:UsethePHYtosynchronizeclocks• ProtocolinthePHY– Eachphysicallylinkisalreadysynchronized!– Noprotocolstackoverhead– Nonetworkoverhead– Scalable:peer-to-peeranddecentralized

10

Application

Transport

Network

DataLink

Physical

𝑡3

𝑡4

Page 11: Globally Synchronized time via Datacenter Networks

DTP:DatacenterTimeProtocol• HighlyScalablewithbounded precision!– ~25ns(4clockticks)betweenpeers– ~150nsforadatacenterwithsixhops– NoNetworkTraffic– Internal ClockSynchronization

• End-to-End:~200nsprecision!

11

Application

Transport

Network

DataLink

Physical

Page 12: Globally Synchronized time via Datacenter Networks

Outline

• Introduction• Design• Evaluation• Discussion• Conclusion

12

Page 13: Globally Synchronized time via Datacenter Networks

DTP:DatacenterTimeProtocol• 10GBackground– Continuous/I/swhenthereisnopacket– Atleast12/I/sbetweentwoEthernetframes

13

Application

Transport

Network

DataLink

Physical

Packeti Packeti+1 Packeti+2

Page 14: Globally Synchronized time via Datacenter Networks

DTP:DatacenterTimeProtocol• 10GBackground– Continuous/I/swhenthereisnopacket– Atleast12/I/sbetweentwoEthernetframes– 1Controlblock(/E/,66bit)=8/I/s– Atleast1/E/betweenanytwoframes– ThePHYisrunby156.25MHz

• Periodis6.4ns

14

Application

Transport

Network

DataLink

Physical

Packeti Packeti+1 Packeti+2

/E/ /E/ /E/ /E/ /E/ /E/ /E/ /E/

/E/ /E/

Page 15: Globally Synchronized time via Datacenter Networks

DTP:DatacenterTimeProtocol

15

Application

Transport

Network

DataLink

Physical

Packeti Packeti+1 Packeti+2

/E/ /E/ /E/ /E/ /E/ /E/ /E/ /E/

/E/ /E/

• DTPoverwrites/E/tosendprotocolmessages– Frequentmessaging– NooverheadtoEthernet(L2)

DTP DTP

DTP DTP

/E/

2bitSyncheader

8bitBlockType

3bitDTPMSGType

53bitDTPPayload

Page 16: Globally Synchronized time via Datacenter Networks

10GbENetworkStack

8/26/16 SoNICNSDI2013

16

Physical64/66bPCS

PMA

PMD

Encode

Scrambler

Gearbox

Decode

Descrambler

Blocksync

DataLink

Network

Transport

Application Data

/S/ /D/ /D/ /D/ /D/ /T/ /E/

DataL3Hdr

DataL3HdrL2Hdr

DataL3HdrL2Hdr GapEthHdr CRCPreamble

011010010110100101101001011010010110100101101001011010010110100101101

Encode

Scrambler

Gearbox

PMA

64bit 2bitsyncheader

16bit

10.3125Gigabits

/S/ /D/ /D/ /D/ /D/ /T/ /E/

Idlecharacters(/I/)

Page 17: Globally Synchronized time via Datacenter Networks

DTP

17

Physical64/66bPCS

Decode

Descrambler

Blocksync

Encode

Scrambler

Gearbox

PMD

PMA

DTPRxDTPTxDTP Control

localcounter

• localcounter:106-bitclock– Frequently,synchronizelow53bits– Occasionally,synchronizehigh53bits

• delay:one-waydelaytopeer

SynchronizationFIFO

delay

LocalClock RemoteClock

Application

Transport

Network

DataLink

Physical

Page 18: Globally Synchronized time via Datacenter Networks

DTP

18

• Runsintwophasesbetweentwopeers– Init Phase:MeasuringOWD– BeaconPhase:Re-Synchronization

Physicallocaldelay

Physicallocaldelay

Application

Transport

Network

DataLink

Physical

Page 19: Globally Synchronized time via Datacenter Networks

DTP: Init Phase

19

• d𝑒𝑙𝑎𝑦 = 𝑡% −𝑡" − 𝛼 /2– 𝛼=3:Ensuredelayisalwayslessthanactualdelay

• Introduce2clocktickerrors– Duetooscillatorskew,timingandSyncFIFO

𝑡"𝑡#𝑡$

𝑡%

Physicallocaldelay

Physicallocaldelay

Application

Transport

Network

DataLink

Physical

Page 20: Globally Synchronized time via Datacenter Networks

DTP: BeaconPhase

20

• local =max(local,remote+delay)• Frequentmessages– Every1.2us(200clockticks)withMTUpackets– Every7.2us(1200clockticks)withJumbopackets

• Introduces2clocktickerrors– Total4clocktickerrors

Physicallocaldelay

Physicallocaldelay

𝑡"𝑡#

Application

Transport

Network

DataLink

Physical

Page 21: Globally Synchronized time via Datacenter Networks

DTPSwitch

21

• global=max(local counters)• Propagatesglobal viaBeaconmessages

Physicallocaldelay

Physicallocaldelay

Physicallocaldelay

Physicallocaldelay

Physicallocaldelay

max

global

Application

Transport

Network

DataLink

Physical

Page 22: Globally Synchronized time via Datacenter Networks

DTPDaemon

• End-to-Endprecision• AccesstheDTPcounterviaPCIe• EstimateDTPtimeusinginvariantTSCcounter

22

Page 23: Globally Synchronized time via Datacenter Networks

DTPProperty

23

• BoundedPrecisioninhardware– Boundedby4T(=25.6ns,T=oscillatortickis6.4ns)– Networkprecisionboundedby4TD

• Disnetworkdiameterinhops

• RequiresNICandswitchmodifications– PTPalsorequiresPTP-enableddevices

Page 24: Globally Synchronized time via Datacenter Networks

DTPvsPTPPTP DTP

Oscillator Skew

Timestamping HW - timestamping PHYtimestamping

NetworkStack Notinvolved Notinvolved

NetworkJitter TransparentClockBoundary Clock

No jitter

Precision UnboundedTenstoHundredsns(When Idle)

Bounded

24

Page 25: Globally Synchronized time via Datacenter Networks

• Handlingfailure• Differentstandards:1GbE,25GbE,40GbE,100GbE,etc• Externalsynchronization(i.e.synchronizingtotruetime)• Incrementaldeployment

25

DTP:Topicsdiscussedinpaper

Page 26: Globally Synchronized time via Datacenter Networks

Handlingfailure

• BitErrors– IgnoresBiterrorsinMSBs– AppendschecksumforlowLSBs

• FaultyDevices– Whentoomanyjumpsoutsidethebound

26

Page 27: Globally Synchronized time via Datacenter Networks

DifferentStandardsData Rate Encoding Data Width Frequency Period Δ1GbE 8b/10b 8bit 125MHz 8ns 25

10 GbE 64b/66b 32bit 156.25MHz 6.4ns 20

40GbE 64b/66b 64bit 625MHz 1.6ns 5

100GbE 64b/66b 64bit 1562.5MHz 0.64ns 2

27

Page 28: Globally Synchronized time via Datacenter Networks

ExternalSynchronization

• Amasterserver– Connectedtoareferencetime– BroadcaststhemappingbetweenDTP andwalltime

• Clientservers– InterpolatestimeusingDTP counters

28

Page 29: Globally Synchronized time via Datacenter Networks

IncrementalDeployment

• Updatesperrack– DTP-enabledswitch– DTP-enabledNICs– Oneserveractingasamaster forwalltime

• SynchronizingRacks– DTP-enabledswitch– DTPbeacon-joinmessageforsynchronizingDTPcounters– Selectanewmaster

29

Page 30: Globally Synchronized time via Datacenter Networks

Outline

• Introduction• Design• Evaluation• Discussion• Conclusion

30

Page 31: Globally Synchronized time via Datacenter Networks

Evaluation

• DTPPrototype– Terasic DE5boardwithAlteraStratix V– UsingBluespec andConnectal framework

31

Page 32: Globally Synchronized time via Datacenter Networks

Evaluation:DTPTopology

32

S4 S5 S6 S7 S8 S9 S10 S11

S1 S2 S3

S0

DTPNIC

Measuredoffsetsbetweenpeers

Page 33: Globally Synchronized time via Datacenter Networks

Evaluation:Logger

• Offsetbetweenpeers:𝑡$ − 𝑡# − OWD• OffsetbetweenSWandHW:𝑡# − 𝑡"

33

Physicallocaldelay

Physicallocaldelay

DTPDaemon DTPDaemon

𝑡"

𝑡# 𝑡$

𝑡", 𝑡#,𝑡$

Page 34: Globally Synchronized time via Datacenter Networks

Evaluation:DTPTopology

34

S4 S5 S6 S7 S8 S9 S10 S11

S1 S2 S3

S0

DTPNIC

Offset=𝑑𝑡𝑝CD- 𝑑𝑡𝑝ED

Page 35: Globally Synchronized time via Datacenter Networks

Evaluation:PTPTopology

35

S4 S5 S6 S7 S8 S9 S10 S11

S1 S2 S3

S0

Timeserver

PTPSwitch

PTPNIC

Page 36: Globally Synchronized time via Datacenter Networks

Evaluation:PTPTopology

36

S4 S5 S6 S7 S8 S9 S10 S11

S1 S2 S3

S0

Timeserver

PTPSwitch

PTPNIC

Page 37: Globally Synchronized time via Datacenter Networks

Evaluation:PTPTopology

37

S4 S5 S6 S7 S8 S9 S10 S11

S1 S2 S3

S0

Timeserver

PTPSwitch

PTPNIC

Page 38: Globally Synchronized time via Datacenter Networks

PTP:IdleNetwork(Notraffic)

• Tenstohundredsofnanosecondprecision

38

-600

-400

-200

0

200

400

600

Offs

et(n

anosecon

d)

Time(min)

Page 39: Globally Synchronized time via Datacenter Networks

PTP:MediumLoaded(4Gbps)

• Tensofmicrosecondsprecision

39

-50

-25

0

25

50

Offs

et(m

icrosecond

)

Time(min)

Page 40: Globally Synchronized time via Datacenter Networks

PTP:HeavilyLoaded(9Gbps)

• Tenstohundredsofmicrosecondprecision

40

-150

-100

-50

0

50

100

150

Offs

et(m

icrosecond

)

Time(min)

Page 41: Globally Synchronized time via Datacenter Networks

DTP:HeavilyLoaded

• Alwayswithin25.6ns(4clockticks)betweenpeers

41

-32

-25.6

-19.2

-12.8

-6.4

0

6.4

12.8

19.2

25.6

32

0 3 6-5

-4

-3

-2

-1

0

1

2

3

4

5

Offs

et(N

anosecon

d)

Time(min)

Offs

et(C

lockTick)

S1-S4 S1-S5 S1-S0 S2-S7 S2-S8

S2-S0 S3-S10 S3-S11 S3-S0

Page 42: Globally Synchronized time via Datacenter Networks

DTPDaemon

42

Page 43: Globally Synchronized time via Datacenter Networks

DTPDaemon(aftersmoothing)

• Usuallycanaccessthecounterwith25.6nsprecision

43

-20

-16

-12

-8

-4

0

4

8

12

16

20

-128

-102.4

-76.8

-51.2

-25.6

0

25.6

51.2

76.8

102.4

128

Offs

et(C

lockTick)

Offs

et(n

anosecon

d)

Time(min)

Page 44: Globally Synchronized time via Datacenter Networks

Outline

• Introduction• Design• Evaluation• Discussion• Conclusion

44

Page 45: Globally Synchronized time via Datacenter Networks

NextSteps

• IntegrationwithOSNT(OpenSourceNetworkTester)– NetFPGA SUMEBoardwithXilinxVirtex-7

45

Page 46: Globally Synchronized time via Datacenter Networks

SomeRelatedWork• SynchronousEthernet(SyncE)– Synchronizethefrequencyofclocks– DTP,PTPsynchronizesthetime ofclocks

• WhiteRabbit:PTP+SyncE– Sub-nanosecondprecision– 1GbEonlyyet

• CommercialPTP+SyncE– Tenstohundredsofnanoseconds

46

Page 47: Globally Synchronized time via Datacenter Networks

Conclusion

• DTPprovidesboundedprecision andscalability– Boundedprecision:4clockticks(25.6ns)betweenpeers– Scalability:153.6nsforadatacenterwithsixhops– Free:NoNetworkTraffic– Applications:Usuallywithin25.6ns(withoutbounds)– End-to-End:153.6+25.6*2=200ns!

47

Page 48: Globally Synchronized time via Datacenter Networks

Questions?

• http://github.com/hanw/sonic-lite• http://sonic.cs.cornell.edu• Email:[email protected]

• CometoPostersessiontomorrow!

48