Clocking and Timing in Fault-Tolerant Systems-on-Chip
description
Transcript of Clocking and Timing in Fault-Tolerant Systems-on-Chip
![Page 1: Clocking and Timing in Fault-Tolerant Systems-on-Chip](https://reader035.fdocuments.in/reader035/viewer/2022062315/56816384550346895dd4696a/html5/thumbnails/1.jpg)
Clocking and Timing in Fault-Tolerant Systems-on-Chip
Andreas Steininger
![Page 2: Clocking and Timing in Fault-Tolerant Systems-on-Chip](https://reader035.fdocuments.in/reader035/viewer/2022062315/56816384550346895dd4696a/html5/thumbnails/2.jpg)
Outline
• The Clock as a Blessing• The Clock as a Curse• Alternative Synchronization Schemes
- GALS- fully asynchronous- the DARTS approach
• Conclusion
2
![Page 3: Clocking and Timing in Fault-Tolerant Systems-on-Chip](https://reader035.fdocuments.in/reader035/viewer/2022062315/56816384550346895dd4696a/html5/thumbnails/3.jpg)
Contributors to this Work
The DARTS project team
TU Vienna Gottfried FuchsMatthias FueggerUlrich SchmidThomas Handl
RUAG Space Gerald KempfManfred SustWolfgang Zangerl
3
![Page 4: Clocking and Timing in Fault-Tolerant Systems-on-Chip](https://reader035.fdocuments.in/reader035/viewer/2022062315/56816384550346895dd4696a/html5/thumbnails/4.jpg)
The Need for Fault Tolerance
miniaturization is key to progress in VLSI=> smaller structures=> lower voltage swing=> smaller critical charge=> higher operating frequencies
…result in higher susceptibility to faults (SET, EMI,…)
=> cannot avoid faults, need to tolerate them
4
![Page 5: Clocking and Timing in Fault-Tolerant Systems-on-Chip](https://reader035.fdocuments.in/reader035/viewer/2022062315/56816384550346895dd4696a/html5/thumbnails/5.jpg)
The Role of Time
“The only reason for time is so that everything doesn’t happen at once”, Albert Einstein
5
![Page 6: Clocking and Timing in Fault-Tolerant Systems-on-Chip](https://reader035.fdocuments.in/reader035/viewer/2022062315/56816384550346895dd4696a/html5/thumbnails/6.jpg)
The Need for Clocking
activities need to be co-ordinated• on system level (braking of wheels, …)• on algorithmic level (consensus, …)• on communication level• on logic level (state machine switching,…)
co-ordination in the time domain (synchronization) is an efficient way to attain this=> need a global notion of time (discrete „ticks“)
6
![Page 7: Clocking and Timing in Fault-Tolerant Systems-on-Chip](https://reader035.fdocuments.in/reader035/viewer/2022062315/56816384550346895dd4696a/html5/thumbnails/7.jpg)
The Quality of Synchronization
real time
local time (number of ticks)
precision π
7
![Page 8: Clocking and Timing in Fault-Tolerant Systems-on-Chip](https://reader035.fdocuments.in/reader035/viewer/2022062315/56816384550346895dd4696a/html5/thumbnails/8.jpg)
Typical Precision Values
on system level: ms … mson algorithm level: ms … mson communication level: ns … mson logic level: ps … ns
8
![Page 9: Clocking and Timing in Fault-Tolerant Systems-on-Chip](https://reader035.fdocuments.in/reader035/viewer/2022062315/56816384550346895dd4696a/html5/thumbnails/9.jpg)
Synchronization Requirements
9
phase synchronisation(for „hardware clock“
on logic level)
clock synchronisation(for distributed time base
on algorithmic level)
1ms is excellent precision for distributed clock
at 1GHz this means 360.000° phase shift
![Page 10: Clocking and Timing in Fault-Tolerant Systems-on-Chip](https://reader035.fdocuments.in/reader035/viewer/2022062315/56816384550346895dd4696a/html5/thumbnails/10.jpg)
Globally Synchronous Design
• whole design is „isochronic“ („perfect“ precision)• time conveyed by clock transitions• perfect co-ordination of all activities
• very efficient design• can assume consistent states• high level of abstraction
• very efficient implementation:• single crystal oscillator• single control line (clock net)
10
![Page 11: Clocking and Timing in Fault-Tolerant Systems-on-Chip](https://reader035.fdocuments.in/reader035/viewer/2022062315/56816384550346895dd4696a/html5/thumbnails/11.jpg)
„Isochronic“ Regions ?
speed of light (in medium) = 2 x 108 m/s = 20cm/ns
11
2cm
Ref
1GHz
4GHz
8GHz
![Page 12: Clocking and Timing in Fault-Tolerant Systems-on-Chip](https://reader035.fdocuments.in/reader035/viewer/2022062315/56816384550346895dd4696a/html5/thumbnails/12.jpg)
The Variation Problem
12
Designer
system model
projected conditions
User
actual conditions
actual system
worst case
safety margins
?(unknown)
?(imperfections)
Timing completely fixed after designNo way to react to actual conditions & system („PVT variations“)
![Page 13: Clocking and Timing in Fault-Tolerant Systems-on-Chip](https://reader035.fdocuments.in/reader035/viewer/2022062315/56816384550346895dd4696a/html5/thumbnails/13.jpg)
Fault-Tolerant Architectures
Duplication & Comparison
Triple-Modular Redundancy
13
FU
FU=?
ERR
FU
FU
vo-ter
YFU
![Page 14: Clocking and Timing in Fault-Tolerant Systems-on-Chip](https://reader035.fdocuments.in/reader035/viewer/2022062315/56816384550346895dd4696a/html5/thumbnails/14.jpg)
Lock-Step Operation
single clock
14
„3“ „4“
„3“ „4“
single point of failure good replica determinism
FU
FU
vo-ter
YFU
„3“ „4“
![Page 15: Clocking and Timing in Fault-Tolerant Systems-on-Chip](https://reader035.fdocuments.in/reader035/viewer/2022062315/56816384550346895dd4696a/html5/thumbnails/15.jpg)
Lock-Step Operationindependent clocks
15
„3“ „4“
„3“ „4“
single fault tolerant bad replica determinism
FU
FU
vo-ter
YFU
„3“ „4“
![Page 16: Clocking and Timing in Fault-Tolerant Systems-on-Chip](https://reader035.fdocuments.in/reader035/viewer/2022062315/56816384550346895dd4696a/html5/thumbnails/16.jpg)
Fault-Tolerant HW-Clocking
16
FU
FU
vo-ter
YFU
v
v
v
![Page 17: Clocking and Timing in Fault-Tolerant Systems-on-Chip](https://reader035.fdocuments.in/reader035/viewer/2022062315/56816384550346895dd4696a/html5/thumbnails/17.jpg)
Fault-Tolerant HW-Clocking
17
FU
FU
vo-ter
YFU
v
v
v
D
D
?
?
![Page 18: Clocking and Timing in Fault-Tolerant Systems-on-Chip](https://reader035.fdocuments.in/reader035/viewer/2022062315/56816384550346895dd4696a/html5/thumbnails/18.jpg)
The Charme of SoCs
billions of transistors fit on one die=> structuring into (IP) modules
„System-on-Chip“BUT:• large clock distribution networks => „isochronic“??• FT clocking does not work with large skew• may need individual clocks for function modules
=> clock-synchrony neither attainable nor desirable
18
![Page 19: Clocking and Timing in Fault-Tolerant Systems-on-Chip](https://reader035.fdocuments.in/reader035/viewer/2022062315/56816384550346895dd4696a/html5/thumbnails/19.jpg)
Co-ordination of Data Exchange
19
SRC SNK f(x)
When it is valid and consistent
When SNK has consumed the previous one
When can SNK use its input?
When can SRC apply the next input?
![Page 20: Clocking and Timing in Fault-Tolerant Systems-on-Chip](https://reader035.fdocuments.in/reader035/viewer/2022062315/56816384550346895dd4696a/html5/thumbnails/20.jpg)
The Synchronous Approach
20
SRC SNK f(x)
co-ordination based on (global) time
![Page 21: Clocking and Timing in Fault-Tolerant Systems-on-Chip](https://reader035.fdocuments.in/reader035/viewer/2022062315/56816384550346895dd4696a/html5/thumbnails/21.jpg)
Alternative: Asynchronous Design
21
SRC SNK f(x)
co-ordination based on handshaking
REQ: „Data word valid, you can use it“
ACK: „Data word consumed, send the next“
![Page 22: Clocking and Timing in Fault-Tolerant Systems-on-Chip](https://reader035.fdocuments.in/reader035/viewer/2022062315/56816384550346895dd4696a/html5/thumbnails/22.jpg)
Async. Design – Advantages
• closed-loop control makes timing much more robust and adaptive to PVT variations
• no need for worst-case timing• local handshakes replace global clock• activity only when needed• beneficial for EMI• tends to stop operation in case of fault
22
![Page 23: Clocking and Timing in Fault-Tolerant Systems-on-Chip](https://reader035.fdocuments.in/reader035/viewer/2022062315/56816384550346895dd4696a/html5/thumbnails/23.jpg)
Async. Design – Disadvantages
• Need to handle race between REQ and data
23
![Page 24: Clocking and Timing in Fault-Tolerant Systems-on-Chip](https://reader035.fdocuments.in/reader035/viewer/2022062315/56816384550346895dd4696a/html5/thumbnails/24.jpg)
Async. Design – Disadvantages
• Need to handle race between REQ and data
24
SRC SNK f(x)
REQ: „Data word valid, you can use it“
![Page 25: Clocking and Timing in Fault-Tolerant Systems-on-Chip](https://reader035.fdocuments.in/reader035/viewer/2022062315/56816384550346895dd4696a/html5/thumbnails/25.jpg)
Async. Design – Disadvantages
• Need to handle race between REQ and dataSolution 1: „Bundled Data“
25
SRC SNK f(x)
REQ: „Data word valid, you can use it“
![Page 26: Clocking and Timing in Fault-Tolerant Systems-on-Chip](https://reader035.fdocuments.in/reader035/viewer/2022062315/56816384550346895dd4696a/html5/thumbnails/26.jpg)
Async. Design – Disadvantages
• Need to handle race between REQ and dataSolution 2: „Delay Insensitive“ (Coding)
26
SRC SNK f(x)
REQ: „Data word valid, you can use it“
Completion detection
![Page 27: Clocking and Timing in Fault-Tolerant Systems-on-Chip](https://reader035.fdocuments.in/reader035/viewer/2022062315/56816384550346895dd4696a/html5/thumbnails/27.jpg)
Async. Design – Disadvantages
• Need to handle race between REQ and data• significant HW overhead (coding, delay elements)• „adaptive“ timing not as predictable• more difficult to design• classical fault-tolerance schemes not applicable• tends to stop operation in case of fault
27
![Page 28: Clocking and Timing in Fault-Tolerant Systems-on-Chip](https://reader035.fdocuments.in/reader035/viewer/2022062315/56816384550346895dd4696a/html5/thumbnails/28.jpg)
Best of Both Worlds
GALS: Globally Asynchronous Locally Synchronous
28
retain efficiency of synchronous design wherever possible:„intra-module“
use asynchronousprinciple whereclock distributiontoo cumbersome:„inter-module“
First mention in PhD thesis by Chapiro / Stanford 84
![Page 29: Clocking and Timing in Fault-Tolerant Systems-on-Chip](https://reader035.fdocuments.in/reader035/viewer/2022062315/56816384550346895dd4696a/html5/thumbnails/29.jpg)
A GALS Example
29
CPU2GHz
PCI-IF533MHz
DSP2,7GHz
USB-IF24MHz
![Page 30: Clocking and Timing in Fault-Tolerant Systems-on-Chip](https://reader035.fdocuments.in/reader035/viewer/2022062315/56816384550346895dd4696a/html5/thumbnails/30.jpg)
Communication in GALS
Shared Memoryproducer writes to memory, consumer reads from therepro: control flow stays independent• shared single-port memory • true dual-port memory
Direct Messages (Data words)move data word from producer‘s output register to consumer‘s input register• non-buffered / buffered (FIFO-queues)• clock fixed, data-driven or pausible
30
![Page 31: Clocking and Timing in Fault-Tolerant Systems-on-Chip](https://reader035.fdocuments.in/reader035/viewer/2022062315/56816384550346895dd4696a/html5/thumbnails/31.jpg)
Shared Memory
decoupling of clock domains by memory acting as a third party => high area overhead => unusual
for single port memory arbitration required• arbitration problem (unbounded delay…)• one side may block the other at the arbiter
for multiport memory problems are confined to access to the same cell• busy flag may become metastable• blocking still possible for one specific address
31
![Page 32: Clocking and Timing in Fault-Tolerant Systems-on-Chip](https://reader035.fdocuments.in/reader035/viewer/2022062315/56816384550346895dd4696a/html5/thumbnails/32.jpg)
Shared Memory
32
CPU2GHz
shared memory
Arbi-tration
0xff14
DSP2,7GHz
• perfect decoupling of data path
• potential metastability problems at arbitration logic
• potential blocking through arbitration
![Page 33: Clocking and Timing in Fault-Tolerant Systems-on-Chip](https://reader035.fdocuments.in/reader035/viewer/2022062315/56816384550346895dd4696a/html5/thumbnails/33.jpg)
Direct Messagesclock domain boundary is between producer‘s output register
and consumer‘s input register
in general a synchronizer is needed at consumer‘s input• definitely for conventional (fixed) clock• can be avoided by data-driven / pausible clocking
control flows of producer and consumer are strongly coupled: not maintaining the input/output register blocks other party
buffers/queues/FIFOs can • mitigate, but not avoid this problem (full/empty)• compensate variations in the data rate on both sides, but not
different average data rates33
![Page 34: Clocking and Timing in Fault-Tolerant Systems-on-Chip](https://reader035.fdocuments.in/reader035/viewer/2022062315/56816384550346895dd4696a/html5/thumbnails/34.jpg)
Direct Messages
data moving over clock domain boundarymetastability problems=> need to insert handshake…with synchronizers
34
S
0xff14
CPU2GHz
DSP2,7GHz
S
and (optional) buffers
![Page 35: Clocking and Timing in Fault-Tolerant Systems-on-Chip](https://reader035.fdocuments.in/reader035/viewer/2022062315/56816384550346895dd4696a/html5/thumbnails/35.jpg)
Arbiter: Principle
purpose: ○ manage concurring requests to shared resource
method: ○ handle pairs of request_in / grant_out ○ requests may arrive in any order ○ arbiter must activate only one grant_out at a
time (respond to the first requester)
Mutual Exclusion (MUTEX)
problem: ○ resolve concurrent requests=> metastability problem
35
![Page 36: Clocking and Timing in Fault-Tolerant Systems-on-Chip](https://reader035.fdocuments.in/reader035/viewer/2022062315/56816384550346895dd4696a/html5/thumbnails/36.jpg)
Arbiter: Circuit
36
„Metastability filter“: e.g., hi-threshold inverter
[from D. J. Kinniment „Synchronization and Arbitration in Digital Systems“, Wiley]
MUTEX-element: SR-latch
G1’
G2’
R1
R2
G1
G2
Vout,FF
t
Vth,inv
Vmeta
![Page 37: Clocking and Timing in Fault-Tolerant Systems-on-Chip](https://reader035.fdocuments.in/reader035/viewer/2022062315/56816384550346895dd4696a/html5/thumbnails/37.jpg)
Arbiter: Operation
37
R1
G1
R2
G2
G1’
G2’
R1
R2
G1
G2
![Page 38: Clocking and Timing in Fault-Tolerant Systems-on-Chip](https://reader035.fdocuments.in/reader035/viewer/2022062315/56816384550346895dd4696a/html5/thumbnails/38.jpg)
Muller C-Element
38
RS
reset
set
a
b
y
IF a = bTHEN y = aELSE hold yC
a b
y
Ca
by
![Page 39: Clocking and Timing in Fault-Tolerant Systems-on-Chip](https://reader035.fdocuments.in/reader035/viewer/2022062315/56816384550346895dd4696a/html5/thumbnails/39.jpg)
Muller C-Element: Circuit
39
[Alan Martin, Caltech]
![Page 40: Clocking and Timing in Fault-Tolerant Systems-on-Chip](https://reader035.fdocuments.in/reader035/viewer/2022062315/56816384550346895dd4696a/html5/thumbnails/40.jpg)
Data-Driven Clocking
Principle:○ as soon as new data arrive => start clocking○ determine number k of clock cycles
required to process new data
○ stop clocking after k cycles, wait for next data
Properties: ○ need to switch clock on and off => beware spurious clock pulses!
○ no metastability problem: data stable as soon
as consumer clock starts○ potential for power saving○ useful for specific applications only (no
pipe!)
40
![Page 41: Clocking and Timing in Fault-Tolerant Systems-on-Chip](https://reader035.fdocuments.in/reader035/viewer/2022062315/56816384550346895dd4696a/html5/thumbnails/41.jpg)
Data-Driven Clock: Circuit / 1
41
CLK out
D
CLK out
CLK half period determined by D
D
![Page 42: Clocking and Timing in Fault-Tolerant Systems-on-Chip](https://reader035.fdocuments.in/reader035/viewer/2022062315/56816384550346895dd4696a/html5/thumbnails/42.jpg)
Data-Driven Clock: Circuit / 2
42
D
C
REQ
ACK
CLK out
REQ
ACK
transition on REQ answered by transition on CLK out
min CLK half period deter-mined by D
CLK out
D
![Page 43: Clocking and Timing in Fault-Tolerant Systems-on-Chip](https://reader035.fdocuments.in/reader035/viewer/2022062315/56816384550346895dd4696a/html5/thumbnails/43.jpg)
Pausible Clocking
Principle:○ producer requests consumer‘s clock to pause○ data provided to input register during idle
time○ consumer‘s clock may resume
- free running („pausible clock“)- with one cycle only („stoppable clock“)
Properties: ○ need to switch clock on and off => beware spurious clock pulses!=> beware of clock tree delays!
○ producer controls consumer‘s clock (blocking!)
○ applications must cope with paused clock43
![Page 44: Clocking and Timing in Fault-Tolerant Systems-on-Chip](https://reader035.fdocuments.in/reader035/viewer/2022062315/56816384550346895dd4696a/html5/thumbnails/44.jpg)
Pausible Clock: Circuit / 1
44
D
C
REQ
ACK
CLK out
REQ
ACK
inverter generates next REQ from ACK
self-oscillation
CLK out
D
![Page 45: Clocking and Timing in Fault-Tolerant Systems-on-Chip](https://reader035.fdocuments.in/reader035/viewer/2022062315/56816384550346895dd4696a/html5/thumbnails/45.jpg)
Pausible Clock: Circuit / 2
45
D
C
REQ’ACK’ external unit can
safely stop CLK by activating REQ’
… and gets ACK’ as a response
CLK out
CLK out
REQ’
ACK’
Arb
D
![Page 46: Clocking and Timing in Fault-Tolerant Systems-on-Chip](https://reader035.fdocuments.in/reader035/viewer/2022062315/56816384550346895dd4696a/html5/thumbnails/46.jpg)
Pausible Clock: Circuit / 3
46
D
C
REQ1ACK1
for more external sources arbiters can be added and “anded” before the Muller C-Element
the two inverters can be eliminated by using a Muller C-Element with inverting output
CLK outArb REQn
ACKn
Arb
![Page 47: Clocking and Timing in Fault-Tolerant Systems-on-Chip](https://reader035.fdocuments.in/reader035/viewer/2022062315/56816384550346895dd4696a/html5/thumbnails/47.jpg)
Advantages of GALS
• synchronous islands can be designed efficiently• modules operate independently• can use module specific-clock & timing• clocking is no single point of failure
47
![Page 48: Clocking and Timing in Fault-Tolerant Systems-on-Chip](https://reader035.fdocuments.in/reader035/viewer/2022062315/56816384550346895dd4696a/html5/thumbnails/48.jpg)
Problems with GALS
• operation of modules not (inherently) co-ordinatedsynchrony for communication but not on system / algorithm level
• communication has to cross clock boundaries• potential for metastability
=> performance penalty through synchronizers OR => module must handle irregular clocking
48
![Page 49: Clocking and Timing in Fault-Tolerant Systems-on-Chip](https://reader035.fdocuments.in/reader035/viewer/2022062315/56816384550346895dd4696a/html5/thumbnails/49.jpg)
The DARTS Idea
49
phase synchronisation
tick synchronisation
clock synchronisation
Distributed Algorithms for Robust Tick Synchronization
![Page 50: Clocking and Timing in Fault-Tolerant Systems-on-Chip](https://reader035.fdocuments.in/reader035/viewer/2022062315/56816384550346895dd4696a/html5/thumbnails/50.jpg)
TG-AlgsFu1
Data Bus
Fu3
Fu2
TG-Net
The DARTS Approach
Concept: Multiple synchronized tick generators Method: Distributed algorithm for fault-tolerant
tick generation implemented in (asynchronous) digital logic
Advantages- No crystal oscillator(s)- No critical clock tree- Clock is no single point of failure! - Reasonable synchrony
50
![Page 51: Clocking and Timing in Fault-Tolerant Systems-on-Chip](https://reader035.fdocuments.in/reader035/viewer/2022062315/56816384550346895dd4696a/html5/thumbnails/51.jpg)
The DARTS Principle
51
Every function unit Fui augmented with simple local clock unit (TG-Alg)
TG-Algs communicate over dedicated TG-Net to generate tick-synchronized local clock signals
Up to f TG-Algs can be Byzantine faulty need n ≥ 3f + 2 TG-Algs
Fu1
Fu2
Fu3
data bus
Clock tree
TG-Algs
TG-Net
DARTS clocksStandard synchronous clocking
Formally proven
synchronization properties
![Page 52: Clocking and Timing in Fault-Tolerant Systems-on-Chip](https://reader035.fdocuments.in/reader035/viewer/2022062315/56816384550346895dd4696a/html5/thumbnails/52.jpg)
A Comparison
52
TG-AlgsFu1
Data Bus
Fu3
Fu2
TG-Net
tick(3) tick(4)
Fu1 clk
Fu2 clk52
global synchrony (< 1 tick)
synchronous SoC GALSDARTS
Fu1Data Bus Fu3
Fu2
Oscillator
Oscillator
Oscillator
Clo
ck
Tree
Oscillator
Fu1
Data Bus Fu3
Fu2
single point of failure
global synchrony (potentially 1 tick)
no single point of failure
no single point of failure NO (inherent) global synchrony
![Page 53: Clocking and Timing in Fault-Tolerant Systems-on-Chip](https://reader035.fdocuments.in/reader035/viewer/2022062315/56816384550346895dd4696a/html5/thumbnails/53.jpg)
The Distributed Algorithm
(1) Initially:(2) send tick(0) to all; clock:= 0;(3) “Relay Rule”(4) If received tick(m) from at least f+1 remote nodes and m > clock:(5) send tick(clock+1),…, tick(m) to all [once]; clock:= m;(6) “Increment Rule”(7) If received tick(m) from at least 2f+1 remote nodes and m >= clock:(8) send tick(m+1) to all [once]; clock:= m+1;
[Srikanth & Toueg, 87]
TG-Alg 1
TG-Alg 6
TG-Alg 5
TG-Alg 4
TG-Alg 3
TG-Alg 2
TG-Net
![Page 54: Clocking and Timing in Fault-Tolerant Systems-on-Chip](https://reader035.fdocuments.in/reader035/viewer/2022062315/56816384550346895dd4696a/html5/thumbnails/54.jpg)
Implementation Challenges
54
(1) Initially:(2) send tick(0) to all; clock:= 0;(3) “Relay Rule”(4) If received tick(m) from at least f+1 remote nodes and m > clock:(5) send tick(clock+1),…, tick(m) to all [once]; clock:= m;(6) “Increment Rule”(7) If received tick(m) from at least 2f+1 remote nodes and m >= clock:(8) send tick(m+1) to all [once]; clock:= m+1;
Replacement by zero-bit messages
k-bit messagesk unbounded Atomicity of actions
To be ensured by the architecture and delay constraints
Thresholds functions for fault tolerance
Glitch-free asynchronous implementation
TICK(k)
TICK(k-1)
...
TICK(1)
TICK(0)
k-bit msg vs. zero-bit tick
Software-based algorithm
![Page 55: Clocking and Timing in Fault-Tolerant Systems-on-Chip](https://reader035.fdocuments.in/reader035/viewer/2022062315/56816384550346895dd4696a/html5/thumbnails/55.jpg)
The DARTS Prototype
55
ASIC design:
• radhard 180nm technology
• 2 designs:- flexible- fast
Prototype board:8 chips plus fixed & programmable interconnect
![Page 56: Clocking and Timing in Fault-Tolerant Systems-on-Chip](https://reader035.fdocuments.in/reader035/viewer/2022062315/56816384550346895dd4696a/html5/thumbnails/56.jpg)
Proof of Concept
56
![Page 57: Clocking and Timing in Fault-Tolerant Systems-on-Chip](https://reader035.fdocuments.in/reader035/viewer/2022062315/56816384550346895dd4696a/html5/thumbnails/57.jpg)
Frequency Stability (Warm-up)
57
0 2 4 6 8 10 12 14 16 1853.15
53.2
53.25
53.3
53.35
53.4
53.45
time in [hours]
frequ
ency
in [M
Hz]
![Page 58: Clocking and Timing in Fault-Tolerant Systems-on-Chip](https://reader035.fdocuments.in/reader035/viewer/2022062315/56816384550346895dd4696a/html5/thumbnails/58.jpg)
Frequency Stability (detail)
58
0 5 10 1551.94
51.96
51.98
52.0
time in [min]
frequ
ency
in [M
Hz]
0 5 10 151.7968
1.7970
1.7972
1.7974
core
vol
tage
in [V
]
![Page 59: Clocking and Timing in Fault-Tolerant Systems-on-Chip](https://reader035.fdocuments.in/reader035/viewer/2022062315/56816384550346895dd4696a/html5/thumbnails/59.jpg)
DARTS – General Properties
Fully asynchronous implementation NO oscillators
Tolerates up to three Byzantine faulty nodes(configurable number of TG-Algs; 5 to 12)
Adapts to operating conditions (asynchronous logic)
59
![Page 60: Clocking and Timing in Fault-Tolerant Systems-on-Chip](https://reader035.fdocuments.in/reader035/viewer/2022062315/56816384550346895dd4696a/html5/thumbnails/60.jpg)
Still Room for Improvements
o Transient faults are permanently stored in the elastic pipelines
o No on-the-fly integration of TG-Algo Relatively low clock speedo Interfacing to traditional synchronous designso Scaling with number of faults is costly
60
![Page 61: Clocking and Timing in Fault-Tolerant Systems-on-Chip](https://reader035.fdocuments.in/reader035/viewer/2022062315/56816384550346895dd4696a/html5/thumbnails/61.jpg)
Summary: Trends & Needs
• Preceding miniaturization necessitates fault tolerance
• Co-ordinaton of activities is fundamental, thus tight synchrony is a desirable feature on all levels
• SoCs are large modular designs on a single die
61
![Page 62: Clocking and Timing in Fault-Tolerant Systems-on-Chip](https://reader035.fdocuments.in/reader035/viewer/2022062315/56816384550346895dd4696a/html5/thumbnails/62.jpg)
Summary: SoC Clocking
• globally synchronous clock:+ ideal synchrony, efficient in design & implementation- isochrony unrealistic, single point of failure
• DARTS clock+ best attainable global synchrony, adaptive timing, FT- high implementation efforts, frequency not stable
• GALS+ uses best of syn & asyn, indep. & module-specific clock- no global synchrony, metastability issues
• asynchronous design+ power-efficient, robust against faults & PVT- high overheads, difficult to design, timing hard to predict
62
![Page 63: Clocking and Timing in Fault-Tolerant Systems-on-Chip](https://reader035.fdocuments.in/reader035/viewer/2022062315/56816384550346895dd4696a/html5/thumbnails/63.jpg)
More information on DARTS
http://ti.tuwien.ac.at/ecs/research/projects/darts
63