Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid...
Transcript of Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid...
![Page 1: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator](https://reader030.fdocuments.in/reader030/viewer/2022040408/5ebec3de099cdc5d680a599d/html5/thumbnails/1.jpg)
Keynote SSS‘08
Distributed Algorithms and VLSI
Ulrich SchmidVienna University of Technology
![Page 2: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator](https://reader030.fdocuments.in/reader030/viewer/2022040408/5ebec3de099cdc5d680a599d/html5/thumbnails/2.jpg)
Keynote SSS'08 U. Schmid 2
Content
Short introduction to Very Large Scale Integration (VLSI): A photo gallery …– Great perspectives– But …
VLSI Circuits ↔ Distributed Algorithms– DAs and VLSI: Do’s and Don’t’s
Do’s – an Example: DARTS Fault-tolerant Clocks– Starting point: A simple distributed algorithm– How to implement it in VLSI ?– Proofs – [Under the rug: Metastability …]
![Page 3: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator](https://reader030.fdocuments.in/reader030/viewer/2022040408/5ebec3de099cdc5d680a599d/html5/thumbnails/3.jpg)
Keynote SSS'08 U. Schmid 3
Short introduction to VLSI: A photo gallery …
![Page 4: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator](https://reader030.fdocuments.in/reader030/viewer/2022040408/5ebec3de099cdc5d680a599d/html5/thumbnails/4.jpg)
Keynote SSS'08 U. Schmid 4
VLSI Circuits
![Page 5: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator](https://reader030.fdocuments.in/reader030/viewer/2022040408/5ebec3de099cdc5d680a599d/html5/thumbnails/5.jpg)
Keynote SSS'08 U. Schmid 5
Major IngredientsTransistors (nMOS):
Polysilicon GateSiO2Insulator
n n
p substrate
channel
Source Drain
LW
Gate
Source
Drain
Interconnect (wires):
Form & connect gates
(Inverter)
![Page 6: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator](https://reader030.fdocuments.in/reader030/viewer/2022040408/5ebec3de099cdc5d680a599d/html5/thumbnails/6.jpg)
Keynote SSS'08 U. Schmid 6
Miniaturization: Moore‘s Law
Intel 4004 (1971) Intel P4 (2001)• 2.250 transistors• 12 mm2 / 10 µm• 0.74 MHz, 1W
• 42.000.000 transistors• 217 mm2 / 0.180 µm = 180 nm• 2 GHz, 50 W
![Page 7: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator](https://reader030.fdocuments.in/reader030/viewer/2022040408/5ebec3de099cdc5d680a599d/html5/thumbnails/7.jpg)
Keynote SSS'08 U. Schmid 7
Multicore Processors
IBM POWER4 (dual-core)
IBM Cell (8-core)
Tilera TILE64
Today: < 45 nm
![Page 8: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator](https://reader030.fdocuments.in/reader030/viewer/2022040408/5ebec3de099cdc5d680a599d/html5/thumbnails/8.jpg)
Keynote SSS'08 U. Schmid 8
Systems-on-Chip (SoC)
Assemble whole SoC from suitable componentsMarket for “IP cores”, from different vendorsSync/asyncinterfaces
Nvidia Tegra
![Page 9: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator](https://reader030.fdocuments.in/reader030/viewer/2022040408/5ebec3de099cdc5d680a599d/html5/thumbnails/9.jpg)
Keynote SSS'08 U. Schmid 9
Great perspectives for VLSI circuits.
But …
![Page 10: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator](https://reader030.fdocuments.in/reader030/viewer/2022040408/5ebec3de099cdc5d680a599d/html5/thumbnails/10.jpg)
Keynote SSS'08 U. Schmid 10
Manufacturing Limitations
VLSILab Politechnico Torino
Optical Proximity Correction, Intel Corp.
![Page 11: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator](https://reader030.fdocuments.in/reader030/viewer/2022040408/5ebec3de099cdc5d680a599d/html5/thumbnails/11.jpg)
Keynote SSS'08 U. Schmid 11
Defects (Electromigration)
P. Gutman, IBM T.J.Watson Research Center
M. Ohring, Reliability and Failure of Electronic Materials and Devices,1998 ASM Corp. Shanghai
Wiskers Hillock
Void
![Page 12: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator](https://reader030.fdocuments.in/reader030/viewer/2022040408/5ebec3de099cdc5d680a599d/html5/thumbnails/12.jpg)
Keynote SSS'08 U. Schmid 12
Defects (Gate Oxide BD)
K.-L. Pey, C.-H. Tung, Physical characterization of breakdown in metal-oxide-semiconductor transistors
Breakdown−induced thermochemical reactions in (a) poly−Si gate and (b) p−Si substrate of n−channel MOSFETs.
Semitracks, Inc.
ESD-induced gate oxide breakdownwww.siliconfareast.com
![Page 13: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator](https://reader030.fdocuments.in/reader030/viewer/2022040408/5ebec3de099cdc5d680a599d/html5/thumbnails/13.jpg)
Keynote SSS'08 U. Schmid 13
Power Dissipation Problems
A. Choudhary, UMassSmall transistor dissipating 5mW in an SOI wafer; University of Bolton
→ Reduce supply voltage !
![Page 14: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator](https://reader030.fdocuments.in/reader030/viewer/2022040408/5ebec3de099cdc5d680a599d/html5/thumbnails/14.jpg)
Keynote SSS'08 U. Schmid 14
Radiation-induced Soft Errors
SLAC National Accelerator LabStanford
SET SEU
Powell, 1959
0 km10 km
1
10-3
Soft error rates dominate in VLSI !
![Page 15: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator](https://reader030.fdocuments.in/reader030/viewer/2022040408/5ebec3de099cdc5d680a599d/html5/thumbnails/15.jpg)
Keynote SSS'08 U. Schmid 15
Slow Signal Propagation
Transistors switch faster
BUTWires thinnerLess transistor driving strengthRC Signal propagation along wires dominate circuit speed
![Page 16: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator](https://reader030.fdocuments.in/reader030/viewer/2022040408/5ebec3de099cdc5d680a599d/html5/thumbnails/16.jpg)
Keynote SSS'08 U. Schmid 16
Clock Distribution Problem
Circuit & physical design of the POWER4 microprocessor, IBM J. Res. Dev.
Cell processor
tPD,CLK
CLK
D
CLK
D
CLK
D
CLK
D
…
tdly,DATA,1m
tdly,DATA,2m
tdly,DATA,km
FF1
FF2
FFk FFmcombin. logic
Clock signal (common!)
CLK
D
CLK
DCombinat. logic (gates)
Data
Synchronous design paradigm:
→ Synchronous abstraction increasingly difficult to maintain !
![Page 17: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator](https://reader030.fdocuments.in/reader030/viewer/2022040408/5ebec3de099cdc5d680a599d/html5/thumbnails/17.jpg)
Keynote SSS'08 U. Schmid 17
Hence, deep submicron VLSI circuits …
![Page 18: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator](https://reader030.fdocuments.in/reader030/viewer/2022040408/5ebec3de099cdc5d680a599d/html5/thumbnails/18.jpg)
Keynote SSS'08 U. Schmid 18
… are in fact FT Distributed Systems
Spatial distributionMessage-passing communicationMassive concurrencyAsynchronyFailuresSecurity issues (IP cores!)
Worth-while undertaking:Explore the applicability of DA results & approaches to VLSI circuits …
![Page 19: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator](https://reader030.fdocuments.in/reader030/viewer/2022040408/5ebec3de099cdc5d680a599d/html5/thumbnails/19.jpg)
Keynote SSS'08 U. Schmid 19
Applying DA Research in VLSI ?
2008 Dagstuhl-Seminar Distributed Algorithms in VLSI Chips (B. Charron-Bost, J. Ebergen, S. Dolev, U. Schmid, http://www.dagstuhl.de/08371)
[Great place for such undertakings …]
![Page 20: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator](https://reader030.fdocuments.in/reader030/viewer/2022040408/5ebec3de099cdc5d680a599d/html5/thumbnails/20.jpg)
Keynote SSS'08 U. Schmid 20
DA and VLSI – Don’t’s
Apply standard DAs in the VLSI context – too heavy weight in terms of computation & communicationApply standard replication-based FT (for coping with “classic” VLSI faults) – too heavy-weight in terms of power & area penalties
BUT …
![Page 21: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator](https://reader030.fdocuments.in/reader030/viewer/2022040408/5ebec3de099cdc5d680a599d/html5/thumbnails/21.jpg)
Keynote SSS'08 U. Schmid 21
DA and VLSI – Do’s (I)Apply “light-weight” DAs for decentralized handling of [nowadays centralized] functions, e.g. in large multicores– Memory access scheduling (Moscibroda & Mutlu, PODC’08)– Apply self-stabilizing algorithms for handling transient failures (S.
Dolev & Haviv, IEEE ToC, 2006)– Fault-tolerant clock generation in SoCs (Függer, Schmid, Fuchs,
Kempf, EDCC’06)
Apply replication-based FT to cope with malicious failures in VLSI – IP core security threats in SoCs– Inconsistently propagated errors in high-dependability
applications
Tilera TILE64
![Page 22: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator](https://reader030.fdocuments.in/reader030/viewer/2022040408/5ebec3de099cdc5d680a599d/html5/thumbnails/22.jpg)
Keynote SSS'08 U. Schmid 22
DA and VLSI – Do’s (II)
Apply VLSI results & approaches in DA research– Error-correcting codes and asynchronous consensus (Friedmann,
Mostefaoui, Rajsbaum & Raynal, IEEE ToC, 2007)– Corruption-resilient Codes (S. Dolev & Tzachar, DISC’08)
Extend DA approaches, to contribute to a (still lacking!) “Theory of Dependable VLSI Circuits”– Early example: Arbiter-Problem (Lamport, ~1980)– Handle massive concurrency (continuously computing gates!)– Handle computation and communication resource restrictions– Handle “non-closed” specifications– Define suitable failure models
![Page 23: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator](https://reader030.fdocuments.in/reader030/viewer/2022040408/5ebec3de099cdc5d680a599d/html5/thumbnails/23.jpg)
Keynote SSS'08 U. Schmid 23
Do’s – an Example: DARTS Fault-tolerant Clocks
![Page 24: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator](https://reader030.fdocuments.in/reader030/viewer/2022040408/5ebec3de099cdc5d680a599d/html5/thumbnails/24.jpg)
Keynote SSS'08 U. Schmid 24
DARTS – Distributed Algorithms for Robust Tick Synchronization
Joint work with A. Steininger, M. Függer, G. Fuchs [and many others]
http://ti.tuwien.ac.at/ecs/research/projects/darts
![Page 25: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator](https://reader030.fdocuments.in/reader030/viewer/2022040408/5ebec3de099cdc5d680a599d/html5/thumbnails/25.jpg)
Keynote SSS'08 U. Schmid 25
Clocking in SoCs (I)
Classic synchronous paradigmConcept: Common notion of time for entire chip
Method: Single quartz oscillatorGlobal, phase-accurate clock tree
Disadvantages- Cumbersome clock tree design- High power consumption- Clock is single point of failure!
DSP
WLAN
Video
GPRS
GPS
![Page 26: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator](https://reader030.fdocuments.in/reader030/viewer/2022040408/5ebec3de099cdc5d680a599d/html5/thumbnails/26.jpg)
Keynote SSS'08 U. Schmid 26
Clocking in SoCs (II)
Alternative: DARTS clocksConcept: Multiple synchronized tick generators
Method: Distributed FT tick generation alg (TG algs)Interacting via dedicated clock network (TG net)
Advantages- No quartz oscillator(s)- No critical clock tree- Clock is no single point of failure!- Reasonable synchrony
DSP
WLAN
Video
GPRS
GPS
![Page 27: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator](https://reader030.fdocuments.in/reader030/viewer/2022040408/5ebec3de099cdc5d680a599d/html5/thumbnails/27.jpg)
Keynote SSS'08 U. Schmid 27
Reasonable Synchrony ?
Phase synchronization
Clock synchronization
- max precision, - min/max frequency
Tick synchronization
![Page 28: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator](https://reader030.fdocuments.in/reader030/viewer/2022040408/5ebec3de099cdc5d680a599d/html5/thumbnails/28.jpg)
Keynote SSS'08 U. Schmid 28
Starting point: A Distributed Algorithm
![Page 29: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator](https://reader030.fdocuments.in/reader030/viewer/2022040408/5ebec3de099cdc5d680a599d/html5/thumbnails/29.jpg)
Keynote SSS'08 U. Schmid 29
On booting do:send tick(0) to all; C:= 0; /* C is last tick number sent */
Continuously do:
If received tick(C) from n – f different processes:send tick(C+1) to all; C := C+1;
On booting do:send tick(0) to all; C:= 0; /* C is last tick number sent */
Continuously do:
If received tick(C) from all n processes:send tick(C+1) to all; C := C+1;
Failure-free case (f = 0): Simple barrier synchronization(Modified) Srikanth & Toueg algorithmFailure case f > 0 ?
A Distributed Algorithm (I)
On booting do:send tick(0) to all; C:= 0; /* C is last tick number sent */
Continuously do:If received tick(X) from f +1 different processes and X > C:
send tick(C+1),…, tick(X) to all [once]; C := X;If received tick(C) from n – f different processes:
send tick(C+1) to all [once]; C := C+1;
![Page 30: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator](https://reader030.fdocuments.in/reader030/viewer/2022040408/5ebec3de099cdc5d680a599d/html5/thumbnails/30.jpg)
Keynote SSS'08 U. Schmid 30
A Distributed Algorithm (III)For n ≥ 3f + 1 and up to f Byz. failures,
with end-to-end delays ∈[d,d+ε]:Suppose process p sends tick(C+1) at time tThen, process q also sends tick(C+1)by time t+d+2ε
⇒ Clock ticks occur approximately synchronously
On booting: send tick(0) to all; C := 0; If got tick(X) from f +1 procs and X > C: send tick(C+1),…, tick(X) to all [once]; C := X; If got tick(C) from n - f processes: send tick(C+1) to all [once]; C := C+1;
f + 1
n − f ≥ 2f + 1
p at t any q’ at t+ε any q at t+d+2ε
≤ ε≤ d+ε
![Page 31: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator](https://reader030.fdocuments.in/reader030/viewer/2022040408/5ebec3de099cdc5d680a599d/html5/thumbnails/31.jpg)
Keynote SSS'08 U. Schmid 31
How to implement this DA in VLSI ?
Mind: We don’t have any clock available for a synchronous implementation …
![Page 32: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator](https://reader030.fdocuments.in/reader030/viewer/2022040408/5ebec3de099cdc5d680a599d/html5/thumbnails/32.jpg)
Keynote SSS'08 U. Schmid 32
Asynchronous Basic Circuits
a
b
y
loop
b y
a
y
prop
a b0
10
01
10
1
yold0
1yold
AND, OR, …; Muller C-Gate:- Continuously computes y = y(a,b) [with delay tprop]- AND gate for signal transitions ( barrier synchronization)- Note: Inevitably involves feedback loop [tloop]
![Page 33: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator](https://reader030.fdocuments.in/reader030/viewer/2022040408/5ebec3de099cdc5d680a599d/html5/thumbnails/33.jpg)
Keynote SSS'08 U. Schmid 33
Asynchronous Communication
Convey alternating up/down signal transitions only FIFO “zero-bit message” channels [with delay]
performance penalty (serial data transmission)additional wires (parallel data transmission)
Sender Receiver
k-bit
k-bit data transmission costly: Additional circuitry +
Signal wires
![Page 34: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator](https://reader030.fdocuments.in/reader030/viewer/2022040408/5ebec3de099cdc5d680a599d/html5/thumbnails/34.jpg)
Keynote SSS'08 U. Schmid 34
Major Challenges
If received tick(X) from f +1 processes and X > C :send tick(C+1),…, tick(X) to all [once]C := X
If received tick(C) from n − f processes :send tick(C+1) to all [once]C := C+1
k-bit message, k unbounded
Atomicity of actions
To be replaced byzero-bit messages
k kept at receiver
To be ensured byarchitecture + pathdelay constraints
Build suitablethreshold circuits
Thresholdcomparison
![Page 35: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator](https://reader030.fdocuments.in/reader030/viewer/2022040408/5ebec3de099cdc5d680a599d/html5/thumbnails/35.jpg)
Keynote SSS'08 U. Schmid 35
k-bit Zero-bit Messages
...
...C
C
C
C
Rremote,in
C
C
C
C
NAND
NOR
NOR
NAND
NAND
NAND
GEQe
GRe
GEQo
GRo
Ctop
Pipe Compare Signal Generation
Diff-Gate Local PipeRemote Pipe
Counter Module
LocalClk
TG net feeds everyclock signal to everyTG alg (bus of width n)At every TG alg, n − 1 Counter Modules [oneper remote TG alg] maintain tick numbersAnonymous ticks ⇒rules only distinguish– r rem > r loc (f + 1, GR
rule) – r rem ≥ r loc (n − f, GEQ
rule)
Asynchronous up/down-counterTG alg 1
TG alg 6
TG alg 5
TG alg 4
TG alg 3
TG alg 2
TG net
On booting: send tick(0) to all; C := 0; If got tick(X) from f +1 procs and X > C: send tick(C+1),…, tick(X) to all [once]; C := X; If got tick(C) from n - f processes: send tick(C+1) to all [once]; C := C+1;
Move tick number maintenance from sender to receiver
![Page 36: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator](https://reader030.fdocuments.in/reader030/viewer/2022040408/5ebec3de099cdc5d680a599d/html5/thumbnails/36.jpg)
Keynote SSS'08 U. Schmid 36
Asynchron. Up/Down Counter
C
C
C
C
Rremote,in
C
C
C
C
NAND
NOR
NOR
NAND
NAND
NAND
GEQe
GRe
GEQo
GRo
Ctop
Pipe Compare Signal Generation
Diff-Gate Local PipeRemote Pipe
Counter Module
LocalClk
Ingredients:– Two elastic pipelines (= FIFO buffers for signal
transitions) count remote and local clock ticks– Common transitions removed by Diff-Gate– GR and GEQ status signals derived from last stages
Metastability-free by construction [well, almost …]
![Page 37: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator](https://reader030.fdocuments.in/reader030/viewer/2022040408/5ebec3de099cdc5d680a599d/html5/thumbnails/37.jpg)
Keynote SSS'08 U. Schmid 37
Atomicity of Actions
The gates making up the f + 1 and the n − f rulecompute continuously and concurrently, hence– may both produce tick(k), for the same k– this must be circumvented by all means [„once“]
How to ensure this atomicity?– Use separate circuitry for generating up-transitions (odd
k) and down-transitions (even k) → tick(k − 1) and tick(k) never mixed up
– Ensure that ratio of the maximum and minimum delay along certain paths is bounded (cp. Θ–Model [WLS05], ABC Model [RS08]) → tick(k − 2) and tick(k) nevermixed up
![Page 38: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator](https://reader030.fdocuments.in/reader030/viewer/2022040408/5ebec3de099cdc5d680a599d/html5/thumbnails/38.jpg)
Keynote SSS'08 U. Schmid 38
Threshold Modules
...
...
......
......
GR and GEQ statussignals of the n − 1 Counter Modules fedinto f +1 and n − fthreshold gatesBack-transition from status signals to transition-signalling for generating tick(k)
![Page 39: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator](https://reader030.fdocuments.in/reader030/viewer/2022040408/5ebec3de099cdc5d680a599d/html5/thumbnails/39.jpg)
Keynote SSS'08 U. Schmid 39
Proofs
![Page 40: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator](https://reader030.fdocuments.in/reader030/viewer/2022040408/5ebec3de099cdc5d680a599d/html5/thumbnails/40.jpg)
Keynote SSS'08 U. Schmid 40
Proofs & Implementations (SW)
abstraction
model (alg+sys)
implementation
SW
specificationproof
On booting: send tick(0) to all; C := 0; If got tick(X) from f +1 procs and X > C: send tick(C+1),…, tick(X) to all [once]; C := X; If got tick(C) from n - f processes: send tick(C+1) to all [once]; C := C+1;
- max precision- min/max frequency
Ticksync n TG Algs, f Byz.
Executable machine code, real system
Prove that the model meets the specificationMinimize „proof gap“ between model and implementation
Proof goals:
Tick synced FT clocks
Distr. state machine, Byzantine failures
TTP implementation
![Page 41: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator](https://reader030.fdocuments.in/reader030/viewer/2022040408/5ebec3de099cdc5d680a599d/html5/thumbnails/41.jpg)
Keynote SSS'08 U. Schmid 41
Proofs & Implementations (HW)
abstraction
model (alg+sys)
implementation
SW HW
partitioning & constraints
HW capabilities
specificationproof
On booting: send tick(0) to all; C := 0; If got tick(X) from f +1 procs and X > C: send tick(C+1),…, tick(X) to all [once]; C := X; If got tick(C) from n - f processes: send tick(C+1) to all [once]; C := C+1;
![Page 42: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator](https://reader030.fdocuments.in/reader030/viewer/2022040408/5ebec3de099cdc5d680a599d/html5/thumbnails/42.jpg)
Keynote SSS'08 U. Schmid 42
Hierarchical Proof
Specification of low-level building blocks Up/down ticks correctly simulate tick(k)Synchronization propertiesBounded Precision & FrequencyBounded space (pipeline)
tick-up/downInterlocking proof
tick(k), tick(k+1), …
(P)
Precision & Frequency
(U) (S)
Bounded space
![Page 43: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator](https://reader030.fdocuments.in/reader030/viewer/2022040408/5ebec3de099cdc5d680a599d/html5/thumbnails/43.jpg)
Keynote SSS'08 U. Schmid 43
On booting:send tick(0) to all; C := 0;
If got tick(X) from f +1 procs and X > C: send tick(C+1),…, tick(X) to all [once];C := X;
If got from n - f processes: send to all [once];C := C+1;
Interlocking Proof - “[once]”
k
k+1
k-2
x
tick-up/down
tick(k), tick(k+1), …
Interlocking proof
tick(k+1)tick(k)
x
tick(C)tick(C+1)
![Page 44: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator](https://reader030.fdocuments.in/reader030/viewer/2022040408/5ebec3de099cdc5d680a599d/html5/thumbnails/44.jpg)
Keynote SSS'08 U. Schmid 44
Higher-Level Properties
(P) Progress. If all correct nodes send tick(k) by time t, then every correct node sends at least tick(k+1) by t + T+.(U) Unforgeability. If no correct node sends tick(k) by time t, then no correct node sends tick(k+1) by t+T-
first.(S) Simultaneity. If some correct node sends tick(k) by time t, then every correct process sends at least tick(k) by t+T-
first
and, on top of those,
Precision & FrequencyBounded pipeline size
tick(k), tick(k+1), …
(P)
Precision & Frequency
(U) (S)
Bounded pipes
Prove elementary synchronization properties
![Page 45: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator](https://reader030.fdocuments.in/reader030/viewer/2022040408/5ebec3de099cdc5d680a599d/html5/thumbnails/45.jpg)
Keynote SSS'08 U. Schmid 45
Complete Suite of Proofs
[EDCC’06]
![Page 46: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator](https://reader030.fdocuments.in/reader030/viewer/2022040408/5ebec3de099cdc5d680a599d/html5/thumbnails/46.jpg)
Keynote SSS'08 U. Schmid 46
ack_ext ack_int
req_ext req_int
Remote Pipe
____
_G
EQe
GR
e
GEQ
o
___
GR
o
3f+1
1
= 2f+1 = 2f+1
= f+1 = f+1
......
......
Threshold Logic_____GEQe
GRe
GEQo
___GRo
clk_
out
Pipeline 1
Node p
...
...
...
Pipe Compare Signal Generators
CC
CC
CC
CC
C
Diff-GateCC
C
Local Pipe
rem
ote
clk_
in
External Pipe
Pipeline 2
Local PipeDiff-Gate
Pipe Compare Signal Gen.
ExternalPipe
Pipeline 3
Local PipeDiff-Gate
Pipe Compare Signal Gen.
RemotePipe
Pipeline 3f+1
LocalPipe
Diff-Gate
Pipe Compare Signal Gen.
...
Complete Implementation
Implementation of the model only needs to– implement the low-level building blocks as specified– ensure the additional delay ratio bounds for
interlocking proof (place & route constraints)
[DFT’06]
![Page 47: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator](https://reader030.fdocuments.in/reader030/viewer/2022040408/5ebec3de099cdc5d680a599d/html5/thumbnails/47.jpg)
Keynote SSS'08 U. Schmid 47
DARTS - Lessons Learned
Fault-tolerant distributed algorithms are indeed applicable in the VLSI context, but need “down-sizing” Distributed computing models with bounded delay ratio (Θ-Model, ABC model) well-suited for VLSI context (technology migration, re-using of models, etc.)Sole transition logic approach not sufficient for fault-tolerance ⇒ need a model that integrates event and state representationTime-free models suffer from a large “proof-gap” ⇒ need a model incorporating (continuous) timeFailures raise new metastability concerns ⇒ MS needs further investigation
![Page 48: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator](https://reader030.fdocuments.in/reader030/viewer/2022040408/5ebec3de099cdc5d680a599d/html5/thumbnails/48.jpg)
Keynote SSS'08 U. Schmid 48
Under the rug: Metastability …
[Stolen from Dagstuhl presentation of A. Steininger …]
![Page 49: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator](https://reader030.fdocuments.in/reader030/viewer/2022040408/5ebec3de099cdc5d680a599d/html5/thumbnails/49.jpg)
Keynote SSS'08 U. Schmid 49
Metastability
1
2
3
4
5
1 2 3 4 5
Inv 1
Inv 2
ui,2 = uo,1
ui,1 = uo,2
stable (HI)
stable (LO)
metastable
Bistable element(memory cell) withpositive feedback
![Page 50: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator](https://reader030.fdocuments.in/reader030/viewer/2022040408/5ebec3de099cdc5d680a599d/html5/thumbnails/50.jpg)
Keynote SSS'08 U. Schmid 50
Revisit Muller C-Element
1
01
0x
a
x
y
a
x
y
a
x
y
pure delay at gateand interconnect
limited output slope
normal operation
oscillationcreeping
b y
a
![Page 51: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator](https://reader030.fdocuments.in/reader030/viewer/2022040408/5ebec3de099cdc5d680a599d/html5/thumbnails/51.jpg)
Keynote SSS'08 U. Schmid 51
Error Containment
count pr
count pq
ThM
TG
node p
count qp
count qr
ThM
TG
node q
count rp
count rq
ThM
TG
node r
According to our proofs the wall holds – but we ignored metastability!
![Page 52: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator](https://reader030.fdocuments.in/reader030/viewer/2022040408/5ebec3de099cdc5d680a599d/html5/thumbnails/52.jpg)
Keynote SSS'08 U. Schmid 52
The Counter Module
count pr
count pq
ThM
TG
node p
count qp
count qr
ThM
TG
node q
count rp
count rq
ThM
TG
node r
C
C
C
C
Rremote,in
C
C
C
C
NAND
NOR
NOR
NAND
NAND
NAND
GEQe
GRe
GEQo
GRo
Ctop
Pipe Compare Signal Generation
Diff-Gate Local PipeRemote Pipe
Counter Module
LocalClk
purely combinational logicwon‘t hurt
BUT won‘t help
Muller C-ElementMetastable input may pass through!
![Page 53: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator](https://reader030.fdocuments.in/reader030/viewer/2022040408/5ebec3de099cdc5d680a599d/html5/thumbnails/53.jpg)
Keynote SSS'08 U. Schmid 53
The Threshold Module
count pr
count pq
ThM
TG
node p
count qp
count qr
ThM
TG
node q
count rp
count rq
ThM
TG
node r
Threshold Modulepurely combinational logic=> will not create metastability problem
BUT:
will propagate metastabilitywhile being near thethreshold
NO masking, NO protection
![Page 54: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator](https://reader030.fdocuments.in/reader030/viewer/2022040408/5ebec3de099cdc5d680a599d/html5/thumbnails/54.jpg)
Keynote SSS'08 U. Schmid 54
Metastability Containment ?
count pr
count pq
ThM
TG
node p
count qp
count qr
ThM
TG
node q
count rp
count rq
ThM
TG
node r
![Page 55: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator](https://reader030.fdocuments.in/reader030/viewer/2022040408/5ebec3de099cdc5d680a599d/html5/thumbnails/55.jpg)
Keynote SSS'08 U. Schmid 55
The End … © 2007, WDR
![Page 56: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator](https://reader030.fdocuments.in/reader030/viewer/2022040408/5ebec3de099cdc5d680a599d/html5/thumbnails/56.jpg)
Keynote SSS'08 U. Schmid 56
Some References[Bau05] R. Baumann. Radiation-induced soft errors in advanced semiconductor technologies. IEEE Transactions on Device and Materials Reliability 5(3):305--316, Sept. 2005.[BJ83] J. C. Barros and B. W. Johnson. Equivalence of the arbiter, the synchronizer, the latch, and the inertial delay. IEEE Trans. Comput., 32(7):603--614, 1983.[BZMLCLD02] R. Bhamidipati, A. Zaidi, S. Makineni, K. Low, R. Chen, K.-Y. Liu, and J. Dalgrehn. Challenges and methodologies for implementing high-performance network processors. Intel Technology Journal, 6(3):83--92, Aug. 2002.[BY07] A. Bink and R. York. Arm996hs, the first licensable, clockless 32-bit processor core. IEEE Micro, 25(2):58--68, February 2007.[Bor05] S. Borkar. Designing reliable systems from unreliable components: the challenges of transistor variability and degradation. IEEE Micro, 25(6):10--16, Nov. 2005.[Cha84] D. M. Chapiro. Globally-Asynchronous Locally-Synchronous Systems. PhD thesis, Stanford University, Oct. 1984.[Con03] C. Constantinescu. Trends and challenges in VLSI circuit reliability. IEEE Micro, 23(4):14--19, July 2003.[DH06a] S. Dolev and Y. Haviv. Self-stabilizing microprocessors, analyzing and overcoming soft-errors. IEEE Transactions on Computers, 55(4):385--399, Apr. 2006.[Dol00] S. Dolev. Self-Stabilization. MIT Press, 2000.[DR98] C. Dyer and D. Rodgers. Effects on spacecraft \& aircraft electronics. In Proceedings ESA Workshop on Space Weather, ESA WPP-155, pages 17--27, Nordwijk, The Netherlands, nov 1998. ESA. [DT08] S. Dolev and N. Tzachar. Brief announcment: Corruption resilient fountain codes. In DISC, pages 502--503, 2008.[FFSK06:DFT] M. Ferringer, G. Fuchs, A. Steininger, and G. Kempf. VLSI Implementation of a Fault-Tolerant Distributed Clock Generation. IEEE International Symposium on Defect and Fault-Tolerance in VLSI Systems (DFT2006), pages 563--571, Oct. 2006.
![Page 57: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator](https://reader030.fdocuments.in/reader030/viewer/2022040408/5ebec3de099cdc5d680a599d/html5/thumbnails/57.jpg)
Keynote SSS'08 U. Schmid 57
Some References
[FMRR07] R. Friedman, A. Mostefaoui, S. Rajsbaum, and M. Raynal. Asynchronous agreement and its relation with error-correcting codes. IEEE Trans. Comput., 56(7):865--875, 2007.[Fri01] E. G. Friedman. Clock distribution networks in synchronous digital integrated circuits. Proceedings of the IEEE, 89(5):665--692, May 2001.[FSFK06] M. Fuegger, U. Schmid, G. Fuchs, and G. Kempf. Fault-Tolerant Distributed Clock Generation in VLSI Systems-on-Chip. In Proceedings of the Sixth European Dependable Computing Conference (EDCC-6), pages 87--96. IEEE Computer Society Press, Oct. 2006.[ITRS05] International technology roadmap for semiconductors, 2005.[KHP04] T. Karnik, P. Hazucha, and J. Patel. Characterization of soft errors caused by singleevent upsets in CMOS processes. Dependable and Secure Computing, IEEE Transactions on, 1(2):128--143, April-June 2004.[KK98] I. Koren and Z. Koren. Defect tolerance in VLSI circuits: Techniques and yield analysis. Proceedings of the IEEE, 86(9):1819--1838, Sep 1998.[Lam84] L. Lamport. Buridan's principle. Technical report, SRI Technical Report, 1984.[Lam03] L. Lamport. Arbitration-free synchronization. Distributed Computing, 16(2/3):219--237, September 2003. [LP76] L. Lamport and R. Palais. On the glitch phenomenon. Technical report, SRI Technical Report, 1976.[LS03] G. Le Lann and U. Schmid. How to implement a timer-free perfect failure detector in partially synchronous systems. Technical Report 183/1-127, Department of Automation, Technische Universit\"at Wien, January 2003.[Mar81] L. Marino. General theory of metastable operation. IEEE Transactions on Computers, C-30(2):107--115, February 1981.[MA01] M. S. Maza and M. L. Aranda. Analysis of clock distribution networks in the presence of crosstalk and groundbounce. In Proceedings International IEEE Conference on Electronics, Circuits, and Systems (ICECS), pages 773--776, 2001.
![Page 58: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator](https://reader030.fdocuments.in/reader030/viewer/2022040408/5ebec3de099cdc5d680a599d/html5/thumbnails/58.jpg)
Keynote SSS'08 U. Schmid 58
Some References[Nic05] M. Nicolaidis. Design for soft error mitigation. Device and Materials Reliability, IEEE Transactions on, 5(3):405--418, Sept. 2005.[Nor96] E. Normand. Single-event effects in avionics. IEEE Transactions on Nuclear Science,43(2):461--474, Apr 1996.[PB93] M. Peercy and P. Banerjee. Fault tolerant VLSI systems. Proceedings of the IEEE, 81(5):745--758, May 1993.[Res01] P. J. Restle and others. A clock distribution network for microprocessors. IEEE Journal of Solid-State Circuits, 36(5):792--799, May 2001. [RDS90] L. M. Reyneri, D. DelCorso, and B. Sacco. Oscillatory metastability in homogeneous and nhomogeneous flip-flops. IEEE Journal of Solid-State Circuits, SC-25(1):254--264, February 1990.[RS08] P. Robinson and U. Schmid. The Asynchronous Bounded-Cycle Model. Proceedings SSS'08, 2008.[SE02] I. E. Sutherland and J. Ebergen. Computers without Clocks. Scientific American, 287(2):62--69, Aug. 2002.[Sut89] I. E. Sutherland. Micropipelines. Communications of the ACM, Turing Award, 32(6):720--738, June 1989. ISSN:0001-0782.[WLS05] J. Widder, G. Le Lann, and U. Schmid. Failure detection with booting in partially synchronous systems. In Proceedings of the 5th European Dependable Computing Conference (EDCC-5), volume 3463 of LNCS, pages 20--37, Budapest, Hungary, Apr. 2005. Springer Verlag.[WS05] J. Widder and U. Schmid. Achieving synchrony without clocks. Research Report 49/2005, Technische Universität Wien, Institut für Technische Informatik, 2005. (submitted).