Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi...

158
Synchronous Elastic Systems Synchronous Elastic Systems Mike Kishinevsky and Mike Kishinevsky and Jordi Cortadella Jordi Cortadella Universitat Universitat Politecnica Politecnica de Catalunya de Catalunya Barcelona, Spain Barcelona, Spain Intel Intel Strategic CAD Strategic CAD Labs Labs Hillsboro, USA Hillsboro, USA DAC Summer School DAC Summer School July 26, 2009 July 26, 2009
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    219
  • download

    1

Transcript of Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi...

Page 1: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

Synchronous Elastic Systems Synchronous Elastic Systems

Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella

Universitat Politecnica Universitat Politecnica de Catalunya de Catalunya

Barcelona, SpainBarcelona, Spain

IntelIntelStrategic CAD LabsStrategic CAD Labs

Hillsboro, USAHillsboro, USA

DAC Summer SchoolDAC Summer SchoolJuly 26, 2009July 26, 2009

Page 2: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

Contributors to SELF research

Micro-architectural pipelining, speculation Marc Galceran Oms, Timothy Kam Design experiments: Alexander GotmanovDesign experiments: Alexander GotmanovPerformance analysis: Jorge JPerformance analysis: Jorge Júúlvezlvez Theory of elastic machines:Sava Krstic and John O’LearyOptimization: Dmitry Bufistov, Josep CarmonaBill Grundmann

2

Page 3: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

AgendaAgenda

I.I. Basics of elastic systemsBasics of elastic systems

II.II. Why to study Why to study

III.III. Early evaluation and Early evaluation and performance analysisperformance analysis

IV.IV. Correct-by-construction pipeliningCorrect-by-construction pipelining

V.V. Communication fabricsCommunication fabrics

VI.VI. Open problemsOpen problems

3

Page 4: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

Token (of data)

Synchronous Stream of DataSynchronous Stream of Data

… 147Clock cycle 012…

4

Page 5: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

Token

Synchronous Elastic StreamSynchronous Elastic Stream

… 147012…

4 17

012… 345

Clock cycle

Clock cycle

…Bubble (no data)

5

Page 6: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

Synchronous CircuitSynchronous Circuit

+… 147 … 348

201…

Latency = 0

6

Page 7: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

Synchronous Elastic CircuitSynchronous Elastic Circuit

+

Latency = 0… 147

+Latency can vary

e

… 348201…

348…147…201…

7

Page 8: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

Ordinary Synchronous SystemOrdinary Synchronous System

A C

DB

A C

DB

=

Changing latencies changes behavior

8

Page 9: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

Synchronous Elastic Synchronous Elastic (characteristic property)(characteristic property)

A C

DB

A C

DB

=

Changing latencies does NOT change behavior = time elasticity

e

ee

e

ee

ee e

9

Page 10: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

Elasticity?Elasticity refers to elasticity of time, i.e. tolerance to changes Elasticity refers to elasticity of time, i.e. tolerance to changes in timing parameters, not properties of materialsin timing parameters, not properties of materials

Luca Carloni et al. in the first systematic study of such Luca Carloni et al. in the first systematic study of such systems called them Latency Insensitive Systemssystems called them Latency Insensitive Systems

Other used names: Other used names: – Latency tolerant systemsLatency tolerant systems– Synchronous emulation of asynchronous systems Synchronous emulation of asynchronous systems – Synchronous handshake circuitsSynchronous handshake circuits

We use term “synchronous elastic” to link to asynchronous We use term “synchronous elastic” to link to asynchronous elastic systems that have been developed before elastic systems that have been developed before e.g., David Muller’s pipelines of late 1950se.g., David Muller’s pipelines of late 1950s Ivan Sutherland’s micro-pipelines 1989Ivan Sutherland’s micro-pipelines 1989Tolerate the variability of input data arrival and computation Tolerate the variability of input data arrival and computation

delaysdelays

Asynchronous elastic tolerate changes in continuous timeAsynchronous elastic tolerate changes in continuous timeSynchronous elastic - in discrete timeSynchronous elastic - in discrete time

10

Page 11: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

WhyWhy

ScalableScalable Modular (Plug & Play) Modular (Plug & Play)

Potential for better energy-delay trade-offs Potential for better energy-delay trade-offs – design for typical case instead of worst casedesign for typical case instead of worst case– can separate performance critical parts from non-critical and can separate performance critical parts from non-critical and

optimize in isolationoptimize in isolation

New micro-architectural opportunities New micro-architectural opportunities in digital designin digital design

Not asynchronous: use existing design experience, CAD Not asynchronous: use existing design experience, CAD tools and flows... but have some advantages of tools and flows... but have some advantages of asynchronousasynchronous

11

Page 12: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

What can we do with synchronous elastic systems?

12

Page 13: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

Variable latency units

L = 1

L = 3

L = 2

L = 1

ALUALU

start done

13

Page 14: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

# adds

Benchmark“Patricia”from Media Bench

Statisticsof operandsizes

bits of adder used

1st operand 2nd o

pera

nd

Sign

ifica

nt b

its

# adds

12 bits of an adderdo 95% of additions

14

Page 15: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

Power-delay for an adder

1 125 15

Compare 64 bits VLA andprefix adder

relative delay 15

Page 16: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

Variable-latency cache hits

2-way associative 32KB

2-cycle hit

1-cycle hit

12-cycle miss

L1-cache

L2-cache

suggested by Joel Emer for ASIM experimentsuggested by Joel Emer for ASIM experiment

16

Page 17: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

Variable-latency cache hits

Pseudo-associative32KB

{1-2} cycle hit

1-cycle hit

12-cycle miss

L1-cache

L2-cache

Sequential access: if hit in first access L = 1, if not – L=2Trade-off: faster, or larger, or less power cache

17

Page 18: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

Variable-latency cache hits

Pseudo-associative64KB

{2-3} cycle hit

1-cycle hit

12-cycle miss

L1-cache

L2-cache

Sequential access: if hit in first access L = 1, if not – L=2Trade-off: faster, or larger, or less power cache

18

Page 19: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

Motivation exampleMotivation example

19

9 8 6

4 4

39

10

4

10 - Combinational block with delay 10

- Initialized register (dot)

Cycle time is

Throughput is 1

Effective cycle time is 21

211916

Retiming can not do better!

Retiming and Recycling (R&R) can

Effective cycle time is 19

Effective cycle time is 16

Throughput is 4/5

Effective cycle time is 15

12

Find a minimal effective cycle time of the circuit represented as retiming graph (RG)!

The longest combinational path delayThe number of valid data/clock cyclecycle time/throughput

Retiming graph

5 registers, 4 tokens

Page 20: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

Correct-by-construction automatic pipelining in presence of iteration dependencies

Transforms:– bypass– retiming– elasticize– early enabling– insert buffers

and negative tokens

– size elastic buffer capacity

ID E1 E2RFŸŸŸ

ID E1 E2RFŸŸŸ

ŸŸ

ŸŸ 1

01

0

-1

SPEC

IMP

Correct-by-construction

20

and correct-by-construction speculation

Page 21: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

21

How to Design Synchronous Elastic SystemsHow to Design Synchronous Elastic Systems

Example of the implementation:Example of the implementation:SELF = Synchronous Elastic FlowSELF = Synchronous Elastic Flow

Other implementations are possibleOther implementations are possible

Page 22: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

22

Pipelined communicationPipelined communicationsender receiver

DataData

What if the sender does not always send valid data?

Page 23: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

23

The Valid bitThe Valid bitsender receiver

Data Data

Valid Valid

What if the receiver is not always ready ?

Page 24: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

24

The Stop bitThe Stop bit

0000000000

sender

Data

Valid

Stop

receiver

Data

Valid

Stop

Page 25: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

25

The Stop bitThe Stop bit

1111000000

sender

Data

Valid

Stop

receiver

Data

Valid

Stop

Page 26: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

26

The Stop bitThe Stop bit

1111110000

sender

Data

Valid

Stop

receiver

Data

Valid

Stop

Page 27: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

27

The Stop bitThe Stop bit

1111111111

sender

Data

Valid

Stop

receiver

Data

Valid

Stop

Back-pressureBack-pressure

Page 28: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

28

The Stop bitThe Stop bit

1100000000

sender

Data

Valid

Stop

receiver

Data

Valid

Stop

Long combinational path

Page 29: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

29

Cyclic structuresCyclic structures

Data

Valid

Stop

Combinational cycle

One can build circuits with combinational cycles (constructive cycles by Berry), but synthesis and timing tools do not like them

Page 30: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

30

Example: pipelined linear communication chain Example: pipelined linear communication chain with transparent latcheswith transparent latches

sender receiverH L H L

½ cycle ½ cycle

Master and slave latches with independent control

Page 31: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

31

Shorthand notation Shorthand notation (clock lines not shown)(clock lines not shown)

D Q

clkEn

En

Page 32: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

32

SELF (linear communication)SELF (linear communication)sender receiver

V V V V

S S S S

En En En En

1 1

Data

Valid

Stop

Data

Valid

Stop

1 1

Page 33: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

33

SELFSELFsender receiver

V V V V

S S S S

En En En En

Data

Valid

Stop

Data

Valid

Stop

11

00

Page 34: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

34

sender receiver

V V V V

S S S S

En En En En

Data

Valid

Stop

Data

Valid

Stop

11

00

SELFSELF

Page 35: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

35

sender receiver

V V V V

S S S S

En En En En

Data

Valid

Stop

Data

Valid

Stop

11

00

SELFSELF

Page 36: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

36

sender receiver

V V V V

S S S S

En En En En

Data

Valid

Stop

Data

Valid

Stop

11

00

SELFSELF

Page 37: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

37

sender receiver

V V V V

S S S S

En En En En

Data

Valid

Stop

Data

Valid

Stop

11

00

SELFSELF

Page 38: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

38

sender receiver

V V V V

S S S S

En En En En

Data

Valid

Stop

Data

Valid

Stop

00

00

SELFSELF

Page 39: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

39

sender receiver

V V V V

S S S S

En En En En

Data

Valid

Stop

Data

Valid

Stop

00

00

SELFSELF

Page 40: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

40

sender receiver

V V V V

S S S S

En En En En

Data

Valid

Stop

Data

Valid

Stop

00

00

SELFSELF

Page 41: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

41

sender receiver

V V V V

S S S S

En En En En

Data

Valid

Stop

Data

Valid

Stop

00

00

SELFSELF

Page 42: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

42

sender receiver

V V V V

S S S S

En En En En

Data

Valid

Stop

Data

Valid

Stop

00

00

SELFSELF

Page 43: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

43

sender receiver

V V V V

S S S S

En En En En

Data

Valid

Stop

Data

Valid

Stop

11

11

SELFSELF

Page 44: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

44

sender receiver

V V V V

S S S S

En En En En

Data

Valid

Stop

11

11

Data

Valid

Stop

SELFSELF

Page 45: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

45

sender receiver

V V V V

S S S S

En En En En

Data

Valid

Stop

11

11

Data

Valid

Stop

SELFSELF

Page 46: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

46

sender receiver

V V V V

S S S S

En En En En

Data

Valid

Stop

11

11

Data

Valid

Stop

SELFSELF

Page 47: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

47

sender receiver

V V V V

S S S S

En En En En

Data

Valid

Stop

11

11

Data

Valid

Stop

SELFSELF

Page 48: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

48

sender receiver

V V V V

S S S S

En En En En

Data

Valid

Stop

11

11

Data

Valid

Stop

SELFSELF

Page 49: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

49

sender receiver

V V V V

S S S S

En En En En

Data

Valid

Stop

11

11

Data

Valid

Stop

SELFSELF

Page 50: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

50

sender receiver

V V V V

S S S S

En En En En

Data

Valid

Stop

11

11

Data

Valid

Stop

SELFSELF

Page 51: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

51

sender receiver

V V V V

S S S S

En En En En

Data

Valid

Stop

11

11

Data

Valid

Stop

SELFSELF

Page 52: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

52

sender receiver

V V V V

S S S S

En En En En

Data

Valid

Stop

11

00

Data

Valid

Stop

SELFSELF

Page 53: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

53

sender receiver

V V V V

S S S S

En En En En

11

00

Data

Valid

Stop

Data

Valid

Stop

SELFSELF

Page 54: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

54

sender receiver

V V V V

S S S S

En En En En

11

00

Data

Valid

Stop

Data

Valid

Stop

SELFSELF

Page 55: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

55

sender receiver

V V V V

S S S S

En En En En

11

00

Data

Valid

Stop

Data

Valid

Stop

SELFSELF

Page 56: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

56

sender receiver

V V V V

S S S S

En En En En

11

00

Data

Valid

Stop

Data

Valid

Stop

SELFSELF

Page 57: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

57

sender receiver

V V V V

S S S S

En En En En

Data

Valid

Stop

Data

Valid

Stop

11

00

SELFSELF

Page 58: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

58

sender receiver

V V V V

S S S S

En En En En

Data

Valid

Stop

Data

Valid

Stop

11

00

SELFSELF

Page 59: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

59

sender receiver

V V V V

S S S S

En En En En

Data

Valid

Stop

Data

Valid

Stop

11

00

SELFSELF

Page 60: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

60

Elastic channel and its protocol Elastic channel and its protocol

Idle Retry

Transfer

Valid * not Stop

not Valid Valid * Stop

SenderSender ReceiverReceiverData

Valid

Stop

Page 61: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

61

RetryRetry

TransferTransfer

Elastic channel protocolElastic channel protocol

SenderSender ReceiverReceiver

DataData

ValidValid

StopStop

DataData

ValidValid

StopStop

* D D * C C C B * A* D D * C C C B * A

0 1 1 0 1 1 1 1 0 10 1 1 0 1 1 1 1 0 1

0 0 1 0 0 1 1 0 0 00 0 1 0 0 1 1 0 0 0

IdleIdle

Page 62: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

62

Basic VS blockBasic VS block

SSii

EnEnii

VVii

SSi-1i-1

VVi-1i-1

VSSSii

EnEnii

VVii

SSi-1i-1

VVi-1i-1

VS block + data-path latch = elastic HALF-buffer (EHB)EHB + EHB = elastic buffer with capacity 2

Page 63: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

Control specification of the EB

63

Page 64: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

Two implementations

64

Page 65: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

65

Elastic buffer keeps data while stop is in flightElastic buffer keeps data while stop is in flight

W1R1

W2R1

W1R2

W2R2

W1R1 Cannot be done withSingle Edge Flopswithout double pumping

Can use latches inside Master-Slave as shown before

EBs = FIFOs with two parameters: Forward latency Capacity

Backward latency for stop propagation assumed (but need not be) equal to fwd latency

Typical case: (1,2) - 1 cycle forward latency with capacity of 2

Replaces “normal” registers Decoupling buffers

Page 66: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

66

JoinJoin

VS

+

V1

V2

S1

S2

V

S

VS

VS

Page 67: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

67

(Lazy) Fork(Lazy) Fork

V1

V2

S1

S2

V

S

Page 68: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

68

Eager ForkEager Fork

V1

V2

S1

S2

V

S

Page 69: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

69

Eager fork (another implementation)Eager fork (another implementation)

VS

VS VS

VSVS

Page 70: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

70

Variable Latency Units Variable Latency Units

[0 - k] cycles

[0 - k] cycles

V/S V/S

donego clear

Page 71: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

Coarse grain control

71

Page 72: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

72

ElasticizationElasticization

Synchronous Elastic

Page 73: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

73

CLKCLK

Page 74: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

74

CLKCLK

PC

IF/ID ID/EX EX/MEM MEM/WB

JJOOIINN

JJOOIINN

FFOORRKK

FORKFORK

Page 75: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

75

V

S

CLKCLK

V

S

V

S

V

S

V

S

JOIN

JOIN

FORK

FORK

Page 76: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

76

1

0

CLKCLK

1

0

1

0

1

0

1

0

JOIN

JOIN

FORK

FORK 0

0

Page 77: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

77

1

0

1

0

1

0

1

0

1

0

Elastic control layerGeneration of gated clocks

CLKCLK

Page 78: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

Equivalence

D: a b c d e d f g h i j …D: a b c d e d f g h i j …

Synchronous: stream of data

SELF: elastic stream of data

D: a * b * * c d e * d f * g h * * i j …D: a * b * * c d e * d f * g h * * i j …V: 1 0 1 1 0 1 1 1 1 1 1 1 1 1 0 1 1 1 …V: 1 0 1 1 0 1 1 1 1 1 1 1 1 1 0 1 1 1 …S: 0 0 0 1 0 0 0 0 1 0 0 1 0 0 0 1 0 0 … S: 0 0 0 1 0 0 0 0 1 0 0 1 0 0 0 1 0 0 …

Transfer sub-stream = original stream

Called: transfer equivalence, flow equivalence, or latency equivalence

78

Page 79: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

79

Marked Graph modelsMarked Graph modelsof elastic systemsof elastic systems

Page 80: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

80

Modelling elastic control with Petri netsModelling elastic control with Petri nets

data-tokendata-token

bubblebubble

data-tokendata-tokenbubblebubble

Page 81: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

81

Modelling elastic control with Petri netsModelling elastic control with Petri nets

data-tokendata-tokenbubblebubble 2 data-tokens2 data-tokens

Hiding internal transitions of elastic buffers

Page 82: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

82

Modelling elastic control with Marked GraphsModelling elastic control with Marked Graphs

Page 83: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

83

Forward Forward (Valid or Request)(Valid or Request)

Backward Backward (Stop or Acknowledgement)(Stop or Acknowledgement)

Modelling elastic control with Marked GraphsModelling elastic control with Marked Graphs

Page 84: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

84

Elastic control with Timed Marked Graphs.Elastic control with Timed Marked Graphs.Continuous time = asynchronousContinuous time = asynchronous

d=250ps d=151ps

250 151

Delays in time units

Page 85: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

85

Elastic control with Timed Marked Graphs.Elastic control with Timed Marked Graphs.Discrete time = synchronous elasticDiscrete time = synchronous elastic

d=1 d=1

1 1

Latencies in clock cycles

Page 86: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

86

Elastic control with Timed Marked Graphs.Elastic control with Timed Marked Graphs.Discrete time. Multi-cycle operationDiscrete time. Multi-cycle operation

d=2 d=1

2 1

Page 87: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

87

Elastic control with Timed Marked Graphs.Elastic control with Timed Marked Graphs.Discrete time. Variable latency operationDiscrete time. Variable latency operation

d {1,2} d=1

{1,2} 1

e.g. discrete probabilistic distribution: average latency 0.8*1 + 0.2*2 = 1.2

Page 88: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

88

Modeling forks and joinsModeling forks and joins

d=1

1

Page 89: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

89

Modelling combinational elastic blocksModelling combinational elastic blocks

d=1 d=0

1 0

Page 90: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

90

Elastic Marked GraphsElastic Marked Graphs

An Elastic Marked Graph (EMG) is a Timed MG such that for any arc a there exists a complementary arc a’ satisfying the following condition •a = a’• and •a’ = a•

Initial number of tokens on a and a’ (M0(a)+M0(a’)) = capacity of the corresponding elastic buffer

Similar forms of “pipelined” Petri Nets and Marked Graphs have been previously used for modeling pipelining in HW and SW (e.g. Patil 1974; Tsirlin, Rosenblum 1982)

Page 91: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

91

Reminder: Performance analysis of Marked graphsReminder: Performance analysis of Marked graphs

Efficient algorithms: (Karp 1978), (Dasdan,Gupta 1998)

A B C

Th(C)=2/5

Th(B)=3/5

Th(A)=3/7

Th=min(Th(A), Th(B), Th(C))=2/5

Th = operations / cycle = number of firings per time unit Th = operations / cycle = number of firings per time unit

The throughput is given by theminimum mean-weight cycle

Page 92: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

92

Early evaluationEarly evaluation

Naïve solution: introduce choice places Naïve solution: introduce choice places – issue tokens at choice node only into one (some) relevant pathissue tokens at choice node only into one (some) relevant path– problem: tokens can arrive to merge nodes out-of-order problem: tokens can arrive to merge nodes out-of-order

later token can overpass the earlier one later token can overpass the earlier one

Solution: change enabling rule Solution: change enabling rule – early evaluationearly evaluation– issue negative tokens to input places without tokens, issue negative tokens to input places without tokens,

i.e. keep the same firing rulei.e. keep the same firing rule– Add symmetric sub-channels with negative tokensAdd symmetric sub-channels with negative tokens– Negative tokens kill positive tokens when meetNegative tokens kill positive tokens when meet

Two related problems: Two related problems: Early evaluation and Exceptions (how to kill a data-token)Early evaluation and Exceptions (how to kill a data-token)

Page 93: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

93

Examples of early evaluationExamples of early evaluation

MULTIPLIER

a

bc if a = 0 then c := 0 -- don’t wait for b*

MULTIPLEXOR

a

bc

s

if s = T then c := a -- don’t wait for b else c := b -- don’t wait for a

T

F

Page 94: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

94

Related workRelated work

Petri netsPetri nets– Extensions to model OR causalityExtensions to model OR causality

Kishinevsky et al. Change Diagrams [e.g. book of 1994]Kishinevsky et al. Change Diagrams [e.g. book of 1994]Yakovlev et al. Causal Nets 1996Yakovlev et al. Causal Nets 1996

Asynchronous systemsAsynchronous systems– Reese et al 2002: Early evaluationReese et al 2002: Early evaluation– Brej 2003: Early evaluation with anti-tokensBrej 2003: Early evaluation with anti-tokens– Ampalan & Singh 2006: preemption using anti-tokensAmpalan & Singh 2006: preemption using anti-tokens

Page 95: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

95

Dual Marked GraphDual Marked GraphMarking: Marking: Arcs (places) Z(allow negative markings)Some nodes are labeled as early-enabling

Enabling rules for a node:– Positive enabling: M(a) > 0 for every input arc– Early enabling (for early enabling nodes):

M(a) > 0 for some input arcs– Negative enabling: M(a) < 0 for every output arc

Firing rule: the same as in regular MG

Page 96: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

96

Dual Marked GraphsDual Marked Graphs

Early enabling can be associated with an external guard that depends on data variables (e.g., a select signal of a multiplexor) Actual enabling guards are abstracted away (unless needed)Anti-token generation: When an early enabled node fires, it generates anti-tokens in the predecessor arcs that had no tokensAnti-token propagation counterflow: When negative enabled node fires, it propagates the anti-tokens from the successor to the predecessor arcs

Page 97: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

97

Dual Marked Graph modelDual Marked Graph model

-1

Enabled !

-1

-1

-1

-1

Page 98: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

98

Passive anti-tokenPassive anti-tokenPassive DMG = version of DMG without negative enablingPassive DMG = version of DMG without negative enablingNegative tokens can only be generated due to early Negative tokens can only be generated due to early enabling, but cannot propagateenabling, but cannot propagateLet Let DD be a strongly connected DMG such that all cycles be a strongly connected DMG such that all cycles have positive cumulative markinghave positive cumulative marking

Let Let DDpp be a corresponding passive DMGbe a corresponding passive DMG. .

If environment (consumers) never generate negative If environment (consumers) never generate negative tokens, and there are no multi-cycle operations then tokens, and there are no multi-cycle operations then throughput (throughput (DD) = throughput () = throughput (DDpp))

– If capacity of input places for early enabling transitions is unlimited, If capacity of input places for early enabling transitions is unlimited, then active anti-tokens do not improve performancethen active anti-tokens do not improve performance

– Active anti-tokens reduce activity in the data-path Active anti-tokens reduce activity in the data-path (good for power reduction)(good for power reduction)

Page 99: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

99

Properties of DMGsProperties of DMGsFiring invariant: Let node n be simultaneously positive (early) and negative enabled in marking M. Let M1 be the result of firing n from M due to positive (early) enabling. Let M2 be the result of firing n from M due to negative enabling. Then, M1 = M2

Token preservation. Let c be a cycle of a strongly connected DMG with initial marking M0. For every reachable marking M : M(c) = M0(c)

Liveness. A strongly connected passive DMG is live iff for every cycle c: M(c) > 0. – For DMGs this is a sufficient condition of liveness– It is also a necessary condition for positive liveness

Repetitive behavior. In a SC DMG: a firing sequence s from M leads to the same marking iff every node fires in s the same number of times

DMGs have properties similar to regular MGs

Page 100: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

100

Implementing early enablingImplementing early enabling

Page 101: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

101

How to implement anti-tokens ?How to implement anti-tokens ?

Positive tokens

Negative tokens

Page 102: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

102

How to implement anti-tokens ?How to implement anti-tokens ?

Positive tokens

Negative tokens

Page 103: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

103

How to implement anti-tokens ?How to implement anti-tokens ?

ValidValid++ ValidValid++

ValidValid––

ValidValid++

StopStop++ StopStop++

ValidValid––

StopStop––StopStop––

+

-

Page 104: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

104

Controller for elastic bufferController for elastic buffer

V

S

V

S

Data

H

H

L

L

L

H

V

S

V

S

En En

Page 105: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

105

Dual controller for elastic bufferDual controller for elastic buffer

S+

V+

V-

S-

S+

V+

V-

S-

En En

Page 106: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

Dual Join and Fork

106

Page 107: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

Join with early evaluation

107

Page 108: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

Condition on Early Evaluation Function

Early evaluation function makes decision based on presence of valid bits, not on their absence

Formally: EE is positive unate with respect to data input

Example: legal EE function for a data-path MUX (s – select input)

108

Page 109: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

109

Passive anti-token (capacity one)Passive anti-token (capacity one)

Bigger capacity can be achieved by “injecting” anti-token up-down counters on elastic channels

Page 110: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

Properties of elastic channels

Invariants: mutually exclusive Kill (V -) and Stop (S +) Valid (V +) and retain of a kill (S -)

110

Page 111: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

111

ConclusionsConclusions

Early evaluation can increase performance Early evaluation can increase performance beyond the min cycle ratiobeyond the min cycle ratio

The duality between positive and negative The duality between positive and negative tokens suggests a clean and effective tokens suggests a clean and effective implementationimplementation

Dual Marked Graphs is a formal model for Dual Marked Graphs is a formal model for analytical analysis and optimization methodsanalytical analysis and optimization methods

Page 112: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

Performance analysis Performance analysis with early evaluationwith early evaluation

112

Page 113: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

113

Revisit Performance Analysis of Marked GraphsRevisit Performance Analysis of Marked Graphs

t

pttp )d(mm

0

1lim

The throughput can also be computed by means ofThe throughput can also be computed by means oflinear programminglinear programming

Average marking

ppmth min

Throughput

),min( 21 pp mmth

t1 t2

t3

p1 p2

[Campos, Chiola, Silva 1991]

Page 114: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

114

a

b

d c

p1 p2

p3 p4

p5

max th

mp1 = 1 + tb – ta

mp2 = 0 + ta – tb

mp3 = 1 + td – ta

mp4 = 0 + ta – tc

mp5 = 1 + tc – td

Th = 0.5Th = 0.5

reachability

th ≤ mp2 // transition b

th ≤ mp4 // transition c

th ≤ mp5 // transition d

th ≤ min(mp1, mp3) // transition a

th constraints

Revisit Performance Analysis of Marked GraphsRevisit Performance Analysis of Marked Graphs

Page 115: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

115

GMG = Multi-guarded Dual Marked GraphGMG = Multi-guarded Dual Marked Graph

Refinement of passive DMGsRefinement of passive DMGsEvery node has a set of guardsEvery node has a set of guardsEvery guard is a set of input places (arcs)Every guard is a set of input places (arcs)

Example:Example:t1 t2

t4

p1 p2

t3

p3

G(t4)={{p1,p3},{p2,p3}}

Page 116: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

116

Early evaluationEarly evaluation

1-

1-

Page 117: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

117

Early evaluationEarly evaluation

1

1

(0.43)(0.43) (0.60)(0.60) (0.40)(0.40)

0.600.600.540.540.490.490.460.460.440.440.430.431.01.0

0.540.540.510.510.480.480.460.460.440.440.430.430.80.8

0.490.490.480.480.470.470.450.450.440.440.430.430.60.6

0.450.450.450.450.450.450.440.440.440.440.430.430.40.4

0.430.430.430.430.420.420.420.420.420.420.420.420.20.2

0.400.400.400.400.400.400.400.400.400.400.400.400.00.0

1.01.00.80.80.60.60.40.40.20.20.00.0

0.600.600.540.540.490.490.460.460.440.440.430.431.01.0

0.540.540.510.510.480.480.460.460.440.440.430.430.80.8

0.490.490.480.480.470.470.450.450.440.440.430.430.60.6

0.450.450.450.450.450.450.440.440.440.440.430.430.40.4

0.430.430.430.430.420.420.420.420.420.420.420.420.20.2

0.400.400.400.400.400.400.400.400.400.400.400.400.00.0

1.01.00.80.80.60.60.40.40.20.20.00.0

Page 118: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

118

LP formulation for an upper bound of a LP formulation for an upper bound of a throughput (by example) throughput (by example)

Th = (2 - Th = (2 - αα) / (3 - ) / (3 - αα))

1-

a

b

d c

p1 p2

p3 p4

p5

max th

mp1 = 1 + tb – ta

mp2 = 0 + ta – tb

mp3 = 1 + td – ta

mp4 = 0 + ta – tc

mp5 = 1 + tc – td

th ≤ mp2

th ≤ mp4

th ≤ mp5

th = mp1 + (1-) mp3

Page 119: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

119

Averaging cycle throughput or cycle times Averaging cycle throughput or cycle times does not workdoes not work

Th = (2 - Th = (2 - αα) / (3 - ) / (3 - αα))

1-

a

b

d c

p1 p2

p3 p4

p5

1/21/2

2/32/3

Th’ = Th’ = αα 1/2 + (1- 1/2 + (1- αα) 2/3 = (4 - ) 2/3 = (4 - αα) / 6 ) / 6

1/Th” = 21/Th” = 2αα + (1- + (1- αα) 3/2) 3/2==(3 + (3 + αα) / ) / 22

Th” = 2/(3+ Th” = 2/(3+ αα))

Averaging throughput of individual cycles

Averaging effective cycle times of individual cycles

Page 120: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

Correct-by-construction pipeliningCorrect-by-construction pipelining

121

Page 121: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

Notation for elastic systems

Elastic buffer (latency=1, capacity=2)

with one token of information

Empty elastic buffer (latency=1, capacity=2)

Channel with an injector of k negative tokens -k

Empty elastic buffer (latency=0, capacity=m) m

Page 122: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

Elastic transforms

=

m=

=-1 -1 -2

-1=

-k=...

k

...

k

Page 123: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

W R

wa ra

wd rd

= W R

wa ra

=

rdwd0

1

Bypass transform

Classic transform. Works for elastic systems

Page 124: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

Pipelining by example

rawa

F2F1 F4F3

RF

Convert to elastic form

Page 125: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

Pipelining by example

rawa

F2F1 F4F3

RF

Handshakes added to the environmentHandshakes added to the environment

Page 126: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

Pipelining by example

rawa

F2F1 F4F3

RF

Bypass 4 times

Page 127: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

Pipelining by example

rawa

F2F1 F4F3

=

Early evaluation elastic multiplexer

Only waits for the branch that is needed

Page 128: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

Pipelining by example

rawa

F2F1 F4F3

=

To simplify the presentation:

Assume only dependencies of depth 1

Prune away unneeded bypasses

Page 129: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

Pipelining by example

rawa

F2F1 F4F3

=

Page 130: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

Pipelining by example

rawa

F2F1 F4F3

=

Insert empty buffers

Page 131: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

Pipelining by example

rawa

F2F1 F4F3

=

Cannot retime!

Page 132: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

Pipelining by example

rawa

F2F1 F4F3

=

Insert empty buffers.

To enable retiming use a form with

negative tokens

Page 133: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

Pipelining by example

rawa

F2F1 F4F3

=

-3

Page 134: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

Pipelining by example

rawa

F2F1 F4F3

=

-3

Retime

Assume for now retiming is done like in

normal synchronous designs

Will come back to this later

Page 135: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

Pipelining by example

rawa

F2F1 F4F3

=

-3

Page 136: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

Pipelining by example

rawa

F2F1 F4F3

=

-3

Deadlock!

Page 137: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

Why deadlock?

-3

Page 138: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

Why deadlock?

-3

Positively live system all cycles have positive marking

Cycle with sum of tokens = -2

Page 139: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

Transformations

Correct designs

?

Page 140: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

Retiming of Elastic Buffers

F

-1

F

-1

A

A

Page 141: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

F-1

Deadlock!

F

-1

A

Retiming of Elastic Buffers

Retiming move removed required buffer capacity from the old location

Page 142: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

How to fix deadlock

-3

Extra capacity

Page 143: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

Pipelining by example

rawa

F2F1 F4F3

=

-3

Size buffers

(by adding (0,k) buffers)

Page 144: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

Pipelining by example

rawa

F2F1 F4F3

=

-3

3

Correct fully pipelined design

Page 145: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

Transformations

Correct designsupsize

Page 146: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

F-1

F

-1

1

A

Retiming of Elastic Buffers

Would require solving capacity sizing problem for every retiming move

Page 147: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

F-1

F

-1

12

A

2

Retiming of Elastic Buffers

Conservatively preserve previous capacity

Page 148: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

Transformations

Correct designs

upsize downsize

conservativelypreserve capacity

Page 149: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

150

Correctness (short story)Correctness (short story)

Developed theory of elastic machines Developed theory of elastic machines (for late evaluation)(for late evaluation)Verify correctness of any elastic implementation = check Verify correctness of any elastic implementation = check conformance with the definition of elastic machineconformance with the definition of elastic machineAll SELF controllers are verified for conformanceAll SELF controllers are verified for conformanceElasticization is correct-by-constructionElasticization is correct-by-construction

Theory for early evaluation and negative delays is more Theory for early evaluation and negative delays is more challengingchallenging– Sketch of a theory, but no fully satisfactory compositional Sketch of a theory, but no fully satisfactory compositional

properties found yetproperties found yet– Verification done on concrete systems and controllersVerification done on concrete systems and controllers

Page 150: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

What is a Communication Fabric?

Part of the design that pushes data aroundGlue between different IP blocksInclude not only wires, but also…– switches, arbiters, routers, buffers and queues,

addressing logic, logic managing credits, logic for cache coherency, starvation and deadlock prevention, clock and power down logic etc.

Often has regular parts (e.g. ring or mesh topology), but need not be

Elasticity is a natural requirement, but different notion of equivalence: only relative order matters

Page 151: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

Many Communication Fabrics

High-end interconnect– Connects cores in high-end chips– Implements cache coherence

IO/Mem fabrics– PC MCH (Memory Control Hub), PCH, SCH

Implements PCI-compatible memory-mapped IO

– SOC chipsSystem Interconnect Memory ControllerOften simpler than PCI: no configuration, etc.

Message fabrics– Power messages, sideband wires, etc. in

most designs– Don’t care about performance

Core i7-based

Platform

Atom-based Platform

Page 152: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

Tree topology NoC

R

R

RR

R

AGENT

AGENT

AGENT

AGENT

AGENT

AGENT

AGENT

[In collaboration with Ken Stevens, Charles Dike, Bill Grundmann] 153

Page 153: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

Router node interface

RouterA

B

C

154

Page 154: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

NoC Router

EHB M

S

A

EHB

EHB

S

M

S

M

C

B

Relative order of tokens between agents is preserved

155

Page 155: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

Switch and Merge

156

Page 156: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

Some open problems

Better performance analysis (bounds) for system with early evaluation

Given: The number and sizes of IP blocks & communication requirements & message ordering constraints & flow control rates

Find: Optimal floorplans & communication fabrics in (perf, area, energy) space

Compositional theory of elastic machines with early evaluation

Given: a class of communication fabric & message ordering constraints & flow control details

Prove: no deadlocks, every message gets delivered

Page 157: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

SummarySummary

SELF gives a low cost implementation of elastic machines SELF gives a low cost implementation of elastic machines

Functionality is correct when latencies change Functionality is correct when latencies change

New micro-architectural opportunities and new automatuion New micro-architectural opportunities and new automatuion methodsmethods

Compositional theory proving correctnessCompositional theory proving correctness

Early evaluation - mechanism for performance and power Early evaluation - mechanism for performance and power optimizationoptimization

Applications to design of NoCs and communication fabrics Applications to design of NoCs and communication fabrics

158

Page 158: Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Universitat Politecnica de Catalunya Barcelona,

See reference list for some relevant publications

159