Compositional Dataflow Circuits

Post on 18-Dec-2021

4 views 0 download

Transcript of Compositional Dataflow Circuits

Compositional Dataflow Circuits

Stephen A. Edwards Richard Townsend Martha A. Kim

Columbia University

MEMOCODE, Vienna, Austria, October 1, 2017

gcd(a,b) =if a = b

aelse if a < b

gcd(a,b −a)else

gcd(a −b,b)

1 0 1 0

a b

=

mux

1 0 1 0

1 0 1 0

fork

gcd(a,b)discard

<

1

initialtoken

1 0 1 0

demux

− −

gcd(a,b) =if a = b

aelse if a < b

gcd(a,b −a)else

gcd(a −b,b)

1 0 1 0

a b

=

mux

1 0 1 0

1 0 1 0

fork

gcd(a,b)discard

<

1

initialtoken

1 0 1 0

demux

− −

gcd(a,b) =if a = b

aelse if a < b

gcd(a,b −a)else

gcd(a −b,b)

1 0 1 0

a b

=

mux

1 0 1 0

1 0 1 0

fork

gcd(a,b)discard

<

1

initialtoken

1 0 1 0

demux

− −

gcd(a,b) =if a = b

aelse if a < b

gcd(a,b −a)else

gcd(a −b,b)

1 0 1 0

a b

=

mux

1 0 1 0

1 0 1 0

fork

gcd(a,b)discard

<

1

initialtoken

1 0 1 0

demux

− −

gcd(a,b) =if a = b

aelse if a < b

gcd(a,b −a)else

gcd(a −b,b)

1 0 1 0

a b

=

mux

1 0 1 0

1 0 1 0

fork

gcd(a,b)discard

<

1

initialtoken

1 0 1 0

demux

− −

gcd(a,b) =if a = b

aelse if a < b

gcd(a,b −a)else

gcd(a −b,b)

1 0 1 0

a b

=

mux

1 0 1 0

1 0 1 0

fork

gcd(a,b)discard

<

1

initialtoken

1 0 1 0

demux

gcd(a,b) =if a = b

aelse if a < b

gcd(a,b −a)else

gcd(a −b,b)

1 0 1 0

a b

=

mux

1 0 1 0

1 0 1 0

fork

gcd(a,b)discard

<

1

initialtoken

1 0 1 0

demux

gcd(a,b) =if a = b

aelse if a < b

gcd(a,b −a)else

gcd(a −b,b)

Townsend et al.

CC ’2017

1 0 1 0

a b

=

mux

1 0 1 0

1 0 1 0

fork

gcd(a,b)discard

<

1

initialtoken

1 0 1 0

demux

− −

Patience Through Handshaking

Want patient blocks to handle delays from

Memory systemsData-dependent

computations

Full buffersShared resourcesBusy computational

units

up

stream

do

wn

stream

data

valid

ready

valid ready Meaning

1 1 Token transferred1 0 Token valid; held0 − No token to transfer

Latency-insensitive Design (Carloni et al.)Elastic Circuits (Cortadella et al.)FIFOs with backpressure

Patience Through Handshaking

Want patient blocks to handle delays from

Memory systemsData-dependent

computations

Full buffersShared resourcesBusy computational

units

up

stream

do

wn

stream

data

valid

ready

valid ready Meaning

1 1 Token transferred1 0 Token valid; held0 − No token to transfer

Latency-insensitive Design (Carloni et al.)Elastic Circuits (Cortadella et al.)FIFOs with backpressure

Combinational Function Block

Strict/Unit Rate:All input tokens required to produce an output

in0

in1 out

f

Datapath

Combinational function ignores flow control

Combinational Function Block

Strict/Unit Rate:All input tokens required to produce an output

in0

in1 out

f

Valid network

Output valid if both inputs are valid

Combinational Function Block

Strict/Unit Rate:All input tokens required to produce an output

in0

in1 out

f

Ready network

Input tokens consumed if output token is consumed(output is valid and ready)

Multiplexer Block

in0 in1 in2

out

select

in0

in1

in2

select out

decoder

Demultiplexer Block

out0 out1 out2

in

select

select

in

out2

out1

out0

decoder

Buffering a Linear Pipeline (Point 1/4)

Combinational block

Data buffer:Pipeline registerwith valid, enable

01

Control Buffer:Register diverts token whendownstream suddenly stops

01

01 0

10

Long Combinational Path (Data + Valid)

Long Combinational Path (Ready)

Cao et al. MEMOCODE 2015Inspired by Carloni’s Latency Insensitive Design (e.g., MEMOCODE 2007)

Buffering a Linear Pipeline (Point 1/4)

Combinational blockData buffer:Pipeline registerwith valid, enable

01

Control Buffer:Register diverts token whendownstream suddenly stops

01

01 0

10

Long Combinational Path (Data + Valid)

Long Combinational Path (Ready)

Cao et al. MEMOCODE 2015Inspired by Carloni’s Latency Insensitive Design (e.g., MEMOCODE 2007)

Buffering a Linear Pipeline (Point 1/4)

Combinational block

Data buffer:Pipeline registerwith valid, enable

01

Control Buffer:Register diverts token whendownstream suddenly stops

01

01 0

10

Long Combinational Path (Data + Valid)

Long Combinational Path (Ready)

Cao et al. MEMOCODE 2015Inspired by Carloni’s Latency Insensitive Design (e.g., MEMOCODE 2007)

Buffering a Linear Pipeline (Point 1/4)

Combinational blockData buffer:Pipeline registerwith valid, enable

01

Control Buffer:Register diverts token whendownstream suddenly stops

01

01 0

10

Long Combinational Path (Data + Valid)

Long Combinational Path (Ready)

Cao et al. MEMOCODE 2015Inspired by Carloni’s Latency Insensitive Design (e.g., MEMOCODE 2007)

Buffering a Linear Pipeline (Point 1/4)

Combinational blockData buffer:Pipeline registerwith valid, enable

01

Control Buffer:Register diverts token whendownstream suddenly stops

01

01 0

10

Long Combinational Path (Data + Valid)

Long Combinational Path (Ready)

Cao et al. MEMOCODE 2015Inspired by Carloni’s Latency Insensitive Design (e.g., MEMOCODE 2007)

The Problem with Fork

Combinational Block:inputs ready whenboth valid &output ready

Fork:outputs valid onlywhen all are ready

Oops: Combinational CycleThis is not compositional

The Problem with Fork

Combinational Block:inputs ready whenboth valid &output ready

Fork:outputs valid onlywhen all are ready

Oops: Combinational CycleThis is not compositional

The Problem with Fork

Combinational Block:inputs ready whenboth valid &output ready

Fork:outputs valid onlywhen all are ready

Oops: Combinational CycleThis is not compositional

The Problem with Fork

Combinational Block:inputs ready whenboth valid &output ready

Fork:outputs valid onlywhen all are ready

Oops: Combinational CycleThis is not compositional

The Problem with Fork

Combinational Block:inputs ready whenboth valid &output ready

Fork:outputs valid onlywhen all are ready

Oops: Combinational CycleThis is not compositional

The Solution to Combinational Loops (Point 2/4)

valid

ready

XXXXX

Allowed: Combinationalpaths from valid to ready

Prohibited: Combinationalpaths from ready to valid

The Solution to Combinational Loops (Point 2/4)

valid

ready

XXXXX

Allowed: Combinationalpaths from valid to ready

Prohibited: Combinationalpaths from ready to valid

The Solution to Combinational Loops (Point 2/4)

valid

ready

XXXXX

Allowed: Combinationalpaths from valid to ready

Prohibited: Combinationalpaths from ready to valid

The Solution to Combinational Loops (Point 2/4)

valid

readyXXXXX

Allowed: Combinationalpaths from valid to ready

Prohibited: Combinationalpaths from ready to valid

The Solution to Fork: A Little State (Point 3/4)

in

out2

out1

out0

Valid out ignores readyof other outputs

Flip-flop set after tokensent suppresses duplicates

Input consumed once onetoken sent on every output

The Solution to Fork: A Little State (Point 3/4)

in

out2

out1

out0

Valid out ignores readyof other outputs

Flip-flop set after tokensent suppresses duplicates

Input consumed once onetoken sent on every output

The Solution to Fork: A Little State (Point 3/4)

in

out2

out1

out0

Valid out ignores readyof other outputs

Flip-flop set after tokensent suppresses duplicates

Input consumed once onetoken sent on every output

Nondeterministic Merge (Point 4/4)

f f fShare with

merge/demux

merge

f

demux

select

Two-Way Nondeterministic Merge Block w/ Select

in0

in1

out

sel

01

Arb

iter

“Two-way fork with multiplexed outputselected by an arbiter”

Experiments: Random Buffer Placement

0

2

4

6

2 4 6 8 10

(7 buffers)Com

plet

ion

Tim

e (µ

s)

Number of buffer pairs

GCD(100,2)

0

750

1500

2250

2 4 6 8 10

(80 buffers)

21-way Conveyor

0

1

2

3

2 4 6 8 10

(96 buffers)

BSN

Best Buffering for GCD (Manually Obtained)

Each loop has one of each buffer

Data Buffer

Control Buffer

Summary

Compositional Dataflow Networks as an IR

Patient dataflow blocks with valid/ready handshaking

1. Break downstream, upstream paths w/ two buffer types

2. Avoid comb. cycles: prohibit ready-to-valid paths

3. Add one state bit per output so forks may “race ahead”

4. Tame nondeterministic merge with a select output

Random buffer placement experiments show it works