Combinational Circuits in Bluespec Arvind Computer Science & Artificial Intelligence Lab

28
Combinational Circuits in Bluespec Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 9, 2011 L03-1 http:// csg.csail.mit.edu/6.375

description

Combinational Circuits in Bluespec Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology. Now we call this Guarded Atomic Actions. Bluespec: Two-Level Compilation. Bluespec (Objects, Types, Higher-order functions). Lennart Augustsson - PowerPoint PPT Presentation

Transcript of Combinational Circuits in Bluespec Arvind Computer Science & Artificial Intelligence Lab

Page 1: Combinational Circuits in Bluespec Arvind  Computer Science & Artificial Intelligence Lab

Combinational Circuits in Bluespec

Arvind Computer Science & Artificial Intelligence LabMassachusetts Institute of Technology

February 9, 2011 L03-1http://csg.csail.mit.edu/6.375

Page 2: Combinational Circuits in Bluespec Arvind  Computer Science & Artificial Intelligence Lab

Bluespec: Two-Level Compilation

Object code(Verilog/C)

Rules and Actions(Term Rewriting System)

• Rule conflict analysis• Rule schedulingJames Hoe & Arvind@MIT 1997-2000

Bluespec(Objects, Types,

Higher-order functions)

Level 1 compilation• Type checking• Massive partial evaluation

and static elaboration

Level 2 synthesis

Lennart Augustsson@Sandburst 2000-2002

Now we call this Guarded Atomic Actions

February 9, 2011 L03-2http://csg.csail.mit.edu/6.375

Page 3: Combinational Circuits in Bluespec Arvind  Computer Science & Artificial Intelligence Lab

Static Elaboration

.exe

compiledesign2 design3design1

elaborate w/params

run1run1run2.1…

run1run1run3.1…

run1run1run1.1…

run w/params

run w/params

run1run1…

run

At compile time Inline function calls and unroll loops Instantiate modules with specific parameters Resolve polymorphism/overloading, perform most

data structure operations

sourceSoftwareToolflow: source

HardwareToolflow:

February 9, 2011 L03-3http://csg.csail.mit.edu/6.375

Page 4: Combinational Circuits in Bluespec Arvind  Computer Science & Artificial Intelligence Lab

Combinational IFFTin0

in1

in2

in63

in3in4

Bfly4

Bfly4

Bfly4

x16

Bfly4

Bfly4

Bfly4

Bfly4

Bfly4

Bfly4

out0

out1

out2

out63

out3out4

Permute

Permute

Permute

All numbers are complex and represented as two sixteen bit quantities. Fixed-point arithmetic is used to reduce area, power, ...

*

*

*

*

+

-

-

+

+

-

-

+

*jt2

t0

t3

t1

February 9, 2011 L03-4http://csg.csail.mit.edu/6.375

Page 5: Combinational Circuits in Bluespec Arvind  Computer Science & Artificial Intelligence Lab

4-way Butterfly Node

function Vector#(4,Complex) bfly4 (Vector#(4,Complex) t, Vector#(4,Complex)

k);

BSV has a very strong notion of types Every expression has a type. Either it is declared by

the user or automatically deduced by the compiler The compiler verifies that the type declarations are

compatible

*

*

**

+

-

-+

+

-

-+

*i

t0

t1

t2

t3

k0

k1

k2

k3

February 9, 2011 L03-5http://csg.csail.mit.edu/6.375

Page 6: Combinational Circuits in Bluespec Arvind  Computer Science & Artificial Intelligence Lab

BSV code: 4-way Butterflyfunction Vector#(4,Complex) bfly4 (Vector#(4,Complex) t, Vector#(4,Complex) k);

Vector#(4,Complex) m, y, z;

m[0] = k[0] * t[0]; m[1] = k[1] * t[1]; m[2] = k[2] * t[2]; m[3] = k[3] * t[3];

y[0] = m[0] + m[2]; y[1] = m[0] – m[2]; y[2] = m[1] + m[3]; y[3] = i*(m[1] – m[3]);

z[0] = y[0] + y[2]; z[1] = y[1] + y[3]; z[2] = y[0] – y[2]; z[3] = y[1] – y[3];

return(z);endfunction

Polymorphic code: works on any type of numbers for which *, + and - have been defined

*

*

**

+

-

-+

+

-

-+

*i

m y z

Note: Vector does not mean storageFebruary 9, 2011 L03-6http://csg.csail.mit.edu/6.375

Page 7: Combinational Circuits in Bluespec Arvind  Computer Science & Artificial Intelligence Lab

Complex ArithmeticAddition

zR = xR + yR zI = xI + yI

Multiplication zR = xR * yR - xI * yI zI = xR * yI + xI * yR

The actual arithmetic for FFT is different because we use a non-standard fixed point representation

February 9, 2011 L03-7http://csg.csail.mit.edu/6.375

Page 8: Combinational Circuits in Bluespec Arvind  Computer Science & Artificial Intelligence Lab

BSV code for Additiontypedef struct{ Int#(t) r; Int#(t) i;} Complex#(numeric type t) deriving (Eq,Bits); function Complex#(t) \+ (Complex#(t) x, Complex#(t) y); Int#(t) real = x.r + y.r; Int#(t) imag = x.i + y.i; return(Complex{r:real, i:imag});endfunction

What is the type of this + ?

February 9, 2011 L03-8http://csg.csail.mit.edu/6.375

Page 9: Combinational Circuits in Bluespec Arvind  Computer Science & Artificial Intelligence Lab

Combinational IFFTin0

in1

in2

in63

in3in4

Bfly4

Bfly4

Bfly4

x16

Bfly4

Bfly4

Bfly4

Bfly4

Bfly4

Bfly4

out0

out1

out2

out63

out3out4

Permute

Permute

Permute

stage_f function

repeat stage_f three times

function Vector#(64, Complex) stage_f (Bit#(2) stage, Vector#(64, Complex) stage_in);

function Vector#(64, Complex) ifft (Vector#(64, Complex) in_data);

February 9, 2011 L03-9http://csg.csail.mit.edu/6.375

Page 10: Combinational Circuits in Bluespec Arvind  Computer Science & Artificial Intelligence Lab

BSV Code: Combinational IFFTfunction Vector#(64, Complex) ifft

(Vector#(64, Complex) in_data);//Declare vectors Vector#(4,Vector#(64, Complex)) stage_data; stage_data[0] = in_data; for (Integer stage = 0; stage < 3; stage = stage + 1) stage_data[stage+1] = stage_f(stage,stage_data[stage]);return(stage_data[3]);

The for-loop is unfolded and stage_f is inlined during static elaboration

Note: no notion of loops or procedures during execution

February 9, 2011 L03-10http://csg.csail.mit.edu/6.375

Page 11: Combinational Circuits in Bluespec Arvind  Computer Science & Artificial Intelligence Lab

BSV Code: Combinational IFFT- Unfoldedfunction Vector#(64, Complex) ifft

(Vector#(64, Complex) in_data);//Declare vectors Vector#(4,Vector#(64, Complex)) stage_data;

stage_data[0] = in_data; for (Integer stage = 0; stage < 3; stage = stage + 1) stage_data[stage+1] = stage_f(stage,stage_data[stage]); return(stage_data[3]);

Stage_f can be inlined now; it could have been inlined before loop unfolding also.

Does the order matter?

stage_data[1] = stage_f(0,stage_data[0]);stage_data[2] = stage_f(1,stage_data[1]);stage_data[3] = stage_f(2,stage_data[2]);

February 9, 2011 L03-11http://csg.csail.mit.edu/6.375

Page 12: Combinational Circuits in Bluespec Arvind  Computer Science & Artificial Intelligence Lab

Bluespec Code for stage_ffunction Vector#(64, Complex) stage_f (Bit#(2) stage, Vector#(64, Complex) stage_in); begin for (Integer i = 0; i < 16; i = i + 1) begin Integer idx = i * 4; let twid = getTwiddle(stage, fromInteger(i)); let y = bfly4(twid, stage_in[idx:idx+3]); stage_temp[idx] = y[0]; stage_temp[idx+1] = y[1]; stage_temp[idx+2] = y[2]; stage_temp[idx+3] = y[3]; end //Permutation for (Integer i = 0; i < 64; i = i + 1) stage_out[i] = stage_temp[permute[i]]; endreturn(stage_out); twid’s are

mathematically derivable constants

February 9, 2011 L03-12http://csg.csail.mit.edu/6.375

Page 13: Combinational Circuits in Bluespec Arvind  Computer Science & Artificial Intelligence Lab

Higher-order functions:Stage functions f1, f2 and f3function f0(x); return (stage_f(0,x)); endfunction

function f1(x); return (stage_f(1,x)); endfunction

function f2(x); return (stage_f(2,x)); endfunction

What is the type of f0(x) ?function Vector#(64, Complex) f0

(Vector#(64, Complex) x);

February 9, 2011 L03-13http://csg.csail.mit.edu/6.375

Page 14: Combinational Circuits in Bluespec Arvind  Computer Science & Artificial Intelligence Lab

Suppose we want to reuse some part of the circuit ...

in0

in1

in2

in63

in3in4

Bfly4

Bfly4

Bfly4

x16

Bfly4

Bfly4

Bfly4

Bfly4

Bfly4

Bfly4

out0

out1

out2

out63

out3out4

Permute

Permute

Permute

Reuse the same circuit three times to reduce area

But why?

February 9, 2011 L03-14http://csg.csail.mit.edu/6.375

Page 15: Combinational Circuits in Bluespec Arvind  Computer Science & Artificial Intelligence Lab

Architectural Exploration:Area-Performance tradeoff in 802.11a Transmitter

February 9, 2011 L03-15http://csg.csail.mit.edu/6.375

Page 16: Combinational Circuits in Bluespec Arvind  Computer Science & Artificial Intelligence Lab

802.11a Transmitter Overview

Controller Scrambler Encoder

Interleaver Mapper

IFFT CyclicExtend

headers

data

IFFT Transforms 64 (frequency domain) complex numbers into 64 (time domain)

complex numbers accounts for 85% area

24 Uncoded

bits

One OFDM symbol (64 Complex Numbers)

Must produce one OFDM symbol every 4 msecDepending upon the transmission rate, consumes 1, 2 or 4 tokens to produce one OFDM symbol

February 9, 2011 L03-16http://csg.csail.mit.edu/6.375

Page 17: Combinational Circuits in Bluespec Arvind  Computer Science & Artificial Intelligence Lab

Preliminary results[MEMOCODE 2006] Dave, Gerding, Pellauer, Arvind

Design Lines of Block Code (BSV)Controller 49 Scrambler 40 Conv. Encoder 113 Interleaver 76 Mapper 112 IFFT 95 Cyc. Extender 23

Complex arithmetic libraries constitute another 200 lines of code

RelativeArea

0% 0% 0% 1%11%85% 3%

February 9, 2011 L03-17http://csg.csail.mit.edu/6.375

Page 18: Combinational Circuits in Bluespec Arvind  Computer Science & Artificial Intelligence Lab

Combinational IFFTin0

in1

in2

in63

in3in4

Bfly4

Bfly4

Bfly4

x16

Bfly4

Bfly4

Bfly4

Bfly4

Bfly4

Bfly4

out0

out1

out2

out63

out3out4

Permute

Permute

Permute

Reuse the same circuit three times to reduce area

February 9, 2011 L03-18http://csg.csail.mit.edu/6.375

Page 19: Combinational Circuits in Bluespec Arvind  Computer Science & Artificial Intelligence Lab

f g

Design AlternativesReuse a block over multiple cycles

we expect:Throughput toArea to

ff g

decrease – less parallelism

The clock needs to run faster for the same throughput hyper-linear increase in energy

decrease – reusing a block

February 9, 2011 L03-19http://csg.csail.mit.edu/6.375

Page 20: Combinational Circuits in Bluespec Arvind  Computer Science & Artificial Intelligence Lab

Circular pipeline: Reusing the Pipeline Stage

in0

in1

in2

in63

in3in4

out0

out1

out2

out63

out3out4

Bfly4

Bfly4

Permute

Stage Counter

February 9, 2011 L03-20http://csg.csail.mit.edu/6.375

Page 21: Combinational Circuits in Bluespec Arvind  Computer Science & Artificial Intelligence Lab

Superfolded circular pipeline: Just one Bfly-4 node!in0

in1

in2

in63

in3in4

out0

out1

out2

out63

out3out4

Bfly4

Permute

Index == 15?

Index: 0 to 15

64, 2-way M

uxes

4, 16-way M

uxes

4, 16-way DeM

uxes

Stage 0 to 2

February 9, 2011 L03-21http://csg.csail.mit.edu/6.375

Page 22: Combinational Circuits in Bluespec Arvind  Computer Science & Artificial Intelligence Lab

Pipelining a block

inQ outQf2f1 f3

CombinationalC

inQ outQf2f1 f3 PipelineP

inQ outQf Folded

PipelineFP

Clock? Area? Throughput?Clock: C < P FP Area: FP < C < P Throughput: FP < C < PFebruary 9, 2011 L03-22http://csg.csail.mit.edu/6.375

Page 23: Combinational Circuits in Bluespec Arvind  Computer Science & Artificial Intelligence Lab

Inelastic pipeline

xsReg1inQ

f0 f1 f2

sReg2 outQ

rule sync-pipeline (True); inQ.deq(); sReg1 <= f0(inQ.first()); sReg2 <= f1(sReg1); outQ.enq(f2(sReg2));endrule

This is real IFFT code; just replace f0, f1 and f2 with stage_f code

This rule can fire only if

Atomicity: Either all or none of the state elements inQ, outQ, sReg1 and sReg2 will be updated

- inQ has an element- outQ has space

February 9, 2011 L03-23http://csg.csail.mit.edu/6.375

Page 24: Combinational Circuits in Bluespec Arvind  Computer Science & Artificial Intelligence Lab

Stage functions f1, f2 and f3function f0(x); return (stage_f(0,x)); endfunction

function f1(x); return (stage_f(1,x)); endfunction

function f2(x); return (stage_f(2,x)); endfunction

The stage_f function was given earlier

February 9, 2011 L03-24http://csg.csail.mit.edu/6.375

Page 25: Combinational Circuits in Bluespec Arvind  Computer Science & Artificial Intelligence Lab

Problem: What about pipeline bubbles?

xsReg1inQ

f0 f1 f2

sReg2 outQ

rule sync-pipeline (True); inQ.deq(); sReg1 <= f0(inQ.first()); sReg2 <= f1(sReg1); outQ.enq(f2(sReg2));endrule

Red and Green tokens must move even if there is nothing in the inQ!

Modify the rule to deal with these conditions

Also if there is no token in sReg2 then nothing should be enqueued in the outQ

Valid bits orthe Maybe type

February 9, 2011 L03-25http://csg.csail.mit.edu/6.375

Page 26: Combinational Circuits in Bluespec Arvind  Computer Science & Artificial Intelligence Lab

The Maybe type data in the pipelinetypedef union tagged {

void Invalid;data_T Valid;

} Maybe#(type data_T);

datavalid/invalid

Registers contain Maybe type values

rule sync-pipeline (True); if (inQ.notEmpty()) begin sReg1 <= tagged Valid f0(inQ.first()); inQ.deq(); end else sReg1 <= tagged Invalid; case (sReg1) matches tagged Valid .sx1: sReg2 <= tagged Valid f1(sx1); tagged Invalid: sReg2 <= tagged Invalid; endcase case (sReg2) matches tagged Valid .sx2: outQ.enq(f2(sx2)); endcaseendrule

sx1 will get bound to the appropriate part of sReg1

February 9, 2011 L03-26http://csg.csail.mit.edu/6.375

Page 27: Combinational Circuits in Bluespec Arvind  Computer Science & Artificial Intelligence Lab

When is this rule enabled?rule sync-pipeline (True); if (inQ.notEmpty()) begin sReg1 <= tagged Valid f0(inQ.first()); inQ.deq(); end else sReg1 <= tagged Invalid; case (sReg1) matches tagged Valid .sx1: sReg2 <= tagged Valid f1(sx1); tagged Invalid: sReg2 <= tagged Invalid; endcase case (sReg2) matches tagged Valid .sx2: outQ.enq(f2(sx2)); endcaseendrule

February 9, 2011 L03-27http://csg.csail.mit.edu/6.375

inQ sReg1 sReg2 outQNE V V NFNE V V FNE V I NFNE V I FNE I V NFNE I V FNE I I NFNE I I F

E V V NFE V V FE V I NFE V I FE I V NFE I V FE I I NFE I I F

yesNoYesYesYesNoYesyes

yesNoYesYesYesNoYes1yes

Yes1 = yes but no change

inQ sReg1 sReg2 outQ

Page 28: Combinational Circuits in Bluespec Arvind  Computer Science & Artificial Intelligence Lab

Next lectureFolded pipeline for FFT

February 9, 2011 L03-28http://csg.csail.mit.edu/6.375