Constructive Computer Architecture Combinational circuits Arvind
Combinational Circuits in Bluespec Arvind Computer Science & Artificial Intelligence Lab
description
Transcript of Combinational Circuits in Bluespec Arvind Computer Science & Artificial Intelligence Lab
Combinational Circuits in Bluespec
Arvind Computer Science & Artificial Intelligence LabMassachusetts Institute of Technology
February 9, 2011 L03-1http://csg.csail.mit.edu/6.375
Bluespec: Two-Level Compilation
Object code(Verilog/C)
Rules and Actions(Term Rewriting System)
• Rule conflict analysis• Rule schedulingJames Hoe & Arvind@MIT 1997-2000
Bluespec(Objects, Types,
Higher-order functions)
Level 1 compilation• Type checking• Massive partial evaluation
and static elaboration
Level 2 synthesis
Lennart Augustsson@Sandburst 2000-2002
Now we call this Guarded Atomic Actions
February 9, 2011 L03-2http://csg.csail.mit.edu/6.375
Static Elaboration
.exe
compiledesign2 design3design1
elaborate w/params
run1run1run2.1…
run1run1run3.1…
run1run1run1.1…
run w/params
run w/params
run1run1…
run
At compile time Inline function calls and unroll loops Instantiate modules with specific parameters Resolve polymorphism/overloading, perform most
data structure operations
sourceSoftwareToolflow: source
HardwareToolflow:
February 9, 2011 L03-3http://csg.csail.mit.edu/6.375
Combinational IFFTin0
…
in1
in2
in63
in3in4
Bfly4
Bfly4
Bfly4
x16
Bfly4
Bfly4
Bfly4
…
Bfly4
Bfly4
Bfly4
…
out0
…
out1
out2
out63
out3out4
Permute
Permute
Permute
All numbers are complex and represented as two sixteen bit quantities. Fixed-point arithmetic is used to reduce area, power, ...
*
*
*
*
+
-
-
+
+
-
-
+
*jt2
t0
t3
t1
February 9, 2011 L03-4http://csg.csail.mit.edu/6.375
4-way Butterfly Node
function Vector#(4,Complex) bfly4 (Vector#(4,Complex) t, Vector#(4,Complex)
k);
BSV has a very strong notion of types Every expression has a type. Either it is declared by
the user or automatically deduced by the compiler The compiler verifies that the type declarations are
compatible
*
*
**
+
-
-+
+
-
-+
*i
t0
t1
t2
t3
k0
k1
k2
k3
February 9, 2011 L03-5http://csg.csail.mit.edu/6.375
BSV code: 4-way Butterflyfunction Vector#(4,Complex) bfly4 (Vector#(4,Complex) t, Vector#(4,Complex) k);
Vector#(4,Complex) m, y, z;
m[0] = k[0] * t[0]; m[1] = k[1] * t[1]; m[2] = k[2] * t[2]; m[3] = k[3] * t[3];
y[0] = m[0] + m[2]; y[1] = m[0] – m[2]; y[2] = m[1] + m[3]; y[3] = i*(m[1] – m[3]);
z[0] = y[0] + y[2]; z[1] = y[1] + y[3]; z[2] = y[0] – y[2]; z[3] = y[1] – y[3];
return(z);endfunction
Polymorphic code: works on any type of numbers for which *, + and - have been defined
*
*
**
+
-
-+
+
-
-+
*i
m y z
Note: Vector does not mean storageFebruary 9, 2011 L03-6http://csg.csail.mit.edu/6.375
Complex ArithmeticAddition
zR = xR + yR zI = xI + yI
Multiplication zR = xR * yR - xI * yI zI = xR * yI + xI * yR
The actual arithmetic for FFT is different because we use a non-standard fixed point representation
February 9, 2011 L03-7http://csg.csail.mit.edu/6.375
BSV code for Additiontypedef struct{ Int#(t) r; Int#(t) i;} Complex#(numeric type t) deriving (Eq,Bits); function Complex#(t) \+ (Complex#(t) x, Complex#(t) y); Int#(t) real = x.r + y.r; Int#(t) imag = x.i + y.i; return(Complex{r:real, i:imag});endfunction
What is the type of this + ?
February 9, 2011 L03-8http://csg.csail.mit.edu/6.375
Combinational IFFTin0
…
in1
in2
in63
in3in4
Bfly4
Bfly4
Bfly4
x16
Bfly4
Bfly4
Bfly4
…
Bfly4
Bfly4
Bfly4
…
out0
…
out1
out2
out63
out3out4
Permute
Permute
Permute
stage_f function
repeat stage_f three times
function Vector#(64, Complex) stage_f (Bit#(2) stage, Vector#(64, Complex) stage_in);
function Vector#(64, Complex) ifft (Vector#(64, Complex) in_data);
February 9, 2011 L03-9http://csg.csail.mit.edu/6.375
BSV Code: Combinational IFFTfunction Vector#(64, Complex) ifft
(Vector#(64, Complex) in_data);//Declare vectors Vector#(4,Vector#(64, Complex)) stage_data; stage_data[0] = in_data; for (Integer stage = 0; stage < 3; stage = stage + 1) stage_data[stage+1] = stage_f(stage,stage_data[stage]);return(stage_data[3]);
The for-loop is unfolded and stage_f is inlined during static elaboration
Note: no notion of loops or procedures during execution
February 9, 2011 L03-10http://csg.csail.mit.edu/6.375
BSV Code: Combinational IFFT- Unfoldedfunction Vector#(64, Complex) ifft
(Vector#(64, Complex) in_data);//Declare vectors Vector#(4,Vector#(64, Complex)) stage_data;
stage_data[0] = in_data; for (Integer stage = 0; stage < 3; stage = stage + 1) stage_data[stage+1] = stage_f(stage,stage_data[stage]); return(stage_data[3]);
Stage_f can be inlined now; it could have been inlined before loop unfolding also.
Does the order matter?
stage_data[1] = stage_f(0,stage_data[0]);stage_data[2] = stage_f(1,stage_data[1]);stage_data[3] = stage_f(2,stage_data[2]);
February 9, 2011 L03-11http://csg.csail.mit.edu/6.375
Bluespec Code for stage_ffunction Vector#(64, Complex) stage_f (Bit#(2) stage, Vector#(64, Complex) stage_in); begin for (Integer i = 0; i < 16; i = i + 1) begin Integer idx = i * 4; let twid = getTwiddle(stage, fromInteger(i)); let y = bfly4(twid, stage_in[idx:idx+3]); stage_temp[idx] = y[0]; stage_temp[idx+1] = y[1]; stage_temp[idx+2] = y[2]; stage_temp[idx+3] = y[3]; end //Permutation for (Integer i = 0; i < 64; i = i + 1) stage_out[i] = stage_temp[permute[i]]; endreturn(stage_out); twid’s are
mathematically derivable constants
February 9, 2011 L03-12http://csg.csail.mit.edu/6.375
Higher-order functions:Stage functions f1, f2 and f3function f0(x); return (stage_f(0,x)); endfunction
function f1(x); return (stage_f(1,x)); endfunction
function f2(x); return (stage_f(2,x)); endfunction
What is the type of f0(x) ?function Vector#(64, Complex) f0
(Vector#(64, Complex) x);
February 9, 2011 L03-13http://csg.csail.mit.edu/6.375
Suppose we want to reuse some part of the circuit ...
in0
…
in1
in2
in63
in3in4
Bfly4
Bfly4
Bfly4
x16
Bfly4
Bfly4
Bfly4
…
Bfly4
Bfly4
Bfly4
…
out0
…
out1
out2
out63
out3out4
Permute
Permute
Permute
Reuse the same circuit three times to reduce area
But why?
February 9, 2011 L03-14http://csg.csail.mit.edu/6.375
Architectural Exploration:Area-Performance tradeoff in 802.11a Transmitter
February 9, 2011 L03-15http://csg.csail.mit.edu/6.375
802.11a Transmitter Overview
Controller Scrambler Encoder
Interleaver Mapper
IFFT CyclicExtend
headers
data
IFFT Transforms 64 (frequency domain) complex numbers into 64 (time domain)
complex numbers accounts for 85% area
24 Uncoded
bits
One OFDM symbol (64 Complex Numbers)
Must produce one OFDM symbol every 4 msecDepending upon the transmission rate, consumes 1, 2 or 4 tokens to produce one OFDM symbol
February 9, 2011 L03-16http://csg.csail.mit.edu/6.375
Preliminary results[MEMOCODE 2006] Dave, Gerding, Pellauer, Arvind
Design Lines of Block Code (BSV)Controller 49 Scrambler 40 Conv. Encoder 113 Interleaver 76 Mapper 112 IFFT 95 Cyc. Extender 23
Complex arithmetic libraries constitute another 200 lines of code
RelativeArea
0% 0% 0% 1%11%85% 3%
February 9, 2011 L03-17http://csg.csail.mit.edu/6.375
Combinational IFFTin0
…
in1
in2
in63
in3in4
Bfly4
Bfly4
Bfly4
x16
Bfly4
Bfly4
Bfly4
…
Bfly4
Bfly4
Bfly4
…
out0
…
out1
out2
out63
out3out4
Permute
Permute
Permute
Reuse the same circuit three times to reduce area
February 9, 2011 L03-18http://csg.csail.mit.edu/6.375
f g
Design AlternativesReuse a block over multiple cycles
we expect:Throughput toArea to
ff g
decrease – less parallelism
The clock needs to run faster for the same throughput hyper-linear increase in energy
decrease – reusing a block
February 9, 2011 L03-19http://csg.csail.mit.edu/6.375
Circular pipeline: Reusing the Pipeline Stage
in0
…
in1
in2
in63
in3in4
out0
…
out1
out2
out63
out3out4
…
Bfly4
Bfly4
Permute
Stage Counter
February 9, 2011 L03-20http://csg.csail.mit.edu/6.375
Superfolded circular pipeline: Just one Bfly-4 node!in0
…
in1
in2
in63
in3in4
out0
…
out1
out2
out63
out3out4
Bfly4
Permute
Index == 15?
Index: 0 to 15
64, 2-way M
uxes
4, 16-way M
uxes
4, 16-way DeM
uxes
Stage 0 to 2
February 9, 2011 L03-21http://csg.csail.mit.edu/6.375
Pipelining a block
inQ outQf2f1 f3
CombinationalC
inQ outQf2f1 f3 PipelineP
inQ outQf Folded
PipelineFP
Clock? Area? Throughput?Clock: C < P FP Area: FP < C < P Throughput: FP < C < PFebruary 9, 2011 L03-22http://csg.csail.mit.edu/6.375
Inelastic pipeline
xsReg1inQ
f0 f1 f2
sReg2 outQ
rule sync-pipeline (True); inQ.deq(); sReg1 <= f0(inQ.first()); sReg2 <= f1(sReg1); outQ.enq(f2(sReg2));endrule
This is real IFFT code; just replace f0, f1 and f2 with stage_f code
This rule can fire only if
Atomicity: Either all or none of the state elements inQ, outQ, sReg1 and sReg2 will be updated
- inQ has an element- outQ has space
February 9, 2011 L03-23http://csg.csail.mit.edu/6.375
Stage functions f1, f2 and f3function f0(x); return (stage_f(0,x)); endfunction
function f1(x); return (stage_f(1,x)); endfunction
function f2(x); return (stage_f(2,x)); endfunction
The stage_f function was given earlier
February 9, 2011 L03-24http://csg.csail.mit.edu/6.375
Problem: What about pipeline bubbles?
xsReg1inQ
f0 f1 f2
sReg2 outQ
rule sync-pipeline (True); inQ.deq(); sReg1 <= f0(inQ.first()); sReg2 <= f1(sReg1); outQ.enq(f2(sReg2));endrule
Red and Green tokens must move even if there is nothing in the inQ!
Modify the rule to deal with these conditions
Also if there is no token in sReg2 then nothing should be enqueued in the outQ
Valid bits orthe Maybe type
February 9, 2011 L03-25http://csg.csail.mit.edu/6.375
The Maybe type data in the pipelinetypedef union tagged {
void Invalid;data_T Valid;
} Maybe#(type data_T);
datavalid/invalid
Registers contain Maybe type values
rule sync-pipeline (True); if (inQ.notEmpty()) begin sReg1 <= tagged Valid f0(inQ.first()); inQ.deq(); end else sReg1 <= tagged Invalid; case (sReg1) matches tagged Valid .sx1: sReg2 <= tagged Valid f1(sx1); tagged Invalid: sReg2 <= tagged Invalid; endcase case (sReg2) matches tagged Valid .sx2: outQ.enq(f2(sx2)); endcaseendrule
sx1 will get bound to the appropriate part of sReg1
February 9, 2011 L03-26http://csg.csail.mit.edu/6.375
When is this rule enabled?rule sync-pipeline (True); if (inQ.notEmpty()) begin sReg1 <= tagged Valid f0(inQ.first()); inQ.deq(); end else sReg1 <= tagged Invalid; case (sReg1) matches tagged Valid .sx1: sReg2 <= tagged Valid f1(sx1); tagged Invalid: sReg2 <= tagged Invalid; endcase case (sReg2) matches tagged Valid .sx2: outQ.enq(f2(sx2)); endcaseendrule
February 9, 2011 L03-27http://csg.csail.mit.edu/6.375
inQ sReg1 sReg2 outQNE V V NFNE V V FNE V I NFNE V I FNE I V NFNE I V FNE I I NFNE I I F
E V V NFE V V FE V I NFE V I FE I V NFE I V FE I I NFE I I F
yesNoYesYesYesNoYesyes
yesNoYesYesYesNoYes1yes
Yes1 = yes but no change
inQ sReg1 sReg2 outQ
Next lectureFolded pipeline for FFT
February 9, 2011 L03-28http://csg.csail.mit.edu/6.375