Bluespec Lectures 3 & 4 with some slides from Nikhil Rishiyur at Bluespec and Simon Moore at the...

61
Bluespec Lectures 3 & 4 with some slides from Nikhil Rishiyur at Bluespec and Simon Moore at the University of Cambridge

Transcript of Bluespec Lectures 3 & 4 with some slides from Nikhil Rishiyur at Bluespec and Simon Moore at the...

Page 1: Bluespec Lectures 3 & 4 with some slides from Nikhil Rishiyur at Bluespec and Simon Moore at the University of Cambridge.

Bluespec

Lectures 3 & 4with some slides from Nikhil Rishiyur at Bluespecand Simon Moore at the University of Cambridge

Page 2: Bluespec Lectures 3 & 4 with some slides from Nikhil Rishiyur at Bluespec and Simon Moore at the University of Cambridge.

Course Resources

• http://cas.ee.ic.ac.uk/~ssingh • Lecture notes (Power Point, PDF)• Example Bluespec programs used in Lectures• Complete Photoshop system (Bluespec)• Links to Bluespec code samples• User guide, reference guide: doc sub-

directory of Bluespec installation • More information at http://bluespec.com

Page 3: Bluespec Lectures 3 & 4 with some slides from Nikhil Rishiyur at Bluespec and Simon Moore at the University of Cambridge.

Rules, not clock edges

• rules are atomic– they execute within one clock cycle

• structure:rule name (explicit conditions)

statements;endrule

• conditions:– explicit – conditions (Boolean expression) provided– implicit – conditions that have to be met to allow the

statements to fire, e.g. for fifo.enq only if fifo not full

Page 4: Bluespec Lectures 3 & 4 with some slides from Nikhil Rishiyur at Bluespec and Simon Moore at the University of Cambridge.

Rules: powerfulalternative to always blocks

• rules for state updates instead of always blocks• Simple concept: think if…then…

• Rule can execute (or “fire”) only when its conditions are TRUE• Every rule is atomic with respect to other rules• Powerful ramifications:

– Executable specification – design around operations as described in specs– Atomicity of rules dramatically reduces concurrency bugs– Automates management of shared resources – avoids many complex

errors

rule ruleName (<boolean cond>); <state update(s)>

endrule

Page 5: Bluespec Lectures 3 & 4 with some slides from Nikhil Rishiyur at Bluespec and Simon Moore at the University of Cambridge.

Bits, Bools and conversion• Bit#(width)

– vector of bits

• Bool– single bit for Booleans (True, False)

• pack()– function to convert most things (pack) into a bit representation

• unpack()– opposite of pack()

• extend()– extend an integer (signed, unsigned, bits)

• truncate()– truncate an integer

Page 6: Bluespec Lectures 3 & 4 with some slides from Nikhil Rishiyur at Bluespec and Simon Moore at the University of Cambridge.

Reg and Bit/Uint/Int types• registers (initialised and uninitialised versions):

Reg#(type) name0 <- mkReg(initial_value);Reg#(type) name1 <- mkRegU;

• some types (unsigned and signed integer, and bits):UInt#(width), Int#(width), Bit#(width)

• example:Reg#(UInt#(8)) counter <- mkReg(0);rule count_up;

counter <= counter+1;endrule

name of module to “make”(i.e. instantiate)

N.B. modules are typically prefixed “mk”interface type

type parameter (e.g. UInt#(8))

since Reg is generic

Page 7: Bluespec Lectures 3 & 4 with some slides from Nikhil Rishiyur at Bluespec and Simon Moore at the University of Cambridge.

Registers

interface Reg#(type a); method Action _write (a x1); method a _read ();endinterface: Reg• Polymorphic• Just library elements• In one cycle register reads must execute before

register writes• x <= y + 1 is syntactic sugar for

x._write (y._read + 1)

Page 8: Bluespec Lectures 3 & 4 with some slides from Nikhil Rishiyur at Bluespec and Simon Moore at the University of Cambridge.

Scheduling Annotations

C Conflict

CF Conflict free

SB Sequence before

SBR Sequence before restricted(cannot be in the same rule)

SA Sequence after

SAR Sequence after restricted(cannot be in the same rule)

Page 9: Bluespec Lectures 3 & 4 with some slides from Nikhil Rishiyur at Bluespec and Simon Moore at the University of Cambridge.

Scheduling Annotations for a Register

read writeread CF SB

write SA SBR

• Two read methods would be conflict-free (CF), that is, you could have multiple methods that read from the same register in the same rule, sequenced in any order.

• A write is sequenced after (SA) a read.• A read is sequenced before (SB) a write.• If you have two write methods, one must be sequenced before the other,

and they cannot be in the same rule, as indicated by the annotation SBR.

Page 10: Bluespec Lectures 3 & 4 with some slides from Nikhil Rishiyur at Bluespec and Simon Moore at the University of Cambridge.

Updating Registers

Reg#(int) x <- mkReg (0) ;

rule countup (x < 30); int y = x + 1; x <= x + 1; $display ("x = %0d, y = %0d", x, y);endrule

Page 11: Bluespec Lectures 3 & 4 with some slides from Nikhil Rishiyur at Bluespec and Simon Moore at the University of Cambridge.

Rules of Rules (The Three Basics)

1. Rules are atomic 2. Rules fire or don’t at most once per cycle 3. Rules don’t conflict with other rules

Page 12: Bluespec Lectures 3 & 4 with some slides from Nikhil Rishiyur at Bluespec and Simon Moore at the University of Cambridge.

x

y+1Q

D

D

Q +1

clk

rule r1; x <= y + 1; endrule

rule r2; y <= x + 1; endrule

Page 13: Bluespec Lectures 3 & 4 with some slides from Nikhil Rishiyur at Bluespec and Simon Moore at the University of Cambridge.

x2

y2+1Q

D

D

Q +1

clk

(* synthesize *)module rules4 (Empty);

Reg#(int) x <- mkReg (10); Reg#(int) y <- mkReg (100);

rule r1; x <= y + 1; endrule

rule r2; y <= x + 1; endrule

rule monitor; $display ("x, y = %0d, %0d ", x, y); endrule

endmodule$ ./rules4 -m 5x, y = 10, 100x, y = 10, 11x, y = 10, 11x, y = 10, 11

Page 14: Bluespec Lectures 3 & 4 with some slides from Nikhil Rishiyur at Bluespec and Simon Moore at the University of Cambridge.

$ ./rules5 -m 5x, y = 10, 100x, y = 101, 11x, y = 12, 102x, y = 103, 13

x

y+1Q

D

D

Q +1

clk

(* synthesize *)module rules5 (Empty);

Reg#(int) x <- mkReg (10); Reg#(int) y <- mkReg (100);

rule r ; x <= y + 1; y <= x + 1; endrule

rule monitor; $display ("x, y = %0d, %0d ", x, y); endrule

endmodule

Page 15: Bluespec Lectures 3 & 4 with some slides from Nikhil Rishiyur at Bluespec and Simon Moore at the University of Cambridge.

x2

y2+1Q

D

D

Q +1

clk

(* synthesize *)module rules6 (Empty);

Reg#(int) x <- mkReg (10); Reg#(int) y <- mkReg (100);

rule r1; x <= y + 1; endrule

rule r2; y <= x + 1; endrule

(* descending_urgency = "r1, r2" *)

rule monitor; $display ("x, y = %0d, %0d ", x, y); endrule

endmodule$ ./rules6 -m 5x, y = 10, 100x, y = 101, 100x, y = 101, 100x, y = 101, 100

Page 16: Bluespec Lectures 3 & 4 with some slides from Nikhil Rishiyur at Bluespec and Simon Moore at the University of Cambridge.

interface Rules7_Interface ; method int readValue ; method Action setValue (int newXvalue) ; method ActionValue#(int) increment ;endinterface

(* synthesize *)module rules7 (Rules7_Interface);

Reg#(int) x <- mkReg (0);

method readValue ; return x ; endmethod

method Action setValue (int newXvalue); x <= newXvalue ; endmethod

method ActionValue#(int) increment ; x <= x + 1 ; return x ; endmethod

endmodule

Page 17: Bluespec Lectures 3 & 4 with some slides from Nikhil Rishiyur at Bluespec and Simon Moore at the University of Cambridge.

interface Rules7_Interface ; (* always_ready *) method int readResult ; (* always_enabled *) method Action setValues (int newX, int newY, int newZ) ;endinterface

(* synthesize *)module rules7 (Rules7_Interface) ;

Reg#(int) x <- mkReg (0) ; Reg#(int) y <- mkReg (0) ; Reg#(int) z <- mkReg (0) ; Reg#(int) result <- mkRegU ;

Reg#(Bool) b <- mkReg (False) ;

rule toggle ; b <= !b ; endrule

rule r1 (b) ; result <= x * y ; endrule

rule r2 (!b) ; result <= x * z ; endrule

method readResult = result ;

method Action setValues (int newX, int newY, int newZ) ; x <= newX ; y <= newY ; z <= newZ ; endmethod

endmodule

// remaining internal signals assign x_MUL_y___d8 = x * y ; assign x_MUL_z___d5 = x * z ;

Page 18: Bluespec Lectures 3 & 4 with some slides from Nikhil Rishiyur at Bluespec and Simon Moore at the University of Cambridge.
Page 19: Bluespec Lectures 3 & 4 with some slides from Nikhil Rishiyur at Bluespec and Simon Moore at the University of Cambridge.

interface Rules8_Interface ; (* always_ready *) method int readResult ; (* always_enabled *) method Action setValues (int newX, int newY, int newZ) ;endinterface

(* synthesize *)module rules8 (Rules8_Interface) ;

Reg#(int) x <- mkReg (0) ; Reg#(int) y <- mkReg (0) ; Reg#(int) z <- mkReg (0) ; Wire#(int) t <- mkWire ; Reg#(int) result <- mkRegU ;

Reg#(Bool) b <- mkReg (False) ;

rule toggle ; b <= !b ; endrule

rule computeT ; if (b) t <= y ; else t <= z ; endrule

rule r1 (b) ; result <= x * t ; endrule

method readResult = result ;

method Action setValues (int newX, int newY, int newZ) ; x <= newX ; y <= newY ; z <= newZ ; endmethod

endmodule

// inlined wires assign t$wget = b ? y : z ;

// remaining internal signals assign x_MUL_t_wget___d6 = x * t$wget ;

Page 20: Bluespec Lectures 3 & 4 with some slides from Nikhil Rishiyur at Bluespec and Simon Moore at the University of Cambridge.
Page 21: Bluespec Lectures 3 & 4 with some slides from Nikhil Rishiyur at Bluespec and Simon Moore at the University of Cambridge.

High Level Synthesis

• Most work on high level synthesis focuses on the automation scheduling and allocation to achieve resource sharing.

• Perspective: high level synthesis in general applies to many aspects of converting high level descriptions into efficient circuits but there has been an undue level of effort on resource sharing in an ASIC context.

• Bluespec automates many aspects of scheduling (it makes scheduling composable) but resource usage is under the explicit control of the designer.

• For FPGA-based design this is often a better bit as a programming model.

Page 22: Bluespec Lectures 3 & 4 with some slides from Nikhil Rishiyur at Bluespec and Simon Moore at the University of Cambridge.

Simple example withconcurrency and shared resources

Process 0: increments register x when cond0Process 1: transfers a unit from register x to register y when cond1Process 2: decrements register y when cond2

Each register can only be updated by one process on each clock. Priority: 2 > 1 > 0

Just like real applications, e.g.: Bank account: 0 = deposit to checking, 1 = transfer from checking to

savings, 2 = withdraw from savings

0 1 2x y

+1 -1 +1 -1

Process priority: 2 > 1 > 0

cond0 cond1 cond2

Page 23: Bluespec Lectures 3 & 4 with some slides from Nikhil Rishiyur at Bluespec and Simon Moore at the University of Cambridge.

Fundamentally, we are scheduling three potentially concurrent atomic transactions that share resources.

What if the priorities changed: cond1 > cond2 > cond0?What if the processes are in different modules?

0 1 2x y

+1 -1 +1 -1 Process priority: 2 > 1 > 0

cond0 cond1 cond2

always @(posedge CLK) begin if (cond2) y <= y – 1; else if (cond1) begin y <= y + 1; x <= x – 1; end

if (cond0 && !cond1) x <= x + 1;end

* There are other ways to write this RTL, but all suffer from same analysis

Resource-access scheduling logic i.e., control logic

always @(posedge CLK) begin if (cond2) y <= y – 1; else if (cond1) begin y <= y + 1; x <= x – 1; end

if (cond0 && (!cond1 || cond2) ) x <= x + 1;end

Better scheduling

Page 24: Bluespec Lectures 3 & 4 with some slides from Nikhil Rishiyur at Bluespec and Simon Moore at the University of Cambridge.

With Bluespec, the design is direct

(* descending_urgency = “proc2, proc1, proc0” *)

rule proc0 (cond0); x <= x + 1;endrule

rule proc1 (cond1); y <= y + 1; x <= x – 1;endrule

rule proc2 (cond2); y <= y – 1;endrule

Hand-written RTL:Explicit scheduling Complex clutter, unmaintainable

BSV:Functional correctness follows directly from rule semantics (atomicity)

Executable spec (operation-centric)

Automatic handling of shared resource control logic

Same hardware as the RTL

0 1 2x y

+1 -1 +1 -1

Process priority: 2 > 1 > 0

cond0 cond1 cond2

Page 25: Bluespec Lectures 3 & 4 with some slides from Nikhil Rishiyur at Bluespec and Simon Moore at the University of Cambridge.

Now, let’s make a small change: add a new process and insert its priority

01

2

x y

+1

-1 +1

-1

Process priority: 2 > 3 > 1 > 0

cond0 cond1 cond2

3+2 -2

cond3

Page 26: Bluespec Lectures 3 & 4 with some slides from Nikhil Rishiyur at Bluespec and Simon Moore at the University of Cambridge.

Process priority: 2 > 3 > 1 > 0

Changing the Bluespec design

01

2

x y

+1

-1 +1

-1

cond0 cond1 cond2

3+2 -2

cond3

(* descending_urgency = “proc2, proc1, proc0” *)

rule proc0 (cond0); x <= x + 1;endrule

rule proc1 (cond1); y <= y + 1; x <= x – 1;endrule

rule proc2 (cond2); y <= y – 1;endrule

(* descending_urgency = "proc2, proc3, proc1, proc0" *) rule proc0 (cond0); x <= x + 1;endrule rule proc1 (cond1); y <= y + 1; x <= x - 1;endrule rule proc2 (cond2); y <= y - 1; x <= x + 1;endrule rule proc3 (cond3); y <= y - 2; x <= x + 2;endrule

Pre-Change

?

Page 27: Bluespec Lectures 3 & 4 with some slides from Nikhil Rishiyur at Bluespec and Simon Moore at the University of Cambridge.

Process priority: 2 > 3 > 1 > 0

Changing the Verilog design

01

2

x y

+1

-1 +1

-1

cond0 cond1 cond2

3+2 -2

cond3

always @(posedge CLK) begin if (!cond2 && cond1) x <= x – 1; else if (cond0) x <= x + 1;

if (cond2) y <= y – 1; else if (cond1) y <= y + 1;end

always @(posedge CLK) begin if ((cond2 && cond0) || (cond0 && !cond1 && !cond3)) x <= x + 1; else if (cond3 && !cond2) x <= x + 2; else if (cond1 && !cond2) x <= x - 1 if (cond2) y <= y - 1; else if (cond3) y <= y - 2; else if (cond1) y <= y + 1;end

Pre-Change

?

Page 28: Bluespec Lectures 3 & 4 with some slides from Nikhil Rishiyur at Bluespec and Simon Moore at the University of Cambridge.

Alternate RTL style (more common)

• Combinatorial explosion• Case 3’b111 is subtle• Many repetitions of update actions

( cut-paste errors)– cf. “WTO Principle” (Write Things

Once—Gerard Berry)• Difficult to maintain/extend• Difficult to modularize

0 1 2x y

+1 -1 +1 -1 Process priority: 2 > 1 > 0

cond0 cond1 cond2

always @ (posedge clk) case ({cond0, cond1, cond2}) 3'b000: begin // nothing happens x <= x; y <= y; end 3'b001: begin //proc2 fires y <= y-1; end 3'b010: begin //proc1 x <= x-1; y <= y+1; end 3'b011: begin //proc2 fires (2>1) y <= y-1; end 3'b100: begin //proc0 x <= x+1; end 3'b101: begin //proc2 + proc0 x <= x+1; y <= y-1; end 3'b110: begin //proc1 (1>0) x <= x-1; y <= y+1; end 3'b111: begin //proc2 + proc0 x <= x+1; // NOTE – subtle! y <= y-1; end endcase

Page 29: Bluespec Lectures 3 & 4 with some slides from Nikhil Rishiyur at Bluespec and Simon Moore at the University of Cambridge.

Late Specifications

Late specification changes and feature enhancements are challenging to deal with.

Micro-architectural changes for timing/area/performance, e.g.: Adding a pipeline stage to an existing pipeline Adding a pipeline stage where pipelining was not anticipated Spreading a calculation over more clocks (longer iteration) Moving logic across a register stage (rebalancing) Restructuring combinational clouds for shallower logic

Fixing bugs

Bluespec makes it easier to try out multiple macro/micro-architectures earlier in the design cycle

Page 30: Bluespec Lectures 3 & 4 with some slides from Nikhil Rishiyur at Bluespec and Simon Moore at the University of Cambridge.

Why Rule atomicity improves correctness

Correctness is often couched (formally or informally) as an invariant E.g.,

Rule atomicity improves thinking about (and formally proving) invariants, because invariants can be verified one rule at a time

In contrast, in RTL and thread models, must think of all possible interleavings cf. The Problem With Threads, Edward A. Lee, IEEE Computer

39(5), May 2006, pp. 33-42

“# ingress packets — # egress packets == packet-count register value”

Page 31: Bluespec Lectures 3 & 4 with some slides from Nikhil Rishiyur at Bluespec and Simon Moore at the University of Cambridge.

Bank Account: Key Benefits

• Executable specifications• Rapid changes• But, with fine-grained control of RTL:

– Define the optimal architecture/micro-architecture

– Debug at the source OR RTL level – designer understands both

– The Quality of Results (QoR) of RTL!

Page 32: Bluespec Lectures 3 & 4 with some slides from Nikhil Rishiyur at Bluespec and Simon Moore at the University of Cambridge.

A more complexexample, from CPU design

Speculative, out-of-orderMany, many concurrent activities

Branch

RegisterFile

ALUUnitRe-

OrderBuffer(ROB) MEM

Unit

DataMemory

InstructionMemory

Fetch Decode

FIFO

FIFO FIFO FIFO FIFO

FIFO

FIFOFIFO

FIFOFIFORe-

OrderBuffer(ROB)

Branch

RegisterFile

ALUUnit

MEMUnit

DataMemory

InstructionMemory

Fetch Decode

Page 33: Bluespec Lectures 3 & 4 with some slides from Nikhil Rishiyur at Bluespec and Simon Moore at the University of Cambridge.

33

Many concurrent actions on common state: nightmare to manage explicitly

EmptyWaiting

EW

Head

Tail

V - -Instr - V -

V - -Instr - V -

V - -Instr - V -

V - -Instr - V -

V - -Instr - V -

V - -Instr - V -

V - -Instr - V -

V - -Instr - V -

V - -Instr - V -

V - -Instr - V -

V 0 -Instr B V 0W

V 0 -Instr C V 0W

-Instr D V 0W

V 0 -Instr A V 0W

V - -Instr - V -

V - -Instr - V -E

E

E

E

E

E

E

E

E

E

E

E

V 0

Re-Order Buffer

Put aninstr into

ROB

DecodeUnit

RegisterFile

Get operandsfor instr

Writebackresults

Get a readyALU instr

Get a readyMEM instr

Put ALU instr results in ROB

Put MEM instr results in ROB

ALUUnit(s)

MEMUnit(s)Resolve

branches

Operand 1 ResultInstruction Operand 2State

Page 34: Bluespec Lectures 3 & 4 with some slides from Nikhil Rishiyur at Bluespec and Simon Moore at the University of Cambridge.

Branch Resolution• …• …• …

Commit Instr• Write results to registerfile (or allow memorywrite for store)• Set to Empty• Increment head pointer

Write Back Results to ROB• Write back results toinstr result• Write back to all waitingtags• Set to done

Dispatch Instr• Mark instructiondispatched• Forward to appropriateunit

In Bluespec…

..you can code each operation in isolation, as a rule

..the tool guarantees that operations are INTERLOCKED (i.e. each runs to completion without external interference)

Insert Instr in ROB• Put instruction in firstavailable slot• Increment tail pointer• Get source operands

- RF <or> prev instr

Page 35: Bluespec Lectures 3 & 4 with some slides from Nikhil Rishiyur at Bluespec and Simon Moore at the University of Cambridge.

Which oneis correct?

What’s required to verify that they’re correct?What if the priorities changed: cond1 > cond2 > cond0?What if the processes are in different modules?

always @(posedge CLK) begin if (!cond2 || cond1) x <= x – 1; else if (cond0) x <= x + 1;

if (cond2) y <= y – 1; else if (cond1) y <= y + 1;end

0 1 2x y

+1 -1 +1 -1 Process priority: 2 > 1 > 0

cond0 cond1 cond2

always @(posedge CLK) begin if (!cond2 && cond1) x <= x – 1; else if (cond0) x <= x + 1;

if (cond2) y <= y – 1; else if (cond1) y <= y + 1;end

Page 36: Bluespec Lectures 3 & 4 with some slides from Nikhil Rishiyur at Bluespec and Simon Moore at the University of Cambridge.

Some Verilog solutions

Functional code and scheduling code are deeply (inextricably) intertwined.

What’s required to verify that they’re correct?What if the priorities changed: cond1 > cond2 > cond0?What if the processes are in different modules?

always @(posedge CLK) begin if (!cond2 || cond1) x <= x – 1; else if (cond0) x <= x + 1;

if (cond2) y <= y – 1; else if (cond1) y <= y + 1;end

0 1 2x y

+1 -1 +1 -1

always @(posedge CLK) begin if (!cond2 && cond1) x <= x – 1; else if (cond0) x <= x + 1;

if (cond2) y <= y – 1; else if (cond1) y <= y + 1;end

Which oneis correct?

Process priority:2 > 1 > 0

cond0 cond1 cond2

Page 37: Bluespec Lectures 3 & 4 with some slides from Nikhil Rishiyur at Bluespec and Simon Moore at the University of Cambridge.

37

Finite State Machines in Bluespec

for makigncomposable, parallel, nested, suspendable/abortable FSMs

Features:• FSMs automatically synthesized• Complex FSMs expressed succinctly• FSM actions have same atomic semantics as BSV rule bodies

• Well-behaved on shared resources—no surprises• Standard BSV interfaces and BSV’s higher-order functions can write your

own FSM generators

fsm

sequentialloops

fsm fsm

sequencing

fsm

fsm

fsm

fsm

if-then-else parallel FSMs(fork-join)

fsm

fsmfsm

hierarchy(with suspend and abort)

This powerful capability is enabled by higher-order functions, polymorphic types, advanced parameterization and atomic transactions

Enables exponentially smallerdescriptions compared to flat FSMs

Page 38: Bluespec Lectures 3 & 4 with some slides from Nikhil Rishiyur at Bluespec and Simon Moore at the University of Cambridge.

38

FSM example (from testbench stimulus section)

Stmt s = seq action rand_packets0.init; rand_packets1.init; endaction par for (j0 <= 0; j0 < n; j0 <= j0 + 1) action let pkt0 <- rand_packets0.next; switch.ports[0].put (pkt0); endaction for (j1 <= 0; j1 < n; j1 <= j1 + 1) action let pkt1 <- rand_packets1.next; switch.ports[1].put (pkt1); endaction endpar drain_switch; endseq;

FSM fsm <- mkFSM (s);

rule go; s.start;endrule

Basic FSM statements are “Actions”, just like rule bodies, and have exactly the same atomic semantics. Thus, BSV FSMs are well-behaved with respect to concurrent resource contention and flow control.

Page 39: Bluespec Lectures 3 & 4 with some slides from Nikhil Rishiyur at Bluespec and Simon Moore at the University of Cambridge.

39

Strong support for multiple clock and reset domains

• Rich and mature support for MCD (multiple clock domains and resets)

• Clock is a first-class data type• Cannot accidentally mix clocks and ordinary signals• Strong static checking ensures that it is impossible to

accidentally cross clock domain boundaries (i.e., without a synchronizer)• No need for linting tools to check domain discipline

• Clock manipulation• Clocks can be passed in and out of module interfaces• Library of clock dividers and other transformations• Module instantiation can specify an alternative clock (instead

of inheriting parent’s default clock)

• (Similarly: Reset and reset domains)

Page 40: Bluespec Lectures 3 & 4 with some slides from Nikhil Rishiyur at Bluespec and Simon Moore at the University of Cambridge.

Synthesis of Atomic Actions

state

ComputePredicates

for each rule

Compute next state

for each rule

scheduler

SelectorMux’s & priority

encoders

read

p3

p2

p1

d1

d2

d3

f1 f2 f3

update

Predicates computed for each rule with a combinationalcircuit

Select maximal subset of applicable rules

enabled rules

Potential update functions

Page 41: Bluespec Lectures 3 & 4 with some slides from Nikhil Rishiyur at Bluespec and Simon Moore at the University of Cambridge.

Key Issue: How to select to maximal subset of rules for firing?

• Two rules R1 and R2 can execute simultaneously if they are “conflict free” i.e.– R1 and R2 do not update the same state; and– Neither R1 or R2 do not read the that the other

updates (“sequentially composable” rules)

Page 42: Bluespec Lectures 3 & 4 with some slides from Nikhil Rishiyur at Bluespec and Simon Moore at the University of Cambridge.

Rules of Rules (The Details 1-5/10)1. Rules are atomic: rules fire completely or not at all, and you can imagine that

nothing else happens during their execution. 2. Explicit and implicit conditions may prevent rules from firing.3. Every rule fires exactly 0 or 1 times every cycle (at this point in our product's

history anyway ;) 4. Rules that conflict in some way may fire together in the same cycle, but only

if the compiler can schedule them in a valid order to do so -- that is, where the overall effect is as if they had happened one at at time as in (1) above.

5. Rules determine if they are going to fire or not before they actually do so. They are considered in their order of "urgency" (by a "greedy algorithm"): they "will fire" if they "can fire" and are not prevented by a conflict with a rule which has been selected already. It's OK to think of this phase as being completed (except for wires) before any rules are actually executed. This is what "urgency" is about.

Page 43: Bluespec Lectures 3 & 4 with some slides from Nikhil Rishiyur at Bluespec and Simon Moore at the University of Cambridge.

Rules of Rules (The Details 6-10/10)

6. After determining which rules are going to fire, the simulator can then schedule their execution. (In hardware it's all done by combinational logic which has the same effect.) Rules do not need to execute in the same order as they were considered for deciding whether they "will fire". For example rule1 can have a higher urgency than rule2, but it is possible that rule2 executes its logic before rule1. Urgency is used to determine which rules "will fire“. Earliness defines the order they fire in.

7. All reads from a register must be scheduled before any writes to the same register: any rule which reads from a register must be scheduled "earlier" than any other rule which writes to it.

8. Constants may be "read" at any time; a register *might* have a write but no read.9. The compiler creates a sequence of steps, where each step is essentially a rule firing. Its

inputs are valid at the beginning of the cycle, its outputs are valid at the end of the cycle. Data is not allowed to be driven "backwards" in the schedule: that is, no action may influence any action that happened "earlier" in the cycle. This would go against causality, and constitutes a "feedback" path that the compiler will not allow.

10. If the compiler is not told otherwise, methods have higher urgency than rules, and will execute earlier than rules, unless there's some reason to the contrary. There is a compiler switch to flip this around and make rules have higher urgency.

Page 44: Bluespec Lectures 3 & 4 with some slides from Nikhil Rishiyur at Bluespec and Simon Moore at the University of Cambridge.

The Swap Conundrum(* synthesize *)module rules9 (Empty) ;

Reg#(int) x <- mkReg (12) ; Reg#(int) y <- mkReg (17) ;

rule r1 ; x <= y ; endrule

rule r2 ; y <= x ; endrule

rule monitor ; $display ("x, y = %0d, %0d", x, y) ; endrule

endmodule

$ ./rules9 -m 5x, y = 12, 17x, y = 12, 12x, y = 12, 12x, y = 12, 12

Page 45: Bluespec Lectures 3 & 4 with some slides from Nikhil Rishiyur at Bluespec and Simon Moore at the University of Cambridge.

The Swap Conundrum(* synthesize *)module rules9 (Empty) ;

Reg#(int) x <- mkReg (12) ; Reg#(int) y <- mkReg (17) ;

rule r1 ; x <= y ; endrule

rule r2 ; y <= x ; endrule

rule monitor ; $display ("x, y = %0d, %0d", x, y) ; endrule

endmodule

rule r1 (tick 1) x._write (y._read ()) y readx write

rule r2 (tick 2) y._write(x._read()) x ready write

PROBLEM: register x must read before write

Page 46: Bluespec Lectures 3 & 4 with some slides from Nikhil Rishiyur at Bluespec and Simon Moore at the University of Cambridge.

(* synthesize *)module rules10 (Empty) ;

Reg#(int) x <- mkReg (12) ; Reg#(int) y <- mkReg (17) ;

rule r ; x <= y ; y <= x ; endrule

rule monitor ; $display ("x, y = %0d, %0d", x, y) ; endrule

endmodule

$ ./rules10 -m 5x, y = 12, 17x, y = 17, 12x, y = 12, 17x, y = 17, 12

Schedule wise, step 1 reads x and y at the beginning and writes x and y at the end.

Page 47: Bluespec Lectures 3 & 4 with some slides from Nikhil Rishiyur at Bluespec and Simon Moore at the University of Cambridge.

Wires

• In Bluespec from a scheduling perspective registers and wires are dual concepts.

• In one cycle all register reads must execute before register writes.

• In one cycle a wire must be written to (at most once) before it is read (any number of times).

Page 48: Bluespec Lectures 3 & 4 with some slides from Nikhil Rishiyur at Bluespec and Simon Moore at the University of Cambridge.

Rules of Wires

• Wires truly become wires in hardware: they do not save “state” between cycles (compare to signal in VHDL).

• A wire’s schedule requires that it be written before it is read (as opposed to a register that is read before it is written).

• A wire can not be written more than once in a cycle.

Page 49: Bluespec Lectures 3 & 4 with some slides from Nikhil Rishiyur at Bluespec and Simon Moore at the University of Cambridge.

(* synthesize *)module rules11 (Empty) ;

Reg#(int) x <- mkReg (12) ; Reg#(int) y <- mkReg (17) ; Wire#(int) xwire <- mkWire;

rule r1 ; x <= y ; endrule

rule r2 ; y <= xwire ; endrule

rule driveX ; xwire <= x ; endrule

rule monitor ; $display ("x, y = %0d, %0d", x, y) ; endrule

endmodule

$ ./rules11 -m 5x, y = 12, 17x, y = 17, 12x, y = 12, 17x, y = 17, 12

Page 50: Bluespec Lectures 3 & 4 with some slides from Nikhil Rishiyur at Bluespec and Simon Moore at the University of Cambridge.

(* synthesize *)module rules11 (Empty) ;

Reg#(int) x <- mkReg (12) ; Reg#(int) y <- mkReg (17) ; Wire#(int) xwire <- mkWire;

rule r1 ; x <= y ; endrule

rule r2 ; y <= xwire ; endrule

rule driveX ; xwire <= x ; endrule

rule monitor ; $display ("x, y = %0d, %0d", x, y) ; endrule

endmodule

$ cat rules11.sched=== Generated schedule for rules11 ===

Rule schedule-------------Rule: monitorPredicate: TrueBlocking rules: (none)

Rule: driveXPredicate: TrueBlocking rules: (none)

Rule: r2Predicate: xwire.whasBlocking rules: (none)

Rule: r1Predicate: TrueBlocking rules: (none)

Logical execution order: monitor, driveX, r1, r2

=======================================

Question: is monitor, driveX, r2, r1 a valid schedule?

Page 51: Bluespec Lectures 3 & 4 with some slides from Nikhil Rishiyur at Bluespec and Simon Moore at the University of Cambridge.

Wire

• Implements Reg interface (_read and _write methods).

• Implicit condition:– it not ready if it has not been written

• In any cycle if there is no write to a wire then any rule that reads that wire is blocked (it can not fire).

Page 52: Bluespec Lectures 3 & 4 with some slides from Nikhil Rishiyur at Bluespec and Simon Moore at the University of Cambridge.

(* synthesize *)module rules12 (Empty) ;

Reg#(int) y <- mkReg (17) ; Reg#(int) count <- mkReg (0) ; Wire#(int) x <- mkWire;

rule producer ; if (count % 3 == 0) x <= count ; endrule

rule consumer ; y <= x ; $display ("cycle %0d: y set to %0d", count, x) ; endrule

rule counter ; count <= count + 1 ; endrule

endmodule

$ ./rules12 -m 9cycle 0: y set to 0cycle 3: y set to 3cycle 6: y set to 6

Page 53: Bluespec Lectures 3 & 4 with some slides from Nikhil Rishiyur at Bluespec and Simon Moore at the University of Cambridge.

DWire

• A Wire with a default value.• A Dwire is always ready.• If there is a write to a DWire in a cycle then

just like a Wire it assumes that value.• If there is no write to a DWire in a cycle it

assumes a default value (given at instantiation time).

Page 54: Bluespec Lectures 3 & 4 with some slides from Nikhil Rishiyur at Bluespec and Simon Moore at the University of Cambridge.

(* synthesize *)module rules13 (Empty) ;

Reg#(int) y <- mkReg (17) ; Reg#(int) count <- mkReg (0) ; Wire#(int) x <- mkDWire (42);

rule producer ; if (count % 3 == 0) x <= count ; endrule

rule consumer ; y <= x ; $display ("cycle %0d: y set to %0d", count, x) ; endrule

rule counter ; count <= count + 1 ; endrule

endmodule

$ cycle 1: y set to 42cycle 2: y set to 42cycle 3: y set to 3cycle 4: y set to 42cycle 5: y set to 42cycle 6: y set to 6cycle 7: y set to 42

Page 55: Bluespec Lectures 3 & 4 with some slides from Nikhil Rishiyur at Bluespec and Simon Moore at the University of Cambridge.

BypassWire

• Closest thing to a wire in Verilog.• A BypassWire is always ready.• Rather than having a default value the

compiler must be able to statically determine that this wire is driven on every cycle.

Page 56: Bluespec Lectures 3 & 4 with some slides from Nikhil Rishiyur at Bluespec and Simon Moore at the University of Cambridge.

FIFOs

• Lots and lots of FIFOs provided in FIFO, FIFOF, SpecialFIFOs libraries

• Examples (2 and 4 element FIFOs):FIFO#(UInt#(8)) myfifo <- mkFIFO; FIFO#(UInt#(8)) biggerfifo <- mkSizedFIFO(4);

• Example BypassFIFO (1 storage element, data passes straight through if enq and deq on same cycle when empty)

FIFO#(UInt#(8)) bypassfifo <- mkBypassFIFO;

• Basic interfaces:– enq(value) // enqueue “value”– first // returns first element of fifo– deq // dequeue

Page 57: Bluespec Lectures 3 & 4 with some slides from Nikhil Rishiyur at Bluespec and Simon Moore at the University of Cambridge.

import FIFO::*;

(* synthesize *)module rules14 (Empty) ;

Reg#(int) count <- mkReg (0) ;

FIFO#(int) fifo <- mkSizedFIFO (30);

rule producer (count < 5) ; fifo.enq (count*3) ; $display ("cycle %0d: enqeuing value %d", count, count*3) ; endrule

rule consumer (count > 5) ; int x = fifo.first ; fifo.deq ; $display ("cycle %0d: deqeued value %0d", count, x) ; endrule

rule counter ; count <= count + 1 ; endrule

endmodule

$ ./rules14 -m 20cycle 0: enqeuing value 0cycle 1: enqeuing value 3cycle 2: enqeuing value 6cycle 3: enqeuing value 9cycle 4: enqeuing value 12cycle 6: deqeued value 0cycle 7: deqeued value 3cycle 8: deqeued value 6cycle 9: deqeued value 9cycle 10: deqeued value 12

Page 58: Bluespec Lectures 3 & 4 with some slides from Nikhil Rishiyur at Bluespec and Simon Moore at the University of Cambridge.

import FIFO::*;

(* synthesize *)module rules15 (Empty) ;

Reg#(int) count <- mkReg (0) ;

FIFO#(int) fifo <- mkSizedFIFO (30);

rule producer (count < 5) ; fifo.enq (count*3) ; $display ("cycle %0d: enqeuing value %0d", count, count*3) ; endrule

rule consumer (count < 5) ; int x = fifo.first ; fifo.deq ; $display ("cycle %0d: deqeued value %0d", count, x) ; endrule

rule counter ; count <= count + 1 ; endrule

endmodule

?

Page 59: Bluespec Lectures 3 & 4 with some slides from Nikhil Rishiyur at Bluespec and Simon Moore at the University of Cambridge.

import GetPut::* ;import Connectable::* ;

module mkProducer (Get#(int)) ;

Reg#(int) i <- mkReg (0) ;

rule incrementI ; i <= i + 1 ; endrule

method ActionValue#(int) get () ; return i ; endmethod endmodule: mkProducer

module mkConsumer (Put#(int)) ;

Wire#(int) i <- mkWire ;

rule report ; $display ("mkConsumer %d", i) ; endrule

method Action put (int x) ; i <= x ; endmethod

endmodule: mkConsumer

(* synthesize *)module mkConnectableExample(Empty) ;

Get#(int) p <- mkProducer ; Put#(int) c <- mkConsumer ; mkConnection (p, c) ;

endmodule: mkConnectableTest

Higher Order Typesp and c are methods whichare passed as arguments

Page 60: Bluespec Lectures 3 & 4 with some slides from Nikhil Rishiyur at Bluespec and Simon Moore at the University of Cambridge.

ServerFarm

ServerFarm Information Flow

DividerServer

requ

est

resp

onse

DividerServer

requ

est

resp

onse

resp

onse

requ

est

Page 61: Bluespec Lectures 3 & 4 with some slides from Nikhil Rishiyur at Bluespec and Simon Moore at the University of Cambridge.

Conclusions

• Bluespec:– provides cleaner interfaces

• quicker to create large systems from libraries of components• easier to refine design

– creates most of the control for you (unless you don’t want it to)

• less likely to get it wrong!

– has strong typing• helps remove bugs

– provides powerful static elaboration