Markus Hjort Reaktor Innovations Java Web Development T-111.4360 suunnittelu @ HUT 4.3.2008.
TKT-1212 Digitaalijärjestelmien Suunnittelu€¦ · Several Implementation styles Two basic...
Transcript of TKT-1212 Digitaalijärjestelmien Suunnittelu€¦ · Several Implementation styles Two basic...
TKT 1212TKT-1212Digitaalijärjestelmien Suunnittelu
FSM implementations, Practical VHDL synthesisy
Erno Salminen, 2011
CurrentstateInput
Next
State
Output
Moore
AcknowledgementsAcknowledgements Prof. Pong . P. Chu provided ”official” slides for the book
which is gratefully aknowledged See also: http://academic.csuohio.edu/chu_p/
M t lid d b A i K l l Most slides were made by Ari Kulmala and other previous lecturers (Teemu Pitkänen, Konsta Punkka,
Mikko Alho…))
2
Finite State MachinesFinite State Machines
All the previous teachings are still valid and just the description stylep g p ychanges
in=x2 in
Stop Play
Play x 2y=z3
Next_tracky=z5
in=plin=x2
in=next
n=others
in=others
always py=z1 y=z2
Rewplay x 2y=z4Prev_track
y=z6
in=st
in=prev
in=others
in=-x2
in=others
always
3
y p
in=-x2
Several Implementation stylesSeveral Implementation styles Two basic flavors: Mealy and Moore
b h l h h l d h In both cases, one must select whether to include the output registers or not
Moreover, you decide the VHDL presentation of FSM, y p Description style: how many processes Encoding of states, if not automated in synthesis
CurrentstateInput
Next
StateOutput
CurrentstateInput
Next
State
Output
MealyMoore4
Several Implementation styles (2)Several Implementation styles (2)1. 1 sequential process2. 2 processes
a) Seq: curr. state registers and output, Comb: next state logicb) Seq: curr state, Comb: next state, outputb) Seq: curr state, Comb: next state, outputc) Seq: next and curr state, Comb: output
3. 3 processes (Seq: curr state, Comb: output, Comb: nextl dstate logic separated)
CurrentstateInput
Next
StateOutput
CurrentstateInput
Next
State
Output
MealyMoore5
ImplementationsImplementations General form:
We define an own type for the state machine states We define an own type for the state machine states ALWAYS use enumeration type for state machine states synthesis software, e.g. Quartus II, does not recognize it otherwise
E ti t f architecture rtl of traffic_light is
type states_type is (red, yellow, green);
i it t t li itl d fi d i t t h
Enumeration type for states
-- init state explicitly defined in reset, not heresignal ctrl_r : states_type;
...begin rtlbegin -- rtl Signal ctrl_r is the current
state register
6
Implementations: 2seg MooreImplementations: 2seg-Moore Moore, two segment coding stylesync ps : process (clk, rst n)y _p p ( , _ )begin -- process singleif rst_n = '0' then
<INIT STATE OF THE FSM>
l if lk' t d lk '1' thelsif clk'event and clk = '1' then<Synchronous part of the FSM; assign next state to curr state>
end if;end process sync ps;end process sync_ps;
comb_ns_output : process (ctrl_r, input)begin -- process output
<Combinational part;d fi t t tdefine next state, define output>
end process ns_output;
end rtl;
7
Implementations: 2seg MealyImplementations: 2seg-Mealy Mealy, two-segment coding style
( lk )sync_ps : process (clk, rst_n)begin -- process singleif rst_n = '0' then
<INIT STATE OF THE FSM>
elsif clk'event and clk = '1' then <Synchronous part of the FSM; assign next state to curr state>
end if;end if;end process sync_ps;
comb_ns_output : process (ctrl_r, input)begin -- process output
<Combinational part;define next state,define output(Looks same as Moore, but here also(Looks same as Moore, but here alsothe input is considered for output)>
end process ns_output;end rtl;8
C di g t l 1 g M /R g M lCoding style: 1seg-Moore/Reg-Mealy 1-segment style
sync_all : process (clk, rst_n)begin -- process singleif rst_n = '0' then
<INIT STATE OF THE FSM>
elsif clk'event and clk = '1' then<define next stateand assign next state to curr state><define output> Output becomes register!
end if;end process sync_all;
9
Coding style: 3seg Moore/MealyCoding style: 3seg_Moore/Mealy 3-segment stylesync ps : process (clk, rst n)y _p p ( , _ )begin -- process singleif rst_n = '0' then
<INIT STATE OF THE FSM>elsif clk'event and clk = '1' then
< t t i t><curr state assignment>end if;
end process sync_ps;
comb ns : process (ctrl r, input)comb_ns : process (ctrl_r, input)begin -- process output
<Combinational part;define next state>
end process comb_ns;
comb_output : process (ctrl_r, input)begin -- process output
<Combinational part;define output>define output>
end process comb_output;
10
Examples: Traffic light FSM implemented with various stylesimplemented with various styles Examples shown as VHDL code
They also show the usage of counter in state machine Acts as a timer for showing yellow light
Output latency is larger for Moore and registered Mealy
H ll h i l i k ll li h f However, all the implementations keep yellow light on for same amount of time
11
Comparison of implementation styles: C di g t lCoding style 1-segment:
Automatically inferred output registersy p g Simple view to the design, everything at one place Safe, registered Mealy machine is easy to implement with this style Recommended (as opposite to the course book!)
2-segment, 3-segment Only way to implement unregistered outputs to FSMs Modular Long ago synthesis tools did not recognize 1 segment FSMs correctly Long ago synthesis tools did not recognize 1-segment FSMs correctly
Not the case anymore Recommended style in many books, partially because of those limitations of the old tools
Useful for quite simple control machines that do not have e.g. delay countersincludedincluded
Complex state machines are cumbersome to read The code does not proceed smoothly, have to jump around the code The same conditions may be repeated in many processes
12
Comparison of implementation styles: C di g t lCoding style 1-segment:
Automatically inferred output registersy p g Simple view to the design, everything at one place Safe, registered Mealy machine is easy to implement with this style Recommended (as opposite to the course book!)
2-segment, 3-segment Only way to implement unregistered outputs to FSMs Modular Long ago synthesis tools did not recognize 1 segment FSMs correctly Long ago synthesis tools did not recognize 1-segment FSMs correctly
Not the case anymore Recommended style in many books, partially because of those limitations of the old tools
Useful for quite simple control machines that do not have e.g. delay countersincludedincluded
Complex state machines are cumbersome to read The code does not proceed smoothly, have to jump around the code The same conditions may be repeated in many processes
13
Quartus II design flow after you’ve flow after you ve simulated and verified the design
Generic gate-level representationp
Places and routes the logic into a device
Converts the post-fit netlist into a FPGA programming file
Analyzes and validates the timing performance of all logic in a design.
Run on FPGA14
Examples: state diagramExamples: state diagramTool A
15 Note the encoding
RTL Synthesis result: Tool ARegister for output bit 2RTL Synthesis result: Tool A output bit 2
State register
R i Registers for output bits 1..0
Combinatorialoutput logic
Next state logic, incl. counter for showing yellow lightlong enough
Comb path frominput to output l i
16 Registered mealy machine, traffic light VHD
logic
Technology schematic Tool ATechnology schematic, Tool ASingle flip-flops
Look-up tables, max 6 inputs
17
Registered mealy machine, traffic light VHD
Synthesis result: Tool BSynthesis result: Tool B Same VHDL, different result
# Info: [45144]: Extracted FSM in module work.traffic_light(rtl){generic map (n_colors_g => 3 yellow_length_g => 10)}, with state variable = ctrl_r[1:0], async set/reset state(s) = 00 , number of states = 3.# Info: [45144]: Preserving the original encoding in 3 state FSM# Info: [45144]: FSM: State encoding table.# Info: [40000]: FSM: Index Literal Encoding[ ] g# Info: [40000]: FSM: 0 00 00# Info: [40000]: FSM: 1 01 01# Info: [40000]: FSM: 2 10 10
Note the different state encoding
18
Registered mealy machine, traffic light VHD
Technology schematic Tool BTechnology schematic, Tool B
LUTLUTs
Multi-bit registers
19 Registered mealy machine, traffic light VHD
Physical placement on-chipThe traffic_light.vhd place and routed
Stratix 2S180, 143 000 ALUTs (~LUTs)
20 Quite much unused resources...
Implementation area and freqImplementation area and freq Note that no strict generalization can be made about the
”betterness”betterness Tool A Total ALUTs 15
h ALUTs with register 10 Tool B LUTs 16 Registers 9
The one register difference is due to the different stateencodingencoding The state encoding can be explicitly defined or left to the tool
to choose (as in this case)
21
Synthesis of different VHDLsSynthesis of different VHDLsAREA [LUT] AREA [reg] Lines of Code
mealy (single) 16 9 104y ( g )mealy (output separated) 13 6 126mealy_2proc. (out+ns separated) 11 6 125mealy_3proc 11 6 150
Functionally equivalent
Moore 11 6 108
Timing aspect vary Different max frequency
O l h ” l l ” h 3 b Only the ”Mealy single” has output registers (3 bit)
Coding style has an effect even with small designs
R dibilit f th d i i l!
22
Readibility of the code is as crucial!
Synthesis summarySynthesis summary Different tools produce slightly different results in even small
designs Notable effect also achieavable by tuning the tool settings Synthesis tools are heuristic due to very large design space Synthesis tools are heuristic due to very large design space
Even a single tool may produce slightly different results on different runs! Optimization heuristics utilize randomness
However, no tool can convert a bad design into a good one!
Make sure that you are aware of what signals of the shown codes have been implemented as registers!
23
Comparison of implementation styles: M d M lMoore and Mealy Generally, we want that outputs are registered
Mealy machine is dangerous due to possible long combinational paths (or loops)
For registered outputs, use a registered Mealy machine Outputs are registered but has shorter latency than Moore Outputs are registered, but has shorter latency than Moore
machine with registered outputs
Otherwise, opt for Moore machine, p
24
Logic synthesisLogic synthesis
25
Foreword: VHDL and synthesisForeword: VHDL and synthesis The main goal of writingVHDL is to generate synthesizable
description
This lecture presents some practical examples of how to write code that is good for synthesiswrite code that is good for synthesis
The quality of the design is much affected by the coding style must be able to choose structures that synthesize the best must be able to choose structures that synthesize the best
26
Simplest exampleSimplest example In this course, we concentrate on RTL synthesis: how is HDL
d i li f b i d fli flconverted into netlist of basic gates and flip-flops Technology mapping, routing and placement are beyond the scope of this
course
Example: arith_result <= a + b + c - 1; The resulting combinatorial logic is straightforward
l f d d h d Inclusion of DFF depends on the context (inside a sync process or not)
27
Example: if elseExample: if-else
Conceptual structure of nested if-clauses in HDL
Conceptual hardware realization
Example: Example: selected
iassignment
Fig 1. Basic form of synthesized logic
Note the similaritySimilar to if else Note the similarityto –if-clause inside a process
Similar to if-else
29 Fig 2. Full logic
Logic from case clauseLogic from case-clause This example has 2 outputs but again the logic is similar to if-
lclause
30
Conceptual structure of case-clause in HDL Conceptual hardware realization
More complexSelectedSelectedassignemt
31Fig1. Conceptual hardware realization
Fig 2. Full logic
Example: loopExample: loop
Bounds must be static , likeBounds must be static , likehere (3-1 down 0)
The loop is ”unrolled” in logic
Evertyhing happens in parallel!
Hence, the loop is equivalentto
Sidenote: y <= a xor b is evenbetter with std_logic_vectors, butthen we would not have an example of a loop
Loops: example2 in SWLoops: example2 in SW With software
f (i 0 i i ) {for (i = 0; i < max_c; i++) {b(i) = a(i) + i;
}
It ti l l ti f b(i) ( i lifi d) Iterative calculation for b(i) (simplified)1. Calculate for-clause2. Fetch a3. Add4. Store b5 Increment i5. Increment i6. Go back to 1
Takes a lot of clock cycles (several even with loop-unrolling)
33
y p g
Loops: example2 in HWLoops: example2 in HW Hardware:
dd i f i i 0 t 1 tadd_i: for i in 0 to max_c-1 generateb(i) <= a(i) + i;
end generate add_i;
G t < > ll l t ti it Generates <max_c> parallel computation units High area overhead
Result generated in 1 clock cycle, very fast!g y , y
However, in HW we can adjust the area-performance ratio Pipeline e.g. half of the result on the first cycle, rest on the
second Fully sequential (the SW case), still faster than SW
34
y q ( ),
Basic optimizationsBasic optimizations Constant operands simplify Boolean equations For example, consider 4 bit comparatora) x = y
b) x = 0
Smallest possible data width is of course desiredS a est poss b e ata w t s o cou se es e
Iterative algorithms trade area for delay
Even the most basic operations have different costsp
One can share complex units via multiplexing
35
SharingSharing Arithmetic operators
l Large implementation Limited optimization by synthesis software Data width has a major impactj p
“Optimization” can be achieved by “sharing” in RT level coding
O h i Operator sharing Functionality sharing
36
Sharing 2Sharing 2 Possible when “value expressions” in priority network and multiplexing p p y p g
network are mutually exclusively: Only one result is routed to output
G f f d l l h Generic format of conditional signal assignment guarantees this:sig_name <= value_expr_1 when boolean_expr_1 else
value_expr_2 when boolean_expr_2 elsevalue_expr_3 when boolean_expr_3 else. . .
value expr n;value_expr_n;
37
Sharing example 1NOTE:Coded outside a processSharing example 1
Original code:
Coded outside a process
r <= a+b when boolean_exp elsea+c;
Revised code (enables sharing):src0 <= b when boolean exp elsesrc0 b when boolean_exp else
c;r <= a + src0;
38
Area: 2 adders, 1 mux, Bool
D l
Area: 1 adder, 1 mux, Bool
D l Delay: Delay:
39
However, no free lunch in general: sharing reduces A butincreases T in this case
Sharing example 2NOTE:Coded inside a process
l hSharing example 2 Original code:
Equivalent with previous
Revised code:b dprocess(a,b,c,d,...)
beginif boolean exp then
process(a,b,c,d,...)begin
if boolean_exphif boolean_exp then
r <= a+b;else
thensrc0 <= a;src1 <= b;
r <= a+c;end if;end process;
elsesrc0 <= a;src1 <= c;p ;
end if;end process;r <= src0 + src1; s c0 s c ;
40
Guidelines for synthesizable HDLGuidelines for synthesizable HDL
41
Logic design basics still applyLogic design basics still apply Modularize the design to components Easier to design single components Easier to upgrade
U i t l t ( SVN it) Use version control system (e.g. SVN or git)
Asynchronous reset is used only to initialize Not part of the functionality Not part of the functionality Hence, you don’t force reset from your code Use separate clear-signal or similar if needed. That is checked on the edge
sensitive if-branch (lec 12)
42
General guidelines and hintsGeneral guidelines and hints Use only synthesizable code!
Use std_logic data types and numeric_std package
Use only descending range in the arrays (e.g. downto)i l i d l i (d id h 1 d 0) Signal write_r : std_logic_vector(data_width_g-1 downto 0)
Signal write_out : std_logic_vector(0 to data_width_g-1)
Parenthesis to show the order of evaluation A and ( x or b)
Check the VHDL coding rules used in the course Not just tidyness, affects also performance/area of the designj y , p g
43
Points to notePoints to note Assignment delay, such as ”a <= b After x ns”, is problematic Assigment will be synthesized but not the delay This example will produce a simple wire If you fix your code like this, it won’t work after synthesisy y , y Only place to use non-synthesizable code is testbenches
Variables are synthesizable, but… it is harder to figure out the resulting logic than with signals (lec12)
High-impedance state ’Z’ is synthesizable but… simulation results and real HW do not always match (lec12)simulation results and real HW do not always match (lec12)
Synthesis tools are great but… they behave differently. Some structures are not accepted by all tools
44
Notes on combinational circuitdesigndesign Always write a complete sensitivity list
In every comb. process invocation, every signal must beassigned a value Oth i t l t h t h ld th i l Otherwise generates latches to hold the previous values We practically never want to have latches from RTL comb.
Processes Usual suspect: Incomplete if-else or such
Avoid combinational loops! The same signal on both sides of assigment
E.g. a <= a+1; -- aargh!
45
Note on sequential synchronouscircuit designcircuit design In a synchronous process, there are only two branches
if rst_n = '0' then -- asynchronous reset (active low)
<INIT>elsif clk'event and clk = '1' then rising clockelsif clk'event and clk = '1' then -- rising clock
edge <Synchronous part>
end if;
Only clk and rst_n in the sensitivity list! No else-branch, no other elsif branches, and no code outside the
branchesbranches Some tools support more branches and some don’t => behavior
undefined
46
The true Devil never do!The true Devil – never do!ENTITY bad_counter IS
PORT (reset clk inc : IN STD LOGIC;reset, clk, inc : IN STD_LOGIC;cnt : BUFFER INTEGER RANGE 0 TO 4);
END bad_counter;ARCHITECTURE example OF bad_counter ISBEGIN -- Example
PROCESS (clk, reset, inc, cnt)BEGIN -- PROCESS
IF reset = ’0’ THEN -- asynchronous reset (active low)cnt <= 0;
ELSIF inc = ’1’ THENcnt <= cnt+1;
S lk’ lk ’1’ i i l k dELSIF clk’EVENT AND clk = ’1’ THEN -- rising clock edgecnt <= cnt-1;
END IF;END PROCESS;
END example;
What is wrong?
Generates a pseudo-random machine.
What is wrong?
47
Not supported by synthesisNot supported by synthesis Signals in packages (global signals) Si l d i bl i iti li ti Signal and variable initialization
Typically ignored (there are exceptions, e.g. Xilinx FPGA synthesis)
Unconstrained while and for loops More than one ’event in a process Multiple wait statements Physical types for example time Physical types, for example time Access types File types Guard expression (Sensitivity lists, delays and asserts are ignored)
48
TypesTypes Using own types may significantly clarify the code Declaration Declaration:
TYPE location ISRECORD
x: INTEGER range 0 to location_max_c-1;y: INTEGER range 0 to location max c-1;y: INTEGER range 0 to location_max_c 1;valid : std_logic;
END RECORD;
TYPE locations type IS ARRAY (0 to 3) of location;_ yp ( ) ;SIGNAL loc_r : locations_type;
Usage:For i in 0 to 3 loop
x,y,valid
0,3,’1’Loc_r(0)loc_r(i).x <= i;loc_r(i).y <= 3-i;loc_r(i).valid <= ’1’;
End loop;
1,2,’1’
2,1,’1’
3,0,’1’
Loc_r(1)
Loc_r(2)
Loc_r(3)
49
Types #2Types #2
Initialization of an array constant:Initialization of an array constant:constant a_bound_c : integer := 2;
type vector_2d is array (0 to a_bound_c-1) of std_logic_vector(1 downto 0);
type vector_3d is array (0 to a_bound_c-1) of vector_2d;
constant initial_values_c : vector_3d := (("00", "01"),
("10", "11"));
You may split initilization to multiple lines to increase readability
Initial_values_c
c0\c1 0 1
0 ”00” ”01”
c1 = horizontalc0 = vertical
1 ”10” ”11”
50
Types #3Types #3
Special case, have to use positional assignment:Special case, have to use positional assignment:constant ip_types_c : integer := 1;
type ip_vect is array (0 to ip_types_c-1 ) of integer;
constant ip_amount_c : ip_vect := (0 => 1); -- right way
constant ip_amount2_c : ip_vect := (1); -- does not work!
constant ip_amount2_c : ip_vect := 1; -- does not work!
** Error: rtm pkg.vhd(20): Integer literal 1 is not of type ip vect._p g ( ) g yp p_
There is only a single value but it is an array nonetheless
51
Coding style effect Coding style effect Depends on the used synthesis software
However, if style x is clearly better than y in synthesis tool A, most probably it won’t be much worse in B (although it mayyield the same result)yield the same result)
Here we use Quartus II Altera FPGA synthesis tool as an Here, we use Quartus II Altera FPGA synthesis tool as an example Used on the course Source: Quartus II Handbook (for ver. 6.1) + Ref 1.
52
MultiplexersMultiplexers Multiplexers form a large portion of the logic utilization
E.g. 30% of Nios II/f soft-core processor area are muxes
If-structure generates a priority multiplexerIF cond1 THEN < aIF cond1 THEN z <= a;
ELSIF cond2 THEN z <= b;
ELSIF cond3 THEN z <= c;
ELSE z <= d;
END IF;
It is preferred to use caseCASE sel IS
WHEN cond1 => z <= a;
WHEN cond2 => z <= b;
d3WHEN cond3 => z <= c;
WHEN OTHERS => z <= d;
END CASE;
Creates a balanced multiplexer tree
53
Multiplexers #2Multiplexers #2 Do not let the simplicity of VHDL trick you
Multiplexing four 32-bit words requires 130 input bits (2 control bits + 128 data bits), 32 output bits
A l f i A lot of routing 32 x 4-to-1 muxes A 4-to-1 mux requires three 2-to-1 muxesA 4 to 1 mux requires three 2 to 1 muxes One 2-to-1 mux implementable in one basic logic element => 3*32=96 2-to-1 muxes required, 96 LEs consumed
54
ShiftersShifters Variable amount shifting is area-hungry
h 32 b h b h f d b Assume that 32-bit vector that can be shifted arbitrary amount to left or right
Needs a 32-to-1 mux for every result bit! 32-to-1 mux = 31 2-to-1 muxes = 31 LEs
32*31 = 992 2-to-1 muxes (=LEs) Non-constant shifters are generally not supported Non constant shifters are generally not supported
(automatically) by synthesis tools
An FPGA-specific trick is to use the embedded multipliers to d th d i hiftido the dynamic shifting Multiplying by 2n shifts the result to left by n Faster and more area-efficient than doing this with LEsg
55
ComparatorsComparators <, >, ==
Avoid implementing == in general logic cells. Comparators are implementable using arithmetic operations and fast carry chainschains. Calculate a-b and check is the result negative, zero, or positive
Synthesis tools should be aware of this automatically...y y
Recall that x = a[6:0] < b[6:0] is the same as x = signed(a[6:0]-b[6:0])[7]
The last carry [overflow] of the subtraction The last carry [overflow] of the subtraction
Note: in ASIC’s it may not be feasible to use arithmetics for comparison
56
An example 0.55 um standard-cell CMOS i l t tiCMOS implementation
RTL Hardware Design58
Subscriptsa = area-optimizedd = delay-optimized
Asymptotic cost:Nand: area is O(n) and time O(1)”>” area is O(n) and time O(n)
Background: Big-O notation for algorithmic complexityalgorithmic complexity
Way to approximate the how the cost increases with the number of i tinputs n
Function f(n) belongs to class O(g(n)):if n0 and c can be found to satisfy:if n0 and c can be found to satisfy:
f(n) < cg(n) for any n, n > n0
g(n) is simple function: 1, log2n, n, n*log2n, n2, n3, 2n
Following are O(n2):
58
Interpretation of Big OInterpretation of Big-O Filter out the “interference”: constants and less important
t terms Algorithms with O(2n) is intractable, but already O(n3) is
very bady Not realistic for a larger n Frequently tractable algorithms for sub-optimal solution exist
O d l h l i h One may develop a heuristic algorithm They do not guarantee optimal solution, but ususally provide
rather good one with acceptable cost Often utilize pseudo-random choices For example, simulated annealing and genetic algorithms
59
E g E.g.,
intractable
60
Specific to FPGASpecific to FPGA A lot of registers – use them Aggressive pipelining Aggressive pipelining Objective is to hide the routing delays as much as possible Simple logic stages between registers
Adders Adders Generally, its not beneficial to share adders FPGAs often contain (e.g. Altera) special structures for adders Sharing of adder may cost as much as the adder itselfSharing of adder may cost as much as the adder itself
Hard macros Use whenever appropriate Higher performance than by building one with the FPGA native resources Higher performance than by building one with the FPGA native resources ”they are there anyway”
Embedded multipliers and small SRAMs are common
61
FPGA #2FPGA #2 Get to know the properties of the device
E.g. FPGA on-chip memories are typically multiples of 9 bit wide Th i th bit b d f The ninth bit can be used for Control
Parity bit
Otherwise, it is wasted! Memories are typically dual-ported, take advantage of this
62
ConclusionsConclusions Finite state machines can be coded in a variety ways Prefer simplicity according to in house coding rules Prefer simplicity, according to in-house coding rules
Coding style has a profound effect on the quality of the hardware Area, max clock frequency Loops Loops Complex assignment logic creates a sea of multiplexers E.g. variable amount left-right shifter
Synthesis tools create different but functionally equivalent netlists Synthesis tools create different but functionally equivalent netlists even for small designs
Know your FPGA! You might save area and time if using some hard coded macros You might save area and time if using some hard-coded macros However, these are tricks that you should only use on the final
optimization phase
63
ReferencesReferences1. James Ball, Designing Soft-Core Processors for FPGAs. In
book ”Processor Design. System-on-chip computing for ASICs and FPGAs”, eds. Jari Nurmi.
64