Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical...
Transcript of Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical...
![Page 1: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published](https://reader036.fdocuments.in/reader036/viewer/2022062909/5b049dc57f8b9aba168d924c/html5/thumbnails/1.jpg)
Reconfigurable Computing Finite State Machines
School of Electrical and Information Engineering
http://www.ee.usyd.edu.au/~phwl
Philip Leong ([email protected])
Tortise: You don't need to give an infinite number of monkeys an infinite amount of time to write Hamlet. A finite number of monkeys and a finite amount of time will do just fine. And like my father used to say, never use an infinite number of monkeys when a finite number will do.
- Mike Schiraldi, in a sci.math posting
![Page 2: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published](https://reader036.fdocuments.in/reader036/viewer/2022062909/5b049dc57f8b9aba168d924c/html5/thumbnails/2.jpg)
Lecture Recordings
http://www.ee.usyd.edu.au/people/philip.leong/hit-rc14.html Slides often change just prior to lecture!
Video recording of lecture
![Page 3: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published](https://reader036.fdocuments.in/reader036/viewer/2022062909/5b049dc57f8b9aba168d924c/html5/thumbnails/3.jpg)
Introduction
› Most practical digital designs built from datapath+control - control implemented as a finite state machine
› Describe how to implement finite state machines - how to construct a FSM
- how to optimize for time and area
3
![Page 4: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published](https://reader036.fdocuments.in/reader036/viewer/2022062909/5b049dc57f8b9aba168d924c/html5/thumbnails/4.jpg)
Simple Example: A memory controller
4
![Page 5: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published](https://reader036.fdocuments.in/reader036/viewer/2022062909/5b049dc57f8b9aba168d924c/html5/thumbnails/5.jpg)
State Transition Table
5
Current State Input Condition
Next State Output
Idle ready Decision OE=0,WE=0 Decision ~rw
rw Write Read
OE=0,WE=0
Write ready Idle OE=0,WE=1 Read ready Idle OE=1,WE=0
• Note that output is only a function of the state (Moore machine) • If input condition not met, state does not change
Next State Logic Output LogicState Registersinputs next_state current_state outputs
![Page 6: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published](https://reader036.fdocuments.in/reader036/viewer/2022062909/5b049dc57f8b9aba168d924c/html5/thumbnails/6.jpg)
State Assignment
6
Current State Input Condition
Next State Output
Idle ready Decision OE=0,WE=0 Decision ~rw
rw Write Read
OE=0,WE=0
Write ready Idle OE=0,WE=1 Read ready Idle OE=1,WE=0
(s1,s0) (n1,n0) Output 00 01 OE=0,WE=0 01 10
11 OE=0,WE=0
10 00 OE=0,WE=1 11 00 OE=1,WE=0
• n1 = ~s1.~s0.ready • n0 = ~s0.~s0.ready+ ~s1.s0.rw • OE = s1.s0 • WE = s1.~s0
![Page 7: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published](https://reader036.fdocuments.in/reader036/viewer/2022062909/5b049dc57f8b9aba168d924c/html5/thumbnails/7.jpg)
Implementation
7
• n1 = ~s1.~s0.ready • n0 = ~s0.~s0.ready + ~s1.s0.rw • OE = s1.s0 • WE = s1.~s0
![Page 8: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published](https://reader036.fdocuments.in/reader036/viewer/2022062909/5b049dc57f8b9aba168d924c/html5/thumbnails/8.jpg)
Speed
8
Next State Logic Output LogicState Registersinputs next_state current_state outputs
› Clock period constraint due to setup time - tclk ≥ tlogic + tsu+ tcq
- where tlogic is the next state logic delay
- tsu is the register setup time (time before the clock edge that data must be stable)
- tcq is the clock to output delay (time after clock edge q might be unstable)
› Clock period constraint due to hold time - tH ≤ tlogic + tcq ≤ tclk
- This is most extreme in the case of a shift register
tcq tco tSU
![Page 9: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published](https://reader036.fdocuments.in/reader036/viewer/2022062909/5b049dc57f8b9aba168d924c/html5/thumbnails/9.jpg)
Optimization for speed/area
› Often there is no need - FPGA usually fast enough so almost any description is acceptable
- in this case use the most straightforward implementation (easiest to write/read/debug)
- otherwise giving timing constraints often does the job
› Optimization - FSM dependent
- most efficient FSM depends on # states, complexity of logic etc
- architecture dependent - different for FPGA/CPLD/ASIC
- speed max freq?, smallest clock to output delay? - interrelated but we can optimize one in favor of another
9
![Page 10: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published](https://reader036.fdocuments.in/reader036/viewer/2022062909/5b049dc57f8b9aba168d924c/html5/thumbnails/10.jpg)
Other FSM architectures
› We will look at 4 different implementations - outputs decoded from state bits combinatorially
- output decoded in parallel output registers
- outputs encoded within state bits
- one hot encoding
10
![Page 11: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published](https://reader036.fdocuments.in/reader036/viewer/2022062909/5b049dc57f8b9aba168d924c/html5/thumbnails/11.jpg)
Outputs decoded from state bits combinatorially
› This is the technique described earlier
› Output logic is a combinatorial function of the state - suffers from an additional delay from output of state bits to the output signals
Next State Logic Output LogicState Registersinputs next_state current_state outputs
11
![Page 12: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published](https://reader036.fdocuments.in/reader036/viewer/2022062909/5b049dc57f8b9aba168d924c/html5/thumbnails/12.jpg)
Glitching
› Consider a state change from 1111 -> 0000 - The 4 bits could change at different speeds e.g. 1111 -> 1011 -> 0011 -> 0000
- Output logic could produce a number of spurious values during current_state transient
- Wastes power and could affect correct operation
12
Next State Logic Output LogicState Registersinputs next_state current_state outputs
![Page 13: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published](https://reader036.fdocuments.in/reader036/viewer/2022062909/5b049dc57f8b9aba168d924c/html5/thumbnails/13.jpg)
Output decoded in parallel output registers
› Decode the outputs before the state bits are registered i.e. at the output of the nextstate
› Outputs are generated earlier than the previous version this is because the output combinatorial logic delay is removed
› Area larger because we have additional registers for the output › Speed we have two combinatorial delays to the output so maximum speed
may be affected › Avoids glitching of outputs
Next State Logic State Registersinputs next_state current_state
outputsOutput Logic Output Registers
13
![Page 14: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published](https://reader036.fdocuments.in/reader036/viewer/2022062909/5b049dc57f8b9aba168d924c/html5/thumbnails/14.jpg)
Outputs encoded within state bits
› An example is a counter - the output is also the state
› Find a state encoding which directly generates the output - design more difficult
- we need to find such an encoding
- code more complicated and difficult to read
- can resolve clashes by adding outputs to differentiate similar bits
- e.g. for the memory controller can add ST0 to differentiate the IDLE and DECISION states
› Efficiency depends on the particular circuit - may be more/less efficient in area/speed, best way is to try it for a critical design
Next State Logic State Registersinputs next_state
current_state
outputs
14
![Page 15: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published](https://reader036.fdocuments.in/reader036/viewer/2022062909/5b049dc57f8b9aba168d924c/html5/thumbnails/15.jpg)
Gray code
› Gray code is arranged so that only one output signal changes (no glitching)
› Can code FSMs so that the state transitions follow Gray codes
› Conversion simple g[i] = b[i] xor b[i+1] (b[4] = 0)
15
![Page 16: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published](https://reader036.fdocuments.in/reader036/viewer/2022062909/5b049dc57f8b9aba168d924c/html5/thumbnails/16.jpg)
One hot encoding
› Use n flip-flops to represent an n-state FSM › Significantly reduces the logic for outputs and nextstate
- why?
- also reduces logic to generate outputs
- at the expense of more registers
› FPGAs - rich in registers
- for max speed and small to medium FSMs often the best choice
16
![Page 17: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published](https://reader036.fdocuments.in/reader036/viewer/2022062909/5b049dc57f8b9aba168d924c/html5/thumbnails/17.jpg)
Mealy machines
› All FSMs we have discussed are Moore machines since outputs are functions only of the current state - for Mealy FSMs, outputs functions of current state and inputs
- output can usually change 1 cycle earlier than Moore
- can have fewer states than Moore
Next State Logic Output LogicState Registersinputs current_state outputsnext_state
17
![Page 18: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published](https://reader036.fdocuments.in/reader036/viewer/2022062909/5b049dc57f8b9aba168d924c/html5/thumbnails/18.jpg)
Fault tolerance
› What happens if for whatever reason the FSM gets into an undefined state - fault tolerant designs can recover
- alternatively could enter an error state that somehow reports
- explicit “don’t care”
18
![Page 19: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published](https://reader036.fdocuments.in/reader036/viewer/2022062909/5b049dc57f8b9aba168d924c/html5/thumbnails/19.jpg)
Fault tolerance for one-hot designs
› There are many more possible illegal states for 1-hot designs - could detect with logic
- this is quite expensive, affects performance and area
- could be pipelined
19
![Page 20: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published](https://reader036.fdocuments.in/reader036/viewer/2022062909/5b049dc57f8b9aba168d924c/html5/thumbnails/20.jpg)
BlockRAM Applications, State Machine
Store next address in the ROM
Conditional jump info is being entered as
additional address inputs
One RAM can be split in two: e.g. One 18K dual-port = two 9K BlockRAMs
with independent everything:
( R/W, clock, address, data content, aspect ratio )
High speed, independent of complexity
Slide courtesy of Peter Alfke 20
![Page 21: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published](https://reader036.fdocuments.in/reader036/viewer/2022062909/5b049dc57f8b9aba168d924c/html5/thumbnails/21.jpg)
Fast State Machine in one BlockRAM
256 states, 4-way branch (or 128 states, 8-way) 36 optional parallel outputs from the second port
High speed operation, independent of code
8 bits 8 + 1 bitsBranchControl 2 bits
Output
BlockROM
1K x 9
256 x 368 bits 36 bits
Slide courtesy of Peter Alfke 21
![Page 22: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published](https://reader036.fdocuments.in/reader036/viewer/2022062909/5b049dc57f8b9aba168d924c/html5/thumbnails/22.jpg)
Metastability
› Recall all registers have setup (tSU) and hold (tH) times. Violating these can lead to metastability in which the specified clock-to-output delay (tCO) is violated.
22 Source: Altera
![Page 23: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published](https://reader036.fdocuments.in/reader036/viewer/2022062909/5b049dc57f8b9aba168d924c/html5/thumbnails/23.jpg)
Mechanism
23
› The mechanism is analagous to a ball dropped on a hill. If setup and hold times are met, the ball is dropped near a stable state.
› Due to noise, it is eventually resolved but this takes an indeterminate amount of time.
Source: Altera
![Page 24: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published](https://reader036.fdocuments.in/reader036/viewer/2022062909/5b049dc57f8b9aba168d924c/html5/thumbnails/24.jpg)
Modeling Metastability
24
› The mean time between failure can be modeled as
- where C1 and C2 are process constants
- fCLK, fDATA are clock frequency of receiving signal and toggle rate of asynchronous input
- tMET is timing slack available beyond the register’s tCO, for a potentially metastable signal to resolve to a known value
- tMET/C2 has largest effect and increases in transistor speed improve metastability MTBF
![Page 25: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published](https://reader036.fdocuments.in/reader036/viewer/2022062909/5b049dc57f8b9aba168d924c/html5/thumbnails/25.jpg)
Synchronization Chains
25
› Metastability can be virtually eliminated (with an increase in latency) via synchronization chains. tMET is sum of the output timing slacks for each register in the chain - e.g. if tMET is 50 ps, increasing to 1 ns improves MTBF by e20=5×108 times
› Signals on different clock domains can be handled with a dual-clock FIFO which contains synchronizers
› Never connect asynchronous input to multiple flip-flops as they can resolve at different speeds
![Page 26: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published](https://reader036.fdocuments.in/reader036/viewer/2022062909/5b049dc57f8b9aba168d924c/html5/thumbnails/26.jpg)
Clock skew
› The clock signal can arrive at different times, this is called clock skew
› Clock subject to variation, this is called clock jitter
› While quite well controlled, skew and jitter reduce maximum clock frequency - tclk ≥ tlogic + tsu+ tcq + tskew + tjitter
26
![Page 27: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published](https://reader036.fdocuments.in/reader036/viewer/2022062909/5b049dc57f8b9aba168d924c/html5/thumbnails/27.jpg)
Conclusions
› Studied many different ways to implement FSMs
› Mostly we use - outputs decoded combinatorially from state registers
- one-hot (particularly good for high speed small-medium FSMs on FPGAs)
- RAM/ROM
› Consider metastability when dealing with asynchronous inputs
27
![Page 28: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published](https://reader036.fdocuments.in/reader036/viewer/2022062909/5b049dc57f8b9aba168d924c/html5/thumbnails/28.jpg)
References
› An excellent tutorial on HDL, particularly the “Technology Independent Coding Style” chapter http://www.actel.com/documents/hdlcode_ug.pdf
› Altera White Paper, Understanding Metastability in FPGAs http://www.altera.com/literature/wp/wp-01082-quartus-ii-metastability.pdf
28
![Page 29: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published](https://reader036.fdocuments.in/reader036/viewer/2022062909/5b049dc57f8b9aba168d924c/html5/thumbnails/29.jpg)
Reconfigurable Computing Shift Registers and Random Number Generators
School of Electrical and Information Engineering
http://www.ee.usyd.edu.au/~phwl
Philip Leong ([email protected])
![Page 30: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published](https://reader036.fdocuments.in/reader036/viewer/2022062909/5b049dc57f8b9aba168d924c/html5/thumbnails/30.jpg)
Introduction
› Efficient FPGA design is often nonobvious - with practice, you can learn when to use the special features
› This lecture - Shift registers - Linear feedback shift register (LFSR) - Using RAM for shift registers - True random number generator
![Page 31: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published](https://reader036.fdocuments.in/reader036/viewer/2022062909/5b049dc57f8b9aba168d924c/html5/thumbnails/31.jpg)
Shift register
› Basic building block in digital circuit design
› Features - high speed due to simple logic & interconnection
- low interconnection costs
![Page 32: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published](https://reader036.fdocuments.in/reader036/viewer/2022062909/5b049dc57f8b9aba168d924c/html5/thumbnails/32.jpg)
Shift registers
› Applications - delay line
- serial-parallel conversion
- parallel-serial conversion
- boundary scan
- serial buses
- bit serial signal processing (more about this in future lectures)
![Page 33: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published](https://reader036.fdocuments.in/reader036/viewer/2022062909/5b049dc57f8b9aba168d924c/html5/thumbnails/33.jpg)
Straightforward Implementation
module shift_1x64 (clk, shift, sr_in, sr_out,); input clk, shift; input sr_in; output sr_out; reg [63:0] sr; always@(posedge clk) begin if (shift == 1'b1) begin sr[63:1] <= sr[62:0]; sr[0] <= sr_in; end end assign sr_out = sr[63]; endmodule How many LEs for an N bit SR?
![Page 34: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published](https://reader036.fdocuments.in/reader036/viewer/2022062909/5b049dc57f8b9aba168d924c/html5/thumbnails/34.jpg)
Straightforward Implementation
VHDL
if (clk’event and clk = ‘1’) then
sr <= srin & sr(sr’high downto (sr’low+1));
end if;
How many LEs for an N bit SR?
VERILOG module shift_1x64 (clk, shift, sr_in, sr_out,); input clk, shift; input sr_in; output sr_out; reg [63:0] sr; always@(posedge clk) begin if (shift == 1'b1) begin sr[63:1] <= sr[62:0]; sr[0] <= sr_in; end end assign sr_out = sr[63]; endmodule
![Page 35: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published](https://reader036.fdocuments.in/reader036/viewer/2022062909/5b049dc57f8b9aba168d924c/html5/thumbnails/35.jpg)
RAM
› Remember - Xilinx LUT as a memory has better density than a LUT
- Shift registers only need serial access to the memories
- i.e. do not need all of the outputs in parallel, only need the first input and the last output
![Page 36: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published](https://reader036.fdocuments.in/reader036/viewer/2022062909/5b049dc57f8b9aba168d924c/html5/thumbnails/36.jpg)
Shift register using RAM (4-LUT example)
![Page 37: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published](https://reader036.fdocuments.in/reader036/viewer/2022062909/5b049dc57f8b9aba168d924c/html5/thumbnails/37.jpg)
Address generation: counters
› Ripple carry counter
› Synchronous counter
› Gray code counter
› LFSR
› What are the advantages and disadvantages of each?
![Page 38: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published](https://reader036.fdocuments.in/reader036/viewer/2022062909/5b049dc57f8b9aba168d924c/html5/thumbnails/38.jpg)
Counters
![Page 39: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published](https://reader036.fdocuments.in/reader036/viewer/2022062909/5b049dc57f8b9aba168d924c/html5/thumbnails/39.jpg)
Gray code
![Page 40: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published](https://reader036.fdocuments.in/reader036/viewer/2022062909/5b049dc57f8b9aba168d924c/html5/thumbnails/40.jpg)
LFSR
› Linear feedback shift register - XNOR of certain outputs are fed back to the input
- it is maximum-length if it has a period of 2^N-1
- lets look at a length 4 maximum length LFSR
![Page 41: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published](https://reader036.fdocuments.in/reader036/viewer/2022062909/5b049dc57f8b9aba168d924c/html5/thumbnails/41.jpg)
4 bit Maximal Length LFSR
![Page 42: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published](https://reader036.fdocuments.in/reader036/viewer/2022062909/5b049dc57f8b9aba168d924c/html5/thumbnails/42.jpg)
4 bit LFSR
› How does it count?
![Page 43: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published](https://reader036.fdocuments.in/reader036/viewer/2022062909/5b049dc57f8b9aba168d924c/html5/thumbnails/43.jpg)
4 bit LFSR
› 0, 1, 3, 7, E, D, B, 6, C, 9, 2, 5, A, 4, 8, 0
› period is 2^4-1 = 15
› can be used as pseudorandom number generators
› note it only has a 2 input XOR gate as the critical path! - Compare with an adder’s carry chain
![Page 44: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published](https://reader036.fdocuments.in/reader036/viewer/2022062909/5b049dc57f8b9aba168d924c/html5/thumbnails/44.jpg)
Larger maximal length LFSRs (see XAPP052 for more)
› N=4 (4,3)
› N=8 (8,6,5,4)
› N=16 (16,15,13,4)
› N=32 (32,22,2,1)
› N=64 (64,63,61,60)
› N=128 (128,126,101,99)
› N=168 (168,166,153,151)
![Page 45: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published](https://reader036.fdocuments.in/reader036/viewer/2022062909/5b049dc57f8b9aba168d924c/html5/thumbnails/45.jpg)
Primitive polynomials
› A polynomial P of degree m over a finite field Fp is primitive, iff P is monic, P(0)≠0 and ord(P(x))=pm-1. - Monic means the coefficient of the term with the highest power is 1 - If P(0)≠0, ord(P) is smallest integer e for which P(x) divides xe-1 - E.g. P(x)=x4+x3+1 is primitive
› The feedback values for an XOR based maximum length LFSR correspond to the primitive polynomial
› Programs which compute primitive polynomials - Web based but only supports small n
http://wims.unice.fr/wims/wims.cgi?session=VX756ADCFA.2&+lang=en&+module=tool%2Falgebra%2Fprimpoly.en
![Page 46: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published](https://reader036.fdocuments.in/reader036/viewer/2022062909/5b049dc57f8b9aba168d924c/html5/thumbnails/46.jpg)
![Page 47: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published](https://reader036.fdocuments.in/reader036/viewer/2022062909/5b049dc57f8b9aba168d924c/html5/thumbnails/47.jpg)
LFSR counters
› Maximal LFSR period is 2^N-1 - sometimes we want the period < 2^N-1
- Arbitrary period counter with shorter period can be made by adding a synchronous reset circuit
![Page 48: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published](https://reader036.fdocuments.in/reader036/viewer/2022062909/5b049dc57f8b9aba168d924c/html5/thumbnails/48.jpg)
Virtex SRL
› Virtex has a macro called “Shift Register Lookup Table” (SRL) for implementing SRs
![Page 49: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published](https://reader036.fdocuments.in/reader036/viewer/2022062909/5b049dc57f8b9aba168d924c/html5/thumbnails/49.jpg)
LUT as a mux
![Page 50: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published](https://reader036.fdocuments.in/reader036/viewer/2022062909/5b049dc57f8b9aba168d924c/html5/thumbnails/50.jpg)
LUT as an addressable SR
• Length of SR can be controlled using the D output
![Page 51: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published](https://reader036.fdocuments.in/reader036/viewer/2022062909/5b049dc57f8b9aba168d924c/html5/thumbnails/51.jpg)
Virtex 5
› Virtex 5-7 series slice has 4x 6-input LUTs, each LUT can be a 64x1 or 32x2 RAM
› SRL16x2 and SRL32 primitives are available per LUT (in SLICEM)
![Page 52: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published](https://reader036.fdocuments.in/reader036/viewer/2022062909/5b049dc57f8b9aba168d924c/html5/thumbnails/52.jpg)
SRLC32E
![Page 53: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published](https://reader036.fdocuments.in/reader036/viewer/2022062909/5b049dc57f8b9aba168d924c/html5/thumbnails/53.jpg)
True Random Number Generator
› Introduction
› Ring Oscillator based noise source
› Alternating Sequence Generator
› Results
› Conclusion
![Page 54: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published](https://reader036.fdocuments.in/reader036/viewer/2022062909/5b049dc57f8b9aba168d924c/html5/thumbnails/54.jpg)
Introduction
› Random number generators (RNGs) are in widespread use, main applications are in - Simulation
- Cryptography
› Propose random number generators for these applications - Physical RNG (PRNG) based on oscillator phase noise
![Page 55: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published](https://reader036.fdocuments.in/reader036/viewer/2022062909/5b049dc57f8b9aba168d924c/html5/thumbnails/55.jpg)
What is a random number generator?
› A cryptographically secure random bit generator (CSRBG) is one which produces sequences for which there is no polynomial time algorithm which, on input of the first l bits of the output sequences, can predict the (l+1)st bit of s with a probability significantly greater than 0.5
![Page 56: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published](https://reader036.fdocuments.in/reader036/viewer/2022062909/5b049dc57f8b9aba168d924c/html5/thumbnails/56.jpg)
Applications
› Designed for embedded FPGA applications - Small
- Fast
› For cryptographic applications - Secure
![Page 57: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published](https://reader036.fdocuments.in/reader036/viewer/2022062909/5b049dc57f8b9aba168d924c/html5/thumbnails/57.jpg)
Physical random number generators
› Oscillator sampling
› Direct amplification of transistor/resistor noise
› Discrete time chaos
![Page 58: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published](https://reader036.fdocuments.in/reader036/viewer/2022062909/5b049dc57f8b9aba168d924c/html5/thumbnails/58.jpg)
Oscillator phase noise source
• One method to produce a real random number source in an FPGA is to sample a high frequency signal with a low quality low frequency clock
• If the phase noise of the low frequency oscillator is of the same order as the period of the high frequency clock, output is quite random
![Page 59: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published](https://reader036.fdocuments.in/reader036/viewer/2022062909/5b049dc57f8b9aba168d924c/html5/thumbnails/59.jpg)
Ring Oscillator based noise source
![Page 60: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published](https://reader036.fdocuments.in/reader036/viewer/2022062909/5b049dc57f8b9aba168d924c/html5/thumbnails/60.jpg)
Increased Randomness via Clock Doubling
![Page 61: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published](https://reader036.fdocuments.in/reader036/viewer/2022062909/5b049dc57f8b9aba168d924c/html5/thumbnails/61.jpg)
Poker Test (ONS only)
› Checks that each m-bit string occurs the correct number of times › Passed if the result is between 1.03 and 57.4
From Menezes, “Handbook of Applied Cryptography”
![Page 62: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published](https://reader036.fdocuments.in/reader036/viewer/2022062909/5b049dc57f8b9aba168d924c/html5/thumbnails/62.jpg)
Poker Test (ONS only)
![Page 63: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published](https://reader036.fdocuments.in/reader036/viewer/2022062909/5b049dc57f8b9aba168d924c/html5/thumbnails/63.jpg)
PRNG
› At this point, have a mediocre physical RNG that doesn’t pass Poker test › Can turn into a good RNG by reducing sampling clock rate and passing through
a parity filter to deskew non-uniform distribution (Tsoi, Leung and Leong, FCCM03) - slow
› How can we make a fast one?
![Page 64: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published](https://reader036.fdocuments.in/reader036/viewer/2022062909/5b049dc57f8b9aba168d924c/html5/thumbnails/64.jpg)
Alternating Step Generator
• Selected among many stream ciphers because of its compact implementation
• Recent research has shown weaknesses in this cipher (but any other can be used)
![Page 65: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published](https://reader036.fdocuments.in/reader036/viewer/2022062909/5b049dc57f8b9aba168d924c/html5/thumbnails/65.jpg)
Modified Alternating Step Generator
![Page 66: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published](https://reader036.fdocuments.in/reader036/viewer/2022062909/5b049dc57f8b9aba168d924c/html5/thumbnails/66.jpg)
LFSRs
› 127 and 129 bit LFSRs based on primitive polynomials with approx equal 1 and 0 coefficients - LFSR1(x) = x127 + x102 + x61 + x59 + x58 + x57 + x56 + x55 + x54 + x53 +
x52 + x51 + x50 + x49 +x48 + x47 + x46 + x45 + x44 + x43 + x42 + x41 + x40 + x39 + x38 + x37 + x36 + x35 +x34 + x33 + x32 + x31 + x30 + x29 + x28 + x27 + x26 + x25 + x24 + x23 + x22 + x21 +x20 + x19 + x18 + x17 + x16 + x15 + x14 + x13 + x12 + x11 + x10 + x9 + x8 + x7 +x6 + x5 + x4 + x3 + x2 + x + 1
- LFSR2(x) = x129 + x119 + x62 + x61 + x60 + x59 + x58 + x57 + x56 + x55 + x54 + x53 + x52 + x51 +x50 + x49 + x48 + x47 + x46 + x45 + x44 + x43 + x42 + x41 + x40 + x39 + x38 + x37 +x36 + x35 + x34 + x33 + x32 + x31 + x30 + x29 + x28 + x27 + x26 + x25 + x24 + x23 +x22 + x21 + x20 + x19 + x18 + x17 + x16 + x15 + x14 + x13 + x12 + x11 + x10 + x9+x8 + x7 + x6 + x5 + x4 + x3 + x2 + x + 1
![Page 67: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published](https://reader036.fdocuments.in/reader036/viewer/2022062909/5b049dc57f8b9aba168d924c/html5/thumbnails/67.jpg)
Implementation
› Results for Virtex XCV300E-8 (including computer interface) - Period 7.482ns, 129 slices (4%), 4 BRAM (12%) - High frequency clock can be > 400 MHz (133 MHz used) - Ring oscillator freq > 800 MHz
› Reported results are for no clock doubler but doubled version passes tests as well
› Passes Crush, NIST and Diehard tests
![Page 68: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published](https://reader036.fdocuments.in/reader036/viewer/2022062909/5b049dc57f8b9aba168d924c/html5/thumbnails/68.jpg)
NIST and Diehard Tests
![Page 69: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published](https://reader036.fdocuments.in/reader036/viewer/2022062909/5b049dc57f8b9aba168d924c/html5/thumbnails/69.jpg)
TRNG Summary
› Combines weak RNG with high speed stream cipher to produce a physical noise source
› Clock doubler increases oscillator phase noise › Small area, high output rate, good statistical properties › ASG makes a very low cost implementation, other stream ciphers can be used
![Page 70: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published](https://reader036.fdocuments.in/reader036/viewer/2022062909/5b049dc57f8b9aba168d924c/html5/thumbnails/70.jpg)
Generalized LFSRs
› A variation useful for cryptographic applications - Feedback taps are programmable (can be updated after synthesis)
- Polynomial has high weight so it has many feedback taps requiring high fan-in XNOR gate
› Question: how do we design such a circuit?
![Page 71: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published](https://reader036.fdocuments.in/reader036/viewer/2022062909/5b049dc57f8b9aba168d924c/html5/thumbnails/71.jpg)
Block diagram
Q
QSET
CLR
D
Q
QSET
CLR
D
Q
QSET
CLR
D
Q
QSET
CLR
D
Q
QSET
CLR
D
...
PROGRAMMABLE LUT implementing required XNOR
CLK
CE
DIN
Q1 Q3Q2 Q15
F(Q1,Q2,…,Q15)
![Page 72: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published](https://reader036.fdocuments.in/reader036/viewer/2022062909/5b049dc57f8b9aba168d924c/html5/thumbnails/72.jpg)
Implementing the LUT
LUT 1 LUT 2 LUT 3 LUT 4
LUT 1
Q1 Q4Q3Q2 Q13 Q16Q15Q14Q9 Q12Q11Q10Q5 Q8Q7Q6
F(Q1,Q2,…,Q16)
• SR used to program the LUTs to implement the required feedback arrangement WITHOUT CHANGING THE BITSTREAM
• Can be reduced from 2 levels of logic to 1 by pipelining
![Page 73: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published](https://reader036.fdocuments.in/reader036/viewer/2022062909/5b049dc57f8b9aba168d924c/html5/thumbnails/73.jpg)
Generalized LFSR Floorplan
• Serpentine arrangement reduces wire lengths
LFSR Cell LFSR CellLFSR CellLFSR Cell
LFSR Cell LFSR CellLFSR CellLFSR Cell
LFSR Cell LFSR CellLFSR CellLFSR Cell
LFSR Cell LFSR CellLFSR CellLFSR Cell
Stage 1 pipeline
![Page 74: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published](https://reader036.fdocuments.in/reader036/viewer/2022062909/5b049dc57f8b9aba168d924c/html5/thumbnails/74.jpg)
LFSR cell
Q
QSET
CLR
DSR input SR output
Q
QSET
CLR
D
From SouthConnections
x3
u1x2
x1
f(x1...xn)
To left handtop cell
Leftmost cell hasone input from
each of theflip-flops on the
first row
x3
x4
u1 x2
x1
f(x1...xn)Row 1 Leftmost cell only
Row 1 Non-leftmost cell only
All cells• RAMs used to customize LFSR polynomial (can be loaded at any time)
• Leftmost stage has no pipelining
![Page 75: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published](https://reader036.fdocuments.in/reader036/viewer/2022062909/5b049dc57f8b9aba168d924c/html5/thumbnails/75.jpg)
Pipelining Logic
Q
QSET
CLR
D
Q
QSET
CLR
D
Q
QSET
CLR
D
Q
QSET
CLR
D
• Must compensate for the extra latency introduced by pipelining
![Page 76: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published](https://reader036.fdocuments.in/reader036/viewer/2022062909/5b049dc57f8b9aba168d924c/html5/thumbnails/76.jpg)
Conclusions
› If we can use Xilinx distributed RAM it gives higher better density - can be used for shift registers, counters & PRNG
- Virtex has a SRL macro for even better efficiency
› LFSR - generate random numbers - high speed nonbinary counters and dividers - Can construct rrue random number generator using oscillator phase noise and the ASG
![Page 77: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published](https://reader036.fdocuments.in/reader036/viewer/2022062909/5b049dc57f8b9aba168d924c/html5/thumbnails/77.jpg)
References
› XAPP052 “Efficient Shift Registers, LFSR Counters and Long PseudoRandom Sequence Generators” and XAPP210 “LFSRs in Virtex Devices” (http://www.xilinx.com/support/documentation/application_notes/xapp052.pdf)
› K.H. Tsoi, Ka Ho Leung, and Philip H.W. Leong. A high performance physical random number generator. IEE Proc. Computers & Digital Techniques, 1(4):349–352, July 2007. (http://www.cse.cuhk.edu.hk/~phwl/mt/public/archives/papers/prng_cdt07.pdf)
![Page 78: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published](https://reader036.fdocuments.in/reader036/viewer/2022062909/5b049dc57f8b9aba168d924c/html5/thumbnails/78.jpg)
Review Questions
› Make sure you understand how the SRL is implemented in an FPGA
› Make sure you can implement an LFSR in VHDL/Verilog
› When is an LFSR more efficient than a counter for counting applications?