Midterm Exam next week Tuesday (4/5) EECS150 -...
Transcript of Midterm Exam next week Tuesday (4/5) EECS150 -...
1
Spring 2005 EECS150 - Lec19-review Page 1
EECS150 - Digital DesignLecture 19 – Review
March 31, 2005John Wawrzynek
Spring 2005 EECS150 - Lec19-review Page 2
Exam II• Midterm Exam next week Tuesday (4/5)
– In class– Closed book/notes– Covers lectures 9 (FSMs) through lecture 17 (memory 1)
• Exam held in 125 Cory• Today:
– Highlights from lectures 9 - 17– I will mention most important points from each lecture– Exam may cover subtopics not mentioned today– Use homework as a guide to the type of questions on the exam
Spring 2005 EECS150 - Lec19-review Page 3
Lecture 9 - Finite State Machines 1
February 15, 2005
Spring 2005 EECS150 - Lec19-review Page 4
Finite State Machines (FSMs)• FSM circuits are a type of
sequential circuit:– output depends on present and
past inputs• effect of past inputs is represented
by the current state
• Behavior is represented by State Transition Diagram:– traverse one edge per clock cycle.
2
Spring 2005 EECS150 - Lec19-review Page 5
Formal Design Process
Review of Design Steps:
1. Specify circuit function (English)2. Draw state transition diagram3. Write down symbolic state transition table4. Write down encoded state transition table5. Derive logic equations6. Derive circuit diagram
FFs for stateCL for NS and OUT
Spring 2005 EECS150 - Lec19-review Page 6
State Encoding• One-hot encoding of states.• One FF per state.
• Why one-hot encoding?– Simple design procedure.
• Circuit matches state transition diagram (example next page).– Often can lead to simpler and faster “next state” and output logic.
• Why not do this?– Can be costly in terms of FFs for FSMs with large number of states.
• FPGAs are “FF rich”, therefore one-hot state machine encoding is often a good approach.
Spring 2005 EECS150 - Lec19-review Page 7
One-hot encoded FSM• Even Parity Checker Circuit:
• In General: • FFs must be initialized for correct operation (only one 1)
Circuit generated through direct inspection of the STD.
Spring 2005 EECS150 - Lec19-review Page 8
Lecture 10 - Finite State Machines 2
February 17, 2005
3
Spring 2005 EECS150 - Lec19-review Page 9
FSM RecapMoore Machine Mealy Machine
STATE[output values]
input value
STATE
input value/output values
Both machine types allow one-hot implementations.
Spring 2005 EECS150 - Lec19-review Page 10
FSM ComparisonSolution A
Moore Machine• output function only of PS• maybe more states (why?)• synchronous outputs
– no glitches– one cycle “delay”– full cycle of stable output
Solution BMealy Machine
• output function of both PS & input• maybe fewer states• asynchronous outputs
– if input glitches, so does output– output immediately available– output may not be stable long
enough to be useful (below):
If output of Mealy FSM goes through combinational logic before being registered, the CL might delay the signal and it could be missed by the clock edge.
Spring 2005 EECS150 - Lec19-review Page 11
General FSM Design Process with Verilog Implementation
Design Steps:1. Specify circuit function (English)2. Draw state transition diagram3. Write down symbolic state transition table4. Assign encodings (bit patterns) to symbolic states5. Code as Verilog behavioral description� Use parameters to represent encoded states.� Use separate always blocks for register assignment and CL
logic block.� Use case for CL block. Within each case section assign all
outputs and next state value based on inputs. Note: For Moore style machine make outputs dependent only on state not dependent on inputs.
Spring 2005 EECS150 - Lec19-review Page 12
FSMs in Verilog
always @(posedge clk) if (rst) ps <=
ZERO;else ps <= ns;
always @(ps in)case (ps)
ZERO: if (in) begin out = 1’b1;ns = ONE;
endelse begin
out = 1’b0;ns = ZERO;
endONE: if (in) begin
out = 1’b0;ns = ONE;endelse begin
out = 1’b0;ns = ZERO;
enddefault: begin
out = 1’bx; ns = default;
end
always @(posedge clk) if (rst) ps <= ZERO;else ps <= ns;
always @(ps in)case (ps)
ZERO: beginout = 1’b0;if (in) ns = CHANGE;else ns = ZERO;
endCHANGE: begin
out = 1’b1;if (in) ns = ONE;else ns = ZERO;
endONE: begin
out = 1’b0;if (in) ns = ONE;else ns = ZERO;
default: begin out = 1’bx; ns = default;
end
Mealy Machine Moore Machine
4
Spring 2005 EECS150 - Lec19-review Page 13
Lecture 11 - Shifters & Counters
February 24, 2003
Spring 2005 EECS150 - Lec19-review Page 14
Universal Shift-register
Spring 2005 EECS150 - Lec19-review Page 15
Shift Registers• Plain shift register:
• Shifter with shift-enable input
• Verilog: assign OUT = Q[0];always @ (posedge clk)
if (shiftEnable) Q <= {IN; Q[3:1]};else Q <= Q;
• FPGA FFs have clock-enable (CE), therefore muxes are not needed.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Spring 2005 EECS150 - Lec19-review Page 16
Controller using Counters• State Transition Diagram:
– Assume presence of two binary counters. An “i” counter for the outer loop and “j” counter for inner loop.
counter
CLK RST
CE TC
IDLECEi,CEj
RSTi
CEi,CEj
INNER<inner contol>
CEi,CEj
RSTj
OUTER<outer contol>
STARTSTART
TCi
TCj
TCi
TCj
TC is asserted when the counter reaches it maximum count value.CE is “count enable”. The counterincrements its value on the rising edge of the clock if CE is asserted.
5
Spring 2005 EECS150 - Lec19-review Page 17
Odd Counts• Extra combinational logic can be
added to terminate count before max value is reached:
• Example: count to 12
• Alternative:
4-bit binarycounter
load
4
TC
Spring 2005 EECS150 - Lec19-review Page 18
Synchronous Counters• How do we extend to n-bits?• Extrapolate c+: d+ = d ⊕ abc, e+ = e ⊕ abcd
• Has difficulty scaling (AND gate inputs grow with n)
• CE is “count enable”, allows external control of counting, • TC is “terminal count”, is asserted on highest value, allows cascading,
external sensing of occurrence of max value.
a b c
a+ b+ c+
d
d+
b
b+
c
c+
a
a+
d
d+
CE TC
Spring 2005 EECS150 - Lec19-review Page 19
Synchronous Counters
b
b+
c
c+
a
a+
d
d+
CE TC
• How does this one scale?� Delay grows α n
• Generation of TC signals very similar to generation of carry signals in adder.
• “Parallel Prefix” circuit reduces delay:
log2n
log2n
Spring 2005 EECS150 - Lec19-review Page 20
Ring Counters• “one-hot” counters0001, 0010, 0100, 1000, 0001, …
“Self-starting” version:
• What are these good for?
D Q D Q D Q D Q
q3 q2 q1 q0
D Q D Q D Q D Q
q3 q2 q1
S R S R S R S R
q0
0 0 0 0reset
6
Spring 2005 EECS150 - Lec19-review Page 21
Lecture 12 – Project Description
March 1, 2005
Spring 2005 EECS150 - Lec19-review Page 22
Digital Audio– Music waveform
– A series of numbers is used to represent the waveform, ratherthan a voltage or current, as in analog systems.
• Discrete time: regular spacing of sample values in time. Most digital audio system use 44.1KHz (consumer) sample rate or 48KHz (professional) sample rate. – Lower frequency would limit the maximum representable frequency
content. (Human hearing max is 20KHz)• Digital: All inputs/outputs and internal values (signals) take on discrete
values (not analog). Most digital audio systems use 16-bit values (64K possible values for any point in waveform). Using much fewer than 16 bits generates noticeable noise from distortion.
Spring 2005 EECS150 - Lec19-review Page 23
Analog / Digital Conversion
• Converters are used to move from/to the analog domain.• ADC & DAC often combined in a single chip called CODEC
(coder/decoder).• Other types of CODECs perform other functions (ex: video conversion,
audio compression/decompression).
Digital SystemDigital System
processingprocessing
recordingrecording
playbackplayback
synthesissynthesis
Analog to DigitalConverter (ADC)
sound source
(microphone)
sample clock
26, 46, 51, 55, 51, …
Digital to AnalogConverter (DAC)
26, 46, 51, 55, 51, …
sample clock
poweramplifier
decompressiondecompression
compressioncompression
Spring 2005 EECS150 - Lec19-review Page 24
Digital Audio Data-rates
• Relatively small storage devices has prompted the development and application of many compression algorithms for music and speech:
– Typically compression ratios of 10-100– MP3: 32Kbits/sec - 320Kbits/sec (factor of 4x to 44x)– These techniques are lossy; information is lost. However the better ones (MP3 &
AAC for example) used techniques based on characteristics of human auditory perception to drop information of little importance.
• In our project, uncompressed audio will be used. – Sufficient network bandwidth to support multiple streams of audio.– Much simpler hardware design.
• Uncompressed audio is often referred to as PCM (pulse code modulation) . (.wav files in windows)
44.1K samples/sec x 2 (stereo) x 16 bits/samples = 1.4 Mbit/sec = 176,400 Bytes/sec
1 minute ≈ 10MByte total
7
Spring 2005 EECS150 - Lec19-review Page 25
Local Area Network (LAN) Basics
• A LAN is made up physically of a set of switches, wires, and hosts. Routers and gateways provide connectivity out to other LANs and to the internet.
• Ethernet defines a set of standards for data-rate (10/100Mbps, 1/10Gbps), and signaling to allow switches and computers to communicate.
• Most Ethernet implementations these days are “switched” (point to point connections between switches and hosts, no contention or collisions).
• Information travels in variable sized blocks, called Ethernet Frames, each frame includes preamble, header (control) information, data, and error checking. We usually call these packets.
• Preamble is a fixed pattern used by receivers to synchronize their clocks to the data.
• Link level protocol on Ethernet is called the Medium Access Control (MAC) protocol. It defines the format of the packets.
switchswitch
host
host
host host
switch
host
to router or gateway
Preamble MAC Payload CRC(8 bytes) header
Spring 2005 EECS150 - Lec19-review Page 26
Ethernet Medium Access Control (MAC)
• MAC protocol encapsulates a payload by adding a 14 byte header before the data and a 4-byte cyclic redundancy check (CRC) after the data.
• The CRC provides error detection in the case where line errors result in corruption of the MAC frame. In most applications a frame with an invalid CRC is discarded by the MAC receiver.
Ethertypes for EECS150 project: 0x0101: audio packets0x0102: LCD packets(picked from the range of “experimental” type codes to avoid potential conflict.
– One way transmission only.– All packets will be broadcasted
• A 6-byte destination address, specifies either a single recipient node (unicastmode), a group of recipient nodes (multicast mode), or the set of all recipient nodes (broadcast mode).
• A 6-byte source address, is set to the sender’s globally unique node address. Its common function is to allow address learning which may be used to configure the filter tables in switches.
• A 2-byte type field, identifies the type of protocol being carried (e.g. 0x0800 for IP protocol).
Spring 2005 EECS150 - Lec19-review Page 27
Protocol Stacks• Usual case is that MAC protocol
encapsulates IP (internet protocol) which in turn encapsulates TCP (transport control protocol) with in turn encapsulates the application layer. Each layer adds its own headers.
• Other protocols exist for other network services (ex: printers).
• When the reliability features (retransmission) of TCP are not needed, UDP/IP is used. Gaming and other applications where reliability is provided at the application layer.
application layerex: http
TCP
IP
MAC Layer 2Layer 3Layer 4Layer 5
Streaming Ex. Mpeg4
UDP
IP
MAC Layer 2Layer 3Layer 4Layer 5
Spring 2005 EECS150 - Lec19-review Page 28
Standard Hardware-Network-Interface
• Usually divided into three hardware blocks. (Application level processing could be either hardware or software.)
– MAG. “Magnetics” chip is a transformer for providing electrical isolation.
– PHY. Provides serial/parallel and parallel/serial conversion and encodes bit-stream for Ethernet signaling convention. Drives/receives analog signals to/from MAG. Recovers clock signal from data input.
– MAC. Media access layer processing. Processes Ethernet frames: preambles, headers, computes CRC to detect errors on receiving and to complete packet for transmission. Buffers (stores) data for/from application level.
• Application level interface– Could be a standard bus (ex: PCI)– or designed specifically for application
level hardware.• MII is an industry standard for
connection PHY to MAC.
MAG(transformer)
PHY(Ethernet signal)
MAC(MAC layer processing)
application level
interfaceEthernet
connection
Media Independent Interface (MII)
Calinx has no MAC chip, mustbe handled in FPGA.
Calinx has no MAC chip, mustbe handled in FPGA.
8
Spring 2005 EECS150 - Lec19-review Page 29
Lecture 14 - CMOS
March 8, 2005
Spring 2005 EECS150 - Lec19-review Page 30
Transistor-level Logic Circuits• NAND gate • NOR gate
• Note: – out = 0 iff both a OR b = 1 therefore out =
(a+b)’– Again pFET network and nFET network are
duals of one another.
Other more complex functions are possible. Ex: out = (a+bc)’
Spring 2005 EECS150 - Lec19-review Page 31
Transmission Gate• Transmission gates are the way to build “switches” in CMOS. • In general, both transistor types are needed:
– nFET to pass zeros.– pFET to pass ones.
• The transmission gate is bi-directional (unlike logic gates).
• Does not directly connect to Vdd and GND, but can be combined with logic gates or buffers to simplify many logic structures.
Spring 2005 EECS150 - Lec19-review Page 32
Pass-Transistor Multiplexor• 2-to-1 multiplexor:
c = sa + s’b
• Switches simplify the implementation:
s
s’b
a
c
9
Spring 2005 EECS150 - Lec19-review Page 33
Tri-state Buffers
• Bidirectional connections: • Busses:
Tri-state buffers are used when multiple circuits all connect to a common bus.Only one circuit at a time is allowed to drive the bus. All others “disconnect”.
Spring 2005 EECS150 - Lec19-review Page 34
Transistor-level Logic CircuitsPositive Level-sensitive latch:
Latch Transistor Level:Positive Edge-triggered flip-flop
built from two level-sensitive latches:
clk’
clk
clk
clk’
Spring 2005 EECS150 - Lec19-review Page 35
Lecture 15 - Timing
March 10, 2005
Spring 2005 EECS150 - Lec19-review Page 36
Limitations on Clock Rate1 Logic Gate Delay
• What are typical delay values?
2 Delays in flip-flops
• Both times contribute to limiting the clock period. Plus clock skew.
t
input
output
D
clk
Q
setup time clock to Q delay
• What must happen in one clock cycle for correct operation?• Assuming perfect clock distribution (all flip-flops see the clock at the same
time):– All signals connected to FF inputs must be ready and “setup” before
rising edge of clock.
10
Spring 2005 EECS150 - Lec19-review Page 37
General Model of Synchronous Circuit
• In general, for correct operation:
for all paths.• How do we enumerate all paths?
– Any circuit input or register output to any register input or circuit output.
– “setup time” for circuit outputs depends on what it connects to– “clk-Q time” for circuit inputs depends on from where it comes.
reg regCL CL
clock input
output
option feedback
input output
T ≥ time(clk→Q) + time(CL) + time(setup)T ≥ τclk→Q + τCL + τsetup
Spring 2005 EECS150 - Lec19-review Page 38
Gate Switching Behavior• Inverter:
• NAND gate:
Spring 2005 EECS150 - Lec19-review Page 39
Gate Delay• Cascaded gates:
Vout
Vin
“transfer curve” for inverter.
Spring 2005 EECS150 - Lec19-review Page 40
Gate Delay• Fan-out:
• The delay of a gate is proportional to its output capacitance. Because, gates 2 and 3 turn on/off at a later time. (It takes longer for the output of gate 1 to reach the switching threshold of gates 2 and 3 as we add more output capacitance.)
1
3
2
11
Spring 2005 EECS150 - Lec19-review Page 41
“Critical” Path
• Critical Path: the path with the maximum delay, from any input to any output.– In general, we include register set-up and clk-to-Q times in
critical path calculation.
• What is the critical path in this circuit?
• Why do we care about the critical path?
Spring 2005 EECS150 - Lec19-review Page 42
Delay in Flip-flops• Setup time results from delay
through first latch.
• Clock to Q delay results from delay through second latch.
D
clk
Q
setup time clock to Q delay
clk
clk’
clk
clk
clk’
clk’
clk
clk’
Spring 2005 EECS150 - Lec19-review Page 43
Clock Skew (cont.)
• If clock period T = TCL+Tsetup+Tclk→Q, circuit will fail.• Therefore:
1. Control clock skewa) Careful clock distribution. Equalize path delay from clock source to all clock loads by controlling wires delay and buffer delay.b) don’t “gate” clocks.
2. T ≥ TCL+Tsetup+Tclk→Q + worst case skew.
• Most modern large high-performance chips (microprocessors) control end to end clock skew to a few tenths of a nanosecond.
clock skew, delay in distributionCL
CLKCLK’
CLK
CLK’
Spring 2005 EECS150 - Lec19-review Page 44
Lecture 16 - Power
March 15, 2005
12
Spring 2005 EECS150 - Lec19-review Page 45
Basics• Power supply provides energy for charging and discharging wires and transistor
gates. The energy supplied is stored & then dissipated as heat.
• If a differential amount of charge dq is given a differential increase in energy dw, the potential of the charge is increased by:
• By definition of current: dqdwV /=dtdqI /=
dtdwP /≡ Power: Rate of work being done w.r.t time.Rate of energy being used.
IVPdtdq
dqdw
dtdw ×==×=/
�∞−
=t
Pdtw total energy
Units: tEP ∆= Watts = Joules/seconds
A very practicalformulation!
If we would liketo know total energy
Spring 2005 EECS150 - Lec19-review Page 46
Metrics• How does MIPS/watt relate to energy?• Average power consumption = energy / time
MIPS/watt = instructions/sec / joules/sec = instructions/joule
– therefore an equivalent metric (reciprocal) is energy per operation (E/op)
• E/op is more general - applies to more that processors– also, usually more relevant, as batteries life is limited by total
energy draw.– This metric gives us a measure to use to compare two alternative
implementations of a particular function.
Spring 2005 EECS150 - Lec19-review Page 47
Power in CMOS
C
pullupnetwork
pulldownnetwork
Vdd
GND
10
i(t)
v(t)t0 t1
v(t)
VddSwitching Energy:
energy used toswitch a node
Energy supplied Energy dissipatedEnergy stored
Calculate energy dissipated in pullup:
Esw = P(t)dt =t0
t1� (Vdd − v) ⋅ i(t)dt =t0
t1� (Vdd − v) ⋅ c (dv dt) dt =t0
t1�
= cVdd dv − c v ⋅ dv = cVdd2 −1 2cVdd
2
v0
v1�v0
v1� =1 2cVdd2
An equal amount of energy is dissipated on pulldown.
Spring 2005 EECS150 - Lec19-review Page 48
Controlling Energy Consumption
• Largest contributing component to CMOS power consumption is switching power:
• Factors influencing power consumption:n: total number of nodes in circuit
α: activity factor (probability of each node switching)
f: clock frequency (does this effect energy consumption?)
Vdd: power supply voltage
• What control do you have over each factor? • How does each effect the total Energy?
What control do you have as a designer?
221 ddavgavgavg VcfnP ⋅⋅⋅= α
In EECS150 design projects, we will not optimize for power consumption.
13
Spring 2005 EECS150 - Lec19-review Page 49
Lecture 17 – Memory 1
March 17, 2005
Spring 2005 EECS150 - Lec19-review Page 50
Standard Internal Memory Organization
• Special circuit tricks are used for the cell array to improve storage density. (We will look at these later)
• RAM/ROM naming convention: – examples: 32 X 8, "32 by 8" => 32 8-bit words – 1M X 1, "1 meg by 1" => 1M 1-bit words
2-D arrary of bit cells. Each cell stores one bit of data.
Spring 2005 EECS150 - Lec19-review Page 51
Read Only Memory (ROM)• Simply form of memory. No write operation needed.• Functional Equivalence:
• Full tri-state buffers are not needed at each cell point.• In practice, single transistors are used to implement zero cells. Logic
one’s are derived through precharging or bit-line pullup transistor.
Connections to Vddused to store a logic 1, connections to GND for storing logic 0.
address decoder bit-cell array
Spring 2005 EECS150 - Lec19-review Page 52
Column MUX in ROMs and RAMs: • Controls physical aspect ratio
– Important for physical layout and to control delay on wires.
• In DRAM, allows time-multiplexing of chip address pins
14
Spring 2005 EECS150 - Lec19-review Page 53
Cascading Memory Modules (or chips) • Example: assemblage of 256 x 8
ROM using 256 x 4 modules:
• example: 1K x * ROM using 256 x 4 modules:
• each module has tri-state outputs:
Spring 2005 EECS150 - Lec19-review Page 54
Memory Components Types:• Volatile:
– Random Access Memory (RAM): • DRAM "dynamic" • SRAM "static"
• Non-volatile:– Read Only Memory (ROM):
• Mask ROM "mask programmable" • EPROM "electrically programmable" • EEPROM "erasable electrically programmable" • FLASH memory - similar to EEPROM with programmer
integrated on chip
Spring 2005 EECS150 - Lec19-review Page 55
Volatile Memory Comparison
• SRAM Cell
• Larger cell � lower density, higher cost/bit
• No refresh required
• Simple read � faster access • Standard IC process � natural for
integration with logic
• DRAM Cell
• Smaller cell � higher density, lower cost/bit
• Needs periodic refresh, and refresh after read
• Complex read � longer access time • Special IC process � difficult to integrate
with logic circuits
word line
bit line bit line
word line
bit line
The primary difference between different memory types is the bit cell.
Spring 2005 EECS150 - Lec19-review Page 56
Dual-ported Memory Internals• Add decoder, another set of
read/write logic, bits lines, word lines:
• Example cell: SRAM
• Repeat everything but cross-coupled inverters.
• This scheme extends up to a couple more ports, then need to add additional transistors.
deca decbcell
array
r/w logic
r/w logic
data portsaddress
ports
b2 b2b1 b1
WL2
WL1