Midterm Exam next week Tuesday (4/5) EECS150 -...

1

Spring 2005 EECS150 - Lec19-review

EECS150 - Digital DesignLecture 19 – Review

March 31, 2005John Wawrzynek

Spring 2005 EECS150 - Lec19-review Page 2

Exam II• Midterm Exam next week Tuesday (4/5)

– In class– Closed book/notes– Covers lectures 9 (FSMs) through lecture 17 (memory 1)

• Exam held in 125 Cory• Today:

– Highlights from lectures 9 - 17– I will mention most important points from each lecture– Exam may cover subtopics not mentioned today– Use homework as a guide to the type of questions on the exam


Lecture 9 - Finite State Machines 1

February 15, 2005


Finite State Machines (FSMs)• FSM circuits are a type of

sequential circuit:– output depends on present and

past inputs• effect of past inputs is represented

by the current state

• Behavior is represented by State Transition Diagram:– traverse one edge per clock cycle.

2


Formal Design Process

Review of Design Steps:

1. Specify circuit function (English)2. Draw state transition diagram3. Write down symbolic state transition table4. Write down encoded state transition table5. Derive logic equations6. Derive circuit diagram

FFs for stateCL for NS and OUT


State Encoding• One-hot encoding of states.• One FF per state.

• Why one-hot encoding?– Simple design procedure.

• Circuit matches state transition diagram (example next page).– Often can lead to simpler and faster “next state” and output logic.

• Why not do this?– Can be costly in terms of FFs for FSMs with large number of states.

• FPGAs are “FF rich”, therefore one-hot state machine encoding is often a good approach.


One-hot encoded FSM• Even Parity Checker Circuit:

• In General: • FFs must be initialized for correct operation (only one 1)

Circuit generated through direct inspection of the STD.


Lecture 10 - Finite State Machines 2

February 17, 2005

3


FSM RecapMoore Machine Mealy Machine

STATE[output values]

input value

STATE

input value/output values

Both machine types allow one-hot implementations.


FSM ComparisonSolution A

Moore Machine• output function only of PS• maybe more states (why?)• synchronous outputs

– no glitches– one cycle “delay”– full cycle of stable output

Solution BMealy Machine

• output function of both PS & input• maybe fewer states• asynchronous outputs

– if input glitches, so does output– output immediately available– output may not be stable long

enough to be useful (below):

If output of Mealy FSM goes through combinational logic before being registered, the CL might delay the signal and it could be missed by the clock edge.


General FSM Design Process with Verilog Implementation

Design Steps:1. Specify circuit function (English)2. Draw state transition diagram3. Write down symbolic state transition table4. Assign encodings (bit patterns) to symbolic states5. Code as Verilog behavioral description� Use parameters to represent encoded states.� Use separate always blocks for register assignment and CL

logic block.� Use case for CL block. Within each case section assign all

outputs and next state value based on inputs. Note: For Moore style machine make outputs dependent only on state not dependent on inputs.


FSMs in Verilog

always @(posedge clk) if (rst) ps <=

ZERO;else ps <= ns;

always @(ps in)case (ps)

ZERO: if (in) begin out = 1’b1;ns = ONE;

endelse begin

out = 1’b0;ns = ZERO;

endONE: if (in) begin

out = 1’b0;ns = ONE;endelse begin

out = 1’b0;ns = ZERO;

enddefault: begin

out = 1’bx; ns = default;

end

always @(posedge clk) if (rst) ps <= ZERO;else ps <= ns;

always @(ps in)case (ps)

ZERO: beginout = 1’b0;if (in) ns = CHANGE;else ns = ZERO;

endCHANGE: begin

out = 1’b1;if (in) ns = ONE;else ns = ZERO;

endONE: begin

out = 1’b0;if (in) ns = ONE;else ns = ZERO;

default: begin out = 1’bx; ns = default;

end

Mealy Machine Moore Machine

4


Lecture 11 - Shifters & Counters

February 24, 2003


Universal Shift-register


Shift Registers• Plain shift register:

• Shifter with shift-enable input

• Verilog: assign OUT = Q[0];always @ (posedge clk)

if (shiftEnable) Q <= {IN; Q[3:1]};else Q <= Q;

• FPGA FFs have clock-enable (CE), therefore muxes are not needed.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.


Controller using Counters• State Transition Diagram:

– Assume presence of two binary counters. An “i” counter for the outer loop and “j” counter for inner loop.

counter

CLK RST

CE TC

IDLECEi,CEj

RSTi

CEi,CEj

INNER<inner contol>

CEi,CEj

RSTj

OUTER<outer contol>

STARTSTART

TCi

TCj

TCi

TCj

TC is asserted when the counter reaches it maximum count value.CE is “count enable”. The counterincrements its value on the rising edge of the clock if CE is asserted.

5


Odd Counts• Extra combinational logic can be

added to terminate count before max value is reached:

• Example: count to 12

• Alternative:

4-bit binarycounter

load

4

TC


Synchronous Counters• How do we extend to n-bits?• Extrapolate c+: d+ = d ⊕ abc, e+ = e ⊕ abcd

• Has difficulty scaling (AND gate inputs grow with n)

• CE is “count enable”, allows external control of counting, • TC is “terminal count”, is asserted on highest value, allows cascading,

external sensing of occurrence of max value.

a b c

a+ b+ c+

d

d+

b

b+

c

c+

a

a+

d

d+

CE TC


Synchronous Counters

b

b+

c

c+

a

a+

d

d+

CE TC

• How does this one scale?� Delay grows α n

• Generation of TC signals very similar to generation of carry signals in adder.

• “Parallel Prefix” circuit reduces delay:

log2n

log2n


Ring Counters• “one-hot” counters0001, 0010, 0100, 1000, 0001, …

“Self-starting” version:

• What are these good for?

D Q D Q D Q D Q

q3 q2 q1 q0

D Q D Q D Q D Q

q3 q2 q1

S R S R S R S R

q0

0 0 0 0reset

6


Lecture 12 – Project Description

March 1, 2005


Digital Audio– Music waveform

– A series of numbers is used to represent the waveform, ratherthan a voltage or current, as in analog systems.

• Discrete time: regular spacing of sample values in time. Most digital audio system use 44.1KHz (consumer) sample rate or 48KHz (professional) sample rate. – Lower frequency would limit the maximum representable frequency

content. (Human hearing max is 20KHz)• Digital: All inputs/outputs and internal values (signals) take on discrete

values (not analog). Most digital audio systems use 16-bit values (64K possible values for any point in waveform). Using much fewer than 16 bits generates noticeable noise from distortion.


Analog / Digital Conversion

• Converters are used to move from/to the analog domain.• ADC & DAC often combined in a single chip called CODEC

(coder/decoder).• Other types of CODECs perform other functions (ex: video conversion,

audio compression/decompression).

Digital SystemDigital System

processingprocessing

recordingrecording

playbackplayback

synthesissynthesis

Analog to DigitalConverter (ADC)

sound source

(microphone)

sample clock

26, 46, 51, 55, 51, …

Digital to AnalogConverter (DAC)

26, 46, 51, 55, 51, …

sample clock

poweramplifier

decompressiondecompression

compressioncompression


Digital Audio Data-rates

• Relatively small storage devices has prompted the development and application of many compression algorithms for music and speech:

– Typically compression ratios of 10-100– MP3: 32Kbits/sec - 320Kbits/sec (factor of 4x to 44x)– These techniques are lossy; information is lost. However the better ones (MP3 &

AAC for example) used techniques based on characteristics of human auditory perception to drop information of little importance.

• In our project, uncompressed audio will be used. – Sufficient network bandwidth to support multiple streams of audio.– Much simpler hardware design.

• Uncompressed audio is often referred to as PCM (pulse code modulation) . (.wav files in windows)

44.1K samples/sec x 2 (stereo) x 16 bits/samples = 1.4 Mbit/sec = 176,400 Bytes/sec

1 minute ≈ 10MByte total

7


Local Area Network (LAN) Basics

• A LAN is made up physically of a set of switches, wires, and hosts. Routers and gateways provide connectivity out to other LANs and to the internet.

• Ethernet defines a set of standards for data-rate (10/100Mbps, 1/10Gbps), and signaling to allow switches and computers to communicate.

• Most Ethernet implementations these days are “switched” (point to point connections between switches and hosts, no contention or collisions).

• Information travels in variable sized blocks, called Ethernet Frames, each frame includes preamble, header (control) information, data, and error checking. We usually call these packets.

• Preamble is a fixed pattern used by receivers to synchronize their clocks to the data.

• Link level protocol on Ethernet is called the Medium Access Control (MAC) protocol. It defines the format of the packets.

switchswitch

host

host

host host

switch

host

to router or gateway

Preamble MAC Payload CRC(8 bytes) header


Ethernet Medium Access Control (MAC)

• MAC protocol encapsulates a payload by adding a 14 byte header before the data and a 4-byte cyclic redundancy check (CRC) after the data.

• The CRC provides error detection in the case where line errors result in corruption of the MAC frame. In most applications a frame with an invalid CRC is discarded by the MAC receiver.

Ethertypes for EECS150 project: 0x0101: audio packets0x0102: LCD packets(picked from the range of “experimental” type codes to avoid potential conflict.

– One way transmission only.– All packets will be broadcasted

• A 6-byte destination address, specifies either a single recipient node (unicastmode), a group of recipient nodes (multicast mode), or the set of all recipient nodes (broadcast mode).

• A 6-byte source address, is set to the sender’s globally unique node address. Its common function is to allow address learning which may be used to configure the filter tables in switches.

• A 2-byte type field, identifies the type of protocol being carried (e.g. 0x0800 for IP protocol).


Protocol Stacks• Usual case is that MAC protocol

encapsulates IP (internet protocol) which in turn encapsulates TCP (transport control protocol) with in turn encapsulates the application layer. Each layer adds its own headers.

• Other protocols exist for other network services (ex: printers).

• When the reliability features (retransmission) of TCP are not needed, UDP/IP is used. Gaming and other applications where reliability is provided at the application layer.

application layerex: http

TCP

IP

MAC Layer 2Layer 3Layer 4Layer 5

Streaming Ex. Mpeg4

UDP

IP

MAC Layer 2Layer 3Layer 4Layer 5


Standard Hardware-Network-Interface

• Usually divided into three hardware blocks. (Application level processing could be either hardware or software.)

– MAG. “Magnetics” chip is a transformer for providing electrical isolation.

– PHY. Provides serial/parallel and parallel/serial conversion and encodes bit-stream for Ethernet signaling convention. Drives/receives analog signals to/from MAG. Recovers clock signal from data input.

– MAC. Media access layer processing. Processes Ethernet frames: preambles, headers, computes CRC to detect errors on receiving and to complete packet for transmission. Buffers (stores) data for/from application level.

• Application level interface– Could be a standard bus (ex: PCI)– or designed specifically for application

level hardware.• MII is an industry standard for

connection PHY to MAC.

MAG(transformer)

PHY(Ethernet signal)

MAC(MAC layer processing)

application level

interfaceEthernet

connection

Media Independent Interface (MII)

Calinx has no MAC chip, mustbe handled in FPGA.

Calinx has no MAC chip, mustbe handled in FPGA.

8


Lecture 14 - CMOS

March 8, 2005


Transistor-level Logic Circuits• NAND gate • NOR gate

• Note: – out = 0 iff both a OR b = 1 therefore out =

(a+b)’– Again pFET network and nFET network are

duals of one another.

Other more complex functions are possible. Ex: out = (a+bc)’


Transmission Gate• Transmission gates are the way to build “switches” in CMOS. • In general, both transistor types are needed:

– nFET to pass zeros.– pFET to pass ones.

• The transmission gate is bi-directional (unlike logic gates).

• Does not directly connect to Vdd and GND, but can be combined with logic gates or buffers to simplify many logic structures.


Pass-Transistor Multiplexor• 2-to-1 multiplexor:

c = sa + s’b

• Switches simplify the implementation:

s

s’b

a

c

9


Tri-state Buffers

• Bidirectional connections: • Busses:

Tri-state buffers are used when multiple circuits all connect to a common bus.Only one circuit at a time is allowed to drive the bus. All others “disconnect”.


Transistor-level Logic CircuitsPositive Level-sensitive latch:

Latch Transistor Level:Positive Edge-triggered flip-flop

built from two level-sensitive latches:

clk’

clk

clk

clk’


Lecture 15 - Timing

March 10, 2005


Limitations on Clock Rate1 Logic Gate Delay

• What are typical delay values?

2 Delays in flip-flops

• Both times contribute to limiting the clock period. Plus clock skew.

t

input

output

D

clk

Q

setup time clock to Q delay

• What must happen in one clock cycle for correct operation?• Assuming perfect clock distribution (all flip-flops see the clock at the same

time):– All signals connected to FF inputs must be ready and “setup” before

rising edge of clock.

10


General Model of Synchronous Circuit

• In general, for correct operation:

for all paths.• How do we enumerate all paths?

– Any circuit input or register output to any register input or circuit output.

– “setup time” for circuit outputs depends on what it connects to– “clk-Q time” for circuit inputs depends on from where it comes.

reg regCL CL

clock input

output

option feedback

input output

T ≥ time(clk→Q) + time(CL) + time(setup)T ≥ τclk→Q + τCL + τsetup


Gate Switching Behavior• Inverter:

• NAND gate:


Gate Delay• Cascaded gates:

Vout

Vin

“transfer curve” for inverter.


Gate Delay• Fan-out:

• The delay of a gate is proportional to its output capacitance. Because, gates 2 and 3 turn on/off at a later time. (It takes longer for the output of gate 1 to reach the switching threshold of gates 2 and 3 as we add more output capacitance.)

1

3

2

11


“Critical” Path

• Critical Path: the path with the maximum delay, from any input to any output.– In general, we include register set-up and clk-to-Q times in

critical path calculation.

• What is the critical path in this circuit?

• Why do we care about the critical path?


Delay in Flip-flops• Setup time results from delay

through first latch.

• Clock to Q delay results from delay through second latch.

D

clk

Q

setup time clock to Q delay

clk

clk’

clk

clk

clk’

clk’

clk

clk’


Clock Skew (cont.)

• If clock period T = TCL+Tsetup+Tclk→Q, circuit will fail.• Therefore:

1. Control clock skewa) Careful clock distribution. Equalize path delay from clock source to all clock loads by controlling wires delay and buffer delay.b) don’t “gate” clocks.

2. T ≥ TCL+Tsetup+Tclk→Q + worst case skew.

• Most modern large high-performance chips (microprocessors) control end to end clock skew to a few tenths of a nanosecond.

clock skew, delay in distributionCL

CLKCLK’

CLK

CLK’


Lecture 16 - Power

March 15, 2005

12


Basics• Power supply provides energy for charging and discharging wires and transistor

gates. The energy supplied is stored & then dissipated as heat.

• If a differential amount of charge dq is given a differential increase in energy dw, the potential of the charge is increased by:

• By definition of current: dqdwV /=dtdqI /=

dtdwP /≡ Power: Rate of work being done w.r.t time.Rate of energy being used.

IVPdtdq

dqdw

dtdw ×==×=/

�∞−

=t

Pdtw total energy

Units: tEP ∆= Watts = Joules/seconds

A very practicalformulation!

If we would liketo know total energy


Metrics• How does MIPS/watt relate to energy?• Average power consumption = energy / time

MIPS/watt = instructions/sec / joules/sec = instructions/joule

– therefore an equivalent metric (reciprocal) is energy per operation (E/op)

• E/op is more general - applies to more that processors– also, usually more relevant, as batteries life is limited by total

energy draw.– This metric gives us a measure to use to compare two alternative

implementations of a particular function.


Power in CMOS

C

pullupnetwork

pulldownnetwork

Vdd

GND

10

i(t)

v(t)t0 t1

v(t)

VddSwitching Energy:

energy used toswitch a node

Energy supplied Energy dissipatedEnergy stored

Calculate energy dissipated in pullup:

Esw = P(t)dt =t0

t1� (Vdd − v) ⋅ i(t)dt =t0

t1� (Vdd − v) ⋅ c (dv dt) dt =t0

t1�

= cVdd dv − c v ⋅ dv = cVdd2 −1 2cVdd

2

v0

v1�v0

v1� =1 2cVdd2

An equal amount of energy is dissipated on pulldown.


Controlling Energy Consumption

• Largest contributing component to CMOS power consumption is switching power:

• Factors influencing power consumption:n: total number of nodes in circuit

α: activity factor (probability of each node switching)

f: clock frequency (does this effect energy consumption?)

Vdd: power supply voltage

• What control do you have over each factor? • How does each effect the total Energy?

What control do you have as a designer?

221 ddavgavgavg VcfnP ⋅⋅⋅= α

In EECS150 design projects, we will not optimize for power consumption.

13


Lecture 17 – Memory 1

March 17, 2005


Standard Internal Memory Organization

• Special circuit tricks are used for the cell array to improve storage density. (We will look at these later)

• RAM/ROM naming convention: – examples: 32 X 8, "32 by 8" => 32 8-bit words – 1M X 1, "1 meg by 1" => 1M 1-bit words

2-D arrary of bit cells. Each cell stores one bit of data.


Read Only Memory (ROM)• Simply form of memory. No write operation needed.• Functional Equivalence:

• Full tri-state buffers are not needed at each cell point.• In practice, single transistors are used to implement zero cells. Logic

one’s are derived through precharging or bit-line pullup transistor.

Connections to Vddused to store a logic 1, connections to GND for storing logic 0.

address decoder bit-cell array


Column MUX in ROMs and RAMs: • Controls physical aspect ratio

– Important for physical layout and to control delay on wires.

• In DRAM, allows time-multiplexing of chip address pins

14


Cascading Memory Modules (or chips) • Example: assemblage of 256 x 8

ROM using 256 x 4 modules:

• example: 1K x * ROM using 256 x 4 modules:

• each module has tri-state outputs:


Memory Components Types:• Volatile:

– Random Access Memory (RAM): • DRAM "dynamic" • SRAM "static"

• Non-volatile:– Read Only Memory (ROM):

• Mask ROM "mask programmable" • EPROM "electrically programmable" • EEPROM "erasable electrically programmable" • FLASH memory - similar to EEPROM with programmer

integrated on chip


Volatile Memory Comparison

• SRAM Cell

• Larger cell � lower density, higher cost/bit

• No refresh required

• Simple read � faster access • Standard IC process � natural for

integration with logic

• DRAM Cell

• Smaller cell � higher density, lower cost/bit

• Needs periodic refresh, and refresh after read

• Complex read � longer access time • Special IC process � difficult to integrate

with logic circuits

word line

bit line bit line

word line

bit line

The primary difference between different memory types is the bit cell.


Dual-ported Memory Internals• Add decoder, another set of

read/write logic, bits lines, word lines:

• Example cell: SRAM

• Repeat everything but cross-coupled inverters.

• This scheme extends up to a couple more ports, then need to add additional transistors.

deca decbcell

array

r/w logic

r/w logic

data portsaddress

ports

b2 b2b1 b1

WL2

WL1

Midterm Exam next week Tuesday (4/5) EECS150 -...

Documents

Transcript of Midterm Exam next week Tuesday (4/5) EECS150 -...