Review: Basic Building Blocks

27
Review: Basic Building Blocks Datapath Execution units - Adder, multiplier, divider, shifter, etc. Register file and pipeline registers Multiplexers, decoders Control Finite state machines (PLA, ROM, random logic) Interconnect Switches, arbiters, buses Memory Caches (SRAMs), TLBs, DRAMs, buffers

description

Review: Basic Building Blocks. Datapath Execution units Adder , multiplier, divider, shifter, etc. Register file and pipeline registers Multiplexers, decoders Control Finite state machines (PLA, ROM, random logic) Interconnect Switches, arbiters, buses Memory - PowerPoint PPT Presentation

Transcript of Review: Basic Building Blocks

Page 1: Review:  Basic Building Blocks

Review: Basic Building Blocks

Datapath Execution units

- Adder, multiplier, divider, shifter, etc.

Register file and pipeline registers Multiplexers, decoders

Control Finite state machines (PLA, ROM, random logic)

Interconnect Switches, arbiters, buses

Memory Caches (SRAMs), TLBs, DRAMs, buffers

Page 2: Review:  Basic Building Blocks

The 1-bit Binary Adder

1-bit Full Adder(FA)

A

BS

Cin

S = A B Cin

Cout = A&B | A&Cin | B&Cin (majority function)

How can we use it to build a 64-bit adder?

How can we modify it easily to build an adder/subtractor?

How can we make it better (faster, lower power, smaller)?

A B Cin CoutS carry status

0 0 0 0 0 kill

0 0 1 0 1 kill

0 1 0 0 1 propagate

0 1 1 1 0 propagate

1 0 0 0 1 propagate

1 0 1 1 0 propagate

1 1 0 1 0 generate

1 1 1 1 1 generate

Cout

G = A&BP = A BK = !A & !B

= P Cin

= G | P&Cin

Page 3: Review:  Basic Building Blocks

FA Gate Level Implementations

A B

S

Cout

Cin

t1 t0t2 t0

t1

A B

S

Cout

Cin

t2

Page 4: Review:  Basic Building Blocks

XOR FA

Cout

S

Cin

A

B

16 transistors

Page 5: Review:  Basic Building Blocks

CPL FA

A

!A

B!B Cin!Cin

!S

S

Cout

!CoutA

!A

B

!B

!B

B Cin !Cin

Cin

!Cin

20+8 transistors, dual rail – beware of threshold drops

Page 6: Review:  Basic Building Blocks

Mirror Adder

B

B B

B B

B

B

B

A

A

A

A

A

A A

A

Cin

Cin

Cin

Cin

Cin!Cout !S

24+4 transistors

kill

generate

0-propagate

1-propagate

Cout = A&B | B&Cin | A&Cin SUM = A&B&Cin | COUT&(A | B | Cin)

4 4

4 4

4

8

888

8

2 2 2

3

3

3

6

6

6

444

4

2

Sizing: Each input in the carry circuit has a logical effort of 2 so the optimal fan-out for each is also 2. Since !Cout drives 2 internal and 2 inverter transistor gates (to form Cin for the nms bit adder) should oversize the carry circuit. PMOS/NMOS ratio of 2.

Page 7: Review:  Basic Building Blocks

Mirror Adder Features The NMOS and PMOS chains are completely

symmetrical with a maximum of two series transistors in the carry circuitry, guaranteeing identical rise and fall transitions if the NMOS and PMOS devices are properly sized.

When laying out the cell, the most critical issue is the minimization of the capacitances at node !Cout (four diffusion capacitances, two internal gate capacitances, and two inverter gate capacitances). Shared diffusions can reduce the stack node capacitances.

The transistors connected to Cin are placed closest to the output.

Only the transistors in the carry stage have to be optimized for optimal speed. All transistors in the sum stage can be minimal size.

Page 8: Review:  Basic Building Blocks

A 64-bit Adder/Subtractor

1-bit FA S0

C0=Cin

C1

1-bit FA S1

C2

1-bit FA S2

C3

C64=Cout

1-bit FA S63

C63

. .

.

Ripple Carry Adder (RCA) built out of 64 FAs

Subtraction – complement all subtrahend bits (xor gates) and set the low order carry-in

RCA

advantage: simple logic, so small (low cost)

disadvantage: slow (O(N) for N bits) and lots of glitching (so lots of energy consumption)

A0

B0

A1

B1

A2

B2

A63

B63

add/subt

Page 9: Review:  Basic Building Blocks

Ripple Carry Adder (RCA)

A0 B0

S0

C0=CinFA

A1 B1

S1

FA

A2 B2

S2

FA

A3 B3

S3

FACout=C4

T = O(N) worst case delay

Tadder TFA(A,BCout) + (N-2)TFA(CinCout) + TFA(CinS)

Real Goal: Make the fastest possible carry path

Page 10: Review:  Basic Building Blocks

Inversion Property

A B

S

CinFA

!Cout (A, B, Cin) = Cout (!A, !B, !Cin)

Cout

A B

S

FACout Cin

!S (A, B, Cin) = S(!A, !B, !Cin)

Inverting all inputs to a FA results in inverted values for all outputs

Page 11: Review:  Basic Building Blocks

Exploiting the Inversion Property

A0 B0

S0

C0=CinFA’

A1 B1

S1

FA’

A2 B2

S2

FA’

A3 B3

S3

FA’Cout=C4

Now need two “flavors” of FAs

regular cellinverted cell

Minimizes the critical path (the carry chain) by eliminating inverters between the FAs (will need to increase the transistor sizing on the carry chain portion of the mirror adder).

Page 12: Review:  Basic Building Blocks

Fast Carry Chain Design

The key to fast addition is a low latency carry network

What matters is whether in a given position a carry is generated Gi = Ai & Bi = AiBi

propagated Pi = Ai Bi (sometimes use Ai | Bi) annihilated (killed) Ki = !Ai & !Bi

Giving a carry recurrence of

Ci+1 = Gi | PiCi

C1 = G0 | P0C0

C2 = G1 | P1G0 | P1P0 C0

C3 = G2 | P2G1 | P2P1G0 | P2P1P0 C0

C4 = G3 | P3G2 | P3P2G1 | P3P2P1G0 | P3P2P1P0 C0

Page 13: Review:  Basic Building Blocks

Manchester Carry Chain

Switches controlled by Gi and Pi

Total delay of time to form the switch control signals Gi and Pi

setup time for the switches signal propagation delay through N switches in the worst case

Gi Pi

!Ci!Ci+1

clk

Page 14: Review:  Basic Building Blocks

4-bit Sliced MCC Adder

G P

!C0

clk

G PG PG P

& & & &

A0 B0A1 B1A2 B2A3 B3

S0S1S2S3

!C1!C2!C3

!C4

Page 15: Review:  Basic Building Blocks

Domino Manchester Carry Chain Circuit

Ci,0G0

clk

clkP0P1P2P3

G1G2G3

Ci,41 2 3 4

5

6

3 3 3 3 3

1

2

2

3

3

4

4

5

!(G0 | P0 Ci,0)

!(G1 | P1G0 | P1P0 Ci,0)

!(G2 | P2G1 | P2P1G0 | P2P1P0 Ci,0)

!(G3 | P3G2 | P3P2G1 | P3P2P1G0 | P3P2P1P0 Ci,0)

Page 16: Review:  Basic Building Blocks

Binary Adder Landscape

synchronous word parallel adders

ripple carry adders (RCA) carry prop min adders

signed-digit fast carry prop residue adders adders adders

Manchester carry parallel conditional carry carry chain select prefix sum skip

T = O(N), A = O(N)

T = O(1), A = O(N)

T = O(log N)A = O(N log N)

T = O(N), A = O(N)T = O(N)

A = O(N)

Page 17: Review:  Basic Building Blocks

Carry-Skip (Carry-Bypass) Adder

If (P0 & P1 & P2 & P3 = 1) then Co,3 = Ci,0 otherwise the block itself kills or generates the carry internally

A0 B0

S0

Ci,0FA

A1 B1

S1

FA

A2 B2

S2

FA

A3 B3

S3

FACo,3

Co,3

BP = P0 P1 P2 P3 “Block Propagate”

Page 18: Review:  Basic Building Blocks

Carry-Skip Chain Implementation

BPblock carry-in

block carry-outcarry-out

Cin

G0

P0P1P2P3

G1G2G3

!Cout

BP

Page 19: Review:  Basic Building Blocks

4-bit Block Carry-Skip Adder

Worst-case delay carry from bit 0 to bit 15 = carry generated in bit 0, ripples through bits 1, 2, and 3, skips the middle two groups (B is the group size in bits), ripples in the last group from bit 12 to bit 15

Ci,0

Sum

CarryPropagation

Setup

Sum

CarryPropagation

Setup

Sum

CarryPropagation

Setup

Sum

CarryPropagation

Setup

bits 0 to 3bits 4 to 7bits 8 to 11bits 12 to 15

Tadd = tsetup + B tcarry + ((N/B) -1) tskip +B tcarry + tsum

Page 20: Review:  Basic Building Blocks

Optimal Block Size and Time Assuming one stage of ripple (tcarry) has the same delay

as one skip logic stage (tskip) and both are 1

TCSkA = 1 + B + (N/B-1) + B + 1

tsetup ripple in skips ripple in tsum

block 0 last block

= 2B + N/B + 1

So the optimal block size, B, is

dTCSkA/dB = 0 (N/2) = Bopt

And the optimal time is

Optimal TCSkA = 2((2N)) + 1

Page 21: Review:  Basic Building Blocks

Carry-Skip Adder Extensions Variable block sizes

A carry that is generated in, or absorbed by, one of the inner blocks travels a shorter distance through the skip blocks, so can have bigger blocks for the inner carries without increasing the overall delay

CinCout

Multiple levels of skip logic

skip level 1

skip level 2

CinCout

AND of the first level skip signals (BP’s)

Page 22: Review:  Basic Building Blocks

Carry-Skip Adder Comparisons

0

10

20

30

40

50

60

70

8 bits 16 bits 32 bits 48 bits 64 bits

RCA

CSkA

VSkA

B=2 B=3B=4

B=5B=6

Page 23: Review:  Basic Building Blocks

Parallel Prefix Adders (PPAs) Define carry operator € on (G,P) signal pairs

€ is associative, i.e.,

[(g’’’,p’’’) € (g’’,p’’)] € (g’,p’) = (g’’’,p’’’) € [(g’’,p’’) € (g’,p’)]

(G’’,P’’) (G’,P’)

(G,P)

where G = G’’ P’’G’ P = P’’P’

€ €

G’

!G

G’’

P’’

Page 24: Review:  Basic Building Blocks

PPA General Structure Given P and G terms for each bit position, computing all

the carries is equal to finding all the prefixes in parallel

(G0,P0) € (G1,P1) € (G2,P2) € … € (GN-2,PN-2) € (GN-1,PN-1)

Since € is associative, we can group them in any order but note that it is not commutative

Measures to consider number of € cells tree cell depth (time) tree cell area cell fan-in and fan-out max wiring length wiring congestion delay path variation (glitching)

Pi, Gi logic (1 unit delay)

Si logic (1 unit delay)

Ci parallel prefix logic tree (1 unit delay per level)

Page 25: Review:  Basic Building Blocks

Brent-Kung PPAP

aral

lel P

refix

Com

puta

tion

G0

P0

G1

P1

G2

p2

G3

P3

G4

P4

G5

P5

G6

P6

G7

P7

G8

P8

G9

p9

G10

P10

G11

p11

G12

P12

G13

p13

G14

p14

G15

p15

€€€€€€€

€ € € €

€ € € € € €

€ €

C1C2C3C4C5C6C7C8C9C10C11C12C13C14C15C16

Cin

T =

log 2

NT

= lo

g 2N

- 2

A =

2lo

g 2N

A = N/2

Page 26: Review:  Basic Building Blocks

Kogge-Stone PPF AdderP

aral

lel P

refix

Com

puta

tion

G0

P0

G1

P1

G2

P2

G3

P3

G4

P4

G5

P5

G6

P6

G7

P7

G8

P8

G9

P9

G10

P10

G11

P11

G12

P12

G13

P13

G14

P14

G15

P15

€€€€€€€

€ € € €

C1C2C3C4C5C6C7C8C9C10C11C12C13C14C15C16

Cin

T =

log 2

N

A =

log 2

N

A = N

€€€€€€€

€ € € € € € € € € €

€ € € € € € € € € €

€ € € € € €

Tadd = tsetup + log2N t€ + tsum

Page 27: Review:  Basic Building Blocks

More Adder Comparisons

0

10

20

30

40

50

60

70

8 bits 16 bits 32 bits 48 bits 64 bits

RCA

CSkA

VSkA

KS PPA