Post on 17-Jan-2016
Content Addressable Memories
Cell Design and Peripheral Circuits
CAM: Introduction CAM vs. RAM
001101115
100011014
101111013
110010112
000011011
010101010
10001101
Data Out
44
Ad
dre
ss In
110001115
000111014
100011013
110010112
000011011
010101010
10001101
Data In
33
Ad
dre
ss O
ut
1000110110001101
CAM: Introduction Binary CAM Cell
ML pre-charged to VDD
Match: ML remains at VDD
Mismatch: ML discharges
BL1cBL1
WL
SL1c SL1
ML
BL1c_cellBL1_cell
P1 P2
N1 N2
N3N4
N5 N7
N6 N8
CAM: Introduction Ternary CAM (TCAM)
00X001115
010011014
000111013
110010X12
101011011
010X01010
XXX01101
Input Keyword
XXXXX1115
XXXX11014
XXX111013
XX0010112
X00011011
010101010
01101
01101
1101
0001101
11
44
Match
Match
11
44
Match
Match
10001101
Input Keyword
CAM: Introduction TCAM Cell
Global Masking SLs Local Masking BLs
BL1 BL2 Logic
0 1 0
1 0 1
1 1 X
0 0 N.A.
BL1BL1 BL2BL2
WLWL
RAM RAM CellCell
RAM RAM CellCell
SL1SL1 SL2SL2MLML
BL1cBL1c BL2cBL2c
Comparison Comparison LogicLogic
CAM: Introduction DRAM based TCAM Cell
Higher bit density Slower table update Expensive process Refreshing circuitry Scaling issues (Leakage)
BL2BL1
WL
SL2 SL1
ML
BL2_cellBL1_cell
N3 N4
N5 N7
N6 N8
CAM: Introduction SRAM based TCAM Cell
Standard CMOS process Fast table update Large area (16T)
BL1 BL1c BL2BL2c
WL
SL1 SL2
ML
BL1c_cell BL2c_cell
CAM: Introduction Block diagram of a 256 x 144 TCAM
CAM Cell (0)
BL1c(0) BL2c(0)
CAM Cell (143)
BL1c(N) BL2c(N)
CAM Cell (0)
BL1c(0) BL2c(0)
CAM Cell (143)
BL1c(N) BL2c(N)
ML0SL1(143) SL2(143) SL1(0) SL2(0)
MLSAMLSO(0)
MLSAML255 MLSO(255)
SL Drivers
Search Lines (SLs)
ML Sense Amplifiers
Match Lines
(MLs)
CAM: Introduction Why low-power TCAMs?
Parallel search Very high power(2Mb Sibercore TCAM 66MHz 66Msps 3.4W)
IPv6, OC-768 Larger word size, larger no. of entries High power
Embedded applications (SoC)
CAM: Introduction
Why high-performance TCAMs? OC-768 135M packets/s (7.4 ns/packet)
Application complexity Multiple searches
IPv6 Larger word size larger search time
CAM: Design Techniques
Cell Design: 12T Static TCAM cell* ‘0’ is retained by Leakage (VWL ~ 200 mV)
High density Leakage (3 orders) Noise margin Soft-errors (node S) Unsuitable for READ
* I. Arsovski, T. Chandler, A. Sheikholeslami, IEEE JSSC, vol. 38, no. 1, pp. 155-158, Jan. 2003
CAM: Design Techniques
Cell Design: NAND vs. NOR Type CAM Low Power Charge-sharing Slow CAM
Cell (N)CAM
Cell (1)CAM
Cell (0)
SAML_NAND M
SA
CAM Cell (N)
CAM Cell (1)
CAM Cell (0)
ML_NOR MM
BL1 BL1c
WL
SL1 SL1c
VDD BL1 BL1c
WL
SL1c SL1
VDD
NAND-type CAM NOR-type CAM
CAM: Design Techniques MLSA Design: Conventional
Pre-charge ML to VDD
Match VML = VDD
Mismatch VML = 0
MM MM
VDD
PRE
MLSO
VDD
ML
CAM: Design Techniques MLSA Design: Current Race Sensing*
Dummy MLDummy ML MLOFFMLOFF
DelayDelay
* I. Arsovski, T. Chandler, A. Sheikholeslami, IEEE JSSC, vol. 38, no. 1, pp. 155-158, Jan. 2003
RSTRST
VVDDDD
RSTcRSTc
MLML
MLSOMLSOMLOFFMLOFF
MATCHMATCHMMMM MMMM
CAM: Design Techniques
MLSA Design: Current Race Sensing No need to reset SLs in every clock cycle Lower ML voltage swing (Vth + ∆V) ≈ ½VDD
Speed Current Voltage Margin
Voltage Margin
ML [0]
MLSO [0]
ML [1]
CAM: Design Techniques
MLSA Design: Charge Redistribution* Fast pre-charge ML through MREF
Mismatch SP=‘0’ MLSO=‘1’ IML > IREF > leakage
∆VML (VREF – Vth)
FAST_PRE High power
* P. Vlasenko, D. Perry, MOSAID Technologies Inc., US Patent 6717876, April 6, 2004
FAST_PRE
RST
VREF
VDDVDD
SP MLSOIREF
ML
CML
CSPMREF
RST
CAM: Design Techniques
MLSA Design: Charge Injection* Reset ML and pre-charge CINJ
Charge share CINJ and CML
Match VML = CINJ x VDD/(CINJ +CML)
Mismatch VML = 0
Small ∆VML
Poor noise margin Area penalty (CINJ)
VDD
ML MLSO
CML
OFFSET SACHARGE_INPRE
CINJ
RST
* G. Kasai, Y. Takarabe, K. Furumi, and M. Yoneda, SONY Corp., Proc. IEEE CICC, pp. 387-390, Sep. 2003
CAM: Design Techniques Low Power: Selective Pre-charge*
MLs: Two segments If MATCH in pre-search Main-search No. of bits in pre-search Data statistics
ML1 ML2MLSA1
MLSO1
MLSA2MLSO2
ML1 ML2MLSA1
MLSO1
MLSA2MLSO2
PRE-SEARCH MAIN-SEARCH
* C. Zukowski and S. Wang, Proc. IEEE ISCAS, pp. 745-770, Jun. 9-12, 1997
CAM: Design Techniques Low Power: Dual-ML TCAM*
MLSA1 is enabled first MLSA2 is enabled if MLSO1 = ‘1’
* N. Mohan, M. Sachdev, Proc. IEEE ISCAS, pp. 633-636, May 23-26, 2004
CAM Cell (0)
BL1c(0) BL2c(0)
CAM Cell (N)
BL1c(N) BL2c(N)
ML1SL1(N) SL2(N) SL1(0) SL2(0)
MLSA1MLSO1
ML2
MLSA2MLSO2
ML1
ML2
CAM: Design Techniques Low Power: Dual-ML TCAM
Cap(ML1) = Cap(ML2) = ½ C(ML) Same speed, 50% less energy (Ideally!)
Parasitic interconnects degrade both speed and energy
Additional ML increases coupling capacitance
CAM: Design Techniques
Low Power: Dual-ML TCAM Simulation results (144 bits)*
Interconnect cap. = 27 fF W/L = 0.6µm/0.18µm
Old New Difference
TS (ns) 8.14 8.46 4%
E1 (fJ) 769 426 45%
E2 (fJ) 769 973 26%
* N. Mohan, M. Sachdev, Proc. IEEE ISCAS, pp. 633-636, May 23-26, 2004
CAM: Design Techniques Low Power: Dual-ML TCAM*
EAVG = PML1 x E1 +(1 – PML1) x E2
SA1 cannot detect Type I For ‘M’ mismatches, PML1 = 1 – (0.5)M
Mismatch SL1 SL2 BL1 BL2
Type I 0 1 1 0
Type II 1 0 0 1
SL1SL1
BL1cBL1c
ML1ML1
* N. Mohan, M. Sachdev, Proc. IEEE ISCAS, pp. 633-636, May 23-26, 2004
CAM: Design Techniques Low Power: Dual-ML TCAM*
* N. Mohan, M. Sachdev, Proc. IEEE ISCAS, pp. 633-636, May 23-26, 2004
0
1
2
3
4
5
6
1 2 3 4 5 6
Number of Mismatches (M)
Ave
rag
e E
ner
gy
(fJ/
bit
/sea
rch
)
TraditionalDual ML
43%
CAM: Design Techniques Low Power: Hierarchical SLs*
144 bits (5 segments: 8, 34, 34, 34, 34) SLs Multiple blocks (64 words each) ∆VGSL 0.45V (VDD=1.8V)
Logic complexity Search time/latency 64-bit OR gates
* Pagiamtzis et. al., Proc. IEEE CICC, pp. 383-386, Sep. 2003
CAM: Design Techniques Static Power Reduction
16T TCAM: Leakage Paths*
WL
BL1 BL1c
SL1 SL2
BL2BL2c
ML
‘1’‘0’ ‘1’
‘0’
N1 N2
N3 N4
P1 P2
N5 N6
N7 N8
P3 P4N12
N9 N11
N10
‘0’ ‘0’‘1’ ‘1’
BL1c_cell BL2c_cell
* N. Mohan, M. Sachdev, Proc. IEEE CCECE, pp. 711-714, May 2-5, 2004
CAM: Design Techniques Static Power Reduction
Technology Scaling1
Dimensions 30% Dynamic power 50% Leakage current 5x
Architectural level techniques2, 3
A small portion is enabled
1. S. Borkar, IEEE Micro, pp. 23-29, Jul.-Aug. 1999
2. K. Pagiamtzis, A. Sheikholeslami, Proc. IEEE CICC, pp. 383-386, Sep. 2003
3. G. Kasai, Y. Takarabe, K. Furumi, M. Yoneda, Proc. IEEE CICC, pp. 387-390, Sep. 2003
CAM: Design Techniques Static Power Reduction
Leakage current* VDD ISUB
1 20 exp( )S S DD
SUBT
k k VI I
nV
VDD
* R. X. Gu, M. I. Elmasry, IEEE JSSC, vol. 31, no. 5, pp. 707-713, May 1996
CAM: Design Techniques
Static Power Reduction Side Effects of VDD Reduction in TCAM Cells
Speed: No change Dynamic power: No change Robustness VDD Volt. Margin
(Current-race sensing)Voltage Margin
ML [0]
MLSO [0]
ML [1]
CAM: Design Techniques Static Power Reduction
Voltage Margin of 144-bit TCAM word in 0.18 µm CMOS*
200
250
300
350
400
450
500
1.8 1.7 1.6 1.5 1.4 1.3 1.2 1.1 1 0.9 0.8
VDD (V)
Vo
ltag
e m
arg
in (
mV
)
* N. Mohan, M. Sachdev, Proc. IEEE CCECE, pp. 711-714, May 2-5, 2004
CAM: Design Techniques Static Power Reduction
Effects of Technology Scaling* Berkeley predictive technology model (BPTM)
0.1
1
10
100
1000
0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2
VDD (V)
Lea
kag
e C
urr
ent
(nA
)
130 nm100 nm70 nm45 nm
* N. Mohan, M. Sachdev, Proc. IEEE CCECE, pp. 711-714, May 2-5, 2004