TLC: Transmission Line Caches Brad Beckmann David Wood Multifacet Project University of...
-
Upload
damon-copeland -
Category
Documents
-
view
213 -
download
0
Transcript of TLC: Transmission Line Caches Brad Beckmann David Wood Multifacet Project University of...
TLC: Transmission Line Caches
Brad Beckmann
David Wood
Multifacet Project
http://www.cs.wisc.edu/multifacet/
University of Wisconsin-Madison
12/3/03
Beckmann & Wood
MICRO ’03 - TLC: Transmission Line Caches 2
Overview
• Problem: Global interconnect• Opportunity: On-chip transmission lines
– What are they?– Why now?
• Application: Large on-chip caches• Solution: TLC: Transmission Line Caches
+ Consistent high performance+ Simple logical design+ Less substrate area– Circuit verification– Wafer manufacturing cost
Beckmann & Wood
MICRO ’03 - TLC: Transmission Line Caches 3
Outline
• Problem: Global interconnect
• Opportunity: On-chip transmission lines
• Application: Large on-chip caches
• Solution: TLC: Transmission Line Caches
• Evaluation
• Conclusions
Beckmann & Wood
MICRO ’03 - TLC: Transmission Line Caches 4
Global Interconnect Problem
• Global interconnect latency → Bottleneck– RC delay dominant– Held constant using repeaters– Doesn’t scale with transistors
• Large structures particularly hurt– Partitioning mitigates intra-partition delay – Performance dominated by inter-partition delay
Beckmann & Wood
MICRO ’03 - TLC: Transmission Line Caches 5
Conventional Solution
• ↑ wire size → ↓ RC delay+ 3x size → 3x reduced delay+ ↑ wire segment length– 3x channel area– Doesn’t scale
• Intrinsic repeater delay• Inductive effects
A Better Solution?
Beckmann & Wood
MICRO ’03 - TLC: Transmission Line Caches 6
Outline
• Problem: Global interconnect
• Opportunity: On-chip transmission lines
• Application: Large on-chip caches
• Solution: TLC - Transmission Line Caches
• Evaluation
• Conclusions
Beckmann & Wood
MICRO ’03 - TLC: Transmission Line Caches 7
RC vs. TL CommunicationConventional Global RC Wire
On-chip Transmission Line
Voltage Voltage
DistanceVt
Driver Receiver
Voltage Voltage
DistanceVt
Driver Receiver
Beckmann & Wood
MICRO ’03 - TLC: Transmission Line Caches 8
RC Wire vs. TL Design
RC delay dominated
ReceiverDriver
On-chip Transmission Line
Conventional Global RC Wire
LC delay dominated
~0.375 mm
~10 mm
Beckmann & Wood
MICRO ’03 - TLC: Transmission Line Caches 9
On-chip Transmission Lines
• Why now? → 2010 technology– Relative RC delay ↑– Improve latency by 10x or more
• What are their limitations?– Require thick wires and dielectric spacing– Increase wafer cost
Presents a different Latency/Bandwidth Tradeoff
Beckmann & Wood
MICRO ’03 - TLC: Transmission Line Caches 10
Link Latency
0
10
20
30
40
50
0.5 1 1.5 2 2.5 3
Length (cm)
Lat
ency
(cy
cles
)
Repeated RC
Single TL
Link Latency
0
10
20
30
40
50
0.5 1 1.5 2 2.5 3
Length (cm)
Lat
ency
(cy
cles
)
Repeated RC
Single TL
Latency Comparison
Beckmann & Wood
MICRO ’03 - TLC: Transmission Line Caches 11
Bandwidth Comparison
2 transmission line signals
50 conventional signals
Key observation• Transmission lines – route over large structures• Conventional wires – substrate area & vias for repeaters
Beckmann & Wood
MICRO ’03 - TLC: Transmission Line Caches 12
Outline
• Problem: Global interconnect
• Opportunity: On-chip transmission lines
• Application: Large on-chip caches
• Solution: TLC: Transmission Line Caches
• Evaluation
• Conclusions
Beckmann & Wood
MICRO ’03 - TLC: Transmission Line Caches 13
Texas Non-uniform Cache Architectures (NUCA)
Bank
Switch
SNUCA – statically partitions addresses across the banks
CacheController
Request 0x….3Request 0x….C
Beckmann & Wood
MICRO ’03 - TLC: Transmission Line Caches 14
Texas DNUCA Solution
A BIssues with DNUCA• Locating cache blocks
• Power consumed accessing distant banks
• 15% of total area devoted to routing channels
Frequently requested blocks migrate towards the cache controller
Beckmann & Wood
MICRO ’03 - TLC: Transmission Line Caches 15
Outline
• Problem: Global interconnect
• Opportunity: On-chip transmission lines
• Application: Large on-chip caches
• Solution: TLC - Transmission Line Caches
• Evaluation
• Conclusions
Beckmann & Wood
MICRO ’03 - TLC: Transmission Line Caches 16
TLC - Transmission Line Cache
512 KBBank
TLC Cache Controller
TL Drivers &Receivers
TL link2x8 bytes
High bandwidth, low latency interface between the controller and banks
Beckmann & Wood
MICRO ’03 - TLC: Transmission Line Caches 17
TLC Cache Controller
Repeaters
Multi-cycledelay
CentralCache
ControllerLogic
TransmissionLines
Latches
TransmissionLineTransceivers
Transmission Lines
Beckmann & Wood
MICRO ’03 - TLC: Transmission Line Caches 18
Outline
• Problem: Global interconnect
• Opportunity: On-chip transmission lines
• Application: Large on-chip caches
• Solution: TLC - Transmission Line Caches
• Evaluation
• Conclusions
Beckmann & Wood
MICRO ’03 - TLC: Transmission Line Caches 19
Methodology
• Assumptions– ITRS projection for 2010
• 45 nm technology• Low-k (2.1) intermetal dielectric
– 10 GHz operational frequency• Physical Evaluation
– Linpar RLC extractor– Hspice W element transmission line
• Performance Evaluation– Full system simulation– Simics extended with an Out-of-Order processor and
memory system timing models
Beckmann & Wood
MICRO ’03 - TLC: Transmission Line Caches 20
Cache Characteristics
Cache
Design
Total Size
Banks Bank Size Bank Access
Time
Uncontended
Latency
SNUCA 16 MB 32 512 KB 8 cycles 9 – 32 cycles
DNUCA 16 256 64 3 3 – 47
TLC 16 32 512 8 10 – 16
• Exclusive write-back caches
• 4 wide, 30 stage pipeline, OoO processor
• 300 cycle memory latency
Beckmann & Wood
MICRO ’03 - TLC: Transmission Line Caches 21
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
bzip
gcc
mcf
perl
lucas
swim
applu
equa
ke
apac
heze
us
s_jbb olt
p
Benchmarks
No
rma
lize
d E
xe
cu
tio
n T
ime
SNUCA
DNUCA
TLC
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Benchmarks
No
rma
lize
d E
xe
cu
tio
n T
ime
SNUCA
DNUCA
TLC
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Benchmarks
No
rma
lize
d E
xe
cu
tio
n T
ime
SNUCA
DNUCA
TLC
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
bzip
gcc
mcf
perl
lucas
swim
applu
equa
ke
apac
heze
us
s_jbb olt
p
.
No
rma
lize
d E
xe
cu
tio
n T
ime
SNUCA
DNUCA
TLC
Performance
SpecINT SpecFP Commercial
Beckmann & Wood
MICRO ’03 - TLC: Transmission Line Caches 22
Substrate Area
Cache
Design
Storage
Area
Channel
Area
Controller Area
Total
Area
D-NUCA 92 mm2 17 mm2 1.1 mm2 110 mm2
TLC 77 3.1 10 91*
• On-chip transmission lines allow direct routing from the driver to receiver without repeaters
• Facilitates compact layout• Devotes less substrate area to the routing channels
* 18% reduction
Beckmann & Wood
MICRO ’03 - TLC: Transmission Line Caches 23
Link Utilization
0
1
2
bzip
gcc
mcf
perl
lucas
swim
applu
equa
ke
apac
heze
uss_
jbb oltp
Benchmarks
Lin
k U
tiliz
ati
on
(%
)
TLC
Beckmann & Wood
MICRO ’03 - TLC: Transmission Line Caches 24
Optimized TLC Designs
• Utilize fewer transmission lines– Base design: requires 2k transmission lines– Opt designs: require 1k, 500, & 350
• Reduce manufacturing cost
• Increase logic complexity
Beckmann & Wood
MICRO ’03 - TLC: Transmission Line Caches 25
0
2
4
6
8
10
12
14
Benchmarks
Lin
k U
tiliz
ati
on
(%
)
TLC
TLCopt 1000
TLCopt 500
TLCopt 350
0
2
4
6
8
10
12
14
Benchmarks
Lin
k U
tiliz
atio
n (%
)
TLC
TLCopt 1000
TLCopt 500
TLCopt 350
0
2
4
6
8
10
12
14
Benchmarks
Lin
k U
tiliz
ati
on
(%
)
TLC
TLCopt 1000
TLCopt 500
TLCopt 350
0
2
4
6
8
10
12
14
Benchmarks
Lin
k U
tiliz
atio
n (%
)
TLC
TLCopt 1000
TLCopt 500
TLCopt 350
Link Utilization (TLC Family)
Beckmann & Wood
MICRO ’03 - TLC: Transmission Line Caches 26
Performance (TLC Family)
0
0.2
0.4
0.6
0.8
1
bzip
gcc
mcf
perl
lucas
swim
applu
equa
ke
apac
heze
us
s_jbb olt
p
Benchmarks
No
rma
lize
d E
xe
cu
tio
n T
ime
TLC
TLCopt 1000
TLCopt 500
TLCopt 350
Beckmann & Wood
MICRO ’03 - TLC: Transmission Line Caches 27
Conclusions 1
• Transmission lines offer a different latency/bandwidth tradeoff
• Advantages– Lower latency for global links– Direct routing over large structures
• Limitations– Large, sparsely populated, metal layers– Greater circuit verification effort
Beckmann & Wood
MICRO ’03 - TLC: Transmission Line Caches 28
Conclusions 2
• Possible application: TLC• Advantages
– Consistent high performance– Simpler logical design– 18% less substrate area– Less power in the communication network
• Disadvantages– Circuit verification– Wafer cost
Beckmann & Wood
MICRO ’03 - TLC: Transmission Line Caches 29
Other Applications?
Beckmann & Wood
MICRO ’03 - TLC: Transmission Line Caches 30
TL link2x126 bitsTL link2x64 bitsTL link2x44 bits
Optimized TLC Designs
1 MBBank
TLCopt 1000
• Blocks are partitioned across 2 banks
• Each transmission line link is 126 bits wide
• 1008 total data TLs
TLCopt 500
• Blocks are partitioned across 4 banks
• Each transmission line link is 64 bits wide
• 512 total data TLs
TLCopt Cache Controller
TLCopt 350
• Blocks are partitioned across 8 banks
• Each transmission line link is 44 bits wide
• 352 total data TLs
Beckmann & Wood
MICRO ’03 - TLC: Transmission Line Caches 31
Equake Performance
Misses per 1K Instructions
012345678
TLCLRU
TLC 4-way
TLC 8-way
TLC 16-way
TLC 32-way
DNUCA
Cache Design
Mis
ses
per
1K
In
str.
Executed Cycles
0.E+001.E+082.E+083.E+084.E+085.E+086.E+087.E+08
TLC LR
U
TLC 4-
way
TLC 8-
way
TLC 16
-way
TLC 32
-way
DNUCA
Cache Design
Cyc
les
Beckmann & Wood
MICRO ’03 - TLC: Transmission Line Caches 32
Additional Transceiver Delay