TLC: Transmission Line Caches Brad Beckmann David Wood Multifacet Project University of...

32
TLC: Transmission Line Caches Brad Beckmann David Wood Multifacet Project http://www.cs.wisc.edu/multifacet/ University of Wisconsin-Madison 12/3/03

Transcript of TLC: Transmission Line Caches Brad Beckmann David Wood Multifacet Project University of...

Page 1: TLC: Transmission Line Caches Brad Beckmann David Wood Multifacet Project  University of Wisconsin-Madison 12/3/03.

TLC: Transmission Line Caches

Brad Beckmann

David Wood

Multifacet Project

http://www.cs.wisc.edu/multifacet/

University of Wisconsin-Madison

12/3/03

Page 2: TLC: Transmission Line Caches Brad Beckmann David Wood Multifacet Project  University of Wisconsin-Madison 12/3/03.

Beckmann & Wood

MICRO ’03 - TLC: Transmission Line Caches 2

Overview

• Problem: Global interconnect• Opportunity: On-chip transmission lines

– What are they?– Why now?

• Application: Large on-chip caches• Solution: TLC: Transmission Line Caches

+ Consistent high performance+ Simple logical design+ Less substrate area– Circuit verification– Wafer manufacturing cost

Page 3: TLC: Transmission Line Caches Brad Beckmann David Wood Multifacet Project  University of Wisconsin-Madison 12/3/03.

Beckmann & Wood

MICRO ’03 - TLC: Transmission Line Caches 3

Outline

• Problem: Global interconnect

• Opportunity: On-chip transmission lines

• Application: Large on-chip caches

• Solution: TLC: Transmission Line Caches

• Evaluation

• Conclusions

Page 4: TLC: Transmission Line Caches Brad Beckmann David Wood Multifacet Project  University of Wisconsin-Madison 12/3/03.

Beckmann & Wood

MICRO ’03 - TLC: Transmission Line Caches 4

Global Interconnect Problem

• Global interconnect latency → Bottleneck– RC delay dominant– Held constant using repeaters– Doesn’t scale with transistors

• Large structures particularly hurt– Partitioning mitigates intra-partition delay – Performance dominated by inter-partition delay

Page 5: TLC: Transmission Line Caches Brad Beckmann David Wood Multifacet Project  University of Wisconsin-Madison 12/3/03.

Beckmann & Wood

MICRO ’03 - TLC: Transmission Line Caches 5

Conventional Solution

• ↑ wire size → ↓ RC delay+ 3x size → 3x reduced delay+ ↑ wire segment length– 3x channel area– Doesn’t scale

• Intrinsic repeater delay• Inductive effects

A Better Solution?

Page 6: TLC: Transmission Line Caches Brad Beckmann David Wood Multifacet Project  University of Wisconsin-Madison 12/3/03.

Beckmann & Wood

MICRO ’03 - TLC: Transmission Line Caches 6

Outline

• Problem: Global interconnect

• Opportunity: On-chip transmission lines

• Application: Large on-chip caches

• Solution: TLC - Transmission Line Caches

• Evaluation

• Conclusions

Page 7: TLC: Transmission Line Caches Brad Beckmann David Wood Multifacet Project  University of Wisconsin-Madison 12/3/03.

Beckmann & Wood

MICRO ’03 - TLC: Transmission Line Caches 7

RC vs. TL CommunicationConventional Global RC Wire

On-chip Transmission Line

Voltage Voltage

DistanceVt

Driver Receiver

Voltage Voltage

DistanceVt

Driver Receiver

Page 8: TLC: Transmission Line Caches Brad Beckmann David Wood Multifacet Project  University of Wisconsin-Madison 12/3/03.

Beckmann & Wood

MICRO ’03 - TLC: Transmission Line Caches 8

RC Wire vs. TL Design

RC delay dominated

ReceiverDriver

On-chip Transmission Line

Conventional Global RC Wire

LC delay dominated

~0.375 mm

~10 mm

Page 9: TLC: Transmission Line Caches Brad Beckmann David Wood Multifacet Project  University of Wisconsin-Madison 12/3/03.

Beckmann & Wood

MICRO ’03 - TLC: Transmission Line Caches 9

On-chip Transmission Lines

• Why now? → 2010 technology– Relative RC delay ↑– Improve latency by 10x or more

• What are their limitations?– Require thick wires and dielectric spacing– Increase wafer cost

Presents a different Latency/Bandwidth Tradeoff

Page 10: TLC: Transmission Line Caches Brad Beckmann David Wood Multifacet Project  University of Wisconsin-Madison 12/3/03.

Beckmann & Wood

MICRO ’03 - TLC: Transmission Line Caches 10

Link Latency

0

10

20

30

40

50

0.5 1 1.5 2 2.5 3

Length (cm)

Lat

ency

(cy

cles

)

Repeated RC

Single TL

Link Latency

0

10

20

30

40

50

0.5 1 1.5 2 2.5 3

Length (cm)

Lat

ency

(cy

cles

)

Repeated RC

Single TL

Latency Comparison

Page 11: TLC: Transmission Line Caches Brad Beckmann David Wood Multifacet Project  University of Wisconsin-Madison 12/3/03.

Beckmann & Wood

MICRO ’03 - TLC: Transmission Line Caches 11

Bandwidth Comparison

2 transmission line signals

50 conventional signals

Key observation• Transmission lines – route over large structures• Conventional wires – substrate area & vias for repeaters

Page 12: TLC: Transmission Line Caches Brad Beckmann David Wood Multifacet Project  University of Wisconsin-Madison 12/3/03.

Beckmann & Wood

MICRO ’03 - TLC: Transmission Line Caches 12

Outline

• Problem: Global interconnect

• Opportunity: On-chip transmission lines

• Application: Large on-chip caches

• Solution: TLC: Transmission Line Caches

• Evaluation

• Conclusions

Page 13: TLC: Transmission Line Caches Brad Beckmann David Wood Multifacet Project  University of Wisconsin-Madison 12/3/03.

Beckmann & Wood

MICRO ’03 - TLC: Transmission Line Caches 13

Texas Non-uniform Cache Architectures (NUCA)

Bank

Switch

SNUCA – statically partitions addresses across the banks

CacheController

Request 0x….3Request 0x….C

Page 14: TLC: Transmission Line Caches Brad Beckmann David Wood Multifacet Project  University of Wisconsin-Madison 12/3/03.

Beckmann & Wood

MICRO ’03 - TLC: Transmission Line Caches 14

Texas DNUCA Solution

A BIssues with DNUCA• Locating cache blocks

• Power consumed accessing distant banks

• 15% of total area devoted to routing channels

Frequently requested blocks migrate towards the cache controller

Page 15: TLC: Transmission Line Caches Brad Beckmann David Wood Multifacet Project  University of Wisconsin-Madison 12/3/03.

Beckmann & Wood

MICRO ’03 - TLC: Transmission Line Caches 15

Outline

• Problem: Global interconnect

• Opportunity: On-chip transmission lines

• Application: Large on-chip caches

• Solution: TLC - Transmission Line Caches

• Evaluation

• Conclusions

Page 16: TLC: Transmission Line Caches Brad Beckmann David Wood Multifacet Project  University of Wisconsin-Madison 12/3/03.

Beckmann & Wood

MICRO ’03 - TLC: Transmission Line Caches 16

TLC - Transmission Line Cache

512 KBBank

TLC Cache Controller

TL Drivers &Receivers

TL link2x8 bytes

High bandwidth, low latency interface between the controller and banks

Page 17: TLC: Transmission Line Caches Brad Beckmann David Wood Multifacet Project  University of Wisconsin-Madison 12/3/03.

Beckmann & Wood

MICRO ’03 - TLC: Transmission Line Caches 17

TLC Cache Controller

Repeaters

Multi-cycledelay

CentralCache

ControllerLogic

TransmissionLines

Latches

TransmissionLineTransceivers

Transmission Lines

Page 18: TLC: Transmission Line Caches Brad Beckmann David Wood Multifacet Project  University of Wisconsin-Madison 12/3/03.

Beckmann & Wood

MICRO ’03 - TLC: Transmission Line Caches 18

Outline

• Problem: Global interconnect

• Opportunity: On-chip transmission lines

• Application: Large on-chip caches

• Solution: TLC - Transmission Line Caches

• Evaluation

• Conclusions

Page 19: TLC: Transmission Line Caches Brad Beckmann David Wood Multifacet Project  University of Wisconsin-Madison 12/3/03.

Beckmann & Wood

MICRO ’03 - TLC: Transmission Line Caches 19

Methodology

• Assumptions– ITRS projection for 2010

• 45 nm technology• Low-k (2.1) intermetal dielectric

– 10 GHz operational frequency• Physical Evaluation

– Linpar RLC extractor– Hspice W element transmission line

• Performance Evaluation– Full system simulation– Simics extended with an Out-of-Order processor and

memory system timing models

Page 20: TLC: Transmission Line Caches Brad Beckmann David Wood Multifacet Project  University of Wisconsin-Madison 12/3/03.

Beckmann & Wood

MICRO ’03 - TLC: Transmission Line Caches 20

Cache Characteristics

Cache

Design

Total Size

Banks Bank Size Bank Access

Time

Uncontended

Latency

SNUCA 16 MB 32 512 KB 8 cycles 9 – 32 cycles

DNUCA 16 256 64 3 3 – 47

TLC 16 32 512 8 10 – 16

• Exclusive write-back caches

• 4 wide, 30 stage pipeline, OoO processor

• 300 cycle memory latency

Page 21: TLC: Transmission Line Caches Brad Beckmann David Wood Multifacet Project  University of Wisconsin-Madison 12/3/03.

Beckmann & Wood

MICRO ’03 - TLC: Transmission Line Caches 21

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

bzip

gcc

mcf

perl

lucas

swim

applu

equa

ke

apac

heze

us

s_jbb olt

p

Benchmarks

No

rma

lize

d E

xe

cu

tio

n T

ime

SNUCA

DNUCA

TLC

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Benchmarks

No

rma

lize

d E

xe

cu

tio

n T

ime

SNUCA

DNUCA

TLC

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Benchmarks

No

rma

lize

d E

xe

cu

tio

n T

ime

SNUCA

DNUCA

TLC

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

bzip

gcc

mcf

perl

lucas

swim

applu

equa

ke

apac

heze

us

s_jbb olt

p

.

No

rma

lize

d E

xe

cu

tio

n T

ime

SNUCA

DNUCA

TLC

Performance

SpecINT SpecFP Commercial

Page 22: TLC: Transmission Line Caches Brad Beckmann David Wood Multifacet Project  University of Wisconsin-Madison 12/3/03.

Beckmann & Wood

MICRO ’03 - TLC: Transmission Line Caches 22

Substrate Area

Cache

Design

Storage

Area

Channel

Area

Controller Area

Total

Area

D-NUCA 92 mm2 17 mm2 1.1 mm2 110 mm2

TLC 77 3.1 10 91*

• On-chip transmission lines allow direct routing from the driver to receiver without repeaters

• Facilitates compact layout• Devotes less substrate area to the routing channels

* 18% reduction

Page 23: TLC: Transmission Line Caches Brad Beckmann David Wood Multifacet Project  University of Wisconsin-Madison 12/3/03.

Beckmann & Wood

MICRO ’03 - TLC: Transmission Line Caches 23

Link Utilization

0

1

2

bzip

gcc

mcf

perl

lucas

swim

applu

equa

ke

apac

heze

uss_

jbb oltp

Benchmarks

Lin

k U

tiliz

ati

on

(%

)

TLC

Page 24: TLC: Transmission Line Caches Brad Beckmann David Wood Multifacet Project  University of Wisconsin-Madison 12/3/03.

Beckmann & Wood

MICRO ’03 - TLC: Transmission Line Caches 24

Optimized TLC Designs

• Utilize fewer transmission lines– Base design: requires 2k transmission lines– Opt designs: require 1k, 500, & 350

• Reduce manufacturing cost

• Increase logic complexity

Page 25: TLC: Transmission Line Caches Brad Beckmann David Wood Multifacet Project  University of Wisconsin-Madison 12/3/03.

Beckmann & Wood

MICRO ’03 - TLC: Transmission Line Caches 25

0

2

4

6

8

10

12

14

Benchmarks

Lin

k U

tiliz

ati

on

(%

)

TLC

TLCopt 1000

TLCopt 500

TLCopt 350

0

2

4

6

8

10

12

14

Benchmarks

Lin

k U

tiliz

atio

n (%

)

TLC

TLCopt 1000

TLCopt 500

TLCopt 350

0

2

4

6

8

10

12

14

Benchmarks

Lin

k U

tiliz

ati

on

(%

)

TLC

TLCopt 1000

TLCopt 500

TLCopt 350

0

2

4

6

8

10

12

14

Benchmarks

Lin

k U

tiliz

atio

n (%

)

TLC

TLCopt 1000

TLCopt 500

TLCopt 350

Link Utilization (TLC Family)

Page 26: TLC: Transmission Line Caches Brad Beckmann David Wood Multifacet Project  University of Wisconsin-Madison 12/3/03.

Beckmann & Wood

MICRO ’03 - TLC: Transmission Line Caches 26

Performance (TLC Family)

0

0.2

0.4

0.6

0.8

1

bzip

gcc

mcf

perl

lucas

swim

applu

equa

ke

apac

heze

us

s_jbb olt

p

Benchmarks

No

rma

lize

d E

xe

cu

tio

n T

ime

TLC

TLCopt 1000

TLCopt 500

TLCopt 350

Page 27: TLC: Transmission Line Caches Brad Beckmann David Wood Multifacet Project  University of Wisconsin-Madison 12/3/03.

Beckmann & Wood

MICRO ’03 - TLC: Transmission Line Caches 27

Conclusions 1

• Transmission lines offer a different latency/bandwidth tradeoff

• Advantages– Lower latency for global links– Direct routing over large structures

• Limitations– Large, sparsely populated, metal layers– Greater circuit verification effort

Page 28: TLC: Transmission Line Caches Brad Beckmann David Wood Multifacet Project  University of Wisconsin-Madison 12/3/03.

Beckmann & Wood

MICRO ’03 - TLC: Transmission Line Caches 28

Conclusions 2

• Possible application: TLC• Advantages

– Consistent high performance– Simpler logical design– 18% less substrate area– Less power in the communication network

• Disadvantages– Circuit verification– Wafer cost

Page 29: TLC: Transmission Line Caches Brad Beckmann David Wood Multifacet Project  University of Wisconsin-Madison 12/3/03.

Beckmann & Wood

MICRO ’03 - TLC: Transmission Line Caches 29

Other Applications?

Page 30: TLC: Transmission Line Caches Brad Beckmann David Wood Multifacet Project  University of Wisconsin-Madison 12/3/03.

Beckmann & Wood

MICRO ’03 - TLC: Transmission Line Caches 30

TL link2x126 bitsTL link2x64 bitsTL link2x44 bits

Optimized TLC Designs

1 MBBank

TLCopt 1000

• Blocks are partitioned across 2 banks

• Each transmission line link is 126 bits wide

• 1008 total data TLs

TLCopt 500

• Blocks are partitioned across 4 banks

• Each transmission line link is 64 bits wide

• 512 total data TLs

TLCopt Cache Controller

TLCopt 350

• Blocks are partitioned across 8 banks

• Each transmission line link is 44 bits wide

• 352 total data TLs

Page 31: TLC: Transmission Line Caches Brad Beckmann David Wood Multifacet Project  University of Wisconsin-Madison 12/3/03.

Beckmann & Wood

MICRO ’03 - TLC: Transmission Line Caches 31

Equake Performance

Misses per 1K Instructions

012345678

TLCLRU

TLC 4-way

TLC 8-way

TLC 16-way

TLC 32-way

DNUCA

Cache Design

Mis

ses

per

1K

In

str.

Executed Cycles

0.E+001.E+082.E+083.E+084.E+085.E+086.E+087.E+08

TLC LR

U

TLC 4-

way

TLC 8-

way

TLC 16

-way

TLC 32

-way

DNUCA

Cache Design

Cyc

les

Page 32: TLC: Transmission Line Caches Brad Beckmann David Wood Multifacet Project  University of Wisconsin-Madison 12/3/03.

Beckmann & Wood

MICRO ’03 - TLC: Transmission Line Caches 32

Additional Transceiver Delay