December 5, 2001MICRO-34, Austin, Texas Cool-Cache for Hot Multimedia Osman S. Unsal, Raksit Ashok,...
-
date post
15-Jan-2016 -
Category
Documents
-
view
214 -
download
0
Transcript of December 5, 2001MICRO-34, Austin, Texas Cool-Cache for Hot Multimedia Osman S. Unsal, Raksit Ashok,...
December 5, 2001 MICRO-34, Austin, Texas
Cool-Cache for Hot Multimedia
Osman S. Unsal, Raksit Ashok,
Israel Koren, C. Mani Krishna,
Csaba Andras Moritz
Department of Electrical and Computer Engineering
University of Massachusetts, Amherst
Power Density
1
10
100
1000
W/cm2
i386 i486 Pentium PentiumPro
PentiumII
PentiumIII
PentiumIV
NuclearReactor
Source: Fred Pollack, Intel, Micro32
Cool-* Project
A compiler-enabled power-aware architecture.
CPU Power Dissipation by Block
Alpha 21464
Issue46%
Mem26%
Exec22%
Fetch6%
Power PC
Clock19%
Cache23%
Control L.16%
Data Flow11%
I/O 7%
PLA5%
ROM2%
TLB17%
Strong ARM
Icache26%
Ibox18%
Clock10%
IMMU9%
EBOX8%
DMMU8%
Others5%
Dcache16%
Concentrate on L1 data cache
IEEE Journal of SSC Nov. 96 Proceedings of ISSCC 94
Cool Chips, Micro-32, 99
Cool-Cache Philosophy
Speculatively employ static information to simplify memory accesses
Leverage multimedia sensitive compile-time partitioning of memory accesses
Conventional Cool-Cache
Data
Static and dynamic
Tag
Dynamic
SRAM Buffer
•Non-adaptive
•Tags
•Single access mechanism
•Statically Speculative
•No Tags
•Multiple access mechanism
Cool-Cache Framework
Minibuffer Scratchpad – Scalars in media applications have low memory
footprint, high access frequency
– Partition scalars from non-scalars
Hotlines– Non-scalar locations in cache can be speculatively
predicted
– Simplify memory accesses
Cool-Cache Architecture
Cool-Cache Architecture
Cool-Cache Architecture
Cool-Cache Architecture
Hotline Approachfor (i=0;i<100; i++) {
a[i]=a[i+1]; /* both can be mapped to the same hotline */
*p++=b[i]; /* to separate hotlines without alias analysis */
}
•Based on:
Type analysis
Control-flow and loop-structure analysis
Alias analysis
• A compile-time fully-predictable approach would require loop-transformations to align accesses to cache line boundaries, has limited scope to simple loops.
Hotlines Advantages
• Speculative prediction does not require static correctness
• Granularity of speculation is compiler controllable
• Hotlines does not increase code size
Cool-Cache Compiler
High-Level Analysis
Alias Analysis
Hotlines Analysis
Cool-Cache Specific Code Generation
Footprint AnalysisAnnotations
High-Level Optimizations
BenchmarksBenchmark DescriptionADPCM Adaptive differential pulse code modification audio coding
EPIC Image compression coder based on wavelet decomposition
G721 Voice compression coder based on G.711,721,723 standards
GSM Rate speech transcoding coder based on the GSM standard
JPEG A lossy image compression coder
MESA OpenGL clone: using Mipmap quadilateral texture mapping
MPEG Lossy motion video compression decoder
PEGWIT Public key encryption coder generates a public key
RASTA Speech recognition front-end processing
Experimental Setup
General Parameters
1GHz,
0.35μm, 2.5V
Issue In-order, single
L1 D-Cache 64K, 2way
Minibuffer 1K
L1 I-Cache 32K, 2way
L2 Cache None
Main memory 100 cycles
Minibuffer FootprintApplication Size
Adpcm 0
Epic 203
G721 Enc. 32
Gsm Enc. 146
Jpeg Enc. 83
Mpeg Enc. 604
Pegwit 16
Rasta 152
PGP 358
Mesa 770
Application 32reg. 16 reg.
Epic 32.0 62.4
G721 4.5 38.8
Gsm 2.3 37.2
Jpeg 1.1 46.5
Rasta 16.0 36.0
•Scalar memory requirements are low!
•Percentage of scalars in total memory accesses are high!
Impact of Minibuffer
0100200300400500600700800900
Ene
rgy
cons
umpt
ion
(mJ)
Epi
c
Peg
wit
Ras
ta
Mip
map
Mpe
g
G72
1
Gsm
Jpeg
16 R. W/Minibuffer 16 R. No Minibuffer32 R. No Minibuffer
Minibuffer Energy Savings
0
10
20
30
40
50
60
Per
cent
Epi
c
Peg
wit
Ras
ta
Mip
map
Mpe
g
G72
1
Gsm
Jpeg
32-Register16-Register
Hotlines Hit Rate
0
20
40
60
80
100
Per
cent
Gsm
Jpeg
Mpe
g
Ras
ta
Epi
c
G72
1
Mes
a
Peg
wit
Adp
cm
SW handler
Cache TLB
Static
Cool-Cache Relative Runtime
00.20.40.60.8
11.21.4
Gsm
Jpeg
Mpe
g
Ras
ta
Epi
c
G72
1
Mes
a
Peg
wit
1024
256
64
Cool-Cache Energy Savings(32 Registers)
0
10
20
30
40
50
60
Per
cent
Gsm
Jpeg
Mp
eg
Ras
ta
Ep
ic
G72
1
Mes
a
Peg
wit
Ad
pcm
4-WayDirect
Cool-Cache Energy Savings(16 Registers)
01020304050607080
Per
cent
Gsm
Jpeg
Mp
eg
Ras
ta
Ep
ic
G72
1
Mes
a
Peg
wit
Ad
pcm
4-Way
Conclusion
Cool-Cache: a compiler-enabled, power-aware data cache
Static speculative approach is powerful