ADAPTIVE CACHE-LINE SIZE MANAGEMENT ON 3D INTEGRATED MICROPROCESSORSTakatsugu Ono, Koji Inoue and Kazuaki MurakamiKyushu University, Japan
ISOCC 2009
Outline Introduction Software Controllable-Variable
Line Size(SC-VLS) Cache
Evaluation Summary
2
3D Integration Stacking the
main memory on processors
Connecting them by wide on-chip buses
The memory bandwidth can be improved
Processor Core
$DL1 $IL1
Main Memory
wide on-chip bus
3
Motivation4
3D stacking makes it possible to reduce the cache miss penalty
We can employ larger cache line size in order to expect the effect of prefetching
But… if programs don’t have high spatial localities of memory references
It might worsen the performanceA large amount of energy is required!
Software-Controllable Variable Line-Size Cache (1/3)
5
We propose SC-VLS cache It attempts to optimize the amount of
data to be transferred between cache memory and main memory
When a program does not require high memory bandwidth ⇒ SC-VLS cache reduces the cache line size
Software-Controllable Variable Line-Size Cache (2/3)
6
Features SC-VLS cache doesn’t require any
hardware monitor to decide the line size
Advantages SC-VLS cache reduces energy
consumption with trivial hardware overhead
Software-Controllable Variable Line-Size Cache (3/3)
7
Adequate line size analysis Before an application program is executed,
we analyze an adequate line size of each function
Code generation Line size change instructions are inserted into
start of functions in original program code The instruction sets status register to indicate
an adequate line size
Adequate Line Size Analysis- Example-
8
5|100 6|200 2|100 5|100 2|200
foo1() foo2() foo3() foo1() foo2()
line size = 64B
ave. MR64Bfoo1()= 10/200 = 5.0%ave. MR64Bfoo2()= 8/400 = 2.0%ave. MR64Boo3()= 2/100 = 2.0%
2|100 2|200 1|100 2|100 14|200
foo1() foo2() foo3() foo1() foo2()
line size = 32Bave. MR32Bfoo1()= 4/200 = 2.0%ave. MR32Bfoo2()= 16/400 = 4.0%ave. MR32Bfoo3()= 1/100 = 1.0%
32B 64B 32B 32B 64B
foo1() foo2() foo3() foo1() foo2()
adequate line sizefoo1() = 32Badequate line sizefoo2() = 64Badequate line sizefoo3() = 32B
MR64B≒ 2.9%
MR32B = 3.0%
MRadequate ≒ 1.9%
# of misses # of accesses
2|100 6|200 1|100 2|100 2|200
line size = adequate
Evaluation Simulator
SimpleScalar and CACTI Benchmark programs
10 programs (MiBench) Input data sets
Analysis phase: small Execution phase: large
The SC-VLS cache can dynamically choose four line sizes; 32B, 64B, 128B and 256B
9
Energy
bitcou
nt madtiff
2bw
dijkst
ra
rijnda
el_en
c
rijnda
el_de
csha
adpcm
_enc
adpcm
_dec
lame
00.5
11.5
22.5
3
FIX32B FIX64B FIX128B FIX256B SC-VLS
Benchmark programs
Nor
mal
ized
ene
rgy
11.4
3.7
3.7
9.0
4.5
11.4
11.3
7.1
5.219.3
10
Performance
bitcou
ntmad
tiff2b
wdij
kstra
rijnda
el_en
c
rijnda
el_de
csha
adpcm
_enc
adpcm
_dec
lame
0.940.960.98
11.021.041.061.08
FIX32B FIX64B FIX128B FIX256B SC-VLS
Benchmark programs
Nor
mal
ized
exe
cutio
n tim
e
11
Summary 3D integration
can improve memory bandwidth makes it possible to reduce the cache
miss penalty SC-VLS cache
can dynamically change the line sizes reduces the energy consumption up
to 75%
12
THANK YOU
ACKNOWLEDGEMENTThis research was supported in part by New Energy and Industrial Technology Development Organization
ArchitectureTag Index Offset
MUX
Processor
Status Reg.
Address
Set an adequate line size
Data
Hit / Miss
Tag Minimumline sizeValid bit
SRAM cell array=
==
==
=
==
MUX
DRAM cell array
32B
TSV
32B
32B
32B
32B
32B
32B
32B
14
Adequate Line Size Analysis15
We execute cache simulation with each line size independently to determine an adequate line size
1. An average cache miss rate of each function is calculated
2. We compare the average cache miss rates with all line size candidates
3. A line size which the cache miss rate is the smallest is determined as an adequate line size
Energy Model16
mmAC
iiaccessSADRAMaccessLL NEEACE
111mem
# L1 memory access
Total energy of stacked DRAM
average energy for a cache access
Total energy of $L1
# main memory access
average energy for a cache access# activated DRAM sub-array
Average SC-VLS Cache Line Size
Benchmarks Average SC-VLS cache line size (B)
bitcount 81.94mad 233.60
tiff2bw 255.99dijkstra 223.04
rijndael_enc 64.82rijndael_dec 33.01
sha 141.90adpcm_enc 233.40adpcm_dec 255.67
lame 254.78
17
Top Related