Tony Givargis University of California, Riverside & NEC USA
1
Fast Cache and Bus Power Estimation for Parameterized
System-on-a-Chip Design
Tony D. Givargis & Frank VahidDepartment of Computer Science
University of CaliforniaRiverside, CA 92521
{givargis,vahid}@cs.ucr.edu
Jörg HenkelC&C Research Laboratories, NEC USA
4 Independence Way, Princeton, NJ [email protected]
A DAC scholarship and a NSF grant in part supported this research.
Tony Givargis University of California, Riverside & NEC USA
2
Introduction
• Systems-on-a-chip (SOC) era– increased chip capacity– parametrizable core based system design
• Large power/performance tradeoffs possible just by varying bus/cache parameter values [givargis99]
• But, simulation based cache/bus power evaluation is slow
Tony Givargis University of California, Riverside & NEC USA
3
Introduction
• We present a two-step approach for fast cache power evaluation– collect intermediate data using simulation– use equations to rapidly predict power– couple with a fast bus estimation approach
• Our approach is– orders of magnitude faster than simulation– yields good accuracy
Tony Givargis University of California, Riverside & NEC USA
4
Target Architecture
Bus ACPU I-Cache
D-Cache
Bridge
Peripheral 1
Peripheral Bus
Peripheral 2 Peripheral n
Memory
Bus B
Tony Givargis University of California, Riverside & NEC USA
5
Focus on Cache/Bus Parameters
Bus A Bus B
Peripheral 1
Peripheral Bus
Bridge
CPUI-Cache
D-Cache
Peripheral 2 Peripheral n
Memory
CACHE40%
BUS20%
Others10%
CPU30%
Power dissipation breakdownin a Digital Camera example
Tony Givargis University of California, Riverside & NEC USA
6
Cache Parameters
Bus A
Peripheral 1
CPU I-Cache
D-Cache
Bridge
Peripheral Bus
Peripheral 2 Peripheral n
Memory
Bus B
Tony Givargis University of California, Riverside & NEC USA
7
Cache Parameters
Tag Index Offset
V T D V T D
== ==Mux
Data
• Associativity
• Cache Size
• Line Size
Tony Givargis University of California, Riverside & NEC USA
8
Bus Parameters
Bus A
Peripheral 1
CPU I-Cache
D-Cache
Bridge
Peripheral Bus
Peripheral 2 Peripheral n
Memory
Bus B
Tony Givargis University of California, Riverside & NEC USA
9
Bus Parameters
Bus A/BMuxDemux
MuxDemux
Bus A/BMuxDemux
MuxDemux
Change Bus Width [givargis98]
C1
C2
C1 < C2
Tony Givargis University of California, Riverside & NEC USA
10
Bus Parameters
Bus A/BEncoderDecoder
EncoderDecoder
Change Data Representation (Bus Invert) [Stan95]
Bus A/BEncoderDecoder
EncoderDecoder
invert_ctr
Reduce Bus Switching
Tony Givargis University of California, Riverside & NEC USA
11
Bus Parameters
01001011
10010110
Hamming Dist = 6
01001011
0
01101001
1 inverted_ctr
Binary Encoding Bus-Invert Encoding
Hamming Dist = 3
Tony Givargis University of California, Riverside & NEC USA
12
Related Work
• Important to explore various cache and bus parameters for best performance and power [Wilton96][Li98][givargis99]
– large number of cache/bus configurations– need to estimate power/performance in constant time
• Trace stripping [Wolf99], configuration ordering, single pass simulation [Kirovski])
Tony Givargis University of California, Riverside & NEC USA
13
Approach Overview• Given a trace of memory refs• Cache parameters
• Size (S)• Line/block-size (L)• Associativity (A)
• Compute # of misses (N)
6
5
4
3
2
1
,,
,,
,,
,,
,,
,,
NALSf
NALSf
NALSf
NALSf
NALSf
NALSf
MaxMinMax
MinMaxMax
MaxMinMin
MinMaxMin
MinMinMax
MinMinMin
0
20
40
60
80
100
120
0.5 1 2 4 8 16 32 64
Size (S)
# of misses (N)
}}}
Tony Givargis University of California, Riverside & NEC USA
14
Approach Overview
• Capture improvements obtainable by:– changing line-size at
small/large values of cache-size
– changing associativity at small/large values of cache-size
)1(),,(
)(
)(
)(
/
/
/
321
2243
1132
1121
tttALSf
RRRat
RRRlt
NNNst
AAAa
LLLl
SSSs
kji
MaxMink
MaxMinj
MaxMini
Tony Givargis University of California, Riverside & NEC USA
15
Approach Overview• Bus equation:
• m items/second (denotes the traffic N on the bus)• n bits/item• k bit wide bus• binary encoding• random data assuption
mkk
nCP bus
2
1
Tony Givargis University of California, Riverside & NEC USA
16
Approach Overview• Bus equation:
• m items/second (denotes the traffic N on the bus)• n bits/item• k bit wide bus• bus-invert encoding• random data assumption
222
21
2
1
2
1
2
1
1 k
k
nmCP
k
k
k
k
k
k
k
bus
Tony Givargis University of California, Riverside & NEC USA
17
Experiments
Bus A Bus B
Peripheral 1
Peripheral Bus
Bridge
CPUI-Cache
D-Cache
Peripheral 2 Peripheral n
Memory
• Cache parameters– size: 128, 256, 512, 1k, 2k, 4k, 8k, 16k, 32k– assoc: 2, 4, 8– line: 8, 16, 32
• Bus Parameters– width: 4, 8, 16, 32– code: binary/bus-invert
• Analyzed 45K sets exhaust.– 3d-Image
– MPEG
– CKey
– Diesel• 5kB to 230kB of C code
Tony Givargis University of California, Riverside & NEC USA
18
Experiment Setup
CProgram
TraceGenerator
CacheSimulator
CPUPower
ISS
Performance
+
Pow
er
MemoryPower
BusSimulator
I/D CachePower
• Dinero [Edler, Hill]
• CPU power [Tiwari96]
Tony Givargis University of California, Riverside & NEC USA
19
Experiment Results
0
0. 05
0. 1
0. 15
0. 2
0. 25
0. 3
Conf 0 Conf 1 Conf 2 Conf 3 Conf 4 Conf 5 Conf 6 Conf 7 Conf 8 Conf 9Execu
tio
n T
ime (
sec)
• Diesel application’s performance• Blue (light-gray) is obtained using full simulation• Red (dark-gray) is obtained using our equations
4% error320x faster
Tony Givargis University of California, Riverside & NEC USA
20
Experiment Results• Diesel application’s energy consumption• Blue (light-gray) is obtained using full simulation• Red (dark-gray) is obtained using our equations
0
500
1000
1500
2000
2500
3000
Conf 0 Conf 1 Conf 2 Conf 3 Conf 4 Conf 5 Conf 6 Conf 7 Conf 8 Conf 9
mic
ro-J
ou
les
2% error420x faster
Tony Givargis University of California, Riverside & NEC USA
21
Experiment Results• CKey application’s performance• Blue (light-gray) is obtained using full simulation• Red (dark-gray) is obtained using our equations
0
5
10
15
20
25
Conf 0 Conf 1 Conf 2 Conf 3 Conf 4 Conf 5 Conf 6 Conf 7 Conf 8 Conf 9
Execu
tio
n T
ime (
sec)
8% error125x faster
Tony Givargis University of California, Riverside & NEC USA
22
Experiment Results• CKey application’s energy consumption• Blue (light-gray) is obtained using full simulation• Red (dark-gray) is obtained using our equations
0
50
100
150
200
250
300
Conf 0 Conf 1 Conf 2 Conf 3 Conf 4 Conf 5 Conf 6 Conf 7 Conf 8 Conf 9
milli-J
ou
les
3 % error125x faster
Tony Givargis University of California, Riverside & NEC USA
23
Experiment Results
• 125 - 400x speedup
• 1-18% absolute error (power & performance)
• 2% average power error
020406080
100120140160180200
3d-image mpeg ckey diesel
SimEq.
Time (hours)
0
0.5
1
1.5
2
2.5
3
3d-image mpeg ckey diesel
Power Error (%)
Tony Givargis University of California, Riverside & NEC USA
24
Conclusion
• Presented a technique for rapidly estimating the power and performance of cache and bus sub-systems– orders of magnitude faster than exhaustive
simulation– yields good accuracy
• Enable exploration of parameters in parameterized system-on-a-chip architecture
Top Related