05483007
-
Upload
connadoz4518 -
Category
Documents
-
view
214 -
download
0
Transcript of 05483007
-
7/27/2019 05483007
1/6
AN ENVIRONMENT FOR ENERGY CONSUMPTION ANALYSIS OF
CACHE MEMORIES IN SOC PLATFORMS
Cordeiro, F.R.; Silva-Filho, A.G.; Araujo, C.C.; Gomes, M.; Barros, E.N.S. and Lima, M.E.
Informatics Center (CIn)
Federal University of Pernambuco (UFPE)
Av. Prof. Luiz Freire s/n Cidade Universitria Recife/PE - Brasil
email: { frc, agsf,cca2,maag,ensb,mel}@cin.ufpe.br
ABSTRACT
The tuning of cache architectures in platforms for
embedded systems applications can dramatically reduceenergy consumption. The existing cache exploration
environments constrain the designer to analyze cache
energy consumption on single processor systems and
worse, systems that are based on a single processor type. In
this paper is presented the PCacheEnergyAnalyzer
environment for energy consumption analysis of cache
memory on SoC platforms. This is a powerful energy
analysis environment that combines the use of efficient
tools to provide static and dynamic energy consumption
analysis, the flexibility to support the architecture
exploration of cache memories on platforms that are not
bound to a specific processor, and fast simulation
techniques. The proposed environment has been integratedinto the SoC modeling framework PDesigner, providing a
user-friendly graphical interface allowing the integrated
modeling and cache energy analysis of SoCs. The
PCacheEnergyAnalyzer has been validated with four
applications of the Mediabench suite benchmark.
1.INTRODUCTIONCurrently, the energy consumed by the memory hierarchy
can account for up to 50% of the total energy spent by
microprocessor-based architectures [1][2]. This fact
becomes more critical due to the emergence of SoCs,
meaning a large part of the integrated circuits contains
heterogeneous processors and often cache memories.
Moreover, current semiconductor technologies have raised
static memory consumption from negligible to up to 30%.
Many approaches do not take this fact in consideration.
Many efforts have been made to reduce the
consumption of energy by adjusting cache parameters to the
needs of a particular application[3][4][5][6]. However,
since the fundamental purpose of the cache subsystem is to
provide high performance for memory accessing, cache
optimization techniques are supposed to be driven not only
by energy savings but also by preventing degradation of the
applications performance.
No single combination of cache parameters (total size,
line size and associativity), also known as cache
configuration, would be perfect for all applications.
Therefore cache subsystems have been customized in order
to deal with specific characteristics and to optimize theirenergy consumption when running a particular application.
By adjusting the parameters of a cache memory to a
specific application it is possible to save on average 60% of
energy consumption [3].
Nevertheless, finding a suitable cache configuration
(combination of total size, line size and associativity) for a
specific application can be a complex task and may take a
long time for simulation and analysis. Most of the tools use
exhaustive or heuristics based exploration [4][5][6] of all
possible cache configurations. These are cost intensive
approaches, leading to unacceptable exploration times.
The tools for cache analysis lack the resources designers
need to perform an energy consumption analysis withefficacy and efficiency. Some do not take into consideration
static energy consumption. These tools are not flexible;
they are normally bound to a specific processor. Designers
experience several difficulties when they need to analyzecaches on a platform with processors that are different from
the ones in the tools.
Some environments have been developed aimed at
cache parameters exploration [4][5][7]. However, none of
them has taken into account cache memory energy models
that consider the two energy components: static anddynamic. Silva-Filho [6] considers static and dynamic
energy components, however, the work focuses on a single
platform and is not integrated in to a graphical interfaceenvironment for platform analysis.
Although cache memory exploration considering energy
consumption is not a new issue in Design SpaceExploration (DSE), this work contributes with a new
approach for exploiting platforms with cache memory
architectures, considering energy consumption. Differently
from other approaches with analysis intended for only one
processor, this paper presents an environment for energy
consumption analysis called PCacheEnergyAnalyzer. Thisis a an environment that provides support for the
exploration of cache memories configurations in terms of
static and dynamic energy consumption. Moreover, it uses a
35978-1-4244-6311-4/10/$26.00 2010 IEEE
-
7/27/2019 05483007
2/6
fast exploration strategy based on single-step simulation for
simulating multiple sizes of caches simultaneously. It also
supports the cache exploration in platforms not bound to a
specific processor. All these features are integrated in a
easy to handle graphical environment in the PDesignerFramework.
The rest of this paper is structured as follows. In the
next section, we discuss some recent related work. In
section 3 the proposed approach for a cache energy
environment is presented. In section 5 some results are
presented comparing the potentialities for two different
processors and several applications by using
PCacheEnergyAnalyzer environment. Finally, in Section 5
the conclusions and future directions are discussed.
2.RELATED WORKSome existing methods still apply the exhaustive search to
find the optimal cache configuration in the design space.
However, the time required for such an exhaustive search
is often prohibitive. Platune [8] is an example of a
framework for adjusting configurable System-on-Chip
(SoC) platforms that utilizes the exhaustive search method
for one-level caches and just one type of processor (MIPS
core processor). It is suitable only in some cases when
there are only a small number of possible configurations
[8]. But for a large design space, a long exploration time
would be required. Even the use of heuristics may be
unsuitable for several long simulations.
Palesi et al. [9] reduces the possible configuration spaceby using a genetic algorithm and produces faster results
than the Platune approach. Zhang et al. [3] have developed
a heuristic based on the influence of each cache parameter
(cache size, line size and associativity) in the overall
energy consumption. However, the simulation mechanism
used by the previous approaches is based on theSimpleScalar [10] and CACTI tools [11]. SimpleScalar is a
microprocessor simulation tool based on command lines,
which generate the results of the applications
performance. The CACTI tool is intended to generate
energy consumption per access for a given cache
configuration. In these cases, the simulation of different
configurations for the same application may take a longperiod.
Prete et al. [12] proposed the simulation tool called
ChARM for tuning ARM-based embedded systems that
also include cache memories. This tool provides a
parametric, trace-driven simulation for tuning systemconfiguration. Unlike previous approaches, it provides a
graphical interface that allows designers to configure the
components parameters of the components, evaluate
execution time, conduct a set of simulations, and analyze
the results. However, energy results are not supported by
this approach.
On the other hand, Silva-Filho, in [8] takes into account
static and dynamic energy consumption estimates in his
analysis with the TECH-CYCLES heuristic. This heuristic
uses the eCACTI [13] cache memory model to determine
the energy consumption of the hierarchy. The eCACTI,differently from other approaches, considers the two
energy components: static and dynamic. The static energy
component that was negligible in previous technologies
represents, for recent technologies, up to 30% of the
energy in CMOS circuits [14].
The eCACTI is an up-to-date cache memory model that
was extended from the original CACTI model [11]. The
original CACTI tool does not consider the static
component of energy. Also, the transistor width of various
devices is assumed to be constant (except for wordlines)
when analyzing power and delay. Nowadays this
assumption would be incorrect [11], because the transistor
widths in actual cache designs change according to theircapacitive load. These lead to significant inaccuracies in
the CACTI power estimates.
The PDesigner framework is an Eclipse-based
framework [15] that provides support for the modeling and
simulation of SoCs and MPSoCs platforms. By using this
framework the platform designer can build the platform
graphically and generate an executable simulator.
Currently, PDesigner is a free solution and offers support
to modeling platform with different components such as
processors, cache memory, memory, bus and connections.
Performance results are obtained from this approach;
however, energy results are not supported.
Looking at the situation depicted in Table 1 it becomes
evident that there is no environment that combines theflexibility to model multiple platforms with caches; the use
of an approach based on a single simulation; the capability
to estimate both dynamic and static energy consumption of
cache memories; or the possibility to explore the platform
configuration design space graphically.
Table 1. Comparison of related studies.
Multi
Platform
Modeling
SingleSimul.
DynamicConsump
.
StaticConsump
.
GraphicalExploration
Zhang - - - -
Palesis - - - -
Silva-Filho - - -
Platune - - -
SimpleScalar - - - - -
ChARM - - - -
PDesigner - - -
36
-
7/27/2019 05483007
3/6
LibraryExtension
Integrationin PDesigner
VisualEnvironmentInteraction
AnalysisFlow
LibraryExtension
Integrationin PDesigner
InteractiveGraphicalEnvironment
AnalysisFlow
VisualEnvironmentInteraction
Dynamic & Static
Energy estimation
PCacheEnergyAnalyzer Plugin
3.PROPOSED APPROACHIn this paper, we propose the development of a cache
energy consumption estimation tool that implements an
energy consumption analysis flow and its integration as aplugin in the PDesigner framework. The plugin, called
PCacheEnergyAnalyzer, provides dynamic and static
energy consumption statistics for cache memory
components of a SoC. The plugin is also an interactive
environment that provides a graphical user-friendly
interface for cache analysis and its interaction with theplatform model already provided by the PDesigner.
The proposed approach is depicted in Figure 1. The first
step in the approach has been the definition of an energy
cache analysis flow. For the implementation of the flow a
new SystemC component that generates traces of memoryaccesses has been created, and that has been added to the
PDesigner library. Moreover, two additional tools have
been created: an interactive graphical environment that
allows the control and view of the results of the analysis;
and a tool for dynamic and static energy consumption
estimation based on the eCACTI model. These two toolscomprise the PCacheEnergyAnalyzerplugin. The plugin
allows the designer to select a cache on the platform,
define the design space to be explored, visualize the results
in charts, select the desired cache configuration from the
chart and reflect the decision on the platform.
Finally, the updated library and thePCacheEnergyAnalyzer plugin have been integrated into
the PDesigner framework. The result is a powerful tool
that supports the modeling of platforms and the cache
architecture exploration.
In the rest of this section the analysis flow, its
implementation by the PCacheEnergyAnalyzer and the
integration in the PDesigner are explained.
Fig. 1. Proposed approach.
3.1.Cache Energy Consumption Analysis FlowFigure 2 shows the flow used to analyze energy
consumption in cache memories. All necessary steps are
detailed carefully in this section.
Fig. 2. Energy consumption analysis flow.
Initially, the desired platform is graphically constructed
from a list of components available in the PDesigner
component library. System designers model the
architecture by dragging and dropping the components
from the component palette. The component palette has the
following component types: processor, bus, device,
memory and cache memory. Figure 3 shows an example of
a platform composed of a MIPS processor, cache memory,
bus and main memory. The component master and slave
protocol ports are connected through connections. The
designer can also change the component parameters by
selecting them and using the properties view (lower part of
the Figure 3).
The application is a binary code compiled for the target
processor. The designer selects the processor and
associates the binary file with the triple {processor,
memory, load address}.
In order to make energy analysis in cache memory it is
necessary to select the PCacheEnergyAnalyzer option
when the designer right-clicks on the cache component.
This option enables the platform to explore energy
consumption in the cache memory component.
Once the cache component has been selected, the
designer can change the cache memory properties. In theProperties window shown in Figure 3, the designer can
change the exploration space of the cache memory
component. This is done by defining minimum and
maximum values for each cache memory parameter. The
parameters are the following: cache size, cache line size
and associativity. For the associativity there is only the
maximum parameter.
After, an executable simulator of the platform it is
generated. The simulator performs a single simulation and
generates miss and hits statistics for the entire
configuration space defined by the designer. So, the result
of the simulator execution is an XML file that contains the
Define
Simulate
Define
View Results
Select Configuration
Update Platform
Select Cache Calculate Energy
Platform
Mapping
Application
Energy Analysis
Exploration Space
Configuration Space
DefineTransistor Technology
37
-
7/27/2019 05483007
4/6
cache configuration ID, cache parameters such as size, line
size, associativity, number of accesses and miss rate.
A simulation mechanism using a single-pass simulation
technique, based on [16] work, has been adopted. Usually,
simulations using this method are based on traces andspend more than one single simulation [16] [17]. For
instance, single-pass cache evaluation mechanism
proposed in [16] is 70 times faster than a simulation-based
mechanism for ADPCM application from Mediabench.
Fig. 3. PDesigner, Architecture Modeling, Component
Palette and Configuration Space.
The exploration space may contain cache
configurations that are invalid or that are not interesting for
the designer. After simulation, the designer is able to select
some or all configurations for energy analysis and definethe configuration space that contains all the desired cache
configurations through a Configuration Selection Window.
This window allows the designer to select the transistor
technology size and also all the cache configurations in the
configuration space. After the configuration space has been
defined, the energy module calculates the energyconsumption and number of cycles for each selected
configuration.
The cache memory energy consumption calculation
flow is depicted in Figure 4.
A parser receives as input the selected
Configurations Space saved in the XML file and separates
it in two sets of information. The first of these is the cache
parameters and technology information that are provided to
the eCACTI tool for the dynamic and static energy
calculation per access. The second one contains the
number of misses, the number of accesses and cacheparameters of the chosen configuration. This information,
together with the dynamic and static energy provided by
the eCACTI, is used to calculate the total static and
dynamic energies consumed by the cache memory for the
application. In addition, in this step the total number ofcycles needed to run the application is also calculated.
Once calculated these parameters, another parser generates
the energy estimation results for each configuration also in
XML format file.
Fig. 4. Energy Calculation Flow
A cost function represented by F = Energy x Cycles
equation is also calculated. The minimization of this cost
function makes it possible to obtain the cache
configurations near to Pareto-optimal [8]. These cacheconfigurations present a tradeoff between performance and
energy consumption. The configuration that has the lowest
Energy x Cycles cost is also identified.
Once the energy calculation flow is concluded, the user
graphically visualizes the results of the cache energy
analysis. The energy consumption estimation for each ofthe configurations in the configuration space is displayed
in a visual interactive chart as depicted in Figure 5. Thechart displays on the y-axis the energy consumed and, on
the x-axis, the performance in number of clock cycles.
Each point on the chart corresponds to one of the
configurations in the configuration space.
The chart is interactive, meaning the user can select one
of the points and display information about it. There are
two types of information: the first, in the form of a tool tip,
is depicted by the rectangle in Figure 5 and contains the
number of cycles and energy consumed by the selected
configuration; the second form of presenting information is
by viewing properties, also shown in Figure 5.
Selected Configurations
Space (.XML)
parser
Cache parameters
and technology
eCACTI
Energy, Cycles
CalculationEnergy, Cycles
Results
parser
Energy Consumption
Estimation Results (.XML)
Cache parameters,
# Miss, # Accesses
Dynamic and Static
Energy per access
Processor
Cache Memory
Bus
Main Memory
Component
Exploration Space
Processor Load Address
38
-
7/27/2019 05483007
5/6
Fig.
Here the following information is di
configuration parameter values, miss
accesses, the cost value based on t
calculation, dynamic and static energy
total cycles required to run the applica
energy consumption.
The configuration with the lowest
represented in the interactive chart in a d
user can use this configuration as a re
he/she is not obliged to choose it
configuration.
The user also can interact with the chart iproperties of a particular cache configurthe designer selects one of the configu
his/her performance/energy consumption
user selects the configuration by simpl
point in the chart. In this step, the de
platform by replacing the actual cache cselected configuration parameter
PCacheEnergyAnalyzer plugin makes
automatically by interacting with
Framework.
4.RESULTS
The PCacheEnergyAnalyzertool has be
the cache memory design space f
applications of the Mediabench benchm
timing, rawcaudio and rawdaudio.
The architecture is composed of o
structure SimpleBus; one cache mem
memory. The parameters of the cache
and the exploration is performed for
processors and four different applic
Mediabench suite [18].
0,0000
0,0200
0,0400
0,0600
0,0800
0,1000
0,1200
Timing Rawca
Energy
(Joules)
. Energy estimation interactive chart.
played: the cache
rate, number of
he cost function
consumption, the
tion and the total
calculated cost is
ifferent color. The
erence. Therefore
as the optimal
order to view theation. In this step,rations that meets
requirements. The
y clicking on the
igner updates the
mponent with thevalues. The
the substitution
the PDesigner
n used to explore
r four different
ark suite [18]: fft,
ne interconection
ry; and a RAM
emory are varied
the two different
ations from the
The configuration space
configurations for each
technology was 0.18um. The
8192 bytes; the cache line siz
and the associativity ranges fr
The energy consumption
have been calculated based o
4. The results are then displ
interactive chart of Figure 5.
Fig. 6. Energy estimation
Figure 6 summarizes the e
values of the cache configur
and configurations with the l
the configuration space for e
MIPS and SPARCV8 process
Despite these two process
compilers and compilation
differences in some cases. It
the MIPS processor prese
consumption than the S
application, and slightly high
other applications.
dio Rawdaudio FFT
MIPS (Cost Function)
MIPS (Lowest Energy)
SPARC (Cost Function)
SPARC (Lowest Energy)
used considers 50 different
application. The selected
cache size varies from 256 to
e ranges from 16 to 64 bytes;
om 1 to 4.
estimation and performance
n the flow depicted in Figure
yed in the energy estimation
for different applications.
ergy consumption estimation
tions with best cost function
owest energy consumption in
ch application, running in the
ors.
rs have similar architectures,
optimization presents some
can be seen in the chart that
nts a much better energy
PARCV8 for the timing
r energy consumption for the
39
-
7/27/2019 05483007
6/6
Additionally, the proposed approach also was compared
with existing work by using the basicmath_small from
Mibench suite [19]. SimpleScalar and
PcacheEnergyAnalyzer(PCEA) were compared in terms of
fidelity by analyzing the energy consumption for somedifferent cache configurations. Each pixel in Figure 7
represents the energy consumption for a given cache
configuration (cache size, cache line size, associativity).
Fig. 7. Normalized Energy comparison for SimpleScalar
and PCEA approaches.
Although SimpleScalar tool do not support energyconsumption analysis, it was calculated with an approach
based on Zhang work [3], using one level cache and the
eCACTI cache memory energy model. For simplicity of
the analysis, data and instructions caches configurations
are assumed to be the same.
Results showed in Figure 7 indicate that both approachespresent fidelity. We believe that the precision difference
depicted in the figure 8 is due to the used compilers and
compilation optimizations.
5.CONCLUSIONIn this work has been presented thePcacheEnergyAnalyzer
environment for energy consumption analysis. The tool
provides support for cache memory energy consumption
estimation on SoC platforms. Initial studies were focused
for one level caches, however, it can be easily extended formore levels. Results have shown that it is a powerful tool
for helping users to find interesting cache configurations
for a particular application, which consider not only
performance, but also the best relation between
performance and energy consumption.
PCacheEnergyAnalyzer fills the gaps of the existing
tools by simultaneously providing multiplatform support,
extensibility, dynamic and static energy consumption
estimation and a graphical environment.
6.REFERENCES[1] H. Chang; L. Code; M. Hunt, G. Martin, A.J. McNelly and
L. Todd, Surviving the SOC revolution: A guide to
platform-based design; Kluwer Academic Publishers, 1 ed.,1999.
[2] B. Malik Moyer and D. Cermak, A Low Power UnifiedCache Architecture Providing Power and PerformanceFlexibility, Int Symp. On Low Power Electronics and
Design, June 2000, pp. 241-243.
[3] C. Zhang, F. Vahid, Cache configuration exploration onprototyping platforms. 14th IEEE Interational Workshop onRapid System Prototyping (June 2003), vol 00, p.164.
[4] A. Gordon-Ross, F. Vahid, N. Dutt, Automatic Tuning ofTwo-Level Caches to Embedded Aplications, DATE,pp.208-213 (Feb 2004).
[5] A. Gordon-Ross, et.al. ,Fast Configurable-Cache Tuningwith a Unified Second-Level Cache, ISLPED05, 2005.
[6]
A.G. Silva-Filho, F.R. Cordeiro, R.E. SantAnna and M.E.Lima, Heuristic for Two-Level Cache HierarchyExploration Considering Energy Consumption and
Performance, PATMOS 2006, Montpellier, France,September 13-15, 2006 pp 75-83.
[7] A. Halambi, et al. EXPRESSION: A language forarchitecture exploration through compiler/simulatorretargetability. DATE , March 1999. p.485-491.
[8] T. Givargis, F. Vahid; Platune: A Tuning framework forsystem-on-a-chip platforms, IEEE Trans. Computer-AidedDesign, vol 21, nov. 2002. pp.1-11.
[9] M. Palesi, T. Givargis, Multi-objective design spaceexploration using genetic algorithms. InternacionalWordshop on Hardware/Software Codesign (May 2002).
[10]D. Burger, T.M. Austin, The SimpleScalar Tool Set,Version 2.0; Computer Architecture News; Vol 25(3). June1997. pp.13-25.
[11]P. Shivakumar, N.P. Jouppi, Cacti 3.0: An Integrated CacheTiming, Power and Area model, WRL Research Report
2001/2.
[12]C.A. Prete, M. Graziano, F. Lazzarini, The ChARM Toolfor Tuning Embbeded Systems. In IEEE Micro 1997. Vol17, pp. 67-76.
[13]N. Dutt, M. Mamidipaka, eCACTI: An Enhanced PowerEstimation Model for On-chip Caches, TR 04-28; set. 2004.
[14]E. Macii, et. al. ; Energy-Aware Design of EmbeddedMemories: A Survey of Technologies, Architectures andOptimization Techniques,ACM Transactions on Embedded
Computing Systems; Vol. 2, No. 1, Feb. 2003, pp. 5-32.
[15]Eclipse, available at http://www.eclipse.org.[16]P. Viana, et al. Cache-Analyzer: Design Space Evaluationof Configurable-Caches in a Single-Pass. International
Workshop on Rapid System Prototyping. pp. 3-9, May 2007.[17]R.A. Sugumar, and S.G. Abraham, Efficient simulation of
multiple cache configurations using binomial trees, CSE-
TR-111-91,CSE Div, Univ. of Michigan, 1991. Available in:
.[18]Mediabench: http://cares.icsl.ucla.edu/MediaBench/,2006.[19]M.R. Guttaus, et al. Mibench: A free, commercially
representative embedded benchmark suite. In IEEE 4thAnnual Workshop on Workload Characterization, pp.1-12,
Dec. 2001.
40