05483007

download 05483007

of 6

Transcript of 05483007

  • 7/27/2019 05483007

    1/6

    AN ENVIRONMENT FOR ENERGY CONSUMPTION ANALYSIS OF

    CACHE MEMORIES IN SOC PLATFORMS

    Cordeiro, F.R.; Silva-Filho, A.G.; Araujo, C.C.; Gomes, M.; Barros, E.N.S. and Lima, M.E.

    Informatics Center (CIn)

    Federal University of Pernambuco (UFPE)

    Av. Prof. Luiz Freire s/n Cidade Universitria Recife/PE - Brasil

    email: { frc, agsf,cca2,maag,ensb,mel}@cin.ufpe.br

    ABSTRACT

    The tuning of cache architectures in platforms for

    embedded systems applications can dramatically reduceenergy consumption. The existing cache exploration

    environments constrain the designer to analyze cache

    energy consumption on single processor systems and

    worse, systems that are based on a single processor type. In

    this paper is presented the PCacheEnergyAnalyzer

    environment for energy consumption analysis of cache

    memory on SoC platforms. This is a powerful energy

    analysis environment that combines the use of efficient

    tools to provide static and dynamic energy consumption

    analysis, the flexibility to support the architecture

    exploration of cache memories on platforms that are not

    bound to a specific processor, and fast simulation

    techniques. The proposed environment has been integratedinto the SoC modeling framework PDesigner, providing a

    user-friendly graphical interface allowing the integrated

    modeling and cache energy analysis of SoCs. The

    PCacheEnergyAnalyzer has been validated with four

    applications of the Mediabench suite benchmark.

    1.INTRODUCTIONCurrently, the energy consumed by the memory hierarchy

    can account for up to 50% of the total energy spent by

    microprocessor-based architectures [1][2]. This fact

    becomes more critical due to the emergence of SoCs,

    meaning a large part of the integrated circuits contains

    heterogeneous processors and often cache memories.

    Moreover, current semiconductor technologies have raised

    static memory consumption from negligible to up to 30%.

    Many approaches do not take this fact in consideration.

    Many efforts have been made to reduce the

    consumption of energy by adjusting cache parameters to the

    needs of a particular application[3][4][5][6]. However,

    since the fundamental purpose of the cache subsystem is to

    provide high performance for memory accessing, cache

    optimization techniques are supposed to be driven not only

    by energy savings but also by preventing degradation of the

    applications performance.

    No single combination of cache parameters (total size,

    line size and associativity), also known as cache

    configuration, would be perfect for all applications.

    Therefore cache subsystems have been customized in order

    to deal with specific characteristics and to optimize theirenergy consumption when running a particular application.

    By adjusting the parameters of a cache memory to a

    specific application it is possible to save on average 60% of

    energy consumption [3].

    Nevertheless, finding a suitable cache configuration

    (combination of total size, line size and associativity) for a

    specific application can be a complex task and may take a

    long time for simulation and analysis. Most of the tools use

    exhaustive or heuristics based exploration [4][5][6] of all

    possible cache configurations. These are cost intensive

    approaches, leading to unacceptable exploration times.

    The tools for cache analysis lack the resources designers

    need to perform an energy consumption analysis withefficacy and efficiency. Some do not take into consideration

    static energy consumption. These tools are not flexible;

    they are normally bound to a specific processor. Designers

    experience several difficulties when they need to analyzecaches on a platform with processors that are different from

    the ones in the tools.

    Some environments have been developed aimed at

    cache parameters exploration [4][5][7]. However, none of

    them has taken into account cache memory energy models

    that consider the two energy components: static anddynamic. Silva-Filho [6] considers static and dynamic

    energy components, however, the work focuses on a single

    platform and is not integrated in to a graphical interfaceenvironment for platform analysis.

    Although cache memory exploration considering energy

    consumption is not a new issue in Design SpaceExploration (DSE), this work contributes with a new

    approach for exploiting platforms with cache memory

    architectures, considering energy consumption. Differently

    from other approaches with analysis intended for only one

    processor, this paper presents an environment for energy

    consumption analysis called PCacheEnergyAnalyzer. Thisis a an environment that provides support for the

    exploration of cache memories configurations in terms of

    static and dynamic energy consumption. Moreover, it uses a

    35978-1-4244-6311-4/10/$26.00 2010 IEEE

  • 7/27/2019 05483007

    2/6

    fast exploration strategy based on single-step simulation for

    simulating multiple sizes of caches simultaneously. It also

    supports the cache exploration in platforms not bound to a

    specific processor. All these features are integrated in a

    easy to handle graphical environment in the PDesignerFramework.

    The rest of this paper is structured as follows. In the

    next section, we discuss some recent related work. In

    section 3 the proposed approach for a cache energy

    environment is presented. In section 5 some results are

    presented comparing the potentialities for two different

    processors and several applications by using

    PCacheEnergyAnalyzer environment. Finally, in Section 5

    the conclusions and future directions are discussed.

    2.RELATED WORKSome existing methods still apply the exhaustive search to

    find the optimal cache configuration in the design space.

    However, the time required for such an exhaustive search

    is often prohibitive. Platune [8] is an example of a

    framework for adjusting configurable System-on-Chip

    (SoC) platforms that utilizes the exhaustive search method

    for one-level caches and just one type of processor (MIPS

    core processor). It is suitable only in some cases when

    there are only a small number of possible configurations

    [8]. But for a large design space, a long exploration time

    would be required. Even the use of heuristics may be

    unsuitable for several long simulations.

    Palesi et al. [9] reduces the possible configuration spaceby using a genetic algorithm and produces faster results

    than the Platune approach. Zhang et al. [3] have developed

    a heuristic based on the influence of each cache parameter

    (cache size, line size and associativity) in the overall

    energy consumption. However, the simulation mechanism

    used by the previous approaches is based on theSimpleScalar [10] and CACTI tools [11]. SimpleScalar is a

    microprocessor simulation tool based on command lines,

    which generate the results of the applications

    performance. The CACTI tool is intended to generate

    energy consumption per access for a given cache

    configuration. In these cases, the simulation of different

    configurations for the same application may take a longperiod.

    Prete et al. [12] proposed the simulation tool called

    ChARM for tuning ARM-based embedded systems that

    also include cache memories. This tool provides a

    parametric, trace-driven simulation for tuning systemconfiguration. Unlike previous approaches, it provides a

    graphical interface that allows designers to configure the

    components parameters of the components, evaluate

    execution time, conduct a set of simulations, and analyze

    the results. However, energy results are not supported by

    this approach.

    On the other hand, Silva-Filho, in [8] takes into account

    static and dynamic energy consumption estimates in his

    analysis with the TECH-CYCLES heuristic. This heuristic

    uses the eCACTI [13] cache memory model to determine

    the energy consumption of the hierarchy. The eCACTI,differently from other approaches, considers the two

    energy components: static and dynamic. The static energy

    component that was negligible in previous technologies

    represents, for recent technologies, up to 30% of the

    energy in CMOS circuits [14].

    The eCACTI is an up-to-date cache memory model that

    was extended from the original CACTI model [11]. The

    original CACTI tool does not consider the static

    component of energy. Also, the transistor width of various

    devices is assumed to be constant (except for wordlines)

    when analyzing power and delay. Nowadays this

    assumption would be incorrect [11], because the transistor

    widths in actual cache designs change according to theircapacitive load. These lead to significant inaccuracies in

    the CACTI power estimates.

    The PDesigner framework is an Eclipse-based

    framework [15] that provides support for the modeling and

    simulation of SoCs and MPSoCs platforms. By using this

    framework the platform designer can build the platform

    graphically and generate an executable simulator.

    Currently, PDesigner is a free solution and offers support

    to modeling platform with different components such as

    processors, cache memory, memory, bus and connections.

    Performance results are obtained from this approach;

    however, energy results are not supported.

    Looking at the situation depicted in Table 1 it becomes

    evident that there is no environment that combines theflexibility to model multiple platforms with caches; the use

    of an approach based on a single simulation; the capability

    to estimate both dynamic and static energy consumption of

    cache memories; or the possibility to explore the platform

    configuration design space graphically.

    Table 1. Comparison of related studies.

    Multi

    Platform

    Modeling

    SingleSimul.

    DynamicConsump

    .

    StaticConsump

    .

    GraphicalExploration

    Zhang - - - -

    Palesis - - - -

    Silva-Filho - - -

    Platune - - -

    SimpleScalar - - - - -

    ChARM - - - -

    PDesigner - - -

    36

  • 7/27/2019 05483007

    3/6

    LibraryExtension

    Integrationin PDesigner

    VisualEnvironmentInteraction

    AnalysisFlow

    LibraryExtension

    Integrationin PDesigner

    InteractiveGraphicalEnvironment

    AnalysisFlow

    VisualEnvironmentInteraction

    Dynamic & Static

    Energy estimation

    PCacheEnergyAnalyzer Plugin

    3.PROPOSED APPROACHIn this paper, we propose the development of a cache

    energy consumption estimation tool that implements an

    energy consumption analysis flow and its integration as aplugin in the PDesigner framework. The plugin, called

    PCacheEnergyAnalyzer, provides dynamic and static

    energy consumption statistics for cache memory

    components of a SoC. The plugin is also an interactive

    environment that provides a graphical user-friendly

    interface for cache analysis and its interaction with theplatform model already provided by the PDesigner.

    The proposed approach is depicted in Figure 1. The first

    step in the approach has been the definition of an energy

    cache analysis flow. For the implementation of the flow a

    new SystemC component that generates traces of memoryaccesses has been created, and that has been added to the

    PDesigner library. Moreover, two additional tools have

    been created: an interactive graphical environment that

    allows the control and view of the results of the analysis;

    and a tool for dynamic and static energy consumption

    estimation based on the eCACTI model. These two toolscomprise the PCacheEnergyAnalyzerplugin. The plugin

    allows the designer to select a cache on the platform,

    define the design space to be explored, visualize the results

    in charts, select the desired cache configuration from the

    chart and reflect the decision on the platform.

    Finally, the updated library and thePCacheEnergyAnalyzer plugin have been integrated into

    the PDesigner framework. The result is a powerful tool

    that supports the modeling of platforms and the cache

    architecture exploration.

    In the rest of this section the analysis flow, its

    implementation by the PCacheEnergyAnalyzer and the

    integration in the PDesigner are explained.

    Fig. 1. Proposed approach.

    3.1.Cache Energy Consumption Analysis FlowFigure 2 shows the flow used to analyze energy

    consumption in cache memories. All necessary steps are

    detailed carefully in this section.

    Fig. 2. Energy consumption analysis flow.

    Initially, the desired platform is graphically constructed

    from a list of components available in the PDesigner

    component library. System designers model the

    architecture by dragging and dropping the components

    from the component palette. The component palette has the

    following component types: processor, bus, device,

    memory and cache memory. Figure 3 shows an example of

    a platform composed of a MIPS processor, cache memory,

    bus and main memory. The component master and slave

    protocol ports are connected through connections. The

    designer can also change the component parameters by

    selecting them and using the properties view (lower part of

    the Figure 3).

    The application is a binary code compiled for the target

    processor. The designer selects the processor and

    associates the binary file with the triple {processor,

    memory, load address}.

    In order to make energy analysis in cache memory it is

    necessary to select the PCacheEnergyAnalyzer option

    when the designer right-clicks on the cache component.

    This option enables the platform to explore energy

    consumption in the cache memory component.

    Once the cache component has been selected, the

    designer can change the cache memory properties. In theProperties window shown in Figure 3, the designer can

    change the exploration space of the cache memory

    component. This is done by defining minimum and

    maximum values for each cache memory parameter. The

    parameters are the following: cache size, cache line size

    and associativity. For the associativity there is only the

    maximum parameter.

    After, an executable simulator of the platform it is

    generated. The simulator performs a single simulation and

    generates miss and hits statistics for the entire

    configuration space defined by the designer. So, the result

    of the simulator execution is an XML file that contains the

    Define

    Simulate

    Define

    View Results

    Select Configuration

    Update Platform

    Select Cache Calculate Energy

    Platform

    Mapping

    Application

    Energy Analysis

    Exploration Space

    Configuration Space

    DefineTransistor Technology

    37

  • 7/27/2019 05483007

    4/6

    cache configuration ID, cache parameters such as size, line

    size, associativity, number of accesses and miss rate.

    A simulation mechanism using a single-pass simulation

    technique, based on [16] work, has been adopted. Usually,

    simulations using this method are based on traces andspend more than one single simulation [16] [17]. For

    instance, single-pass cache evaluation mechanism

    proposed in [16] is 70 times faster than a simulation-based

    mechanism for ADPCM application from Mediabench.

    Fig. 3. PDesigner, Architecture Modeling, Component

    Palette and Configuration Space.

    The exploration space may contain cache

    configurations that are invalid or that are not interesting for

    the designer. After simulation, the designer is able to select

    some or all configurations for energy analysis and definethe configuration space that contains all the desired cache

    configurations through a Configuration Selection Window.

    This window allows the designer to select the transistor

    technology size and also all the cache configurations in the

    configuration space. After the configuration space has been

    defined, the energy module calculates the energyconsumption and number of cycles for each selected

    configuration.

    The cache memory energy consumption calculation

    flow is depicted in Figure 4.

    A parser receives as input the selected

    Configurations Space saved in the XML file and separates

    it in two sets of information. The first of these is the cache

    parameters and technology information that are provided to

    the eCACTI tool for the dynamic and static energy

    calculation per access. The second one contains the

    number of misses, the number of accesses and cacheparameters of the chosen configuration. This information,

    together with the dynamic and static energy provided by

    the eCACTI, is used to calculate the total static and

    dynamic energies consumed by the cache memory for the

    application. In addition, in this step the total number ofcycles needed to run the application is also calculated.

    Once calculated these parameters, another parser generates

    the energy estimation results for each configuration also in

    XML format file.

    Fig. 4. Energy Calculation Flow

    A cost function represented by F = Energy x Cycles

    equation is also calculated. The minimization of this cost

    function makes it possible to obtain the cache

    configurations near to Pareto-optimal [8]. These cacheconfigurations present a tradeoff between performance and

    energy consumption. The configuration that has the lowest

    Energy x Cycles cost is also identified.

    Once the energy calculation flow is concluded, the user

    graphically visualizes the results of the cache energy

    analysis. The energy consumption estimation for each ofthe configurations in the configuration space is displayed

    in a visual interactive chart as depicted in Figure 5. Thechart displays on the y-axis the energy consumed and, on

    the x-axis, the performance in number of clock cycles.

    Each point on the chart corresponds to one of the

    configurations in the configuration space.

    The chart is interactive, meaning the user can select one

    of the points and display information about it. There are

    two types of information: the first, in the form of a tool tip,

    is depicted by the rectangle in Figure 5 and contains the

    number of cycles and energy consumed by the selected

    configuration; the second form of presenting information is

    by viewing properties, also shown in Figure 5.

    Selected Configurations

    Space (.XML)

    parser

    Cache parameters

    and technology

    eCACTI

    Energy, Cycles

    CalculationEnergy, Cycles

    Results

    parser

    Energy Consumption

    Estimation Results (.XML)

    Cache parameters,

    # Miss, # Accesses

    Dynamic and Static

    Energy per access

    Processor

    Cache Memory

    Bus

    Main Memory

    Component

    Exploration Space

    Processor Load Address

    38

  • 7/27/2019 05483007

    5/6

    Fig.

    Here the following information is di

    configuration parameter values, miss

    accesses, the cost value based on t

    calculation, dynamic and static energy

    total cycles required to run the applica

    energy consumption.

    The configuration with the lowest

    represented in the interactive chart in a d

    user can use this configuration as a re

    he/she is not obliged to choose it

    configuration.

    The user also can interact with the chart iproperties of a particular cache configurthe designer selects one of the configu

    his/her performance/energy consumption

    user selects the configuration by simpl

    point in the chart. In this step, the de

    platform by replacing the actual cache cselected configuration parameter

    PCacheEnergyAnalyzer plugin makes

    automatically by interacting with

    Framework.

    4.RESULTS

    The PCacheEnergyAnalyzertool has be

    the cache memory design space f

    applications of the Mediabench benchm

    timing, rawcaudio and rawdaudio.

    The architecture is composed of o

    structure SimpleBus; one cache mem

    memory. The parameters of the cache

    and the exploration is performed for

    processors and four different applic

    Mediabench suite [18].

    0,0000

    0,0200

    0,0400

    0,0600

    0,0800

    0,1000

    0,1200

    Timing Rawca

    Energy

    (Joules)

    . Energy estimation interactive chart.

    played: the cache

    rate, number of

    he cost function

    consumption, the

    tion and the total

    calculated cost is

    ifferent color. The

    erence. Therefore

    as the optimal

    order to view theation. In this step,rations that meets

    requirements. The

    y clicking on the

    igner updates the

    mponent with thevalues. The

    the substitution

    the PDesigner

    n used to explore

    r four different

    ark suite [18]: fft,

    ne interconection

    ry; and a RAM

    emory are varied

    the two different

    ations from the

    The configuration space

    configurations for each

    technology was 0.18um. The

    8192 bytes; the cache line siz

    and the associativity ranges fr

    The energy consumption

    have been calculated based o

    4. The results are then displ

    interactive chart of Figure 5.

    Fig. 6. Energy estimation

    Figure 6 summarizes the e

    values of the cache configur

    and configurations with the l

    the configuration space for e

    MIPS and SPARCV8 process

    Despite these two process

    compilers and compilation

    differences in some cases. It

    the MIPS processor prese

    consumption than the S

    application, and slightly high

    other applications.

    dio Rawdaudio FFT

    MIPS (Cost Function)

    MIPS (Lowest Energy)

    SPARC (Cost Function)

    SPARC (Lowest Energy)

    used considers 50 different

    application. The selected

    cache size varies from 256 to

    e ranges from 16 to 64 bytes;

    om 1 to 4.

    estimation and performance

    n the flow depicted in Figure

    yed in the energy estimation

    for different applications.

    ergy consumption estimation

    tions with best cost function

    owest energy consumption in

    ch application, running in the

    ors.

    rs have similar architectures,

    optimization presents some

    can be seen in the chart that

    nts a much better energy

    PARCV8 for the timing

    r energy consumption for the

    39

  • 7/27/2019 05483007

    6/6

    Additionally, the proposed approach also was compared

    with existing work by using the basicmath_small from

    Mibench suite [19]. SimpleScalar and

    PcacheEnergyAnalyzer(PCEA) were compared in terms of

    fidelity by analyzing the energy consumption for somedifferent cache configurations. Each pixel in Figure 7

    represents the energy consumption for a given cache

    configuration (cache size, cache line size, associativity).

    Fig. 7. Normalized Energy comparison for SimpleScalar

    and PCEA approaches.

    Although SimpleScalar tool do not support energyconsumption analysis, it was calculated with an approach

    based on Zhang work [3], using one level cache and the

    eCACTI cache memory energy model. For simplicity of

    the analysis, data and instructions caches configurations

    are assumed to be the same.

    Results showed in Figure 7 indicate that both approachespresent fidelity. We believe that the precision difference

    depicted in the figure 8 is due to the used compilers and

    compilation optimizations.

    5.CONCLUSIONIn this work has been presented thePcacheEnergyAnalyzer

    environment for energy consumption analysis. The tool

    provides support for cache memory energy consumption

    estimation on SoC platforms. Initial studies were focused

    for one level caches, however, it can be easily extended formore levels. Results have shown that it is a powerful tool

    for helping users to find interesting cache configurations

    for a particular application, which consider not only

    performance, but also the best relation between

    performance and energy consumption.

    PCacheEnergyAnalyzer fills the gaps of the existing

    tools by simultaneously providing multiplatform support,

    extensibility, dynamic and static energy consumption

    estimation and a graphical environment.

    6.REFERENCES[1] H. Chang; L. Code; M. Hunt, G. Martin, A.J. McNelly and

    L. Todd, Surviving the SOC revolution: A guide to

    platform-based design; Kluwer Academic Publishers, 1 ed.,1999.

    [2] B. Malik Moyer and D. Cermak, A Low Power UnifiedCache Architecture Providing Power and PerformanceFlexibility, Int Symp. On Low Power Electronics and

    Design, June 2000, pp. 241-243.

    [3] C. Zhang, F. Vahid, Cache configuration exploration onprototyping platforms. 14th IEEE Interational Workshop onRapid System Prototyping (June 2003), vol 00, p.164.

    [4] A. Gordon-Ross, F. Vahid, N. Dutt, Automatic Tuning ofTwo-Level Caches to Embedded Aplications, DATE,pp.208-213 (Feb 2004).

    [5] A. Gordon-Ross, et.al. ,Fast Configurable-Cache Tuningwith a Unified Second-Level Cache, ISLPED05, 2005.

    [6]

    A.G. Silva-Filho, F.R. Cordeiro, R.E. SantAnna and M.E.Lima, Heuristic for Two-Level Cache HierarchyExploration Considering Energy Consumption and

    Performance, PATMOS 2006, Montpellier, France,September 13-15, 2006 pp 75-83.

    [7] A. Halambi, et al. EXPRESSION: A language forarchitecture exploration through compiler/simulatorretargetability. DATE , March 1999. p.485-491.

    [8] T. Givargis, F. Vahid; Platune: A Tuning framework forsystem-on-a-chip platforms, IEEE Trans. Computer-AidedDesign, vol 21, nov. 2002. pp.1-11.

    [9] M. Palesi, T. Givargis, Multi-objective design spaceexploration using genetic algorithms. InternacionalWordshop on Hardware/Software Codesign (May 2002).

    [10]D. Burger, T.M. Austin, The SimpleScalar Tool Set,Version 2.0; Computer Architecture News; Vol 25(3). June1997. pp.13-25.

    [11]P. Shivakumar, N.P. Jouppi, Cacti 3.0: An Integrated CacheTiming, Power and Area model, WRL Research Report

    2001/2.

    [12]C.A. Prete, M. Graziano, F. Lazzarini, The ChARM Toolfor Tuning Embbeded Systems. In IEEE Micro 1997. Vol17, pp. 67-76.

    [13]N. Dutt, M. Mamidipaka, eCACTI: An Enhanced PowerEstimation Model for On-chip Caches, TR 04-28; set. 2004.

    [14]E. Macii, et. al. ; Energy-Aware Design of EmbeddedMemories: A Survey of Technologies, Architectures andOptimization Techniques,ACM Transactions on Embedded

    Computing Systems; Vol. 2, No. 1, Feb. 2003, pp. 5-32.

    [15]Eclipse, available at http://www.eclipse.org.[16]P. Viana, et al. Cache-Analyzer: Design Space Evaluationof Configurable-Caches in a Single-Pass. International

    Workshop on Rapid System Prototyping. pp. 3-9, May 2007.[17]R.A. Sugumar, and S.G. Abraham, Efficient simulation of

    multiple cache configurations using binomial trees, CSE-

    TR-111-91,CSE Div, Univ. of Michigan, 1991. Available in:

    .[18]Mediabench: http://cares.icsl.ucla.edu/MediaBench/,2006.[19]M.R. Guttaus, et al. Mibench: A free, commercially

    representative embedded benchmark suite. In IEEE 4thAnnual Workshop on Workload Characterization, pp.1-12,

    Dec. 2001.

    40