Prefetching in Embedded Mobile Systems Can Be Energy Efficient

33
Based on the paper by :Jie Tang, Shaoshan Liu,Zhimin Gu,Chen Liu and Jean-Luc Gaoudiot,Fellow, IEEE Computer Architecture Letters Volume 10 Issue 1

description

Based on the paper by :Jie Tang, Shaoshan Liu,Zhimin Gu,Chen Liu and Jean-Luc Gaoudiot,Fellow, IEEE Computer Architecture Letters Volume 10 Issue 1. Prefetching in Embedded Mobile Systems Can Be Energy Efficient. Overview. Introduction Motivation and Background Previous Work Methodology - PowerPoint PPT Presentation

Transcript of Prefetching in Embedded Mobile Systems Can Be Energy Efficient

Page 1: Prefetching in Embedded Mobile Systems Can Be Energy Efficient

Based on the paper by :Jie Tang, Shaoshan Liu,Zhimin Gu,Chen Liu and Jean-Luc Gaoudiot,Fellow, IEEEComputer Architecture LettersVolume 10 Issue 1

Page 2: Prefetching in Embedded Mobile Systems Can Be Energy Efficient

IntroductionMotivation and BackgroundPrevious WorkMethodologyPrefetcher PerformanceEnergy EfficiencyEnergy Consumption AnalysisEnergy Efficiency ModelConclusion

Page 3: Prefetching in Embedded Mobile Systems Can Be Energy Efficient

Data prefetching, is the process of fetching data that is needed in the program in advance, before the instruction that requires it is executed.

It removes apparent memory latency.

Data prefetching has been a successful technique in modern high-performance computing platforms.

It was found however, that prefetching significantly increases power consumption.

Page 4: Prefetching in Embedded Mobile Systems Can Be Energy Efficient

Embedded Mobile Systems typically have constraints for space , cost and power.

This means that they can’t afford power consuming processes.

Hence, prefetching was considered unsuitable for Embedded Systems.

Page 5: Prefetching in Embedded Mobile Systems Can Be Energy Efficient

Embedded mobile systems have now come to be powered by powerful processors such as dual core processors like the Tegra2 by Nvidia

Smart phone applications include web browsing, multimedia, gaming, Webtop control all of which require a very high performance from the computing system.

Page 6: Prefetching in Embedded Mobile Systems Can Be Energy Efficient

To meet the this requirement, methods such as prefetching which were earlier shunned, can now be used.

With better, more power efficient technology, the energy consumption behavior may have also changed.

Due to this reason, we have decided to study and model the energy efficiency of different types of prefetchers.

Page 7: Prefetching in Embedded Mobile Systems Can Be Energy Efficient

Over the years, the main bottleneck that was preventing the speeding up of systems, has been the slowness of memory and not the processor speed.

Prefetching date can be implemented in hardware by observing fetching patterns, such as prefetching the most recently used data first.

Sequential prefetching takes advantage of spatial locality in the memory.

Tagged prefetching associates a tag bit with every memory block and prefetches based on that value

Page 8: Prefetching in Embedded Mobile Systems Can Be Energy Efficient

Stride-based prefetching detects the stride pattern in the address stream like fetches from different iterations from the same loop.

Stream prefetchers try to capture sequential nearby misses and prefetch an entire block at a time.

Correlated prefetchers issue prefetches based on the previously recorded correlations between addresses of cache misses.

Page 9: Prefetching in Embedded Mobile Systems Can Be Energy Efficient

There had been some studies focusing on improving energy efficiency in hardware prefetching: PARE is one of these techniques, which constructs a power-aware hardware prefetching engine.

By categorizing memory accesses into different groups

It uses a table with indexed hardware history which is continuously updated and different memory fetches are categorized, and the prefetching decisions are based on the information in this table.

Page 10: Prefetching in Embedded Mobile Systems Can Be Energy Efficient

Modern embedded mobile systems execute a wide variety of workloads

The first set includes two XML data processing benchmarks taken from Xerces-C++

They are implementing event-based parsing which is data centric (SAX) and tree-based parsing model which is document centric (DOM).

Table 1 Benchmark Set

Xerces-C++ SAX

DOMMedia

Bench II JPEG2000 Encode

JPEG2000 Decode

H.264 Encode

H.264 Decode

PARSEC Fluidanimate

Freqmine

Page 11: Prefetching in Embedded Mobile Systems Can Be Energy Efficient

The second set is taken from MediaBench II which provides application level benchmarks, representing multimedia and entertainment workloads , based on the ISO JPEG-2000 and ISO JPEG-2000 standard.

It also has the H.264 Video Compression standards.

The third set is taken from the PARSEC(Princeton Application Repository for Shared-Memory Computers) benchmark for multithreaded processors which is used in many gaming applications.

Page 12: Prefetching in Embedded Mobile Systems Can Be Energy Efficient

Cache hierarchy indicates the level of cache that the prefetcher covers.

Prefetching degree shows whether the prefetching degree

of the prefetcher is static or dynamically adjusted.

Trigger L1 and Trigger L2 respectively show what triggers the prefetch.

Table 2 Summary of Prefetchers

cacheprefetchin

g

hierarchy degree trigger L1 trigger L2P1 L1 & L2 Dynamic miss accessP2 L1 Static miss N/AP3 L1 & L2 Dynamic miss missP4 L2 Static N/A missP5 L1 & L2 Dynamic miss missP6 L2 Static N/A access

Page 13: Prefetching in Embedded Mobile Systems Can Be Energy Efficient

To study the performance of the selected prefetchers, we use CMP$IM,a cache simulator, to model high-performance embedded systems.

It is a Pin based multi-core cache simulator

Simulation parameters are shown in Table 3, which resembles modern Smartphone and e-book systems

Table 3 Simulation Parameters

Frequency 1 GHz

Issue Width 4Instruction

Window 128 entries

L1 Data Cache 32KB, 8-way, 1cycle

L1 Inst. Cache 32 KB, 8-way, 1cycleL2 Uniform

Cache512 KB, 16-way, 20

cycles

Memory Latency 256 MB, 200 cycles

Page 14: Prefetching in Embedded Mobile Systems Can Be Energy Efficient

To study the impact of prefetching on energy consumption of memory subsystem, we use CACTI to model energy parameter of different technology implementations.

In a simulator, a hardware prefetcher can be defined by a set of hardware tables, its output is in the form of tables of data, hence it’s energy consumption can be modeled.

Page 15: Prefetching in Embedded Mobile Systems Can Be Energy Efficient

Prefetching techniques are effective on improving performance by more than 5% on average. In detail, the effectiveness of prefetchers depends on both prefetching technique itself and natures of applications.

P3 results in the best average performance

because it’s the most aggressive prefetcher.

JPEG2000 decoding and encoding programs can receive up to 22% of performance improvement due to its streaming feature.

Page 16: Prefetching in Embedded Mobile Systems Can Be Energy Efficient

Fig1 Performance Improvement

Page 17: Prefetching in Embedded Mobile Systems Can Be Energy Efficient

We study the energy efficiency of both 90 nm and 32 nm technologies. The results are summarized in Figures 2 and 3 respectively.

The baseline for comparison is energy consumption without any prefetcher, thus a positive number shows that with the prefetcher the system dissipates more energy.

For instance, 0.1 means that with the prefetcher, the system dissipates 10% more energy compared to baseline.

Page 18: Prefetching in Embedded Mobile Systems Can Be Energy Efficient

In 90nm technology, most prefetchers significantly increase overall energy consumption, which confirms the findings of previous studies.

Thus, in 90 nm technology, only very conservative prefetchers can be energy efficient.

Page 19: Prefetching in Embedded Mobile Systems Can Be Energy Efficient

Fig 2 90nm

Page 20: Prefetching in Embedded Mobile Systems Can Be Energy Efficient

Fig 3 32nm

Page 21: Prefetching in Embedded Mobile Systems Can Be Energy Efficient

In 32 nm technology, P4 is still the most energy efficient prefetcher, reducing overall energy by almost 4% on average; when running JPEG 2000 Decode, it achieves close to 10% energy saving.

P2 and P3 are still the most energy-inefficient prefetchers due to their aggressiveness. However, in the worst case they only consume 25% extra energy, a four-fold reduction compared to the 90 nm implementations.

Thus most prefetchers are able to provide performance gain with less than 5% energy overheads; and P1 and P4 even result in 2% to 5% energy reductions.

Page 22: Prefetching in Embedded Mobile Systems Can Be Energy Efficient

In equation 1, the total energy consumption consists of two contributors: static energy (Estatic) and dynamic energy (Edynamic)

Nm is the number of read/write memory accesses

Edynamic = number of read/write accesses with the energy dissipated on the bus & memory subsystem of each access (E’m ).

Page 23: Prefetching in Embedded Mobile Systems Can Be Energy Efficient

(Estatic) is production of overall execution time (t) and the system static power consumption (Pstatic).

When prefetchers accelerate the process, the reduced execution time reduces the static energy consumption.

However, prefetchers generate significant amount of extra memory subsystem accesses leading to pure dynamic overheads.

Equation 1: E = Estatic+Edynamic= (Pstatic x t)+(Nm x E’m)

Page 24: Prefetching in Embedded Mobile Systems Can Be Energy Efficient

Dynamic memory dynamic activities of the memory subsystem

Static memory memory subsystem static power consumption

Dynamic prefetch dynamic activities of the prefetcher

Static prefetch prefetcher hardware static power consumption

Table 4 Energy Category

Page 25: Prefetching in Embedded Mobile Systems Can Be Energy Efficient

Fig 4

Page 26: Prefetching in Embedded Mobile Systems Can Be Energy Efficient

In 90 nm technology, dynamic energy contributes to up to 66% of the total energy consumption: 14% from the pre-fetcher and 52% from the memory subsystem. Static energy only accounts for 34% of the total energy consumption.

Hence, although the prefetchers are able to reduce execution time, there leaves little room for total energy saving, leading to energy inefficiency for most pre-fetchers in 90 nm implementations.

Page 27: Prefetching in Embedded Mobile Systems Can Be Energy Efficient

In 32 nm technology, static energy contributes over 66% of the total energy consumption: 65% from the memory subsystem, and 1% from the prefetcher hardware.

Dynamic energy is far less compared to static.

32 nm technology, prefetchers become energy-efficient in many different cases.

Page 28: Prefetching in Embedded Mobile Systems Can Be Energy Efficient

We propose an analytical model to evaluate efficiency. Equation 2: Eno-pref > Epref (?)

To simplify the model, we assume there is only one level in the memory subsystem. Compared to Eno-pref, Epref has two more contributors: static energy and dynamic energy consumption coming from prefetcher hardware.

Equation 3: Pm-static*t1+Nm1xE’m>Pm-staticxt2+Nm2xE’m+Pp-staticxt2+NpxE’p

Equation 4:(t1-t2)/t1 > [(Nm2-Nm1)*E’m+Np*E’p+Pp-static*t2]/Pm-static*t1

Page 29: Prefetching in Embedded Mobile Systems Can Be Energy Efficient

The left-hand side shows the performance gain as a result of prefetching.

The dividend of right-hand side contains three terms: energy overhead incurred by the extra memory accesses ;dynamic energy; and static energy consumption.

The divisor of the right-hand side represents the static energy of the original design without prefetching

Page 30: Prefetching in Embedded Mobile Systems Can Be Energy Efficient

As summarized in Equation 5, if a prefetcher needs to be energy efficient, the performance gain (G) it brings must be greater than the ratio of the energy overhead (Eoverhead) it incurs over the original static energy (Eno-pref-static).

Equation 5: G> Eoverhead/Eno-perf-static

Equation 6 : EEI=G - Eoverhead/Eno-perf-static

Page 31: Prefetching in Embedded Mobile Systems Can Be Energy Efficient

We define a metric Energy Efficiency Indicator (EEI) in Equation 6. A positive EEI indicates the prefetcher is energy-efficient and vice versa.

We have validated the analytical results with the empirical results shown in table, thus indicating the simplicity and effectiveness of our analytical models.

P1 P2 P3 P4 P5 P6

90 nm -0.1 -0.5 -0.690.03-0.27

-0.31

32 nm 0.03 -0.05 -0.070.05 0.00

-0.14

Page 32: Prefetching in Embedded Mobile Systems Can Be Energy Efficient

With a new trend in highly capable embedded mobile applications, it seems conducive to implement high-performance techniques-> PREFETCHING

They do not seem to put a burden on energy consumption and should thus be implement

Page 33: Prefetching in Embedded Mobile Systems Can Be Energy Efficient

A simple analytical model has been demonstrated to estimate the effects of prefetching and to effectively calculate it.

System designers can estimate the energy efficiency of their hardware prefetcher designs and make changes accordingly.