Quality of Service-Aware, Scalable Cache Tuning Algorithm...

1
Quality of Service-Aware, Scalable Cache Tuning Algorithm in Consumer- based Embedded Devices M. Hammam Alsafrjalani and Ann Gordon-Ross Department of Electrical and Computer Engineering, University of Florida, Gainesville, FL, USA q Consumer-based Embedded Devices (CEDs) are ubiquitous q High consumer-de:ined quality of service (QoS) expectations (consumer goals) and stringent energy constraints, often contending q Goal: innovate a CEDs design approach that enables CEDs to adhere to consumer goals and energy requirements, independently of applications/deployment q Challenges q Consumer goals must be known during design time q Implausible given ubiquity nature of systems (diverse consumers) q Software applications must be known during design time [1] q Implausible, given rapid growth of unknown third party applications (~1.5 million+ apps for Android) Using con0igurable hardware and associated tuning algorithms q Con0igurable hardware q Contains modi:iable/tunable parameters q Clock, voltage, memory, etc. q Parameter values change to application’s hardware requirements q Parameter values dictated by a tuning algorithm q Tuning algorithm q Monitors application execution, evaluate energy consumption and quality of service q Explores the design space (available parameter values) q Adjusts parameter values such that CED q Meets QoS expectation and consume lowest energy q Target effective component q Cache memory q Has high impact on energy and performance Application-scalable hardware and runtime tuning algorithm q Scalable hardware q Compressed tuning information into auxiliary tables q Employed LRU policy to enable scalability of application- tuning q Only 4% area (hardware) overhead q Dynamic, general purpose CED cache-tuning algorithm q Requires no a priori knowledge of applications q Flexible: Conservative and Moderate modes q For disparate QoS expectations q Trades off energy savings for higher QoS Designing approach for CED’s Saving energy does not require a priori knowledge of applications ü Enables scalability to an arbitrary number of unknown (future), third-party applications ü Meets consumer goals Tuning modes provide :lexible adherence to QoS expectations ü Can incorporate tuning modes for disparate conditions/situations, such as Comparison to prior work Prior work [1][2] saved more energy, however q (-) Incurred as much as 109.2% and 132.5% more tuning overhead while tuning q (-) Requires long applications execution to amortize the tuning overhead Prior work degraded QoS up to 7X more, compared to our approach q (-) Can be completely avoided only with a priori knowledge of applications [1] q Not possible for CEDs Interdisciplinary collaboration Ø Psychology behind consumer expectations and its impact on design constraints Ø QuantiNication of consumer feelings, moods, and thoughts Ø Other factors such as time of use (day, night, etc.), place of operation (ofNice, construction site, on-route, etc.), ... Architecture innovations Ø Performance & energy expectations vs. privacy & security Ø Adaptability to Internet of Things (IoT) devices Ø Tuning through IoT [1] Wang, W., Mishra, P., Gordon-Ross, A. “SACR: Scheduling-Aware Cache Recon:iguration for Real-Time Embedded Systems,” Int. Con. on VLSI Design, 2009. [2] Zhang, C., Vahid, F. “Cache con:iguration exploration on prototyping platforms,” IEEE International Workshop on Rapid Systems Prototyping, 2003 V.S. v Quad-core system v Con0igurable, private level 1 cache memory Ø Possible con:igurations: 2KB 1-way; 4KB 1- or 2- way; and 8KB 1-, 2-, or 4-way v Hardware cache tuner Ø Global, monitors all cores Ø Tunes the cores based on the tuning algorithm Ø Stores tuning information in lookup and auxiliary tables v Information Input: Reads tuning information from lookup and auxiliary tables; tunes to best con:iguration, or resumes exploration v Exploration: Determines the con:iguration to tune the hardware to, based on the tuning information and tuning mode; updates the tables v Evaluation: Evaluates if the con:iguration saves energy and/or degrades QoS; decides if tuning is done 39.76% 20.68% 19.00% 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 Tuning Mode Percent of Energy savings Data cache Aggressive Moderate Conserva>ve 34.98% 25.14% 23.49% 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 Tuning Mode Percent of Energy savings Instruc5on cache Aggressive Moderate Conserva>ve ü Aggressive(prior work) achieved highest savings Ø However required priori knowledge of application ü Moderate and conservative modes results in comparable energy savings E(total) = E(sta) + E(dyn) E(dyn) = cache_hits * E(hit) + cache_misses * E(miss) E(miss) = E(off_chip_access) + miss_cycles * E(CPU_stall) + E(cache_fill) Miss Cycles = cache_misses * miss_latency + (cache_misses * (line_size/16)) * memory_band_width) E(sta) = total_cycles * E(staCc_per_cycle) E(sta3c_per_cycle) = E(per_Kbyte) * cache_size_in_Kbytes E(per_Kbyte) = (E(dyn_of_base_cache) * 10%) / (base_cache_size_in_Kbytes) QoS Expecta3on = performance > threshold Threshold = minimum acceptable performance If performance < threshold, QoS_degradaCon = true Else QoS_degradaCon = false Energy savings and QoS : our approach vs. base system and prior work v Energy savings: Measured application’s energy consumption of best con:iguration determined by the tuning algorithm, calculated energy percent decrease with respect to the base system, and averaged the energy savings for all applications (34 total) v Quality of Service: Calculated the number of QoS degradation occurrences while tuning each application, compared the result to a perfect (no QoS degradation) system, and averaged the QoS degradation for all applications (34 total) v Comparison to prior work: Incorporated a tuning mode which represents prior work approach; an aggressive mode which determined the lowest-energy con:iguration without considering QoS 7.70 1.00 0.41 0 1 2 3 4 5 6 7 8 9 Tuning Mode Average QoS degrada5on Data cache Aggressive Moderate Conserva>ve 7.40 1.00 0.70 0 1 2 3 4 5 6 7 8 Tuning Mode Average QoS degrada5on Instruc5on cache Aggressive Moderate Conserva>ve Aggressive(prior work) imposed QoS degradation as high as 7X , on average ü Moderate mode imposed QoS degradation, at most 1 ü Conservative mode average QoS degradation < 1.00 Otherwise Environmental Experience/Personal

Transcript of Quality of Service-Aware, Scalable Cache Tuning Algorithm...

Page 1: Quality of Service-Aware, Scalable Cache Tuning Algorithm ...esaule/NSF-PI-CSR-2017-website/poster… · platforms,” IEEE International Workshop on Rapid Systems Prototyping, 2003

Quality of Service-Aware, Scalable Cache Tuning Algorithm in Consumer-based Embedded Devices

M. Hammam Alsafrjalani and Ann Gordon-Ross Department of Electrical and Computer Engineering, University of Florida, Gainesville, FL, USA

q  Consumer-basedEmbeddedDevices(CEDs)areubiquitous

q  Highconsumer-de:inedqualityofservice(QoS)expectations(consumergoals)andstringentenergyconstraints,oftencontending

q  Goal:innovateaCEDsdesignapproachthatenablesCEDsto

adheretoconsumergoalsandenergyrequirements,independentlyofapplications/deployment

q  Challenges

q  Consumergoalsmustbeknownduringdesigntimeq  Implausiblegivenubiquitynatureofsystems(diverse

consumers)q  Softwareapplicationsmustbeknownduringdesigntime[1]

q  Implausible,givenrapidgrowthofunknownthirdpartyapplications(~1.5million+appsforAndroid)

Usingcon0igurablehardwareandassociatedtuningalgorithmsq  Con0igurablehardware

q  Containsmodi:iable/tunableparametersq  Clock,voltage,memory,etc.

q  Parametervalueschangetoapplication’shardwarerequirements

q  Parametervaluesdictatedbyatuningalgorithmq  Tuningalgorithm

q  Monitorsapplicationexecution,evaluateenergyconsumptionandqualityofservice

q  Exploresthedesignspace(availableparametervalues)q  AdjustsparametervaluessuchthatCED

q  MeetsQoSexpectationandconsumelowestenergyq  Targeteffectivecomponent

q  Cachememoryq  Hashighimpactonenergyandperformance

Application-scalablehardwareandruntimetuningalgorithmq  Scalablehardware

q  Compressedtuninginformationintoauxiliarytablesq  EmployedLRUpolicytoenablescalabilityofapplication-

tuningq  Only4%area(hardware)overhead

q  Dynamic,generalpurposeCEDcache-tuningalgorithmq  Requiresnoaprioriknowledgeofapplicationsq  Flexible:ConservativeandModeratemodes

q  FordisparateQoSexpectationsq  TradesoffenergysavingsforhigherQoS

DesigningapproachforCED’sSavingenergydoesnotrequireaprioriknowledgeofapplicationsü  Enablesscalabilitytoanarbitrarynumberofunknown(future),third-partyapplications

ü  MeetsconsumergoalsTuningmodesprovide:lexibleadherencetoQoSexpectationsü  Canincorporatetuningmodesfordisparateconditions/situations,suchas

ComparisontopriorworkPriorwork[1][2]savedmoreenergy,however

q  (-)Incurredasmuchas109.2%and132.5%moretuningoverheadwhiletuning

q  (-)Requireslongapplicationsexecutiontoamortizethetuningoverhead

PriorworkdegradedQoSupto7Xmore,comparedtoourapproachq  (-)Canbecompletelyavoidedonlywithapriori

knowledgeofapplications[1]

q  NotpossibleforCEDs

InterdisciplinarycollaborationØ  Psychologybehindconsumerexpectationsanditsimpactondesignconstraints

Ø  QuantiNicationofconsumerfeelings,moods,andthoughts

Ø  Otherfactorssuchastimeofuse(day,night,etc.),placeofoperation(ofNice,constructionsite,on-route,etc.),...

ArchitectureinnovationsØ  Performance&energyexpectationsvs.privacy&security

Ø  AdaptabilitytoInternetofThings(IoT)devices

Ø  TuningthroughIoT

[1]Wang,W.,Mishra,P.,Gordon-Ross,A.“SACR:Scheduling-AwareCacheRecon:igurationforReal-TimeEmbeddedSystems,”Int.Con.onVLSIDesign,2009.[2]Zhang,C.,Vahid,F.“Cachecon:igurationexplorationonprototypingplatforms,”IEEEInternationalWorkshoponRapidSystemsPrototyping,2003

V.S.

v  Quad-coresystemv  Con0igurable,privatelevel1cachememory

Ø  Possiblecon:igurations:2KB1-way;4KB1-or2-way;and8KB1-,2-,or4-way

v  HardwarecachetunerØ  Global,monitorsallcoresØ  TunesthecoresbasedonthetuningalgorithmØ  Storestuninginformationinlookupandauxiliary

tables

v  Information Input: Reads tuning information from lookup andauxiliarytables;tunestobestcon:iguration,orresumesexploration

v  Exploration:Determinesthecon:igurationtotunethehardwareto,basedonthetuninginformationandtuningmode;updatesthetables

v  Evaluation: Evaluates if the con:iguration saves energy and/ordegradesQoS;decidesiftuningisdone

39.76%

20.68% 19.00%

00.050.10.150.20.250.30.350.40.45

TuningMode

Percen

tofE

nergysavings

Datacache

Aggressive Moderate Conserva>ve

34.98%

25.14% 23.49%

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

TuningMode

Percen

tofE

nergysavings

Instruc5oncache

Aggressive Moderate Conserva>ve

ü  Aggressive(priorwork)achievedhighestsavingsØ  Howeverrequiredprioriknowledgeof

applicationü  Moderateandconservativemodesresultsincomparable

energysavings

E(total)=E(sta)+E(dyn)E(dyn)=cache_hits*E(hit)+cache_misses*E(miss)E(miss)=E(off_chip_access)+miss_cycles*E(CPU_stall)+E(cache_fill)MissCycles=cache_misses*miss_latency+(cache_misses*(line_size/16))

*memory_band_width)E(sta)=total_cycles*E(staCc_per_cycle)E(sta3c_per_cycle)=E(per_Kbyte)*cache_size_in_KbytesE(per_Kbyte)=(E(dyn_of_base_cache)*10%)/(base_cache_size_in_Kbytes)

QoSExpecta3on=performance>thresholdThreshold=minimumacceptableperformanceIfperformance<threshold,QoS_degradaCon=trueElse QoS_degradaCon=false

EnergysavingsandQoS:ourapproachvs.basesystemandpriorworkv  Energy savings: Measured application’s energy consumption of

best con:iguration determined by the tuning algorithm, calculatedenergypercentdecreasewithrespecttothebasesystem,andaveragedtheenergysavingsforallapplications(34total)

v  Quality of Service: Calculated the number of QoS degradationoccurrences while tuning each application, compared the result to aperfect (no QoS degradation) system, and averaged the QoSdegradationforallapplications(34total)

v  Comparison to priorwork: Incorporated a tuningmodewhichrepresents prior work approach; an aggressive mode whichdeterminedthelowest-energycon:igurationwithoutconsideringQoS

7.70

1.000.41

0123456789

TuningMode

AverageQoSdegrada

5on

Datacache

Aggressive Moderate Conserva>ve7.40

1.00 0.70

0

1

2

3

4

5

6

7

8

TuningMode

AverageQoSdegrada

5on

Instruc5oncache

Aggressive Moderate Conserva>ve

Aggressive(priorwork)imposedQoSdegradationashighas7X,onaverage

ü  ModeratemodeimposedQoSdegradation,atmost1ü  ConservativemodeaverageQoSdegradation<1.00

Otherwise

Environmental Experience/Personal