Yi Xiang · 2015. 9. 9. · energy harvesting-aware multicore embedded system platforms • Solar...
Transcript of Yi Xiang · 2015. 9. 9. · energy harvesting-aware multicore embedded system platforms • Solar...
Yi Xiang
Committee members:
Dr. Sudeep Pasricha (Academic Advisor)
Dr. Anura Jayasumana
Dr. H. J. Siegel
Dr. Michelle Strout
OUTLINE
2
Introduction and Preview of Contributions
Contribution I:Semi-Dynamic Scheduling for Independent Tasks: Hybrid Energy Storage, Process Variation, and Thermal Management
Contribution II:Template-Based Scheduling for Task Graphs:Slack Reclamation, Soft Errors, and Hard Failures
Contribution III:Mixed-Criticality Scheduling on Heterogeneous Cores:Soft Deadline, Near-Threshold/Super-Threshold Computing
Conclusion Ou
tlin
e
• We have been doing this for a very long time:
3
Intr
od
uc
tio
n a
nd
Pre
vie
w o
f C
on
trib
uti
on
s
What is Energy Harvesting?
wind + mill
windmill
water + wheelwaterwheel
solar + core
solarcore?
Chao Li et al., “SolarCore: Solar Energy Driven Multi-Core Architecture Power
Management”, IEEE International Symposium on High Performance Computer
Architecture (HPCA), pp. 205-216, 2011.
• Collect energy from ambient sources
• Including solar, radio frequency,
magnetic, vibration, thermoelectric, etc.
4
Intr
od
uc
tio
n a
nd
Pre
vie
w o
f C
on
trib
uti
on
s
Energy Harvesting for Electronic Systems
• To support energy autonomy for electronic devices
• Wearable electronics, wireless sensor networks, etc.
• Limited energy availability for electronic devices:
• Electricity grid is not available everywhere
5
Intr
od
uc
tio
n a
nd
Pre
vie
w o
f C
on
trib
uti
on
s
Application of Energy Harvesting: Project Loon
Project Loon
o Global network-on-balloon
o Provide internet access for rural
and remote areas
o Float in atmosphere
o Powered by solar energy
harvesting
For many applications energy is not readily available,
motivating the use of energy harvesting
• Energy constraints for embedded devices:
• Limited energy capacity of batteries
• Replacing battery can be inconvenient, costly, or even impractical
6
Intr
od
uc
tio
n a
nd
Pre
vie
w o
f C
on
trib
uti
on
s
Pervasive computing:
o Billions of sensors
o Scattered everywhere
o Batteries:
o costly or impossible
maintenance
o toxic to environment
o Need energy harvesting to
achieve energy autonomy
User Demands:
High performance
Large screen size
High resolution
GPS
Camera
Biometric sensors
24X7 battery life?External
Battery
Pack
Energy Harvesting and Batteries
• Solar energy harvesting as power supply
• Photovoltaic (PV) panel to scavenge energy from solar radiation
• Why choose solar energy harvesting?
Solar Energy Harvesting
7
Intr
od
uc
tio
n a
nd
Pre
vie
w o
f C
on
trib
uti
on
s
• Advantages of solar energy harvesting as power supply
• High power density
• Varied scales of PV panels and systems
Why Solar Energy Harvesting?
8
Intr
od
uc
tio
n a
nd
Pre
vie
w o
f C
on
trib
uti
on
s
Energy Source Typical Power Density
thermal gradient 60 μW/cm2
vibration 4 μW/cm3
radio frequency 1 μW/cm2
solar radiation 100 mW/cm2
• Solar radiation can vary dramatically with environment change
• Energy shortage at times
• Hard to predict
As a result, hard to find a performance- and energy-optimal schedule
for workload running on energy harvesting-aware embedded systems
9
Harvesting power trace of an PV array in one dayProvided by National Renewable Energy Laboratory (NREL), Golden, Colorado
Intr
od
uc
tio
n a
nd
Pre
vie
w o
f C
on
trib
uti
on
s
Challenges with Solar Energy Harvesting for
Embedded Systems
Design a holistic framework for performance- and energy-optimal
scheduling and allocation of workload (task, communication) on
energy harvesting-aware multicore embedded system platforms
• Solar energy harvesting as the only power source
• Batteries/supercapacitors used for temporal energy storage
• Multi-core processors with frequency scaling capability
• Real-time periodic workloads with deadline constraints
Focus of this Dissertation
10
Intr
od
uc
tio
n a
nd
Pre
vie
w o
f C
on
trib
uti
on
s
Real-Time Workload with Different Timing Constraints
• Hard deadline constraint
• Any task miss → total system failure
• Firm deadline constraint:
• Every task miss → inevitable performance penalty
• Soft deadline constraint:
• Each task miss → possible performance penalty
Main objective:
Minimizing miss rate/penalty for task set with firm/soft deadlines
11
Intr
od
uc
tio
n a
nd
Pre
vie
w o
f C
on
trib
uti
on
s
Overview of Proposed Framework
12
• task-to-core mapping
• intra-core scheduling
• communication mapping
• voltage-frequency selection
• dynamic power management
Semi-Dynamic
Workload and Platform
Management Framework
Real-Time Workloads
independent tasks
task graphs
Multicore Platforms
homogeneous
Constraints
timing
temperature
Energy Harvesting Systems
photovoltaic panels
batteries supercapacitors
heterogeneous
core variation
energy
soft error hard error
firm deadline soft deadline
hybrid
Objective
minimize miss rate/penalty
off tE off
tF
offtD
tBtC
tA
multithreaded tasks
OUTLINE
13
Introduction and Preview of Contributions
Contribution I:Semi-Dynamic Scheduling for Independent Tasks: Hybrid Energy Storage, Process Variation, and Thermal Management
Contribution II:Template-Based Scheduling for Task Graphs:Slack Reclamation, Soft Errors, and Hard Failures
Contribution III:Mixed-Criticality Scheduling on Heterogeneous Cores:Soft Deadline, Near-Threshold/Super-Threshold Computing
Conclusion Ou
tlin
e
Contribution I:
Semi-Dynamic Scheduling for Independent Tasks
• Main Objective
• To reduce miss rate of independent tasks under varying and
stringent energy harvesting conditions
• Contributions
• A semi-dynamic algorithm (SDA) that results in lower task miss rates
compared to best known prior work
• Efficient utilization of multicore systems
• Management of battery/supercapacitor hybrid storage system
• Awareness of discrete frequency levels, process variations, thermal
issues
14
Co
ntr
ibu
tio
n I
: S
em
i-D
yn
am
ic S
ch
ed
ulin
g f
or
Ind
ep
en
den
t Ta
sk
s
Workload: Periodic Independent Task Set
• Multiple independent periodic tasks with firm deadlines
• A task miss: missing deadline of a task instance
• Minimize miss rate → utilize energy as efficient as possible
15
An example of
periodic task set
Co
ntr
ibu
tio
n I
: S
em
i-D
yn
am
ic S
ch
ed
ulin
g f
or
Ind
ep
en
den
t Ta
sk
s
• Task utilization = execution time at fmax
task period
• fopt = fmax×U
• The lowest frequency that is sufficient to meet all task deadlines,
with Earliest Deadline First (EDF) scheduling
• For energy efficiency: minimize frequency fluctuations
• Main drawback: dynamic task dropping and slowing down on
energy shortage
16
Related Work: Utilization-Based Algorithm (UTB)
Co
ntr
ibu
tio
n I
: S
em
i-D
yn
am
ic S
ch
ed
ulin
g f
or
Ind
ep
en
den
t Ta
sk
s
J. Lu et al., “Scheduling and Mapping of Periodic Tasks on Multi-Core Embedded
Systems with Energy Harvesting", IEEE International Green Computing
Conference (IGCC), pp. 1-6, 2011.
• Proposed solution: a semi-dynamic window-based scheduling
• Estimate/predict energy budget
• Preset execution strategy with uniform speed
• Divide execution process into schedule windows
17
Motivation: Address Limitation of UTB
6 tasks finished
9 tasks finished
HOW? A spike/dip in harvesting
power can make the prediction
inaccurate.
Any mispredictions can only affect prediction accuracy of
one schedule window
Co
ntr
ibu
tio
n I
: S
em
i-D
yn
am
ic S
ch
ed
ulin
g f
or
Ind
ep
en
den
t Ta
sk
s
Proposed Semi-Dynamic Framework
18
• During system execution for a long duration of time• Slice execution time into time windows of k minutes
• At reschedule point, predict/obtain energy budget for next time window
• Reject tasks based on energy budget. Then allocate the rest
• Execute accepted tasks with uniform optimal frequency
• Semi-Dynamic: reschedule→execute→reschedule→execute …
k minutes
Co
ntr
ibu
tio
n I
: S
em
i-D
yn
am
ic S
ch
ed
ulin
g f
or
Ind
ep
en
den
t Ta
sk
s
Experiment Setup
19
• Design a simulation environment to capture workload executing on
multicore embedded system platform
• Historical weather data (solar radiation intensity, temperature)
provided by National Renewable Energy Laboratory (NREL)
• System only operates from 6:00 AM to 6:30 PM
• 50 periodic task sets are randomly generated for each comparison
• Implementation of our proposed Semi-Dynamic Algorithm (SDA),
together with Utilization-Based Algorithm (UTB)
Co
ntr
ibu
tio
n I
: S
em
i-D
yn
am
ic S
ch
ed
ulin
g f
or
Ind
ep
en
den
t Ta
sk
s
UTB: Utilization-Based Algorithm
J. Lu et al., “Scheduling and mapping of periodic tasks on multi-
core embedded systems with energy harvesting”, IGCC 2011.
• SDA outperforms UTB
• Advantage expands with increasing number of cores
• Up to 70% miss rate reduction compared to UTB
20
Co
ntr
ibu
tio
n I
: S
em
i-D
yn
am
ic S
ch
ed
ulin
g f
or
Ind
ep
en
den
t Ta
sk
s
100% utilization: very intensive workload
Simulation Results with Heavy Workload
Advantage of SDA: missUTB - missSDA
21• More miss rate reduction when power budget is low or fluctuating
Co
ntr
ibu
tio
n I
: S
em
i-D
yn
am
ic S
ch
ed
ulin
g f
or
Ind
ep
en
den
t Ta
sk
s
Related Topic I: Hybrid Energy Storage
22
• Pros and cons of different storage medium types
battery-only Supercapacitor-only
high energy density low energy density
low power density high power density
less recharge cycles more recharge cycles
Solution:
Battery-supercapacitor hybrid storage system
Co
ntr
ibu
tio
n I
: S
em
i-D
yn
am
ic S
ch
ed
ulin
g f
or
Ind
ep
en
den
t Ta
sk
s
Simulation Results
23
• CA-SDA outperforms BA-SDA
• Proposed HY-SDA outperforms all other techniques
UTB BA-SDA CA-SDA HY-SDA
best prior work battery-only supercap-only hybrid storage
Harvesting Power X 2
Co
ntr
ibu
tio
n I
: S
em
i-D
yn
am
ic S
ch
ed
ulin
g f
or
Ind
ep
en
den
t Ta
sk
s
24
• HY-SDA with penalty awareness outperforms MISS-SDA (HY-SDA
without priority support)
Related Topic II: Task Miss Penalty• Assume tasks have different miss penalties:
• Utilize flexibility of reschedule points provided by SDA
• During task rejection: give priority to tasks with higher miss penalty
density (miss penalty/required execution time)
Co
ntr
ibu
tio
n I
: S
em
i-D
yn
am
ic S
ch
ed
ulin
g f
or
Ind
ep
en
den
t Ta
sk
s
• Process variation:
• Variations in the attributes of transistors formed during fabrication
• Effect: performance metrics to deviate from their nominal values
• Variations in gate delays
• Solution: at reschedule point, distribute workload with awareness of
each core’s peak frequency
Related Topic III: Process Variation-Aware Workload Allocation
25
Co
ntr
ibu
tio
n I
: S
em
i-D
yn
am
ic S
ch
ed
ulin
g f
or
Ind
ep
en
den
t Ta
sk
s
1.0 1.0 1.0
0.8
0.80.8
1.00.6
0.6
→ different peak frequencies of cores
Simulation Results
26
• Variation-Unaware: faulty schedule with high task miss rate
• Some assigned tasks consumed energy without finishing in time
• Variation-aware: significantly lower (up to 49%) task miss rate
• Peak frequencies of cores:
normal distribution with average of 1000MHz and variation of 33%
Co
ntr
ibu
tio
n I
: S
em
i-D
yn
am
ic S
ch
ed
ulin
g f
or
Ind
ep
en
den
t Ta
sk
s
Related Topic IV: Discrete Frequency Levels
27
• In general, fexec selected by SDA is from a continuous range, which
may not correspond to an available, discrete frequency level
• Single: executing with discrete frequency level closest to and higher than fexec
Level 0 1 2 3 4 5
Frequency (MHz) idle 150 400 600 800 1000
Power (mW) 40 80 170 400 900 1600
• Proposed dual-speed method
• Intra: combine two neighboring
discrete frequencies to approximate
fexec
• Switch frequency multiple times
within each task instance
• Inter: avoid frequent speed switches
for less switching overhead
• Only switch frequency once between
every two task instances
Ends up with near-ideal result
Co
ntr
ibu
tio
n I
: S
em
i-D
yn
am
ic S
ch
ed
ulin
g f
or
Ind
ep
en
den
t Ta
sk
s
Related Topic V: Run-Time Thermal Issues
• Motivation
• Only passive cooling applicable
• Solar radiation → Tenviroment and fexecution
• High risk of core overheating during the middle of the day
• Solution:
• Passive: enforce thermal throttling for system stability
• Proactive: at reschedule point, allocate less workload to hotspot
• Advantages of proactive thermal management over passive:
• 22% less frequent thermal throttlings (94 → 74)
• Less throttlings→ more stable speed→ 2.7% miss rate reduction
28
Co
ntr
ibu
tio
n I
: S
em
i-D
yn
am
ic S
ch
ed
ulin
g f
or
Ind
ep
en
den
t Ta
sk
s
Summary of Contribution I:
Semi-Dynamic Scheduling for Independent Tasks
• Proposed a novel Semi-Dynamic Algorithm (SDA) that achieved
up to 70% miss rate reduction compared to best known prior work
• Unlike any other work, SDA provides flexibility to simultaneously
achieve multiple goals:
• Intelligent management of hybrid energy storage systems
• Priority scheduling for tasks with different miss penalties
• Process variation-aware workload allocation
• More energy efficient utilization of discrete frequency levels
• Proactive reaction to run-time thermal issues
• Publications Y. Xiang, S. Pasricha, "Run-Time Management for Multi-Core Embedded Systems with Energy
Harvesting", IEEE Transactions on Very Large Scale Integration Systems (TVLSI), 2014.
Y. Xiang, S. Pasricha, "Harvesting-Aware Energy Management for Multicore Platforms with
Hybrid Energy Storage", ACM Great Lakes Symposium on VLSI (GLSVLSI), pp. 25-30, 2013.
Y. Xiang, S. Pasricha, "Thermal-Aware Semi-Dynamic Power Management for Multicore
Systems with Energy Harvesting", IEEE International Symposium on Quality Electronic Design
(ISQED), pp. 619-626, 2013.
29
Co
ntr
ibu
tio
n I
: S
em
i-D
yn
am
ic S
ch
ed
ulin
g f
or
Ind
ep
en
den
t Ta
sk
s
OUTLINE
30
Introduction and Preview of Contributions
Contribution I:Semi-Dynamic Scheduling for Independent Tasks: Hybrid Energy Storage, Process Variation, and Thermal Management
Contribution II:Template-Based Scheduling for Task Graphs:Slack Reclamation, Soft Errors, and Hard Failures
Contribution III:Mixed-Criticality Scheduling on Heterogeneous Cores:Soft Deadline, Near-Threshold/Super-Threshold Computing
Conclusion Ou
tlin
e
Contribution II:
Template-Based Scheduling for Task Graphs
• Objective
• To reduce total task graph miss rate/penalty under variant and
stringent energy harvesting condition at run-time
• To offload scheduling complexity of task graphs to design time
• Contribution
• A hybrid workload management framework (HyWM) using schedule
templates to integrate design-time and run-time scheduling efforts
• Two approaches for offline schedule template generation
• Novel run-time slack reclamation and soft error handling heuristics
• Aging-aware workload allocation on multicore processors
31
Co
ntr
ibu
tio
n I
I: T
em
pla
te-B
as
ed
Sc
he
du
lin
g f
or
Ta
sk
Gra
ph
s
• Multiple applications modeled as periodic task graphs
• Each periodic task graph has instances that arrive recursively
• To avoid a miss: finish all task nodes before deadline
Workload: Periodic Task Graphs
1st period 2nd period 3rd period
arrival times deadlines
new TG instance ready finished or missed ?32
Co
ntr
ibu
tio
n I
I: T
em
pla
te-B
as
ed
Sc
he
du
lin
g f
or
Ta
sk
Gra
ph
s
• Task scheduling on systems powered by energy harvesting
• IGCC 2011’ J. Lu and Q. Qiu
Scheduling and Mapping of Periodic Tasks on Multi-Core Embedded
Systems with Energy Harvesting
No awareness of task dependency
• Energy-aware scheduling for task graphs
• DATE 2007’ R. Wtanabe et al.
Task scheduling under performance constraints for reducing the energy
consumption of the multi-processor SoC
Does not target systems with energy harvesting
The first work to consider scheduling of multiple task graphs
for systems powered by energy harvesting
Related Works
33
Co
ntr
ibu
tio
n I
I: T
em
pla
te-B
as
ed
Sc
he
du
lin
g f
or
Ta
sk
Gra
ph
s
Hybrid Workload Management: Motivation and Overview
Template-based:
• Design-time (offline) schedule template generation to offload complexity
• Run-time (online) schedule template selection based on energy budget
With variations in energy harvesting, how to provide fixed energy
budget so that we know which template to select?
34
Co
ntr
ibu
tio
n I
I: T
em
pla
te-B
as
ed
Sc
he
du
lin
g f
or
Ta
sk
Gra
ph
s
Motivation for hybrid framework:
• Task graph scheduling problem is complex
• Limited energy and computing resource prohibit dynamic scheduling
at run-time
Semi-Dynamic Framework
• Semi-dynamic: divide time of execution into schedule windows
• Each window has an energy budget that is independent from
other windows
• Shifting: use energy harvested during the previous window
• Each window has an energy budget that is decoupled from
run-time variations in energy harvesting 35
Co
ntr
ibu
tio
n I
I: T
em
pla
te-B
as
ed
Sc
he
du
lin
g f
or
Ta
sk
Gra
ph
s
Design Time Template Generation:
36
• Optimized schedule template
• Time consuming
• Design-time
MILP-Based Template Generation
Mixed Integer Linear Programming
Co
ntr
ibu
tio
n I
I: T
em
pla
te-B
as
ed
Sc
he
du
lin
g f
or
Ta
sk
Gra
ph
s
MILP for Design-Time Template Generation
• Represent scheduling problem as mixed-integer linear programming
(MILP) formulation
• Objective:
• Formulate MILP constraints based on workload, platform, and energy
budget
• Represent scheduling decisions as integer variables in MILP
• Generated templates saved in system for run-time selection
37
MILP
Optimization
Optimized
Schedule
Templates
Energy budget levels
Co
ntr
ibu
tio
n I
I: T
em
pla
te-B
as
ed
Sc
he
du
lin
g f
or
Ta
sk
Gra
ph
s
Quality of Schedule Templates Generated by MILP
• With low energy budget, schedule less workload for higher efficiency
• With sufficient energy budget, schedule full workload even with
stringent timing requirement (95% per core utilization)
• 4-core embedded system
• 9 task graph instances per schedule window (E3S benchmark)
• With per-core average utilization of 95%
38
Co
ntr
ibu
tio
n I
I: T
em
pla
te-B
as
ed
Sc
he
du
lin
g f
or
Ta
sk
Gra
ph
s
Frequency Selection in Schedule Templates
Dealing with stringent energy constraint when energy budget is low
• Lower energy budget → lower average execution frequency
• Slow and inefficient frequency level (150MHz) is ignored
automatically
• Breakdown of frequencies selected for all task nodes
• For schedule templates with energy beget levels from low to high
39
Co
ntr
ibu
tio
n I
I: T
em
pla
te-B
as
ed
Sc
he
du
lin
g f
or
Ta
sk
Gra
ph
s
Frequency Selection in Schedule Templates
Dealing with stringent timing constraint when energy budget is high
• Higher workload intensity → higher average execution frequency
• Breakdown of frequencies selected for all task nodes
• For schedule templates with energy beget levels from low to high
40
Co
ntr
ibu
tio
n I
I: T
em
pla
te-B
as
ed
Sc
he
du
lin
g f
or
Ta
sk
Gra
ph
s
• Task scheduling on systems powered by energy harvesting
• IGCC 2011’ J. Lu and Q. Qiu
Scheduling and Mapping of Periodic Tasks on Multi-Core Embedded
Systems with Energy Harvesting
• GLSVLSI 2013’ Y. Xiang and S. Pasricha
Harvesting-Aware Energy Management for Multicore Platforms with
Hybrid Energy Storage
• Energy-aware scheduling for task graphs
• DATE 2007’ R. Wtanabe et al.
Task scheduling under performance constraints for reducing the energy
consumption of the GALS multi-processor SoC
• Embedded System Synthesis Benchmarks Suite (E3S)
Comparison with Prior Work
UTA
SDA
LP+SA
41
Co
ntr
ibu
tio
n I
I: T
em
pla
te-B
as
ed
Sc
he
du
lin
g f
or
Ta
sk
Gra
ph
s
Linear programming + heuristics
Assume that scheduler can identify ready task nodes for them
Comparison over Miss Rate
• Proposed HyWM-LP: miss rate reduction of 9.7% compared to
LP+SA, 15.2 % compared to SDA, and 29.5% compared to UTA
• Because of semi-dynamic framework: VS
• Because of dependency awareness: VS
• Because of schedule template optimality : VS42
dependency Window-
Shifting
UTA
SDA
LP+SA
Co
ntr
ibu
tio
n I
I: T
em
pla
te-B
as
ed
Sc
he
du
lin
g f
or
Ta
sk
Gra
ph
s
Design Time Template Generation:
Alternative Approach
43
• Optimized schedule template
• Time consuming
• Not Scalable
MILP-Based Template Generation
• Near-optimal schedule template
• Much less time consuming
• Better scalability
Analysis-Based Template Generation
ATGMILP
For problem size of 10 tasks, 100 nodes:
about 100 hours and 5 GB memory
Fast schedule template generation
Co
ntr
ibu
tio
n I
I: T
em
pla
te-B
as
ed
Sc
he
du
lin
g f
or
Ta
sk
Gra
ph
s
Analysis-Based Template Generation (ATG)
Basic ideas:
• With known workload and platform, we can simulate system execution
• During simulation, we can monitor execution process to find energy inefficient events
• We can get hindsight on how to avoid these events
• Make informed update on execution schedule and rewind simulation
• Save the best schedule we have
44
Co
ntr
ibu
tio
n I
I: T
em
pla
te-B
as
ed
Sc
he
du
lin
g f
or
Ta
sk
Gra
ph
s
Dynamic Critical Path Identification in ATG
• Start from the ending task node with deadline equal to deadline of task graph
• Recursively calculate implicit deadlines of precedent task nodes:
To compare priorities of task nodes
To identify critical paths of task graphs
45
CO
MM
= 5
0
WCET = 1100
Implicit Deadline = 2000 – 100 – 150
Implicit Deadline = 1750
τ1
τ3τ2
τ4
Task Graph Deadline = 2000
WCET = 800
Implicit Deadline = 2000 – 100 – 50
Implicit Deadline = 1850
WCET = 300
1850 – 800 – 50 > 1750 – 1100 – 200
Implicit Deadline = 1750 – 1100 – 200
Implicit Deadline = 450
WCET = 100
Implicit Deadline =
2000
Direction of
Calculation
Other Predecessor Nodes in Task
Graph
(Omitted in this figure)
Initially only one deadline
for entire task graph
Co
ntr
ibu
tio
n I
I: T
em
pla
te-B
as
ed
Sc
he
du
lin
g f
or
Ta
sk
Gra
ph
s
Design Time Template Generation:
Motivation for Alternative Approach
46
• Optimized schedule template
• Time consuming
• Not Scalable
MILP-Based Template Generation
• Near-optimal schedule template
• Much less time consuming
• Better scalability
Analysis-Based Template Generation
ATGMILP
For problem size of 10 tasks, 100 nodes:
about 100 hours and 5 GB memory Near-optimal schedule templates (up to 1.3%
higher miss rate)
Only uses about 1 hour and 50 MB memory
Co
ntr
ibu
tio
n I
I: T
em
pla
te-B
as
ed
Sc
he
du
lin
g f
or
Ta
sk
Gra
ph
s
Fast schedule template generation
• Random transient errors (bits flipping) in circuits, caused by
• Alpha particles from package decay
• Neutron strikes from cosmic rays
• Not a permanent failure in circuitry
• May lead to faulty/invalid execution result on a task node
(also considered as task graph miss → waste of energy)
Related Topic I: Soft Errors
47
Earth’s Surface
pn p
p
n
n
p
pn
nn+
- ++ +-- -
Transistor Device
source drain
Particle Strike!
Co
ntr
ibu
tio
n I
I: T
em
pla
te-B
as
ed
Sc
he
du
lin
g f
or
Ta
sk
Gra
ph
s
Run-Time Soft Error Handling and Slack Reclamation
• Goal: avoid invalidation of schedule templates
• Never change allocation and execution order decisions
• Never delay start times of task graph instances
• Can drop task instances, adjust execution frequencies when necessary
• Soft error handling heuristic
• Triggered when a soft error is detected on a task node
• Attempts to re-execute node with error to save previous execution effort
• By boosting up frequency using surplus energy
• By dropping of upcoming and not started yet task graphs
• By reclaiming slack time
• Slack reclamation heuristic
• To utilize slack time due to task dropping and earlier than worst-case execution time
• Triggered when a task node can start earlier than scheduled
• Reclaim: slow down execution frequency without delaying scheduled finish time
• Pass-on: If can’t reclaim instantly, start and finish task earlier
48
Co
ntr
ibu
tio
n I
I: T
em
pla
te-B
as
ed
Sc
he
du
lin
g f
or
Ta
sk
Gra
ph
s
Benefit of Slack Reclamation and Error Handling
Setup:
• Based on 8-core system
• Execution time variation: uniform distribution from 50% to 100% of worst case
• Error injection rate: 10-5/s per core at maximum frequency
Experiment Results:
1. Base case: no slack time awareness + no soft error awareness:
High miss rate of 40.11%
2. Slack reclamation + no soft error awareness:
Significant lower miss rate of 29.19% (27.2% reduction)
3. Slack reclamation + soft error handling:
miss rate of 22.01% (45.2 % reduction)
In all, 45.2% miss rate reduction achieved compared to the base case
49
Co
ntr
ibu
tio
n I
I: T
em
pla
te-B
as
ed
Sc
he
du
lin
g f
or
Ta
sk
Gra
ph
s
• A major concern for electronic devices with technology scaling
Related Topic II: Aging Effect
50
Useful lifeInfant
mortality
~ 10 year Co
ntr
ibu
tio
n I
I: T
em
pla
te-B
as
ed
Sc
he
du
lin
g f
or
Ta
sk
Gra
ph
s
Per-Core Aging Effect Modeling
51
Infant
mortality
~ 10 year
Reliability model based on Weibull distribution
• Core survival rate at time t:
• β, constant parameter related
to core architecture
• Scale parameter, α
• Electromigration model:
• Workload-related factors: supply voltage (vdd), execution frequency (f),
core temperature (T)
• Can be controlled
Co
ntr
ibu
tio
n I
I: T
em
pla
te-B
as
ed
Sc
he
du
lin
g f
or
Ta
sk
Gra
ph
s
• Assumes 8-core system with per-core aging progress detection circuitry
• Aging-Aware: to balance workload/aging effect among cores
• Compare against:• Biased: always assign heavier workload to certain cores
• Random: randomize workload allocation
Aging-Aware Allocation Scheme
52
~ 10 year
• Aging-Aware: slower reliability drop and improved MTTF (+19.7%)
MTTF: mean-time-to-failure
Co
ntr
ibu
tio
n I
I: T
em
pla
te-B
as
ed
Sc
he
du
lin
g f
or
Ta
sk
Gra
ph
s
• Experiment:• 8-core homogenous processor• Processing capability : achievable task graph finish rate with remaining cores
System Aging Model with Core Failure Tolerance
53
Failure Threshold* 0 1 2 3 4 5 6 7
MTTF (years) 10.06 15.58 20.22 24.66 29.27 34.44 40.94 51.35
Processing Capability
Before System Failure (%)100 92.3 80.1 68.8 52.4 34.9 18.4 7.0
Average Processing
Capability during System
Lifetime (%)
100 96.5 92.5 87.7 81.7 75.0 67.3 60.3
• Higher tolerance → longer MTTF → lower performance guarantee
Co
ntr
ibu
tio
n I
I: T
em
pla
te-B
as
ed
Sc
he
du
lin
g f
or
Ta
sk
Gra
ph
s
• Multicore system reliability model:
With different levels of tolerance on number of failed cores (h)
Probability of exactly h core survived after tw schedule windows
Summary of Contribution II:
Template-Based Scheduling for Task Graphs
• The first work to consider task graph set scheduling for multiprocessors powered by energy harvesting
• Hybrid Workload Management framework (HyWM) achieved up to 29.5% lower miss rate compared to prior works
• Two offline schedule template generation approaches to offloadscheduling complexity of task graphs to design-time
• Comprehensive heuristic for slack reclamation and soft error handling with up to 45.2% miss rate reduction
• Aging-aware workload allocation with 19.7% longer MTTF
• Publication Y. Xiang, S. Pasricha, "Fault-Aware Application Scheduling in Low Power Embedded
Systems with Energy Harvesting", ACM/IEEE International Conference onHardware/Software Codesign and System Synthesis (CODES+ISSS), article 32, 2014.
Y. Xiang, S. Pasricha, "A Hybrid Framework for Application Allocation and Schedulingin Multicore Systems with Energy Harvesting", ACM Great Lakes Symposium on VLSI(GLSVLSI), pp. 163-168, 2014.
Y. Xiang, S. Pasricha, "Soft and Hard Reliability-Aware Scheduling for MulticoreEmbedded Systems with Energy Harvesting", IEEE Transactions on Multi-ScaleComputing Systems (TMSCS), under review.
54
Co
ntr
ibu
tio
n I
I: T
em
pla
te-B
as
ed
Sc
he
du
lin
g f
or
Ta
sk
Gra
ph
s
OUTLINE
55
Introduction and Preview of Contributions
Contribution I:Semi-Dynamic Scheduling for Independent Tasks: Hybrid Energy Storage, Process Variation, and Thermal Management
Contribution II:Template-Based Scheduling for Task Graphs:Slack Reclamation, Soft Errors, and Hard Failures
Contribution III:Mixed-Criticality Scheduling on Heterogeneous Cores:Soft Deadline, Near-Threshold/Super-Threshold Computing
Conclusion Ou
tlin
e
Contribution III: Mixed-Criticality Scheduling on Heterogeneous Systems
• Main Objective
• To reduce miss penalty of mixed-criticality workload under varying
and stringent energy harvesting condition
• Contributions
• Modeled mixed-criticality workload with soft/firm deadline constraints
• A single-ISA heterogeneous multicore platform for mixed-criticality
scheduling
• A novel timing intensity metric to estimate task instance importance
in a mixed-criticality workload
• Considered near-threshold computing for high energy efficiency
56
Co
ntr
ibu
tio
n I
II:
Mix
ed
-Cri
tic
ali
ty S
ch
ed
ulin
g o
n H
ete
rog
en
eo
us S
ys
tem
s
Mixed-Criticality Workload
(m, k)-soft deadline: only need to finish m instances out of every k instances
• Compare throughput-centric workload to timing-centric workload:
• Has less critical timing requirements
• Emphasizes even more on energy efficiency
• Our mixed-criticality scheduling problem:
Simultaneous scheduling of timing- and throughput-centric tasks
with energy harvesting
57
Criticality Type Timing-Centric Throughput-Centric
Structure Model task graphs multithreaded applications
Parallelism highly customized barrier-synchronized
Execution Time few seconds few minutes
Period tens of seconds tens of minutes
Deadline Model firm (m, k)-soft
Schedule Method template-based dynamic scheduling
Benchmark Suit E3S PARSEC
Co
ntr
ibu
tio
n I
II:
Mix
ed
-Cri
tic
ali
ty S
ch
ed
ulin
g o
n H
ete
rog
en
eo
us S
ys
tem
s
special case of task graph
simpler parallelism
longer execution time
period > schedule window
flexible timing
possible and necessary
same as task graph model
used in the previous section
Mixed-Criticality Scheduling on Heterogeneous Platform
• Big cores (high performance) for timing-centric task graphs
• Small cores (high efficiency) for throughput-centric multithreaded
applications: near-threshold computing
• Exception: sequential phases of multithreaded applications executed on big cores
• Interaction: split energy budget by comparing miss penalty density
• Lower weight factor (timing intensity) for tasks with soft deadlines
58
Co
ntr
ibu
tio
n I
II:
Mix
ed
-Cri
tic
ali
ty S
ch
ed
ulin
g o
n H
ete
rog
en
eo
us S
ys
tem
s
Near-Threshold Computing
59
Co
ntr
ibu
tio
n I
II:
Mix
ed
-Cri
tic
ali
ty S
ch
ed
ulin
g o
n H
ete
rog
en
eo
us S
ys
tem
s
Vth: CMOS gate-to-source
threshold voltage
Vdd: supply voltage:
• Super-threshold region
• High performance
• Low efficiency
• Near-threshold region
• Lower performance
• Highest efficiency
Heterogeneous Platform: Big Core and Small Core
• Simulated using Sniper (performance) and McPAT (power)
• Small core (near-threshold supply voltage) to focus on energy efficiency
• Big core to focus on timing performance
60
Architectural Parameters
Core Types Big Cores Small Cores
Execution Out-of-Order In-Order
Issue Width 4 2
Reorder Buffer Size 128 N/A
Cache 64KB, 4-way 16KB, direct
Core Area 15.7 mm2 4 mm2
Cluster Parameters
Cluster Type Big-Core-Cluster Small-Core-Cluster
Core Count 8 32
Frequency Control Per-Core DVFS Uniform Frequency
f , Vdd Range 0.5~1.2GHz, 0.4~1 V f nth, Vddnth
Technology Parameters
Technology Node 22 nm
Vth 0.289 V
Vddnth, f nth 0.4 V, 500 MHz
Co
ntr
ibu
tio
n I
II:
Mix
ed
-Cri
tic
ali
ty S
ch
ed
ulin
g o
n H
ete
rog
en
eo
us S
ys
tem
s
Experiment Results
61
• Compared to (m, k)-unaware,
9.5% performance benefit from soft deadline-awareness
• Compared to PIE (Performance impact estimation, ISCA 2012),
13.6% performance improvement from emphasis on energy efficiency
• Compared to B8-B8,
23.2% performance benefit from heterogeneous computing
B8-S32: proposed mixed-critical scheduling on heterogeneous platform with 8 big
cores and 32 small cores
Co
ntr
ibu
tio
n I
II:
Mix
ed
-Cri
tic
ali
ty S
ch
ed
ulin
g o
n H
ete
rog
en
eo
us S
ys
tem
s
Craeynestet al., “Scheduling heterogeneous multi-cores
through Performance Impact Estimation”, ISCA 2012.
• First work to consider mixed-criticality scheduling and near-threshold
computing for energy harvesting embedded systems
• A single-ISA heterogeneous multicore platform for mixed-criticality scheduling
• Estimate importance of tasks based on soft deadline constraints
• Up to 23.2% performance benefit
• Publication• Y. Xiang, S. Pasricha, "Mixed-Criticality Scheduling on Heterogeneous Multicore Systems
Powered by Energy Harvesting", ACM Transaction on Embedded Computing (TECS), under
review.
62
Summary of Contribution III: Mixed-Criticality Scheduling on Heterogeneous Systems
Co
ntr
ibu
tio
n I
II:
Mix
ed
-Cri
tic
ali
ty S
ch
ed
ulin
g o
n H
ete
rog
en
eo
us S
ys
tem
s
OUTLINE
63
Introduction and Preview of Contributions
Contribution I:Semi-Dynamic Scheduling for Independent Tasks: Hybrid Energy Storage, Process Variation, and Thermal Management
Contribution II:Template-Based Scheduling for Task Graphs:Slack Reclamation, Soft Errors, and Hard Failures
Contribution III:Mixed-Criticality Scheduling on Heterogeneous Cores:Soft Deadline, Near-Threshold/Super-Threshold Computing
Conclusion Ou
tlin
e
• The proposed Semi-Dynamic Framework is a unified solution for
energy harvesting-aware resource management with significant
advantages on energy efficiency and flexibility over prior work
• Issues addressed by our Semi-Dynamic Framework:
1. Minimizing miss rate/miss penalty
2. Run-time thermal control
3. Mitigating impact of process variation
4. Management of hybrid energy storage
5. Scheduling of task graphs with inter-node dependencies
6. Soft error handling and slack reclamation during execution
7. Mitigating aging effects across the chip over time
8. Mixed-criticality scheduling on heterogeneous processors
64
Co
nc
lus
ion
My Dissertation Summary
List of PublicationsJournal papers:
• Y. Xiang, S. Pasricha, "Run-Time Management for Multi-Core Embedded Systems with Energy
Harvesting", IEEE Transactions on Very Large Scale Integration Systems (TVLSI), 2014.
• B. Donohoo, C. Ohlsen, S. Pasricha, C. Anderson, Y. Xiang, "Context-Aware Energy
Enhancements for Smart Mobile Devices", IEEE Transactions on Mobile Computing (TMC), vol.
13, no. 8, 2013.
• Y. Zou, Y. Xiang, S. Pasricha, "Characterizing Vulnerability of Network Interfaces in Embedded
Chip Multiprocessors", IEEE Embedded System Letters (ESL), vol. 4, no. 2, 2012.
• Y. Xiang, S. Pasricha, "Mixed-Criticality Scheduling on Heterogeneous Multicore Systems
Powered by Energy Harvesting", ACM Transaction on Embedded Computing (TECS), under
review.
• Y. Xiang, S. Pasricha, "Soft and Hard Reliability-Aware Scheduling for Multicore Embedded
Systems with Energy Harvesting", IEEE Transactions on Multi-Scale Computing Systems
(TMSCS), under review.
Conference papers:
• Y. Xiang, S. Pasricha, "Fault-Aware Application Scheduling in Low Power Embedded Systems
with Energy Harvesting", ACM/IEEE International Conference on Hardware/Software Codesign
and System Synthesis (CODES+ISSS), article 32, 2014.
• Y. Xiang, S. Pasricha, "A Hybrid Framework for Application Allocation and Scheduling in Multicore
Systems with Energy Harvesting", ACM Great Lakes Symposium on VLSI (GLSVLSI), pp.163-
168, 2014.
• Y. Xiang, S. Pasricha, "Harvesting-Aware Energy Management for Multicore Platforms with
Hybrid Energy Storage", ACM Great Lakes Symposium on VLSI (GLSVLSI), pp. 25-30, 2013.
• Y. Xiang, S. Pasricha, "Thermal-Aware Semi-Dynamic Power Management for Multicore Systems
with Energy Harvesting", IEEE International Symposium on Quality Electronic Design (ISQED),
pp. 619-626, 2013.
• Y. Zou, Y. Xiang, S. Pasricha, “Analysis of On-chip Interconnection Network Interface Reliability in
Multicore Systems”, IEEE International Conference on Computer Design (ICCD), pp.427-428,
2011.
65
Lis
t o
f P
ub
lic
ati
on
s
Thank You!
66
Th
an
k y
ou
!