H.J. Siegel Tony Maciejewski Purdue University
description
Transcript of H.J. Siegel Tony Maciejewski Purdue University
H.J. SiegelTony MaciejewskiPurdue University
Viktor PrasannaUSC
Richard FreundNoemix
Adapting MSHN Scheduling Technology for HiPer-D MSHN
H.J. SiegelTony MaciejewskiPurdue UniversityColorado State University
Viktor PrasannaUSC
Richard FreundNoemix
Adapting MSHN Scheduling Technology for HiPer-D MSHN
MSHN: Management System for Heterogeneous Networks apply the techniques, knowledge, and expertise
we developed in MSHN for mapping applications to heterogeneous platforms to the HiPer-D environment
work in collaboration with DeSiDeRaTa and HiPer-D teams DeSiDeRaTa dynamically modifies existing HiPer-D mappings to avoid QoS
violations using real-time feedback our focus: derive initial HiPer-D mapping
done off-line – more time than dynamic mapper use estimated predictions about HiPer-D (no feedback)
H.J. SiegelTony MaciejewskiPurdue UniversityColorado State University
Viktor PrasannaUSC
Richard FreundNoemix
Adapting MSHN Scheduling Technology for HiPer-D MSHN
MSHN4
One Example HiPer-D Subsystem
CFF Broker
Tacfire
ALT Data Server
OTH Data Server
Gun Sim
(SGI)(SUN)
(NT)
TCP TCP Ensemble
Ensemble
Ensemble
TCP
NDDS
CORBA (TAO)
Ensemble Land Picture
Air Track Picture
Land Attack Engagement
Server
Deconflict Server
Fire Sim 21
Display Components
MSHN5
HiPer-D System-Wide Attributes
continuously executing applications
QoS constraints for a system-wide mapping
for each subsystem throughput end-to-end latency
target platform
collection of heterogeneous COTS machines and networks
standard operating systems and network protocols
MSHN6
Our Proposed Goal for the Initial Mapping
find an initial mapping of all HiPer-D applications onto the machines in the hardware platform
to be used when HiPer-D is first booted up
based on estimated predicted computation and communication times
derived off-line (static mapping) – more time for derivation than a dynamic (on-line) mapping
maximize the allowable increase in workload (e.g., targets) until a dynamic reallocation of resources is required to avoid a QoS violation in any subsystem
finding an optimal solution to the problem is intractable
need heuristic approaches
MSHN7
Pluggable Algorithms (Heuristics)
select a pluggable algorithm (heuristic)
depends on
heterogeneous suite of machines
quality of mapping vs. off-line heuristic execution time
heuristics being investigated
local optimization: “2-phase greedy”
mixed integer programming based
global optimization: genetic algorithm (GA) simulated annealing (SA)
MSHN8
Research Tasks to Create Heuristics
characterized and modeled the continuously executing
HiPer-D applications and the hardware platform
(static information, no real-time feedback)
computation and communication time
particular machine selection
machine multi-tasking and network sharing
application workload
non-periodic (asynchronous) nature
context switching and background loads
limited profiling information available
MSHN9
Research Tasks to Create Heuristics (cont’d)
quantified the performance goal using the model
designing and developing heuristics for allocating the
applications so as to optimize the performance goal
leverage heterogeneity
evaluating performance of heuristics using simulations
will integrate into HiPer-D
MSHN10
Hardware Platform Model Hi-Level Overview collection of heterogeneous machines
each with full-duplex connection to a non-blocking switch
collection of sensors and actuators each with a half-duplex connection to the switch
sensor features data output rate - the
# of data sets produced per time unit
tactical load - the # of “tracks” per data set
MSHN11
Application Model Concepts
a directed acyclic graph nodes are applications, arcs are data transfers
continuously executing, pipelined applications
actuator 1
sensor 3
sensor 1
sensor 2
actuator 2
sensor
actuator
application
MSHN12
Modeling Multiple Input Applications
discriminating applications - do not produce an output unless a particular input is present Z is a discriminating
application generates output only if certain
sensor data received from radar combining applications - produce
an output when all inputs arrive S is a combining application P, Q, and R solve pieces
of a large computation, and Scombines the pieces to form the final answer
Z
YY sends picture of battlefield to Z
P
Q
R
S
MSHN13
Paths Overview group of applications that together perform one job
a “path” begins at a sensor ends at an actuator or a discriminating application
paths are important because QoS constraints are defined on paths
= discriminating app.
path 1
path 2
path 3
path 4
MSHN14
Real-Time QoS Constraints for a Path end-to-end latency - the time elapsed between
the instant the sensor produces an input and the instant the path delivers its own output must be no larger than a specified value
throughput - the number of data sets that the path can process in unit time must be no smaller than a specified value assumed equal to the sensor data rate
end-to-end latency
MSHN15
Verifying Constraints computation stage
a set of applications communication stage
a set of data transfers path
sequence of computation and communication stages throughput constraint
execution time for any comp. stage 1/(sensor data rate) execution time for any comm. stage 1/(sensor data rate)
the end-to-end latency constraint the sum of execution times for ALL stages in a given
path that path’s maximum allowed end-to-end latency
MSHN16
Quantifying the Allowable Increase in Load
let the maximum allowable increase in the tactical load be abbreviated as “MAIT”
MAIT for a system = min(system wide MAIT for throughput constraint, system wide MAIT for latency constraint )
different procedures required to find the MAIT values for throughput and latency constraints
assume that a mapping is given
MSHN17
MAIT for the Latency Constraint
solve on a path-by-path basis let be the initial tactical load for the sensor
for this path (load was basis for the mapping) solve for in the next equation to find
MAIT for the latency constraint for a path set the maximum allowed path latency
= execution times for all comp. stages in the path given sensor tactical load + execution times for all comm. stages in the path given sensor tactical load
repeat the calculation above for all paths system wide MAIT for latency = min of ( – )/ over all paths
MSHN18
MAIT for the Throughput Constraint solve on a path-by-path basis recall is the initial tactical load for the sensor
for this path (load was basis for the mapping) solve for in the next equation to find
MAIT for the throughput constraint for a path 1/(max. output rate from the sensor) =
max(max(computation time for any application in the path given sensor tactical load ),
max(communication time for any data transfer in the path given sensor tactical load ) )
repeat the calculation above for all paths system wide MAIT for throughput =
min of ( – )/ over all paths
MSHN19
Two-Phase Greedy Heuristic
origins in the Min-min heuristic
first discussed in Ibarra and Kim, J. of ACM, ’77
from prior MSHN work
performed comparably to GA and SA
in some different environments
faster heuristic than GA and SA
experiment to determine its performance
in the HiPer-D environment
MSHN20
Two-Phase Greedy: Simplified Overview
1st phase: for each unmapped application individually, find
machine that maximizes allowable increase in workload for
throughput constraint (without allowing any QoS violations)
based on applications mapped so far
2nd phase: select the single application/machine pair
from 1st phase that maximizes the allowable increase in
workload for the latency constraint and map the application
based on applications mapped so far and
average values for unmapped applications
repeat both phases for all unmapped applications
MSHN21
Variations on Two-Phase Greedy Approach base: 1st phase: throughput constraint,
2nd phase: latency constraint
variations
1st phase: latency, 2nd phase: throughput
1st phase: latency, 2nd phase: latency
1st phase: throughput, 2nd phase: throughput
simulations: best is function of relative tightness of constraints
“lower” bounds - a single phase greedy
map applications in random order
either use just throughput or use just latency
upper bounds – use best machine, unlimited machines, no communications(unattainable)
MSHN22
Mixed Integer Programming (MIP) Approach based on well-researched mathematical techniques for optimization mathematical programming formulation based on models
of the latency and throughput of applications applicable to several objective functions uses QoS requirements as constraints directly solvable if the formulation is linear use heuristics to convert non-linear problems into linear problems
solution globally optimized satisfies all constraints
MSHN23
MIP Approach Overview
SubsystemApplication
model
Target Target PlatformPlatform
Linearization
Heuristics
Paths MIPFormulation
optimize
subject to•latency requirements
•throughput requirements
•mapping constraints
objective function
Solution
MIPSolver
MSHN24
MIP Formulation issue
direct formulation of the MIP problem is non-linear product of two variables in a single term
solution use heuristics to estimate one of the variables
linearize the formulation for example
latency of an application impacted by the number of applications mapped on each machine (N) estimating N
Capability Based Heuristic (CBH) Uniformly Allocated Heuristic (UAH)
MSHN25
Preliminary Simulation Results
objective:
20
40
60
80
100
3 4 5
CBH UAH
number of machines
avg % of optimal value(12 applications,
10 cases)
workloadinitial
workloadin increase allowable maximummin maximizepaths all
MSHN26
Status and Impact
status the heuristics are being implemented and evaluated integration as pluggable HiPer-D algorithms to occur
impact provide effective initial allocation
of resources to HiPer-D applications simulation tool for “what if” studies of
resources available versus manageable load possible to compare off-line versus on-line resource
allocation heuristics for the same HiPer-D system state
MSHN27
Proposed Post Quorum Work
applying our heuristics to the entire HiPer-D system
and evaluating their performance
developing application task-profiling procedures predict computation and communication times predict impact of workload changes
refining our current models to have more accurate approximations of impact of multitasking bandwidth sharing workload varying in different ways at different sensors
MSHN28
Proposed Post Quorum Work (cont’d)
understanding the performance of heuristics when the estimated computation and communication times differ from the actual times evaluate robustness of current heuristics design alternate heuristics for robustness
using our heuristics for determining the relative strengths and weaknesses of existing or proposed hardware environments
MSHN29
Summary of Notes for Quorum Assessment
Our Specific Program Goal original MSHN:
resource management system that exploits heterogeneity MSHN for HiPer-D:
initial resource allocation generated off-line
Technical Approach (How?) original MSHN:
Scheduling Advisor; Client Library; Resource Status Server: Resource Requirements Database
MSHN for HiPer-D: model computation and communication; heuristics for mapping continuous, communicating, aperiodic applications
MSHN30
Summary of Notes for Quorum Assessment (cont’d)
Scope of Solution (Metrics) original MSHN:
makespan; FISC – Flexible Integrated System Capability (priorities, deadlines, versions, security,…)
MSHN for HiPer-D: % allowable increase in workload w/o dynamic remapping
Open Problems/Extensions original MSHN:
build prototype into full working system MSHN for HiPer-D:
application profiling; impact of multitasking; heuristic robustness; use as system evaluator
MSHN31
t-1
j =0
(pj) ij ij ij ij 4
ij ij ij ij
max0 i < Ij
FISC Measure – Collective Value of Applications
priorities: (pj) - weight required descendant: ij - output generated completed by firm deadline: if yes ij = 1, else ij = 0 required security: ij = 1 if minimum met, else ij = 0 required application specific QoS:
ij = 1 if minimum met, else ij = 0 versions: ij - normalized worth (%) deadlines: ij - % (based on eij
d, sijd, fij
d, m) variable security: ij - % satisfied variable application specific QoS: ij - % satisfied
MSHN32
relative importance among attributes for task j: cj (versions), cj(deadline), cj (security),
and cj(application specific QoS) set by user, policy maker, or application developer
all coefficients 0, and cj cj cj cj > 0
1
if ij = ij = ij = ij = 100%, fraction = 1
(pj) ij ij ij ij t-1
j =0
cj cj cj cj
cjij cjij cjij cjij
max
0 i < Ij
FISC with Weighted Coefficients
cj cj cj cj
cjij cjij cjij cjij
MSHN33
Summary of Notes for Quorum Assessment (cont’d)
Scope of Solution (Metrics) original MSHN:
makespan; FISC – Flexible Integrated System Capability (priorities, deadlines, versions, security,…)
MSHN for HiPer-D: % allowable increase in workload w/o dynamic remapping
Open Problems/Extensions original MSHN:
build prototype into full working system MSHN for HiPer-D:
application profiling; impact of multitasking; heuristic robustness; use as system evaluator
MSHN34