H.J. Siegel Tony Maciejewski Purdue University

34
H.J. Siegel Tony Maciejewski Purdue University Viktor Prasanna USC Richard Freund Noemix Adapting MSHN Scheduling Technology for HiPer-D MSHN

description

Viktor Prasanna USC. Richard Freund Noemix. Adapting MSHN Scheduling Technology for HiPer-D. H.J. Siegel Tony Maciejewski Purdue University. Viktor Prasanna USC. Richard Freund Noemix. Adapting MSHN Scheduling Technology for HiPer-D. H.J. Siegel Tony Maciejewski Purdue University - PowerPoint PPT Presentation

Transcript of H.J. Siegel Tony Maciejewski Purdue University

Page 1: H.J. Siegel Tony Maciejewski Purdue University

H.J. SiegelTony MaciejewskiPurdue University

Viktor PrasannaUSC

Richard FreundNoemix

Adapting MSHN Scheduling Technology for HiPer-D MSHN

Page 2: H.J. Siegel Tony Maciejewski Purdue University

H.J. SiegelTony MaciejewskiPurdue UniversityColorado State University

Viktor PrasannaUSC

Richard FreundNoemix

Adapting MSHN Scheduling Technology for HiPer-D MSHN

Page 3: H.J. Siegel Tony Maciejewski Purdue University

MSHN: Management System for Heterogeneous Networks apply the techniques, knowledge, and expertise

we developed in MSHN for mapping applications to heterogeneous platforms to the HiPer-D environment

work in collaboration with DeSiDeRaTa and HiPer-D teams DeSiDeRaTa dynamically modifies existing HiPer-D mappings to avoid QoS

violations using real-time feedback our focus: derive initial HiPer-D mapping

done off-line – more time than dynamic mapper use estimated predictions about HiPer-D (no feedback)

H.J. SiegelTony MaciejewskiPurdue UniversityColorado State University

Viktor PrasannaUSC

Richard FreundNoemix

Adapting MSHN Scheduling Technology for HiPer-D MSHN

Page 4: H.J. Siegel Tony Maciejewski Purdue University

MSHN4

One Example HiPer-D Subsystem

CFF Broker

Tacfire

ALT Data Server

OTH Data Server

Gun Sim

(SGI)(SUN)

(NT)

TCP TCP Ensemble

Ensemble

Ensemble

TCP

NDDS

CORBA (TAO)

Ensemble Land Picture

Air Track Picture

Land Attack Engagement

Server

Deconflict Server

Fire Sim 21

Display Components

Page 5: H.J. Siegel Tony Maciejewski Purdue University

MSHN5

HiPer-D System-Wide Attributes

continuously executing applications

QoS constraints for a system-wide mapping

for each subsystem throughput end-to-end latency

target platform

collection of heterogeneous COTS machines and networks

standard operating systems and network protocols

Page 6: H.J. Siegel Tony Maciejewski Purdue University

MSHN6

Our Proposed Goal for the Initial Mapping

find an initial mapping of all HiPer-D applications onto the machines in the hardware platform

to be used when HiPer-D is first booted up

based on estimated predicted computation and communication times

derived off-line (static mapping) – more time for derivation than a dynamic (on-line) mapping

maximize the allowable increase in workload (e.g., targets) until a dynamic reallocation of resources is required to avoid a QoS violation in any subsystem

finding an optimal solution to the problem is intractable

need heuristic approaches

Page 7: H.J. Siegel Tony Maciejewski Purdue University

MSHN7

Pluggable Algorithms (Heuristics)

select a pluggable algorithm (heuristic)

depends on

heterogeneous suite of machines

quality of mapping vs. off-line heuristic execution time

heuristics being investigated

local optimization: “2-phase greedy”

mixed integer programming based

global optimization: genetic algorithm (GA) simulated annealing (SA)

Page 8: H.J. Siegel Tony Maciejewski Purdue University

MSHN8

Research Tasks to Create Heuristics

characterized and modeled the continuously executing

HiPer-D applications and the hardware platform

(static information, no real-time feedback)

computation and communication time

particular machine selection

machine multi-tasking and network sharing

application workload

non-periodic (asynchronous) nature

context switching and background loads

limited profiling information available

Page 9: H.J. Siegel Tony Maciejewski Purdue University

MSHN9

Research Tasks to Create Heuristics (cont’d)

quantified the performance goal using the model

designing and developing heuristics for allocating the

applications so as to optimize the performance goal

leverage heterogeneity

evaluating performance of heuristics using simulations

will integrate into HiPer-D

Page 10: H.J. Siegel Tony Maciejewski Purdue University

MSHN10

Hardware Platform Model Hi-Level Overview collection of heterogeneous machines

each with full-duplex connection to a non-blocking switch

collection of sensors and actuators each with a half-duplex connection to the switch

sensor features data output rate - the

# of data sets produced per time unit

tactical load - the # of “tracks” per data set

Page 11: H.J. Siegel Tony Maciejewski Purdue University

MSHN11

Application Model Concepts

a directed acyclic graph nodes are applications, arcs are data transfers

continuously executing, pipelined applications

actuator 1

sensor 3

sensor 1

sensor 2

actuator 2

sensor

actuator

application

Page 12: H.J. Siegel Tony Maciejewski Purdue University

MSHN12

Modeling Multiple Input Applications

discriminating applications - do not produce an output unless a particular input is present Z is a discriminating

application generates output only if certain

sensor data received from radar combining applications - produce

an output when all inputs arrive S is a combining application P, Q, and R solve pieces

of a large computation, and Scombines the pieces to form the final answer

Z

YY sends picture of battlefield to Z

P

Q

R

S

Page 13: H.J. Siegel Tony Maciejewski Purdue University

MSHN13

Paths Overview group of applications that together perform one job

a “path” begins at a sensor ends at an actuator or a discriminating application

paths are important because QoS constraints are defined on paths

= discriminating app.

path 1

path 2

path 3

path 4

Page 14: H.J. Siegel Tony Maciejewski Purdue University

MSHN14

Real-Time QoS Constraints for a Path end-to-end latency - the time elapsed between

the instant the sensor produces an input and the instant the path delivers its own output must be no larger than a specified value

throughput - the number of data sets that the path can process in unit time must be no smaller than a specified value assumed equal to the sensor data rate

end-to-end latency

Page 15: H.J. Siegel Tony Maciejewski Purdue University

MSHN15

Verifying Constraints computation stage

a set of applications communication stage

a set of data transfers path

sequence of computation and communication stages throughput constraint

execution time for any comp. stage 1/(sensor data rate) execution time for any comm. stage 1/(sensor data rate)

the end-to-end latency constraint the sum of execution times for ALL stages in a given

path that path’s maximum allowed end-to-end latency

Page 16: H.J. Siegel Tony Maciejewski Purdue University

MSHN16

Quantifying the Allowable Increase in Load

let the maximum allowable increase in the tactical load be abbreviated as “MAIT”

MAIT for a system = min(system wide MAIT for throughput constraint, system wide MAIT for latency constraint )

different procedures required to find the MAIT values for throughput and latency constraints

assume that a mapping is given

Page 17: H.J. Siegel Tony Maciejewski Purdue University

MSHN17

MAIT for the Latency Constraint

solve on a path-by-path basis let be the initial tactical load for the sensor

for this path (load was basis for the mapping) solve for in the next equation to find

MAIT for the latency constraint for a path set the maximum allowed path latency

= execution times for all comp. stages in the path given sensor tactical load + execution times for all comm. stages in the path given sensor tactical load

repeat the calculation above for all paths system wide MAIT for latency = min of ( – )/ over all paths

Page 18: H.J. Siegel Tony Maciejewski Purdue University

MSHN18

MAIT for the Throughput Constraint solve on a path-by-path basis recall is the initial tactical load for the sensor

for this path (load was basis for the mapping) solve for in the next equation to find

MAIT for the throughput constraint for a path 1/(max. output rate from the sensor) =

max(max(computation time for any application in the path given sensor tactical load ),

max(communication time for any data transfer in the path given sensor tactical load ) )

repeat the calculation above for all paths system wide MAIT for throughput =

min of ( – )/ over all paths

Page 19: H.J. Siegel Tony Maciejewski Purdue University

MSHN19

Two-Phase Greedy Heuristic

origins in the Min-min heuristic

first discussed in Ibarra and Kim, J. of ACM, ’77

from prior MSHN work

performed comparably to GA and SA

in some different environments

faster heuristic than GA and SA

experiment to determine its performance

in the HiPer-D environment

Page 20: H.J. Siegel Tony Maciejewski Purdue University

MSHN20

Two-Phase Greedy: Simplified Overview

1st phase: for each unmapped application individually, find

machine that maximizes allowable increase in workload for

throughput constraint (without allowing any QoS violations)

based on applications mapped so far

2nd phase: select the single application/machine pair

from 1st phase that maximizes the allowable increase in

workload for the latency constraint and map the application

based on applications mapped so far and

average values for unmapped applications

repeat both phases for all unmapped applications

Page 21: H.J. Siegel Tony Maciejewski Purdue University

MSHN21

Variations on Two-Phase Greedy Approach base: 1st phase: throughput constraint,

2nd phase: latency constraint

variations

1st phase: latency, 2nd phase: throughput

1st phase: latency, 2nd phase: latency

1st phase: throughput, 2nd phase: throughput

simulations: best is function of relative tightness of constraints

“lower” bounds - a single phase greedy

map applications in random order

either use just throughput or use just latency

upper bounds – use best machine, unlimited machines, no communications(unattainable)

Page 22: H.J. Siegel Tony Maciejewski Purdue University

MSHN22

Mixed Integer Programming (MIP) Approach based on well-researched mathematical techniques for optimization mathematical programming formulation based on models

of the latency and throughput of applications applicable to several objective functions uses QoS requirements as constraints directly solvable if the formulation is linear use heuristics to convert non-linear problems into linear problems

solution globally optimized satisfies all constraints

Page 23: H.J. Siegel Tony Maciejewski Purdue University

MSHN23

MIP Approach Overview

SubsystemApplication

model

Target Target PlatformPlatform

Linearization

Heuristics

Paths MIPFormulation

optimize

subject to•latency requirements

•throughput requirements

•mapping constraints

objective function

Solution

MIPSolver

Page 24: H.J. Siegel Tony Maciejewski Purdue University

MSHN24

MIP Formulation issue

direct formulation of the MIP problem is non-linear product of two variables in a single term

solution use heuristics to estimate one of the variables

linearize the formulation for example

latency of an application impacted by the number of applications mapped on each machine (N) estimating N

Capability Based Heuristic (CBH) Uniformly Allocated Heuristic (UAH)

Page 25: H.J. Siegel Tony Maciejewski Purdue University

MSHN25

Preliminary Simulation Results

objective:

20

40

60

80

100

3 4 5

CBH UAH

number of machines

avg % of optimal value(12 applications,

10 cases)

workloadinitial

workloadin increase allowable maximummin maximizepaths all

Page 26: H.J. Siegel Tony Maciejewski Purdue University

MSHN26

Status and Impact

status the heuristics are being implemented and evaluated integration as pluggable HiPer-D algorithms to occur

impact provide effective initial allocation

of resources to HiPer-D applications simulation tool for “what if” studies of

resources available versus manageable load possible to compare off-line versus on-line resource

allocation heuristics for the same HiPer-D system state

Page 27: H.J. Siegel Tony Maciejewski Purdue University

MSHN27

Proposed Post Quorum Work

applying our heuristics to the entire HiPer-D system

and evaluating their performance

developing application task-profiling procedures predict computation and communication times predict impact of workload changes

refining our current models to have more accurate approximations of impact of multitasking bandwidth sharing workload varying in different ways at different sensors

Page 28: H.J. Siegel Tony Maciejewski Purdue University

MSHN28

Proposed Post Quorum Work (cont’d)

understanding the performance of heuristics when the estimated computation and communication times differ from the actual times evaluate robustness of current heuristics design alternate heuristics for robustness

using our heuristics for determining the relative strengths and weaknesses of existing or proposed hardware environments

Page 29: H.J. Siegel Tony Maciejewski Purdue University

MSHN29

Summary of Notes for Quorum Assessment

Our Specific Program Goal original MSHN:

resource management system that exploits heterogeneity MSHN for HiPer-D:

initial resource allocation generated off-line

Technical Approach (How?) original MSHN:

Scheduling Advisor; Client Library; Resource Status Server: Resource Requirements Database

MSHN for HiPer-D: model computation and communication; heuristics for mapping continuous, communicating, aperiodic applications

Page 30: H.J. Siegel Tony Maciejewski Purdue University

MSHN30

Summary of Notes for Quorum Assessment (cont’d)

Scope of Solution (Metrics) original MSHN:

makespan; FISC – Flexible Integrated System Capability (priorities, deadlines, versions, security,…)

MSHN for HiPer-D: % allowable increase in workload w/o dynamic remapping

Open Problems/Extensions original MSHN:

build prototype into full working system MSHN for HiPer-D:

application profiling; impact of multitasking; heuristic robustness; use as system evaluator

Page 31: H.J. Siegel Tony Maciejewski Purdue University

MSHN31

t-1

j =0

(pj) ij ij ij ij 4

ij ij ij ij

max0 i < Ij

FISC Measure – Collective Value of Applications

priorities: (pj) - weight required descendant: ij - output generated completed by firm deadline: if yes ij = 1, else ij = 0 required security: ij = 1 if minimum met, else ij = 0 required application specific QoS:

ij = 1 if minimum met, else ij = 0 versions: ij - normalized worth (%) deadlines: ij - % (based on eij

d, sijd, fij

d, m) variable security: ij - % satisfied variable application specific QoS: ij - % satisfied

Page 32: H.J. Siegel Tony Maciejewski Purdue University

MSHN32

relative importance among attributes for task j: cj (versions), cj(deadline), cj (security),

and cj(application specific QoS) set by user, policy maker, or application developer

all coefficients 0, and cj cj cj cj > 0

1

if ij = ij = ij = ij = 100%, fraction = 1

(pj) ij ij ij ij t-1

j =0

cj cj cj cj

cjij cjij cjij cjij

max

0 i < Ij

FISC with Weighted Coefficients

cj cj cj cj

cjij cjij cjij cjij

Page 33: H.J. Siegel Tony Maciejewski Purdue University

MSHN33

Summary of Notes for Quorum Assessment (cont’d)

Scope of Solution (Metrics) original MSHN:

makespan; FISC – Flexible Integrated System Capability (priorities, deadlines, versions, security,…)

MSHN for HiPer-D: % allowable increase in workload w/o dynamic remapping

Open Problems/Extensions original MSHN:

build prototype into full working system MSHN for HiPer-D:

application profiling; impact of multitasking; heuristic robustness; use as system evaluator

Page 34: H.J. Siegel Tony Maciejewski Purdue University

MSHN34