July, 2000.Simulation of distributed computing systems I.C. Legrand1 MONARC Models Of Networked...

July, 2000. Simulation of distributed computing systems I.C. Legrand 1

MONARC Models Of Networked Analysis at Regional Centers

Iosif C. Legrand (CALTECH)

Simulation of distributed computing systemsSimulation of distributed computing systems


The Design and the Development of a Simulation program for large scale distributed computing systems.

Validation tests

Queuing Theory

Performance measurements based on an Object Oriented data model for specific HEP applications.

A large system simulation: the CMS HLT Farm.

Proposal for a Dynamic, Self Organising scheduling system.

Summary

Contents Contents


The GOALS of the Simulation The GOALS of the Simulation ProgramProgram

To perform realistic simulation and modelling of large scale distributed computing systems, customised for specific HEP applications.

To provide a design framework to evaluate the performance of a range of possible computer systems, as measured by their ability to provide the physicists with the requested data in the required time, and to optimise the cost.

To narrow down a region in this parameter space in which viable models can be chosen by any of the LHC-era experiments.

To offer a dynamic and flexible simulation environment.


The Monarc Simulation ProgramThe Monarc Simulation Program

This Simulation program is not intended to be a detailed simulator for basic components such as operating systems, data base servers or routers. Instead, based on realistic mathematical models and measured parameters on test bed systems for all the basic components, it aims to correctly describe the performance and limitations of large distributed systems with complex interactions.

At the same time it provides a flexible framework for evaluating different strategies for middleware software design, providing dynamic load balancing and optimising resource utilisation as well as turnaround time for high priority tasks.


Design Considerations of the Design Considerations of the Simulation ProgramSimulation Program

The simulation and modelling task for the MONARC project

requires to describe many complex programs running

concurrently in a distributed architecture.

A process oriented approach for discrete event simulation is well suited to describe concurrent running programs.

“Active objects” (having an execution thread, a program counter, stack...) provide an easy way to map the structure of a set of distributed running programs into the simulation environment.


Design Considerations of the Design Considerations of the Simulation Program (2)Simulation Program (2)

This simulation project is based on Java(TM) technology which provides adequate tools for developing a flexible and distributed process oriented simulation. Java has built-in multi-thread support for concurrent processing, which can be used for simulation purposes by providing a dedicated scheduling mechanism.

The distributed objects support (through RMI or CORBA) can be used on distributed simulations, or for an environment in which parts of the system are simulated and interfaced through such a mechanism with other parts which actually are running the real application. The distributed object model can also provide the environment to be used for autonomous mobile agents.


Data Model Data Model

Database server

Disk

Tape Unit

Data Base Index

...

Data Container

Data Container

Array

...

CLIENT

Register

AMS SERVER

D a ta C on ta iner

Da ta C o nta iner

Data Base

D a ta C on ta iner

Da ta C o ntainer

Data Base

...

Database server

It provides:

Realistic mapping for an object data base

Specific HEP data structure Transparent access to any

data Automatic storage

management An efficient way to handle

very large number of objects. Emulation of clustering

factors for different types of access patterns.

Handling related objects in different data bases.


Multitasking Processing ModelMultitasking Processing Model

Concurrent running tasks share resources (CPU, memory, I/O)

“Interrupt” driven scheme: For each new task or when one task is finished, an interrupt is

generated and all “processing times” are recomputed.

It provides:

Handling of concurrent jobs with different priorities.

An efficient mechanism to simulate multitask processing.

An easy way to apply different load balancingschemes.


“Interrupt” driven simulation for each new message an interrupt is created and for all the active transfers the speed and the

estimated time to complete the transfer are recalculated.

An efficient and realistic way to simulate concurrent transfers having different sizes / protocols.

LAN/WAN Simulation ModelLAN/WAN Simulation Model


Arrival Patterns Arrival Patterns

A flexible mechanism to define the Stochastic process of how users perform data processing tasks

Dynamic loading of “Activity” tasks, which are threaded objects and are controlled by the simulation scheduling mechanism

Physics ActivitiesInjecting “Jobs”

Each “Activity” thread generates data processing jobs

for( int k =0; k< jobs_per_group; k++) { Job job = new Job( this, Job.ANALYSIS, "TAG”, 1, events_to_process); farm.addJob(job ); // submit the job sim_hold ( 1000 ); // wait 1000 s }

Regional Centre Farm

Job

Activity

Job

Job

Activity

These dynamic objects are used to model the users behavior


Regional Centre ModelRegional Centre Model

Complex Composite Object


Input Parameters for the Simulation Input Parameters for the Simulation Program Program

Response functions are based on “the previous state” of the component, a set of system related parameters (SysP) and parameters for a specific request (ReqP).

Such a time response function allows to describe correctly Highly Nonlinear Processes or “Chaotic” Systems behavior (typical for caching, swapping…)

It is important to correctly identify and describe the time response functions for all active components in the system. This should be done using realistic measurements.

The simulation frame allows one to introduce any time dependent response function for the interacting components.

Ti F Ti 1– SysP ReqP =


Simulation GUISimulation GUI

One may dynamically add and (re)configure Regional Centers parametersOne may dynamically add and (re)configure Regional Centers parameters


Simulation GUI (2)Simulation GUI (2)

On-line monitoring for major parameters in the simulation. On-line monitoring for major parameters in the simulation. Tools to analyze the data. Tools to analyze the data.


Results repository and the Results repository and the “publishing” procedure“publishing” procedure

Web Server

afs/nfs file system

RMI ServerWrite Objects


Queueing theory (1)Queueing theory (1)

M | M | 1 Model M | M | 1 Model

E[S]

arrivals waiting in service

E[N] Mean number of jobs E[R] Mean response time


Queueing theory (2)Queueing theory (2) M | M | 1 network queue modelM | M | 1 network queue model

arrivals

waiting in service

1 2 r

E N E

i 1=

r

Ni

i

1 i–

-------------

i 1=

r

= = E R E Ri

i 1=

r

E Si 1 i– -------------------

i 1=

r

= =and


Validation Measurements I Validation Measurements I The AMS Data Access Case The AMS Data Access Case

0

20

40

60

80

100

120

140

160

180

0 5 10 15 20 25 30 35

No. of concurrent jobs

Me

an

Tim

e p

er

job

[m

s]

Simulation Measurements

Raw Data DBLAN

4 CPUs Client


Validation Measurements IValidation Measurements I

Local DB access 32 jobs

The Distribution of the jobs processing time

monarc01

0

5

10

15

20

25

30

35

100 105 110 115 120

Simulationmean 109.5

Measurementmean 114.3


0

50

100

150

200

250

300

350

0 5 10 15 20 25 30 35


Me

an

Tim

e p

er j

ob

[m

s]

Local DB (atlobj) Local Db (monarc) AMS

Sim Local DB Sim Local DB Sim AMS

Validation Measurements IValidation Measurements IMeasurements & Simulation Measurements & Simulation

SimulationMeasurements


Validation Measurements II Validation Measurements II

0

2000

4000

6000

8000

10000

12000

14000

0 5 10 15 20 25


Tim

e

[s]

Measurements Simulation

0

500

1000

1500

2000

2500

3000

3500

4000

0 2 4 6 8

2Mbps WAN Client - Server CPU usage

Data Traffic

12 3 4 5 6

12 3 4 5 6


Simple Example: Resource Utilisation Simple Example: Resource Utilisation vs. Job’s Response Time vs. Job’s Response Time

Mean 0.55 Mean 0.72 Mean 0.93

Physics Analysis ExamplePhysics Analysis Example

180 CPUs 200 CPUs 250 CPUs


The CMS HLT Farm SimulationThe CMS HLT Farm Simulation

PileupDB

PileupDB

PileupDB

PileupDB

PileupDB

HPSS

PileupDB

PileupDB

SignalDB

SignalDB

SignalDB

...

6 Servers for Signal

Output Server

Output Server

Lock Server

Lock Server

SU

N

...

FARM 140 Processing Nodes

17 Servers

9 S

erve

rs

Total 24 Pile Up Servers

2 Objectivity Federations


The HLT Farm Configuration The HLT Farm Configuration

Distribute Pileup over 24 Linux servers

Other data over 6 Linux servers (70GB disk each)

2 Linux stations used for federations (Metadata, catalog etc)

2 Linux stations used for Journal files (used in locking)

shift20 (SUN) Used for lockserving and Output (2 x ~250GB disks)

The strategy is to use many commodity PCs as data Base Servers


Network Traffic & Job efficiency Network Traffic & Job efficiency

Mean measured Value ~48MB/s

Measurement

SimulationJet

<0.52>

Muon<0.90>


Total Time for Jet & Muon Total Time for Jet & Muon Production JobsProduction Jobs


CPU LOADCPU LOAD

Jet Production Job

Muon Production Job

ReadWrite


A Self-Organising Job Scheduling A Self-Organising Job Scheduling SystemSystem

The aim of this proposal is to describe a possible approach for the scheduling task, as a system able to dynamically learn and cluster information in a large dimensional parameter space. This dynamic scheduling system should be seen as an adaptive middle layer software, aware of current available resources and based on the “past experience” to optimise the job performance and resource utilisation.


A self organising modelA self organising model

This approach is based on using the “past experience” from jobs that have been executed to create a dynamic decision making scheme. A competitive learning algorithm can be used to “cluster” correlated information in the multi-dimensional space.

In our case, a feature mapping architecture able to map a high-dimensional input space into a much lower-dimensional structure in such a way that most of similarly correlated patters in the original data remain in the mapped space.

Such a clustering scheme seems possible for this decision making scheme as we expert a strong correlation between the parameters involved. Compared with an “intuitive” model we expect that such an approach will offer a better way to analyse possible options and can evolve and improve itself dynamically.


A simple toy example A simple toy example

We assume that the time to execute a job in the local farm having a certain load () is:

Where t0 is the theoretical time to perform the job and f() describes the effect of the farm load in the job execution time.

If the job is executed on a remote site, an extra factor () is introduced reducing the response time:

tr t0 1 f r + =

tl t0 1 f + =


Clustering of the Self Organising MapClustering of the Self Organising Map


Evaluating the self organising Evaluating the self organising scheduling with the simulation toolscheduling with the simulation tool

DECISION

Intuitive scheduling

scheme

RCsSIMULATION

Activities Simulation

Job

“Self OrganizingMap”

Job

Job

Job Job

JobJob

Job

Warming up

The decision for future jobs is based on identifying the clusters in the total parameter space, which are close to the hyper-plane defined in this space by the {Job} {System} subset.

In this way the decision can be done evaluating the typical performances of this list of close clusters and chose a decision set which meets the expected / available performance / resources and cost.


SummarySummary

A CPU- and code-efficient approach for the simulation of distributed systems has been developed for the MONARC project.

provides an easy way to map the distributed data processing, transport and analysis tasks onto the simulation.

can handle dynamically any model configuration, including very elaborate ones with hundreds of interacting complex objects.

can run on real distributed computer systems, and may interact with real components.

These results are encouraging and motivate us to continue developing this simulation tool.

Modelling and understanding current systems, their performance and limitations, is essential for the design of the future large scale distributed processing systems.

A frame to evaluate different strategies for the middle layer software to optimize resource utilization in such distributed systems is under development.

July, 2000.Simulation of distributed computing systems I.C. Legrand1 MONARC Models Of Networked...

Documents

Transcript of July, 2000.Simulation of distributed computing systems I.C. Legrand1 MONARC Models Of Networked...