Final Presentation, Summer Internship Program, CERN

Post on 29-Nov-2014

1.469 views 2 download

description

This was my final presentation of the summer to my research group at CERN. I gave this presentation with the other student I worked with, Martin Barisits, to the Distributed Data Management group of ATLAS at CERN.

Transcript of Final Presentation, Summer Internship Program, CERN

MARTINWILLSIMGRID Simulator

Martin Barisits Will Boyd

Supervised by Mario Lassnig and Vincent Garonne

August 13, 2009

Barisits, Boyd (Vienna UT, Georgia Tech) MARTINWILLSIM August 13, 2009 1 / 41

Content

1 Introduction

2 Approach

3 Topology Generator

4 Load Generator

5 Simulator

6 Simulation Results

7 Conclusion

8 Acknowledgements

9 References

Barisits, Boyd (Vienna UT, Georgia Tech) MARTINWILLSIM August 13, 2009 2 / 41

Introduction

Introduction

Barisits, Boyd (Vienna UT, Georgia Tech) MARTINWILLSIM August 13, 2009 3 / 41

Introduction Authors

The Authors

Martin BarisitsVienna UT, Austria

• BSc: Medical ComputerScience

• MSc: ComputationalIntelligence

Will BoydGeorgia Tech, USA

• BSc: Physics & ComputerScience

Barisits, Boyd (Vienna UT, Georgia Tech) MARTINWILLSIM August 13, 2009 4 / 41

Introduction Problem

The Problem

• Goal: Test data distribution strategies• Need: Simulator• Need: Ability to load the Simulator with the current GRID Topology• Need: Inject the Simulator with realistic workloads• Process Results

Barisits, Boyd (Vienna UT, Georgia Tech) MARTINWILLSIM August 13, 2009 5 / 41

Approach

Approach

Barisits, Boyd (Vienna UT, Georgia Tech) MARTINWILLSIM August 13, 2009 6 / 41

Approach Design

• Two basic challenges• Write a tool to get a snapshot of the whole GRID environment

(Topology, Usage) and to generate Loads• Write a Simulator which can execute this input

• Different Simulators for GRID analysis are available in theresearch community

• For time reasons we decided to use a Simulator package to baseour Simulator on

Barisits, Boyd (Vienna UT, Georgia Tech) MARTINWILLSIM August 13, 2009 7 / 41

Approach Design

• Two basic challenges• Write a tool to get a snapshot of the whole GRID environment

(Topology, Usage) and to generate Loads• Write a Simulator which can execute this input

• Different Simulators for GRID analysis are available in theresearch community

• For time reasons we decided to use a Simulator package to baseour Simulator on

Barisits, Boyd (Vienna UT, Georgia Tech) MARTINWILLSIM August 13, 2009 7 / 41

Approach Design

• Two basic challenges• Write a tool to get a snapshot of the whole GRID environment

(Topology, Usage) and to generate Loads• Write a Simulator which can execute this input

• Different Simulators for GRID analysis are available in theresearch community

• For time reasons we decided to use a Simulator package to baseour Simulator on

Barisits, Boyd (Vienna UT, Georgia Tech) MARTINWILLSIM August 13, 2009 7 / 41

Approach Package Evaluation

Package Evaluation

• Evaluation of GRID/cloud computing simulation packages• SimGrid[2]

• Based on pure C

• Pros: Fast execution time; low memory consumption; scalable

• Cons: Lacking in some functionality; High level of abstraction• GridSim[3]

• Java-based

• Pros: Highly developed; internal logging of network traffic; easier touse; Packet-based

• Cons: Slow execution time; bad memory consumption; not scalable

Barisits, Boyd (Vienna UT, Georgia Tech) MARTINWILLSIM August 13, 2009 8 / 41

Approach Package Evaluation

Package Evaluation

• Evaluation of GRID/cloud computing simulation packages• SimGrid[2]

• Based on pure C

• Pros: Fast execution time; low memory consumption; scalable

• Cons: Lacking in some functionality; High level of abstraction• GridSim[3]

• Java-based

• Pros: Highly developed; internal logging of network traffic; easier touse; Packet-based

• Cons: Slow execution time; bad memory consumption; not scalable

Barisits, Boyd (Vienna UT, Georgia Tech) MARTINWILLSIM August 13, 2009 8 / 41

Approach Package Evaluation

Package Evaluation

• Evaluation of GRID/cloud computing simulation packages• SimGrid[2]

• Based on pure C

• Pros: Fast execution time; low memory consumption; scalable

• Cons: Lacking in some functionality; High level of abstraction• GridSim[3]

• Java-based

• Pros: Highly developed; internal logging of network traffic; easier touse; Packet-based

• Cons: Slow execution time; bad memory consumption; not scalable

Barisits, Boyd (Vienna UT, Georgia Tech) MARTINWILLSIM August 13, 2009 8 / 41

Approach Package Evaluation

Package Performance

• Attempted to simulate oneday on GRID (1.5 millionfile transfers)

• GridSim: exponential inCPU time with increasingtransfers

• SimGrid: linear in CPUTime with increasingtransfers

Barisits, Boyd (Vienna UT, Georgia Tech) MARTINWILLSIM August 13, 2009 9 / 41

Approach Flow

Flow

Barisits, Boyd (Vienna UT, Georgia Tech) MARTINWILLSIM August 13, 2009 10 / 41

Topology Generator

Topology Generator

Barisits, Boyd (Vienna UT, Georgia Tech) MARTINWILLSIM August 13, 2009 11 / 41

Topology Generator The GRID

GRID Sites

GRID sites across the world

Barisits, Boyd (Vienna UT, Georgia Tech) MARTINWILLSIM August 13, 2009 12 / 41

Topology Generator The GRID

ATLAS Computing Model

• Hierarchical computingnetwork

• Tier-0

• Tier-1

• Tier-2

• Tier-0 (CERN)generates data

• Tier-1s store data

• Tier-2s process data The Tier-0 and Tier-2network configuration[1]

Barisits, Boyd (Vienna UT, Georgia Tech) MARTINWILLSIM August 13, 2009 13 / 41

Topology Generator The GRID

ATLAS Computing Model

• Hierarchical computingnetwork

• Tier-0

• Tier-1

• Tier-2

• Tier-0 (CERN)generates data

• Tier-1s store data

• Tier-2s process data The Tier-0 and Tier-2network configuration[1]

Barisits, Boyd (Vienna UT, Georgia Tech) MARTINWILLSIM August 13, 2009 13 / 41

Topology Generator The GRID

ATLAS Computing Model

• Hierarchical computingnetwork

• Tier-0

• Tier-1

• Tier-2

• Tier-0 (CERN)generates data

• Tier-1s store data

• Tier-2s process data The Tier-0 and Tier-2network configuration[1]

Barisits, Boyd (Vienna UT, Georgia Tech) MARTINWILLSIM August 13, 2009 13 / 41

Topology Generator The GRID

ATLAS Computing Model

• Hierarchical computingnetwork

• Tier-0

• Tier-1

• Tier-2

• Tier-0 (CERN)generates data

• Tier-1s store data

• Tier-2s process data The Tier-0 and Tier-2network configuration[1]

Barisits, Boyd (Vienna UT, Georgia Tech) MARTINWILLSIM August 13, 2009 13 / 41

Topology Generator Simulator Topology

The Topology Generator

• TopologyGen.py• Script to construct GRID topology

• Parses TiersOfATLASCache.py

• Finds and associates Tier-1s and Tier-2s

• Queries the DQ2 database

• Total disk space capacity

• Used disk space

• Topology is written to two XML files

Barisits, Boyd (Vienna UT, Georgia Tech) MARTINWILLSIM August 13, 2009 14 / 41

Topology Generator Simulator Topology

The Topology Generator

• TopologyGen.py• Script to construct GRID topology

• Parses TiersOfATLASCache.py

• Finds and associates Tier-1s and Tier-2s

• Queries the DQ2 database

• Total disk space capacity

• Used disk space

• Topology is written to two XML files

Barisits, Boyd (Vienna UT, Georgia Tech) MARTINWILLSIM August 13, 2009 14 / 41

Topology Generator Simulator Topology

The Topology Generator

• TopologyGen.py• Script to construct GRID topology

• Parses TiersOfATLASCache.py

• Finds and associates Tier-1s and Tier-2s

• Queries the DQ2 database

• Total disk space capacity

• Used disk space

• Topology is written to two XML files

Barisits, Boyd (Vienna UT, Georgia Tech) MARTINWILLSIM August 13, 2009 14 / 41

Topology Generator Simulator Topology

The Topology Generator

• TopologyGen.py• Script to construct GRID topology

• Parses TiersOfATLASCache.py

• Finds and associates Tier-1s and Tier-2s

• Queries the DQ2 database

• Total disk space capacity

• Used disk space

• Topology is written to two XML files

Barisits, Boyd (Vienna UT, Georgia Tech) MARTINWILLSIM August 13, 2009 14 / 41

Topology Generator Simulator Topology

The Topology Generator

• TopologyGen.py• Script to construct GRID topology

• Parses TiersOfATLASCache.py

• Finds and associates Tier-1s and Tier-2s

• Queries the DQ2 database

• Total disk space capacity

• Used disk space

• Topology is written to two XML files

Barisits, Boyd (Vienna UT, Georgia Tech) MARTINWILLSIM August 13, 2009 14 / 41

Topology Generator Simulator Topology

The Topology Generator

• TopologyGen.py• Script to construct GRID topology

• Parses TiersOfATLASCache.py

• Finds and associates Tier-1s and Tier-2s

• Queries the DQ2 database

• Total disk space capacity

• Used disk space

• Topology is written to two XML files

Barisits, Boyd (Vienna UT, Georgia Tech) MARTINWILLSIM August 13, 2009 14 / 41

Topology Generator Simulator Topology

The Topology Generator

• TopologyGen.py• Script to construct GRID topology

• Parses TiersOfATLASCache.py

• Finds and associates Tier-1s and Tier-2s

• Queries the DQ2 database

• Total disk space capacity

• Used disk space

• Topology is written to two XML files

Barisits, Boyd (Vienna UT, Georgia Tech) MARTINWILLSIM August 13, 2009 14 / 41

Topology Generator Simulator Topology

Platform and Deployment Files

• Platform file• Node declarations

• Link declarations

• Route declarations

• Deployment file• Logfiles for each node

• Total and used disk space

• Used disk space by datatype

• Tier-0 loadfiles

• Associated Tier-1s and Tier-2s

Barisits, Boyd (Vienna UT, Georgia Tech) MARTINWILLSIM August 13, 2009 15 / 41

Topology Generator Simulator Topology

Platform and Deployment Files

• Platform file• Node declarations

• Link declarations

• Route declarations

• Deployment file• Logfiles for each node

• Total and used disk space

• Used disk space by datatype

• Tier-0 loadfiles

• Associated Tier-1s and Tier-2s

Barisits, Boyd (Vienna UT, Georgia Tech) MARTINWILLSIM August 13, 2009 15 / 41

Topology Generator Simulator Topology

Platform and Deployment Files

• Platform file• Node declarations

• Link declarations

• Route declarations

• Deployment file• Logfiles for each node

• Total and used disk space

• Used disk space by datatype

• Tier-0 loadfiles

• Associated Tier-1s and Tier-2s

Barisits, Boyd (Vienna UT, Georgia Tech) MARTINWILLSIM August 13, 2009 15 / 41

Topology Generator Simulator Topology

Route declaration from a Tier-0 to Tier-1 in the platform file<route src= ’CERN_52 ’ ds t= ’RAL−LCG2_MCDISK ’>

< l i n k : c t n i d = ’RAL−LCG2_MCDisk_InternalLink ’>< l i n k : c t n i d = ’ RAL_OPNLinkInternal ’ / >< / l i n k : c t n i d = ’ CERN_52_InternalLink ’>

< / rou te>

Host definition in the platform file<process f u n c t i o n = ’ T ier1Storage ’ host= ’ INFN−T1_DATADISK ’>

<argument value= ’ 1 ’ / > < !−− l o g f i l e −−><argument value= ’ 214576722 ’ / > < !−− t o t a l d isk space −−><argument value= ’ 75266283 ’ / > < !−− used d isk space −−><argument value= ’ 2631309 ’ / > < !−− RAW−−><argument value= ’ 0 ’ / > < !−− SIM −−><argument value= ’ 0 ’ / > < !−− DRD−−><argument value= ’ 28882683 ’ / > < !−− ESD−−><argument value= ’ 21172405 ’ / > < !−− AOD−−><argument value= ’ 0 ’ / > < !−− DPD−−><argument value= ’ 244615 ’ / > < !−− TAG−−><argument value= ’ INFN−MILANO−ATLASC_DATADISK; INFN−NAPOLI−ATLAS_DATADISK ; ’ / >

< / process>

Barisits, Boyd (Vienna UT, Georgia Tech) MARTINWILLSIM August 13, 2009 16 / 41

Topology Generator Simulator Topology

MARTINWILLSIM GRID Topology

The topology that is generated for simulation

Barisits, Boyd (Vienna UT, Georgia Tech) MARTINWILLSIM August 13, 2009 17 / 41

Load Generator

Load Generator

Barisits, Boyd (Vienna UT, Georgia Tech) MARTINWILLSIM August 13, 2009 18 / 41

Load Generator LoadGen.py

Generating a Load

• Loadfile given to each Tier-0

• Loadfiles define dataset transfers• Unique dataset ID

• Random (uniform) target Tier-1 storage node

• Random (uniform) filesize (0.5-6GB)

• Random (weekly distribution) inter-submission time

• Dataset datatype (i.e., RAW)

Barisits, Boyd (Vienna UT, Georgia Tech) MARTINWILLSIM August 13, 2009 19 / 41

Load Generator LoadGen.py

Generating a Load

• Loadfile given to each Tier-0

• Loadfiles define dataset transfers• Unique dataset ID

• Random (uniform) target Tier-1 storage node

• Random (uniform) filesize (0.5-6GB)

• Random (weekly distribution) inter-submission time

• Dataset datatype (i.e., RAW)

Barisits, Boyd (Vienna UT, Georgia Tech) MARTINWILLSIM August 13, 2009 19 / 41

Load Generator Simulating Real Loads

Load Distribution

• Dataset distribution• Uniform background

traffic

• Wednesday/Fridaypeak traffic

• Random "spikes" oftraffic

• Each component isweighted

• Distribution can easilybe adjusted

An example weekly dataset transfer distribution

Barisits, Boyd (Vienna UT, Georgia Tech) MARTINWILLSIM August 13, 2009 20 / 41

Load Generator Simulating Real Loads

Load Distribution

• Dataset distribution• Uniform background

traffic

• Wednesday/Fridaypeak traffic

• Random "spikes" oftraffic

• Each component isweighted

• Distribution can easilybe adjusted

An example weekly dataset transfer distribution

Barisits, Boyd (Vienna UT, Georgia Tech) MARTINWILLSIM August 13, 2009 20 / 41

Load Generator Simulating Real Loads

Load Distribution

• Dataset distribution• Uniform background

traffic

• Wednesday/Fridaypeak traffic

• Random "spikes" oftraffic

• Each component isweighted

• Distribution can easilybe adjusted

An example weekly dataset transfer distribution

Barisits, Boyd (Vienna UT, Georgia Tech) MARTINWILLSIM August 13, 2009 20 / 41

Load Generator Simulating Real Loads

Load Distribution

• Dataset distribution• Uniform background

traffic

• Wednesday/Fridaypeak traffic

• Random "spikes" oftraffic

• Each component isweighted

• Distribution can easilybe adjusted

An example weekly dataset transfer distribution

Barisits, Boyd (Vienna UT, Georgia Tech) MARTINWILLSIM August 13, 2009 20 / 41

Load Generator Simulating Real Loads

Load Distribution

• Dataset distribution• Uniform background

traffic

• Wednesday/Fridaypeak traffic

• Random "spikes" oftraffic

• Each component isweighted

• Distribution can easilybe adjusted

An example weekly dataset transfer distribution

Barisits, Boyd (Vienna UT, Georgia Tech) MARTINWILLSIM August 13, 2009 20 / 41

Load Generator Simulating Real Loads

Load Distribution

• Dataset distribution• Uniform background

traffic

• Wednesday/Fridaypeak traffic

• Random "spikes" oftraffic

• Each component isweighted

• Distribution can easilybe adjusted

An example weekly dataset transfer distribution

Barisits, Boyd (Vienna UT, Georgia Tech) MARTINWILLSIM August 13, 2009 20 / 41

Simulator

Simulator

Barisits, Boyd (Vienna UT, Georgia Tech) MARTINWILLSIM August 13, 2009 21 / 41

Simulator Facts

Facts

• Based on SimGrid[2]• Implemented in C• Intent to implement an extensible Simulation-Framework rather

than a strict Simulator• Goals:

• Fast• Scalable• Representative

Barisits, Boyd (Vienna UT, Georgia Tech) MARTINWILLSIM August 13, 2009 22 / 41

Simulator Facts

Facts

• Based on SimGrid[2]• Implemented in C• Intent to implement an extensible Simulation-Framework rather

than a strict Simulator• Goals:

• Fast• Scalable• Representative

Barisits, Boyd (Vienna UT, Georgia Tech) MARTINWILLSIM August 13, 2009 22 / 41

Simulator Facts

Facts

• Based on SimGrid[2]• Implemented in C• Intent to implement an extensible Simulation-Framework rather

than a strict Simulator• Goals:

• Fast• Scalable• Representative

Barisits, Boyd (Vienna UT, Georgia Tech) MARTINWILLSIM August 13, 2009 22 / 41

Simulator Facts

Facts

• Based on SimGrid[2]• Implemented in C• Intent to implement an extensible Simulation-Framework rather

than a strict Simulator• Goals:

• Fast• Scalable• Representative

Barisits, Boyd (Vienna UT, Georgia Tech) MARTINWILLSIM August 13, 2009 22 / 41

Simulator Facts

Features

• Build network topology according to the injected Topology File• Simulate the shipment and processing of DataSets• Give the user a framework to implement/change own behavior• Provide functions to write simulation output• Background Noise Generation (Traffic from other Experiments,

. . . )

Barisits, Boyd (Vienna UT, Georgia Tech) MARTINWILLSIM August 13, 2009 23 / 41

Simulator Facts

Features

• Build network topology according to the injected Topology File• Simulate the shipment and processing of DataSets• Give the user a framework to implement/change own behavior• Provide functions to write simulation output• Background Noise Generation (Traffic from other Experiments,

. . . )

Barisits, Boyd (Vienna UT, Georgia Tech) MARTINWILLSIM August 13, 2009 23 / 41

Simulator Facts

Features

• Build network topology according to the injected Topology File• Simulate the shipment and processing of DataSets• Give the user a framework to implement/change own behavior• Provide functions to write simulation output• Background Noise Generation (Traffic from other Experiments,

. . . )

Barisits, Boyd (Vienna UT, Georgia Tech) MARTINWILLSIM August 13, 2009 23 / 41

Simulator Facts

Features

• Build network topology according to the injected Topology File• Simulate the shipment and processing of DataSets• Give the user a framework to implement/change own behavior• Provide functions to write simulation output• Background Noise Generation (Traffic from other Experiments,

. . . )

Barisits, Boyd (Vienna UT, Georgia Tech) MARTINWILLSIM August 13, 2009 23 / 41

Simulator Facts

Features

• Build network topology according to the injected Topology File• Simulate the shipment and processing of DataSets• Give the user a framework to implement/change own behavior• Provide functions to write simulation output• Background Noise Generation (Traffic from other Experiments,

. . . )

Barisits, Boyd (Vienna UT, Georgia Tech) MARTINWILLSIM August 13, 2009 23 / 41

Simulator Design

Major Entities

• Nodes (Tier0, Tier1, Tier2)• Tasks (Datatransfer)• DataSets• Links

Barisits, Boyd (Vienna UT, Georgia Tech) MARTINWILLSIM August 13, 2009 24 / 41

Simulator Design

Major Entities

• Nodes (Tier0, Tier1, Tier2)• Tasks (Datatransfer)• DataSets• Links

Barisits, Boyd (Vienna UT, Georgia Tech) MARTINWILLSIM August 13, 2009 24 / 41

Simulator Design

Task

TaskTaskname Size (DataSet)

• Taskname• Command

• Size• Communication

Size• Execution Size

• DataSet• DataSet ID• DataSet Size• DataSet Type

Barisits, Boyd (Vienna UT, Georgia Tech) MARTINWILLSIM August 13, 2009 25 / 41

Simulator Design

Task

TaskTaskname Size (DataSet)

• Taskname• Command

• Size• Communication

Size• Execution Size

• DataSet• DataSet ID• DataSet Size• DataSet Type

Barisits, Boyd (Vienna UT, Georgia Tech) MARTINWILLSIM August 13, 2009 25 / 41

Simulator Design

Task

TaskTaskname Size (DataSet)

• Taskname• Command

• Size• Communication

Size• Execution Size

• DataSet• DataSet ID• DataSet Size• DataSet Type

Barisits, Boyd (Vienna UT, Georgia Tech) MARTINWILLSIM August 13, 2009 25 / 41

Simulator Design

Task

TaskTaskname Size (DataSet)

• Taskname• Command

• Size• Communication

Size• Execution Size

• DataSet• DataSet ID• DataSet Size• DataSet Type

Barisits, Boyd (Vienna UT, Georgia Tech) MARTINWILLSIM August 13, 2009 25 / 41

Simulator Design

Command Language

• (PUSH, DataSet)• (PULL, DataSetTemplate)• (DELETE, DataSetTemplate)• (PROCESS, DataSet)• (NOISE)• (INITSHUTDOWN)• (FINALIZE)

Barisits, Boyd (Vienna UT, Georgia Tech) MARTINWILLSIM August 13, 2009 26 / 41

Simulator Design

Command Language

• (PUSH, DataSet)• (PULL, DataSetTemplate)• (DELETE, DataSetTemplate)• (PROCESS, DataSet)• (NOISE)• (INITSHUTDOWN)• (FINALIZE)

Barisits, Boyd (Vienna UT, Georgia Tech) MARTINWILLSIM August 13, 2009 26 / 41

Simulator Design

Command Language

• (PUSH, DataSet)• (PULL, DataSetTemplate)• (DELETE, DataSetTemplate)• (PROCESS, DataSet)• (NOISE)• (INITSHUTDOWN)• (FINALIZE)

Barisits, Boyd (Vienna UT, Georgia Tech) MARTINWILLSIM August 13, 2009 26 / 41

Simulator Design

Command Language

• (PUSH, DataSet)• (PULL, DataSetTemplate)• (DELETE, DataSetTemplate)• (PROCESS, DataSet)• (NOISE)• (INITSHUTDOWN)• (FINALIZE)

Barisits, Boyd (Vienna UT, Georgia Tech) MARTINWILLSIM August 13, 2009 26 / 41

Simulator Design

Command Language

• (PUSH, DataSet)• (PULL, DataSetTemplate)• (DELETE, DataSetTemplate)• (PROCESS, DataSet)• (NOISE)• (INITSHUTDOWN)• (FINALIZE)

Barisits, Boyd (Vienna UT, Georgia Tech) MARTINWILLSIM August 13, 2009 26 / 41

Simulator Design

Command Language

• (PUSH, DataSet)• (PULL, DataSetTemplate)• (DELETE, DataSetTemplate)• (PROCESS, DataSet)• (NOISE)• (INITSHUTDOWN)• (FINALIZE)

Barisits, Boyd (Vienna UT, Georgia Tech) MARTINWILLSIM August 13, 2009 26 / 41

Simulator Design

Nodes

• Different types of nodes with different functions• Producer (Tier 0)• Storages (Tier 1, Tier 2)• Hosts (Tier 1, Tier 2)• FinalizeNode

• All nodes understand the Command Language• Different nodes execute commands differently

• It’s up to the user to define the semantics of a node• Node Features

• DataSet Store (Hashmap)• Ability to write simulation output• Queues• Simulation Functions (Execute, Sleep, . . . )

Barisits, Boyd (Vienna UT, Georgia Tech) MARTINWILLSIM August 13, 2009 27 / 41

Simulator Design

Nodes

• Different types of nodes with different functions• Producer (Tier 0)• Storages (Tier 1, Tier 2)• Hosts (Tier 1, Tier 2)• FinalizeNode

• All nodes understand the Command Language• Different nodes execute commands differently

• It’s up to the user to define the semantics of a node• Node Features

• DataSet Store (Hashmap)• Ability to write simulation output• Queues• Simulation Functions (Execute, Sleep, . . . )

Barisits, Boyd (Vienna UT, Georgia Tech) MARTINWILLSIM August 13, 2009 27 / 41

Simulator Design

Nodes

• Different types of nodes with different functions• Producer (Tier 0)• Storages (Tier 1, Tier 2)• Hosts (Tier 1, Tier 2)• FinalizeNode

• All nodes understand the Command Language• Different nodes execute commands differently

• It’s up to the user to define the semantics of a node• Node Features

• DataSet Store (Hashmap)• Ability to write simulation output• Queues• Simulation Functions (Execute, Sleep, . . . )

Barisits, Boyd (Vienna UT, Georgia Tech) MARTINWILLSIM August 13, 2009 27 / 41

Simulator Design

Nodes

• Different types of nodes with different functions• Producer (Tier 0)• Storages (Tier 1, Tier 2)• Hosts (Tier 1, Tier 2)• FinalizeNode

• All nodes understand the Command Language• Different nodes execute commands differently

• It’s up to the user to define the semantics of a node• Node Features

• DataSet Store (Hashmap)• Ability to write simulation output• Queues• Simulation Functions (Execute, Sleep, . . . )

Barisits, Boyd (Vienna UT, Georgia Tech) MARTINWILLSIM August 13, 2009 27 / 41

Simulator Design

Pseudo Code of a Tier-1 Storagewhile ( 1 ) {

task = receiveTask ( ) ;switch ( task )

case <PUSH, dataSet >:s to re ( dataSet ) ; / / S t o r e i n t h e F i l e S y s t e m

t i e r 2 = getNextT ier2 ( ) ;send ( t i e r 2 , PUSH, dataSet ) ; / / S e n d t h e t a s k

w r i t e S t a t i s t i c s ( ) ;case <DELETE, dataSet >:

. . .case <FINALIZE , dataSet >:

w r i t e S t a t i s t i c s ( ) ;break ;

. . .}

Barisits, Boyd (Vienna UT, Georgia Tech) MARTINWILLSIM August 13, 2009 28 / 41

Simulator Design

The way of a DataSet (1/4)

Barisits, Boyd (Vienna UT, Georgia Tech) MARTINWILLSIM August 13, 2009 29 / 41

Simulator Design

The way of a DataSet (2/4)

Barisits, Boyd (Vienna UT, Georgia Tech) MARTINWILLSIM August 13, 2009 30 / 41

Simulator Design

The way of a DataSet (3/4)

Barisits, Boyd (Vienna UT, Georgia Tech) MARTINWILLSIM August 13, 2009 31 / 41

Simulator Design

The way of a DataSet (4/4)

Barisits, Boyd (Vienna UT, Georgia Tech) MARTINWILLSIM August 13, 2009 32 / 41

Simulation Results

Simulation Results

Barisits, Boyd (Vienna UT, Georgia Tech) MARTINWILLSIM August 13, 2009 33 / 41

Simulation Results Disk Space Evolution

Overloading the GRID

Tier-0 Dataset submissiondistribution

Disk space evolution withincreasing daily dataset transfers

Barisits, Boyd (Vienna UT, Georgia Tech) MARTINWILLSIM August 13, 2009 34 / 41

Simulation Results Disk Space Evolution

Disk Space Evolution

Tier-1 storage node An associated Tier-2 storagenode

Barisits, Boyd (Vienna UT, Georgia Tech) MARTINWILLSIM August 13, 2009 35 / 41

Simulation Results Disk Space Evolution

Data Storage by Datatype

Uniform dataset transfer distribution Simulated dataset transfer distribution

Barisits, Boyd (Vienna UT, Georgia Tech) MARTINWILLSIM August 13, 2009 36 / 41

Simulation Results Scalability

Scalability of MARTINWILLSIM

• MartinWillSim run tosimulate increasingnumber of days

• 250,000 tasks/day(800TB/day)

• Simulated one month ofdataset transfers in 40mins.

• CPU time linear withnumber of simulated tasks

Barisits, Boyd (Vienna UT, Georgia Tech) MARTINWILLSIM August 13, 2009 37 / 41

Conclusion

Conclusion

Barisits, Boyd (Vienna UT, Georgia Tech) MARTINWILLSIM August 13, 2009 38 / 41

Conclusion Conclusion

Recap

• Evaluation of different Simulator Packages [2 Weeks]• Design and Implementation of:

• Topology & Load Generator [6 Weeks]• MartinWillSim Simulator [6 Weeks]

• Result Analysis• Documentation

Barisits, Boyd (Vienna UT, Georgia Tech) MARTINWILLSIM August 13, 2009 39 / 41

Conclusion Conclusion

Recap

• Evaluation of different Simulator Packages [2 Weeks]• Design and Implementation of:

• Topology & Load Generator [6 Weeks]• MartinWillSim Simulator [6 Weeks]

• Result Analysis• Documentation

Barisits, Boyd (Vienna UT, Georgia Tech) MARTINWILLSIM August 13, 2009 39 / 41

Conclusion Conclusion

Recap

• Evaluation of different Simulator Packages [2 Weeks]• Design and Implementation of:

• Topology & Load Generator [6 Weeks]• MartinWillSim Simulator [6 Weeks]

• Result Analysis• Documentation

Barisits, Boyd (Vienna UT, Georgia Tech) MARTINWILLSIM August 13, 2009 39 / 41

Conclusion Conclusion

Recap

• Evaluation of different Simulator Packages [2 Weeks]• Design and Implementation of:

• Topology & Load Generator [6 Weeks]• MartinWillSim Simulator [6 Weeks]

• Result Analysis• Documentation

Barisits, Boyd (Vienna UT, Georgia Tech) MARTINWILLSIM August 13, 2009 39 / 41

Conclusion Future Work

Future Work

• Add LISP hooks for Decision Making• Add more detail of the ATLAS Computing model to the simulator• Add functions for random errors (node failures, link failures)• Add recording of other statistics for result analysis

• Link Throughput• Usage of Processing Capacities• Replication Factors• User behavior

• For higher detail: Add Files as smallest Entity to the Simulator• Further Validation

Barisits, Boyd (Vienna UT, Georgia Tech) MARTINWILLSIM August 13, 2009 40 / 41

Conclusion Future Work

Future Work

• Add LISP hooks for Decision Making• Add more detail of the ATLAS Computing model to the simulator• Add functions for random errors (node failures, link failures)• Add recording of other statistics for result analysis

• Link Throughput• Usage of Processing Capacities• Replication Factors• User behavior

• For higher detail: Add Files as smallest Entity to the Simulator• Further Validation

Barisits, Boyd (Vienna UT, Georgia Tech) MARTINWILLSIM August 13, 2009 40 / 41

Conclusion Future Work

Future Work

• Add LISP hooks for Decision Making• Add more detail of the ATLAS Computing model to the simulator• Add functions for random errors (node failures, link failures)• Add recording of other statistics for result analysis

• Link Throughput• Usage of Processing Capacities• Replication Factors• User behavior

• For higher detail: Add Files as smallest Entity to the Simulator• Further Validation

Barisits, Boyd (Vienna UT, Georgia Tech) MARTINWILLSIM August 13, 2009 40 / 41

Conclusion Future Work

Future Work

• Add LISP hooks for Decision Making• Add more detail of the ATLAS Computing model to the simulator• Add functions for random errors (node failures, link failures)• Add recording of other statistics for result analysis

• Link Throughput• Usage of Processing Capacities• Replication Factors• User behavior

• For higher detail: Add Files as smallest Entity to the Simulator• Further Validation

Barisits, Boyd (Vienna UT, Georgia Tech) MARTINWILLSIM August 13, 2009 40 / 41

Conclusion Future Work

Future Work

• Add LISP hooks for Decision Making• Add more detail of the ATLAS Computing model to the simulator• Add functions for random errors (node failures, link failures)• Add recording of other statistics for result analysis

• Link Throughput• Usage of Processing Capacities• Replication Factors• User behavior

• For higher detail: Add Files as smallest Entity to the Simulator• Further Validation

Barisits, Boyd (Vienna UT, Georgia Tech) MARTINWILLSIM August 13, 2009 40 / 41

Conclusion Future Work

Future Work

• Add LISP hooks for Decision Making• Add more detail of the ATLAS Computing model to the simulator• Add functions for random errors (node failures, link failures)• Add recording of other statistics for result analysis

• Link Throughput• Usage of Processing Capacities• Replication Factors• User behavior

• For higher detail: Add Files as smallest Entity to the Simulator• Further Validation

Barisits, Boyd (Vienna UT, Georgia Tech) MARTINWILLSIM August 13, 2009 40 / 41

Acknowledgements

Thank you!

• Mario Lassnig

• Vincent Garonne

• Angelos Molfetas

• Ingrid Schmid

• All those who helped make the 2009 Summer Student Programmepossible!

Barisits, Boyd (Vienna UT, Georgia Tech) MARTINWILLSIM August 13, 2009 41 / 41

References

[1] "https://twiki.cern.ch/twiki/pub/LHCOPN/ImplementationDetails/map-lhcopn.png"

[2] Casanova, Legrand, Quinson: SimGrid: a Generic Frameworkfor Large-Scale Distributed Experiments, 2008

[3] Buyya, Murshed: GridSim: A Toolkit for the Modeling andSimulation of Distributed Resource Management and Schedulingfor Grid Computing, 2002

Barisits, Boyd (Vienna UT, Georgia Tech) MARTINWILLSIM August 13, 2009 41 / 41