Experience of Data Grid simulation packages using.grid2008.jinr.ru/pdf/nechaevsky.pdf · •Results...
Transcript of Experience of Data Grid simulation packages using.grid2008.jinr.ru/pdf/nechaevsky.pdf · •Results...
Nechaevskiy A.V. (SINP MSU), Korenkov V.V. (LIT JINR)
Experience of Experience of Data Grid simulationData Grid simulationpackages usingpackages using..
Dubna, 2008
Contant•Operation of LCG DataGrid•Errors of FTS services of the Grid.•Primary goals of the Grid simulation systems. •The OptorSim and the GridSim simulators. •Results of the LCG DataGrid simulation with theOptorSim.
Tier- 2s and T ie r-1s a re inter- connec ted by the genera l
pu rpose research ne tworks
Any T ie r-2 m a yaccess da ta a t
any T ie r- 1
T i e r-2 IN2P3
T R I U M F
A S C C
F N A L
B N L
Nordic
C N A F
S A R APIC
R A L
G r i d K a
T ie r- 2
T ie r -2
T i e r-2
T ie r- 2
T ie r- 2
T i e r-2
T i e r-2T i e r-2
Tier -2
Grid – solution for LHC experiments
LHC experiments support
Main faults have been allocated for the monitoring time: timeouts, the program errors, specific errors of applications and an users errors.
•SOURCE during PREPARATION phase: [REQUEST_TIMEOUT] failed to prepare source file in 180 seconds•TRANSFER during TRANSFER phase: [TRANSFER_TIMEOUT] gridftp_copy_wait: Connection timed out•The server sent an error response: 425 425 Can't open data connection. timed out() failed•DESTINATION during PREPARATION phase: [CONNECTION] failed to contact on remote SRM [srm]. Givin' up after 3 tries
Error’s details description:https://twiki.cern.ch/twiki/bin/view/LCG/TransferOperationsPopularErrors
Errors description are used in FTS monitoring:• Scope – source’s error (SOURCE – source site, DESTINATION – destination site,
TRANSFER – during transfer).• Category – an error class (FILE-EXIST, NO-SPACE-LEFT, TRANSFER-TOMEOUT etc.).• Phase – a stage in transfer life cycle on which there was an error (ALLOCATION,
TRANSFER-PREPARATION, TRANSFER, etc.).• Message – the detailed description of an error. We have a list from more than 400
various patterns which changes in time.
The primary goals solved by DataGrid simulation tools
• Simulation allows to make various experiments of investigated object;• Simulation allows to predict and prevent a number of unexpected
situations;• Simulation makes it possible to define equipment for data transfers
and data storage in a minimum variation for providing requirements of the project;
• Simulation also gives possibilities to check the system work to define its "bottlenecks" and many other possibilities.
Grid simulators:SimGridOptorSimGridSim
Requirements for grid simulator
It is obvious that a simulator must include:
• simulation of operation of DataGrid’s basic elements (data storage elements (SE), resource brokers (RB), replica catalogs (RC), network, users, sites);
• simulation time has to be much less then a time of real work of DataGrid;
• different kind of statistics is needed (for example, volume of data transfers, throughput, etc.);
• simulation of failures of the equipment is necessary and also results of the simulation should be comparable to a real situation.
• OptorSim allows to estimatevarious algorithms of optimisationand replication strategy
• Implemented in Java• Configuration files are used to set
simulation’s parameters• The source code is available• edg-wp2.web.cern.ch/edg-
wp2/optimization/optorsim.html
OptorSim
Implementation of the Replica Catalog in the LCG and in the OptorSim
LCG:The file catalogue LFC stores the information about all the files and their replicas in the LCG . It is one of the critical services.Logical File Name (LFN)
An alias created by a user to refer to some item of data, e.g.“lfn:cms/20030203/run2/track1”
Globally Unique Identifier (GUID)A non-human-readable unique identifier for an item of data, e.g.“guid:f81d4fae-7dec-11d0-a765-00a0c91e6bf6”
Site URL (SURL) / Physical FN (PFN) / Site FN (SFN)
The location of an actual piece of data on a storage system, e.g.“srm://srm.cern.ch/castor/cern.ch/grid/cms/output10_1”
OptorSim:• File information is stored in the
OptorSim in the Replica Catalogue(same in LCG) • Replica Catalogue is a list of
mapping of LFN to their physical file names (LFN and PFN in LCG)
• Replica Manager manages the data replication and registers files in Replica Catalogue (The cataloging of the files is implemented in the LFC)
• The "best" placement of replica is defined before the transfer. It allows Sites to copy the files from different sources in order to avoid huge loadings of the resources.
OptorSim’s - graphic interface
The Statistics is available in the table forms, graphics anddiagrammes
GridSim• GridSim allows to simulate various classes of heterogeneous
resources, users, applications and brokers• Implemented in Java• Configuration files are used to set simulation’s parameters• The source code is available• There is a lot of examples of the GridSim using• http://www.gridbus.org/gridsim/
The simulation details
•CERN-RDIG segment is a part ofglobal LCG structure
•GEANT2 network are used for the huge data traffic between CERN and RDIG’s sites and other participants
•Routers are also used for foreigntraffic and they are represented asbackground traffic in the simulastion
•Four RDIG’s sites - JINR, SINP (Moscow State University), IHEP, ITEP were considered
Simulation’s results•It is required 12-14 hours for transfer of 500-700 GB data with 6-12 Mb/s throughputs. This situation is close to a reality
•The volumes of the data transfers can vary from several Gigabytes to hundreds of Gigabytes per hour but channel’s throughputs in the OptorSim are fixed
•The possibility to simulate various failures of the equipment and the other errors is absent in the OptorSim
Throughput of the channel CERN-JINR and quantity of the passed data for 02.02.2008
Conclusion
• The main errors of the LCG including the FTS errors were considered
• The simulation toolkits do not provide possibility to simulate various sorts of errors in Grid
• The simulation of the various sorts of errors in Grid-networks is necessary
Questions?