Www.efda-taskforce-itm.org The european ITM Task Force data structure F. Imbeaux.
-
Upload
darcy-kennedy -
Category
Documents
-
view
214 -
download
2
Transcript of Www.efda-taskforce-itm.org The european ITM Task Force data structure F. Imbeaux.
www.efda-taskforce-itm.org
The european ITM Task Force data structure
F. Imbeaux
www.efda-taskforce-itm.org
• Data structure : logic, tools
• Universal Access Layer (UAL)
• Data base organisation, storage, and catalogue system
Outline
www.efda-taskforce-itm.org
• Key elements for the communication between different codes :
– Definition of a common format
– This common format should be generic enough :
• to be shared by all codes
• to apply to all machines / circumstances for a given physical problem
• Practical point of view :
– Code i/o interface uses directly the data structure
– Many modern languages (Fortran 90, C++, …) can use structured objects the data structure design is language independent
– The content of the structured object can be changed (e.g. add a new variable) without having to modify the i/o interface of the codes
• Conceptual point of view :
– Object-oriented approach to Consistent Physical Objects (CPO)
Why a data structure ?
www.efda-taskforce-itm.org
• Full description of a tokamak : physics quantities + subsystems characteristics + diagnostics measurements
Object oriented data structure :• High degree of organisation : several subtrees corresponding to « Consistent
Physical Objects » (avoid flat structures with long list of parameter names).
• Substructures correspond to Consistent Physical Object :
– Subsystem : (e.g. a heating system, or a diagnostic) : will contain structured information on the hardware setup and the measured data by / related to this object.
– Code results (e.g. a given plasma equilibrium, or the various source terms and fast particle distribution function from an RF code) : will contain structured information on the code parameters and the physics results.
• Programming Language flexibility : use of recent software technologies : database structure is defined using XML schemas
Object oriented data structure
www.efda-taskforce-itm.org
• XML is a generic and standardised object-oriented language, quite convenient to describe structures
• XML files can also contain the actual data, but we do not use this possibility (ASCII format not convenient for large size numerical data)
• XML schemas are used to define the data structure (arborescence, type of the objects, …). User-friendly tools (XML editors) allow fast and easy design of the data structure.
• Small translations scripts allow an automated translation of the structure for various purposes :
– Generate type definitions in various languages
– Generate access routines to CPO in various languages
– Generate documentation
– Extract specific parts of the data structure (e.g. machine description)
Use of XML schemas
www.efda-taskforce-itm.org
• Just below the top : subtrees representing Consistent Physical Objects : subsystems of the tokamak, or topical code results (equilibrium, MHD, RF, …)
• One general bookkeeping node (contains in particular the reference GPN)
• Set of reduced data summarising the main simulation parameters ("0D") for the data base catalogue
Detailed data structure : TOP
www.efda-taskforce-itm.org
• Typical diagnostic structure
• Each subtree (CPO) has its own time array.
• Each subtree (CPO) has one bookkeeping structure.
Detailed data structure : CPO diagnostic
www.efda-taskforce-itm.org
• Typical code structure
• Each subtree (CPO) has its own time array.
• Each subtree (CPO) has one bookkeeping structure.
• Stores code results and code parameters
Detailed data structure : CPO code
www.efda-taskforce-itm.org
• A database entry is an instance of the data structure
• During simulations, the « plasma state component » is an instance of the data structure
• All data exchanged between codes are part of the data structure (CPOs)
• Access to the data structure is managed by the Universal Access Layer
The data structure is a key central object
www.efda-taskforce-itm.org
• Library allowing to access to the data structure
– From the database
– From the plasma state component during the run
• Low level routines : Get/Put a single variable. Developed in C.
• User level routines : Get/Put a whole CPO, with time interpolation / resampling options. Developed for F90 and C++.
• Transport layer : Access to the data (knows about the storage method)
• Since the « user level » routines deal with CPO, they must adhere to the data structure they are generated by XSL (XML) scripts from the schemas
The Universal Access Layer
www.efda-taskforce-itm.org
• The data is presently stored on an MDS+ server
– Widely used data access system in the fusion community
– Interfaces already exist with many languages
– Convenient for storing multi-dimensional arrays, no problem with large data size
– Not really object oriented (arrays of objects not possible), slow for large number of data calls
• The XML schemas defining the data structure are used to build the MDS+ model tree (automated script)
• The data storage system may evolve in the future
• The data structure is in principle storage-independent : CPOs can be stored with different methods. They contain nodes describing the storage method, to be used by the Transport Layer to access data.
Database storage
www.efda-taskforce-itm.org
• A database entry is defined by :
– The tokamak name (MDS tree name)
– The shot number
– The version of the data / reference number of the simulation
• Use of the MDS+ shot number as a Generalised Pulse Number (GPN)
• A full tokamak simulator should be able to compute all possible experimental quantities Unique data structure for experimental data and all kinds of simulations
• Guarantee the consistency of a given dataset with minimum bookkeeping effort Each entry of the database corresponds to a unique consistent physics dataset
• Each new simulation or version of the experimental data creates a new entry
Definition of database entries
www.efda-taskforce-itm.org
• Guarantee data consistency within one entry each new simulation or version of the experimental data creates a new entry. Copying all data present in the structure would cost a lot of storage space.
• Only data that are modified are explicitly written in the « output » GPN
• The unmodified data can be tracked down using a signal referencing the « input » GPN.
– This signal would be located at the top of the tree
– Valid for all subtrees (subtrees of different origin not allowed, since it may violate data consistency)
simple and efficient bookkeeping
• A catalogue system, based on a relational database allows to keep track of all existing entries in the ITM DB, and their relation (input/output)
Referencing system
www.efda-taskforce-itm.org
Referencing system
Exp. Data
0535220004
Ref : none
Ref : 0535220001
Simulation #1
0535220002
Ref : 0535220004
Simulation #2
0535220005
Ref : 0535220004
Simulation #3
0535220006
Ref : 0535220002
Simulation #4
0535220003
Exp. Data
0535220001
Ref : none
• Guarantees data consistency
• Referencing system recursive search, hidden from the user if he does not want to know about it
www.efda-taskforce-itm.org
• Data structure completed for IMP #1 (equilibrium). Design ongoing for IMP3 (integration – core and edge transport) and IMP5 (H&CD sources)
• MDS+ server set up, model tree adheres to the DS
• UAL : first prototype has been produced, being tested. Hope to deliver first user version in June (C++ and F90)
• Extend DS to other physics areas
• Data base catalogue and referencing system designed conceptually, tools must be developed.
• First test of the whole set DS + UAL : benchmarking equilibrium codes (IMP1 task)
Present status and perspectives