Www.efda-taskforce-itm.org The european ITM Task Force data structure F. Imbeaux.

www.efda-taskforce-itm.org

The european ITM Task Force data structure

F. Imbeaux


• Data structure : logic, tools

• Universal Access Layer (UAL)

• Data base organisation, storage, and catalogue system

Outline


• Key elements for the communication between different codes :

– Definition of a common format

– This common format should be generic enough :

• to be shared by all codes

• to apply to all machines / circumstances for a given physical problem

• Practical point of view :

– Code i/o interface uses directly the data structure

– Many modern languages (Fortran 90, C++, …) can use structured objects the data structure design is language independent

– The content of the structured object can be changed (e.g. add a new variable) without having to modify the i/o interface of the codes

• Conceptual point of view :

– Object-oriented approach to Consistent Physical Objects (CPO)

Why a data structure ?


• Full description of a tokamak : physics quantities + subsystems characteristics + diagnostics measurements

Object oriented data structure :• High degree of organisation : several subtrees corresponding to « Consistent

Physical Objects » (avoid flat structures with long list of parameter names).

• Substructures correspond to Consistent Physical Object :

– Subsystem : (e.g. a heating system, or a diagnostic) : will contain structured information on the hardware setup and the measured data by / related to this object.

– Code results (e.g. a given plasma equilibrium, or the various source terms and fast particle distribution function from an RF code) : will contain structured information on the code parameters and the physics results.

• Programming Language flexibility : use of recent software technologies : database structure is defined using XML schemas

Object oriented data structure


• XML is a generic and standardised object-oriented language, quite convenient to describe structures

• XML files can also contain the actual data, but we do not use this possibility (ASCII format not convenient for large size numerical data)

• XML schemas are used to define the data structure (arborescence, type of the objects, …). User-friendly tools (XML editors) allow fast and easy design of the data structure.

• Small translations scripts allow an automated translation of the structure for various purposes :

– Generate type definitions in various languages

– Generate access routines to CPO in various languages

– Generate documentation

– Extract specific parts of the data structure (e.g. machine description)

Use of XML schemas


• Just below the top : subtrees representing Consistent Physical Objects : subsystems of the tokamak, or topical code results (equilibrium, MHD, RF, …)

• One general bookkeeping node (contains in particular the reference GPN)

• Set of reduced data summarising the main simulation parameters ("0D") for the data base catalogue

Detailed data structure : TOP


• Typical diagnostic structure

• Each subtree (CPO) has its own time array.

• Each subtree (CPO) has one bookkeeping structure.

Detailed data structure : CPO diagnostic


• Typical code structure

• Each subtree (CPO) has its own time array.

• Each subtree (CPO) has one bookkeeping structure.

• Stores code results and code parameters

Detailed data structure : CPO code


• A database entry is an instance of the data structure

• During simulations, the « plasma state component » is an instance of the data structure

• All data exchanged between codes are part of the data structure (CPOs)

• Access to the data structure is managed by the Universal Access Layer

The data structure is a key central object


• Library allowing to access to the data structure

– From the database

– From the plasma state component during the run

• Low level routines : Get/Put a single variable. Developed in C.

• User level routines : Get/Put a whole CPO, with time interpolation / resampling options. Developed for F90 and C++.

• Transport layer : Access to the data (knows about the storage method)

• Since the « user level » routines deal with CPO, they must adhere to the data structure they are generated by XSL (XML) scripts from the schemas

The Universal Access Layer


• The data is presently stored on an MDS+ server

– Widely used data access system in the fusion community

– Interfaces already exist with many languages

– Convenient for storing multi-dimensional arrays, no problem with large data size

– Not really object oriented (arrays of objects not possible), slow for large number of data calls

• The XML schemas defining the data structure are used to build the MDS+ model tree (automated script)

• The data storage system may evolve in the future

• The data structure is in principle storage-independent : CPOs can be stored with different methods. They contain nodes describing the storage method, to be used by the Transport Layer to access data.

Database storage


• A database entry is defined by :

– The tokamak name (MDS tree name)

– The shot number

– The version of the data / reference number of the simulation

• Use of the MDS+ shot number as a Generalised Pulse Number (GPN)

• A full tokamak simulator should be able to compute all possible experimental quantities Unique data structure for experimental data and all kinds of simulations

• Guarantee the consistency of a given dataset with minimum bookkeeping effort Each entry of the database corresponds to a unique consistent physics dataset

• Each new simulation or version of the experimental data creates a new entry

Definition of database entries


• Guarantee data consistency within one entry each new simulation or version of the experimental data creates a new entry. Copying all data present in the structure would cost a lot of storage space.

• Only data that are modified are explicitly written in the « output » GPN

• The unmodified data can be tracked down using a signal referencing the « input » GPN.

– This signal would be located at the top of the tree

– Valid for all subtrees (subtrees of different origin not allowed, since it may violate data consistency)

simple and efficient bookkeeping

• A catalogue system, based on a relational database allows to keep track of all existing entries in the ITM DB, and their relation (input/output)

Referencing system


Referencing system

Exp. Data

0535220004

Ref : none

Ref : 0535220001

Simulation #1

0535220002

Ref : 0535220004

Simulation #2

0535220005

Ref : 0535220004

Simulation #3

0535220006

Ref : 0535220002

Simulation #4

0535220003

Exp. Data

0535220001

Ref : none

• Guarantees data consistency

• Referencing system recursive search, hidden from the user if he does not want to know about it


• Data structure completed for IMP #1 (equilibrium). Design ongoing for IMP3 (integration – core and edge transport) and IMP5 (H&CD sources)

• MDS+ server set up, model tree adheres to the DS

• UAL : first prototype has been produced, being tested. Hope to deliver first user version in June (C++ and F90)

• Extend DS to other physics areas

• Data base catalogue and referencing system designed conceptually, tools must be developed.

• First test of the whole set DS + UAL : benchmarking equilibrium codes (IMP1 task)

Present status and perspectives

Www.efda-taskforce-itm.org The european ITM Task Force data structure F. Imbeaux.

Documents

Transcript of Www.efda-taskforce-itm.org The european ITM Task Force data structure F. Imbeaux.