A Domain-Specific Modeling Language for Scientific Data Composition and Interoperability
description
Transcript of A Domain-Specific Modeling Language for Scientific Data Composition and Interoperability
A Domain-Specific Modeling Language forScientific Data Composition and Interoperability
Hyun Cho University of Alabama at Birmingham
Jeff Gray University of Alabama
File Formats: Image Files
Organize and store digital images that are composed of either pixel or vector (geometric) data
Bitmap-based Created by scanner and digital camera TIF, JPG, BMP
Vector-based Geometric description + Bitmap Resolution Independent &
Infinitely scalable Font, DRW, CGM
File Formats: Music and Audio Files
Storing audio data that are produced by audio-to-digital converters
Key Parameters Sample Rate, Resolution, Number of channels
Uncompressed formats WAV, AIFF and AU
Lossless compression Formats FLAC, Lossless Windows Media Audio (WMA)
Lossy compression Formats MP3, Lossy Windows Media Audio (WMA)
File Formats: Text Files
File formats that are structured as plain text, representing a sequence of lines
ASCII, TXT
File Formats: Compound File Formats
Used to structure the contents of a document in the file
Contain a number of independent data streams that are organized in a hierarchy Stream: files in a file system Storage: sub-directories in a file system
MS Office, OpenOffice
Characteristics of Generic File Formats
Can handle one or two data types Numeric data or alphanumeric data
May have a limitation of the file size Mostly limited to a maximum file size of 2GB
May increase file I/O time linearly as the file size grows
0
100
200
300
400
500
600
700
800
0 10 20 30 40 50 60 70 80 90 100
Ela
pse
Tim
e (s)
File Size (M)
C
Java
An In-Depth Examination of Java I/O Performance and Possible Tuning Strategieshttp://pages.cs.wisc.edu/~remzi/Classes/736/Fall2000/Project-Writeups/KaiHongfei.html
Characteristics of Generic File Formats
Can handle one or two data type Numeric data or alphanumeric data
May have a limitation of the file size Mostly limited to a maximum file size of 2GB
May increase file I/O time linearly as the file size is grew
0
100
200
300
400
500
600
700
800
0 10 20 30 40 50 60 70 80 90 100
Ela
pse
Tim
e (s)
File Size (M)
C
Java
An In-Depth Examination of Java I/O Performance and Possible Tuning Strategieshttp://pages.cs.wisc.edu/~remzi/Classes/736/Fall2000/Project-Writeups/KaiHongfei.html
These generic file formats are not appropriate for storing and retrieving scientific data because the files were not designed to maintain high volume of complex scientific data, such as high resolution images, massive numerical data, and graphs.
Scientific Data Format: NetCDF3
Network Common Data Format Machine-independent file format
Support a wide variety of platformsincluding Linux, MacOS, & Windows
Representing multi-dimensional arrayswith ancillary data
Y
X
Time
141 241 341
131 231 331
121 221 321
111 211 311
441
431
421
411
143 243 343
133 233 333
123 223 323
113 213 313
443
433
423
413
Time = 1 Time = n
…
Scientific Data Format: HDF5
Hierarchical Data Format File format for managing any kind of data
Support high volume and/or complex data Platform-independent Flexible, efficient storage
and I/O
Characteristics of the Scientific Data File Formats
Self-Descriptive Contain metadata to inform the contained data type and
their organization Directly Accessible
Can access arbitrary data through APIs Concurrently Accessible
Multiple threads or processes can access data simultaneously
Enable high performance computing and speedier access
Archivable Have their own archiving mechanism to backup and
restore a high volume of data
Challenges in Using the Scientific Data File Formats
Use different representations to organize the file structure Each file format needs its own data visualization and
composition It is difficult to exchange data between two or more
scientific data formats Manage the evolution of APIs
Challenging to verify that APIs are evolved in accordance with the evolution of file specification
Maintain stability of existing applications from API evolution User applications are subject to change of APIs
Limited support for data integration among heterogeneous scientific data formats
Framework for Scientific Data File Management
API Abstraction Layer
CDF API
HDF API
HW API
CDF LibHDF Lib
Device Driver
NetCDF API
NetCDF Lib
...
CDF Data
HDF Data
NetCDF Data
Devices
Metamodels
CDF HDF NetCDF
File Content ManagerContent
ComposerContent Verifier
Content Mapper
Communication model
...Physical
Layer
API Layer
DSML Layer
NEW SLIDES NEEDED HERE TO INTRODUCE DSM!
Model-Driven Engineering (MDE) and Domain-Specific Modeling (DSM)
MDE: specifies and generates software systems based on high-level models
Domain-Specific Modeling (DSM): a paradigm of MDE that uses notations and rules from an application domain
Metamodel: defines a Domain-specific Modeling language (DSML) by specifying the entities and their relationships in an application domain
Model: an instance of the metamodel
Model Transformation: a process that converts one or more models to various levels of software artifacts (e.g., other models, source code)
Unifying the representation of file structure organization
Adapt a DSML to build a tool for visualizing & composing the scientific file format in a unified way
Analyze data model of each scientific file format
Feature Model
Define DSML from Feature Model
Common Data Model
Variable Data Model
Grammar & Syntax
Implement DSML
DSML Tool
Unifying the representation of file structure organization
Feature Model for Scientific File Format Describe some highlights here And here
Unifying the representation of file structure organization
Content Composer DSML Modeling tool for scientific data file Implemented by using GEMS
API Abstraction Layer
Help to protect user applications from the evolution of APIs
NetCDF HDF5
int nc_create ( const char* path, int cmode, int *ncidp)
H5File (const char *name,unsigned int flags)
Abstraction
createFile( const char *path, FileCreationProperty fileCreationProperty)
API Abstraction Layer
HDF API
HDF Lib
NetCDF API
NetCDF Lib
Integrating data among heterogeneous data formats
Content Mapper Define rules how to map data from a scientific data
format to another Content Verifier
Verify the correctness of the file composition Verify the correctness of mapping rule
Metamodels
CDF HDF NetCDF
File Content ManagerContent
ComposerContent Verifier
Content Mapper
Communication model
DSML Layer
Summary
From the prototype of the framework A DSML can help to build a graphical tool to compose and
support interoperability across scientific file structures Adoption of the layered architecture in the framework can
help to maintain the independence of each layer Both the API abstraction layer and the layered
architecture are essential to develop and maintain user applications
Further works Create metamodels that include full specification of each
scientific file Categorizing APIs in accordance to their intended use for
API abstraction layer Develop metamodels for managing API evolution
Thank you!
Example of Scientific Data Format: OPeNDAP
Client-server protocol for scientific data access Targeted oceanographic data management