A Domain-Specific Modeling Language for Scientific Data Composition and Interoperability

22
A Domain-Specific Modeling Language for Scientific Data Composition and Interoperability Hyun Cho University of Alabama at Birmingham Jeff Gray University of Alabama

description

A Domain-Specific Modeling Language for Scientific Data Composition and Interoperability. File Formats: Image Files. Organize and store digital images that are composed of either pixel or vector (geometric) data Bitmap-based Created by scanner and digital camera TIF, JPG, BMP Vector-based - PowerPoint PPT Presentation

Transcript of A Domain-Specific Modeling Language for Scientific Data Composition and Interoperability

Page 1: A Domain-Specific Modeling Language for Scientific Data Composition and Interoperability

A Domain-Specific Modeling Language forScientific Data Composition and Interoperability

Hyun Cho University of Alabama at Birmingham

Jeff Gray University of Alabama

Page 2: A Domain-Specific Modeling Language for Scientific Data Composition and Interoperability

File Formats: Image Files

Organize and store digital images that are composed of either pixel or vector (geometric) data

Bitmap-based Created by scanner and digital camera TIF, JPG, BMP

Vector-based Geometric description + Bitmap Resolution Independent &

Infinitely scalable Font, DRW, CGM

Page 3: A Domain-Specific Modeling Language for Scientific Data Composition and Interoperability

File Formats: Music and Audio Files

Storing audio data that are produced by audio-to-digital converters

Key Parameters Sample Rate, Resolution, Number of channels

Uncompressed formats WAV, AIFF and AU

Lossless compression Formats FLAC, Lossless Windows Media Audio (WMA)

Lossy compression Formats MP3, Lossy Windows Media Audio (WMA)

Page 4: A Domain-Specific Modeling Language for Scientific Data Composition and Interoperability

File Formats: Text Files

File formats that are structured as plain text, representing a sequence of lines

ASCII, TXT

Page 5: A Domain-Specific Modeling Language for Scientific Data Composition and Interoperability

File Formats: Compound File Formats

Used to structure the contents of a document in the file

Contain a number of independent data streams that are organized in a hierarchy Stream: files in a file system Storage: sub-directories in a file system

MS Office, OpenOffice

Page 6: A Domain-Specific Modeling Language for Scientific Data Composition and Interoperability

Characteristics of Generic File Formats

Can handle one or two data types Numeric data or alphanumeric data

May have a limitation of the file size Mostly limited to a maximum file size of 2GB

May increase file I/O time linearly as the file size grows

0

100

200

300

400

500

600

700

800

0 10 20 30 40 50 60 70 80 90 100

Ela

pse

Tim

e (s)

File Size (M)

C

Java

An In-Depth Examination of Java I/O Performance and Possible Tuning Strategieshttp://pages.cs.wisc.edu/~remzi/Classes/736/Fall2000/Project-Writeups/KaiHongfei.html

Page 7: A Domain-Specific Modeling Language for Scientific Data Composition and Interoperability

Characteristics of Generic File Formats

Can handle one or two data type Numeric data or alphanumeric data

May have a limitation of the file size Mostly limited to a maximum file size of 2GB

May increase file I/O time linearly as the file size is grew

0

100

200

300

400

500

600

700

800

0 10 20 30 40 50 60 70 80 90 100

Ela

pse

Tim

e (s)

File Size (M)

C

Java

An In-Depth Examination of Java I/O Performance and Possible Tuning Strategieshttp://pages.cs.wisc.edu/~remzi/Classes/736/Fall2000/Project-Writeups/KaiHongfei.html

These generic file formats are not appropriate for storing and retrieving scientific data because the files were not designed to maintain high volume of complex scientific data, such as high resolution images, massive numerical data, and graphs.

Page 8: A Domain-Specific Modeling Language for Scientific Data Composition and Interoperability

Scientific Data Format: NetCDF3

Network Common Data Format Machine-independent file format

Support a wide variety of platformsincluding Linux, MacOS, & Windows

Representing multi-dimensional arrayswith ancillary data

Y

X

Time

141 241 341

131 231 331

121 221 321

111 211 311

441

431

421

411

143 243 343

133 233 333

123 223 323

113 213 313

443

433

423

413

Time = 1 Time = n

Page 9: A Domain-Specific Modeling Language for Scientific Data Composition and Interoperability

Scientific Data Format: HDF5

Hierarchical Data Format File format for managing any kind of data

Support high volume and/or complex data Platform-independent Flexible, efficient storage

and I/O

Page 10: A Domain-Specific Modeling Language for Scientific Data Composition and Interoperability

Characteristics of the Scientific Data File Formats

Self-Descriptive Contain metadata to inform the contained data type and

their organization Directly Accessible

Can access arbitrary data through APIs Concurrently Accessible

Multiple threads or processes can access data simultaneously

Enable high performance computing and speedier access

Archivable Have their own archiving mechanism to backup and

restore a high volume of data

Page 11: A Domain-Specific Modeling Language for Scientific Data Composition and Interoperability

Challenges in Using the Scientific Data File Formats

Use different representations to organize the file structure Each file format needs its own data visualization and

composition It is difficult to exchange data between two or more

scientific data formats Manage the evolution of APIs

Challenging to verify that APIs are evolved in accordance with the evolution of file specification

Maintain stability of existing applications from API evolution User applications are subject to change of APIs

Limited support for data integration among heterogeneous scientific data formats

Page 12: A Domain-Specific Modeling Language for Scientific Data Composition and Interoperability

Framework for Scientific Data File Management

API Abstraction Layer

CDF API

HDF API

HW API

CDF LibHDF Lib

Device Driver

NetCDF API

NetCDF Lib

...

CDF Data

HDF Data

NetCDF Data

Devices

Metamodels

CDF HDF NetCDF

File Content ManagerContent

ComposerContent Verifier

Content Mapper

Communication model

...Physical

Layer

API Layer

DSML Layer

Page 13: A Domain-Specific Modeling Language for Scientific Data Composition and Interoperability

NEW SLIDES NEEDED HERE TO INTRODUCE DSM!

Page 14: A Domain-Specific Modeling Language for Scientific Data Composition and Interoperability

Model-Driven Engineering (MDE) and Domain-Specific Modeling (DSM)

MDE: specifies and generates software systems based on high-level models

Domain-Specific Modeling (DSM): a paradigm of MDE that uses notations and rules from an application domain

Metamodel: defines a Domain-specific Modeling language (DSML) by specifying the entities and their relationships in an application domain

Model: an instance of the metamodel

Model Transformation: a process that converts one or more models to various levels of software artifacts (e.g., other models, source code)

Page 15: A Domain-Specific Modeling Language for Scientific Data Composition and Interoperability

Unifying the representation of file structure organization

Adapt a DSML to build a tool for visualizing & composing the scientific file format in a unified way

Analyze data model of each scientific file format

Feature Model

Define DSML from Feature Model

Common Data Model

Variable Data Model

Grammar & Syntax

Implement DSML

DSML Tool

Page 16: A Domain-Specific Modeling Language for Scientific Data Composition and Interoperability

Unifying the representation of file structure organization

Feature Model for Scientific File Format Describe some highlights here And here

Page 17: A Domain-Specific Modeling Language for Scientific Data Composition and Interoperability

Unifying the representation of file structure organization

Content Composer DSML Modeling tool for scientific data file Implemented by using GEMS

Page 18: A Domain-Specific Modeling Language for Scientific Data Composition and Interoperability

API Abstraction Layer

Help to protect user applications from the evolution of APIs

NetCDF HDF5

int nc_create ( const char* path, int cmode, int *ncidp)

H5File (const char *name,unsigned int flags)

Abstraction

createFile( const char *path, FileCreationProperty fileCreationProperty)

API Abstraction Layer

HDF API

HDF Lib

NetCDF API

NetCDF Lib

Page 19: A Domain-Specific Modeling Language for Scientific Data Composition and Interoperability

Integrating data among heterogeneous data formats

Content Mapper Define rules how to map data from a scientific data

format to another Content Verifier

Verify the correctness of the file composition Verify the correctness of mapping rule

Metamodels

CDF HDF NetCDF

File Content ManagerContent

ComposerContent Verifier

Content Mapper

Communication model

DSML Layer

Page 20: A Domain-Specific Modeling Language for Scientific Data Composition and Interoperability

Summary

From the prototype of the framework A DSML can help to build a graphical tool to compose and

support interoperability across scientific file structures Adoption of the layered architecture in the framework can

help to maintain the independence of each layer Both the API abstraction layer and the layered

architecture are essential to develop and maintain user applications

Further works Create metamodels that include full specification of each

scientific file Categorizing APIs in accordance to their intended use for

API abstraction layer Develop metamodels for managing API evolution

Page 21: A Domain-Specific Modeling Language for Scientific Data Composition and Interoperability

Thank you!

Page 22: A Domain-Specific Modeling Language for Scientific Data Composition and Interoperability

Example of Scientific Data Format: OPeNDAP

Client-server protocol for scientific data access Targeted oceanographic data management