A Metadata Binding Store for Distributed Scientific Data
description
Transcript of A Metadata Binding Store for Distributed Scientific Data
![Page 1: A Metadata Binding Store for Distributed Scientific Data](https://reader033.fdocuments.in/reader033/viewer/2022051821/568164f8550346895dd765df/html5/thumbnails/1.jpg)
A Metadata Binding Store for Distributed Scientific Data
Yin Chen, Malcolm Atkinson, Stuart Aitken Dec. 2009
UK e-Science All Hands Meeting 2009, Oxford, 08 Dec. 2009
![Page 2: A Metadata Binding Store for Distributed Scientific Data](https://reader033.fdocuments.in/reader033/viewer/2022051821/568164f8550346895dd765df/html5/thumbnails/2.jpg)
MOTIVATION Scientific data/metadata are generated at great
speed and high volume
Data and Metadata are often created independently
Hypothesis: A binding service is useful to serve various scales distributed scientific data
Metadata are the key to data access, discovery, preservation, provenance, interpretation
We view the relationship between data and metadata as a binding
![Page 3: A Metadata Binding Store for Distributed Scientific Data](https://reader033.fdocuments.in/reader033/viewer/2022051821/568164f8550346895dd765df/html5/thumbnails/3.jpg)
EurExpress Project, EU funded under FP6, 2005-2009.
Aim to capture >20,000 gene via RNA in situ hybridization (ISH).
Generate digital ‘transcriptome atlas’
IS BINDING A PROBLEM?
Nov.2009: 19,411 assay, 15,715 annotations, ~5TB data
Gene Expression Data Repository
Metadata
Template Images
Annotation(FIATAS)Alicante
Genepaint Robotics
14.5 days
mouse embryo
Section slides Automatic
ISHs(8 EU Bio
labs)
High resolution
gene express imagesISH
management(LIME system)
![Page 4: A Metadata Binding Store for Distributed Scientific Data](https://reader033.fdocuments.in/reader033/viewer/2022051821/568164f8550346895dd765df/html5/thumbnails/4.jpg)
REAL WORLD OBSERVATIONS Information
inconsistency
1) The Numbers of probe genes miss-matched with the template design
2) The Numbers of gene expression images without metadata
Significant human operating errors
Consistency checking became more difficult as data increased
The bindings have to be efficiently managed!
![Page 5: A Metadata Binding Store for Distributed Scientific Data](https://reader033.fdocuments.in/reader033/viewer/2022051821/568164f8550346895dd765df/html5/thumbnails/5.jpg)
DESIGN PRINCIPLES A binding system manages bindings
Generic approach, independent from data resources
Federate references of data and metadata Data warehousing approach is no longer feasible Data become too large, too dynamic, too unwieldy to copy No permit to copy Refreshness
Allow binding sharing among user communities, scalable
Can be combined with other services
Design principle: Simple Minimize internal complexity: no conflict Maximize external integrity: less overlap
![Page 6: A Metadata Binding Store for Distributed Scientific Data](https://reader033.fdocuments.in/reader033/viewer/2022051821/568164f8550346895dd765df/html5/thumbnails/6.jpg)
A SIMPLE BINDING STORE Binding Data Model
Binding ID – UUID, need no central registration authority, unlimited
Binding subject/object – URIs, used by most web accessible data resources
Binding description – Tags, efficient, flexible
Binding APIs Manipulation operations Discovery operations Delivery operations
![Page 7: A Metadata Binding Store for Distributed Scientific Data](https://reader033.fdocuments.in/reader033/viewer/2022051821/568164f8550346895dd765df/html5/thumbnails/7.jpg)
IMPLEMENTATION Grid tech. OGSA-DAI
OGSA-DAI server activities OGSA-DAI client activities OGSA-DAI client toolkits
Service Proxy APIs, programmable interface for users
Command-line UI
Not included in current work
![Page 8: A Metadata Binding Store for Distributed Scientific Data](https://reader033.fdocuments.in/reader033/viewer/2022051821/568164f8550346895dd765df/html5/thumbnails/8.jpg)
EVALUATION Use workload modelling and simulation
method No available binding data Observations from wwwPDB, BADC,
EurExpress, NanoCMOS, Flickr Creation patterns, access patterns, and
content patterns are observed Simulation of the real-world observations
![Page 9: A Metadata Binding Store for Distributed Scientific Data](https://reader033.fdocuments.in/reader033/viewer/2022051821/568164f8550346895dd765df/html5/thumbnails/9.jpg)
WORKLOAD MODELLING
Creation Workloads
Num
ber o
f Ann
otat
ion
per d
ay
New
PD
B S
truc
ture
per
Mon
th
Num
ber o
f Dat
a Fi
le p
er d
ay
Access Workloads
Num
ber o
f Acc
ess
per d
ay
Tag Behaviours
![Page 10: A Metadata Binding Store for Distributed Scientific Data](https://reader033.fdocuments.in/reader033/viewer/2022051821/568164f8550346895dd765df/html5/thumbnails/10.jpg)
WORKLOAD SIMULATION
Zipf’s Dist. α=0.9 Zipf’s Dist. α=0.4
Hidden Markov ModelTwo Poisson Processes, Two Uniform Dist.
Uniform Dist.: Trend:
Zipf’s Dist. α=0.2 Weibull Dist.
Poisson Process:
Prob
abili
ty o
f the
inte
rval
s oc
curr
ence
![Page 11: A Metadata Binding Store for Distributed Scientific Data](https://reader033.fdocuments.in/reader033/viewer/2022051821/568164f8550346895dd765df/html5/thumbnails/11.jpg)
EXPERIMENT SETUP
Inter(R) Core2 2.66GHz, RAM 7GB, 144GB HD, 100Mbps network conn, Red Hat 4.1, Tomcat 5.5, OD 3.1, MySQL 6.0, R 2.9.
SSJ, Colt, benchmark script 10 runs per configuration, collected Means, SEs, 95%
CIs
![Page 12: A Metadata Binding Store for Distributed Scientific Data](https://reader033.fdocuments.in/reader033/viewer/2022051821/568164f8550346895dd765df/html5/thumbnails/12.jpg)
EXPERIMENT RESULTS Robust to different types of workloads Robust to small ~ large scale workloads Robust to both independent and combined workloads Stressed by the Ultra scale workloads
![Page 13: A Metadata Binding Store for Distributed Scientific Data](https://reader033.fdocuments.in/reader033/viewer/2022051821/568164f8550346895dd765df/html5/thumbnails/13.jpg)
FUTURE WORK A Scalable Binding Store
Cloud Computing promises to be scalable
Our Evaluation of the Hadoop
![Page 14: A Metadata Binding Store for Distributed Scientific Data](https://reader033.fdocuments.in/reader033/viewer/2022051821/568164f8550346895dd765df/html5/thumbnails/14.jpg)
BINDING APPLICATIONS Web move to web3.0 Binding index Combine with metadata management tools Mashup applications
![Page 15: A Metadata Binding Store for Distributed Scientific Data](https://reader033.fdocuments.in/reader033/viewer/2022051821/568164f8550346895dd765df/html5/thumbnails/15.jpg)
ACKNOWLEDGEMENT National e-Science Center, research group, support team,
middleware team MRC HGU Biomedical Statistical Analyse Section: Prof Richard
Baldock, Dr Duncan Davidson Newcastle HDBR: Prof Susan Lindsay, Steven N. Lisgo EDINA Geo Research & Data Library: Chris Higgins, Dr David
Medyckyj-Scott Data resourses: DGEMap, EurExpress Prof Richard Baldock, Lalit
Kumar, NanoCMOS Dr Clive Davenhall, Prof Richard Sinnott Technique support: OGSA-DAI team Research materials: COBrA-CT, OntoGrid Prof Carole Goble, Dr
Oscar Corcho, MyGrid Dr Phillip Lord