Semantic Mediation and Scientific Workflows Bertram Ludäscher Data and Knowledge Systems San Diego...
-
Upload
marcia-robinson -
Category
Documents
-
view
218 -
download
2
description
Transcript of Semantic Mediation and Scientific Workflows Bertram Ludäscher Data and Knowledge Systems San Diego...
Semantic Mediation and Semantic Mediation and Scientific WorkflowsScientific Workflows
Bertram LudäscherBertram Ludäscher
Data and Knowledge Systems
San Diego Supercomputer Center
University of California, San Diego
2 SEEK Kansas 11/02SEEK Kansas 11/02
• Data Integration Approaches:– Let’s just share data, e.g., link everything from a web page!– ... or better put everything into an relational or XML database– ... and do remote access using the Grid– ... or just use Web services!
• Nice try. But: – “Find the files where the amygdala was segmented.”– “Which other structures were segmented in the same files?”– “Did the volume of any of those structures differ much from
normal?”– What is the cerebellar distribution of rat proteins with more
than 70% homology with human NCS-1? Any structure specificity? How about other rodents?
Some BIRNing Data Integration Some BIRNing Data Integration QuestionsQuestions
Biomedical InformaticsResearch Network
XML-Based (or Relational) vs. Semantic MediationXML-Based (or Relational) vs. Semantic Mediation
Raw DataRaw DataRaw Data
IF THEN IF THEN IF THEN
LogicalDomainConstraints
Integrated-CM CM-QL(Src1-CM,...)
. . ....
....
........ (XML)Objects
Conceptual Models
XMLElements
XML Models
C2 C3
C1
R
Classes,Relations,is-a, has-a, ...
“Glue Maps” = Domain & Process Maps (ontologies)
Integrated-DTD XML-QL(Src1-DTD,...)
No DomainConstraints
A = (B*|C),DB = ...
Structural Constraints (DTDs),Parent, Child, Sibling, ...
CM ~ {Descr.Logic, ER, UML, RDF/XML(-Schema), …} CM-QL ~ {F-Logic, DAML+OIL, …}
5 SEEK Kansas 11/02SEEK Kansas 11/02
Making the SM System “Understand” Your Data: Making the SM System “Understand” Your Data: SourceSource ContextualizationContextualization via Ontology Refinement via Ontology Refinement
In addition to registering (“hanging off”) data relative toexisting concepts, a source may also refine the mediator’s domain map...
sources can register new concepts at the mediator ...
Query Processing Query Processing DemoDemo
Query resultsin context
ContextualizationCON(Result) wrt. ANATOM.
Mediator View DefinitionMediator View DefinitionDERIVEprotein_distribution(Protein, Organism,Brain_region, Feature_name, Anatom, Value) WHEREI:protein_label_image[ proteins ->> {Protein}; organism -> Organism; anatomical_structures ->>{AS:anatomical_structure[name->Anatom]}] , % from PROLAB
NAE:neuro_anatomic_entity[name->Anatom; % from ANATOM located_in->>{Brain_region}], AS..segments..features[name->Feature_name; value->Value]. • provided by the domain expert and mediation engineer• deductive OO language (here: F-logic)
7 SEEK Kansas 11/02SEEK Kansas 11/02
A Scientific Workflow: A Scientific Workflow: Promoter IdentificationPromoter Identification
Questions:Are chr#’s in common?Are chr#’s locations in common?Are there conserved upstream sequences?Are gene locations conserved across species
Questions: RNA POLII promoter?GpC Island present?Are there common TAF’s across genomic gi#?
Questions: Are there other common genes?
gi#’s from clusfavor
cDNA gi#Gene name
blast
blast human
Genomic gi#Chr #
Gene location
TAF’sLocation on Genomic gi#’s
Probabilities of matchProbabilities of random match
TRANSFAC
GC Island locationExon/intron location
Repeats locationPromoter location
GRAIL
Validates polII promoter location
promoter locationShared TAF’s across clusterCommon consensus sequence
Data Consolidation
Consensus sequences
CLUSTAL
blast other species
Genomic gi#Chr #
Gene location
blast
Matthew Coleman, LLNL, 2002
Genomic gi# cDNA gi#
blast
CLUSTAL
TRANSFAC
8 SEEK Kansas 11/02SEEK Kansas 11/02
SDM Demo & ArchitectureSDM Demo & Architecture
Translation Approach:Abstract Workflow (AWF) => Executable Workflow (EWF)