A Domain-Specific Conceptual Data Modeling and Querying Methodology
description
Transcript of A Domain-Specific Conceptual Data Modeling and Querying Methodology
A Domain-Specific Conceptual Data Modeling and Querying Methodology
Hao Tian, Rajshekhar Sunderraman, and Hong Yang
Department of Computer Science
Georgia State University
Atlanta, Georgia USA
OUTLINE 2
Outline
• Introduction
• DSC-DM and DSC-QL
• Implementation
• Conclusion
INTRODUCTION 3
Motivations• New trends in database applications
– Data structures: simple → complex – Query capability: backend (DB) → front-end (UI)
• Traditional data management technologies– Originated and grew within the business domain– Facing many challenges from other domains (especially life
sciences)– Cannot fully meet the needs today.
• Motivated by issues in life science applications– Data representation & query capability for life scientists
“High-throughput ‘omic’ technology is increasingly used in clinical and epidemiological studies; however, our ability to analyze and interpret high-content and complex databases has not kept pace.” -- Vernon & Reeves, 2006
– Domain knowledge vs. database knowledge– Life scientists need more capabilities and less dependence on
computer scientists.
INTRODUCTION 4
Limitations of SQL and OQL• DBMS-dependent.• Users have to work explicitly at the database level.• No support for user-defined data types and functions in pure
SQL.• Methods in OQL are not trivial.• Designed for database administrators and developers.
SQL:
SELECT n.neuronid
FROM neuron n, neuron m, electrical_synapse es, chemical_synapse cs
WHERE n.neournid=es.precellid AND m.neuronid=es.postcellid AND
n.neournid=cs.precellid AND m.neuronid=cs.postcellid AND
cs.type = ‘inhibitory’
OQL:
SELECT n.neuronid
FROM neuron n, neuron m, electrical_synapse es, n.output o, o.connection c
WHERE c.type = ‘inhibitory’ AND n = es.preCell AND m = es.postCell
OR c.type = ‘inhibitory’ AND m = es.preCell AND n = es.postCell
INTRODUCTION 5
Limitations of Graphical Query Builders• Powerful but require a good understanding of SQL/OQL.• Build a query through graphical interfaces.• Developed by DBMS vendors, third-party companies or individuals. • For DB administrators, not suitable for naïve users.
INTRODUCTION 6
Limitations of Form-Based Query Interface• Designed for end-users.• Limited expressive power (only supports predefined types of queries).• Application-dependent.• Can become obsolete as the database evolves.
INTRODUCTION 7
Conceptual Query Languages (CQL)
• Based on a conceptual data model, rather than database structure.
• Can be either translated into DBMS-supported query languages or evaluated by a particular program.
• No extensive studies so far.
Employee +-- has Salary > 90000 +-- either speaks count(Language) > 1 | or – drives Car +-- has Color ‘red’
ConQuer Query Example
INTRODUCTION 8
Pros and Cons of CQLs
• Pros – Formulate queries at the conceptual level.– Hide database details from end-users.– Improve query capability for end-users.– Have less cognitive burden and great usability.
• Cons – No built-in support for domain-specific or user-defined functions.– No implementation details for most of them.[Gogolla et al., 1991;
Hohenstein et al., 1992; Grant et al., 1993; Lawley et al., 1994]– Integrated into a software and cannot be applied to existing
applications. [Auddino et al., 1991, Rosengren, 1994; Bloesch et al., 1996]
INTRODUCTION 9
Goal
• Design a methodology for domain-specific conceptual data modeling and querying that can
– be applied to any particular domain.
– empowers end-users to represent, manipulate, and query the data effectively and efficiently.
INTRODUCTION 10
Overview of Methodology• Provide a conceptual data model (DSC-DM)
– capture more domain semantics than EER.
• Provide a conceptual query language (DSC-QL) – uses only the abstract concepts and functions in DSC-DM.
• Can be applied to any domain and major types of DBMSs.
• Require domain knowledge, rather than database details.
• Dynamic support for user-defined functions.
Domain-Specific Conceptual Data Model Data Model
Domain-Specific Conceptual Query Language
Database Schema
Traditional Query Language
Domain Knowledge
OUTLINE 11
Outline
• Introduction
• DSC-DM and DSC-QL
• Implementation
• Conclusion
DSC-QM AND DSC-QL 12
DSC-DM
• The basis of DSC-QL and created by domain experts
• Automatically converted into physical data models.
• Three components– Data structure diagram
• Similar to EER– Annotation table and meta-attribute table
• For annotation, controlled values, complex data structures, and meta-attributes like data type, unit, standard error, and so on
– User-defined function (UDF) • Domain-specific functions & Application-specific functions• Can be dynamically defined by end-users
DSC-QM AND DSC-QL 13output
Data Structure Diagram
DSC-DM Example
EntityType Attribute AnnoEntityType Remark
Neuron neuronType Literature Link to paper
Neuron actionPotential Picture Image of plot
Domain-specific functions:
Signature:
E_connect (Neuron n, Neuron m)
Semantics:
Both n and m are of type Neuron. n is
electrically coupled with m.
Implementation:
EXISTS (n.neuronid, m.neuronid)
[ElectricalSynapse c,
c.input.neuron.neuronid=n.neuronid,
c.output.neuron.neuronid=m.neuronid] OR
[ElectricalSynapse c,
c.output.neuron.neuronid = n.neuronid,
c.input.neuron.neuronid = m.neuronid]
…
Application-specific functions:
…
Class Attribute Default Data Type Data Structure Unit Standard Error Controlled Values Constraint
Electrophysiology actionPotential no int atomic mV 2 0
Neuron dorsalOrVentral no string atomic {dorsal, ventral} not null
Meta-attribute Table
Annotation Table
User-Defined Functions
input
Locate
ProjectTo
Electrophysiology
HasG
HasFPLink
Biliteral
MemberOf
hasComponents
Neuron
Connection
ChemicalSynapse
ElectricalSynapse
Neuromodulation
Component
Molecule
Function
SomaElectrophysiology
Axon
Conductance
FiringPatternNerve
Ganglion
Hom
olog hasSoma
hasAxon
HasFunction hasMolecule
Reference Book
Literature
Picture
DSC-QM AND DSC-QL 14
DSC-QL
• High level conceptual query language
• Uses only abstract concepts in DSC-DM.
• Flexible, extensible, and readily usable.– User-defined functions– Composite & set attributes– Super classes– Dot-path expressions (mono-valued & multi-valued)
• DBMS-independent.
DSC-QM AND DSC-QL 15
DSC-QL Syntax
• General structure: (result-list) [query-criteria]
• Can be combined by UNION, INTERSECT, and MINUS.
• Basic terms– Declaration: E.g. Neuron n, n.hasMolecule.molecule m– Attribute reference: E.g. n.name, n.hasMolecule.molecule.name– Comparison: E.g. n.hasMolecule.molecule.name = ‘5HT’– UDF: E.g. e_connect(n, m) – Sub-query E.g. EXISTS (n) [n.hasMolecule.molecule.name =‘5HT’]
DSC-QM AND DSC-QL 16
hasAxon
DSC-QL Examples
• All serotonergic neurons: (Neuron n)[n.hasMolecule.Molecule.name = ‘5HT’]
• Which inhibitory synapses exhibit facilitation?(Connection c)[c.type = ‘inhibitory’,
c.plasticity = ‘facilitation’]
name
output
input
Locate
ProjectTo
.Electrophysiology
HasG
HasFP
Link
Biliteral
MemberOf
hasComponents
Neuron
ChemicalSynapse
ElectricalSynapse
Neuromodulation
Component
Molecule
Function
Soma
Electrophysiology
Axon
Conductance
FiringPattern
Nerve
Ganglion
Hom
olog
hasSoma
HasFunction
hasMolecule
name
plasticitytype
Connection
DSC-QM AND DSC-QL 17
DSC-QL Examples Cont.
• Which neurons are electrically coupled with neuron “R3-13”– (Neuron n)[e_connect(n, m), m.name = “R3-13”] Uses DSF
– (Neuron n)[n.input.electricalSynapse.output.neuron.name = ‘R3-13’] OR Substitute [n.output.electricalSynapse.input.neuron.name = ‘R3-13’]
– (Neuron n)[Neuron m, m.name = “R3-13”, Normal Form EXISTS(c)[ElectricalSynapse c, c.input.neuron.neuronid = m.neuronid, c.output.neuron.neuronid = n.neuronid] OR [ElectricalSynapse c, c.input.neuron.neuronid = n.neuronid, c.output.neuron.neuronid = m.neuronid]]
hasAxon
name
output
input
Locate
ProjectTo
.Electrophysiology
HasG
HasFP
Link
Biliteral
MemberOf
hasComponents
Neuron
ChemicalSynapse
ElectricalSynapse
Neuromodulation
Component
Molecule
Function
Soma
Electrophysiology
Axon
Conductance
FiringPattern
Nerve
Ganglion
Hom
olog
hasSoma
HasFunction
hasMolecule
name
plasticitytype
Connection
OUTLINE 18
Outline
• Introduction
• DSC-DM and DSC-QL
• Implementation
• Conclusion
IMPLEMENTATION 19
System Architecture
Relational DB Object-Oriented DB …
DSC-DMGrammar
DSC-QL Translator
DSC-QLQuery Q
Domain Knowledge
SQL
DB Generator
OQL …
Mapping.XML
XML Converter
Things that end-users need to know
UDF
DSC-QL Normalizer
DSC-QL Normal Form
ODL/SQL_DDL/etc.
DSC.XML
UDF.XML
IMPLEMENTATION 20
Translation Example 1
• DSC-QL:(Neuron n)[n.hasMolecule.Molecule.name = ‘5HT’]
• Normal Form:(n.name) [Neuron n, n.hasMolecule.Molecule m,
m.name=‘5HT’]
• SQL Translation:SELECT n.name FROM Neuron n, (SELECT tid1.neuronid AS pk, tid3.* FROM Neuron tid1, hasMolecule tid2, Molecule
tid3,WHERE tid1.neuronid = tid2.neuronid AND
tid2.moleculeid = tid3.moleculeid) mWHERE n.neuronid = m.pk AND m.name = ‘5HT’
IMPLEMENTATION 21
Translation Example 2• DSC-QL (Connection c)[c.type = ‘inhibitory’, c.plasticity = ‘facilitation’]
• Normal Form(c.connectionID)[ElectricalSynapse c, c.type=‘inhibitory’, c.plasticity= ‘facilitation’]
UNION (c.connectionID)[ChemicalSynapse c, c.type=‘inhibitory’, c.plasticity= ‘facilitation’] UNION (c.connectionID)[NeuroModulation c, c.type = ‘inhibitory’, c.plasticity =
‘facilitation’]
• SQL TranslationSELECT c.connectionID FROM ElectricalSynapse cWHERE c.type = ‘inhibitory’ AND c.plasticity = ‘facilitation’
UNION … UNION …
IMPLEMENTATION 22
Translation Example 3• DSC-QL (Neurons that have a chemical synapse to neuron R3-13) (Neuron n)[c_connect(n, m), m.name=“R3-13”]
• Normal Form (n.name)[Neuron n, Neuron m, m.name = ‘R3-13’, EXISTS (n.neuronid, m.neuronid)[ChemicalSynapse c, c.output.neuron sysid1, c.input.neuron sysid2, sysid1.neuronid=n.neuronid, sysid2.neuronid=m.neuronid]]
• SQL Translation SELECT n.name FROM Neuron n, Neuron m WHERE m.name = ‘R3-13’ AND EXISTS( SELECT n.nueornid, m.neuronid FROM ChemicalSynapse c, (SELECT sid1.connectionID AS pk FROM ElectricalSynapse sid1, input sid2, neuron sid3 WHERE sid1.connectionid = sid2.connectionid AND sid2.neuronid = sid3.neuronid) sysid1 (SELECT sid1.connectionID AS pk FROM ElectricalSynapse sid1, output sid2, neuron sid3 WHERE sid1.connectionid = sid2.connectionid AND sid2.neuronid = sid3.neuronid) sysid2 WHERE c.connectionID = sysid1.pk AND c.connectionID = sysid2.pk AND sysid1.neuronid = n.neuronid AND sysid2.neuronid = m.neuronid)
IMPLEMENTATION 23
Implementation
• Two translation algorithms (DSC-QL→SQL and DSC-QL→OQL) have been introduced.
• Two system prototypes have been built.– Traditional business domain – Relational DBMS: Oracle 10g– Neuroscience – OODBMS: EyeDB
• Current version is a standalone Java application with a command line query interface. A web-based graphical query interface is under development and will be released very soon.
OUTLINE 24
Outline
• Introduction
• DSC-DM and DSC-QL
• Implementation
• Conclusion
PROPOSED METHODOLOGY 25
Related work• Many domain-specific query languages:
– Target at a particular domain such as Web Search Engine, Intrusion Analysis, genomic information, and high energy physics analysis.
– Cannot be applied to other domains.
• Some conceptual query languages (CQL):– ERQL: Built on EER model; No support for UDFs and meta-attributes; No
implementation report.– ConQuer: Based on Object Role Modelling (ORM); Must be used in a pre-
developed graphical user interface.– CBQL: Based on ConceptBase; Work on deductive database, Hard to grasp for
naive users.
• A few query methodologies– LabBase:
Constructs a domain-specific DBMS, LabBase, on top of a persistent object system, ObjectStore; Specifically tailored to the domain of laboratory information system.
– CQL in SUPER project: A graphical query language bound to a pre-developed editor; Not support UDFs; Hard
to understand for naïve users.
CONCLUSION 26
Contributions• No domain-specific conceptual data modeling and querying methodology
has been proposed so far.
– It combines the virtues of conceptual and domain-specific query languages, can be applied to any domain and all major types of DBMSs, can be integrated into most existing applications.
– End-users can directly interact with the data based on their domain knowledge, instead of the database details.
– DSC-DM can capture more domain semantics than EER such as user-defined functions, meta-data, and annotation information.
– DSC-QL has the same expressive power as SQL, but is much simpler and more suitable for end-users. It can be translated into SQL and OQL and dynamically supports new user-defined functions without reprogramming the application system.
CONCLUSION 27
Future Work
• Implement the support to native XML database
• Add data manipulation function into
DSC-QL