A Domain-Specific Conceptual Data Modeling and Querying Methodology

27
A Domain-Specific Conceptual Data Modeling and Querying Methodology Hao Tian, Rajshekhar Sunderraman, and Hong Yang Department of Computer Science Georgia State University Atlanta, Georgia USA

description

A Domain-Specific Conceptual Data Modeling and Querying Methodology. Hao Tian, Rajshekhar Sunderraman, and Hong Yang Department of Computer Science Georgia State University Atlanta, Georgia USA. Outline. Introduction DSC-DM and DSC-QL Implementation Conclusion. Motivations. - PowerPoint PPT Presentation

Transcript of A Domain-Specific Conceptual Data Modeling and Querying Methodology

Page 1: A Domain-Specific Conceptual Data Modeling and Querying Methodology

A Domain-Specific Conceptual Data Modeling and Querying Methodology

Hao Tian, Rajshekhar Sunderraman, and Hong Yang

Department of Computer Science

Georgia State University

Atlanta, Georgia USA

Page 2: A Domain-Specific Conceptual Data Modeling and Querying Methodology

OUTLINE 2

Outline

• Introduction

• DSC-DM and DSC-QL

• Implementation

• Conclusion

Page 3: A Domain-Specific Conceptual Data Modeling and Querying Methodology

INTRODUCTION 3

Motivations• New trends in database applications

– Data structures: simple → complex – Query capability: backend (DB) → front-end (UI)

• Traditional data management technologies– Originated and grew within the business domain– Facing many challenges from other domains (especially life

sciences)– Cannot fully meet the needs today.

• Motivated by issues in life science applications– Data representation & query capability for life scientists

“High-throughput ‘omic’ technology is increasingly used in clinical and epidemiological studies; however, our ability to analyze and interpret high-content and complex databases has not kept pace.” -- Vernon & Reeves, 2006

– Domain knowledge vs. database knowledge– Life scientists need more capabilities and less dependence on

computer scientists.

Page 4: A Domain-Specific Conceptual Data Modeling and Querying Methodology

INTRODUCTION 4

Limitations of SQL and OQL• DBMS-dependent.• Users have to work explicitly at the database level.• No support for user-defined data types and functions in pure

SQL.• Methods in OQL are not trivial.• Designed for database administrators and developers.

SQL:

SELECT n.neuronid

FROM neuron n, neuron m, electrical_synapse es, chemical_synapse cs

WHERE n.neournid=es.precellid AND m.neuronid=es.postcellid AND

n.neournid=cs.precellid AND m.neuronid=cs.postcellid AND

cs.type = ‘inhibitory’

OQL:

SELECT n.neuronid

FROM neuron n, neuron m, electrical_synapse es, n.output o, o.connection c

WHERE c.type = ‘inhibitory’ AND n = es.preCell AND m = es.postCell

OR c.type = ‘inhibitory’ AND m = es.preCell AND n = es.postCell

Page 5: A Domain-Specific Conceptual Data Modeling and Querying Methodology

INTRODUCTION 5

Limitations of Graphical Query Builders• Powerful but require a good understanding of SQL/OQL.• Build a query through graphical interfaces.• Developed by DBMS vendors, third-party companies or individuals. • For DB administrators, not suitable for naïve users.

Page 6: A Domain-Specific Conceptual Data Modeling and Querying Methodology

INTRODUCTION 6

Limitations of Form-Based Query Interface• Designed for end-users.• Limited expressive power (only supports predefined types of queries).• Application-dependent.• Can become obsolete as the database evolves.

Page 7: A Domain-Specific Conceptual Data Modeling and Querying Methodology

INTRODUCTION 7

Conceptual Query Languages (CQL)

• Based on a conceptual data model, rather than database structure.

• Can be either translated into DBMS-supported query languages or evaluated by a particular program.

• No extensive studies so far.

Employee +-- has Salary > 90000 +-- either speaks count(Language) > 1 | or – drives Car +-- has Color ‘red’

ConQuer Query Example

Page 8: A Domain-Specific Conceptual Data Modeling and Querying Methodology

INTRODUCTION 8

Pros and Cons of CQLs

• Pros – Formulate queries at the conceptual level.– Hide database details from end-users.– Improve query capability for end-users.– Have less cognitive burden and great usability.

• Cons – No built-in support for domain-specific or user-defined functions.– No implementation details for most of them.[Gogolla et al., 1991;

Hohenstein et al., 1992; Grant et al., 1993; Lawley et al., 1994]– Integrated into a software and cannot be applied to existing

applications. [Auddino et al., 1991, Rosengren, 1994; Bloesch et al., 1996]

Page 9: A Domain-Specific Conceptual Data Modeling and Querying Methodology

INTRODUCTION 9

Goal

• Design a methodology for domain-specific conceptual data modeling and querying that can

– be applied to any particular domain.

– empowers end-users to represent, manipulate, and query the data effectively and efficiently.

Page 10: A Domain-Specific Conceptual Data Modeling and Querying Methodology

INTRODUCTION 10

Overview of Methodology• Provide a conceptual data model (DSC-DM)

– capture more domain semantics than EER.

• Provide a conceptual query language (DSC-QL) – uses only the abstract concepts and functions in DSC-DM.

• Can be applied to any domain and major types of DBMSs.

• Require domain knowledge, rather than database details.

• Dynamic support for user-defined functions.

Domain-Specific Conceptual Data Model Data Model

Domain-Specific Conceptual Query Language

Database Schema

Traditional Query Language

Domain Knowledge

Page 11: A Domain-Specific Conceptual Data Modeling and Querying Methodology

OUTLINE 11

Outline

• Introduction

• DSC-DM and DSC-QL

• Implementation

• Conclusion

Page 12: A Domain-Specific Conceptual Data Modeling and Querying Methodology

DSC-QM AND DSC-QL 12

DSC-DM

• The basis of DSC-QL and created by domain experts

• Automatically converted into physical data models.

• Three components– Data structure diagram

• Similar to EER– Annotation table and meta-attribute table

• For annotation, controlled values, complex data structures, and meta-attributes like data type, unit, standard error, and so on

– User-defined function (UDF) • Domain-specific functions & Application-specific functions• Can be dynamically defined by end-users

Page 13: A Domain-Specific Conceptual Data Modeling and Querying Methodology

DSC-QM AND DSC-QL 13output

Data Structure Diagram

DSC-DM Example

EntityType Attribute AnnoEntityType Remark

Neuron neuronType Literature Link to paper

Neuron actionPotential Picture Image of plot

Domain-specific functions:

Signature:

E_connect (Neuron n, Neuron m)

Semantics:

Both n and m are of type Neuron. n is

electrically coupled with m.

Implementation:

EXISTS (n.neuronid, m.neuronid)

[ElectricalSynapse c,

c.input.neuron.neuronid=n.neuronid,

c.output.neuron.neuronid=m.neuronid] OR

[ElectricalSynapse c,

c.output.neuron.neuronid = n.neuronid,

c.input.neuron.neuronid = m.neuronid]

Application-specific functions:

Class Attribute Default Data Type Data Structure Unit Standard Error Controlled Values Constraint

Electrophysiology actionPotential no int atomic mV 2 0

Neuron dorsalOrVentral no string atomic {dorsal, ventral} not null

Meta-attribute Table

Annotation Table

User-Defined Functions

input

Locate

ProjectTo

Electrophysiology

HasG

HasFPLink

Biliteral

MemberOf

hasComponents

Neuron

Connection

ChemicalSynapse

ElectricalSynapse

Neuromodulation

Component

Molecule

Function

SomaElectrophysiology

Axon

Conductance

FiringPatternNerve

Ganglion

Hom

olog hasSoma

hasAxon

HasFunction hasMolecule

Reference Book

Literature

Picture

Page 14: A Domain-Specific Conceptual Data Modeling and Querying Methodology

DSC-QM AND DSC-QL 14

DSC-QL

• High level conceptual query language

• Uses only abstract concepts in DSC-DM.

• Flexible, extensible, and readily usable.– User-defined functions– Composite & set attributes– Super classes– Dot-path expressions (mono-valued & multi-valued)

• DBMS-independent.

Page 15: A Domain-Specific Conceptual Data Modeling and Querying Methodology

DSC-QM AND DSC-QL 15

DSC-QL Syntax

• General structure: (result-list) [query-criteria]

• Can be combined by UNION, INTERSECT, and MINUS.

• Basic terms– Declaration: E.g. Neuron n, n.hasMolecule.molecule m– Attribute reference: E.g. n.name, n.hasMolecule.molecule.name– Comparison: E.g. n.hasMolecule.molecule.name = ‘5HT’– UDF: E.g. e_connect(n, m) – Sub-query E.g. EXISTS (n) [n.hasMolecule.molecule.name =‘5HT’]

Page 16: A Domain-Specific Conceptual Data Modeling and Querying Methodology

DSC-QM AND DSC-QL 16

hasAxon

DSC-QL Examples

• All serotonergic neurons: (Neuron n)[n.hasMolecule.Molecule.name = ‘5HT’]

• Which inhibitory synapses exhibit facilitation?(Connection c)[c.type = ‘inhibitory’,

c.plasticity = ‘facilitation’]

name

output

input

Locate

ProjectTo

.Electrophysiology

HasG

HasFP

Link

Biliteral

MemberOf

hasComponents

Neuron

ChemicalSynapse

ElectricalSynapse

Neuromodulation

Component

Molecule

Function

Soma

Electrophysiology

Axon

Conductance

FiringPattern

Nerve

Ganglion

Hom

olog

hasSoma

HasFunction

hasMolecule

name

plasticitytype

Connection

Page 17: A Domain-Specific Conceptual Data Modeling and Querying Methodology

DSC-QM AND DSC-QL 17

DSC-QL Examples Cont.

• Which neurons are electrically coupled with neuron “R3-13”– (Neuron n)[e_connect(n, m), m.name = “R3-13”] Uses DSF

– (Neuron n)[n.input.electricalSynapse.output.neuron.name = ‘R3-13’] OR Substitute [n.output.electricalSynapse.input.neuron.name = ‘R3-13’]

– (Neuron n)[Neuron m, m.name = “R3-13”, Normal Form EXISTS(c)[ElectricalSynapse c, c.input.neuron.neuronid = m.neuronid, c.output.neuron.neuronid = n.neuronid] OR [ElectricalSynapse c, c.input.neuron.neuronid = n.neuronid, c.output.neuron.neuronid = m.neuronid]]

hasAxon

name

output

input

Locate

ProjectTo

.Electrophysiology

HasG

HasFP

Link

Biliteral

MemberOf

hasComponents

Neuron

ChemicalSynapse

ElectricalSynapse

Neuromodulation

Component

Molecule

Function

Soma

Electrophysiology

Axon

Conductance

FiringPattern

Nerve

Ganglion

Hom

olog

hasSoma

HasFunction

hasMolecule

name

plasticitytype

Connection

Page 18: A Domain-Specific Conceptual Data Modeling and Querying Methodology

OUTLINE 18

Outline

• Introduction

• DSC-DM and DSC-QL

• Implementation

• Conclusion

Page 19: A Domain-Specific Conceptual Data Modeling and Querying Methodology

IMPLEMENTATION 19

System Architecture

Relational DB Object-Oriented DB …

DSC-DMGrammar

DSC-QL Translator

DSC-QLQuery Q

Domain Knowledge

SQL

DB Generator

OQL …

Mapping.XML

XML Converter

Things that end-users need to know

UDF

DSC-QL Normalizer

DSC-QL Normal Form

ODL/SQL_DDL/etc.

DSC.XML

UDF.XML

Page 20: A Domain-Specific Conceptual Data Modeling and Querying Methodology

IMPLEMENTATION 20

Translation Example 1

• DSC-QL:(Neuron n)[n.hasMolecule.Molecule.name = ‘5HT’]

• Normal Form:(n.name) [Neuron n, n.hasMolecule.Molecule m,

m.name=‘5HT’]

• SQL Translation:SELECT n.name FROM Neuron n, (SELECT tid1.neuronid AS pk, tid3.* FROM Neuron tid1, hasMolecule tid2, Molecule

tid3,WHERE tid1.neuronid = tid2.neuronid AND

tid2.moleculeid = tid3.moleculeid) mWHERE n.neuronid = m.pk AND m.name = ‘5HT’

Page 21: A Domain-Specific Conceptual Data Modeling and Querying Methodology

IMPLEMENTATION 21

Translation Example 2• DSC-QL (Connection c)[c.type = ‘inhibitory’, c.plasticity = ‘facilitation’]

• Normal Form(c.connectionID)[ElectricalSynapse c, c.type=‘inhibitory’, c.plasticity= ‘facilitation’]

UNION (c.connectionID)[ChemicalSynapse c, c.type=‘inhibitory’, c.plasticity= ‘facilitation’] UNION (c.connectionID)[NeuroModulation c, c.type = ‘inhibitory’, c.plasticity =

‘facilitation’]

• SQL TranslationSELECT c.connectionID FROM ElectricalSynapse cWHERE c.type = ‘inhibitory’ AND c.plasticity = ‘facilitation’

UNION … UNION …

Page 22: A Domain-Specific Conceptual Data Modeling and Querying Methodology

IMPLEMENTATION 22

Translation Example 3• DSC-QL (Neurons that have a chemical synapse to neuron R3-13) (Neuron n)[c_connect(n, m), m.name=“R3-13”]

• Normal Form (n.name)[Neuron n, Neuron m, m.name = ‘R3-13’, EXISTS (n.neuronid, m.neuronid)[ChemicalSynapse c, c.output.neuron sysid1, c.input.neuron sysid2, sysid1.neuronid=n.neuronid, sysid2.neuronid=m.neuronid]]

• SQL Translation SELECT n.name FROM Neuron n, Neuron m WHERE m.name = ‘R3-13’ AND EXISTS( SELECT n.nueornid, m.neuronid FROM ChemicalSynapse c, (SELECT sid1.connectionID AS pk FROM ElectricalSynapse sid1, input sid2, neuron sid3 WHERE sid1.connectionid = sid2.connectionid AND sid2.neuronid = sid3.neuronid) sysid1 (SELECT sid1.connectionID AS pk FROM ElectricalSynapse sid1, output sid2, neuron sid3 WHERE sid1.connectionid = sid2.connectionid AND sid2.neuronid = sid3.neuronid) sysid2 WHERE c.connectionID = sysid1.pk AND c.connectionID = sysid2.pk AND sysid1.neuronid = n.neuronid AND sysid2.neuronid = m.neuronid)

Page 23: A Domain-Specific Conceptual Data Modeling and Querying Methodology

IMPLEMENTATION 23

Implementation

• Two translation algorithms (DSC-QL→SQL and DSC-QL→OQL) have been introduced.

• Two system prototypes have been built.– Traditional business domain – Relational DBMS: Oracle 10g– Neuroscience – OODBMS: EyeDB

• Current version is a standalone Java application with a command line query interface. A web-based graphical query interface is under development and will be released very soon.

Page 24: A Domain-Specific Conceptual Data Modeling and Querying Methodology

OUTLINE 24

Outline

• Introduction

• DSC-DM and DSC-QL

• Implementation

• Conclusion

Page 25: A Domain-Specific Conceptual Data Modeling and Querying Methodology

PROPOSED METHODOLOGY 25

Related work• Many domain-specific query languages:

– Target at a particular domain such as Web Search Engine, Intrusion Analysis, genomic information, and high energy physics analysis.

– Cannot be applied to other domains.

• Some conceptual query languages (CQL):– ERQL: Built on EER model; No support for UDFs and meta-attributes; No

implementation report.– ConQuer: Based on Object Role Modelling (ORM); Must be used in a pre-

developed graphical user interface.– CBQL: Based on ConceptBase; Work on deductive database, Hard to grasp for

naive users.

• A few query methodologies– LabBase:

Constructs a domain-specific DBMS, LabBase, on top of a persistent object system, ObjectStore; Specifically tailored to the domain of laboratory information system.

– CQL in SUPER project: A graphical query language bound to a pre-developed editor; Not support UDFs; Hard

to understand for naïve users.

Page 26: A Domain-Specific Conceptual Data Modeling and Querying Methodology

CONCLUSION 26

Contributions• No domain-specific conceptual data modeling and querying methodology

has been proposed so far.

– It combines the virtues of conceptual and domain-specific query languages, can be applied to any domain and all major types of DBMSs, can be integrated into most existing applications.

– End-users can directly interact with the data based on their domain knowledge, instead of the database details.

– DSC-DM can capture more domain semantics than EER such as user-defined functions, meta-data, and annotation information.

– DSC-QL has the same expressive power as SQL, but is much simpler and more suitable for end-users. It can be translated into SQL and OQL and dynamically supports new user-defined functions without reprogramming the application system.

Page 27: A Domain-Specific Conceptual Data Modeling and Querying Methodology

CONCLUSION 27

Future Work

• Implement the support to native XML database

• Add data manipulation function into

DSC-QL