Between Types and Tables Generic Mapping Between Relational Databases and Data Structures in Clean
Mapping and Integration of Multiple Forms into Relational Databases
-
Upload
ritu-khare -
Category
Documents
-
view
476 -
download
4
Transcript of Mapping and Integration of Multiple Forms into Relational Databases
CVDI is a collaboration between the University of Louisiana at Lafayette & Drexel University
MAPPING & INTEGRATINGMULTIPLE FORMS INTO A DATABASE
Yuan An, Ritu Khare, Il-Yeol Song, Xiaohua Hu
Background Patient Information
Date:
Patient
Name:
M FGender:
BP:
DOB:
HPI:
Weight:
Vital SignHeight:
Healthy Living Program
Date:
Patient
Name:
Hours Exercise:
DOB:
Smokes:
Hours Watching TV:
Social Activities
Alcohol:
piId Date Patient HPI VitalSign
gId options
001 Male
002 Female
pId Name Gender DOB
vId Height Weight BP
PatientInformation
Patient
Gender Vital Signs
The FormMapper System
Empirical Study in Healthcare
Tree Extraction Component Form Mapping and Integration Component
Layered Hidden Markov Models(HMMs)
Parent Child Association Rules
FORM
DATABASE
root
x1 x2
Y1 Y2
z1X1
z2X2
Y1
Y2
z1 z2z3Y3
Y3
z3
Initial Correspondence Generation and Validation
Database Birthing Algorithm NEW DB
Merging Algorithm
Key Techniques
Hierarchical Representation of Forms as Form Trees
Hidden Markov Models for Form Information Extraction
Sophisticated Matching techniques for Deriving Mapping Correspondences between tree and database
Form Tree Patterns and DB design principles to translate a form tree into an equivalent database (See Fig. 4)
Quantitative metric (quality tuning factor) to facilitate the decision of merging(or not merging) two mapped tables
Desirable Characteristics of Database (w.r.t. the input form)
Completeness
Correctness
Compactness
Normalization (3NF)
Optimization (minimizepotential NULL values & the number of database elements)
ID c
textbox
Tj
Fig. 3 The FormMapper System has two components: (1) Tree Extraction (2) Form Integration.
ID f
radiobutton
Tj
ID Options
1 Vk
T
Semantic Form Tree
ID f
checkbox
Tj
ID ck
T
ID
Tj
ID
TID fj f
Tr
Fig. 4 Some Form Tree to Database Mapping Patterns.
a)Textbox Pattern
b)Radiobutton Pattern c)Checkbox Pattern
d)Category – Subcategory Pattern
Datasets
16 highly complex data-
entry forms from 3healthcare institutions.
Average 57 form elements per form
Benchmarks
16 Gold Standard Trees
Prepared Using a DIY form design tool.
Two sets of 3 Gold
Standard Databases prepared by 2 database experts each with at least 10 years of experience.
Tree Extraction Component
Expectation Maximization Algorithm on 52 clinical forms
Viterbi Algorithm for decoding
5 parent child association rules
Accuracy: 96.93%
Duration: 0.07 sec per form
Form Integration Component
Indexing using Lucene
Quality tuning factor = 0.5
Duration: 3 sec per form
0
50
100
150
200
Tables Columns Values Foreign Keys
FormMapper
Gold 1
Gold 2
0
50
100
150
200
Tables Columns Values Foreign Keys
0
50
100
150
200
Tables Columns Values Foreign Keys
52%28%
20%Perfect Match
Positive Mismatch
Negative Mismatch
54%
40%
6%
FormMapper Vs Gold DB
On an average, 87% of the database
tables are either identical orsuperior(positive mismatch) to thegold database tables based on thedefined database characteristics.
Inferior cases (negative mismatch) ismostly due to the missingcorrespondences (due to extractioninaccuracies) and imprecisely derivedcardinalities amongcategory/subcategory in forms.
Implications
High potential to replace the human experts
As more forms are mapped, the database grows automatically in a principled manner .
It is challenging to automate the aspects of mapping that rely on human understanding of domain semantics.
Work in Progress
Leverage Ontology and Controlled Vocabularies to handle semantic heterogeneity.
More sophisticated Correspondence Generation and Validation Techniques
Consider more complicated merging situations (e.g. a table corresponds to a column)
In the quest for database usability, several DIY and WYSIWYG approachesenable non-technical users to design forms. Such approaches (e.g.FormAssembly) automatically translate forms into databases whileshielding the users from technical details. Such approaches, however,neither support database evolution due to changing user requirementsnor support multiple users managing a common database.
Fig. 1 Using forms as the front-end interface mapping to a back-end database is a standard way for data collection. Figure shows a scenario in healthcare domain
Fig. 2 A New Form representing a new (or evolved) user requirement
Challenges in Mapping Forms to Databases
How to automatically understand a user-created form and extract semantic relationships among form elements?
How to automatically map the semantic model extracted from a form to the existing database?
How to automatically evolve the existing database with desired properties and what are these properties?
While there exist many techniques to forward engineer a single form toan individual back-end database, mapping multiple forms to an existingstructured database remains unexplored. This work addresses theproblem of automatically mapping multiple(possibly overlapping)forms to an existing structured database.
Fig. 5. Scale of the evolved Databases
Fig. 6. Comparison of Tables.
Input Form
Database 1
Database 2
Database 3
FormMapperVs Gold 1
FormMapperVs Gold 2
Motivation and Focus