Mapping and Integration of Multiple Forms into Relational Databases

Post on 11-Jun-2015

476 views 4 download

Transcript of Mapping and Integration of Multiple Forms into Relational Databases

CVDI is a collaboration between the University of Louisiana at Lafayette & Drexel University

MAPPING & INTEGRATINGMULTIPLE FORMS INTO A DATABASE

Yuan An, Ritu Khare, Il-Yeol Song, Xiaohua Hu

Background Patient Information

Date:

Patient

Name:

M FGender:

BP:

DOB:

HPI:

Weight:

Vital SignHeight:

Healthy Living Program

Date:

Patient

Name:

Hours Exercise:

DOB:

Smokes:

Hours Watching TV:

Social Activities

Alcohol:

piId Date Patient HPI VitalSign

gId options

001 Male

002 Female

pId Name Gender DOB

vId Height Weight BP

PatientInformation

Patient

Gender Vital Signs

The FormMapper System

Empirical Study in Healthcare

Tree Extraction Component Form Mapping and Integration Component

Layered Hidden Markov Models(HMMs)

Parent Child Association Rules

FORM

DATABASE

root

x1 x2

Y1 Y2

z1X1

z2X2

Y1

Y2

z1 z2z3Y3

Y3

z3

Initial Correspondence Generation and Validation

Database Birthing Algorithm NEW DB

Merging Algorithm

Key Techniques

Hierarchical Representation of Forms as Form Trees

Hidden Markov Models for Form Information Extraction

Sophisticated Matching techniques for Deriving Mapping Correspondences between tree and database

Form Tree Patterns and DB design principles to translate a form tree into an equivalent database (See Fig. 4)

Quantitative metric (quality tuning factor) to facilitate the decision of merging(or not merging) two mapped tables

Desirable Characteristics of Database (w.r.t. the input form)

Completeness

Correctness

Compactness

Normalization (3NF)

Optimization (minimizepotential NULL values & the number of database elements)

ID c

textbox

Tj

Fig. 3 The FormMapper System has two components: (1) Tree Extraction (2) Form Integration.

ID f

radiobutton

Tj

ID Options

1 Vk

T

Semantic Form Tree

ID f

checkbox

Tj

ID ck

T

ID

Tj

ID

TID fj f

Tr

Fig. 4 Some Form Tree to Database Mapping Patterns.

a)Textbox Pattern

b)Radiobutton Pattern c)Checkbox Pattern

d)Category – Subcategory Pattern

Datasets

16 highly complex data-

entry forms from 3healthcare institutions.

Average 57 form elements per form

Benchmarks

16 Gold Standard Trees

Prepared Using a DIY form design tool.

Two sets of 3 Gold

Standard Databases prepared by 2 database experts each with at least 10 years of experience.

Tree Extraction Component

Expectation Maximization Algorithm on 52 clinical forms

Viterbi Algorithm for decoding

5 parent child association rules

Accuracy: 96.93%

Duration: 0.07 sec per form

Form Integration Component

Indexing using Lucene

Quality tuning factor = 0.5

Duration: 3 sec per form

0

50

100

150

200

Tables Columns Values Foreign Keys

FormMapper

Gold 1

Gold 2

0

50

100

150

200

Tables Columns Values Foreign Keys

0

50

100

150

200

Tables Columns Values Foreign Keys

52%28%

20%Perfect Match

Positive Mismatch

Negative Mismatch

54%

40%

6%

FormMapper Vs Gold DB

On an average, 87% of the database

tables are either identical orsuperior(positive mismatch) to thegold database tables based on thedefined database characteristics.

Inferior cases (negative mismatch) ismostly due to the missingcorrespondences (due to extractioninaccuracies) and imprecisely derivedcardinalities amongcategory/subcategory in forms.

Implications

High potential to replace the human experts

As more forms are mapped, the database grows automatically in a principled manner .

It is challenging to automate the aspects of mapping that rely on human understanding of domain semantics.

Work in Progress

Leverage Ontology and Controlled Vocabularies to handle semantic heterogeneity.

More sophisticated Correspondence Generation and Validation Techniques

Consider more complicated merging situations (e.g. a table corresponds to a column)

In the quest for database usability, several DIY and WYSIWYG approachesenable non-technical users to design forms. Such approaches (e.g.FormAssembly) automatically translate forms into databases whileshielding the users from technical details. Such approaches, however,neither support database evolution due to changing user requirementsnor support multiple users managing a common database.

Fig. 1 Using forms as the front-end interface mapping to a back-end database is a standard way for data collection. Figure shows a scenario in healthcare domain

Fig. 2 A New Form representing a new (or evolved) user requirement

Challenges in Mapping Forms to Databases

How to automatically understand a user-created form and extract semantic relationships among form elements?

How to automatically map the semantic model extracted from a form to the existing database?

How to automatically evolve the existing database with desired properties and what are these properties?

While there exist many techniques to forward engineer a single form toan individual back-end database, mapping multiple forms to an existingstructured database remains unexplored. This work addresses theproblem of automatically mapping multiple(possibly overlapping)forms to an existing structured database.

Fig. 5. Scale of the evolved Databases

Fig. 6. Comparison of Tables.

Input Form

Database 1

Database 2

Database 3

FormMapperVs Gold 1

FormMapperVs Gold 2

Motivation and Focus