Dissertation Defense Presentation

Click here to load reader

download Dissertation Defense Presentation

of 45

description

 

Transcript of Dissertation Defense Presentation

  • 1. A Framework for MappingUser-designed Forms to Relational Databases Dissertation PresentationNovember 15 2011Ritu KhareCOMMITTEE :Dr. Yuan An (Chair)Dr. Jiexun Jason LiDr. Il-Yeol SongDr. Min SongDr. Christopher C. Yang1

2. Presentation Order1. Motivation2. Problems3. Solutions4. Evaluation5. Final Remarks2 3. 1. Motivation3 4. General Motivation: Database Usability (Sawyer, 1995) Enable users to SEARCH and Enable users to DESIGNQUERY databasesdatabases. (Jagadish et al. 2007) Information Retrieval Form-based DIY and WYSIWYGTechniques (Liu et al, 2006, Hristidis paradigmset al., 2003, Catarci, 2000, Jayapandian FormAssembly, ZohoCreator,and Jagadish, 2006)GoogleFormsDatabases still remain unusable from the integration point of view(Gurses et al., 2009)4 5. Precise Motivation: Integration of New Needs Newneedsrelated to 1) Building of new forms patientssocial 2) Integration of new formhabits into back-end5 6. Research Objective To develop a mechanism to automatically map and integrate a user-designed form into existing structured database. Assume that a user-designed form is already acquired Seek a framework that merges the semantically matching elements between forms and databases. creates new database elements corresponding to the unmatched form elements.6 7. 2. Research Problems7 8. A form template represents thesemantic intentions of the designer Problem #1 : Form UnderstandingExisting Work Focus on Search Forms (Benslimane, et al. 2007, Kaljuviee et al., 2001) shorter and simpler than the data-entry forms. (empirical finding) Rules and heuristics (Zhang et al. 2004, He et al., 2007)Automatic Extraction of the form semantics not likely to circumvent theMachine can only read the syntactic patternsever broadening varieties in of form elements. A certain layout patternform topologies cannot be associated with a semantic intention.8 9. Problem#2: Correspondence DiscoveryExisting Work Schema and Ontology Mapping (Madhavan et al., 2001, Detect semantically matching Euzenat and Shvaiko, 2005, Rahm and Bernstein, 2001, An et al. 2005, An et al. 2006)elements between a form and Mostly semi-automatican existing database Not applicable to form to Challenges database correspondence discovery Variety of terms to denote the Heterogeneity between forms and same concepts.databases Correspondences are to be used for Variety of concepts denotedevolving the database; the discovery process has to keep this requirement by similar termsinto consideration. Identify and eliminate the invalid correspondences.9 10. Problem# 3: Form Integration Problem#3a: Merging Existing Work Merging into an existing Form integration (Yang et al., database so that the same 2008) concept is not duplicated and largely manual the database remains expose the users to the technical compact.details of the underlying data Merging increases the model. potential of having NULL Database integration(Yang et al. values, i.e., less optimized2003) database.provide guidelines. Judicious Decisions10 11. Problem# 3: Form Integration Problem#3b: BirthingExisting Work: Extend the database for Form-based database design the unmatched form Several methods (Choobineh et al. 1988, Pavicevic et al, 2006, Choobeneh and elementsVenkatraman, 1992, Deklarit, 2008) and commercial tools (Form assembly, How to automaticallygoogle forms, zohocreator, wufoo) No empirical evaluation of the derive the functionalresultant databases dependencies among the Few focus on designing a database with certain desirable properties, form elements?e.g., expressiveness (Yang et al, 2008, Choobineh et al., 1988, Lukovic, et al 2007). How to translate the These properties do not reflect complex form patterns?any compliance with the form semantics and are inadequate How to evaluate multiplefor evaluating the mapping process. design alternatives & pick one?11 12. Research Questions and System Goals 1. Form UnderstandingSystem Goals: A model to capture the form 1. To evolve a DB that is high-semanticsquality and optimized as per Extract this model from a giventhe form semantics, i.e., compliant to the principlesform (Wang and Strong, 1996, Ramakrishnan and Gehrke, 2002, 2. Correspondence Discovery Silberschatz, et al., 2001, Batini and Scannapieco, 2006): Determinesemantically Completeness: All formequivalent elements b/w form &elements represented indatabasedatabase IncorporateDB evolutionCorrectness: Formsemantics retained:requirement during discovery Compactness: Equivalentprocess elements merged 3. Form Integration Normalization: 3NF w.r.t.forms functional Resolve merging conflicts while dependenciesmaintaining the original formMinimize NULL values insemantics FKs and Descriptiveattributes Given a form pattern, derive a2. To ensure minimalism in therelational databasewithrequired user intervention12desirable properties 13. 3. Solutions13 14. Form Representation: Form Tree The form tree accurately captures the designers intentions, and hence the semantic associations among the form elements. Inspired by hierarchical modeling of forms in existing works (Dragut et al. 2009, Wu et al. 2009)14 15. Framework OutlineFormUnderstanding Form Treeand SemanticsExtractionCorrespondenc Form Tree with e Discovery and DiscoveredValidationCorrespondences DatabaseDesign and Database Evolution15 16. Method 1a: Form Tree Generation16 17. Method 1a: Form Tree GenerationI. Tag and 2.Derive TreeSegment Phase Phase(5 rules) The approach leverages the probabilistic nature of form designand develops a 2-layered Hidden Markov Model (HMM)based artificial designer that has the ability to understand thesemantics of any arbitrarily designed form. T-HMM: Tagging HMM S-HMM-Segmentation T-HMM17 18. Method 1b: Form Term Annotation Refine semantics by annotating terms Systematized Nomenclature of Medicine Challenge: Same form term can beClinical Terms (SNOMED CT) comprisingspecified in multiple contexts, i.e.,360,000 concepts belonging to varioussemantic categories. The key is to identifysemantic categories. the semantic category for a given term. We hypothesize that the term context can ConceptID Description Semantic Category be derived from the structure of the form tree. 0231832Respiratory RateObservable Entity 362508001Both eyes, entire Body Structure18 19. Method 1b: Form term annotationForm TreeSNOMED CTChoose the FormStructure Classificationbest match SNOMED Term CT Analyzer Model Semanticconcept fromthis category ConceptcategorySNOMED CT search service19 20. Method 2: Correspondence Discovery and Validation Linguistic Exact Concept Matching Matching 1 220 21. Total Heuristics = 4 Method 2: Validation AlgorithmPast Medical X History History X Id HPI Medications SocialHistory FamilyHxHistory of MedsX present Illness Oral HygieneAppetite Id Optionsradio 1 Good 2 Fair goodpoor 3 Poor Look-up table21 22. Method 3: Database Design and Evolution 12 322 23. Method 3a: Birthing AlgorithmTotal Patterns = 12 Principles: High Quality(Complete, Correct, Compact, Normalized) and Optimization (minimize NULLs) Traverses the form tree in depth first orderM:1Tj.ID -> Tj.cRadiobutton Pattern Textbox Pattern Category/subcategory PatternExtended RB Pattern23 24. Method 3a: Birthing Algorithm Sibling categoriespattern Textbox patternCategory-subcat. patternTextbox24 Radiobutton Checkbox pattern patternpattern 25. Method 3: Database Design and Evolution 12325 26. Tot. merging scenarios = 8 Method 3b: Merging Algorithm Compactness Factor(CF): A Each merger involves a trade-offconfigurable value (0,1) that indicates between compactness andthe weightage given to compactness optimization (min. NULL values) Null Value Ratio(NVR): A calculated principles.value that indicates the potential ofhaving NULL values in a given table. New DB Existing DB NVR = 2/5=0.4Case a: CF=0.5Case b: CF=0.3 Final DB (CF>NVR)(CF