A Performance Evaluation of Alternative Mapping Schemes for Storing XML Data in a Relational...

A Performance Evaluation of Alternative Mapping

Schemes for Storing XML Data in a Relational

DatabaseBy

Daniela Floresu

Donald Kossmann

Presented by:Intakhab Mehboob Khan

Table of Contents

• Introduction• Approaches to Store Semi-Structured Data• Data Model for Semi-Structured Data• Query Language and XML-QL• Storing XML Data in Relational Database

– Mapping Attributes– Mapping Values

• Evaluating the Mapping Schemes• Conclusion

Introduction

• August 3, 1999

• How XML data can be stored and Queried

• Presented alternative Mapping Schemes to Store XML data

• Performance experiments that analyze the tradeoffs of the schemes

Approaches to Store Semi-Structured Data

• Special Purpose Database System– Examples are Lore, Rufus and Strudel– Store and retrieve xml data, using specially

designed structures and indices

• Object Oriented Database – Example is O2 or Objectsore– Rich data modeling capabilities of OODMS are

exploited

• Standard Relational Database System– Data is mapped in tables of a relational schema

Data Model for Semi-Structured Data

• Characteristics of Semi-Structured Data– Schema is not given in advance, may be

implicit– Schema is relatively large and may be

changing frequently– Schema is descriptive rather than

perspective– Data is not strongly typed

• Simple graph data model similar to OEM model

Data Model for Semi-Structured Data

Query Language and XML-QL

• All query languages for semi-structure are based on labeled graph

• Features of Semi-Structure query language– regular path expression– ability to query the schema

• In addition, XML-QL restructuring mechanism

Storing XML Data in Relational Database [Mapping Attributes]

• Edge Approach– Store all attributes in single table– Edge(source, ordinal, name, flag, target)– Indexing, Forward and backward traversals– Variant of Edge approach is: Store attributes

name in separate table


• Attribute Approach– All the attributes with the same name in one

table– Resembles to binary storage scheme

proposed to stir semi-structure data– Aname(source, ordinal, flag, target)– Indexing


• Universal Table– Single Universal table to store all attributes

of XML document

– Universal(source, ordinaln1, flagn1, targetn1,…..)


• Normalized Universal Table– Multi-valued attributes are stored in

separate Overflow tables

– UnivNorm(source, ordinaln1, flagn1, targetn1,…..)

– Overflow(source, ordinal, flag, target),….

Storing XML Data in Relational Database [Mapping Values]

• Storing values in separate table– Value table storing all integers, dates, and

all strings• Vtype(vid, value)

Storing XML Data in Relational Database [Mapping Values]

• Storing values together with attributes– Column for each data type: Inlining– No flag is needed– For indexing, on every value columns

separately in addition to source and target

Evaluating the Mapping Schemes

• Plan of Attack– Size of Relational Database for each mapping

scheme– The time to bulkload the relational database

given an XML document– The time to reconstruct the XML document from

the relational data– The time to execute different classes of XML

queries– The time to execute different kinds of update

functions


• Experimental Platform– Commercial relational database system,

installed on Sun Sparc Station 20 with• Two 75 MHZ processors• 128MB of main memory & a disk that stores the

database and intermediate results of query processing

– Machine runs on Solaris 2.6, with limited size of main memory buffer to 6.4MB

– Calls to relational database from the Java programs are implemented with JDBC


• Benchmark Specification– Benchmark Database


• Benchmark Specification– Benchmark Queries


• Benchmark Specification– Update Functions


• Benchmark Specification– Database Size


• Benchmark Specification– Bulkloading Times


• Benchmark Specification– Reconstructing the XML Document


• Benchmark Specification– Running Times of the Queries


• Benchmark Specification– Running Times of the Updates Functions

Conclusion• Relational database has following advantages

– Mature and Scale very well– Traditional and Semi-structured data can co-exist in

relational database– RDBMS are capable of performing more complex XML

queries on large database

• Disadvantages– Very expensive to reconstruct the original XML data

from relational database– Components such as authorization and concurrency

control need to be implemented outside RDBMS

Conclusion (Cont’d)

• Alternative mapping schemes results shows:– Attribute tables for every attribute name that

occurs in an XML document and inlining of values into these Attributes tables is the best approach

A Performance Evaluation of Alternative Mapping Schemes for Storing XML Data in a Relational...

Documents

Transcript of A Performance Evaluation of Alternative Mapping Schemes for Storing XML Data in a Relational...