XML To Relational Model. Key Index – Forward Traversal Backward Traversal.

Post on 15-Jan-2016

219 views 0 download

Tags:

Transcript of XML To Relational Model. Key Index – Forward Traversal Backward Traversal.

XML To Relational Model

Key Index – Forward Traversal

Backward Traversal

Binary Approach

Bname(source, ordinal, flag, target) Create many tables as different

subelement and attribute names occur in XML document

Partition Edge Table by name

Universal table – Take outer join of all binary tables

Universal Table with Overflow

Converting Ordered XML to Relations

Skynet Hitech. Company

<Company><Name>

Skynet Hitech</Name><Department>

<Name>Research

</Name><Manager>

John Smith</Manager><Employee>

Tom Jackson</Employee>

</Department>

<Department><Name>

Sales</Name><Manager>

Linda White</Manager><Employee>

Kevin Lee </Employee></Department>

</Company>

Ordered XML model for Skynet Hitech. Company

Company

Name Department

Skynet Hitech Name Manager Employee

Research John Smith Tom Jackson

Department

Name Manager Employee

Sales Linda White Kevin Lee

1

1 2 3

1 2 3 1 2 3

Schema of the storing table

Attributes IDID: the unique index for each tuple DID: the document ID Path: the path from the root to the leaf node,

this is to find a particular node Surrogate Pattern: number representation of

nodes Value: Text value associated with each node

Numbering nodes

Company

Name Department

Skynet Hitech Name Manager Employee

Research John Smith Tom Jackson

Department

Name Manager Employee

Sales Linda White Kevin Lee

1[1]

2[2]

2[1]

Tuple that stores “Linda White”

ID: 00334 DID: 501 Path: Company/Department/Manager Surrogate Pattern: 1[1]2[2]2[1] Value: Linda White

Old Skynet file stored in the RDBMS

OLD  

Path Surrogate Patten Value

Company/Name 1[1]1[1] Skynet Hitech

Company/Department/Name 1[1]2[1]1[1] Research

Company/Department/Manager 1[1]2[1]2[1] John Smith

Company/Department/Employee 1[1]2[1]3[1] Tom Jackson

Company/Department/Name 1[1]2[2]1[1] Sales

Company/Department/Manager 1[1]2[2]2[1] Linda White

Company/Department/Employee 1[1]2[2]3[1] Kevin Lee

book

booktitle

author

monograph

title

contactauthor

authorID

editor

*

nameaddress

?

firstname lastname

?

authorid

article

*

name

<!ELEMENT book (booktitle, author)

<!ELEMENT booktitle (#PCDATA)>

<!ELEMENT author (name, address)><!ATTLIST author id ID #REQUIRED>

<!ELEMENT name (firstname?, lastname)>

<!ELEMENT firstname (#PCDATA)>

<!ELEMENT lastname (#PCDATA)>

<!ELEMENT address ANY>

<!ELEMENT article (title, author*, contactauthor)>

<!ELEMENT title (#PCDATA)>

<!ELEMENT contactauthor EMPTY><!ATTLIST contactauthor authorID IDREF IMPLIED>

<!ELEMENT monograph (title, author, editor)>

<!ELEMENT editor (monograph*)><!ATTLIST editor name CDATA #REQUIRED>

Basic Inline Algorithm

A relation is created for root of element of graph

All element’s descendents are inlined into that relation except Children below a “*” node are made into

separate relations – this corresponds to creating a new relation for a set-valued child

Each node having a backpointer edge pointing to it is made into a separate relation

Drawbacks

Grossly inefficient for many queries “List all authors having first name Jack” will have to

be executed as the union of 5 separate queries Large number of relations it creates

To determine the set of relations to be created for an element, we construct an element graph by… Do a DFS traversal of DTD graph, starting at element

node for which we are constructing relations Each node is marked as “visited” the first time it is

reached and is unmarked once all its children have been traversed

If an unmarked node in DTD graph is reach during DFS, a new node bearing the same name is created in the element graph

A regular edge is created from the most recently created node in the element graph with the same names as the DFS parent of the current DTD node to newly created node

If an attempt is made to traverse an already marked DTD, then a backpointer edge is added from the most recently created node in the element graph to the most recently created node in the element graph of the same name as the marked DTD node

Fragmentation: Example

Results in 5 relations Just retrieving first and last names of an

author requires three joins!

<!ELEMENT author (name, address)><!ATTLIST author id ID #REQUIRED>

<!ELEMENT name (firstname?, lastname)>

<!ELEMENT firstname (#PCDATA)>

<!ELEMENT lastname (#PCDATA)>

<!ELEMENT address ANY>

author (authorID: integer, id: string)

name (nameID: integer, authorID: integer)

firstname (firstnameID: integer, nameID: integer, value: string)

lastname (lastnameID: integer, nameID: integer, value: string)

address (addressID: integer, authorID: integer, value: string)

Shared Inlining Method

Relations are created for… All elements in the DTD graph whose nodes have an

in-degree greater than one. Nodes with in-degree of one are inlined

Elements have an in-degree of zero Elements below a “*” node Of mutually recursive elements all having in-degree

one, one of them is made a separate relation Each element node X that is a separate relation inlines

all nodes Y that are reachable from it such that the path from X to Y does not contain a node that is to be made a separate relation

Issues with Sharing Elements

Parent of elements not fixed at schema level

Need to store type and ids of parents parentCODE field (type of parent) parentID field (id of parent) No foreign key relationship

Hybrid

Same as Shared except that it inlines some elements not inlined in Shared Inlines elements with in-degreee greater than

one that are not recursive or reached through a “*” node.

Set sub-elements and recursive elements are treated as in Shared

book (bookID: integer, book.booktitle.isroot: boolean, book.booktitle : string)

article (articleID: integer, article.contactauthor.isroot: boolean, article.contactauthor.authorid: string)

monograph (monographID: integer, monograph.parentID: integer, monograph.parentCODE: integer, monograph.editor.isroot: boolean, monograph.editor.name: string)

title (titleID: integer, title.parentID: integer, title.parentCODE: integer, title: string)

author (authorID: integer, author.parentID: integer, author.parentCODE: integer, author.name.isroot: boolean, author.name.firstname.isroot: :boolean, author.name.firstname: string, author.name.lastname.isroot: boolean, author.name.lastname: string, author.address.isroot: boolean, author.address: string, author.authorid: string)

Shared Inline

Hybrid