Querying and storing XML Week 5 Publishing relational data as XML February 12-15, 2013.
Storing and Querying Ordered XML Using a Relational Database System
-
Upload
damian-snow -
Category
Documents
-
view
38 -
download
0
description
Transcript of Storing and Querying Ordered XML Using a Relational Database System
Storing and Querying Ordered XML Using a Relational Database System
By Khang NguyenBased on the paper of Igor Tatarinov and Statis Viglas
Introduction Researchers proposed using relational
database system to decompose XML documents into relations and translate XML queries into SQL queries over these relation.
The paper will answer the question of how (and whether) the ordered XML data model can be effectively supported by the unordered relational data model.
Introduction (Cont.)
The paper proposes three order encoding methods (Global Order, Local Order, and Dewey Order) to represent XML order in the relational data model.
The paper will also answer the question of when and why to use which encoding method.
1. Ordered XML: Data Model, Query Language and Query Dimensions
1.1 The XML Data Model An XML can be viewed as a tree. Leaf Nodes = Data Values (text). Internal Nodes = XML elements. Document Order = the orders of the
elements in the XML document.
1. Ordered XML: Data Model, Query Language and Query Dimensions
1.2 Order in XML Query Language 1.2.1 XPath
is a language for specifying navigation within an XML document.
An XPath expression’s syntax: Path ::= /Step1/Step2/…/Stepn Path ::= /films/film/writers/writer/Murray
Burnett An XPath expression is evaluated
sequentially, “step” by “step”.
1. Ordered XML: Data Model, Query Language and Query Dimensions (Cont.)
1.2.2 XQuery Is a more complex language based on
XPath. Has all the functionalities of XPath. Includes Before and After operators that
take two node sequences (XPath expressions) and
Return the nodes from the first sequence that are before or after some node in the second sequence.
1. Ordered XML: Data Model, Query Language and Query Dimensions (Cont.)
1.3 Evaluation Modes for XML Queries 1.3.1 Select Mode:
The nodes in an input XML document are assumed to have unique identifiers (IDs).
The results of executing an XPath expression is an ordered set of node IDs.
1.3.2 Reconstruct Mode: Combines selection and extraction. The result of evaluating an XPath expression in
reconstruct mode is an ordered set of XML elements.
2. XML Order Encoding Methods
To capture the document order in the relational data model is accomplished by encoding each node’s position in an XML document as a data value.
Unfortunately, there is no single encoding method, which is optimal for both queries and updates.
2. XML Order Encoding Methods (Cont.) 2.1 Global Order Encoding:
Each node is assigned a number that represents the node’s absolute position in the document.
Poor insertion performance is the primary weakness.
To improve updates’ performance, use sparse numbering which does not require the remaining nodes be numbered when deleting XML fragments.
Global Order Encoding
play
title act act
1
2 4
#text
title scene
#text
3 5
6
7
Global Order
2. XML Order Encoding Methods (Cont.) 2.3 Local (Sibling) Order Encoding:
Each node is assigned a number that represents its relative position among its siblings.
Combining a node’s position with that of its ancestors yields a path vector that uniquely identifies the absolute position of the node within the document.
The advantage is the low overhead incurred by updates.
Local Order Encoding
play
title act act
1
1 2
#text
title scene
#text
1 1
1
2
Local Order
3
2. XML Order Encoding Methods (Cont.) 2.3 Dewey Order Encoding:
Each node is assigned a vector that represents the path from the document’s root to the node.
Is “lossless” because each path uniquely identifies the absolute position of the node within the document.
One potential disadvantage is the extra space required to store paths from the root to each node.
Dewey Order Encoding
play
title act act
1
1.1 1.2
#text
title scene
#text
1.1.1 1.2.1
1.2.1.1
1.2.2
Dewey Order
1.3
3. Shredding Ordered XML into Relations 3.1 The Schema-less Case:
The schema of input document is unknown. The Edge shredding approach is proposed. A single relation, the Edge table, is used to
store an entire document. The Edge table is defined
Edge(id, parent, name, value) To reduce storage, a separate relation (the
Path table) can be used to store paths and their identifiers.
3. Shredding Ordered XML into Relations (Cont.)
3.1.1 Storing Order Information Global Order
Edge(id, parent_id, end_des_id, path_id, value)
Local Order Edge(id, parent_id, sIndex, path_id,
value) Dewey Order
Edge(dewey, path_id, value)
3. Shredding Ordered XML into Relations (Cont.) 3.2 The Schema-aware Case:
Inlining shredding technique can be used when an XML Schema (DTD) is known.
One advantage is the possibility of more efficient navigation from an element to its sub-elements.
Another advantage is Inlining shreds and stores XML documents into a set of tables. Queries tend to access less data and perform better.
Summary Relational database systems can support
most ordered XML queries efficiently. Global Order is best for query-mostly
workloads. Dewey Order is best for a mix of queries and updates. Local Order is best for update-intensive environments.
Edge and Inlining methods are well known for translating ordered XML queries into SQL queries.