Introduction To XML Algebra
description
Transcript of Introduction To XML Algebra
![Page 1: Introduction To XML Algebra](https://reader036.fdocuments.in/reader036/viewer/2022062315/56815350550346895dc1626f/html5/thumbnails/1.jpg)
1
Introduction To XML Algebra
Wan LiuBintou KaneAdvanced Database Instructor: Elka
2/11/20021
![Page 2: Introduction To XML Algebra](https://reader036.fdocuments.in/reader036/viewer/2022062315/56815350550346895dc1626f/html5/thumbnails/2.jpg)
2
Outline
Reasons for XML algebra Niagara algebra AT&T Algebra
![Page 3: Introduction To XML Algebra](https://reader036.fdocuments.in/reader036/viewer/2022062315/56815350550346895dc1626f/html5/thumbnails/3.jpg)
3
Data Model and Design We need a clear framework to design a
database A data model is like creating different
data structures for appropriate programming usage. It is a type system, it is abstract.
Relational database is implemented by tables, XML format is a new one method for information integration.
![Page 4: Introduction To XML Algebra](https://reader036.fdocuments.in/reader036/viewer/2022062315/56815350550346895dc1626f/html5/thumbnails/4.jpg)
4
Why XML Algebra? It is common to translate a query
language into the algebra. First, the algebra is used to give a
semantics for the query language. Second, the algebra is used to
support query optimization.
![Page 5: Introduction To XML Algebra](https://reader036.fdocuments.in/reader036/viewer/2022062315/56815350550346895dc1626f/html5/thumbnails/5.jpg)
5
XML Algebra HistoryLore Algebra (August 1999)
-- Stanford University
IBM Algebra (September 1999) --Oracle; IBM; Microsoft Corp
YAT Algebra (May 2000)
AT&T Algebra (June 2000) --AT&T; Bell Labs
Niagara Algebra (2001) -- University of Wisconsin -Madison
![Page 6: Introduction To XML Algebra](https://reader036.fdocuments.in/reader036/viewer/2022062315/56815350550346895dc1626f/html5/thumbnails/6.jpg)
6
NIAGARA Title : Following the paths of XML
Data: An algebraic framework for XML query evaluation
By : Leonidas Galanis, Efstratios Viglas, David J. DeWitt, Jeffrey. F. Naughton, and David Maier.
![Page 7: Introduction To XML Algebra](https://reader036.fdocuments.in/reader036/viewer/2022062315/56815350550346895dc1626f/html5/thumbnails/7.jpg)
7
OutLine Concepts of Niagara Algebra
Operations
Optimization
![Page 8: Introduction To XML Algebra](https://reader036.fdocuments.in/reader036/viewer/2022062315/56815350550346895dc1626f/html5/thumbnails/8.jpg)
8
Goals of Niagara Algebra
Be independent of schema information Query on both structure and content Generate simple,flexible, yet powerful
algebraic expressions Allow re-use of traditional optimization
techniques
![Page 9: Introduction To XML Algebra](https://reader036.fdocuments.in/reader036/viewer/2022062315/56815350550346895dc1626f/html5/thumbnails/9.jpg)
9
Example: XML Source Documents
Invoice.xml
<Invoice_Document>
<invoice No = 1>
<account_number>2 </account_number>
<carrier>AT&T</carrier>
<total>$0.25</total>
</invoice>
<invoice>
<account_number>1 </account_number>
<carrier>Sprint</carrier>
<total>$1.20</total>
</invoice>
<invoice>
<account_number>1 </account_number>
<carrier>AT&T</carrier>
<total>$0.75</total>
</invoice>
</Invoice_Document>
Customer.xml
<Customer_Document>
<customer>
<account>1 </account>
<name>Tom </name>
</customer >
<customer>
<account>2 </account>
<name>George </name>
</customer >
</Customer _Document>
![Page 10: Introduction To XML Algebra](https://reader036.fdocuments.in/reader036/viewer/2022062315/56815350550346895dc1626f/html5/thumbnails/10.jpg)
10
XML Data Model and Tree Graph
Example:Invoice_Document
Invoice Invoice…
numbercarrier total number
carriertotal
2 AT&T $0.25 1 Sprint $1.20
<Invoice_Document> <invoice> <number>2</number> <carrier>Sprint</carrier> <total>$0.25</total> </invoice>
<invoice><number>1</number> <carrier>Sprint</carrier> <total>$1.20</total> </invoice>
</Invoice_Document>
Ordered Tree Graph,
Semi structured Data
![Page 11: Introduction To XML Algebra](https://reader036.fdocuments.in/reader036/viewer/2022062315/56815350550346895dc1626f/html5/thumbnails/11.jpg)
11
XML Data Model [GVDNM01]
Collection of bags of vertices. Vertices in a bag have no order. Example:
Root invoice.xml invoice invoice.account_number
<invoice>Invoice-element-content
</invoice>
< account_number >element-content
</ account_number >
[Root“invoice.xml”, invoice, invoice. account_number ]
![Page 12: Introduction To XML Algebra](https://reader036.fdocuments.in/reader036/viewer/2022062315/56815350550346895dc1626f/html5/thumbnails/12.jpg)
12
Data Model Bag elements are reachable by path
expressions. The path expression consists of two
parts : An entry point A relative forward part
Example: account_number:invoice
![Page 13: Introduction To XML Algebra](https://reader036.fdocuments.in/reader036/viewer/2022062315/56815350550346895dc1626f/html5/thumbnails/13.jpg)
13
Operators Source S , Follow , Select , Join ,
Rename , Expose , Vertex , Group , Union , Intersection , Difference - , Cartesian Product .
![Page 14: Introduction To XML Algebra](https://reader036.fdocuments.in/reader036/viewer/2022062315/56815350550346895dc1626f/html5/thumbnails/14.jpg)
14
Source Operator S Input : a list of documents Output :a collection of singleton bags Examples : S (*) All Known XML documentsS (invoice*.xml) All XML documents whose filename matches “invoice*.xmlS (*,schema.dtd) All known XML documents that conform to
schema.dtd
![Page 15: Introduction To XML Algebra](https://reader036.fdocuments.in/reader036/viewer/2022062315/56815350550346895dc1626f/html5/thumbnails/15.jpg)
15
Follow operator Input : a path expression in entry
point notation Functionality : extracts vertices
reachable by path expression Output : a new bag that consist of
the extracted vertex + all the contents of the original bag (in care of unnesting follow)
![Page 16: Introduction To XML Algebra](https://reader036.fdocuments.in/reader036/viewer/2022062315/56815350550346895dc1626f/html5/thumbnails/16.jpg)
16
Follow operator (Example*)
Root invoice.xml invoice
<invoice>Invoice-element-content
</invoice>
Root invoice.xml invoice invoice.carrier
<invoice>Invoice-element-content
</invoice>
<carrier>carrier -element-content
</carrier >
(carrier:invoice)*Unnesting Follow
{[Root invoice.xml , invoice]}
{[Root invoice.xml , invoice, invoice.carrier]}
![Page 17: Introduction To XML Algebra](https://reader036.fdocuments.in/reader036/viewer/2022062315/56815350550346895dc1626f/html5/thumbnails/17.jpg)
17
Select operator Input : a set of bags Functionality : filters the bags of a
collection using a predicate Output : a set of bags that conform
to the predicate Predicate : Logical operator (,,), or simple
qualifications (,,,,,)
![Page 18: Introduction To XML Algebra](https://reader036.fdocuments.in/reader036/viewer/2022062315/56815350550346895dc1626f/html5/thumbnails/18.jpg)
18
Select operator (Example)
invoice.carrier =Sprint
Root invoice.xml invoice<invoice>
Invoice-element-content</invoice>
Root invoice.xml invoice<invoice>
Invoice-element-content</invoice>
Root invoice.xml invoice<invoice>
Invoice-element-content</invoice>
{[Root invoice.xml , invoice], [Root invoice.xml , invoice], ……………}
{[Root invoice.xml , invoice],… }
![Page 19: Introduction To XML Algebra](https://reader036.fdocuments.in/reader036/viewer/2022062315/56815350550346895dc1626f/html5/thumbnails/19.jpg)
19
Join operator Input: two collections of bags Functionality: Joins the two
collections based on a predicate Output: the concatenation of pairs of
pages that satisfy the predicate
![Page 20: Introduction To XML Algebra](https://reader036.fdocuments.in/reader036/viewer/2022062315/56815350550346895dc1626f/html5/thumbnails/20.jpg)
20
Join operator (Example)
Root invoice.xml invoice<invoice>
Invoice-element-content</invoice>
Root customer.xml customer<customer>
customer-element-content</customer>
account_number: invoice =number:customer
Root invoice.xml invoice Root customer.xml customer<invoice>
Invoice-element-content</invoice>
<customer>customer-element-content
</customer>
{[Root invoice.xml , invoice]} {[Root customer.xml , customer]}
{[Root invoice.xml , invoice, Root customer.xml , customer]}
![Page 21: Introduction To XML Algebra](https://reader036.fdocuments.in/reader036/viewer/2022062315/56815350550346895dc1626f/html5/thumbnails/21.jpg)
21
Expose operator Input: a list of path expressions of
vertices to be exposed Output: a set of bags that contains
vertices in the parameter list with the same order
![Page 22: Introduction To XML Algebra](https://reader036.fdocuments.in/reader036/viewer/2022062315/56815350550346895dc1626f/html5/thumbnails/22.jpg)
22
Expose operator (Example)
Root invoice.xml invoice. bill_period invoice.carrier
<invoice>carrier-element-content
</invoice>
<carrier>bill_period -element-content
</carrier >
(bill_period,carrier)
{[Root invoice.xml , invoice.bill_period, invoice.carrier]}
Root invoice.xml invoice invoice.carrier invoice.bill_period
<invoice>Invoice-element-content
</invoice>
<carrier>bill_period -element-content
</carrier >
{[Root invoice.xml , invoice, invoice.carrier, invoice.bill_period]}
<invoice>carrier-element-content
</invoice>
![Page 23: Introduction To XML Algebra](https://reader036.fdocuments.in/reader036/viewer/2022062315/56815350550346895dc1626f/html5/thumbnails/23.jpg)
23
Vertex operator
Creates the actual XML vertex that will encompass everything created by an expose operator
Example :
(Customer_invoice)[((account)[invoice.account_number], (inv_total)[invoice.total])]
![Page 24: Introduction To XML Algebra](https://reader036.fdocuments.in/reader036/viewer/2022062315/56815350550346895dc1626f/html5/thumbnails/24.jpg)
24
Other operators Group : is used for arbitrary
grouping of elements based on their values Aggregate functions can be used with
the group operator (i.e. average) Rename : Changes the entry point
annotation of the elements of a bag. Example: (invoice.bill_period,date)
![Page 25: Introduction To XML Algebra](https://reader036.fdocuments.in/reader036/viewer/2022062315/56815350550346895dc1626f/html5/thumbnails/25.jpg)
25
Example: XML Source Documents
Invoice.xml
<Invoice_Document>
<invoice>
<account_number>2 </account_number>
<carrier>AT&T</carrier>
<total>$0.25</total>
</invoice>
<invoice>
<account_number>1 </account_number>
<carrier>Sprint</carrier>
<total>$1.20</total>
</invoice>
<invoice>
<account_number>1 </account_number>
<total>$0.75</total>
</invoice>
<auditor> maria </auditor>
</Invoice_Document>
Customer.xml
<Customer_Document>
<customer>
<account>1 </account>
<name>Tom </name>
</customer >
<customer>
<account>2 </account>
<name>George </name>
</customer >
</Customer _Document>
![Page 26: Introduction To XML Algebra](https://reader036.fdocuments.in/reader036/viewer/2022062315/56815350550346895dc1626f/html5/thumbnails/26.jpg)
26
Xquery ExampleList account number, customer name, and
invoice total for all invoices that has carrier = “Sprint”.
FOR $i in (invoices.xml)//invoice,
$c in (customers.xml)//customer
WHERE $i/carrier = “Sprint” and
$i/account_number= $c/account
RETURN
<Sprint_invoices>
$i/account_number,
$c/name,
$i/total
</Sprint_invoices>
![Page 27: Introduction To XML Algebra](https://reader036.fdocuments.in/reader036/viewer/2022062315/56815350550346895dc1626f/html5/thumbnails/27.jpg)
27
Example: Xquery output
<Sprint_Invoice>
<account_number>1 </account_number>
<name>Tom </name>
<total>$1.20</total>
</Sprint_Invoice >
![Page 28: Introduction To XML Algebra](https://reader036.fdocuments.in/reader036/viewer/2022062315/56815350550346895dc1626f/html5/thumbnails/28.jpg)
28
Algebra Tree Execution
customer (2) customer(1) Invoice (1) invoice (2) invoice (3)
Source (Invoices.xml) Source (cutomers.xml)
Follow (*.invoice) Follow (*.customer)
Select (carrier= “Sprint” )
invoice (2)
Join (*.invoice.account_number=*.customer.account)
invoice(2) customer(1)
Expose (*.account_number , *.name, *.total )
Account_number name total
![Page 29: Introduction To XML Algebra](https://reader036.fdocuments.in/reader036/viewer/2022062315/56815350550346895dc1626f/html5/thumbnails/29.jpg)
29
Optimization with Niagara
Optimizer based on the Niagara algebra
Use the operation more efficiently
Produce simpler expression by combining operations
![Page 30: Introduction To XML Algebra](https://reader036.fdocuments.in/reader036/viewer/2022062315/56815350550346895dc1626f/html5/thumbnails/30.jpg)
30
Language Convention A and B are path expressions A< B -- Path Expression A is
prefix of B AnB --- Common prefix of path
A and B AńB --- Greatest common of
path A and B ┴ --- Null path Expression
![Page 31: Introduction To XML Algebra](https://reader036.fdocuments.in/reader036/viewer/2022062315/56815350550346895dc1626f/html5/thumbnails/31.jpg)
31
Use of Rule 8.5Make profit of rule 8.5
Allows optimization based on path selectivity
When applying un-nesting follow operation Φμ
![Page 32: Introduction To XML Algebra](https://reader036.fdocuments.in/reader036/viewer/2022062315/56815350550346895dc1626f/html5/thumbnails/32.jpg)
32
Φμ(A) [Φμ(B)]=Φμ (B)[Φμ (A)]
True WhenExist C / C <A && C < B
C = AńBOr AnB = ┴Interchangeability of Follow operation
![Page 33: Introduction To XML Algebra](https://reader036.fdocuments.in/reader036/viewer/2022062315/56815350550346895dc1626f/html5/thumbnails/33.jpg)
33
Application of 8.5 With Invoice
Φμ(acc_Num:invoice)[Φμ(carrier:invoice)] *
?=Φμ(carrier:invoice)[Φμ(acc_Num:invoice)] **
Both Share the common prefix invoice
Case AńB = invoice
![Page 34: Introduction To XML Algebra](https://reader036.fdocuments.in/reader036/viewer/2022062315/56815350550346895dc1626f/html5/thumbnails/34.jpg)
34
Benefit of Rule Application Note if:acc_Num required for each invoice Elementcarrier is not required for invoice Element
Then using *
Φμ(acc_Num:invoice)[Φμ(acc_Num:customer)]
make more sense than ** Why?
![Page 35: Introduction To XML Algebra](https://reader036.fdocuments.in/reader036/viewer/2022062315/56815350550346895dc1626f/html5/thumbnails/35.jpg)
35
Reduction of Input Size on the firstSub-operation
Φμ(carrier:invoice)
Should we or can we apply the 8.5 below?Φμ(acc_Num:invoice)[Φμ(acc_Num:Customer)]Why?
![Page 36: Introduction To XML Algebra](https://reader036.fdocuments.in/reader036/viewer/2022062315/56815350550346895dc1626f/html5/thumbnails/36.jpg)
36
acc_Num:invoice and
acc_Num:Customer are totally different path
Case is: AnB = ┴ Then yes
![Page 37: Introduction To XML Algebra](https://reader036.fdocuments.in/reader036/viewer/2022062315/56815350550346895dc1626f/html5/thumbnails/37.jpg)
37
Rule 8.7 , 8.9 , 8.11 Interesting Helps identify
When and where to use selection to decrease size of input operation to
subsequent operationExample Algebra tree slide 28Selected before join.
![Page 38: Introduction To XML Algebra](https://reader036.fdocuments.in/reader036/viewer/2022062315/56815350550346895dc1626f/html5/thumbnails/38.jpg)
38
Addition would be
Give computation for finding when rule can be applied automatically in a case and then apply it.
![Page 39: Introduction To XML Algebra](https://reader036.fdocuments.in/reader036/viewer/2022062315/56815350550346895dc1626f/html5/thumbnails/39.jpg)
39
AT&T Algebra
![Page 40: Introduction To XML Algebra](https://reader036.fdocuments.in/reader036/viewer/2022062315/56815350550346895dc1626f/html5/thumbnails/40.jpg)
40
![Page 41: Introduction To XML Algebra](https://reader036.fdocuments.in/reader036/viewer/2022062315/56815350550346895dc1626f/html5/thumbnails/41.jpg)
41
AT&T Algebra Introduction
The algebra is derived from the nested relational algebra.
AT&T algebra makes heavy use of list comprehensions, a standard notation in the function programming community.
AT&T algebra uses the functional programming language Haskell as a notation from presenting the algebra.
![Page 42: Introduction To XML Algebra](https://reader036.fdocuments.in/reader036/viewer/2022062315/56815350550346895dc1626f/html5/thumbnails/42.jpg)
42
AT&T data model The data model merges attribute and
element nodes, and eliminates comments.
Declare Basic Type: Node.Text :: String ->nodeelem :: Tag -> [Node] ->noderef :: Node ->Node
<<bibbib>> <<book yearbook year=“1999”>=“1999”> <<titletitle> Data on the Web</title>> Data on the Web</title> <year> 1999</year><year> 1999</year> </book></book>
</bib></bib>
elem “bib” [
elem “book”[
elem “@year” [ text “1999” ],
elem “title” [text “Data on the web” ] ]]
![Page 43: Introduction To XML Algebra](https://reader036.fdocuments.in/reader036/viewer/2022062315/56815350550346895dc1626f/html5/thumbnails/43.jpg)
43
Basic Type Declarations To find the type of a node,
isText :: Node -> Bool isElem :: Node -> Bool isRef :: Node -> Bool
For a text node, string :: Node -> String For an element node,
1)tag :: Node -> Tag 2)children :: Node -> [Node]
For a reference node, dereference :: Node -> Node
![Page 44: Introduction To XML Algebra](https://reader036.fdocuments.in/reader036/viewer/2022062315/56815350550346895dc1626f/html5/thumbnails/44.jpg)
44
Nested relational algebra… In the nested relational approach, data is
composed of tuples and lists. Tuple values and tuple types are written
in round brackets. (1999,"Data on theWeb",["Abiteboul"]) :: (Int,String,[String]) Decompose values: year :: (Int,String,[String]) year (x,y,l) = x
![Page 45: Introduction To XML Algebra](https://reader036.fdocuments.in/reader036/viewer/2022062315/56815350550346895dc1626f/html5/thumbnails/45.jpg)
45
Nested relational algebra… Comprehensions: List comprehensions can
be used to express fundamental query operations, navigation, cartesian product, nesting, joins.
Example: [ value x | x <- children book0, is "author" x ]
==> [ "Abiteboul" ] Normal expression:[ exp | qual1,...,qualn ] bool-exp pat <- list-exp
![Page 46: Introduction To XML Algebra](https://reader036.fdocuments.in/reader036/viewer/2022062315/56815350550346895dc1626f/html5/thumbnails/46.jpg)
46
Nested relational algebra… Using comprehensions to write queries.
Navigatefollow :: Tag -> Node -> [Node] follow t x = [ y | y <- children x, is t y ] Cartesian product[ (value y, value z) | x <- follow "book" bib0, y <- follow "title" x, z <- follow "author" x ] ==> [ ("Data on the Web", "Abiteboul")]
![Page 47: Introduction To XML Algebra](https://reader036.fdocuments.in/reader036/viewer/2022062315/56815350550346895dc1626f/html5/thumbnails/47.jpg)
47
Nested relational algebra… Joins.
elem "reviews"elem "reviews" [ [
elem "book" [ elem "book" [
elem "title" [ text"Data on the elem "title" [ text"Data on the Web" ], Web" ],
elem "review" [ text "This is elem "review" [ text "This is great!" ]] great!" ]]
elem “bib” [
elem “book”[
elem “@year” [ text “1999” ],
elem “title” [text “Data on the web” ] ]]
[ (value y, int (value z), value w) | x <- follow "book" bib0,
y <- follow "title" x,
z <- follow "@year" x,
u <- follow "book" reviews0,
v <- follow "title" u,
w <- follow “@year" u,
y == v ]
==> [("Data on the Web", 1999, "This is great!")]
![Page 48: Introduction To XML Algebra](https://reader036.fdocuments.in/reader036/viewer/2022062315/56815350550346895dc1626f/html5/thumbnails/48.jpg)
48
Nested relational algebra… Regular expression matching
( [ (x,y,u) | x <- item "@year", y <- item "title", u <- rep (item "author") ] ) :: Reg (Node,Node,[Node] ) match reg0 book0
==> [(elem "@year" [text "1999"], elem "title" [text "Data on the
Web"],
[elem "author" [text "Abiteboul"],
elem "author" [text "Buneman"],
elem "author" [text "Suciu"] ] ) ]
Match :: Reg a -> Node-> [a]
Result
![Page 49: Introduction To XML Algebra](https://reader036.fdocuments.in/reader036/viewer/2022062315/56815350550346895dc1626f/html5/thumbnails/49.jpg)
49
Nested relational algebra… Sorting.
sortBy :: (a -> a -> Bool) -> [a] -> [a]
sortBy (<=) [3,1,2,1] ==> [1,1,2,3]
GroupinggroupBy :: (a -> a -> Bool) -> [a] -> [[a]] groupBy (==) [3,1,2,1] == [[2],[1,1],[3]]
![Page 50: Introduction To XML Algebra](https://reader036.fdocuments.in/reader036/viewer/2022062315/56815350550346895dc1626f/html5/thumbnails/50.jpg)
50
Cross Comparisons of Algebra
Niagara and AT&T standalone XML algebras
Niagara proposed after W3C had selected proposed standard
and has operators which operate on sets of bags
At&T algebra chosen as proposed standard by W3C
-- expressions resemble high level query language -- latest version of document referred to as “Semantics of XML Query Language XQuery”
![Page 51: Introduction To XML Algebra](https://reader036.fdocuments.in/reader036/viewer/2022062315/56815350550346895dc1626f/html5/thumbnails/51.jpg)
51
Future Work
Need more different evaluation strategies which would allow for flexible query plans
Develop physical operators that take advantage of physical storage structures and generate mapping
from query tree to a physical query plan