1 Rainbow XML-Query Processing Revisited: The Complete Story (Part I) Xin Zhang.
-
date post
22-Dec-2015 -
Category
Documents
-
view
220 -
download
1
Transcript of 1 Rainbow XML-Query Processing Revisited: The Complete Story (Part I) Xin Zhang.
2
Motivation Experience from Past Imp. Solid Foundation for Researches
Order-sensitive Query Processing. Brian
Update Computation Pushdown. Mukesh
Query Optimization. Brian & Brad Cost-based XML Storage. Xin
4
<!ELEMENT prices (book*)> <!ELEMENT book (title, source, price)> <!ELEMENT title (#PCDATA)> <!ELEMENT source (#PCDATA)> <!ELEMENT price (#PCDATA)>
<prices> <book>
<title> TCP/IP Illustrated </title> <source>www.amazon.com</source> <price>65.95</price>
</book> <book>
<title> TCP/IP Illustrated </title> <source>www.bn.com</source> <price>69.95</price>
</book> <book>
<title>Data on the Web</title> <source>www.amazon.com</source> <price>34.95</price>
</book> <book>
<title>Data on the Web</title> <source>www.bn.com</source> <price>39.95</price>
</book> </prices>
Example* of XML Use Cases.
5
Example XQuery
<results> {
for $t in distinct (document("prices.xml") //book/title)
let $p := document("prices.xml") //book[title = $t]/price
return <minprice title= $t/text()>
<price> min($p/text()) </price> </minprice>
} </results>
In the document "prices.xml", find the minimum price for each book, in the form of a "minprice" element with the book title as its title attribute.
<results> <minprice title="TCP/IP Illustrated">
<price>65.95</price> </minprice> <minprice title="Data on the Web">
<price>34.95</price> </minprice>
</results>
6
Four Kinds of Data Models Object –Relational storage.
Special data type: sequence. Two kinds of Tables
Flat Table: Required ID assignment. Nested Table: Complicated operators.
Two kinds of Cells: References to DOM trees: Requires de-
referencing. Values: Waste space.
They are all interchangeable.
7
Data Model An Ordered Sensitive Table. Every cell has its own domain, e.g.:
SQL domains. XML node. A Collection.
Every column denotes one variable ($v) or an internal variable (coln,Rm).
Comparison are done by deep equal.
8
Do we need Schema Order? So far, all the operators doesn’t
require any schema order. Hence, we will not consider the
schema order in the data model.
9
Definition of Collection Collection has to have at least 2
objects. If try to generate a collection of one
object, the collection will be reduced into the object, and no collection will be generated.
Collection cannot be nested! Collection is an unnamed XML node.
10
Data Model Examples Table of XML
Fragments. Table Types:
Regular Relations. Table with XML nodes. Table with a collection of
XML nodes.
<price> 65.95</price>
title price
price
prices {<price> 34.95</price>,<price> 39.95</price>,...}
11
Column Names A relation column name
“price”, A generated column name.
“col1”, “r1”. A Variable Binding:
“$var”
13
XML Operators (5+2)
OperatorOperator SymSym..
PrmsPrms..
OutpOutputut
DatDataa
DescriptionDescription
Tagger T p col s Taggering s according to list pattern p.
Navigate
col1, path
col2 s Navigate from column col of s through a XPath.
Aggregate Agg N/A N/A s Make a collection for each column.
Composer C p col s Construct a XML document from one s according to DOM pattern p.
XML Union X col+ col s Union multiple columns into one.
XML Intersect
X col+ col s Intersect multiple columns into one.
XML Difference
-X col+ col s Difference multiple columns into one.
14
Special Operators (7)OperatorOperator SySy
mmPrms.Prms. OutputOutput DataData DescriptionDescription
SQL SQL stmt col+ N/A One SQL query statement stmt over multiple s.
Function {F} param+ col s? XML or user defined function over zero or one data source with a list of parameters.
Source S desc col+ N/A Identify a data source by description desc. It could be a piece of XML fragments, an XML documents, or a relational table.
Name col1, col2ns
ss
Rename column col1 of source s into name col2.
name s into ns.
FOR FOR col+ s, sq FOR operator iterate over s and execute subquery sq with variable binding columns col1..n.
IF_THEN_ELSE
IF c sq1, sq2 If condition c is true, then execute subquery sq1, else execute subquery sq2.
Merge M s+ Merge multiple tables into one table.
15
SQL Operators (11)OperatorOperator Sym.Sym. Prms.Prms. OutpOutp
ututDataData DescriptionDescription
Project col+ N/A s Project out multiple columns from subquery s.
Select c N/A s Filter subquery s by condition c.
Cartesian Product
N/A N/A s1, s2 Cartesian product of the results of two sources, s1 and s2.
Theta Join c N/A ls, rs Join two sources ls and rs under condition c.
Outer Join
cc
N/AN/A
ls, rsls, rs
Left (right) outer join two sources ls and rs by condition c.
Groupby col+ N/A s, sqg Making temporary groups by multiple columns from source s, then evaluate subquery sqg for each group, then merge the evaluated results back.
Orderby col+ N/A s Sort source s by multiple columns.
Union N/A N/A s+ Union multiple sources together.
Outer Union O N/A N/A s+ Outer union multiple sources together.
Difference N/A N/A ls, rs Difference between two sources.
Intersect N/A N/A s+ Intersect multiple sources.
COp COp Col+ N/A s, sq Correlated Operator on columns col+. It will execute sq for each tuple in source s.
16
Functions (Examples) Ref: http://www.w3.org/TR/xquery-operators/
TypeType ExamplesExamples
String concat, contains, lowercase, name, starts-with, subst, trim, uppercase ...
Aggregation avg, count, max, min, sum, ...
Sequence exists ...
Date and Time
date ...
Context last, position ...
Node shallow ...
... ...
User Defined
The new function defined in the XQuery.
17
Expression
Used in Select and Join operators.
Arithmetic: negative, +, -, *, /,
%. Boolean:
NOT, OR, AND >, =, <, >=, <=,
<> Terminals:
String and Double Column Name
interfaceBinANDExpression
interfaceBinArithExpression
+PLUS:int+MINUS:int+MULTIPLE:int+DIVIDE:int+MOD:int
interfaceBinBoolExpression
interfaceBinCOMPExpression
+LT:int+GT:int+LEQ:int+GEQ:int+EQ:int+NEQ:int
interfaceBinExpression
left:Expression right:Expression
interfaceBinORExpression
Visitableinterface
Expression
+eval:Object
interfaceTerminalExpression
+STRING:int+DOUBLE:int+NAME:int
type:int
interfaceUniExpression
expression:Expression
interfaceUniMinusExpression
interfaceUniNOTExpression
18
Pattern for Tagger
List pattern only contains Strings and Column Names.
DOM pattern is a tree.
interfaceAttributeNode
tagValue:NavigationStep[]
interfaceColumnNameNode
tagValue:edu.wpi.cs.dsrg.xmldb.xat.common.operator.xmloperator.NavigationStep[]
Visitableinterface
DOMPatternNode
+addChild:void+addChild:void+deleteChild:void+getChild:DOMPatternNode+setChild:void+setTagValue:void+getTagValue:Object+setTagValue:void
children:DOMPatternNode[] parent:DOMPatternNode tagName:String canceledOut:boolean
interfaceRootNode
interfaceTagNode
interfaceTextNode
19
Where are we? XAT Data Model XAT Operators
XML (5): Tagger, Composer, Navigate, Aggregate, XML Union.
XAT Generation
20
Tagger Tpcol (s)
Consume: columns used in the pattern p. Produce: generate the new column col. Logic:
One additional column is added with tagged information.
Need to work with operator to create nested structure. Order Handling:
The tagged column is added to the end. The tuple order of the output table is same as table s.
Requirement: The columns used in pattern p should be in table s.
21
Example: T<price>[col1]</price>col2
Col1
65.95
34.95
Col1
<price>65.95</price>
<price>34.95</price>
22
Composer Cpcol(s)
Consume: columns used in the pattern p. Produce: generate the new column col with
nested structure. Logic:
Doesn’t require other operator to create the nested structure.
Order Handling: Tuple order is same as the input.
Requirement: Require a special schema for the input subquery s. (id[1..n], type, att[1..m], value)
23
Navigate col, pathcol’(s)
Consume: column col. Produce: new column col’. Logic:
One additional column is added with navigation information. Tuples are multiplied if there are more than one results in
the navigation. If the navigation result is empty, get rid of that tuple.
Order Handling: The navigation column is added to the end. The tuple order of the output table is same as table s and
the navigation order. Requirement: N/A
24
Two types of Navigates Navigate Unnesting:
Unnesting the parent-children relationship, and duplicates the parent values for each child.
Navigate Collection: Nesting the parent-children
relationship, create a collection of children, but keep the single parent.
26
Collections Issues in , 1)What happened if there already a
collection in input table? !Depends on the input table. If navigate
from the collections, see issue 2. If not, then same as the original collection.
2)What happened if navigate from a collection in the input table?
Then, generate another collection, but no nested collections.
27
Navigation Steps in the Navigate operator.
Attribute: @ Children: //, /child Text: text() Column Name: col1
28
Navigation Use Cases a(<a>...</a>) NULL b(<a><b>...</b></a>)
<b>...</b> a(<a><a>...</a></a>)
<a>...</a> text()(<a>text()</a>) text() a({<a/>,<b/>} <a/>
29
Example of R1, bookcol1
R1 Col1<prices>...</prices>
<book> <title> TCP/IP Illustrated </title> <source>www.amazon.com</source> <price>65.95</price>
</book>
<prices>...</prices>
<book> <title> TCP/IP Illustrated </title> <source>www.bn.com</source> <price>69.95</price>
</book>
<prices>...</prices>
<book> <title>Data on the Web</title> <source>www.amazon.com</source> <price>34.95</price>
</book>
<prices>...</prices>
<book> <title>Data on the Web</title> <source>www.bn.com</source> <price>39.95</price>
</book>
R1<prices>
<book> ...
</book>
<book> ...
</book>
<book> ...
</book>
<book> ...
</book> </prices>
30
Example of R1, bookcol1
R1 Col1<prices>
<book> ...
</book> <book>
...</book> <book>
...</book> <book>
...</book>
</prices>
{<book> <title> TCP/IP Illustrated </title> <source>www.amazon.com</source> <price>65.95</price>
</book> ,<book>
<title> TCP/IP Illustrated </title> <source>www.bn.com</source> <price>69.95</price>
</book> ,<book>
<title>Data on the Web</title> <source>www.amazon.com</source> <price>34.95</price>
</book> ,<book>
<title>Data on the Web</title> <source>www.bn.com</source> <price>39.95</price>
</book>}
R1<prices>
<book> ...
</book>
<book> ...
</book>
<book> ...
</book>
<book> ...
</book> </prices>
31
Example of col1, bookcol2
Col1
{<book> <title> TCP/IP Illustrated </title> <source>www.amazon.com</
source> <price>65.95</price>
</book> ,<book>
<title> TCP/IP Illustrated </title> <source>www.bn.com</source> <price>69.95</price>
</book> ,<book>
<title>Data on the Web</title> <source>www.amazon.com</
source> <price>34.95</price>
</book> ,<book>
<title>Data on the Web</title> <source>www.bn.com</source> <price>39.95</price>
</book>}
Col1
col2
{...}
<book> <title> TCP/IP Illustrated </title> <source>www.amazon.com</
source> <price>65.95</price>
</book>
{...}
<book> <title> TCP/IP Illustrated </title> <source>www.bn.com</source> <price>69.95</price>
</book>
{...}
<book> <title> Data on the Web</title> <source>www.amazon.com</
source> <price>34.95</price>
</book>
{...}
<book> <title> Data on the Web </title> <source>www.bn.com</source> <price>39.95</price>
</book>
32
Example of col1, titlecol2
Col1
{<book> <title> TCP/IP Illustrated </title> <source>www.amazon.com</
source> <price>65.95</price>
</book> ,<book>
<title> TCP/IP Illustrated </title> <source>www.bn.com</source> <price>69.95</price>
</book> ,<book>
<title>Data on the Web</title> <source>www.amazon.com</
source> <price>34.95</price>
</book> ,<book>
<title>Data on the Web</title> <source>www.bn.com</source> <price>39.95</price>
</book>}
Col1 col2
{<book> ...
</
book> ,<book> ...
</
book> ,<book> ...
</
book> ,<book> ...
</
book>}
{<title> TCP/IP Illustrated
</title> ,<title> TCP/IP Illustrated
</title> ,<title> Data on the Web
</title> ,<title> Data on the Web
</title> }
33
Aggregate Agg(s) Consume: nothing. Produce: nothing. Logic:
Create a collection for each column. Order Handling:
There is only one tuple. Requirement: N/A
34
Example of Agg(s)
Col1<book>
<title> TCP/IP Illustrated </title> <source>www.amazon.com</source> <price>65.95</price>
</book>
<book> <title> TCP/IP Illustrated </title> <source>www.bn.com</source> <price>69.95</price>
</book>
<book> <title>Data on the Web</title> <source>www.amazon.com</source> <price>34.95</price>
</book>
<book> <title>Data on the Web</title> <source>www.bn.com</source> <price>39.95</price>
</book>
Col1
{<book> <title> TCP/IP Illustrated </title> <source>www.amazon.com</source> <price>65.95</price>
</book> ,<book>
<title> TCP/IP Illustrated </title> <source>www.bn.com</source> <price>69.95</price>
</book> ,<book>
<title>Data on the Web</title> <source>www.amazon.com</source> <price>34.95</price>
</book> ,<book>
<title>Data on the Web</title> <source>www.bn.com</source> <price>39.95</price>
</book>}
35
XML Union X col[1..n] col(s)
Consume: columns col[1..n]. Produce: new column col. Logic:
For every tuple with col[1..n], merge their results into one collection and put it into the new column col.
Order Handling: N/A Requirement: N/A
36
Example: X title, priceresult(s)
title price Result<title> TCP/IP Illustrated </title>
<price>65.95</price> {<title> TCP/IP Illustrated
</title>,
<price>65.95</price>}<title> TCP/IP Illustrated </title>
<price>69.95</price> {<title> TCP/IP Illustrated
</title>,
<price>69.95</price>}<title>Data on the Web</title>
<price>34.95</price> {<title> Data on the Web
</title>,
<price>34.95</price>}<title>Data on the Web</title>
<price>39.95</price> {<title> Data on the Web
</title>,
<price>349.95</price>}
37
Where are we? XAT Data Model XAT Operators
XML (5) Special (7):SQL, Function, Source,
Name, FOR, IF, Merge. XAT Generation
38
SQL SQLstmtcol[1..m]
Consume: depends on the stmt. Produce: depends on the stmt. Logic:
Execute stmt over the multiple tables and output the result. It is assumed to be executed by a RDB engine. Usually, it’s the operator right above the source (e.g., table) operator.
Order Handling: The tuple order is un-decidable. The tuple order can
be reconfirmed by additional orderby node. Requirement: N/A.
39
Function Fparam[1..m] col(s?) Consume: columns used in the
param[1..m] Produce: new column col. Logic:
Execute XML or user defined function on the data sources.
Or used to represent a recursive query. Order Handling:
They can be reconfirmed by orderby nodes. Requirement: N/A.
40
Source sdesccol[1..n]
Consume: nothing Produce: new column col for XML sources; multiple
columns for Table source. Logic:
Identify following sources: view, XML document, XML fragment, or a table.
Col[1..n] depends on the source description. It will be one new column if the input is a XML source, otherwise, it will be a list of columns from the table source.
Order Handling: Depends on the implementation. Keep original tuple order as much as possible.
Requirement: N/A.
41
Example of S “prices.xml” R1
R1<prices>
<book> <title> TCP/IP Illustrated </title> <source>www.amazon.com</source> <price>65.95</price>
</book> <book>
<title> TCP/IP Illustrated </title> <source>www.bn.com</source> <price>69.95</price>
</book> <book>
<title>Data on the Web</title> <source>www.amazon.com</source> <price>34.95</price>
</book> <book>
<title>Data on the Web</title> <source>www.bn.com</source> <price>39.95</price>
</book> </prices>
42
Name Column col1, col2 (s)
Consume: Column col1. Produce: Column col2. Logic:
Rename col1 in table s into col2.
Order Handling: Keep all the schema and tuple orders.
Requirement: col1 in table s.
43
Name ns (s)
Consume: Nothing. Produce: Nothing. Logic: name s to ns. Order Handling:
Keep all the schema and tuple orders. Requirement: N/A.
44
FOR FORcol[1..n] (s, sq ) Consume: Nothing Produce: Nothing. Logic:
It’s a FOR iteration operator. For value in the columns col[1..n] of table s, evaluate the sub-query sq.
Very important for query decorrelation. Order Handling:
Schema order is decided by sq. Tuple order is similar to the join operator without the
left part. Requirement: N/A.
45
Merge M (s[1..n])
Consume: Nothing Produce: Nothing. Logic:
Merge multiple tables into one table. Tuple order is very important in this
operator. Order Handling:
Tuple order same as the input. Requirement:
s[1..n] have same number of tuples.
46
Example: M (title, price)
title price<title> TCP/IP Illustrated </title>
<price>65.95</price>
<title> TCP/IP Illustrated </title>
<price>69.95</price>
<title>Data on the Web</title>
<price>34.95</price>
<title>Data on the Web</title>
<price>39.95</price>
title<title> TCP/IP Illustrated </title>
<title> TCP/IP Illustrated </title>
<title>Data on the Web</title>
<title>Data on the Web</title>
price<price>65.95</price>
<price>69.95</price>
<price>34.95</price>
<price>39.95</price>
47
Where are we? XAT Data Model XAT Operators
XML Operators (5). Special Operators (7). SQL Operators (11): Project, Select,
Cartesian Product, Join (Theta, Outer), Groupby, Orderby, Union (Node, Outer), COp, Intersect, Difference.
XAT Generation
48
Project col[1..n] (s) Consume: All columns in DM(s). Produce: nothing. Logic: Keep only columns col[1..n] in DM(s). Order Handling:
Keep original tuple order, the schema order is reordered as the col[1..n] in the project operator.
Requirement: The col[1..n] should be in source s.
49
Select c(s) Consume: columns used in condition
expression c. Produce: nothing. Logic: Keep tuples in s when c is true. Order Handling:
Keep original tuple order, keep original schema order.
Requirement: Condition c should only reference to the
source s.
50
Theta Join c (ls, rs) Consume: columns in the condition c. Produce: nothing. Logic: Join ls and rs together under condition c. Order Handling:
The tuple order of the output table is iteration of tuples in rs over the iteration of tuples in ls, e.g., {<l1, r1>, <l1, r2>, <l2, r1>, <l2, r2>}
Requirement: Condition c should be relates to both tables ls and rs.
51
Left Outer Join c (ls, rs) Consume: Columns in the condition c. Produce: Nothing Logic: Join but keep all the tuples in ls. Order Handling:
The tuple order of the output table is iteration of tuples in rs over the iteration of tuples in ls, e.g., {<l1, r1>, <l1, r2>, <l2, null>, <l3, r1>, <l3, r3>}
Requirement: Condition c should be relates to both tables ls and rs.
52
Right Outer Join c (ls, rs) Consume: Columns in the condition c. Produce: Nothing. Logic: Join but keep all the tuples in rs. Order Handling:
The tuple order of the output table is iteration of tuples in ls over the iteration of tuples in rs, e.g.,{<null, r1>, <null, r2>, <l1, r1>, <l1, r2>, <l2, r1>, <l2, r3>}, “null” is at the beginning of the output.
Requirement: Condition c should be relates to both tables ls and rs.
53
Left Semi Join c (ls, rs) Consume: Columns in condition c. Produce: nothing. Logic: Join but only keep the columns in
ls. Order Handling:
The tuple order of the output table is same as table ls.
Requirement: Condition c should be relates to both tables
ls and rs.
54
Semi Join c (ls, rs) Consume: Columns used in condition c. Produce: nothing. Logic: Join but only keep the columns in
rs. Order Handling:
The tuple order of the output table is same as table rs.
Requirement: Condition c should be relates to both tables
ls and rs.
55
Groupby col[1..n] (s, sq) Consume: col[1..n] Produce: nothing. Logic:
Group the DM(s) by col[1..n], then apply sq on each group.
If the sq generates a table instead of one single value, the generated table will be treated as a collection.
Order Handling: The tuple order of the output table is same as table s.
Requirement: Col[1..n] should be in table s.
56
Orderby col[1..n] (s) Consume: col[1..n] Produce: nothing. Logic: Order s by col[1..n]. Order Handling:
The tuple order of the output table is as specified.
Requirement: Col[1..n] should be in table s.
57
Union (s[1..n]) Consume: nothing Produce: nothing Logic:
Same as SQL. Order Handling:
The tuple order of the output table is in the order of table s[1..n].
Requirement: All tables s[1..n] have same schema.
58
Outer Union O(s[1..n]) Consume: nothing Produce: nothing Logic:
Same as SQL. Order Handling:
The tuple order of the output table is in the order of table s[1..n].
Requirement: N/A.
59
Intersect (s[1..n]) Consume: nothing Produce: nothing Logic:
Same as SQL. Order Handling:
The tuple order of the output table is in the order of table s[1..n].
Requirement: All tables s[1..n] have same schema.
60
Difference (ls, rs) Consume: nothing Produce: nothing Logic:
Same as SQL. Order Handling:
The tuple order of the output table is in the order of table ls.
Requirement: Tables ls and rs have same schema.
61
Full set of Operators XML (5):
T, C, , Agg(), X Special (7):
SQL, F, S, , FOR, IF, M SQL (11):
, , , , , , , , , O, COp, , Syntax
Op<params><column_name>(<sub_queries>)
<column_name>:=Op(<params>) [<sub_queries>]
64
How to translate FOR binding?
FOR $x IN for-binding
Inner-query use $xFOR($x)
$x IN For-binding
Inner-query use $x
65
How to translate LET binding?
LET $x := let-binding
Rest-of-query use $x
$x := let-binding
Rest-of-query use $x
66
What’s difference between FOR and LET bindings? XQuery
FOR $x IN document(“x.xml”)/x LET $x := document(“x.xml”)/x
XAT For-binding: R1, x
$x (s“x.xml”R1)
Let-binding: C R1,x col1(s“x.xml”R1)
67
XML Parser TreeQuiltQuery(
ElementConstruct(<Results>,FLWRExpression(
Binding(ForBinding($t, distinct, Nav(
FunDocument(“prices.xml”),Steps(
LocationStep(//), LocationStep(book), LocationStep(title))),
LetBinding($p, Nav(FunDocument(“prices.xml”),Steps(
LocationStep(//), LocationStep(book,
BinOpComp(=,Nav(CurrentNode,
Steps(LocationStep(title))),
Nav(Var($t), Steps(Text())))),LocationStep(price))))),
ElementConstruct(<minprice>,AttributeExpression(@title,
Nav(Var($t), Steps(Text()))),ElementConstruct(<price>,
FunMin(Nav(Var($p, Steps(Text())))))))))
<results> {
for $t in distinct (document("prices.xml") //book/title) let $p := document("prices.xml") //book[title = $t]/price return
<minprice title= $t/text()> <price> min($p/text()) </price>
</minprice> }
</results>
68
Parsed Tree (1)QuiltQuery(
ElementConstruct(<Results>,FLWRExpression(
Binding(ForBinding($t, distinct, Nav(
FunDocument(“prices.xml”),Steps(
LocationStep(//), LocationStep(book), LocationStep(title))),
LetBinding($p, Nav(FunDocument(“prices.xml”),Steps(
LocationStep(//), LocationStep(book,
BinOpComp(=,Nav(CurrentNode,
Steps(LocationStep(title))),
Nav(Var($t), Steps(Text())))),LocationStep(price))))),
ElementConstruct(<minprice>,AttributeExpression(@title,
Nav(Var($t), Steps(Text()))),ElementConstruct(<price>,
FunMin(Nav(Var($p, Steps(Text())))))))))
69
Parsed Tree (2)QuiltQuery(
ElementConstruct(<Results>,FLWRExpression(
Binding(ForBinding($t, distinct, Nav(
FunDocument(“prices.xml”),Steps(
LocationStep(//), LocationStep(book), LocationStep(title))),
LetBinding($p, Nav(FunDocument(“prices.xml”),Steps(
LocationStep(//), LocationStep(book,
BinOpComp(=,Nav(CurrentNode,
Steps(LocationStep(title))),
Nav(Var($t), Steps(Text())))),LocationStep(price))))),
ElementConstruct(<minprice>,AttributeExpression(@title,
Nav(Var($t), Steps(Text()))),ElementConstruct(<price>,
FunMin(Nav(Var($p, Steps(Text())))))))))
S “prices.xml” R1
70
Parsed Tree (3)QuiltQuery(
ElementConstruct(<Results>,FLWRExpression(
Binding(ForBinding($t, distinct, Nav(
FunDocument(“prices.xml”),Steps(
LocationStep(//), LocationStep(book), LocationStep(title))),
LetBinding($p, Nav(FunDocument(“prices.xml”),Steps(
LocationStep(//), LocationStep(book,
BinOpComp(=,Nav(CurrentNode,
Steps(LocationStep(title))),
Nav(Var($t), Steps(Text())))),LocationStep(price))))),
ElementConstruct(<minprice>,AttributeExpression(@title,
Nav(Var($t), Steps(Text()))),ElementConstruct(<price>,
FunMin(Nav(Var($p, Steps(Text())))))))))
Distinctcol1$t(R1, //book/title
col1(S“prices.xml” R1))
71
Parsed Tree (4)QuiltQuery(
ElementConstruct(<Results>,FLWRExpression(
Binding(ForBinding($t, distinct, Nav(
FunDocument(“prices.xml”),Steps(
LocationStep(//), LocationStep(book), LocationStep(title))),
LetBinding($p, Nav(FunDocument(“prices.xml”),Steps(
LocationStep(//), LocationStep(book,
BinOpComp(=,Nav(CurrentNode,
Steps(LocationStep(title))),
Nav(Var($t), Steps(Text())))),LocationStep(price))))),
ElementConstruct(<minprice>,AttributeExpression(@title,
Nav(Var($t), Steps(Text()))),ElementConstruct(<price>,
FunMin(Nav(Var($p, Steps(Text())))))))))
S”prices.xml”R2
Distinctcol1$t(R1, //book/title
col1(S“prices.xml” R1))
72
Parsed Tree (5)QuiltQuery(
ElementConstruct(<Results>,FLWRExpression(
Binding(ForBinding($t, distinct, Nav(
FunDocument(“prices.xml”),Steps(
LocationStep(//), LocationStep(book), LocationStep(title))),
LetBinding($p, Nav(FunDocument(“prices.xml”),Steps(
LocationStep(//), LocationStep(book,
BinOpComp(=,Nav(CurrentNode,
Steps(LocationStep(title))),
Nav(Var($t), Steps(Text())))),LocationStep(price))))),
ElementConstruct(<minprice>,AttributeExpression(@title,
Nav(Var($t), Steps(Text()))),ElementConstruct(<price>,
FunMin(Nav(Var($p, Steps(Text())))))))))
R2,//bookcol2(s”prices.xml”
R2)
Distinctcol1$t(R1, //book/title
col1(S“prices.xml” R1))
73
Parsed Tree (6)QuiltQuery(
ElementConstruct(<Results>,FLWRExpression(
Binding(ForBinding($t, distinct, Nav(
FunDocument(“prices.xml”),Steps(
LocationStep(//), LocationStep(book), LocationStep(title))),
LetBinding($p, Nav(FunDocument(“prices.xml”),Steps(
LocationStep(//), LocationStep(book,
BinOpComp(=,Nav(CurrentNode,
Steps(LocationStep(title))),
Nav(Var($t), Steps(Text())))),LocationStep(price))))),
ElementConstruct(<minprice>,AttributeExpression(@title,
Nav(Var($t), Steps(Text()))),ElementConstruct(<price>,
FunMin(Nav(Var($p, Steps(Text())))))))))
Distinctcol1$t(R1, //book/title
col1(S“prices.xml”R1))
titlecol3(R2,//book
col2(s”prices.xml”R2))
74
Parsed Tree (7)QuiltQuery(
ElementConstruct(<Results>,FLWRExpression(
Binding(ForBinding($t, distinct, Nav(
FunDocument(“prices.xml”),Steps(
LocationStep(//), LocationStep(book), LocationStep(title))),
LetBinding($p, Nav(FunDocument(“prices.xml”),Steps(
LocationStep(//), LocationStep(book,
BinOpComp(=,Nav(CurrentNode,
Steps(LocationStep(title))),
Nav(Var($t), Steps(Text())))),LocationStep(price))))),
ElementConstruct(<minprice>,AttributeExpression(@title,
Nav(Var($t), Steps(Text()))),ElementConstruct(<price>,
FunMin(Nav(Var($p, Steps(Text())))))))))
Distinctcol1$t(R1, //book/title
col1(S“prices.xml” R1))
$t, text()col4(col2,title
col3(R2,//bookcol2(s”prices.xml”
R2)))
75
Parsed Tree (8)QuiltQuery(
ElementConstruct(<Results>,FLWRExpression(
Binding(ForBinding($t, distinct, Nav(
FunDocument(“prices.xml”),Steps(
LocationStep(//), LocationStep(book), LocationStep(title))),
LetBinding($p, Nav(FunDocument(“prices.xml”),Steps(
LocationStep(//), LocationStep(book,
BinOpComp(=,Nav(CurrentNode,
Steps(LocationStep(title))),
Nav(Var($t), Steps(Text())))),LocationStep(price))))),
ElementConstruct(<minprice>,AttributeExpression(@title,
Nav(Var($t), Steps(Text()))),ElementConstruct(<price>,
FunMin(Nav(Var($p, Steps(Text())))))))))
Distinctcol1$t(R1, //book/title
col1(S“prices.xml” R1))
col3=col4($t, text()col4(col2,title
col3(R2,//bookcol2(s”prices.xml”
R2))))
76
Parsed Tree (9)QuiltQuery(
ElementConstruct(<Results>,FLWRExpression(
Binding(ForBinding($t, distinct, Nav(
FunDocument(“prices.xml”),Steps(
LocationStep(//), LocationStep(book), LocationStep(title))),
LetBinding($p, Nav(FunDocument(“prices.xml”),Steps(
LocationStep(//), LocationStep(book,
BinOpComp(=,Nav(CurrentNode,
Steps(LocationStep(title))),
Nav(Var($t), Steps(Text())))),LocationStep(price))))),
ElementConstruct(<minprice>,AttributeExpression(@title,
Nav(Var($t), Steps(Text()))),ElementConstruct(<price>,
FunMin(Nav(Var($p, Steps(Text())))))))))
Distinctcol1$t(R1, //book/title
col1(S“prices.xml”R1))
col2,price$p(
col3=col4(
$t, text()col4(col2,title
col3(R2,//bookcol2(s”prices.xml”
R2)))))
77
Parsed Tree (10)QuiltQuery(
ElementConstruct(<Results>,FLWRExpression(
Binding(ForBinding($t, distinct, Nav(
FunDocument(“prices.xml”),Steps(
LocationStep(//), LocationStep(book), LocationStep(title))),
LetBinding($p, Nav(FunDocument(“prices.xml”),Steps(
LocationStep(//), LocationStep(book,
BinOpComp(=,Nav(CurrentNode,
Steps(LocationStep(title))),
Nav(Var($t), Steps(Text())))),LocationStep(price))))),
ElementConstruct(<minprice>,AttributeExpression(@title,
Nav(Var($t), Steps(Text()))),ElementConstruct(<price>,
FunMin(Nav(Var($p, Steps(Text())))))))))
Distinctcol1$t(R1, //book/title
col1(S“prices.xml” R1))
$t, text()col5(
col2,price$p(
col3=col4(
$t, text()col4(
col2,titlecol3(R2,//book
col2(s”prices.xml”R2))))))
78
Parsed Tree (11)QuiltQuery(
ElementConstruct(<Results>,FLWRExpression(
Binding(ForBinding($t, distinct, Nav(
FunDocument(“prices.xml”),Steps(
LocationStep(//), LocationStep(book), LocationStep(title))),
LetBinding($p, Nav(FunDocument(“prices.xml”),Steps(
LocationStep(//), LocationStep(book,
BinOpComp(=,Nav(CurrentNode,
Steps(LocationStep(title))),
Nav(Var($t), Steps(Text())))),LocationStep(price))))),
ElementConstruct(<minprice>,AttributeExpression(@title,
Nav(Var($t), Steps(Text()))),ElementConstruct(<price>,
FunMin(Nav(Var($p, Steps(Text())))))))))
Distinctcol1$t(R1, //book/title
col1(S“prices.xml” R1))
Mincol6col7(
$p, text()col6 (
$t, text()col5(
col2,price$p(
col3=col4(
$t, text()col4(
col2,titlecol3(
R2,//bookcol2(s”prices.xml”
R2)))))))
79
Parsed Tree (12)QuiltQuery(
ElementConstruct(<Results>,FLWRExpression(
Binding(ForBinding($t, distinct, Nav(
FunDocument(“prices.xml”),Steps(
LocationStep(//), LocationStep(book), LocationStep(title))),
LetBinding($p, Nav(FunDocument(“prices.xml”),Steps(
LocationStep(//), LocationStep(book,
BinOpComp(=,Nav(CurrentNode,
Steps(LocationStep(title))),
Nav(Var($t), Steps(Text())))),LocationStep(price))))),
ElementConstruct(<minprice>,AttributeExpression(@title,
Nav(Var($t), Steps(Text()))),ElementConstruct(<price>,
FunMin(Nav(Var($p, Steps(Text())))))))))
Distinctcol1$t(R1, //book/title
col1(S“prices.xml” R1))
T<minprice title=[col5]><price>[col7]</price></miniprice>col8(
Mincol6col7(
$p, text()col6 (
$t, text()col5(
col2,price$p(
col3=col4(
$t, text()col4(
col2,titlecol3(
R2,//bookcol2(s”prices.xml”
R2))))))))
80
Parsed Tree (13)QuiltQuery(
ElementConstruct(<Results>,FLWRExpression(
Binding(ForBinding($t, distinct, Nav(
FunDocument(“prices.xml”),Steps(
LocationStep(//), LocationStep(book), LocationStep(title))),
LetBinding($p, Nav(FunDocument(“prices.xml”),Steps(
LocationStep(//), LocationStep(book,
BinOpComp(=,Nav(CurrentNode,
Steps(LocationStep(title))),
Nav(Var($t), Steps(Text())))),LocationStep(price))))),
ElementConstruct(<minprice>,AttributeExpression(@title,
Nav(Var($t), Steps(Text()))),ElementConstruct(<price>,
FunMin(Nav(Var($p, Steps(Text())))))))))
Agg(FOR$t(
Distinctcol1$t(R1, //book/title
col1(S“prices.xml” R1))),
T<minprice title=[col5]><price>[col7]</price></miniprice>col8(
Mincol6col7(
$p, text()col6 (
$t, text()col5(
col2,price$p(
col3=col4(
$t, text()col4(
col2,titlecol3(
R2,//bookcol2(s”prices.xml”
R2)))))))))
81
Parsed Tree (14)QuiltQuery(
ElementConstruct(<Results>,FLWRExpression(
Binding(ForBinding($t, distinct, Nav(
FunDocument(“prices.xml”),Steps(
LocationStep(//), LocationStep(book), LocationStep(title))),
LetBinding($p, Nav(FunDocument(“prices.xml”),Steps(
LocationStep(//), LocationStep(book,
BinOpComp(=,Nav(CurrentNode,
Steps(LocationStep(title))),
Nav(Var($t), Steps(Text())))),LocationStep(price))))),
ElementConstruct(<minprice>,AttributeExpression(@title,
Nav(Var($t), Steps(Text()))),ElementConstruct(<price>,
FunMin(Nav(Var($p, Steps(Text())))))))))
T<results>col8</result>col9(
Agg(FOR$t(
Distinctcol1$t(R1, //book/title
col1(S“prices.xml” R1))),
T<minprice title=[col5]><price>[col7]</price></miniprice>col8(
Mincol6col7(
$p, text()col6 (
$t, text()col5(
col2,price$p(
col3=col4(
$t, text()col4(
col2,titlecol3(
R2,//bookcol2(s”prices.xml”
R2))
))
))
))
))
))
82
XAT ExampleT<results>col8</result>
col9(Agg(
FOR$t(
Distinctcol1$t(R1, //book/title
col1(S“prices.xml” R1))),
T<minprice title=[col5]><price>[col7]</price></miniprice>col8(
Mincol6col7(
$p, text()col6 (
$t, text()col5(
col2,price$p(
col3=col4(
$t, text()col4(
col2,titlecol3(
R2,//bookcol2(s”prices.xml”
R2))))))))))))
83
XAT Example (Graph)
Mincol6col7
$t, text()col5
$p, text()col6
s”prices.xml”R2
R2,//bookcol2
$t, text()col4
T<results>col8</result>col9
Distinctcol1$t
S“prices.xml” R1
R1, //book/titlecol1
col2,titlecol3
FOR$t
Agg
T<minprice title=[col5]><price>[col7]</price></miniprice>col8
col3=col4
col2,price$p
85
Different Set of Operators After Parsing but before
Decorrelation With FOR, no /, no .
After Decorrelation With /, , , Distinct(), no FOR.
...
86
Equivalent Rewriting Rules Navigation Pushdown
Swap navigation operator down. Computation Pushdown
Swap SQL operator down. Groupby Operator Simplification
Pull functions (subqueries) out of Groupby function.
87
Issues (1) Use subquery or subquery result? Both. Do we really need cutting? Yes. Do we need Binding and Expose? Binding yes. Expose no: we
use navigate instead. Why we need to distinguish the Binding from Column Names?
Because binding used in multiple places and immutable, but column names used in one place.
Which data model is better? OR is better than R. Bag semantics or Set semantics? Bag Identify different set of operators at different stage? TBD Do we need the collection in the ORDBMS? Yes. What’s the type tree? Regular Expression Types. Better notation for the Algebra Syntax. It’s too complex. Do we
really need to define the new column name? Yes. Also, an XC (XML Calculus) is required. Can be directed from Datalog.
88
Issues (2) How to handle Union in the XQuery? Union will be
translated into XML Union. How do decorrelated the XQuery with Union? As usual.
Because, the union will not generate branches but only the linear tree.
How to translate XML Union (Intersect and Difference) into the SQL Union (Intersect and Difference)? TBD
Can we allow collection of collections? Looks like we don’t need that.
89
Entry Point Notation Format:
<relative forward part> : <entry point> Examples:
author.lastname:book, lastname:book.author, lastname:author:book (multi-level entry point)
Rules: author.lastname = /:author.lastname lastname:author.lastname = author.lastname text():lastname.text() = lastname.text()
90
Discussion of Entry Point/Column Name Entry Point is used to show the dependencies
between different navigations. XPERANTO use different column names to
distinguish between different navigations, because their sources are relations.
Niagara use Entry Point to get rid of tedious column names and make the algebra looks better, and also they are XML oriented.
We use column names with typing system. Because we have both source of relations and XML fragments, and also in the middle of the XAT, some operators might generate new columns.
91
Column Name and Nested Operators In most of the cases, we can get rid of
the column names by using the Nested Operators.
Well, the data model is used to separate the operators by the directly nesting, so that, optimization can be done easily.
Hence, we still need the column names instead of the nested operators to represent our algebra.
92
XML Calculus (XC) Idea of XC is from extending
Datalog. It can be used to prove the
correctness of the rewriting rules. It can also be used to help with
semantic analysis.
93
Type Tree To explain the type of each column name, in the
other words, the semantic of each column name.
It will be used by Navigation pushdown to decide the cancellation, order pushdown, and other rewriting rules that required the semantic checking.
It could be: XML type, a relational table, column, and function’s return type.
It has type with a list of column names of that type.
95
How to translate multiple LET bindings? If the two let bindings from
different sources, For each let binding a collection is
generated. Until this is a FOR binding to
iterate through the collections, we just keep the two collections.
96
How to Handle Multiple FOR? That’s handled in the
Decorrelation. Keep this in mind:
FOR: means for each, it used . Hence, if there are multiple for, it results in a Cartesian product.
Others, navigate means creating a collection!
97
Trick in the XAT In XML Algebra, we use a evaluation
context, which is a sequence of XML nodes in a XML data model, which is a forest.
In Relational, we use a evaluation context, which is a list of tuples.
Hence, in the XAT generation, we try to convert the data model used by the XML into the data model used by OR.
That’s the tricky part!