1 XML Algebra Comparison between: XPERANTO NIAGARA.

Post on 15-Jan-2016

216 views 0 download

Tags:

Transcript of 1 XML Algebra Comparison between: XPERANTO NIAGARA.

1

XML Algebra

Comparison between:XPERANTONIAGARA

2

Part I NIAGARA

XML Query Optimization

XML Algebra Data Model Operator Query Plan Equivalent Rules

XPERANTO XML Query to SQL XML Algebra

Data Model Operator Query Plan Composition Rules

Translation Example

3

<?xml version=”1.0” encoding=”US-ASCII” ?> <!DOCTYPE invoice [<!ELEMENT invoice (account_number,

bill_period, carrier+, itemized_call*, total)>

<!ELEMENT account_number (#PCDATA)><!ELEMENT bill_period (#PCDATA)><!ELEMENT carrier (#PCDATA)><!ELEMENT itemized_call EMPTY><!ATTLIST itemized_call

no ID #REQUIREDdate CDATA #REQUIREDnumber_called CDATA #REQUIREDtime CDATA #REQUIREDrate (NIGHT|DAY) #REQUIREDmin CDATA #REQUIREDamount CDATA #REQUIRED>

<!ELEMENT total (#PCDATA)>]>

<invoice>

<account_number>555 777-3158 573 234 3</account_number>

<bill_period>Jun 9 - Jul 8, 2000</bill_period>

<carrier>Sprint</carrier>

<itemized_call no=”1” date=”JUN 10” number_called=”973 555-8888” time=”10:17pm” rate=”NIGHT” min=”1” amount=”0.05” />

<itemized_call no=”2” date=”JUN 13” number_called=”973 650-2222” time=”10:19pm” rate=”DAY” min=”1” amount=”0.15” />

<itemized_call no=”3” date=”JUN 15” number_called=”206 365-9999” time=”10:25pm” rate=”NIGHT” min=”3” amount=”0.15” />

<total>$0.35</total>

</invoice>

Example of Telephone Bill

4

Example XQueryUser XQuery: <summary>{

FOR $rate IN distinct(document(“invoice”)/invoice/itemized_call@rate)

LET $itemized_call := document(“invoice”)/invoice/itemized_call[@rate=$rate]

WHERE $itemized_call/@number_called LIKE ‘973%’

RETURN<rate>$rate</rate><number_of_calls>count($itemized_call)</number_of_calls>

}</summary>

Count number of itemized_calls in calling area 973 grouped by the calling rate.

5

NIAGARA Title : Following the paths of XML

Data: An algebraic framework for XML query evaluation

By : Leonidas Galanis, Efstratios Viglas, David J. DeWitt, Jeffrey. F. Naughton, and David Maier.

6

Goals Be independent of schema

information Query on both structure and

content Generate simple, flexible, yet

powerful algebraic expressions Allow re-use of traditional

optimization techniques

7

Data Model A collection of bags of vertices. The vertices in the bag have no order. Example:

Root invoice.xml invoice invoice.account_number

<invoice>Invoice-element-content

</invoice>

< account_number >carrier -element-content

</ account_number >

[Root“invoice.xml”, invoice, invoice. account_number ]

8

Data Model Bag elements are reachable by

path expressions. The path expression consists of

two parts : An entry point A relative forward part

Example : account_number:invoice

9

Operators Source S , Follow , Select , Join

, Rename , Expose , Vertex , Group , Union , Intersection , Difference - , Cartesian Product .

10

Source Operator S Input : a list of documents Output :a collection of singleton bags Examples : S (*) All Known XML documentsS (invoice*.xml) All XML documents whose filename matches “invoice*.xmlS (*,schema.dtd) All known XML documents that conform to

schema.dtd

11

Follow operator Input : a path expression in entry

point notation Functionality : extracts vertices

reachable by path expression Output : a new bag that consist of

the extracted vertex + all the contents of the original bag (in care of unnesting follow)

12

Follow operator (Example*)

Root invoice.xml invoice

<invoice>Invoice-element-content

</invoice>

Root invoice.xml invoice invoice.carrier

<invoice>Invoice-element-content

</invoice>

<carrier>carrier -element-content

</carrier >

(carrier:invoice)*Unnesting Follow

{[Root invoice.xml , invoice]}

{[Root invoice.xml , invoice, invoice.carrier]}

13

Select operator Input : a set of bags Functionality : filters the bags of a

collection using a predicate Output : a set of bags that conform

to the predicate Predicate : Logical operator (,,), or

simple qualifications (,,,,,)

14

Select operator (Example)

invoice.carrier =Sprint

Root invoice.xml invoice<invoice>

Invoice-element-content</invoice>

Root invoice.xml invoice<invoice>

Invoice-element-content</invoice>

Root invoice.xml invoice<invoice>

Invoice-element-content</invoice>

{[Root invoice.xml , invoice], [Root invoice.xml , invoice], ……………}

{[Root invoice.xml , invoice],… }

15

Join operator Input : two collections of bags Functionality :Joins the two

collections based on a predicate Output :the concatenation of pairs

of pages that satisfy the predicate

16

Join operator (Example)

Root invoice.xml invoice<invoice>

Invoice-element-content</invoice>

Root customer.xml customer<customer>

customer-element-content</customer>

account_number: invoice =number:customer

Root invoice.xml invoice Root customer.xml customer<invoice>

Invoice-element-content</invoice>

<customer>customer-element-content

</customer>

{[Root invoice.xml , invoice]} {[Root customer.xml , customer]}

{[Root invoice.xml , invoice, Root customer.xml , customer]}

17

Expose operator Input : a list of path expressions of

vertices to be exposed Output : a set of bags that contains

vertices in the parameter list with the same order

18

Expose operator (Example)

Root invoice.xml invoice. bill_period invoice.carrier

<invoice>carrier-element-content

</invoice>

<carrier>bill_period -element-content

</carrier >

(bill_period,carrier)

{[Root invoice.xml , invoice.bill_period, invoice.carrier]}

Root invoice.xml invoice invoice.carrier invoice.bill_period

<invoice>Invoice-element-content

</invoice>

<carrier>bill_period -element-content

</carrier >

{[Root invoice.xml , invoice, invoice.carrier, invoice.bill_period]}

<invoice>carrier-element-content

</invoice>

19

Vertex operator Creates the actual XML vertex that

will encompass everything created by an expose operator

Example :

(Customer_invoice)[((account)[invoice.account_number], (inv_total)[invoice.total])]

20

Other operators Group : is used for arbitrary

grouping of elements based on their values Aggregate functions can be used with

the group operator (i.e. average) Rename : Changes the entry point

annotation of the elements of a bag. Example: (invoice.bill_period,date)

21

Example XQueryUser XQuery: <summary>{

FOR $rate IN distinct(document(“invoice”)/invoice/itemized_call@rate)

LET $itemized_call := document(“invoice”)/invoice/itemized_call[@rate=$rate]

WHERE $itemized_call/@number_called LIKE ‘973%’

RETURN<rate>$rate</rate><number_of_calls>count($itemized_call)</number_of_calls>

}</summary>

Count number of itemized_calls in calling area 973 grouped by the calling rate.

22

Query Plan: Algebra

υ(summary)[

ε(υ(rate)[rate]

υ(number_of_calls)[number])

[

ρ(rate:invoice.itemized_call, rate),

ρ(count(invoice.itemized_call), number)

[γ(rate:invoice.itemized_call, count(invoice.itemized_call))

[σ number called:invoice.itemized_call ►”973%”

[Φμ(invoice.itemized_call)

[s(invoice.xml)]]]]]]

23

Equivalent Rules 14 equivalent rules so far. Definition of Auxiliary Operators for

Equiv. A > B: Path expression A is a prefix of B ┴ : The null path expression A∏B : The greatest common prefix of path

expressions A and B A∏B : The common prefix of path

expressions A and B

.

24

Equivalent Rules Examples Rule applications

Follow ordering Φμ(A) [Φμ(B)] = Φμ (B)[Φμ (A)]

iff C < A, C < B: C = A∏B, or A∏B = ┴.

A BB

C

A ...

X X

25

Equivalent Rules Examples Rule applications

Join commutability and associability (A B) C = (C B) A

26

Equivalent Rules Examples Rule applications

Selection distribution and interchangeability

σc[A B] = σc1[A] σc2[B] where c is a conjoin of the conditions c1

and c2, each of which only refers to one of the join inputs

27

Equivalent Rules Examples Rule applications

Elimination of unused bag elements ε(P)(J[A]) = J(ε(P[A]))

iff J uses only elements exposed by P

28

XPERANTO Goal:

XQuery SQL References:

J. Shanmugasundaram, et. Al. Querying XML Querying XML Views of Relational DataViews of Relational Data, VLDB 2001.

J. Shanmugasundaram, et. Al. Efficiently Publishing Relational Data as XML Documents, VLDB 2000.

J. Shanmugasundaram, Ph.D. Dissertation. July, 2001.

29

Query Processing Architecture

RDBMS

XQuery Parser

Query Rewrite & View Composition

ComputationPushdown Tagger Runtime

XQueryQuery Results

XPERANTO Query Engine

Tagger Graph

XQGM

XQGM

SQL Query Tuples

RDB

User XML View

XQuery

XQuery

User

30

Data ModelTables of A List of XML Fragments

<carrier> $carrier</carrier

$carriers

Groupby: $carrier = aggXMLFrags($carrier_entry)

$carrier_entry

Project: $carrier_entry = <carrier>$carrier</carrier>

$carrier

Select: $invoice_id = $id

Table: Carrier

$invoice_id $carrier $invoice_id

$carrier

$carrier

$carrier_entry

$carriers

<carrier> $carrier</carrier<carrier> $carrier</carrier>……….

31

Operators Table, Project, Select, Join, Groupby,

Orderby, Union, Unnest, View, Function

- Select, Project, join, groupby, orderby and union have the same semantics as their relational counterparts.

- Project : to invoke various function defined- Table/View : to refer to relational table or XML view- Unnest : to unnest XML list- Function : to invoke XQuery valued functions - Groupby : to create XML Fragments

32

XML Functions & Operators

XML Function Description Operators

1 cr8Elem(Tag, Atts, Clist) Creates an element with tag name Tag, attribute list Atts, and contents Clist

Project

2 cr8AttList(A1,…..An) Creates a list of attributes from the attributes passed as parameters

Project

3 cr8Att(Name, Val) Creates an attribute with name Name and value Val Project

4 cr8XMLFragList(C1,…Cn) Creates an XML fragment list from the content parameters

Project

5 aggXMLFrags© Aggregate XML function that creates an XML fragment list

Groupby

6 getTagName(Elem) Returns the element name of the Elem Project, Select

7 getAttributes(Elem) Returns the list of attributes of Elem Project, Select

8 getContents(Elem) Returns the XML fragment list of contents of Elem Project, Select

9 getAttName(Att) Returns the name of attribute Att Project, Select

10 getAttValue Returns the value of the attribute Att Project, Select

11 isElement(E) Returns true if E is an element, returns false otherwise Select

12 isText(T) Returns true if T is text, returns false otherwise Select

13 Unnest(List) Superscalar function that unnests a list Unnest

33

Operators - Examples

$elems

Project: $elems = getContents($invoice)

$count

Groupby: $count = count($itemized_call)

$elems

<account_number>508-753-2352</account_number> <bill_period>24 july – 23 august, 2001</bill_period> ………….. ………….. …………..

$count

3

$itemized_call

<itemized_call > </itemized_call>

<itemized_call > </itemized_call>

<itemized_call > </itemized_call>

$invoice

<invoice> <account_number>508-753-2352</account_number> <bill_period>24 july – 23 august, 2001</bill_period> …………… ………….. …………..</invoice>

34

Operators - Examples

$entries

Groupby: $entries = aggXMLFrags($entry)

$result

Project: $result = cr8Elem(summary, Att, $entries)

$entry

<rate> DAY </rate> <number_of_calls> 20 </number_of_calls>

<rate> NIGHT </rate> <number_of_calls> 23 </number_of_calls>

$entries

<rate> DAY </rate><number_of_calls> 20 </number_of_calls><rate> NIGHT </rate> <number_of_calls> 23 </number_of_calls>

$entries

<rate> DAY </rate><number_of_calls> 20 </number_of_calls><rate> NIGHT </rate><number_of_calls> 23 </number_of_calls>

$result

<summary> <rate> DAY </rate> <number_of_calls> 20 </number_of_calls> <rate> NIGHT </rate> <number_of_calls> 23 </number_of_calls></summary>

35

Operator - Examples

$elem

Unnest: $elem = unnest($elems)

$elems

<rate> DAY </rate><number_of_calls> 20 </number_of_calls><rate> NIGHT </rate><number_of_calls> 23 </number_of_calls>

$elem

<rate> DAY </rate><number_of_calls> 20 </number_of_calls><rate> NIGHT </rate><number_of_calls> 23 </number_of_calls>

36

XML Query

$rate

Navigate: $doc/invoice/itemized_call@rate

$doc

View: document(“invoice.xml”);

XQGM:

$itemized_call

Selection: $number LIKE ‘973%’

$itemized_call

Select: $rate = $irate

$entry

Project: $entry = <rate> $rate </rate> <number_of_calls> $count </number_of_calls>

$entries

Groupby: $entries = aggXMLFrags($entry)

$result

Project: $result = <summary> $entries </summary>

$rate

Select: distinct($rate)

$itemized_call

Navigate: $irate = $doc/invoice/itemized_call@rate$number = $doc/invoice/itemized_call@number_called

$irate

$count

Groupby: $count = count($itemized_call)

$rate

Join (Correlated):

$count

$number

User XQuery: <summary>{

FOR $rate IN

distinct(document(“invoice”)/invoice/itemized_call@rate)

LET $itemized_call :=

document(“invoice”)/invoice/itemized_call[@rate=$rate]

WHERE $itemized_call/@number_called

LIKE ‘973%’

RETURN<rate>$rate</rate><number_of_calls>count($itemiz

ed_call)</number_of_calls>

}</summary>

37

Navigation in XQGM

$invoice

XQGM:

$account_number

Select: getTagName($elem)=“account_number”

$elems

Project: $elems = getContents($invoice)

$elem

Unnest: $elem = unnest($elems)

$invoice

$account_number

Navigate: $invoice/account_number

38

Default XML View<invoice>

<row><id> 1 </id><account_number>555 777-3158 573 234

3</account_number><bill_period> Jun 9 – Jun 8, 2000 </bill_period><total>$0.35</total>

</row>

</invoice><carrier>

<row><invoice_id> 1 </invoice_id><carrier>Sprint</carrier>

</row>

</carrier>...

id account_number bill_period total

1 555 777-3158 573 234 3

Jun 9 – Jun 8, 2000

$0.35

invoice

invoice_id carrier

1 Sprint

carrier

invoice_id

no

date number_called

time rate min

amount

1 1 JUN 10

973 555-8888

10:17pm

NIGHT

1 0.05

1 2 JUN 13

973 650-2222

10:19am

DAY 1 0.15

1 3 JUN 15

206 365-9999

10:25pm

NIGHT

3 0.15

itemized_call

39

User Defined XML View

Id account_number bill_period total

1 555 777-3158 573 234 3

Jun 9 – Jun 8, 2000

$0.35

Invoice

Invoice_id Carrier

1 Sprint

Carrier

Invoice_id

No

Date Number_called

Time Rate Min

Amount

1 1 JUN 10

973 555-8888

10:17pm

NIGHT

1 0.05

1 2 JUN 13

973 650-2222

10:19am

DAY 1 0.15

1 3 JUN 15

206 365-9999

10:25pm

NIGHT

3 0.15

Itemized_call

<invoice>

<account_number>555 777-3158 573 234 3</account_number>

<bill_period>Jun 9 - Jul 8, 2000</bill_period>

<carrier>Sprint</carrier>

<itemized_call no=”1” date=”JUN 10” number_called=”973 555-8888” time=”10:17pm” rate=”NIGHT” min=”1” amount=”0.05” />

<itemized_call no=”2” date=”JUN 13” number_called=”973 650-2222” time=”10:19pm” rate=”DAY” min=”1” amount=”0.15” />

<itemized_call no=”3” date=”JUN 15” number_called=”206 365-9999” time=”10:25pm” rate=”NIGHT” min=”3” amount=”0.15” />

<total>$0.35</total>

</invoice>

40

User Defined XML View Cont.

Create view invoice as (FOR

$invoice IN view(“default”)/invoice/row

RETURN<invoice>

<account_number>$invoice/account_number</account_number><bill_period>$invoice/bill_period</bill_period>FOR

$carrier in view(“default”)/carrier/rowWHERE

$carrier/invoice_id = $invoice/idRETURN

<carrier>$carrier</carrier>FOR

$itemized_call in view(“default”)/itemized_call/rowWHERE

$itemized_call/invoice_id = $invoice/idRETURN

<itemized_call no=$itemized_call/no date=$itemized_call/date number_called=$itemized_call/number_called time=$itemized_call/time rate=$itemized_call/rate min=$itemized_call/min amount=$itemized_call/amount />

SORTBY (@no)<total>$invoice/total</total>

</invoice>

)

41

XML View XQGMCreate view invoice as (

FOR $invoice IN

view(“default”)/invoice/row

RETURN<invoice>

<account_number>$invoice/account_number</account_number>

<bill_period>$invoice/bill_period</bill_period>

FOR

$carrier in view(“default”)/carrier/row

WHERE

$carrier/invoice_id = $invoice/id

RETURN

<carrier>$carrier</carrier>

FOR

$itemized_call in view(“default”)/itemized_call/row

WHERE

$itemized_call/invoice_id = $invoice/id

RETURN

<itemized_call no=$itemized_call/no date=$itemized_call/date number_called=$itemized_call/number_called time=$itemized_call/time rate=$itemized_call/rate min=$itemized_call/min amount=$itemized_call/amount />

SORTBY (@no)

<total>$invoice/total</total>

</invoice>

)

$account_number

Join (Correlated):

$bill_period $total

$doc

Project: $doc = <invoice> <account_number> $account_number </account_number> <bill_period>$bill_period</bill_period>$carriers $itemized_calls<total>$total</total></invoice>

$carriers

Groupby: $carrier = aggXMLFrags($carrier_entry)

$carrier_entry

Project: $carrier_entry = <carrier>$carrier</carrier>

$carrier

Select: $invoice_id = $id

Table: Carrier

$invoice_id $carrier

Table: Invoice

$id $account_number $bill_period $total

$items

Subquery.

Table: Carrier

$invoice_id $carrier

$items $carriers

42

View Composition User Query XQGM + User View XQGM To cancel out the Navigation operators By using the composition rules

cr8Elem(invoice, cr8AttList(),cr8XMLFragList(

cr8Elem(account_number, cr8AttList(),cr8XMLFragList($account_number)),

cr8Elem(bill_period, cr8AttList(),cr8XMLFragList($bill_period)),

$carriers,$items,cr8Elem(total, cr8AttList(),

cr8XMLFragList($total)))

)

$account_number

Select: getTagName($elem)=“account_number”

$elems

Project: $elems = getContents($invoice)

$elem

Unnest: $elem = unnest($elems)

$invoice

43

12 Composition RulesFunction COMPOSES WITH REDUCTION

1 getTagName cr8Elem(Tag, Atts, Clist) Tag

2 getAttributes Cr8Elem(Tag, Atts, Clist)

Atts

3 getContents cr8Element(Tag, Atts, Clist)

Clist

4 getAttName cr8Att(Name, Val) Name

5 getAttValue cr8Att(Name, Val) Val

6 isElement cr8Element(Tag, Atts, Clist)

True

7 isElement Other than cr8Eleme False

8 isText PCDATA True

9 isText Other than PCDATA False

10

Unnest aggXMLFrags(C) C

11

Unnest cr8XMLFragList(C1, ..., Cn)

C1 U ... U Cn

12

Unnest cr8AttList(A1, ..., An) A1 U ... U An

44

View Composition Example

$account_number

Select: getTagName($elem)=“account_number”

$elems

Project: $elems = getContents($invoice)

$elem

Unnest: $elem = unnest($elems)

$account_number

Join (Correlated):

$bill_period $total

$invoice

Project: $invoice = <invoice> <account_number> $account_number </account_number> <bill_period> $bill_period </bill_period> $carriers $itemized_calls <total> $total </total> </invoice>

$items $carriers

$account_number

Join (Correlated):

45

Computation Pushdown Goal: XQGM SQLs + Tagger Graph Step1: Query Decorrelation

Correlated Join Out Unions Reference: P. Seshadri, et. Al. “Complex Query

Decorrelation”, ICDE 1996. Step2: Tagger Pull-Up

XQGM Tagger Run-Time Graph Use “Sorted Outer Union”

Reference: J. Shanmugasundaram, et. Al. “Efficiently Publishing Relational Data as XML Documents”.

Separation of SQL and Tagger Operations Semantically equivalent fragment by pattern.

46

ComparisonXPERANTO NIAGARA

Goal XQuery SQL XQuery Algebra

Algebra XQGM and Tagger Graph XML Algebra

Data Model Tables of a list of XML Fragments

A collection of bags of vertices

Operators* 10 operators with 13 functions

12 operators

Variable Binding Lot of temporary variables No variables.

Order Sensitive Semi-sensitive (missing orderby)

Regular Expression

No Support at operator level Support at operator level

Text-in-context No Support Support

Level of abstraction

Function level (lower) Logical level (higher)

Transition rules Composition rules & (ad-hoc) 1 Semantically equivalent pattern

(ad-hoc) Equivalent rules

Operation History

Not maintained Maintained

47

Conclusions and Future Work WE NEED OUR OWN ALGEBRA. More Reading

David Beech, et. Al. A Formal Data Model and Algebra for XML.

Mary Fernandez, et. Al. An Algebra for XML Query.