Management of XML and Semistructured Data
-
Upload
josiah-evans -
Category
Documents
-
view
19 -
download
1
description
Transcript of Management of XML and Semistructured Data
Management of XML and Semistructured Data
Lecture 7: XML-QL, Structural Recursion
Monday, April 23, 2001
XML-QL
• First declarative language for XML• How to obtain a query language for XML fast ?
– Assume OEM as data model
– Use features from UnQL and StruQL• Patterns
• Templates
• Skolem functions
– Design XML-like syntax
Patterns in XML-QL
WHERE <book language=“french”> <publisher> <name> Morgan Kaufmann </> </> <author> $A </> </book> in “www.a.b.c/bib.xml”CONSTRUCT <author> $A </>
WHERE <book language=“french”> <publisher> <name> Morgan Kaufmann </> </> <author> $A </> </book> in “www.a.b.c/bib.xml”CONSTRUCT <author> $A </>
Find all authors who published in Morgan Kaufmann:
Abbreviation: </> closes any tag.
Patterns in XML-QL
where <book language=$X> <author> $A </author>
</book> in “www.a.b.c/bib.xml” <book>
<author> $A </author> <author> Jones </author>
</book> in “www.a.b.c/bib.xml”construct <result> $X </>
where <book language=$X> <author> $A </author>
</book> in “www.a.b.c/bib.xml” <book>
<author> $A </author> <author> Jones </author>
</book> in “www.a.b.c/bib.xml”construct <result> $X </>
Find all languages in which Jones’ coauthors have published:
There is a join here…
Constructors in XML-QL
where <book language = $L> <author> $A </> </> in “www.a.b.c/bib.xml”construct <result> <author> $A </> <lang> $L </> </>
where <book language = $L> <author> $A </> </> in “www.a.b.c/bib.xml”construct <result> <author> $A </> <lang> $L </> </>
Result is:<result> <author>Smith</author> <lang>English </lang> </result><result> <author>Smith</author> <lang>Mandarin</lang> </result><result> <author>Doe </author> <lang>English </lang> </result>. . . .
Find all authors and the languages in which they published:
Nested Queries in XML-QL
WHERE <book.author> $A </> in “www.a.b.c/bib.xml”CONSTRUCT <result> <author> $A </> WHERE <book language = $L> <author> $A </> </> in “www.a.b.c/bib.xml” CONSTRUCT <lang> $L </> </>
WHERE <book.author> $A </> in “www.a.b.c/bib.xml”CONSTRUCT <result> <author> $A </> WHERE <book language = $L> <author> $A </> </> in “www.a.b.c/bib.xml” CONSTRUCT <lang> $L </> </>
Find all authors and the languages in which they published; group by authors:
Note: book.author is a (regular) path expression
<result> <author>Smith</author> <lang>English</lang> <lang>Mandarin</lang> <lang>…</lang> …</result><result> <author>Doe</author> <lang>English</lang> …</result>
Result is:
Skolem Functions in XML-QL
WHERE <book language = $L> <author> $A </> </> in “www.a.b.c/bib.xml”CONSTRUCT <result id=F($A)> <author> $A</> <lang> $L </> </>
WHERE <book language = $L> <author> $A </> </> in “www.a.b.c/bib.xml”CONSTRUCT <result id=F($A)> <author> $A</> <lang> $L </> </>
Same query, with Skolem functions
Assumptions:• the ID attribute is always id• default Skolem function for author is G($A), for lang is H($A, $L) (why ?)
Skolem Functions in XML-QL
{ WHERE <book> <author> $A </> <title> $T </> </> in “www.a.b.c/bib.xml” CONSTRUCT <person id=F($A)> <name id=G($A)> $A </> <booktitle> $T</> /* implicit Skolem function H($A, $T) */ </>}{ WHERE <paper> <author> $A </> <title> $T </> <journal> $J </> </> in “www.d.e.f/papers.xml” CONSTRUCT <person id=F($A)> <name id=G($A)> $A </> <papertitle> $T</> /* implicit Skolem function J($A, $T) */ <journaltitle> $J</> /* implicit Skolem function K($A, $T) */ </>}
{ WHERE <book> <author> $A </> <title> $T </> </> in “www.a.b.c/bib.xml” CONSTRUCT <person id=F($A)> <name id=G($A)> $A </> <booktitle> $T</> /* implicit Skolem function H($A, $T) */ </>}{ WHERE <paper> <author> $A </> <title> $T </> <journal> $J </> </> in “www.d.e.f/papers.xml” CONSTRUCT <person id=F($A)> <name id=G($A)> $A </> <papertitle> $T</> /* implicit Skolem function J($A, $T) */ <journaltitle> $J</> /* implicit Skolem function K($A, $T) */ </>}
Object fusion with Skolem functions and block structure- Compile a complete list of authors, from two sources
Result:
(some have only books,Others only papers,Others have both)
<person> <name>Smith</name> <booktitle>Book1</booktitle > <booktitle>Book2</booktitle > </result><person> <name>Jones</name> <booktitle>Book3</booktitle > <papertitle>paper1</papertitle > <journaltitle>journal1</journaltitle > </result><person> <name>Mark</name> <papertitle>paper2</papertitle > <journaltitle>journal3</journaltitle > </result>…
Skolem Functions in XML-QL
WHERE <book language = $L> <author> $A </> </> in “www.a.b.c/bib.xml”CONSTRUCT <result id=F($A)> <author id=G($A)> $A</> <lang id=H($A)> $L </> </>
WHERE <book language = $L> <author> $A </> </> in “www.a.b.c/bib.xml”CONSTRUCT <result id=F($A)> <author id=G($A)> $A</> <lang id=H($A)> $L </> </>
“Wrong” query number 1:
What is “wrong” here ?
Skolem Functions in XML-QL
WHERE <book language = $L> <author> $A </> </> in “www.a.b.c/bib.xml”CONSTRUCT <result id=F($A,$L)> <author id=G($A)> $A</> <lang id=H($A,$L)> $L </> </>
WHERE <book language = $L> <author> $A </> </> in “www.a.b.c/bib.xml”CONSTRUCT <result id=F($A,$L)> <author id=G($A)> $A</> <lang id=H($A,$L)> $L </> </>
“Wrong” query number 2:
What is “wrong” here ?
Skolem Functions in XML-QL
{ WHERE <book language = $L> <author> $A </> </> in “www.a.b.c/bib.xml” CONSTRUCT <author id=F($A)> <lang id=H($A,$L)> $L </> </> }{ WHERE <person> <city> $C </> <fluent-in> $X </> </> in “www.a.b.c/bib.xml” CONSTRUCT <location id=G($C)> <lang id=H($C,$L)> $L </> </> }
{ WHERE <book language = $L> <author> $A </> </> in “www.a.b.c/bib.xml” CONSTRUCT <author id=F($A)> <lang id=H($A,$L)> $L </> </> }{ WHERE <person> <city> $C </> <fluent-in> $X </> </> in “www.a.b.c/bib.xml” CONSTRUCT <location id=G($C)> <lang id=H($C,$L)> $L </> </> }
“Wrong” query number 3:
Three Rules to Construct Only Trees
Rule 1: nested elements must have Skolem functions that are… [how ??]
Rule 2: an element that has an atomic content must have a Skolem function that is… [how ??]
Rule 3: if a Skolem function occurs in two different places than the following condition must hold… [which ??]
CONSTRUCT <tag1 id=F([args1])> <tag2 id=G([args2])> …</> </>
CONSTRUCT <tag1 id=F([args1])> <tag2 id=G([args2])> …</> </>
CONSTRUCT … <tag id=F([args])> $X </> …CONSTRUCT … <tag id=F([args])> $X </> …
{ CONSTRUCT <tag1 id=G([args1])> <tag id=F([args])> …</> </> }{ CONSTRUCT <tag1 id=H([args2])> <tag id=F([args])> …</> </> }
{ CONSTRUCT <tag1 id=G([args1])> <tag id=F([args])> …</> </> }{ CONSTRUCT <tag1 id=H([args2])> <tag id=F([args])> …</> </> }
XML-QL v.s. XQuery
• Xquery (=Quilt) v.s. XML-QL+ faithful XML data model
+ Xpath sublanguage
+ aggregate functions (like in SQL)
+ some features from XQL- Patterns- Skolem functions
A Different Paradigm:Structural Recursion
Data as sets with a union operator:
{a:3, a:{b:”one”, c:5}, b:4} =
{a:3} U {a:{b:”one”,c:5}} U {b:4}
Structural Recursion
f($T1 U $T2) = f($T1) U f($T2)f({$L: $T}) = f($T)f({}) = {}f($V) = if isInt($V) then {result: $V} else {}
f($T1 U $T2) = f($T1) U f($T2)f({$L: $T}) = f($T)f({}) = {}f($V) = if isInt($V) then {result: $V} else {}
Example: retrieve all integers in the data
a a b
b c3
“one” 5
4result result result
3 5 4
Structural Recursion
What does this do ?
f($T1 U $T2) = f($T1) U f($T2)f({$L: $T}) = if $L=a then {b:f($T)} else {$L:f($T)}f({}) = {}f($V) = $V
f($T1 U $T2) = f($T1) U f($T2)f({$L: $T}) = if $L=a then {b:f($T)} else {$L:f($T)}f({}) = {}f($V) = $V
Structural Recursion
What does this do ?
f($T1 U $T2) = f($T1) U f($T2)f({$L: $T}) = {$L:{$L:f($T)}}f({}) = {}f($V) = $V
f($T1 U $T2) = f($T1) U f($T2)f({$L: $T}) = {$L:{$L:f($T)}}f({}) = {}f($V) = $V
Input = tree with n nodesOutput = ???
Structural Recursion
Example: increase all engine prices by 10%
f($T1 U $T2) = f($T1) U f($T2)f({$L: $T}) = if $L= engine then {$L: g($T)} else {$L: f($T)}f({}) = {}f($V) = $V
f($T1 U $T2) = f($T1) U f($T2)f({$L: $T}) = if $L= engine then {$L: g($T)} else {$L: f($T)}f({}) = {}f($V) = $V
g($T1 U $T2) = g($T1) U g($T2)g({$L: $T}) = if $L= price then {$L:1.1*$T} else {$L: g($T)}g({}) = {}g($V) = $V
g($T1 U $T2) = g($T1) U g($T2)g({$L: $T}) = if $L= price then {$L:1.1*$T} else {$L: g($T)}g({}) = {}g($V) = $V
engine body
part price
price price
part price
100
1000
100
1000
engine body
part price
price price
part price
110
1100
100
1000
Structural Recursion
Retrieve all subtrees reachable by (a.b)*.aa
b
a
f($T1 U $T2) = f($T1) U f($T2)f({$L: $T}) = if $L= a then g($T} U $T else { }f({}) = { }f($V) = { }
f($T1 U $T2) = f($T1) U f($T2)f({$L: $T}) = if $L= a then g($T} U $T else { }f({}) = { }f($V) = { }
g($T1 U $T2) = g($T1) U g($T2)g({$L: $T}) = if $L= b then f($T) else { }g({}) = { }g($V) = { }
g($T1 U $T2) = g($T1) U g($T2)g({$L: $T}) = if $L= b then f($T) else { }g({}) = { }g($V) = { }
Structural Recursion: General Form
f1($T1 U $T2) = f1($T1) U f1($T2)f1({$L: $T}) = E1($L, f1($T),...,fk($T), $T)f1({}) = { }f1($V) = { }
f1($T1 U $T2) = f1($T1) U f1($T2)f1({$L: $T}) = E1($L, f1($T),...,fk($T), $T)f1({}) = { }f1($V) = { }
fk($T1 U $T2) = fk($T1) U fk($T2)fk({$L: $T}) = Ek($L, f1($T),...,fk($T), $T)fk({}) = { }fk($V) = { }
fk($T1 U $T2) = fk($T1) U fk($T2)fk({$L: $T}) = Ek($L, f1($T),...,fk($T), $T)fk({}) = { }fk($V) = { }
. . . .
Each of E1, ..., Ek consists only of {_ : _}, U, if_then_else_
Evaluating Structural Recursion
Recursive Evaluation:
• Compute the functions recursively, starting with f1 at the root
Termination is guaranteed.
How efficiently can we evaluate this ?
Structural Recursion
Consider this:
f($T1 U $T2) = f($T1) U f($T2)f({$L: $T}) = {$L:f($T)}, $L:f($T)}f({}) = {}f($V) = $V
f($T1 U $T2) = f($T1) U f($T2)f({$L: $T}) = {$L:f($T)}, $L:f($T)}f({}) = {}f($V) = $V
Naive Recursive Evaluation
a
b
c
d
a a
b b b b
c c c c c c c c
Input tree = n nodesOutput tree = 2n+1 – 1 nodes
Efficient Recursive EvaluationRecursive Evaluation with function memorization.PTIME complexity.
a
b
c
d
a
b
c
d
a
b
c
d
f($T1 U $T2) = f($T1) U f($T2)f({$L: $T}) = {$L:f($T)}, $L:f($T)}f({}) = {}f($V) = $V
f($T1 U $T2) = f($T1) U f($T2)f({$L: $T}) = {$L:f($T)}, $L:f($T)}f({}) = {}f($V) = $V
Alternatively: apply the function in parallel to each input edge Bulk Evaluation
Bulk EvaluationSometimes f doesn’t return anything use edges
a
b
c
d
f($T1 U $T2) = f($T1) U f($T2)f({$L: $T}) = if $L=c then $T else f($T)f({}) = {}f($V) = $V
f($T1 U $T2) = f($T1) U f($T2)f({$L: $T}) = if $L=c then $T else f($T)f({}) = {}f($V) = $V
d
d
Epsilon Edges
Meaning of edges:
=
a b
c d
a b
c d
dc
Epsilon Edges
Note: union becomes easy to draw with edges:
Example:a b c d
eU =a b c d e
=
a b c d e
T1 T2U =T1 T2
Bulk Evaluation
Idea: “apply” E1, ..., Ek independently on each edge, then connect with edges PTIME
f1($T1 U $T2) = f1($T1) U f1($T2)f1({$L: $T}) = E1($L, f1($T),...,fk($T), $T)f1({}) = { }f1($V) = { }
f1($T1 U $T2) = f1($T1) U f1($T2)f1({$L: $T}) = E1($L, f1($T),...,fk($T), $T)f1({}) = { }f1($V) = { }
fk($T1 U $T2) = fk($T1) U fk($T2)fk({$L: $T}) = Ek($L, f1($T),...,fk($T), $T)fk({}) = { }fk($V) = { }
fk($T1 U $T2) = fk($T1) U fk($T2)fk({$L: $T}) = Ek($L, f1($T),...,fk($T), $T)fk({}) = { }fk($V) = { }
. . . .
Bulk Evaluation
f($T1 U $T2) = f($T1) U f($T2)f({$L: $T}) = if $L= a then g($T} U $T else { }f({}) = { }f($V) = { }
f($T1 U $T2) = f($T1) U f($T2)f({$L: $T}) = if $L= a then g($T} U $T else { }f({}) = { }f($V) = { }
g($T1 U $T2) = g($T1) U g($T2)g({$L: $T}) = if $L= b then f($T) else { }g({}) = { }g($V) = { }
g($T1 U $T2) = g($T1) U g($T2)g({$L: $T}) = if $L= b then f($T) else { }g({}) = { }g($V) = { }
Recall (a.b)*.a:
a
b
a
a
a
b
b
a
c
ba
aa
b
b
a
c b
b
a
c
a
d
d
d
d
bc
Structural Recursion
• Can evaluate in two ways:– Recursively: memorize functions’ results– Bulk: apply all functions on all edges, in
parallel, connect, eliminate what is useless
• Complexity: PTIME– More precisely: NLOGSPACE
• Works on graphs with cycles too !
XSL
• two W3C drafts: XSLT and XPATH– http://www.w3.org/TR/xpath, 11/99– http://www.w3.org/TR/WD-xslt, 11/99
• in commercial products (e.g. IE5.0)
• purpose: stylesheet specification language:– stylesheet: XML -> HTML– in general: XML -> XML
XSL Templates and Rules
• query = collection of template rules
• template rule = match pattern + template
<xsl:template> <xsl:apply-templates/> </xsl:template>
<xsl:template match = “/bib/*/title”> <result> <xsl:value-of/> </result></xsl:template>
<xsl:template> <xsl:apply-templates/> </xsl:template>
<xsl:template match = “/bib/*/title”> <result> <xsl:value-of/> </result></xsl:template>
Retrieve all book titles:
Flow Control in XSL
<xsl:template> <xsl:apply-templates/> </xsl:template>
<xsl:template match=“a”> <A><xsl:apply-templates/></A></xsl:template>
<xsl:template match=“b”> <B><xsl:apply-templates/></B></xsl:template>
<xsl:template match=“c”> <C><xsl:value-of/></C></xsl:template>
<xsl:template> <xsl:apply-templates/> </xsl:template>
<xsl:template match=“a”> <A><xsl:apply-templates/></A></xsl:template>
<xsl:template match=“b”> <B><xsl:apply-templates/></B></xsl:template>
<xsl:template match=“c”> <C><xsl:value-of/></C></xsl:template>
<a> <e> <b> <c> 1 </c>
<c> 2 </c>
</b>
<a> <c> 3 </c>
</a>
</e>
<c> 4 </c>
</a>
<A> <B> <C> 1 </C>
<C> 2 </C>
</B>
<A> <C> 3 </C>
</A>
<C> 4 </C>
</A>
XSL is Structural Recursion
Equivalent to:
f(T1 U T2) = f(T1) U f(T2)f({L: T}) = if L= c then {C: t} else L= b then {B: f(t)} else L= a then {A: f(t)} else f(t)f({}) = {}f(V) = V
f(T1 U T2) = f(T1) U f(T2)f({L: T}) = if L= c then {C: t} else L= b then {B: f(t)} else L= a then {A: f(t)} else f(t)f({}) = {}f(V) = V
XSL query = single functionXSL query with modes = multiple function
XSL and Structural Recursion
XSL:• trees only• may loop
Structural Recursion:• arbitrary graphs• always terminates
<xsl:template match = “e”> <xsl:apply-patterns select=“/”/></xsl:template>
<xsl:template match = “e”> <xsl:apply-patterns select=“/”/></xsl:template>
stack overflow on IE 5.0
add the following rule: