XML Query: xQuery Reference: Xquery By Priscilla Walmsley, Published by O’Reilly.
Introduction to XQuery Resources: Official URL: Short intros: amoeller/XML/querying.
-
Upload
gwen-gardner -
Category
Documents
-
view
231 -
download
1
Transcript of Introduction to XQuery Resources: Official URL: Short intros: amoeller/XML/querying.
Introduction to XQuery
Resources:Official URL: www.w3.org/TR/xquery
Short intros: http://www.xml.com/pub/a/2002/10/16/xquery.html
www.brics.dk/~amoeller/XML/querying
Or see Ramakrishnan & Gehrke text
Lecture modified from slides by Dan Suciu
XML vs. Relational Data
{ row: { name: “John”, phone: 3634 },
row: { name: “Sue”, phone: 6343 },
row: { name: “Dick”, phone: 6363 }
}
name phone
John 3634
Sue 6343
Dick 6363
row row row
name name name
phone phone phone
“John” 3634 “Sue” “Dick”6343 6363
Relation … in XML
Relational to XML Data
• A relation instance is basically a tree with:– Unbounded fanout at level 1 (i.e., any # of rows)– Fixed fanout at level 2 (i.e., fixed # fields)
• XML data is essentially an arbitrary tree– Unbounded fanout at all nodes/levels– Any number of levels– Variable # of children at different nodes, variable
path lengths
Query Language for XML• Must be high-level; “SQL for XML”• Must conform to XSchema
– But also work in absence of schema info
• Support simple and complex/nested datatypes• Support universal and existential quantifiers,
aggregation• Operations on sequences and hierarchies of doc
structures• Capability to transform and create XML structures
XQuery
• Influenced by XML-QL, Lorel, Quilt, YATL– Also, XPath and XML Schema
• Reads a sequence of XML fragments or atomic values and returns a sequence of XML fragments or atomic values– Inputs/outputs are objects defined by XML-
Query data model, rather than strings in XML syntax
Overview of XQuery• Path expressions• Element constructors• FLWOR (“flower”) expressions
– Several other kinds of expressions as well, including conditional expressions, list expressions, quantified expressions, etc.
• Expressions evaluated w.r.t. a context:– Context item (current node)– Context position (in sequence being processed)– Context size (of the sequence being processed)– Context also includes namespaces, variables, functions,
date, etc.
Path Expressions
Examples:
• Bib/paper
• Bib/book/publisher
• Bib/paper/author/lastname
Given an XML document, the value of a path expression p is a set of objects
Path Expression Examples
Doc =
&o1
&o12 &o24 &o29
&o43
&o70 &o71
&96
&243 &206
&25
“Serge”“Abiteboul”
1997
“Victor”“Vianu”
122 133
paper bookpaper
references
references references
authortitle
yearhttp
author
authorauthor
title publisherauthor
authortitle
page
firstname lastnamefirstname
lastnamefirst last
Bib
&o44 &o45 &o46
&o47 &o48 &o49 &o50 &o51
&o52
Bib/paper = <&o12,&o29>
Bib/book/publisher = <&o51>
Bib/paper/author/lastname = <&o71,&206>
Bib/paper = <&o12,&o29>
Bib/book/publisher = <&o51>
Bib/paper/author/lastname = <&o71,&206>
Note that order of elements matters!
Element Construction
• An XQuery expression can construct new values or structures
• Example: Consider the path expressions from the previous slide.– Each of them returns a newly constructed
sequence of elements– Key point is that we don’t just return existing
structures or atomic values; we can re-arrange them as we wish into new structures
FLWOR Expressions
• FOR-LET-WHERE-ORDERBY-RETURN = FLWOR
FOR / LET Clauses
WHERE Clause
ORDERBY/RETURN Clause
List of tuples
List of tuples
Instance of XQuery data model
FOR vs. LET
• FOR $x IN list-expr – Binds $x in turn to each value in the list expr
• LET $x = list-expr – Binds $x to the entire list expr– Useful for common sub-expressions and for
aggregations
FOR vs. LET: Example
FOR $x IN document("bib.xml")/bib/book
RETURN <result> $x </result>
FOR $x IN document("bib.xml")/bib/book
RETURN <result> $x </result>
Returns: <result> <book>...</book></result> <result> <book>...</book></result> <result> <book>...</book></result> ...
LET $x IN document("bib.xml")/bib/book
RETURN <result> $x </result>
LET $x IN document("bib.xml")/bib/book
RETURN <result> $x </result>
Returns:<result> <book>...</book> <book>...</book> <book>...</book> ...</result>
Notice that result hasseveral elements
Notice that result hasexactly one element
XQuery Example 1
Find all book titles published after 1995:
FOR $x IN document("bib.xml")/bib/book
WHERE $x/year > 1995
RETURN $x/title
FOR $x IN document("bib.xml")/bib/book
WHERE $x/year > 1995
RETURN $x/title
Result: <title> abc </title> <title> def </title> <title> ghi </title>
XQuery Example 2For each author of a book by Morgan
Kaufmann, list all books she published:
FOR $a IN distinct(document("bib.xml") /bib/book[publisher=“Morgan Kaufmann”]/author)
RETURN <result>
$a,
FOR $t IN /bib/book[author=$a]/title
RETURN $t
</result>
FOR $a IN distinct(document("bib.xml") /bib/book[publisher=“Morgan Kaufmann”]/author)
RETURN <result>
$a,
FOR $t IN /bib/book[author=$a]/title
RETURN $t
</result>
distinct = a function that eliminates duplicates (after converting inputs to atomic values)
Results for Example 2
<result> <author>Jones</author> <title> abc </title> <title> def </title> </result> <result> <author> Smith </author> <title> ghi </title> </result>
Observe how nested structure of result elements is determined by the nested structure of the query.
XQuery Example 3
count = (aggregate) function that returns the number of elements
<big_publishers>
FOR $p IN distinct(document("bib.xml")//publisher)
LET $b := document("bib.xml")/book[publisher = $p]
WHERE count($b) > 100
RETURN $p
</big_publishers>
<big_publishers>
FOR $p IN distinct(document("bib.xml")//publisher)
LET $b := document("bib.xml")/book[publisher = $p]
WHERE count($b) > 100
RETURN $p
</big_publishers>
For each publisher p
- Let the list of books published by p be b
Count the # books in b, and return p if b > 100
XQuery Example 4
Find books whose price is larger than average:
LET $a=avg(document("bib.xml")/bib/book/price)
FOR $b in document("bib.xml")/bib/book
WHERE $b/price > $a
RETURN $b
LET $a=avg(document("bib.xml")/bib/book/price)
FOR $b in document("bib.xml")/bib/book
WHERE $b/price > $a
RETURN $b
Collections in XQuery• Ordered and unordered collections
– /bib/book/author = an ordered collection
– Distinct(/bib/book/author) = an unordered collection
• Examples:– LET $a = /bib/book $a is a collection; stmt iterates
over all books in collecion
– $b/author also a collection (several authors...)
RETURN <result> $b/author </result>RETURN <result> $b/author </result>
Returns a single collection! <result> <author>...</author> <author>...</author> <author>...</author> ... </result>
However:
Collections in XQuery
What about collections in expressions ?
• $b/price list of n prices
• $b/price * 0.7 list of n numbers??• $b/price * $b/quantity list of n x m numbers ??
– Valid only if the two sequences have at most one element– Atomization
• $book1/author eq "Kennedy" - Value Comparison• $book1/author = "Kennedy" - General Comparison
Sorting in XQuery
<publisher_list> FOR $p IN distinct(document("bib.xml")//publisher)
ORDERBY $p RETURN <publisher> <name> $p/text() </name> , FOR $b IN document("bib.xml")//book[publisher = $p]
ORDERBY $b/price DESCENDING RETURN <book>
$b/title , $b/price </book> </publisher></publisher_list>
<publisher_list> FOR $p IN distinct(document("bib.xml")//publisher)
ORDERBY $p RETURN <publisher> <name> $p/text() </name> , FOR $b IN document("bib.xml")//book[publisher = $p]
ORDERBY $b/price DESCENDING RETURN <book>
$b/title , $b/price </book> </publisher></publisher_list>
Conditional Expressions: If-Then-Else
FOR $h IN //holding
ORDERBY $h/titleRETURN <holding>
$h/title,
IF $h/@type = "Journal"
THEN $h/editor
ELSE $h/author
</holding>
FOR $h IN //holding
ORDERBY $h/titleRETURN <holding>
$h/title,
IF $h/@type = "Journal"
THEN $h/editor
ELSE $h/author
</holding>
Existential Quantifiers
FOR $b IN //book
WHERE SOME $p IN $b//para SATISFIES
contains($p, "sailing")
AND contains($p, "windsurfing")
RETURN $b/title
FOR $b IN //book
WHERE SOME $p IN $b//para SATISFIES
contains($p, "sailing")
AND contains($p, "windsurfing")
RETURN $b/title
Universal Quantifiers
FOR $b IN //book
WHERE EVERY $p IN $b//para SATISFIES
contains($p, "sailing")
RETURN $b/title
FOR $b IN //book
WHERE EVERY $p IN $b//para SATISFIES
contains($p, "sailing")
RETURN $b/title
Other Stuff in XQuery• Before and After
– for dealing with order in the input
• Filter– deletes some edges in the result tree
• Recursive functions• Namespaces• References, links …• Lots more stuff …
XML Schema
• Includes primitive data types (integers, strings, dates, etc.)
• Supports value-based constraints (integers > 100)
• User-definable structured types
• Inheritance (extension or restriction)
• Foreign keys
• Element-type reference constraints
Sample XML Schema<schema version=“1.0”
xmlns=“http://www.w3.org/1999/XMLSchema”><element name=“author” type=“string” /><element name=“date” type = “date” /><element name=“abstract”> <type> … </type></element><element name=“paper”> <type> <attribute name=“keywords” type=“string”/> <element ref=“author” minOccurs=“0” maxOccurs=“*” /> <element ref=“date” /> <element ref=“abstract” minOccurs=“0” maxOccurs=“1” /> <element ref=“body” /> </type></element></schema>
XML-Query Data Model• Describes XML data as a tree• Node ::= DocNode |
ElemNode | ValueNode | AttrNode | NSNode | PINode | CommentNode | InfoItemNode | RefNode
http://www.w3.org/TR/query-datamodel/2/2001
XML-Query Data ModelElement node (simplified definition):
• elemNode : (QNameValue, {AttrNode }, [ ElemNode | ValueNode]) ElemNode
• QNameValue = means “a tag name”Reads: “Give me a tag, a set of attributes, a list of
elements/values, and I will return an element”
XML Query Data Model
Example:
<book price = “55”
currency = “USD”>
<title> Foundations … </title>
<author> Abiteboul </author>
<author> Hull </author>
<author> Vianu </author>
<year> 1995 </year>
</book>
<book price = “55”
currency = “USD”>
<title> Foundations … </title>
<author> Abiteboul </author>
<author> Hull </author>
<author> Vianu </author>
<year> 1995 </year>
</book>
book1= elemNode(book, {price2, currency3}, [title4, author5, author6, author7, year8])
price2 = attrNode(…) /* next */currency3 = attrNode(…)title4 = elemNode(title, string9)…
book1= elemNode(book, {price2, currency3}, [title4, author5, author6, author7, year8])
price2 = attrNode(…) /* next */currency3 = attrNode(…)title4 = elemNode(title, string9)…