Database Systems I Query Languages for XML

34
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 1 Database Systems I Query Languages for XML

description

Database Systems I Query Languages for XML. Query Languages for XML. XPath is a simple query language based on describing similar paths in XML documents. XQuery extends XPath in a style similar to SQL, introducing iterations, subqueries, etc. - PowerPoint PPT Presentation

Transcript of Database Systems I Query Languages for XML

Page 1: Database Systems I  Query Languages  for  XML

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 1

Database Systems I

Query Languages for XML

Page 2: Database Systems I  Query Languages  for  XML

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 2

Query Languages for XMLXPath is a simple query language based on describing similar paths in XML documents.XQuery extends XPath in a style similar to SQL, introducing iterations, subqueries, etc.XPath and XQuery expressions are applied to an XML document and return a sequence of qualifying items.Items can be primitive values or nodes (elements, attributes, documents).The items returned do not need to be of the same type.

Page 3: Database Systems I  Query Languages  for  XML

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 3

XPathA path expression returns the sequence of all qualifying items that are reachable from the input item following the specified path.A path expression is a sequence consisting of tags or attributes and special characters such as slashes (“/”).Absolute path expressions are applied to some XML document and returns all elements that are reachable from the document’s root element following the specified path.Relative path expressions are applied to an arbitrary node.

Page 4: Database Systems I  Query Languages  for  XML

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 4

XPath<?XML version=“1.0” standalone =“yes” ?><bibliography>

<book bookID = “b100“> <title> Foundations… </title> <author> Abiteboul </author> <author> Hull </author> <author> Vianu </author> <publisher> Addison Wesley </publisher> <year> 1995 </year> </book>…

</bibliography>

Applied to the above document, the XPath expression /bibliography/book/author returns the sequence

<author> Abiteboul </author>

<author> Hull </author> <author> Vianu </author> . . .

Page 5: Database Systems I  Query Languages  for  XML

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 5

AttributesIf we do not want to return the qualifying elements, but the value one of their attributes, we end the path expression with @attribute.Applied to the above document, the XPath expression

/bibliography/book/@bookID returns the sequence

“b100“ . . .

Page 6: Database Systems I  Query Languages  for  XML

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 6

AxesXPath provides a variety of axes, i.e. modes of navigation through semistructured data.

At each step of a path expression, we can prefix a tag or attribute name by an axis name and a colon.

For example, the path expression

/child::bibliography/child::book/attribute::bookID

is equivalent to /bibliography/book/@bookID.

Descendants are all direct and indirect children of a node.

Page 7: Database Systems I  Query Languages  for  XML

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 7

AxesAxes include

parent, ancestor, descendant, next-sibling, previous-sibling, self, and descendant-or-self.

XPath has the following shorthands for axes:/ child,// descendant-or-self,@ attribute,. self,.. parent.

Page 8: Database Systems I  Query Languages  for  XML

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 8

Axes<bibliography>

<book bookID = “b100“> <title> Foundations… </title> <author affiliation = “IBM“> Abiteboul </author> <author> Hull </author>

. . . </book> <article articleID = “a245“>

<header><author authorID = “a739“> Codd

</author> <title> A relational database model </title>

</header> <body> . . . </body> </article>

</bibliography>

Applied to the above document, the path expression /bibliography//author returns the sequence <author> Abiteboul </author> <author> Hull </author>

<author> Codd </author> .

Page 9: Database Systems I  Query Languages  for  XML

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 9

WildcardsWe can use wildcards instead of actual tags and attributes:* means any tag, and @* means any attribute.

Examples /bibliography/*/author returns the

sequence <author> Abiteboul </author>

<author> Hull </author>.

/bibliography//author/@* returns the sequence “IBM“

“a739“.

Page 10: Database Systems I  Query Languages  for  XML

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 10

ConditionsWe can restrict the qualifying paths to those that satisfy a given condition, surrounded by square brackets.Conditions can be anything returning a boolean value.

In particular, conditions can be: [<subpath>=<value>] there exists a subpath with the specified value [i] the element is the i-th element of the specified type Example /bibliography/book[/title=“Foundations…”]/author[2] returns <author> Hull </author>.

Page 11: Database Systems I  Query Languages  for  XML

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 11

XQueryXQuery extends XPath, i.e. every XPath expression is an XQuery expression.Beyond XPath expressions, XQuery introduces FLWOR expressions.Format: for let where order-by return

for/let clauses

where clause

order-by/return clause

sequence of items

sequence of items

Page 12: Database Systems I  Query Languages  for  XML

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 12

XQueryFLWOR expressions are similar to SQL select . . from . . . where . . . queries.

XQuery allows zero, one or more for and let clauses.

The where clause is optional.

There is one optional order-by clause.

Finally, there is exactly one return clause.

XQuery is case-sensitive.

XQuery (and XPath) is a W3C standard.

Page 13: Database Systems I  Query Languages  for  XML

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 13

XQueryXQuery is a functional language.

Any XQuery expression can be used in any place that an expression is expected.

SQL also allows subqueries in many places. However, SQL does, e.g., not allow any subquery to be any operand of any comparison in a WHERE clause.

This implies that every XQuery operator must be defined for operands that are sequences of items, not just for individual items.

Page 14: Database Systems I  Query Languages  for  XML

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 14

XQuery Clausesfor $x in expr

Defines node variable $x.The expression expr evaluates to a sequence of items.The variable $x is assigned to each item, in turn, and the body of the for clause is executed once for each assignment.

let $x := expr Defines collection variable $x.The expression expr evaluates to a sequence of items.The variable is bound to the entire sequence of items.Useful for common subexpressions and for aggregations.

Page 15: Database Systems I  Query Languages  for  XML

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 15

XQuery Clauseswhere condition

The condition is a boolean expression.The clause is applied to some item.If and only if the condition evaluates to true, the following return clause is executed for that item.

return expressionThe result of a FLWOR clause is a sequence of items. Expression defines the result format for the current (qualifying) item.The sequence of items produced by expression is appended to the sequence of items produced so far.

Page 16: Database Systems I  Query Languages  for  XML

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 16

Document NodesThe context for a for or let clause is often provided by a document node.Typically, the document comes from a file.The doc function constructs a document node from a file with a given name.Examples

doc("bib.xml")

doc(“infolab.stanford.edu/~hector/movies.xml”)

Page 17: Database Systems I  Query Languages  for  XML

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 17

Interpretation as XQuery Expression

XQuery expressions can be used wherever an XML expression of any kind is permitted.

Any text string is acceptable as content of a tag or value of an attribute.

If a string contains an XQuery expression that should be evaluated, this substring must be surrounded by curly brackets {}.Example

for $b in doc("bib.xml")/bibliography/book return <result id = {$b/@bookID}>{$b/title}</result>

Page 18: Database Systems I  Query Languages  for  XML

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 18

XQuery Examples

for $x in doc("bib.xml")/bibliography/book

return <result> {$x} </result>

for $x in doc("bib.xml")/bibliography/book

return <result> {$x} </result>

Returns: <result> <book>...</book></result> <result> <book>...</book></result> <result> <book>...</book></result> ...

let $x := doc("bib.xml")/bibliography/book

return <result> {$x} </result>

let $x := doc("bib.xml")/bibliography/book

return <result> {$x} </result>

Returns: <result> <book>...</book> <book>...</book> <book>...</book> ...</result>

Find all books.for vs. let

Page 19: Database Systems I  Query Languages  for  XML

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 19

XQuery Examples

Result: <title> abc </title> <title> def </title> <title> ghi </title>

for $x in doc("bib.xml")/bibliography/book

where $x/year > 1995

return $x/title

for $x in doc("bib.xml")/bibliography/book

where $x/year > 1995

return $x/title

Find all titles of books published after 1995.

Page 20: Database Systems I  Query Languages  for  XML

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 20

Ordering the Query Result

The order-by clause allows you to order the

results of an XQuery expression.

order-by list of expressions

The sort order is based on the value of the first

expression. Ties are broken based on the value

of the second (if necessary third etc.)

expression.

By default, the order is ascending.

A descending sort order can be specified using

descending.

Page 21: Database Systems I  Query Languages  for  XML

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 21

Elimination of DuplicatesThe built-in function distinct-values eliminates duplicates from a sequence of result items.

In principle, it applies only to primitive (atomic) types.

It can also be applied to elements, but then it will remove their tags, replacing them by quotes “”.ExampleIf return $b/title produces <title> aaa </title> <title> bbb </title> <title> aaa </title> then distinct-values (return $b/title) produces “aaa” “bbb”.

Page 22: Database Systems I  Query Languages  for  XML

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 22

XQuery ExamplesFind all books published by Morgan Kaufman and list them in descending order of their prices.

Uses order-by with option descending.

for $b in doc("bib.xml") /bibliography/book[publisher=“Morgan Kaufmann”])

order-by $b/price descending

return $b

for $b in doc("bib.xml") /bibliography/book[publisher=“Morgan Kaufmann”])

order-by $b/price descending

return $b

Page 23: Database Systems I  Query Languages  for  XML

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 23

XQuery ExamplesFor each author of a book published by Morgan Kaufmann, list the author and the titles of all books she published.

Uses nested subquery and function distinct-values.

for $a in distinct-values(doc("bib.xml") /bibliography/book[publisher=“Morgan Kaufmann”]/author)

return <result>

{$a}

{for $t in /bib/book[author=$a]/title

return $t}

</result>

for $a in distinct-values(doc("bib.xml") /bibliography/book[publisher=“Morgan Kaufmann”]/author)

return <result>

{$a}

{for $t in /bib/book[author=$a]/title

return $t}

</result>

Page 24: Database Systems I  Query Languages  for  XML

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 24

XQuery ExamplesResult:

<result> <author>Jones</author> <title> abc </title> <title> def </title> </result> <result> <author> Smith </author> <title> ghi </title> </result>

Page 25: Database Systems I  Query Languages  for  XML

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 25

JoinsWe can join two or more documents, by using one variable for each of the documents .

We let a variable range over the elements of the corresponding document, within a for-clause.

Need to be careful when comparing elements for equality, since their equality is by element identity, not by element content.

Typically, we want to compare the element content.

The built-in function data(E) returns the content of an element E.

Page 26: Database Systems I  Query Languages  for  XML

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 26

Example

Find all pairs of titles of books from the same year.

Uses two variables ranging over books and the data function applied to their year elements.

let $books:=doc("bib.xml")

for $b1 in doc("bib.xml")/bibliography/book, $b2 in doc("bib.xml")/bibliography/book

where data($b1/year) = data($b2 /year) return <result>{$b1/title} {$b2/title} </result>

let $books:=doc("bib.xml")

for $b1 in doc("bib.xml")/bibliography/book, $b2 in doc("bib.xml")/bibliography/book

where data($b1/year) = data($b2 /year) return <result>{$b1/title} {$b2/title} </result>

Page 27: Database Systems I  Query Languages  for  XML

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 27

Comparison OperatorsXQuery supports the standard comparison operators such as <, >, =.

Comparison operators are applied to a sequence of items.

Comparisons have an existential nature. I.e., they return true if and only if at least one of the items satisfies the condition of the comparison.

for $b in doc("bib.xml")/bibliography/book/ where $b/author/firstname = “A”

and $b/author/lastname = “B” return $b

Books returned can have one author with firstname A and another author with lastname B.

Page 28: Database Systems I  Query Languages  for  XML

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 28

Comparison OperatorsXQuery also supports special comparison operators

that only compare sequences consisting of a single

item: eq, ne, lt, gt, ge.

These comparisons fail if one of the operands

contains more than one item.

XQuery also provides built-in functions for

approximate string matching, in particular

contains($p, "windsurfing").

Page 29: Database Systems I  Query Languages  for  XML

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 29

Quantification

XQuery supports the existential and the universal quantifier.

Universal quantifierevery $v in expression1 satisfies

expression 2

Existential quantifiersome $v in expression1 satisfies

expression 2

Expression1 evaluates to a sequence of items, expression 2 is a boolean expression.

Page 30: Database Systems I  Query Languages  for  XML

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 30

Aggregation

XQuery provides built-in functions for the standard

aggregations such as SUM, MIN, COUNT and AVG.

They can be applied to any XQuery expression, i.e.

to any sequence of items.

Example

avg(doc("bib.xml")/bibliography/book/price)

count(doc("bib.xml")/bibliography/book/price)

Computes the average book price and the number of books, resp.

Page 31: Database Systems I  Query Languages  for  XML

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 31

XQuery Examples

Find books whose price is larger than the average price.

Uses aggregate operator (avg), applied to the result of a path expression.

let $a:=avg(doc("bib.xml")/bibliography/book/price)

for $b in doc("bib.xml")/bibliography/book

where $b/price > $a

return $b

let $a:=avg(doc("bib.xml")/bibliography/book/price)

for $b in doc("bib.xml")/bibliography/book

where $b/price > $a

return $b

Page 32: Database Systems I  Query Languages  for  XML

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 32

XQuery Examples

Find title of books with a paragraph containing the terms “sailing” and “windsurfing”.

Uses existential quantifier (some) and string matching (contains).

for $b in doc("bib.xml")//book

where some $p in $b//para satisfies

contains($p, "sailing") and contains($p, "windsurfing")

return $b/title

for $b in doc("bib.xml")//book

where some $p in $b//para satisfies

contains($p, "sailing") and contains($p, "windsurfing")

return $b/title

Page 33: Database Systems I  Query Languages  for  XML

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 33

XQuery Examples

Find the title of books where every paragraph contains the terms “sailing”.

Uses universal quantifier (every) and string matching (contains).

for $b in doc("bib.xml")//book

where every $p in $b//para satisfies

contains($p, "sailing")

return $b/title

for $b in doc("bib.xml")//book

where every $p in $b//para satisfies

contains($p, "sailing")

return $b/title

Page 34: Database Systems I  Query Languages  for  XML

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 34

SummaryXQuery is the standard XML query language.It is a functional language, i.e. any XQuery expression can be used in any place where an expression is expected.An XQuery expression consists of for, let, where, order and return clauses, of which some are optional.The main new concept compared to SQL are path expressions that return sets of elements reachable via the given path.Path expressions are defined in XPath, a sublanguage of XQuery. In addition, XQuery has equivalent constructs for most of the main SQL constructs, in particular quantifiers and aggregate functions.