Semistructured Data and XML
description
Transcript of Semistructured Data and XML
![Page 1: Semistructured Data and XML](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816900550346895de00f7d/html5/thumbnails/1.jpg)
SEMISTRUCTURED DATA AND XML
![Page 2: Semistructured Data and XML](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816900550346895de00f7d/html5/thumbnails/2.jpg)
2222
HOW THE WEB IS TODAY HTML documents
often generated by applications consumed by humans only easy access: across platforms, across
organizations only layout, no semantic information
No application interoperability: HTML not understood by applications
screen scraping brittle Database technology: client-server
still vendor specific
![Page 3: Semistructured Data and XML](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816900550346895de00f7d/html5/thumbnails/3.jpg)
3333
XML DATA EXCHANGE FORMAT A standard from the W3C (World Wide Web
Consortium, http://www.w3.org). The mission of the W3C „. . . developing common protocols that
promote its evolution and ensure its interoperability. . .“.
Basic ideas XML = data XML generated by applications XML consumed by applications Easy access: across platforms, organizations.
![Page 4: Semistructured Data and XML](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816900550346895de00f7d/html5/thumbnails/4.jpg)
4444
PARADIGM SHIFT ON THE WEB For web search engines:
From documents (HTML) to data (XML) From document management to document
understanding (e.g., question answering) From information retrieval to data management
For database systems: From relational (structured) model to
semistructured data From data processing to data /query translation From storage to transport
![Page 5: Semistructured Data and XML](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816900550346895de00f7d/html5/thumbnails/5.jpg)
5555
THE SEMISTRUCTURED DATA MODEL
&o1
&o12 &o24 &o29
&o43&96
&243 &206
&25
“Serge” “Abiteboul”
1997
“Victor” “Vianu” 122 133
paper bookpaper
references
references references
author title year httpauthor
authorauthor
title publisherauthor
authortitle
page
firstnamelastname firstname lastname first
last
Bib
Object Exchange Model (OEM) complex object
atomic object
![Page 6: Semistructured Data and XML](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816900550346895de00f7d/html5/thumbnails/6.jpg)
6666
THE SEMISTRUCTURED DATA MODEL
Data is self-describing, i.e. the data description is integrated with the data itself rather than in a separate schema.
Database is a collection of nodes and arcs (directed graph).
Leaf nodes represent data of some atomic type (atomic objects, such as numbers or strings).
Interior nodes represent complex objects consisting of components (child nodes), connected by arcs to this node.
Arcs are directed and connect two nodes.
![Page 7: Semistructured Data and XML](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816900550346895de00f7d/html5/thumbnails/7.jpg)
7777
THE SEMISTRUCTURED DATA MODEL
Arc labels indicates the relationship between the two corresponding nodes.
The root node is the only interior node without in-arcs, representing the entire database.
All database objects are children of the root node.
Every node must be reachable from the root. A general graph structure is possible, i.e. the
graph need not be a tree structure.
![Page 8: Semistructured Data and XML](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816900550346895de00f7d/html5/thumbnails/8.jpg)
8888
SYNTAX FOR SEMISTRUCTURED DATA
Bib: &o1 { paper: &o12 { … }, book: &o24 { … }, paper: &o29 { author: &o52 “Abiteboul”, author: &o96 { firstname: &243 “Victor”, lastname: &o206 “Vianu”}, title: &o93 “Regular path queries with
constraints”, references: &o12, references: &o24, pages: &o25 { first: &o64 122, last: &o92
133} } }
Observe: Nested tuples, set-values, oids!
![Page 9: Semistructured Data and XML](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816900550346895de00f7d/html5/thumbnails/9.jpg)
9999
SYNTAX FOR SEMISTRUCTURED DATA
May omit oids: { paper: { author: “Abiteboul”, author: { firstname: “Victor”, lastname: “Vianu”}, title: “Regular path queries …”, page: { first: 122, last: 133 } } }
![Page 10: Semistructured Data and XML](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816900550346895de00f7d/html5/thumbnails/10.jpg)
10101010
VS. RELATIONAL MODEL Missing attributes Additional attributes Multiple attribute values (set-valued attributes) Objects as attribute values No global schema
only the first characteristics supported by relational model, all others are not
![Page 11: Semistructured Data and XML](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816900550346895de00f7d/html5/thumbnails/11.jpg)
11111111
VS. RELATIONAL MODEL Semistructured data
Self-describing, Irregular data, No a-priori structure.
Relational DB Separate schema, Regular data, A-priori structure.
![Page 12: Semistructured Data and XML](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816900550346895de00f7d/html5/thumbnails/12.jpg)
XML
![Page 13: Semistructured Data and XML](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816900550346895de00f7d/html5/thumbnails/13.jpg)
13131313
IMPORTANT XML STANDARDS
XSL/XSLT: presentation and transformation standards
RDF: resource description framework (meta-info such as ratings, categorizations, etc.)
Xpath/Xpointer/Xlink: standard for linking to documents and elements within
Namespaces: for resolving name clashes DOM: Document Object Model for
manipulating XML documents SAX: Simple API for XML parsing XQuery: query language
![Page 14: Semistructured Data and XML](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816900550346895de00f7d/html5/thumbnails/14.jpg)
14141414
XML A W3C standard to complement HTML Origins: Structured text SGML
Large-scale electronic publishing Data exchange on the web
Motivation: HTML describes presentation XML describes content
http://www.w3.org/TR/2000/REC-xml-20001006 (version 2, 10/2000)
SGMLXMLHTML4.0
![Page 15: Semistructured Data and XML](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816900550346895de00f7d/html5/thumbnails/15.jpg)
15151515
FROM HTML TO XML
HTML describes the presentation
![Page 16: Semistructured Data and XML](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816900550346895de00f7d/html5/thumbnails/16.jpg)
16161616
HTML
<h1> Bibliography </h1><p> <i> Foundations of Databases </i> Abiteboul, Hull, Vianu <br> Addison Wesley, 1995<p> <i> Data on the Web </i> Abiteboul, Buneman, Suciu <br> Morgan Kaufmann, 1999
HTML describes the presentation
![Page 17: Semistructured Data and XML](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816900550346895de00f7d/html5/thumbnails/17.jpg)
17171717
XML
<bibliography> <book> <title> Foundations… </title> <author> Abiteboul </author> <author> Hull </author> <author> Vianu </author> <publisher> Addison Wesley
</publisher> <year> 1995 </year> </book> …
</bibliography>
XML describes the content
![Page 18: Semistructured Data and XML](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816900550346895de00f7d/html5/thumbnails/18.jpg)
18181818
WHY ARE WE DB’ERS INTERESTED? It’s data. That’s us.
Database issues: How are we going to model XML? (graphs). How are we going to query XML? (XQuery) How are we going to store XML (in a relational
database? object-oriented? native?) How are we going to process XML efficiently?
(many interesting research questions!)
![Page 19: Semistructured Data and XML](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816900550346895de00f7d/html5/thumbnails/19.jpg)
19191919
ELEMENTS Tags
book, title, author, … start tag: <book>, end tag: </book> defined by user / programmer (different from
HTML!) Elements
<book>…<book>,<author>…</author> An element consists of a matching start and end
tag and the enclosed content. Elements can be nested, i.e. content of one
element can consist of sequence of other elements.
![Page 20: Semistructured Data and XML](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816900550346895de00f7d/html5/thumbnails/20.jpg)
20202020
ATTRIBUTES Attributes can be associated with any
element. Provide additional information about
elements. Attributes can have only one value. Example
<book price = “55” currency = “USD”> <title> Foundations of Databases </title> <author> Abiteboul </author> … <year> 1995 </year></book>
Attributes can also be used to connect elements.
![Page 21: Semistructured Data and XML](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816900550346895de00f7d/html5/thumbnails/21.jpg)
21212121
NON-TREE-LIKE XML So far: only tree-like XML documents,
i.e. each element is nested within at most one other element.
Attributes can also be used to create non-tree XML documents.
Attributes with a domain of ID serve as primary keys of elements.
Attributes with a domain of IDREF serve as foreign keys referencing the ID of another element.
![Page 22: Semistructured Data and XML](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816900550346895de00f7d/html5/thumbnails/22.jpg)
22222222
NON-TREE-LIKE XMLExample of a non-tree structure<persons> <person personid=“o555”>
<name> Jane </name> </person> <person personid=“o456”> <name> Mary </name> <children refs=“o123 o555”</children > </person> <person personid=“o123” mother=“o456”> <name>John</name> </person></persons>
![Page 23: Semistructured Data and XML](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816900550346895de00f7d/html5/thumbnails/23.jpg)
23232323
NAMESPACES An XML document can involve tags that
come for multiple sources. One and the same tag can appear in more
than one source.<table> <tr>
<td>Apples</td> <td>Bananas</td>
</tr> </table>
<table> <name>African Coffee Table</name> <width>80</width><length>120</length>
</table>
![Page 24: Semistructured Data and XML](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816900550346895de00f7d/html5/thumbnails/24.jpg)
24242424
NAMESPACES Name conflicts can be resolved by prefixing
tag names according to their source.<h:table>
<h:tr> <h:td>Apples</h:td> <h:td>Bananas</h:td> </h:tr>
</h:table> <f:table>
<f:name>African Coffee Table</f:name> <f:width>80</f:width> <f:length>120</f:length>
</f:table> When using prefixes in XML, a namespace
for the prefix must be defined. The namespace must be referenced (via an
URI) in the start tag of an enclosing element .
![Page 25: Semistructured Data and XML](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816900550346895de00f7d/html5/thumbnails/25.jpg)
25252525
WELL-FORMED XML A well-formed XML document satisfies the
following conditions: Begins with a declaration that it is XML. Has a single root element that encloses the whole
document. Consists of properly nested elements, i.e. start
and end tag of an element are within the same enclosing element.
standalone =“yes” states that document has no DTD.
In this mode, you can invent your own tags, like in semistructured data model.
![Page 26: Semistructured Data and XML](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816900550346895de00f7d/html5/thumbnails/26.jpg)
26262626
WELL-FORMED XML
<?XML version=“1.0” standalone =“yes” ?><bibliography>
<book> <title> Foundations… </title> <author> Abiteboul </author> <author> Hull </author> <author> Vianu </author> <publisher> Addison Wesley </publisher> <year> 1995 </year> </book> <book> <title> … </title> . . . </book> …
</bibliography>
![Page 27: Semistructured Data and XML](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816900550346895de00f7d/html5/thumbnails/27.jpg)
27272727
WELL-FORMED XML HTML browsers will display documents with
errors (like missing end tags). The W3C XML specification states that a
program should stop processing an XML document if it finds an error.
The main reason is that XML is being consumed by programs rather than by humans (as HTML).
W3C provides a validator that checks whether an XML document is well-formed.
![Page 28: Semistructured Data and XML](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816900550346895de00f7d/html5/thumbnails/28.jpg)
28282828
VALID XML The validator can also check whether an XML
document is valid, i.e. conforms to a Document Type Definition (DTD).
A DTD specifies the allowable tags and how they can be nested.
XML with a DTD is no longer semistructured (self-describing).
However, a DTD is less rigid than the schema of a relational DB. E.g., a DTD allows missing and multiple attributes / elements.
![Page 29: Semistructured Data and XML](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816900550346895de00f7d/html5/thumbnails/29.jpg)
DTD
![Page 30: Semistructured Data and XML](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816900550346895de00f7d/html5/thumbnails/30.jpg)
30303030
DOCUMENT TYPE DEFINITIONS Document Type Definition (DTD): set of rules
(grammar) specifying elements, attributes and all other aspects of XML documents.
For each element, specify name and content type.
Content type can, e.g., be #PCDATA (character string), other elements, regular expression made of the above content
types* = zero or more occurrences? = zero or one occurrence+ = one or more occurrences, = sequence of elements.
![Page 31: Semistructured Data and XML](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816900550346895de00f7d/html5/thumbnails/31.jpg)
31313131
DOCUMENT TYPE DESCRIPTORS Sort of like a schema but not really.
Inherited from SGML DTD standard BNF grammar establishing constraints on
element structure and content Definitions of entities
<!ELEMENT Book (title, author*) >
<!ELEMENT title #PCDATA> <!ELEMENT author (name, address,age?)>
<!ATTLIST Book id ID #REQUIRED> <!ATTLIST Book pub IDREF #IMPLIED>
![Page 32: Semistructured Data and XML](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816900550346895de00f7d/html5/thumbnails/32.jpg)
32323232
EXAMPLE DTD: PRODUCT CATALOG<!DOCTYPE CATALOG [<!ELEMENT CATALOG (PRODUCT+)> <!ELEMENT PRODUCT (SPECIFICATIONS+,OPTIONS?,PRICE+,NOTES?)><!ATTLIST PRODUCT NAME CDATA #IMPLIED CATEGORY (HandTool|Table|Shop-Professional) "HandTool" PARTNUM CDATA #IMPLIED PLANT (Pittsburgh|Milwaukee|Chicago) "Chicago" INVENTORY (InStock|Backordered|Discontinued) "InStock"> <!ELEMENT SPECIFICATIONS (#PCDATA)> <!ATTLIST SPECIFICATIONS WEIGHT CDATA #IMPLIED POWER CDATA #IMPLIED> <!ELEMENT OPTIONS (#PCDATA)> <!ATTLIST OPTIONS FINISH (Metal|Polished|Matte) "Matte" ADAPTER (Included|Optional|NotApplicable) "Included" CASE (HardShell|Soft|NotApplicable) "HardShell"> <!ELEMENT PRICE (#PCDATA)> <!ATTLIST PRICE MSRP CDATA #IMPLIED WHOLESALE CDATA #IMPLIED STREET CDATA #IMPLIED SHIPPING CDATA #IMPLIED> <!ELEMENT NOTES (#PCDATA)> ]>
![Page 33: Semistructured Data and XML](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816900550346895de00f7d/html5/thumbnails/33.jpg)
33333333
SHORTCOMINGS OF DTDSUseful for documents, but not so good for data: Element name and type are associated
globally No support for structural re-use
Object-oriented-like structures aren’t supported No support for data types
Can’t do data validation Can have a single key item (ID), but:
No support for multi-attribute keys No support for foreign keys (references to other
keys) No constraints on IDREFs (reference only a Section)
![Page 34: Semistructured Data and XML](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816900550346895de00f7d/html5/thumbnails/34.jpg)
XML SCHEMA
![Page 35: Semistructured Data and XML](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816900550346895de00f7d/html5/thumbnails/35.jpg)
35353535
XML SCHEMA The successor of DTDs to specify a schema
for XML documents. A W3C standard. Includes and extends functionality of DTDs. In particular, XML Schemas support data
types. This makes it easier to validate the correctness of data and to work with data from a database.
XML Schemas are written in XML. You don't have to learn a new language and can use your XML parser to parse your Schema files.
![Page 36: Semistructured Data and XML](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816900550346895de00f7d/html5/thumbnails/36.jpg)
36363636
EXAMPLE XML SCHEMA<schema version=“1.0” xmlns=“http://www.w3.org/1999/XMLSchema”>
<element name=“author” type=“string” /><element name=“date” type = “date” /><element name=“abstract”> <type> … </type></element><element name=“paper”> <type> <attribute name=“keywords” type=“string”/> <element ref=“author” minOccurs=“0”
maxOccurs=“*” /> <element ref=“date” /> <element ref=“abstract” minOccurs=“0”
maxOccurs=“1” /> <element ref=“body” /> </type></element></schema>
![Page 37: Semistructured Data and XML](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816900550346895de00f7d/html5/thumbnails/37.jpg)
37373737
SIMPLE ELEMENTS Simple elements contain only text. They can have one of the built-in datatypes:
xs:string, xs:decimal, xs:integer, xs:booleanxs:date, xs:time.
Example<xs:element name="lastname“
type="xs:string"/><xs:element name="age" type="xs:integer"/> <xs:element name="dateborn"
type="xs:date"/>
![Page 38: Semistructured Data and XML](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816900550346895de00f7d/html5/thumbnails/38.jpg)
38383838
SIMPLE ELEMENTS Restrictions allow you to further constrain the
content of simple elements.
<xs:element name="age"> <xs:simpleType>
<xs:restriction base="xs:integer">
<xs:minInclusive value="0"/> <xs:maxInclusive
value="120"/> </xs:restriction> </xs:simpleType> </xs:element>
![Page 39: Semistructured Data and XML](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816900550346895de00f7d/html5/thumbnails/39.jpg)
39393939
ATTRIBUTES Attributes can be specified using the attribute
element:<xs:attribute name="xxx"
type="yyy"/> Attribute elements are nested within the
element of the element with which they are associated.
By default, attributes are optional. To make an attribute mandatory, use
<xs:attribute name="lang“ type="xs:string“use="required"/>
Attributes can have the same built-in datatypes as simple elements.
![Page 40: Semistructured Data and XML](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816900550346895de00f7d/html5/thumbnails/40.jpg)
40404040
COMPLEX ELEMENTS Complex elements can contain other elements and
can have attributes. Nested elements need to occur in the order
specified. The number of repetitions of elements are
controlled by the attributes minOccurs and maxOccurs. The default is one repetition.
A complex element with an attribute:<xs:element name="product">
<xs:complexType> <xs:attribute name="prodid"
type="xs:positiveInteger"/> </xs:complexType> </xs:element>
![Page 41: Semistructured Data and XML](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816900550346895de00f7d/html5/thumbnails/41.jpg)
41414141
COMPLEX ELEMENTS A complex element containing a sequence of
nested (simple) elements:
<xs:element name="employee"> <xs:complexType> <xs:sequence>
<xs:element name="firstname" type="xs:string"/>
<xs:element name="lastname" type="xs:string"/> </xs:sequence> </xs:complexType>
</xs:element>
![Page 42: Semistructured Data and XML](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816900550346895de00f7d/html5/thumbnails/42.jpg)
42424242
COMPLEX ELEMENTS If you name the complex element, other
elements can reference and include it:
<xs:complexType name="persontype"> <xs:sequence>
<xs:element name="firstname" type="xs:string"/> <xs:element name="lastname" type="xs:string"/> </xs:sequence>
</xs:complexType>
<xs:element name="person" type="persontype"/>
![Page 43: Semistructured Data and XML](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816900550346895de00f7d/html5/thumbnails/43.jpg)
43434343
EXAMPLE XML SCHEMA<schema version=“1.0” xmlns=“http://www.w3.org/1999/XMLSchema”>
<element name=“author” type=“string” /><element name=“date” type = “date” /><element name=“abstract”> <type> … </type></element><element name=“paper”> <type> <attribute name=“keywords” type=“string”/> <element ref=“author” minOccurs=“0”
maxOccurs=“*” /> <element ref=“date” /> <element ref=“abstract” minOccurs=“0”
maxOccurs=“1” /> <element ref=“body” /> </type></element></schema>
![Page 44: Semistructured Data and XML](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816900550346895de00f7d/html5/thumbnails/44.jpg)
44444444
XML VS. SEMISTRUCTURED DATA Both described best by a graph. Both are schema-less, self-describing
(XML without DTD / XML schema). XML is ordered, semistructured data is not. XML can mix text and elements:
<talk> Making Java easier to type and easier to type
<speaker> Phil Wadler </speaker></talk>
XML has lots of other stuff: attributes, entities, processing instructions, comments.
![Page 45: Semistructured Data and XML](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816900550346895de00f7d/html5/thumbnails/45.jpg)
XML-PATH = XPATH
![Page 46: Semistructured Data and XML](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816900550346895de00f7d/html5/thumbnails/46.jpg)
46464646
QUERY LANGUAGES FOR XML XPath is a simple query language based on
describing similar paths in XML documents. XQuery extends XPath in a style similar to
SQL, introducing iterations, subqueries, etc. XPath and XQuery expressions are applied to
an XML document and return a sequence of qualifying items.
Items can be primitive values or nodes (elements, attributes, documents).
The items returned do not need to be of the same type.
![Page 47: Semistructured Data and XML](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816900550346895de00f7d/html5/thumbnails/47.jpg)
47474747
XPATH A path expression returns the sequence of all
qualifying items that are reachable from the input item following the specified path.
A path expression is a sequence consisting of tags or attributes and special characters such as slashes (“/”).
Absolute path expressions are applied to some XML document and returns all elements that are reachable from the document’s root element following the specified path.
Relative path expressions are applied to an arbitrary node.
![Page 48: Semistructured Data and XML](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816900550346895de00f7d/html5/thumbnails/48.jpg)
48484848
XPATH<?XML version=“1.0” standalone =“yes” ?><bibliography>
<book bookID = “b100“> <title> Foundations… </title> <author> Abiteboul </author> <author> Hull </author> <author> Vianu </author> <publisher> Addison Wesley </publisher> <year> 1995 </year> </book>…
</bibliography>
Applied to the above document, the XPath expression /bibliography/book/author returns the sequence
<author> Abiteboul </author> <author> Hull </author> <author> Vianu </author> . . .
![Page 49: Semistructured Data and XML](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816900550346895de00f7d/html5/thumbnails/49.jpg)
49494949
ATTRIBUTES If we do not want to return the qualifying elements, but the
value one of their attributes, we end the path expression with @attribute.
<?XML version=“1.0” standalone =“yes” ?><bibliography>
<book bookID = “b100“> <title> Foundations… </title> <author> Abiteboul </author> <author> Hull </author> <author> Vianu </author> <publisher> Addison Wesley </publisher> <year> 1995 </year> </book>
the XPath expression /bibliography/book/@bookID
returns the sequence “b100“ . . .
![Page 50: Semistructured Data and XML](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816900550346895de00f7d/html5/thumbnails/50.jpg)
50505050
WILDCARDS We can use wildcards instead of actual tags
and attributes:* means any tag, and @* means any attribute.
Examples /bibliography/*/author returns the
sequence <author> Abiteboul </author> <author> Hull </author>.
/bibliography//author/@* returns the sequence “IBM“
“a739“.
![Page 51: Semistructured Data and XML](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816900550346895de00f7d/html5/thumbnails/51.jpg)
51515151
PATH EXPRESSIONS
Examples: Bib.paper Bib.book.publisher Bib.paper.author.lastname
Given an OEM instance, the value of a path expression p is a set of objects
![Page 52: Semistructured Data and XML](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816900550346895de00f7d/html5/thumbnails/52.jpg)
52525252
PATH EXPRESSIONS
Examples:
DB =
&o1
&o12 &o24 &o29
&o43
&o70 &o71
&96
&243 &206
&25
“Serge” “Abiteboul”
1997
“Victor” “Vianu” 122 133
paper bookpaper
references
references references
authortitle year httpauthor
authorauthor
title publisherauthor
authortitle
page
firstnamelastname firstname lastname first
last
Bib
&o44 &o45 &o46
&o47 &o48 &o49 &o50 &o51
&o52
Bib.paper={&o12,&o29}Bib.book.publisher={&o51}Bib.paper.author.lastname={&o71,&206}
![Page 53: Semistructured Data and XML](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816900550346895de00f7d/html5/thumbnails/53.jpg)
XML-QUERY = XQUERY
![Page 54: Semistructured Data and XML](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816900550346895de00f7d/html5/thumbnails/54.jpg)
54545454
XQUERY
Summary: FOR-LET-WHERE-ORDERBY-RETURN = FLWOR
FOR/LET Clauses
WHERE Clause
ORDERBY/RETURN Clause
List of tuples
List of tuples
Instance of Xquery data model
![Page 55: Semistructured Data and XML](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816900550346895de00f7d/html5/thumbnails/55.jpg)
55555555
XQUERY FLWOR expressions are similar to SQL
select . . from . . . where . . . queries. XQuery allows zero, one or more for and let
clauses. The where clause is optional. There is one optional order-by clause. Finally, there is exactly one return clause. XQuery is case-sensitive. XQuery (and XPath) is a W3C standard.
![Page 56: Semistructured Data and XML](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816900550346895de00f7d/html5/thumbnails/56.jpg)
56565656
XQUERY CLAUSES for $x in expr
Defines node variable $x. The expression expr evaluates to a sequence of
items. The variable $x is assigned to each item, in turn,
and the body of the for clause is executed once for each assignment.
let $x := expr Defines collection variable $x. The expression expr evaluates to a sequence of
items. The variable is bound to the entire sequence of
items. Useful for common subexpressions and for
aggregations.
![Page 57: Semistructured Data and XML](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816900550346895de00f7d/html5/thumbnails/57.jpg)
57575757
XQUERY CLAUSES where condition
The condition is a boolean expression. The clause is applied to some item. If and only if the condition evaluates to true, the
following return clause is executed for that item. return expression
The result of a FLWOR clause is a sequence of items.
Expression defines the result format for the current (qualifying) item.
The sequence of items produced by expression is appended to the sequence of items produced so far.
![Page 58: Semistructured Data and XML](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816900550346895de00f7d/html5/thumbnails/58.jpg)
58585858
INTERPRETATION AS XQUERY XQuery expressions can be used wherever an
XML expression of any kind is permitted. Any text string is acceptable as content of a
tag or value of an attribute. If a string contains an XQuery expression that
should be evaluated, this substring must be surrounded by curly brackets {}.
Example
for $b in doc("bib.xml")/bibliography/book return <result id = {$b/@bookID}>{$b/title}</result>
![Page 59: Semistructured Data and XML](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816900550346895de00f7d/html5/thumbnails/59.jpg)
59595959
FOR V.S. LET Find all books
FOR $x IN document("bib.xml")/bib/book
RETURN <result> $x </result>
Returns: <result> <book>...</book></result> <result> <book>...</book></result> <result> <book>...</book></result> ...
LET $x IN document("bib.xml")/bib/book
RETURN <result> $x </result>
Returns: <result> <book>...</book> <book>...</book> <book>...</book> ...</result>
![Page 60: Semistructured Data and XML](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816900550346895de00f7d/html5/thumbnails/60.jpg)
60606060
XQUERY
Find all book titles published after 1995:FOR $x IN document("bib.xml")/bib/book
WHERE $x/year > 1995
RETURN $x/title
Result: <title> abc </title> <title> def </title> <title> ghi </title>
![Page 61: Semistructured Data and XML](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816900550346895de00f7d/html5/thumbnails/61.jpg)
61616161
ORDERING THE QUERY RESULT The order-by clause allows you to order the
results of an XQuery expression.order-by list of expressions
The sort order is based on the value of the first expression. Ties are broken based on the value of the second (if necessary third etc.) expression.
By default, the order is ascending. A descending sort order can be specified
using descending.
![Page 62: Semistructured Data and XML](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816900550346895de00f7d/html5/thumbnails/62.jpg)
62626262
ELIMINATION OF DUPLICATES The built-in function distinct-values eliminates
duplicates from a sequence of result items. In principle, it applies only to primitive
(atomic) types. It can also be applied to elements, but then it
will remove their tags, replacing them by quotes “”.
ExampleIf return $b/title produces
<title> aaa </title> <title> bbb </title> <title> aaa </title>
then distinct-values (return $b/title) produces
“aaa” “bbb”.
![Page 63: Semistructured Data and XML](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816900550346895de00f7d/html5/thumbnails/63.jpg)
63636363
XQUERYFor each author of a book by Morgan Kaufmann,
list all books she published:
FOR $a IN distinct(document("bib.xml") /bib/book[publisher=“Morgan Kaufmann”]/author)
RETURN <result>
$a,
FOR $t IN /bib/book[author=$a]/title
RETURN $t
</result>
distinct = a function thateliminates duplicates
Result: <result>
<author>Jones</author> <title> abc </title> <title> def </title> </result> <result> <author> Smith
</author> <title> ghi </title> </result>
![Page 64: Semistructured Data and XML](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816900550346895de00f7d/html5/thumbnails/64.jpg)
64646464
JOINS We can join two or more documents, by using
one variable for each of the documents . We let a variable range over the elements of
the corresponding document, within a for-clause.
Need to be careful when comparing elements for equality, since their equality is by element identity, not by element content.
Typically, we want to compare the element content.
The built-in function data(E) returns the content of an element E.
![Page 65: Semistructured Data and XML](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816900550346895de00f7d/html5/thumbnails/65.jpg)
65656565
XQUERY
Find books whose price is larger than average:
LET $a=avg(document("bib.xml")/bib/book/price)
FOR $b in document("bib.xml")/bib/book
WHERE $b/price > $a
RETURN $b
![Page 66: Semistructured Data and XML](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816900550346895de00f7d/html5/thumbnails/66.jpg)
66666666
SORTING IN XQUERY
<publisher_list> FOR $p IN distinct(document("bib.xml")//publisher) ORDERBY $p RETURN <publisher> <name> $p/text() </name> , FOR $b IN document("bib.xml")//book[publisher = $p] ORDERBY $b/price DESCENDING
RETURN <book> $b/title , $b/price </book> </publisher></publisher_list>
![Page 67: Semistructured Data and XML](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816900550346895de00f7d/html5/thumbnails/67.jpg)
67676767
IF-THEN-ELSE
FOR $h IN //holding ORDERBY $h/titleRETURN <holding>
$h/title,
IF $h/@type = "Journal"
THEN $h/editor
ELSE $h/author
</holding>
![Page 68: Semistructured Data and XML](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816900550346895de00f7d/html5/thumbnails/68.jpg)
68686868
EXISTENTIAL QUANTIFIERS
FOR $b IN //book
WHERE SOME $p IN $b//para SATISFIES
contains($p, "sailing")
AND contains($p, "windsurfing")
RETURN $b/title
![Page 69: Semistructured Data and XML](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816900550346895de00f7d/html5/thumbnails/69.jpg)
69696969
QUANTIFICATION XQuery supports the existential and the
universal quantifier. Universal quantifier
every $v in expression1 satisfies expression 2
Existential quantifiersome $v in expression1 satisfies
expression 2 Expression1 evaluates to a sequence of
items, expression 2 is a boolean expression.
![Page 70: Semistructured Data and XML](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816900550346895de00f7d/html5/thumbnails/70.jpg)
70707070
AGGREGATION XQuery provides built-in functions for the
standard aggregations such as SUM, MIN, COUNT and AVG.
They can be applied to any XQuery expression, i.e. to any sequence of items.
Example avg(doc("bib.xml")/bibliography/book/price)
count(doc("bib.xml")/bibliography/book/price)
Computes the average book price and the number of
books, resp.
![Page 71: Semistructured Data and XML](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816900550346895de00f7d/html5/thumbnails/71.jpg)
71717171
XQUERY EXAMPLES Find books whose price is larger than the
average price.
Uses aggregate operator (avg), applied to the result of a path expression.
let $a:=avg(doc("bib.xml")/bibliography/book/price)
for $b in doc("bib.xml")/bibliography/book
where $b/price > $a
return $b
![Page 72: Semistructured Data and XML](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816900550346895de00f7d/html5/thumbnails/72.jpg)
72727272
XQUERY EXAMPLES
Find title of books with a paragraph containing the terms “sailing” and “windsurfing”.
Uses existential quantifier (some) and string matching (contains).
for $b in doc("bib.xml")//book
where some $p in $b//para satisfies
contains($p, "sailing") and contains($p, "windsurfing")
return $b/title