about XML/Xquery/RDF
HTML vs. XML<h1> Bibliography </h1><p> <i> Foundations of Databases </i> Abiteboul, Hull, Vianu <br> Addison Wesley, 1995<p> <i> Data on the Web </i> Abiteoul, Buneman, Suciu <br> Morgan Kaufmann, 1999
<bibliography> <book> <title> Foundations…
</title> <author> Abiteboul
</author> <author> Hull </author> <author> Vianu </author> <publisher> Addison
Wesley </publisher> <year> 1995 </year> </book> …
</bibliography>
“Self-describing”
-Schema info part of the data
-Good for data exchange
(albeit baroque for storage)
Why are Database folks so excited about XML?
• XML is just a syntax for (self-describing) data
• This is still exciting because– No standard syntax for
relational data– With XML, we can
• Translate any legacy data to XML
• Can exchange data in XML format
– Ship over the web, input to any application
XML machine accessible meaningThis is what a web-page in natural language looks like for a machine
Jim Hendler
XML machine accessible meaning
CV
name
education
work
private
< >
< >
< >
< >
< >
XML allows “meaningful tags” to be added toparts of the text
Jim Hendler
XML machine accessible meaning
CV
name
education
work
private
< >
< >
< >
< >
< >
< >
< >
<>
<>
<>
But to your machine, the tags look like this….
Jim Hendler
XML machine accessible meaning
Schemas help….
CV
name
education
work
private
< >
< >
< >
< >
< >
< >
< >
<>
<>
<>
CV
name
education
work
private
< >
< >
< >
< >
< >
< >
< >
<>
<>
<>
< > …by relating common termsbetween documents
Jim Hendler
But other people use other schemas
CV
name
education
work
private
< >
< >
< >
< >
< >
< >
>
<>
<>
Someone else has one like this….
Jim Hendler
But other people use other schemas
CV
name
education
work
private
< >
< >
< >
< >
< >
< >
< >
<>
<>
<>
CV
name
education
work
private
< >
< >
< >
< >
< >
< >
< >
<>
<>
<>
< >…which don’t fit in
CV
name
education
work
private
< >
< >
< >
< >
< >
< >
< >
< >
< >
Moral: There is still
need for ontology
mapping..
Jim Hendler
11/18
The X-standards…
• XML: an on-the-wire representation for data– Xquery: a query language for XML– Xschema: a schema description language for
XML data• RDF: a language for meta-data description• WSDL/SOAP/UDDI: languages for
describing services
XML Terminology• tags: book, title, author, …• start tag: <book>, end tag: </book>• elements:
<book>…<book>,<author>…</author>• elements are nested• empty element: <red></red> abbrv. <red/>• an XML document: single root element
well formed XML document: if it has matching tags
<h1> Bibliography </h1><p> <i> Foundations of Databases </i> Abiteboul, Hull, Vianu <br> Addison Wesley, 1995<p> <i> Data on the Web </i> Abiteoul, Buneman, Suciu <br> Morgan Kaufmann, 1999
<bibliography> <book> <title> Foundations… </title> <author> Abiteboul </author> <author> Hull </author> <author> Vianu </author> <publisher> Addison Wesley </publisher> <year> 1995 </year> </book> …
</bibliography>
HTML describes presentation
XML describes content
XML Terminology• tags: book, title, author, …• start tag: <book>, end tag: </book>• elements:
<book>…<book>,<author>…</author>• elements are nested• empty element: <red></red> abbrv. <red/>• an XML document: single root element
well formed XML document: if it has matching tags
More XML: Attributes
<book price = “55” currency = “USD”> <title> Foundations of Databases </title> <author> Abiteboul </author> … <year> 1995 </year></book>
Attributes are single-valued --No guidance on when to use them
More XML: Oids and References
<person id=“o555”> <name> Jane </name> </person>
<person id=“o456”> <name> Mary </name> <children idref=“o123 o555”/></person>
<person id=“o123” mother=“o456”><name>John</name></person>
oids and references in XML are just syntax
Object identifiers
XML vs. Relational Data• XML is meant as a language that supports
both Text and Structured Data– Conflicting demands...
• XML supports semi-structured data– In essence, the schema can be union
of multiple schemas • Easy to represent books with or
without prices, books with any number of authors etc.
• XML supports free mixing of text and data– using the #PCDATA type
• XML is ordered (while relational data is unordered)
TEXT
Structured(relational)
Data
XMLLessStructure
MoreStructure
DTDs<!DOCTYPE paper [ <!ELEMENT paper (section*)> <!ELEMENT section ((title,section*) | text)> <!ELEMENT title (#PCDATA)> <!ELEMENT text (#PCDATA)>]>
<paper> <section> <text> </text> </section> <section> <title> </title> <section> … </section> <section> … </section> </section></paper>
Notice that DTD is not
In XML syntax…
Semi-structured
XML Schemas
• More recent proposal (with XML syntax)• unifies previous schema proposals• generalizes DTDs• uses XML syntax• two documents: structure and datatypes
– http://www.w3.org/TR/xmlschema-1– http://www.w3.org/TR/xmlschema-2
RDF: Meta-data Standard for Web<rdf:Description about=“www.mypage.com”> <about> birds, butterflies, snakes </about> <author> <rdf:Description> <firstname> John </firstname> <lastname> Smith </lastname> </rdf:Description> </author></rdf:Description>
www.mypage.com
birds, butterflies, snakes
John Smith
about author
firstname lastname
Good’ol semantic networks..?
Querying XML• Requirements:
– Need to handle lack of schema.• We may not know much about the data, so we need to navigate the XML.
– Need to support both “information retrieval” and “SQL-style” queries.
• Ordered vs. un-ordered XML – “Human readable”
• like SQL?
• Candidates– Many… based on conflicting requirements
• XSL: Makes IR folks happy• XML-QL: Makes DB folks happy• Xquery : W3C’s attempt to make everybody (un)happy
11/20
Agenda: Xquery examples
Information Integration
• XQuery 1.0: An XML Query Language
– W3C Working Draft 20 December 2001
• XML Query Use Cases – W3C Working Draft 20
December 2001• Microsoft .Net Xquery Language
Demo– http://131.107.228.20/– Supports querying on the
documents described in the W3C Use Cases
• Xquery Tutorial by Fankhauser & Wadler
– www.research.avayalabs.com/user/wadler/papers/xquery-tutorial/ xquery-tutorial.pdf
Xquery Resources
FLoWeR Expressions
Xquery queries are made up of FLWR expressions that work on “paths”
• For binds variables to nodes• Let computes aggregates• Where applies a formula to find matching elements• Return constructs the output elements
Path expressions are of the form: element//element/element[attrib=value]
Comparison to SQL• Look at the use case description on Xquery manual
• Supports all (?) SQL style queries (with different syntax of course) [default queries in the demo]
• Has support for – “construction”—outputting the answers in arbitrary XML formats
(use case XMP )– “path expressions” --- navigating the XML tree (use case seq)– Simple text queries [use case text]– Allows queries on “Tag” elements
• Removes the “data/meta-data” barrier in queries• For each book that has at least one author, list the title and first two authors,
and an empty "et-al" element if the book has additional authors. [XMP use case 6]
DTD for http://www.bn.com/bib.xml
<!ELEMENT bib (book* )> <!ELEMENT book (title, (author+ | editor+ ), publisher, price )> <!ATTLIST book year CDATA #REQUIRED ><!ELEMENT author (last, first )> <!ELEMENT editor (last, first, affiliation )> <!ELEMENT title (#PCDATA )> <!ELEMENT last (#PCDATA )> <!ELEMENT first (#PCDATA )> <!ELEMENT affiliation (#PCDATA )> <!ELEMENT publisher (#PCDATA )><!ELEMENT price (#PCDATA )>
Example Query
<bib> { for $b in /bib/book where $b/publisher =
"Addison-Wesley" and $b/@year > 1991 return <book year={ $b/@year
}> { $b/title } </book> } </bib>
“For all books after 1991, return with Year changed from a tag to an attribute”
<bib> <book year="1994"> <title>TCP/IP
Illustrated</title> </book> <book year="1992"> <title>Advanced
Programming in the Unix environment</title>
</book></bib>
ResultQuery
Example Query (2) • Return the books that cost more at amazon
than fatbrainLet $amazon := document(
http://www.amazon.com/books.xml),Let $fatbrain := document(
http://www.fatbrain.com/books.xml)For $am in $amazon/books/book, $fat in $fatbrain/books/bookWhere $am/isbn = $fat/isbn and $am/price > $fat/priceReturn <book>{ $am/title, $am/price, $fat/price
}<book>
Join
XML frenzy in the DB Community
• Now that XML is there, what can we do with it?– Convert all databases from Relational to XML?
• Or provide XML views of relational databases?– Develop theory of native XML databases?
• Or assume that XML data will be stored in relational databases..
– Issues: What sort of storage mechanisms? What sort of indices?
XML middleware for Databases• XML adapters (middle-ware)
received significant attention in DB community– SilkRoute (AT&T)– Xperanto (IBM)
• Issues:– Need to convert relational data
into XML• Tagging (easy)
– Need to convert Xquery queries into equivalent SQL queries
• Trickier as Xquery supports schema querying
SQL
Relations
Xquery
XML
Xquery Tutorial
Craig KnoblockUniversity of Southern California
References• XQuery 1.0: An XML Query Language
– W3C Working Draft 20 December 2001• XML Query Use Cases
– W3C Working Draft 20 December 2001• Microsoft .Net Xquery Language Demo
– http://131.107.228.20/– Supports querying on the documents described in the W3C Use
Cases• Xquery Tutorial by Fankhauser & Wadler
– www.research.avayalabs.com/user/wadler/papers/xquery-tutorial/ xquery-tutorial.pdf
DTD for http://www.bn.com/bib.xml
<!ELEMENT bib (book* )> <!ELEMENT book (title, (author+ | editor+ ), publisher, price )> <!ATTLIST book year CDATA #REQUIRED ><!ELEMENT author (last, first )> <!ELEMENT editor (last, first, affiliation )> <!ELEMENT title (#PCDATA )> <!ELEMENT last (#PCDATA )> <!ELEMENT first (#PCDATA )> <!ELEMENT affiliation (#PCDATA )> <!ELEMENT publisher (#PCDATA )><!ELEMENT price (#PCDATA )>
Data for www.bn.com/bib.xml<bib>
<book year="1994"> <title>TCP/IP Illustrated</title> <author><last>Stevens</last><first>W.</first></author> <publisher>Addison-Wesley</publisher> <price> 65.95</price>
</book> <book year="1992">
<title>Advanced Programming in the Unix environment</title> <author><last>Stevens</last><first>W.</first></author>
<publisher>Addison-Wesley</publisher> <price>65.95</price>
</book>
Data for www.bn.com/bib.xml (cont.)
<book year="2000"> <title>Data on the Web</title> <author><last>Abiteboul</last><first>Serge</first></author><author><last>Buneman</last><first>Peter</first></author><author><last>Suciu</last><first>Dan</first></author> <publisher>Morgan Kaufmann Publishers</publisher> <price> 39.95</price>
</book> <book year="1999">
<title>The Economics of Technology and Content for Digital TV</title> <editor> <last>Gerbarg</last><first>Darcy</first>
<affiliation>CITI</affiliation> </editor><publisher>Kluwer Academic Publishers</publisher> <price>129.95</price>
</book> </bib>
Document References
• Document can either be referenced explicitly or in the default namespace
• In the Microsoft Demo– /Bib =
document("http://www.bn.com/bib.xml")/bib• We will use /bib throughout, but you
must use the expansion to run the demo• In Theseus the document for xquery is
passed as input
Projection• Return the names of all authors of books/bib/book/author
=<author><last>Stevens</last><first>W.</first></author><author><last>Stevens</last><first>W.</first></author><author><last>Abiteboul</last><first>Serge</first></author><author><last>Buneman</last><first>Peter</first></author><author><last>Suciu</last><first>Dan</first></author>
Project (cont.)• The same query can also be written as a for loop/bib/book/author
=for $bk in /bib/book return
for $aut in $bk/author return $aut=
<author><last>Stevens</last><first>W.</first></author><author><last>Stevens</last><first>W.</first></author><author><last>Abiteboul</last><first>Serge</first></author><author><last>Buneman</last><first>Peter</first></author><author><last>Suciu</last><first>Dan</first></author>
Selection• Return the titles of all books published before
1997/bib/book[@year < "1997"]/title=<title>TCP/IP Illustrated</title><title>Advanced Programming in the Unix
environment</title>
Selection (cont.)• Return the titles of all books published before
1997/bib/book[@year < "1997"]/title=for $bk in /bib/book where $bk/@year < "1997" return $bk/title=<title>TCP/IP Illustrated</title><title>Advanced Programming in the Unix
environment</title>
Selection (cont.)• Return book with the title “Data on the Web”/bib/book[title = "Data on the Web"]=
<book year="2000"> <title>Data on the Web</title> <author><last>Abiteboul</last><first>Serge</first></
author><author><last>Buneman</last><first>Peter</first></
author><author><last>Suciu</last><first>Dan</first></author> <publisher>Morgan Kaufmann Publishers</publisher> <price> 39.95</price>
</book>
Selection (cont.)• Return the price of the book “Data on the
Web”/bib/book[title = "Data on the Web"]/price=<price> 39.95</price>
How would you return the book with a price of $39.95?
Selection (cont.)• Return the book with a price of $39.95for $bk in /bib/book where $bk/price = " 39.95" return $bk=
<book year="2000"> <title>Data on the Web</title> <author><last>Abiteboul</last><first>Serge</first></author><author><last>Buneman</last><first>Peter</first></author><author><last>Suciu</last><first>Dan</first></author> <publisher>Morgan Kaufmann Publishers</publisher> <price> 39.95</price>
</book>
Construction• Return year and title of all books published before 1997for $bk in /bib/book where $bk/@year < "1997" return <book>{ $bk/@year, $bk/title }</book>=<book year="1994"> <title>TCP/IP Illustrated</title></book><book year="1992"> <title>Advanced Programming in the Unix
environment</title></book>
Grouping• Return titles for each authorfor $author in distinct(/bib/book/author/last) return <author name={ $author/text() }> { /bib/book[author/last = $author]/title }</author>=<author name="Stevens"> <title>TCP/IP Illustrated</title> <title>Advanced Programming in the Unix environment</title></author><author name="Abiteboul"> <title>Data on the Web</title></author>…
Join• Return the books that cost more at amazon than
fatbrainLet $amazon := document(
http://www.amazon.com/books.xml),Let $fatbrain := document(
http://www.fatbrain.com/books.xml)For $am in $amazon/books/book, $fat in $fatbrain/books/bookWhere $am/isbn = $fat/isbn and $am/price > $fat/priceReturn <book>{ $am/title, $am/price, $fat/price }<book>
Example Query 1
<bib> { for $b in /bib/book where $b/publisher = "Addison-Wesley" and
$b/@year > 1991 return <book year={ $b/@year }> { $b/title } </book> } </bib>What does this do?
Result Query 1
<bib> <book year="1994"> <title>TCP/IP Illustrated</title> </book> <book year="1992"> <title>Advanced Programming in the Unix
environment</title> </book></bib>
Example Query 2<results>{ for $b in
document("http://www.bn.com/bib.xml")/bib/book, $t in $b/title, $a in $b/author return <result> { $t } { $a } </result> }</results>
Result Query 2<results> <result><title>TCP/IP Illustrated</title> <last>Stevens </last> </result> <result><title>Advanced Programming in the Unix environment</title> <last>Stevens</last> </result> <result><title>Data on the Web</title> <last>Abiteboul</last> </result> <result> <title>Data on the Web</title> <last>Buneman</last> </result> <result><title>Data on the Web</title> <last>Suciu</last> </result></results>
Example Query 3
<books-with-prices>{ for $b in document("http://www.bn.com/bib.xml")//book, $a in
document("http://www.amazon.com/reviews.xml")//entry where $b/title = $a/title return <book-with-prices> { $b/title } <price-amazon>{ $a/price/text() }</price-amazon> <price-bn>{ $b/price/text() }</price-bn> </book-with-prices>}</books-with-prices>
Result Query 3
<books-with-prices> <book-with-prices> <title>TCP/IP Illustrated</title> <price-amazon>65.95</price-amazon> <price-bn> 65.95</price-bn> </book-with-prices> <book-with-prices> <title>Advanced Programming in the Unix environment</title> <price-amazon>65.95</price-amazon> <price-bn>65.95</price-bn> </book-with-prices> <book-with-prices> <title>Data on the Web </title> <price-amazon>34.95</price-amazon> <price-bn> 39.95</price-bn> </book-with-prices></books-with-prices>
Example Query 4
<bib> { for $b in document("www.bn.com/bib.xml")//book where $b/publisher = "Addison-Wesley" and $b/@year >
"1991" return <book> { $b/@year } { $b/title } </book> sortby (title) } </bib>
Example Result 4
<bib> <book year="1992"> <title>Advanced Programming in the Unix
environment</title> </book> <book year="1994"> <title>TCP/IP Illustrated</title> </book> </bib>
Impact of XML on IntegrationIf and when all sources accept
Xqueries and exchange data in XML format, then– Mediator can accept user
queries in Xquery– Access sources using Xquery– Get data back in XML format– Merge results and send to user
in XML format• How about now?
– Sources can use XML adapters (middle-ware)
Mediator
Xquery
XML
Xquery
XML
SQL
Relations
Xquery
XML
Is XML standardization a magical solution for Integration?
If all WEB sources standardize into XML format– Source access (wrapper generation
issues) become easier to manage– BUT all other problems remain
• Still need to relate source (XML)schemas to mediator (XML)schema
• Still need to reason about source overlap, source access limitations etc.
• Still need to manage execution in the presence of source/network uncertainities
QueryQuery
Services
Webpages
Structureddata
Sensors(streamingData)
Services
Webpages
Structureddata
Sensors(streamingData)
ExecutorNeeds to handleSource/network
Interruptions,Runtime uncertainity,
replanning
Source Fusion/Query Planning
Needs to handle:Multiple objectives,Service composition,
Source quality & overlap
Source TrustOntologies;
Source/ServiceDescriptions
Replanning
Requests
Prefere
nce/U
tility
Model
Answers
ProbingQueries
Sour
ce C
alls
Monitor
Updating StatisticsExecutor
Needs to handleSource/network
Interruptions,Runtime uncertainity,
replanning
Source Fusion/Query Planning
Needs to handle:Multiple objectives,Service composition,
Source quality & overlap
Source TrustOntologies;
Source/ServiceDescriptions
Replanning
Requests
Prefere
nce/U
tility
Model
Answers
ProbingQueries
Sour
ce C
alls
Monitor
Updating Statistics
Mediator
Xquery
XML
Xquery
XML
“Semantic Web”
• The LAV/GAV approaches assume that some human expert will do the actual schema mapping
• The “semantic-web” initiative attempts to automate schema mapping– Idea: Allow pages to write logical axioms relating their
vocabulary (tags) to other external tags– Support automatic inference of relations between
source and mediator schema using these rules • DAML+OIL
Top Related