ADT 2006 Lecture 3 XML/XQuery Data Management...
Transcript of ADT 2006 Lecture 3 XML/XQuery Data Management...
ADT 2008ADT 2008Lecture 3Lecture 3
XML/XQuery Data ManagementXML/XQuery Data Management
Beyond Chapter 10 ofBeyond Chapter 10 ofSilberschatz, Korth, SudarshanSilberschatz, Korth, Sudarshan“Database System Concepts”“Database System Concepts”
Stefan [email protected]
http://www.cwi.nl/~manegold/
2
[email protected] Lecture 3: XML/XQuery Data Management ADT 2008
why• Motivation & The Big Picture
what• Crash Course XQuery
WHO • XML files Saxon, Galax, GNU Qexo• XML DBMS eXist, BerkeleyDB, MonetDB, XHive, Tamino, Xyleme• XML EAI BEA Liquid Data, Data Direct, Mark Logic, Mono, Ipedo• XML RDBMS Oracle10g, SQLserver 2005, DB2
how• Under The Hood of MonetDB/XQuery• Some Benchmarks
XML DatabasesXML Databases
3
[email protected] Lecture 3: XML/XQuery Data Management ADT 2008
XML Data Management SystemsXML Data Management Systems
XML File Processors Used as part of a document processing pipeline Small documents (messages)
XQuery Processor
XML
input
XML
output
4
[email protected] Lecture 3: XML/XQuery Data Management ADT 2008
XML File ProcessorsXML File Processors Used as part of a document processing pipelineUsed as part of a document processing pipeline Small documents (messages)Small documents (messages)
XML Databases Manage large collections of XML documents Text Keyword Search Support (XML contains text..) Integration with Web Servers/Platforms
web server
application logic
web browser
the internet
XML DBMS
requestHTML
XMLXQuery
XML Data Management SystemsXML Data Management Systems
5
[email protected] Lecture 3: XML/XQuery Data Management ADT 2008
XML File ProcessorsXML File Processors Used as part of a document processing pipelineUsed as part of a document processing pipeline Small documents (messages)Small documents (messages)
XML Databases Manage large collections of XML documents Text Keyword Search Support (XML contains text..) Integration with Web Servers/Platforms
web server
xslt rendering
web browser
the internet
XML DBMS
requestXHTML
XMLXQuery
=
XML Data Management SystemsXML Data Management Systems
6
[email protected] Lecture 3: XML/XQuery Data Management ADT 2008
XML File ProcessorsXML File Processors Used as part of a document processing pipelineUsed as part of a document processing pipeline Small documents (messages)Small documents (messages)
XML DatabasesXML Databases Manage large collections of XML documentsManage large collections of XML documents Text Keyword Search Support (XML contains text..)Text Keyword Search Support (XML contains text..) Integration with Web Servers/PlatformsIntegration with Web Servers/Platforms
XML Integration Platforms XML as lingua franca to integrate data XML data: intergrate data sources in XML; realtime access XML messaging: SOA frameworks (SOAP/WSDL,UDDI,BCEL) XQuery/XSLT for querying/transforming XML Integrated in J2EE/.NET application server (XQJ, XLinq)
WSDL
AppX AppX
WSDL
AppX
Unified XML Data Layer
XML messagetransformation
integrated virtualXML data model
XML DBMS
XML DBMS
RDBMSsotherdata
sources
Application server framework
SOA Framework
data caching
mediator mediatormediator
SOAP
XML Data Management SystemsXML Data Management Systems
7
[email protected] Lecture 3: XML/XQuery Data Management ADT 2008
XML File ProcessorsXML File Processors Used as part of a document processing pipelineUsed as part of a document processing pipeline Small documents (messages)Small documents (messages)
XML DatabasesXML Databases Manage large collections of XML documentsManage large collections of XML documents Text Keyword Search Support (XML contains text..)Text Keyword Search Support (XML contains text..) Integration with Web Servers/PlatformsIntegration with Web Servers/Platforms
XML Integration PlatformsXML Integration Platforms XML as lingua franca to integrate dataXML as lingua franca to integrate data XML data: intergrate data sources in XML; realtime accessXML data: intergrate data sources in XML; realtime access XML messaging: SOA frameworks (SOAP/WSDL,UDDI,BCEL)XML messaging: SOA frameworks (SOAP/WSDL,UDDI,BCEL) XQuery/XSLT for querying/transforming XMLXQuery/XSLT for querying/transforming XML Integrated in J2EE/.NET application server (JXQ, XLinq)Integrated in J2EE/.NET application server (JXQ, XLinq)
RDBMS with XML Functionality Easily mix relational and XML, can be very useful for that .ini/properties file Query the XML VARCHAR with SQL/XML ugly but works
web server
application logic
web browser
the internet
RDBMS
request
XMLSQL/XML
xml
HTML
XML Data Management SystemsXML Data Management Systems
8
[email protected] Lecture 3: XML/XQuery Data Management ADT 2008
Google HitsGoogle Hits
market share analysis
(poor-man‘s)EAI458KBEA Liquid Data xquery
xmldb18KNeoCore XMS xquery
xmldb24KXyleme xquery
EAI22KIpedo xquery
xmldb1440KeXist xquery
file17KGalax xquery
EAI24KOpenlink Virtuoso xquery
file41KQexo xquery
xmldb56KTamino xquery
xmldb65KX-Hive xquery
xmldb87KMonetDB xquery
EAI88KSonic XML xquery
Xmldb142KBerkeleyDB XML xquery
Xmldb225KXindice xquery
EAI419KMono xquery
EAI438KMark Logic xquery
File743KSaxon xquery
rdbms1220KSQLServer xquery
rdbms1470KOracle xquery
EAI2170KDataDirect xquery
9
[email protected] Lecture 3: XML/XQuery Data Management ADT 2008
XML Data Management SystemsXML Data Management Systems
XML File Processors
XML Databases
XML Integration Platforms (EAI)
RDBMS with SQL/XML Functionality
#317KGalax xquery
#241KQexo xquery
#1743KSaxon xquery
#2438KMark Logic xquery
#11440KeXist xquery
#656KTamino xquery
#565KX-Hive xquery
#487KMonetDB xquery
#3142KBerkeleyDB XML xquery
#2225KXindice xquery
#2458KBEA Liquid Data xquery
#622KIpedo xquery
#524KOpenlink Virtuoso xquery
#488KSonic XML xquery
#3419KMono xquery
#12170KDataDirect xquery
#21220KSQLServer xquery
#11470KOracle xquery
10
[email protected] Lecture 3: XML/XQuery Data Management ADT 2008
XML DatabasesXML DatabasesQueryingHow well is XPath/XQuery implemented?
All Axes? Collations? XMLSchema? Dynamic Typing? Modules? Recursive UDF?UpdatesWhat update dialect is used? (XUpdate / updateX / WebDAV / other)
W3C Update Facility Proposal (since jan 2006!)DB propertiesQuery performance/throughput (benchmarks published?)Update consistency model (fully serializable / snapshot consistency / s.th. less)Replication, Failover, Backup facilities APIsSOAP / WSDL
Call a query from outside <=> Calling out from a queryWeb Support
Cocoon/apache modules (web sessions, low overhead web queries)XML Beans (J2EE served from XMLDB)WebDAV
Language bindingsXQJ =>Java Binding / Xlinq => C# Binding
11
[email protected] Lecture 3: XML/XQuery Data Management ADT 2008
XML DBMS comparisonXML DBMS comparison
+
+
-
+
-
++
+
fulltext
++++++Mark Logic
+++++++Tamino
+++++++++++X-Hive
+-+++++++MonetDB/XQuery
++++++++BerkeleyDB XML
+--+-Apache Xindice
+++-+++eXist
APIsreplicationscalabilityupdatesxquery
12
[email protected] Lecture 3: XML/XQuery Data Management ADT 2008
XML Data Integration (EII)XML Data Integration (EII)• Virtual XML View Functionality
• Can you define (and query/update) a virtual XML view instead of the explicit sources
• Note: no W3C standard for XML views yet• Supported Backends
• What data sources can be backends ODBC, JDBC, excel, word, xml
• Which (kind of queries can be pushed down)• SOA Framework Integration
• SOAP, WSDL, UDDI, BCEL• Application Framework (J2EE/.NET) integration
• Business logic in a 3GPL• XML beans or XML object binding• XQJ / Xlinq Java/C# Xquery interface
• Scalability• Load Balancing, Replication, Data Caching (!)
13
[email protected] Lecture 3: XML/XQuery Data Management ADT 2008
XML Data Integration (EII)XML Data Integration (EII)
+++++Ipedo
+++++OpenLink Virtuoso
+-++++Sonic XML
++-++Mono
++++++++BEA Liquid Data
--++-DataDirect
J2EE/.NETxml viewmediatorsSOA
14
[email protected] Lecture 3: XML/XQuery Data Management ADT 2008
XML in a Relational DBMSXML in a Relational DBMS
• Store XML as a BLOB in relational database • index by materializing indexed expressions in separate columns• Plus: store XML in parsed and validated form• Minus: proprietary solution (blob is a black box)• Minus: replicate data for indexing
• Schemabased Shredding• Map XML Schema / DTD to SQL DDL• Plus: integrates well with relational data• Minus: missing tools, complicated
• SQL / XML• Extend SQL with XML data type• Plus: integrates well with relational data• Minus: not clear how it integrates with application, odd „marriage“
15
[email protected] Lecture 3: XML/XQuery Data Management ADT 2008
History of SQL / XMLHistory of SQL / XML
• First edition part of SQL:2003• Part 14 of the SQL standard• Predates XQuery standard!!!• Limited functionality storage and publishing
• Second edition• More complete integration of XQuery + XQuery Data Model• Advanced Query capabilities• Published in 2006
16
[email protected] Lecture 3: XML/XQuery Data Management ADT 2008
<PhantasyPeople>
<row>
<Id>4711</Id>
<Name>Wutz</Name>
</row>
<row>
<Id>911</Id>
<Name>Potter</Name>
</row>
</PhantasyPeople>
Potter911
Wutz4711
NameId
Phantasy-People
Publishing Rel. Data as XML (1/2)Publishing Rel. Data as XML (1/2)
17
[email protected] Lecture 3: XML/XQuery Data Management ADT 2008
Publishing Rel. Data as XML (2/2)Publishing Rel. Data as XML (2/2)
18
[email protected] Lecture 3: XML/XQuery Data Management ADT 2008
Publishing XML as Rel. TablePublishing XML as Rel. Table
('$MyDB//customer'
19
[email protected] Lecture 3: XML/XQuery Data Management ADT 2008
XML Type in SQLXML Type in SQL
• A new type (like varchar, date, numeric)• SQL:2003 XML type restricted to
• XML document or• XML element or• Sequence of XML elements
• SQL / XML, 2nd edition• Full support of XQuery Data Model• XML(SEQUENCE), XML(ANY CONTENT), ...
20
[email protected] Lecture 3: XML/XQuery Data Management ADT 2008
Example (SQL:2003)Example (SQL:2003)create table books(title varchar(20),authors XML);
„P. Boncz“MonetDB
<author>D. Chamberlin </author>
<author>D. Florescu</author>
<author>et al.</author>
XQuery 1.0
AuthorsTitle
No schema validation, no typing!
• SQL:2006 => explicit validation with XMLVALIDATE()
21
[email protected] Lecture 3: XML/XQuery Data Management ADT 2008
XMLQueryXMLQuery
XMLQuery(
XQueryexpression
PASSING { BY REF | BY VALUE}
(valueexpression AS identifier [BY REF | BY VALUE])*
RETURNING { CONTENT|SEQUENCE } { BY REF|BY VALUE}
)
• If PASSING value has no identifier, then that is context node• BY REF preserves Id (of an XML type)• BY VALUE creates a copy of the data
22
[email protected] Lecture 3: XML/XQuery Data Management ADT 2008
Oracle10g/11g: XQuery SupportOracle10g/11g: XQuery Support
1. XMLDB integrated database engine• SQL / XML standard support• Optimized queries – rewrite to relational
2. Standalone Java query engine• 100% Java• Integrated into Oracle App Server XDS• Interoperates with XSLT/XPath
First relational database to ship an XQuery implementation !
23
[email protected] Lecture 3: XML/XQuery Data Management ADT 2008
Oracle 10g/11g: XQuery database supportOracle 10g/11g: XQuery database support
• Production in Oracle Database 10gr2• Supports XMLQuery and XMLTable construct• Native compilation into SQL /XML structures• Returns XMLType(Content)• Can query over relational, OR, XMLType data• fn:doc Maps to XDB Repository on server• SQLPlus provides xquery command to execute XQuery• XSLT will also get compiled to XQuery
24
[email protected] Lecture 3: XML/XQuery Data Management ADT 2008
Microsoft SQL Server 2005: OverviewMicrosoft SQL Server 2005: Overview
XML ParserXML ParserXMLXML
ValidationValidation
XML datatypeXML datatype
(binary XML)(binary XML)
SchemaSchemaCollectionCollection
XML SchemataXML Schemata
query()query()
modify()modify()Node Node TableTable
PATH PATH Index Index
PROP PROP Index Index
VALUE VALUE Index Index
PRIMARYPRIMARYXML INDEXXML INDEX
query()query()
25
[email protected] Lecture 3: XML/XQuery Data Management ADT 2008
• Create XML index on XML columnCREATE PRIMARY XML INDEX idx_1 ON docs (xDoc)
• Creates secondary indexes on tags, values, paths• Speeds up queries
• Results can be served directly from index• Entire query is optimized
Same award winning cost based optimizer
• Indexes are used as available
Microsoft SQL Server 2005: IndexingMicrosoft SQL Server 2005: Indexing
26
[email protected] Lecture 3: XML/XQuery Data Management ADT 2008
IBM DB2 v9: OverviewIBM DB2 v9: Overview
27
[email protected] Lecture 3: XML/XQuery Data Management ADT 2008
IBM DB2 v9: StorageIBM DB2 v9: Storage
28
[email protected] Lecture 3: XML/XQuery Data Management ADT 2008
IBM DB2 v9: SQL in XQueryIBM DB2 v9: SQL in XQuery
29
[email protected] Lecture 3: XML/XQuery Data Management ADT 2008
IBM DB2 v9: XQuery in SQLIBM DB2 v9: XQuery in SQL
30
[email protected] Lecture 3: XML/XQuery Data Management ADT 2008
IBM DB2 v9: IndexingIBM DB2 v9: Indexing
31
[email protected] Lecture 3: XML/XQuery Data Management ADT 2008
why• Motivation & The Big Picture
what• Crash Course XQuery
who• XML files Saxon, Galax, GNU Qexo• XML DBMS eXist, BerkeleyDB, MonetDB, XHive, Tamino, Xyleme• XML EAI BEA Liquid Data, Data Direct, Mark Logic, Mono, Ipedo• XML RDBMS Oracle10g, SQLserver 2005, DB2
HOW • Under The Hood of MonetDB/XQuery• Some Benchmarks
XML DatabasesXML Databases
32
[email protected] Lecture 3: XML/XQuery Data Management ADT 2008
XQuery Systems: 2 ApproachesXQuery Systems: 2 Approaches
Native
Tree is basic data structure• treestorage manager • treequery processing (algebra)• treequery optimization
Reinventing the wheel?
Relational
Leverage RDBMS storage, query processing & optimization
• XML shredded into tables
• XQuery translated into SQL
Let’s use the old rim,
but make a new tyre!
X-Hive TimberDB2BDB-XML
GalaxMicrosoftOracle
MonetDB/XQuery
33
[email protected] Lecture 3: XML/XQuery Data Management ADT 2008
The IdeaThe Idea
Steel
Rim
Aluminium
Tyre
The IdeaThe Idea
34
[email protected] Lecture 3: XML/XQuery Data Management ADT 2008
Storing XML in Relations: Schema-Based ApproachStoring XML in Relations: Schema-Based Approach
35
[email protected] Lecture 3: XML/XQuery Data Management ADT 2008
Schema-Based Approach: Benefits & DrawbacksSchema-Based Approach: Benefits & Drawbacks
36
[email protected] Lecture 3: XML/XQuery Data Management ADT 2008
Peter Boncz, Stefan Manegold (CWI Amsterdam)
Torsten Grust, Jens Teubner, Jan Rittinger (Technische Universität München)
Maurice van Keulen (Technische Universiteit Twente)
MonetDB/XQueryA Fast XQuery Processor Powered by a Relational Engine
[ACM/SIGMOD 2006]
37
[email protected] Lecture 3: XML/XQuery Data Management ADT 2008
MonetDBMonetDB
open-source Mozilla license => download at monetdb-xquery.org
38
[email protected] Lecture 3: XML/XQuery Data Management ADT 2008
Pathfinder Project
Torsten Grust, Jens Teubner, Jan Rittinger
Maurice van Keulen
MonetDB/XQueryMonetDB/XQuery
open-source Mozilla license => download at monetdb-xquery.org
39
[email protected] Lecture 3: XML/XQuery Data Management ADT 2008
Schema-Oblivious Storage: XPath AcceleratorSchema-Oblivious Storage: XPath Accelerator
40
[email protected] Lecture 3: XML/XQuery Data Management ADT 2008
<a> <b> <c/> </b> <d/> <e> <f> <g/> <h/> </f> <i> <j/> </i> </e> </a>
XPath Accelerator: XPath Accelerator: pre/postpre/post plane plane
Node-based relational encoding of XQuery's data model
41
[email protected] Lecture 3: XML/XQuery Data Management ADT 2008
0<a> 1<b> 2<c/> </b> 3<d/> 4<e> 5<f> 6<g/> 7<h/> </f> 8<i> 9<j/> </i> </e> </a>
Node-based relational encoding of XQuery's data model
XPath Accelerator: XPath Accelerator: pre/postpre/post plane plane
42
[email protected] Lecture 3: XML/XQuery Data Management ADT 2008
0<a> 1<b> 2<c/>0 </b>1 3<d/>2 4<e> 5<f> 6<g/>3 7<h/>4 </f>5 8<i> 9<j/>6 </i>7 </e>8 </a>9
Node-based relational encoding of XQuery's data model
XPath Accelerator: XPath Accelerator: pre/postpre/post plane plane
43
[email protected] Lecture 3: XML/XQuery Data Management ADT 2008
0<a> 1<b> 2<c/>0 </b>1 3<d/>2 4<e> 5<f> 6<g/>3 7<h/>4 </f>5 8<i> 9<j/>6 </i>7 </e>8 </a>9
Node-based relational encoding of XQuery's data model
XPath Accelerator: XPath Accelerator: pre/postpre/post plane plane
44
[email protected] Lecture 3: XML/XQuery Data Management ADT 2008
0<a> 1<b> 2<c/>0 </b>1 3<d/>2 4<e> 5<f> 6<g/>3 7<h/>4 </f>5 8<i> 9<j/>6 </i>7 </e>8 </a>9
①
Node-based relational encoding of XQuery's data model
① f/following: SELECT * FROM pre_post WHERE pre > f.pre AND post > f.post
XPath Accelerator: XPath Accelerator: pre/postpre/post plane plane
45
[email protected] Lecture 3: XML/XQuery Data Management ADT 2008
0<a> 1<b> 2<c/>0 </b>1 3<d/>2 4<e> 5<f> 6<g/>3 7<h/>4 </f>5 8<i> 9<j/>6 </i>7 </e>8 </a>9
①
Node-based relational encoding of XQuery's data model
① f/following: SELECT * FROM pre_post WHERE pre > f.pre AND post > f.post f/descendant: SELECT * FROM pre_post WHERE pre > f.pre AND post < f.post
XPath Accelerator: XPath Accelerator: pre/postpre/post plane plane
46
[email protected] Lecture 3: XML/XQuery Data Management ADT 2008
0<a> 1<b> 2<c/>0 </b>1 3<d/>2 4<e> 5<f> 6<g/>3 7<h/>4 </f>5 8<i> 9<j/>6 </i>7 </e>8 </a>9
①
Node-based relational encoding of XQuery's data model
① f/following: SELECT * FROM pre_post WHERE pre > f.pre AND post > f.post f/descendant: SELECT * FROM pre_post WHERE pre > f.pre AND post < f.post f/preceeding: SELECT * FROM pre_post WHERE pre < f.pre AND post < f.post
XPath Accelerator: XPath Accelerator: pre/postpre/post plane plane
47
[email protected] Lecture 3: XML/XQuery Data Management ADT 2008
①
Node-based relational encoding of XQuery's data model0<a> 1<b> 2<c/>0 </b>1 3<d/>2 4<e> 5<f> 6<g/>3 7<h/>4 </f>5 8<i> 9<j/>6 </i>7 </e>8 </a>9
① f/following: SELECT * FROM pre_post WHERE pre > f.pre AND post > f.post f/descendant: SELECT * FROM pre_post WHERE pre > f.pre AND post < f.post f/preceeding: SELECT * FROM pre_post WHERE pre < f.pre AND post < f.post f/ancester: SELECT * FROM pre_post WHERE pre < f.pre AND post > f.post
XPath Accelerator: XPath Accelerator: pre/postpre/post plane plane
48
[email protected] Lecture 3: XML/XQuery Data Management ADT 2008
①
Node-based relational encoding of XQuery's data model
① f/following: SELECT * FROM pre_post WHERE pre > f.pre AND post > f.post f/descendant: SELECT * FROM pre_post WHERE pre > f.pre AND post < f.post f/preceeding: SELECT * FROM pre_post WHERE pre < f.pre AND post < f.post f/ancester: SELECT * FROM pre_post WHERE pre < f.pre AND post > f.post
Similar queries for other XPath axes
0<a> 1<b> 2<c/>0 </b>1 3<d/>2 4<e> 5<f> 6<g/>3 7<h/>4 </f>5 8<i> 9<j/>6 </i>7 </e>8 </a>9
XPath Accelerator: XPath Accelerator: pre/postpre/post plane plane
49
[email protected] Lecture 3: XML/XQuery Data Management ADT 2008
pre/postpre/post Table & Table & pre/size/levelpre/size/level Table Table
<a> <b> <c/> </b> <d/> <e> <f> <g/> <h/> </f> <i> <j/> </i> </e> </a>
Pre Posta 0 9b 1 1c 2 0d 3 2e 4 8f 5 5g 6 3h 7 4i 8 7j 9 6
Pre Size Level0 9 01 1 12 0 23 0 14 5 15 2 26 0 37 0 38 1 29 0 3
Post = pre + size - level
50
[email protected] Lecture 3: XML/XQuery Data Management ADT 2008
Complete XML Storage SchemaComplete XML Storage Schema
51
[email protected] Lecture 3: XML/XQuery Data Management ADT 2008
XPath evaluation (SQL)XPath evaluation (SQL)
Example query: /descendant::open_auction[./bidder]/annotation
SELECT DISTINCT a.pre FROM doc r, doc oa, doc b, doc a WHERE r.pre=0 AND oa.pre > r.pre AND oa.post < r.post AND oa.name = “open_auction” AND oa.kind = “elem” AND b.pre > oa.pre AND b.post < oa.post AND b.level = oa.level + 1 AND b.name = “bidder” AND b.kind < “elem” AND a.pre > oa.pre AND a.post < oa.post AND a.level = oa.level + 1 AND a.name = “annotation” AND a.kind = “elem” ORDER BY a.pre
<- descendant
} child
} child