JAXB Transparency No. 1 JAXB (Java API for XML Data Binding) Cheng-Chia Chen.
M3 XML Processing - Sepp Hochreiter · zXML Data Binding – Non-Generic Mapping {JAXB 2.0 – Java...
Transcript of M3 XML Processing - Sepp Hochreiter · zXML Data Binding – Non-Generic Mapping {JAXB 2.0 – Java...
Modul 3:
XML Processing
a.Univ.-Prof. Dr. Werner Retschitzegger
Vorlesu
ng
IFS in der B
ioinformatik
SS 2011
Johannes Kepler University Linzwww.jku.ac.at
Johannes Kepler University Linzwww.jku.ac.at
Institute of Bioinformaticswww.bioinf.jku.at
Institute of Bioinformaticswww.bioinf.jku.at
IFSIFSInformation Systems Group
www.ifs.uni-linz.ac.at
IFSIFSIFSIFSInformation Systems Group
www.ifs.uni-linz.ac.at
M3-2
XML ProcessingXML & DBXQueryXPathIntroduction
© 2011 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
Outline
IntroductionMotivationXML Processing Alternatives – OverviewExtensions of Existing Languages Interfaces to Existing LanguagesNative XML Processing
XPathXQueryXML & DB
The following slides are based (among others) on:Kay, Michael: XPath 2.0 Programmer's Reference (3rd ed.), Wiley, Aug. 2004.Walmsley, Priscilla, XQuery, OReilly, March 2007.Klettke, Meike, Meyer, Holger: XML & Datenbanken, dpunkt.verlag, Jan. 2003.
M3-3
XML ProcessingXML & DBXQueryXPathIntroduction
© 2011 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
Motivation
Huge amount of XML data, steadily growing
We need to “process” it, including its “storage”Filter, search, select, join, aggregateCreate new pieces of informationClean, normalize the data Update itVerify the correctness Take actions based on the existing dataWrite complex execution flowsStore it efficiently
No common architecture like for RDBS Applications are too heterogeneous
M3-4
XML ProcessingXML & DBXQueryXPathIntroduction
© 2011 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML Processing Alternatives – Overview
(1) Existing Language ExtensionsProcedural
JavaScript (ECMA), AJAX, PHP
DeclarativeSQL/XML – part of the SQL:2003-Standard
(2) Interfaces to Existing LanguagesXML APIs – Generic Mapping
DOM, SAX, StaX
XML Data Binding – Non-Generic MappingJAXB 2.0 – Java Architecture for XML BindingSDO – Service Data Objects (J2EE platform)ADO – ActiveX data objects (.NET platform)EMF – Eclipse Modeling Framework
(3) Native XML ProcessingPure XML Type System
XPath, XSLT and XQuery
UE IFS2
VO IFS2
VO IFS2
VO/UEModel Engineering
M3-5
XML ProcessingXML & DBXQueryXPathIntroduction
© 2011 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
(1) Extensions to Existing Languages
Extension of the type system of existing languages with XML typesImport of XML data into this type system
Extension of the APIXML retrieval and manipulation XPath-based or XPath inspired
Example: SQL/XML
SELECT e.employee_id, XMLElement("Emp",
e.first_name||' '||e.last_name) AS resultFROM employees eWHERE employee_id > 200;
EMPLOYEE_ID FIRST_NAME
EMPLOYEES
LAST_NAME
EMPLOYEE_ID RESULT----------- ----------------------------
201 <Emp>Michael Hartstein</Emp>202 <Emp>Pat Fay</Emp>203 <Emp>Susan Mavris</Emp>
SELECT e.resume.extract('//JOB_ID/text()') result
FROM emp_resumes eWHERE e.employee_id = 100;
Relational Data XML Data
XML Data Relational Data<RESUME><FULL_NAME>S.King</FULL_NAME><JOB_HISTORY><JOB_ID>AD_PRES</JOB_ID></JOB_HISTORY>…
</RESUME>
RESUME
EMP_RESUMES
RESULT-------AD_PRES
M3-6
XML ProcessingXML & DBXQueryXPathIntroduction
© 2011 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
Mapping of XML data to genericXML programmatic APIsProgramming languages(e.g. Java, C#) are used to manipulate the dataRe-serialize it at the end
More details later on …
(2) Interfaces to Existing LanguagesXML API’s
<purchaseOrder><lineItem>…</lineItem><lineItem>…</lineItem>
</purchaseOrder>
<book><author>…</author><title>…</title>…
</book>
Class DomNode{public String getNodeName();public String getNodeValue();public void setNodeValue(nodeValue);public short getNodeType();
}
Generic Mappings
M3-7
XML ProcessingXML & DBXQueryXPathIntroduction
© 2011 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
(2) Interfaces to Existing LanguagesXML Data Binding
Non-Generic Mappings
Mapping of the XML Schemaof the XML data to appropriatecode in the target languageBased on this mapping, marshalling / unmarshallingbetween XML and objectsAdvantages
Abstraction from low-level APIs& the details of the parsing processDevelopment effort and error-proness can be reduced
DisadvantagesHigh memory demands forlarge XML documentsXML Schemaevolution leads to a new generation of thecorrsponding classes
<type name=“book-type”><sequence><attribute name=“year” type=“xs:integer”/><element name=“title” type=“xs:string”/><sequence minoccurs=“0”>
<element name=“author” type=“xs:string/></sequence>
</sequence></type>
<element name=“book” type=“book-type”>
Class Book-type{public integer getYear();public string getTitle();public List getAuthors();
}
DerivedClasses
andInterfaces
DerivedClasses
andInterfaces
Data Abstraction
Translation
Objects
Instances
Instances Deserialization(Unmarshalling)
Serialization(Marshalling)
Validation
Binding Compiler
Data Binding Framework getter/setter-methods
Customization of translation possible
XML Schema
XML Document
http://www.rpbourret.com/xml/XMLDataBinding.htm
M3-8
XML ProcessingXML & DBXQueryXPathIntroduction
© 2011 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
(3) Native XML Processing
Most promising alternative for the future!
The only alternative such that …the data is modeled only onceit is well integrated with the XML Schema type systemit preserves the logical/physical data independencethe code deals with non-generic structuresthe code can be optimized automatically
Data is storedin plain file systems or in dedicated data storese.g. XML extensions of RDBS
Missing pieces, under developmentprocedural logicupdate language…
M3-9
XML Processing
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XQuery XML & DBXPathIntroduction
OutlineOutline
IntroductionXPath
IntroductionXPath 1.0XPath 2.0
XQueryXML & DB
The following slides are based (among others) on:Kay, Michael: XPath 2.0 Programmer's Reference (3rd ed.), Wiley, Aug. 2004.Walmsley, Priscilla, XQuery, OReilly, March 2007.Klettke, Meike, Meyer, Holger: XML & Datenbanken, dpunkt.verlag, Jan. 2003.
M3-10
XML Processing
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XQuery XML & DBXPathIntroduction
IntroductionOverview
PurposeOriginal goal: selecting document parts for layout purposes (XSL)Now used for various XML-standards – XML Schema, XPointerNo XML syntax used – proprietary syntaxVarious selection criteria, e.g., element/attribute names, content, type
Basic Processing PrincipleTree-based navigation, similar to navigation in a file system Starting point is always a certain context – i.e., a tree node specified by a XPath expressionNavigation and Filter modify the contextResult of a XPath expression = context computed in the last step
Read-only languageIt cannot create nodes or modify existing nodes, except by callingfunctions written in another languageHowever, it can create new atomic values and sequences of existing nodes
W3C-StandardsXPath 1.0, Nov. 1999, ~ 44 pagesXPath 2.0, Jan. 2007, ~ 250 pages
M3-11
XML Processing
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XQuery XML & DBXPathIntroduction
XPath 1.0XPath Datamodel – 7 Node Types
Note: Root is NOT equal to the root (i.e. outermost) elementbut rather represents the whole XML document ("document entity“)
ProcessingInstruction
NodeStringValue: String
NodeWithChildren
Element Attribute Text CommentRoot
Namespace
declares*
*
NodeWithoutChildren
1
0..1isDefinedBy
parent
child*parent
child1
outermostelement
parent
child *
parent childchild
**
*
attribute
namespace
*
M3-12
XML Processing
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XQuery XML & DBXPathIntroduction
no:Attribute
h1234
HandyCatalog:Element
name:Attribute
NOKIA
ProducerNo:Element Type:Element Type:Element
Weight:Element Price:Elementcontract:Attributeno
:Comment
NOKIA
Producer:Element
....
Price:Element
:Text
999:Text
4999:Text
141g
name:Attribute
8210
name:Attribute
7110
contract:Attributeyes
:root
Node Name: Node TypeNode Value
Legend:Root Node
: part-of
UML Object Diagram
XPath 1.0 XPath Data Model – Example HandyCatalog1.xml
Root (Outermost)Element
M3-13
XML Processing
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XQuery XML & DBXPathIntroduction
XPath 1.0XPath Navigation – 13 Axes Names
self
ancestor-or-self
ancestor
parent
following-sibling
following
child
descendant
descendant-or-self
preceding
preceding-sibling
Context Node
Parts of a XML document represent nodes of a treeProcessing direction of the XPath-processor is depth-firstFurther axes names
attributenamespace
M3-14
XML Processing
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XQuery XML & DBXPathIntroduction
Hierarchical Operators / and ///
root node
//Typeall Type elements at arbitrary depth//Type/Priceall Price childelements of Type elements at arbitrary depth
Access to Elements */*
root element//*
all elements, including the root element/HandyCatalog/*/Typeall Type elements, which are grandchildsof the HandyCatalog element
XPath 1.0Hierarchical Operators, Elements/Attributes
Access to Attributes @//@*all attributes
Weight
ProducerNono
Pricecontract
Typename
HandyCatalog
Producername
root
Hierarchical Operators / and ///
root node
//Typeall Type elements at arbitrary depth//Type/Priceall Price childelements of Type elements at arbitrary depth
Access to Elements */*
root element//*
all elements, including the root element/HandyCatalog/*/Typeall Type elements, which are grandchildsof the HandyCatalog element
M3-15
XML Processing
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XQuery XML & DBXPathIntroduction
XPath 1.0Filter
Weight
ProducerNono
Pricecontract
Typename
HandyCatalog
Producername
root
//Type[Price]all Type elements containing a Price childelement
//Producer[ProducerNo]/Type[Price]all Type elements containing a Price childelement, whereby the Type elements must be childelements of a Producer element which contains a ProducerNo childelement
//Producer[Type/Price]all Producer elements containing a Type childelement which in turn contains a Price childelement
//Type[Weight and Price]all Type elements having Weight and Price childelements
//Type[Weight = "141g"]all Type elements containing a Weight childelement with value 141g
//Type[@name = "7110"]all Type elements containing an attribute name with value 7110
M3-16
XML Processing
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XQuery XML & DBXPathIntroduction
XPath 1.0Union, Index-based Access, Variables
Weight
ProducerNono
Pricecontract
Typename
HandyCatalog
Producername
root
Union | //Type/Weight | //Type/Priceall Weight and Price childelements of Type elements
Index-based access via the node’s context position//Type[1]first Type elementType[last()]last Type element
Variable $qnamefrom within XPath 1.0, variables can be referenced onlythe variable $qname has to be definedby the application using XPath 1.0 (e.g., XSLT or XQuery)Note: XPath 2.0 can also bind values to variable („for-clause“)
M3-17
XML Processing
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XQuery XML & DBXPathIntroduction
XPath 1.0 Path Expressions 1/2
Relative PathProcessing starts at the current context node (determined e.g., by the preceding Location Step)
Absolute Path Processing starts at the root node ("/") INDEPENDENT of the current context
Location Step
AxisName – Navigation via axes name (ancestor, etc.)Short forms for some axes nameschild:: element-name element-nameattribute::attname @attname/descendant-or-self::node()/ //self::node() .
parent::node() ..
AxisName::NodeTest('['predicate']')*
Location Step[/Location Step]*
/Path
Chaining
M3-18
XML Processing
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XQuery XML & DBXPathIntroduction
XPath 1.0Path Expressions 2/2
::NodeTest – Node filtering (1)Name of the node, or Wildcard "*" – arbitrary elements, "@*" – arbitrary attributes, or Type of the node on basis of a function (text(), comment(), processing-instruction(), node())
Result = Set of Nodes
[predicate] – Node filtering (2)Is a Filter on all nodes selected by NodeTest – e.g., specification of the context position via the nodes’ numberMultiple predicates are processed from left2rightResult = Boolean ValuePredicates may again contain Location Paths
E.g., selection of a node, in case that certain elements/attributes exist in the context of this node//address[tel/@type="work"]
M3-19
XML Processing
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XQuery XML & DBXPathIntroduction
XPath 1.0 Operators and Functions
XPath OperatorsNode Set Operators
|, [expr], /, //
Boolean and Comparison Operatorsor, and, =, !=, <=, <, >=, >
Arithmetic Operators+, -, *, div, mod
XPath Core Function Library ~ 37 functions availableNode Set Functions (7)
last(), position(), count(), id()(), local-name
String Functions (20)contains(string s1, string s2)concat(string s1, string s2, string sn*)
Boolean Functions (5)boolean true(), boolean false()
Number Functions (5)number round(number), number sum(node-set)
M3-20
XML Processing
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XQuery XML & DBXPathIntroduction
XPath 2.0Goals of XPath 2.0
Simplify manipulation of XML Schema-typed contentIntroduction of a type system based on XML Schema
Simplify manipulation of string contentRegular expressions, changing strings to upper and lower case, etc.
Support related XML standardsSupports common underlying semantics for XSLT 2.0 and XQuery 1.0Data model based on the InfoSet W3C-Standard
Improve ease of useNew string / aggregation functions, conditional expression, etc.
Improve interoperabilityDifferent implementations of specifications should produce same result
Improve i18n supportSupport the needs of different languages and cultures worldwide
Maintain backward compatibilityLarge gratuitous incompatibilities were avoidedAbility to run in backward compatibility mode
Enable improved processor efficiency
M3-21
XML Processing
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XQuery XML & DBXPathIntroduction
XPath 2.0XPath 2.0 vs. XPath 1.0
70% more language concepts than XPath 1.0
Number of operatorshas doubled
Number of functions in the standard function library has grown by a factor of four
Minor changes in core syntax
Introduction of a new type system based on XML Schemarepresents a pretty radical overhaul of the language semantics
M3-22
XML Processing
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XQuery XML & DBXPathIntroduction
XPath 2.0New Features in XPath 2.0 – Overview
Everything is a „sequence“ and Sequence ProcessingConstruction operatorsFilterNew set operators in addition to UNIONFunctions for list manipulationAggregation functions
Support of XML Schema‘s Type SystemType annotationsTyped valuesType expressions
Changes to Path ExpressionsNode tests now also on basis of XML Schema TypesLocation steps can be now defined by function calls
New ExpressionsControl primitives: «for» and «if»Quantifiers: «some» and «every»
New Operators and New Functions
M3-23
XML Processing
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XQuery XML & DBXPathIntroduction NOTE: Although syntactically correct, nested sequences become unnested
XPath 1.0: Sets of nodes onlyUnorderedCan‘t contain duplicates
SequencesAre ordered(1, 2, 3, 4) is different from (4, 3, 2, 1)
Can have duplicates(1, 2, 3, 4) is different from (1, 1, 2, 3, 4)
Can have heterogenous items(1, 2, 3, “foo“)
Can‘t be nested(1, 2, (3, 4)) is the same as (1, 2, 3, 4)
IdentityYES: NodesNONE: Atomic values and sequences1 is the same as (1)
XPath 2.0„Everything is a sequence“ 1/2
Item{abstract}
SequenceAtomic ValueNode
contains*
Remember Lisp ?
M3-24
XML Processing
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XQuery XML & DBXPathIntroduction
XPath 2.0„Everything is a sequence“ 2/2
Consequence of „everything is a sequence“Every operand of an expression is a sequenceEvery result of an expression is a sequence
2 characteristics: closure and composabilityThe language is closed every possible operation applied to a sequence generates again a sequenceTherefore expressions can be nested arbitrarily –composability
ExampleSum(//Type/Price)
Result = Sequence
M3-25
XML Processing
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XQuery XML & DBXPathIntroduction
XPath 2.0Sequence Processing 1/2
Union (alternative: | as in XPath 1.0)(A, B) union (A, B) (A, B) (A, B) union (B, C) (A, B, C)
Intersection(A, B) intersect (A, B) (A, B)(A, B) intersect (B, C) (B)
XPath 1.0 versus XPath 2.0Determine whether the node $x is included in the /foo/bar node-setXPath 1.0: count(/foo/bar)=count(/foo/bar | $x)XPath 2.0: $x intersect /foo/bar
Difference(A, B) except (A, B) ()(A, B) except (B, C) (A)
XPath 1.0 versus XPath 2.0Select all attributes except the one with a given NS-qualified nameXPath 1.0: @*[not(namespace-uri()='http://example.com' and local-name()='foo')]XPath 2.0: @* except @exc:foo
M3-26
XML Processing
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XQuery XML & DBXPathIntroduction
XPath 2.0Sequence Processing 2/2
List functionsinsert((1, 3, 4), 2, 2) (1, 2, 3, 4)remove((1, 2, 3), 2) (1, 3)index-of((10, 20, 30), 20) 2empty(()) trueexists((1, 2, 3)) true
Aggregation functionssum(1, 2, 3) 6 //already supported in XPath 1.0count(1, 2, 3) 3 //already supported in XPath 1.0avg(1, 2, 3) 2min(1, 2, 3) 1max(1, 2, 3) 3
M3-27
XML Processing
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XQuery XML & DBXPathIntroduction
XPath 2.0Type System
XPath 1.0 supports Node-setsBooleansStringsA single numeric data type (double precision floating point)
Weakly typed language
XPath 2.0 supportsSequences as a data typeAll 19 primitive simple types built into XML Schema like integers, decimals, single precision, dates, times, durations, …User-defined data typesStrong type checking as well as weak type checking
hybrid languagesatisfies data-oriented and document-oriented world
M3-28
XML Processing
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XQuery XML & DBXPathIntroduction
XPath 2.0Type System – Changes to XPath 1.0 Data Model
NodeStringValue: String
1ProcessingInstruction
Text Comment
Namespace
declares*
* 0..1isDefinedBy
parent
child*
parent
child1
outermostelement
parent
child *
parent childchild
**
NodeWithChildren NodeWithoutChildren
* *
attribute
namespace*
*Document AttributeElement
TypedNodeName: QName?TypedValue: AtomicValue*TypeAnnotation: QName?
ComplexTypes SimpleTypes
has
0..1
XMLSchemaTypes has
0..1 AtomicValueTypeAnnotation
TypeAnnotation
TypeAnnotation
TypeAnnotation*
M3-29
XML Processing
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XQuery XML & DBXPathIntroduction
XPath 2.0Path Expressions – Node Test by Schema Type
Node tests in XPath 1.0On basis of the node‘s name and it‘s predefined 7 types
Node tests in XPath 2.0Also on basis of the node‘s type defined by XML SchemaFor example, select all elements of type Person, regardless of the nameUseful especially when using a schema with a rich typehierarchy in which many elements can be derived from thesame type definition
M3-30
XML Processing
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XQuery XML & DBXPathIntroduction
XPath 2.0Path Expressions – Function as Location Step
Now, a function call can be used as a location stepAllows to follow logical relationships in the document’s structure, not just physical relationships given by the hierarchyExample: «customer[@id="123"]/find-orders(.)/order-value»The person writing a path expression doesn’t necessarily need to know how the orders for a customer are found
supports some kind of information hiding encapsulationthe way that they are found can change without invalidating the expression locality of change
XPath itself does not allow to write the find-orders()function
you can do this on basis of XQuery or XSLT
M3-31
XML Processing
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XQuery XML & DBXPathIntroduction
XPath 2.0«for» Expression
Enables iteration over sequences, returning a new valuefor each member in the argument sequence
for $line in /po:PurchaseOrder/po:OrderLines/po:Linereturn $line/po:Price * $line/po:Quantity
Similar to xsl:for-each, but it is different in that it is an actual expression, that returns a sequence which can, in turn, be processed as such
fn:sum(for $line in /po:PurchaseOrder/po:OrderLines/po:Linereturn $line/po:Price * $line/po:Quantity
)
PurchaseOrder
OrderLines
Line
Price Quantity
Seller
Code
M3-32
XML Processing
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XQuery XML & DBXPathIntroduction
XPath 2.0«if» Expression
Depending on whether the expression in parenthesisevaluates to true or false, the expression returns thethen or else section
if(/po:PurchaseOrder/po:Seller = 'Bookstore') then 'ok' else 'ko'
Power of XPath 2.0 comes from the ability to combineexpressions to create sophisticated requests
fn:sum(for $line in /po:PurchaseOrder/po:OrderLines/po:Linereturnif($line/po:Code) then $line/po:Price * $line/po:Quantityelse ()
)
PurchaseOrder
OrderLines
Line
Price Quantity
Seller
Code
M3-33
XML Processing
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XQuery XML & DBXPathIntroduction
XPath 2.0Existential «some» and Universal «every» Quantifiers
XPath 1.0 equals operator (=) could compare node-sets/students/student/name = "Fred" returns true if anystudent name is equal to "Fred" existential quantificationThe same applies to !=, <, >,…;
e.g. /students/student/name != "Fred" returns true if anystudent name is not equal to "Fred"
XPath 2.0 makes it possible to write explicit quantifiedexpressions – existentially and universially quantified
some $x in /students/student/name satisfies $x = "Fred"every $x in /students/student/name satisfies $x = "Fred"
This formulation is more powerful, because the constrainingcondition can be anything (not just =, !=, < and so on)
some $item in //LineItemsatisfies (($item/Price * $item/Quantity) > 100)some $x in (1, 2, 3), $y in (2, 3, 4) satisfies $x + $y = 4
M3-34
XML Processing
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XQuery XML & DBXPathIntroduction
XPath 2.0String Support Improved
Case conversionupper-case('Michael') 'MICHAEL‚
String concatenationconcat(‘Jane‘, ‘ ‘, ‘Brown‘) ‘Jane Brown‘
Complementing the starts-with()function of XPath 1.0 ends-with() function
Regular expressions supported by 3 functionsmatches(), replace(), and tokenize()Example: matches(SSNumber, '\d{3}-\d{2}-\d{4}')
All functions that perform comparison of strings can now use a user-specified collation to do the string comparison
This allows more intelligent localization of string matchingaccording to the conventions of different languages
M3-35
XML Processing
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XQuery XML & DBXPathIntroduction
XPath 2.0XPath Functions by Category 1/2
Boolean Functionsboolean(), false(), true()
Numeric Functionsabs(), avg(), max(), min()
String Functionscompare(), concat(), contains()
Date and Time Functionscurrent-date(), current-time()
Duration Functionsdays-from-dayTimeDuration(), hours-from-dayTimeDuration()
Aggregation Functionscount(), avg(), count(), max(), min(), sum()
Functions on URIsbase-uri(), collection(), doc()
Functions on QNamesexpanded-QName(), local-name-from-QName()
M3-36
XML Processing
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XQuery XML & DBXPathIntroduction
XPath 2.0XPath Functions by Category 2/2
Functions on Sequencesempty(), exists()
Functions that Return Properties of Nodesbase-uri(), data(), document-uri()
Functions that Find Nodescollection(), doc(), id(), root()
Functions that Return Context Informationbase-uri(), collection(), current-date()
Diagnostic Functionserror(), trace()
Functions that Assert a Static Typeexactly-one(), one-or-many(), zero-or-one()
M3-37
XML Processing
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML & DBXQueryXPathIntroduction
Outline
IntroductionXPathXQuery
IntroductionFor and let clausesAdding Elements/Attributes to ResultsConditional ExpressionsJoinsQuantifiersDistinctness & GroupingSorting & AggregatingStructure of a XQuery ProgramAppendix
XML & DB The following slides are based (among others) on:
Kay, Michael: XPath 2.0 Programmer's Reference (3rd ed.), Wiley, Aug. 2004.Walmsley, Priscilla, XQuery, OReilly, March 2007.Klettke, Meike, Meyer, Holger: XML & Datenbanken, dpunkt.verlag, Jan. 2003.
M3-38
XML Processing
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML & DBXQueryXPathIntroduction
IntroductionWhy XQuery?
Why a “query” language for XML?Preserve logical/physical data independence
Based on an abstract data model, independent of physical data storage
Declarative programmingDescribe the “what”, not the “how”Commonalities with functional, imperative and query languages
Why a native query language? Why not SQL?We need to deal with the pecularities of XMLHierarchical, ordered, textual, potentially schema-less structure
Why another XML processing language ? Why not XSLT?The template nature of XSLT was not appealing to DB peopleNot declarative enough
Transacted data
Declarative processing
XQuery
Persistent data
Transacted data
Declarative processing
Persistent data
SQL
M3-39
XML Processing
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML & DBXQueryXPathIntroduction
XQuery 1.0
IntroductionXPath – XSLT – XQuery
XPath 1.0
XPath 2.0
XSLT 1.0
XSLT 2.0 uses
uses
Common Data Model
Common Data Model XML Schemauses
1999
2007
provides
Library ofFunctions &Operators
extends
XM
L-b
ase
dS
yn
tax
No
n-X
ML-b
ase
dS
yn
tax
M3-40
XML Processing
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML & DBXQueryXPathIntroduction
IntroductionXPath – XSLT – XQuery
XPath 2.0Common language fornavigation, selection, extractionUsed in XSLT, XQuery, XPointer, XML Schema, XForms, etc.
XSLT 2.0: XML ⇒ XML, HTML, TextLoosely-typed scripting languageFormat XML in HTML for display in browserMust be highly tolerant of variability/errors in data
XQuery 1.0: XML ⇒ XMLStrongly-typed query language – enforces input and output typesMust guarantee safety/correctness of operations on data – side-effect freeLarge-scale database access
M3-41
XML Processing
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML & DBXQueryXPathIntroduction
Quilt
SQL OQL
XML-QL
XQL
XQL-99
XSLXPointer
XPath
Navigation,path expressions
Variabel bindings,flexible structuringof the result
Expressions
IntroductionHistory
XQuery
Main basis for XQuery was “Quilt”XML query language from IBM, INRIA and Software AG
M3-42
XML Processing
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML & DBXQueryXPathIntroduction
IntroductionXQuery Family of Standards
W3C-REC Jan. 2007XQuery 1.0 and XPath 2.0 Functions and Operators
the functions you can call in XPath expressions and the operations you can perform on XPath 2.0 data types
XQuery 1.0 and XPath 2.0 Data Model (XDM)representation and access for both XML and non-XML sources
XSLT 2.0 and XQuery 1.0 Serializationhow to output the results of XSLT 2.0 and XML Query evaluation in XML, HTML or as text
XML Syntax for XQuery 1.0 (XQueryX)an XML-aware syntax for querying collections of structured and semi-structured data both locally and over the Web
XQuery 1.0 and XPath 2.0 Formal Semanticsthe type system used in XQuery and XSLT 2.0 via XPath defined precisely for implementers
W3C Working Drafts / Java Community ProcessXQuery Update – Candidate Recommendation since August 2008!XQuery and XPath Full Text SearchXQJ – Query API for Java (~ JDBC)
http://www.w3.org/TR/xquery/
M3-43
XML Processing
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML & DBXQueryXPathIntroduction
IntroductionXQuery = 80% XPath 2.0 + 20% …
FLWOR (for-let-where-order-return)-expressions~ SQL’s SELECT-FROM-WHERE
XML constructionAdding new elements and attributes as well as transformations
Sorting of the resultOperators on types
Compile & run-time type tests
User-defined functionsModularize large queriesProcess recursive data
Strong typingGuarantees result value conforms to output typeEnforced statically or dynamically
M3-44
XML Processing
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML & DBXQueryXPathIntroduction
IntroductionFLWOR ['floωer] Expression 1/2
Variables are bound to values of expressions (using XPath)
Filtering of tuples on basis of predicates(optional)
Composition of the result (single nodes,ordered forest of nodes or atomic value)
Result = Instance of XPath/XQuery Data Model
RETURN
Filtered list of tupels frombound variables
WHERE
Ordered list of tupels frombound variables
FOR/LET
XML-Document
Construction (cf. SELECT in SQL)
Selection (cf. WHERE in SQL)
Iteration (cf. FROM in SQL) and Var. Binding
ORDER Ordering of tuples on basis of predicates(optional)
Ordering (cf. ORDERBY in SQL)
Ordered list of tupels
M3-45
XML Processing
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML & DBXQueryXPathIntroduction
IntroductionFLWOR ['floωer] Expression 2/2
LET $var := expr
FOR $var IN expr RETURN expr
WHERE expr ORDER expr
,
FLWOR ExpressionsAllow sortingAllow joiningAllow adding elements/ attributes to resultsVerbose, but can beclearer
FunctionCall
XPathExpression
Variable Binding
Variable Reference
Path ExpressionsGreat if just copyingcertain elements and attributes as is
M3-46
XML Processing
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML & DBXQueryXPathIntroduction
IntroductionXQuery Syntax – Some Important Issues
Nested ExpressionsCompact, non-XML syntaxBUT all names must be valid XML names
variables, functions, elements, etc.can be associated with a NS
No reserved wordsCase-sensitive
keywords are written as lowercaseNo special end-of-line characterXQuery comments are delimited by (: and :)
anywhere (insignificant) whitespace is alloweddo not appear in the resultexpansion over multiple lines allowed
Whitespacesallowed almost anywhere – have no significance
M3-47
XML Processing
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML & DBXQueryXPathIntroduction
IntroductionThe XQuery Processing Model
Source Tree
Analysis and Evaluation(XQuery Processor)
XQueryQuery
XM
L P
roce
ssor
Result Tree
Serialize orpass onSource
Document(XML)
ResultDocument
(XML)
M3-48
XML Processing
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML & DBXQueryXPathIntroduction
Running Example
0..1
PriceListeffDate
Prodnum
Pricecurrency
Discounttype
Prices
1
1..*
1
Ordernumdatecust
Itemdeptnumquantitycolor
*
Catalog
*
1
ColorChoices
Text
1 0..1
Desc
Text
0..1
1 1
Number Namelanguage
Text Text1 1
Productdept
Catalog.xmlPrices.xml
Order.xml
Text1
Text1
M3-49
XML Processing
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML & DBXQueryXPathIntroduction
Using a let clause with a range expression
Using a range expression in a for clause
Multiple for clauses
Multiple variable bindings in one for clause
for/let and Enclosed Expressions
Catalog
*
1
ColorChoices
Text
1 0..1
Desc
Text
0..1
1 1
Number Namelanguage
Text Text1 1
Productdept
Catalog
*
1
ColorChoices
Text
1 0..1
Desc
Text
0..1
1 1
Number Namelanguage
Text Text1 1
ProductdeptProductdept
M3-50
XML Processing
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML & DBXQueryXPathIntroduction
Adding Elements/Attributes to ResultsThree Use Cases
(1) 1:1 copying of elements/attributes from the input documentSimple elementsComplex elements – along with their attributes and children if any (notjust their atomic values!)No opportunity to change attributes, children, etc.
(2) Direct element/attribute constructors – a mixture of ...Literal content („hard-coded“) – appears as is in the output documentExpressions within „{}“ evaluating to any kind of node (elements, attributes, etc.) and to atomic valuesUsing XML syntax (proper nesting, case sensitivity, etc.)
(3) Computed constructorsAllows for dynamic names of nodes and dynamic valuesCopying tags from the input document but making minor changes(e.g., add an attribute)Turning content from the input document into markup
M3-51
XML Processing
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML & DBXQueryXPathIntroduction
Copy simple elements – name
Copy complex elements – product
Adding Elements/Attributes to Results(1) 1:1 Copying from the Input Document
Catalog
*
1
ColorChoices
Text
1 0..1
Desc
Text
0..1
1 1
Number Namelanguage
Text Text1 1
Productdept
Catalog
*
1
ColorChoices
Text
1 0..1
Desc
Text
0..1
1 1
Number Namelanguage
Text Text1 1
ProductdeptProductdept
M3-52
XML Processing
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML & DBXQueryXPathIntroduction
Wrap whole result (name elements) in new ul elements
In addition, wrap each resulting name element in an li element
Adding Elements/Attributes to Results(2) Direct Constructors 1/3
Catalog
*
1
ColorChoices
Text
1 0..1
Desc
Text
0..1
1 1
Number Namelanguage
Text Text1 1
Productdept
Catalog
*
1
ColorChoices
Text
1 0..1
Desc
Text
0..1
1 1
Number Namelanguage
Text Text1 1
ProductdeptProductdept
Literalcontent
Literalcontent
M3-53
XML Processing
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML & DBXQueryXPathIntroduction
Adding Elements/Attributes to Results(2) Direct Constructors 2/3
Add new attributes, copy attribute values / elementcontent
Copy element content and use as attribute valueswith prefix „P“
Catalog
*
1
ColorChoices
Text
1 0..1
Desc
Text
0..1
1 1
Number Namelanguage
Text Text1 1
Productdept
Catalog
*
1
ColorChoices
Text
1 0..1
Desc
Text
0..1
1 1
Number Namelanguage
Text Text1 1
ProductdeptProductdept
New attribute name & new value
New attribute name & copy existing value
data-()function not necessary –automatic „atomization“
Copy element content (or attribute content)(its typed value) via data()-function
M3-54
XML Processing
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML & DBXQueryXPathIntroduction
Adding Elements/Attributes to Results(2) Direct Constructors 3/3
Copy attributes/elements & eliminate certain elements
Catalog
*
1
ColorChoices
Text
1 0..1
Desc
Text
0..1
1 1
Number Namelanguage
Text Text1 1
Productdept
Catalog
*
1
ColorChoices
Text
1 0..1
Desc
Text
0..1
1 1
Number Namelanguage
Text Text1 1
ProductdeptProductdept
Copy dept-attributesto new element new_product
Copy product elements andadd as subelements to new_product
Eliminate the numbersubelements of product
M3-55
XML Processing
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML & DBXQueryXPathIntroduction
Turning content into markupAttribute values elementsExplicit element constructor
Adding Elements/Attributes to Results(3) Computed Constructors
Catalog
*
1
ColorChoices
Text
1 0..1
Desc
Text
0..1
1 1
Number Namelanguage
Text Text1 1
Productdept
Catalog
*
1
ColorChoices
Text
1 0..1
Desc
Text
0..1
1 1
Number Namelanguage
Text Text1 1
ProductdeptProductdept
M3-56
XML Processing
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML & DBXQueryXPathIntroduction
Conditional Expressions
Catalog
*
1
ColorChoices
Text
1 0..1
Desc
Text
0..1
1 1
Number Namelanguage
Text Text1 1
Productdept
Catalog
*
1
ColorChoices
Text
1 0..1
Desc
Text
0..1
1 1
Number Namelanguage
Text Text1 1
ProductdeptProductdept
M3-57
XML Processing
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML & DBXQueryXPathIntroduction
Joins 1/2
Two-way join in a predicate
Two-way join in a where clause
Ordernumdatecust
Ordernumdatecust
Itemdeptnumquantitycolor
Itemdeptnumquantitycolor
*
Catalog
*
1
ColorChoices
Text
1 0..1
Desc
Text
0..1
1 1
Number Namelanguage
Text Text1 1
Productdept
Catalog
*
1
ColorChoices
Text
1 0..1
Desc
Text
0..1
1 1
Number Namelanguage
Text Text1 1
ProductdeptProductdept
M3-58
XML Processing
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML & DBXQueryXPathIntroduction
0..1
PriceListeffDatePriceListeffDate
ProdnumProdnum
PricecurrencyPricecurrency
DiscounttypeDiscounttype
Prices
1
1..*
1
1
Text1
Text
Joins 2/2
Three-way join in a where clause
Outer Join
Ordernumdatecust
Ordernumdatecust
Itemdeptnumquantitycolor
Itemdeptnumquantitycolor
*
Catalog
*
1
ColorChoices
Text
1 0..1
Desc
Text
0..1
1 1
Number Namelanguage
Text Text1 1
Productdept
Catalog
*
1
ColorChoices
Text
1 0..1
Desc
Text
0..1
1 1
Number Namelanguage
Text Text1 1
ProductdeptProductdept
M3-59
XML Processing
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML & DBXQueryXPathIntroduction
Quantifiers
Quantified expression using the some keyword
Quantified expression using the every keyword
Combining the not function with the some keyword
Binding multiple variables in a quantified expression
Catalog
*
1
ColorChoices
Text
1 0..1
Desc
Text
0..1
1 1
Number Namelanguage
Text Text1 1
Productdept
Catalog
*
1
ColorChoices
Text
1 0..1
Desc
Text
0..1
1 1
Number Namelanguage
Text Text1 1
ProductdeptProductdept
M3-60
XML Processing
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML & DBXQueryXPathIntroduction
Distinctness & Grouping
... by department
Ordernumdatecust
Ordernumdatecust
Itemdeptnumquantitycolor
Itemdeptnumquantitycolor
*
M3-61
XML Processing
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML & DBXQueryXPathIntroduction
Sorting & Aggregating
Ordernumdatecust
Ordernumdatecust
Itemdeptnumquantitycolor
Itemdeptnumquantitycolor
*
M3-62
XML Processing
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML & DBXQueryXPathIntroduction
Structure of a XQuery ProgramProlog, Body, Modules 1/3
PrologRole
is the link between the XQuery expression and the environment where the expression is embedded
Partsnamespace declarationsschema importsdefault element and function namespacefunction declarationsfunction library importsglobal and external variable definitions, etceach declaration separated by a semicolon
BodyContains the XQuery expression within { }
Note!a function does not inherit the context from the main body of the query – rather, the context has to be passed as parameter
M3-63
XML Processing
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML & DBXQueryXPathIntroduction
Structure of a XQuery ProgramProlog, Body, Modules 2/3
Example 1
Example 2
Prolog
Body
Prolog
Body
M3-64
XML Processing
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML & DBXQueryXPathIntroduction
Structure of a XQuery ProgramProlog, Body, Modules 3/3
Module
XQuery style conventions:http://www.xqdoc.org/xquery-style.html
Useful functions available at: http://www.xqueryfunctions.com
M3-65
XML Processing
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML & DBXQueryXPathIntroduction
Simple for and let clause
Intermingled for and let clauses
Appendixfor and let Clauses
Catalog
*
1
ColorChoices
Text
1 0..1
Desc
Text
0..1
1 1
Number Namelanguage
Text Text1 1
Productdept
Catalog
*
1
ColorChoices
Text
1 0..1
Desc
Text
0..1
1 1
Number Namelanguage
Text Text1 1
ProductdeptProductdept
M3-66
XML Processing
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML & DBXQueryXPathIntroduction
AppendixDirect Constructors 1/3
Wrap the content of each number and name element
Get the content of each name element / order by
Catalog
*
1
ColorChoices
Text
1 0..1
Desc
Text
0..1
1 1
Number Namelanguage
Text Text1 1
Productdept
Catalog
*
1
ColorChoices
Text
1 0..1
Desc
Text
0..1
1 1
Number Namelanguage
Text Text1 1
ProductdeptProductdept
M3-67
XML Processing
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML & DBXQueryXPathIntroduction
AppendixDirect Constructors 2/3
Aggregation function – no tags from input document included
Add attributes class & dep
Catalog
*
1
ColorChoices
Text
1 0..1
Desc
Text
0..1
1 1
Number Namelanguage
Text Text1 1
Productdept
Catalog
*
1
ColorChoices
Text
1 0..1
Desc
Text
0..1
1 1
Number Namelanguage
Text Text1 1
ProductdeptProductdept
M3-68
XML Processing
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML & DBXQueryXPathIntroduction
Enclosed expressions that evaluate to elements
Enclosed expressions that evaluate to attributes
Enclosed expressions with multiple subexpressions
AppendixDirect Constructors 3/3
Catalog
*
1
ColorChoices
Text
1 0..1
Desc
Text
0..1
1 1
Number Namelanguage
Text Text1 1
Productdept
Catalog
*
1
ColorChoices
Text
1 0..1
Desc
Text
0..1
1 1
Number Namelanguage
Text Text1 1
ProductdeptProductdept
M3-69
XML Processing
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML & DBXQueryXPathIntroduction
AppendixConditional Expressions
Simple conditional expression
Conditional expression returning multiple expressions
Catalog
*
1
ColorChoices
Text
1 0..1
Desc
Text
0..1
1 1
Number Namelanguage
Text Text1 1
Productdept
Catalog
*
1
ColorChoices
Text
1 0..1
Desc
Text
0..1
1 1
Number Namelanguage
Text Text1 1
ProductdeptProductdept
M3-70
XML Processing
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML & DBXQueryXPathIntroduction
AppendixQuantifiers
A where clause with multiple expressionsand an exists quantifier
Catalog
*
1
ColorChoices
Text
1 0..1
Desc
Text
0..1
1 1
Number Namelanguage
Text Text1 1
Productdept
Catalog
*
1
ColorChoices
Text
1 0..1
Desc
Text
0..1
1 1
Number Namelanguage
Text Text1 1
ProductdeptProductdept
M3-71
XML Processing
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML & DBXQueryXPathIntroduction
AppendixOrdering
The order by clause
Using multiple orderingspecifications
Ordernumdatecust
Ordernumdatecust
Itemdeptnumquantitycolor
Itemdeptnumquantitycolor
*
M3-72
XML Processing
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML & DBXQueryXPathIntroduction
AppendixDistinctness & Aggregation 1/3
Distinctness on a combination of values
Aggregation – sum
Ordernumdatecust
Ordernumdatecust
Itemdeptnumquantitycolor
Itemdeptnumquantitycolor
*
Catalog
*
1
ColorChoices
Text
1 0..1
Desc
Text
0..1
1 1
Number Namelanguage
Text Text1 1
Productdept
Catalog
*
1
ColorChoices
Text
1 0..1
Desc
Text
0..1
1 1
Number Namelanguage
Text Text1 1
ProductdeptProductdept
M3-73
XML Processing
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML & DBXQueryXPathIntroduction
AppendixDistinctness & Aggregation 2/3
Aggregation – count, sum
Ordernumdatecust
Ordernumdatecust
Itemdeptnumquantitycolor
Itemdeptnumquantitycolor
*
M3-74
XML Processing
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML & DBXQueryXPathIntroduction
AppendixDistinctness & Aggregation 3/3
Aggregation on multiple values
Ordernumdatecust
Ordernumdatecust
Itemdeptnumquantitycolor
Itemdeptnumquantitycolor
*
M3-75
XML Processing
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML & DBXQueryXPathIntroduction
Outline
IntroductionXPathXQueryXML & DB
MotivationStorage AlternativesAccess AlternativesSQL/XML – SQL:2003-Standard
The following slides are based (among others) on:Kay, Michael: XPath 2.0 Programmer's Reference (3rd ed.), Wiley, Aug. 2004.Walmsley, Priscilla, XQuery, OReilly, March 2007.Klettke, Meike, Meyer, Holger: XML & Datenbanken, dpunkt.verlag, Jan. 2003.
M3-76
XML Processing
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML & DBXQueryXPathIntroduction
Motivation XML and DB – Why?
Existing DB store large amounts of dataPublish data as XML documents
Existing DB should store existing XML documentsStorage in DB along with additional „meta“ information
Well-known Benefits of DB Efficient storage of large amounts of well-structured dataStructured query language (SQL)OptimizationViews and security mechanismsConcurrency Control / Transactions – more fine-grained than just on a document levelRecovery techniques
......
......
........................
XML Doc.<a>
<b>...</b><c d=.../>
</a>
......
......
........................
XML Doc.<a>
<b>...</b><c d=.../>
</a>
......
......
........................
XML Doc.<a>
<b>...</b><c d=.../>
</a>
......
......
........................
XML Doc.<a>
<b>...</b><c d=.../>
</a>
DB are essential cornerstones of today’s IT infrastructures –the importance of DB for Web applications steadily increases"... The Web is one huge database..."
[The Asilomar Report on Database Research, SIGMOD Record 27(4), Dec. 1998]
M3-77
XML Processing
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML & DBXQueryXPathIntroduction
MotivationThe Challenge: Different Categories of XML Documents
Data-orientedWell-known, fine-grained, typed structureOrdering of subelements doesn‘t matterSchema available, defining the structureExamples: order, invoice
Document-orientedSemi-structured, course grained, untypedOrdering of subelements significantMixed content commonSchema often non-existent or very genericExample: Claim
MixtureBeispiel: Email
<Order orderNr="1012"><CustomerNr>8596</CostumerNr><Position posNr="1">
<ProductNr>14896612</ProductNr><Amount>2</Amount>...
</Position>...</Order>
<Claim>A severe <Reason>fire</Reason>damaged the building and claimed <DeathToll>12</DeathToll> lives. First investigations done by police indicate fire raising with <Motive>criminal intent</Motive>.</Claim>
<Email><Sender>[email protected]</Sender>...<Recipient>[email protected]</Recipient><Content>All the best to your 110th birthday!</Content>
</Email>
M3-78
XML Processing
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML & DBXQueryXPathIntroduction
Storage Alternatives Overview
Storage Alternatives
File system DBS Hybrid
Native DBSConventional DBS
Datamodel
RM
OROO
XML
no Schema XML SchemaDTDSchema
Language
File systemXML documents stored as files at operating system levelAdditional descriptive attributes and file referencesstored within DBS possible
DBSXML document stored in DBS as a whole or shreddered, eventually together with descriptive attributes
HybridXML document or parts thereof stored across DBS and file systemRedundant or non-redundant storage possible
Non-shreddered vs.shreddered
M3-79
XML Processing
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML & DBXQueryXPathIntroduction
Storage Alternatives Native Storage
Conceptual XML mapping to a fine-grained storage structureTransformation into an internal XML treeOften DOM-trees are resembledElement names are replaced by means of a dictionary
http://www.idealliance.org/proceedings/xml05/ship/58/Native_XML_Databases.HTML
M3-80
XML Processing
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML & DBXQueryXPathIntroduction
Storage AlternativesRelational Storage – Heterogeneity 1/2
DatamodelLevel M2
InstanceLevelM0
SchemaLevelM1
XML-Document
ElementElement Value
AttributeElement Typ
DTD / XML Schema (optional)
AttributeAttribute Value
Element Type aElement Type b...
Attribute xAttribute y...
XML Concepts
Relational DB
Relation Attribute
Tupel Value
Relationales Schema
Relation ARelation B...
Attribute XAttribute Y...
Relational Concepts
Legend: ... consistsOf... mayConstistOf
M3-81
XML Processing
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML & DBXQueryXPathIntroduction
Storage AlternativesRelational Storage – Heterogeneity 2/2
XML (DTD)nestedbasically „STRING“ onlystored within attributes and ETselements are ordered
just a single attribute of type ID
IDREFs (untyped) andnested ETs (typed)
RDBSflatnumerousstored within attributestupels are not ordered
composite key possibleforeign key – typed
optionalalso after instance creationschema in form of tags is part of the instance data – “self-describing”
necessary created prior to instancesnot part of the instances
StructureDatatypesValuesOrderIdentificationRelationships
Schema
M3-82
XML Processing
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML & DBXQueryXPathIntroduction
*
hotelChain
«attribute»hotelID
name category location telephone
hotel
room
roomCat price
*
*11 1
1 1
1
UML Diagram:
<!ELEMENT hotelChain (hotel*)><!ELEMENT hotel (name, category, location, telephone*, room*)><!ATTLIST hotel hotelID CDATA #REQUIRED><!ELEMENT name (#PCDATA)><!ELEMENT category (#PCDATA)><!ELEMENT location (#PCDATA)><!ELEMENT telephone (#PCDATA)><!ELEMENT room (roomCat, price)><!ELEMENT roomCat (#PCDATA)><!ELEMENT price (#PCDATA)>
DTD:
Storage AlternativesRelational Storage – Example
M3-83
XML Processing
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML & DBXQueryXPathIntroduction
Fixed SchemaSchema is domain-independent (e.g., Handy-Catalog)and independent from the target schema
no decomposition: XML-document is stored as a wholedecomposition: XML-document is “shreddered”Similarities with the generic XML API approach
Derived SchemaSchema is derived from the other oneSimilarities with the XML Data Binding approach
User-Defined SchemaSchema is domain-dependent, but has been designed independent of the target schema
Storage Alternatives Relational Storage – Mapping Onto a Schema
Schema DB-side
fixedderived
user-defined
fixed us
er
defin
edde
rived
SchemaXML-side
M3-84
XML Processing
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML & DBXQueryXPathIntroduction
Storage Alternatives Mapping Onto a Fixed RDB-Schema
:hotelChain
«attribute»:hotelID
:name :category :location :telephone
:hotel
:room
:roomCat :price
Element Name DB ValueAttribute Name DB ValueXML Value DB Value
[cf. Florescu et al., IEEE Data Engineering, 1999]
Example: Decomposition of the document (content and schema)into a single table
FixedMappingTableSource Ordinal Name Target/Value...
location Viennatelephone 0043/732/2468roomroomCat Suite
...
M3-85
XML Processing
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML & DBXQueryXPathIntroduction
*
hotelChain
«attribute»hotelID
name category location telephone
hotel
room
roomCat price
*
*11 1
1 1
1
Problem: Fragmentation
Element Type DB RelationAttribute DB AttributeForeign Keys connect Elements
rID hID
room
rcID rID value
roomCathcID
hotelChain
[cf. Shanmugasundaram et al., VLDB, 1999]
hID hcID hotelID
hotel
nID hID value
name
lID hID value
location
cID hID value
category
pID rID value
price
Example: Decomposition of the XML Schema into tables („Basic Inlining“)
Storage Alternatives Mapping Onto a Derived RDB-Schema
tID hID valuetelephone
M3-86
XML Processing
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML & DBXQueryXPathIntroduction
ID Phone# Desc
Phone
ID RoomCat Rate
RoomRatesTownID TownName Country
Town
ID Name Category TownID
Accommodation
*
hotelChain
«attribute»hotelID
name category location telephone
hotel
room
roomCat price
*
*11 1
1 1
1
Example: Mapping of the XML Schema intoexisting tables and attributes
Storage Alternatives Mapping onto a User-defined RDB-Schema
M3-87
XML Processing
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML & DBXQueryXPathIntroduction
Storage AlternativesMapping Options – Advantages/Disadvantages
Fixed- Domain not represented in schema- Queries/optimization hard to realize+ Fixed at DB-side:
no Schema at XML-side necessarybest suited for document-oriented XML
Derived- The schema at the other hand side must exist
User-Defined+ Schema can be designed independent of the target schema+ Data of existing DBs can be used!- Heterogeneity problem!
Derived / User-Defined+ Domain is represented in schema+ Optimization mechanisms usable+ Suited especially for data-oriented XML
fix derived
fixedmapping
n.a.
XML DB user-defined
fix
derived
user-defined
user-definedmapping
derivedmapping
fixedmapping
derivedmapping
n.a.
n.a.
n.a.
M3-88
XML Processing
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML & DBXQueryXPathIntroduction
Storage Alternatives Representation of Mapping Knowledge
“Template-Driven”mapping knowledge hard-codedqueriestransformation programs
“Model-Driven”mapping knowledge reified (i.e., stored as meta data)as a file, e.g., as XML documentin the DB, usage of DB functionality
<?xml version="1.0" ?><Accommodation xmlns:sql="urn:schemas-ms-com:xml-sql">
<sql:query>SELECT * FROM Accommodation FOR XML AUTO,ELEMENTS
</sql:query></Accommodation>
<?xml version="1.0" ?><Schema xmlns="urn:schemas-ms-com:xml-data"
xmlns:dt="urn:schemas-microsoft-com:datatypes"xmlns:sql="urn:schemas-microsoft-com:xml-sql">
<ElementType name="Phone" content="textOnly" /><ElementType name="Accommodation" sql:relation="Accommodation">
<element type="Phone" sql:relation="Phone" sql:field="Number"> <sql:relationship
key-relation="Accommodation"key="AcID"foreign-key="AccID"foreign-relation="Phone" />
</element></ElementType>
</Schema>
M3-89
XML Processing
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML & DBXQueryXPathIntroduction
Read-only QueryXML-centered
Access via XML-based languageW3C XQuery-Standard
DB-centeredAccess via SQL-based languageSQL/XML – Part of the current SQL2003-Standard
Proprietary MechanismNeither DB- nor XML-centered
Data ManipulationCurrent research areaXQuery Update Facility, W3C Candidate Rec. Aug. 2008http://www.w3.org/TR/xqupdate/
Access Alternatives Read-only Query vs. Data Manipulation
M3-90
XML Processing
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML & DBXQueryXPathIntroduction
SQL/XMLFirst Edition: Part of SQL:2003-Standard
Storage of XML documentsIntroduction of new datatype XMLTypeAutomatic shredding which can be customized
Publishing stored data by extending SQLwith XML-Functions
Functions for retrieving relational data andtransform it into XML (e.g., XMLGen, XMLElement, XMLAgg)
Unfortunately, SQL:2003 pre-dated the XQuerystandard
Therefore no full XQuery functionality avaliablecf. SQL:2007 ...
RDBS
XMLType
SQL SQL/XML Functions
XML Documents
M3-91
XML Processing
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML & DBXQueryXPathIntroduction
SQL/XMLSecond Edition: Part of Forthcoming SQL:2007-Standard
More complete integration of XQuery Data ModelXML datatype will support XQuery data model
heterogeneous sequencesnon well-formed XML data full XML Schema support and validation
Advanced Query capabilitiesXMLQuery() function
create XML content using XQuery
XMLTable() function shred XML to relational data using XQuery
Mapping between SQL & XQuery data modelXMLCAST between XML and SQL types
Figure „IBM DB2“from an article ofHolger Seubert
M3-92
XML Processing
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML & DBXQueryXPathIntroduction
SQL/XMLWhat is it Good For …
BenefitsTakes advantage of the entire SQL infrastructure (e.g. triggers, PL/SQL)Transactional supportScalability, clustering, reliabilityGlobal optimization (XML and relational)Standard implemented and supported by Microsoft, Oracle, IBM, etc.
DrawbacksRequires data to be loaded into the DB
not good for temporary XML datanot worth the effort for small volumes of data
Blending of the two languages (SQL, XQuery) isn’t naturalXQuery not supported entirely by DB engines
No XML updates a la XQuery yet
M3-93© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML Processing
Literature
Standard-Specificationshttp://www.w3.org/TR/xpath20/http://www.w3.org/TR/xquery/http://www.sqlx.org
SQL/XML StandardBest source (!) on XML & DBS incl. an extensive overview of available systems:
http://www.rpbourret.com/xml/XMLDatabaseProds.htmhttp://www.rpbourret.com/xml/XMLAndDatabases.htm
Interesting collection of papers:http://www.cs.cornell.edu/People/jai/pubs.html#PaperCategory:PublishingRelationalDataAsXML
GI-Working Group „Web und Datenbanken“: http://dbs.uni-leipzig.de/webdb/
M. Koran, Evaluierung von XML Datenbanken, Master Thesis, Universität Zürich, Oktober 2006 [http://www.ifi.uzh.ch/index.php?id=490&print=1&no_cache=1]Books
H. Katz, et al., XQuery from the Experts, Addison Wesley, 2004.J. Melton et al., Querying XML: XQuery, XPath, and SQL/XML in Context, Morgan Kaufmann/Elsevier, 2006M. Klettke, H. Meyer, XML & Datenbanken: Konzepte, Sprachen und Systeme, Meike Klettke, Holger Meyer, dpunkt, 2003http://www.xml-und-datenbanken.de/
Web & Datenbanken: Konzepte, Architekturen, Anwendungen, Erhard Rahm, Gottfried Vossen (Hrsg.), dpunkt, 2003
Bastian Gorke: XML-Datenbanken in der Praxis, bomots Verlag, 2006