San Diego Supercomputer CenterSan Diego Supercomputer CenterSan Diego Supercomputer CenterSan Diego Supercomputer CenterXMLDM'02, PragueXMLDM'02, PragueXMLDM'02, PragueXMLDM'02, Prague 1111
Time to Leave the Trees: From Syntactic Time to Leave the Trees: From Syntactic to Conceptual Querying of XMLto Conceptual Querying of XML
Time to Leave the Trees: From Syntactic Time to Leave the Trees: From Syntactic to Conceptual Querying of XMLto Conceptual Querying of XML
Bertram LudBertram Ludääscherscher
Ilkay AltintasIlkay Altintas
Amarnath GuptaAmarnath Gupta
San Diego Supercomputer Center San Diego Supercomputer Center
U.C. San DiegoU.C. San Diego
Bertram LudBertram Ludääscherscher
Ilkay AltintasIlkay Altintas
Amarnath GuptaAmarnath Gupta
San Diego Supercomputer Center San Diego Supercomputer Center
U.C. San DiegoU.C. San Diego
San Diego Supercomputer CenterSan Diego Supercomputer CenterSan Diego Supercomputer CenterSan Diego Supercomputer CenterXMLDM'02, PragueXMLDM'02, PragueXMLDM'02, PragueXMLDM'02, Prague 2222
OverviewOverviewOverviewOverview
• Motivating Example: Motivating Example: – querying XML w/o and w/ conceptual-level information
– “syntactic” vs. “conceptual” querying of XML
• Distilling conceptual-level information: Distilling conceptual-level information: – MXS (abstract Model for XML Schema)
• XPathT: XPathT: – Incorporating conceptual-level information in XPath
San Diego Supercomputer CenterSan Diego Supercomputer CenterSan Diego Supercomputer CenterSan Diego Supercomputer CenterXMLDM'02, PragueXMLDM'02, PragueXMLDM'02, PragueXMLDM'02, Prague 3333
Motivating ExampleMotivating ExampleMotivating ExampleMotivating Example
• Example: “Books DB” Example: “Books DB” (yes, more complex examples exist... ;)(yes, more complex examples exist... ;)
– elements: <myDB> ... <book> .... <price> .... <author> ...
• Sample Queries:Sample Queries:– Q1: Which <book>s have a <price> below $80?– Q2: What’s the count and average <price> of <book>s?
• (Nice) Try:(Nice) Try:– Q1: myDB//book[price<80]– Q2: N := count(myDB//book); S := sum(myDB//book/price);
Avg := S/N;
• But what about ...But what about ...– ... <book>s with multiple <price>s?– ... <awe> (award-winning-exemplars) elements (= subtype of
book having subelement <award>): we forgot those!
San Diego Supercomputer CenterSan Diego Supercomputer CenterSan Diego Supercomputer CenterSan Diego Supercomputer CenterXMLDM'02, PragueXMLDM'02, PragueXMLDM'02, PragueXMLDM'02, Prague 4444
Schema Information to the Rescue!Schema Information to the Rescue!Schema Information to the Rescue!Schema Information to the Rescue!
• XML & Semistructured Data Model:XML & Semistructured Data Model:– labeled ordered trees – “instance contains its own schema information”– XML instances and DTDs have very little “schema info”:
• tag names (aka element “types”) = attribute names• element nesting = object (“slot”) structure
no data types, constraints, classes, class hierarchy, ...
• Schemas are Good for You!Schemas are Good for You!– link to conceptual models/DB design, query formulation,– validation, storage layout (optimization), – query processing (optimization), ...
XML SchemaXML Schema
San Diego Supercomputer CenterSan Diego Supercomputer CenterSan Diego Supercomputer CenterSan Diego Supercomputer CenterXMLDM'02, PragueXMLDM'02, PragueXMLDM'02, PragueXMLDM'02, Prague 5555
Motivating Example (Cont’d)Motivating Example (Cont’d)Motivating Example (Cont’d)Motivating Example (Cont’d)
• Q1 after studying <myDB> and/or its XML Schema:Q1 after studying <myDB> and/or its XML Schema: there is a type hierarchy below type bookT tag names are bound to those types but XPath doesn’t know this => use Syntactic Queries:
//*[book OR tbook OR cbook OR...OR awe] [price<80]//*[book OR tbook OR cbook OR...OR awe] [price<80]
tedious and error-prone (do-it-yourself: tedious and error-prone (do-it-yourself: Appendix AAppendix A) ) – e.g. you overlooked <publication xsi:type=“bookT”> !(usually schema info not contained in the XML instance)
small changes in the schema (adding a new subtype) small changes in the schema (adding a new subtype) require rewriting of your query...require rewriting of your query...
San Diego Supercomputer CenterSan Diego Supercomputer CenterSan Diego Supercomputer CenterSan Diego Supercomputer CenterXMLDM'02, PragueXMLDM'02, PragueXMLDM'02, PragueXMLDM'02, Prague 6666
From Syntactic to Conceptual XML QueriesFrom Syntactic to Conceptual XML QueriesFrom Syntactic to Conceptual XML QueriesFrom Syntactic to Conceptual XML Queries
1. Distill conceptual information from the XML Schema1. Distill conceptual information from the XML Schema Abstract Model of XML Schema (MXS)
2. Incorporate MXS information into the query language2. Incorporate MXS information into the query language XPathT (“XPath with types/classes”)
turn turn Syntactic XML QuerySyntactic XML Query //*[book OR tbook OR cbook OR ... OR awe] [price<80]//*[book OR tbook OR cbook OR ... OR awe] [price<80]
into a more adequate into a more adequate Conceptual XMLConceptual XML QueryQuery:://*[ts(bookT)][price<80] /* works for any subtype of bookT */
more robust w.r.t. schema changesmore robust w.r.t. schema changes new opportunities for semantic query optimizationnew opportunities for semantic query optimization
San Diego Supercomputer CenterSan Diego Supercomputer CenterSan Diego Supercomputer CenterSan Diego Supercomputer CenterXMLDM'02, PragueXMLDM'02, PragueXMLDM'02, PragueXMLDM'02, Prague 7777
Abstract Model of XML Schema (Abstract Model of XML Schema (MXSMXS))Abstract Model of XML Schema (Abstract Model of XML Schema (MXSMXS))
• Basic Ideas: Basic Ideas: – Formal abstract model (never mind the XML Schema syntax!),
inspired by Model Schema Language (MXL) [Brown-Fuchs-Robie-Wadler-WWW10-2001]
– “Types as Classes”
• XML Schema Names: XML Schema Names: – T: Type Names
– E: Element Names
– A: Attribute Names
• XML Instances...XML Instances...– ... usually contain only element names (tags) E and attributes A
( exception: “xsd:type = ...” )
San Diego Supercomputer CenterSan Diego Supercomputer CenterSan Diego Supercomputer CenterSan Diego Supercomputer CenterXMLDM'02, PragueXMLDM'02, PragueXMLDM'02, PragueXMLDM'02, Prague 8888
Abstract Model of XML Schema (Abstract Model of XML Schema (MXSMXS))Abstract Model of XML Schema (Abstract Model of XML Schema (MXSMXS))
• MXS NamesMXS Names– T: Types, E: Elements, A: Attributes
• Kinds of TypesKinds of Types– simple vs. complex: T_s, T_c
– abstract vs. concrete: T_a, T_na
• Type HierarchyType Hierarchy– restrict (T_s T_s) (T_c T_c)
• restricts possible instances, keeping structure
– extend (T_s T_c) T_c• adds “slots” (elements and attributes)
– subtype = extend restrict• extend and restrict are subtyping mechanisms
San Diego Supercomputer CenterSan Diego Supercomputer CenterSan Diego Supercomputer CenterSan Diego Supercomputer CenterXMLDM'02, PragueXMLDM'02, PragueXMLDM'02, PragueXMLDM'02, Prague 9999
Type (Class) Hierarchy in XML SchemaType (Class) Hierarchy in XML SchemaType (Class) Hierarchy in XML SchemaType (Class) Hierarchy in XML Schema
• Convention: user-defined type names end with “T”Convention: user-defined type names end with “T”– authorT, publicationT, bookT, ...
San Diego Supercomputer CenterSan Diego Supercomputer CenterSan Diego Supercomputer CenterSan Diego Supercomputer CenterXMLDM'02, PragueXMLDM'02, PragueXMLDM'02, PragueXMLDM'02, Prague 10101010
Inheritance in XML Schema (I) Inheritance in XML Schema (I) Inheritance in XML Schema (I) Inheritance in XML Schema (I)
expTextBookTexpTextBookT ::= ::= SUBTYPESUBTYPE ((bookTbookT) ) that that RESTRICTsRESTRICTs <<priceprice> > to to expPriceTexpPriceT andand EXTENDs EXTENDs with with <<recommended_forrecommended_for>>
EXTENDEXTEND
RESTRICTRESTRICT
SUBTYPESUBTYPE
San Diego Supercomputer CenterSan Diego Supercomputer CenterSan Diego Supercomputer CenterSan Diego Supercomputer CenterXMLDM'02, PragueXMLDM'02, PragueXMLDM'02, PragueXMLDM'02, Prague 11111111
Inheritance in XML Schema (II) Inheritance in XML Schema (II) Inheritance in XML Schema (II) Inheritance in XML Schema (II)
1919ththcenturyTextBookTypecenturyTextBookType ::= ::= SUBTYPESUBTYPE {{textBookT, c19bookT}textBookT, c19bookT}
multiplemultipleinheritanceinheritance
singlesingleinheritanceinheritance
XML Schema type system does not known the two are equivalent!XML Schema type system does not known the two are equivalent!
San Diego Supercomputer CenterSan Diego Supercomputer CenterSan Diego Supercomputer CenterSan Diego Supercomputer CenterXMLDM'02, PragueXMLDM'02, PragueXMLDM'02, PragueXMLDM'02, Prague 12121212
Framework for Conceptual Queries in XMLFramework for Conceptual Queries in XMLFramework for Conceptual Queries in XMLFramework for Conceptual Queries in XML
• Binding Types to ElementsBinding Types to Elements– bind (E (T_s T_c )) (A T_s)
• binds element names to simple or complex types
• binds attribute names to simple types
• Syntactic XML InstanceSyntactic XML Instance: : DD– root(NodeId), child(NodeId,Integer,NodeId),
tag(NodeId,Tagname), data(NodeId,Data)
• Conceptual XML InstanceConceptual XML Instance: : DD++– restrict(T, T), extend(T, T), subtype(T, T),
– bind(E T, T)
– ...
San Diego Supercomputer CenterSan Diego Supercomputer CenterSan Diego Supercomputer CenterSan Diego Supercomputer CenterXMLDM'02, PragueXMLDM'02, PragueXMLDM'02, PragueXMLDM'02, Prague 13131313
XPathT: Incorporating Type (Class) XPathT: Incorporating Type (Class) Information in XPath Information in XPath
XPathT: Incorporating Type (Class) XPathT: Incorporating Type (Class) Information in XPath Information in XPath
• XPath XPath patterns ppatterns p and and qualifiers qqualifiers q: : pp[[qq]] returns returns matches matches of of pp which which qualifyqualify according to according to qq
• New New XPathTXPathT patterns: patterns:
• r(t), e(t), s(t):r(t), e(t), s(t): restrictrestrict, , extendextend, , subtypesubtype type type tt• tr(t), te(t), ts(t): tr(t), te(t), ts(t): transitivetransitive versions versions
San Diego Supercomputer CenterSan Diego Supercomputer CenterSan Diego Supercomputer CenterSan Diego Supercomputer CenterXMLDM'02, PragueXMLDM'02, PragueXMLDM'02, PragueXMLDM'02, Prague 14141414
Semantics of XPathTSemantics of XPathTSemantics of XPathTSemantics of XPathT• Example: Example:
“transitive subtype”:
SEM( ts(t) ) :=
{ t’ | subtype*(t,t’) }
from types to element names:
SEM( [T] ) :=
{ e | bind(t,e), tT }
SEM( [ts(bookT)] ) := {book,ebook,tbook, ...}
San Diego Supercomputer CenterSan Diego Supercomputer CenterSan Diego Supercomputer CenterSan Diego Supercomputer CenterXMLDM'02, PragueXMLDM'02, PragueXMLDM'02, PragueXMLDM'02, Prague 15151515
Conceptual(-level) XML Queries in XPathTConceptual(-level) XML Queries in XPathTConceptual(-level) XML Queries in XPathTConceptual(-level) XML Queries in XPathT
• Which books have price below $80?Which books have price below $80?//*[ts(bookT)][price<80]
• Semantic-aware equivalent rewriting:Semantic-aware equivalent rewriting://*[ts(bookT)][NOT ts(expTextBookT)][price<80]
• Logic XPathT Query Plan:Logic XPathT Query Plan: tree structure informationtree structure informationconceptual informationconceptual information
San Diego Supercomputer CenterSan Diego Supercomputer CenterSan Diego Supercomputer CenterSan Diego Supercomputer CenterXMLDM'02, PragueXMLDM'02, PragueXMLDM'02, PragueXMLDM'02, Prague 16161616
SummarySummarySummarySummary• Complex domains require Complex domains require conceptual level modeling and queryingconceptual level modeling and querying
capabilities capabilities beyond just tree structurebeyond just tree structure
• Statues Quo: Statues Quo: XML SchemaXML Schema: simple “conceptual model” with may : simple “conceptual model” with may ad-hoc “design decisions”/restrictions ad-hoc “design decisions”/restrictions
Abstract Abstract Model of XML SchemaModel of XML Schema ( (MXSMXS))
XPathTXPathT: first step towards “conceptual” or “semantic” XML query : first step towards “conceptual” or “semantic” XML query language extensionslanguage extensions
more more concise, intuitive, flexibleconcise, intuitive, flexible, and , and robustrobust queries queries the the system maps conceptual to syntactic queriessystem maps conceptual to syntactic queries, not the , not the
programmer/query designer!programmer/query designer!
San Diego Supercomputer CenterSan Diego Supercomputer CenterSan Diego Supercomputer CenterSan Diego Supercomputer CenterXMLDM'02, PragueXMLDM'02, PragueXMLDM'02, PragueXMLDM'02, Prague 17171717
Next Steps & OutlookNext Steps & OutlookNext Steps & OutlookNext Steps & Outlook
• extend MXS to include more conceptual informationextend MXS to include more conceptual information• develop formal semanticsdevelop formal semantics
– XPathT, extensions: XPathC, XQueryC
• research problems: research problems: – mapping: XPathC queries => equivalent XPath queries– formalize equivalence, always possible? Then, conventional
XML query processors can be used!– “proxy XML Schema doc”: instead of rewriting into XPath
over the original instance, can one materialize some conceptual info as a “proxy XML doc” such that conceptual queries become conventional queries against the proxy...
– semantic query optimization: equivalent rewritings given the conceptual level constraints
Top Related