XML- : an extendible framework for manipulating XML data

25
1 XML-KSI, 2004 XML-: an extendible framework for manipulating XML data Jaroslav Pokorny Charles University Praha

description

XML-  : an extendible framework for manipulating XML data. Jaroslav Pokorny Charles University Praha. Two approaches to XML. logical or physical Idea: XML as a database DB of XML documents „mix“ of (relational) DB and XML data XML views (over non-XML and/or XML data) Advantages: - PowerPoint PPT Presentation

Transcript of XML- : an extendible framework for manipulating XML data

Page 1: XML-  : an extendible framework for manipulating XML data

1XML-KSI, 2004

XML-: an extendible framework for manipulating XML data

Jaroslav Pokorny

Charles University

Praha

Page 2: XML-  : an extendible framework for manipulating XML data

2XML-KSI, 2004

Two approaches to XML

logical or physical

Idea: XML as a database– DB of XML documents – „mix“ of (relational) DB and XML data– XML views (over non-XML and/or XML data)

Advantages: – independence on original platforms and models on

processed data– more flexible for design, manipulation (integration,

updates, querying)

Page 3: XML-  : an extendible framework for manipulating XML data

3XML-KSI, 2004

Two approaches to XML

implications– implementations: XML DBs (native, via relational,

OO, OR), – special demands on query languages

• how do them powerful• how to describe their semantics• how implement them

– new types of software: wrappers, mediators (personal) goal: to develop a powerful formal

approach appropriate for manipulating both XML and non-XML data

Page 4: XML-  : an extendible framework for manipulating XML data

4XML-KSI, 2004

Outline XML - shortly

XML – functional data model

functional typing XML (and non-XML data)

LT language

XML-schema, XML-database

XML- framework

Conclusions

Page 5: XML-  : an extendible framework for manipulating XML data

5XML-KSI, 2004

XML – an example<!DOCTYPE biblio [<!ELEMENT biblio (book monograph)*><!ELEMENT book (title, author*)><!ELEMENT title (#PCDATA)<!ELEMENT monograph (title, author, editor)><!ATTLIST monograph year CDATA #REQUIRED><!ELEMENT editor (monograph*)><!ELEMENT author (name, address?)><!ELEMENT name (firstname?, surname)><!ELEMENT firstname (#PCDATA) ><!ELEMENT surname (#PCDATA) ><!ELEMENT address(locality, ZIP)><!ELEMENT locality (#PCDATA) ><!ELEMENT ZIP (#PCDATA) >]>

Page 6: XML-  : an extendible framework for manipulating XML data

6XML-KSI, 2004

XML – an example<book>

<title> Fundamentals of DBS </title><author >

<name><firstname> Ramez </firstname><surname> Elmasri </surname>

</name><address >

<locality> Arlington </locality><ZIP> 76019 </ZIP>

</address></author ><author >

<name><firstname> Shamkant </firstname><surname> Navathe </surname>

</name></author >

</book>

Page 7: XML-  : an extendible framework for manipulating XML data

7XML-KSI, 2004

XML model

Usually: tree- or graph-oriented

Here: inspiration by functional approach to conceptual modelling

DEPARTMENT

MEMBER*

PROJECT*

For example, the HIT data model from 80s.

Page 8: XML-  : an extendible framework for manipulating XML data

8XML-KSI, 2004

Synopsis of the approach Typing XML data

Background: – a functional type system (base of primitive types + functions,

tuples, and unions)

Extensions to:– typing XML regular expressions,– typing XML elements.

Querying XML elements– a general typed -calculus (functional variables and

constants, tuples, applications of functions, -abstractions)• XML-database schema as a set of variables of types,• XML-database as any valuation of these variables

– XML- - a syntactic variant of the typed -calculus over XML-data

Page 9: XML-  : an extendible framework for manipulating XML data

9XML-KSI, 2004

Typing XML data - informally

E … a set of abstract elements. The content of an abstract element will be either a string

from PCDATA, in the easiest example, or a sequence of abstract subelements (or groups), or empty.

Ex: <phone>781 7090</phone>. It is an instance of a phone element object.

For an eE, phone(e) returns e.g. the phone number ‘781 7090‘.

phone element object will be conceived as a (partial) function from E into PCDATA.

Page 10: XML-  : an extendible framework for manipulating XML data

10XML-KSI, 2004

Typing XML data - informallyEx:

<!ELEMENT name (firstname?, surname)>is conceived a set of functions from E E EThe current name element object, i.e. the one

stored in a given XML database, is a function assigning to each abstract element eE at most a couple of abstract elements.

Hierarchy of notions:element type, element object, element

Page 11: XML-  : an extendible framework for manipulating XML data

11XML-KSI, 2004

Functional typing

B … a set of symbols (the base)

T ::= S primitive type

(T1 T2) functional type

(T1,...,Tn) tuple type

(T1 + T2) union type

where S B

Remark: relations are ((T1,...,Tn ) BOOL)-objects!

Page 12: XML-  : an extendible framework for manipulating XML data

12XML-KSI, 2004

Functional typingInterpretation:

Members of B … mutually disjoint non-empty sets, (T1

T2) ... the set of all (total or partial) functions from T1 into T2, (T1,...,Tn) … T1... Tn, (T1+…+Tn ) … Ti

Exs: arithmetic operations: +, -, *, / are

((NUMBER, NUMBER) NUMBER)-objects. logic:

– and/((BOOL, BOOL) BOOL), – universal R-quantifier R, and existential R-quantifiers R are

( (R BOOL) BOOL) - objects.– R-identity =R is ((R,R) BOOL)-object.

aggregation functions: COUNTR /((R BOOL) NUMBER)

Page 13: XML-  : an extendible framework for manipulating XML data

13XML-KSI, 2004

Typing XML regular expressions

Let B = {PCDATA, BOOL, NAME}. The type system Treg over B is recursively defined as follows.

T ::= tag: PCDATA tag:

where tag NAME. elementary regular expression

T* zero or more

T+ one or more

T? zero or one

where T is an alternative or elementary regular expression.

(T1 T2) alternative

Page 14: XML-  : an extendible framework for manipulating XML data

14XML-KSI, 2004

Typing XML regular expressions

Interpretation:

Ex.:

(T1 T2) … a set of objects of type T1 T2.

T* … (T BOOL) /partially ordered model/

T* … ((T, NUMBER) BOOL) /ordered model/

– Consider a function f of this type. For a couple (t, i),

f(t, i) = TRUE iff t is ith object in an (ordered) set of T-objects.

Page 15: XML-  : an extendible framework for manipulating XML data

15XML-KSI, 2004

Typing XML elements and attributesTreg over B, E.

The type system TE induced by Treg (or TE if Treg is understood) containing the regular element expressions given by the following rules:

E ::= TAG:T TAG: elementary element typeswhere tag:T and tag: are elementary regular expressions over B

E* E+ E? (E1 E2)

TAG:(E1,..., En)

where tag NAME.

Elementary element types and regular element expressions TAG:(E1,...,En) are called element types.

Page 16: XML-  : an extendible framework for manipulating XML data

16XML-KSI, 2004

Typing XML elements and attributesSemantics of element types:

TAG:PCDATA … the set of all (partial functions) from E to tag:PCDATA

… etc

Attributes are also functions.

Ex.: year (of monograph) is a function assigning to each monograph its year (of issue).

Notation:

EMONOGRAPH CDATA

Page 17: XML-  : an extendible framework for manipulating XML data

17XML-KSI, 2004

Example: BIBLIO element types

TITLE:PCDATAFIRSTNAME:PCDATASURNAME:PCDATALOCALITY:PCDATAZIP:PCDATAADDRESS:(LOCALITY, ZIP)BOOK:(TITLE, AUTHOR*)NAME:(FIRSTNAME, SURNAME)MONOGRAPH:(TITLE, AUTHOR, EDITOR)

YEAR/(MONOGRAPH CDATA)

EDITOR:MONOGRAPH*

AUTHOR:(NAME, ADDRESS?)

BIBLIO: (BOOK MONOGRAPH)*

Page 18: XML-  : an extendible framework for manipulating XML data

18XML-KSI, 2004

LT language (Language of Terms)Func ... constants, each of a fixed type, variables for

each type from T. Let types T, T1, ..., Tn (n 1) are members of T.

Typed constants and variables are terms.

M(M1,...,Mn) application

x1,...,xn(M) -abstraction

where x1,...,xn are distinct variables

(M1,...,Mn) tuple

Mi projections

for a term M (M1,...,Mn) K:M tagged termwhere K/NAME. If M/T, then K:M/(E T).

Page 19: XML-  : an extendible framework for manipulating XML data

19XML-KSI, 2004

Schema and DB

XML-database schema, SXML, is a set of variables of types from TE.

Given a database schema SXML, an XML-database is any valuation of these variables.

Ex.: SURNAME, AUTHOR

Page 20: XML-  : an extendible framework for manipulating XML data

20XML-KSI, 2004

XML- framework What is it? XML- framework is a subset of LT + syntactic sugarFeatures: queries are expressed by terms Ex.: AUTHOR (1)

RESULT: AUTHOR …. more „XML-like“)Typically: .. ( .. …(expression)…),

where expression/BOOLx (AUTHOR(x)) does the same as (1)

paths as compositions of functionsEx.: SURNAME(NAME(AUTHOR(m)))

where m is a monograph abstract element objectNotation: m.AUTHOR.NAME.SURNAME

Page 21: XML-  : an extendible framework for manipulating XML data

21XML-KSI, 2004

XML- framework applications of logic, arithmetic, … functions

e (b.AUTHOR(e) and e.NAME.SURNAME = ‘Smith’)

where b is a book abstract element object

b e (b.AUTHOR(e) and e.NAME.SURNAME = ‘Smith’)

is a YES/NO query.

Page 22: XML-  : an extendible framework for manipulating XML data

22XML-KSI, 2004

XML- framework restructuring

name:x.NAME (title:y (.BOOK.(AUTHOR(x) and

TITLE = y)) )

title:y (name:x.NAME (.BOOK.(AUTHOR(x) and

TITLE = y)) )Notation: tagged variables, content of abstract elements by y, x

aggregations + nesting

D. For each book, find the number of its authors.

x, n (.BOOK..(TITLE = x and COUNT(AUTHOR) = n))Notation: dots .. for omitting parts of paths and prefixes

possibility to embed any user defined function

Page 23: XML-  : an extendible framework for manipulating XML data

23XML-KSI, 2004

XML- framework D(XQuery):

FOR $x IN distinct(document(“biblio1.xml”)//book)

LET $n := count($x/author)

RETURN <book>

<name>$x/title/text()</name>

<numb_of_auth>$n</numb_of_auth>

</book>

Page 24: XML-  : an extendible framework for manipulating XML data

24XML-KSI, 2004

Integration of heterogeneous information sources

relational schemes, DTDs, ADTs, classes in OO

user

queryanswer

typed objects

Page 25: XML-  : an extendible framework for manipulating XML data

25XML-KSI, 2004

ConclusionsIssues: finding appropriate restrictions of XML- for querying implementation is in progress

The forthcoming paper: cleaning the model (ordered and unordered) formal semantics of types, extensions to tagged variables

Future: XML- with tag variables semantics of XQuery in XML- framework