A TREE BASED ALGEBRA FRAMEWORK FOR XML DATA SYSTEMS Ali El bekai, Nick Rossiter School of...

18
A TREE BASED ALGEBRA FRAMEWORK FOR XML DATA SYSTEMS Ali El bekai, Nick Rossiter School of Informatics, Northumbria University Email: [email protected] , [email protected]

Transcript of A TREE BASED ALGEBRA FRAMEWORK FOR XML DATA SYSTEMS Ali El bekai, Nick Rossiter School of...

Page 1: A TREE BASED ALGEBRA FRAMEWORK FOR XML DATA SYSTEMS Ali El bekai, Nick Rossiter School of Informatics, Northumbria University Email: ali.elbekai@unn.ac.uk,

A TREE BASED ALGEBRA FRAMEWORK FOR XML DATA

SYSTEMS Ali El bekai, Nick Rossiter

School of Informatics, Northumbria University Email: [email protected] ,

[email protected]

Page 2: A TREE BASED ALGEBRA FRAMEWORK FOR XML DATA SYSTEMS Ali El bekai, Nick Rossiter School of Informatics, Northumbria University Email: ali.elbekai@unn.ac.uk,

Overview

• Framework in algebra for processing XML data. • Review related work• Develop a simple algebra, called TA (Tree

Algebra), for processing storing and manipulating XML data as trees

• Describe input and output of the algebraic operators

• Define the syntax of relationships/operators and their semantics in terms of algorithms.

• Examples are given in the domain specific XML query language.

• Discuss closure and application

Page 3: A TREE BASED ALGEBRA FRAMEWORK FOR XML DATA SYSTEMS Ali El bekai, Nick Rossiter School of Informatics, Northumbria University Email: ali.elbekai@unn.ac.uk,

Related Work

• IBM (Beech & Rys, 1999)• Lore (McHugh et al 1997)• YATL (Christophides et al 2000)• Niagara (Galanis et al 2001) • AT&T (W3C)• TAX (Jagadish et al 2001)

• Problems identified in complexity and generality

Page 4: A TREE BASED ALGEBRA FRAMEWORK FOR XML DATA SYSTEMS Ali El bekai, Nick Rossiter School of Informatics, Northumbria University Email: ali.elbekai@unn.ac.uk,

Tree Algebra

• True tree– Each node one parent but many children– Root node

• Leaves of tree– Correspond to different sources – object

relational

• Two types of operators– Algebraic operators – Relational operators

Page 5: A TREE BASED ALGEBRA FRAMEWORK FOR XML DATA SYSTEMS Ali El bekai, Nick Rossiter School of Informatics, Northumbria University Email: ali.elbekai@unn.ac.uk,

Concepts in Tree Model

• Root (ultimate ancestor or parent)• Node (parent or child)• Edge (link from a parent to a child)• Leaf (atomic values, nodes with no children)• Path (sequence of edges between nodes)• Descendants (all successor nodes for a node)• Ancestors (all parent nodes for a node)

Page 6: A TREE BASED ALGEBRA FRAMEWORK FOR XML DATA SYSTEMS Ali El bekai, Nick Rossiter School of Informatics, Northumbria University Email: ali.elbekai@unn.ac.uk,

Mappings

• XML Document Tree

• Element Node (root, parent, child)

• Leaf child node, atomic values

• Attribute function, values

Page 7: A TREE BASED ALGEBRA FRAMEWORK FOR XML DATA SYSTEMS Ali El bekai, Nick Rossiter School of Informatics, Northumbria University Email: ali.elbekai@unn.ac.uk,

ad : (ancestor - descendent)

pc : (parent - child)

collection

object1 object3

objid objectInfor2

des_date

016

F_format

referenceinfo

Doc:

objectInfor1objNumber

100 1234Info_id

10 12.10.98

Info_id des_date

20 12.12.98

objectInfor4

des_date

objectInfor3objNumber

3239 Info_id

03 12.10.99

Info_iddes_date

09 12.12.99

objid

301imageinfor

Img_id ref_id title type

pdf r35colletor bibliographic

ad

pc

pc

pc

pc

pcpc

pc

pc

pc

Element Edge

Parent Edge

Example XML Tree

Root – collection element; object1, object3 – sub-elements;

Page 8: A TREE BASED ALGEBRA FRAMEWORK FOR XML DATA SYSTEMS Ali El bekai, Nick Rossiter School of Informatics, Northumbria University Email: ali.elbekai@unn.ac.uk,

Algebraic Relationships

• Comparison of two trees• Universal (unary)

– Defines tree containing all information

• Similarity (binary)– Two trees have the same structure

• Equivalence (binary)– Two trees are indistinguishable

• Subsumption (binary)– One tree is subsumed in another

Page 9: A TREE BASED ALGEBRA FRAMEWORK FOR XML DATA SYSTEMS Ali El bekai, Nick Rossiter School of Informatics, Northumbria University Email: ali.elbekai@unn.ac.uk,

collection3Doc3

objectInfor2

1012.10.98

collection4

20

12.12.98Info_id

desc_date

Doc4

object3

objectInfor1

Info_id desc_date

~

objectInfor2

10 12.10.982012.12.98

Info_id

desc_date

object3

objectInfor1

Info_id

desc_date

object1

~

Example Equivalence Relationship

XML Tree Collection3 is equivalent to Collection4:Same node structure, no mismatch in content

Page 10: A TREE BASED ALGEBRA FRAMEWORK FOR XML DATA SYSTEMS Ali El bekai, Nick Rossiter School of Informatics, Northumbria University Email: ali.elbekai@unn.ac.uk,

object3

objectInfor2objectInfor1

10 12.10.98

Info_id desc_date

collection3

20 12.12.98

Info_iddesc_date

Doc 3

collection1

object1 object3

objectInfor2

desc_date

16

format

referenceinfo

Doc1

objectInfor1

Info_id

10 12.10.98

Info_iddesc_date

20 12.12.98

objectInfor4

desc_date

objectInfor3

Info_id

03 12.10.99

imageinfor

Img_id ref_id type

pdf r35 Bibliographic

object1

Example Subsumption Relationship

Collection3 is part of collection4 (structure and content)

Page 11: A TREE BASED ALGEBRA FRAMEWORK FOR XML DATA SYSTEMS Ali El bekai, Nick Rossiter School of Informatics, Northumbria University Email: ali.elbekai@unn.ac.uk,

Algebraic Operators for Trees

• Join (binary, input two trees, output one tree, commutative, associative)– Joined on a predicate

• Union (binary, input two trees, output one tree, commutative, associative, disjoint)– Summing trees together

• Complement (binary, input two trees, output one tree, not commutative, not associative)– Nodes in one tree not found in another

Page 12: A TREE BASED ALGEBRA FRAMEWORK FOR XML DATA SYSTEMS Ali El bekai, Nick Rossiter School of Informatics, Northumbria University Email: ali.elbekai@unn.ac.uk,

// Input two XML document or two DOC tree (DOCn Tree, DOCm Tree)// Output DOCnm Tree = (DOCn Tree - DOCm Tree)1 Start from root node DOCn2 If root node DOCn Tree and root node DOCm Tree has parent/child node

2.1 Perform depth-first algorithm2.2 If DOCn Tree has parent node not existing in DOCm Tree

2.2.1 set parent node DOCn Tree to the new DOCnm Tree 2.2.2 while parent node DOCn Tree has child node not existing in DOCm Tree

2.2.2.1 set child node DOCn Tree to DOCnm Tree 2.2.2.2 if child node DOCn Tree has leaf node not existing in DOCm Tree

2.2.2.2.1 set leaf node DOCn Tree to DOCnm Tree 2.2.2.3 set null to DOCnm Tree 2.2.3 repeat

2.3 set null to DOCnm Tree3 Set root node to DOCnm Tree and terminate4 end/terminate

Algorithm for Complement Operator

Page 13: A TREE BASED ALGEBRA FRAMEWORK FOR XML DATA SYSTEMS Ali El bekai, Nick Rossiter School of Informatics, Northumbria University Email: ali.elbekai@unn.ac.uk,

collection1

object1 object3

objectInfor2

desc_date

16

formatreferenceinfo

Doc1

objectInfor1

10 12.10.98

Info_iddesc_date

20 12.12.98

objectInfor4

desc_date

objectInfor3

Info_id

03 12.10.99imageinfor

Img_id ref_id type

pdf r35 Bibliographic

16

format

referenceinfo

objectInfor4

desc_date

objectInfor3

Info_id

03 12.10.99

imageinfor

Img_idref_id

type

pdf r35Bibliographic

object3Doc1p

Info_id

Projection Algebra Operator (unary, input one tree, output one tree): Example

Eliminates nodes other than those specifiedProjection of object3

Page 14: A TREE BASED ALGEBRA FRAMEWORK FOR XML DATA SYSTEMS Ali El bekai, Nick Rossiter School of Informatics, Northumbria University Email: ali.elbekai@unn.ac.uk,

Algebra Operators (continued)

• Select (unary, input one tree, output one tree)– Filters nodes according to a predicate

• Expose (unary, input one tree, output one tree)– Retrieve specific elements/nodes given by

parent/child boundaries

• Vertex (unary, input one tree, output one tree)– Creates the vertex encompassing all nodes created

by the expose operator

Page 15: A TREE BASED ALGEBRA FRAMEWORK FOR XML DATA SYSTEMS Ali El bekai, Nick Rossiter School of Informatics, Northumbria University Email: ali.elbekai@unn.ac.uk,

// Input one DOC tree or one XML document// Output one DOC tree or one XML document1 start with entry point, it is the root node2 perform depth-first algorithm 2.1 if parameter is equal to the specific node needed to expose

2.1.1 return the specific node2.1.2 set specific node in the new tree

2.2 if exposed element does not exist then terminate3 end/terminate

Algorithm for Complement Operator

Page 16: A TREE BASED ALGEBRA FRAMEWORK FOR XML DATA SYSTEMS Ali El bekai, Nick Rossiter School of Informatics, Northumbria University Email: ali.elbekai@unn.ac.uk,

Results

• Developed – Domain specific algebra– Tree algebra– Algebraic relationships

• Universal, similarity, equivalence, subsumption

– Algebraic operators• Join, union, complement, project, select, expose,

vertex

– Closure – output is always a tree

Page 17: A TREE BASED ALGEBRA FRAMEWORK FOR XML DATA SYSTEMS Ali El bekai, Nick Rossiter School of Informatics, Northumbria University Email: ali.elbekai@unn.ac.uk,

Verification

• All operators:– Presented as algorithms– Implemented in java

• Case study:– Virtual museum application– Implemented code employed for satisfaction

of museum requirements

Page 18: A TREE BASED ALGEBRA FRAMEWORK FOR XML DATA SYSTEMS Ali El bekai, Nick Rossiter School of Informatics, Northumbria University Email: ali.elbekai@unn.ac.uk,

Further Work

• Investigate – Extent to which limitations in operators affects

usability– Does domain need extending?

• Further experimentation– Examine feedback from museum study– Look at further areas