Post on 29-Dec-2015
A TREE BASED ALGEBRA FRAMEWORK FOR XML DATA
SYSTEMS Ali El bekai, Nick Rossiter
School of Informatics, Northumbria University Email: ali.elbekai@unn.ac.uk ,
nick.rossiter@unn.ac.uk
Overview
• Framework in algebra for processing XML data. • Review related work• Develop a simple algebra, called TA (Tree
Algebra), for processing storing and manipulating XML data as trees
• Describe input and output of the algebraic operators
• Define the syntax of relationships/operators and their semantics in terms of algorithms.
• Examples are given in the domain specific XML query language.
• Discuss closure and application
Related Work
• IBM (Beech & Rys, 1999)• Lore (McHugh et al 1997)• YATL (Christophides et al 2000)• Niagara (Galanis et al 2001) • AT&T (W3C)• TAX (Jagadish et al 2001)
• Problems identified in complexity and generality
Tree Algebra
• True tree– Each node one parent but many children– Root node
• Leaves of tree– Correspond to different sources – object
relational
• Two types of operators– Algebraic operators – Relational operators
Concepts in Tree Model
• Root (ultimate ancestor or parent)• Node (parent or child)• Edge (link from a parent to a child)• Leaf (atomic values, nodes with no children)• Path (sequence of edges between nodes)• Descendants (all successor nodes for a node)• Ancestors (all parent nodes for a node)
Mappings
• XML Document Tree
• Element Node (root, parent, child)
• Leaf child node, atomic values
• Attribute function, values
ad : (ancestor - descendent)
pc : (parent - child)
collection
object1 object3
objid objectInfor2
des_date
016
F_format
referenceinfo
Doc:
objectInfor1objNumber
100 1234Info_id
10 12.10.98
Info_id des_date
20 12.12.98
objectInfor4
des_date
objectInfor3objNumber
3239 Info_id
03 12.10.99
Info_iddes_date
09 12.12.99
objid
301imageinfor
Img_id ref_id title type
pdf r35colletor bibliographic
ad
pc
pc
pc
pc
pcpc
pc
pc
pc
Element Edge
Parent Edge
Example XML Tree
Root – collection element; object1, object3 – sub-elements;
Algebraic Relationships
• Comparison of two trees• Universal (unary)
– Defines tree containing all information
• Similarity (binary)– Two trees have the same structure
• Equivalence (binary)– Two trees are indistinguishable
• Subsumption (binary)– One tree is subsumed in another
collection3Doc3
objectInfor2
1012.10.98
collection4
20
12.12.98Info_id
desc_date
Doc4
object3
objectInfor1
Info_id desc_date
~
objectInfor2
10 12.10.982012.12.98
Info_id
desc_date
object3
objectInfor1
Info_id
desc_date
object1
~
Example Equivalence Relationship
XML Tree Collection3 is equivalent to Collection4:Same node structure, no mismatch in content
object3
objectInfor2objectInfor1
10 12.10.98
Info_id desc_date
collection3
20 12.12.98
Info_iddesc_date
Doc 3
collection1
object1 object3
objectInfor2
desc_date
16
format
referenceinfo
Doc1
objectInfor1
Info_id
10 12.10.98
Info_iddesc_date
20 12.12.98
objectInfor4
desc_date
objectInfor3
Info_id
03 12.10.99
imageinfor
Img_id ref_id type
pdf r35 Bibliographic
object1
Example Subsumption Relationship
Collection3 is part of collection4 (structure and content)
Algebraic Operators for Trees
• Join (binary, input two trees, output one tree, commutative, associative)– Joined on a predicate
• Union (binary, input two trees, output one tree, commutative, associative, disjoint)– Summing trees together
• Complement (binary, input two trees, output one tree, not commutative, not associative)– Nodes in one tree not found in another
// Input two XML document or two DOC tree (DOCn Tree, DOCm Tree)// Output DOCnm Tree = (DOCn Tree - DOCm Tree)1 Start from root node DOCn2 If root node DOCn Tree and root node DOCm Tree has parent/child node
2.1 Perform depth-first algorithm2.2 If DOCn Tree has parent node not existing in DOCm Tree
2.2.1 set parent node DOCn Tree to the new DOCnm Tree 2.2.2 while parent node DOCn Tree has child node not existing in DOCm Tree
2.2.2.1 set child node DOCn Tree to DOCnm Tree 2.2.2.2 if child node DOCn Tree has leaf node not existing in DOCm Tree
2.2.2.2.1 set leaf node DOCn Tree to DOCnm Tree 2.2.2.3 set null to DOCnm Tree 2.2.3 repeat
2.3 set null to DOCnm Tree3 Set root node to DOCnm Tree and terminate4 end/terminate
Algorithm for Complement Operator
collection1
object1 object3
objectInfor2
desc_date
16
formatreferenceinfo
Doc1
objectInfor1
10 12.10.98
Info_iddesc_date
20 12.12.98
objectInfor4
desc_date
objectInfor3
Info_id
03 12.10.99imageinfor
Img_id ref_id type
pdf r35 Bibliographic
16
format
referenceinfo
objectInfor4
desc_date
objectInfor3
Info_id
03 12.10.99
imageinfor
Img_idref_id
type
pdf r35Bibliographic
object3Doc1p
Info_id
Projection Algebra Operator (unary, input one tree, output one tree): Example
Eliminates nodes other than those specifiedProjection of object3
Algebra Operators (continued)
• Select (unary, input one tree, output one tree)– Filters nodes according to a predicate
• Expose (unary, input one tree, output one tree)– Retrieve specific elements/nodes given by
parent/child boundaries
• Vertex (unary, input one tree, output one tree)– Creates the vertex encompassing all nodes created
by the expose operator
// Input one DOC tree or one XML document// Output one DOC tree or one XML document1 start with entry point, it is the root node2 perform depth-first algorithm 2.1 if parameter is equal to the specific node needed to expose
2.1.1 return the specific node2.1.2 set specific node in the new tree
2.2 if exposed element does not exist then terminate3 end/terminate
Algorithm for Complement Operator
Results
• Developed – Domain specific algebra– Tree algebra– Algebraic relationships
• Universal, similarity, equivalence, subsumption
– Algebraic operators• Join, union, complement, project, select, expose,
vertex
– Closure – output is always a tree
Verification
• All operators:– Presented as algorithms– Implemented in java
• Case study:– Virtual museum application– Implemented code employed for satisfaction
of museum requirements
Further Work
• Investigate – Extent to which limitations in operators affects
usability– Does domain need extending?
• Further experimentation– Examine feedback from museum study– Look at further areas