Min LuTIMBER: A Native XML DB1 TIMBER: A Native XML Database Author: H.V. Jagadish, etc. Presenter:...

29
Min Lu TIMBER: A Native XML DB 1 TIMBER: TIMBER: A Native XML A Native XML Database Database Author: H.V. Jagadish, etc. Presenter: Min Lu Date: Apr 5, 2005

Transcript of Min LuTIMBER: A Native XML DB1 TIMBER: A Native XML Database Author: H.V. Jagadish, etc. Presenter:...

Min Lu TIMBER: A Native XML DB 1

TIMBER: TIMBER: A Native XML DatabaseA Native XML Database

Author: H.V. Jagadish, etc.

Presenter: Min Lu

Date: Apr 5, 2005

Min Lu TIMBER: A Native XML DB 2

IntroductionIntroduction

• Growing XML – XML repository• New Approach – Native XML DB• TIMBER: Tree-structured native XML

database Implemented at the University of Michigan by Bright Energetic Researchers

Min Lu TIMBER: A Native XML DB 3

Topics of DiscussionTopics of Discussion

MotivationMotivation• TIMBER Architecture• Tree Algebra (TAX)• Query Optimization• Conclusion

Min Lu TIMBER: A Native XML DB 4

MotivationMotivation

Min Lu TIMBER: A Native XML DB 5

MotivationMotivation

• XML Characteristics* Tree structured - elements can be structurally

related and these relationships are meaningful

* Flexibility

• Map XML to Relational DB

* Unnormalized relational representation

* Or a large number of tables

Min Lu TIMBER: A Native XML DB 6

MotivationMotivation

• Native XML DB• Tamino - a commercial one• Natix - a native XML data management

system, designed for storing and processing XML data.

• Timber – on “Shore” storage manager.

Min Lu TIMBER: A Native XML DB 7

Topics of DiscussionTopics of Discussion

• Motivation TIMBER ArchitectureTIMBER Architecture• Tree Algebra (TAX)• Query Optimization• Conclusion

Min Lu TIMBER: A Native XML DB 8

TIMBER ArchitectureTIMBER Architecture

(Shore)

Shore:• Disk memory management• Buffering • Concurrency control

Min Lu TIMBER: A Native XML DB 9

TIMBER Architecture – Data FlowTIMBER Architecture – Data Flow

Parse tree

(Shore)

Internal representation

Interface

One node at a time

InterfaceInterface

Min Lu TIMBER: A Native XML DB 10

TIMBER Architecture – Query FlowTIMBER Architecture – Query Flow

Operator tree

(Shore)

CallCall

CallCall

Min Lu TIMBER: A Native XML DB 11

Nodes in TIMBERNodes in TIMBER

• One node for each element• All attributes clubbed into one node• Content of element pulled into a

child node• Processing instruction, comments

are simply ignored

Min Lu TIMBER: A Native XML DB 12

Node LabelsNode Labels

• The determination of PC, AD relationships is a frequent operation

• Label each node with a triple• Start, end, level: (S, E, L)

Min Lu TIMBER: A Native XML DB 13

Triple Labels for AD & PCTriple Labels for AD & PC

• AD: (S1, E1, L1) - (S2, E2, L2) <=> S1<S2 & E1>E2ex. (1.0, 9.0, 1) – (3.0, 6.0, 5)

• PC: (S1, E1, L1) - (S2, E2, L2)<=> S1<S2 & E1>E2 & L1=L2-1ex. (1.0, 9.0, 1) – (2.0, 8.0, 2)

1.0 3.0 6.0 9.0

Descendant interval Ancestor interval

Min Lu TIMBER: A Native XML DB 14

Triple Label BenefitsTriple Label Benefits

• Updates: no re-labeling• Use Double value to leave gaps for

new nodes• Serves as a node identifier• Store nodes by the start labels to

cluster their sub-elements together with them

Min Lu TIMBER: A Native XML DB 15

Topics of DiscussionTopics of Discussion

• Motivation• TIMBER Architecture Tree Algebra (TAX)Tree Algebra (TAX)• Query Optimization• Conclusion

Min Lu TIMBER: A Native XML DB 16

Tree Algebra (TAX)Tree Algebra (TAX)

• Set-at-a-time for efficiency• Bulk algebra: input one or more sets of

trees and output a set of trees• Pattern tree: the portion of interest• Witness tree: bears witness to the

success of the pattern match on the input tree

Min Lu TIMBER: A Native XML DB 17

Pattern Tree & Witness TreePattern Tree & Witness Tree

A

B C

Min Lu TIMBER: A Native XML DB 18

Operators in TAXOperators in TAX

• Algebra Operations developed:Selection, Projection, Product,

Set union, Set difference,

Renaming, Reordering, Grouping

• The core of XQuery can be parsed to TAX operators

Min Lu TIMBER: A Native XML DB 19

Projection Operator in TAXProjection Operator in TAX

Input C: collection of treesParameter P: pattern treeParameter PL: projection list

(the info to keep in the output)

Min Lu TIMBER: A Native XML DB 20

Topics of DiscussionTopics of Discussion

• Motivation• TIMBER Architecture• Tree Algebra (TAX) Query OptimizationQuery Optimization• Conclusion

Min Lu TIMBER: A Native XML DB 21

Query OptimizationQuery Optimization

• Consider the join between faculty node and secretary node first, then join the result with RA node.

• Join faculty node with RA node first, then, join the result with secretary node.

Min Lu TIMBER: A Native XML DB 22

Query OptimizerQuery Optimizer

• Query optimizer enumerates all evaluation plans, estimate their costs, then choose the optimal one.

• An algorithm FP_Optimization for finding the best evaluation plan.

Min Lu TIMBER: A Native XML DB 23

Case Study for Query OptimizationCase Study for Query Optimization

• Consider the query against the DB “mBench 0.1x data set” with about 130,000 nodes

A

B

D

F

C

E

G

A

B

D F

C E G

Min Lu TIMBER: A Native XML DB 24

Query OptimizationQuery Optimization

Five Alternative Query Plans with different orders and combination of operators.

Min Lu TIMBER: A Native XML DB 25

Performance StudyPerformance Study

Min Lu TIMBER: A Native XML DB 26

Topics of DiscussionTopics of Discussion

• Motivation• TIMBER Architecture• Tree Algebra (TAX)• Query Optimization ConclusionConclusion

Min Lu TIMBER: A Native XML DB 27

ConclusionConclusion

• A comprehensive set-at-a-time query processing ability in a native XML store, with all the standard components of relational query processing

• New access methods have been developed to evaluate queries from XML

• New cost estimation and query optimization techniques have been developed.

Min Lu TIMBER: A Native XML DB 28

Work to be DoneWork to be Done

• Currently all processing instructions, comments, and such are simply ignored.- An extra child node of the element node with all such data needs to be created.

• TIMBER was developed when XQuery didn’t support updates.- 11th Feb 2005: First Public Working Draft of the XQuery Update Facility Requirements;- A parser has to be implemented to support updates.

• During an extremely localized sequence of inserts, the Start End labels become an issue.

Min Lu TIMBER: A Native XML DB 29

Questions?Questions?

Thank you!