TIMBER A Native XML Database

41
TIMBER TIMBER A Native XML Database A Native XML Database Xiali He Xiali He The Overview of the TIMBER System in University of Michigan

description

TIMBER A Native XML Database. The Overview of the TIMBER System in University of Michigan. Xiali He. Outline. Introduction Motivations and Related Work System Architecture Tree Algebra Query Evaluation Query Optimization Updates Issue. Introduction. Why Native XML Database? - PowerPoint PPT Presentation

Transcript of TIMBER A Native XML Database

Page 1: TIMBER A Native XML Database

TIMBERTIMBERA Native XML DatabaseA Native XML Database

Xiali HeXiali He

The Overview of the TIMBER System in University of Michigan

Page 2: TIMBER A Native XML Database

OutlineOutline

IntroductionIntroduction

Motivations and Related WorkMotivations and Related Work

System ArchitectureSystem Architecture

Tree AlgebraTree Algebra

Query EvaluationQuery Evaluation

Query OptimizationQuery Optimization

Updates IssueUpdates Issue

Page 3: TIMBER A Native XML Database

IntroductionIntroduction

Why Native XML Database?Why Native XML Database? Mapping between XML data and existing database Mapping between XML data and existing database

has some problems due to the flexible nature of XMLhas some problems due to the flexible nature of XML Results in an unnormalized relational representationResults in an unnormalized relational representation Results in large number of tablesResults in large number of tables

Challenges in TIMBER system:Challenges in TIMBER system: Start from scratchStart from scratch Retain XML data’s natural structures and flexibility Retain XML data’s natural structures and flexibility

and heterogeneityand heterogeneity Efficient processing on tree structuresEfficient processing on tree structures UpdatesUpdates

Page 4: TIMBER A Native XML Database

Reuse the existing database technologiesReuse the existing database technologies Transaction Management FacilitiesTransaction Management Facilities Declarative QueryingDeclarative Querying Set-at-a-time ProcessingSet-at-a-time Processing

Redesign and tailor certain components Redesign and tailor certain components for the XML domainfor the XML domain Bulk Algebra – TAXBulk Algebra – TAX Query EvaluationQuery Evaluation Query OptimizationQuery Optimization

Page 5: TIMBER A Native XML Database

OutlineOutline

IntroductionIntroduction

Motivations and Related WorkMotivations and Related Work

System ArchitectureSystem Architecture

Tree AlgebraTree Algebra

Query EvaluationQuery Evaluation

Query OptimizationQuery Optimization

Updates IssueUpdates Issue

Page 6: TIMBER A Native XML Database

Motivations and Related WorkMotivations and Related Work

Mapping techniques between tree-based Mapping techniques between tree-based XML data to flat relational schemaXML data to flat relational schema

Problems:Problems: XML has very XML has very richrich tree structure. tree structure. Relational has Relational has rigid rigid table structure.table structure. A simple tree schema produces complex A simple tree schema produces complex

relational schema with many tables.relational schema with many tables. A simple XML query get translated into A simple XML query get translated into

expensive sequences of joins in relational expensive sequences of joins in relational database.database.

Page 7: TIMBER A Native XML Database

Other Direct XML data management Other Direct XML data management systems:systems: Implementation ProceduralImplementation Procedural Tuple-at-a-time Tuple-at-a-time Poor PerformancePoor Performance

On Top of object-oriented database and On Top of object-oriented database and semi-structure databasesemi-structure database

Page 8: TIMBER A Native XML Database

OutlineOutline

IntroductionIntroductionMotivations and Related WorkMotivations and Related WorkSystem ArchitectureSystem ArchitectureTree AlgebraTree AlgebraQuery EvaluationQuery EvaluationQuery OptimizationQuery OptimizationUpdates IssueUpdates IssueSystem StudySystem Study

Page 9: TIMBER A Native XML Database

System ArchitectureSystem Architecture

Data StorageData Storage

Index StorageIndex Storage

Metadata StorageMetadata Storage

Query ProcessingQuery Processing

TIMBER- An efficient XML database engine

Page 10: TIMBER A Native XML Database

Data StorageData StorageNodes in Timber System:Nodes in Timber System:

Node for each elementNode for each element Child node for each sub-elementChild node for each sub-element Child node for all attributes of an elementChild node for all attributes of an element Child node for content of an element nodeChild node for content of an element node Child node for all processing instructions, comments.Child node for all processing instructions, comments.

( in future)( in future)

Node Identifier in Timber System:Node Identifier in Timber System:

((SS, , EE, , LL) – ) – SStart label, tart label, EEnd Label, nd Label, LLevel Labelevel Label

Physical Storage Order:Physical Storage Order:

Sorted nodes by the value of start Labels.Sorted nodes by the value of start Labels.

System Architecture

Page 11: TIMBER A Native XML Database

Index StorageIndex StorageIndices in Timber System:Indices in Timber System: On attribute valuesOn attribute values On element contentOn element content On tag nameOn tag name

Index structure return lists of Index structure return lists of

((SS, , EE, , LL) labels) labels

System Architecture

Page 12: TIMBER A Native XML Database

Metadata StorageMetadata Storage

Use histograms for cost estimationUse histograms for cost estimation

Timber is independent of XML schemaTimber is independent of XML schema

System Architecture

Query ProcessingQuery Processing

Page 13: TIMBER A Native XML Database

OutlineOutline

IntroductionIntroductionMotivations and Related WorkMotivations and Related WorkSystem ArchitectureSystem ArchitectureTree AlgebraTree AlgebraQuery EvaluationQuery EvaluationQuery OptimizationQuery OptimizationUpdates IssueUpdates IssueSystem StudySystem Study

Page 14: TIMBER A Native XML Database

Tree Algebra - TAXTree Algebra - TAX

Timber System develop a suite of operators suited Timber System develop a suite of operators suited to manipulating trees instead of tuples:to manipulating trees instead of tuples:SelectionSelectionProjectionProjectionOrderingOrderingGroupingGroupingProductProductSet UnionSet UnionSet DifferenceSet DifferenceRenamingRenaming

Page 15: TIMBER A Native XML Database
Page 16: TIMBER A Native XML Database

Pattern Tree Pattern Tree

XML: Can not reference the component of the tree by position or XML: Can not reference the component of the tree by position or name!name!Solution: Solution: Pattern treesPattern trees to specify homogeneous tuples of node to specify homogeneous tuples of node binding. binding. Witness treeWitness tree is produced for each combination of node is produced for each combination of node bindings that matches the pattern.bindings that matches the pattern.Pattern tree can bind as many variables as there are nodes in the Pattern tree can bind as many variables as there are nodes in the pattern tree. While XPath binds only one variable.pattern tree. While XPath binds only one variable.

Tree Algebra - TAX

Pattern Tree Witness Tree

Page 17: TIMBER A Native XML Database

Pattern tree can also associate element content etc – another examplePattern tree can also associate element content etc – another example

Page 18: TIMBER A Native XML Database

SelectionSelectionTree Algebra - TAX

C - Collection

SL – Selection List P - pattern

Output: is the witness tree induced by some embedding of P into C, modified as possibly prescribed in SL.

(Lists nodes from P for which not just the nodes themselves,

but all descendants, are to be returned in the output)

More than just filter!

Order is preserved!

Page 19: TIMBER A Native XML Database

ProjectionProjectionTree Algebra - TAX

C - Collection

PL – Projection ListP - pattern

Output: Could be zero, one or more output trees in a projection.

(A list of node labels from P, possible with *)

Page 20: TIMBER A Native XML Database

Example - Projection

$1

$2 $3

pc pcPattern Tree

$1.tag = faculty &

$2.tag = RA &

$3.tag = name

PL: $1, $3

faculty

RA name

pc pc

TA

projection

faculty

name

faculty

name

pc

TA

projectionno match

pc

pc

Page 21: TIMBER A Native XML Database

OrderingOrderingTree Algebra - TAX

Timber system specify pattern trees to be unordered except where ordering constraints are explicitly specified!

Page 22: TIMBER A Native XML Database

GroupingGroupingTree Algebra - TAX

C - Collection

OL - Ordering List

P - pattern

Output: Output tree Si corresponding each group Wi (witness tree) is showed in the next page.

(compose an order direction and an element

or element attribute, with values drawn from an ordered domain)

GB - Grouping basis

With the use of grouping, we can produce a simpler and mode efficient execution!

(lists elements by label in P, whose value are used to partition the set W of witness tree of P against the collection C)

Grouping may not induce a partitioning

Page 23: TIMBER A Native XML Database

tax_group_root

tax_grouping_basis tax_group_subroot

Output tree: Si

one child for each element In the grouping basis

roots of the inputtree in C that

corresponding to Wi

Page 24: TIMBER A Native XML Database

How to make FLWR execution more How to make FLWR execution more efficient by using grouping operator?efficient by using grouping operator?

FOR $a IN distint-value(document(“bib.xml”)//author)

RETURN

<authorpubs>

{$a}

{

FOR $b IN document(“bib.xml”)//article

WHERE $a = $b/author

RETURN $b/title

}

</authorpubs>

Page 25: TIMBER A Native XML Database

1.1. Construct an initial pattern tree from the “inner” FLWR Construct an initial pattern tree from the “inner” FLWR statement and consisting of bound variables and their statement and consisting of bound variables and their paths from the document root.paths from the document root.

$1

$2

$1.tag = doc_root &

$2.tag = article

Algorithm:Algorithm:

2.2. Construct the input for the GROUPBY operatorConstruct the input for the GROUPBY operator

$1

$2

$1.tag = article &

$2.tag = author

pc

pc

Page 26: TIMBER A Native XML Database

3.3. Apply the GROUPBY operator on the collection of trees Apply the GROUPBY operator on the collection of trees generated from step 1.generated from step 1.

TAX group root

TAX groupbasis

TAX group subroot

author articlearticle

title year author title authoryear

Page 27: TIMBER A Native XML Database

4.4. A projection is necessary to extract from intermediate A projection is necessary to extract from intermediate grouping nodes necessary for the outcome.grouping nodes necessary for the outcome.

5. Use rename operator to change the dummy root to 5. Use rename operator to change the dummy root to the tag specified in the return clause.the tag specified in the return clause.

$1

$2

$4

$3

$5

$6

$1.tag = TAX Group root &

$2.tag = TAX.Grouping basis &

$3.tag = TAX group subroot &

$4.tag = author &

$5.tag = article &

$6.tag = title

PL: $1, $4*, $6*

Page 28: TIMBER A Native XML Database

OutlineOutline

IntroductionIntroduction

Motivations and Related WorkMotivations and Related Work

System ArchitectureSystem Architecture

Tree AlgebraTree Algebra

Query EvaluationQuery Evaluation

Query OptimizationQuery Optimization

Updates IssueUpdates Issue

Page 29: TIMBER A Native XML Database

Query EvaluationQuery Evaluation

Physical AlgebraPhysical Algebra Separation of physical algebra and logical Separation of physical algebra and logical

algebraalgebra Pattern Tree ReusePattern Tree Reuse Node MaterializationNode Materialization

Structural Joins in Pattern Tree MatchingStructural Joins in Pattern Tree Matching

GroupByGroupBy

Page 30: TIMBER A Native XML Database

Physical AlgebraPhysical AlgebraPattern Tree ReusePattern Tree Reuse

Query EvaluationQuery Evaluation

$1

$3 $4

$2

$1.tag = department&

$2.tag = faculty &

$3.tag = RA &

$4.tag = name

$1

$2

$1

$2

Isroot($1) &

$2.tag = secretary

$1.tag = PID1WID2 &

$2.tag = secretary

Find out the

secretary for each faculty?

Selection

projection

Page 31: TIMBER A Native XML Database

Node MaterializationNode Materialization Timber system has materialization in Timber system has materialization in

the physical algebra, which takes a the physical algebra, which takes a node identifier(s) as input and returns node identifier(s) as input and returns a set of XML tree(s) that correspond.a set of XML tree(s) that correspond.

Partial materialization is needed to Partial materialization is needed to minimize the size of the intermediate minimize the size of the intermediate results being manipulated.results being manipulated.

Page 32: TIMBER A Native XML Database

Structural Joins in Pattern Tree Structural Joins in Pattern Tree MatchingMatching

For performance reason, full database For performance reason, full database scan is not be able to find all the matches scan is not be able to find all the matches in a single pass.in a single pass.Locate one node in each pattern match by Locate one node in each pattern match by indices and scan part of database is good indices and scan part of database is good but still expensive.but still expensive.Timber!- Use all available indices and Timber!- Use all available indices and independently locate candidates for as independently locate candidates for as many nodes in pattern tree.many nodes in pattern tree.

Query Evaluation

Page 33: TIMBER A Native XML Database

Q: Seeking a faculty who has a secretary reporting to them

Page 34: TIMBER A Native XML Database

Whole Stack-Tree Family of Structural Join Algorithm.Whole Stack-Tree Family of Structural Join Algorithm.

11 22 33 44 55 66 77 xx

1010

77

44

22

11

99

88

66

55

33

AList DList

stack

Pushmerge

Page 35: TIMBER A Native XML Database

GroupByGroupBy

RDBMS implement grouping rely on RDBMS implement grouping rely on sorting (or hashing)sorting (or hashing)

Tree structure grouping not necessarily Tree structure grouping not necessarily partition the set. So timber system use partition the set. So timber system use pattern tree to identify group list node and pattern tree to identify group list node and thus produce all possible tuples of thus produce all possible tuples of bindings. Sorting (hashing) then can be bindings. Sorting (hashing) then can be performed by using them.performed by using them.

Query Evaluation

Page 36: TIMBER A Native XML Database

Query OptimizationQuery Optimization

Structural Join Order SelectionStructural Join Order Selection In relational query processing, it is almost good In relational query processing, it is almost good

idea to evaluate selections first.idea to evaluate selections first. Not in XML! Since structural join may sometimes Not in XML! Since structural join may sometimes

be more selective than selection predicate; Also, be more selective than selection predicate; Also, structural joins can be computed with node structural joins can be computed with node identifier alone, while selection predicate may identifier alone, while selection predicate may require access to the actual data.require access to the actual data.

Finding the best fully pipelined evaluation plan Finding the best fully pipelined evaluation plan by using algorithm FP-Optimization.by using algorithm FP-Optimization.

Page 37: TIMBER A Native XML Database
Page 38: TIMBER A Native XML Database

Result Size EstimationResult Size Estimation Need an accurate estimate of the cardinality of Need an accurate estimate of the cardinality of

the final query as well as each intermediate result the final query as well as each intermediate result for each query plan!for each query plan!

Position HistogramPosition Histogram

00 11

22

00 33

22

faculty TA

X-START

Y-

END Upper bound of number of matches = 2*2+1*3 = 7

5(faculty) * 3(TA) = 15

Page 39: TIMBER A Native XML Database

OutlineOutline

IntroductionIntroduction

Motivations and Related WorkMotivations and Related Work

System ArchitectureSystem Architecture

Tree AlgebraTree Algebra

Query EvaluationQuery Evaluation

Query OptimizationQuery Optimization

Updates IssueUpdates Issue

Page 40: TIMBER A Native XML Database

Update IssueUpdate Issue

Start and End label? (floating number)Start and End label? (floating number)

Changes in the sizes and numbers of Changes in the sizes and numbers of elements could cause pages to overflow or elements could cause pages to overflow or underflow. Space management!underflow. Space management!

Page 41: TIMBER A Native XML Database

DISCUSSIONSDISCUSSIONS

Thank You!