TIMBER A Native XML Database
-
Upload
pandora-athans -
Category
Documents
-
view
42 -
download
4
description
Transcript of TIMBER A Native XML Database
TIMBERTIMBERA Native XML DatabaseA Native XML Database
Xiali HeXiali He
The Overview of the TIMBER System in University of Michigan
OutlineOutline
IntroductionIntroduction
Motivations and Related WorkMotivations and Related Work
System ArchitectureSystem Architecture
Tree AlgebraTree Algebra
Query EvaluationQuery Evaluation
Query OptimizationQuery Optimization
Updates IssueUpdates Issue
IntroductionIntroduction
Why Native XML Database?Why Native XML Database? Mapping between XML data and existing database Mapping between XML data and existing database
has some problems due to the flexible nature of XMLhas some problems due to the flexible nature of XML Results in an unnormalized relational representationResults in an unnormalized relational representation Results in large number of tablesResults in large number of tables
Challenges in TIMBER system:Challenges in TIMBER system: Start from scratchStart from scratch Retain XML data’s natural structures and flexibility Retain XML data’s natural structures and flexibility
and heterogeneityand heterogeneity Efficient processing on tree structuresEfficient processing on tree structures UpdatesUpdates
Reuse the existing database technologiesReuse the existing database technologies Transaction Management FacilitiesTransaction Management Facilities Declarative QueryingDeclarative Querying Set-at-a-time ProcessingSet-at-a-time Processing
Redesign and tailor certain components Redesign and tailor certain components for the XML domainfor the XML domain Bulk Algebra – TAXBulk Algebra – TAX Query EvaluationQuery Evaluation Query OptimizationQuery Optimization
OutlineOutline
IntroductionIntroduction
Motivations and Related WorkMotivations and Related Work
System ArchitectureSystem Architecture
Tree AlgebraTree Algebra
Query EvaluationQuery Evaluation
Query OptimizationQuery Optimization
Updates IssueUpdates Issue
Motivations and Related WorkMotivations and Related Work
Mapping techniques between tree-based Mapping techniques between tree-based XML data to flat relational schemaXML data to flat relational schema
Problems:Problems: XML has very XML has very richrich tree structure. tree structure. Relational has Relational has rigid rigid table structure.table structure. A simple tree schema produces complex A simple tree schema produces complex
relational schema with many tables.relational schema with many tables. A simple XML query get translated into A simple XML query get translated into
expensive sequences of joins in relational expensive sequences of joins in relational database.database.
Other Direct XML data management Other Direct XML data management systems:systems: Implementation ProceduralImplementation Procedural Tuple-at-a-time Tuple-at-a-time Poor PerformancePoor Performance
On Top of object-oriented database and On Top of object-oriented database and semi-structure databasesemi-structure database
OutlineOutline
IntroductionIntroductionMotivations and Related WorkMotivations and Related WorkSystem ArchitectureSystem ArchitectureTree AlgebraTree AlgebraQuery EvaluationQuery EvaluationQuery OptimizationQuery OptimizationUpdates IssueUpdates IssueSystem StudySystem Study
System ArchitectureSystem Architecture
Data StorageData Storage
Index StorageIndex Storage
Metadata StorageMetadata Storage
Query ProcessingQuery Processing
TIMBER- An efficient XML database engine
Data StorageData StorageNodes in Timber System:Nodes in Timber System:
Node for each elementNode for each element Child node for each sub-elementChild node for each sub-element Child node for all attributes of an elementChild node for all attributes of an element Child node for content of an element nodeChild node for content of an element node Child node for all processing instructions, comments.Child node for all processing instructions, comments.
( in future)( in future)
Node Identifier in Timber System:Node Identifier in Timber System:
((SS, , EE, , LL) – ) – SStart label, tart label, EEnd Label, nd Label, LLevel Labelevel Label
Physical Storage Order:Physical Storage Order:
Sorted nodes by the value of start Labels.Sorted nodes by the value of start Labels.
System Architecture
Index StorageIndex StorageIndices in Timber System:Indices in Timber System: On attribute valuesOn attribute values On element contentOn element content On tag nameOn tag name
Index structure return lists of Index structure return lists of
((SS, , EE, , LL) labels) labels
System Architecture
Metadata StorageMetadata Storage
Use histograms for cost estimationUse histograms for cost estimation
Timber is independent of XML schemaTimber is independent of XML schema
System Architecture
Query ProcessingQuery Processing
OutlineOutline
IntroductionIntroductionMotivations and Related WorkMotivations and Related WorkSystem ArchitectureSystem ArchitectureTree AlgebraTree AlgebraQuery EvaluationQuery EvaluationQuery OptimizationQuery OptimizationUpdates IssueUpdates IssueSystem StudySystem Study
Tree Algebra - TAXTree Algebra - TAX
Timber System develop a suite of operators suited Timber System develop a suite of operators suited to manipulating trees instead of tuples:to manipulating trees instead of tuples:SelectionSelectionProjectionProjectionOrderingOrderingGroupingGroupingProductProductSet UnionSet UnionSet DifferenceSet DifferenceRenamingRenaming
Pattern Tree Pattern Tree
XML: Can not reference the component of the tree by position or XML: Can not reference the component of the tree by position or name!name!Solution: Solution: Pattern treesPattern trees to specify homogeneous tuples of node to specify homogeneous tuples of node binding. binding. Witness treeWitness tree is produced for each combination of node is produced for each combination of node bindings that matches the pattern.bindings that matches the pattern.Pattern tree can bind as many variables as there are nodes in the Pattern tree can bind as many variables as there are nodes in the pattern tree. While XPath binds only one variable.pattern tree. While XPath binds only one variable.
Tree Algebra - TAX
Pattern Tree Witness Tree
Pattern tree can also associate element content etc – another examplePattern tree can also associate element content etc – another example
SelectionSelectionTree Algebra - TAX
C - Collection
SL – Selection List P - pattern
Output: is the witness tree induced by some embedding of P into C, modified as possibly prescribed in SL.
(Lists nodes from P for which not just the nodes themselves,
but all descendants, are to be returned in the output)
More than just filter!
Order is preserved!
ProjectionProjectionTree Algebra - TAX
C - Collection
PL – Projection ListP - pattern
Output: Could be zero, one or more output trees in a projection.
(A list of node labels from P, possible with *)
Example - Projection
$1
$2 $3
pc pcPattern Tree
$1.tag = faculty &
$2.tag = RA &
$3.tag = name
PL: $1, $3
faculty
RA name
pc pc
TA
projection
faculty
name
faculty
name
pc
TA
projectionno match
pc
pc
OrderingOrderingTree Algebra - TAX
Timber system specify pattern trees to be unordered except where ordering constraints are explicitly specified!
GroupingGroupingTree Algebra - TAX
C - Collection
OL - Ordering List
P - pattern
Output: Output tree Si corresponding each group Wi (witness tree) is showed in the next page.
(compose an order direction and an element
or element attribute, with values drawn from an ordered domain)
GB - Grouping basis
With the use of grouping, we can produce a simpler and mode efficient execution!
(lists elements by label in P, whose value are used to partition the set W of witness tree of P against the collection C)
Grouping may not induce a partitioning
tax_group_root
tax_grouping_basis tax_group_subroot
Output tree: Si
one child for each element In the grouping basis
roots of the inputtree in C that
corresponding to Wi
How to make FLWR execution more How to make FLWR execution more efficient by using grouping operator?efficient by using grouping operator?
FOR $a IN distint-value(document(“bib.xml”)//author)
RETURN
<authorpubs>
{$a}
{
FOR $b IN document(“bib.xml”)//article
WHERE $a = $b/author
RETURN $b/title
}
</authorpubs>
1.1. Construct an initial pattern tree from the “inner” FLWR Construct an initial pattern tree from the “inner” FLWR statement and consisting of bound variables and their statement and consisting of bound variables and their paths from the document root.paths from the document root.
$1
$2
$1.tag = doc_root &
$2.tag = article
Algorithm:Algorithm:
2.2. Construct the input for the GROUPBY operatorConstruct the input for the GROUPBY operator
$1
$2
$1.tag = article &
$2.tag = author
pc
pc
3.3. Apply the GROUPBY operator on the collection of trees Apply the GROUPBY operator on the collection of trees generated from step 1.generated from step 1.
TAX group root
TAX groupbasis
TAX group subroot
author articlearticle
title year author title authoryear
4.4. A projection is necessary to extract from intermediate A projection is necessary to extract from intermediate grouping nodes necessary for the outcome.grouping nodes necessary for the outcome.
5. Use rename operator to change the dummy root to 5. Use rename operator to change the dummy root to the tag specified in the return clause.the tag specified in the return clause.
$1
$2
$4
$3
$5
$6
$1.tag = TAX Group root &
$2.tag = TAX.Grouping basis &
$3.tag = TAX group subroot &
$4.tag = author &
$5.tag = article &
$6.tag = title
PL: $1, $4*, $6*
OutlineOutline
IntroductionIntroduction
Motivations and Related WorkMotivations and Related Work
System ArchitectureSystem Architecture
Tree AlgebraTree Algebra
Query EvaluationQuery Evaluation
Query OptimizationQuery Optimization
Updates IssueUpdates Issue
Query EvaluationQuery Evaluation
Physical AlgebraPhysical Algebra Separation of physical algebra and logical Separation of physical algebra and logical
algebraalgebra Pattern Tree ReusePattern Tree Reuse Node MaterializationNode Materialization
Structural Joins in Pattern Tree MatchingStructural Joins in Pattern Tree Matching
GroupByGroupBy
Physical AlgebraPhysical AlgebraPattern Tree ReusePattern Tree Reuse
Query EvaluationQuery Evaluation
$1
$3 $4
$2
$1.tag = department&
$2.tag = faculty &
$3.tag = RA &
$4.tag = name
$1
$2
$1
$2
Isroot($1) &
$2.tag = secretary
$1.tag = PID1WID2 &
$2.tag = secretary
Find out the
secretary for each faculty?
Selection
projection
Node MaterializationNode Materialization Timber system has materialization in Timber system has materialization in
the physical algebra, which takes a the physical algebra, which takes a node identifier(s) as input and returns node identifier(s) as input and returns a set of XML tree(s) that correspond.a set of XML tree(s) that correspond.
Partial materialization is needed to Partial materialization is needed to minimize the size of the intermediate minimize the size of the intermediate results being manipulated.results being manipulated.
Structural Joins in Pattern Tree Structural Joins in Pattern Tree MatchingMatching
For performance reason, full database For performance reason, full database scan is not be able to find all the matches scan is not be able to find all the matches in a single pass.in a single pass.Locate one node in each pattern match by Locate one node in each pattern match by indices and scan part of database is good indices and scan part of database is good but still expensive.but still expensive.Timber!- Use all available indices and Timber!- Use all available indices and independently locate candidates for as independently locate candidates for as many nodes in pattern tree.many nodes in pattern tree.
Query Evaluation
Q: Seeking a faculty who has a secretary reporting to them
Whole Stack-Tree Family of Structural Join Algorithm.Whole Stack-Tree Family of Structural Join Algorithm.
11 22 33 44 55 66 77 xx
1010
77
44
22
11
99
88
66
55
33
AList DList
stack
Pushmerge
GroupByGroupBy
RDBMS implement grouping rely on RDBMS implement grouping rely on sorting (or hashing)sorting (or hashing)
Tree structure grouping not necessarily Tree structure grouping not necessarily partition the set. So timber system use partition the set. So timber system use pattern tree to identify group list node and pattern tree to identify group list node and thus produce all possible tuples of thus produce all possible tuples of bindings. Sorting (hashing) then can be bindings. Sorting (hashing) then can be performed by using them.performed by using them.
Query Evaluation
Query OptimizationQuery Optimization
Structural Join Order SelectionStructural Join Order Selection In relational query processing, it is almost good In relational query processing, it is almost good
idea to evaluate selections first.idea to evaluate selections first. Not in XML! Since structural join may sometimes Not in XML! Since structural join may sometimes
be more selective than selection predicate; Also, be more selective than selection predicate; Also, structural joins can be computed with node structural joins can be computed with node identifier alone, while selection predicate may identifier alone, while selection predicate may require access to the actual data.require access to the actual data.
Finding the best fully pipelined evaluation plan Finding the best fully pipelined evaluation plan by using algorithm FP-Optimization.by using algorithm FP-Optimization.
Result Size EstimationResult Size Estimation Need an accurate estimate of the cardinality of Need an accurate estimate of the cardinality of
the final query as well as each intermediate result the final query as well as each intermediate result for each query plan!for each query plan!
Position HistogramPosition Histogram
00 11
22
00 33
22
faculty TA
X-START
Y-
END Upper bound of number of matches = 2*2+1*3 = 7
5(faculty) * 3(TA) = 15
OutlineOutline
IntroductionIntroduction
Motivations and Related WorkMotivations and Related Work
System ArchitectureSystem Architecture
Tree AlgebraTree Algebra
Query EvaluationQuery Evaluation
Query OptimizationQuery Optimization
Updates IssueUpdates Issue
Update IssueUpdate Issue
Start and End label? (floating number)Start and End label? (floating number)
Changes in the sizes and numbers of Changes in the sizes and numbers of elements could cause pages to overflow or elements could cause pages to overflow or underflow. Space management!underflow. Space management!
DISCUSSIONSDISCUSSIONS
Thank You!