Trees & Hierarchies in SQLfiles.meetup.com/274991/TREES.pdf · Trees in SQL -2 Trees are not...

46
Trees & Hierarchies in SQL Joe Celko Copyright 2009

Transcript of Trees & Hierarchies in SQLfiles.meetup.com/274991/TREES.pdf · Trees in SQL -2 Trees are not...

Page 1: Trees & Hierarchies in SQLfiles.meetup.com/274991/TREES.pdf · Trees in SQL -2 Trees are not hierarchies –Hierarchies have subordination –Kill your captain, you still have to

Trees & Hierarchies in SQL

Joe Celko

Copyright 2009

Page 2: Trees & Hierarchies in SQLfiles.meetup.com/274991/TREES.pdf · Trees in SQL -2 Trees are not hierarchies –Hierarchies have subordination –Kill your captain, you still have to

Trees in SQL

Trees are graph structures used to represent

– Hierarchies

– Parts explosions

– Organizational charts

Three major methods in SQL – Adjacency list model

– Nested Sets Model

– Path enumeration

Page 3: Trees & Hierarchies in SQLfiles.meetup.com/274991/TREES.pdf · Trees in SQL -2 Trees are not hierarchies –Hierarchies have subordination –Kill your captain, you still have to

Trees in SQL -2

Trees are not hierarchies – Hierarchies have subordination

– Kill your captain, you still have to take orders from your general

– Break an edge in a tree, and you have two or more disjoint trees.

This means an adjacency list model is a tree, but not a hierarchy – It is also not normalized!

Page 4: Trees & Hierarchies in SQLfiles.meetup.com/274991/TREES.pdf · Trees in SQL -2 Trees are not hierarchies –Hierarchies have subordination –Kill your captain, you still have to
Page 5: Trees & Hierarchies in SQLfiles.meetup.com/274991/TREES.pdf · Trees in SQL -2 Trees are not hierarchies –Hierarchies have subordination –Kill your captain, you still have to
Page 6: Trees & Hierarchies in SQLfiles.meetup.com/274991/TREES.pdf · Trees in SQL -2 Trees are not hierarchies –Hierarchies have subordination –Kill your captain, you still have to

Trees as Nested Sets

root

A0

A1 A2

B0

Page 7: Trees & Hierarchies in SQLfiles.meetup.com/274991/TREES.pdf · Trees in SQL -2 Trees are not hierarchies –Hierarchies have subordination –Kill your captain, you still have to

Graphs as Tables

Nodes and edges are not the same kind of things – Organizational chart & Personnel file

You should use separate tables for the structure and the elements – You can put more than one structure table on the

same elements – very useful for data mining and reporting

Page 8: Trees & Hierarchies in SQLfiles.meetup.com/274991/TREES.pdf · Trees in SQL -2 Trees are not hierarchies –Hierarchies have subordination –Kill your captain, you still have to

Adjacency List Model

CREATE TABLE AdjTree

(node_name CHAR(2) NOT NULL,

parent _name CHAR(2), -- null is root

<< other data >>);

The structure and node data should be in different tables – Nobody does this in practice

– Look at Oracle's Scott/Tiger sample

Page 9: Trees & Hierarchies in SQLfiles.meetup.com/274991/TREES.pdf · Trees in SQL -2 Trees are not hierarchies –Hierarchies have subordination –Kill your captain, you still have to

Adjacency List Model -2

Programmers do not add constraints: – Trees have no cycles

– Number of edges = number of nodes – 1

– Look up others in any book on graph theory

The result is that adjacency list models get corrupted – If you use stored procedures, they have to check

every row for cycles

– Because it has to be accessed procedurally, this is not discovered

Page 10: Trees & Hierarchies in SQLfiles.meetup.com/274991/TREES.pdf · Trees in SQL -2 Trees are not hierarchies –Hierarchies have subordination –Kill your captain, you still have to

Adjacency List Model -3

The adjacency list model is not normalized – Change ‘B0’ to ‘Bx’ - a single fact is changed

– You must change it in his node – one place

– You must also change it in his subordinates – many places!!

Try to write triggers or DRI actions for adjacency list model – The longest cycle is the number of nodes in the

entire tree

Page 11: Trees & Hierarchies in SQLfiles.meetup.com/274991/TREES.pdf · Trees in SQL -2 Trees are not hierarchies –Hierarchies have subordination –Kill your captain, you still have to

Adjacency List Model -3

Many SQL products do not support table level constraints. Example:

– CHECK((SELECT COUNT(*) FROM Tree) -1

= SELECT COUNT(*)

FROM ((SELECT node_name FROM AdjTree)

UNION

(SELECT parent_name FROM AdjTree)))

Page 12: Trees & Hierarchies in SQLfiles.meetup.com/274991/TREES.pdf · Trees in SQL -2 Trees are not hierarchies –Hierarchies have subordination –Kill your captain, you still have to
Page 13: Trees & Hierarchies in SQLfiles.meetup.com/274991/TREES.pdf · Trees in SQL -2 Trees are not hierarchies –Hierarchies have subordination –Kill your captain, you still have to
Page 14: Trees & Hierarchies in SQLfiles.meetup.com/274991/TREES.pdf · Trees in SQL -2 Trees are not hierarchies –Hierarchies have subordination –Kill your captain, you still have to

Nested Sets as Numbers

Basic nested sets model for tree structure – Does not show the nodes table

– This does not show all constraints

CREATE TABLE Tree

(node_id INTEGER NOT NULL

REFERENCES Nodes(node_id),

lft INTEGER NOT NULL,

rgt INTEGER NOT NULL);

Page 15: Trees & Hierarchies in SQLfiles.meetup.com/274991/TREES.pdf · Trees in SQL -2 Trees are not hierarchies –Hierarchies have subordination –Kill your captain, you still have to

Problems with Adjacency list

You have to use cursors or self-joins to traverse the tree

Cursors are not a table -- their order has meaning -- Closure violation!

Cursors take MUCH longer than queries

Ten level self-joins are worse than cursors

Page 16: Trees & Hierarchies in SQLfiles.meetup.com/274991/TREES.pdf · Trees in SQL -2 Trees are not hierarchies –Hierarchies have subordination –Kill your captain, you still have to

Problems with Path Enumeration

Path can get long in a deep tree

Great for searching down the tree, but not up the tree – SELECT * FROM Tree WHERE path LIKE ‘Root,%’;

– SELECT * FROM Tree WHERE path LIKE ‘%,B0’;

Inserting and deleting nodes is complicated – Requires string manipulation to change all the paths

beneath the insertion or deletion point

Page 17: Trees & Hierarchies in SQLfiles.meetup.com/274991/TREES.pdf · Trees in SQL -2 Trees are not hierarchies –Hierarchies have subordination –Kill your captain, you still have to

Problems with Nested Sets

Not good for traversals – Too set oriented – great for hierarchical summaries

Inserting and deleting nodes is complicated – Requires stored procedures to re-numbering

– Not as bad as you think!

The rows are VERY short, so a lot of them fit onto a page

Math is simple

Page 18: Trees & Hierarchies in SQLfiles.meetup.com/274991/TREES.pdf · Trees in SQL -2 Trees are not hierarchies –Hierarchies have subordination –Kill your captain, you still have to

Tree Aggregates

Give me the total cost for all subtrees – (root, 13.75) -- sum of every node in tree

– (A0, 7.25) -- sum of “A0” subtree

– (A1, 2.00)

– (A2, 3.50)

Dropping A2 would reduce all superior rows by 3.50, but would not change A1

Page 19: Trees & Hierarchies in SQLfiles.meetup.com/274991/TREES.pdf · Trees in SQL -2 Trees are not hierarchies –Hierarchies have subordination –Kill your captain, you still have to

Find Root of Tree

SELECT * FROM Tree WHERE lft = 1;

It helps to have an index the lft column

The rgt value will be twice the number of nodes in the tree.

General rule: The number of nodes in any subtree ((rgt -lft) + 1 )/ 2

Page 20: Trees & Hierarchies in SQLfiles.meetup.com/274991/TREES.pdf · Trees in SQL -2 Trees are not hierarchies –Hierarchies have subordination –Kill your captain, you still have to

Find All Leaf Nodes

SELECT * FROM Tree WHERE lft =

rgt -1;

An index on lft will help

A covering index on (lft, rgt) is even better

Page 21: Trees & Hierarchies in SQLfiles.meetup.com/274991/TREES.pdf · Trees in SQL -2 Trees are not hierarchies –Hierarchies have subordination –Kill your captain, you still have to

Find Superiors of X

SELECT Super.*

FROM Tree AS T1, Tree AS Sup

WHERE T1.node = ‘X’

AND T1.lft BETWEEN Sup.lft mAND Sup.rgt;

This is the most important trick in this method

The BETWEEN predicates preserve subordination in the hierarchy

One query for any depth tree

Page 22: Trees & Hierarchies in SQLfiles.meetup.com/274991/TREES.pdf · Trees in SQL -2 Trees are not hierarchies –Hierarchies have subordination –Kill your captain, you still have to

Find Subordinates of X

SELECT Sub.*

FROM Tree AS T1, Tree AS Sub

WHERE T1.node = ‘X’

AND Sub.lft BETWEEN T1.lft AND

T1.rgt;

This is the same pattern as finding superiors

Page 23: Trees & Hierarchies in SQLfiles.meetup.com/274991/TREES.pdf · Trees in SQL -2 Trees are not hierarchies –Hierarchies have subordination –Kill your captain, you still have to

Find Depth of Tree

SELECT T1.node, COUNT(T2.node) AS lvl

FROM Tree AS T1, Tree AS T2

WHERE T1.lft BETWEEN T2.lft AND T2.rgt

GROUP BY T1.node;

Count the containing nested sets for levels

The closer to the root a node is, the greater the value of (rgt - lft)

Page 24: Trees & Hierarchies in SQLfiles.meetup.com/274991/TREES.pdf · Trees in SQL -2 Trees are not hierarchies –Hierarchies have subordination –Kill your captain, you still have to

Totals by Level in Tree

SELECT T1.node,

SUM(T2.cost) AS

tot_level_cost

FROM Tree AS T1, Tree AS T2

WHERE T2.lft BETWEEN T1.lft AND

T1.rgt

GROUP BY T1.node;

Uses any aggregate function the same way

Page 25: Trees & Hierarchies in SQLfiles.meetup.com/274991/TREES.pdf · Trees in SQL -2 Trees are not hierarchies –Hierarchies have subordination –Kill your captain, you still have to

CTEs and Adjacency List – 1

Part of SQL-99 – Oracle

– DB2

– SQL Server

– others

WITH RECURSIVE <temp table> <column list> AS

(<seed statement>

UNION ALL

<recursive statement>)

<statement>;

Page 26: Trees & Hierarchies in SQLfiles.meetup.com/274991/TREES.pdf · Trees in SQL -2 Trees are not hierarchies –Hierarchies have subordination –Kill your captain, you still have to

CTEs and Adjacency List – 2

Can do tree traversals and many other things

WITH RECURSIVE OrgChart (emp_id, mgr_id, lvl) AS

(SELECT P1.emp_id, P1.mgr_id, 0

FROM Personnel AS P1

WHERE P1.emp_id = :my_guy

UNION ALL

SELECT P2.emp_id, P2.mgr_id, (lvl + 1)

FROM Personnel AS P2, OrgChart AS C1

WHERE P2.mgr_id = C1.emp_id)

SELECT P3.emp_title, P3.emp_id, P3.mgr_id, C2.lvl

FROM Personnel AS P3, OrgChart AS C2

WHERE P3.emp_id = C2.emp_id

AND P3.emp_id <> :my_guy;

Page 27: Trees & Hierarchies in SQLfiles.meetup.com/274991/TREES.pdf · Trees in SQL -2 Trees are not hierarchies –Hierarchies have subordination –Kill your captain, you still have to

Delete a Subtree

Remove subtree rooted at :my_node

DELETE FROM Tree

WHERE lft BETWEEN

(SELECT lft

FROM Tree

WHERE node = :my_node)

AND (SELECT rgt

FROM Tree

WHERE node = :my_node);

Page 28: Trees & Hierarchies in SQLfiles.meetup.com/274991/TREES.pdf · Trees in SQL -2 Trees are not hierarchies –Hierarchies have subordination –Kill your captain, you still have to

Delete & Promote Oldest - 1

Delete A0 node

A1 A2

A0 B0

Root

Page 29: Trees & Hierarchies in SQLfiles.meetup.com/274991/TREES.pdf · Trees in SQL -2 Trees are not hierarchies –Hierarchies have subordination –Kill your captain, you still have to

Delete & Promote Oldest - 2

A2

A1 B0

root

Page 30: Trees & Hierarchies in SQLfiles.meetup.com/274991/TREES.pdf · Trees in SQL -2 Trees are not hierarchies –Hierarchies have subordination –Kill your captain, you still have to

Delete & Promote Subtree - 1

Delete A0 node

A1 A2

A0 B0

Root

Page 31: Trees & Hierarchies in SQLfiles.meetup.com/274991/TREES.pdf · Trees in SQL -2 Trees are not hierarchies –Hierarchies have subordination –Kill your captain, you still have to

Delete & Promote Subtree - 2

A1 A2 B0

root

Page 32: Trees & Hierarchies in SQLfiles.meetup.com/274991/TREES.pdf · Trees in SQL -2 Trees are not hierarchies –Hierarchies have subordination –Kill your captain, you still have to

Delete a Single Node

Method one - promote a child to the parent’s prior position in the tree. – Oldest son inherits family business

– Requires business rules and stored procedures

Method two- subordinate the entire subtree to the grandparent. – Orphans go live with grandmother

– This is the default in nested sets model

Renumbering is not required

Page 33: Trees & Hierarchies in SQLfiles.meetup.com/274991/TREES.pdf · Trees in SQL -2 Trees are not hierarchies –Hierarchies have subordination –Kill your captain, you still have to

Useful View

CREATE VIEW LftRgt (seq_nbr)

AS SELECT lft FROM Tree

UNION ALL

SELECT rgt FROM Tree;

You can use this to find gaps in the node numbering

Page 34: Trees & Hierarchies in SQLfiles.meetup.com/274991/TREES.pdf · Trees in SQL -2 Trees are not hierarchies –Hierarchies have subordination –Kill your captain, you still have to

Gaps in Nested Sets Model -1

Deleted nodes leave gaps in numbering of lft and rgt nodes.

Fill in gaps by sliding everyone over to the lft until there are no gaps.

UPDATE Tree

SET lft = (SELECT COUNT(*)

FROM LftRgt

WHERE seq_nbr <= Tree.lft,

rgt = (SELECT COUNT(*)

FROM LftRgt

WHERE seq_nbr <= Tree.rgt);

Page 35: Trees & Hierarchies in SQLfiles.meetup.com/274991/TREES.pdf · Trees in SQL -2 Trees are not hierarchies –Hierarchies have subordination –Kill your captain, you still have to

Gaps in Nested Sets Model -2

ROW_NUMBER () function to close up gaps

WITH X(lr, seq_nbr)

AS

(SELECT lr, seq_nbr

FROM (SELECT ROW_NUMBER()

OVER (ORDER BY seq_nbr), seq_nbr

FROM LftRgt)

WHERE lr <> seq_nbr)

UPDATE Tree

SET lft = (SELECT lr

FROM X

WHERE X.seq_nbr = Tree.lft,

rgt = (SELECT lr

FROM X

WHERE seq_nbr = Tree.rgt);

Page 36: Trees & Hierarchies in SQLfiles.meetup.com/274991/TREES.pdf · Trees in SQL -2 Trees are not hierarchies –Hierarchies have subordination –Kill your captain, you still have to

Inserting into a Tree

The real trick is numbering the subtree correctly before inserting it.

Basic idea is to spread the Nested Sets numbers apart to make a gap, the size of the subtree then you add the subtree.

The position of the subtree within the siblings of the new parent in the tree is another decision.

Page 37: Trees & Hierarchies in SQLfiles.meetup.com/274991/TREES.pdf · Trees in SQL -2 Trees are not hierarchies –Hierarchies have subordination –Kill your captain, you still have to

Inserting into a Tree -2

If you frequently update the tree structure, then use a bigger spread in the numbering.

At higher levels, use steps of 100,000, then 10,000 and so forth.

Most SQL products can handle DECIMAL(s,p) of 30 or more digits.

Page 38: Trees & Hierarchies in SQLfiles.meetup.com/274991/TREES.pdf · Trees in SQL -2 Trees are not hierarchies –Hierarchies have subordination –Kill your captain, you still have to

Inserting into a Tree -3

B A1 A2

A0 Root

Slide everyone to the left

New

Page 39: Trees & Hierarchies in SQLfiles.meetup.com/274991/TREES.pdf · Trees in SQL -2 Trees are not hierarchies –Hierarchies have subordination –Kill your captain, you still have to

Creating a Tree -1

If you want to have all the constraints for a proper hierarchy, then it is complicated.

CREATE TABLE Tree

(node_id INTEGER NOT NULL -- primary key optional

REFERENCES Nodes(node_id)

ON UPDATE CASCADE

ON DELETE CASCADE,

lft INTEGER NOT NULL UNIQUE CHECK (lft > 0),

rgt INTEGER NOT NULL UNIQUE CHECK (rgt > 1),

UNIQUE (lft, rgt), -- redundant, but useful

CHECK (lft < rgt)

);

Page 40: Trees & Hierarchies in SQLfiles.meetup.com/274991/TREES.pdf · Trees in SQL -2 Trees are not hierarchies –Hierarchies have subordination –Kill your captain, you still have to

Creating a Tree -2

Other needed constraints – no overlaps in the nodes

SELECT *

FROM Tree AS T1

WHERE EXISTS

(SELECT *

FROM Tree AS T2

WHERE T1.lft BETWEEN T2.lft AND T2.rgt

AND T1.rgt NOT BETWEEN T2.lft AND T2.rgt;

Page 41: Trees & Hierarchies in SQLfiles.meetup.com/274991/TREES.pdf · Trees in SQL -2 Trees are not hierarchies –Hierarchies have subordination –Kill your captain, you still have to

Creating a Tree -3

Other needed constraints – no disjoint nodes

SELECT *

FROM Tree AS T1

WHERE EXISTS

(SELECT *

FROM Tree AS T2

WHERE T1.lft <

(SELECT rgt

FROM Tree

WHERE lft = 1));

Page 42: Trees & Hierarchies in SQLfiles.meetup.com/274991/TREES.pdf · Trees in SQL -2 Trees are not hierarchies –Hierarchies have subordination –Kill your captain, you still have to

Creating a Tree -4

If you do not have triggers or CREATE ASSERTION, you can use an updatable view

CREATE VIEW GoodTree (node, i, j)

AS

SELECT T1.node, T1.i, T1.j

FROM Tree AS T1

WHERE NOT EXISTS (<overlaps>)

AND NOT EXISTS (<disjoint>)

WITH CHECK OPTION;

Page 43: Trees & Hierarchies in SQLfiles.meetup.com/274991/TREES.pdf · Trees in SQL -2 Trees are not hierarchies –Hierarchies have subordination –Kill your captain, you still have to

Converting an Adjacency Model into

a Nested Sets Model

Current best method is to load nodes into a tree in a host language, then do a recursive pre-order tree traversal to get the lft and rgt traversal numbers.

Adjacency list method does not order siblings; Nested Sets Model does this automatically

Classic push down stack algorithm works

You can keep both models in one table with a column for the immediate superior

Page 44: Trees & Hierarchies in SQLfiles.meetup.com/274991/TREES.pdf · Trees in SQL -2 Trees are not hierarchies –Hierarchies have subordination –Kill your captain, you still have to

Converting a Nested Sets Model into

an Adjacency Model

This actually pretty straight forward; you can put it into a single view

SELECT B.emp AS boss, P.emp

FROM OrgChart AS P

LEFT OUTER JOIN

OrgChart AS B

ON B.lft

= (SELECT MAX(lft)

FROM OrgChart AS S

WHERE P.lft > S.lft

AND P.lft < S.rgt);

Page 45: Trees & Hierarchies in SQLfiles.meetup.com/274991/TREES.pdf · Trees in SQL -2 Trees are not hierarchies –Hierarchies have subordination –Kill your captain, you still have to

Structure versus Contents

Nested Sets Model allows the structure of trees to be compared

– For each tree find the lft value of the root node of each tree

– Make a canonical form and UNION ALL them

EXISTS ( SELECT *

FROM ( SELECT (lft - lftmost), (rgt - lftmost)

FROM Tree1

UNION ALL

SELECT (lft - lftmost), (rgt - lftmost)

FROM Tree2) AS Both (lft, rgt)

GROUP BY Both.lft, Both.rgt

HAVING COUNT (*) =1 ) ;

Page 46: Trees & Hierarchies in SQLfiles.meetup.com/274991/TREES.pdf · Trees in SQL -2 Trees are not hierarchies –Hierarchies have subordination –Kill your captain, you still have to

Questions & Answers

?