9 trees iv

Post on 15-Jul-2015

79 views 0 download

Tags:

Transcript of 9 trees iv

TREES

Table of Contents:

•Heapsort

• B Tree

• Huffman’s Algorithm

Heap

• Suppose H is a complete binary tree with n elements. Then H is called a heap or a maxheap if each node N of H has the property that value of N is greater than or equal to value at each of the children of N.

• Analogously, a minheap is a heap such that value at N is less than or equal to the value of each of its children

97

88 95

66 55 87 48

25 3877623524

26

48

40 39

30

18 17

Example of Max Heap

Inserting an Element in a Heap Suppose H is a heap with N elements, and suppose an ITEM of

information is given. We insert ITEM into the heap H as follows:

• First adjoin the ITEM at the end of H so that H is still a complete tree but not necessarily a heap.

• Then let the ITEM rise to its appropriate place in H so that H is finally a heap.

[Heap is more efficiently implemented through array rather than linked list. In a heap, the location of parent of a node PTR is given by PTR/2 ]

Build a Maxheap

Following are the elements:

44,30,50, 22,60,55,77,55

Algorithm: INSHEAP( TREE, N, ITEM)

A heap H with N elements is stored in the array TREE and an ITEM of information is given. This procedure inserts the ITEM as the new element of H. PTR gives the location of ITEM as it rises in the tree and PAR denotes the parent of ITEM

1. [Add new node to H and Initialize PTR]

Set N:= N +1 and PTR:=N

2. [Find Location to Insert ITEM

Repeat steps 3 to 6 while PTR > 1

3. Set PAR:= └PTR/2 ┘ [Location of Parent node]

4. If ITEM ≤ TREE[PAR], then:

Set TREE[PTR]:=ITEM and Return

[End of If Structure]

5. Set TREE[PTR]:=TREE[PAR] [Moves node down]

6. Set PTR:=PAR [updates PTR]

[End of step 2 Loop]

7. Set TREE[1]:=ITEM

8. Return

Deleting the Root node in a heap

Suppose H is a heap with N elements and suppose we want to delete the root R of H. This is accomplished as follows:

• Assign the root R to some variable ITEM

• Replace the deleted node R by last node L of H so that H is still a complete tree but not necessarily a heap.

• Let L sink to its appropriate place in H so that H is finally a heap.

95

85 70

55 33 30 65

15 20 15 22

DELETE 95

Algorithm: DELHEAP( TREE, N , ITEM )

A heap H with N elements is stored in the array TREE. This algorithm assigns the root TREE[1] of H to the variable ITEM and then reheaps the remaining elements. The variable LAST stores the value of the original last node of H. The pointers PTR, LEFT and RIGHT give the Location of LAST and its left and right children as LAST sinks into the tree.

1: Set ITEM:=TREE[1] [removes root of H]

2: Set LAST:=TREE[N] and N:=N-1 [removes last node of H]

3: Set PTR:=1, LEFT:=2 and RIGHT:=3

4: Repeat step 5 to 7 while RIGHT ≤ N:

5: If LAST ≥ TREE[LEFT] and LAST ≥ TREE [RIGHT] , then:

Set TREE[PTR]:=LAST and Return

6: If TREE[RIGHT]≤ TREE[LEFT], then:

Set TREE[PTR]:=TREE[LEFT]

Set PTR:=LEFT

Else:

Set TREE[PTR]:=TREE[RIGHT] and PTR:=RIGHT

[End of If structure]

Set LEFT:= 2* PTR and RIGHT:=LEFT + 1

[End of Loop]

7: If LEFT=N and If LAST < TREE[LEFT], then:

Set TREE[PTR]:=TREE[LEFT] and Set PTR:=LEFT

8: Set TREE[PTR]:=LAST

9: Return

Application of Heap

HeapSort- One of the important applications of heap is sorting of an array using heapsort method. Suppose an array A with N elements is to be sorted. The heapsort algorithm sorts the array in two phases:

• Phase A: Build a heap H out of the elements of A

• Phase B: Repeatedly delete the root element of H

Since the root element of heap contains the largest element of the heap, phase B deletes the elements in decreasing order. Similarly, using heapsort in minheap sorts the elements in increasing order as then the root represents the smallest element of the heap.

Algorithm: HEAPSORT(A,N)

An array A with N elements is given. This algorithm sorts the elements of the array

• Step 1: [Build a heap H, call the procedure ]

Repeat for J=1 to N-1:

Call INSHEAP(A, J, A[J+1])

[End of Loop]• Step 2: [Sort A repeatedly deleting the root of H]

Repeat while N > 1:

(a) Call DELHEAP( A, N, ITEM)

(b) Set A[N + 1] := ITEM [Store the elements deleted from

the heap]

[End of loop]• Step 3: Exit

Complexity of HeapSort

• Phase1 (Build a heap H out of the ‘n’ elements of A):

g(n) ≤ nlog2n

• Phase 2 (Repeatedly delete the root element of H):

h(n) ≤ nlog2n

Therefore, f(n) = O(nlog2n) (In worst case)

• Better than Bubblesort (O(n2 )) and Quicksort (Avg- O(nlog2n), Worst O(n2 ))

• Problem: Create a Heap out of the following data:

jan feb mar apr may jun jul aug sept oct nov dec

B-Trees

Motivation for B-Trees

• Index structures for large datasets cannot be stored in main memory

• Storing it on disk requires different approach to efficiency

• Assuming that a disk spins at 3600 RPM, one revolution occurs in 1/60 of a second, or 16.7ms

• Crudely speaking, one disk access takes about the same time as 200,000 instructions

Motivation (cont.)

• Assume that we use an AVL tree to store about 20 million records

• We end up with a very deep binary tree with lots of different disk accesses; log2 20,000,000 is about 24, so this takes about 0.2 seconds

• We know we can’t improve on the log n lower bound on search for a binary tree

• But, the solution is to use more branches and thus reduce the height of the tree!

– As branching increases, depth decreases

Definition of a B-tree• A B-tree of order m is an m-way tree (i.e., a tree

where each node may have up to m children) in which:

1.the number of keys in each non-leaf node is one less than the number of its children and these keys partition the keys in the children in the fashion of a search tree

2.all leaves are on the same level

3.all non-leaf nodes except the root have at least m / 2 children

4.the root is either a leaf node, or it has from two to m children

5.a leaf node contains no more than m – 1 keys

• The number m should always be odd

An example B-Tree

6251426 12

26

55 60 7064 9045

1 2 4 7 8 1513 18 25

27 29 46 48 53

A B-tree of order 5 containing 26 items

Note that all the leaves are at the same levelNote that all the leaves are at the same level

• Suppose we start with an empty B-tree and keys arrive in the following order:1 12 8 2 25 5 14 28 17 7 52 16 48 68 3 26 29 53 55 45

• We want to construct a B-tree of order 5

• The first four items go into the root:

• To put the fifth item in the root would violate condition 5

• Therefore, when 25 arrives, pick the middle key to make a new root

Constructing a B-tree

1 2 8 12

Constructing a B-tree (contd.)

1 2

8

12 25

6, 14, 28 get added to the leaf nodes:

1 2

8

12 146 25 28

Constructing a B-tree (contd.)

Adding 17 to the right leaf node would over-fill it, so we take the middle key, promote it (to the root) and split the leaf

8 17

12 14 25 281 2 6

7, 52, 16, 48 get added to the leaf nodes

8 17

12 14 25 281 2 6 16 48 527

Constructing a B-tree (contd.)Adding 68 causes us to split the right most leaf, promoting 48 to the root, and adding 3 causes us to split the left most leaf, promoting 3 to the root; 26, 29, 53, 55 then go into the leaves

3 8 17 48

52 53 55 6825 26 28 291 2 6 7 12 14 16

Adding 45 causes a split of 25 26 28 29

and promoting 28 to the root then causes the root to split

Constructing a B-tree (contd.)

17

3 8 28 48

1 2 6 7 12 14 16 52 53 55 6825 26 29 45

Inserting into a B-Tree

• Attempt to insert the new key into a leaf

• If this would result in that leaf becoming too big, split the leaf into two, promoting the middle key to the leaf’s parent

• If this would result in the parent becoming too big, split the parent into two, promoting the middle key

• This strategy might have to be repeated all the way to the top

• If necessary, the root is split in two and the middle key is promoted to a new root, making the tree one level higher

Exercise in Inserting a B-Tree

• Insert the following keys to a 5-way B-tree:

3, 7, 9, 23, 45, 1, 5, 14, 25, 24, 13, 11, 8, 19, 4, 31, 35, 56

Removal from a B-tree

• During insertion, the key always goes into a leaf. For deletion we wish to remove from a leaf. There are three possible ways we can do this:

CASE: 1 - If the key is already in a leaf node, and removing it doesn’t cause that leaf node to have too few keys, then simply remove the key to be deleted.

CASE: 2 - If the key is not in a leaf then it is guaranteed (by the nature of a B-tree) that its predecessor or successor will be in a leaf -- in this case we can delete the key and promote the predecessor or successor key to the non-leaf deleted key’s position.

Removal from a B-tree (2)• If (1) or (2) lead to a leaf node containing less than

the minimum number of keys then we have to look at the siblings immediately adjacent to the leaf in question:

CASE: 3- If one of them has more than the min. number of keys then we can promote one of its keys to the parent and take the parent key into our lacking leaf

CASE:4 - If neither of them has more than the min. number of keys then the lacking leaf and one of its neighbours can be combined with their shared parent (the opposite of promoting a key) and the new leaf will have the correct number of keys; if this step leave the parent with too few keys then we repeat the process up to the root itself, if required

Type #1: Simple leaf deletion

1212 2929 5252

22 77 99 1515 2222 5656 6969 72723131 4343

Delete 2: Since there are enoughkeys in the node, just delete it

Assuming a 5-wayB-Tree, as before...

Type #2: Simple non-leaf deletion

1212 2929 5252

77 99 1515 2222 5656 6969 72723131 4343

Delete 52

Borrow the predecessoror (in this case) successor

5656

Delete 52Delete 72

Type #4: Too few keys in node and its siblings

1212 2929 5656

77 99 1515 2222 6969 72723131 4343

Delete 72Too few keys!

Join back together

Type #4: Too few keys in node and its siblings

1212 2929

77 99 1515 2222 696956563131 4343

Delete 22

Type #3: Enough siblings

1212 2929

77 99 1515 2222 696956563131 4343

Delete 22

Demote root key andpromote leaf key

Type #4: Too few keys in node and its siblings

1212 2929 5656

77 99 1515 2222 6969 72723131 4343

Delete 72Too few keys!

Join back together

Type #3: Enough siblings

1212

292977 99 1515

3131

696956564343

Deletion Exercise

Delete 95,226

Given a B tree of Order 5

Result after deletion of 95,226

Delete 221

Result after deletion of 221

Delete 70

B Tree after Deletion of 70

Exercise to do

• Given 5-way B-tree created by these data (last exercise):

3, 7, 9, 23, 45, 1, 5, 14, 25, 24, 13, 11, 8, 19, 4, 31, 35, 56

• Further Add the following keys:

– 2, 6,12

• Delete the following keys:

– 4, 5, 7, 3, 14

Comparing Trees• Binary trees

– Can become unbalanced and lose their good time complexity (big O)

– AVL trees are strict binary trees that overcome the balance problem

– Heaps remain balanced but only prioritise (not order) the keys

• Multi-way trees

– B-Trees can be m-way, they can have any (odd) number of children

– One B-Tree, the 2-3 (or 3-way) B-Tree, approximates a permanently balanced binary tree, exchanging the AVL tree’s balancing operations for insertion and (more complex) deletion operations

Huffman Coding: An Application of Binary Trees and

Priority Queues

Encoding and Compression of Data

• ASCII

• Variations on ASCII

– min number of bits needed

– cost of savings

– patterns

– modifications

Purpose of Huffman Coding

• Proposed by Dr. David A. Huffman in 1952

– “A Method for the Construction of Minimum Redundancy Codes”

• Applicable to many forms of data transmission

– Our example: text files

The Basic Algorithm

• Huffman coding is a form of statistical coding

• Not all characters occur with the same frequency!

• Yet all characters are allocated the same amount of space

– 1 char = 1 byte, be it e or x

The Basic Algorithm

• Any savings in tailoring codes to frequency of character?

• Code word lengths are no longer fixed like ASCII.

• Code word lengths vary and will be shorter for the more frequently used characters.

The (Real) Basic Algorithm

1. Scan text to be compressed and tally occurrence of all characters.

2. Sort or prioritize characters based on number of occurrences in text.

3. Build Huffman code tree based on prioritized list.

4. Perform a traversal of tree to determine all code words.

5. Scan text again and create new file using the Huffman codes.

Building a TreeScan the original text

• Consider the following short text:

Eerie eyes seen near lake.• Count up the occurrences of all characters in the text

Building a TreeScan the original text

Eerie eyes seen near lake.

• What characters are present?

E e r i space y s n a r l k .

Building a TreeScan the original text

Eerie eyes seen near lake.• What is the frequency of each character in the text?

Building a TreePrioritize characters

• Create binary tree nodes with character and frequency of each character

• Place nodes in a priority queue

– The lower the occurrence, the higher the priority in the queue

Building a Tree

• The queue after inserting all nodes

• Null Pointers are not shown

E

1

i

1

y

1

l

1

k

1

.

1

r

2

s

2

n

2

a

2

sp

4

e

8

Building a Tree

• While priority queue contains two or more nodes

– Create new node

– Dequeue node and make it left subtree

– Dequeue next node and make it right subtree

– Frequency of new node equals sum of frequency of left and right children

– Enqueue new node back into queue

Building a Tree

E

1

i

1

y

1

l

1

k

1

.

1

r

2

s

2

n

2

a

2

sp

4

e

8

Building a Tree

E

1

i

1

y

1

l

1

k

1

.

1

r

2

s

2

n

2

a

2

sp

4

e

8

2

Building a Tree

E

1

i

1

y

1

l

1

k

1

.

1

r

2

s

2

n

2

a

2

sp

4

e

8

2

Building a Tree

E

1

i

1

k

1

.

1

r

2

s

2

n

2

a

2

sp

4

e

8

2

y

1

l

1

2

Building a Tree

E

1

i

1

k

1

.

1

r

2

s

2

n

2

a

2

sp

4

e

8

2

y

1

l

1

2

Building a Tree

E

1

i

1

r

2

s

2

n

2

a

2

sp

4

e

8

2

y

1

l

1

2

k

1

.

1

2

Building a Tree

E

1

i

1

r

2

s

2

n

2

a

2

sp

4

e

8

2

y

1

l

1

2

k

1

.

1

2

Building a Tree

E

1

i

1

n

2

a

2sp

4

e

8

2

y

1

l

1

2

k

1

.

1

2

r

2

s

2

4

Building a Tree

E

1

i

1

n

2

a

2

sp

4

e

8

2

y

1

l

1

2

k

1

.

1

2

r

2

s

2

4

Building a Tree

E

1

i

1

sp

4

e

8

2

y

1

l

1

2

k

1

.

1

2

r

2

s

2

4

n

2

a

2

4

Building a Tree

E

1

i

1

sp

4

e

8

2

y

1

l

1

2

k

1

.

1

2

r

2

s

2

4

n

2

a

2

4

Building a Tree

E

1

i

1

sp

4

e

8

2

y

1

l

1

2

k

1

.

1

2

r

2

s

2

4

n

2

a

2

4

4

Building a Tree

E

1

i

1

sp

4

e

82

y

1

l

1

2k

1

.

1

2

r

2

s

2

4

n

2

a

2

4 4

Building a Tree

E

1

i

1

sp

4

e

82

y

1

l

1

2

k

1

.

1

2

r

2

s

2

4

n

2

a

2

4 4

6

Building a Tree

E

1

i

1

sp

4

e

82

y

1

l

1

2

k

1

.

1

2r

2

s

2

4

n

2

a

2

4 4 6

What is happening to the characters with a low number of occurrences?

Building a Tree

E

1

i

1

sp

4

e

82

y

1

l

1

2

k

1

.

1

2

r

2

s

2

4

n

2

a

2

4

4 6

8

Building a Tree

E

1

i

1

sp

4

e

82

y

1

l

1

2

k

1

.

1

2

r

2

s

2

4

n

2

a

2

4

4 6 8

Building a Tree

E

1

i

1

sp

4

e

8

2

y

1

l

1

2

k

1

.

1

2

r

2

s

2

4

n

2

a

2

4

46

8

10

Building a Tree

E

1

i

1

sp

4

e

8

2

y

1

l

1

2

k

1

.

1

2r

2

s

2

4

n

2

a

2

4 46

8 10

Building a Tree

E

1

i

1

sp

4

e

8

2

y

1

l

1

2

k

1

.

1

2

r

2

s

2

4

n

2

a

2

4

46

8

10

16

Building a Tree

E

1

i

1

sp

4

e

8

2

y

1

l

1

2

k

1

.

1

2

r

2

s

2

4

n

2

a

2

4

46

8

10 16

Building a Tree

E

1

i

1

sp

4

e

8

2

y

1

l

1

2

k

1

.

1

2

r

2

s

2

4

n

2

a

2

4

46

8

1016

26

Building a Tree

E

1

i

1

sp

4

e

8

2

y

1

l

1

2

k

1

.

1

2

r

2

s

2

4

n

2

a

2

4

46

8

1016

26

After enqueueing this node there is only one node left in priority queue.

Building a Tree

Dequeue the single node left in the queue.

This tree contains the new code words for each character.

Frequency of root node should equal number of characters in text.

E

1

i

1

sp

4

e

8

2

y

1

l

1

2

k

1

.

1

2

r

2

s

2

4

n

2

a

2

4

46 8

1016

26

Eerie eyes seen near lake. � 26 characters

Encoding the FileTraverse Tree for Codes

• Perform a traversal of the tree to obtain new code words

• Going left is a 0 going right is a 1

• code word is only completed when a leaf node is reached

E

1

i

1

sp

4

e

8

2

y

1

l

1

2

k

1

.

1

2

r

2

s

2

4

n

2

a

2

4

46 8

1016

26

Encoding the FileTraverse Tree for Codes

Char CodeE 0000i 0001y 0010l 0011k 0100. 0101space011e 10r 1100s 1101n 1110a 1111

E

1

i

1

sp

4

e

8

2

y

1

l

1

2

k

1

.

1

2

r

2

s

2

4

n

2

a

2

4

46 8

1016

26

Encoding the File

• Rescan text and encode file using new code words

Eerie eyes seen near lake.

Char CodeE 0000i 0001y 0010l 0011k 0100. 0101space011e 10r 1100s 1101n 1110a 1111

0000101100000110011100010101101101001111101011111100011001111110100100101

• Why is there no need for a separator character?

.

Encoding the FileResults

• Have we made things any better?

• 73 bits to encode the text

• ASCII would take 8 * 26 = 208 bits

0000101100000110011100010101101101001111101011111100011001111110100100101

If modified code used 4 bits per character are needed. Total bits 4 * 26 = 104. Savings not as great.

Decoding the File

• How does receiver know what the codes are?

• Tree constructed for each text file.

– Considers frequency for each file

– Big hit on compression, especially for smaller files

• Tree predetermined

– based on statistical analysis of text files or file types

• Data transmission is bit based versus byte based

Decoding the File

• Once receiver has tree it scans incoming bit stream

• 0 ⇒ go left

• 1 ⇒ go right

E

1

i

1

sp

4

e

8

2

y

1

l

1

2

k

1

.

1

2

r

2

s

2

4

n

2

a

2

4

46 8

1016

26

10100011011110111101111110000110101

Summary

• Huffman coding is a technique used to compress files for transmission

• Uses statistical coding

– more frequently used symbols have shorter code words

• Works well for text and fax transmissions

• An application that uses several data structures

HUFFMAN’S Algorithm

• Data ITEM : A B C D E F G H

• Weight 22 5 11 19 2 11 25 5

A

BCF

DE

G

G

CBA

E DF

H

Z

Y

W S

TPR

X Q