Brought to you by Max (ICQ:31252512 TEL:61337706) February 5, 2005 Advanced Data Structures...

32
Brought to you by Max (ICQ:31252512 TEL:61337706) February 5, 2005 Advanced Data Structur es Introduction

Transcript of Brought to you by Max (ICQ:31252512 TEL:61337706) February 5, 2005 Advanced Data Structures...

Page 1: Brought to you by Max (ICQ:31252512 TEL:61337706) February 5, 2005 Advanced Data Structures Introduction.

Brought to you by Max (ICQ:31252512 TEL:61337706)February 5, 2005

Advanced Data StructuresIntroduction

Page 2: Brought to you by Max (ICQ:31252512 TEL:61337706) February 5, 2005 Advanced Data Structures Introduction.

Page 2

Outline

• Review of some data structures Array Linked List Sorted Array

• New stuff 3 of the most important data structures in OI (and your own pro

gramming) Binary Search Tree Heap (Priority Queue) Hash Table

Page 3: Brought to you by Max (ICQ:31252512 TEL:61337706) February 5, 2005 Advanced Data Structures Introduction.

Page 3

Review

• How to measure the merits of a data structure?• Time complexity of common operations

Function Find(T : DataType) : Element Function Find_Min() : Element Procedure Add(T : DataType) Procedure Remove(E : Element) Procedure Remove_Min()

Page 4: Brought to you by Max (ICQ:31252512 TEL:61337706) February 5, 2005 Advanced Data Structures Introduction.

Page 4

Review - Array

• Here Element is simply the integer index of the array cell

• Find(T) Must scan the whole array, O(N)

• Find_Min() Also need to scan the whole array, O(N)

• Add(T) Simply add it to the end of the array, O(1)

• Remove(E) Deleting an element creates a hole Copy the last element to fill the hole, O(1)

• Remove_Min() Need to Find_Min() then Remove(), O(N)

Page 5: Brought to you by Max (ICQ:31252512 TEL:61337706) February 5, 2005 Advanced Data Structures Introduction.

Page 5

Review - Linked List

• Element is a pointer to the object• Find(T)

Scan the whole list, O(N)• Find_Min()

Scan the whole list, O(N)• Add(T)

Just add it to a convenient position (e.g. head), O(1)• Remove(E)

With suitable implementation, O(1)• Remove_Min()

Need to Find_Min() then Remove(), O(N)

Page 6: Brought to you by Max (ICQ:31252512 TEL:61337706) February 5, 2005 Advanced Data Structures Introduction.

Page 6

Review - Sorted Array

• Like array, Element is the integer index of the cell• Find(T)

We can use binary search, O(logN)• Find_Min()

The first element must be the minimum, O(1)• Add(T)

First we need to find the correct place, O(logN) Then we need to shift the array by 1 cell, O(N)

• Remove(E) Deleting an element creates a hole Need to shift the of array by 1 cell, O(N)

• Remove_Min() Can be O(1) or O(N) depending on choice of implementation

Page 7: Brought to you by Max (ICQ:31252512 TEL:61337706) February 5, 2005 Advanced Data Structures Introduction.

Page 7

Review - Summary

• If we are going to perform a lot of these operations (e.g. N=100000), none of these is fast enough!

Array Linked List Sorted ArrayFind O(N) O(N) O(logN)Find_Min O(N) O(N) O(1)Add O(1) O(1) O(N)Remove O(1) O(1) O(N)Remove_Min

O(N) O(N) O(1) or O(N)

Page 8: Brought to you by Max (ICQ:31252512 TEL:61337706) February 5, 2005 Advanced Data Structures Introduction.

Brought to you by Max (ICQ:31252512 TEL:61337706)February 5, 2005

Advanced Data StructuresBinary Search Tree

Page 9: Brought to you by Max (ICQ:31252512 TEL:61337706) February 5, 2005 Advanced Data Structures Introduction.

Page 9

What is a Binary Search Tree?

• Use a binary tree to store the data• Maintain this property

Left Subtree < Node < Right Subtree

Page 10: Brought to you by Max (ICQ:31252512 TEL:61337706) February 5, 2005 Advanced Data Structures Introduction.

Page 10

Binary Search Tree - Implementation• Definition of a Node:

Node = RecordLeft, Right : ^Node;Value : Integer;

End;

• To search for a value (pseudocode)Node Find(Node N, Value V) :-

If (N.Value = V)Return N;

Else If (V < N.Value) and (V.Left != NULL)Return Find(N.Left);

Else If (V > N.Value) and (V.Right != NULL)Return Find(N.Right);

ElseReturn NULL; // not found

Page 11: Brought to you by Max (ICQ:31252512 TEL:61337706) February 5, 2005 Advanced Data Structures Introduction.

Page 11

Binary Search Tree - Find

Page 12: Brought to you by Max (ICQ:31252512 TEL:61337706) February 5, 2005 Advanced Data Structures Introduction.

Page 12

Binary Search Tree - Remove

• Case I : Removing a leaf node Easy

• Case II : Removing a node with a single child Replace the removed node with its child

• Case III : Removing a node with 2 children Replace the removed node with the minimum element in the righ

t subtree (or maximum element in the left subtree) This may create a hole again Apply Case I or II

• Sometimes you can avoid this by using “Lazy Deletion” Mark a node as removed instead of actually removing it Less coding, performance hit not big if you are not doing this freq

uently (may even save time)

Page 13: Brought to you by Max (ICQ:31252512 TEL:61337706) February 5, 2005 Advanced Data Structures Introduction.

Page 13

Binary Search Tree - Remove

Page 14: Brought to you by Max (ICQ:31252512 TEL:61337706) February 5, 2005 Advanced Data Structures Introduction.

Page 14

Binary Search Tree - Summary

• Add() is similar to Find()• Find_Min()

Just walk to the left, easy• Remove_Min()

Equivalent to Find_Min() then Remove()

• Summary Find() : O(logN) Find_Min() : O(logN) Remove_Min() : O(logN) Add() : O(logN) Remove() : O(logN) The BST is “supposed” to behave like that

Page 15: Brought to you by Max (ICQ:31252512 TEL:61337706) February 5, 2005 Advanced Data Structures Introduction.

Page 15

Binary Search Tree - Problems

• In reality… All these operations are O(logN) only if the tree is balanced Inserting a sorted sequence degenerates into a linked list

• The real upper bounds Find() : O(N) Find_Min() : O(N) Remove_Min() : O(N) Add() : O(N) Remove() : O(N)

• Solution AVL Tree, Red Black Tree Use “rotations” to maintain balance Both are difficult to implement, rarely used

Page 16: Brought to you by Max (ICQ:31252512 TEL:61337706) February 5, 2005 Advanced Data Structures Introduction.

Brought to you by Max (ICQ:31252512 TEL:61337706)February 5, 2005

Advanced Data StructuresHeap (Priority Queue)

Page 17: Brought to you by Max (ICQ:31252512 TEL:61337706) February 5, 2005 Advanced Data Structures Introduction.

Page 17

What is a Heap?

• A (usually) complete binary tree for Priority Queue Enqueue = Add Dequeue = Find_Min and Remove_Min

• Heap Property Every node’s value is greater than those of its decendants

Page 18: Brought to you by Max (ICQ:31252512 TEL:61337706) February 5, 2005 Advanced Data Structures Introduction.

Page 18

Heap - Implementation• Usually we use an array to simulate a heap• Assume nodes are indexed 1, 2, 3, ...

Parent = [Node / 2] Left Child = Node*2 Right Child = Node*2 + 1

Page 19: Brought to you by Max (ICQ:31252512 TEL:61337706) February 5, 2005 Advanced Data Structures Introduction.

Page 19

Heap - Add

• Append the new element at the end• Shift it up until the heap property is restored• Why always works?

Page 20: Brought to you by Max (ICQ:31252512 TEL:61337706) February 5, 2005 Advanced Data Structures Introduction.

Page 20

Heap - Remove_Min

• Replace the root with the last element• Shift it down until the heap property is restored• Again, why it always works?

Page 21: Brought to you by Max (ICQ:31252512 TEL:61337706) February 5, 2005 Advanced Data Structures Introduction.

Page 21

Heap - Build_Heap

• There is a special operation called Build_Heap Transform an ordinary into a heap without using extra memory

• The Remove_Min operation has two steps Replace the root with a leaf node Restore the heap structure by shifting the node down

• This is called “Heapify”• If we apply the Heapify step to ALL internal nodes, botto

m to up, we get a heap

Page 22: Brought to you by Max (ICQ:31252512 TEL:61337706) February 5, 2005 Advanced Data Structures Introduction.

Page 22

Heap - Build_Heap

Page 23: Brought to you by Max (ICQ:31252512 TEL:61337706) February 5, 2005 Advanced Data Structures Introduction.

Page 23

Heap - Summary

• Find() is usually not supported by a heap You may scan the whole tree / array if you really want

• Remove() is equivalent to applying Remove_Min() on a subtree

Remember that any subtree of a heap is also a heap

• Summary Find() : O(N) // We usually don’t use Heap for this Find_Min() : O(1) Remove_Min() : O(logN) Add() : O(logN) Remove() : O(logN)

Page 24: Brought to you by Max (ICQ:31252512 TEL:61337706) February 5, 2005 Advanced Data Structures Introduction.

Brought to you by Max (ICQ:31252512 TEL:61337706)February 5, 2005

Advanced Data StructuresHash Table

Page 25: Brought to you by Max (ICQ:31252512 TEL:61337706) February 5, 2005 Advanced Data Structures Introduction.

Page 25

What is a Hash Table?

• Question We have a Mark Six result (6 integers in the range 1..49) We want to check if our bet matches it What is the most efficient way?

• Answer Use a boolean array with 49 cells Checking a number is O(1)

• Problem What if the range of number is very large? What if we need to store strings?

• Solution Use a “Hash Function” to compress the range of values

Page 26: Brought to you by Max (ICQ:31252512 TEL:61337706) February 5, 2005 Advanced Data Structures Introduction.

Page 26

Hash Table

• Suppose we need to store values between 0 and 99, but only have an array with 10 cells

• We can map the values [0,99] to [0,9] by taking modulo 10. The result is the “Hash Value”

• Adding, finding and removing an element are O(1)

• It is even possible to map the strings to integers, e.g. “ATE” to (1*26*26+20*26+5) mod 10

Page 27: Brought to you by Max (ICQ:31252512 TEL:61337706) February 5, 2005 Advanced Data Structures Introduction.

Page 27

Hash Table - Collision

• But this approach has an inherent problem What happens if two data has the same hash value?

• Two major methods to deal with this Chaining (Also called Open Hashing) Open Addressing (Also called Closed Hashing)

Page 28: Brought to you by Max (ICQ:31252512 TEL:61337706) February 5, 2005 Advanced Data Structures Introduction.

Page 28

Hash Table - Chaining• Keep a link list at each hash table cell• On average, Add / Find / Remove is O(1+)

= Load Factor = # of stored elements / # of cells• If hash function is “random” enough, usually can get the average case

Page 29: Brought to you by Max (ICQ:31252512 TEL:61337706) February 5, 2005 Advanced Data Structures Introduction.

Page 29

Hash Table - Open Addressing

• If you don’t want to implement a linked list…• An alternative is to skip a cell if it is occupied• The following diagram illustrates “Linear Probing”

Page 30: Brought to you by Max (ICQ:31252512 TEL:61337706) February 5, 2005 Advanced Data Structures Introduction.

Page 30

Hash Table - Open Addressing

• Find() must continue until a blank cell is reached• Remove() must use Lazy Deletion, otherwise further ope

rations may fail

Page 31: Brought to you by Max (ICQ:31252512 TEL:61337706) February 5, 2005 Advanced Data Structures Introduction.

Page 31

Hash Table - Summary

• Find_Min() and Remove_Min() are usually not supported in a Hash Table

You may scan the whole tree / array if you really want

• For Chaining Find() : O(1+) Add() : O(1+ Remove() : O(1+)

• For Open Adressing Find() : O(1 / 1-) Add() : O(1 / 1-) Remove() : O(ln(1/1-)/ + 1/)

• Both are close to O(1) if is kept small (< 50%)

Page 32: Brought to you by Max (ICQ:31252512 TEL:61337706) February 5, 2005 Advanced Data Structures Introduction.

Page 32

Miscellaneous Stuff

• Judge problems 1020 – Left Join 1021 – Inner Join 1019 – Addition II

• Past contest problems NOI2004 Day 1 – Cashier Any more?

• Good place to find related information - Wikipedia http://en.wikipedia.org/wiki/Binary_search_tree http://en.wikipedia.org/wiki/Binary_heap http://en.wikipedia.org/wiki/Hash_table