Hash Tables

23
Hash Tables a hash table is an array of size Tsize has index positions 0 .. Tsize-1 two types of hash tables closed hash table array element type is a <key, value> pair all items stored in the array Chained (open) hash table element type is a pointer to a linked list of nodes containing <key, value> pairs items are stored in the linked list nodes keys are used to generate an array index home address (0 .. Tsize-1)

description

Hash Tables. a hash table is an array of size Tsize has index positions 0 .. Tsize-1 two types of hash tables closed hash table array element type is a pair all items stored in the array Chained (open) hash table - PowerPoint PPT Presentation

Transcript of Hash Tables

Page 1: Hash Tables

1

Hash Tables a hash table is an array of size Tsize

has index positions 0 .. Tsize-1 two types of hash tables

closed hash table array element type is a <key, value> pair all items stored in the array

Chained (open) hash table element type is a pointer to a linked list of nodes

containing <key, value> pairs items are stored in the linked list nodes

keys are used to generate an array index home address (0 .. Tsize-1)

Page 2: Hash Tables

2

faster searching"balanced" search trees guarantee

O(log2 n) search path by controlling height of the search tree AVL tree 2-3-4 tree red-black tree (used by STL associative

container classes)hash table allows for O(1) search

performance search time does not increase as n

increases

Page 3: Hash Tables

3

ConsiderationsHow big an array?

load factor of a hash table is n/TsizeHash function to use?

int hash(KeyType key) // 0 .. Tsize-1

Collision resolution strategy? hash function is many-to-one

Page 4: Hash Tables

4

Hash Functiona hash function is used to map a

key to an array index (home address) search starts from here

insert, retrieve, update, delete all start by applying the hash function to the key

Page 5: Hash Tables

5

Some hash functions if KeyType is int - key % TSize if KeyType is a string - convert to an

integer and then % Tsizegoals for a hash function

fast to compute even distribution

cannot guarantee no collisions unless all key values are known in advance

Page 6: Hash Tables

6

An Closed Hash Table

key value

Hash (key) producesan index in the range0 to 6. That index isthe “home address”

0123456

Some insertions:K1 --> 3K2 --> 5K3 --> 2

K1 K1info

K2 K2info

K3 K3info

Page 7: Hash Tables

7

Handling Collisions0123456

K3 K3infoK1 K1info

K2 K2info

Some more insertions:K4 --> 3K5 --> 2K6 --> 4

K4 K4info

K5 K5info

K6 K6info

Linear probing collisionresolution strategy

Page 8: Hash Tables

8

Search Performance

0123456

K3 K3infoK1 K1info

K2 K2infoK4 K4info

K5 K5info

K6 K6infoAverage number of probes needed to retrieve the value with key K?

K hash(K) #probesK1 3 1K2 5 1K3 2 1K4 3 2K5 2 5K6 4 4

14/6 = 2.33 (successful)unsuccessful search?

Page 9: Hash Tables

9

A Chained Hash Tableinsert keys:K1 --> 3K2 --> 5K3 --> 2K4 --> 3K5 --> 2K6 --> 4

linked lists of synonyms

0123456

K3 K3info

K1 K1info

K5 K5info

K4 K4info

K6 K6info

K2 K2info

Page 10: Hash Tables

10

Search PerformanceAverage number of probes needed to retrieve the value with key K?

K hash(K) #probesK1 3 1K2 5 1K3 2 1K4 3 2K5 2 2K6 4 1

8/6 = 1.33 (successful)

0123456

K3 K3info

K1 K1info

K5 K5info

K4 K4info

K6 K6info

K2 K2info

unsuccessful search?

Page 11: Hash Tables

11

successful search performance

open addressing open addressing chaining (linear probing) (double hashing)load factor 0.5 1.50 1.39 1.25 0.7 2.17 1.72 1.35 0.9 5.50 2.56 1.45 1.0 ---- ---- 1.50 2.0 ---- ---- 2.00

Page 12: Hash Tables

12

Factors affecting Search Performance

quality of hash function how uniform? depends on actual data

collision resolution strategy used load factor of the HashTable

N/Tsize the lower the load factor the better

the search performance

Page 13: Hash Tables

13

TraversalVisit each item in the hash tableClosed hash table

O(Tsize) to visit all n items Tsize is larger than n

Chained hash table O(Tsize + n) to visit all n items

Items are not visited in order of key value

Page 14: Hash Tables

14

Deletions?search for item to be deletedchained hash table

find node and delete itopen hash table

must mark vacated spot as “deleted” is different than “never used”

Page 15: Hash Tables

15

Hash Table Summarysearch speed depends on load

factor and quality of hash function should be less than .75 for open

addressing can be more than 1 for chaining

items not kept sorted by keyvery good for fast access to

unordered data with known upper bound to pick a good TSize

Page 16: Hash Tables

16

heap is a binary tree that

is complete has the heap-order property

max heap - item stored in each node has a key/priority that is >= the priority of the items stored in each of its children

min heap - item stored in each node has a key/priority that is <= the priority of the items stored in each of its children

efficient data structure for PriorityQueue ADT requires the ability to compare items based on

their priorities basis for the heapsort algorithm

Page 17: Hash Tables

17

two heaps

23 18 9 8 12 7 1 4 2

A heap is always a complete binary tree

1 4 2 9 8 7 18 23 12

Page 18: Hash Tables

18

a complete binary tree can be stored in an array

23 18 9 8 12 7 1 4 2

for the item in A[i]: leftChild is in A[2i+1] rightChild is in A[2i+2] parent is in A[(i-1)/2]

0 1 2 3 4 5 6 7 8 23 18 9 8 12 7 1 4 2A

9Size

Page 19: Hash Tables

19

PriorityQueue ADT Data Items

a collection of items which can be ordered by priority

Operations constructor - creates an empty PQ empty () - returns true iff a PQ is empty size () - returns the number of items in a PQ push (item) - adds an item to a PQ top () - returns the item in a PQ with the highest

priority pop () – removes the item with the highest

priority from a PQ

Page 20: Hash Tables

20

PQ Data structures unordered array or linked list

push is O(1) top and pop are (n)

ordered array or linked list push is O(n) top and pop are (1)

heap top is O(1) push and pop are O(log2 n)

STL has a priority_queue class is implemented using a heap

Page 21: Hash Tables

21

PQ operations top

return item at A[0] push and pop must maintain heap-order

property push

put new item at end (in A[size]) re-establish the heap-order property by moving

the new item to where it belongs pop

A[0] is item to delete swap A[0] and A[size-1] move item at A[0] down a path to where it

belongs

Page 22: Hash Tables

22

pop( )

0 1 2 3 4 5 6 7 8 23 18 9 8 12 7 1 4 2A

9Size

23 18 9 8 12 7 1 4 2

18 12 9 8 2 7 1 4

18 12 2 23

8

Page 23: Hash Tables

23

Balanced Search Trees several varieties (Ch.13)

AVL trees 2-3-4 trees Red-Black trees B-Trees (used for searching secondary

memory) nodes are added and deleted so that the

height of the tree is kept under control insert and delete take more work, but

retrieval (also insert & delete) never more than log2 n because height is controlled