1 HashingHashing Alan, Tam Siu Lung 96397999 [email protected] 99967891.

33
1 Hashing Hashing Alan, Tam Siu Lung Alan, Tam Siu Lung 96397999 [email protected] 96397999 [email protected] 99967891 99967891

Transcript of 1 HashingHashing Alan, Tam Siu Lung 96397999 [email protected] 99967891.

Page 1: 1 HashingHashing Alan, Tam Siu Lung 96397999 Tam@SiuLung.com 99967891.

1

HashingHashingHashingHashingAlan, Tam Siu LungAlan, Tam Siu Lung

96397999 [email protected] 96397999 [email protected] 9996789199967891

Page 2: 1 HashingHashing Alan, Tam Siu Lung 96397999 Tam@SiuLung.com 99967891.

2

Prerequisites• List ADT

– Linked List• Table ADT

– Array• Mathematics

– Modular Arithmetic• Computer Organization

– ASCII• Algorithm

– Order Analysis

Page 3: 1 HashingHashing Alan, Tam Siu Lung 96397999 Tam@SiuLung.com 99967891.

3

Basic Data TypesPascal Type

Storage Operations

Word A positive integer+, -, *, div, mod

Double A real number+, -, *, /, int, frac

Array[1..12] of Boolean

A sequence of 12 bits

y := a[] (get),a[] := y (set)

Page 4: 1 HashingHashing Alan, Tam Siu Lung 96397999 Tam@SiuLung.com 99967891.

4

Abstract Data Types (ADT)

• Stack<v>– Can add and remove in LIFO order

• Queue<v>– Can add and remove in FIFO order

• Priority Queue<v>– Can add. Can remove in larger first

order. v is comparable.

Page 5: 1 HashingHashing Alan, Tam Siu Lung 96397999 Tam@SiuLung.com 99967891.

5

Data Structure• An ADT, implemented by a Data Type• E.g.

– ArrayList, using an array to implement a List ADT

– ArrayHeap, using an array to implement a Heap (may in turn implements a PQ)

Page 6: 1 HashingHashing Alan, Tam Siu Lung 96397999 Tam@SiuLung.com 99967891.

6

Dictionary<k, v> ADT• Add(k, v)

– Add a key-value pair• Remove(k)

– Remove a key-value pair given the key• Search(k) : v

– Search for the value given the keyA Table ADT only differs in that key is an integer in range.

Page 7: 1 HashingHashing Alan, Tam Siu Lung 96397999 Tam@SiuLung.com 99967891.

7

Direct Addressing• Use the Table ADT• The key is the location• Efficient: O(1) for all

operations• Infeasible: if the key can

range from 1 to 20000000000, if the key is not numeric ...

0 Ant

5 Boy

99 Car

Page 8: 1 HashingHashing Alan, Tam Siu Lung 96397999 Tam@SiuLung.com 99967891.

8

Time ComplexityAverage Case

Add Remove Search

Array O(1) O(n) O(n)Sorted Array O(n) O(lg n) O(lg n)Linked List O(1) O(n) O(n)BST O(lg n) O(lg n) O(lg n)Hash Table ~O(1) ~O(1) ~O(1)

Note: For sorted array and BST, keys have to be ordered.

Page 9: 1 HashingHashing Alan, Tam Siu Lung 96397999 Tam@SiuLung.com 99967891.

9

Hash Function• Hash Function: hm(k)• Map all keys into an integer domain, e.g. 0 t

o m - 1• E.g. CRC32 hashes strings into 32-bit intege

r (i.e. m = 232)– Alan: 1598313570– Max: 3452409927– Man: 943766770– On: 2246271074

Note: We won’t use such a big m in our programs!

Page 10: 1 HashingHashing Alan, Tam Siu Lung 96397999 Tam@SiuLung.com 99967891.

10

Hash Table• Use a Table<int, v> ADT of size m• Use h(k) as the key• All operations can be done like using Table• Solved except

– Collision: What to do if two different k have same h(k)

– How to find a suitable hash function

Page 11: 1 HashingHashing Alan, Tam Siu Lung 96397999 Tam@SiuLung.com 99967891.

11

Hash Functions• If k is an integer, use h(k) = k mod m• More advanced: floor(m*frac(k*A)) for som

e 0 < A < 1• If k is a string, convert it to an integer, e.g.• h(‘Alan’) = [ASC(‘A’)*2563+ ASC(‘l’)

*2562+ ASC(‘a’)*256+ASC(‘n’)] mod m• If k is other data type, try to combine all fea

tures of the type

Page 12: 1 HashingHashing Alan, Tam Siu Lung 96397999 Tam@SiuLung.com 99967891.

12

Chaining(a.k.a. Open Hashing)

• Use Table<int, List<v> > instead• When there are multiple k’s with sa

me h(k), add it to the list (usually linked list)

• When searching, remove it from the list

• Order: O(length of all lists)

Page 13: 1 HashingHashing Alan, Tam Siu Lung 96397999 Tam@SiuLung.com 99967891.

13

Chaining Samples0

5

99

h(‘Alan’) = h(‘Man’) = h(‘On’) = 0, h(‘Max’) = 5Operations:

Add <Alan, D>Add <Max, Z>Add <Man, X>Add <On, Y>Search for MaxRemove Man

Alan D

Page 14: 1 HashingHashing Alan, Tam Siu Lung 96397999 Tam@SiuLung.com 99967891.

14

Chaining Samples0

5

99

h(‘Alan’) = h(‘Man’) = h(‘On’) = 0, h(‘Max’) = 5Operations:

Add <Alan, D>Add <Max, Z>Add <Man, X>Add <On, Y>Search for MaxRemove Man

Alan D

Max Z

Page 15: 1 HashingHashing Alan, Tam Siu Lung 96397999 Tam@SiuLung.com 99967891.

15

Chaining Samples0

5

99

h(‘Alan’) = h(‘Man’) = h(‘On’) = 0, h(‘Max’) = 5Operations:

Add <Alan, D>Add <Max, Z>Add <Man, X>Add <On, Y>Search for MaxRemove Man

Man X Alan D

Max Z

Page 16: 1 HashingHashing Alan, Tam Siu Lung 96397999 Tam@SiuLung.com 99967891.

16

Chaining Samples0

5

99

h(‘Alan’) = h(‘Man’) = h(‘On’) = 0, h(‘Max’) = 5Operations:

Add <Alan, D>Add <Max, Z>Add <Man, X>Add <On, Y>Search for MaxRemove Man

On Y Man X Alan D

Max Z

Page 17: 1 HashingHashing Alan, Tam Siu Lung 96397999 Tam@SiuLung.com 99967891.

17

Chaining Samples0

5

99

h(‘Alan’) = h(‘Man’) = h(‘On’) = 0, h(‘Max’) = 5Operations:

Add <Alan, D>Add <Max, Z>Add <Man, X>Add <On, Y>Search for MaxRemove Man

On Y Man X Alan D

Max Z

Page 18: 1 HashingHashing Alan, Tam Siu Lung 96397999 Tam@SiuLung.com 99967891.

18

Chaining Samples0

5

99

h(‘Alan’) = h(‘Man’) = h(‘On’) = 0, h(‘Max’) = 5Operations:

Add <Alan, D>Add <Max, Z>Add <Man, X>Add <On, Y>Search for MaxRemove Man

On Y Man X Alan D

Max Z

Page 19: 1 HashingHashing Alan, Tam Siu Lung 96397999 Tam@SiuLung.com 99967891.

19

Chaining Samples0

5

99

h(‘Alan’) = h(‘Man’) = h(‘On’) = 0, h(‘Max’) = 5Operations:

Add <Alan, D>Add <Max, Z>Add <Man, X>Add <On, Y>Search for MaxRemove Man

On Y Man X Alan D

Max Z

Page 20: 1 HashingHashing Alan, Tam Siu Lung 96397999 Tam@SiuLung.com 99967891.

20

Chaining Samples0

5

99

h(‘Alan’) = h(‘Man’) = h(‘On’) = 0, h(‘Max’) = 5Operations:

Add <Alan, D>Add <Max, Z>Add <Man, X>Add <On, Y>Search for MaxRemove Man

On Y Alan D

Max Z

Page 21: 1 HashingHashing Alan, Tam Siu Lung 96397999 Tam@SiuLung.com 99967891.

21

Chaining (Optional)• Note that the Table can be Table<int,

Container<v> > for any Container supporting Add, Remove and Search.

• Why not consider other things, say another hash table? A BST?

Page 22: 1 HashingHashing Alan, Tam Siu Lung 96397999 Tam@SiuLung.com 99967891.

22

Open Addressing(a.k.a. Closed Hashing)

• During collission, find another slot for the entry• E.g. if h(k) is not empty, try h(k)+1, h(k)+2, etc• Define the probe sequence <h(k, 0), h(k, 1), ..., h(k,

m – 1)> be the sequence to slots to try (it should be a permutation of <0, 1, ..., m – 1>

• Then both add and search will try the same sequence, so finally must find the pair <k, v> before an empty slot is reached

• How about delete? Search and mark it empty?• Order: O(length of probe sequence)

Page 23: 1 HashingHashing Alan, Tam Siu Lung 96397999 Tam@SiuLung.com 99967891.

23

Open Addressing Samples0 Alan D

1 Nil

2 Nil

3 Nil

4 Nil

5 Nil 99 Nil

Add Max

Add Man

0 Alan D

1 Nil

2 Nil

3 Nil

4 Nil

5 Max Z 99 Nil

0 Alan D

1 Man X

2 Nil

3 Nil

4 Nil

5 Max Z 99 Nil

Page 24: 1 HashingHashing Alan, Tam Siu Lung 96397999 Tam@SiuLung.com 99967891.

24

Open Addressing Samples0 Alan D

1 Man X

2 On Y

3 Nil

4 Nil

5 Max Z 99 Nil

Search for Max

Add Man

0 Alan D

1 Man X

2 On Y

3 Nil

4 Nil

5 Max Z 99 Nil

0 Alan D

1 Man X

2 On Y

3 Nil

4 Nil

5 Max Z 99 Nil

Page 25: 1 HashingHashing Alan, Tam Siu Lung 96397999 Tam@SiuLung.com 99967891.

25

Open Addressing Samples0 Alan D

1 Man X

2 On Y

3 Nil

4 Nil

5 Max Z 99 Nil

Search for Max

Delete Man

0 Alan D

1 Man X

2 On Y

3 Nil

4 Nil

5 Max Z 99 Nil

0 Alan D

1 Del X

2 On Y

3 Nil

4 Nil

5 Max Z 99 Nil

Page 26: 1 HashingHashing Alan, Tam Siu Lung 96397999 Tam@SiuLung.com 99967891.

26

Collision Resolution• The method outlined above is called l

inear probing– In general, h(k, i) = h(k) + c i– Forms Primary Clustering

• There is also quadratic probing– In general, h(k, i) = h(k) + c1 i2 + c2 i– Still forms Secondary Clustering

Page 27: 1 HashingHashing Alan, Tam Siu Lung 96397999 Tam@SiuLung.com 99967891.

27

Double Hashing (Optional)

• h(k, i) = ( h(k) + i h’(k) ) mod m• Note: h’(k) cannot be 0• Meaningful h’(k) should be in [1, m)• E.g. m – k mod (m – 1)

Page 28: 1 HashingHashing Alan, Tam Siu Lung 96397999 Tam@SiuLung.com 99967891.

28

How good is Hashing?• Nearly constant time if very short

list or very low probing rate• So we need

– A uniform hash function (your job)– A larger hash table (trade it off with

memory limit)

Page 29: 1 HashingHashing Alan, Tam Siu Lung 96397999 Tam@SiuLung.com 99967891.

29

Size too small? (Optional)

• Create a new hash table and re-hash all entries (not useful for OI use)

• If use open addressing, need to re-hash to remove the deleted items anyway

Page 30: 1 HashingHashing Alan, Tam Siu Lung 96397999 Tam@SiuLung.com 99967891.

30

Extensible Hashing (Optional)

• Use Table<int, Ptr> (Ptr is like the list in chaining)

• The size m = 2k

• Given any uniform hash function h(k), g(k) = last k bits of h(k)

• Ptr points to an array of size r, each storing an entry

• The problem: what to do when the array is full

Page 31: 1 HashingHashing Alan, Tam Siu Lung 96397999 Tam@SiuLung.com 99967891.

31

Extensible Hashing (Optional)

00

01

10

11

Alan Man On

Ben

Max

h(‘Alan’) = 0, h(‘Man’) = 4, h(‘On’) = 12, h(‘Ben’) = 5, h(‘Max’)=5

Page 32: 1 HashingHashing Alan, Tam Siu Lung 96397999 Tam@SiuLung.com 99967891.

32

Extensible Hashing (Optional)

00

01

10

11

Alan Man On

Ben Si

Max

Add Si where h(‘Si’) = 9, i.e. g(‘Si’) = 01

Page 33: 1 HashingHashing Alan, Tam Siu Lung 96397999 Tam@SiuLung.com 99967891.

33

Extensible Hashing (Optional)000

001

010

011

100

101

110

111

Alan On

Ben Si

Max

Add Unu where h(‘Unu’) = 4, i.e. g(‘Unu’) = 100The first array will be split according to their h(k)Still need to chain?

Man Unu