1 HashingHashing Alan, Tam Siu Lung 96397999 [email protected] 99967891.
-
Upload
esther-chapman -
Category
Documents
-
view
216 -
download
0
Transcript of 1 HashingHashing Alan, Tam Siu Lung 96397999 [email protected] 99967891.
1
HashingHashingHashingHashingAlan, Tam Siu LungAlan, Tam Siu Lung
96397999 [email protected] 96397999 [email protected] 9996789199967891
2
Prerequisites• List ADT
– Linked List• Table ADT
– Array• Mathematics
– Modular Arithmetic• Computer Organization
– ASCII• Algorithm
– Order Analysis
3
Basic Data TypesPascal Type
Storage Operations
Word A positive integer+, -, *, div, mod
Double A real number+, -, *, /, int, frac
Array[1..12] of Boolean
A sequence of 12 bits
y := a[] (get),a[] := y (set)
4
Abstract Data Types (ADT)
• Stack<v>– Can add and remove in LIFO order
• Queue<v>– Can add and remove in FIFO order
• Priority Queue<v>– Can add. Can remove in larger first
order. v is comparable.
5
Data Structure• An ADT, implemented by a Data Type• E.g.
– ArrayList, using an array to implement a List ADT
– ArrayHeap, using an array to implement a Heap (may in turn implements a PQ)
6
Dictionary<k, v> ADT• Add(k, v)
– Add a key-value pair• Remove(k)
– Remove a key-value pair given the key• Search(k) : v
– Search for the value given the keyA Table ADT only differs in that key is an integer in range.
7
Direct Addressing• Use the Table ADT• The key is the location• Efficient: O(1) for all
operations• Infeasible: if the key can
range from 1 to 20000000000, if the key is not numeric ...
0 Ant
5 Boy
99 Car
8
Time ComplexityAverage Case
Add Remove Search
Array O(1) O(n) O(n)Sorted Array O(n) O(lg n) O(lg n)Linked List O(1) O(n) O(n)BST O(lg n) O(lg n) O(lg n)Hash Table ~O(1) ~O(1) ~O(1)
Note: For sorted array and BST, keys have to be ordered.
9
Hash Function• Hash Function: hm(k)• Map all keys into an integer domain, e.g. 0 t
o m - 1• E.g. CRC32 hashes strings into 32-bit intege
r (i.e. m = 232)– Alan: 1598313570– Max: 3452409927– Man: 943766770– On: 2246271074
Note: We won’t use such a big m in our programs!
10
Hash Table• Use a Table<int, v> ADT of size m• Use h(k) as the key• All operations can be done like using Table• Solved except
– Collision: What to do if two different k have same h(k)
– How to find a suitable hash function
11
Hash Functions• If k is an integer, use h(k) = k mod m• More advanced: floor(m*frac(k*A)) for som
e 0 < A < 1• If k is a string, convert it to an integer, e.g.• h(‘Alan’) = [ASC(‘A’)*2563+ ASC(‘l’)
*2562+ ASC(‘a’)*256+ASC(‘n’)] mod m• If k is other data type, try to combine all fea
tures of the type
12
Chaining(a.k.a. Open Hashing)
• Use Table<int, List<v> > instead• When there are multiple k’s with sa
me h(k), add it to the list (usually linked list)
• When searching, remove it from the list
• Order: O(length of all lists)
13
Chaining Samples0
5
99
h(‘Alan’) = h(‘Man’) = h(‘On’) = 0, h(‘Max’) = 5Operations:
Add <Alan, D>Add <Max, Z>Add <Man, X>Add <On, Y>Search for MaxRemove Man
Alan D
14
Chaining Samples0
5
99
h(‘Alan’) = h(‘Man’) = h(‘On’) = 0, h(‘Max’) = 5Operations:
Add <Alan, D>Add <Max, Z>Add <Man, X>Add <On, Y>Search for MaxRemove Man
Alan D
Max Z
15
Chaining Samples0
5
99
h(‘Alan’) = h(‘Man’) = h(‘On’) = 0, h(‘Max’) = 5Operations:
Add <Alan, D>Add <Max, Z>Add <Man, X>Add <On, Y>Search for MaxRemove Man
Man X Alan D
Max Z
16
Chaining Samples0
5
99
h(‘Alan’) = h(‘Man’) = h(‘On’) = 0, h(‘Max’) = 5Operations:
Add <Alan, D>Add <Max, Z>Add <Man, X>Add <On, Y>Search for MaxRemove Man
On Y Man X Alan D
Max Z
17
Chaining Samples0
5
99
h(‘Alan’) = h(‘Man’) = h(‘On’) = 0, h(‘Max’) = 5Operations:
Add <Alan, D>Add <Max, Z>Add <Man, X>Add <On, Y>Search for MaxRemove Man
On Y Man X Alan D
Max Z
18
Chaining Samples0
5
99
h(‘Alan’) = h(‘Man’) = h(‘On’) = 0, h(‘Max’) = 5Operations:
Add <Alan, D>Add <Max, Z>Add <Man, X>Add <On, Y>Search for MaxRemove Man
On Y Man X Alan D
Max Z
19
Chaining Samples0
5
99
h(‘Alan’) = h(‘Man’) = h(‘On’) = 0, h(‘Max’) = 5Operations:
Add <Alan, D>Add <Max, Z>Add <Man, X>Add <On, Y>Search for MaxRemove Man
On Y Man X Alan D
Max Z
20
Chaining Samples0
5
99
h(‘Alan’) = h(‘Man’) = h(‘On’) = 0, h(‘Max’) = 5Operations:
Add <Alan, D>Add <Max, Z>Add <Man, X>Add <On, Y>Search for MaxRemove Man
On Y Alan D
Max Z
21
Chaining (Optional)• Note that the Table can be Table<int,
Container<v> > for any Container supporting Add, Remove and Search.
• Why not consider other things, say another hash table? A BST?
22
Open Addressing(a.k.a. Closed Hashing)
• During collission, find another slot for the entry• E.g. if h(k) is not empty, try h(k)+1, h(k)+2, etc• Define the probe sequence <h(k, 0), h(k, 1), ..., h(k,
m – 1)> be the sequence to slots to try (it should be a permutation of <0, 1, ..., m – 1>
• Then both add and search will try the same sequence, so finally must find the pair <k, v> before an empty slot is reached
• How about delete? Search and mark it empty?• Order: O(length of probe sequence)
23
Open Addressing Samples0 Alan D
1 Nil
2 Nil
3 Nil
4 Nil
5 Nil 99 Nil
Add Max
Add Man
0 Alan D
1 Nil
2 Nil
3 Nil
4 Nil
5 Max Z 99 Nil
0 Alan D
1 Man X
2 Nil
3 Nil
4 Nil
5 Max Z 99 Nil
24
Open Addressing Samples0 Alan D
1 Man X
2 On Y
3 Nil
4 Nil
5 Max Z 99 Nil
Search for Max
Add Man
0 Alan D
1 Man X
2 On Y
3 Nil
4 Nil
5 Max Z 99 Nil
0 Alan D
1 Man X
2 On Y
3 Nil
4 Nil
5 Max Z 99 Nil
25
Open Addressing Samples0 Alan D
1 Man X
2 On Y
3 Nil
4 Nil
5 Max Z 99 Nil
Search for Max
Delete Man
0 Alan D
1 Man X
2 On Y
3 Nil
4 Nil
5 Max Z 99 Nil
0 Alan D
1 Del X
2 On Y
3 Nil
4 Nil
5 Max Z 99 Nil
26
Collision Resolution• The method outlined above is called l
inear probing– In general, h(k, i) = h(k) + c i– Forms Primary Clustering
• There is also quadratic probing– In general, h(k, i) = h(k) + c1 i2 + c2 i– Still forms Secondary Clustering
27
Double Hashing (Optional)
• h(k, i) = ( h(k) + i h’(k) ) mod m• Note: h’(k) cannot be 0• Meaningful h’(k) should be in [1, m)• E.g. m – k mod (m – 1)
28
How good is Hashing?• Nearly constant time if very short
list or very low probing rate• So we need
– A uniform hash function (your job)– A larger hash table (trade it off with
memory limit)
29
Size too small? (Optional)
• Create a new hash table and re-hash all entries (not useful for OI use)
• If use open addressing, need to re-hash to remove the deleted items anyway
30
Extensible Hashing (Optional)
• Use Table<int, Ptr> (Ptr is like the list in chaining)
• The size m = 2k
• Given any uniform hash function h(k), g(k) = last k bits of h(k)
• Ptr points to an array of size r, each storing an entry
• The problem: what to do when the array is full
31
Extensible Hashing (Optional)
00
01
10
11
Alan Man On
Ben
Max
h(‘Alan’) = 0, h(‘Man’) = 4, h(‘On’) = 12, h(‘Ben’) = 5, h(‘Max’)=5
32
Extensible Hashing (Optional)
00
01
10
11
Alan Man On
Ben Si
Max
Add Si where h(‘Si’) = 9, i.e. g(‘Si’) = 01
33
Extensible Hashing (Optional)000
001
010
011
100
101
110
111
Alan On
Ben Si
Max
Add Unu where h(‘Unu’) = 4, i.e. g(‘Unu’) = 100The first array will be split according to their h(k)Still need to chain?
Man Unu