Topic 6 Hashing
-
Upload
haire-kahfi-maa-takaful -
Category
Documents
-
view
221 -
download
0
Transcript of Topic 6 Hashing
-
8/14/2019 Topic 6 Hashing
1/31
CSC508
DATA STRUCTURES
TOPIC 6
Hashing1
Zulaile
Mabni
-
8/14/2019 Topic 6 Hashing
2/31
CHAPTER OBJECTIVES
Learn about hashing
Learn about hash methods
Mid-square
Folding
Division
Learn about collision
Open addressing
Chaining
2
Malik,DataStructuresUsingJava
-
8/14/2019 Topic 6 Hashing
3/31
WHYHASHING?
Need a data structure in which finds/searches are
very fast
Insert and Delete process should be fast too
Objects have unique keys A key may be a single property/attribute value
Or may be created from multiple properties/values
-
8/14/2019 Topic 6 Hashing
4/31
HASH TABLES VS. OTHER DATA
STRUCTURES
Maximize efficiency: implement the operations Insert(),
Delete() and Search()/Find() efficiently.
Arrays:
not space efficient (assumes we leave empty space for
keys not currently in the structure) Linked List
space efficient
Insert(), Delete() and Search()/Find() not too efficient
Hash Tables: Better than the above in terms of space and efficiency
-
8/14/2019 Topic 6 Hashing
5/31
HASH TABLES
Very useful data structure
Good for storing and retrieving key-value pairs
Not good for iterating through a list of items
Example applications: Storing objects according to ID numbers
When the ID numbers are widely spread out
When you dont need to access items in ID order
-
8/14/2019 Topic 6 Hashing
6/31
HASH INDEX/VALUE
A hash value or hash index is used to
index the hash table (array)
A hash function takes akey and returns a
hash value/index The hash index is a integer (to index an array)
The key is specific value associated with a
specific object being stored in the hash
table
It is important that the key remain constant
for the lifetime of the object
-
8/14/2019 Topic 6 Hashing
7/31
Hash TablesConceptual View
Obj5
key=1
obj1
key=15
Obj4
key=2
Obj2
key=36
7
6
5
4
3
21
0
table
Obj3
key=4
buckets
hash
value/inde
x
b1
b2
b3
b4
-
8/14/2019 Topic 6 Hashing
8/31
Hash Tables solve these problems by using a much smaller array and
mapping keys with a hash function.
Let universe of keys U and an array of size m. A hash function h is a functionfrom U to 0m, that is:
h U 0m
HASH TABLES
U
(universe of keys)
k1 k2
k3 k4
k6
0
1
2
3
4
5
6
7
h (k2)=2
h (k1)=h (k3)=3
h (k6)=5
h (k4)=7
-
8/14/2019 Topic 6 Hashing
9/31
HASH FUNCTION
You want a hash function/algorithm that is:
Fast
Easy to compute
Minimize the number of collisions
Creates a good distribution of hash values so that the
items (based on their keys) are distributed evenly
through the array
Hash functions can use as input
Integer key values String key values
Multipart key values
Multipart fields, and/or
Multiple fields
-
8/14/2019 Topic 6 Hashing
10/31
CHOOSING AHASH FUNCTION
The performance of the hash table depends onhaving a hash function that evenly distributesthe keys:uniform hashingis the ideal target
Choosing a good hash function requires taking
into account the kind of data that will be used. E.g., Choosing the first letter of a last name will likely
cause lots of collisions depending on the nationality ofthe population.
Most programming languages (including java)
have hash functions built in.
-
8/14/2019 Topic 6 Hashing
11/31
COMMONLYUSED HASH METHODS
Mid-Square
Hash method, h, computed by squaring the
identifier
Using appropriate number of bits from themiddle of the square to obtain the bucket
address
Middle bits of a square usually depend on all
the characters, it is expected that differentkeys will yield different hash addresses with
high probability, even if some of the characters
are the same11
Malik,DataStructuresUsingJava
-
8/14/2019 Topic 6 Hashing
12/31
COMMONLYUSED HASH METHODS
Folding
KeyX is partitioned into parts such that all the parts,
except possibly the last parts, are of equal length
Parts then added, in convenient way, to obtain hash
address
Division (Modular arithmetic)
key mod m
m is the array size; in general, it should be prime
number
KeyX is converted into an integer iX
This integer divided by size of hash table to get
remainder, giving address ofX in HT 12
Malik,DataStructuresUsingJava
-
8/14/2019 Topic 6 Hashing
13/31
THE MOD FUNCTION
Stands for modulo
When you divide x by y, you get a result and a remainder
Mod is the remainder
8 mod 5 = 3
9 mod 5 = 4
10 mod 5 = 0
15 mod 5 = 0
Thus for key-value mod M, multiples of M give the sameresult, 0
But multiples of other numbers do not give the same result
-
8/14/2019 Topic 6 Hashing
14/31
COMMONLYUSED HASH METHODS
Multiplication method
Floor ((key*someFraction mod 1)*arraySize)
Where some fraction is typically 0.618
Java Hash Map method
Create a hash by performing a series of shifts, adds,
and xors on the key
index = hash mod arraySize
-
8/14/2019 Topic 6 Hashing
15/31
COMMONLYUSED HASH METHODS
Suppose that each key is a string. Thefollowing Java method uses the divisionmethod to compute the address of the key:
int hashmethod(String insertKey)
{
int sum = 0;
for(int j = 0; j
-
8/14/2019 Topic 6 Hashing
16/31
HASH FUNCTIONS & INSERT()
Usage summary:
int hashValue = hashFunction (int key);
Or hashValue = hashFunction (String key);
Or hashValue = hashFunction (itemType item);
Insert method:
public void insert (int key, itemType item)
{
hashValue = hashFunction (key);table[hashValue] = item;
}
-
8/14/2019 Topic 6 Hashing
17/31
HASH TABLES: INSERT EXAMPLE
For example, if we hash keys 01000 into a hash table with 5
entries and use h(key) = key mod 5 , we get the followingsequence of events:
0
1
2
3
4
key dataInsert 2
2
0
1
2
3
4
key dataInsert 21
2
21
0
1
2
3
4
key dataInsert 34
2
21
34
Insert 54
There is a
collisionatarray entry #4
???
-
8/14/2019 Topic 6 Hashing
18/31
COLLISION RESOLUTION
Algorithms to handle collisions
Two categories of collision resolution techniques
Open addressing (closed hashing)
Chaining (open hashing)
18
Malik,Data
StructuresUsingJava
-
8/14/2019 Topic 6 Hashing
19/31
A problem arises when we have two keys thathash in the same array entrythis is called acollision.
There are two ways to resolve collision:
Hashing with Chaining (a.k.a. SeparateChaining): every hash table entry contains a pointerto a linked list of keys that hash in the same entry
Hashing with Open Addressing: every hash tableentry contains only one key. If a new key hashes to atable entry which is filled, systematically examineother table entries until you find one empty entry toplace the new key
DEALING WITH COLLISIONS
-
8/14/2019 Topic 6 Hashing
20/31
HASHING WITH CHAINING
The problem is that keys 34 and 54 hash in the same entry (4). We
solve this collisionby placing all keys that hash in the same hash table
entry in a chain (linked list) or bucket (array) pointed by this entry:
0
1
2
3
4
other
key key data
Insert 54
2
21
54 34
CHAIN
0
1
2
3
4
Insert 101
2
101
54 34
21
-
8/14/2019 Topic 6 Hashing
21/31
OPENADDRESSING
Collisions are resolved by systematically examining other tableindexes, i0 , i1 , i2 , until an empty slot is located.
The key is first mapped to an array cell using the hash function (e.g.
key % array-size)
If there is a collision find an available array cell
There are different algorithms to find (to probe for) the next array
cell
Linear probing
Quadratic probing
Random probing Double Hashing
-
8/14/2019 Topic 6 Hashing
22/31
OPENADDRESSING: LINEAR PROBING
Suppose that an item with keyX is to be inserted in HT
Use hash function to compute index h(X) of item in HT
Suppose h(X) = t.
If HT[t] is empty, store item into array slot.
Suppose HT[t] already occupied by another item; collisionoccurs
Linear probing: starting at location t, search array sequentiallyto find next available array slot:
(t + 1) % HTSize, (t + 2) % HTSize,,(t + j) % HTSize
Be sure to wrap around the end of the array! Stop when you have tried all possible array indices
If the array is full, you need to throw an exception or, betteryet, resize the array
22
Malik,Data
StructuresUsingJava
-
8/14/2019 Topic 6 Hashing
23/31
COLLISION RESOLUTION:
OPENADDRESSING
Pseudocode implementing linear probing:
hIndex = hashmethod(insertKey);
found = false;
while(HT[hIndex] != emptyKey && !found)
if(HT[hIndex].key == key)
found = true;
else
hIndex = (hIndex + 1) % HTSize;
if(found)
System.out.println(Duplicate items not allowed);else
HT[hIndex] = newItem;
23
Malik,Data
StructuresUsingJava
-
8/14/2019 Topic 6 Hashing
24/31
HASH TABLESOPENADDRESSING
(LINEAR PROBING)
Obj5
key=1
obj1
key=15
Obj4
key=2
Obj2
key=28
7
6
5
4
3
2
1
0
table
Obj3key=4Index=5
hashv
alue/index
Index=4
h(key) = key mod 8
-
8/14/2019 Topic 6 Hashing
25/31
RANDOM PROBING
Uses a random number generator to findthe next available slot
ith slot in the probe sequence is: (h(X) +ri) %HTSize whereri is theith value in arandom permutation of the numbers 1 toHTSize1
Suppose HTSize = 101, for h(X) = 26, and r1= 2, r2 = 5, r3 = 8.
The probe sequence of X has the elements26, 28,31,34
All insertions and searches use the samesequence of random numbers
25
Malik,Data
StructuresUsingJava
-
8/14/2019 Topic 6 Hashing
26/31
QUADRATIC PROBING
In Quadratic probing, starting at position
t, check the array locations ( t + 1) %
HTSize, (t + 2) % HTSize,, (t + i) %
HTSize.We do not know if it probes all the
positions in the table
WhenHTSize is prime, quadratic probing
probes about half the table before
repeating the probe sequence
26
Malik,Data
StructuresUsingJava
-
8/14/2019 Topic 6 Hashing
27/31
DOUBLE HASHING
Apply a second hash function after the first
The second hash function, like the first, is
dependent on the key
Secondary hash function must
Be different than the first
And, obviously, not generate a zero
Good algorithm:
arrayIndex = (arrayIndex + stepSize) % arraySize;
Where stepSize = constant(key % constant)
And constant is a prime less than the array size
-
8/14/2019 Topic 6 Hashing
28/31
-
8/14/2019 Topic 6 Hashing
29/31
HASH TABLES IN JAVA
Java supports a number of hash table classes
Hashtable, HashMap, LinkedHashMap, HashSet,
See Sun Java API Documentation
http://java.sun.com/j2se/1.4.1/docs/api/
As a programmer, you dont see the collision detection,chaining, etc.
You can set
The initial table size The load factor (Default is .75)
hashCode()hash function
-
8/14/2019 Topic 6 Hashing
30/31
HASHINGANIMATION
http://www.cs.auckland.ac.nz/software/AlgAnim/hash_tables.html
-
8/14/2019 Topic 6 Hashing
31/31
REFERENCES
Malik D.S., Nair P.S., Data Structures Using
Java, Course Technology, 2003.
Weiss Mark Allen,Data Structures & Algorithm
Analysis in C++, Pearson Education
International Inc, 2003.
31
Malik,Data
StructuresUsingJava