STIA2023 Hashing
-
Upload
amimul-ihsan -
Category
Documents
-
view
225 -
download
0
Transcript of STIA2023 Hashing
-
8/12/2019 STIA2023 Hashing
1/36
Data Structures & Algorithm
AnalysisWeek 13: Hashing
-
8/12/2019 STIA2023 Hashing
2/36
Objectives:
At the end of this lesson, the student will be able to: Describe the basic idea of hashing,
Describe the purpose of a hash table, and a hash function,
Describe how a hash function compresses a hash code into an index to
hash table,
Explain what collisions are and why they occur,
Describe open addressing as a method to resolve collisions,
Describe linear probing, and quadratic probing as particular open
addressing schemes,
Describe separate chaining as method to resolve collisions, and
Describe the relative efficiencies of various collisions resolution
techniques.
-
8/12/2019 STIA2023 Hashing
3/36
Chapter Contents
What is Hashing? Hash Functions
Computing Hash Codes
Compression a Hash Code into an Index for the Hash Table
Resolving Collisions Open Addressing with Linear Probing
Open Addressing with Quadratic Probing
Separate Chaining
-
8/12/2019 STIA2023 Hashing
4/36
Chapter Contents (ctd.)
Efficiency The Load Factor
The Cost of Open Addressing
The Cost of Separate Chaining
-
8/12/2019 STIA2023 Hashing
5/36
What is Hashing?
A technique that determines an index or location forstorage of an item in a data structure
The hash function receives the search key
Returns the index of an element in an array called the hashtable
The index is known as the hash index
A perfect hash function maps each search key into adifferent integer suitable as an index to the hash table
-
8/12/2019 STIA2023 Hashing
6/36
What is Hashing?
Fig. 1: A hash function indexes its hash table.
-
8/12/2019 STIA2023 Hashing
7/36
What is Hashing?
Two steps of the hash function Convert the search key into an integer called the hash code
Compress the hash code into the range of indices for the hash
table
Typical hash functions are not perfect They can allow more than one search key to map into a single
index
This is known as a collision
-
8/12/2019 STIA2023 Hashing
8/36
What is Hashing?
Fig. 2: A collision caused by the hash function h
-
8/12/2019 STIA2023 Hashing
9/36
Hash Functions
General characteristics of a good hash function
Minimize collisions
Distribute entries uniformly throughout the hash
table
Be fast to compute
-
8/12/2019 STIA2023 Hashing
10/36
Computing Hash Codes
We will override the hashCodemethod of Object
Guidelines
If a class overrides the method equals, it should override
hashCode
If the method equalsconsiders two objects equal, hashCodemust
return the same value for both objects
If an object invokes hashCodemore than once during execution
of program on the same data, it must return the same hash code
If an object's hash code during one execution of a program can
differ from its hash code during another execution of the sameprogram
-
8/12/2019 STIA2023 Hashing
11/36
Computing Hash Codes
The hash code for a string, s
Hash code for a primitive type
Use the primitive typed key itself Manipulate internal binary representations
Use folding
int hash = 0;
int n = s.length();
for (int i = 0; i < n; i++)
hash = g * hash + s.charAt(i);
// g is a positive constant
-
8/12/2019 STIA2023 Hashing
12/36
Compressing a Hash Code
Must compress the hash code so it fits into the indexrange
Typical method for a code c is to compute c modulo n
nis a prime number (the size of the table)
Index will then be between 0 and n 1
private int getHashIndex(Object key)
{ int hashIndex = key.hashCode() % hashTable.length;
if (hashIndex < 0)
hashIndex = hashIndex + hashTable.length;
return hashIndex;
} // end getHashIndex
-
8/12/2019 STIA2023 Hashing
13/36
Resolving Collisions
Options when hash functions returns location
already used in the table
Use another location in the table (open addressing)
Change the structure of the hash table so that each arraylocation can represent multiple values (separate
chaining)
-
8/12/2019 STIA2023 Hashing
14/36
Open Addressing with Linear Probing
Open addressing scheme locates alternate location
New location must be open, available
Linear probing
If collision occurs at hashTable[k], look successively atlocation k + 1, k + 2,
-
8/12/2019 STIA2023 Hashing
15/36
Fig. 3 : The effect of linear probing after adding four
entries whose search keys hash to the same index.
Open Addressing with Linear Probing
-
8/12/2019 STIA2023 Hashing
16/36
Fig. 4: A revision of the hash table shown in 19-3 when
linear probing resolves collisions; each entry contains a
search key and its associated value
Open Addressing with Linear Probing
-
8/12/2019 STIA2023 Hashing
17/36
Removals
Fig. 5: A hash table if removeused null
to remove entries.
-
8/12/2019 STIA2023 Hashing
18/36
-
8/12/2019 STIA2023 Hashing
19/36
Open Addressing with Linear Probing
Fig. 6: A linear probe sequence (a) after adding an entry;
(b) after removing two entries;
-
8/12/2019 STIA2023 Hashing
20/36
Fig. 6: A linear probe sequence (c) after a search; (d)
during the search while adding an entry; (e) after an
addition to a formerly occupied location.
Open Addressing with Linear Probing
-
8/12/2019 STIA2023 Hashing
21/36
Searches that Dictionary Operations Require
To retrieve an entry
Search the probe sequence for the key
Examine entries that are present, ignore locations in available state
Stop search when key is found or null reached
To remove an entry
Search the probe sequence same as for retrieval If key is found, mark location as available
To add an entry
Search probe sequence same as for retrieval
Note first available slot
Use available slot if the key is not found
-
8/12/2019 STIA2023 Hashing
22/36
Open Addressing, Quadratic Probing
Change the probe sequence
Given search key k
Probe to k + 1, k + 22, k + 32, k + n2
Reaches every location in the hash table if table size
is a prime number
For avoiding primary clustering But can lead to secondary clustering
-
8/12/2019 STIA2023 Hashing
23/36
-
8/12/2019 STIA2023 Hashing
24/36
Separate Chaining
Alter the structure of the hash table
Each location can represent multiple values
Each location called a bucket
Bucket can be a(n)
List
Sorted list
Chain of linked nodes
Array
Vector
-
8/12/2019 STIA2023 Hashing
25/36
Separate Chaining
Fig. 9: A hash table for use with separate chaining; each
bucket is a chain of linked nodes.
-
8/12/2019 STIA2023 Hashing
26/36
Separate Chaining
Fig. 10: Where new entry is inserted into linked bucket
when integer search keys are (a) duplicate and unsorted;
-
8/12/2019 STIA2023 Hashing
27/36
Separate Chaining
Fig. 10: Where new entry is inserted into linked bucket
when integer search keys are (b) distinct and unsorted;
-
8/12/2019 STIA2023 Hashing
28/36
Separate Chaining
Fig. 10: Where new entry is inserted into linked bucket
when integer search keys are (c) distinct and sorted
-
8/12/2019 STIA2023 Hashing
29/36
Efficiency Observations
Successful retrieval or removal Same efficiency as successful search
Unsuccessful retrieval or removal
Same efficiency as unsuccessful search
Successful addition
Same efficiency as unsuccessful search
Unsuccessful addition
Same efficiency as successful search
-
8/12/2019 STIA2023 Hashing
30/36
Load Factor
Perfect hash function not always possible or practical
Thus, collisions likely to occur
As hash table fills
Collisions occur more often
Measure for table fullness, the load factor
-
8/12/2019 STIA2023 Hashing
31/36
Cost of Open Addressing
Fig. 11: The average number of comparisons required by
a search of the hash table for given values of the load
factor when using linear probing.
-
8/12/2019 STIA2023 Hashing
32/36
Cost of Open Addressing
Fig. 12: The average number of comparisons
required by a search of the hash table for given
values of the load factor when using either
quadratic probing or double hashing.
Note: for quadraticprobing or double
hashing, should
have < 0.5
-
8/12/2019 STIA2023 Hashing
33/36
Cost of Separate Chaining
Fig. 13: Average number of comparisons required by
search of hash table for given values of load factor
when using separate chaining.
Note: Reasonable
efficiency requires
only < 1
-
8/12/2019 STIA2023 Hashing
34/36
Conclusion
Q & A Session
-
8/12/2019 STIA2023 Hashing
35/36
Question 1
The property that is not expected from good hashing
technique should ______________.
A) produce keys uniformly distributed over the range
B)easy to program
C)produce no collisions
D)minimize collisions
-
8/12/2019 STIA2023 Hashing
36/36
Question 2
Assume that a hash function has the following
characteristics:
Keys 77, 355, and 276 hash to 3.
Keys 945 and 579 hash to 5.
Key 517 hashes to 0.Key 155 hashes to 2.
Perform insertions in order the 945, 77, 276, 355, 517, 155, 579
A) Using linear probing technique, indicates the position of the data.
B) Which element requires the largest number of probes to be found in thetable (if more than one, give the element with the smallest index in the
hash table)?
C) Which elements(s) can we access with a single probe?
D) What is the load factor?