Searching and Hashing
-
Upload
jillian-erickson -
Category
Documents
-
view
41 -
download
0
description
Transcript of Searching and Hashing
![Page 1: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/1.jpg)
1
Searching and Hashing
![Page 2: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/2.jpg)
2
Concepts This Lecture
Searching an array Linear search Binary search Comparing algorithm performance
![Page 3: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/3.jpg)
3
Searching
Searching = looking for something Searching an array is particularly common
Goal: determine if a particular value is in the array
We'll see that more than one algorithm will work
![Page 4: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/4.jpg)
4
Searching Algorithms
The algorithm used to find a number in a phone book is practical and efficient for human but not so good for computers It's not precise It's not consistent
Let's imagine another scenario. Suppose that you have A pile of cards containing names of customers They are not organized in any particular way You want to find the card with name Sarah (your key)
The procedure you'll will use is likely to be: look a each card's key (one by one) until one matches your target This is an algorithm and is called Linear Search
![Page 5: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/5.jpg)
5
Searching as a Function Specification: Let b be the array to be searched, n is the size of the array, and b is x
is value being search for. If x appears in b[0..n-1], return its index, i.e., return k such that b[k]==x. If x not found, return –1
None of the parameters are changed by the function Function outline:
void Lookup ((const int vec[ ], int vSize, int key, Boolean& found, int& loc) {
...}
![Page 6: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/6.jpg)
6
Linear Search Algorithm: start at the beginning of the array and examine each
element until x is found, or all elements have been examined
void Lookup (const int vec[ ], int vSize, int key, Boolean& found, int& loc) {
loc = 0;
while (loc < vSize && vec[loc] != key)
loc++;
found = (loc < vSize);
}
![Page 7: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/7.jpg)
7
Linear Search
Test: search(v, 8, 6)
3 12 -5 6 142 21 -17 45b
Found It!
![Page 8: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/8.jpg)
8
Linear Search
Test: search(v, 8, 15)
3 12 -5 6 142 21 -17 45b
Ran off the end! Not found.
![Page 9: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/9.jpg)
9
Linear Search
Note: The loop condition is written so vec[loc] is not accessed if loc >= vSize.
while ( loc < vSize && vec[loc] != key )
(Why is this true? Why does it matter?)
3 12 -5 6 142 21 -17 45b
![Page 10: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/10.jpg)
10
Write a Recursive Linear Search
NodeType linearSearch(NodeType *start, int target) { if (start->key == target) return *start; if (start == NULL) return NULL; else return LinearSearch(start->next, target);}
NodeType linearSearch(NodeType *start, int target) { if (start->key == target) return *start; if (start == NULL) return NULL; else return LinearSearch(start->next, target);}
![Page 11: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/11.jpg)
11
Linear Search-Linked List
for each item in the list if the item's key match the target stop and report "success"report failure
for each item in the list if the item's key match the target stop and report "success"report failure
![Page 12: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/12.jpg)
12
Linear Search (target = 9)
headhead
55 1212 99
//
headhead
![Page 13: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/13.jpg)
13
Linear Search (target = 9)
headhead
55 1212 99
//
headhead
![Page 14: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/14.jpg)
14
Linear Search (target = 9)
headhead
55 1212 99
//
headhead
![Page 15: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/15.jpg)
15
Linear Search (target = n)
NodeType linearSearch(NodeType *start, int target) { NodeType *temp = start; while (temp != NULL) { if (temp->key == target) return *temp; temp = temp->next; } return NULL;}
NodeType linearSearch(NodeType *start, int target) { NodeType *temp = start; while (temp != NULL) { if (temp->key == target) return *temp; temp = temp->next; } return NULL;}
![Page 16: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/16.jpg)
16
Analyzing Linear Search
Best case analysis The element is always found in the first position of the list, which
means that we do one comparison: O(1) Worst case analysis
The element is never present in the list. This means that we are going to do n comparisons where n is the size of the listwe have to go through the whole list to be sure whether the element is
present: O(N) Average case analysis
The search key can be found anywhere in the list If we "run" the algorithm for each possibility where the key may appear
we get: 1+2+….+vSize/vSize => (vSize*(vSize+1)/2)/vSize = (vSize+1)/2 = O(N)
![Page 17: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/17.jpg)
17
Can we do better?
Time needed for linear search is proportional to the size of the array.
An alternate algorithm, "Binary search," works if the array is sorted 1. Look for the target in the middle. 2. If you don't find it, you can ignore half of the
array, and repeat the process with the other half.
Example: Find first page of pizza listings in the yellow pages
![Page 18: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/18.jpg)
18
Can we do better?
Time needed for linear search is proportional to the size of the array.
An alternate algorithm, "Binary search," works if the array is sorted 1. Look for the target in the middle. 2. If you don't find it, you can ignore half of the
array, and repeat the process with the other half.
Example: Find first page of pizza listings in the yellow pages
![Page 19: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/19.jpg)
19
Binary Search
In some cases, you get a list which is already ordered. In this case we can use algorithms that take this into
consideration The idea of binary search is
Split the list in two halves and compare the target with the key in the middle of the list
Based on this comparison we can tell which half of the list may contain the target
Binary search eliminates half of the list at each iteration
It requires direct access to the list elements
![Page 20: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/20.jpg)
20
Binary Search Strategy What we want: Find split between values larger
and smaller than x:
<= x > x
0 L R n
b
<= x > x?
0 L R n
b
Situation while searching
Step: Look at b[(L+R)/2]. Move L or R to the middle depending on test.
![Page 21: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/21.jpg)
21
Binary Search Strategy
More precisely
Values in b[0..L] <= x Values in b[R..n-1] > x Values in b[L+1..R-1] are unknown
<= x > x?
0 L R n
b
![Page 22: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/22.jpg)
22
Binary SearchIterative Approach
/* If x appears in b[0..n-1], return its location, i.e., return k so that b[k]==x. If x not found, return -1 */NodeType binarySearch(NodeType list[], int size, int target){
int front, back, mid;___________________ ;
while ( _______________ ) {
} _________________ ;}
<= x > x?0 L R n
b
![Page 23: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/23.jpg)
23
/* If x appears in b[0..n-1], return its location, i.e., return k so that b[k]==x. If x not found, return -1 */NodeType binarySearch(NodeType list[], int size, int target){
int front, back, mid;___________________ ;
while ( _______________ ) { mid = (front+back)/2;
if (target == list[mid].key) return list[mid]; else if (target < list[mid].key) back = mid-1; } _________________ ;}
<= x > x?0 L R n
b
Binary SearchIterative Approach
![Page 24: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/24.jpg)
24
Loop Termination/* If x appears in b[0..n-1], return its location, i.e., return k so that b[k]==x. If x not found, return -1 */NodeType binarySearch(NodeType list[], int size, int target){
int front, back, mid;___________________ ;
while (front <= back) { mid = (front+back)/2; if (target == list[mid].key) return list[mid]; else if (target < list[mid].key) back = mid-1; else front = mid+1; } _________________ ;}
<= x > x?0 L R n
b
![Page 25: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/25.jpg)
25
/* If x appears in b[0..n-1], return its location, i.e., return k so that b[k]==x. If x not found, return -1 */
NodeType binarySearch(NodeType list[], int size, int target) { int front(0); int back(size-1); int mid; while (front <= back) { mid = (front+back)/2; if (target == list[mid].key) return list[mid]; else if (target < list[mid].key) back = mid-1; else front = mid+1; } _________________ ;}
Initialization
<= x > x0 L R n
b
![Page 26: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/26.jpg)
26
NodeType binarySearch(NodeType list[], int size, int target) { int front(0); int back(size-1); int mid; while (front <= back) { mid = (front+back)/2; if (target == list[mid].key) return list[mid]; else if (target < list[mid].key) back = mid-1; else front = mid+1; } return NULL; \\ Indicates target was not found;}
Return Result
<= x > x0 L R n
b
![Page 27: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/27.jpg)
27
Binary Search
Test: bsearch(v,8,3);
-17 -5 3 6 12 21 45 142b
0 1 2 3 4 5 6 7
L Rmid
while (front <= back) { mid = (front+back)/2; if (target == list[mid].key) return list[mid]; else if (target < list[mid].key) back = mid-1; else front = mid+1;
RmidL midL
![Page 28: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/28.jpg)
28
Binary Search
Test: bsearch(v,8,17);
-17 -5 3 6 12 21 45 142b
L Rmid
while (front <= back) { mid = (front+back)/2; if (target == list[mid].key) return list[mid]; else if (target < list[mid].key) back = mid-1; else front = mid+1;
midmidL RL
0 1 2 3 4 5 6 7
![Page 29: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/29.jpg)
29
Binary Search
Test: bsearch(v,8,143);
-17 -5 3 6 12 21 45 142b
L Rmid
while (front <= back) { mid = (front+back)/2; if (target == list[mid].key) return list[mid]; else if (target < list[mid].key) back = mid-1; else front = mid+1;
midmidmidL L L L
0 1 2 3 4 5 6 7
![Page 30: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/30.jpg)
30
Binary Search
Test: bsearch(v,8,-143);
-17 -5 3 6 12 21 45 142b
L Rmid
while (front <= back) { mid = (front+back)/2; if (target == list[mid].key) return list[mid]; else if (target < list[mid].key) back = mid-1; else front = mid+1;
midmid RRR
0 1 2 3 4 5 6 7
![Page 31: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/31.jpg)
31
Binary Search (target = n)
NodeType binarySearch(NodeType list[], int size, int target) { int front(0); int back(size-1); int mid; while (front <= back) { mid = (front+back)/2; if (target == list[mid].key) return list[mid]; else if (target < list[mid].key) back = mid-1; else front = mid+1; } return NULL; \\ Indicates target was not found;}
NodeType binarySearch(NodeType list[], int size, int target) { int front(0); int back(size-1); int mid; while (front <= back) { mid = (front+back)/2; if (target == list[mid].key) return list[mid]; else if (target < list[mid].key) back = mid-1; else front = mid+1; } return NULL; \\ Indicates target was not found;}
![Page 32: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/32.jpg)
32
Binary Search (target = 7)
44 66 77 1212 1818 2222 2323 2828
front(0)front(0)
3030
back(8)back(8)
![Page 33: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/33.jpg)
33
Binary Search (target = 7)
44 66 77 1212 1818 2222 2323 2828
front(0)front(0)
3030
back(8)back(8)
mid(4)mid(4)
![Page 34: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/34.jpg)
34
Binary Search (target = 7)
44 66 77 1212 1818 2222 2323 2828
front(0)front(0)
3030
back(3)back(3)mid(4)mid(4)
![Page 35: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/35.jpg)
35
Binary Search (target = 7)
44 66 77 1212 1818 2222 2323 2828
front(0)front(0)
3030
back(3)back(3)mid(1)mid(1)
![Page 36: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/36.jpg)
36
Binary Search (target = 7)
44 66 77 1212 1818 2222 2323 2828
front(2)front(2)
3030
back(3)back(3)
mid(1)mid(1)
![Page 37: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/37.jpg)
37
Binary Search (target = 7)
44 66 77 1212 1818 2222 2323 2828
front(2)front(2)
3030
back(3)back(3)
mid(1)mid(1)
![Page 38: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/38.jpg)
38
Is it worth the trouble?
Suppose you had 1000 elements Ordinary search would require maybe 500 comparisons on
average Binary search
after 1st compare, throw away half, leaving 500 elements to be searched.
after 2nd compare, throw away half, leaving 250. Then 125, 63, 32, 16, 8, 4, 2, 1 are left.
After at most 10 steps, you're done! What if you had 1,000,000 elements??
![Page 39: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/39.jpg)
39
How Fast Is It?
Another way to look at it: How big an array can you search if you examine a given number of array elements?
# comps Array size
1 1
2 2
3 4
4 8
5 16
6 32
7 64
8 128
… …
11 1,024
… …
21 1,048,576
![Page 40: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/40.jpg)
40
List size Loop Iterations1 13 27 315 431 563 6
127 7
Analyzing Binary Search
We only need to concentrate in the main loop The loop is different from the linear search because its number of
executions is not a multiple of n (list size) We can easily see that the size of the input is halved in each interaction.
This should already give a "hint" of each function describes this algorithm, but let's use a table
The table shows that thenumber of iterations grows
proportionally to the logarithm
base 2 of the size of the list
O(log n)
The table shows that thenumber of iterations grows
proportionally to the logarithm
base 2 of the size of the list
O(log n)
![Page 41: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/41.jpg)
41
Time for Binary Search
Key observation: for binary search: size of the array n that can be searched with k comparisons: n ~ 2k
Number of comparisons k as a function of array size n: k ~ log2 n
This is fundamentally faster than linear search (where k ~ n)
![Page 42: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/42.jpg)
42
Write a Recursive Binary Seach Function BinarySearch( )
BinarySearch takes sorted array vec, and two subscripts, fromLoc and toLoc, and key as arguments. It returns false if key is not found in the elements vec[fromLoc…toLoc]. Otherwise, it returns true.
BinarySearch is O(log2N).
![Page 43: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/43.jpg)
43
found = BinarySearch(vec, 25, 0, 14 );
key fromLoc toLocindexes
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
vec 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28
16 18 20 22 24 26 28
24 26 28
24 NOTE: denotes element examined
![Page 44: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/44.jpg)
44
Recursive Binary Seach -- basic idea
• This is an example of a recursive function where arguments are halved.
Given: a sorted array a of values (integers, strings, ..) from range [s,t]
Task: search if a value x is in the array. If yes, return position, otherwise -1.
![Page 45: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/45.jpg)
45
Recursive Binary Seach -- basic idea
• Consider how you search for a name in a phone book: you don't use algorithm 1 (otherwise it would take ages to find a name starting with Z).
• instead, you open the book somewhere, and then continue searching in the half that contains the name then open up somewhere in that half, and continue searching in the portion that contains the name, etc.
![Page 46: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/46.jpg)
46
Now let's do this for a sorted array of integers, but let's alwayscheck the middle of the remaining range.Example: search for 7 in the following array
2 5 7 11 17 24 31 38 40 41 0 1 2 3 4 5 6 7 8 9
mid: (0+9)/2 = 4, 7< a[4], so look in lower half
2 5 7 11
mid = (0+3)/2 == 1, 7> a[1], so look in upper half
7 11
mid = (2+3)/2 == 2, 7 == a[2], found!
low high
low high
low high
Recursive Binary Seach -- basic idea
![Page 47: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/47.jpg)
47
Recursive Binary Seach -- basic idea
• Example: array contains 3,5. Search for 4. (0+1)/2 is 0 (integer div). so if we don't exclude mid, the sub array starts again at index 0 and ends at 1. => infinite number of recursive calls in the code on the next page, mid is excluded from the subarray to prevent this.
![Page 48: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/48.jpg)
48
• Let's think about the design of the recursive fct before coding it:1. recursive calls: call function with that half of the current subrange
that contains x
Define subrange with start and end index
2. base case: when should the recursive calls stop: when we find x
• what if x is not in the array? -- stop if a single cell that does not contain x check: does the (start + end)/2 procedure always end in an array of length 1? A: depends on how you implement it. You must ensure that array gets at least smaller by 1.
Recursive Binary Seach -- basic idea
![Page 49: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/49.jpg)
Boolean BinarySearch ( int vec[ ] , int key , int fromLoc , int toLoc )
// PRE: vec [ fromLoc . . toLoc ] sorted in ascending order // POST: FCTVAL == ( key in vec [ fromLoc . . toLoc] )
{ int mid ;if ( fromLoc > toLoc ) // base case -- not found
return false ; else {
mid = ( fromLoc + toLoc ) / 2 ;
if ( vec [ mid ] == key ) // base case-- found at mid
return true ;
else if ( key < vec [ mid ] ) // search lower half return BinarySearch ( vec, key, fromLoc, mid-1 ) ; else // search upper half
return BinarySearch( vec, key, mid + 1, toLoc ) ; }
} 49
Recursive Binary Seach
![Page 50: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/50.jpg)
#include <stdio.h>
/* prototype */
int binSearch(int array[], int first, int last, int N);
void main(void){
int index;
int value;
int list[] = {1,2,3,5,6};
printf(“Enter a search value:”);
scanf(“%i”,&value);/* the function binSearch returns the index of the array */
/* where the match is found, otherwise a –1 */
index = binSearch(list,0,4,value);
if (index == -1)
printf(“Value not found!\n”);
else
printf(“Value matches the %i element in the array!\n”,++index);
}
/* code continued on next slide */
![Page 51: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/51.jpg)
/* array is the name of the array (or sub-array) to be searched */
/* first is the left-most index of the array being searched */
/* last is the right-most index of the array being searched */
/* N is the value being searched for */
int binSearch(int array[], int first, int last, int N) {
int midpt; if (N < array[first] || N > array[last] )
return -1;
/* didn't meet our error condition */
midpt = (first+last)/2;
if (array[midpt] == N)
return midpt; /* recursive calls */
else if (array[midpt] > N)
return binSearch( array, first, midpt – 1, N);
else
return binSearch( array, midpt+1,last, N);
}
![Page 52: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/52.jpg)
52
Note the contents of the “stack” when we execute a call binSearch from main:(some of the details are simplified)
(push) return binSearch( array, 0,1, 2); (first =0, last =4)
(push) return binSearch( array, 1,1, 2); (first =0, last =1)
(pop) return 1; (first = 1, last = 1)
(pop) return 1; (first =0, last=1)
(pop) return 1; (first =0, last=4)
Recursive Binary Seach
![Page 53: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/53.jpg)
53
Note the contents of the “stack” when we execute a call binSearch from main:(some of the details are simplified)
(push) return binSearch( array, 2+1,4, 4); (first =0, last =4)
(pop) return -1; (first = 3, last = 4)
(pop) return -1; (first =0, last =4)
Recursive Binary Seach
![Page 54: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/54.jpg)
54
Iteration vs. Recursion
Turns out any iterative algorithm can be reworked to use recursion instead (and vice versa).
There are programming languages where recursion is the only choice(!)
Some algorithms are more naturally written with recursion But naïve applications of recursion can be inefficient
![Page 55: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/55.jpg)
55
Binary Seach
Several comments on binary search:
• Binary search assumes that the elements are sorted. If they are not sorted, you won't know in which half to continue searching.
• Binary search is not a great idea for linked lists, since you can't just jump to the middle element. You'd have to iterate through the list to get there, so you could just as well check for x while you are doing that.
![Page 56: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/56.jpg)
56
Summary Linear search and binary search are two
different algorithms for searching an array Binary search is vastly more efficient
But binary search only works if the array elements are in order
![Page 57: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/57.jpg)
57
Hashing
![Page 58: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/58.jpg)
58
Tables: rows & columns of information
A table has several fields (types of information) A telephone book may have fields name, address, phone number A user account table may have fields user id, password, home
folder To find an entry in the table, you only need know the
contents of one of the fields (not all of them). This field is the key In a telephone book, the key is usually name In a user account table, the key is usually user id
Ideally, a key uniquely identifies an entry If the key is name and no two entries in the telephone book have
the same name, the key uniquely identifies the entries
![Page 59: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/59.jpg)
59
The Table ADT: operations
insert: given a key and an entry, inserts the entry into the table
find: given a key, finds the entry associated with the key remove: given a key, finds the entry associated with the
key, and removes it
Also: getIterator: returns an iterator, which visits each of the
entries one by one (the order may or may not be defined)etc.
![Page 60: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/60.jpg)
60
Table ADT’s
We are familiar with direct access structures and linear access structures.
Both have its advantages and disadvantages The crucial point for avoiding direct access structures is the
fact that we need to allocate in advance the size of this structure In all likelihood, we tend to overestimate the its size and we end up
with a very sparse structure We tend to think that the actual number of keys to be stored is
equivalent to the universe of possible existing keys In some problems the number of keys to be stored is smaller
than the number in the universe of keys. In this case a hash table may save us a lot of space.
![Page 61: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/61.jpg)
61
How should we implement a table?
How often are entries inserted and removed? How many of the possible key values are likely to be used? What is the likely pattern of searching for keys?
e.g. Will most of the accesses be to just one or two key values?
Is the table small enough to fit into memory? How long will the table exist?
Our choice of representation for the Table ADT depends on the answers to the following
![Page 62: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/62.jpg)
62
TableNode: a key and its entry For searching purposes, it is best to store the
key and the entry separately (even though the key’s value may be inside the entry)
“Smith” “Smith”, “124 Hawkers Lane”, “9675846”
“Yeo” “Yeo”, “1 Apple Crescent”, “0044 1970 622455”
key entry
TableNode
![Page 63: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/63.jpg)
63
Implementation 1:unsorted sequential array
An array in which TableNodes are stored consecutively in any order
insert: add to back of array; O(1) find: search through the keys one at
a time, potentially all of the keys; O(n)
remove: find + replace removed node with last node; O(n)
0
…
key entry
1
23
and so on
![Page 64: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/64.jpg)
64
Implementation 2:sorted sequential array
An array in which TableNodes are stored consecutively, sorted by key
insert: add in sorted order; O(n) find: binary search; O(log n) remove: find, remove node and
shuffle down; O(n)
0
…
key entry
1
23
We can use binary search because thearray elements are sorted
and so on
![Page 65: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/65.jpg)
65
Implementation 3:linked list (unsorted or sorted)
TableNodes are again stored consecutively
insert: add to front; O(1)or O(n) for a sorted list
find: search through potentially all the keys, one at a time; O(n)still O(n) for a sorted list
remove: find, remove using pointer alterations; O(n)
key entry
and so on
![Page 66: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/66.jpg)
66
An array in which TableNodes are not stored consecutively - their place of storage is calculated using the key and a hash function
Hashed key: the result of applying a hash function to a key
Keys and entries are scattered throughout the array
Implementation 5:hashing
key entry
Key hash function
array index
4
10
123
![Page 67: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/67.jpg)
67
An array in which TableNodes are not stored consecutively - their place of storage is calculated using the key and a hash function
insert: calculate place of storage, insert TableNode; O(1)
find: calculate place of storage, retrieve entry; O(1)
remove: calculate place of storage, set it to null; O(1)
Implementation 5:hashing
key entry
4
10
123
All are O(1) !
![Page 68: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/68.jpg)
68
Hash Functions
Hash tables normally maintain the invariant of direct access structure which provide O(1) time (constant time) to access an element
With direct access structure, a key k is normally stored in slot k. In hash tables this element is stored in slot h(k).
h(k) is a hash function. It maps the universe U of keys into the slots of a hash table (smaller than the universe)
h : U --> {0,1,...,m-1} where m is the size of the tableh : U --> {0,1,...,m-1} where m is the size of the table
![Page 69: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/69.jpg)
69
Hashing example: a fruit shop 10 stock details, 10 table positions
key entry01
2
3
4
5
6
7
8
9
Stock numbers are between 0 and 1000Use hash function: stock no. / 100What if we now insert stock no. 350?
Position 3 is occupied: there is a collision
Collision resolution strategy: insert in the next free position (linear probing)
85 85, apples
462 462, pears
912 912, papaya
323 323, guava
350 350, oranges
Given a stock number, we find stock by using the hash function again, and use the collision resolution strategy if necessary
![Page 70: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/70.jpg)
70
Pictorial view of Hash Tables
k1
k2k3
k4
![Page 71: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/71.jpg)
71
Pictorial view of Hash Tables
k1
k2k3
k4
k5
![Page 72: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/72.jpg)
72
Three factors affecting the performance of hashing
The hash function Ideally, it should distribute keys and entries evenly throughout the table It should minimise collisions, where the position given by the hash
function is already occupied The collision resolution strategy
Separate chaining: chain together several keys/entries in each position Open addressing: store the key/entry in a different position
The size of the table Too big will waste memory; too small will increase collisions and may
eventually force rehashing (copying into a larger table) Should be appropriate for the hash function used – and a prime number
is best
![Page 73: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/73.jpg)
73
Choosing a hash function:turning a key into a table position
Truncation Ignore part of the key and use the rest as the array index (converting
non-numeric parts) A fast technique, but check for an even distribution throughout the
table Folding
Partition the key into several parts and then combine them in any convenient way
Unlike truncation, uses information from the whole key Modular arithmetic (used by truncation & folding, and on its own)
To keep the calculated table position within the table, divide the position by the size of the table, and take the remainder as the new position
![Page 74: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/74.jpg)
74
Examples of hash functions (1)
Truncation: If students have an 9-digit identification number, take the last 3 digits as the table position e.g. 925371622 becomes 622
Folding: Split a 9-digit number into three 3-digit numbers, and add them e.g. 925371622 becomes 925 + 376 + 622 = 1923
Modular arithmetic: If the table size is 1000, the first example always keeps within the table range, but the second example does not (it should be mod 1000) e.g. 1923 mod 1000 = 923 (in Java: 1923 % 1000)
![Page 75: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/75.jpg)
75
Examples of hash functions (2) Using a telephone number as a key
The area code is not random, so will not spread the keys/entries evenly through the table (many collisions)
The last 3-digits are more random Using a name as a key
Use full name rather than surname (surname not particularly random) Assign numbers to the characters (e.g. a = 1, b = 2; or use Unicode
values) Strategy 1: Add the resulting numbers. Bad for large table size. Strategy 2: Call the number of possible characters c (e.g. c = 54 for
alphabet in upper and lower case, plus space and hyphen). Then multiply each character in the name by increasing powers of c, and add together.
![Page 76: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/76.jpg)
76
What is a Hash Table ?
The simplest kind of hash table is an array of records.
This example has 701 records.
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ]
An array of records
. . .
[ 700]
![Page 77: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/77.jpg)
77
What is a Hash Table ?
Each record has a special field, called its key.
In this example, the key is a long integer field called Number.
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ]
. . .
[ 700]
[ 4 ]
Number 506643548
![Page 78: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/78.jpg)
78
What is a Hash Table ?
The number might be a person's identification number, and the rest of the record has information about the person.
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ]
. . .
[ 700]
[ 4 ]
Number 506643548
![Page 79: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/79.jpg)
79
What is a Hash Table ?
When a hash table is in use, some spots contain valid records, and other spots are "empty".
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322
. . .
![Page 80: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/80.jpg)
80
Inserting a New Record
In order to insert a new record, the key must somehow be converted to an array index.
The index is called the hash value of the key.
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322
. . .
Number 580625685
![Page 81: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/81.jpg)
81
Inserting a New Record
Typical way create a hash value:
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322
. . .
Number 580625685
(Number mod 701)
What is (580625685 mod 701) ?
![Page 82: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/82.jpg)
82
Inserting a New Record
Typical way to create a hash value:
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322
. . .
Number 580625685
(Number mod 701)
What is (580625685 mod 701) ?3
![Page 83: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/83.jpg)
83
Inserting a New Record
The hash value is used for the location of the new record.
Number 580625685
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322
. . .
[3]
![Page 84: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/84.jpg)
84
Inserting a New Record
The hash value is used for the location of the new record.
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322
. . .Number 580625685
![Page 85: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/85.jpg)
85
Collisions
Here is another new record to insert, with a hash value of 2.
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322
. . .Number 580625685
Number 701466868
My hashvalue is [2].
![Page 86: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/86.jpg)
86
Collisions
This is called a collision, because there is already another valid record at [2].
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322
. . .Number 580625685
Number 701466868
When a collision occurs,move forward until you
find an empty spot.
When a collision occurs,move forward until you
find an empty spot.
![Page 87: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/87.jpg)
87
Collisions
This is called a collision, because there is already another valid record at [2].
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322
. . .Number 580625685
Number 701466868
When a collision occurs,move forward until you
find an empty spot.
When a collision occurs,move forward until you
find an empty spot.
![Page 88: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/88.jpg)
88
Collisions
This is called a collision, because there is already another valid record at [2].
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322
. . .Number 580625685
Number 701466868
When a collision occurs,move forward until you
find an empty spot.
When a collision occurs,move forward until you
find an empty spot.
![Page 89: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/89.jpg)
89
Collisions
This is called a collision, because there is already another valid record at [2].
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322
. . .Number 580625685 Number 701466868
The new record goesin the empty spot.
The new record goesin the empty spot.
![Page 90: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/90.jpg)
90
A Quiz
Where would you be placed in this table, if there is no collision? Use your social security number or some other favorite number.
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322Number 580625685 Number 701466868
. . .
![Page 91: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/91.jpg)
91
Searching for a Key
The data that's attached to a key can be found fairly quickly.
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322
. . .Number 580625685 Number 701466868
Number 701466868
![Page 92: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/92.jpg)
92
Searching for a Key
Calculate the hash value. Check that location of the array
for the key.
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322
. . .Number 580625685 Number 701466868
Number 701466868
My hashvalue is [2].
Not me.
![Page 93: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/93.jpg)
93
Searching for a Key
Keep moving forward until you find the key, or you reach an empty spot.
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322
. . .Number 580625685 Number 701466868
Number 701466868
My hashvalue is [2].
Not me.
![Page 94: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/94.jpg)
94
Searching for a Key
Keep moving forward until you find the key, or you reach an empty spot.
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322
. . .Number 580625685 Number 701466868
Number 701466868
My hashvalue is [2].
Not me.
![Page 95: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/95.jpg)
95
Searching for a Key
Keep moving forward until you find the key, or you reach an empty spot.
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322
. . .Number 580625685 Number 701466868
Number 701466868
My hashvalue is [2].
Yes!
![Page 96: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/96.jpg)
96
Searching for a Key
When the item is found, the information can be copied to the necessary location.
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322
. . .Number 580625685 Number 701466868
Number 701466868
My hashvalue is [2].
Yes!
![Page 97: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/97.jpg)
97
Deleting a Record
Records may also be deleted from a hash table.
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322
. . .Number 580625685 Number 701466868
Pleasedelete me.
![Page 98: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/98.jpg)
98
Deleting a Record
Records may also be deleted from a hash table. But the location must not be left as an ordinary "empty
spot" since that could interfere with searches.
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 233667136Number 281942902 Number 155778322
. . .Number 580625685 Number 701466868
![Page 99: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/99.jpg)
99
Deleting a Record
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 233667136Number 281942902 Number 155778322
. . .Number 580625685 Number 701466868
Records may also be deleted from a hash table. But the location must not be left as an ordinary "empty
spot" since that could interfere with searches. The location must be marked in some special way so that
a search can tell that the spot used to have something in it.
![Page 100: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/100.jpg)
100
Using a hash function
[ 0 ]
[ 1 ]
[ 2 ]
[ 3 ]
[ 4 ]
. . .
Empty
4501
Empty
8903
8
10
values
[ 97]
[ 98]
[ 99]
7803
Empty
.
.
.
Empty
2298
3699
HandyParts company makes no more than 100 different parts. But theparts all have four digit numbers.
This hash function can be used tostore and retrieve parts in an array.
Hash(key) = partNum % 100
![Page 101: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/101.jpg)
101
Placing elements in the array
Use the hash function
Hash(key) = partNum % 100
to place the element with
part number 5502 in the
array.
[ 0 ]
[ 1 ]
[ 2 ]
[ 3 ]
[ 4 ]
. . .
Empty
4501
Empty
8903
8
10
values
[ 97]
[ 98]
[ 99]
7803
Empty
.
.
.
Empty
2298
3699
![Page 102: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/102.jpg)
102
Placing elements in the array
Next place part number6702 in the array.
Hash(key) = partNum % 100
6702 % 100 = 2
But values[2] is already occupied.
COLLISION OCCURS
[ 0 ]
[ 1 ]
[ 2 ]
[ 3 ]
[ 4 ]
. . .
values
[ 97]
[ 98]
[ 99]
7803
Empty
.
.
.
Empty 2298
3699
Empty
4501
5502
![Page 103: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/103.jpg)
103
How to resolve the collision?
One way is by linear probing.This uses the rehash function
(HashValue + 1) % 100
repeatedly until an empty locationis found for part number 6702.
[ 0 ]
[ 1 ]
[ 2 ]
[ 3 ]
[ 4 ]
. . .
values
[ 97]
[ 98]
[ 99]
7803
Empty
.
.
.
Empty
2298
3699
Empty
4501
5502
![Page 104: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/104.jpg)
104
Resolving the collision
Still looking for a place for 6702using the function
(HashValue + 1) % 100
[ 0 ]
[ 1 ]
[ 2 ]
[ 3 ]
[ 4 ]
. . .
values
[ 97]
[ 98]
[ 99]
7803
Empty
.
.
.
Empty
2298
3699
Empty
4501
5502
![Page 105: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/105.jpg)
105
Collision resolved
Part 6702 can be placed atthe location with index 4.
[ 0 ]
[ 1 ]
[ 2 ]
[ 3 ]
[ 4 ]
. . .
values
[ 97]
[ 98]
[ 99]
7803
Empty
.
.
.
Empty
2298
3699
Empty
4501
5502
![Page 106: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/106.jpg)
106
Collision resolved
Part 6702 is placed atthe location with index 4.
Where would the part withnumber 4598 be placed usinglinear probing?
[ 0 ]
[ 1 ]
[ 2 ]
[ 3 ]
[ 4 ]
. . .
values
[ 97]
[ 98]
[ 99]
7803
6702
.
.
.
Empty
2298
3699
Empty
4501
5502
![Page 107: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/107.jpg)
107
Choosing the table size to minimise collisions
As the number of elements in the table increases, the likelihood of a collision increases - so make the table as large as practical
If the table size is 100, and all the hashed keys are divisable by 10, there will be many collisions! Particularly bad if table size is a power of a small integer
such as 2 or 10 More generally, collisions may be more frequent if:
greatest common divisor (hashed keys, table size) > 1 Therefore, make the table size a prime number (gcd = 1)
Collisions may still happen, so we need a collision resolution strategy
![Page 108: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/108.jpg)
108
Collision resolution techniques
We will review a simple technique called chaining. However there are those who argue against this approach and point out other techniques such as: Linear Probing: Very simple. If position h(key) is occupied, do a
linear search in the table until you find a empty slot. The slot is searched in this order: h(key), k(key)+1, h(key)+2, ..., h(key)+c
Quadratic probing: is a variant of the above where the term being added to the hash result is squared. h(key)+c2
Random probing: is another variant where the term being added to the hash function is a random number. h(key)+random()
Rehashing: is a technique where a sequence of hashing functions are defined (h
1, h
2, ... h
k). If a collision occurs the
functions are used in the this order
![Page 109: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/109.jpg)
109
Collision resolution:open addressing (1)
Linear probing: increase by 1 each time [mod table size!] Quadratic probing: to the original position, add 1, 4, 9, 16,…
Probing: If the table position given by the hashed key is already occupied, increase the position by some amount, until an empty position is found
Use the collision resolution strategy when inserting and when finding (ensure that the search key and the found keys match)
May also double hash: result of linear probing result of another hash function
With open addressing, the table size should be double the expected no. of elements
![Page 110: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/110.jpg)
110
Clustering
is the tendency of elements to become unevenly distributed in the hash table, with many elements clustering around a single hash location.
One problem with linear probing is that it results in clustering.
![Page 111: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/111.jpg)
111
Collision resolution:open addressing (2)
If the table is fairly empty with many collisions, linear probing may cluster (group) keys/entries This increases the time to insert and to find
1 2 3 4 5 6 7 8
For a table of size n, then if the table is empty, the probability of the next entry going to any particular place is 1/nIn the diagram, the probability of position 2 getting filled next is 2/n (either a hash to 1 or to 2 fills it)Once 2 is full, the probability of 4 being filled next is 4/n and then of 7 is 7/n (i.e. the probability of getting long strings steadily increases)
![Page 112: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/112.jpg)
112
Collision resolution:open addressing (3)
An empty key/entry marks the end of a cluster, and so can be used to terminate a find operation
So, if we remove an entry within a cluster, we should not empty it!
To allow probing to continue, the removed entry must be marked as ‘removed but cluster continues’
![Page 113: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/113.jpg)
113
Collision resolution:open addressing (4)
Quadratic probing is a solution to the clustering problem Linear probing adds 1, 2, 3, etc. to the original hashed key Quadratic probing adds 12, 22, 32 etc. to the original hashed
key However, whereas linear probing guarantees that all empty
positions will be examined if necessary, quadratic probing does not e.g. Table size 16 and original hashed key 3 gives the
sequence: 3, 4, 7, 12, 3, 12, 7, 4… More generally, with quadratic probing, insertion may be
impossible if the table is more than half-full! Need to rehash (see later)
![Page 114: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/114.jpg)
114
Collision resolution: chaining Each slot of a hash table will be a
pointer to a linked list Add the keys and entries anywhere in
the list (front easiest) Advantages over open addressing:
Simpler insertion and removal Array size is not a limitation (but
should still minimise collisions: make table size roughly equal to expected number of keys and entries)
Disadvantage Memory overhead is large if entries
are small
4
10
123
key entry key entry
key entry key entry
key entry
No need to change position!
![Page 115: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/115.jpg)
115
Chaining
is another means (besides linear probing) used to handle collisions that arise from the use of a hash function.
Chaining uses the hash value, not as the actual location of the element, but as the index into an array of pointers. A chain is a linked list of elements that share the same hash location.
FOR EXAMPLE . . .
![Page 116: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/116.jpg)
116
Using hashing and chaining
[ 0 ]
[ 1 ]
[ 2 ]
[ 3 ]
[ 4 ]
. . .
pointers
[ 97]
[ 98]
[ 99]
HandyParts company makes no more than 100 different parts. But theparts all have four digit numbers.
Use this hash function to store and retrieve parts in the chains.
Hash(key) = partNum % 100
7803
.
.
.
2298
3699
4501
![Page 117: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/117.jpg)
117
Using chaining
[ 0 ]
[ 1 ]
[ 2 ]
[ 3 ]
[ 4 ]
. . .
pointers
[ 97]
[ 98]
[ 99]
7803
.
.
.
2298
3699
4501
Use the hash function
Hash(key) = partNum % 100
to place the element with
part number 5502 in a chain.
5502
![Page 118: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/118.jpg)
118
Using chaining
[ 0 ]
[ 1 ]
[ 2 ]
[ 3 ]
[ 4 ]
. . .
[ 97]
[ 98]
[ 99]
7803
.
.
.
2298
3699
4501
5502
Next place part number6702 in a chain.
Hash(key) = partNum % 100
6702 % 100 = 2
6702
pointers
![Page 119: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/119.jpg)
119
Using chaining
[ 0 ]
[ 1 ]
[ 2 ]
[ 3 ]
[ 4 ]
. . .
[ 97]
[ 98]
[ 99]
7803
.
.
.
2298
3699
4501
5502 6702
Where would the part withnumber 4598 be placed using chaining?
pointers
![Page 120: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/120.jpg)
120
More Chaining…….
![Page 121: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/121.jpg)
121
Hashing(103)
h(103) = 103 mod 10 h(103) = 3
h(103) = 103 mod 10 h(103) = 3
![Page 122: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/122.jpg)
122
Hashing(103)
h(n) = 103 mod 10 h(n) = 3
h(n) = 103 mod 10 h(n) = 3
103103 //
![Page 123: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/123.jpg)
123
Hashing(69)
h(n) = 69 mod 10 h(n) = 9
h(n) = 69 mod 10 h(n) = 9
103103 //
6969 //
![Page 124: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/124.jpg)
124
Hashing(20)
h(n) = 20 mod 10 h(n) = 0
h(n) = 20 mod 10 h(n) = 0
103103 //
6969 //
2020 //
![Page 125: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/125.jpg)
125
Hashing(13)
h(n) = 13 mod 10 h(n) = 3
h(n) = 13 mod 10 h(n) = 3
103103
6969 //
2020 //
1313 //
![Page 126: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/126.jpg)
126
Hashing(110)
h(n) = 110 mod 10 h(n) = 0
h(n) = 110 mod 10 h(n) = 0
103103
6969 //
2020
1313 //
110110 //
![Page 127: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/127.jpg)
127
Hashing(53)
h(n) = 53 mod 10 h(n) = 3
h(n) = 53 mod 10 h(n) = 3
103103
6969 //
2020
1313 //
110110 //
5353 //
![Page 128: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/128.jpg)
128
Final Hash Table
103103
6969 //
2020
1313 //
110110 //
5353 //
![Page 129: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/129.jpg)
129
Searching in a Hash Table
Like any other structure, searching is a common task with hash tables
Searching works as belowGiven a target, hash the targetTake the value of the hash of target and go to the slot.
If the target exist it must be in this slotSearch in the list in the current slot using a linear
search.
![Page 130: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/130.jpg)
130
Searching for 53
103103
6969 //
2020
1313 //
110110 //
5353 //
![Page 131: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/131.jpg)
131
Searching for 53
103103
6969 //
2020
1313 //
110110 //
5353 //
![Page 132: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/132.jpg)
132
Searching for 53
103103
6969 //
2020
1313 //
110110 //
5353 //
temptemp
![Page 133: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/133.jpg)
133
Searching for 53
103103
6969 //
2020
1313 //
110110 //
5353 //
temptemp
![Page 134: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/134.jpg)
134
Searching for 53
103103
6969 //
2020
1313 //
110110 //
5353 //
temptemp
![Page 135: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/135.jpg)
135
Searching for 53
103103
6969 //
2020
1313 //
110110 //
5353 //
temptemp
![Page 136: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/136.jpg)
136
hashSearch(n)
NodeType hashSearch(NodeType* table[],int target) { int index = hash(target); NodeType *temp = table[index]; return linearSearch(temp,target);}
NodeType hashSearch(NodeType* table[],int target) { int index = hash(target); NodeType *temp = table[index]; return linearSearch(temp,target);}
![Page 137: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/137.jpg)
137
Rehashing: enlarging the table To rehash:
Create a new table of double the size (adjusting until it is again prime) Transfer the entries in the old table to the new table, by recomputing their
positions (using the hash function) When should we rehash?
When the table is completely full With quadratic probing, when the table is half-full or insertion fails
Why double the size? If n is the number of elements in the table, there must have been n/2
insertions before the previous rehash (if rehashing done when table full) So by making the table size 2n, a constant cost is added to each insertion
![Page 138: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/138.jpg)
138
Comparison of collision techniques
factor (n/size)
Exp
ecte
d N
umbe
r of
Pro
besLinear Probing
Random Probing
Chaining
![Page 139: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/139.jpg)
139
Applications of Hashing Compilers use hash tables to keep track of declared variables A hash table can be used for on-line spelling checkers — if
misspelling detection (rather than correction) is important, an entire dictionary can be hashed and words checked in constant time
Game playing programs use hash tables to store seen positions, thereby saving computation time if the position is encountered again
Hash functions can be used to quickly check for inequality — if two elements hash to different values they must be different
Storing sparse data
![Page 140: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/140.jpg)
140
When are other representations more suitable than hashing?
Hash tables are very good if there is a need for many searches in a reasonably stable table
Hash tables are not so good if there are many insertions and deletions, or if table traversals are needed — in this case, AVL trees are better
If there are more data than available memory then use a B-tree
Also, hashing is very slow for any operations which require the entries to be sorted e.g. Find the minimum key
![Page 141: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/141.jpg)
141
Performance of Hashing
The number of probes depends on the load factor (usually denoted by ) which represents the ratio of entries present in the table to the number of positions in the array
We also need to consider successful and unsuccessful searches separately
For a chained hash table, the average number of probes for an unsuccessful search is and for a successful search is 1 + /2
![Page 142: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/142.jpg)
142
Performance of Hashing (2)
For open addressing, the formulae are more complicated but typical values are:Load Factor 0.1 0.5 0.8 0.9 0.99Successful searchLinear Probes 1.05 1.6 3.4 6.2 21.3Quadratic Probes 1.04 1.5 2.1 2.7 5.2Unsuccessful searchLinear Probes 1.13 2.7 15.4 59.8 430Quadratic probes 1.13 2.2 5.2 11.9 126
Note that these do not depend on the size of the array or the number of entries present but only on the ratio (the load factor)
![Page 143: Searching and Hashing](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813698550346895d9e29e6/html5/thumbnails/143.jpg)
143
Hash tables store a collection of records with keys. The location of a record depends on the hash value of the
record's key. When a collision occurs, the next available location is
used. Searching for a particular key is generally quick. When an item is deleted, the location must be marked in a
special way, so that the searches know that the spot used to be used.
Summary