Post on 18-Dec-2015
Expected Running Times and Randomized Algorithms
Instructor Neelima Gupta nguptacsduacin
Expected Running Time of Insertion Sort
x1x2 xi-1xihellipxn
For I = 2 to n
Insert the ith element xi in the partially sorted list x1x2 xi-1
(at rth position)
bull Let Xi be the random variable which represents the number of comparisons required to insert ith element of the input array in the sorted sub array of first i-1 elements
bull Xi xi1xi2hellipxii
E(Xi) = Σj xijp(xij )
where E(Xi) is the expected value Xi
And p(xij) is the probability of inserting xi in the jth position 1lejlei
Expected Running Time of Insertion Sort
x1x2 xi-1xihellipxn
How many comparisons it makes to insert ith element in jth position
(at jth position)
Expected Running Time of Insertion Sort
bull Position of Comparisionsi 1i-1 2i-2 3
2 i-11 i-1
Note Here both position 2 and 1 have of Comparisions equal to i-1 Why Because to insert element at position 2 we have to compare with previously
first element and after that comparison we know which of them come first and which at second
Thus E(Xi) = (1i) i-1Σk=1k + (i-1)
where 1i is the probability to insert at jth position in the i possible positions
For n elements
E(X1 + X2 + +Xn)
= nΣi=2 E(Xi)
= nΣi=2 (1i) i-1Σk=1k + (i-1) = (n-1)(n-4)4
Therefore average case of insertion sort takes Θ(n2)
For n number of elements expected time taken is
T = nΣi=2 (1i) i-1Σk=1k + (i-1)
where 1i is the probability to insert at rth position in the i possible positions
E(X1 + X2 + +Xn) = nΣi=1 E(Xi)WhereXi is expected value of inserting Xi element
T = (n-1)(n-4)4Therefore average case of insertion sort takes
Θ(n2)
Quick-Sort
bull Pick the first item from the array--call it the pivotbull Partition the items in the array around the pivot so all
elements to the left are to the pivot and all elements to the right are greater than the pivot
bull Use recursion to sort the two partitions
pivotpartition items gt pivotpartition 1 items pivot
Quicksort Expected number of comparisons
bull Partition may generate splits (0n-1 1n-2 2n-3 hellip n-21 n-
10) each with probability 1n
bull If T(n) is the expected running time
euro
T n( ) =1
nT k( ) + T n minus1minus k( )[ ] + Θ n( )
k= 0
nminus1
sum
Randomized Quick-Sort
bull Pick an element from the array--call it the pivotbull Partition the items in the array around the pivot so all
elements to the left are to the pivot and all elements to the right are greater than the pivot
bull Use recursion to sort the two partitions
pivotpartition items gt pivotpartition 1 items pivot
Remarksbull Not much different from the Q-sort except
that earlier the algorithm was deterministic and the bounds were probabilistic
bull Here the algorithm is also randomized We pick an element to be a pivot randomly Notice that there isnrsquot any difference as to how does the algorithm behave there onwards
bull In the earlier case we can identify the worst case input Here no input is worst case
Randomized Select
1
0
1max1 n
k
nknTkTn
nT
Randomized Algorithms
bull A randomized algorithm performs coin tosses (ieuses random bits) to control its execution
bull b larr random()if b = 0do A hellipelse ie b = 1do B hellip
bull Its running time depends on the outcomes of the coin tosses
Assumptions
bull 1048708 the coins are unbiased andbull 1048708 the coin tosses are independent
bull The worst-case running time of a randomized algorithm may be large but occurs with very low probability (eg it occurs when all the coin tosses give ldquoheadsrdquo)
Monte Carlo Algorithms
bull Running times are guaranteed but the output may not be completely correct
bull Probability of error is low
Las Vegas Algorithms
bull Output is guaranteed to be correct
bull Bounds on running times hold with high probability
bull What type of algorithm is Randomized Qsort
Why expected running times
bull Markovrsquos inequality
P( X gt k E(X)) lt 1k
ie the probability that the algorithm will take more than O(2 E(X)) time is less than 12
Or the probability that the algorithm will take more than O(10 E(X)) time is less than 110
This is the reason why Qsort does well in practice
Markovrsquos Bound
P(XltkM)lt 1k where k is a constant
Chernouffrsquos Bound
P(Xgt2μ)lt frac12
A More Stronger Result
P(Xgtk μ )lt 1nk where k is a constant
Binary search tree can be built randomly
Rank(x)=i
Randomly selected key becomes the root
Pivot element=root
x
gtlt
RANDOMLY BUILT BST
bull Xi the height of the tree rooted at a node with rank=i
bull Yi exponential height of the tree=2^Xi
bull H=maxH1H2 + 1
where H1 ht of left subtree
H2 htof right subtree
H ht of the tree rooted at x
HEIGHT OF THE TREE
bull Y=2^H
=2max2^H12^H2
bull Expected value of exponential ht of the tree with lsquonrsquo nodes
=E(EH(T(X)))
=2n sum maxEH(T(k))EH(T(n-1-k))
=O(n^3)=E(H(T(n)))=O(log n)
Skip list is a data structure that can be used to maintain dictionary
Given n keys we insert these n keys in a linked list that has -infin as first node and infin as last node
Initial list S0
Then we flip coin a coin for each element until only one is left in Si if a tail occurswe insert it into next list Si+1 and so on
-infin infin5 9 25 30 35 38 40
Skip List Dictionary as ADT
-infin
-infin
-infin
S3
S2
S1
S0
-infin 5 infin 4038 353025 9
infin
infin
infin
9 30 38
30
38 30
head Tail
Operations that can be performed on skip list
Each node has two pointers right and down
1 Drop down
bull This operation is performed when after(p)gtkey
bull In this operation pointer p moves down to immediate lower level list
(after drop down)
right
down
-infin
-infin infin
infin 30
309 38
p
S1
S0
2Scan forward
bull This operation is performed when after(p)ltkey
bull Here the pointer p moves to the next element in the list
bull eg here key=28 amp p is at 9 after(9)lt28 so scan forward
-infin 9 infin 25 30
p p pS0
Searching a key kKeep a ptr p to the first node in the highest list Sh
while (after(p)gtk)
if (Scur==S0) Scur is the current skip list
then ldquokey k not foundrdquo
exit
if (after(p)gtk)
drop down to next skip list
If (after(p)ltk)
scan forward ie update pafter(p)
if (after(p)==k)
return after(p)
-infin
-infin
-infin
Searching for a key 25
S3
S2
S1
S0
-infin 5 infin 4038 353025 9
infin
infin
infin
30
9 30 38
p
p
p
Key found
p
p
p
-infin
-infin
-infin
Searching for a key 28
S3
S2
S1
S0
-infin 5 infin 4038 353025 9
infin
infin
infin
30
9 30 38
p
p
p
Key not found
p
p
p
p
-infin
-infin
-infin
Deletion of a key
S3 eg delete 30
S2
S1
S0
-infin 5 infin 4038 353025 9
infin
infin
infin
30
9 30 38
p
p
p
p
p
Analysis
1 An element xk is in Si with probability 12i true forall elements
E(Si ) = sum 12i Xki where Xki = 1 if xk is in Si
0 otherwise
= n2i
E(total size) = E(sum ISi I)
= sum n2i le 2n
2 Expected height of a skip listh = log n
n2h =1
h ≃ log n
n
k=1
k=1
infin
Analysis(contd)
3 Drop down O(log n)
Since pointer p can drop atmost h times
ieheight of the skip list until S0 is reached
and h = logn
4 Scan forward O(log n)
of elements Total no of levels Total Cost
to scan at each level
O(1) O(log n ) O(log n )
The number of elements scanned at ith level is no more than 2 because
The key lies between p and after(p) on the (i+1)th level (thatrsquos why we came down to ith level) And there is only one element between p and after(p) of
(i+1)th level in Si the element pointed to by after(p) in Si
Thus we scan at most two elements at Si the element pointed to by p (when we came down) and after(p) in Si
Hashing
bull Motivation symbol tablesndash A compiler uses a symbol table to relate
symbols to associated databull Symbols variable names procedure names etcbull Associated data memory location call graph etc
ndash For a symbol table (also called a dictionary) we care about search insertion and deletion
ndash We typically donrsquot care about sorted order
Hash Tables
bull More formallyndash Given a table T and a record x with key (= symbol)
and satellite data we need to supportbull Insert (T x)bull Delete (T x)bull Search(T x)
ndash We want these to be fast but donrsquot care about sorting the records
bull The structure we will use is a hash tablendash Supports all the above in O(1) expected time
Hash Functions
bull Next problem collision T
0
m - 1
h(k1)
h(k4)
h(k2) = h(k5)
h(k3)
k4
k2 k3
k1
k5
U(universe of keys)
K(actualkeys)
Resolving Collisions
bull How can we solve the problem of collisions
bull One of the solution is chaining
bull Other solutions open addressing
Chaining
bull Chaining puts elements that hash to the same slot in a linked list
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Chaining
bull How do we insert an element
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Chaining
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
bull How do we delete an element
Chaining
bull How do we search for a element with a given key
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
bull What will be the average cost of a successful search
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
bull What will be the average cost of a successful search A O((1 + )2) = O(1 + )
Analysis of Chaining Continued
bull So the cost of searching = O(1 + )
bull If the number of keys n is proportional to the number of slots in the table what is
bull A = O(1)ndash In other words we can make the expected
cost of searching constant if we make constant
If we could prove this
P(failure)lt1k (we are sort of happy)
P(failure)lt1nk (most of times this is true and wersquore
happy )
P(failure)lt12n (this is difficult but still we want this)
A Final Word About Randomized Algorithms
Acknowledgements
bull Kunal Verma
bull Nidhi Aggarwal
bull And other students of MSc(CS) batch 2009
END
Expected Running Time of Insertion Sort
x1x2 xi-1xihellipxn
For I = 2 to n
Insert the ith element xi in the partially sorted list x1x2 xi-1
(at rth position)
bull Let Xi be the random variable which represents the number of comparisons required to insert ith element of the input array in the sorted sub array of first i-1 elements
bull Xi xi1xi2hellipxii
E(Xi) = Σj xijp(xij )
where E(Xi) is the expected value Xi
And p(xij) is the probability of inserting xi in the jth position 1lejlei
Expected Running Time of Insertion Sort
x1x2 xi-1xihellipxn
How many comparisons it makes to insert ith element in jth position
(at jth position)
Expected Running Time of Insertion Sort
bull Position of Comparisionsi 1i-1 2i-2 3
2 i-11 i-1
Note Here both position 2 and 1 have of Comparisions equal to i-1 Why Because to insert element at position 2 we have to compare with previously
first element and after that comparison we know which of them come first and which at second
Thus E(Xi) = (1i) i-1Σk=1k + (i-1)
where 1i is the probability to insert at jth position in the i possible positions
For n elements
E(X1 + X2 + +Xn)
= nΣi=2 E(Xi)
= nΣi=2 (1i) i-1Σk=1k + (i-1) = (n-1)(n-4)4
Therefore average case of insertion sort takes Θ(n2)
For n number of elements expected time taken is
T = nΣi=2 (1i) i-1Σk=1k + (i-1)
where 1i is the probability to insert at rth position in the i possible positions
E(X1 + X2 + +Xn) = nΣi=1 E(Xi)WhereXi is expected value of inserting Xi element
T = (n-1)(n-4)4Therefore average case of insertion sort takes
Θ(n2)
Quick-Sort
bull Pick the first item from the array--call it the pivotbull Partition the items in the array around the pivot so all
elements to the left are to the pivot and all elements to the right are greater than the pivot
bull Use recursion to sort the two partitions
pivotpartition items gt pivotpartition 1 items pivot
Quicksort Expected number of comparisons
bull Partition may generate splits (0n-1 1n-2 2n-3 hellip n-21 n-
10) each with probability 1n
bull If T(n) is the expected running time
euro
T n( ) =1
nT k( ) + T n minus1minus k( )[ ] + Θ n( )
k= 0
nminus1
sum
Randomized Quick-Sort
bull Pick an element from the array--call it the pivotbull Partition the items in the array around the pivot so all
elements to the left are to the pivot and all elements to the right are greater than the pivot
bull Use recursion to sort the two partitions
pivotpartition items gt pivotpartition 1 items pivot
Remarksbull Not much different from the Q-sort except
that earlier the algorithm was deterministic and the bounds were probabilistic
bull Here the algorithm is also randomized We pick an element to be a pivot randomly Notice that there isnrsquot any difference as to how does the algorithm behave there onwards
bull In the earlier case we can identify the worst case input Here no input is worst case
Randomized Select
1
0
1max1 n
k
nknTkTn
nT
Randomized Algorithms
bull A randomized algorithm performs coin tosses (ieuses random bits) to control its execution
bull b larr random()if b = 0do A hellipelse ie b = 1do B hellip
bull Its running time depends on the outcomes of the coin tosses
Assumptions
bull 1048708 the coins are unbiased andbull 1048708 the coin tosses are independent
bull The worst-case running time of a randomized algorithm may be large but occurs with very low probability (eg it occurs when all the coin tosses give ldquoheadsrdquo)
Monte Carlo Algorithms
bull Running times are guaranteed but the output may not be completely correct
bull Probability of error is low
Las Vegas Algorithms
bull Output is guaranteed to be correct
bull Bounds on running times hold with high probability
bull What type of algorithm is Randomized Qsort
Why expected running times
bull Markovrsquos inequality
P( X gt k E(X)) lt 1k
ie the probability that the algorithm will take more than O(2 E(X)) time is less than 12
Or the probability that the algorithm will take more than O(10 E(X)) time is less than 110
This is the reason why Qsort does well in practice
Markovrsquos Bound
P(XltkM)lt 1k where k is a constant
Chernouffrsquos Bound
P(Xgt2μ)lt frac12
A More Stronger Result
P(Xgtk μ )lt 1nk where k is a constant
Binary search tree can be built randomly
Rank(x)=i
Randomly selected key becomes the root
Pivot element=root
x
gtlt
RANDOMLY BUILT BST
bull Xi the height of the tree rooted at a node with rank=i
bull Yi exponential height of the tree=2^Xi
bull H=maxH1H2 + 1
where H1 ht of left subtree
H2 htof right subtree
H ht of the tree rooted at x
HEIGHT OF THE TREE
bull Y=2^H
=2max2^H12^H2
bull Expected value of exponential ht of the tree with lsquonrsquo nodes
=E(EH(T(X)))
=2n sum maxEH(T(k))EH(T(n-1-k))
=O(n^3)=E(H(T(n)))=O(log n)
Skip list is a data structure that can be used to maintain dictionary
Given n keys we insert these n keys in a linked list that has -infin as first node and infin as last node
Initial list S0
Then we flip coin a coin for each element until only one is left in Si if a tail occurswe insert it into next list Si+1 and so on
-infin infin5 9 25 30 35 38 40
Skip List Dictionary as ADT
-infin
-infin
-infin
S3
S2
S1
S0
-infin 5 infin 4038 353025 9
infin
infin
infin
9 30 38
30
38 30
head Tail
Operations that can be performed on skip list
Each node has two pointers right and down
1 Drop down
bull This operation is performed when after(p)gtkey
bull In this operation pointer p moves down to immediate lower level list
(after drop down)
right
down
-infin
-infin infin
infin 30
309 38
p
S1
S0
2Scan forward
bull This operation is performed when after(p)ltkey
bull Here the pointer p moves to the next element in the list
bull eg here key=28 amp p is at 9 after(9)lt28 so scan forward
-infin 9 infin 25 30
p p pS0
Searching a key kKeep a ptr p to the first node in the highest list Sh
while (after(p)gtk)
if (Scur==S0) Scur is the current skip list
then ldquokey k not foundrdquo
exit
if (after(p)gtk)
drop down to next skip list
If (after(p)ltk)
scan forward ie update pafter(p)
if (after(p)==k)
return after(p)
-infin
-infin
-infin
Searching for a key 25
S3
S2
S1
S0
-infin 5 infin 4038 353025 9
infin
infin
infin
30
9 30 38
p
p
p
Key found
p
p
p
-infin
-infin
-infin
Searching for a key 28
S3
S2
S1
S0
-infin 5 infin 4038 353025 9
infin
infin
infin
30
9 30 38
p
p
p
Key not found
p
p
p
p
-infin
-infin
-infin
Deletion of a key
S3 eg delete 30
S2
S1
S0
-infin 5 infin 4038 353025 9
infin
infin
infin
30
9 30 38
p
p
p
p
p
Analysis
1 An element xk is in Si with probability 12i true forall elements
E(Si ) = sum 12i Xki where Xki = 1 if xk is in Si
0 otherwise
= n2i
E(total size) = E(sum ISi I)
= sum n2i le 2n
2 Expected height of a skip listh = log n
n2h =1
h ≃ log n
n
k=1
k=1
infin
Analysis(contd)
3 Drop down O(log n)
Since pointer p can drop atmost h times
ieheight of the skip list until S0 is reached
and h = logn
4 Scan forward O(log n)
of elements Total no of levels Total Cost
to scan at each level
O(1) O(log n ) O(log n )
The number of elements scanned at ith level is no more than 2 because
The key lies between p and after(p) on the (i+1)th level (thatrsquos why we came down to ith level) And there is only one element between p and after(p) of
(i+1)th level in Si the element pointed to by after(p) in Si
Thus we scan at most two elements at Si the element pointed to by p (when we came down) and after(p) in Si
Hashing
bull Motivation symbol tablesndash A compiler uses a symbol table to relate
symbols to associated databull Symbols variable names procedure names etcbull Associated data memory location call graph etc
ndash For a symbol table (also called a dictionary) we care about search insertion and deletion
ndash We typically donrsquot care about sorted order
Hash Tables
bull More formallyndash Given a table T and a record x with key (= symbol)
and satellite data we need to supportbull Insert (T x)bull Delete (T x)bull Search(T x)
ndash We want these to be fast but donrsquot care about sorting the records
bull The structure we will use is a hash tablendash Supports all the above in O(1) expected time
Hash Functions
bull Next problem collision T
0
m - 1
h(k1)
h(k4)
h(k2) = h(k5)
h(k3)
k4
k2 k3
k1
k5
U(universe of keys)
K(actualkeys)
Resolving Collisions
bull How can we solve the problem of collisions
bull One of the solution is chaining
bull Other solutions open addressing
Chaining
bull Chaining puts elements that hash to the same slot in a linked list
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Chaining
bull How do we insert an element
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Chaining
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
bull How do we delete an element
Chaining
bull How do we search for a element with a given key
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
bull What will be the average cost of a successful search
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
bull What will be the average cost of a successful search A O((1 + )2) = O(1 + )
Analysis of Chaining Continued
bull So the cost of searching = O(1 + )
bull If the number of keys n is proportional to the number of slots in the table what is
bull A = O(1)ndash In other words we can make the expected
cost of searching constant if we make constant
If we could prove this
P(failure)lt1k (we are sort of happy)
P(failure)lt1nk (most of times this is true and wersquore
happy )
P(failure)lt12n (this is difficult but still we want this)
A Final Word About Randomized Algorithms
Acknowledgements
bull Kunal Verma
bull Nidhi Aggarwal
bull And other students of MSc(CS) batch 2009
END
bull Let Xi be the random variable which represents the number of comparisons required to insert ith element of the input array in the sorted sub array of first i-1 elements
bull Xi xi1xi2hellipxii
E(Xi) = Σj xijp(xij )
where E(Xi) is the expected value Xi
And p(xij) is the probability of inserting xi in the jth position 1lejlei
Expected Running Time of Insertion Sort
x1x2 xi-1xihellipxn
How many comparisons it makes to insert ith element in jth position
(at jth position)
Expected Running Time of Insertion Sort
bull Position of Comparisionsi 1i-1 2i-2 3
2 i-11 i-1
Note Here both position 2 and 1 have of Comparisions equal to i-1 Why Because to insert element at position 2 we have to compare with previously
first element and after that comparison we know which of them come first and which at second
Thus E(Xi) = (1i) i-1Σk=1k + (i-1)
where 1i is the probability to insert at jth position in the i possible positions
For n elements
E(X1 + X2 + +Xn)
= nΣi=2 E(Xi)
= nΣi=2 (1i) i-1Σk=1k + (i-1) = (n-1)(n-4)4
Therefore average case of insertion sort takes Θ(n2)
For n number of elements expected time taken is
T = nΣi=2 (1i) i-1Σk=1k + (i-1)
where 1i is the probability to insert at rth position in the i possible positions
E(X1 + X2 + +Xn) = nΣi=1 E(Xi)WhereXi is expected value of inserting Xi element
T = (n-1)(n-4)4Therefore average case of insertion sort takes
Θ(n2)
Quick-Sort
bull Pick the first item from the array--call it the pivotbull Partition the items in the array around the pivot so all
elements to the left are to the pivot and all elements to the right are greater than the pivot
bull Use recursion to sort the two partitions
pivotpartition items gt pivotpartition 1 items pivot
Quicksort Expected number of comparisons
bull Partition may generate splits (0n-1 1n-2 2n-3 hellip n-21 n-
10) each with probability 1n
bull If T(n) is the expected running time
euro
T n( ) =1
nT k( ) + T n minus1minus k( )[ ] + Θ n( )
k= 0
nminus1
sum
Randomized Quick-Sort
bull Pick an element from the array--call it the pivotbull Partition the items in the array around the pivot so all
elements to the left are to the pivot and all elements to the right are greater than the pivot
bull Use recursion to sort the two partitions
pivotpartition items gt pivotpartition 1 items pivot
Remarksbull Not much different from the Q-sort except
that earlier the algorithm was deterministic and the bounds were probabilistic
bull Here the algorithm is also randomized We pick an element to be a pivot randomly Notice that there isnrsquot any difference as to how does the algorithm behave there onwards
bull In the earlier case we can identify the worst case input Here no input is worst case
Randomized Select
1
0
1max1 n
k
nknTkTn
nT
Randomized Algorithms
bull A randomized algorithm performs coin tosses (ieuses random bits) to control its execution
bull b larr random()if b = 0do A hellipelse ie b = 1do B hellip
bull Its running time depends on the outcomes of the coin tosses
Assumptions
bull 1048708 the coins are unbiased andbull 1048708 the coin tosses are independent
bull The worst-case running time of a randomized algorithm may be large but occurs with very low probability (eg it occurs when all the coin tosses give ldquoheadsrdquo)
Monte Carlo Algorithms
bull Running times are guaranteed but the output may not be completely correct
bull Probability of error is low
Las Vegas Algorithms
bull Output is guaranteed to be correct
bull Bounds on running times hold with high probability
bull What type of algorithm is Randomized Qsort
Why expected running times
bull Markovrsquos inequality
P( X gt k E(X)) lt 1k
ie the probability that the algorithm will take more than O(2 E(X)) time is less than 12
Or the probability that the algorithm will take more than O(10 E(X)) time is less than 110
This is the reason why Qsort does well in practice
Markovrsquos Bound
P(XltkM)lt 1k where k is a constant
Chernouffrsquos Bound
P(Xgt2μ)lt frac12
A More Stronger Result
P(Xgtk μ )lt 1nk where k is a constant
Binary search tree can be built randomly
Rank(x)=i
Randomly selected key becomes the root
Pivot element=root
x
gtlt
RANDOMLY BUILT BST
bull Xi the height of the tree rooted at a node with rank=i
bull Yi exponential height of the tree=2^Xi
bull H=maxH1H2 + 1
where H1 ht of left subtree
H2 htof right subtree
H ht of the tree rooted at x
HEIGHT OF THE TREE
bull Y=2^H
=2max2^H12^H2
bull Expected value of exponential ht of the tree with lsquonrsquo nodes
=E(EH(T(X)))
=2n sum maxEH(T(k))EH(T(n-1-k))
=O(n^3)=E(H(T(n)))=O(log n)
Skip list is a data structure that can be used to maintain dictionary
Given n keys we insert these n keys in a linked list that has -infin as first node and infin as last node
Initial list S0
Then we flip coin a coin for each element until only one is left in Si if a tail occurswe insert it into next list Si+1 and so on
-infin infin5 9 25 30 35 38 40
Skip List Dictionary as ADT
-infin
-infin
-infin
S3
S2
S1
S0
-infin 5 infin 4038 353025 9
infin
infin
infin
9 30 38
30
38 30
head Tail
Operations that can be performed on skip list
Each node has two pointers right and down
1 Drop down
bull This operation is performed when after(p)gtkey
bull In this operation pointer p moves down to immediate lower level list
(after drop down)
right
down
-infin
-infin infin
infin 30
309 38
p
S1
S0
2Scan forward
bull This operation is performed when after(p)ltkey
bull Here the pointer p moves to the next element in the list
bull eg here key=28 amp p is at 9 after(9)lt28 so scan forward
-infin 9 infin 25 30
p p pS0
Searching a key kKeep a ptr p to the first node in the highest list Sh
while (after(p)gtk)
if (Scur==S0) Scur is the current skip list
then ldquokey k not foundrdquo
exit
if (after(p)gtk)
drop down to next skip list
If (after(p)ltk)
scan forward ie update pafter(p)
if (after(p)==k)
return after(p)
-infin
-infin
-infin
Searching for a key 25
S3
S2
S1
S0
-infin 5 infin 4038 353025 9
infin
infin
infin
30
9 30 38
p
p
p
Key found
p
p
p
-infin
-infin
-infin
Searching for a key 28
S3
S2
S1
S0
-infin 5 infin 4038 353025 9
infin
infin
infin
30
9 30 38
p
p
p
Key not found
p
p
p
p
-infin
-infin
-infin
Deletion of a key
S3 eg delete 30
S2
S1
S0
-infin 5 infin 4038 353025 9
infin
infin
infin
30
9 30 38
p
p
p
p
p
Analysis
1 An element xk is in Si with probability 12i true forall elements
E(Si ) = sum 12i Xki where Xki = 1 if xk is in Si
0 otherwise
= n2i
E(total size) = E(sum ISi I)
= sum n2i le 2n
2 Expected height of a skip listh = log n
n2h =1
h ≃ log n
n
k=1
k=1
infin
Analysis(contd)
3 Drop down O(log n)
Since pointer p can drop atmost h times
ieheight of the skip list until S0 is reached
and h = logn
4 Scan forward O(log n)
of elements Total no of levels Total Cost
to scan at each level
O(1) O(log n ) O(log n )
The number of elements scanned at ith level is no more than 2 because
The key lies between p and after(p) on the (i+1)th level (thatrsquos why we came down to ith level) And there is only one element between p and after(p) of
(i+1)th level in Si the element pointed to by after(p) in Si
Thus we scan at most two elements at Si the element pointed to by p (when we came down) and after(p) in Si
Hashing
bull Motivation symbol tablesndash A compiler uses a symbol table to relate
symbols to associated databull Symbols variable names procedure names etcbull Associated data memory location call graph etc
ndash For a symbol table (also called a dictionary) we care about search insertion and deletion
ndash We typically donrsquot care about sorted order
Hash Tables
bull More formallyndash Given a table T and a record x with key (= symbol)
and satellite data we need to supportbull Insert (T x)bull Delete (T x)bull Search(T x)
ndash We want these to be fast but donrsquot care about sorting the records
bull The structure we will use is a hash tablendash Supports all the above in O(1) expected time
Hash Functions
bull Next problem collision T
0
m - 1
h(k1)
h(k4)
h(k2) = h(k5)
h(k3)
k4
k2 k3
k1
k5
U(universe of keys)
K(actualkeys)
Resolving Collisions
bull How can we solve the problem of collisions
bull One of the solution is chaining
bull Other solutions open addressing
Chaining
bull Chaining puts elements that hash to the same slot in a linked list
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Chaining
bull How do we insert an element
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Chaining
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
bull How do we delete an element
Chaining
bull How do we search for a element with a given key
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
bull What will be the average cost of a successful search
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
bull What will be the average cost of a successful search A O((1 + )2) = O(1 + )
Analysis of Chaining Continued
bull So the cost of searching = O(1 + )
bull If the number of keys n is proportional to the number of slots in the table what is
bull A = O(1)ndash In other words we can make the expected
cost of searching constant if we make constant
If we could prove this
P(failure)lt1k (we are sort of happy)
P(failure)lt1nk (most of times this is true and wersquore
happy )
P(failure)lt12n (this is difficult but still we want this)
A Final Word About Randomized Algorithms
Acknowledgements
bull Kunal Verma
bull Nidhi Aggarwal
bull And other students of MSc(CS) batch 2009
END
x1x2 xi-1xihellipxn
How many comparisons it makes to insert ith element in jth position
(at jth position)
Expected Running Time of Insertion Sort
bull Position of Comparisionsi 1i-1 2i-2 3
2 i-11 i-1
Note Here both position 2 and 1 have of Comparisions equal to i-1 Why Because to insert element at position 2 we have to compare with previously
first element and after that comparison we know which of them come first and which at second
Thus E(Xi) = (1i) i-1Σk=1k + (i-1)
where 1i is the probability to insert at jth position in the i possible positions
For n elements
E(X1 + X2 + +Xn)
= nΣi=2 E(Xi)
= nΣi=2 (1i) i-1Σk=1k + (i-1) = (n-1)(n-4)4
Therefore average case of insertion sort takes Θ(n2)
For n number of elements expected time taken is
T = nΣi=2 (1i) i-1Σk=1k + (i-1)
where 1i is the probability to insert at rth position in the i possible positions
E(X1 + X2 + +Xn) = nΣi=1 E(Xi)WhereXi is expected value of inserting Xi element
T = (n-1)(n-4)4Therefore average case of insertion sort takes
Θ(n2)
Quick-Sort
bull Pick the first item from the array--call it the pivotbull Partition the items in the array around the pivot so all
elements to the left are to the pivot and all elements to the right are greater than the pivot
bull Use recursion to sort the two partitions
pivotpartition items gt pivotpartition 1 items pivot
Quicksort Expected number of comparisons
bull Partition may generate splits (0n-1 1n-2 2n-3 hellip n-21 n-
10) each with probability 1n
bull If T(n) is the expected running time
euro
T n( ) =1
nT k( ) + T n minus1minus k( )[ ] + Θ n( )
k= 0
nminus1
sum
Randomized Quick-Sort
bull Pick an element from the array--call it the pivotbull Partition the items in the array around the pivot so all
elements to the left are to the pivot and all elements to the right are greater than the pivot
bull Use recursion to sort the two partitions
pivotpartition items gt pivotpartition 1 items pivot
Remarksbull Not much different from the Q-sort except
that earlier the algorithm was deterministic and the bounds were probabilistic
bull Here the algorithm is also randomized We pick an element to be a pivot randomly Notice that there isnrsquot any difference as to how does the algorithm behave there onwards
bull In the earlier case we can identify the worst case input Here no input is worst case
Randomized Select
1
0
1max1 n
k
nknTkTn
nT
Randomized Algorithms
bull A randomized algorithm performs coin tosses (ieuses random bits) to control its execution
bull b larr random()if b = 0do A hellipelse ie b = 1do B hellip
bull Its running time depends on the outcomes of the coin tosses
Assumptions
bull 1048708 the coins are unbiased andbull 1048708 the coin tosses are independent
bull The worst-case running time of a randomized algorithm may be large but occurs with very low probability (eg it occurs when all the coin tosses give ldquoheadsrdquo)
Monte Carlo Algorithms
bull Running times are guaranteed but the output may not be completely correct
bull Probability of error is low
Las Vegas Algorithms
bull Output is guaranteed to be correct
bull Bounds on running times hold with high probability
bull What type of algorithm is Randomized Qsort
Why expected running times
bull Markovrsquos inequality
P( X gt k E(X)) lt 1k
ie the probability that the algorithm will take more than O(2 E(X)) time is less than 12
Or the probability that the algorithm will take more than O(10 E(X)) time is less than 110
This is the reason why Qsort does well in practice
Markovrsquos Bound
P(XltkM)lt 1k where k is a constant
Chernouffrsquos Bound
P(Xgt2μ)lt frac12
A More Stronger Result
P(Xgtk μ )lt 1nk where k is a constant
Binary search tree can be built randomly
Rank(x)=i
Randomly selected key becomes the root
Pivot element=root
x
gtlt
RANDOMLY BUILT BST
bull Xi the height of the tree rooted at a node with rank=i
bull Yi exponential height of the tree=2^Xi
bull H=maxH1H2 + 1
where H1 ht of left subtree
H2 htof right subtree
H ht of the tree rooted at x
HEIGHT OF THE TREE
bull Y=2^H
=2max2^H12^H2
bull Expected value of exponential ht of the tree with lsquonrsquo nodes
=E(EH(T(X)))
=2n sum maxEH(T(k))EH(T(n-1-k))
=O(n^3)=E(H(T(n)))=O(log n)
Skip list is a data structure that can be used to maintain dictionary
Given n keys we insert these n keys in a linked list that has -infin as first node and infin as last node
Initial list S0
Then we flip coin a coin for each element until only one is left in Si if a tail occurswe insert it into next list Si+1 and so on
-infin infin5 9 25 30 35 38 40
Skip List Dictionary as ADT
-infin
-infin
-infin
S3
S2
S1
S0
-infin 5 infin 4038 353025 9
infin
infin
infin
9 30 38
30
38 30
head Tail
Operations that can be performed on skip list
Each node has two pointers right and down
1 Drop down
bull This operation is performed when after(p)gtkey
bull In this operation pointer p moves down to immediate lower level list
(after drop down)
right
down
-infin
-infin infin
infin 30
309 38
p
S1
S0
2Scan forward
bull This operation is performed when after(p)ltkey
bull Here the pointer p moves to the next element in the list
bull eg here key=28 amp p is at 9 after(9)lt28 so scan forward
-infin 9 infin 25 30
p p pS0
Searching a key kKeep a ptr p to the first node in the highest list Sh
while (after(p)gtk)
if (Scur==S0) Scur is the current skip list
then ldquokey k not foundrdquo
exit
if (after(p)gtk)
drop down to next skip list
If (after(p)ltk)
scan forward ie update pafter(p)
if (after(p)==k)
return after(p)
-infin
-infin
-infin
Searching for a key 25
S3
S2
S1
S0
-infin 5 infin 4038 353025 9
infin
infin
infin
30
9 30 38
p
p
p
Key found
p
p
p
-infin
-infin
-infin
Searching for a key 28
S3
S2
S1
S0
-infin 5 infin 4038 353025 9
infin
infin
infin
30
9 30 38
p
p
p
Key not found
p
p
p
p
-infin
-infin
-infin
Deletion of a key
S3 eg delete 30
S2
S1
S0
-infin 5 infin 4038 353025 9
infin
infin
infin
30
9 30 38
p
p
p
p
p
Analysis
1 An element xk is in Si with probability 12i true forall elements
E(Si ) = sum 12i Xki where Xki = 1 if xk is in Si
0 otherwise
= n2i
E(total size) = E(sum ISi I)
= sum n2i le 2n
2 Expected height of a skip listh = log n
n2h =1
h ≃ log n
n
k=1
k=1
infin
Analysis(contd)
3 Drop down O(log n)
Since pointer p can drop atmost h times
ieheight of the skip list until S0 is reached
and h = logn
4 Scan forward O(log n)
of elements Total no of levels Total Cost
to scan at each level
O(1) O(log n ) O(log n )
The number of elements scanned at ith level is no more than 2 because
The key lies between p and after(p) on the (i+1)th level (thatrsquos why we came down to ith level) And there is only one element between p and after(p) of
(i+1)th level in Si the element pointed to by after(p) in Si
Thus we scan at most two elements at Si the element pointed to by p (when we came down) and after(p) in Si
Hashing
bull Motivation symbol tablesndash A compiler uses a symbol table to relate
symbols to associated databull Symbols variable names procedure names etcbull Associated data memory location call graph etc
ndash For a symbol table (also called a dictionary) we care about search insertion and deletion
ndash We typically donrsquot care about sorted order
Hash Tables
bull More formallyndash Given a table T and a record x with key (= symbol)
and satellite data we need to supportbull Insert (T x)bull Delete (T x)bull Search(T x)
ndash We want these to be fast but donrsquot care about sorting the records
bull The structure we will use is a hash tablendash Supports all the above in O(1) expected time
Hash Functions
bull Next problem collision T
0
m - 1
h(k1)
h(k4)
h(k2) = h(k5)
h(k3)
k4
k2 k3
k1
k5
U(universe of keys)
K(actualkeys)
Resolving Collisions
bull How can we solve the problem of collisions
bull One of the solution is chaining
bull Other solutions open addressing
Chaining
bull Chaining puts elements that hash to the same slot in a linked list
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Chaining
bull How do we insert an element
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Chaining
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
bull How do we delete an element
Chaining
bull How do we search for a element with a given key
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
bull What will be the average cost of a successful search
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
bull What will be the average cost of a successful search A O((1 + )2) = O(1 + )
Analysis of Chaining Continued
bull So the cost of searching = O(1 + )
bull If the number of keys n is proportional to the number of slots in the table what is
bull A = O(1)ndash In other words we can make the expected
cost of searching constant if we make constant
If we could prove this
P(failure)lt1k (we are sort of happy)
P(failure)lt1nk (most of times this is true and wersquore
happy )
P(failure)lt12n (this is difficult but still we want this)
A Final Word About Randomized Algorithms
Acknowledgements
bull Kunal Verma
bull Nidhi Aggarwal
bull And other students of MSc(CS) batch 2009
END
bull Position of Comparisionsi 1i-1 2i-2 3
2 i-11 i-1
Note Here both position 2 and 1 have of Comparisions equal to i-1 Why Because to insert element at position 2 we have to compare with previously
first element and after that comparison we know which of them come first and which at second
Thus E(Xi) = (1i) i-1Σk=1k + (i-1)
where 1i is the probability to insert at jth position in the i possible positions
For n elements
E(X1 + X2 + +Xn)
= nΣi=2 E(Xi)
= nΣi=2 (1i) i-1Σk=1k + (i-1) = (n-1)(n-4)4
Therefore average case of insertion sort takes Θ(n2)
For n number of elements expected time taken is
T = nΣi=2 (1i) i-1Σk=1k + (i-1)
where 1i is the probability to insert at rth position in the i possible positions
E(X1 + X2 + +Xn) = nΣi=1 E(Xi)WhereXi is expected value of inserting Xi element
T = (n-1)(n-4)4Therefore average case of insertion sort takes
Θ(n2)
Quick-Sort
bull Pick the first item from the array--call it the pivotbull Partition the items in the array around the pivot so all
elements to the left are to the pivot and all elements to the right are greater than the pivot
bull Use recursion to sort the two partitions
pivotpartition items gt pivotpartition 1 items pivot
Quicksort Expected number of comparisons
bull Partition may generate splits (0n-1 1n-2 2n-3 hellip n-21 n-
10) each with probability 1n
bull If T(n) is the expected running time
euro
T n( ) =1
nT k( ) + T n minus1minus k( )[ ] + Θ n( )
k= 0
nminus1
sum
Randomized Quick-Sort
bull Pick an element from the array--call it the pivotbull Partition the items in the array around the pivot so all
elements to the left are to the pivot and all elements to the right are greater than the pivot
bull Use recursion to sort the two partitions
pivotpartition items gt pivotpartition 1 items pivot
Remarksbull Not much different from the Q-sort except
that earlier the algorithm was deterministic and the bounds were probabilistic
bull Here the algorithm is also randomized We pick an element to be a pivot randomly Notice that there isnrsquot any difference as to how does the algorithm behave there onwards
bull In the earlier case we can identify the worst case input Here no input is worst case
Randomized Select
1
0
1max1 n
k
nknTkTn
nT
Randomized Algorithms
bull A randomized algorithm performs coin tosses (ieuses random bits) to control its execution
bull b larr random()if b = 0do A hellipelse ie b = 1do B hellip
bull Its running time depends on the outcomes of the coin tosses
Assumptions
bull 1048708 the coins are unbiased andbull 1048708 the coin tosses are independent
bull The worst-case running time of a randomized algorithm may be large but occurs with very low probability (eg it occurs when all the coin tosses give ldquoheadsrdquo)
Monte Carlo Algorithms
bull Running times are guaranteed but the output may not be completely correct
bull Probability of error is low
Las Vegas Algorithms
bull Output is guaranteed to be correct
bull Bounds on running times hold with high probability
bull What type of algorithm is Randomized Qsort
Why expected running times
bull Markovrsquos inequality
P( X gt k E(X)) lt 1k
ie the probability that the algorithm will take more than O(2 E(X)) time is less than 12
Or the probability that the algorithm will take more than O(10 E(X)) time is less than 110
This is the reason why Qsort does well in practice
Markovrsquos Bound
P(XltkM)lt 1k where k is a constant
Chernouffrsquos Bound
P(Xgt2μ)lt frac12
A More Stronger Result
P(Xgtk μ )lt 1nk where k is a constant
Binary search tree can be built randomly
Rank(x)=i
Randomly selected key becomes the root
Pivot element=root
x
gtlt
RANDOMLY BUILT BST
bull Xi the height of the tree rooted at a node with rank=i
bull Yi exponential height of the tree=2^Xi
bull H=maxH1H2 + 1
where H1 ht of left subtree
H2 htof right subtree
H ht of the tree rooted at x
HEIGHT OF THE TREE
bull Y=2^H
=2max2^H12^H2
bull Expected value of exponential ht of the tree with lsquonrsquo nodes
=E(EH(T(X)))
=2n sum maxEH(T(k))EH(T(n-1-k))
=O(n^3)=E(H(T(n)))=O(log n)
Skip list is a data structure that can be used to maintain dictionary
Given n keys we insert these n keys in a linked list that has -infin as first node and infin as last node
Initial list S0
Then we flip coin a coin for each element until only one is left in Si if a tail occurswe insert it into next list Si+1 and so on
-infin infin5 9 25 30 35 38 40
Skip List Dictionary as ADT
-infin
-infin
-infin
S3
S2
S1
S0
-infin 5 infin 4038 353025 9
infin
infin
infin
9 30 38
30
38 30
head Tail
Operations that can be performed on skip list
Each node has two pointers right and down
1 Drop down
bull This operation is performed when after(p)gtkey
bull In this operation pointer p moves down to immediate lower level list
(after drop down)
right
down
-infin
-infin infin
infin 30
309 38
p
S1
S0
2Scan forward
bull This operation is performed when after(p)ltkey
bull Here the pointer p moves to the next element in the list
bull eg here key=28 amp p is at 9 after(9)lt28 so scan forward
-infin 9 infin 25 30
p p pS0
Searching a key kKeep a ptr p to the first node in the highest list Sh
while (after(p)gtk)
if (Scur==S0) Scur is the current skip list
then ldquokey k not foundrdquo
exit
if (after(p)gtk)
drop down to next skip list
If (after(p)ltk)
scan forward ie update pafter(p)
if (after(p)==k)
return after(p)
-infin
-infin
-infin
Searching for a key 25
S3
S2
S1
S0
-infin 5 infin 4038 353025 9
infin
infin
infin
30
9 30 38
p
p
p
Key found
p
p
p
-infin
-infin
-infin
Searching for a key 28
S3
S2
S1
S0
-infin 5 infin 4038 353025 9
infin
infin
infin
30
9 30 38
p
p
p
Key not found
p
p
p
p
-infin
-infin
-infin
Deletion of a key
S3 eg delete 30
S2
S1
S0
-infin 5 infin 4038 353025 9
infin
infin
infin
30
9 30 38
p
p
p
p
p
Analysis
1 An element xk is in Si with probability 12i true forall elements
E(Si ) = sum 12i Xki where Xki = 1 if xk is in Si
0 otherwise
= n2i
E(total size) = E(sum ISi I)
= sum n2i le 2n
2 Expected height of a skip listh = log n
n2h =1
h ≃ log n
n
k=1
k=1
infin
Analysis(contd)
3 Drop down O(log n)
Since pointer p can drop atmost h times
ieheight of the skip list until S0 is reached
and h = logn
4 Scan forward O(log n)
of elements Total no of levels Total Cost
to scan at each level
O(1) O(log n ) O(log n )
The number of elements scanned at ith level is no more than 2 because
The key lies between p and after(p) on the (i+1)th level (thatrsquos why we came down to ith level) And there is only one element between p and after(p) of
(i+1)th level in Si the element pointed to by after(p) in Si
Thus we scan at most two elements at Si the element pointed to by p (when we came down) and after(p) in Si
Hashing
bull Motivation symbol tablesndash A compiler uses a symbol table to relate
symbols to associated databull Symbols variable names procedure names etcbull Associated data memory location call graph etc
ndash For a symbol table (also called a dictionary) we care about search insertion and deletion
ndash We typically donrsquot care about sorted order
Hash Tables
bull More formallyndash Given a table T and a record x with key (= symbol)
and satellite data we need to supportbull Insert (T x)bull Delete (T x)bull Search(T x)
ndash We want these to be fast but donrsquot care about sorting the records
bull The structure we will use is a hash tablendash Supports all the above in O(1) expected time
Hash Functions
bull Next problem collision T
0
m - 1
h(k1)
h(k4)
h(k2) = h(k5)
h(k3)
k4
k2 k3
k1
k5
U(universe of keys)
K(actualkeys)
Resolving Collisions
bull How can we solve the problem of collisions
bull One of the solution is chaining
bull Other solutions open addressing
Chaining
bull Chaining puts elements that hash to the same slot in a linked list
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Chaining
bull How do we insert an element
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Chaining
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
bull How do we delete an element
Chaining
bull How do we search for a element with a given key
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
bull What will be the average cost of a successful search
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
bull What will be the average cost of a successful search A O((1 + )2) = O(1 + )
Analysis of Chaining Continued
bull So the cost of searching = O(1 + )
bull If the number of keys n is proportional to the number of slots in the table what is
bull A = O(1)ndash In other words we can make the expected
cost of searching constant if we make constant
If we could prove this
P(failure)lt1k (we are sort of happy)
P(failure)lt1nk (most of times this is true and wersquore
happy )
P(failure)lt12n (this is difficult but still we want this)
A Final Word About Randomized Algorithms
Acknowledgements
bull Kunal Verma
bull Nidhi Aggarwal
bull And other students of MSc(CS) batch 2009
END
Thus E(Xi) = (1i) i-1Σk=1k + (i-1)
where 1i is the probability to insert at jth position in the i possible positions
For n elements
E(X1 + X2 + +Xn)
= nΣi=2 E(Xi)
= nΣi=2 (1i) i-1Σk=1k + (i-1) = (n-1)(n-4)4
Therefore average case of insertion sort takes Θ(n2)
For n number of elements expected time taken is
T = nΣi=2 (1i) i-1Σk=1k + (i-1)
where 1i is the probability to insert at rth position in the i possible positions
E(X1 + X2 + +Xn) = nΣi=1 E(Xi)WhereXi is expected value of inserting Xi element
T = (n-1)(n-4)4Therefore average case of insertion sort takes
Θ(n2)
Quick-Sort
bull Pick the first item from the array--call it the pivotbull Partition the items in the array around the pivot so all
elements to the left are to the pivot and all elements to the right are greater than the pivot
bull Use recursion to sort the two partitions
pivotpartition items gt pivotpartition 1 items pivot
Quicksort Expected number of comparisons
bull Partition may generate splits (0n-1 1n-2 2n-3 hellip n-21 n-
10) each with probability 1n
bull If T(n) is the expected running time
euro
T n( ) =1
nT k( ) + T n minus1minus k( )[ ] + Θ n( )
k= 0
nminus1
sum
Randomized Quick-Sort
bull Pick an element from the array--call it the pivotbull Partition the items in the array around the pivot so all
elements to the left are to the pivot and all elements to the right are greater than the pivot
bull Use recursion to sort the two partitions
pivotpartition items gt pivotpartition 1 items pivot
Remarksbull Not much different from the Q-sort except
that earlier the algorithm was deterministic and the bounds were probabilistic
bull Here the algorithm is also randomized We pick an element to be a pivot randomly Notice that there isnrsquot any difference as to how does the algorithm behave there onwards
bull In the earlier case we can identify the worst case input Here no input is worst case
Randomized Select
1
0
1max1 n
k
nknTkTn
nT
Randomized Algorithms
bull A randomized algorithm performs coin tosses (ieuses random bits) to control its execution
bull b larr random()if b = 0do A hellipelse ie b = 1do B hellip
bull Its running time depends on the outcomes of the coin tosses
Assumptions
bull 1048708 the coins are unbiased andbull 1048708 the coin tosses are independent
bull The worst-case running time of a randomized algorithm may be large but occurs with very low probability (eg it occurs when all the coin tosses give ldquoheadsrdquo)
Monte Carlo Algorithms
bull Running times are guaranteed but the output may not be completely correct
bull Probability of error is low
Las Vegas Algorithms
bull Output is guaranteed to be correct
bull Bounds on running times hold with high probability
bull What type of algorithm is Randomized Qsort
Why expected running times
bull Markovrsquos inequality
P( X gt k E(X)) lt 1k
ie the probability that the algorithm will take more than O(2 E(X)) time is less than 12
Or the probability that the algorithm will take more than O(10 E(X)) time is less than 110
This is the reason why Qsort does well in practice
Markovrsquos Bound
P(XltkM)lt 1k where k is a constant
Chernouffrsquos Bound
P(Xgt2μ)lt frac12
A More Stronger Result
P(Xgtk μ )lt 1nk where k is a constant
Binary search tree can be built randomly
Rank(x)=i
Randomly selected key becomes the root
Pivot element=root
x
gtlt
RANDOMLY BUILT BST
bull Xi the height of the tree rooted at a node with rank=i
bull Yi exponential height of the tree=2^Xi
bull H=maxH1H2 + 1
where H1 ht of left subtree
H2 htof right subtree
H ht of the tree rooted at x
HEIGHT OF THE TREE
bull Y=2^H
=2max2^H12^H2
bull Expected value of exponential ht of the tree with lsquonrsquo nodes
=E(EH(T(X)))
=2n sum maxEH(T(k))EH(T(n-1-k))
=O(n^3)=E(H(T(n)))=O(log n)
Skip list is a data structure that can be used to maintain dictionary
Given n keys we insert these n keys in a linked list that has -infin as first node and infin as last node
Initial list S0
Then we flip coin a coin for each element until only one is left in Si if a tail occurswe insert it into next list Si+1 and so on
-infin infin5 9 25 30 35 38 40
Skip List Dictionary as ADT
-infin
-infin
-infin
S3
S2
S1
S0
-infin 5 infin 4038 353025 9
infin
infin
infin
9 30 38
30
38 30
head Tail
Operations that can be performed on skip list
Each node has two pointers right and down
1 Drop down
bull This operation is performed when after(p)gtkey
bull In this operation pointer p moves down to immediate lower level list
(after drop down)
right
down
-infin
-infin infin
infin 30
309 38
p
S1
S0
2Scan forward
bull This operation is performed when after(p)ltkey
bull Here the pointer p moves to the next element in the list
bull eg here key=28 amp p is at 9 after(9)lt28 so scan forward
-infin 9 infin 25 30
p p pS0
Searching a key kKeep a ptr p to the first node in the highest list Sh
while (after(p)gtk)
if (Scur==S0) Scur is the current skip list
then ldquokey k not foundrdquo
exit
if (after(p)gtk)
drop down to next skip list
If (after(p)ltk)
scan forward ie update pafter(p)
if (after(p)==k)
return after(p)
-infin
-infin
-infin
Searching for a key 25
S3
S2
S1
S0
-infin 5 infin 4038 353025 9
infin
infin
infin
30
9 30 38
p
p
p
Key found
p
p
p
-infin
-infin
-infin
Searching for a key 28
S3
S2
S1
S0
-infin 5 infin 4038 353025 9
infin
infin
infin
30
9 30 38
p
p
p
Key not found
p
p
p
p
-infin
-infin
-infin
Deletion of a key
S3 eg delete 30
S2
S1
S0
-infin 5 infin 4038 353025 9
infin
infin
infin
30
9 30 38
p
p
p
p
p
Analysis
1 An element xk is in Si with probability 12i true forall elements
E(Si ) = sum 12i Xki where Xki = 1 if xk is in Si
0 otherwise
= n2i
E(total size) = E(sum ISi I)
= sum n2i le 2n
2 Expected height of a skip listh = log n
n2h =1
h ≃ log n
n
k=1
k=1
infin
Analysis(contd)
3 Drop down O(log n)
Since pointer p can drop atmost h times
ieheight of the skip list until S0 is reached
and h = logn
4 Scan forward O(log n)
of elements Total no of levels Total Cost
to scan at each level
O(1) O(log n ) O(log n )
The number of elements scanned at ith level is no more than 2 because
The key lies between p and after(p) on the (i+1)th level (thatrsquos why we came down to ith level) And there is only one element between p and after(p) of
(i+1)th level in Si the element pointed to by after(p) in Si
Thus we scan at most two elements at Si the element pointed to by p (when we came down) and after(p) in Si
Hashing
bull Motivation symbol tablesndash A compiler uses a symbol table to relate
symbols to associated databull Symbols variable names procedure names etcbull Associated data memory location call graph etc
ndash For a symbol table (also called a dictionary) we care about search insertion and deletion
ndash We typically donrsquot care about sorted order
Hash Tables
bull More formallyndash Given a table T and a record x with key (= symbol)
and satellite data we need to supportbull Insert (T x)bull Delete (T x)bull Search(T x)
ndash We want these to be fast but donrsquot care about sorting the records
bull The structure we will use is a hash tablendash Supports all the above in O(1) expected time
Hash Functions
bull Next problem collision T
0
m - 1
h(k1)
h(k4)
h(k2) = h(k5)
h(k3)
k4
k2 k3
k1
k5
U(universe of keys)
K(actualkeys)
Resolving Collisions
bull How can we solve the problem of collisions
bull One of the solution is chaining
bull Other solutions open addressing
Chaining
bull Chaining puts elements that hash to the same slot in a linked list
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Chaining
bull How do we insert an element
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Chaining
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
bull How do we delete an element
Chaining
bull How do we search for a element with a given key
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
bull What will be the average cost of a successful search
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
bull What will be the average cost of a successful search A O((1 + )2) = O(1 + )
Analysis of Chaining Continued
bull So the cost of searching = O(1 + )
bull If the number of keys n is proportional to the number of slots in the table what is
bull A = O(1)ndash In other words we can make the expected
cost of searching constant if we make constant
If we could prove this
P(failure)lt1k (we are sort of happy)
P(failure)lt1nk (most of times this is true and wersquore
happy )
P(failure)lt12n (this is difficult but still we want this)
A Final Word About Randomized Algorithms
Acknowledgements
bull Kunal Verma
bull Nidhi Aggarwal
bull And other students of MSc(CS) batch 2009
END
For n number of elements expected time taken is
T = nΣi=2 (1i) i-1Σk=1k + (i-1)
where 1i is the probability to insert at rth position in the i possible positions
E(X1 + X2 + +Xn) = nΣi=1 E(Xi)WhereXi is expected value of inserting Xi element
T = (n-1)(n-4)4Therefore average case of insertion sort takes
Θ(n2)
Quick-Sort
bull Pick the first item from the array--call it the pivotbull Partition the items in the array around the pivot so all
elements to the left are to the pivot and all elements to the right are greater than the pivot
bull Use recursion to sort the two partitions
pivotpartition items gt pivotpartition 1 items pivot
Quicksort Expected number of comparisons
bull Partition may generate splits (0n-1 1n-2 2n-3 hellip n-21 n-
10) each with probability 1n
bull If T(n) is the expected running time
euro
T n( ) =1
nT k( ) + T n minus1minus k( )[ ] + Θ n( )
k= 0
nminus1
sum
Randomized Quick-Sort
bull Pick an element from the array--call it the pivotbull Partition the items in the array around the pivot so all
elements to the left are to the pivot and all elements to the right are greater than the pivot
bull Use recursion to sort the two partitions
pivotpartition items gt pivotpartition 1 items pivot
Remarksbull Not much different from the Q-sort except
that earlier the algorithm was deterministic and the bounds were probabilistic
bull Here the algorithm is also randomized We pick an element to be a pivot randomly Notice that there isnrsquot any difference as to how does the algorithm behave there onwards
bull In the earlier case we can identify the worst case input Here no input is worst case
Randomized Select
1
0
1max1 n
k
nknTkTn
nT
Randomized Algorithms
bull A randomized algorithm performs coin tosses (ieuses random bits) to control its execution
bull b larr random()if b = 0do A hellipelse ie b = 1do B hellip
bull Its running time depends on the outcomes of the coin tosses
Assumptions
bull 1048708 the coins are unbiased andbull 1048708 the coin tosses are independent
bull The worst-case running time of a randomized algorithm may be large but occurs with very low probability (eg it occurs when all the coin tosses give ldquoheadsrdquo)
Monte Carlo Algorithms
bull Running times are guaranteed but the output may not be completely correct
bull Probability of error is low
Las Vegas Algorithms
bull Output is guaranteed to be correct
bull Bounds on running times hold with high probability
bull What type of algorithm is Randomized Qsort
Why expected running times
bull Markovrsquos inequality
P( X gt k E(X)) lt 1k
ie the probability that the algorithm will take more than O(2 E(X)) time is less than 12
Or the probability that the algorithm will take more than O(10 E(X)) time is less than 110
This is the reason why Qsort does well in practice
Markovrsquos Bound
P(XltkM)lt 1k where k is a constant
Chernouffrsquos Bound
P(Xgt2μ)lt frac12
A More Stronger Result
P(Xgtk μ )lt 1nk where k is a constant
Binary search tree can be built randomly
Rank(x)=i
Randomly selected key becomes the root
Pivot element=root
x
gtlt
RANDOMLY BUILT BST
bull Xi the height of the tree rooted at a node with rank=i
bull Yi exponential height of the tree=2^Xi
bull H=maxH1H2 + 1
where H1 ht of left subtree
H2 htof right subtree
H ht of the tree rooted at x
HEIGHT OF THE TREE
bull Y=2^H
=2max2^H12^H2
bull Expected value of exponential ht of the tree with lsquonrsquo nodes
=E(EH(T(X)))
=2n sum maxEH(T(k))EH(T(n-1-k))
=O(n^3)=E(H(T(n)))=O(log n)
Skip list is a data structure that can be used to maintain dictionary
Given n keys we insert these n keys in a linked list that has -infin as first node and infin as last node
Initial list S0
Then we flip coin a coin for each element until only one is left in Si if a tail occurswe insert it into next list Si+1 and so on
-infin infin5 9 25 30 35 38 40
Skip List Dictionary as ADT
-infin
-infin
-infin
S3
S2
S1
S0
-infin 5 infin 4038 353025 9
infin
infin
infin
9 30 38
30
38 30
head Tail
Operations that can be performed on skip list
Each node has two pointers right and down
1 Drop down
bull This operation is performed when after(p)gtkey
bull In this operation pointer p moves down to immediate lower level list
(after drop down)
right
down
-infin
-infin infin
infin 30
309 38
p
S1
S0
2Scan forward
bull This operation is performed when after(p)ltkey
bull Here the pointer p moves to the next element in the list
bull eg here key=28 amp p is at 9 after(9)lt28 so scan forward
-infin 9 infin 25 30
p p pS0
Searching a key kKeep a ptr p to the first node in the highest list Sh
while (after(p)gtk)
if (Scur==S0) Scur is the current skip list
then ldquokey k not foundrdquo
exit
if (after(p)gtk)
drop down to next skip list
If (after(p)ltk)
scan forward ie update pafter(p)
if (after(p)==k)
return after(p)
-infin
-infin
-infin
Searching for a key 25
S3
S2
S1
S0
-infin 5 infin 4038 353025 9
infin
infin
infin
30
9 30 38
p
p
p
Key found
p
p
p
-infin
-infin
-infin
Searching for a key 28
S3
S2
S1
S0
-infin 5 infin 4038 353025 9
infin
infin
infin
30
9 30 38
p
p
p
Key not found
p
p
p
p
-infin
-infin
-infin
Deletion of a key
S3 eg delete 30
S2
S1
S0
-infin 5 infin 4038 353025 9
infin
infin
infin
30
9 30 38
p
p
p
p
p
Analysis
1 An element xk is in Si with probability 12i true forall elements
E(Si ) = sum 12i Xki where Xki = 1 if xk is in Si
0 otherwise
= n2i
E(total size) = E(sum ISi I)
= sum n2i le 2n
2 Expected height of a skip listh = log n
n2h =1
h ≃ log n
n
k=1
k=1
infin
Analysis(contd)
3 Drop down O(log n)
Since pointer p can drop atmost h times
ieheight of the skip list until S0 is reached
and h = logn
4 Scan forward O(log n)
of elements Total no of levels Total Cost
to scan at each level
O(1) O(log n ) O(log n )
The number of elements scanned at ith level is no more than 2 because
The key lies between p and after(p) on the (i+1)th level (thatrsquos why we came down to ith level) And there is only one element between p and after(p) of
(i+1)th level in Si the element pointed to by after(p) in Si
Thus we scan at most two elements at Si the element pointed to by p (when we came down) and after(p) in Si
Hashing
bull Motivation symbol tablesndash A compiler uses a symbol table to relate
symbols to associated databull Symbols variable names procedure names etcbull Associated data memory location call graph etc
ndash For a symbol table (also called a dictionary) we care about search insertion and deletion
ndash We typically donrsquot care about sorted order
Hash Tables
bull More formallyndash Given a table T and a record x with key (= symbol)
and satellite data we need to supportbull Insert (T x)bull Delete (T x)bull Search(T x)
ndash We want these to be fast but donrsquot care about sorting the records
bull The structure we will use is a hash tablendash Supports all the above in O(1) expected time
Hash Functions
bull Next problem collision T
0
m - 1
h(k1)
h(k4)
h(k2) = h(k5)
h(k3)
k4
k2 k3
k1
k5
U(universe of keys)
K(actualkeys)
Resolving Collisions
bull How can we solve the problem of collisions
bull One of the solution is chaining
bull Other solutions open addressing
Chaining
bull Chaining puts elements that hash to the same slot in a linked list
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Chaining
bull How do we insert an element
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Chaining
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
bull How do we delete an element
Chaining
bull How do we search for a element with a given key
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
bull What will be the average cost of a successful search
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
bull What will be the average cost of a successful search A O((1 + )2) = O(1 + )
Analysis of Chaining Continued
bull So the cost of searching = O(1 + )
bull If the number of keys n is proportional to the number of slots in the table what is
bull A = O(1)ndash In other words we can make the expected
cost of searching constant if we make constant
If we could prove this
P(failure)lt1k (we are sort of happy)
P(failure)lt1nk (most of times this is true and wersquore
happy )
P(failure)lt12n (this is difficult but still we want this)
A Final Word About Randomized Algorithms
Acknowledgements
bull Kunal Verma
bull Nidhi Aggarwal
bull And other students of MSc(CS) batch 2009
END
Quick-Sort
bull Pick the first item from the array--call it the pivotbull Partition the items in the array around the pivot so all
elements to the left are to the pivot and all elements to the right are greater than the pivot
bull Use recursion to sort the two partitions
pivotpartition items gt pivotpartition 1 items pivot
Quicksort Expected number of comparisons
bull Partition may generate splits (0n-1 1n-2 2n-3 hellip n-21 n-
10) each with probability 1n
bull If T(n) is the expected running time
euro
T n( ) =1
nT k( ) + T n minus1minus k( )[ ] + Θ n( )
k= 0
nminus1
sum
Randomized Quick-Sort
bull Pick an element from the array--call it the pivotbull Partition the items in the array around the pivot so all
elements to the left are to the pivot and all elements to the right are greater than the pivot
bull Use recursion to sort the two partitions
pivotpartition items gt pivotpartition 1 items pivot
Remarksbull Not much different from the Q-sort except
that earlier the algorithm was deterministic and the bounds were probabilistic
bull Here the algorithm is also randomized We pick an element to be a pivot randomly Notice that there isnrsquot any difference as to how does the algorithm behave there onwards
bull In the earlier case we can identify the worst case input Here no input is worst case
Randomized Select
1
0
1max1 n
k
nknTkTn
nT
Randomized Algorithms
bull A randomized algorithm performs coin tosses (ieuses random bits) to control its execution
bull b larr random()if b = 0do A hellipelse ie b = 1do B hellip
bull Its running time depends on the outcomes of the coin tosses
Assumptions
bull 1048708 the coins are unbiased andbull 1048708 the coin tosses are independent
bull The worst-case running time of a randomized algorithm may be large but occurs with very low probability (eg it occurs when all the coin tosses give ldquoheadsrdquo)
Monte Carlo Algorithms
bull Running times are guaranteed but the output may not be completely correct
bull Probability of error is low
Las Vegas Algorithms
bull Output is guaranteed to be correct
bull Bounds on running times hold with high probability
bull What type of algorithm is Randomized Qsort
Why expected running times
bull Markovrsquos inequality
P( X gt k E(X)) lt 1k
ie the probability that the algorithm will take more than O(2 E(X)) time is less than 12
Or the probability that the algorithm will take more than O(10 E(X)) time is less than 110
This is the reason why Qsort does well in practice
Markovrsquos Bound
P(XltkM)lt 1k where k is a constant
Chernouffrsquos Bound
P(Xgt2μ)lt frac12
A More Stronger Result
P(Xgtk μ )lt 1nk where k is a constant
Binary search tree can be built randomly
Rank(x)=i
Randomly selected key becomes the root
Pivot element=root
x
gtlt
RANDOMLY BUILT BST
bull Xi the height of the tree rooted at a node with rank=i
bull Yi exponential height of the tree=2^Xi
bull H=maxH1H2 + 1
where H1 ht of left subtree
H2 htof right subtree
H ht of the tree rooted at x
HEIGHT OF THE TREE
bull Y=2^H
=2max2^H12^H2
bull Expected value of exponential ht of the tree with lsquonrsquo nodes
=E(EH(T(X)))
=2n sum maxEH(T(k))EH(T(n-1-k))
=O(n^3)=E(H(T(n)))=O(log n)
Skip list is a data structure that can be used to maintain dictionary
Given n keys we insert these n keys in a linked list that has -infin as first node and infin as last node
Initial list S0
Then we flip coin a coin for each element until only one is left in Si if a tail occurswe insert it into next list Si+1 and so on
-infin infin5 9 25 30 35 38 40
Skip List Dictionary as ADT
-infin
-infin
-infin
S3
S2
S1
S0
-infin 5 infin 4038 353025 9
infin
infin
infin
9 30 38
30
38 30
head Tail
Operations that can be performed on skip list
Each node has two pointers right and down
1 Drop down
bull This operation is performed when after(p)gtkey
bull In this operation pointer p moves down to immediate lower level list
(after drop down)
right
down
-infin
-infin infin
infin 30
309 38
p
S1
S0
2Scan forward
bull This operation is performed when after(p)ltkey
bull Here the pointer p moves to the next element in the list
bull eg here key=28 amp p is at 9 after(9)lt28 so scan forward
-infin 9 infin 25 30
p p pS0
Searching a key kKeep a ptr p to the first node in the highest list Sh
while (after(p)gtk)
if (Scur==S0) Scur is the current skip list
then ldquokey k not foundrdquo
exit
if (after(p)gtk)
drop down to next skip list
If (after(p)ltk)
scan forward ie update pafter(p)
if (after(p)==k)
return after(p)
-infin
-infin
-infin
Searching for a key 25
S3
S2
S1
S0
-infin 5 infin 4038 353025 9
infin
infin
infin
30
9 30 38
p
p
p
Key found
p
p
p
-infin
-infin
-infin
Searching for a key 28
S3
S2
S1
S0
-infin 5 infin 4038 353025 9
infin
infin
infin
30
9 30 38
p
p
p
Key not found
p
p
p
p
-infin
-infin
-infin
Deletion of a key
S3 eg delete 30
S2
S1
S0
-infin 5 infin 4038 353025 9
infin
infin
infin
30
9 30 38
p
p
p
p
p
Analysis
1 An element xk is in Si with probability 12i true forall elements
E(Si ) = sum 12i Xki where Xki = 1 if xk is in Si
0 otherwise
= n2i
E(total size) = E(sum ISi I)
= sum n2i le 2n
2 Expected height of a skip listh = log n
n2h =1
h ≃ log n
n
k=1
k=1
infin
Analysis(contd)
3 Drop down O(log n)
Since pointer p can drop atmost h times
ieheight of the skip list until S0 is reached
and h = logn
4 Scan forward O(log n)
of elements Total no of levels Total Cost
to scan at each level
O(1) O(log n ) O(log n )
The number of elements scanned at ith level is no more than 2 because
The key lies between p and after(p) on the (i+1)th level (thatrsquos why we came down to ith level) And there is only one element between p and after(p) of
(i+1)th level in Si the element pointed to by after(p) in Si
Thus we scan at most two elements at Si the element pointed to by p (when we came down) and after(p) in Si
Hashing
bull Motivation symbol tablesndash A compiler uses a symbol table to relate
symbols to associated databull Symbols variable names procedure names etcbull Associated data memory location call graph etc
ndash For a symbol table (also called a dictionary) we care about search insertion and deletion
ndash We typically donrsquot care about sorted order
Hash Tables
bull More formallyndash Given a table T and a record x with key (= symbol)
and satellite data we need to supportbull Insert (T x)bull Delete (T x)bull Search(T x)
ndash We want these to be fast but donrsquot care about sorting the records
bull The structure we will use is a hash tablendash Supports all the above in O(1) expected time
Hash Functions
bull Next problem collision T
0
m - 1
h(k1)
h(k4)
h(k2) = h(k5)
h(k3)
k4
k2 k3
k1
k5
U(universe of keys)
K(actualkeys)
Resolving Collisions
bull How can we solve the problem of collisions
bull One of the solution is chaining
bull Other solutions open addressing
Chaining
bull Chaining puts elements that hash to the same slot in a linked list
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Chaining
bull How do we insert an element
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Chaining
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
bull How do we delete an element
Chaining
bull How do we search for a element with a given key
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
bull What will be the average cost of a successful search
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
bull What will be the average cost of a successful search A O((1 + )2) = O(1 + )
Analysis of Chaining Continued
bull So the cost of searching = O(1 + )
bull If the number of keys n is proportional to the number of slots in the table what is
bull A = O(1)ndash In other words we can make the expected
cost of searching constant if we make constant
If we could prove this
P(failure)lt1k (we are sort of happy)
P(failure)lt1nk (most of times this is true and wersquore
happy )
P(failure)lt12n (this is difficult but still we want this)
A Final Word About Randomized Algorithms
Acknowledgements
bull Kunal Verma
bull Nidhi Aggarwal
bull And other students of MSc(CS) batch 2009
END
Quicksort Expected number of comparisons
bull Partition may generate splits (0n-1 1n-2 2n-3 hellip n-21 n-
10) each with probability 1n
bull If T(n) is the expected running time
euro
T n( ) =1
nT k( ) + T n minus1minus k( )[ ] + Θ n( )
k= 0
nminus1
sum
Randomized Quick-Sort
bull Pick an element from the array--call it the pivotbull Partition the items in the array around the pivot so all
elements to the left are to the pivot and all elements to the right are greater than the pivot
bull Use recursion to sort the two partitions
pivotpartition items gt pivotpartition 1 items pivot
Remarksbull Not much different from the Q-sort except
that earlier the algorithm was deterministic and the bounds were probabilistic
bull Here the algorithm is also randomized We pick an element to be a pivot randomly Notice that there isnrsquot any difference as to how does the algorithm behave there onwards
bull In the earlier case we can identify the worst case input Here no input is worst case
Randomized Select
1
0
1max1 n
k
nknTkTn
nT
Randomized Algorithms
bull A randomized algorithm performs coin tosses (ieuses random bits) to control its execution
bull b larr random()if b = 0do A hellipelse ie b = 1do B hellip
bull Its running time depends on the outcomes of the coin tosses
Assumptions
bull 1048708 the coins are unbiased andbull 1048708 the coin tosses are independent
bull The worst-case running time of a randomized algorithm may be large but occurs with very low probability (eg it occurs when all the coin tosses give ldquoheadsrdquo)
Monte Carlo Algorithms
bull Running times are guaranteed but the output may not be completely correct
bull Probability of error is low
Las Vegas Algorithms
bull Output is guaranteed to be correct
bull Bounds on running times hold with high probability
bull What type of algorithm is Randomized Qsort
Why expected running times
bull Markovrsquos inequality
P( X gt k E(X)) lt 1k
ie the probability that the algorithm will take more than O(2 E(X)) time is less than 12
Or the probability that the algorithm will take more than O(10 E(X)) time is less than 110
This is the reason why Qsort does well in practice
Markovrsquos Bound
P(XltkM)lt 1k where k is a constant
Chernouffrsquos Bound
P(Xgt2μ)lt frac12
A More Stronger Result
P(Xgtk μ )lt 1nk where k is a constant
Binary search tree can be built randomly
Rank(x)=i
Randomly selected key becomes the root
Pivot element=root
x
gtlt
RANDOMLY BUILT BST
bull Xi the height of the tree rooted at a node with rank=i
bull Yi exponential height of the tree=2^Xi
bull H=maxH1H2 + 1
where H1 ht of left subtree
H2 htof right subtree
H ht of the tree rooted at x
HEIGHT OF THE TREE
bull Y=2^H
=2max2^H12^H2
bull Expected value of exponential ht of the tree with lsquonrsquo nodes
=E(EH(T(X)))
=2n sum maxEH(T(k))EH(T(n-1-k))
=O(n^3)=E(H(T(n)))=O(log n)
Skip list is a data structure that can be used to maintain dictionary
Given n keys we insert these n keys in a linked list that has -infin as first node and infin as last node
Initial list S0
Then we flip coin a coin for each element until only one is left in Si if a tail occurswe insert it into next list Si+1 and so on
-infin infin5 9 25 30 35 38 40
Skip List Dictionary as ADT
-infin
-infin
-infin
S3
S2
S1
S0
-infin 5 infin 4038 353025 9
infin
infin
infin
9 30 38
30
38 30
head Tail
Operations that can be performed on skip list
Each node has two pointers right and down
1 Drop down
bull This operation is performed when after(p)gtkey
bull In this operation pointer p moves down to immediate lower level list
(after drop down)
right
down
-infin
-infin infin
infin 30
309 38
p
S1
S0
2Scan forward
bull This operation is performed when after(p)ltkey
bull Here the pointer p moves to the next element in the list
bull eg here key=28 amp p is at 9 after(9)lt28 so scan forward
-infin 9 infin 25 30
p p pS0
Searching a key kKeep a ptr p to the first node in the highest list Sh
while (after(p)gtk)
if (Scur==S0) Scur is the current skip list
then ldquokey k not foundrdquo
exit
if (after(p)gtk)
drop down to next skip list
If (after(p)ltk)
scan forward ie update pafter(p)
if (after(p)==k)
return after(p)
-infin
-infin
-infin
Searching for a key 25
S3
S2
S1
S0
-infin 5 infin 4038 353025 9
infin
infin
infin
30
9 30 38
p
p
p
Key found
p
p
p
-infin
-infin
-infin
Searching for a key 28
S3
S2
S1
S0
-infin 5 infin 4038 353025 9
infin
infin
infin
30
9 30 38
p
p
p
Key not found
p
p
p
p
-infin
-infin
-infin
Deletion of a key
S3 eg delete 30
S2
S1
S0
-infin 5 infin 4038 353025 9
infin
infin
infin
30
9 30 38
p
p
p
p
p
Analysis
1 An element xk is in Si with probability 12i true forall elements
E(Si ) = sum 12i Xki where Xki = 1 if xk is in Si
0 otherwise
= n2i
E(total size) = E(sum ISi I)
= sum n2i le 2n
2 Expected height of a skip listh = log n
n2h =1
h ≃ log n
n
k=1
k=1
infin
Analysis(contd)
3 Drop down O(log n)
Since pointer p can drop atmost h times
ieheight of the skip list until S0 is reached
and h = logn
4 Scan forward O(log n)
of elements Total no of levels Total Cost
to scan at each level
O(1) O(log n ) O(log n )
The number of elements scanned at ith level is no more than 2 because
The key lies between p and after(p) on the (i+1)th level (thatrsquos why we came down to ith level) And there is only one element between p and after(p) of
(i+1)th level in Si the element pointed to by after(p) in Si
Thus we scan at most two elements at Si the element pointed to by p (when we came down) and after(p) in Si
Hashing
bull Motivation symbol tablesndash A compiler uses a symbol table to relate
symbols to associated databull Symbols variable names procedure names etcbull Associated data memory location call graph etc
ndash For a symbol table (also called a dictionary) we care about search insertion and deletion
ndash We typically donrsquot care about sorted order
Hash Tables
bull More formallyndash Given a table T and a record x with key (= symbol)
and satellite data we need to supportbull Insert (T x)bull Delete (T x)bull Search(T x)
ndash We want these to be fast but donrsquot care about sorting the records
bull The structure we will use is a hash tablendash Supports all the above in O(1) expected time
Hash Functions
bull Next problem collision T
0
m - 1
h(k1)
h(k4)
h(k2) = h(k5)
h(k3)
k4
k2 k3
k1
k5
U(universe of keys)
K(actualkeys)
Resolving Collisions
bull How can we solve the problem of collisions
bull One of the solution is chaining
bull Other solutions open addressing
Chaining
bull Chaining puts elements that hash to the same slot in a linked list
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Chaining
bull How do we insert an element
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Chaining
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
bull How do we delete an element
Chaining
bull How do we search for a element with a given key
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
bull What will be the average cost of a successful search
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
bull What will be the average cost of a successful search A O((1 + )2) = O(1 + )
Analysis of Chaining Continued
bull So the cost of searching = O(1 + )
bull If the number of keys n is proportional to the number of slots in the table what is
bull A = O(1)ndash In other words we can make the expected
cost of searching constant if we make constant
If we could prove this
P(failure)lt1k (we are sort of happy)
P(failure)lt1nk (most of times this is true and wersquore
happy )
P(failure)lt12n (this is difficult but still we want this)
A Final Word About Randomized Algorithms
Acknowledgements
bull Kunal Verma
bull Nidhi Aggarwal
bull And other students of MSc(CS) batch 2009
END
Randomized Quick-Sort
bull Pick an element from the array--call it the pivotbull Partition the items in the array around the pivot so all
elements to the left are to the pivot and all elements to the right are greater than the pivot
bull Use recursion to sort the two partitions
pivotpartition items gt pivotpartition 1 items pivot
Remarksbull Not much different from the Q-sort except
that earlier the algorithm was deterministic and the bounds were probabilistic
bull Here the algorithm is also randomized We pick an element to be a pivot randomly Notice that there isnrsquot any difference as to how does the algorithm behave there onwards
bull In the earlier case we can identify the worst case input Here no input is worst case
Randomized Select
1
0
1max1 n
k
nknTkTn
nT
Randomized Algorithms
bull A randomized algorithm performs coin tosses (ieuses random bits) to control its execution
bull b larr random()if b = 0do A hellipelse ie b = 1do B hellip
bull Its running time depends on the outcomes of the coin tosses
Assumptions
bull 1048708 the coins are unbiased andbull 1048708 the coin tosses are independent
bull The worst-case running time of a randomized algorithm may be large but occurs with very low probability (eg it occurs when all the coin tosses give ldquoheadsrdquo)
Monte Carlo Algorithms
bull Running times are guaranteed but the output may not be completely correct
bull Probability of error is low
Las Vegas Algorithms
bull Output is guaranteed to be correct
bull Bounds on running times hold with high probability
bull What type of algorithm is Randomized Qsort
Why expected running times
bull Markovrsquos inequality
P( X gt k E(X)) lt 1k
ie the probability that the algorithm will take more than O(2 E(X)) time is less than 12
Or the probability that the algorithm will take more than O(10 E(X)) time is less than 110
This is the reason why Qsort does well in practice
Markovrsquos Bound
P(XltkM)lt 1k where k is a constant
Chernouffrsquos Bound
P(Xgt2μ)lt frac12
A More Stronger Result
P(Xgtk μ )lt 1nk where k is a constant
Binary search tree can be built randomly
Rank(x)=i
Randomly selected key becomes the root
Pivot element=root
x
gtlt
RANDOMLY BUILT BST
bull Xi the height of the tree rooted at a node with rank=i
bull Yi exponential height of the tree=2^Xi
bull H=maxH1H2 + 1
where H1 ht of left subtree
H2 htof right subtree
H ht of the tree rooted at x
HEIGHT OF THE TREE
bull Y=2^H
=2max2^H12^H2
bull Expected value of exponential ht of the tree with lsquonrsquo nodes
=E(EH(T(X)))
=2n sum maxEH(T(k))EH(T(n-1-k))
=O(n^3)=E(H(T(n)))=O(log n)
Skip list is a data structure that can be used to maintain dictionary
Given n keys we insert these n keys in a linked list that has -infin as first node and infin as last node
Initial list S0
Then we flip coin a coin for each element until only one is left in Si if a tail occurswe insert it into next list Si+1 and so on
-infin infin5 9 25 30 35 38 40
Skip List Dictionary as ADT
-infin
-infin
-infin
S3
S2
S1
S0
-infin 5 infin 4038 353025 9
infin
infin
infin
9 30 38
30
38 30
head Tail
Operations that can be performed on skip list
Each node has two pointers right and down
1 Drop down
bull This operation is performed when after(p)gtkey
bull In this operation pointer p moves down to immediate lower level list
(after drop down)
right
down
-infin
-infin infin
infin 30
309 38
p
S1
S0
2Scan forward
bull This operation is performed when after(p)ltkey
bull Here the pointer p moves to the next element in the list
bull eg here key=28 amp p is at 9 after(9)lt28 so scan forward
-infin 9 infin 25 30
p p pS0
Searching a key kKeep a ptr p to the first node in the highest list Sh
while (after(p)gtk)
if (Scur==S0) Scur is the current skip list
then ldquokey k not foundrdquo
exit
if (after(p)gtk)
drop down to next skip list
If (after(p)ltk)
scan forward ie update pafter(p)
if (after(p)==k)
return after(p)
-infin
-infin
-infin
Searching for a key 25
S3
S2
S1
S0
-infin 5 infin 4038 353025 9
infin
infin
infin
30
9 30 38
p
p
p
Key found
p
p
p
-infin
-infin
-infin
Searching for a key 28
S3
S2
S1
S0
-infin 5 infin 4038 353025 9
infin
infin
infin
30
9 30 38
p
p
p
Key not found
p
p
p
p
-infin
-infin
-infin
Deletion of a key
S3 eg delete 30
S2
S1
S0
-infin 5 infin 4038 353025 9
infin
infin
infin
30
9 30 38
p
p
p
p
p
Analysis
1 An element xk is in Si with probability 12i true forall elements
E(Si ) = sum 12i Xki where Xki = 1 if xk is in Si
0 otherwise
= n2i
E(total size) = E(sum ISi I)
= sum n2i le 2n
2 Expected height of a skip listh = log n
n2h =1
h ≃ log n
n
k=1
k=1
infin
Analysis(contd)
3 Drop down O(log n)
Since pointer p can drop atmost h times
ieheight of the skip list until S0 is reached
and h = logn
4 Scan forward O(log n)
of elements Total no of levels Total Cost
to scan at each level
O(1) O(log n ) O(log n )
The number of elements scanned at ith level is no more than 2 because
The key lies between p and after(p) on the (i+1)th level (thatrsquos why we came down to ith level) And there is only one element between p and after(p) of
(i+1)th level in Si the element pointed to by after(p) in Si
Thus we scan at most two elements at Si the element pointed to by p (when we came down) and after(p) in Si
Hashing
bull Motivation symbol tablesndash A compiler uses a symbol table to relate
symbols to associated databull Symbols variable names procedure names etcbull Associated data memory location call graph etc
ndash For a symbol table (also called a dictionary) we care about search insertion and deletion
ndash We typically donrsquot care about sorted order
Hash Tables
bull More formallyndash Given a table T and a record x with key (= symbol)
and satellite data we need to supportbull Insert (T x)bull Delete (T x)bull Search(T x)
ndash We want these to be fast but donrsquot care about sorting the records
bull The structure we will use is a hash tablendash Supports all the above in O(1) expected time
Hash Functions
bull Next problem collision T
0
m - 1
h(k1)
h(k4)
h(k2) = h(k5)
h(k3)
k4
k2 k3
k1
k5
U(universe of keys)
K(actualkeys)
Resolving Collisions
bull How can we solve the problem of collisions
bull One of the solution is chaining
bull Other solutions open addressing
Chaining
bull Chaining puts elements that hash to the same slot in a linked list
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Chaining
bull How do we insert an element
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Chaining
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
bull How do we delete an element
Chaining
bull How do we search for a element with a given key
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
bull What will be the average cost of a successful search
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
bull What will be the average cost of a successful search A O((1 + )2) = O(1 + )
Analysis of Chaining Continued
bull So the cost of searching = O(1 + )
bull If the number of keys n is proportional to the number of slots in the table what is
bull A = O(1)ndash In other words we can make the expected
cost of searching constant if we make constant
If we could prove this
P(failure)lt1k (we are sort of happy)
P(failure)lt1nk (most of times this is true and wersquore
happy )
P(failure)lt12n (this is difficult but still we want this)
A Final Word About Randomized Algorithms
Acknowledgements
bull Kunal Verma
bull Nidhi Aggarwal
bull And other students of MSc(CS) batch 2009
END
Remarksbull Not much different from the Q-sort except
that earlier the algorithm was deterministic and the bounds were probabilistic
bull Here the algorithm is also randomized We pick an element to be a pivot randomly Notice that there isnrsquot any difference as to how does the algorithm behave there onwards
bull In the earlier case we can identify the worst case input Here no input is worst case
Randomized Select
1
0
1max1 n
k
nknTkTn
nT
Randomized Algorithms
bull A randomized algorithm performs coin tosses (ieuses random bits) to control its execution
bull b larr random()if b = 0do A hellipelse ie b = 1do B hellip
bull Its running time depends on the outcomes of the coin tosses
Assumptions
bull 1048708 the coins are unbiased andbull 1048708 the coin tosses are independent
bull The worst-case running time of a randomized algorithm may be large but occurs with very low probability (eg it occurs when all the coin tosses give ldquoheadsrdquo)
Monte Carlo Algorithms
bull Running times are guaranteed but the output may not be completely correct
bull Probability of error is low
Las Vegas Algorithms
bull Output is guaranteed to be correct
bull Bounds on running times hold with high probability
bull What type of algorithm is Randomized Qsort
Why expected running times
bull Markovrsquos inequality
P( X gt k E(X)) lt 1k
ie the probability that the algorithm will take more than O(2 E(X)) time is less than 12
Or the probability that the algorithm will take more than O(10 E(X)) time is less than 110
This is the reason why Qsort does well in practice
Markovrsquos Bound
P(XltkM)lt 1k where k is a constant
Chernouffrsquos Bound
P(Xgt2μ)lt frac12
A More Stronger Result
P(Xgtk μ )lt 1nk where k is a constant
Binary search tree can be built randomly
Rank(x)=i
Randomly selected key becomes the root
Pivot element=root
x
gtlt
RANDOMLY BUILT BST
bull Xi the height of the tree rooted at a node with rank=i
bull Yi exponential height of the tree=2^Xi
bull H=maxH1H2 + 1
where H1 ht of left subtree
H2 htof right subtree
H ht of the tree rooted at x
HEIGHT OF THE TREE
bull Y=2^H
=2max2^H12^H2
bull Expected value of exponential ht of the tree with lsquonrsquo nodes
=E(EH(T(X)))
=2n sum maxEH(T(k))EH(T(n-1-k))
=O(n^3)=E(H(T(n)))=O(log n)
Skip list is a data structure that can be used to maintain dictionary
Given n keys we insert these n keys in a linked list that has -infin as first node and infin as last node
Initial list S0
Then we flip coin a coin for each element until only one is left in Si if a tail occurswe insert it into next list Si+1 and so on
-infin infin5 9 25 30 35 38 40
Skip List Dictionary as ADT
-infin
-infin
-infin
S3
S2
S1
S0
-infin 5 infin 4038 353025 9
infin
infin
infin
9 30 38
30
38 30
head Tail
Operations that can be performed on skip list
Each node has two pointers right and down
1 Drop down
bull This operation is performed when after(p)gtkey
bull In this operation pointer p moves down to immediate lower level list
(after drop down)
right
down
-infin
-infin infin
infin 30
309 38
p
S1
S0
2Scan forward
bull This operation is performed when after(p)ltkey
bull Here the pointer p moves to the next element in the list
bull eg here key=28 amp p is at 9 after(9)lt28 so scan forward
-infin 9 infin 25 30
p p pS0
Searching a key kKeep a ptr p to the first node in the highest list Sh
while (after(p)gtk)
if (Scur==S0) Scur is the current skip list
then ldquokey k not foundrdquo
exit
if (after(p)gtk)
drop down to next skip list
If (after(p)ltk)
scan forward ie update pafter(p)
if (after(p)==k)
return after(p)
-infin
-infin
-infin
Searching for a key 25
S3
S2
S1
S0
-infin 5 infin 4038 353025 9
infin
infin
infin
30
9 30 38
p
p
p
Key found
p
p
p
-infin
-infin
-infin
Searching for a key 28
S3
S2
S1
S0
-infin 5 infin 4038 353025 9
infin
infin
infin
30
9 30 38
p
p
p
Key not found
p
p
p
p
-infin
-infin
-infin
Deletion of a key
S3 eg delete 30
S2
S1
S0
-infin 5 infin 4038 353025 9
infin
infin
infin
30
9 30 38
p
p
p
p
p
Analysis
1 An element xk is in Si with probability 12i true forall elements
E(Si ) = sum 12i Xki where Xki = 1 if xk is in Si
0 otherwise
= n2i
E(total size) = E(sum ISi I)
= sum n2i le 2n
2 Expected height of a skip listh = log n
n2h =1
h ≃ log n
n
k=1
k=1
infin
Analysis(contd)
3 Drop down O(log n)
Since pointer p can drop atmost h times
ieheight of the skip list until S0 is reached
and h = logn
4 Scan forward O(log n)
of elements Total no of levels Total Cost
to scan at each level
O(1) O(log n ) O(log n )
The number of elements scanned at ith level is no more than 2 because
The key lies between p and after(p) on the (i+1)th level (thatrsquos why we came down to ith level) And there is only one element between p and after(p) of
(i+1)th level in Si the element pointed to by after(p) in Si
Thus we scan at most two elements at Si the element pointed to by p (when we came down) and after(p) in Si
Hashing
bull Motivation symbol tablesndash A compiler uses a symbol table to relate
symbols to associated databull Symbols variable names procedure names etcbull Associated data memory location call graph etc
ndash For a symbol table (also called a dictionary) we care about search insertion and deletion
ndash We typically donrsquot care about sorted order
Hash Tables
bull More formallyndash Given a table T and a record x with key (= symbol)
and satellite data we need to supportbull Insert (T x)bull Delete (T x)bull Search(T x)
ndash We want these to be fast but donrsquot care about sorting the records
bull The structure we will use is a hash tablendash Supports all the above in O(1) expected time
Hash Functions
bull Next problem collision T
0
m - 1
h(k1)
h(k4)
h(k2) = h(k5)
h(k3)
k4
k2 k3
k1
k5
U(universe of keys)
K(actualkeys)
Resolving Collisions
bull How can we solve the problem of collisions
bull One of the solution is chaining
bull Other solutions open addressing
Chaining
bull Chaining puts elements that hash to the same slot in a linked list
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Chaining
bull How do we insert an element
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Chaining
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
bull How do we delete an element
Chaining
bull How do we search for a element with a given key
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
bull What will be the average cost of a successful search
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
bull What will be the average cost of a successful search A O((1 + )2) = O(1 + )
Analysis of Chaining Continued
bull So the cost of searching = O(1 + )
bull If the number of keys n is proportional to the number of slots in the table what is
bull A = O(1)ndash In other words we can make the expected
cost of searching constant if we make constant
If we could prove this
P(failure)lt1k (we are sort of happy)
P(failure)lt1nk (most of times this is true and wersquore
happy )
P(failure)lt12n (this is difficult but still we want this)
A Final Word About Randomized Algorithms
Acknowledgements
bull Kunal Verma
bull Nidhi Aggarwal
bull And other students of MSc(CS) batch 2009
END
Randomized Select
1
0
1max1 n
k
nknTkTn
nT
Randomized Algorithms
bull A randomized algorithm performs coin tosses (ieuses random bits) to control its execution
bull b larr random()if b = 0do A hellipelse ie b = 1do B hellip
bull Its running time depends on the outcomes of the coin tosses
Assumptions
bull 1048708 the coins are unbiased andbull 1048708 the coin tosses are independent
bull The worst-case running time of a randomized algorithm may be large but occurs with very low probability (eg it occurs when all the coin tosses give ldquoheadsrdquo)
Monte Carlo Algorithms
bull Running times are guaranteed but the output may not be completely correct
bull Probability of error is low
Las Vegas Algorithms
bull Output is guaranteed to be correct
bull Bounds on running times hold with high probability
bull What type of algorithm is Randomized Qsort
Why expected running times
bull Markovrsquos inequality
P( X gt k E(X)) lt 1k
ie the probability that the algorithm will take more than O(2 E(X)) time is less than 12
Or the probability that the algorithm will take more than O(10 E(X)) time is less than 110
This is the reason why Qsort does well in practice
Markovrsquos Bound
P(XltkM)lt 1k where k is a constant
Chernouffrsquos Bound
P(Xgt2μ)lt frac12
A More Stronger Result
P(Xgtk μ )lt 1nk where k is a constant
Binary search tree can be built randomly
Rank(x)=i
Randomly selected key becomes the root
Pivot element=root
x
gtlt
RANDOMLY BUILT BST
bull Xi the height of the tree rooted at a node with rank=i
bull Yi exponential height of the tree=2^Xi
bull H=maxH1H2 + 1
where H1 ht of left subtree
H2 htof right subtree
H ht of the tree rooted at x
HEIGHT OF THE TREE
bull Y=2^H
=2max2^H12^H2
bull Expected value of exponential ht of the tree with lsquonrsquo nodes
=E(EH(T(X)))
=2n sum maxEH(T(k))EH(T(n-1-k))
=O(n^3)=E(H(T(n)))=O(log n)
Skip list is a data structure that can be used to maintain dictionary
Given n keys we insert these n keys in a linked list that has -infin as first node and infin as last node
Initial list S0
Then we flip coin a coin for each element until only one is left in Si if a tail occurswe insert it into next list Si+1 and so on
-infin infin5 9 25 30 35 38 40
Skip List Dictionary as ADT
-infin
-infin
-infin
S3
S2
S1
S0
-infin 5 infin 4038 353025 9
infin
infin
infin
9 30 38
30
38 30
head Tail
Operations that can be performed on skip list
Each node has two pointers right and down
1 Drop down
bull This operation is performed when after(p)gtkey
bull In this operation pointer p moves down to immediate lower level list
(after drop down)
right
down
-infin
-infin infin
infin 30
309 38
p
S1
S0
2Scan forward
bull This operation is performed when after(p)ltkey
bull Here the pointer p moves to the next element in the list
bull eg here key=28 amp p is at 9 after(9)lt28 so scan forward
-infin 9 infin 25 30
p p pS0
Searching a key kKeep a ptr p to the first node in the highest list Sh
while (after(p)gtk)
if (Scur==S0) Scur is the current skip list
then ldquokey k not foundrdquo
exit
if (after(p)gtk)
drop down to next skip list
If (after(p)ltk)
scan forward ie update pafter(p)
if (after(p)==k)
return after(p)
-infin
-infin
-infin
Searching for a key 25
S3
S2
S1
S0
-infin 5 infin 4038 353025 9
infin
infin
infin
30
9 30 38
p
p
p
Key found
p
p
p
-infin
-infin
-infin
Searching for a key 28
S3
S2
S1
S0
-infin 5 infin 4038 353025 9
infin
infin
infin
30
9 30 38
p
p
p
Key not found
p
p
p
p
-infin
-infin
-infin
Deletion of a key
S3 eg delete 30
S2
S1
S0
-infin 5 infin 4038 353025 9
infin
infin
infin
30
9 30 38
p
p
p
p
p
Analysis
1 An element xk is in Si with probability 12i true forall elements
E(Si ) = sum 12i Xki where Xki = 1 if xk is in Si
0 otherwise
= n2i
E(total size) = E(sum ISi I)
= sum n2i le 2n
2 Expected height of a skip listh = log n
n2h =1
h ≃ log n
n
k=1
k=1
infin
Analysis(contd)
3 Drop down O(log n)
Since pointer p can drop atmost h times
ieheight of the skip list until S0 is reached
and h = logn
4 Scan forward O(log n)
of elements Total no of levels Total Cost
to scan at each level
O(1) O(log n ) O(log n )
The number of elements scanned at ith level is no more than 2 because
The key lies between p and after(p) on the (i+1)th level (thatrsquos why we came down to ith level) And there is only one element between p and after(p) of
(i+1)th level in Si the element pointed to by after(p) in Si
Thus we scan at most two elements at Si the element pointed to by p (when we came down) and after(p) in Si
Hashing
bull Motivation symbol tablesndash A compiler uses a symbol table to relate
symbols to associated databull Symbols variable names procedure names etcbull Associated data memory location call graph etc
ndash For a symbol table (also called a dictionary) we care about search insertion and deletion
ndash We typically donrsquot care about sorted order
Hash Tables
bull More formallyndash Given a table T and a record x with key (= symbol)
and satellite data we need to supportbull Insert (T x)bull Delete (T x)bull Search(T x)
ndash We want these to be fast but donrsquot care about sorting the records
bull The structure we will use is a hash tablendash Supports all the above in O(1) expected time
Hash Functions
bull Next problem collision T
0
m - 1
h(k1)
h(k4)
h(k2) = h(k5)
h(k3)
k4
k2 k3
k1
k5
U(universe of keys)
K(actualkeys)
Resolving Collisions
bull How can we solve the problem of collisions
bull One of the solution is chaining
bull Other solutions open addressing
Chaining
bull Chaining puts elements that hash to the same slot in a linked list
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Chaining
bull How do we insert an element
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Chaining
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
bull How do we delete an element
Chaining
bull How do we search for a element with a given key
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
bull What will be the average cost of a successful search
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
bull What will be the average cost of a successful search A O((1 + )2) = O(1 + )
Analysis of Chaining Continued
bull So the cost of searching = O(1 + )
bull If the number of keys n is proportional to the number of slots in the table what is
bull A = O(1)ndash In other words we can make the expected
cost of searching constant if we make constant
If we could prove this
P(failure)lt1k (we are sort of happy)
P(failure)lt1nk (most of times this is true and wersquore
happy )
P(failure)lt12n (this is difficult but still we want this)
A Final Word About Randomized Algorithms
Acknowledgements
bull Kunal Verma
bull Nidhi Aggarwal
bull And other students of MSc(CS) batch 2009
END
Randomized Algorithms
bull A randomized algorithm performs coin tosses (ieuses random bits) to control its execution
bull b larr random()if b = 0do A hellipelse ie b = 1do B hellip
bull Its running time depends on the outcomes of the coin tosses
Assumptions
bull 1048708 the coins are unbiased andbull 1048708 the coin tosses are independent
bull The worst-case running time of a randomized algorithm may be large but occurs with very low probability (eg it occurs when all the coin tosses give ldquoheadsrdquo)
Monte Carlo Algorithms
bull Running times are guaranteed but the output may not be completely correct
bull Probability of error is low
Las Vegas Algorithms
bull Output is guaranteed to be correct
bull Bounds on running times hold with high probability
bull What type of algorithm is Randomized Qsort
Why expected running times
bull Markovrsquos inequality
P( X gt k E(X)) lt 1k
ie the probability that the algorithm will take more than O(2 E(X)) time is less than 12
Or the probability that the algorithm will take more than O(10 E(X)) time is less than 110
This is the reason why Qsort does well in practice
Markovrsquos Bound
P(XltkM)lt 1k where k is a constant
Chernouffrsquos Bound
P(Xgt2μ)lt frac12
A More Stronger Result
P(Xgtk μ )lt 1nk where k is a constant
Binary search tree can be built randomly
Rank(x)=i
Randomly selected key becomes the root
Pivot element=root
x
gtlt
RANDOMLY BUILT BST
bull Xi the height of the tree rooted at a node with rank=i
bull Yi exponential height of the tree=2^Xi
bull H=maxH1H2 + 1
where H1 ht of left subtree
H2 htof right subtree
H ht of the tree rooted at x
HEIGHT OF THE TREE
bull Y=2^H
=2max2^H12^H2
bull Expected value of exponential ht of the tree with lsquonrsquo nodes
=E(EH(T(X)))
=2n sum maxEH(T(k))EH(T(n-1-k))
=O(n^3)=E(H(T(n)))=O(log n)
Skip list is a data structure that can be used to maintain dictionary
Given n keys we insert these n keys in a linked list that has -infin as first node and infin as last node
Initial list S0
Then we flip coin a coin for each element until only one is left in Si if a tail occurswe insert it into next list Si+1 and so on
-infin infin5 9 25 30 35 38 40
Skip List Dictionary as ADT
-infin
-infin
-infin
S3
S2
S1
S0
-infin 5 infin 4038 353025 9
infin
infin
infin
9 30 38
30
38 30
head Tail
Operations that can be performed on skip list
Each node has two pointers right and down
1 Drop down
bull This operation is performed when after(p)gtkey
bull In this operation pointer p moves down to immediate lower level list
(after drop down)
right
down
-infin
-infin infin
infin 30
309 38
p
S1
S0
2Scan forward
bull This operation is performed when after(p)ltkey
bull Here the pointer p moves to the next element in the list
bull eg here key=28 amp p is at 9 after(9)lt28 so scan forward
-infin 9 infin 25 30
p p pS0
Searching a key kKeep a ptr p to the first node in the highest list Sh
while (after(p)gtk)
if (Scur==S0) Scur is the current skip list
then ldquokey k not foundrdquo
exit
if (after(p)gtk)
drop down to next skip list
If (after(p)ltk)
scan forward ie update pafter(p)
if (after(p)==k)
return after(p)
-infin
-infin
-infin
Searching for a key 25
S3
S2
S1
S0
-infin 5 infin 4038 353025 9
infin
infin
infin
30
9 30 38
p
p
p
Key found
p
p
p
-infin
-infin
-infin
Searching for a key 28
S3
S2
S1
S0
-infin 5 infin 4038 353025 9
infin
infin
infin
30
9 30 38
p
p
p
Key not found
p
p
p
p
-infin
-infin
-infin
Deletion of a key
S3 eg delete 30
S2
S1
S0
-infin 5 infin 4038 353025 9
infin
infin
infin
30
9 30 38
p
p
p
p
p
Analysis
1 An element xk is in Si with probability 12i true forall elements
E(Si ) = sum 12i Xki where Xki = 1 if xk is in Si
0 otherwise
= n2i
E(total size) = E(sum ISi I)
= sum n2i le 2n
2 Expected height of a skip listh = log n
n2h =1
h ≃ log n
n
k=1
k=1
infin
Analysis(contd)
3 Drop down O(log n)
Since pointer p can drop atmost h times
ieheight of the skip list until S0 is reached
and h = logn
4 Scan forward O(log n)
of elements Total no of levels Total Cost
to scan at each level
O(1) O(log n ) O(log n )
The number of elements scanned at ith level is no more than 2 because
The key lies between p and after(p) on the (i+1)th level (thatrsquos why we came down to ith level) And there is only one element between p and after(p) of
(i+1)th level in Si the element pointed to by after(p) in Si
Thus we scan at most two elements at Si the element pointed to by p (when we came down) and after(p) in Si
Hashing
bull Motivation symbol tablesndash A compiler uses a symbol table to relate
symbols to associated databull Symbols variable names procedure names etcbull Associated data memory location call graph etc
ndash For a symbol table (also called a dictionary) we care about search insertion and deletion
ndash We typically donrsquot care about sorted order
Hash Tables
bull More formallyndash Given a table T and a record x with key (= symbol)
and satellite data we need to supportbull Insert (T x)bull Delete (T x)bull Search(T x)
ndash We want these to be fast but donrsquot care about sorting the records
bull The structure we will use is a hash tablendash Supports all the above in O(1) expected time
Hash Functions
bull Next problem collision T
0
m - 1
h(k1)
h(k4)
h(k2) = h(k5)
h(k3)
k4
k2 k3
k1
k5
U(universe of keys)
K(actualkeys)
Resolving Collisions
bull How can we solve the problem of collisions
bull One of the solution is chaining
bull Other solutions open addressing
Chaining
bull Chaining puts elements that hash to the same slot in a linked list
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Chaining
bull How do we insert an element
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Chaining
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
bull How do we delete an element
Chaining
bull How do we search for a element with a given key
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
bull What will be the average cost of a successful search
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
bull What will be the average cost of a successful search A O((1 + )2) = O(1 + )
Analysis of Chaining Continued
bull So the cost of searching = O(1 + )
bull If the number of keys n is proportional to the number of slots in the table what is
bull A = O(1)ndash In other words we can make the expected
cost of searching constant if we make constant
If we could prove this
P(failure)lt1k (we are sort of happy)
P(failure)lt1nk (most of times this is true and wersquore
happy )
P(failure)lt12n (this is difficult but still we want this)
A Final Word About Randomized Algorithms
Acknowledgements
bull Kunal Verma
bull Nidhi Aggarwal
bull And other students of MSc(CS) batch 2009
END
Assumptions
bull 1048708 the coins are unbiased andbull 1048708 the coin tosses are independent
bull The worst-case running time of a randomized algorithm may be large but occurs with very low probability (eg it occurs when all the coin tosses give ldquoheadsrdquo)
Monte Carlo Algorithms
bull Running times are guaranteed but the output may not be completely correct
bull Probability of error is low
Las Vegas Algorithms
bull Output is guaranteed to be correct
bull Bounds on running times hold with high probability
bull What type of algorithm is Randomized Qsort
Why expected running times
bull Markovrsquos inequality
P( X gt k E(X)) lt 1k
ie the probability that the algorithm will take more than O(2 E(X)) time is less than 12
Or the probability that the algorithm will take more than O(10 E(X)) time is less than 110
This is the reason why Qsort does well in practice
Markovrsquos Bound
P(XltkM)lt 1k where k is a constant
Chernouffrsquos Bound
P(Xgt2μ)lt frac12
A More Stronger Result
P(Xgtk μ )lt 1nk where k is a constant
Binary search tree can be built randomly
Rank(x)=i
Randomly selected key becomes the root
Pivot element=root
x
gtlt
RANDOMLY BUILT BST
bull Xi the height of the tree rooted at a node with rank=i
bull Yi exponential height of the tree=2^Xi
bull H=maxH1H2 + 1
where H1 ht of left subtree
H2 htof right subtree
H ht of the tree rooted at x
HEIGHT OF THE TREE
bull Y=2^H
=2max2^H12^H2
bull Expected value of exponential ht of the tree with lsquonrsquo nodes
=E(EH(T(X)))
=2n sum maxEH(T(k))EH(T(n-1-k))
=O(n^3)=E(H(T(n)))=O(log n)
Skip list is a data structure that can be used to maintain dictionary
Given n keys we insert these n keys in a linked list that has -infin as first node and infin as last node
Initial list S0
Then we flip coin a coin for each element until only one is left in Si if a tail occurswe insert it into next list Si+1 and so on
-infin infin5 9 25 30 35 38 40
Skip List Dictionary as ADT
-infin
-infin
-infin
S3
S2
S1
S0
-infin 5 infin 4038 353025 9
infin
infin
infin
9 30 38
30
38 30
head Tail
Operations that can be performed on skip list
Each node has two pointers right and down
1 Drop down
bull This operation is performed when after(p)gtkey
bull In this operation pointer p moves down to immediate lower level list
(after drop down)
right
down
-infin
-infin infin
infin 30
309 38
p
S1
S0
2Scan forward
bull This operation is performed when after(p)ltkey
bull Here the pointer p moves to the next element in the list
bull eg here key=28 amp p is at 9 after(9)lt28 so scan forward
-infin 9 infin 25 30
p p pS0
Searching a key kKeep a ptr p to the first node in the highest list Sh
while (after(p)gtk)
if (Scur==S0) Scur is the current skip list
then ldquokey k not foundrdquo
exit
if (after(p)gtk)
drop down to next skip list
If (after(p)ltk)
scan forward ie update pafter(p)
if (after(p)==k)
return after(p)
-infin
-infin
-infin
Searching for a key 25
S3
S2
S1
S0
-infin 5 infin 4038 353025 9
infin
infin
infin
30
9 30 38
p
p
p
Key found
p
p
p
-infin
-infin
-infin
Searching for a key 28
S3
S2
S1
S0
-infin 5 infin 4038 353025 9
infin
infin
infin
30
9 30 38
p
p
p
Key not found
p
p
p
p
-infin
-infin
-infin
Deletion of a key
S3 eg delete 30
S2
S1
S0
-infin 5 infin 4038 353025 9
infin
infin
infin
30
9 30 38
p
p
p
p
p
Analysis
1 An element xk is in Si with probability 12i true forall elements
E(Si ) = sum 12i Xki where Xki = 1 if xk is in Si
0 otherwise
= n2i
E(total size) = E(sum ISi I)
= sum n2i le 2n
2 Expected height of a skip listh = log n
n2h =1
h ≃ log n
n
k=1
k=1
infin
Analysis(contd)
3 Drop down O(log n)
Since pointer p can drop atmost h times
ieheight of the skip list until S0 is reached
and h = logn
4 Scan forward O(log n)
of elements Total no of levels Total Cost
to scan at each level
O(1) O(log n ) O(log n )
The number of elements scanned at ith level is no more than 2 because
The key lies between p and after(p) on the (i+1)th level (thatrsquos why we came down to ith level) And there is only one element between p and after(p) of
(i+1)th level in Si the element pointed to by after(p) in Si
Thus we scan at most two elements at Si the element pointed to by p (when we came down) and after(p) in Si
Hashing
bull Motivation symbol tablesndash A compiler uses a symbol table to relate
symbols to associated databull Symbols variable names procedure names etcbull Associated data memory location call graph etc
ndash For a symbol table (also called a dictionary) we care about search insertion and deletion
ndash We typically donrsquot care about sorted order
Hash Tables
bull More formallyndash Given a table T and a record x with key (= symbol)
and satellite data we need to supportbull Insert (T x)bull Delete (T x)bull Search(T x)
ndash We want these to be fast but donrsquot care about sorting the records
bull The structure we will use is a hash tablendash Supports all the above in O(1) expected time
Hash Functions
bull Next problem collision T
0
m - 1
h(k1)
h(k4)
h(k2) = h(k5)
h(k3)
k4
k2 k3
k1
k5
U(universe of keys)
K(actualkeys)
Resolving Collisions
bull How can we solve the problem of collisions
bull One of the solution is chaining
bull Other solutions open addressing
Chaining
bull Chaining puts elements that hash to the same slot in a linked list
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Chaining
bull How do we insert an element
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Chaining
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
bull How do we delete an element
Chaining
bull How do we search for a element with a given key
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
bull What will be the average cost of a successful search
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
bull What will be the average cost of a successful search A O((1 + )2) = O(1 + )
Analysis of Chaining Continued
bull So the cost of searching = O(1 + )
bull If the number of keys n is proportional to the number of slots in the table what is
bull A = O(1)ndash In other words we can make the expected
cost of searching constant if we make constant
If we could prove this
P(failure)lt1k (we are sort of happy)
P(failure)lt1nk (most of times this is true and wersquore
happy )
P(failure)lt12n (this is difficult but still we want this)
A Final Word About Randomized Algorithms
Acknowledgements
bull Kunal Verma
bull Nidhi Aggarwal
bull And other students of MSc(CS) batch 2009
END
Monte Carlo Algorithms
bull Running times are guaranteed but the output may not be completely correct
bull Probability of error is low
Las Vegas Algorithms
bull Output is guaranteed to be correct
bull Bounds on running times hold with high probability
bull What type of algorithm is Randomized Qsort
Why expected running times
bull Markovrsquos inequality
P( X gt k E(X)) lt 1k
ie the probability that the algorithm will take more than O(2 E(X)) time is less than 12
Or the probability that the algorithm will take more than O(10 E(X)) time is less than 110
This is the reason why Qsort does well in practice
Markovrsquos Bound
P(XltkM)lt 1k where k is a constant
Chernouffrsquos Bound
P(Xgt2μ)lt frac12
A More Stronger Result
P(Xgtk μ )lt 1nk where k is a constant
Binary search tree can be built randomly
Rank(x)=i
Randomly selected key becomes the root
Pivot element=root
x
gtlt
RANDOMLY BUILT BST
bull Xi the height of the tree rooted at a node with rank=i
bull Yi exponential height of the tree=2^Xi
bull H=maxH1H2 + 1
where H1 ht of left subtree
H2 htof right subtree
H ht of the tree rooted at x
HEIGHT OF THE TREE
bull Y=2^H
=2max2^H12^H2
bull Expected value of exponential ht of the tree with lsquonrsquo nodes
=E(EH(T(X)))
=2n sum maxEH(T(k))EH(T(n-1-k))
=O(n^3)=E(H(T(n)))=O(log n)
Skip list is a data structure that can be used to maintain dictionary
Given n keys we insert these n keys in a linked list that has -infin as first node and infin as last node
Initial list S0
Then we flip coin a coin for each element until only one is left in Si if a tail occurswe insert it into next list Si+1 and so on
-infin infin5 9 25 30 35 38 40
Skip List Dictionary as ADT
-infin
-infin
-infin
S3
S2
S1
S0
-infin 5 infin 4038 353025 9
infin
infin
infin
9 30 38
30
38 30
head Tail
Operations that can be performed on skip list
Each node has two pointers right and down
1 Drop down
bull This operation is performed when after(p)gtkey
bull In this operation pointer p moves down to immediate lower level list
(after drop down)
right
down
-infin
-infin infin
infin 30
309 38
p
S1
S0
2Scan forward
bull This operation is performed when after(p)ltkey
bull Here the pointer p moves to the next element in the list
bull eg here key=28 amp p is at 9 after(9)lt28 so scan forward
-infin 9 infin 25 30
p p pS0
Searching a key kKeep a ptr p to the first node in the highest list Sh
while (after(p)gtk)
if (Scur==S0) Scur is the current skip list
then ldquokey k not foundrdquo
exit
if (after(p)gtk)
drop down to next skip list
If (after(p)ltk)
scan forward ie update pafter(p)
if (after(p)==k)
return after(p)
-infin
-infin
-infin
Searching for a key 25
S3
S2
S1
S0
-infin 5 infin 4038 353025 9
infin
infin
infin
30
9 30 38
p
p
p
Key found
p
p
p
-infin
-infin
-infin
Searching for a key 28
S3
S2
S1
S0
-infin 5 infin 4038 353025 9
infin
infin
infin
30
9 30 38
p
p
p
Key not found
p
p
p
p
-infin
-infin
-infin
Deletion of a key
S3 eg delete 30
S2
S1
S0
-infin 5 infin 4038 353025 9
infin
infin
infin
30
9 30 38
p
p
p
p
p
Analysis
1 An element xk is in Si with probability 12i true forall elements
E(Si ) = sum 12i Xki where Xki = 1 if xk is in Si
0 otherwise
= n2i
E(total size) = E(sum ISi I)
= sum n2i le 2n
2 Expected height of a skip listh = log n
n2h =1
h ≃ log n
n
k=1
k=1
infin
Analysis(contd)
3 Drop down O(log n)
Since pointer p can drop atmost h times
ieheight of the skip list until S0 is reached
and h = logn
4 Scan forward O(log n)
of elements Total no of levels Total Cost
to scan at each level
O(1) O(log n ) O(log n )
The number of elements scanned at ith level is no more than 2 because
The key lies between p and after(p) on the (i+1)th level (thatrsquos why we came down to ith level) And there is only one element between p and after(p) of
(i+1)th level in Si the element pointed to by after(p) in Si
Thus we scan at most two elements at Si the element pointed to by p (when we came down) and after(p) in Si
Hashing
bull Motivation symbol tablesndash A compiler uses a symbol table to relate
symbols to associated databull Symbols variable names procedure names etcbull Associated data memory location call graph etc
ndash For a symbol table (also called a dictionary) we care about search insertion and deletion
ndash We typically donrsquot care about sorted order
Hash Tables
bull More formallyndash Given a table T and a record x with key (= symbol)
and satellite data we need to supportbull Insert (T x)bull Delete (T x)bull Search(T x)
ndash We want these to be fast but donrsquot care about sorting the records
bull The structure we will use is a hash tablendash Supports all the above in O(1) expected time
Hash Functions
bull Next problem collision T
0
m - 1
h(k1)
h(k4)
h(k2) = h(k5)
h(k3)
k4
k2 k3
k1
k5
U(universe of keys)
K(actualkeys)
Resolving Collisions
bull How can we solve the problem of collisions
bull One of the solution is chaining
bull Other solutions open addressing
Chaining
bull Chaining puts elements that hash to the same slot in a linked list
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Chaining
bull How do we insert an element
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Chaining
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
bull How do we delete an element
Chaining
bull How do we search for a element with a given key
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
bull What will be the average cost of a successful search
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
bull What will be the average cost of a successful search A O((1 + )2) = O(1 + )
Analysis of Chaining Continued
bull So the cost of searching = O(1 + )
bull If the number of keys n is proportional to the number of slots in the table what is
bull A = O(1)ndash In other words we can make the expected
cost of searching constant if we make constant
If we could prove this
P(failure)lt1k (we are sort of happy)
P(failure)lt1nk (most of times this is true and wersquore
happy )
P(failure)lt12n (this is difficult but still we want this)
A Final Word About Randomized Algorithms
Acknowledgements
bull Kunal Verma
bull Nidhi Aggarwal
bull And other students of MSc(CS) batch 2009
END
Las Vegas Algorithms
bull Output is guaranteed to be correct
bull Bounds on running times hold with high probability
bull What type of algorithm is Randomized Qsort
Why expected running times
bull Markovrsquos inequality
P( X gt k E(X)) lt 1k
ie the probability that the algorithm will take more than O(2 E(X)) time is less than 12
Or the probability that the algorithm will take more than O(10 E(X)) time is less than 110
This is the reason why Qsort does well in practice
Markovrsquos Bound
P(XltkM)lt 1k where k is a constant
Chernouffrsquos Bound
P(Xgt2μ)lt frac12
A More Stronger Result
P(Xgtk μ )lt 1nk where k is a constant
Binary search tree can be built randomly
Rank(x)=i
Randomly selected key becomes the root
Pivot element=root
x
gtlt
RANDOMLY BUILT BST
bull Xi the height of the tree rooted at a node with rank=i
bull Yi exponential height of the tree=2^Xi
bull H=maxH1H2 + 1
where H1 ht of left subtree
H2 htof right subtree
H ht of the tree rooted at x
HEIGHT OF THE TREE
bull Y=2^H
=2max2^H12^H2
bull Expected value of exponential ht of the tree with lsquonrsquo nodes
=E(EH(T(X)))
=2n sum maxEH(T(k))EH(T(n-1-k))
=O(n^3)=E(H(T(n)))=O(log n)
Skip list is a data structure that can be used to maintain dictionary
Given n keys we insert these n keys in a linked list that has -infin as first node and infin as last node
Initial list S0
Then we flip coin a coin for each element until only one is left in Si if a tail occurswe insert it into next list Si+1 and so on
-infin infin5 9 25 30 35 38 40
Skip List Dictionary as ADT
-infin
-infin
-infin
S3
S2
S1
S0
-infin 5 infin 4038 353025 9
infin
infin
infin
9 30 38
30
38 30
head Tail
Operations that can be performed on skip list
Each node has two pointers right and down
1 Drop down
bull This operation is performed when after(p)gtkey
bull In this operation pointer p moves down to immediate lower level list
(after drop down)
right
down
-infin
-infin infin
infin 30
309 38
p
S1
S0
2Scan forward
bull This operation is performed when after(p)ltkey
bull Here the pointer p moves to the next element in the list
bull eg here key=28 amp p is at 9 after(9)lt28 so scan forward
-infin 9 infin 25 30
p p pS0
Searching a key kKeep a ptr p to the first node in the highest list Sh
while (after(p)gtk)
if (Scur==S0) Scur is the current skip list
then ldquokey k not foundrdquo
exit
if (after(p)gtk)
drop down to next skip list
If (after(p)ltk)
scan forward ie update pafter(p)
if (after(p)==k)
return after(p)
-infin
-infin
-infin
Searching for a key 25
S3
S2
S1
S0
-infin 5 infin 4038 353025 9
infin
infin
infin
30
9 30 38
p
p
p
Key found
p
p
p
-infin
-infin
-infin
Searching for a key 28
S3
S2
S1
S0
-infin 5 infin 4038 353025 9
infin
infin
infin
30
9 30 38
p
p
p
Key not found
p
p
p
p
-infin
-infin
-infin
Deletion of a key
S3 eg delete 30
S2
S1
S0
-infin 5 infin 4038 353025 9
infin
infin
infin
30
9 30 38
p
p
p
p
p
Analysis
1 An element xk is in Si with probability 12i true forall elements
E(Si ) = sum 12i Xki where Xki = 1 if xk is in Si
0 otherwise
= n2i
E(total size) = E(sum ISi I)
= sum n2i le 2n
2 Expected height of a skip listh = log n
n2h =1
h ≃ log n
n
k=1
k=1
infin
Analysis(contd)
3 Drop down O(log n)
Since pointer p can drop atmost h times
ieheight of the skip list until S0 is reached
and h = logn
4 Scan forward O(log n)
of elements Total no of levels Total Cost
to scan at each level
O(1) O(log n ) O(log n )
The number of elements scanned at ith level is no more than 2 because
The key lies between p and after(p) on the (i+1)th level (thatrsquos why we came down to ith level) And there is only one element between p and after(p) of
(i+1)th level in Si the element pointed to by after(p) in Si
Thus we scan at most two elements at Si the element pointed to by p (when we came down) and after(p) in Si
Hashing
bull Motivation symbol tablesndash A compiler uses a symbol table to relate
symbols to associated databull Symbols variable names procedure names etcbull Associated data memory location call graph etc
ndash For a symbol table (also called a dictionary) we care about search insertion and deletion
ndash We typically donrsquot care about sorted order
Hash Tables
bull More formallyndash Given a table T and a record x with key (= symbol)
and satellite data we need to supportbull Insert (T x)bull Delete (T x)bull Search(T x)
ndash We want these to be fast but donrsquot care about sorting the records
bull The structure we will use is a hash tablendash Supports all the above in O(1) expected time
Hash Functions
bull Next problem collision T
0
m - 1
h(k1)
h(k4)
h(k2) = h(k5)
h(k3)
k4
k2 k3
k1
k5
U(universe of keys)
K(actualkeys)
Resolving Collisions
bull How can we solve the problem of collisions
bull One of the solution is chaining
bull Other solutions open addressing
Chaining
bull Chaining puts elements that hash to the same slot in a linked list
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Chaining
bull How do we insert an element
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Chaining
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
bull How do we delete an element
Chaining
bull How do we search for a element with a given key
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
bull What will be the average cost of a successful search
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
bull What will be the average cost of a successful search A O((1 + )2) = O(1 + )
Analysis of Chaining Continued
bull So the cost of searching = O(1 + )
bull If the number of keys n is proportional to the number of slots in the table what is
bull A = O(1)ndash In other words we can make the expected
cost of searching constant if we make constant
If we could prove this
P(failure)lt1k (we are sort of happy)
P(failure)lt1nk (most of times this is true and wersquore
happy )
P(failure)lt12n (this is difficult but still we want this)
A Final Word About Randomized Algorithms
Acknowledgements
bull Kunal Verma
bull Nidhi Aggarwal
bull And other students of MSc(CS) batch 2009
END
Why expected running times
bull Markovrsquos inequality
P( X gt k E(X)) lt 1k
ie the probability that the algorithm will take more than O(2 E(X)) time is less than 12
Or the probability that the algorithm will take more than O(10 E(X)) time is less than 110
This is the reason why Qsort does well in practice
Markovrsquos Bound
P(XltkM)lt 1k where k is a constant
Chernouffrsquos Bound
P(Xgt2μ)lt frac12
A More Stronger Result
P(Xgtk μ )lt 1nk where k is a constant
Binary search tree can be built randomly
Rank(x)=i
Randomly selected key becomes the root
Pivot element=root
x
gtlt
RANDOMLY BUILT BST
bull Xi the height of the tree rooted at a node with rank=i
bull Yi exponential height of the tree=2^Xi
bull H=maxH1H2 + 1
where H1 ht of left subtree
H2 htof right subtree
H ht of the tree rooted at x
HEIGHT OF THE TREE
bull Y=2^H
=2max2^H12^H2
bull Expected value of exponential ht of the tree with lsquonrsquo nodes
=E(EH(T(X)))
=2n sum maxEH(T(k))EH(T(n-1-k))
=O(n^3)=E(H(T(n)))=O(log n)
Skip list is a data structure that can be used to maintain dictionary
Given n keys we insert these n keys in a linked list that has -infin as first node and infin as last node
Initial list S0
Then we flip coin a coin for each element until only one is left in Si if a tail occurswe insert it into next list Si+1 and so on
-infin infin5 9 25 30 35 38 40
Skip List Dictionary as ADT
-infin
-infin
-infin
S3
S2
S1
S0
-infin 5 infin 4038 353025 9
infin
infin
infin
9 30 38
30
38 30
head Tail
Operations that can be performed on skip list
Each node has two pointers right and down
1 Drop down
bull This operation is performed when after(p)gtkey
bull In this operation pointer p moves down to immediate lower level list
(after drop down)
right
down
-infin
-infin infin
infin 30
309 38
p
S1
S0
2Scan forward
bull This operation is performed when after(p)ltkey
bull Here the pointer p moves to the next element in the list
bull eg here key=28 amp p is at 9 after(9)lt28 so scan forward
-infin 9 infin 25 30
p p pS0
Searching a key kKeep a ptr p to the first node in the highest list Sh
while (after(p)gtk)
if (Scur==S0) Scur is the current skip list
then ldquokey k not foundrdquo
exit
if (after(p)gtk)
drop down to next skip list
If (after(p)ltk)
scan forward ie update pafter(p)
if (after(p)==k)
return after(p)
-infin
-infin
-infin
Searching for a key 25
S3
S2
S1
S0
-infin 5 infin 4038 353025 9
infin
infin
infin
30
9 30 38
p
p
p
Key found
p
p
p
-infin
-infin
-infin
Searching for a key 28
S3
S2
S1
S0
-infin 5 infin 4038 353025 9
infin
infin
infin
30
9 30 38
p
p
p
Key not found
p
p
p
p
-infin
-infin
-infin
Deletion of a key
S3 eg delete 30
S2
S1
S0
-infin 5 infin 4038 353025 9
infin
infin
infin
30
9 30 38
p
p
p
p
p
Analysis
1 An element xk is in Si with probability 12i true forall elements
E(Si ) = sum 12i Xki where Xki = 1 if xk is in Si
0 otherwise
= n2i
E(total size) = E(sum ISi I)
= sum n2i le 2n
2 Expected height of a skip listh = log n
n2h =1
h ≃ log n
n
k=1
k=1
infin
Analysis(contd)
3 Drop down O(log n)
Since pointer p can drop atmost h times
ieheight of the skip list until S0 is reached
and h = logn
4 Scan forward O(log n)
of elements Total no of levels Total Cost
to scan at each level
O(1) O(log n ) O(log n )
The number of elements scanned at ith level is no more than 2 because
The key lies between p and after(p) on the (i+1)th level (thatrsquos why we came down to ith level) And there is only one element between p and after(p) of
(i+1)th level in Si the element pointed to by after(p) in Si
Thus we scan at most two elements at Si the element pointed to by p (when we came down) and after(p) in Si
Hashing
bull Motivation symbol tablesndash A compiler uses a symbol table to relate
symbols to associated databull Symbols variable names procedure names etcbull Associated data memory location call graph etc
ndash For a symbol table (also called a dictionary) we care about search insertion and deletion
ndash We typically donrsquot care about sorted order
Hash Tables
bull More formallyndash Given a table T and a record x with key (= symbol)
and satellite data we need to supportbull Insert (T x)bull Delete (T x)bull Search(T x)
ndash We want these to be fast but donrsquot care about sorting the records
bull The structure we will use is a hash tablendash Supports all the above in O(1) expected time
Hash Functions
bull Next problem collision T
0
m - 1
h(k1)
h(k4)
h(k2) = h(k5)
h(k3)
k4
k2 k3
k1
k5
U(universe of keys)
K(actualkeys)
Resolving Collisions
bull How can we solve the problem of collisions
bull One of the solution is chaining
bull Other solutions open addressing
Chaining
bull Chaining puts elements that hash to the same slot in a linked list
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Chaining
bull How do we insert an element
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Chaining
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
bull How do we delete an element
Chaining
bull How do we search for a element with a given key
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
bull What will be the average cost of a successful search
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
bull What will be the average cost of a successful search A O((1 + )2) = O(1 + )
Analysis of Chaining Continued
bull So the cost of searching = O(1 + )
bull If the number of keys n is proportional to the number of slots in the table what is
bull A = O(1)ndash In other words we can make the expected
cost of searching constant if we make constant
If we could prove this
P(failure)lt1k (we are sort of happy)
P(failure)lt1nk (most of times this is true and wersquore
happy )
P(failure)lt12n (this is difficult but still we want this)
A Final Word About Randomized Algorithms
Acknowledgements
bull Kunal Verma
bull Nidhi Aggarwal
bull And other students of MSc(CS) batch 2009
END
Markovrsquos Bound
P(XltkM)lt 1k where k is a constant
Chernouffrsquos Bound
P(Xgt2μ)lt frac12
A More Stronger Result
P(Xgtk μ )lt 1nk where k is a constant
Binary search tree can be built randomly
Rank(x)=i
Randomly selected key becomes the root
Pivot element=root
x
gtlt
RANDOMLY BUILT BST
bull Xi the height of the tree rooted at a node with rank=i
bull Yi exponential height of the tree=2^Xi
bull H=maxH1H2 + 1
where H1 ht of left subtree
H2 htof right subtree
H ht of the tree rooted at x
HEIGHT OF THE TREE
bull Y=2^H
=2max2^H12^H2
bull Expected value of exponential ht of the tree with lsquonrsquo nodes
=E(EH(T(X)))
=2n sum maxEH(T(k))EH(T(n-1-k))
=O(n^3)=E(H(T(n)))=O(log n)
Skip list is a data structure that can be used to maintain dictionary
Given n keys we insert these n keys in a linked list that has -infin as first node and infin as last node
Initial list S0
Then we flip coin a coin for each element until only one is left in Si if a tail occurswe insert it into next list Si+1 and so on
-infin infin5 9 25 30 35 38 40
Skip List Dictionary as ADT
-infin
-infin
-infin
S3
S2
S1
S0
-infin 5 infin 4038 353025 9
infin
infin
infin
9 30 38
30
38 30
head Tail
Operations that can be performed on skip list
Each node has two pointers right and down
1 Drop down
bull This operation is performed when after(p)gtkey
bull In this operation pointer p moves down to immediate lower level list
(after drop down)
right
down
-infin
-infin infin
infin 30
309 38
p
S1
S0
2Scan forward
bull This operation is performed when after(p)ltkey
bull Here the pointer p moves to the next element in the list
bull eg here key=28 amp p is at 9 after(9)lt28 so scan forward
-infin 9 infin 25 30
p p pS0
Searching a key kKeep a ptr p to the first node in the highest list Sh
while (after(p)gtk)
if (Scur==S0) Scur is the current skip list
then ldquokey k not foundrdquo
exit
if (after(p)gtk)
drop down to next skip list
If (after(p)ltk)
scan forward ie update pafter(p)
if (after(p)==k)
return after(p)
-infin
-infin
-infin
Searching for a key 25
S3
S2
S1
S0
-infin 5 infin 4038 353025 9
infin
infin
infin
30
9 30 38
p
p
p
Key found
p
p
p
-infin
-infin
-infin
Searching for a key 28
S3
S2
S1
S0
-infin 5 infin 4038 353025 9
infin
infin
infin
30
9 30 38
p
p
p
Key not found
p
p
p
p
-infin
-infin
-infin
Deletion of a key
S3 eg delete 30
S2
S1
S0
-infin 5 infin 4038 353025 9
infin
infin
infin
30
9 30 38
p
p
p
p
p
Analysis
1 An element xk is in Si with probability 12i true forall elements
E(Si ) = sum 12i Xki where Xki = 1 if xk is in Si
0 otherwise
= n2i
E(total size) = E(sum ISi I)
= sum n2i le 2n
2 Expected height of a skip listh = log n
n2h =1
h ≃ log n
n
k=1
k=1
infin
Analysis(contd)
3 Drop down O(log n)
Since pointer p can drop atmost h times
ieheight of the skip list until S0 is reached
and h = logn
4 Scan forward O(log n)
of elements Total no of levels Total Cost
to scan at each level
O(1) O(log n ) O(log n )
The number of elements scanned at ith level is no more than 2 because
The key lies between p and after(p) on the (i+1)th level (thatrsquos why we came down to ith level) And there is only one element between p and after(p) of
(i+1)th level in Si the element pointed to by after(p) in Si
Thus we scan at most two elements at Si the element pointed to by p (when we came down) and after(p) in Si
Hashing
bull Motivation symbol tablesndash A compiler uses a symbol table to relate
symbols to associated databull Symbols variable names procedure names etcbull Associated data memory location call graph etc
ndash For a symbol table (also called a dictionary) we care about search insertion and deletion
ndash We typically donrsquot care about sorted order
Hash Tables
bull More formallyndash Given a table T and a record x with key (= symbol)
and satellite data we need to supportbull Insert (T x)bull Delete (T x)bull Search(T x)
ndash We want these to be fast but donrsquot care about sorting the records
bull The structure we will use is a hash tablendash Supports all the above in O(1) expected time
Hash Functions
bull Next problem collision T
0
m - 1
h(k1)
h(k4)
h(k2) = h(k5)
h(k3)
k4
k2 k3
k1
k5
U(universe of keys)
K(actualkeys)
Resolving Collisions
bull How can we solve the problem of collisions
bull One of the solution is chaining
bull Other solutions open addressing
Chaining
bull Chaining puts elements that hash to the same slot in a linked list
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Chaining
bull How do we insert an element
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Chaining
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
bull How do we delete an element
Chaining
bull How do we search for a element with a given key
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
bull What will be the average cost of a successful search
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
bull What will be the average cost of a successful search A O((1 + )2) = O(1 + )
Analysis of Chaining Continued
bull So the cost of searching = O(1 + )
bull If the number of keys n is proportional to the number of slots in the table what is
bull A = O(1)ndash In other words we can make the expected
cost of searching constant if we make constant
If we could prove this
P(failure)lt1k (we are sort of happy)
P(failure)lt1nk (most of times this is true and wersquore
happy )
P(failure)lt12n (this is difficult but still we want this)
A Final Word About Randomized Algorithms
Acknowledgements
bull Kunal Verma
bull Nidhi Aggarwal
bull And other students of MSc(CS) batch 2009
END
Binary search tree can be built randomly
Rank(x)=i
Randomly selected key becomes the root
Pivot element=root
x
gtlt
RANDOMLY BUILT BST
bull Xi the height of the tree rooted at a node with rank=i
bull Yi exponential height of the tree=2^Xi
bull H=maxH1H2 + 1
where H1 ht of left subtree
H2 htof right subtree
H ht of the tree rooted at x
HEIGHT OF THE TREE
bull Y=2^H
=2max2^H12^H2
bull Expected value of exponential ht of the tree with lsquonrsquo nodes
=E(EH(T(X)))
=2n sum maxEH(T(k))EH(T(n-1-k))
=O(n^3)=E(H(T(n)))=O(log n)
Skip list is a data structure that can be used to maintain dictionary
Given n keys we insert these n keys in a linked list that has -infin as first node and infin as last node
Initial list S0
Then we flip coin a coin for each element until only one is left in Si if a tail occurswe insert it into next list Si+1 and so on
-infin infin5 9 25 30 35 38 40
Skip List Dictionary as ADT
-infin
-infin
-infin
S3
S2
S1
S0
-infin 5 infin 4038 353025 9
infin
infin
infin
9 30 38
30
38 30
head Tail
Operations that can be performed on skip list
Each node has two pointers right and down
1 Drop down
bull This operation is performed when after(p)gtkey
bull In this operation pointer p moves down to immediate lower level list
(after drop down)
right
down
-infin
-infin infin
infin 30
309 38
p
S1
S0
2Scan forward
bull This operation is performed when after(p)ltkey
bull Here the pointer p moves to the next element in the list
bull eg here key=28 amp p is at 9 after(9)lt28 so scan forward
-infin 9 infin 25 30
p p pS0
Searching a key kKeep a ptr p to the first node in the highest list Sh
while (after(p)gtk)
if (Scur==S0) Scur is the current skip list
then ldquokey k not foundrdquo
exit
if (after(p)gtk)
drop down to next skip list
If (after(p)ltk)
scan forward ie update pafter(p)
if (after(p)==k)
return after(p)
-infin
-infin
-infin
Searching for a key 25
S3
S2
S1
S0
-infin 5 infin 4038 353025 9
infin
infin
infin
30
9 30 38
p
p
p
Key found
p
p
p
-infin
-infin
-infin
Searching for a key 28
S3
S2
S1
S0
-infin 5 infin 4038 353025 9
infin
infin
infin
30
9 30 38
p
p
p
Key not found
p
p
p
p
-infin
-infin
-infin
Deletion of a key
S3 eg delete 30
S2
S1
S0
-infin 5 infin 4038 353025 9
infin
infin
infin
30
9 30 38
p
p
p
p
p
Analysis
1 An element xk is in Si with probability 12i true forall elements
E(Si ) = sum 12i Xki where Xki = 1 if xk is in Si
0 otherwise
= n2i
E(total size) = E(sum ISi I)
= sum n2i le 2n
2 Expected height of a skip listh = log n
n2h =1
h ≃ log n
n
k=1
k=1
infin
Analysis(contd)
3 Drop down O(log n)
Since pointer p can drop atmost h times
ieheight of the skip list until S0 is reached
and h = logn
4 Scan forward O(log n)
of elements Total no of levels Total Cost
to scan at each level
O(1) O(log n ) O(log n )
The number of elements scanned at ith level is no more than 2 because
The key lies between p and after(p) on the (i+1)th level (thatrsquos why we came down to ith level) And there is only one element between p and after(p) of
(i+1)th level in Si the element pointed to by after(p) in Si
Thus we scan at most two elements at Si the element pointed to by p (when we came down) and after(p) in Si
Hashing
bull Motivation symbol tablesndash A compiler uses a symbol table to relate
symbols to associated databull Symbols variable names procedure names etcbull Associated data memory location call graph etc
ndash For a symbol table (also called a dictionary) we care about search insertion and deletion
ndash We typically donrsquot care about sorted order
Hash Tables
bull More formallyndash Given a table T and a record x with key (= symbol)
and satellite data we need to supportbull Insert (T x)bull Delete (T x)bull Search(T x)
ndash We want these to be fast but donrsquot care about sorting the records
bull The structure we will use is a hash tablendash Supports all the above in O(1) expected time
Hash Functions
bull Next problem collision T
0
m - 1
h(k1)
h(k4)
h(k2) = h(k5)
h(k3)
k4
k2 k3
k1
k5
U(universe of keys)
K(actualkeys)
Resolving Collisions
bull How can we solve the problem of collisions
bull One of the solution is chaining
bull Other solutions open addressing
Chaining
bull Chaining puts elements that hash to the same slot in a linked list
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Chaining
bull How do we insert an element
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Chaining
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
bull How do we delete an element
Chaining
bull How do we search for a element with a given key
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
bull What will be the average cost of a successful search
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
bull What will be the average cost of a successful search A O((1 + )2) = O(1 + )
Analysis of Chaining Continued
bull So the cost of searching = O(1 + )
bull If the number of keys n is proportional to the number of slots in the table what is
bull A = O(1)ndash In other words we can make the expected
cost of searching constant if we make constant
If we could prove this
P(failure)lt1k (we are sort of happy)
P(failure)lt1nk (most of times this is true and wersquore
happy )
P(failure)lt12n (this is difficult but still we want this)
A Final Word About Randomized Algorithms
Acknowledgements
bull Kunal Verma
bull Nidhi Aggarwal
bull And other students of MSc(CS) batch 2009
END
bull Xi the height of the tree rooted at a node with rank=i
bull Yi exponential height of the tree=2^Xi
bull H=maxH1H2 + 1
where H1 ht of left subtree
H2 htof right subtree
H ht of the tree rooted at x
HEIGHT OF THE TREE
bull Y=2^H
=2max2^H12^H2
bull Expected value of exponential ht of the tree with lsquonrsquo nodes
=E(EH(T(X)))
=2n sum maxEH(T(k))EH(T(n-1-k))
=O(n^3)=E(H(T(n)))=O(log n)
Skip list is a data structure that can be used to maintain dictionary
Given n keys we insert these n keys in a linked list that has -infin as first node and infin as last node
Initial list S0
Then we flip coin a coin for each element until only one is left in Si if a tail occurswe insert it into next list Si+1 and so on
-infin infin5 9 25 30 35 38 40
Skip List Dictionary as ADT
-infin
-infin
-infin
S3
S2
S1
S0
-infin 5 infin 4038 353025 9
infin
infin
infin
9 30 38
30
38 30
head Tail
Operations that can be performed on skip list
Each node has two pointers right and down
1 Drop down
bull This operation is performed when after(p)gtkey
bull In this operation pointer p moves down to immediate lower level list
(after drop down)
right
down
-infin
-infin infin
infin 30
309 38
p
S1
S0
2Scan forward
bull This operation is performed when after(p)ltkey
bull Here the pointer p moves to the next element in the list
bull eg here key=28 amp p is at 9 after(9)lt28 so scan forward
-infin 9 infin 25 30
p p pS0
Searching a key kKeep a ptr p to the first node in the highest list Sh
while (after(p)gtk)
if (Scur==S0) Scur is the current skip list
then ldquokey k not foundrdquo
exit
if (after(p)gtk)
drop down to next skip list
If (after(p)ltk)
scan forward ie update pafter(p)
if (after(p)==k)
return after(p)
-infin
-infin
-infin
Searching for a key 25
S3
S2
S1
S0
-infin 5 infin 4038 353025 9
infin
infin
infin
30
9 30 38
p
p
p
Key found
p
p
p
-infin
-infin
-infin
Searching for a key 28
S3
S2
S1
S0
-infin 5 infin 4038 353025 9
infin
infin
infin
30
9 30 38
p
p
p
Key not found
p
p
p
p
-infin
-infin
-infin
Deletion of a key
S3 eg delete 30
S2
S1
S0
-infin 5 infin 4038 353025 9
infin
infin
infin
30
9 30 38
p
p
p
p
p
Analysis
1 An element xk is in Si with probability 12i true forall elements
E(Si ) = sum 12i Xki where Xki = 1 if xk is in Si
0 otherwise
= n2i
E(total size) = E(sum ISi I)
= sum n2i le 2n
2 Expected height of a skip listh = log n
n2h =1
h ≃ log n
n
k=1
k=1
infin
Analysis(contd)
3 Drop down O(log n)
Since pointer p can drop atmost h times
ieheight of the skip list until S0 is reached
and h = logn
4 Scan forward O(log n)
of elements Total no of levels Total Cost
to scan at each level
O(1) O(log n ) O(log n )
The number of elements scanned at ith level is no more than 2 because
The key lies between p and after(p) on the (i+1)th level (thatrsquos why we came down to ith level) And there is only one element between p and after(p) of
(i+1)th level in Si the element pointed to by after(p) in Si
Thus we scan at most two elements at Si the element pointed to by p (when we came down) and after(p) in Si
Hashing
bull Motivation symbol tablesndash A compiler uses a symbol table to relate
symbols to associated databull Symbols variable names procedure names etcbull Associated data memory location call graph etc
ndash For a symbol table (also called a dictionary) we care about search insertion and deletion
ndash We typically donrsquot care about sorted order
Hash Tables
bull More formallyndash Given a table T and a record x with key (= symbol)
and satellite data we need to supportbull Insert (T x)bull Delete (T x)bull Search(T x)
ndash We want these to be fast but donrsquot care about sorting the records
bull The structure we will use is a hash tablendash Supports all the above in O(1) expected time
Hash Functions
bull Next problem collision T
0
m - 1
h(k1)
h(k4)
h(k2) = h(k5)
h(k3)
k4
k2 k3
k1
k5
U(universe of keys)
K(actualkeys)
Resolving Collisions
bull How can we solve the problem of collisions
bull One of the solution is chaining
bull Other solutions open addressing
Chaining
bull Chaining puts elements that hash to the same slot in a linked list
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Chaining
bull How do we insert an element
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Chaining
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
bull How do we delete an element
Chaining
bull How do we search for a element with a given key
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
bull What will be the average cost of a successful search
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
bull What will be the average cost of a successful search A O((1 + )2) = O(1 + )
Analysis of Chaining Continued
bull So the cost of searching = O(1 + )
bull If the number of keys n is proportional to the number of slots in the table what is
bull A = O(1)ndash In other words we can make the expected
cost of searching constant if we make constant
If we could prove this
P(failure)lt1k (we are sort of happy)
P(failure)lt1nk (most of times this is true and wersquore
happy )
P(failure)lt12n (this is difficult but still we want this)
A Final Word About Randomized Algorithms
Acknowledgements
bull Kunal Verma
bull Nidhi Aggarwal
bull And other students of MSc(CS) batch 2009
END
bull Y=2^H
=2max2^H12^H2
bull Expected value of exponential ht of the tree with lsquonrsquo nodes
=E(EH(T(X)))
=2n sum maxEH(T(k))EH(T(n-1-k))
=O(n^3)=E(H(T(n)))=O(log n)
Skip list is a data structure that can be used to maintain dictionary
Given n keys we insert these n keys in a linked list that has -infin as first node and infin as last node
Initial list S0
Then we flip coin a coin for each element until only one is left in Si if a tail occurswe insert it into next list Si+1 and so on
-infin infin5 9 25 30 35 38 40
Skip List Dictionary as ADT
-infin
-infin
-infin
S3
S2
S1
S0
-infin 5 infin 4038 353025 9
infin
infin
infin
9 30 38
30
38 30
head Tail
Operations that can be performed on skip list
Each node has two pointers right and down
1 Drop down
bull This operation is performed when after(p)gtkey
bull In this operation pointer p moves down to immediate lower level list
(after drop down)
right
down
-infin
-infin infin
infin 30
309 38
p
S1
S0
2Scan forward
bull This operation is performed when after(p)ltkey
bull Here the pointer p moves to the next element in the list
bull eg here key=28 amp p is at 9 after(9)lt28 so scan forward
-infin 9 infin 25 30
p p pS0
Searching a key kKeep a ptr p to the first node in the highest list Sh
while (after(p)gtk)
if (Scur==S0) Scur is the current skip list
then ldquokey k not foundrdquo
exit
if (after(p)gtk)
drop down to next skip list
If (after(p)ltk)
scan forward ie update pafter(p)
if (after(p)==k)
return after(p)
-infin
-infin
-infin
Searching for a key 25
S3
S2
S1
S0
-infin 5 infin 4038 353025 9
infin
infin
infin
30
9 30 38
p
p
p
Key found
p
p
p
-infin
-infin
-infin
Searching for a key 28
S3
S2
S1
S0
-infin 5 infin 4038 353025 9
infin
infin
infin
30
9 30 38
p
p
p
Key not found
p
p
p
p
-infin
-infin
-infin
Deletion of a key
S3 eg delete 30
S2
S1
S0
-infin 5 infin 4038 353025 9
infin
infin
infin
30
9 30 38
p
p
p
p
p
Analysis
1 An element xk is in Si with probability 12i true forall elements
E(Si ) = sum 12i Xki where Xki = 1 if xk is in Si
0 otherwise
= n2i
E(total size) = E(sum ISi I)
= sum n2i le 2n
2 Expected height of a skip listh = log n
n2h =1
h ≃ log n
n
k=1
k=1
infin
Analysis(contd)
3 Drop down O(log n)
Since pointer p can drop atmost h times
ieheight of the skip list until S0 is reached
and h = logn
4 Scan forward O(log n)
of elements Total no of levels Total Cost
to scan at each level
O(1) O(log n ) O(log n )
The number of elements scanned at ith level is no more than 2 because
The key lies between p and after(p) on the (i+1)th level (thatrsquos why we came down to ith level) And there is only one element between p and after(p) of
(i+1)th level in Si the element pointed to by after(p) in Si
Thus we scan at most two elements at Si the element pointed to by p (when we came down) and after(p) in Si
Hashing
bull Motivation symbol tablesndash A compiler uses a symbol table to relate
symbols to associated databull Symbols variable names procedure names etcbull Associated data memory location call graph etc
ndash For a symbol table (also called a dictionary) we care about search insertion and deletion
ndash We typically donrsquot care about sorted order
Hash Tables
bull More formallyndash Given a table T and a record x with key (= symbol)
and satellite data we need to supportbull Insert (T x)bull Delete (T x)bull Search(T x)
ndash We want these to be fast but donrsquot care about sorting the records
bull The structure we will use is a hash tablendash Supports all the above in O(1) expected time
Hash Functions
bull Next problem collision T
0
m - 1
h(k1)
h(k4)
h(k2) = h(k5)
h(k3)
k4
k2 k3
k1
k5
U(universe of keys)
K(actualkeys)
Resolving Collisions
bull How can we solve the problem of collisions
bull One of the solution is chaining
bull Other solutions open addressing
Chaining
bull Chaining puts elements that hash to the same slot in a linked list
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Chaining
bull How do we insert an element
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Chaining
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
bull How do we delete an element
Chaining
bull How do we search for a element with a given key
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
bull What will be the average cost of a successful search
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
bull What will be the average cost of a successful search A O((1 + )2) = O(1 + )
Analysis of Chaining Continued
bull So the cost of searching = O(1 + )
bull If the number of keys n is proportional to the number of slots in the table what is
bull A = O(1)ndash In other words we can make the expected
cost of searching constant if we make constant
If we could prove this
P(failure)lt1k (we are sort of happy)
P(failure)lt1nk (most of times this is true and wersquore
happy )
P(failure)lt12n (this is difficult but still we want this)
A Final Word About Randomized Algorithms
Acknowledgements
bull Kunal Verma
bull Nidhi Aggarwal
bull And other students of MSc(CS) batch 2009
END
Skip list is a data structure that can be used to maintain dictionary
Given n keys we insert these n keys in a linked list that has -infin as first node and infin as last node
Initial list S0
Then we flip coin a coin for each element until only one is left in Si if a tail occurswe insert it into next list Si+1 and so on
-infin infin5 9 25 30 35 38 40
Skip List Dictionary as ADT
-infin
-infin
-infin
S3
S2
S1
S0
-infin 5 infin 4038 353025 9
infin
infin
infin
9 30 38
30
38 30
head Tail
Operations that can be performed on skip list
Each node has two pointers right and down
1 Drop down
bull This operation is performed when after(p)gtkey
bull In this operation pointer p moves down to immediate lower level list
(after drop down)
right
down
-infin
-infin infin
infin 30
309 38
p
S1
S0
2Scan forward
bull This operation is performed when after(p)ltkey
bull Here the pointer p moves to the next element in the list
bull eg here key=28 amp p is at 9 after(9)lt28 so scan forward
-infin 9 infin 25 30
p p pS0
Searching a key kKeep a ptr p to the first node in the highest list Sh
while (after(p)gtk)
if (Scur==S0) Scur is the current skip list
then ldquokey k not foundrdquo
exit
if (after(p)gtk)
drop down to next skip list
If (after(p)ltk)
scan forward ie update pafter(p)
if (after(p)==k)
return after(p)
-infin
-infin
-infin
Searching for a key 25
S3
S2
S1
S0
-infin 5 infin 4038 353025 9
infin
infin
infin
30
9 30 38
p
p
p
Key found
p
p
p
-infin
-infin
-infin
Searching for a key 28
S3
S2
S1
S0
-infin 5 infin 4038 353025 9
infin
infin
infin
30
9 30 38
p
p
p
Key not found
p
p
p
p
-infin
-infin
-infin
Deletion of a key
S3 eg delete 30
S2
S1
S0
-infin 5 infin 4038 353025 9
infin
infin
infin
30
9 30 38
p
p
p
p
p
Analysis
1 An element xk is in Si with probability 12i true forall elements
E(Si ) = sum 12i Xki where Xki = 1 if xk is in Si
0 otherwise
= n2i
E(total size) = E(sum ISi I)
= sum n2i le 2n
2 Expected height of a skip listh = log n
n2h =1
h ≃ log n
n
k=1
k=1
infin
Analysis(contd)
3 Drop down O(log n)
Since pointer p can drop atmost h times
ieheight of the skip list until S0 is reached
and h = logn
4 Scan forward O(log n)
of elements Total no of levels Total Cost
to scan at each level
O(1) O(log n ) O(log n )
The number of elements scanned at ith level is no more than 2 because
The key lies between p and after(p) on the (i+1)th level (thatrsquos why we came down to ith level) And there is only one element between p and after(p) of
(i+1)th level in Si the element pointed to by after(p) in Si
Thus we scan at most two elements at Si the element pointed to by p (when we came down) and after(p) in Si
Hashing
bull Motivation symbol tablesndash A compiler uses a symbol table to relate
symbols to associated databull Symbols variable names procedure names etcbull Associated data memory location call graph etc
ndash For a symbol table (also called a dictionary) we care about search insertion and deletion
ndash We typically donrsquot care about sorted order
Hash Tables
bull More formallyndash Given a table T and a record x with key (= symbol)
and satellite data we need to supportbull Insert (T x)bull Delete (T x)bull Search(T x)
ndash We want these to be fast but donrsquot care about sorting the records
bull The structure we will use is a hash tablendash Supports all the above in O(1) expected time
Hash Functions
bull Next problem collision T
0
m - 1
h(k1)
h(k4)
h(k2) = h(k5)
h(k3)
k4
k2 k3
k1
k5
U(universe of keys)
K(actualkeys)
Resolving Collisions
bull How can we solve the problem of collisions
bull One of the solution is chaining
bull Other solutions open addressing
Chaining
bull Chaining puts elements that hash to the same slot in a linked list
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Chaining
bull How do we insert an element
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Chaining
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
bull How do we delete an element
Chaining
bull How do we search for a element with a given key
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
bull What will be the average cost of a successful search
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
bull What will be the average cost of a successful search A O((1 + )2) = O(1 + )
Analysis of Chaining Continued
bull So the cost of searching = O(1 + )
bull If the number of keys n is proportional to the number of slots in the table what is
bull A = O(1)ndash In other words we can make the expected
cost of searching constant if we make constant
If we could prove this
P(failure)lt1k (we are sort of happy)
P(failure)lt1nk (most of times this is true and wersquore
happy )
P(failure)lt12n (this is difficult but still we want this)
A Final Word About Randomized Algorithms
Acknowledgements
bull Kunal Verma
bull Nidhi Aggarwal
bull And other students of MSc(CS) batch 2009
END
-infin
-infin
-infin
S3
S2
S1
S0
-infin 5 infin 4038 353025 9
infin
infin
infin
9 30 38
30
38 30
head Tail
Operations that can be performed on skip list
Each node has two pointers right and down
1 Drop down
bull This operation is performed when after(p)gtkey
bull In this operation pointer p moves down to immediate lower level list
(after drop down)
right
down
-infin
-infin infin
infin 30
309 38
p
S1
S0
2Scan forward
bull This operation is performed when after(p)ltkey
bull Here the pointer p moves to the next element in the list
bull eg here key=28 amp p is at 9 after(9)lt28 so scan forward
-infin 9 infin 25 30
p p pS0
Searching a key kKeep a ptr p to the first node in the highest list Sh
while (after(p)gtk)
if (Scur==S0) Scur is the current skip list
then ldquokey k not foundrdquo
exit
if (after(p)gtk)
drop down to next skip list
If (after(p)ltk)
scan forward ie update pafter(p)
if (after(p)==k)
return after(p)
-infin
-infin
-infin
Searching for a key 25
S3
S2
S1
S0
-infin 5 infin 4038 353025 9
infin
infin
infin
30
9 30 38
p
p
p
Key found
p
p
p
-infin
-infin
-infin
Searching for a key 28
S3
S2
S1
S0
-infin 5 infin 4038 353025 9
infin
infin
infin
30
9 30 38
p
p
p
Key not found
p
p
p
p
-infin
-infin
-infin
Deletion of a key
S3 eg delete 30
S2
S1
S0
-infin 5 infin 4038 353025 9
infin
infin
infin
30
9 30 38
p
p
p
p
p
Analysis
1 An element xk is in Si with probability 12i true forall elements
E(Si ) = sum 12i Xki where Xki = 1 if xk is in Si
0 otherwise
= n2i
E(total size) = E(sum ISi I)
= sum n2i le 2n
2 Expected height of a skip listh = log n
n2h =1
h ≃ log n
n
k=1
k=1
infin
Analysis(contd)
3 Drop down O(log n)
Since pointer p can drop atmost h times
ieheight of the skip list until S0 is reached
and h = logn
4 Scan forward O(log n)
of elements Total no of levels Total Cost
to scan at each level
O(1) O(log n ) O(log n )
The number of elements scanned at ith level is no more than 2 because
The key lies between p and after(p) on the (i+1)th level (thatrsquos why we came down to ith level) And there is only one element between p and after(p) of
(i+1)th level in Si the element pointed to by after(p) in Si
Thus we scan at most two elements at Si the element pointed to by p (when we came down) and after(p) in Si
Hashing
bull Motivation symbol tablesndash A compiler uses a symbol table to relate
symbols to associated databull Symbols variable names procedure names etcbull Associated data memory location call graph etc
ndash For a symbol table (also called a dictionary) we care about search insertion and deletion
ndash We typically donrsquot care about sorted order
Hash Tables
bull More formallyndash Given a table T and a record x with key (= symbol)
and satellite data we need to supportbull Insert (T x)bull Delete (T x)bull Search(T x)
ndash We want these to be fast but donrsquot care about sorting the records
bull The structure we will use is a hash tablendash Supports all the above in O(1) expected time
Hash Functions
bull Next problem collision T
0
m - 1
h(k1)
h(k4)
h(k2) = h(k5)
h(k3)
k4
k2 k3
k1
k5
U(universe of keys)
K(actualkeys)
Resolving Collisions
bull How can we solve the problem of collisions
bull One of the solution is chaining
bull Other solutions open addressing
Chaining
bull Chaining puts elements that hash to the same slot in a linked list
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Chaining
bull How do we insert an element
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Chaining
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
bull How do we delete an element
Chaining
bull How do we search for a element with a given key
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
bull What will be the average cost of a successful search
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
bull What will be the average cost of a successful search A O((1 + )2) = O(1 + )
Analysis of Chaining Continued
bull So the cost of searching = O(1 + )
bull If the number of keys n is proportional to the number of slots in the table what is
bull A = O(1)ndash In other words we can make the expected
cost of searching constant if we make constant
If we could prove this
P(failure)lt1k (we are sort of happy)
P(failure)lt1nk (most of times this is true and wersquore
happy )
P(failure)lt12n (this is difficult but still we want this)
A Final Word About Randomized Algorithms
Acknowledgements
bull Kunal Verma
bull Nidhi Aggarwal
bull And other students of MSc(CS) batch 2009
END
Operations that can be performed on skip list
Each node has two pointers right and down
1 Drop down
bull This operation is performed when after(p)gtkey
bull In this operation pointer p moves down to immediate lower level list
(after drop down)
right
down
-infin
-infin infin
infin 30
309 38
p
S1
S0
2Scan forward
bull This operation is performed when after(p)ltkey
bull Here the pointer p moves to the next element in the list
bull eg here key=28 amp p is at 9 after(9)lt28 so scan forward
-infin 9 infin 25 30
p p pS0
Searching a key kKeep a ptr p to the first node in the highest list Sh
while (after(p)gtk)
if (Scur==S0) Scur is the current skip list
then ldquokey k not foundrdquo
exit
if (after(p)gtk)
drop down to next skip list
If (after(p)ltk)
scan forward ie update pafter(p)
if (after(p)==k)
return after(p)
-infin
-infin
-infin
Searching for a key 25
S3
S2
S1
S0
-infin 5 infin 4038 353025 9
infin
infin
infin
30
9 30 38
p
p
p
Key found
p
p
p
-infin
-infin
-infin
Searching for a key 28
S3
S2
S1
S0
-infin 5 infin 4038 353025 9
infin
infin
infin
30
9 30 38
p
p
p
Key not found
p
p
p
p
-infin
-infin
-infin
Deletion of a key
S3 eg delete 30
S2
S1
S0
-infin 5 infin 4038 353025 9
infin
infin
infin
30
9 30 38
p
p
p
p
p
Analysis
1 An element xk is in Si with probability 12i true forall elements
E(Si ) = sum 12i Xki where Xki = 1 if xk is in Si
0 otherwise
= n2i
E(total size) = E(sum ISi I)
= sum n2i le 2n
2 Expected height of a skip listh = log n
n2h =1
h ≃ log n
n
k=1
k=1
infin
Analysis(contd)
3 Drop down O(log n)
Since pointer p can drop atmost h times
ieheight of the skip list until S0 is reached
and h = logn
4 Scan forward O(log n)
of elements Total no of levels Total Cost
to scan at each level
O(1) O(log n ) O(log n )
The number of elements scanned at ith level is no more than 2 because
The key lies between p and after(p) on the (i+1)th level (thatrsquos why we came down to ith level) And there is only one element between p and after(p) of
(i+1)th level in Si the element pointed to by after(p) in Si
Thus we scan at most two elements at Si the element pointed to by p (when we came down) and after(p) in Si
Hashing
bull Motivation symbol tablesndash A compiler uses a symbol table to relate
symbols to associated databull Symbols variable names procedure names etcbull Associated data memory location call graph etc
ndash For a symbol table (also called a dictionary) we care about search insertion and deletion
ndash We typically donrsquot care about sorted order
Hash Tables
bull More formallyndash Given a table T and a record x with key (= symbol)
and satellite data we need to supportbull Insert (T x)bull Delete (T x)bull Search(T x)
ndash We want these to be fast but donrsquot care about sorting the records
bull The structure we will use is a hash tablendash Supports all the above in O(1) expected time
Hash Functions
bull Next problem collision T
0
m - 1
h(k1)
h(k4)
h(k2) = h(k5)
h(k3)
k4
k2 k3
k1
k5
U(universe of keys)
K(actualkeys)
Resolving Collisions
bull How can we solve the problem of collisions
bull One of the solution is chaining
bull Other solutions open addressing
Chaining
bull Chaining puts elements that hash to the same slot in a linked list
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Chaining
bull How do we insert an element
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Chaining
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
bull How do we delete an element
Chaining
bull How do we search for a element with a given key
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
bull What will be the average cost of a successful search
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
bull What will be the average cost of a successful search A O((1 + )2) = O(1 + )
Analysis of Chaining Continued
bull So the cost of searching = O(1 + )
bull If the number of keys n is proportional to the number of slots in the table what is
bull A = O(1)ndash In other words we can make the expected
cost of searching constant if we make constant
If we could prove this
P(failure)lt1k (we are sort of happy)
P(failure)lt1nk (most of times this is true and wersquore
happy )
P(failure)lt12n (this is difficult but still we want this)
A Final Word About Randomized Algorithms
Acknowledgements
bull Kunal Verma
bull Nidhi Aggarwal
bull And other students of MSc(CS) batch 2009
END
2Scan forward
bull This operation is performed when after(p)ltkey
bull Here the pointer p moves to the next element in the list
bull eg here key=28 amp p is at 9 after(9)lt28 so scan forward
-infin 9 infin 25 30
p p pS0
Searching a key kKeep a ptr p to the first node in the highest list Sh
while (after(p)gtk)
if (Scur==S0) Scur is the current skip list
then ldquokey k not foundrdquo
exit
if (after(p)gtk)
drop down to next skip list
If (after(p)ltk)
scan forward ie update pafter(p)
if (after(p)==k)
return after(p)
-infin
-infin
-infin
Searching for a key 25
S3
S2
S1
S0
-infin 5 infin 4038 353025 9
infin
infin
infin
30
9 30 38
p
p
p
Key found
p
p
p
-infin
-infin
-infin
Searching for a key 28
S3
S2
S1
S0
-infin 5 infin 4038 353025 9
infin
infin
infin
30
9 30 38
p
p
p
Key not found
p
p
p
p
-infin
-infin
-infin
Deletion of a key
S3 eg delete 30
S2
S1
S0
-infin 5 infin 4038 353025 9
infin
infin
infin
30
9 30 38
p
p
p
p
p
Analysis
1 An element xk is in Si with probability 12i true forall elements
E(Si ) = sum 12i Xki where Xki = 1 if xk is in Si
0 otherwise
= n2i
E(total size) = E(sum ISi I)
= sum n2i le 2n
2 Expected height of a skip listh = log n
n2h =1
h ≃ log n
n
k=1
k=1
infin
Analysis(contd)
3 Drop down O(log n)
Since pointer p can drop atmost h times
ieheight of the skip list until S0 is reached
and h = logn
4 Scan forward O(log n)
of elements Total no of levels Total Cost
to scan at each level
O(1) O(log n ) O(log n )
The number of elements scanned at ith level is no more than 2 because
The key lies between p and after(p) on the (i+1)th level (thatrsquos why we came down to ith level) And there is only one element between p and after(p) of
(i+1)th level in Si the element pointed to by after(p) in Si
Thus we scan at most two elements at Si the element pointed to by p (when we came down) and after(p) in Si
Hashing
bull Motivation symbol tablesndash A compiler uses a symbol table to relate
symbols to associated databull Symbols variable names procedure names etcbull Associated data memory location call graph etc
ndash For a symbol table (also called a dictionary) we care about search insertion and deletion
ndash We typically donrsquot care about sorted order
Hash Tables
bull More formallyndash Given a table T and a record x with key (= symbol)
and satellite data we need to supportbull Insert (T x)bull Delete (T x)bull Search(T x)
ndash We want these to be fast but donrsquot care about sorting the records
bull The structure we will use is a hash tablendash Supports all the above in O(1) expected time
Hash Functions
bull Next problem collision T
0
m - 1
h(k1)
h(k4)
h(k2) = h(k5)
h(k3)
k4
k2 k3
k1
k5
U(universe of keys)
K(actualkeys)
Resolving Collisions
bull How can we solve the problem of collisions
bull One of the solution is chaining
bull Other solutions open addressing
Chaining
bull Chaining puts elements that hash to the same slot in a linked list
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Chaining
bull How do we insert an element
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Chaining
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
bull How do we delete an element
Chaining
bull How do we search for a element with a given key
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
bull What will be the average cost of a successful search
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
bull What will be the average cost of a successful search A O((1 + )2) = O(1 + )
Analysis of Chaining Continued
bull So the cost of searching = O(1 + )
bull If the number of keys n is proportional to the number of slots in the table what is
bull A = O(1)ndash In other words we can make the expected
cost of searching constant if we make constant
If we could prove this
P(failure)lt1k (we are sort of happy)
P(failure)lt1nk (most of times this is true and wersquore
happy )
P(failure)lt12n (this is difficult but still we want this)
A Final Word About Randomized Algorithms
Acknowledgements
bull Kunal Verma
bull Nidhi Aggarwal
bull And other students of MSc(CS) batch 2009
END
Searching a key kKeep a ptr p to the first node in the highest list Sh
while (after(p)gtk)
if (Scur==S0) Scur is the current skip list
then ldquokey k not foundrdquo
exit
if (after(p)gtk)
drop down to next skip list
If (after(p)ltk)
scan forward ie update pafter(p)
if (after(p)==k)
return after(p)
-infin
-infin
-infin
Searching for a key 25
S3
S2
S1
S0
-infin 5 infin 4038 353025 9
infin
infin
infin
30
9 30 38
p
p
p
Key found
p
p
p
-infin
-infin
-infin
Searching for a key 28
S3
S2
S1
S0
-infin 5 infin 4038 353025 9
infin
infin
infin
30
9 30 38
p
p
p
Key not found
p
p
p
p
-infin
-infin
-infin
Deletion of a key
S3 eg delete 30
S2
S1
S0
-infin 5 infin 4038 353025 9
infin
infin
infin
30
9 30 38
p
p
p
p
p
Analysis
1 An element xk is in Si with probability 12i true forall elements
E(Si ) = sum 12i Xki where Xki = 1 if xk is in Si
0 otherwise
= n2i
E(total size) = E(sum ISi I)
= sum n2i le 2n
2 Expected height of a skip listh = log n
n2h =1
h ≃ log n
n
k=1
k=1
infin
Analysis(contd)
3 Drop down O(log n)
Since pointer p can drop atmost h times
ieheight of the skip list until S0 is reached
and h = logn
4 Scan forward O(log n)
of elements Total no of levels Total Cost
to scan at each level
O(1) O(log n ) O(log n )
The number of elements scanned at ith level is no more than 2 because
The key lies between p and after(p) on the (i+1)th level (thatrsquos why we came down to ith level) And there is only one element between p and after(p) of
(i+1)th level in Si the element pointed to by after(p) in Si
Thus we scan at most two elements at Si the element pointed to by p (when we came down) and after(p) in Si
Hashing
bull Motivation symbol tablesndash A compiler uses a symbol table to relate
symbols to associated databull Symbols variable names procedure names etcbull Associated data memory location call graph etc
ndash For a symbol table (also called a dictionary) we care about search insertion and deletion
ndash We typically donrsquot care about sorted order
Hash Tables
bull More formallyndash Given a table T and a record x with key (= symbol)
and satellite data we need to supportbull Insert (T x)bull Delete (T x)bull Search(T x)
ndash We want these to be fast but donrsquot care about sorting the records
bull The structure we will use is a hash tablendash Supports all the above in O(1) expected time
Hash Functions
bull Next problem collision T
0
m - 1
h(k1)
h(k4)
h(k2) = h(k5)
h(k3)
k4
k2 k3
k1
k5
U(universe of keys)
K(actualkeys)
Resolving Collisions
bull How can we solve the problem of collisions
bull One of the solution is chaining
bull Other solutions open addressing
Chaining
bull Chaining puts elements that hash to the same slot in a linked list
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Chaining
bull How do we insert an element
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Chaining
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
bull How do we delete an element
Chaining
bull How do we search for a element with a given key
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
bull What will be the average cost of a successful search
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
bull What will be the average cost of a successful search A O((1 + )2) = O(1 + )
Analysis of Chaining Continued
bull So the cost of searching = O(1 + )
bull If the number of keys n is proportional to the number of slots in the table what is
bull A = O(1)ndash In other words we can make the expected
cost of searching constant if we make constant
If we could prove this
P(failure)lt1k (we are sort of happy)
P(failure)lt1nk (most of times this is true and wersquore
happy )
P(failure)lt12n (this is difficult but still we want this)
A Final Word About Randomized Algorithms
Acknowledgements
bull Kunal Verma
bull Nidhi Aggarwal
bull And other students of MSc(CS) batch 2009
END
-infin
-infin
-infin
Searching for a key 25
S3
S2
S1
S0
-infin 5 infin 4038 353025 9
infin
infin
infin
30
9 30 38
p
p
p
Key found
p
p
p
-infin
-infin
-infin
Searching for a key 28
S3
S2
S1
S0
-infin 5 infin 4038 353025 9
infin
infin
infin
30
9 30 38
p
p
p
Key not found
p
p
p
p
-infin
-infin
-infin
Deletion of a key
S3 eg delete 30
S2
S1
S0
-infin 5 infin 4038 353025 9
infin
infin
infin
30
9 30 38
p
p
p
p
p
Analysis
1 An element xk is in Si with probability 12i true forall elements
E(Si ) = sum 12i Xki where Xki = 1 if xk is in Si
0 otherwise
= n2i
E(total size) = E(sum ISi I)
= sum n2i le 2n
2 Expected height of a skip listh = log n
n2h =1
h ≃ log n
n
k=1
k=1
infin
Analysis(contd)
3 Drop down O(log n)
Since pointer p can drop atmost h times
ieheight of the skip list until S0 is reached
and h = logn
4 Scan forward O(log n)
of elements Total no of levels Total Cost
to scan at each level
O(1) O(log n ) O(log n )
The number of elements scanned at ith level is no more than 2 because
The key lies between p and after(p) on the (i+1)th level (thatrsquos why we came down to ith level) And there is only one element between p and after(p) of
(i+1)th level in Si the element pointed to by after(p) in Si
Thus we scan at most two elements at Si the element pointed to by p (when we came down) and after(p) in Si
Hashing
bull Motivation symbol tablesndash A compiler uses a symbol table to relate
symbols to associated databull Symbols variable names procedure names etcbull Associated data memory location call graph etc
ndash For a symbol table (also called a dictionary) we care about search insertion and deletion
ndash We typically donrsquot care about sorted order
Hash Tables
bull More formallyndash Given a table T and a record x with key (= symbol)
and satellite data we need to supportbull Insert (T x)bull Delete (T x)bull Search(T x)
ndash We want these to be fast but donrsquot care about sorting the records
bull The structure we will use is a hash tablendash Supports all the above in O(1) expected time
Hash Functions
bull Next problem collision T
0
m - 1
h(k1)
h(k4)
h(k2) = h(k5)
h(k3)
k4
k2 k3
k1
k5
U(universe of keys)
K(actualkeys)
Resolving Collisions
bull How can we solve the problem of collisions
bull One of the solution is chaining
bull Other solutions open addressing
Chaining
bull Chaining puts elements that hash to the same slot in a linked list
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Chaining
bull How do we insert an element
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Chaining
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
bull How do we delete an element
Chaining
bull How do we search for a element with a given key
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
bull What will be the average cost of a successful search
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
bull What will be the average cost of a successful search A O((1 + )2) = O(1 + )
Analysis of Chaining Continued
bull So the cost of searching = O(1 + )
bull If the number of keys n is proportional to the number of slots in the table what is
bull A = O(1)ndash In other words we can make the expected
cost of searching constant if we make constant
If we could prove this
P(failure)lt1k (we are sort of happy)
P(failure)lt1nk (most of times this is true and wersquore
happy )
P(failure)lt12n (this is difficult but still we want this)
A Final Word About Randomized Algorithms
Acknowledgements
bull Kunal Verma
bull Nidhi Aggarwal
bull And other students of MSc(CS) batch 2009
END
-infin
-infin
-infin
Searching for a key 28
S3
S2
S1
S0
-infin 5 infin 4038 353025 9
infin
infin
infin
30
9 30 38
p
p
p
Key not found
p
p
p
p
-infin
-infin
-infin
Deletion of a key
S3 eg delete 30
S2
S1
S0
-infin 5 infin 4038 353025 9
infin
infin
infin
30
9 30 38
p
p
p
p
p
Analysis
1 An element xk is in Si with probability 12i true forall elements
E(Si ) = sum 12i Xki where Xki = 1 if xk is in Si
0 otherwise
= n2i
E(total size) = E(sum ISi I)
= sum n2i le 2n
2 Expected height of a skip listh = log n
n2h =1
h ≃ log n
n
k=1
k=1
infin
Analysis(contd)
3 Drop down O(log n)
Since pointer p can drop atmost h times
ieheight of the skip list until S0 is reached
and h = logn
4 Scan forward O(log n)
of elements Total no of levels Total Cost
to scan at each level
O(1) O(log n ) O(log n )
The number of elements scanned at ith level is no more than 2 because
The key lies between p and after(p) on the (i+1)th level (thatrsquos why we came down to ith level) And there is only one element between p and after(p) of
(i+1)th level in Si the element pointed to by after(p) in Si
Thus we scan at most two elements at Si the element pointed to by p (when we came down) and after(p) in Si
Hashing
bull Motivation symbol tablesndash A compiler uses a symbol table to relate
symbols to associated databull Symbols variable names procedure names etcbull Associated data memory location call graph etc
ndash For a symbol table (also called a dictionary) we care about search insertion and deletion
ndash We typically donrsquot care about sorted order
Hash Tables
bull More formallyndash Given a table T and a record x with key (= symbol)
and satellite data we need to supportbull Insert (T x)bull Delete (T x)bull Search(T x)
ndash We want these to be fast but donrsquot care about sorting the records
bull The structure we will use is a hash tablendash Supports all the above in O(1) expected time
Hash Functions
bull Next problem collision T
0
m - 1
h(k1)
h(k4)
h(k2) = h(k5)
h(k3)
k4
k2 k3
k1
k5
U(universe of keys)
K(actualkeys)
Resolving Collisions
bull How can we solve the problem of collisions
bull One of the solution is chaining
bull Other solutions open addressing
Chaining
bull Chaining puts elements that hash to the same slot in a linked list
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Chaining
bull How do we insert an element
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Chaining
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
bull How do we delete an element
Chaining
bull How do we search for a element with a given key
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
bull What will be the average cost of a successful search
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
bull What will be the average cost of a successful search A O((1 + )2) = O(1 + )
Analysis of Chaining Continued
bull So the cost of searching = O(1 + )
bull If the number of keys n is proportional to the number of slots in the table what is
bull A = O(1)ndash In other words we can make the expected
cost of searching constant if we make constant
If we could prove this
P(failure)lt1k (we are sort of happy)
P(failure)lt1nk (most of times this is true and wersquore
happy )
P(failure)lt12n (this is difficult but still we want this)
A Final Word About Randomized Algorithms
Acknowledgements
bull Kunal Verma
bull Nidhi Aggarwal
bull And other students of MSc(CS) batch 2009
END
-infin
-infin
-infin
Deletion of a key
S3 eg delete 30
S2
S1
S0
-infin 5 infin 4038 353025 9
infin
infin
infin
30
9 30 38
p
p
p
p
p
Analysis
1 An element xk is in Si with probability 12i true forall elements
E(Si ) = sum 12i Xki where Xki = 1 if xk is in Si
0 otherwise
= n2i
E(total size) = E(sum ISi I)
= sum n2i le 2n
2 Expected height of a skip listh = log n
n2h =1
h ≃ log n
n
k=1
k=1
infin
Analysis(contd)
3 Drop down O(log n)
Since pointer p can drop atmost h times
ieheight of the skip list until S0 is reached
and h = logn
4 Scan forward O(log n)
of elements Total no of levels Total Cost
to scan at each level
O(1) O(log n ) O(log n )
The number of elements scanned at ith level is no more than 2 because
The key lies between p and after(p) on the (i+1)th level (thatrsquos why we came down to ith level) And there is only one element between p and after(p) of
(i+1)th level in Si the element pointed to by after(p) in Si
Thus we scan at most two elements at Si the element pointed to by p (when we came down) and after(p) in Si
Hashing
bull Motivation symbol tablesndash A compiler uses a symbol table to relate
symbols to associated databull Symbols variable names procedure names etcbull Associated data memory location call graph etc
ndash For a symbol table (also called a dictionary) we care about search insertion and deletion
ndash We typically donrsquot care about sorted order
Hash Tables
bull More formallyndash Given a table T and a record x with key (= symbol)
and satellite data we need to supportbull Insert (T x)bull Delete (T x)bull Search(T x)
ndash We want these to be fast but donrsquot care about sorting the records
bull The structure we will use is a hash tablendash Supports all the above in O(1) expected time
Hash Functions
bull Next problem collision T
0
m - 1
h(k1)
h(k4)
h(k2) = h(k5)
h(k3)
k4
k2 k3
k1
k5
U(universe of keys)
K(actualkeys)
Resolving Collisions
bull How can we solve the problem of collisions
bull One of the solution is chaining
bull Other solutions open addressing
Chaining
bull Chaining puts elements that hash to the same slot in a linked list
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Chaining
bull How do we insert an element
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Chaining
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
bull How do we delete an element
Chaining
bull How do we search for a element with a given key
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
bull What will be the average cost of a successful search
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
bull What will be the average cost of a successful search A O((1 + )2) = O(1 + )
Analysis of Chaining Continued
bull So the cost of searching = O(1 + )
bull If the number of keys n is proportional to the number of slots in the table what is
bull A = O(1)ndash In other words we can make the expected
cost of searching constant if we make constant
If we could prove this
P(failure)lt1k (we are sort of happy)
P(failure)lt1nk (most of times this is true and wersquore
happy )
P(failure)lt12n (this is difficult but still we want this)
A Final Word About Randomized Algorithms
Acknowledgements
bull Kunal Verma
bull Nidhi Aggarwal
bull And other students of MSc(CS) batch 2009
END
Analysis
1 An element xk is in Si with probability 12i true forall elements
E(Si ) = sum 12i Xki where Xki = 1 if xk is in Si
0 otherwise
= n2i
E(total size) = E(sum ISi I)
= sum n2i le 2n
2 Expected height of a skip listh = log n
n2h =1
h ≃ log n
n
k=1
k=1
infin
Analysis(contd)
3 Drop down O(log n)
Since pointer p can drop atmost h times
ieheight of the skip list until S0 is reached
and h = logn
4 Scan forward O(log n)
of elements Total no of levels Total Cost
to scan at each level
O(1) O(log n ) O(log n )
The number of elements scanned at ith level is no more than 2 because
The key lies between p and after(p) on the (i+1)th level (thatrsquos why we came down to ith level) And there is only one element between p and after(p) of
(i+1)th level in Si the element pointed to by after(p) in Si
Thus we scan at most two elements at Si the element pointed to by p (when we came down) and after(p) in Si
Hashing
bull Motivation symbol tablesndash A compiler uses a symbol table to relate
symbols to associated databull Symbols variable names procedure names etcbull Associated data memory location call graph etc
ndash For a symbol table (also called a dictionary) we care about search insertion and deletion
ndash We typically donrsquot care about sorted order
Hash Tables
bull More formallyndash Given a table T and a record x with key (= symbol)
and satellite data we need to supportbull Insert (T x)bull Delete (T x)bull Search(T x)
ndash We want these to be fast but donrsquot care about sorting the records
bull The structure we will use is a hash tablendash Supports all the above in O(1) expected time
Hash Functions
bull Next problem collision T
0
m - 1
h(k1)
h(k4)
h(k2) = h(k5)
h(k3)
k4
k2 k3
k1
k5
U(universe of keys)
K(actualkeys)
Resolving Collisions
bull How can we solve the problem of collisions
bull One of the solution is chaining
bull Other solutions open addressing
Chaining
bull Chaining puts elements that hash to the same slot in a linked list
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Chaining
bull How do we insert an element
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Chaining
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
bull How do we delete an element
Chaining
bull How do we search for a element with a given key
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
bull What will be the average cost of a successful search
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
bull What will be the average cost of a successful search A O((1 + )2) = O(1 + )
Analysis of Chaining Continued
bull So the cost of searching = O(1 + )
bull If the number of keys n is proportional to the number of slots in the table what is
bull A = O(1)ndash In other words we can make the expected
cost of searching constant if we make constant
If we could prove this
P(failure)lt1k (we are sort of happy)
P(failure)lt1nk (most of times this is true and wersquore
happy )
P(failure)lt12n (this is difficult but still we want this)
A Final Word About Randomized Algorithms
Acknowledgements
bull Kunal Verma
bull Nidhi Aggarwal
bull And other students of MSc(CS) batch 2009
END
Analysis(contd)
3 Drop down O(log n)
Since pointer p can drop atmost h times
ieheight of the skip list until S0 is reached
and h = logn
4 Scan forward O(log n)
of elements Total no of levels Total Cost
to scan at each level
O(1) O(log n ) O(log n )
The number of elements scanned at ith level is no more than 2 because
The key lies between p and after(p) on the (i+1)th level (thatrsquos why we came down to ith level) And there is only one element between p and after(p) of
(i+1)th level in Si the element pointed to by after(p) in Si
Thus we scan at most two elements at Si the element pointed to by p (when we came down) and after(p) in Si
Hashing
bull Motivation symbol tablesndash A compiler uses a symbol table to relate
symbols to associated databull Symbols variable names procedure names etcbull Associated data memory location call graph etc
ndash For a symbol table (also called a dictionary) we care about search insertion and deletion
ndash We typically donrsquot care about sorted order
Hash Tables
bull More formallyndash Given a table T and a record x with key (= symbol)
and satellite data we need to supportbull Insert (T x)bull Delete (T x)bull Search(T x)
ndash We want these to be fast but donrsquot care about sorting the records
bull The structure we will use is a hash tablendash Supports all the above in O(1) expected time
Hash Functions
bull Next problem collision T
0
m - 1
h(k1)
h(k4)
h(k2) = h(k5)
h(k3)
k4
k2 k3
k1
k5
U(universe of keys)
K(actualkeys)
Resolving Collisions
bull How can we solve the problem of collisions
bull One of the solution is chaining
bull Other solutions open addressing
Chaining
bull Chaining puts elements that hash to the same slot in a linked list
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Chaining
bull How do we insert an element
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Chaining
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
bull How do we delete an element
Chaining
bull How do we search for a element with a given key
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
bull What will be the average cost of a successful search
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
bull What will be the average cost of a successful search A O((1 + )2) = O(1 + )
Analysis of Chaining Continued
bull So the cost of searching = O(1 + )
bull If the number of keys n is proportional to the number of slots in the table what is
bull A = O(1)ndash In other words we can make the expected
cost of searching constant if we make constant
If we could prove this
P(failure)lt1k (we are sort of happy)
P(failure)lt1nk (most of times this is true and wersquore
happy )
P(failure)lt12n (this is difficult but still we want this)
A Final Word About Randomized Algorithms
Acknowledgements
bull Kunal Verma
bull Nidhi Aggarwal
bull And other students of MSc(CS) batch 2009
END
The number of elements scanned at ith level is no more than 2 because
The key lies between p and after(p) on the (i+1)th level (thatrsquos why we came down to ith level) And there is only one element between p and after(p) of
(i+1)th level in Si the element pointed to by after(p) in Si
Thus we scan at most two elements at Si the element pointed to by p (when we came down) and after(p) in Si
Hashing
bull Motivation symbol tablesndash A compiler uses a symbol table to relate
symbols to associated databull Symbols variable names procedure names etcbull Associated data memory location call graph etc
ndash For a symbol table (also called a dictionary) we care about search insertion and deletion
ndash We typically donrsquot care about sorted order
Hash Tables
bull More formallyndash Given a table T and a record x with key (= symbol)
and satellite data we need to supportbull Insert (T x)bull Delete (T x)bull Search(T x)
ndash We want these to be fast but donrsquot care about sorting the records
bull The structure we will use is a hash tablendash Supports all the above in O(1) expected time
Hash Functions
bull Next problem collision T
0
m - 1
h(k1)
h(k4)
h(k2) = h(k5)
h(k3)
k4
k2 k3
k1
k5
U(universe of keys)
K(actualkeys)
Resolving Collisions
bull How can we solve the problem of collisions
bull One of the solution is chaining
bull Other solutions open addressing
Chaining
bull Chaining puts elements that hash to the same slot in a linked list
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Chaining
bull How do we insert an element
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Chaining
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
bull How do we delete an element
Chaining
bull How do we search for a element with a given key
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
bull What will be the average cost of a successful search
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
bull What will be the average cost of a successful search A O((1 + )2) = O(1 + )
Analysis of Chaining Continued
bull So the cost of searching = O(1 + )
bull If the number of keys n is proportional to the number of slots in the table what is
bull A = O(1)ndash In other words we can make the expected
cost of searching constant if we make constant
If we could prove this
P(failure)lt1k (we are sort of happy)
P(failure)lt1nk (most of times this is true and wersquore
happy )
P(failure)lt12n (this is difficult but still we want this)
A Final Word About Randomized Algorithms
Acknowledgements
bull Kunal Verma
bull Nidhi Aggarwal
bull And other students of MSc(CS) batch 2009
END
Hashing
bull Motivation symbol tablesndash A compiler uses a symbol table to relate
symbols to associated databull Symbols variable names procedure names etcbull Associated data memory location call graph etc
ndash For a symbol table (also called a dictionary) we care about search insertion and deletion
ndash We typically donrsquot care about sorted order
Hash Tables
bull More formallyndash Given a table T and a record x with key (= symbol)
and satellite data we need to supportbull Insert (T x)bull Delete (T x)bull Search(T x)
ndash We want these to be fast but donrsquot care about sorting the records
bull The structure we will use is a hash tablendash Supports all the above in O(1) expected time
Hash Functions
bull Next problem collision T
0
m - 1
h(k1)
h(k4)
h(k2) = h(k5)
h(k3)
k4
k2 k3
k1
k5
U(universe of keys)
K(actualkeys)
Resolving Collisions
bull How can we solve the problem of collisions
bull One of the solution is chaining
bull Other solutions open addressing
Chaining
bull Chaining puts elements that hash to the same slot in a linked list
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Chaining
bull How do we insert an element
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Chaining
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
bull How do we delete an element
Chaining
bull How do we search for a element with a given key
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
bull What will be the average cost of a successful search
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
bull What will be the average cost of a successful search A O((1 + )2) = O(1 + )
Analysis of Chaining Continued
bull So the cost of searching = O(1 + )
bull If the number of keys n is proportional to the number of slots in the table what is
bull A = O(1)ndash In other words we can make the expected
cost of searching constant if we make constant
If we could prove this
P(failure)lt1k (we are sort of happy)
P(failure)lt1nk (most of times this is true and wersquore
happy )
P(failure)lt12n (this is difficult but still we want this)
A Final Word About Randomized Algorithms
Acknowledgements
bull Kunal Verma
bull Nidhi Aggarwal
bull And other students of MSc(CS) batch 2009
END
Hash Tables
bull More formallyndash Given a table T and a record x with key (= symbol)
and satellite data we need to supportbull Insert (T x)bull Delete (T x)bull Search(T x)
ndash We want these to be fast but donrsquot care about sorting the records
bull The structure we will use is a hash tablendash Supports all the above in O(1) expected time
Hash Functions
bull Next problem collision T
0
m - 1
h(k1)
h(k4)
h(k2) = h(k5)
h(k3)
k4
k2 k3
k1
k5
U(universe of keys)
K(actualkeys)
Resolving Collisions
bull How can we solve the problem of collisions
bull One of the solution is chaining
bull Other solutions open addressing
Chaining
bull Chaining puts elements that hash to the same slot in a linked list
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Chaining
bull How do we insert an element
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Chaining
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
bull How do we delete an element
Chaining
bull How do we search for a element with a given key
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
bull What will be the average cost of a successful search
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
bull What will be the average cost of a successful search A O((1 + )2) = O(1 + )
Analysis of Chaining Continued
bull So the cost of searching = O(1 + )
bull If the number of keys n is proportional to the number of slots in the table what is
bull A = O(1)ndash In other words we can make the expected
cost of searching constant if we make constant
If we could prove this
P(failure)lt1k (we are sort of happy)
P(failure)lt1nk (most of times this is true and wersquore
happy )
P(failure)lt12n (this is difficult but still we want this)
A Final Word About Randomized Algorithms
Acknowledgements
bull Kunal Verma
bull Nidhi Aggarwal
bull And other students of MSc(CS) batch 2009
END
Hash Functions
bull Next problem collision T
0
m - 1
h(k1)
h(k4)
h(k2) = h(k5)
h(k3)
k4
k2 k3
k1
k5
U(universe of keys)
K(actualkeys)
Resolving Collisions
bull How can we solve the problem of collisions
bull One of the solution is chaining
bull Other solutions open addressing
Chaining
bull Chaining puts elements that hash to the same slot in a linked list
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Chaining
bull How do we insert an element
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Chaining
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
bull How do we delete an element
Chaining
bull How do we search for a element with a given key
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
bull What will be the average cost of a successful search
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
bull What will be the average cost of a successful search A O((1 + )2) = O(1 + )
Analysis of Chaining Continued
bull So the cost of searching = O(1 + )
bull If the number of keys n is proportional to the number of slots in the table what is
bull A = O(1)ndash In other words we can make the expected
cost of searching constant if we make constant
If we could prove this
P(failure)lt1k (we are sort of happy)
P(failure)lt1nk (most of times this is true and wersquore
happy )
P(failure)lt12n (this is difficult but still we want this)
A Final Word About Randomized Algorithms
Acknowledgements
bull Kunal Verma
bull Nidhi Aggarwal
bull And other students of MSc(CS) batch 2009
END
Resolving Collisions
bull How can we solve the problem of collisions
bull One of the solution is chaining
bull Other solutions open addressing
Chaining
bull Chaining puts elements that hash to the same slot in a linked list
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Chaining
bull How do we insert an element
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Chaining
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
bull How do we delete an element
Chaining
bull How do we search for a element with a given key
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
bull What will be the average cost of a successful search
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
bull What will be the average cost of a successful search A O((1 + )2) = O(1 + )
Analysis of Chaining Continued
bull So the cost of searching = O(1 + )
bull If the number of keys n is proportional to the number of slots in the table what is
bull A = O(1)ndash In other words we can make the expected
cost of searching constant if we make constant
If we could prove this
P(failure)lt1k (we are sort of happy)
P(failure)lt1nk (most of times this is true and wersquore
happy )
P(failure)lt12n (this is difficult but still we want this)
A Final Word About Randomized Algorithms
Acknowledgements
bull Kunal Verma
bull Nidhi Aggarwal
bull And other students of MSc(CS) batch 2009
END
Chaining
bull Chaining puts elements that hash to the same slot in a linked list
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Chaining
bull How do we insert an element
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Chaining
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
bull How do we delete an element
Chaining
bull How do we search for a element with a given key
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
bull What will be the average cost of a successful search
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
bull What will be the average cost of a successful search A O((1 + )2) = O(1 + )
Analysis of Chaining Continued
bull So the cost of searching = O(1 + )
bull If the number of keys n is proportional to the number of slots in the table what is
bull A = O(1)ndash In other words we can make the expected
cost of searching constant if we make constant
If we could prove this
P(failure)lt1k (we are sort of happy)
P(failure)lt1nk (most of times this is true and wersquore
happy )
P(failure)lt12n (this is difficult but still we want this)
A Final Word About Randomized Algorithms
Acknowledgements
bull Kunal Verma
bull Nidhi Aggarwal
bull And other students of MSc(CS) batch 2009
END
Chaining
bull How do we insert an element
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Chaining
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
bull How do we delete an element
Chaining
bull How do we search for a element with a given key
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
bull What will be the average cost of a successful search
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
bull What will be the average cost of a successful search A O((1 + )2) = O(1 + )
Analysis of Chaining Continued
bull So the cost of searching = O(1 + )
bull If the number of keys n is proportional to the number of slots in the table what is
bull A = O(1)ndash In other words we can make the expected
cost of searching constant if we make constant
If we could prove this
P(failure)lt1k (we are sort of happy)
P(failure)lt1nk (most of times this is true and wersquore
happy )
P(failure)lt12n (this is difficult but still we want this)
A Final Word About Randomized Algorithms
Acknowledgements
bull Kunal Verma
bull Nidhi Aggarwal
bull And other students of MSc(CS) batch 2009
END
Chaining
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
bull How do we delete an element
Chaining
bull How do we search for a element with a given key
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
bull What will be the average cost of a successful search
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
bull What will be the average cost of a successful search A O((1 + )2) = O(1 + )
Analysis of Chaining Continued
bull So the cost of searching = O(1 + )
bull If the number of keys n is proportional to the number of slots in the table what is
bull A = O(1)ndash In other words we can make the expected
cost of searching constant if we make constant
If we could prove this
P(failure)lt1k (we are sort of happy)
P(failure)lt1nk (most of times this is true and wersquore
happy )
P(failure)lt12n (this is difficult but still we want this)
A Final Word About Randomized Algorithms
Acknowledgements
bull Kunal Verma
bull Nidhi Aggarwal
bull And other students of MSc(CS) batch 2009
END
Chaining
bull How do we search for a element with a given key
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
mdashmdash
T
k4
k2k3
k1
k5
U(universe of keys)
K(actualkeys)
k6
k8
k7
k1 k4 mdashmdash
k5 k2
k3
k8 k6 mdashmdash
mdashmdash
k7 mdashmdash
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
bull What will be the average cost of a successful search
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
bull What will be the average cost of a successful search A O((1 + )2) = O(1 + )
Analysis of Chaining Continued
bull So the cost of searching = O(1 + )
bull If the number of keys n is proportional to the number of slots in the table what is
bull A = O(1)ndash In other words we can make the expected
cost of searching constant if we make constant
If we could prove this
P(failure)lt1k (we are sort of happy)
P(failure)lt1nk (most of times this is true and wersquore
happy )
P(failure)lt12n (this is difficult but still we want this)
A Final Word About Randomized Algorithms
Acknowledgements
bull Kunal Verma
bull Nidhi Aggarwal
bull And other students of MSc(CS) batch 2009
END
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
bull What will be the average cost of a successful search
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
bull What will be the average cost of a successful search A O((1 + )2) = O(1 + )
Analysis of Chaining Continued
bull So the cost of searching = O(1 + )
bull If the number of keys n is proportional to the number of slots in the table what is
bull A = O(1)ndash In other words we can make the expected
cost of searching constant if we make constant
If we could prove this
P(failure)lt1k (we are sort of happy)
P(failure)lt1nk (most of times this is true and wersquore
happy )
P(failure)lt12n (this is difficult but still we want this)
A Final Word About Randomized Algorithms
Acknowledgements
bull Kunal Verma
bull Nidhi Aggarwal
bull And other students of MSc(CS) batch 2009
END
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
bull What will be the average cost of a successful search
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
bull What will be the average cost of a successful search A O((1 + )2) = O(1 + )
Analysis of Chaining Continued
bull So the cost of searching = O(1 + )
bull If the number of keys n is proportional to the number of slots in the table what is
bull A = O(1)ndash In other words we can make the expected
cost of searching constant if we make constant
If we could prove this
P(failure)lt1k (we are sort of happy)
P(failure)lt1nk (most of times this is true and wersquore
happy )
P(failure)lt12n (this is difficult but still we want this)
A Final Word About Randomized Algorithms
Acknowledgements
bull Kunal Verma
bull Nidhi Aggarwal
bull And other students of MSc(CS) batch 2009
END
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
bull What will be the average cost of a successful search
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
bull What will be the average cost of a successful search A O((1 + )2) = O(1 + )
Analysis of Chaining Continued
bull So the cost of searching = O(1 + )
bull If the number of keys n is proportional to the number of slots in the table what is
bull A = O(1)ndash In other words we can make the expected
cost of searching constant if we make constant
If we could prove this
P(failure)lt1k (we are sort of happy)
P(failure)lt1nk (most of times this is true and wersquore
happy )
P(failure)lt12n (this is difficult but still we want this)
A Final Word About Randomized Algorithms
Acknowledgements
bull Kunal Verma
bull Nidhi Aggarwal
bull And other students of MSc(CS) batch 2009
END
Analysis of Chaining
bull Assume simple uniform hashing each key in table is equally likely to be hashed to any slot
bull Given n keys and m slots in the table the load factor = nm = average keys per slot
bull What will be the average cost of an unsuccessful search for a key A O(1+)
bull What will be the average cost of a successful search A O((1 + )2) = O(1 + )
Analysis of Chaining Continued
bull So the cost of searching = O(1 + )
bull If the number of keys n is proportional to the number of slots in the table what is
bull A = O(1)ndash In other words we can make the expected
cost of searching constant if we make constant
If we could prove this
P(failure)lt1k (we are sort of happy)
P(failure)lt1nk (most of times this is true and wersquore
happy )
P(failure)lt12n (this is difficult but still we want this)
A Final Word About Randomized Algorithms
Acknowledgements
bull Kunal Verma
bull Nidhi Aggarwal
bull And other students of MSc(CS) batch 2009
END
Analysis of Chaining Continued
bull So the cost of searching = O(1 + )
bull If the number of keys n is proportional to the number of slots in the table what is
bull A = O(1)ndash In other words we can make the expected
cost of searching constant if we make constant
If we could prove this
P(failure)lt1k (we are sort of happy)
P(failure)lt1nk (most of times this is true and wersquore
happy )
P(failure)lt12n (this is difficult but still we want this)
A Final Word About Randomized Algorithms
Acknowledgements
bull Kunal Verma
bull Nidhi Aggarwal
bull And other students of MSc(CS) batch 2009
END
If we could prove this
P(failure)lt1k (we are sort of happy)
P(failure)lt1nk (most of times this is true and wersquore
happy )
P(failure)lt12n (this is difficult but still we want this)
A Final Word About Randomized Algorithms
Acknowledgements
bull Kunal Verma
bull Nidhi Aggarwal
bull And other students of MSc(CS) batch 2009
END
Acknowledgements
bull Kunal Verma
bull Nidhi Aggarwal
bull And other students of MSc(CS) batch 2009
END
END