Hash tables Definition: A data structure that uses a hash function to map keys into index of an...

12
Hash Tables

Transcript of Hash tables Definition: A data structure that uses a hash function to map keys into index of an...

Page 1: Hash tables Definition: A data structure that uses a hash function to map keys into index of an array element. k1 k2 k3 k4 k5.

Hash Tables

Page 2: Hash tables Definition: A data structure that uses a hash function to map keys into index of an array element. k1 k2 k3 k4 k5.

Hash tablesDefinition: A data structure that uses a hash

function to map keys into index of an array element.

k1

k2 k3k4

k5

Page 3: Hash tables Definition: A data structure that uses a hash function to map keys into index of an array element. k1 k2 k3 k4 k5.

Some properties of hash tableSize of hash table (Example will be shown.)Hash function: map keys into index of an

array element. (To be continued…)Multiplication HashDivision Hash

Input to build a hash table: array of keys to store in the hash tableint [] input = {1,2,3,4,5,6,7,8}

5

1

6 /

2 /

3 4 /

7 8 /

Page 4: Hash tables Definition: A data structure that uses a hash function to map keys into index of an array element. k1 k2 k3 k4 k5.

ExampleHash table size is 10

103

69 /

20

13

110 /

53 /

10

Page 5: Hash tables Definition: A data structure that uses a hash function to map keys into index of an array element. k1 k2 k3 k4 k5.

Division Hashh1(k) = k mod m

Returns the index of arrayk is the keym is the size of the hash table. Good values of m: prime numbers smaller than

and closest to the size of the input. See Table 1.Java syntax of mod is %.

(input size, m value)

(500, 499)

(1000, 997)

(2000, 1999)

(4000, 3989)

Table 1.

Page 6: Hash tables Definition: A data structure that uses a hash function to map keys into index of an array element. k1 k2 k3 k4 k5.

Multiplication hashh2(k) = floor(m (kA mod 1) )

m is size of hash tableGood values of m: prime numbers smaller than

and closest to the size of the input. See table 1.k is keyA = 0.61803 (Came from (sqrt(5) - 1)/2 )

Hints: Use the decimal in your program is better, it may reduce your bugs.

Page 7: Hash tables Definition: A data structure that uses a hash function to map keys into index of an array element. k1 k2 k3 k4 k5.

CollisionsWhen hashing a key, if collision happens the

new key is stored in the linked list in that location

Number of collisions of a location = Number of elements in that location - 1

103

20

13

110 /

53 /

# of collisions = 2-1=1

# of collisions = 3-1=2

Page 8: Hash tables Definition: A data structure that uses a hash function to map keys into index of an array element. k1 k2 k3 k4 k5.

"the 3 metrics"maxCollisions: Maximum number of collisions

of all locations in a hash tableminCollisions: Minimum number of collisions

of all locations in a hash tabletotalCollisions: Total collisions of all locations

in a hash tableExamples on the next slide

Page 9: Hash tables Definition: A data structure that uses a hash function to map keys into index of an array element. k1 k2 k3 k4 k5.

maxCollisions = 2 minCollisions = 1

(** Note that the minCollisions will be at least 1 if there exists collisions in some locations, even if there are locations with 0 collisions. If there is no collisions at all, return 0. )

totalCollisions = 4

103

20

13

110 /

53 /

105 15 /

# of collisions = 1

# of collisions = 2

# of collisions = 1

Page 10: Hash tables Definition: A data structure that uses a hash function to map keys into index of an array element. k1 k2 k3 k4 k5.

DiscussionWhy metrics?

It can tell us which hash is better according to the collision metrics

Why 3 metrics, why not just measure totalCollisions?Let’s see an example.

Page 11: Hash tables Definition: A data structure that uses a hash function to map keys into index of an array element. k1 k2 k3 k4 k5.

103

20

13 /

110 /

103 13 /

103 13 /

20 110 103 13 103 /

Hash table 1: totalCollisions = 4

Hash table 2: totalCollisions = 4

Which hash table is better?

13 /

103 /

13 /

Page 12: Hash tables Definition: A data structure that uses a hash function to map keys into index of an array element. k1 k2 k3 k4 k5.

We not only want less collisions, but also want to distribute the collisions evenly into the hash table. That is why hash table 1 is better than hash table 2.

This lab is to implement two hash functions, division and multiplication and use metrics of collisions to demonstrate which hash is better.