CS 261 – Data Structures Hash Tables Part 1. Open Address Hashing.

25
CS 261 – Data Structures Hash Tables Part 1. Open Address Hashing
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    216
  • download

    1

Transcript of CS 261 – Data Structures Hash Tables Part 1. Open Address Hashing.

Page 1: CS 261 – Data Structures Hash Tables Part 1. Open Address Hashing.

CS 261 – Data Structures

Hash Tables

Part 1. Open Address Hashing

Page 2: CS 261 – Data Structures Hash Tables Part 1. Open Address Hashing.

Can we do better than O(log n) ?•We have seen how skip lists and AVL trees can reduce the time to perform operations from O(n) to O(log n)

•Can we do better? Can we find a structure that will provide O(1) operations?

•Yes. No. Well, Maybe….

Page 3: CS 261 – Data Structures Hash Tables Part 1. Open Address Hashing.

Hash Tables•Hash tables are similar to Arrays except…– Elements can be indexed by values other than integers

– A single position may hold more than one element

•Arbitrary values (hash keys) map to integers by means of a hash function

•Computing a hash function is usually a two-step process:1.Transform the value (or key) to an integer

2.Map that integer to a valid hash table index

•Example: storing names– Compute an integer from a name

– Map the integer to an index in a table (i.e., a vector, array, etc.)

Page 4: CS 261 – Data Structures Hash Tables Part 1. Open Address Hashing.

Hash Tables

Say we’re storing names:

Angie

Joe

Abigail

Linda

Mark

Max

Robert

John

Hash Function0 Angie, Robert

1 Linda

2 Joe, Max, John

3

4 Abigail, Mark

Page 5: CS 261 – Data Structures Hash Tables Part 1. Open Address Hashing.

Hash Function: Transforming to an Integer•Mapping: Map (a part of) the key into an integer– Example: a letter to its position in the alphabet

•Folding: key partitioned into parts which are then combined using efficient operations (such as add, multiply, shift, XOR, etc.)

– Example: summing the values of each character in a string

•Shifting: get rid of high- or low-order bits that are not random

– Example: if keys are always even, shift off the low order bit

•Casts: converting a numeric type into an integer– Example: casting a character to an int to get its ASCII value

Page 6: CS 261 – Data Structures Hash Tables Part 1. Open Address Hashing.

Hash Function: Combinations•Another use for shifting: in combination with folding when the fold operator is commutative:

KeyMapped chars

FoldedShifted and

Folded

eat 5 + 1 + 20 26 20 + 2 + 20 = 42

ate 1 + 20 +

526 4 + 40 + 5 = 49

tea 20 + 5 + 1 26 80 + 10 + 1 = 91

Page 7: CS 261 – Data Structures Hash Tables Part 1. Open Address Hashing.

Hash Function: Mapping to a Valid Index•Almost always use modulus operator (%) with table size:– Example: idx = hash(val) % data.size()

•Must be sure that the final result is positive.– Use only positive arithmetic or take absolute value

– Remember smallest negative number, possibly use longs

•To get a good distribution of indices, prime numbers make the best table sizes:– Example: if you have 1000 elements, a table size of 997 or 1009 is preferable

Page 8: CS 261 – Data Structures Hash Tables Part 1. Open Address Hashing.

Hash Functions: some ideas•Here are some typical hash functions:– Character: the char value cast to an int it’s ASCII value

– Date: a value associated with the current time

– Double: a value generated by its bitwise representation

– Integer: the int value itself

– String: a folded sum of the character values

– URL: the hash code of the host name

Page 9: CS 261 – Data Structures Hash Tables Part 1. Open Address Hashing.

Hash Tables: Collisions•Ideally, we want a perfect hash function where each data element hashes to a unique hash index

•However, unless the data is known in advance, this is usually not possible

•A collision is when two or more different keys result in the same hash table index

Page 10: CS 261 – Data Structures Hash Tables Part 1. Open Address Hashing.

Example, perfect hashing•Alfred, Alessia, Amina, Amy, Andy and Anne have a club. Amy needs to store information in a six element array. Amy discovers can convert 3rd letter to index:

Alfred

F = 5 % 6 = 5

Alessia

E = 4 % 6 = 4

Amina I = 8 % 6 = 2

Amy Y = 24 % 6 = 0

Andy D = 3 % 6 = 3

Anne N = 13 % 6 = 1

Page 11: CS 261 – Data Structures Hash Tables Part 1. Open Address Hashing.

Indexing is faster than searching•Can convert a name (e.g. Alessia) into a number (e.g. 4) in constant time.

•Even faster than searching.

•Allows for O(1) time operations.

•Of course, things get more complicated when the input values change (Alan wants to join the club, since ‘a’ = 0 same as Amy, or worse yet Al who doesn’t have a third letter!)

Page 12: CS 261 – Data Structures Hash Tables Part 1. Open Address Hashing.

Hash Tables: Resolving CollisionsThere are several general approaches to resolving collisions:

1.Open address hashing: if a spot is full, probe for next empty spot

2.Chaining (or buckets): keep a Collection at each table entry

3.caching: save most recently access value, slow search otherwise

Today we will examine Open Address Hashing

Page 13: CS 261 – Data Structures Hash Tables Part 1. Open Address Hashing.

Open Address Hashing•All values are stored in an array.

•Hash value is used to find initial index to try.

•If that position is filled, next position is examined, then next, and so on until an empty position is filled

•The process of looking for an empty position is termed probing, specifically linear probing.

•There are other probing algorithms, but we won’t consider them.

Page 14: CS 261 – Data Structures Hash Tables Part 1. Open Address Hashing.

Example• Eight element table using Amy’s hash function.

Amina Andy Alessia Alfred Aspen

0-aiqy

1-bjrz

2-cks

3-dlt

4-emu

5-fnv

6-gpw

7-hpq

Page 15: CS 261 – Data Structures Hash Tables Part 1. Open Address Hashing.

Now Suppose Anne wants to Join•The index position (5) is filled by Alfred. So we probe to find next free location.

Amina Andy Alessia Alfred Anne Aspen

0-aiqy

1-bjrz

2-cks

3-dlt

4-emu

5-fnv

6-gpw

7-hpq

Page 16: CS 261 – Data Structures Hash Tables Part 1. Open Address Hashing.

Next comes Agnes•Her position, 6, is filled by Anne. So we once more probe. When we get to the end of the array, start again at the beginning. Eventually find position 1.

Amina Agnes Andy Alessia Alfred Anne Aspen

0-aiqy

1-bjrz

2-cks

3-dlt

4-emu

5-fnv

6-gpw

7-hpq

Page 17: CS 261 – Data Structures Hash Tables Part 1. Open Address Hashing.

Finally comes Alan•Lastly Alan wants to join. His location, 0, is filled by Amina. Probe finds last free location. Collection is now completely filled. (More on this later)

Amina Agnes Alan Andy Alessia Alfred Anne Aspen

0-aiqy

1-bjrz

2-cks

3-dlt

4-emu

5-fnv

6-gpw

7-hpq

Page 18: CS 261 – Data Structures Hash Tables Part 1. Open Address Hashing.

Next operation, contains test•Hash to find initial index, move forward examining each location until value is found, or empty location is found.

•Search for Amina, Search for Anne, search for Albert

•Notice that search time is not uniform

Amina Andy Alessia Alfred Aspen

0-aiqy

1-bjrz

2-cks

3-dlt

4-emu

5-fnv

6-gpw

7-hpq

Page 19: CS 261 – Data Structures Hash Tables Part 1. Open Address Hashing.

Final Operation: Remove•Remove is tricky. Can’t just replace entry with null. What happens if we delete Agnes, then search for Alan?

Amina Alan Andy Alessia Alfred Anne Aspen

0-aiqy

1-bjrz

2-cks

3-dlt

4-emu

5-fnv

6-gpw

7-hpq

Page 20: CS 261 – Data Structures Hash Tables Part 1. Open Address Hashing.

How to handle remove•Simple solution: Just don’t do it. (we will do this one)

•Better: create a tombstone:–A value that marks a deleted entry–Can be replaced with new entry–But doesn’t halt a search

AminaTOMBSTONE

Alan Andy Alessia Alfred Anne Aspen

0-aiqy

1-bjrz

2-cks

3-dlt

4-emu

5-fnv

6-gpw

7-hpq

Page 21: CS 261 – Data Structures Hash Tables Part 1. Open Address Hashing.

Hash Table Size - Load Factor•Load factor:

= n / m

–So, load factor represents the average number of elements at each table entry

–For open address hashing, load factor is between 0 and 1 (often somewhere between 0.5 and 0.75)

–For chaining, load factor can be greater than 1

•Want the load factor to remain small

Load factor

# of elements

Size of table

Page 22: CS 261 – Data Structures Hash Tables Part 1. Open Address Hashing.

What to do with a large load factor•Common solution: When the load factor becomes too large (say, bigger than 0.75) then reorganize.

•Create a new table with twice the number of positions

•Copy each element, rehashing using the new table size, placing elements in new table

•The delete the old table

•Exactly like you did with the dynamic array, only this time using hashing.

Page 23: CS 261 – Data Structures Hash Tables Part 1. Open Address Hashing.

Hash Tables: Algorithmic Complexity•Assumptions:–Time to compute hash function is constant

–Worst case analysis All values hash to same position

–Best case analysis Hash function uniformly distributes the values (all buckets have the same number of objects in them)

•Find element operation:–Worst case for open addressing O(n)

–Best case for open addressing O(1)

Page 24: CS 261 – Data Structures Hash Tables Part 1. Open Address Hashing.

Hash Tables: Average Case•What about average case?

•Turns out, it is 1/(1-)•So keeping load factor small is very important

(1/(1-))

0.25 1.3

0.5 2.0

0.6 2.5

0.75 4.0

0.85 6.6

0.95 19.0

Page 25: CS 261 – Data Structures Hash Tables Part 1. Open Address Hashing.

Your turn•Complete the implementation of the hash table

•Use hashfun(value) to get hash value

•Don’t do remove.

•Do add and contains test first, then do the internal reorganize method