Data Structures CSCI 132, Spring 2014 Lecture 34 Analyzing Hash Tables

13
1 Data Structures CSCI 132, Spring 2014 Lecture 34 Analyzing Hash Tables

description

Data Structures CSCI 132, Spring 2014 Lecture 34 Analyzing Hash Tables. Recall Hash Tables. Hash tables use an index function that maps many possible keys to a single location. If the table is sparse, then most of the time only 1 key will go to each location. - PowerPoint PPT Presentation

Transcript of Data Structures CSCI 132, Spring 2014 Lecture 34 Analyzing Hash Tables

Page 1: Data Structures CSCI 132, Spring 2014 Lecture 34 Analyzing Hash Tables

1

Data Structures

CSCI 132, Spring 2014Lecture 34

Analyzing Hash Tables

Page 2: Data Structures CSCI 132, Spring 2014 Lecture 34 Analyzing Hash Tables

2

Recall Hash Tables

A hash table

•Hash tables use an index function that maps many possible keys to a single location.

•If the table is sparse, then most of the time only 1 key will go to each location.

•If 2 records do get assigned to the same location (a collision), we use a method for reassigning the second record (collision resolution).

Page 3: Data Structures CSCI 132, Spring 2014 Lecture 34 Analyzing Hash Tables

3

The C++ Hash Table Specification

const int hash_size = 997; // a prime number of appropriate size

class Hash_table { public: Hash_table( ); void clear( ); Error_code insert(const Record &new_entry); Error_code retrieve(const Key &target, Record &found) const; private: Record table[hash_size];};

Page 4: Data Structures CSCI 132, Spring 2014 Lecture 34 Analyzing Hash Tables

4

Implementation of insert( )

Error_code Hash_table :: insert(const Record &new_entry) { Error_code result = success; int probe_count, // Counter to be sure that table is not full. increment, // Increment used for quadratic probing. probe; // Position currently probed in the hash table. Key null; // Null key for comparison purposes. null.make_blank( );

probe = hash(new_entry); //Find location to insert new_entry

probe_count = 0; increment = 1;

Page 5: Data Structures CSCI 132, Spring 2014 Lecture 34 Analyzing Hash Tables

5

insert( ) continued

while (table[probe] != null // Is the location empty? && table[probe] != new_entry // Duplicate key? && probe_count < (hash_size + 1)/2) { // Has overflow occurred? probe_count++; probe = (probe + increment)%hash_size; increment += 2; // Prepare increment for next iteration. } if (table[probe] == null) table[probe] = new_entry; // Insert new entry. else if (table[probe] == new_entry) result = duplicate_error; else result = overflow; // The table is full. return result;}

Page 6: Data Structures CSCI 132, Spring 2014 Lecture 34 Analyzing Hash Tables

6

Likelihood of collisions

•How many people have to be in a room before the probability that two of them have the same birthday reaches 50%?

P = (1 - (364/365)*(363/365)*(362/365)* ...*(365-m+1)/365 > 0.5 when m >= 23

•The calculation for a probability of a collision in a table is similar.

•The table does not have to be very full for the probability of a collision to reach at least 50%.

•Therefore: Collisions happen! We must handle them efficiently.

Page 7: Data Structures CSCI 132, Spring 2014 Lecture 34 Analyzing Hash Tables

7

Counting Probes

•We can analyze the running time of hash tables by counting comparisons.

•Comparisons take place when "probing" an entry: Looking at an entry and comparing its key to a target.

•The number of probes done depends on how full the table is.n = number of entries in the tablet = number of total positions in table (= hash_size) = n/t = Load Factor

= 0 means no entries in table= 0.5 means the table is 1/2 full<= 1 for contiguous table without chaining (open addressing)can be greater than 1 if using chaining

Page 8: Data Structures CSCI 132, Spring 2014 Lecture 34 Analyzing Hash Tables

8

Number of comparisons for chaining

Unsuccessful searches:•If entries distributed evenly over the table, then the expected number of entries in each chain is: n/t = .

•For an unsuccessful search, we must do one probe for each entry in the list, so the average number of probes (or comparisons) is .

Successful searches:•Average number of comparisons for sequential search of a list with k items is:

(k + 1)/2•The node we are looking for is in our list, the other n-1 nodes are distributed evenly over the table so the average number of nodes will be:

k = (n-1)/t + 1 ~ n/t + 1 = + 1.•Average number of comparisons will be

( + 1 + 1)/2 =/2 + 1

Page 9: Data Structures CSCI 132, Spring 2014 Lecture 34 Analyzing Hash Tables

9

Open addressing (without chaining)

Evenly distributed entries, Random probing:Number of Comparisons (approx)

Successful case: (1/)ln(1/(1-))Unsuccessful case: 1/(1 - )

Linear Probing:Successful case: 0.5(1 + 1/(1-) )Unsuccessful case: 0.5(1 + 1/(1-)2 )

Page 10: Data Structures CSCI 132, Spring 2014 Lecture 34 Analyzing Hash Tables

Theoretical and empirical results

Page 11: Data Structures CSCI 132, Spring 2014 Lecture 34 Analyzing Hash Tables

11

Hash Tables vs. Other Methods

•Speed of retrieval from a hash table does not depend on the total number of entries, but on the ratio of entries/table-size ().

•A table of size 40 with 20 entries has the same performance as a table of size 4000 with 2000 entries.

Sequential Search: (n)Binary Search: ( lg(n))Hash Table retrieval: O (1) for small .

•Read section 9.8 on choosing a method for storage and retrieval of data.

Page 12: Data Structures CSCI 132, Spring 2014 Lecture 34 Analyzing Hash Tables

12

Radix sort

Radix sort creates a table of queues. Each queue corresponds to a letter of the alphabet.

Sort from least significant letter to most significant letter.

Page 13: Data Structures CSCI 132, Spring 2014 Lecture 34 Analyzing Hash Tables

13

Implementation of Radix Sort

const int key_size = 5;const int max_chars = 28;template <class Record>void Sortable_list<Record> :: radix_sort( ) { Record data; Queue queues[max_chars]; for (int position = key_size - 1; position >= 0; position--) { // Loop from the least to the most significant position. while (remove(0, data) == success) { int queue_number = alphabetic_order(data.key_letter(position)); queues[queue_number].append(data); // Queue operation. } rethread(queues); // Reassemble the list. }}