Algorithms and Data Structures Hash Tables and Associative Arrays.
-
Upload
leon-potter -
Category
Documents
-
view
214 -
download
0
Transcript of Algorithms and Data Structures Hash Tables and Associative Arrays.
Algorithms and Data Structures
Hash Tables and Associative Arrays
2
Introduction
• Hash tables are one implementation of associative arrays, or dictionaries.
• An associative array is an array with a potentially infinite or very large index set but small indices are actually used.
• The main idea is to map the former to the latter.
• N feasible elements, n actually used elements
3
Exercises
• Write a program that reads a text file and outputs the 100 most frequent words in the text.
• Assume you have a large file consisting of triples (transaction, price, customer ID). Explain how to computer the total payment due for each customer. Your program should run in linear time.
4
Why hashing?
• Pros– Fast
• Work with array• Using calculation to figure out the array location for both search
and insert
– Easy to program• Cons– Limited array– Data distribution– Cannot visit in any kind of order– Good hashing function is needed
5
Collisions
• Two or more have been hashed into the same array element
• Solutions– Open addressing – find other empty array element
• Linear probing – not good when big cluster found• Quadratic probing – not to probe to the adjacent
element using square distance instead of linear distance• Double hashing – use the second hash when collision
happens
– Chaining – install a linked list at each index
6
Hashing with Chaining
• Maintain a list for each element in arrays• O(1+n/m) to find or remove with random hash
function
7
Hashing with Linear probing
• Open probing or open hashing• Find and insert are trivial• How to remove?
8
Figure 4.2
9
Chaining versus Linear Probing
Chaining • Referential integrity• Linked list does not
guarantee contiguous physical memory allocation
• Search time is small when the number of element is closed to the size of the table
• More overhead• Harder implementation
Linear • Location could be changed• Contiguous physical
memory visit, thus better performance
• Search time is high when the number of element is closed to the size of the table
• Less overhead• Easier implementation
10
อ้�างอ้�ง• Kurt Mehlhorn and Peter Sanders, Algorithms
and Data Structures: The Basic Toolbox, Springer 2008.
• Robert Lafore, Data Structures & Algorithms in JAVA, SAMS, 2002.