Algorithms and Data Structures Hash Tables and Associative Arrays.

10
Algorithms and Data Structures Hash Tables and Associative Arrays

Transcript of Algorithms and Data Structures Hash Tables and Associative Arrays.

Page 1: Algorithms and Data Structures Hash Tables and Associative Arrays.

Algorithms and Data Structures

Hash Tables and Associative Arrays

Page 2: Algorithms and Data Structures Hash Tables and Associative Arrays.

2

Introduction

• Hash tables are one implementation of associative arrays, or dictionaries.

• An associative array is an array with a potentially infinite or very large index set but small indices are actually used.

• The main idea is to map the former to the latter.

• N feasible elements, n actually used elements

Page 3: Algorithms and Data Structures Hash Tables and Associative Arrays.

3

Exercises

• Write a program that reads a text file and outputs the 100 most frequent words in the text.

• Assume you have a large file consisting of triples (transaction, price, customer ID). Explain how to computer the total payment due for each customer. Your program should run in linear time.

Page 4: Algorithms and Data Structures Hash Tables and Associative Arrays.

4

Why hashing?

• Pros– Fast

• Work with array• Using calculation to figure out the array location for both search

and insert

– Easy to program• Cons– Limited array– Data distribution– Cannot visit in any kind of order– Good hashing function is needed

Page 5: Algorithms and Data Structures Hash Tables and Associative Arrays.

5

Collisions

• Two or more have been hashed into the same array element

• Solutions– Open addressing – find other empty array element

• Linear probing – not good when big cluster found• Quadratic probing – not to probe to the adjacent

element using square distance instead of linear distance• Double hashing – use the second hash when collision

happens

– Chaining – install a linked list at each index

Page 6: Algorithms and Data Structures Hash Tables and Associative Arrays.

6

Hashing with Chaining

• Maintain a list for each element in arrays• O(1+n/m) to find or remove with random hash

function

Page 7: Algorithms and Data Structures Hash Tables and Associative Arrays.

7

Hashing with Linear probing

• Open probing or open hashing• Find and insert are trivial• How to remove?

Page 8: Algorithms and Data Structures Hash Tables and Associative Arrays.

8

Figure 4.2

Page 9: Algorithms and Data Structures Hash Tables and Associative Arrays.

9

Chaining versus Linear Probing

Chaining • Referential integrity• Linked list does not

guarantee contiguous physical memory allocation

• Search time is small when the number of element is closed to the size of the table

• More overhead• Harder implementation

Linear • Location could be changed• Contiguous physical

memory visit, thus better performance

• Search time is high when the number of element is closed to the size of the table

• Less overhead• Easier implementation

Page 10: Algorithms and Data Structures Hash Tables and Associative Arrays.

10

อ้�างอ้�ง• Kurt Mehlhorn and Peter Sanders, Algorithms

and Data Structures: The Basic Toolbox, Springer 2008.

• Robert Lafore, Data Structures & Algorithms in JAVA, SAMS, 2002.