Algorithms and Data Structures Hash Tables and Associative Arrays.

Post on 22-Dec-2015

214 views 0 download

Transcript of Algorithms and Data Structures Hash Tables and Associative Arrays.

Algorithms and Data Structures

Hash Tables and Associative Arrays

2

Introduction

• Hash tables are one implementation of associative arrays, or dictionaries.

• An associative array is an array with a potentially infinite or very large index set but small indices are actually used.

• The main idea is to map the former to the latter.

• N feasible elements, n actually used elements

3

Exercises

• Write a program that reads a text file and outputs the 100 most frequent words in the text.

• Assume you have a large file consisting of triples (transaction, price, customer ID). Explain how to computer the total payment due for each customer. Your program should run in linear time.

4

Why hashing?

• Pros– Fast

• Work with array• Using calculation to figure out the array location for both search

and insert

– Easy to program• Cons– Limited array– Data distribution– Cannot visit in any kind of order– Good hashing function is needed

5

Collisions

• Two or more have been hashed into the same array element

• Solutions– Open addressing – find other empty array element

• Linear probing – not good when big cluster found• Quadratic probing – not to probe to the adjacent

element using square distance instead of linear distance• Double hashing – use the second hash when collision

happens

– Chaining – install a linked list at each index

6

Hashing with Chaining

• Maintain a list for each element in arrays• O(1+n/m) to find or remove with random hash

function

7

Hashing with Linear probing

• Open probing or open hashing• Find and insert are trivial• How to remove?

8

Figure 4.2

9

Chaining versus Linear Probing

Chaining • Referential integrity• Linked list does not

guarantee contiguous physical memory allocation

• Search time is small when the number of element is closed to the size of the table

• More overhead• Harder implementation

Linear • Location could be changed• Contiguous physical

memory visit, thus better performance

• Search time is high when the number of element is closed to the size of the table

• Less overhead• Easier implementation

10

อ้�างอ้�ง• Kurt Mehlhorn and Peter Sanders, Algorithms

and Data Structures: The Basic Toolbox, Springer 2008.

• Robert Lafore, Data Structures & Algorithms in JAVA, SAMS, 2002.