1/51 Dictionaries, Tables Hashing TCSS 342. 2/51 The Dictionary ADT a dictionary (table) is an...

58
1/51 Dictionaries, Tables Hashing TCSS 342

Transcript of 1/51 Dictionaries, Tables Hashing TCSS 342. 2/51 The Dictionary ADT a dictionary (table) is an...

Page 1: 1/51 Dictionaries, Tables Hashing TCSS 342. 2/51 The Dictionary ADT a dictionary (table) is an abstract model of a database like a priority queue, a dictionary.

1/51

Dictionaries, Tables Hashing

TCSS 342

Page 2: 1/51 Dictionaries, Tables Hashing TCSS 342. 2/51 The Dictionary ADT a dictionary (table) is an abstract model of a database like a priority queue, a dictionary.

2/51

The Dictionary ADT

• a dictionary (table) is an abstract model of a database

• like a priority queue, a dictionary stores key-element pairs

• the main operation supported by a dictionary is searching by key

Page 3: 1/51 Dictionaries, Tables Hashing TCSS 342. 2/51 The Dictionary ADT a dictionary (table) is an abstract model of a database like a priority queue, a dictionary.

3/51

Examples

• Telephone directory

• Library catalogue

• Books in print: key ISBN

• FAT (File Allocation Table)

Page 4: 1/51 Dictionaries, Tables Hashing TCSS 342. 2/51 The Dictionary ADT a dictionary (table) is an abstract model of a database like a priority queue, a dictionary.

4/51

Main Issues

• Size

• Operations: search, insert, delete, ??? Create reports??? List?

• What will be stored in the dictionary?

• How will be items identified?

Page 5: 1/51 Dictionaries, Tables Hashing TCSS 342. 2/51 The Dictionary ADT a dictionary (table) is an abstract model of a database like a priority queue, a dictionary.

5/51

The Dictionary ADT

• simple container methods:– size()

– isEmpty()

– elements()

• query methods:– findElement(k)

– findAllElements(k)

Page 6: 1/51 Dictionaries, Tables Hashing TCSS 342. 2/51 The Dictionary ADT a dictionary (table) is an abstract model of a database like a priority queue, a dictionary.

6/51

The Dictionary ADT

• update methods:– insertItem(k, e)– removeElement(k)– removeAllElements(k)

• special element– NO_SUCH_KEY, returned by an unsuccessful

search

Page 7: 1/51 Dictionaries, Tables Hashing TCSS 342. 2/51 The Dictionary ADT a dictionary (table) is an abstract model of a database like a priority queue, a dictionary.

7/51

Implementing a Dictionary with a Sequence

• unordered sequence– searching and removing takes O(n) time– inserting takes O(1) time– applications to log files (frequent insertions,

rare searches and removals) 34 14 12 22 18

34 14 12 22 18

Page 8: 1/51 Dictionaries, Tables Hashing TCSS 342. 2/51 The Dictionary ADT a dictionary (table) is an abstract model of a database like a priority queue, a dictionary.

8/51

Implementing a Dictionary with a Sequence

• array-based ordered sequence (assumes keys can be ordered)- searching takes O(log n) time (binary search)- inserting and removing takes O(n) time- application to look-up tables (frequent searches, rare insertions and removals)

12 14 18 22 34

Page 9: 1/51 Dictionaries, Tables Hashing TCSS 342. 2/51 The Dictionary ADT a dictionary (table) is an abstract model of a database like a priority queue, a dictionary.

9/51

Binary Search

• narrow down the search range in stages

• “high-low” game

• findElement(22)

2 4 5 7 8 9 12 14 17 19 22 25 27 28 33 37

low highmid

14

Page 10: 1/51 Dictionaries, Tables Hashing TCSS 342. 2/51 The Dictionary ADT a dictionary (table) is an abstract model of a database like a priority queue, a dictionary.

10/51

Binary Search

2 4 5 7 8 9 12 14 17 19 22 25 27 28 33 37

low highmid

25

2 4 5 7 8 9 12 14 17 19 22 25 27 28 33 37

low highmid

19

2 4 5 7 8 9 12 14 17 19 22 25 27 28 33 37

low = mid = high

22

Page 11: 1/51 Dictionaries, Tables Hashing TCSS 342. 2/51 The Dictionary ADT a dictionary (table) is an abstract model of a database like a priority queue, a dictionary.

11/51

Pseudocode for Binary SearchAlgorithm

BinarySearch(S, k, low, high)if low > high then return NO_SUCH_KEYelse mid (low+high) / 2if k = key(mid) then return key(mid)else if k < key(mid) then return BinarySearch(S, k, low, mi-1)else return BinarySearch(S, k, mi+1, high)

Page 12: 1/51 Dictionaries, Tables Hashing TCSS 342. 2/51 The Dictionary ADT a dictionary (table) is an abstract model of a database like a priority queue, a dictionary.

12/51

Running Time of Binary Search

• The range of candidate items to be searched is halved after each comparison

Comparison Search Range

0 n

1 n/2

… …

2i n/2i

log 2n 1

Page 13: 1/51 Dictionaries, Tables Hashing TCSS 342. 2/51 The Dictionary ADT a dictionary (table) is an abstract model of a database like a priority queue, a dictionary.

13/51

Running Time of Binary Search

• In the array-based implementation, access by rank takes O(1) time, thus binary search runs in O(log n) time

• Binary Search is applicable only to Random Access structures (Arrays, Vectors…)

Page 14: 1/51 Dictionaries, Tables Hashing TCSS 342. 2/51 The Dictionary ADT a dictionary (table) is an abstract model of a database like a priority queue, a dictionary.

14/51

Implementations

• Sorted? Non Sorted?

• Elementary: Arrays, vectors linked lists– Orgainization: None (log file), Sorted, Hashed

• Advanced: balanced trees

Page 15: 1/51 Dictionaries, Tables Hashing TCSS 342. 2/51 The Dictionary ADT a dictionary (table) is an abstract model of a database like a priority queue, a dictionary.

15/51

Skip Lists

• Simulate Binary Search on a linked list.

• Linked list allows easy insertion and deletion.

• http://www.epaperpress.com/s_man.html

Page 16: 1/51 Dictionaries, Tables Hashing TCSS 342. 2/51 The Dictionary ADT a dictionary (table) is an abstract model of a database like a priority queue, a dictionary.

16/51

A FAT Example

• Directory: Key: file name. Data (time, date, size …) location of first block in the FAT table.

• If first block is in physical location #23 (Disk block number) look up position #23 in the FAT. Either shows end of file or has the block number on disk.

• Example: Directory entry: block # 4• FAT: x x x F 5 6 10 x 23 25

3 The file occupies blocks 4,5,6,10, 3.

Page 17: 1/51 Dictionaries, Tables Hashing TCSS 342. 2/51 The Dictionary ADT a dictionary (table) is an abstract model of a database like a priority queue, a dictionary.

17/51

Hashing

• Place item with key k in position h(k).

• Hope: h(k) is 1-1.

• Requires: unique key (unless multiple items allowed). Key must be protected from change (use abstract class that provides only a constructor).

• Keys must be “comparable”.

Page 18: 1/51 Dictionaries, Tables Hashing TCSS 342. 2/51 The Dictionary ADT a dictionary (table) is an abstract model of a database like a priority queue, a dictionary.

18/51

Key class

• public abstract class KeyID {• Private Comparable searchKey;• Public KeyID(Comparable m) {• searchKey = m;• }//Only one constructor• public Comparable getSearchKey() {• return searchKey;

}

• }

Page 19: 1/51 Dictionaries, Tables Hashing TCSS 342. 2/51 The Dictionary ADT a dictionary (table) is an abstract model of a database like a priority queue, a dictionary.

19/51

Hashing Problem

• RT&T is a large phone company, and they want to provide enhanced caller ID capability:– given a phone number, return the caller’s name– phone numbers are in the range 0 to R = 1010–1– n is the number of phone numbers used– want to do this as efficiently as possible

Page 20: 1/51 Dictionaries, Tables Hashing TCSS 342. 2/51 The Dictionary ADT a dictionary (table) is an abstract model of a database like a priority queue, a dictionary.

20/51

Hashing Problem

• We know two ways to design this dictionary:• abalanced search tree (AVL, red-black) or a skip-

list with the phone number as the key has O(log n) query time and O(n) space --- good space usage and search time, but can we reduce the search time to constant?

• abucket array indexed by the phone number has optimal O(1) query time, but there is a huge amount of wasted space: O(n + R)

Page 21: 1/51 Dictionaries, Tables Hashing TCSS 342. 2/51 The Dictionary ADT a dictionary (table) is an abstract model of a database like a priority queue, a dictionary.

21/51

Bucket Array

• Each cell is thought of as a bucket or a container– Holds key element pairs– In array A of size N, an element e with key k is inserted

in A[k].

(null) (null) Roberto (null)… …

000-000-0000 000-000-0001 401-863-7639 ... 999-999-9999

Page 22: 1/51 Dictionaries, Tables Hashing TCSS 342. 2/51 The Dictionary ADT a dictionary (table) is an abstract model of a database like a priority queue, a dictionary.

22/51

Generalized indexing

• Hash table– Data storage associated with a key– The key need not be an integer

Page 23: 1/51 Dictionaries, Tables Hashing TCSS 342. 2/51 The Dictionary ADT a dictionary (table) is an abstract model of a database like a priority queue, a dictionary.

23/51

Hash Tables

• A data structure• The location of an item is determined

– directly as a function of the item itself

– Not by a sequence of trial and error comparisons

• Commonly used to provide faster searching– O(n) for linear searches– O (logn) for binary search– O(1) for hash table

Page 24: 1/51 Dictionaries, Tables Hashing TCSS 342. 2/51 The Dictionary ADT a dictionary (table) is an abstract model of a database like a priority queue, a dictionary.

24/51

Example:

• A symbol table constructed by a compiler– Stores identifiers and information about

them

Page 25: 1/51 Dictionaries, Tables Hashing TCSS 342. 2/51 The Dictionary ADT a dictionary (table) is an abstract model of a database like a priority queue, a dictionary.

25/51

Another Solution

• A Hash Table is an alternative solution with O(1) expected query time and O(n + N) space, where N is the size of the table

• Like an array, but with a function to map the large range of keys into a smaller one – e.g., take the original key, mod the size of the

table, and use that as an index

Page 26: 1/51 Dictionaries, Tables Hashing TCSS 342. 2/51 The Dictionary ADT a dictionary (table) is an abstract model of a database like a priority queue, a dictionary.

26/51

Example

• Insert item (401-863-7639, Roberto) into a table ofsize 5

• 4018637639 mod 5 = 4, so item (401-863-7639, Roberto) is stored in slot 4 of the table

• A lookup uses the same process: map the key to an index, then check the array cell at that index

401-863-7639Roberto

0 1 2 3 4

Page 27: 1/51 Dictionaries, Tables Hashing TCSS 342. 2/51 The Dictionary ADT a dictionary (table) is an abstract model of a database like a priority queue, a dictionary.

27/51

Collision

• Insert (401-863-9350, Andy)

• And insert (401-863-2234, Devin). We have a collision!

Page 28: 1/51 Dictionaries, Tables Hashing TCSS 342. 2/51 The Dictionary ADT a dictionary (table) is an abstract model of a database like a priority queue, a dictionary.

28/51

Collision Resolution

• How to deal with two keys which map to the same cell of the array?

• Use chaining– Set up lists of items with the same index

Page 29: 1/51 Dictionaries, Tables Hashing TCSS 342. 2/51 The Dictionary ADT a dictionary (table) is an abstract model of a database like a priority queue, a dictionary.

29/51

Chaining

0

1

2

3

4

Page 30: 1/51 Dictionaries, Tables Hashing TCSS 342. 2/51 The Dictionary ADT a dictionary (table) is an abstract model of a database like a priority queue, a dictionary.

30/51

Chaining

• The expected, search/insertion/removal time is O(n/N), provided the indices are uniformly distributed

• The performance of the data structure can be fine-tuned by changing the table size N

Page 31: 1/51 Dictionaries, Tables Hashing TCSS 342. 2/51 The Dictionary ADT a dictionary (table) is an abstract model of a database like a priority queue, a dictionary.

31/51

Hash Function

• Function h defined by h(i) = i– Determines the location of an item i in the

hash table

• Called a hash function.

• To reduce the large size of a hash table use

• h(i) = i mod 25;

Page 32: 1/51 Dictionaries, Tables Hashing TCSS 342. 2/51 The Dictionary ADT a dictionary (table) is an abstract model of a database like a priority queue, a dictionary.

32/51

From Keys to Indices

• The mapping of keys to indices of a hash table is called a hash function

• A hash function is usually the composition of two maps:– hash code map: key integer– compression map: integer [0, N 1]

• An essential requirement of the hash function is tomap equal keys to equal indices

• A “good” hash function minimizes the probability of collisions

Page 33: 1/51 Dictionaries, Tables Hashing TCSS 342. 2/51 The Dictionary ADT a dictionary (table) is an abstract model of a database like a priority queue, a dictionary.

33/51

Java Hash

• Java provides a hashCode() method for the Object class, which typically returns the 32-bit memory address of the object.

• This default hash code would work poorly for Integer and String objects

• The hashCode() method should be suitably redefined by classes.

Page 34: 1/51 Dictionaries, Tables Hashing TCSS 342. 2/51 The Dictionary ADT a dictionary (table) is an abstract model of a database like a priority queue, a dictionary.

34/51

Popular Hash-Code Maps

• Integer cast: for numeric types with 32 bits or less, we can reinterpret the bits of the number as an int

• Component sum: for numeric types with more than 32 bits (e.g., long and double), we can add the 32-bit components.

Page 35: 1/51 Dictionaries, Tables Hashing TCSS 342. 2/51 The Dictionary ADT a dictionary (table) is an abstract model of a database like a priority queue, a dictionary.

35/51

Popular Hash-Code Maps

• Polynomial accumulation: for strings of a natural language, combine the character values (ASCII or Unicode) a 0 a 1 ... a n-1 by viewing them as the coefficients of a polynomial: a 0 + a 1 x + ...+ x n-1 a n-1

Page 36: 1/51 Dictionaries, Tables Hashing TCSS 342. 2/51 The Dictionary ADT a dictionary (table) is an abstract model of a database like a priority queue, a dictionary.

36/51

Popular Hash-Code Maps

– The polynomial is computed with Horner’s rule, ignoring overflows, at a fixed value x:a0 + x (a1 + x (a2 + ... x (an-2 + x an-1 ) ... ))

– The choice x = 33, 37, 39, or 41 gives at most 6 collisions on a vocabulary of 50,000 English words

• Why is the component-sum hash code bad for strings?

Page 37: 1/51 Dictionaries, Tables Hashing TCSS 342. 2/51 The Dictionary ADT a dictionary (table) is an abstract model of a database like a priority queue, a dictionary.

37/51

Random Hashing

• Random hashing– Uses a simple random number generation

technique– Scatters the items “randomly” throughout

the hash table

Page 38: 1/51 Dictionaries, Tables Hashing TCSS 342. 2/51 The Dictionary ADT a dictionary (table) is an abstract model of a database like a priority queue, a dictionary.

38/51

Popular Compression Maps

• Division: h(k) = |k| mod N– the choice N =2 k is bad because not all the bits are

taken into account– the table size N is usually chosen as a prime

number– certain patterns in the hash codes are propagated

• Multiply, Add, and Divide (MAD):– h(k) = |ak + b| mod N– eliminates patterns provided a mod N 0– same formula used in linear congruential (pseudo)

random number generators

Page 39: 1/51 Dictionaries, Tables Hashing TCSS 342. 2/51 The Dictionary ADT a dictionary (table) is an abstract model of a database like a priority queue, a dictionary.

39/51

More on Collisions

• A key is mapped to an already occupied table location – what to do?!?

• Use a collision handling technique – We’ve seen Chaining– Can also use Open Addressing

• Double Hashing• Linear Probing

Page 40: 1/51 Dictionaries, Tables Hashing TCSS 342. 2/51 The Dictionary ADT a dictionary (table) is an abstract model of a database like a priority queue, a dictionary.

40/51

Linear Probing

• If the current location is used, try the next table location

• linear_probing_insert(K)if (table is full) errorprobe = h(K)while (table[probe] occupied)probe = (probe + 1) mod Mtable[probe] = K

Page 41: 1/51 Dictionaries, Tables Hashing TCSS 342. 2/51 The Dictionary ADT a dictionary (table) is an abstract model of a database like a priority queue, a dictionary.

41/51

Linear Probing

• Lookups walk along table until the key or an empty slot is found

• Uses less memory than chaining– don’t have to store all those links

• Slower than chaining– may have to walk along table for a long way

• Deletion is more complex– either mark the deleted slot– or fill in the slot by shifting some elements down

Page 42: 1/51 Dictionaries, Tables Hashing TCSS 342. 2/51 The Dictionary ADT a dictionary (table) is an abstract model of a database like a priority queue, a dictionary.

42/51

Linear Probing Example

• h(k) = k mod 13

• Insert keys:

• 18 41 22 44 59 32 31 73

0 1 2 3 4 5 6 7 8 9 10 11 12

41 18 44 59 32 22 31 72

0 1 2 3 4 5 6 7 8 9 10 11 12

Page 43: 1/51 Dictionaries, Tables Hashing TCSS 342. 2/51 The Dictionary ADT a dictionary (table) is an abstract model of a database like a priority queue, a dictionary.

43/51

Double Hashing

• Use two hash functions• If M is prime, eventually will examine every

position in the table• double_hash_insert(K)

if(table is full) errorprobe = h1(K)offset = h2(K)while (table[probe] occupied) probe = (probe + offset) mod Mtable[probe] = K

Page 44: 1/51 Dictionaries, Tables Hashing TCSS 342. 2/51 The Dictionary ADT a dictionary (table) is an abstract model of a database like a priority queue, a dictionary.

44/51

Double Hashing

• Many of same (dis)advantages as linear probing

• Distributes keys more uniformly than linear probing does

Page 45: 1/51 Dictionaries, Tables Hashing TCSS 342. 2/51 The Dictionary ADT a dictionary (table) is an abstract model of a database like a priority queue, a dictionary.

45/51

Double Hashing Example

• h1(K) = K mod 13• h2(K) = 8 - K mod 8

– we want h2 to be an offset to add– 18 41 22 44 59 32 31 73

0 1 2 3 4 5 6 7 8 9 10 11 12

44 41 73 18 32 53 31 22

0 1 2 3 4 5 6 7 8 9 10 11 12

Page 46: 1/51 Dictionaries, Tables Hashing TCSS 342. 2/51 The Dictionary ADT a dictionary (table) is an abstract model of a database like a priority queue, a dictionary.

46/51

Hash code

• static int hashCode(long i) {return (int)((i >> 32) + (int) i);}

Page 47: 1/51 Dictionaries, Tables Hashing TCSS 342. 2/51 The Dictionary ADT a dictionary (table) is an abstract model of a database like a priority queue, a dictionary.

47/51

Hash code

• static int hashCode(String s) { int h=0; for (int i=0; i<s.length(); i++) {

h = (h << 5) | (h >>> 27); // 5-bit cyclic shift of the running sum

h += (int) s.charAt(i); // add in next character } return h; }

Page 48: 1/51 Dictionaries, Tables Hashing TCSS 342. 2/51 The Dictionary ADT a dictionary (table) is an abstract model of a database like a priority queue, a dictionary.

48/51

Linear Probing Hash Table

public class LinearProbingHashTable implements Dictionary {

/** Marker for deactivated buckets */ private static Item AVAILABLE = new Item(null, null); /** number of items in the dictionary */ private int n = 0; /** capacity of the bucket array */ private int N; /** bucket array */ private Item[] A;

Page 49: 1/51 Dictionaries, Tables Hashing TCSS 342. 2/51 The Dictionary ADT a dictionary (table) is an abstract model of a database like a priority queue, a dictionary.

49/51

Linear Probing Hash Table

/** hash comparator */ private HashComparator h; /** constructor providing the hash comparator */ public LinearProbingHashTable(HashComparator

hc) { h = hc; N = 1023; // default capacity A = new Item[N]; }

Page 50: 1/51 Dictionaries, Tables Hashing TCSS 342. 2/51 The Dictionary ADT a dictionary (table) is an abstract model of a database like a priority queue, a dictionary.

50/51

Linear Probing Hash Table

/** constructor providing the hash comparator and the capacity

* of the bucket array */ public LinearProbingHashTable(HashComparator

hc, int bN) { h = hc; N = bN; A = new Item[N]; }

Page 51: 1/51 Dictionaries, Tables Hashing TCSS 342. 2/51 The Dictionary ADT a dictionary (table) is an abstract model of a database like a priority queue, a dictionary.

51/51

Linear Probing Hash Table

// auxiliary methods private boolean available(int i) { return (A[i] == AVAILABLE); } private boolean empty(int i) { return (A[i] == null); }

Page 52: 1/51 Dictionaries, Tables Hashing TCSS 342. 2/51 The Dictionary ADT a dictionary (table) is an abstract model of a database like a priority queue, a dictionary.

52/51

Linear Probing Hash Table

private Object key(int i) { return A[i].key(); } private Object element(int i) { return A[i].element(); } private void check(Object k) { if (!h.isComparable(k)) throw new

InvalidKeyException("Invalid key."); }

Page 53: 1/51 Dictionaries, Tables Hashing TCSS 342. 2/51 The Dictionary ADT a dictionary (table) is an abstract model of a database like a priority queue, a dictionary.

53/51

Helper search method

/** helper search method */ private int findItem(Object key) throws

InvalidKeyException { check(key); int i = h.hashValue(key) % N;

// division method compression map int j = i;

Page 54: 1/51 Dictionaries, Tables Hashing TCSS 342. 2/51 The Dictionary ADT a dictionary (table) is an abstract model of a database like a priority queue, a dictionary.

54/51

Helper search methoddo {

if (empty(i)) return -1; // item is not found

if (available(i)) i = (i + 1) % N; // bucket is deactivated

else if (h.isEqualTo(key(i), key)) // we have found our item

return i; else // we must keep looking

i = (i + 1) % N; } while (i != j);

return -1; // item is not found }

Page 55: 1/51 Dictionaries, Tables Hashing TCSS 342. 2/51 The Dictionary ADT a dictionary (table) is an abstract model of a database like a priority queue, a dictionary.

55/51

Dictionary

// methods of the dictionary ADT public Object findElement (Object key) throws

InvalidKeyException { int i = findItem(key); // helper method for

finding a key if (i < 0)

return Dictionary.NO_SUCH_KEY; return element(i); }

Page 56: 1/51 Dictionaries, Tables Hashing TCSS 342. 2/51 The Dictionary ADT a dictionary (table) is an abstract model of a database like a priority queue, a dictionary.

56/51

Dictionary

public void insertItem (Object key, Object element) throws InvalidKeyException {

check(key); int i = h.hashValue(key) % N; // division method

compression map int j = i;

Page 57: 1/51 Dictionaries, Tables Hashing TCSS 342. 2/51 The Dictionary ADT a dictionary (table) is an abstract model of a database like a priority queue, a dictionary.

57/51

Dictionary

// remember where we are starting do { if (empty(i) || available(i)) { // this slot is available

A[i] = new Item(key, element); n++;

return; } i = (i + 1) % N; // check next slot

} while (i != j); // repeat until we return to startthrow new HashTableFullException("Hash table is

full."); }

Page 58: 1/51 Dictionaries, Tables Hashing TCSS 342. 2/51 The Dictionary ADT a dictionary (table) is an abstract model of a database like a priority queue, a dictionary.

58/51

Dictionary

public Object removeElement (Object key) throws InvalidKeyException { int i = findItem(key); // find this key first if (i <0)

return Dictionary.NO_SUCH_KEY; // nothing to remove

Object toReturn = element(i); A[i] = AVAILABLE; // mark this slot as deactivated n--; return toReturn;

}