Lecture 11: Why I Like Hash

37
LECTURE 11: WHY I LIKE HASH CSC 213 – Large Scale Programming

description

CSC 213 – Large Scale Programming. Lecture 11: Why I Like Hash. Today’s Goal. Consider what will be important when searching Why search in first place? What is its purpose? What should we expect & handle when searching? What factors matter to our users (and ourselves)? - PowerPoint PPT Presentation

Transcript of Lecture 11: Why I Like Hash

Page 1: Lecture 11: Why I Like Hash

LECTURE 11: WHY I LIKE HASH

CSC 213 – Large Scale Programming

Page 2: Lecture 11: Why I Like Hash

Today’s Goal

Consider what will be important when searching Why search in first place? What is its

purpose? What should we expect & handle when

searching? What factors matter to our users (and

ourselves)? (Besides bad joke source) What is

hashing? Why important for searching? How can it

help? What are critical factors of good hash

function? Commonly-used hash function example

examined

Page 3: Lecture 11: Why I Like Hash

Key Ideas Behind Map

1. Used to convert the key into value2. values cannot share a key and be in

same Map3. In searching failure is normal, not

exceptional

Page 4: Lecture 11: Why I Like Hash

Entry ADT

Needs 2 pieces: what we have & what we want First part is the key: data used in search Item we want is value; the second part of

an Entry Implementations must define 2

methods key() & value() return appropriate item Usually includes setValue() but NOT setKey()

Page 5: Lecture 11: Why I Like Hash

SEQUENCE-Based Map

SEQUENCE’s perspective of MAP that it holds

POSITIONs

elements

Page 6: Lecture 11: Why I Like Hash

SEQUENCE-Based Map

Outside view of MAP and how it is stored

POSITIONs

ENTRYs

Page 7: Lecture 11: Why I Like Hash

SEQUENCE-Based Map

MAP implementation’s view of data and storage

POSITIONs

Elements/ENTRYs

Page 8: Lecture 11: Why I Like Hash

Emergency

Page 9: Lecture 11: Why I Like Hash

Please hold while the machine

searches 100,000

records for your location

Page 10: Lecture 11: Why I Like Hash
Page 11: Lecture 11: Why I Like Hash
Page 12: Lecture 11: Why I Like Hash

Map Performance

In all seriousness, can be matter of life-or-death 911 Operators immediately need

addresses Google’s search performance in TB/s O(log n) time too slow for these uses

Would love to use arrays Get O(1) time to add, remove, or lookup

data This HUGE array needs massive RAM

purchase

Page 13: Lecture 11: Why I Like Hash

Monster Amounts of RAM

Java requires using int as array index Limit to int and RAM available in a

machine Integer.MAX_VALUE = 2,147,483,647 8,200,000,000 pages in Google’s index

(2005) In US, possible phone numbers =

10,000,000,000 Must do more for O(1) array usage time

Page 14: Lecture 11: Why I Like Hash

Monster Amounts of RAM

Java requires using int as array index Limit to int and RAM available in a

machine Integer.MAX_VALUE = 2,147,483,647 8,200,000,000 pages in Google’s index

(2005) In US, possible phone numbers =

10,000,000,000 Must do more for O(1) array usage time

As with all life’s problems we turn to hash

Page 15: Lecture 11: Why I Like Hash

Monster Amounts of RAM

Java requires using int as array index Limit to int and RAM available in a

machine Integer.MAX_VALUE = 2,147,483,647 8,200,000,000 pages in Google’s index

(2005) In US, possible phone numbers =

10,000,000,000 Must do more for O(1) array usage time

As with all life’s problems we turn to hash

Page 16: Lecture 11: Why I Like Hash

Hashing To The Rescue

Hash function turns key into int from 0 – N-1 Result is usable as index for an array Specific for key’s type; cannot be reused

Store the Entrys in array (“HASH TABLE”) (Great name for shop in Amsterdam, too) Begin by computing key’s hash value Result is array index for that Entry

Now is possible to use array for O(1) time!

Page 17: Lecture 11: Why I Like Hash

Hash Table Example

Example shows table of Entry<Long,String>

Simple hash function ish(x) = x mod 10,000 x is/from Entry’s key h(x) computes index to use Always is mod array length

Not all locations used Holes will appear in array Empties: set to null -or-

use sentinel value

Hash Table

Entrys

0 •1 02561200

01 “Jay Doe”

2 9811010002

“Bob Doe”

3 •4 45122900

04 “Jill Roe”

⁞ ⁞999

7 •999

82007519998

“Rhi Smith”

9999 •

Page 18: Lecture 11: Why I Like Hash

Properties of Good Hash

To really be useful, hash must have properties

ReliableFAST

Use entire table

Page 19: Lecture 11: Why I Like Hash

Properties of Good Hash

To really be useful, hash must have properties

ReliableFAST

Use entire tableMake good brownies

Page 20: Lecture 11: Why I Like Hash

Reliability of Hash Function Implement Map with a hash table

To use Entry, get key to easily look up its index

Always computes same index for that key

Page 21: Lecture 11: Why I Like Hash

Speed of Hash Function

Hash must be computed on each access Goal: O(1) efficiency by using an array Efficiency of array wasted if hash is slow

If O(1) computation performed by hash function It is possible to perform get in O(1) time O(1) time for put & remove could also occur None of this is guaranteed; many problems

can occur

Page 22: Lecture 11: Why I Like Hash

Use Entire Table Important

Hashing take lots of space because array is used When creating, make array big enough to

hold all data Can copy to larger array, but this not O(1)

operation Use prime number lengths but these quickly

get large Spreads out Entrys equally across

entire table Further apart it's spread, easier to find

opening

Page 23: Lecture 11: Why I Like Hash

Hash Function Analogy

Page 24: Lecture 11: Why I Like Hash

Hash Function Analogy

Hash table

Page 25: Lecture 11: Why I Like Hash

Hash Function Analogy

Hash function Hash table

Page 26: Lecture 11: Why I Like Hash

Examples of Bad Hash

h(x) = 0 Reliable, fast, little use of table

h(x) = random.nextInt() Unreliable, fast, uses entire table

h(x) = current index -or- free index Reliable, slow, uses entire table

h(x) = x34 + 2x33+ 24x32 + 10x31… Reliable, moderate, too large

Page 27: Lecture 11: Why I Like Hash

Incredibly Bad Hash

Page 28: Lecture 11: Why I Like Hash

Incredibly Bad Hash

Using only part of key & not whole thing No matter what, inevitably, you will guess

wrong

Page 29: Lecture 11: Why I Like Hash

Incredibly Bad Hash

Using only part of key & not whole thing No matter what, inevitably, you will guess

wrong

Page 30: Lecture 11: Why I Like Hash

Incredibly Bad Hash

Using only part of key & not whole thing No matter what, inevitably, you will guess

wrong

Part used for hash

Page 31: Lecture 11: Why I Like Hash

Incredibly Bad Hash

Using only part of key & not whole thing No matter what, inevitably, you will guess

wrong

Part used for hashPart that matters

Page 32: Lecture 11: Why I Like Hash

Good Hash

Hash must first turn key into int Easy for numbers, but rarely that simple in

real life For a String, could add value of each

character Would hash to same index “spot”, “pots”,

“stop” Instead we usually use polynomial code:

Censored

= (x0 * ak-1) + (x1 * ak-2) + … + (xk-2 * a1) + xk-1

Page 33: Lecture 11: Why I Like Hash

Good Hash

Hash must first turn key into int Easy for numbers, but rarely that simple in

real life For a String, could add value of each

character Would hash to same index “spot”, “pots”,

“stop” Instead we usually use polynomial code:

Censored

= (x0 * ak-1) + (x1 * ak-2) + … + (xk-2 * a1) + xk-1

“spot” = (‘s’ * a3) + (‘p’ * a2) + (‘o’ * a1) + (‘t’ * a0)

Page 34: Lecture 11: Why I Like Hash

Good Hash

Hash must first turn key into int Easy for numbers, but rarely that simple in

real life For a String, could add value of each

character Would hash to same index “spot”, “pots”,

“stop” Instead we usually use polynomial code:

Censored

= (x0 * ak-1) + (x1 * ak-2) + … + (xk-2 * a1) + xk-1

“spot” = (‘s’ * a3) + (‘p’ * a2) + (‘o’ * a1) + (‘t’ * a0)

“stop” = (‘s’ * a3) + (‘t’ * a2) + (‘o’ * a1) + (‘p’ * a0)

Page 35: Lecture 11: Why I Like Hash

Good, Fast Hash

Polynomial codes good, but very slow Major bummer since we use hash for its

speed Cause of slowdown: computing an takes n

operations Horner’s method better by

piggybacking workSlow Approach:“spot” = (‘s’ * a3) + (‘p’ * a2) + (‘o’ * a1) + (‘t’ * a0)

Horner’s Method“stop” = ((‘s’ *

a + ‘t’) *a + ‘o’) *a + ‘p’

Page 36: Lecture 11: Why I Like Hash

Compression

Hash’s only use is computing array indices Useless if larger than table’s length: no

index exists! When a=33, “spot” hashed to

4,293,383 Some hash incalculable (like

“triskaidekaphobia”) To compress result, work like array-

based queuehash = (result + length) % length

% returns by modulus (the remainder from division)

Serves exact same purpose: keeps index within limits

Page 37: Lecture 11: Why I Like Hash

Before Next Lecture…

Continue working on week #4 assignment Due at usual time Tues. so may want to get

cracking Start thinking of designs & CRC cards for

project Due next Friday as projects completed in stages

Read sections 9.2.1 & 9.2.5 – 9.2.8 of the book Consider better ways of handling this situation: