Hashing Nelson Padua-Perez Bill Pugh Department of Computer Science University of Maryland, College...

25
Hashing Nelson Padua-Perez Bill Pugh Department of Computer Science University of Maryland, College Park
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    214
  • download

    0

Transcript of Hashing Nelson Padua-Perez Bill Pugh Department of Computer Science University of Maryland, College...

Page 1: Hashing Nelson Padua-Perez Bill Pugh Department of Computer Science University of Maryland, College Park.

Hashing

Nelson Padua-Perez

Bill Pugh

Department of Computer Science

University of Maryland, College Park

Page 2: Hashing Nelson Padua-Perez Bill Pugh Department of Computer Science University of Maryland, College Park.

Hashing

ApproachTransform key into number (hash value)

Use hash value to index object in hash table

Use hash function to convert key to number

Page 3: Hashing Nelson Padua-Perez Bill Pugh Department of Computer Science University of Maryland, College Park.

Hashing

Hash TableArray indexed using hash values

Hash Table A with size N

Indices of A range from 0 to N-1

Store in A[ hashValue % N]

Page 4: Hashing Nelson Padua-Perez Bill Pugh Department of Computer Science University of Maryland, College Park.

Beware of %

The % operator is integer remainderx % y == x - y * (x/y)

It doesn’t work the way mathematicians would think

3/2 == 1 3%2 == 12/2 == 1 2%2 == 01/2 == 0 1%2 == 10/2 == 0 0% 2 == 0

(-1)/2 == 0 (-1)%2 == -1(-2)/2 == -1 (-2)%2 == 0

(-3)/2 == -1 (-3)%2 == -1

Page 5: Hashing Nelson Padua-Perez Bill Pugh Department of Computer Science University of Maryland, College Park.

Scattering hash values

hashCode is a 32-bit signed intHave to reduce it to 0..N-1

Could use Math.abs(key.hashCode() % N)might not distribute values well, particularly if N is a power of 2

Multiplicative congruency methodProduces good hash values

Hash value = Math.abs((a * key.hashCode()) % N)

Where

N is table size

a, N are large primes

Page 6: Hashing Nelson Padua-Perez Bill Pugh Department of Computer Science University of Maryland, College Park.

Be careful with Math.abs

We have to useMath.abs( x % N )

Rather than Math.abs(x) % N

why?

Page 7: Hashing Nelson Padua-Perez Bill Pugh Department of Computer Science University of Maryland, College Park.

A scary fact about ints

Integer.MIN_VALUE = - 231

Integer.MIN_VALUE == - Integer.MIN_VALUE

Math.abs(Integer.MIN_VALUE) == Integer.MIN_VALUE

An int value can represent any integer from (- 231) ... (231-1)

An int cannot represent 231

(231-1)+1 == (- 231)

Page 8: Hashing Nelson Padua-Perez Bill Pugh Department of Computer Science University of Maryland, College Park.

Art and magic of hashCodes

There is no “right” hashCode function

some art and magic to finding a good hashCode function, and to finding a hashCode to hashBucket function

From java.util.HashMap:

static int hashBucket(Object x, int N) {

int h = x.hashCode();

h += ~(h << 9);

h ^= (h >>> 14);

h += (h << 4);

h ^= (h >>> 10);

return Math.abs(h % N);

Page 9: Hashing Nelson Padua-Perez Bill Pugh Department of Computer Science University of Maryland, College Park.

Hash Function

Example

hashCode("apple") = 5hashCode("watermelon") = 3hashCode("grapes") = 8hashCode("kiwi") = 0hashCode("strawberry") = 9hashCode("mango") = 6hashCode("banana") = 2

Perfect hash functionUnique values for each key

kiwi

bananawatermelon

applemango

grapesstrawberry

0

1

2

3

4

5

6

7

8

9

Page 10: Hashing Nelson Padua-Perez Bill Pugh Department of Computer Science University of Maryland, College Park.

Hash Function

Suppose now

hashCode("apple") = 5hashCode("watermelon") = 3hashCode("grapes") = 8hashCode("kiwi") = 0hashCode("strawberry") = 9hashCode("mango") = 6hashCode("banana") = 2

hashCode(“orange") = 3

CollisionSame hash value for multiple keys

kiwi

bananawatermelon

applemango

grapesstrawberry

0

1

2

3

4

5

6

7

8

9

Page 11: Hashing Nelson Padua-Perez Bill Pugh Department of Computer Science University of Maryland, College Park.

Types of Hash Tables

Open addressingStore objects in each table entry

Chaining (bucket hashing)

Store lists of objects in each table entry

Page 12: Hashing Nelson Padua-Perez Bill Pugh Department of Computer Science University of Maryland, College Park.

Open Addressing Hashing

ApproachHash table contains objects

Probe examine table entry

Collision

Move K entries past current location

Wrap around table if necessary

Find location for X

Examine entry at A[ bucket(X) ]

If entry = X, found

If entry = empty, X not in hash table

Else increment location by K, repeat

Page 13: Hashing Nelson Padua-Perez Bill Pugh Department of Computer Science University of Maryland, College Park.

Open Addressing Hashing

ApproachLinear probing

K = 1

May form clusters of contiguous entries

Deletions

Find location for X

If X inside cluster, leave non-empty marker

Insertion

Find location for X

Insert if X not in hash table

Can insert X at first unoccupied location

Page 14: Hashing Nelson Padua-Perez Bill Pugh Department of Computer Science University of Maryland, College Park.

Open Addressing Example

Hash codesH(A) = 6 H(C) = 6

H(B) = 7 H(D) = 7

Hash tableSize = 8 elements

= empty entry

* = non-empty marker

Linear probingCollision move 1 entry past current location

12345678

Page 15: Hashing Nelson Padua-Perez Bill Pugh Department of Computer Science University of Maryland, College Park.

Open Addressing Example

OperationsInsert A, Insert B, Insert C, Insert D

12345678

A

12345678

AB

12345678

ABC

12345678

DABC

Page 16: Hashing Nelson Padua-Perez Bill Pugh Department of Computer Science University of Maryland, College Park.

Open Addressing Example

OperationsFind A, Find B, Find C, Find D

12345678

12345678

12345678

12345678

DABC

DABC

DABC

DABC

Page 17: Hashing Nelson Padua-Perez Bill Pugh Department of Computer Science University of Maryland, College Park.

Open Addressing Example

OperationsDelete A, Delete C, Find D, Insert C

12345678

12345678

12345678

12345678

DCB*

D*BC

D*B*

D*B*

Page 18: Hashing Nelson Padua-Perez Bill Pugh Department of Computer Science University of Maryland, College Park.

Efficiency of Open Hashing

Load factor = entries / table size

Hashing is efficient for load factor < 90%

Page 19: Hashing Nelson Padua-Perez Bill Pugh Department of Computer Science University of Maryland, College Park.

Chaining (Bucket Hashing)

ApproachHash table contains lists of objects

Find location for X

Find hash code key for X

Examine list at table entry A[ key ]

Collision

Multiple entries in list for entry

Page 20: Hashing Nelson Padua-Perez Bill Pugh Department of Computer Science University of Maryland, College Park.

Chaining Example

Hash codesH(A) = 6 H(C) = 6

H(B) = 7 H(D) = 7

Hash tableSize = 8 elements

= empty entry

12345678

Page 21: Hashing Nelson Padua-Perez Bill Pugh Department of Computer Science University of Maryland, College Park.

Chaining Example

OperationsInsert A, Insert B, Insert C

12345678

A

12345678

A

B

12345678

C

B

A

Page 22: Hashing Nelson Padua-Perez Bill Pugh Department of Computer Science University of Maryland, College Park.

Chaining Example

OperationsFind B, Find A

12345678

C

B

A

12345678

C

B

A

Page 23: Hashing Nelson Padua-Perez Bill Pugh Department of Computer Science University of Maryland, College Park.

Efficiency of Chaining

Load factor = entries / table size

Average caseEvenly scattered entries

Operations = O( load factor )

Worse case Entries mostly have same hash value

Operations = O( entries )

Page 24: Hashing Nelson Padua-Perez Bill Pugh Department of Computer Science University of Maryland, College Park.

Hashing in Java

CollectionsHashMap & HashSet implement hashing

ObjectsBuilt-in support for hashing

boolean equals(object o)

int hashCode()

Can override with own definitions

Must be careful to support Java contract

Page 25: Hashing Nelson Padua-Perez Bill Pugh Department of Computer Science University of Maryland, College Park.

Java Contract

hashCode() Must return same value for object in each execution, provided no information used in equals comparisons on the object is modified

equals()if a.equals(b), then a.hashCode() must be the same as b.hashCode()

if a.hashCode() != b.hashCode(), then !a.equals(b)

a.hashCode() == b.hashCode()Does not imply a.equals(b)

Though Java libraries will be more efficient if it is true