hashing - cs.unc.edu

46
Hashing

Transcript of hashing - cs.unc.edu

Page 1: hashing - cs.unc.edu

Hashing

Page 2: hashing - cs.unc.edu

DynamicDictionaries

Operations:• create• insert• find• remove• max/min• writeoutinsortedorder

Onlydefined forobjectclassesthatareComparable

Page 3: hashing - cs.unc.edu

Hashtables

Operations:• create• insert• find• remove• max/min• writeoutinsortedorder

Onlydefined forobjectclassesthatareComparable haveequals defined

Page 4: hashing - cs.unc.edu

Hashtables

Operations:• create• insert• find• remove• max/min• writeoutinsortedorder

Onlydefined forobjectclassesthatareComparable haveequals defined

Javaspecific:FromtheJavadocumentation

Page 5: hashing - cs.unc.edu

Hashtables– implementation

• Haveatable(anarray)ofafixedtableSize

• A hashfunctiondetermineswhereinthistableeach

itemshouldbestored

itemhash(item)

[apositiveinteger]

%tableSize

THEDESIGNQUESTIONS

1. ChoosingtableSize

2. Choosingahashfunction

3. Whattodowhenacollision occurs

2174 % 10=4

Page 6: hashing - cs.unc.edu

Hashtables– tableSize

• Shoulddependonthe(maximum)numberofvaluestobestored

• Let λ =[numberofvaluesstored]/tableSize

• Loadfactor ofthehashtable

• Restrictλ tobeatmost1(or½)

• RequiretableSizetobeaprimenumber

• to“randomize”awayanypatternsthatmayariseinthehashfunction

values

• Theprimeshouldbeoftheform(4k+3)

[forreasonstobedetailedlater]

Page 7: hashing - cs.unc.edu

Hashtables– thehashfunction

Iftheobjectstobestoredhaveintegerkeys (e.g.,studentIDs)hash(k)=kis

generallyOK,unlessthekeyshave“patterns”

Otherwise,some“randomized”waytoobtainaninteger

Page 8: hashing - cs.unc.edu

Hashtables– thehashfunction

Iftheobjectstobestoredhaveintegerkeys (e.g.,studentIDs)hash(k)=kis

generallyOK,unlessthekeyshave“patterns”

Otherwise,some“randomized”waytoobtainaninteger

Page 9: hashing - cs.unc.edu

Hashtables– thehashfunction

Iftheobjectstobestoredhaveintegerkeys (e.g.,studentIDs)hash(k)=kis

generallyOK,unlessthekeyshave“patterns”

Otherwise,some“randomized”waytoobtainaninteger

Page 10: hashing - cs.unc.edu

Hashtables– thehashfunction

Iftheobjectstobestoredhaveintegerkeys (e.g.,studentIDs)hash(k)=kis

generallyOK,unlessthekeyshave“patterns”

Otherwise,some“randomized”waytoobtainaninteger

Java-specific

•EveryclasshasadefaulthashCode()methodthatreturnsaninteger

•Maybe(should be)overridden

•Requiredproperties

consistentwiththeclass’sequals()method

neednotbeconsistentacrossdifferentrunsoftheprogram

differentobjectsmayreturnthesamevalue!

Page 11: hashing - cs.unc.edu

Hashtables– thehashfunction

Iftheobjectstobestoredhaveintegerkeys (e.g.,studentIDs)hash(k)=kis

generallyOK,unlessthekeyshave“patterns”

Otherwise,some“randomized”waytoobtainaninteger

Java-specific

•EveryclasshasadefaulthashCode()methodthatreturnsaninteger

•Maybe(should be)overridden

•Requiredproperties

consistentwiththeclass’sequals()method

neednotbeconsistentacrossdifferentrunsoftheprogram

differentobjectsmayreturnthesamevalue!

FromtheJava1.5.0documentation

http://docs.oracle.com/javase/1.5.0/docs/api/java/lang/Object.html#hashCode%28%29

Page 12: hashing - cs.unc.edu

Hashtables– collisionresolution

Theuniverse ofpossibleitemsisusuallyfargreaterthantableSize

Collision:whenmultipleitemshashontothesamelocation(akacellorbucket)

Collisionresolutionstrategiesspecifywhattodoincaseofcollision

1. Chaining(closedaddressing)

2. Probing(openaddressing)

a. Linearprobing

b. Quadraticprobing

c. DoubleHashing

d. PerfectHashing

e. CuckooHashing

Page 13: hashing - cs.unc.edu

Hashtables– implementation

• Haveatable(anarray)ofafixedtableSize

• A hashfunctiondetermineswhereinthistableeach

itemshouldbestored

itemhash(item)

[apositiveinteger]

%tableSize

THEDESIGNQUESTIONS

1. ChoosingtableSize

2. Choosingahashfunction

3. Whattodowhenacollision occurs

Page 14: hashing - cs.unc.edu

Hashtables– tableSize

Restricttheloadfactorλ =[numberofvaluesstored]/tableSize tobe

atmost1(or½)

RequiretableSizetobeaprimenumberoftheform(4k+3)

Page 15: hashing - cs.unc.edu

Hashtables– thehashfunction

Iftheobjectstobestoredhaveintegerkeys (e.g.,studentIDs)hash(k)=kis

generallyOK,unlessthekeyshave“patterns”

Otherwise,some“randomized”waytoobtainaninteger

Java-specific

•EveryclasshasadefaulthashCode()methodthatreturnsaninteger

•Maybeoverridden

•Requiredproperties

consistentwiththeclass’sequals()method

neednotbeconsistentacrossdifferentrunsoftheprogram

differentobjectsmayreturnthesamevalue!

Page 16: hashing - cs.unc.edu

Hashtables– collisionresolution

Theuniverse ofpossibleitemsisusuallyfargreaterthantableSize

Collision:whenmultipleitemshashontothesamelocation(akacellorbucket)

Collisionresolutionstrategiesspecifywhattodoincaseofcollision

1. Chaining(closedaddressing)

2. Probing(openaddressing)

a. Linearprobing

b. Quadraticprobing

c. DoubleHashing

d. PerfectHashing

e. CuckooHashing

Page 17: hashing - cs.unc.edu

Hashtables– collisionresolution: chaining

Maintainalinkedlist ateachcell/bucket

(Thehashtableisan arrayoflinkedlists)

Insert:atfrontoflist

- ifpre-condition is“notalreadyinlist,” then faster

- inanycase,later-inserteditemsoftenaccessedmorefrequently (theLRU principle)

Example:Insert02,12, 22,…,92 intoaninitiallyemptyhashtablewithtableSize =10

[Note:badchoiceoftableSize– onlytomaketheexampleeasier!!]

Page 18: hashing - cs.unc.edu

Maintainalinkedlist ateachcell/bucket

(Thehashtableisan arrayoflinkedlists)

Insert:atfrontoflist

- ifpre-condisthatnotalreadyinlist,thenfaster

- inanycase,later-inserteditemsoftenaccessedmorefrequently

Example:Insert02,12, 22,…,92 intoaninitiallyemptyhashtablewithtableSize =10

[Note:badchoiceoftableSize– onlytomaketheexampleeasier!!]

Hashtables– collisionresolution: chaining

Page 19: hashing - cs.unc.edu

Maintainalinkedlist ateachcell/bucket

(Thehashtableisan arrayoflinkedlists)

Insert:atfrontoflist

- ifpre-condisthatnotalreadyinlist,thenfaster

-inanycase,later-inserteditemsoftenaccessedmorefrequently

FindandRemove:obviousimplementations

Worst-caserun-time:Θ(N)peroperation(allelementsinthesamelist)

Averagecase:O(λ) peroperationDesignrule:forchaining,keepλ ≤1Ifλ becomesgreaterthan1,rehash (later)

Hashtables– collisionresolution: chaining

Theloadfactor:[numberofitemsstored]/tableSize

Page 20: hashing - cs.unc.edu

Hashtables– collisionresolution:probing

1. Chaining (closedaddressing)2. Probing (open addressing)

a. Linearprobingb. Quadraticprobingc. DoubleHashingd. PerfectHashinge. CuckooHashing

Incaseofcollision, tryalternativelocations untilanemptycellisfound

• [Open address]

Probesequence:ho(x), h1(x),h2(x),…,withhi(x)=[hash(x)+f(i)]%tableSize

Thefunction f(i) isdifferent forthedifferentprobingmethods

Avoids theuseofdynamicmemory

f(i) isalinear functionofi– typically,f(i)=i

Example:insert89,18,49,58,and 69 intoatableofsize10,usinglinearprobing

Page 21: hashing - cs.unc.edu

Hashtables– collisionresolution:linearprobing

1. Chaining (closed addressing)2. Probing (open addressing)

a. Linear probingb. Quadratic probingc. Double Hashingd. Perfect Hashinge. Cuckoo Hashing

In case of collision, try alternative locations until an empty cell is found

• [Open address]

Probe sequence: ho(x), h1(x), h2(x), …, with hi(x) = [hash(x) + f(i)] % tableSize

The function f(i) is different for the different probing methods

Avoids the use of dynamic memory

f(i) is a linear function of i – typically, f(i) = i

Example:insert89,18,49,58,and 69 intoatableofsize10,usinglinearprobing

Page 22: hashing - cs.unc.edu

Hashtables- review

Supports thebasicdynamicdictionaryops:insert,find, remove

Doesnot needclasstobeComparable

Threedesigndecisions: tableSize,hashfunction, collision resolution

Tablesize

aprime oftheform(4k+3),keepingloadfactor constraintsinmind

Hashfunction

should“randomize”theitems

Java’shashCode() method

Collision resolution: chaining

Collision resolution:probing (openaddressing)– linearprobing

Theclustering problem

Page 23: hashing - cs.unc.edu

Hashtables- clustering

Twocausesofclustering:

multiplekeyshashontothesamelocation(secondary clustering)

multiplekeyshashontothesamecluster(primary clustering)

Secondary clusteringcausedbyhashfunction;primary,bychoiceofprobesequence

Numberofprobesperoperationincreases with loadfactor

Page 24: hashing - cs.unc.edu

Hashtables– collisionresolution:probing

1. Chaining (closedaddressing)2. Probing (open addressing)

a. Linearprobingb. Quadraticprobingc. DoubleHashingd. PerfectHashinge. CuckooHashing

f(i) isaquadraticfunctionof i(e.g.,f(i)=i2)

Example:insert89,18,49,58,and 69 intoatableofsize10,usingquadraticprobing

Page 25: hashing - cs.unc.edu

Hashtables– collisionresolution:quadraticprobing

Example:insert89,18,49,58,and 69 intoatableofsize10,usingquadraticprobing

Page 26: hashing - cs.unc.edu

Hashtables– collisionresolution:quadraticprobing

Twocausesofclustering:

multiplekeyshashontothesamelocation(secondary clustering)

multiplekeyshashontothesamecluster(primary clustering)

Whichonedoesquadraticprobing solve?

primaryclustering

Efficientimplementation ofi2 à (i+1)2:(i+1)and(2i+1) inparallel,andthenaddi2 and

(2i+1)

Choosing tableSize:

-prime:atleasthalfthetablegetsprobed

-primeof theform (4k+3)andprobesequence is± i2:entiretablegetsprobed

Remove:lazydelete mustbeused

Page 27: hashing - cs.unc.edu

Hashtables– collisionresolution:probing

1. Chaining (closedaddressing)2. Probing (open addressing)

a. Linearprobingb. Quadraticprobingc. DoubleHashingd. PerfectHashinge. CuckooHashing

Togetridofsecondary clustering

Usetwohashfunctions: hash1(.) andhash2(.)

Probesequence“step”sizeishash2(.)

- [Unlikelydistinctitemsagreeonboth hash1(.)andhash2(.)]

hash2(.) mustneverevaluatetozero!

Acommon(good)choice:R– (xmodR), forRaprime

smallerthantableSize

Example:insert89,18,49,58,and 69 intoatableofsize10,usingdoublehashingwithhash2(x)=7– xmod7

Page 28: hashing - cs.unc.edu

Hashtables– collisionresolution:doublehashing

Example:insert89,18,49,58,and 69 intoatableofsize10,usingdoublehashingwithhash2(x)=7– xmod7

Page 29: hashing - cs.unc.edu

Hashtables– collisionresolution:probing

1. Chaining (closedaddressing)2. Probing (open addressing)

a. Linearprobingb. Quadraticprobingc. DoubleHashingd. PerfectHashinge. CuckooHashing

Page 30: hashing - cs.unc.edu

Hashtables– collisionresolution:Cuckoohashing

Goal:constant-timeO(1)find intheworstcase

Exampleapplication:networkroutingtables

[remove alsotakesO(1)time]

Inserthasworst-caseΘ(N)run-time

Keeptwo hashtables,andusetwodifferenthashfunctions

Page 31: hashing - cs.unc.edu

Hashtables– collisionresolution:Cuckoohashing

TABLE1 TABLE2

01234

A:hash1(A)=0,hash2(A)=2

A B:hash1(B)=0,hash2(B)=0B

Page 32: hashing - cs.unc.edu

Hashtables– collisionresolution:Cuckoohashing

TABLE1 TABLE2

01234

A:hash1(A)=0,hash2(A)=2

A

B:hash1(B)=0,hash2(B)=0B

C:hash1(C)=1,hash2(C)=4C

D:hash1(D)=1,hash2(D)=0

D

Page 33: hashing - cs.unc.edu

Hashtables– collisionresolution:Cuckoohashing

TABLE1 TABLE2

01234

A:hash1(A)=0,hash2(A)=2

A

B:hash1(B)=0,hash2(B)=0B

C:hash1(C)=1,hash2(C)=4

C

D:hash1(D)=1,hash2(D)=0

D

E:hash1(E)=3,hash2(E)=2E

F:hash1(F)=3,hash2(F)=4

F

Page 34: hashing - cs.unc.edu

Hashtables– collisionresolution:Cuckoohashing

TABLE1 TABLE2

01234

A:hash1(A)=0,hash2(A)=2

A

B:hash1(B)=0,hash2(B)=0B

C:hash1(C)=1,hash2(C)=4

C

D:hash1(D)=1,hash2(D)=0

D

E:hash1(E)=3,hash2(E)=2

E

F:hash1(F)=3,hash2(F)=4

F

Page 35: hashing - cs.unc.edu

Hashtables– collisionresolution:Cuckoohashing

TABLE1 TABLE2

01234

A:hash1(A)=0,hash2(A)=2

A B:hash1(B)=0,hash2(B)=0B

C:hash1(C)=1,hash2(C)=4

C

D:hash1(D)=1,hash2(D)=0

D

E:hash1(E)=3,hash2(E)=2

E

F:hash1(F)=3,hash2(F)=4

F

Page 36: hashing - cs.unc.edu

Hashtables– collisionresolution:Cuckoohashing

TABLE1 TABLE2

01234

A:hash1(A)=0,hash2(A)=2

A B:hash1(B)=0,hash2(B)=0B

C:hash1(C)=1,hash2(C)=4

C

D:hash1(D)=1,hash2(D)=0

D

E:hash1(E)=3,hash2(E)=2

E

F:hash1(F)=3,hash2(F)=4

F

Page 37: hashing - cs.unc.edu

Hashtables– collisionresolution:Cuckoohashing

Insert

- InsertintoTable1,usinghash1

- Ifcellisalreadyoccupied

- bump itemintoother table(usingappropriatehashfunction)

- Repeat

- Rehash afterkrepetitions

Eachtableshould bemorethanhalfempty

Stronger condition thanloadfactor≤½

Page 38: hashing - cs.unc.edu

Rehashing

Whenloadfactorbecomestoolarge…

(Approximately)double tableSize

Scan oldtable,insertingeachnon-deleteditemintothenewtable

Worst-case time?

- O(N2)

Average-case:O(N)

Amortizedanalysis

Averagecostperinsert,overasequenceofrepeatedre-hashings

[Notgreatforinteractiveapplications…]

Page 39: hashing - cs.unc.edu
Page 40: hashing - cs.unc.edu

Hashtables- review

Supports thebasicdynamicdictionaryops:insert,find, remove

Threedesigndecisions: tableSize,hashfunction, collision resolution

Tablesize:aprime oftheform(4k+3),keepingloadfactor constraintsinmind

Hashfunction

Java’shashCode() method

item goestohash(item)%tableSize

Collision:multiple itemsatthesamelocation

Collision resolution:-chaining

Collision resolution: -probing (openaddressing)- Linearprobing

- Quadraticprobing

- DoubleHashing

- CuckooHashing

Page 41: hashing - cs.unc.edu

Java-specific– hashCode() andequals()

public class Employee {String name;int id;public Employee(String n, int i){name = n; id = i;}

public boolean equals(Employee e){return (name == e.name && id == e.id);

}}

……

public static void main(String[] args) {Employee e1=new Employee("weiss", 001);Employee e2=new Employee("weiss", 001);System.out.println(e1.hashCode() + ", " + e2.hashCode());System.out.println(e1 == e2);System.out.println(e1.equals(e2));

Employee e2 = e1;

Page 42: hashing - cs.unc.edu

f(i) canbeanylinear function (a*i+b)

Ifgcd(a,tableSize)=1,thenlinearprobing willprobe theentiretable

Primaryclustering:blocksofoccupiedcellsstartforming eveninarelativelyemptytable

Hashtables– collisionresolution:linearprobing

anyitemhashing here…

Page 43: hashing - cs.unc.edu

f(i) canbeanylinear function (a*i+b)

Ifgcd(a,tableSize)=1,thenlinearprobing willprobe theentiretable

Primaryclustering:blocksofoccupiedcellsstartforming eveninarelativelyemptytable

Hashtables– collisionresolution:linearprobing

anyitemhashing here… grows theclusterbyone

Page 44: hashing - cs.unc.edu

f(i) canbeanylinear function (a*i+b)

Ifgcd(a,tableSize)=1,thenlinearprobing willprobe theentiretable

Primaryclustering:blocksofoccupiedcellsstartforming eveninarelativelyemptytable

Hashtables– collisionresolution:linearprobing

anyitemhashing here… mergesthetwoclusters

Page 45: hashing - cs.unc.edu

Hashtables- clustering

Twocausesofclustering:

multiplekeyshashontothesamelocation(secondary clustering)

multiplekeyshashontothesamecluster(primary clustering)

Secondary clusteringcausedbyhashfunction;primary,bychoiceofprobesequence

Numberofprobesperoperationincreases with loadfactor

Page 46: hashing - cs.unc.edu

Hashtables– linearprobing:remove

0

1

2

3

4

5

6

7

8

9

insertA;hash(A)=4

A

insertB;hash(B)=5

B

insertC;hash(C)=4

C

removeBfindC

Removemust beimplementedaslazydelete!!

- Loadfactorcomputed including lazy-deleteditems

- Ininserts,may“reclaim”lazy-deletedcells