CSE 373: Data Structures and Algorithms · Hash Tables: Review •A data-structure for the...

27
Instructor: Lilian de Greef Quarter: Summer 2017 CSE 373: Data Structures and Algorithms Lecture 7: Hash Table Collisions

Transcript of CSE 373: Data Structures and Algorithms · Hash Tables: Review •A data-structure for the...

Page 1: CSE 373: Data Structures and Algorithms · Hash Tables: Review •A data-structure for the dictionary ADT •Average case O(1) find, insert, and delete (when under some often-reasonable

Instructor:LiliandeGreefQuarter:Summer2017

CSE373:DataStructuresandAlgorithmsLecture7:HashTableCollisions

Page 2: CSE 373: Data Structures and Algorithms · Hash Tables: Review •A data-structure for the dictionary ADT •Average case O(1) find, insert, and delete (when under some often-reasonable

Today

• Announcements• HashTableCollisions• CollisionResolutionSchemes• SeparateChaining• OpenAddressing/Probing

• LinearProbing• QuadraticProbing• DoubleHashing

• Rehashing

Page 3: CSE 373: Data Structures and Algorithms · Hash Tables: Review •A data-structure for the dictionary ADT •Average case O(1) find, insert, and delete (when under some often-reasonable

Announcements• Reminder:homework2duetomorrow• Homework3:HashTables• Willbeouttomorrownight• Pair-programmingopportunity!(workwithapartner)• Ideasforfindingpartner:before/afterclass,section,Piazza

• Pair-programming:writecodetogether• 2people,1keyboard• Oneisthe“navigator,”theotherthe“driver”• Regularlyswitchofftospendequaltimeinbothroles• Sidenote:ourbrainstendtoeditoutwhenwemaketypos• Needtobeinsamephysicalspaceforentireassignment,sopartnerandplanaccordingly!

Page 4: CSE 373: Data Structures and Algorithms · Hash Tables: Review •A data-structure for the dictionary ADT •Average case O(1) find, insert, and delete (when under some often-reasonable

Review:HashTables&Collisions

Page 5: CSE 373: Data Structures and Algorithms · Hash Tables: Review •A data-structure for the dictionary ADT •Average case O(1) find, insert, and delete (when under some often-reasonable

HashTables:Review

• Adata-structureforthedictionaryADT• AveragecaseO(1)find,insert,anddelete

(whenundersomeoften-reasonableassumptions)• Anarraystoring(key,value)pairs• Usehashvalueandtablesizetocalculatearrayindex• Hashvaluecalculatedfromkeyusinghashfunction

find,insert,ordelete(key,value)

applyhashfunctionh(key)=hashvalue

index=hashvalue%tablesize

array[index]=(key,value)

ifcollision,applycollisionresolution

Page 6: CSE 373: Data Structures and Algorithms · Hash Tables: Review •A data-structure for the dictionary ADT •Average case O(1) find, insert, and delete (when under some often-reasonable

HashTableCollisions:Review

• Collision:

• Wetrytoavoid themby

• Unfortunately,collisionsareunavoidableinpractice• Numberofpossiblekeys>>tablesize• Noperfecthashfunction&table-indexcombo

Page 7: CSE 373: Data Structures and Algorithms · Hash Tables: Review •A data-structure for the dictionary ADT •Average case O(1) find, insert, and delete (when under some often-reasonable

CollisionResolutionSchemes:yourideas

Page 8: CSE 373: Data Structures and Algorithms · Hash Tables: Review •A data-structure for the dictionary ADT •Average case O(1) find, insert, and delete (when under some often-reasonable

CollisionResolutionSchemes:yourideas

Page 9: CSE 373: Data Structures and Algorithms · Hash Tables: Review •A data-structure for the dictionary ADT •Average case O(1) find, insert, and delete (when under some often-reasonable

SeparateChainingOneofseveralcollisionresolutionschemes

Page 10: CSE 373: Data Structures and Algorithms · Hash Tables: Review •A data-structure for the dictionary ADT •Average case O(1) find, insert, and delete (when under some often-reasonable

SeparateChaining

Allkeysthatmaptothesametablelocation(aka“bucket”)arekeptinalist(“chain”).

Example:insert10,22,107,12,42andTableSize =10(forillustrativepurposes,we’reinsertinghashvalues)

0123456789

Page 11: CSE 373: Data Structures and Algorithms · Hash Tables: Review •A data-structure for the dictionary ADT •Average case O(1) find, insert, and delete (when under some often-reasonable

SeparateChaining:Worst-Case

What’stheworst-casescenarioforfind?

What’stheworst-caserunningtimeforfind?

Butonlywithreallybadluckorreallybadhashfunction

Page 12: CSE 373: Data Structures and Algorithms · Hash Tables: Review •A data-structure for the dictionary ADT •Average case O(1) find, insert, and delete (when under some often-reasonable

SeparateChaining:FurtherAnalysis

• Howcanfind becomeslowwhenwehaveagoodhashfunction?

• Howcanwereduceitslikelihood?

Page 13: CSE 373: Data Structures and Algorithms · Hash Tables: Review •A data-structure for the dictionary ADT •Average case O(1) find, insert, and delete (when under some often-reasonable

RigorousAnalysis:LoadFactor

Definition: Theloadfactor(l) ofahashtablewithN elementsis

𝜆 = #$%&'(*+,(

Underseparatechaining,theaveragenumberofelementsperbucketis_____

Forarandom find,onaverage• anunsuccessfulfind comparesagainst_______items

• asuccessfulfind comparesagainst_______items

Page 14: CSE 373: Data Structures and Algorithms · Hash Tables: Review •A data-structure for the dictionary ADT •Average case O(1) find, insert, and delete (when under some often-reasonable

RigorousAnalysis:LoadFactor

Definition: Theloadfactor(l) ofahashtablewithN elementsis

𝜆 = #$%&'(*+,(

Tochooseagoodloadfactor,whatareourgoals?

Soforseparatechaining,agoodloadfactoris

Page 15: CSE 373: Data Structures and Algorithms · Hash Tables: Review •A data-structure for the dictionary ADT •Average case O(1) find, insert, and delete (when under some often-reasonable

OpenAddressing/ProbingAnotherfamilyofcollisionresolutionschemes

Page 16: CSE 373: Data Structures and Algorithms · Hash Tables: Review •A data-structure for the dictionary ADT •Average case O(1) find, insert, and delete (when under some often-reasonable

Idea:useemptyspaceinthetable

• Ifh(key) isalreadyfull,• try(h(key) + 1) % TableSize.Iffull,• try(h(key) + 2) % TableSize.Iffull,• try(h(key) + 3) % TableSize.Iffull…

• Example:insert38,19,8,109,10

0123456789

Page 17: CSE 373: Data Structures and Algorithms · Hash Tables: Review •A data-structure for the dictionary ADT •Average case O(1) find, insert, and delete (when under some often-reasonable

OpenAddressingTerminology

Tryingthenextspotiscalled(alsocalled )

• Wejustdidith probewas(h(key) + i) % TableSize

• Ingeneralhavesomef anduse(h(key) + f(i)) % TableSize

Page 18: CSE 373: Data Structures and Algorithms · Hash Tables: Review •A data-structure for the dictionary ADT •Average case O(1) find, insert, and delete (when under some often-reasonable

DictionaryOperationswithOpenAddressinginsert findsanopentablepositionusingaprobefunction

Whataboutfind?

Whataboutdelete?

• Note:delete withseparatechainingisplain-oldlist-remove

Page 19: CSE 373: Data Structures and Algorithms · Hash Tables: Review •A data-structure for the dictionary ADT •Average case O(1) find, insert, and delete (when under some often-reasonable

Practice:Thekeys12,18,13,2,3,23,5and15areinsertedintoaninitiallyemptyhashtableoflength10usingopenaddressingwithhashfunctionh(k)=kmod10andlinearprobing.Whatistheresultanthashtable?

012 23 2345 15678 189

012 123 1345 5678 189

012 123 134 25 36 237 58 189 15

012 12, 23 13, 3, 2345 5, 15678 189

(A) (B) (C) (D)

Page 20: CSE 373: Data Structures and Algorithms · Hash Tables: Review •A data-structure for the dictionary ADT •Average case O(1) find, insert, and delete (when under some often-reasonable

OpenAddressing:LinearProbing

• Quicktocompute!J• Butmostlyabadidea.Why?

Page 21: CSE 373: Data Structures and Algorithms · Hash Tables: Review •A data-structure for the dictionary ADT •Average case O(1) find, insert, and delete (when under some often-reasonable

(Primary)Clustering

Linearprobingtendstoproduceclusters,whichleadtolongprobingsequences

• Called

• Saw thisstartinginourexample

[R.Sedgewick]

Page 22: CSE 373: Data Structures and Algorithms · Hash Tables: Review •A data-structure for the dictionary ADT •Average case O(1) find, insert, and delete (when under some often-reasonable

AnalysisofLinearProbing

• Foranyl <1,linearprobingwillfindanemptyslot• Itis“safe”inthissense:noinfiniteloopunlesstableisfull

• Non-trivialfactswewon’tprove:Average#ofprobesgivenl (inthelimitasTableSize→¥ )

• Unsuccessfulsearch:

• Successfulsearch:

• Thisisprettybad:needtoleavesufficientemptyspaceinthetabletogetdecentperformance(seechart)

( ) ÷÷ø

öççè

æ-

+ 2111

21

l

( )÷÷øö

ççè

æ-

+l111

21

Page 23: CSE 373: Data Structures and Algorithms · Hash Tables: Review •A data-structure for the dictionary ADT •Average case O(1) find, insert, and delete (when under some often-reasonable

Analysis:LinearProbing• Linear-probingperformancedegradesrapidlyastablegetsfull

(Formulaassumes“largetable”butpointremains)

• Bycomparison,chainingperformanceislinearinl andhasnotroublewithl>1

0.002.004.006.008.0010.0012.0014.0016.0018.0020.00

0.00 0.20 0.40 0.60 0.80 1.00

Average#ofProbe

s

LoadFactor

LinearProbing

linearprobingfound

linearprobingnotfound

0.00

50.00

100.00

150.00

200.00

250.00

300.00

350.00

0.00 0.20 0.40 0.60 0.80 1.00

Average#ofProbe

sLoadFactor

LinearProbing

linearprobingfound

linearprobingnotfound

Page 24: CSE 373: Data Structures and Algorithms · Hash Tables: Review •A data-structure for the dictionary ADT •Average case O(1) find, insert, and delete (when under some often-reasonable

Anyideasforalternatives?

Page 25: CSE 373: Data Structures and Algorithms · Hash Tables: Review •A data-structure for the dictionary ADT •Average case O(1) find, insert, and delete (when under some often-reasonable

OpenAddressing:QuadraticProbing

• Wecanavoidprimaryclusteringbychangingtheprobefunction(h(key) + f(i)) % TableSize

• Acommontechniqueisquadraticprobing:f(i) = i2• Soprobesequenceis:• 0th probe:h(key) % TableSize• 1st probe:• 2nd probe:• 3rd probe:• …• ith probe:(h(key) + i2) % TableSize

• Intuition:Probesquickly“leavetheneighborhood”

Page 26: CSE 373: Data Structures and Algorithms · Hash Tables: Review •A data-structure for the dictionary ADT •Average case O(1) find, insert, and delete (when under some often-reasonable

QuadraticProbingExample#1

TableSize =10Insert:

8918495879

ith probe:(h(key) + i2) % TableSize

0123456789

Page 27: CSE 373: Data Structures and Algorithms · Hash Tables: Review •A data-structure for the dictionary ADT •Average case O(1) find, insert, and delete (when under some often-reasonable

QuadraticProbingExample#2

TableSize =7Insert:

76 (76%7=6)40 (40%7=5)48 (48%7=6)5 (5%7=5)55 (55%7=6)47 (47%7=5)

ith probe:(h(key) + i2) % TableSize

0123456