CSE 373: Data Structures and Algorithms · Hash Tables: Review •A data-structure for the...
Transcript of CSE 373: Data Structures and Algorithms · Hash Tables: Review •A data-structure for the...
Instructor:LiliandeGreefQuarter:Summer2017
CSE373:DataStructuresandAlgorithmsLecture7:HashTableCollisions
Today
• Announcements• HashTableCollisions• CollisionResolutionSchemes• SeparateChaining• OpenAddressing/Probing
• LinearProbing• QuadraticProbing• DoubleHashing
• Rehashing
Announcements• Reminder:homework2duetomorrow• Homework3:HashTables• Willbeouttomorrownight• Pair-programmingopportunity!(workwithapartner)• Ideasforfindingpartner:before/afterclass,section,Piazza
• Pair-programming:writecodetogether• 2people,1keyboard• Oneisthe“navigator,”theotherthe“driver”• Regularlyswitchofftospendequaltimeinbothroles• Sidenote:ourbrainstendtoeditoutwhenwemaketypos• Needtobeinsamephysicalspaceforentireassignment,sopartnerandplanaccordingly!
Review:HashTables&Collisions
HashTables:Review
• Adata-structureforthedictionaryADT• AveragecaseO(1)find,insert,anddelete
(whenundersomeoften-reasonableassumptions)• Anarraystoring(key,value)pairs• Usehashvalueandtablesizetocalculatearrayindex• Hashvaluecalculatedfromkeyusinghashfunction
find,insert,ordelete(key,value)
applyhashfunctionh(key)=hashvalue
index=hashvalue%tablesize
array[index]=(key,value)
ifcollision,applycollisionresolution
HashTableCollisions:Review
• Collision:
• Wetrytoavoid themby
• Unfortunately,collisionsareunavoidableinpractice• Numberofpossiblekeys>>tablesize• Noperfecthashfunction&table-indexcombo
CollisionResolutionSchemes:yourideas
CollisionResolutionSchemes:yourideas
SeparateChainingOneofseveralcollisionresolutionschemes
SeparateChaining
Allkeysthatmaptothesametablelocation(aka“bucket”)arekeptinalist(“chain”).
Example:insert10,22,107,12,42andTableSize =10(forillustrativepurposes,we’reinsertinghashvalues)
0123456789
SeparateChaining:Worst-Case
What’stheworst-casescenarioforfind?
What’stheworst-caserunningtimeforfind?
Butonlywithreallybadluckorreallybadhashfunction
SeparateChaining:FurtherAnalysis
• Howcanfind becomeslowwhenwehaveagoodhashfunction?
• Howcanwereduceitslikelihood?
RigorousAnalysis:LoadFactor
Definition: Theloadfactor(l) ofahashtablewithN elementsis
𝜆 = #$%&'(*+,(
Underseparatechaining,theaveragenumberofelementsperbucketis_____
Forarandom find,onaverage• anunsuccessfulfind comparesagainst_______items
• asuccessfulfind comparesagainst_______items
RigorousAnalysis:LoadFactor
Definition: Theloadfactor(l) ofahashtablewithN elementsis
𝜆 = #$%&'(*+,(
Tochooseagoodloadfactor,whatareourgoals?
Soforseparatechaining,agoodloadfactoris
OpenAddressing/ProbingAnotherfamilyofcollisionresolutionschemes
Idea:useemptyspaceinthetable
• Ifh(key) isalreadyfull,• try(h(key) + 1) % TableSize.Iffull,• try(h(key) + 2) % TableSize.Iffull,• try(h(key) + 3) % TableSize.Iffull…
• Example:insert38,19,8,109,10
0123456789
OpenAddressingTerminology
Tryingthenextspotiscalled(alsocalled )
• Wejustdidith probewas(h(key) + i) % TableSize
• Ingeneralhavesomef anduse(h(key) + f(i)) % TableSize
DictionaryOperationswithOpenAddressinginsert findsanopentablepositionusingaprobefunction
Whataboutfind?
Whataboutdelete?
• Note:delete withseparatechainingisplain-oldlist-remove
Practice:Thekeys12,18,13,2,3,23,5and15areinsertedintoaninitiallyemptyhashtableoflength10usingopenaddressingwithhashfunctionh(k)=kmod10andlinearprobing.Whatistheresultanthashtable?
012 23 2345 15678 189
012 123 1345 5678 189
012 123 134 25 36 237 58 189 15
012 12, 23 13, 3, 2345 5, 15678 189
(A) (B) (C) (D)
OpenAddressing:LinearProbing
• Quicktocompute!J• Butmostlyabadidea.Why?
(Primary)Clustering
Linearprobingtendstoproduceclusters,whichleadtolongprobingsequences
• Called
• Saw thisstartinginourexample
[R.Sedgewick]
AnalysisofLinearProbing
• Foranyl <1,linearprobingwillfindanemptyslot• Itis“safe”inthissense:noinfiniteloopunlesstableisfull
• Non-trivialfactswewon’tprove:Average#ofprobesgivenl (inthelimitasTableSize→¥ )
• Unsuccessfulsearch:
• Successfulsearch:
• Thisisprettybad:needtoleavesufficientemptyspaceinthetabletogetdecentperformance(seechart)
( ) ÷÷ø
öççè
æ-
+ 2111
21
l
( )÷÷øö
ççè
æ-
+l111
21
Analysis:LinearProbing• Linear-probingperformancedegradesrapidlyastablegetsfull
(Formulaassumes“largetable”butpointremains)
• Bycomparison,chainingperformanceislinearinl andhasnotroublewithl>1
0.002.004.006.008.0010.0012.0014.0016.0018.0020.00
0.00 0.20 0.40 0.60 0.80 1.00
Average#ofProbe
s
LoadFactor
LinearProbing
linearprobingfound
linearprobingnotfound
0.00
50.00
100.00
150.00
200.00
250.00
300.00
350.00
0.00 0.20 0.40 0.60 0.80 1.00
Average#ofProbe
sLoadFactor
LinearProbing
linearprobingfound
linearprobingnotfound
Anyideasforalternatives?
OpenAddressing:QuadraticProbing
• Wecanavoidprimaryclusteringbychangingtheprobefunction(h(key) + f(i)) % TableSize
• Acommontechniqueisquadraticprobing:f(i) = i2• Soprobesequenceis:• 0th probe:h(key) % TableSize• 1st probe:• 2nd probe:• 3rd probe:• …• ith probe:(h(key) + i2) % TableSize
• Intuition:Probesquickly“leavetheneighborhood”
QuadraticProbingExample#1
TableSize =10Insert:
8918495879
ith probe:(h(key) + i2) % TableSize
0123456789
QuadraticProbingExample#2
TableSize =7Insert:
76 (76%7=6)40 (40%7=5)48 (48%7=6)5 (5%7=5)55 (55%7=6)47 (47%7=5)
ith probe:(h(key) + i2) % TableSize
0123456