Lecture 8 - web.stanford.edu
Transcript of Lecture 8 - web.stanford.edu
Lecture8HASHING!!!!!
Announcements
• HW3dueFriday!
• HW4postedFriday!
• Q:WherecanIseeexamplesofproofs?• LectureNotes• CLRS• HWSolutions
• Officehours:linesarelongL
• Solutions:• Wewillbe(more)mindfulofthroughput.• GetmoreTAs• Stopassigninghomework• UsePiazza!• Startearly. (TherearenolinesonMonday!)
Today:hashing
n=9buckets
1
2
3
9
13
22
43
9…
NIL
NIL
NIL
NIL
#
Outline
• HashtablesareanothersortofdatastructurethatallowsfastINSERT/DELETE/SEARCH.
• likeself-balancingbinarytrees
• Thedifferenceiswecangetbetterperformanceinexpectationbyusingrandomness.
• Hashfamiliesarethemagicbehindhashtables.
• Universalhashfamiliesareevenmoremagic.
Goal:JustlikeonMonday
• WeareinterestinginputtingnodeswithkeysintoadatastructurethatsupportsfastINSERT/DELETE/SEARCH.
• INSERT
• DELETE
• SEARCH
5
datastructure
5
4
52
HEREITIS
nodewithkey“2”
Today:
• Hashtables:
• O(1)expectedtimeINSERT/DELETE/SEARCH
• Worseworst-caseperformance,butoftengreatinpractice.
OnMonday:
• Selfbalancingtrees:
• O(log(n))deterministicINSERT/DELETE/SEARCH
#prettysweet
#evensweeterinpractice
eg,Python’sdict,Java’sHashSet/HashMap,C++’sunordered_map
Hashtablesareusedfordatabases,caching,objectrepresentation,…
OnewaytogetO(1)time
• Sayallkeysareintheset{1,2,3,4,5,6,7,8,9}.
• INSERT:
• DELETE:
• SEARCH:
9 6 3 5
4 5 6 7 8 9
963 5
1 2 3
6
3 2
3ishere.
Thisiscalled
“directaddressing”
Thatshouldlookfamiliar
• KindoflikeBUCKETSORT fromLecture6.
• Sameproblem:ifthekeysmaycomefromauniverse U={1,2,….,10000000000}….
Thesolutionthenwas…• Putthingsinbucketsbasedononedigit.
1 2 3 4 5 6 7 8 90
345
50 1321
101
1
234
21 345 13 101 50 234 1
INSERT:
NowSEARCH 21
It’sinthisbucketsomewhere…
gothroughuntilwefindit.
22 342 12 102 52 232 2
INSERT:
Problem…
1 2 3 4 5 6 7 8 90
342
52
12
22
102
2
232
NowSEARCH 22….thishasn’tmade
ourliveseasier…
Hashtables
• Thatwasanexampleofahashtable.
• notaverygoodone,though.
• Wewillbemoreclever(andlessdeterministic) aboutourbucketing.
• Thiswillresultinfast(expectedtime)INSERT/DELETE/SEARCH.
Butfirst!Terminology.• WehaveauniverseU,ofsizeM.
• Misreallybig.
• Butonlyafew(sayatmostnfortoday’slecture)elementsofMareevergoingtoshowup.
• Miswaaaayyyyyyy biggerthann.
• Butwedon’tknowwhichoneswillshowupinadvance.
Allofthekeysinthe
universeliveinthis
blob.
UniverseU
Afewelementsarespecial
andwillactuallyshowup.
Example:Uisthesetofallstringsofatmost
140ascii characters.(128140 ofthem).
TheonlyoneswhichIcareaboutarethose
whichappearastrendinghashtagson
twitter.#hashinghashtags
Therearewayfewerthan128140 ofthese.
Examplesaside,I’mgoingtodrawelementslikeI
alwaysdo,asblueboxeswithintegersinthem…
Thepreviousexamplewiththisterminology
• WehaveauniverseU,ofsizeM.• atmostnofwhichwillshowup.
• Mis waaaayyyyyy biggerthann.
• WewillputitemsofUintonbuckets.
• Thereisahashfunction h:U →{1,…,n}whichsayswhatelementgoesinwhatbucket.
Allofthekeysinthe
universeliveinthis
blob.
UniverseU
nbuckets1
2
3
h(x)=least
significantdigitofx.
Forthislecture,I’massumingthatthe
numberofthingsisthesameasthe
numberofbuckets,botharen.
Thisdoesn’thavetobethecase,
althoughwedowant:
#buckets=O(#thingswhichshowup)
Thisisahashtable(withchaining)
• Arrayofnbuckets.
• Eachbucketstoresalinkedlist.• WecaninsertintoalinkedlistintimeO(1)
• TofindsomethinginthelinkedlisttakestimeO(length(list)).
• h:U → {1,…,n}canbeanyfunction:• butforconcretenesslet’sstickwithh(x)=leastsignificantdigitofx.
nbuckets(sayn=9)
1
2
3
9
13 22 43
Fordemonstration
purposesonly!
Thisisaterriblehash
function!Don’tusethis!
9
INSERT:
13
22
43
9
…
SEARCH43:
Scanthroughalltheelementsin
bucketh(43)=3.
Aside:Hashtableswithopenaddressing
• Thepreviousslideisabouthashtableswithchaining.
• There’salsosomethingcalled“openaddressing”
• ReadinCLRSifyouareinterested!
n=9buckets
1
2
3
9
13 43
…
Thisisa“chain”
n=9buckets
1
2
3
9
…
13
43
\end{Aside}
Thisisahashtable(withchaining)
• Arrayofnbuckets.
• Eachbucketstoresalinkedlist.• WecaninsertintoalinkedlistintimeO(1)
• TofindsomethinginthelinkedlisttakestimeO(length(list)).
• h:U → {1,…,n}canbeanyfunction:• butforconcretenesslet’sstickwithh(x)=leastsignificantdigitofx.
nbuckets(sayn=9)
1
2
3
9
13 22 43
Fordemonstration
purposesonly!
Thisisaterriblehash
function!Don’tusethis!
9
INSERT:
13
22
43
9
…
SEARCH43:
Scanthroughalltheelementsin
bucketh(43)=3.
IPython notebooktime
• (Seemstowork!)
• (Willthisexamplebeagoodidea?)
SometimesthisagoodideaSometimesthisisabadidea
• Howdowepickthatfunctionsothatthisisagoodidea?
1. Wewanttheretobenotmanybuckets(say,n).
• Thismeanswedon’tusetoomuchspace
2. Wewanttheitemstobeprettyspread-outinthebuckets.
• ThismeansitwillbefasttoSEARCH/INSERT/DELETE
n=9buckets
1
2
3
9
13
22
43
9
…
n=9buckets
1
2
3
9
13 43
…
21
93
vs.
Worst-caseanalysis
• Designafunctionh:U->{1,…,n} sothat:
• Nomatterwhatinput(fewerthannitemsofU)abadguychooses,thebucketswillbebalanced.
• Here,balancedmeansO(1)entriesperbucket.
• Ifwehadthis,thenwe’dachieveourdreamofO(1)INSERT/DELETE/SEARCH
Canyoucomeupwith
suchafunction?
Wereallycan’tbeatthebadguyhere.
.
UniverseU
h(x)nbuckets
Theseareallthethingsthat
hashtothefirstbucket.
• TheuniverseUhasM items
• Theygethashedintonbuckets
• AtleastonebuckethasatleastM/nitemshashedtoit.
• MisWAAYYYYYbigger thenn,soM/nisbiggerthann.
• Badguychoosesnoftheitemsthatlandedinthis
veryfullbucket.
Solution:
Randomness
Thegame
13 22 43 92
1. Anadversarychoosesanynitems
𝑢", 𝑢$, … , 𝑢& ∈ 𝑈,andanysequence
ofINSERT/DELETE/SEARCH
operationsonthoseitems.
2. You,thealgorithm,
choosesarandom hash
functionℎ: 𝑈 → {1,… , 𝑛}.
3. HASHITOUT
1
2
3
n
13
22
92
…
437
7
Whatdoes
randommean
here?Uniformly
random?
Pluckythepedanticpenguin
INSERT13,INSERT22,INSERT43,
INSERT92,INSERT7,SEARCH43,
DELETE92,SEARCH7,INSERT92
#hashpuns
Example
• Saythathis uniformlyrandom.
• Thatmeansthath(1)isauniformlyrandom numberbetween1andn.
• h(2)isalsoauniformlyrandomnumberbetween1andn,independentofh(1).
• h(3)isalsoauniformlyrandom numberbetween1andn,independentofh(1),h(2).
• …
• h(n)isalsoauniformlyrandom numberbetween1andn,independentofh(1),h(2),…,h(n-1).
Universe
U
nbucke
ts
h
Whyshouldthathelp?
Intuitively:Thebadguycan’tfoilahash
functionthathedoesn’tyetknow.
Whynot?Whatifthere’ssomestrategy
thatfoilsarandomfunctionwithhigh
probability?
We’llneedtodosomeanalysis…
Whatdowewant?
1
2
3
n
14
22
92
…
43
8
7 ui 32 5 15
It’sbad iflotsofitemslandinui’s bucket.
Sowewantnotthat.
Moreprecisely
1
2
3
n
14
22
92
…
43
8
ui
• Wewant:• Forallui thatthebadguychose
• E[numberofitemsinui ‘sbucket]≤ 2.
• Ifthatwerethecase,• Foreachoperationinvolvingui• E[timeofoperation]=O(1)
So,inexpectation,
itwouldtakesO(1)timeper
INSERT/DELETE/SEARCH
operation.
Sowewant:
• Foralli=1,…,n,
E[numberofitemsinui ‘sbucket]≤ 2.
Aside:whynot:
• Foralli=1,…,n:
E[numberofitemsinbucketi ]≤ ___?
1
2
3
n
14 22 92
…
43 8
thishappenswith
probability1/n
Supposethat:
1
2
3
n
14 22 92
…
43 8
andthishappens
withprobability1/netc.
ThenE[numberofitemsinbucketi ]=1foralli.
ButP{thebucketsgetbig}=1.
Thisslide
skippedinclass
Expectednumberofitemsinui’s bucket?
UniverseU
nbucke
ts
h
ujui
• 𝐸 = ∑ 𝑃 ℎ 𝑢6 = ℎ 𝑢7&78"
• = 1 +∑ 𝑃 ℎ 𝑢6 = ℎ 𝑢7�7;6
• = 1 +∑ 1/𝑛�7;6
• = 1 +&="
&≤ 2.
That’swhat
wewanted.youwillverify
thisonHW
COLLISION!
hisuniformlyrandom
That’sgreat!
• Foralli=1,…,n,
• E[numberofitemsinui ‘sbucket]≤ 2
• Thisimplies(aswesawbefore):
• Foranysequence ofINSERT/DELETE/SEARCHoperationsonanynelementsofU,theexpectedruntime(overtherandomchoiceofh)isO(1)peroperation.
So,thesolutionis:
pickauniformlyrandomhashfunction.
Theelephantintheroom
Theelephantintheroom
How do we do that?
Let’simplementthis!
• IPython NotebookforLecture8
Let’s NOT implementthis!
• SupposeU={allofthepossiblehashtags}
• Ifwecompletelychoosetherandomfunctionupfront,wehavetoiteratethroughallofU.
• 128140possibleASCIIstringsoflength140.
• (Morethanthenumberofparticlesintheuniverse)
• Andevenignoringthetimeconsiderations
• Wehavetostoreh(x)foreveryx.
Issues:
Anotherthought…
• Justrememberhontherelevantvalues
Algorithmnow Algorithmlater
1322
4392
7
h(13)=6
h(13)=6
h(22)=3
h(92)=3
Howmuchspacedoesittake
tostoreh?
• ForeachelementxofU:
• storeh(x)
• (whichisarandomnumberin{1,…,n}).
• Storinganumberin{1,..,n}takeslog(n)bits.
• SostoringMofthemtakesMlog(n)bits.
• Incontrast,directaddressingwouldrequireMbits.
Hangonnow
• Sure,that wayofstoringthefunctionhwon’twork.
• Butmaybethere’sanotherway?
Aside:descriptionlength
• SayIhaveasetSwithsthingsinit.
• IgettowritedowntheelementsofShoweverIlike.
• (inbinary)
• HowmanybitsdoIneed?
S
I’llcallthisone“Fido”Thisoneisnamed“Hercules”
Or,01101011Or,101
Onboard:theanswerislog(s)
Spaceneededtostorearandomfn h?
• Saythatthiselephant-shapedblobrepresentstheset
ofallhashfunctions.
• IthassizenM.(Reallybig!)
• Towritedownarandomhashfunction,weneed
log(nM)=Mlog(n)bits.L
Solution
• Pickfromasmallersetoffunctions.
Acleverlychosen subset
offunctions.Wecallsuch
asubsetahashfamily.
Weneedonlylog|H|bits
tostoreanelementofH.H
Outline
• HashtablesareanothersortofdatastructurethatallowsfastINSERT/DELETE/SEARCH.
• likeself-balancingbinarytrees
• Thedifferenceiswecangetbetterperformanceinexpectationbyusingrandomness.
• Hashfamiliesarethemagicbehindhashtables.
• Universalhashfamiliesareevenmoremagic.
Hashfamilies
• Ahashfamilyisacollectionofhashfunctions.
”Allofthehashfunctions”is
anexampleofahashfamily.
Example:asmallerhashfamily
• H ={functionwhichreturnstheleastsig.digit,
functionwhichreturnsthemostsig.digit}
• PickhinHatrandom.
• Storejustonebittorememberwhichwepicked.
Thisisstillaterribleidea!
Don’tusethisexample!
Forpedagogicalpurposesonly!
H
Thegame
19 22 42 92
1. Anadversary(whoknowsH)choosesanyn
items𝑢", 𝑢$, … , 𝑢& ∈ 𝑈,andanysequence
ofINSERT/DELETE/SEARCHoperationson
thoseitems.
2. You,thealgorithm,choosesarandom hash
functionℎ: 𝑈 → {0,… , 9}.Chooseit
randomlyfromH.
3. HASHITOUT
0
1
2
9 19
22 92
…
42
00
INSERT19,INSERT22,INSERT42,
INSERT92,INSERT0,SEARCH42,
DELETE92,SEARCH0,INSERT92
#hashpuns
h0 =Most_significant_digit
h1 = Least_significant_digit
H={h0,h1}
Ipickedh1
Thegame
1. Anadversary(whoknowsH)choosesanyn
items𝑢", 𝑢$, … , 𝑢& ∈ 𝑈,andanysequence
ofINSERT/DELETE/SEARCHoperationson
thoseitems.
2. You,thealgorithm,choosesarandom hash
functionℎ: 𝑈 → {0,… , 9}.Chooseit
randomlyfromH.
3. HASHITOUT
0
1
2
9
11
…
101
#hashpuns
h0 =Most_significant_digit
h1 = Least_significant_digit
H={h0,h1}
Ipickedh1
11101
111
121
131
141
111
121
131141
Thisadversary
couldhavebeen
moreadversarial!
Outline
• HashtablesareanothersortofdatastructurethatallowsfastINSERT/DELETE/SEARCH.
• likeself-balancingbinarytrees
• Thedifferenceiswecangetbetterperformanceinexpectationbyusingrandomness.
• Hashfamiliesarethemagicbehindhashtables.
• Universalhashfamiliesareevenmoremagic.
Howtopickthehashfamily?
• Definitelynotlikeinthatexample.
• Let’sgobacktothatcomputationfromearlier….
H
Expectednumberofitemsinui’s bucket?
UniverseU
nbucke
ts
h
ujui
• 𝐸 = ∑ 𝑃 ℎ 𝑢6 = ℎ 𝑢7&78"
• = 1 +∑ 𝑃 ℎ 𝑢6 = ℎ 𝑢7�7;6
• = 1 +∑ 1/𝑛�7;6
• = 1 +&="
&≤ 2.
Sothenumber
ofitemsinui’s
bucketisO(1).
youwillverify
thisonHW
COLLISION!
Howtopickthehashfamily?
• Let’sgobacktothatcomputationfromearlier….
• 𝐸 numberofthingsinbucketℎ 𝑢6
• =∑ 𝑃 ℎ 𝑢6 = ℎ 𝑢7&78"
• = 1 +∑ 𝑃 ℎ 𝑢6 = ℎ 𝑢7�7;6
• ≤ 1 +∑ 1/𝑛�7;6
• = 1 +&="
&≤ 2.
• Allweneededwasthatthis ≤ 1/n.
Strategy
• PickasmallhashfamilyH,sothatwhenIchoosehrandomlyfromH,
forall𝑢6 , 𝑢7 ∈ 𝑈with𝑢6 ≠ 𝑢7 ,
𝑃U∈V ℎ 𝑢6 = ℎ 𝑢7 ≤1
𝑛
H
h
• AhashfamilyHthatsatisfiesthisis
calledauniversalhashfamily.
• ThenwestillgetO(1)-sizedbucketsin
expectation.
• Butnowthespaceweneedis
log(|H|)bits.• Hopefullyprettysmall!
InEnglish:fixany
twoelementsofU.
Theprobability
thattheycollide
underarandomh
inHissmall.
Sothewholeschemewillbe
nbucke
ts
h
ui
UniverseU
Choosehrandomly
fromauniversalhash
familyH
Wecanstorehinsmallspace
sinceHissosmall.
Probably
these
bucketswill
bepretty
balanced.
UniversalhashfamilyLet’sstareatthisdefinition
• Hisauniversalhashfamilyif:
• WhenhischosenuniformlyatrandomfromH,
forall𝑢6 , 𝑢7 ∈ 𝑈with𝑢6 ≠ 𝑢7 ,
𝑃U∈V ℎ 𝑢6 = ℎ 𝑢7 ≤1
𝑛
Youactuallysawthisinyourpre-lectureexercise!
Toads=hashfns
Icecream=items
”Like”and“Dislike”=buckets
Checkourunderstanding…
• Hisauniversalhashfamilyif:
• WhenhischosenuniformlyatrandomfromH,
forall𝑢6 , 𝑢7 ∈ 𝑈with𝑢6 ≠ 𝑢7 ,
𝑃U∈V ℎ 𝑢6 = ℎ 𝑢7 ≤1
𝑛
• His[somethingelse]if:
• WhenhischosenuniformlyatrandomfromH,
forall𝑢 ∈ 𝑈, forall𝑥 ∈ {0, … , 𝑛 − 1},
𝑃U∈V ℎ 𝑢6 = 𝑥 ≤1
𝑛 Arethese
different?
Slide
(probably)
skippedin
class
Pre-lectureexercise
Universe={vanilla,chocolate}
Buckets={like,dislike}
Toads=differentpossiblewaysofdistributingitems
Statement1:P[randomtoadlikesvanilla]=½,P[randomtoadlikeschocolate]=½
P[“vanilla”landsinthebucket“like”]=½
Statement2:P[randomtoadfeelsthesameaboutchocolateandvanilla]=½
P [vanillaandchocolatelandinthesamebucket]=½
Slideskippedinclass
Pre-lectureexercise
Universe={vanilla,chocolate}
Buckets={like,dislike}
Toads=differentpossiblewaysofdistributingitemsSeemliketheymightbethesame…?
Statement1:P[randomtoadlikesvanilla]=½,P[randomtoadlikeschocolate]=½
P[“vanilla”landsinthebucket“like”]=½
Statement2:P[randomtoadfeelsthesameaboutchocolateandvanilla]=½
P [vanillaandchocolatelandinthesamebucket]=½
Slideskippedinclass
Pre-lectureexercise
Universe={vanilla,chocolate}
Buckets={like,dislike}
Toads=differentpossiblewaysofdistributingitemsButno!1istruebut2isnot.
Statement1:P[randomtoadlikesvanilla]=½,P[randomtoadlikeschocolate]=½
P[“vanilla”landsinthebucket“like”]=½
Statement2:P[randomtoadfeelsthesameaboutchocolateandvanilla]=½
P [vanillaandchocolatelandinthesamebucket]=½
Slideskippedinclass
Checkourunderstanding…
• Hisauniversalhashfamilyif:
• WhenhischosenuniformlyatrandomfromH,
forall𝑢6 , 𝑢7 ∈ 𝑈with𝑢6 ≠ 𝑢7 ,
𝑃U∈V ℎ 𝑢6 = ℎ 𝑢7 ≤1
𝑛
• His[somethingelse]if:
• WhenhischosenuniformlyatrandomfromH,
forall𝑢 ∈ 𝑈, forall𝑥 ∈ {0, … , 𝑛 − 1},
𝑃U∈V ℎ 𝑢6 = 𝑥 ≤1
𝑛 Theseare
different!
Slideskippedinclass
Example
• Uniformlyrandomhashfunctionh
• [Wejustsawthis]
• [Ofcourse,thisonehasotherdownsides…]
• PickasmallhashfamilyH,sothatwhenIchoosehrandomlyfromH,
forall𝑢6 , 𝑢7 ∈ 𝑈with𝑢6 ≠ 𝑢7 ,
𝑃U∈V ℎ 𝑢6 = ℎ 𝑢7 ≤1
𝑛
Non-example
• h0 =Most_significant_digit
• h1 =Least_significant_digit
• H={h0,h1}
• [discussiononboard]
• PickasmallhashfamilyH,sothatwhenIchoosehrandomlyfromH,
forall𝑢6 , 𝑢7 ∈ 𝑈with𝑢6 ≠ 𝑢7 ,
𝑃U∈V ℎ 𝑢6 = ℎ 𝑢7 ≤1
𝑛
Asmalluniversalhashfamily??
• Here’sone:
• Pickaprime𝑝 ≥ 𝑀.
• Define𝑓],^ 𝑥 = 𝑎𝑥 + 𝑏𝑚𝑜𝑑𝑝
ℎ],^ 𝑥 = 𝑓],^ 𝑥 𝑚𝑜𝑑𝑛
• Claim:
𝐻 = {ℎ],^ 𝑥 ∶ 𝑎 ∈ {1,… , 𝑝 − 1}, 𝑏 ∈ {0,… , 𝑝 − 1}}
isauniversalhashfamily.
Saywhat?
• Example:M=p=5,n=3
• TodrawhfromH:
• Pickarandomain{1,…,4},bin{0,…,4}
• Asperthedefinition:
• 𝑓$," 𝑥 = 2𝑥 + 1𝑚𝑜𝑑5
• ℎ$," 𝑥 = 𝑓$," 𝑥 𝑚𝑜𝑑3
1,2,3,4,5a=2,b=1
1
23
40
𝑓$," 𝑥
1
23
4 0
𝑓$," 1
𝑓$," 0
𝑓$," 3
𝑓$," 4𝑓$," 2U=
1
2
3
mod3
Thisstepjust
scramblesstuffup.
Nocollisionshere!
Thisstepistheone
wheretwodifferent
elementsmightcollide.
Ignoringwhythisisagoodidea
• Canwestorehwithsmallspace?
• Justneedtostoretwonumbers:
• aisin{1,…,p-1}
• bisin{0,…,p-1}
• Soabout2log(p)bits
• Byourchoiceofp,that’sO(log(M))bits.
1,2,3,4,5a=2,b=1
Compare:directaddressingwasMbits!
Twitterexample:log(M)=140log(128)=980 vsM=128140
AnotherwaytoseethisusingonlythesizeofH
• Wehavep-1choicesfora,andpchoicesforb.
• So|H|=p(p-1)=O(M2)
• Spaceneededtostoreanelementh:
• log(M2)=O(log(M)).
O(Mlog(n))bits
perfunction
O(log(M))bits
perfunction
Whydoesthiswork?
• Thisisactuallyalittlecomplicated.
• Therearesomehiddenslideshereaboutwhy.
• Alsoseethelecturenotes.
• Thethingwehavetoshowisthatthecollisionprobabilityisnotverylarge.
• Intuitively,thisisbecause:
• forany(fixed,notrandom)pair𝑥 ≠ 𝑦 in{0,….,p-1},
• Ifaandbarerandom,
• ax+banday+bareindependentrandomvariables.(why?)
Whydoesthiswork?
• Wanttoshow:
• forall𝑢6 , 𝑢7 ∈ 𝑈with𝑢6 ≠ 𝑢7 , 𝑃U∈V ℎ 𝑢6 = ℎ 𝑢7 ≤"
&
• aka,theprobabilityofanytwoelementscollidingissmall.
• Let’sjustfixtwoelementsandseeanexample.
• Let’sconsider𝑢6 , = 0, 𝑢7 = 1.
1
23
40
𝑓],^ 𝑥
1
23
4 0U=
1
2
3
mod3
𝑎𝑥 + 𝑏𝑚𝑜𝑑𝑝
Convince
yourselfthatit
willbethesame
foranypair!
Thisslideskippedinclass– hereforreference!
Theprobabilitythat0and1collideissmall
• Wanttoshow:
• 𝑃U∈V ℎ 0 = ℎ 1 ≤"
&
• Forany𝑦j ≠ 𝑦" ∈ {0,1,2,3,4},howmanya,b aretheresothat𝑓],^ 0 = 𝑦jand𝑓],^ 1 = 𝑦"?
• Claim:it’sexactlyone.
• Proof:solvethesystemofeqs.foraandb.
1
23
40
𝑓],^ 𝑥
1
23
4 0U=
1
2
3
mod3
𝑎𝑥 + 𝑏𝑚𝑜𝑑𝑝
eg,y0 =3,y1 =1.
𝑎 ⋅ 1 + 𝑏 = 𝑦"𝑚𝑜𝑑𝑝
𝑎 ⋅ 0 + 𝑏 = 𝑦j𝑚𝑜𝑑𝑝
Thisslideskippedinclass– hereforreference!
Theprobabilitythat0and1collideissmall
• Wanttoshow:
• 𝑃U∈V ℎ 0 = ℎ 1 ≤"
&
• Forany𝑦j ≠ 𝑦" ∈ {0,1,2,3,4}, exactlyonepaira,b have𝑓],^ 0 = 𝑦jand𝑓],^ 1 = 𝑦".
• If0and1collideit’sb/cthere’ssome𝑦j ≠ 𝑦"sothat:
• 𝑓],^ 0 = 𝑦jand𝑓],^ 1 = 𝑦".
• 𝑦j = 𝑦"𝑚𝑜𝑑𝑛.
1
23
40
𝑓],^ 𝑥
1
23
4 0U=
1
2
3
mod3
𝑎𝑥 + 𝑏𝑚𝑜𝑑𝑝
eg,y0 =3,y1 =1.
Thisslideskippedinclass– hereforreference!
Theprobabilitythat0and1collideissmall
• Wanttoshow:
• 𝑃U∈V ℎ 0 = ℎ 1 ≤"
&
• Thenumberofa,b sothat0,1collideunderha,b isatmostthenumberof𝑦j ≠ 𝑦"sothat𝑦j = 𝑦"𝑚𝑜𝑑𝑛.
• Howmanyisthat?• Wehavepchoicesfor𝑦j,thenatmost1/noftheremainingp-1arevalidchoicesfor𝑦"…
• Soatmost𝑝 ⋅l="
&.
1
23
40
𝑓],^ 𝑥
1
23
4 0U=
1
2
3
mod3
𝑎𝑥 + 𝑏𝑚𝑜𝑑𝑝
eg,y0 =3,y1 =1.
Thisslideskippedinclass– hereforreference!
Theprobabilitythat0and1collideissmall
• Wanttoshow:
• 𝑃U∈V ℎ 0 = ℎ 1 ≤"
&
• The#of(a,b) sothat0,1collideunderha,b is≤ 𝑝 ⋅l="
&.
• Theprobability(overa,b)that0,1collideunderha,b is:
• 𝑃U∈V ℎ 0 = ℎ 1 ≤l⋅
mno
p
V
• = l⋅
mno
p
l l="
• ="
&.
Thisslideskippedinclass– hereforreference!
Thesameargumentgoesforanypair
forall𝑢6 , 𝑢7 ∈ 𝑈with𝑢6 ≠ 𝑢7 ,
𝑃U∈V ℎ 𝑢6 = ℎ 𝑢7 ≤1
𝑛
That’sthedefinitionofauniversalhashfamily.
SothisfamilyHindeeddoesthetrick.
Thisslideskippedinclass– hereforreference!
Butlet’scheckthatitdoes work
• BacktoIPython NotebookforLecture8…
Empiricalprobabilityofcollisionoutof100trials
Numberofpairsof(x,y).
(Outof$jj$
=19900pairs)
M=200,n=10
Sothewholeschemewillbe
nbucke
ts
ha,b
ui
UniverseU
Chooseaandbatrandom
andformthefunctionha,b
Wecanstorehinspace
O(log(M))sincewejustneed
tostoreaandb.
Probably
these
bucketswill
bepretty
balanced.
Outline
• HashtablesareanothersortofdatastructurethatallowsfastINSERT/DELETE/SEARCH.
• likeself-balancingbinarytrees
• Thedifferenceiswecangetbetterperformanceinexpectationbyusingrandomness.
• Hashfamiliesarethemagicbehindhashtables.
• Universalhashfamiliesareevenmoremagic.
Recap
WantO(1)INSERT/DELETE/SEARCH
• WeareinterestinginputtingnodeswithkeysintoadatastructurethatsupportsfastINSERT/DELETE/SEARCH.
• INSERT
• DELETE
• SEARCH
5
datastructure
5
4
52
HEREITIS
Westudiedthisgame
13 22 43 92
1. Anadversarychoosesanynitems
𝑢", 𝑢$, … , 𝑢& ∈ 𝑈,andanysequence
ofLINSERT/DELETE/SEARCH
operationsonthoseitems.
2. You,thealgorithm,
choosesarandom hash
functionℎ: 𝑈 → {1,… , 𝑛}.
3. HASHITOUT
1
2
3
n
13
22
92
…
437
7
INSERT13,INSERT22,INSERT43,
INSERT92,INSERT7,SEARCH43,
DELETE92,SEARCH7,INSERT92
Uniformlyrandomhwasgood
• Ifwechoosehuniformlyatrandom,forall𝑢6 , 𝑢7 ∈ 𝑈with𝑢6 ≠ 𝑢7 ,
𝑃U∈V ℎ 𝑢6 = ℎ 𝑢7 ≤1
𝑛
• Thatwasenoughtoensurethat,inexpectation,abucketisn’ttoofull.
Abitmoreformally:
Foranysequence ofINSERT/DELETE/SEARCHoperations
onanynelementsofU,theexpectedruntime(overthe
randomchoiceofh)isO(1)peroperation.
Uniformlyrandomhwasbad
• Ifweactuallywanttoimplementthis,wehavetostorethehashfunctionh.
• Thattakesalotofspace!• WemayaswellhavejustinitializedabucketforeverysingleiteminU.
• Instead,wechoseafunctionrandomlyfromasmallerset.
Weneededasmallersetthatstillhasthisproperty
• Ifwechoosehuniformlyatrandom,forall𝑢6 , 𝑢7 ∈ 𝑈with𝑢6 ≠ 𝑢7 ,
𝑃U∈V ℎ 𝑢6 = ℎ 𝑢7 ≤1
𝑛
Thiswasallweneededtomake
surethatthebucketswere
balancedinexpectation!
• Wecallanysetwiththatpropertya
universalhashfamily.
• WegaveanexampleofareallysmalloneJ
Conclusion:
• WecanbuildahashtablethatsupportsINSERT/DELETE/SEARCH inO(1)expectedtime,
• ifweknowthatonlynitemsareeverygoingtoshowup,whereniswaaaayyyyyy lessthanthesizeMoftheuniverse.
• Thespacetoimplementthishashtableis
O(nlog(M))bits.• O(n)buckets
• O(n)itemswithlog(M)bitsperitem
• O(log(M))tostorethehashfn.
• Miswaaayyyyyy biggerthann,butlog(M)probablyisn’t.
That’sitfordatastructures(fornow)
DataStructure:RBTrees andHashTables
Nowwecanusethesegoingforward!
Before NextTime
• Graphalgorithms!
• Pre-lectureexerciseforLecture9
• Introtographs
NextTime