Download - CS 106B, Lecture 27 Advanced Hashing - Stanford University

Transcript
Page 1: CS 106B, Lecture 27 Advanced Hashing - Stanford University

Thisdocumentiscopyright(C)StanfordComputerScienceandMartyStepp,licensedunderCreativeCommonsAttribution2.5License.Allrightsreserved.BasedonslidescreatedbyKeithSchwarz,JulieZelenski,JerryCain,EricRoberts,MehranSahami,StuartReges,CynthiaLee,andothers.

CS106B,Lecture27AdvancedHashing

Thisdocumentiscopyright(C)StanfordComputerScienceandAshleyTaylor,licensedunderCreativeCommonsAttribution2.5License.Allrightsreserved.BasedonslidescreatedbyMartyStepp,ChrisGregg,KeithSchwarz,JulieZelenski,JerryCain,EricRoberts,MehranSahami,StuartReges,CynthiaLee,andothers

Page 2: CS 106B, Lecture 27 Advanced Hashing - Stanford University

2

Plan for Today • DiscusshowHashMapsdifferfromHashSets• AnotherimplementationforHashSet/Map:CuckooHashing!• Discussqualitiesofagoodhashfunction.• Learnaboutanotherapplicationforhashing:cryptography.

Page 3: CS 106B, Lecture 27 Advanced Hashing - Stanford University

3

Hash map (15.4)

• Ahashmapislikeasetwherethenodesstorekey/valuepairs:

//key(ID)value(name)map.put(51234562,"Ashley");map.put(62756179,"Amy");map.put(54727849,"Marty");map.put(46281955,"Seth");– MustmodifytheHashNodeclasstostoreakeyandavalue

index 0 1 2 3 4 5 6 7 8 9value

62756179 Amy46281955 Seth51234562 Ashley

54727849 Marty

Page 4: CS 106B, Lecture 27 Advanced Hashing - Stanford University

4

Hash map vs. hash set –  Thehashingisalwaysdoneonthekeys,notthevalues.–  ThecontainsfunctionisnowcontainsKey;thereandinremove,yousearchforanodewhosekeymatchesagivenkey.

–  Theaddmethodisnowput;ifthegivenkeyisalreadythere,youmustreplaceitsoldvaluewiththenewone.map.put(54727849,"Chris");//replaceMartywithChris

index 0 1 2 3 4 5 6 7 8 9value

62756179 Amy46281955 Seth51234562 Ashley

54727849 MartyChris

Page 5: CS 106B, Lecture 27 Advanced Hashing - Stanford University

5

Another Way to Hash • Fun(butsoontoberelevant)fact:cuckoobirdslaytheireggsinotherbirds’nests

Source:wikimedia

Page 6: CS 106B, Lecture 27 Advanced Hashing - Stanford University

6

Cuckoo Hashing • Whatifwemadecontainsreallyfast(lookatatmosttwoelements,nomatterwhat)?

•  Idea:havetwoarraysthatstoreelements,whereeacharrayhasitsownhashfunction

• Tryhashingtheelementintobotharrays,andputitinanemptyspace

•  Ifnospaceisempty,kickoutoneoftheexistingelementsandmoveittotheotherarray.

• Containsjustchecksthecorrespondingspotinbotharrays• Sloweradd,butfastercontains

Page 7: CS 106B, Lecture 27 Advanced Hashing - Stanford University

7

Cuckoo Hashing Insert:3

HashFunction:3x%4 HashFunction:(2x+1)%4

Page 8: CS 106B, Lecture 27 Advanced Hashing - Stanford University

8

Cuckoo Hashing

3

Insert:3

HashFunction:3x%4 HashFunction:(2x+1)%4

Page 9: CS 106B, Lecture 27 Advanced Hashing - Stanford University

9

Cuckoo Hashing

3

Insert:6

HashFunction:3x%4 HashFunction:(2x+1)%4

Page 10: CS 106B, Lecture 27 Advanced Hashing - Stanford University

10

Cuckoo Hashing

3 6

Insert:6

HashFunction:3x%4 HashFunction:(2x+1)%4

Page 11: CS 106B, Lecture 27 Advanced Hashing - Stanford University

11

Cuckoo Hashing

3 6

Insert:5

HashFunction:3x%4 HashFunction:(2x+1)%4

Page 12: CS 106B, Lecture 27 Advanced Hashing - Stanford University

12

Cuckoo Hashing

3 6

5

Insert:5

HashFunction:3x%4 HashFunction:(2x+1)%4

Page 13: CS 106B, Lecture 27 Advanced Hashing - Stanford University

13

Cuckoo Hashing

3 6

5

Insert:7

HashFunction:3x%4 HashFunction:(2x+1)%4

Page 14: CS 106B, Lecture 27 Advanced Hashing - Stanford University

14

Cuckoo Hashing

3 6

7

Insert:7

HashFunction:3x%4 HashFunction:(2x+1)%4

5

Page 15: CS 106B, Lecture 27 Advanced Hashing - Stanford University

15

Cuckoo Hashing

3

5

6

7

Insert:7

HashFunction:3x%4 HashFunction:(2x+1)%4

Page 16: CS 106B, Lecture 27 Advanced Hashing - Stanford University

16

Cuckoo Hashing

3

5

6

7

Searchfor7(lookinbotharrays)

HashFunction:3x%4 HashFunction:(2x+1)%4

Page 17: CS 106B, Lecture 27 Advanced Hashing - Stanford University

17

Cuckoo Hashing • Whataretheadvantagesordisadvantagesofcuckoohashingversusresolvingcollisionsthroughchaining?

• Whatdoweneedtowatchoutfor?Whenshouldwerehash?

Page 18: CS 106B, Lecture 27 Advanced Hashing - Stanford University

18

Announcements • Calligraphyannouncements

–  Shouldstartthe3rdparttodayortomorrowatthelatest–  StartercodeandWindows–pleaseredownload– Nolatedaysmaybeused,nolatesubmissionsaccepted

• Lastclasstomorrow–gotopoll.ly/#/LdVNgWyo/G6z0awRv• FinalisaonSaturday,at8:30AM,inCubberleyAuditorium

–  Everythingfromthecoursethroughtodayisfairgame,emphasisisonsecondhalfmaterials(startingwithpointers)

– Moreinformation:https://web.stanford.edu/class/cs106b/exams/final.html

–  Practiceexamisonline–notguaranteedtomatchinformat,etc.– WednesdayandThursdaywillbefinalreview

• Pleasegiveusfeedback!cs198.stanford.edu

Page 19: CS 106B, Lecture 27 Advanced Hashing - Stanford University

19

Hashing strings

•  Itiseasytohashanintegeri(useindexabs(i)%length).– Howcanwehashothertypesofvalues(suchasstrings)?

•  Ifwecouldconvertstringsintointegers,wecouldhashthem.– Whatkindofintegerisappropriateforagivenstring?– Doesitmatterwhatintegerwechoose?Whatshoulditbebasedon?

index 0 1 2 3 4 5 6 7

character 'H' 'i' '' 'D' '0' '0' 'd' '!'

Page 20: CS 106B, Lecture 27 Advanced Hashing - Stanford University

20

hashCode consistency • AvalidhashCodefunctionmustbeconsistent(mustproducesameresultsoneachcall)

hashCode(x)==hashCode(x),ifx'sstatedoesn'tchange

Page 21: CS 106B, Lecture 27 Advanced Hashing - Stanford University

21

hashCode and equality • AvalidhashCodefunctionmustbeconsistentwithequality.

a==bmustimplythathashCode(a)==hashCode(b).Vector<int> v1; Vector<int> v2; v1.add(1); v2.add(3); v1.add(3); v2.insert(0, 1); // hashCode(v1) == hashCode(v2)

a!=b doesNOTnecessarilyimplythat

hashCode(a)!=hashCode(b) (whynot?)

Page 22: CS 106B, Lecture 27 Advanced Hashing - Stanford University

22

hashCode distribution • AgoodhashCodefunctioniswell-distributed.

–  Foralargesetofdistinctvalues,theyshouldgenerallyreturnuniquehashcodesratherthanoftencollidingintothesamehashbucket.

–  Thispropertyisdesiredbutnotrequired.Why?

Page 23: CS 106B, Lecture 27 Advanced Hashing - Stanford University

23

Possible hashCode 1 • Q:Isthisavalidhashfunction?Isitgood?inthashCode(strings){//#1return42;}

0 1 2 3 4 5 6 7

H i D 0 0 d !

Page 24: CS 106B, Lecture 27 Advanced Hashing - Stanford University

24

Possible hashCode 2 • Q:Isthisavalidhashfunction?Isitgood?inthashCode(strings){//#2returnrandomInteger(0,9999999);}

0 1 2 3 4 5 6 7

H i D 0 0 d !

Page 25: CS 106B, Lecture 27 Advanced Hashing - Stanford University

25

Possible hashCode 3 • Q:Isthisavalidhashfunction?Isitgood?inthashCode(strings){//#3return(int)&s;//addressofs(apointer)}

0 1 2 3 4 5 6 7

H i D 0 0 d !

Page 26: CS 106B, Lecture 27 Advanced Hashing - Stanford University

26

Possible hashCode 4 • Q:Isthisavalidhashfunction?Isitgood?inthashCode(strings){//#4returns.length();}

0 1 2 3 4 5 6 7

H i D 0 0 d !

Page 27: CS 106B, Lecture 27 Advanced Hashing - Stanford University

27

Possible hashCode 5 • Q:Isthisavalidhashfunction?Isitgood?inthashCode(strings){//#5if(s.length()>0){return(int)s[0];//asciiof1stchar}else{return0;}}

0 1 2 3 4 5 6 7

H i D 0 0 d !

Page 28: CS 106B, Lecture 27 Advanced Hashing - Stanford University

28

Possible hashCode 6 •  Thisfunctionsumsthecharacters'ASCIIvalues.

–  Isitvalid?Isitgood?– Whatwillcollide?inthashCode(strings){//#6inthash=0;for(inti=0;i<s.length();i++){hash+=(int)s[i];//ASCIIofchar}returnhash;}

0 1 2 3 4 5 6 7

H i D 0 0 d !

Page 29: CS 106B, Lecture 27 Advanced Hashing - Stanford University

29

Measuring collisions • Hashfunction=sumofcharactersofstring.• Add50,000,000articletitlestoahashmapwith50,000buckets:

Page 30: CS 106B, Lecture 27 Advanced Hashing - Stanford University

30

Idea: Weighted sum hash=s[0]+s[1]+s[2]+...+s[n]

•  Insteadofadding,let'sgiveeachcharacteraweight.– Multiplyitbyincreasingpowersofsomeprimenumber;say,31.–  Thishelpsspreadthestrings'hashcodesovertherangeofintvalues.

hash=s[0]+(31*s[1])+(312*s[2])+...+(31n*s[n])

Page 31: CS 106B, Lecture 27 Advanced Hashing - Stanford University

31

hashCode for strings inthashCode(strings){inthash=5381;for(inti=0;i<(int)s.length();i++){hash=31*hash+(int)s[i];}returnhash;}–  FYI:TheaboveistheactualhashfunctionusedforstringsinJava.

–  Aswithanygeneralhashingfunction,collisionsarepossible.• Example:"Ea"and"FB"havethesamehashvalue.

Page 32: CS 106B, Lecture 27 Advanced Hashing - Stanford University

32

Measuring collisions • Hashfunction=sumofcharactersofstring,multiplyingby31.• Add50,000,000articletitlestoahashmapwith50,000buckets:

Page 33: CS 106B, Lecture 27 Advanced Hashing - Stanford University

33

Hashing structs/objects • Bydefaultyoucannotaddyourownstructs/objectstohashsets.

– Ourlibrariesdon'tknowhowtohashtheseobjects.structPoint{intx;inty;

...

};HashSet<Point>hset;Pointp{17,35};hset.add(p);ERROR:nomatchingfunctionforcallto'hashCode(constPoint&)'

Page 34: CS 106B, Lecture 27 Advanced Hashing - Stanford University

34

Hashing structs/objects • Tomakeyourowntypeshashablebyourlibraries:

–  1)Overloadthe==operator.–  2)WriteahashCodefunctionthattakesyourtypeasitsparameter.

• "Addup"theobject'sstate;scale/multiplypartstodistributetheresults.

structPoint{intx;inty;

...

};

inthashCode(constPoint&p){return1337*p.y+31*p.x;}

booloperator==(constPoint&p1,constPoint&p2){returnp1.x==p2.x&&p1.y==p2.y;}

Page 35: CS 106B, Lecture 27 Advanced Hashing - Stanford University

35

Hashing and Passwords • Wewanttostoreafileofuserpasswords

– Whenausertypesapassword,seeifitmatchesourfile• Problem:anyonewhocanseeourfilecangetallthepasswords

User Password Ashley password123

Shreya traceComics Seth ki88leLuv

Page 36: CS 106B, Lecture 27 Advanced Hashing - Stanford University

36

Hashing and Passwords • Whatifwestoredauniquecodeforeachpasswordinsteadofthestring?– Hashing!

• Extrarequirementsforthehashfunction:– Wantalargenumberofpossiblevalues(hardtofindcollisions)–  Can’tfindthepasswordfromthehash(one-way)– Generallyuseadifferenthashfunction(e.g.SHA-256)

• TheneedforsaltingUser Password Ashley 17851691385

Marty 63158910316 Amy 90713593110

Page 37: CS 106B, Lecture 27 Advanced Hashing - Stanford University

37

Hashing and Data Integrity • Acommon"attack"incryptographyisman-in-the-middle• Howcanyouensurethatahackerdidn'tinterferewiththedata?• Getthehashfromatrustedsource–sincehashfunctionsonlyrarelyhavecollisions,changestodatawillleadtoadifferenthash