Text Similarity - Columbia Universitycompute similarity • Cosine similarity • When vectors have...

44
Text Similarity

Transcript of Text Similarity - Columbia Universitycompute similarity • Cosine similarity • When vectors have...

Page 1: Text Similarity - Columbia Universitycompute similarity • Cosine similarity • When vectors have unit length, cosine similarity is the dot product • Common to normalize embeddings

TextSimilarity

Page 2: Text Similarity - Columbia Universitycompute similarity • Cosine similarity • When vectors have unit length, cosine similarity is the dot product • Common to normalize embeddings

Announcements• NoteproblemswithMidtermmul2plechoiceques2ons2and3.Ifyougotthemwrong,youwillgetcredit.BringyourexambacktoTAhours.

• Therewillbearecita2ontomorrowonHW3.BesuretoprovideyourinterestandavailabilityonPiazza.

• Youhave2weeksforHW3.Duedateontheassignmentiscorrect.Ihaveupdatedwebsite.

Page 3: Text Similarity - Columbia Universitycompute similarity • Cosine similarity • When vectors have unit length, cosine similarity is the dot product • Common to normalize embeddings

TimetoRe4lect• Whydoesaneuralnetwork?

• Whatisitgoodat?

• Empiricalvstheory:whatdoweknow?

Page 4: Text Similarity - Columbia Universitycompute similarity • Cosine similarity • When vectors have unit length, cosine similarity is the dot product • Common to normalize embeddings

Supervised Machine Learning

x Inputfeaturevector

σ

ŷ

w

Predictedvalue

Sigmoidorother

nonlinearity

Parameters(thingswe'relearning)

Page 5: Text Similarity - Columbia Universitycompute similarity • Cosine similarity • When vectors have unit length, cosine similarity is the dot product • Common to normalize embeddings

Supervised Machine Learning

xw

σ

ŷ y Actualvalue

Howwrongwerewe?

Updateparameters

Page 6: Text Similarity - Columbia Universitycompute similarity • Cosine similarity • When vectors have unit length, cosine similarity is the dot product • Common to normalize embeddings

HighlightsofNeuralNets•  Learnarepresenta2on,notjusttopredict

•  Cri2calcomponentistheembeddinglayer•  Mappingfromdiscretesymbolstocon2nuousvectorsinlow-dimensionalspace

•  Seman&crepresenta&on:Distributed

•  Feed-forwardneuralnetworks(mul2-layerperceptron)canbeusedanywherealinearclassifierisused•  Superiorperformanceo[enduetonon-linearity

• Whichparametervalues,whichneuralnet(RNN,CNN,LSTM)arebestforataskisdeterminedexperimentally

Page 7: Text Similarity - Columbia Universitycompute similarity • Cosine similarity • When vectors have unit length, cosine similarity is the dot product • Common to normalize embeddings

7

2000

2001

2002

2003

2004

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

Accuracy

ASR

76

~95

ImageRecogni2on72

Hugeleapforwardin‘SpeechRecogni2on’and‘ImageRecogni2on’

Slidecredit:OmidBakhshandeh

Page 8: Text Similarity - Columbia Universitycompute similarity • Cosine similarity • When vectors have unit length, cosine similarity is the dot product • Common to normalize embeddings

TrendinNLPTasks

2000

2001

2002

2003

2004

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

8

Accuracy

75.6 80.4ParaphraseIden2fica2on

91 9288.99

91.2

NER

96

97

POS

DepParsing

96

97

94.2 94.4

NPChunking

Slidecredit:OmidBakhshandeh

Parsing

Page 9: Text Similarity - Columbia Universitycompute similarity • Cosine similarity • When vectors have unit length, cosine similarity is the dot product • Common to normalize embeddings

TimetoRe4lect• Yourreac2onstoneuralnetssofar

•  Aretheys2llconfusing?

•  Doyouneedtoseemore?

•  Areyouconvinced(yet)?

•  Aretheyintriguing?

•  Doyouwanttoseemore?

•  Successisempiricallydetermined:isempiricalvstheore2calproblema2c?

Page 10: Text Similarity - Columbia Universitycompute similarity • Cosine similarity • When vectors have unit length, cosine similarity is the dot product • Common to normalize embeddings
Page 11: Text Similarity - Columbia Universitycompute similarity • Cosine similarity • When vectors have unit length, cosine similarity is the dot product • Common to normalize embeddings

SynonomyandParaphrase• Acri2calpieceoftextinterpreta2on• Canbedomain-specific• Wordsynonomy:

•  Generaldomain:“hot”“sexy”butbiology?

General Biology

hot Warm,sexy,exci2ng Heated,warm,thermal

treat Address,handle Cure,fight,kill

head Leader,boss,mind Skull,brain,cranium

ExamplesfromPavlick

Page 12: Text Similarity - Columbia Universitycompute similarity • Cosine similarity • When vectors have unit length, cosine similarity is the dot product • Common to normalize embeddings

SententialParaphrase• Paraphrasesextractedfromdifferenttransla2onsofthesamenovel

ExamplesfromBarzilay

Emmaburstintotearsandhetriedtocomforther,sayingthingstomakehersmile.

Emmacried,andhetriedtoconsoleher,adorninghiswordswithpuns

Andfinally,dazzlinglywhite,itshonehighabovethemintheemptysky.

Itappearedwhiteanddazzling,intheemptyheavens.

Peoplesaid“TheEveningNoiseissounding,thesunissemng.”

“Theeveningbellisringing”peopleusedtosay.

Page 13: Text Similarity - Columbia Universitycompute similarity • Cosine similarity • When vectors have unit length, cosine similarity is the dot product • Common to normalize embeddings

Phrasalparaphrases

King’sson Sonoftheking

Inbooles booled

Starttotalk Starttalking

Suddenlycame Camesuddenly

Makeappearance appear

ExamplesfromBarzilay

Page 14: Text Similarity - Columbia Universitycompute similarity • Cosine similarity • When vectors have unit length, cosine similarity is the dot product • Common to normalize embeddings

TypesofTextSimilarity

• Manytypesoftextsimilarityexist:• Morphologicalsimilarity(e.g.,respect-respecpul)•  Spellingsimilarity(e.g.,theater-theatre)•  Synonymy(e.g.,talka2ve-chaoy)• Homophony(e.g.,raise-raze-rays)•  Seman2csimilarity(e.g.,cat-tabby)•  Sentencesimilarity(e.g.,paraphrases)• Documentsimilarity(e.g.,twonewsstoriesonthesameevent)

SlidefromRadev

Page 15: Text Similarity - Columbia Universitycompute similarity • Cosine similarity • When vectors have unit length, cosine similarity is the dot product • Common to normalize embeddings

Tasksrequiringtextsimilarity• Informa2onretrieval

• Machinetransla2on

• Summariza2on

• Inference

Page 16: Text Similarity - Columbia Universitycompute similarity • Cosine similarity • When vectors have unit length, cosine similarity is the dot product • Common to normalize embeddings

Usingwordembeddingstocomputesimilarity• Cosinesimilarity

• Whenvectorshaveunitlength,cosinesimilarityisthedotproduct

• Commontonormalizeembeddingsmatrixsothateachrowasunitlength

Page 17: Text Similarity - Columbia Universitycompute similarity • Cosine similarity • When vectors have unit length, cosine similarity is the dot product • Common to normalize embeddings

SimilarityMeasures(Cont.)

X

Y

•  Cosinesimilarity:similarityoftwovectors,normalized

cos(X,Y ) = x1y1 + x2y2 +...+ xnynx12 +...+ xn

2 ⋅ y12 +...+ yn

2=

xiyii=1

n

xi2

i=1

n

∑ ⋅ yi2

i=1

n

SlidefromRadev

Page 18: Text Similarity - Columbia Universitycompute similarity • Cosine similarity • When vectors have unit length, cosine similarity is the dot product • Common to normalize embeddings

DocumentSimilarity

• Usedininforma2onretrievaltodeterminewhichdocument(d1ord2)ismoresimilartoagivenqueryq.

• Documentsandqueriesarerepresentedinthesamespace.

• Angle(orcosine)isaproxyforsimilaritybetweentwovectors

SlidefromRadev

Page 19: Text Similarity - Columbia Universitycompute similarity • Cosine similarity • When vectors have unit length, cosine similarity is the dot product • Common to normalize embeddings

Quiz

•  Giventhreedocuments

D1=<1,3>D2=<10,30>D3=<3,1>

•  Computethecosinescores

σ(D1,D2)σ(D1,D3)

• Whatdothenumberstellyou?

SlidefromRadev

Page 20: Text Similarity - Columbia Universitycompute similarity • Cosine similarity • When vectors have unit length, cosine similarity is the dot product • Common to normalize embeddings
Page 21: Text Similarity - Columbia Universitycompute similarity • Cosine similarity • When vectors have unit length, cosine similarity is the dot product • Common to normalize embeddings
Page 22: Text Similarity - Columbia Universitycompute similarity • Cosine similarity • When vectors have unit length, cosine similarity is the dot product • Common to normalize embeddings

Quiz• Whatistherangeofvaluesthatthecosinescorescantake?

SlidefromRadev

Page 23: Text Similarity - Columbia Universitycompute similarity • Cosine similarity • When vectors have unit length, cosine similarity is the dot product • Common to normalize embeddings
Page 24: Text Similarity - Columbia Universitycompute similarity • Cosine similarity • When vectors have unit length, cosine similarity is the dot product • Common to normalize embeddings

FindingSimilarWords•  FindingkmostsimilarwordswhereEanembeddingmatrixforallwords.•  w=E[w]•  S=Ew

•  Avectorofsimilari2es•  S[i]=similarityofwtoithword•  K-mostsimilarwords?

• Howcanwefindthek-mostsimilarwordsthatarealsoorthographicallysimilar?

Page 25: Text Similarity - Columbia Universitycompute similarity • Cosine similarity • When vectors have unit length, cosine similarity is the dot product • Common to normalize embeddings
Page 26: Text Similarity - Columbia Universitycompute similarity • Cosine similarity • When vectors have unit length, cosine similarity is the dot product • Common to normalize embeddings

Similaritytoagroupofwords• Given:wi...wkthatareseman2callysimilar

• Findwjsuchthatitisthemostseman2callysimilartothegroup

• Definesimilarityasaveragesimilaritytothegroup:1/kΣi-1ksimcos(w,wi)s=E(w)E(w1+w2+…+wk)/k

• Howwouldwecomputeoddwordout?

Page 27: Text Similarity - Columbia Universitycompute similarity • Cosine similarity • When vectors have unit length, cosine similarity is the dot product • Common to normalize embeddings

ShortDocumentSimilarity• Wecantrainamodelorwecanjustusewordembeddings

• Suitableforveryshorttextssuchasqueries,newspaperheadlinesortweets

• Similarity=thesumofthepairwisesimilari2esofallwordsinthedocument

Page 28: Text Similarity - Columbia Universitycompute similarity • Cosine similarity • When vectors have unit length, cosine similarity is the dot product • Common to normalize embeddings

ComputingDocumentSimilarity• WhereD1=w1

1…w1mandD2=w2

1….w2n

• Equivalentto:

• Allows:Documentcollec2onDisamatrixwhereeachrowiisadocument.Similaritywithanewdocument:

Page 29: Text Similarity - Columbia Universitycompute similarity • Cosine similarity • When vectors have unit length, cosine similarity is the dot product • Common to normalize embeddings

AnalogySolvingTask• 

•  Equivalentto(COS-ADD)(LevyandGoldberg2014)

•  “…itisnotclearwhatsuccessonabenchmarkofanalogytaskssaysaboutthequalityofwordembeddingsbeyondtheirsuitabilityforsolvingthispar2culartask.”(Goldberg2017)

Page 30: Text Similarity - Columbia Universitycompute similarity • Cosine similarity • When vectors have unit length, cosine similarity is the dot product • Common to normalize embeddings

UsingWordNetandotherparaphrasecorpora•  (PPDB)PennParaphraseDatabase(PavlickandCallison-Burch)

• Canweusewordpairsthatreflectsimilaritybeoerforthetask?•  Pre-trainedembeddingsE•  GraphGrepresen2ngsimilarwordpairs•  SearchforanewwordembeddingmatrixE’whoserowsareclosetoEbutalsoclosetoG

• Methodsforcombiningpre-trainedwordembeddingswithsmaller,specializedembeddings

Page 31: Text Similarity - Columbia Universitycompute similarity • Cosine similarity • When vectors have unit length, cosine similarity is the dot product • Common to normalize embeddings

Caveats• Don’tjustuseoff-the-shelfwordembeddingsblindly

• Experimentwithcorpusandhyper-parametersemngs

• Whenusingoff-the-shelfembeddings,usethesametokeniza2onandnormaliza2on

Page 32: Text Similarity - Columbia Universitycompute similarity • Cosine similarity • When vectors have unit length, cosine similarity is the dot product • Common to normalize embeddings

Resources• Wordembeddings

•  hops://code.google.com/p/word2vec/•  hop://nlp.stanford.edu/projects/glove/images/compara2ve_superla2ve.jpg

• Neuralnetplaporms•  Kerrashops://keras.io/•  Pytorchhop://pytorch.org/•  Tensorflowhops://www.tensorflow.org/•  Theanohop://deeplearning.net/so[ware/theano/

Page 33: Text Similarity - Columbia Universitycompute similarity • Cosine similarity • When vectors have unit length, cosine similarity is the dot product • Common to normalize embeddings

Languageismadeupofsequences• Sofarwehaveseenembeddingsforwords

•  (andmethodsforcombiningthroughvectorconcatena2onandarithme2c)

• Buthowcanweaccountforsequences?• Wordsassequencesofleoers•  Sentencesassequencesofwords• Documentsassequencesofsentences

Page 34: Text Similarity - Columbia Universitycompute similarity • Cosine similarity • When vectors have unit length, cosine similarity is the dot product • Common to normalize embeddings

RecurrentNeuralNetworks• Representarbitrarilysizedsequencesinfixed-sizevector

• Goodatcapturingsta2s2calregulari2esinsequences(ordermaoers)

• IncludesimpleRNNs,Longshort-termmemory(LSTMs),GatedRecurrentUnit(GRUs)

Page 35: Text Similarity - Columbia Universitycompute similarity • Cosine similarity • When vectors have unit length, cosine similarity is the dot product • Common to normalize embeddings

[Thangetal.2013] [Bowmanetal.2014]

Learningwordmeaning Logicalentailmentfromtheirmorphsusingcompositional

semanticsviaRNNs

Page 36: Text Similarity - Columbia Universitycompute similarity • Cosine similarity • When vectors have unit length, cosine similarity is the dot product • Common to normalize embeddings

MachineTranslation(Sequences)

• Sequence-to-sequence•  Sutskeveretal.2014

Page 37: Text Similarity - Columbia Universitycompute similarity • Cosine similarity • When vectors have unit length, cosine similarity is the dot product • Common to normalize embeddings

RNNAbstraction• RNNisafunc2onthattakesanarbitrarylengthsequenceasinputandreturnsasingledoutdimensionalvectorasoutput•  Input:x1:n=x1x2…xn(xiεRd-in)•  Output:ynεRd-out

OOutputvectoryusedforfurtherpredic2on

Page 38: Text Similarity - Columbia Universitycompute similarity • Cosine similarity • When vectors have unit length, cosine similarity is the dot product • Common to normalize embeddings

RNNCharacteristics• Cancondi2onontheen2resequencewithoutresor2ngtotheMarkovassump2on

• Cangetverygoodlanguagemodelsaswellasgoodperformanceonmanyothertasks

Page 39: Text Similarity - Columbia Universitycompute similarity • Cosine similarity • When vectors have unit length, cosine similarity is the dot product • Common to normalize embeddings

RNNsarede4inedrecursively• Bymeansofafunc2onRtakingasinputastatevectorhi-1andaninputvectorxi

• Returnsanewstatevectorhi

• Thestatevectorcanbemappedtoanoutputvectoryiusingasimpledeterminis2cfunc2on

• Andfedthroughso[maxforclassifica2on.

Page 40: Text Similarity - Columbia Universitycompute similarity • Cosine similarity • When vectors have unit length, cosine similarity is the dot product • Common to normalize embeddings

wx

wh

RecurrentNeuralNetworks

x1

h0

h1

ℎ↓𝑡 =𝜎( 𝑊↓ℎ ℎ↓𝑡−1 + 𝑊↓𝑥 𝑥↓𝑡 )

σ

SlidefromRadev

Page 41: Text Similarity - Columbia Universitycompute similarity • Cosine similarity • When vectors have unit length, cosine similarity is the dot product • Common to normalize embeddings

wx

wh

RNN

x1

h0

h1

ℎ↓𝑡 =𝜎(𝑊↓ℎ ℎ↓𝑡−1 + 𝑊↓𝑥 𝑥↓𝑡 )𝑦↓𝑡 =𝑠𝑜𝑓𝑡𝑚𝑎𝑥( 𝑊↓𝑦 ℎ↓𝑡 )

σ

y1so8maxwy

SlidefromRadev

Page 42: Text Similarity - Columbia Universitycompute similarity • Cosine similarity • When vectors have unit length, cosine similarity is the dot product • Common to normalize embeddings

RNN

wx

wh

x1

h0

h1

σwx

wh

x2

h2

σwx

wh

x3

h3

σ

y3so8max

The cat sat

wy

SlidefromRadev

Page 43: Text Similarity - Columbia Universitycompute similarity • Cosine similarity • When vectors have unit length, cosine similarity is the dot product • Common to normalize embeddings

UpdatingParametersofanRNN

wx

wh

x1

h0

h1

σwx

wh

x2

h2

σwx

wh

x3

h3

σ

y3so8max

The cat sat

Cost

wy

Backpropaga2onthrough2me

SlidefromRadev

Page 44: Text Similarity - Columbia Universitycompute similarity • Cosine similarity • When vectors have unit length, cosine similarity is the dot product • Common to normalize embeddings

NextTime• MoreonRNNsandtheiruseinsen2mentanalysis