+ Various Improvements in Vector Space Word Representations Manaal Faruqui Sujay Jauhar, Jesse Dodge...
-
Upload
shannon-malone -
Category
Documents
-
view
216 -
download
0
Transcript of + Various Improvements in Vector Space Word Representations Manaal Faruqui Sujay Jauhar, Jesse Dodge...
Improving Vector Space Word Representations Using Multilingual Correlation
Various Improvements in Vector Space Word RepresentationsManaal Faruqui
Sujay Jauhar, Jesse DodgeChris Dyer, Noah Smith
+1Distributional SemanticsYou shall know a word by the company it keeps
(Harris 1954; Firth, 1957)I will take what is mine with fire and bloodthe end battle would be between fire and iceMy dragons are large and can breathe fire nowflame is the visible portion of a firetake place whereby fires can sustain their own heat+Translational SemanticsWhat other Information?(Bannard & Callison-Burch, 2005) That plane can seat more than 300 peopleRussian airplanes are huge Multilingual Information!plane airplane+OutlineDistributional SemanticsMonolingual context
Translational SemanticsMultilingual context
Better Semantic RepresentationsUsing Distributional + Translational semantics
+Word Vector RepresentationsHow to encode such co-occurrences?daynightcoldsleep0102winter3350the10129contextswords+Word Vector RepresentationLatent Semantic Analysis(Deerwester et al., 1990)Singular Value Decomposition
wordscontextwords+One of the earliest ways of computing word vectors6Multilingual InformationEnglish
German
French
SpanishdragonDrachedragondragnProblem ?= Append+Multilingual InformationVector Size Increases
Idiosyncratic Info.
What if word is OOV ?Disadvantages of Vector Concatenation?+Languages might be capturing idiosyncratic aspects of the meaning of the word.Instead of adding them together we want a consensus of what they mean !8Multilingual InformationI will take what is mine with fire and bloodthe end battle would be between fire and iceMy dragons are large and can breathe fire now
So, what can we do?... Das Ende der Schlacht wrde zwischen Feuer und Eis ... ... gesehen ist Feuer eine Oxidationsreaktion mit...... Das Licht des Feuers ist eine physikalische ErscheinungTwo Views: Canonical Correlation Analysis !+We want agreement across languages, not just what one language thinks of another9Canonical Correlation Analysis (CCA)Project two sets of vectors (equal cardinality) in a space where they are maximally correlatedConvex Optimization Problem with Exact Solution !CCA+Canonical Correlation Analysis (CCA)k = min(r(), r())WVXYn2d1kn1d2d2kd1XYkkn2n1X and Y are now maximally correlated !W, V = CCA(, )+Canonical Correlation Analysis (CCA)Vector Size Increases, Doesnt increaseProblems Addressed?Idiosyncratic Information, Lets you choose!What if word is OOV?, Projection vectors for everyone!+Canonical Correlation Analysis (CCA)The vocabularies cant be of equal size !Ok, but equal cardinality sets & ?Get word alignments from a parallel corpusPreserve only words in the original vocabularyFor every word in English, select the best foreign word
+Experimental SetupLSA Word Vector LearningMonolingual DataEnglishGermanFrenchSpanishNews CorpusWMT-2011WMT-2011 WMT 2011-12WMT-2011Tokens360,000,000290,000,000263,000,000164,000,000Types180,000294,000137,000145,000Tokenizer and Lowercasing: WMT scripts+Experimental SetupLSA Word Vector LearningParallel DataDe-EnFr-EnEs-EnNews Comm + EuroparlWMTWMTWMTTokens128,000,000138,000,000134,000,000Word pairs37,00038,00038,000Word Alignment Tool: fast_align (Dyer et al, 2013)+Experimental SetupLSA Word Vector LearningCorpus Preprocessing
...hello hello hello hello hello
Context :
23.45 , 21st , 10-20-2014 , 0.5e10 NUM
anchfgugsjh, wekjfbg, bhguyq UNK+Experimental SetupWord Similarity EvaluationWS-353 (Finkelstein et al, 2001)WS-353-SIM (Agirre et al, 2009)WS-353-REL (Agirre et al, 2009)RG-65 (Rubenstein and Goodenough, 1965)MC-30 (Miller and Charles, 1991)MTurk-287 (Radinsky et al, 2011)Word Relation EvaluationSemantic Relations (Mikolov et al, 2013)Syntactic Relations (Mikolov et al, 2013)Evaluation Benchmarks+Experimental SetupMonolingual Vector Length: 80Multilingual Vector Length: ?Multilingual Vector LearningThe length in projected space can be chosen: kChoose the best value of k for WS-353k [0.1, 0.2, , 1.0]+Experimental SetupMultilingual Vector Learning
Performance on WS-353; k = 0.6Spearmans correlation Dimensions+Experimental SetupMultilingual Vector LearningSpearmans correlation+Experimental SetupMultilingual Vector LearningAccuracy+Experimental SetupRNNLM (Mikolov et al, 2011)Predict next word given the historyNeural language modelRecurrent hidden layer connections
Skip-Gram, word2vec (Mikolov et al, 2013)Predict context given the wordRemoves hidden layerVocabulary represented in Huffman coding
Multilingual Vectors: Neural Networks+Experimental SetupMultilingual Vector LearningRNNLMSkip-Gram+Experimental SetupMultilingual Vectors: ScalingSpearmans correlation on WS-353
+Experimental SetupMultilingual Vectors: Qualitative Analysis
Antonyms and Synonyms of Beautiful: Monolingual Settingt-SNE tool (van der Maaten and Hinton, 2008)+Experimental SetupMultilingual Vectors: Qualitative AnalysisAntonyms and Synonyms of Beautiful: Multilingual Setting
t-SNE tool (van der Maaten and Hinton, 2008)+ConclusionCCA: Easy to use tool in MATLABTake vectors from two languages and improve them.
Multilingual Information is ImportantEven if the problems are inherently monolingual.
More Effective for Distributional VectorsSemantics generalizes better than Syntax.
Vectors available at: http://cs.cmu.edu/~mfaruqui+Word Vector RepresentationsWhat do they lack?
Semantic Lexicons !+28Encoding Ontological BeliefsRetrofittingqcanineqdogqhoundqmuttqpugqcanineqhoundqdogqpugqmutt+29Encoding Ontological Beliefsqjqi
Euclidean Distance!qiqi+Optimization
Convex ObjectiveIntractable Optimal SolutionIterative Updates+Optimization
Number of NeighborsUniformly weighted: 1+Experimental SetupWordNet (Miller, 1995)Paraphrase Database (Ganitkevich et al, 2013)FrameNet (Fillmore et al, 2003)
Lexical Ontologies and Word VectorsLatent Semantic Analysis (Deerwester et al, 1990)Global Context Vectors (Huang et al, 2012)Skip-Gram Vectors (Mikolov et al, 2013)Log Bilinear Vectors (Mnih and Teh, 2012)Multilingual Vectors (Faruqui and Dyer, 2014)
Lexical OntologiesWord Vectors+Experimental SetupEvaluation TasksWord Similarity TasksWS-353 (Finkelstein et al, 2001)MEN-3000 (Bruni et al, 2012)
Word Relation Task (Mikolov et al, 2013)King : Queen : : Man : Woman
TOEFL synonym selection (Landauer and Dumais, 1997)rug: sofa, ottoman, carpet, hallway+We only show results for a subset of tasks34ResultsUsing PPDB to Enrich Skip-Gram Vectors (Mikolov et al, 2013)+We only show two settings: PPDB + Skip-Gram and WordNet + Global Context
But they all work !35ResultsUsing PPDB to Enrich Global Context Vectors (Huang et al, 2012)+ResultsUsing FrameNet to Enrich Skip-Gram Vectors (Mikolov et al, 2013)+We only show two settings: PPDB + Skip-Gram and WordNet + Global Context
But they all work !37ResultsUsing Global WordNet (de Melo & Weikum, 2009) for Multilingual Vectors
German: RG-65French: WS-353Spanish: MC-30+Semantic Lexicons During LearningLog bilinear word embeddings (Mnih and Teh, 2012)
Learnt using Noise Contrastive Estimation (Gutmann, 2010)
+Semantic Lexicons During Learning
MAP Estimation+Semantic Lexicons During LearningTaskBaselineMAPRetrofittingMEN-3k58.061.863.7WS-35353.657.059.1Word-Relations31.536.346.2Sentiment72.574.473.4+Thanks !