Unsupervised Cross-Modal Alignment of Speech...

Click here to load reader

  • date post

    21-Jun-2020
  • Category

    Documents

  • view

    2
  • download

    0

Embed Size (px)

Transcript of Unsupervised Cross-Modal Alignment of Speech...

  • 4. Experimental Settings Datasets

    Details of Training•Speech2VecwithSkip-grams,window=3• Encoder:single-layerbidirectionalLSTM• Decoder:single-layerunidirectionalLSTM• SGDwithfixedlearningrateof0.001

    •Word2VecfastTextimplementation•Bothdimensions=50•Discriminatorinadversarialtraining• 2layers,512neurons,ReLU

    5. Results

    Task I | Spoken Word Recognition• Accuracydecreasesasthelevelofsupervisiondecreases• Unsupervisedalignmentapproachisalmostaseffectiveasitsu-pervisedcounterpart(Avs.A*)• Wordsegmentationisacriticalstep• Applytodifferentcorporasettings

    Task II | Spoken Word Synonyms Retrieval• Theoutputactuallycontainbothsynonymsanddifferentlexicalformsoftheaudiosegment• Alsoconsidersynonymsasvalidresults

    Task III | Spoken Word Translation• Moresupervisionyieldsbetterperformance• Translationusingthesamecorpusoutperformsthoseusingdiffer-entcorpora

    2. Learning Embeddings

    Text Embedding Space• TrainWord2Vec[Mikolovetal.,2013]onthetextcorpus• Unsupervisedlearningofdistributedwordrepresentationsthatmodelwordsemantics

    Speech Embedding Space• TrainSpeech2Vec[ChungandGlass,2018]onthespeechcorpus• Thecorpusispre-processedbyanoff-the-shelfspeechsegmentationalgorithmsuchthatutterancesaresegmentedintoaudiosegmentscorrespondingtospokenwords• SpeechversionofWord2Vec:unsupervisedsemanticaudiosegmentrepresentations

    Unsupervised Cross-Modal Alignment of Speech and Text Embedding SpacesYu-An Chung, Wei-Hung Weng, Schrasing Tong, James GlassComputer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA

    1. Overview

    • Goal: To learn a linear mapping between speech & text embedding spaces

    3. Embedding Spaces Alignment

    References•ChungandGlass.Speech2vec:Asequence-to-sequenceframeworkforlearningwordembed-dingsfromspeech.INTERSPEECH2018.•Lampleetal.Wordtranslationwithoutparalleldata.ICLR2018.•Mikolovetal.Distributedrepresentationsofwordsandphrasesandtheircompositionality.NIPS2013.

    • Bothembeddingspacesarelearnedfromcorporabasedondistri-butionalhypothesis(e.g.,skip-grams)→approximatelyisomorphic• Constructthesyntheticmappingdictionarytolearnalinearmappingmatrixbetweenthetwoembeddingspaces

    Adversarial Training• Makethealignedembeddingsindistinguishable

    Refinement (Orthogonal Procrustes Problem)• UsetheWlearnedfromtheadversarialtrainingstepasaninitialproxyandbuildasyntheticparalleldictionary• Considerthemostfrequentwords

    Method Comparison