Recurrent Neural Network Based Language...

Recurrent Neural Network Based Language Model

Author: Toma ́sˇ Mikolov et. al Johns Hopkins University, USA

Presented by : Vicky Xuening Wang

ECS 289G, Nov 2015, UC Davis

Toma ́sˇ Mikolov1,2, Martin Karafia ́t1, Luka ́sˇ Burget1, Jan “Honza” Cˇernocky ́1, Sanjeev Khudanpur2

Language Model Tasks

• Statistical/Probabilistic Language Models

• Goal: compute the probability of a sentence or sequence of words:

• P(W) = P(w1,w2,w3,w4,w5...wn)

• Related task: predict probability of an upcoming word:

• P(wn|w1,w2,w3,w4,….wn-1)

Introduction - Language model

https://web.stanford.edu/class/cs124/lec/languagemodeling.pdf

• Chain rule of probability

• Markov assumption

• N-gram model

Introduction

• Typical tasks: • Machine Translation

• P(high winds tonite) > P(large winds tonite)

• Spell Correction • P(about fifteen minutes from) > P(about fifteen minuets

• Speech Recognition • P(I saw a van) >> P(eyes awe of an)

• Summarization, question-answering, etc

Introduction - LM tasks

Introduction - Bigram model

Maximum Likelihood Estimation

Introduction - Perplexity

https://web.stanford.edu/class/cs124/lec/languagemodeling.pdfhttps://web.stanford.edu/class/cs124/lec/languagemodeling.pdf

Lower is better!

Introduction - WER

Lower is better!

• Recurrent Neural Network based language model (RNN-LM) outperforms standard backoff N-gram models

• Words are projected into low dimensional space, similar words are automatically clustered together.

• Smoothing is solved implicitly.

• Backpropagation is used for training.

Overview

Fixed-length

• Input layer x • Hidden/context layer s • Output layer y

Model Description - RNN Con’d

• RNN can be seen as a chain of NNs • Intimately related to sequences and lists. • In the last few years, RNN has been successfully applied

to : speech recognition, language modeling, translation, image captioning…

RNN v.s. FF

• Parameters to tune or selected:

• RNN

• Size of hidden layer

• FF

• size of layer that projects words to low dimensional space

• size of hidden layer

• size of context-length

RNN v.s. FF

• In feedforward networks, history is represented by context of N − 1 words - it is limited in the same way as in N-gram backoff models.

• In recurrent networks, history is represented by neurons with recurrent connections - history length is unlimited.

• Also, recurrent networks can learn to compress whole history in low dimensional space, while feedforward networks compress (project) just single word.

• Recurrent networks have possibility to form short term memory, so they can better deal with position invariance; feedforward networks cannot do that.

Comparison of modelsSimple experiment on 4M words from Switchboard corpus

93.7baseline

KN 5gram FF RNN 4*RNN+KN5

Model setting• Standard backpropogation algorithm + SGD

• Train in several epochs:

• α=0.1

• if log-likelihood of validation data increases

• continue

• else α=0.5α and continue

• terminate if no significant improvement

• Convergence usually reached at 10-20 epochs

Model setting- Optimization

• Rare token • merge all words occurring less often

than a threshold in training data to a uniformly distributed rare token

Experiments• WSJ (Source: read text only)

• training corpus consists of 37M words

• baseline KN5 - modified Kneser-Ney smoothed 5-gram

• RNN LM - select 6.4M words trained on 300K sentences

• combine 0.75 RNN+0.25 backoff Model

• NIST RT05 (115 hours of meeting speech + web data)

• more than1.3G words

• RNN LM select 5.4M words

21Best perplexity result is 112 for mixture of static and dynamic RNN LMs with larger learning rate 0.3

~50%! 18%

12% improvement

• RNNs are trained only on in-domain data(5.4M words)

• RT 05, RT 09 are trained on more than 1.3G words

Summary

• RNN LM is simple and intelligent.

• RNN LMs can be competitive with backoff LMs that are trained on much more data.

• Results show interesting improvements both for ASR and MT.

• Simple toolkit has been developed that can be used to train RNN LMs.

• This work provides clear connection between machine learning, data compression and language modeling.

Future work• Clustering of vocabulary to speed up training

• Parallel implementation of neural network training algorithm

• Online learning or dynamic model will be the future

• BPTT algorithm for a lot of training data

• Go beyond BPTT? LSTM

• Extended to OCR, data compression, cognitive sciences…

–Xuening

Thanks!

Recurrent Neural Network Based Language...

Documents

Transcript of Recurrent Neural Network Based Language...

RECURRENT NEURAL NETWORKS

Recurrent Neural Networksjcheung/teaching/fall-2017/...Recurrent Neural Networks COMP-550 Oct 5, 2017 Outline Introduction to neural networks and deep learning Feedforward neural networks

RECURRENT NEURAL NETWORKS - index-of.co.ukindex-of.co.uk/Artificial-Intelligence/Recurrent Neural Networks Design And... · recurrent neural networks and is exemplary of current research

Lecture 6: Recurrent & Graph Neural NetworksLecture 6: Recurrent & Graph Neural Networks Efstratios Gavves. UVA DEEP LEARNING COURSE –EFSTRATIOS GAVVES RECURRENT NEURAL NETWORKS

Tutorial Training Recurrent Neural Networks

RECURRENT NEURAL NETWORKS - Lagout Intelligence/Neural networks... · Recurrent neural networks have been an interesting and important part of neural network research during the 1990's.

Online Learning with Stochastic Recurrent Neural …...recurrent neural network, hence, we refer to it as a stochastic recurrent neural network. The basis model consists of two different

Pixel Recurrent Neural Networks

ImageNet Classification with Deep Convolutional Neural ...web.cs.ucdavis.edu/~yjlee/teaching/ecs289g-fall2016/yuanzhe.pdf · ImageNet Classification with Deep Convolutional Neural

Neural networks for structured data 1. Table of contents Recurrent models Partially recurrent neural networks Elman networks Jordan networks Recurrent.

Understanding Recurrent Neural Networks Using ...

ELEC 677: Recurrent Neural Network Applications & Recurrent Neural Network … · 2016-11-08 · Recurrent Neural Network Applications & Recurrent Neural Network Language Models Lecture

A Recurrent Quantum Neural Network

Visualizing and Understanding Recurrent Networksweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-fall2016/ismail2.pdf · [Visualizing and Understanding Recurrent Networks, Andrej Karpathy*,

Recurrent Neural Network Grammars - nlp.cs.hku.hk

Recurrent neural networks and robust time series prediction - Neural

Recurrent Neural Network - whdeng Neural Network.pdfRecurrent neural networks Long Short-Term Memory recurrent networks Applications Recurrent Neural Network Xiaogang Wang xgwang@ee.cuhk.edu.hk

SUPERRESOLUTION RECURRENT CONVOLUTIONAL NEURAL NETWORKS ...

Beyond Recurrent Neural Networks - GitHub Pagesiamthevastidledhitchhiker.github.io/figs/SOM_11MAR2016/TensorFlow... · The Hitchhiker’s Guide to TensorFlow: Beyond Recurrent Neural

CONVOLUTIONAL-RECURRENT NEURAL NETWORKS ...hzhao1/papers/ICASSP2018/...tional recurrent neural networks (BRNN) that have recurrent connections in both directions. The output of the