LSTM: A Search Space Odyssey
Transcript of LSTM: A Search Space Odyssey
![Page 1: LSTM: A Search Space Odyssey](https://reader030.fdocuments.in/reader030/viewer/2022012015/615a8d9968c2cc71902c51cc/html5/thumbnails/1.jpg)
LSTM: A Search Space Odyssey
Authors: Klaus Greff, Rupesh K. Srivastava, Jan Koutn´ık, Bas R. Steunebrink, J¨urgen Schmidhuber
![Page 2: LSTM: A Search Space Odyssey](https://reader030.fdocuments.in/reader030/viewer/2022012015/615a8d9968c2cc71902c51cc/html5/thumbnails/2.jpg)
Outlines
• Introduction
• Long Short-Term Memory (LSTM) with peephole connections
• Experiment and discussion
• Conclusion
![Page 3: LSTM: A Search Space Odyssey](https://reader030.fdocuments.in/reader030/viewer/2022012015/615a8d9968c2cc71902c51cc/html5/thumbnails/3.jpg)
Definition:
• Recurrent Neural Networks
• Importance and its applications
• Gradient problem
• Vanishing gradient
• Exploding gradient
• What is the LSTM?
Introduction LSTM with peephole connections Results and discussion Conclusion
![Page 4: LSTM: A Search Space Odyssey](https://reader030.fdocuments.in/reader030/viewer/2022012015/615a8d9968c2cc71902c51cc/html5/thumbnails/4.jpg)
LSTM History:
• LSTM was proposed in 1997 by Sepp Hochreiter and Jürgen Schmidhuber.
• In 1999, Felix Gers and Jürgen Schmidhuber and Fred Cummins introduced the
forget gate into LSTM architecture.
• In 2000, Gers & Schmidhuber & Cummins added peephole connections
• In 2014, Kyunghyun Cho et al. put forward a simplified variant called Gated
recurrent unit
Introduction LSTM with peephole connections Results and discussion Conclusion
![Page 5: LSTM: A Search Space Odyssey](https://reader030.fdocuments.in/reader030/viewer/2022012015/615a8d9968c2cc71902c51cc/html5/thumbnails/5.jpg)
Simple RNN
Introduction LSTM with peephole connections Results and discussion Conclusion
![Page 6: LSTM: A Search Space Odyssey](https://reader030.fdocuments.in/reader030/viewer/2022012015/615a8d9968c2cc71902c51cc/html5/thumbnails/6.jpg)
Block diagram
• Three gates:• Input gate
• Forget gate
• Output gate
• Two blocks:• Block input
• Block output
• One cell state:• cell state
Introduction LSTM with peephole connections Results and discussion Conclusion
![Page 7: LSTM: A Search Space Odyssey](https://reader030.fdocuments.in/reader030/viewer/2022012015/615a8d9968c2cc71902c51cc/html5/thumbnails/7.jpg)
Block Diagram
Block input:
𝑊𝑊𝑧𝑧: input weight ( 𝑅𝑅𝑁𝑁 ×𝑀𝑀)
𝑅𝑅𝑧𝑧: recurrent weight ( 𝑅𝑅𝑁𝑁 ×𝑀𝑀)
𝑏𝑏𝑧𝑧: bias weight
𝑥𝑥𝑡𝑡: input vector at time t
𝑦𝑦𝑡𝑡−1: output at time t-1
Input
Recurrent
z
Introduction LSTM with peephole connections Results and discussion Conclusion
![Page 8: LSTM: A Search Space Odyssey](https://reader030.fdocuments.in/reader030/viewer/2022012015/615a8d9968c2cc71902c51cc/html5/thumbnails/8.jpg)
Block Diagram
Input gate:𝑊𝑊𝑖𝑖: input weight ( 𝑅𝑅𝑁𝑁 ×𝑀𝑀)
𝑅𝑅𝑖𝑖: recurrent weight ( 𝑅𝑅𝑁𝑁 ×𝑀𝑀)
𝑏𝑏𝑖𝑖: bias weight (𝑅𝑅𝑁𝑁 )
𝑝𝑝𝑖𝑖: peephole weight (𝑅𝑅𝑁𝑁 )
𝑐𝑐𝑡𝑡−1: cell state at time t-1
𝑥𝑥𝑡𝑡: input vector at time t
𝑦𝑦𝑡𝑡−1: output at time t-1
Input
Recurrent
i
𝑐𝑐𝑡𝑡−1
Introduction LSTM with peephole connections Results and discussion Conclusion
![Page 9: LSTM: A Search Space Odyssey](https://reader030.fdocuments.in/reader030/viewer/2022012015/615a8d9968c2cc71902c51cc/html5/thumbnails/9.jpg)
Block Diagram
Forget gate:𝑊𝑊𝑓𝑓: input weight ( 𝑅𝑅𝑁𝑁 ×𝑀𝑀)
𝑅𝑅𝑓𝑓: recurrent weight ( 𝑅𝑅𝑁𝑁 ×𝑀𝑀)
𝑏𝑏𝑓𝑓: bias weight (𝑅𝑅𝑁𝑁 )
𝑝𝑝𝑓𝑓: peephole weight (𝑅𝑅𝑁𝑁 )
𝑐𝑐𝑡𝑡−1: cell state at time t-1
𝑥𝑥𝑡𝑡: input vector at time t
𝑦𝑦𝑡𝑡−1: output at time t-1
Input
Recurrent
f
𝑐𝑐𝑡𝑡−1
Introduction LSTM with peephole connections Results and discussion Conclusion
![Page 10: LSTM: A Search Space Odyssey](https://reader030.fdocuments.in/reader030/viewer/2022012015/615a8d9968c2cc71902c51cc/html5/thumbnails/10.jpg)
Block Diagram
Output gate:𝑊𝑊𝑜𝑜: input weight ( 𝑅𝑅𝑁𝑁 ×𝑀𝑀)
𝑅𝑅𝑜𝑜: recurrent weight ( 𝑅𝑅𝑁𝑁 ×𝑀𝑀)
𝑏𝑏𝑜𝑜: bias weight (𝑅𝑅𝑁𝑁 )
𝑝𝑝𝑜𝑜: peephole weight (𝑅𝑅𝑁𝑁 )
𝑐𝑐𝑡𝑡−1: cell state at time t-1
𝑥𝑥𝑡𝑡: input vector at time t
𝑦𝑦𝑡𝑡−1: output at time t-1
Input
Recurrent
o
𝑐𝑐𝑡𝑡
Introduction LSTM with peephole connections Results and discussion Conclusion
![Page 11: LSTM: A Search Space Odyssey](https://reader030.fdocuments.in/reader030/viewer/2022012015/615a8d9968c2cc71902c51cc/html5/thumbnails/11.jpg)
Block Diagram
State cell:𝑧𝑧𝑡𝑡: the output of block input at time t
𝑖𝑖𝑡𝑡: the output of input gate at time t
𝑐𝑐𝑡𝑡−1: the output of cell state at time
t-1
𝑓𝑓𝑡𝑡: output of forget gate at time t
𝑐𝑐𝑡𝑡−1
𝑖𝑖𝑡𝑡
𝑧𝑧𝑡𝑡
𝑐𝑐𝑡𝑡−1
𝑓𝑓𝑡𝑡
Introduction LSTM with peephole connections Results and discussion Conclusion
![Page 12: LSTM: A Search Space Odyssey](https://reader030.fdocuments.in/reader030/viewer/2022012015/615a8d9968c2cc71902c51cc/html5/thumbnails/12.jpg)
Block Diagram
Block output:𝑜𝑜𝑡𝑡: the output of output gate at time t
𝑐𝑐𝑡𝑡: state cell at time tInput
Recurrent
y
Introduction LSTM with peephole connections Results and discussion Conclusion
![Page 13: LSTM: A Search Space Odyssey](https://reader030.fdocuments.in/reader030/viewer/2022012015/615a8d9968c2cc71902c51cc/html5/thumbnails/13.jpg)
LSTM Variants
• NIG: No Input Gate: 𝑖𝑖𝑡𝑡 = 1
• NFG: No Forget Gate: 𝑓𝑓𝑡𝑡 = 1
• NOG: No Output Gate: 𝑜𝑜𝑡𝑡 = 1
• NIAF: No Input Activation Function: g(x) = x
• NOAF: No Output Activation Function: h(x) = x
• CIFG: Coupled Input and Forget Gate: 𝑓𝑓𝑡𝑡 = 1- 𝑖𝑖𝑡𝑡
• NP: No Peepholes
• FGR: Full gate recurrence
Introduction LSTM with peephole connections Results and discussion Conclusion
![Page 14: LSTM: A Search Space Odyssey](https://reader030.fdocuments.in/reader030/viewer/2022012015/615a8d9968c2cc71902c51cc/html5/thumbnails/14.jpg)
Experiment setup
Datasets:
• TIMIT speech corpus
• IAM Online Handwriting Database
• JSB Chorales
Introduction LSTM with peephole connections Results and discussion Conclusion
![Page 15: LSTM: A Search Space Odyssey](https://reader030.fdocuments.in/reader030/viewer/2022012015/615a8d9968c2cc71902c51cc/html5/thumbnails/15.jpg)
Experiment setup
Features:
• TIMIT speech corpus:• extract 12 MFCCs + energy as well as their first and second derivatives
• IAM Online Handwriting Database:• x, y, t and the time of the pen lifting
• JSB Chorales:
• transposing each MIDI sequence in C major or C minor and sampling frames every quarter note.
Introduction LSTM with peephole connections Results and discussion Conclusion
![Page 16: LSTM: A Search Space Odyssey](https://reader030.fdocuments.in/reader030/viewer/2022012015/615a8d9968c2cc71902c51cc/html5/thumbnails/16.jpg)
Experiment setup
Network Architectures and training:
Dataset Type of Network Num of Hidden Layer Output Layer Loss Function Training
TIMIT Bidirectional LSTM Two SoftMax Cross-Entropy Error SGD
IAM Online Bidirectional LSTM Two SoftMax CTC Loss SGD
JSB Chorales LSTM one Sigmoid Cross-Entropy Error SGD
Introduction LSTM with peephole connections Results and discussion Conclusion
![Page 17: LSTM: A Search Space Odyssey](https://reader030.fdocuments.in/reader030/viewer/2022012015/615a8d9968c2cc71902c51cc/html5/thumbnails/17.jpg)
Comparison of the Variants
• Test set performance for all 200 trials:
Introduction LSTM with peephole connections Results and discussion Conclusion
![Page 18: LSTM: A Search Space Odyssey](https://reader030.fdocuments.in/reader030/viewer/2022012015/615a8d9968c2cc71902c51cc/html5/thumbnails/18.jpg)
Comparison of the Variants
• Test set performance for the best 10% trials:
Introduction LSTM with peephole connections Results and discussion Conclusion
![Page 19: LSTM: A Search Space Odyssey](https://reader030.fdocuments.in/reader030/viewer/2022012015/615a8d9968c2cc71902c51cc/html5/thumbnails/19.jpg)
Impact of Hyperparameters
Introduction LSTM with peephole connections Results and discussion Conclusion
![Page 20: LSTM: A Search Space Odyssey](https://reader030.fdocuments.in/reader030/viewer/2022012015/615a8d9968c2cc71902c51cc/html5/thumbnails/20.jpg)
Interaction of Hyperparameters
Introduction LSTM with peephole connections Results and discussion Conclusion
![Page 21: LSTM: A Search Space Odyssey](https://reader030.fdocuments.in/reader030/viewer/2022012015/615a8d9968c2cc71902c51cc/html5/thumbnails/21.jpg)
Total marginal predicted performance
TIMIT:
Introduction LSTM with peephole connections Results and discussion Conclusion
![Page 22: LSTM: A Search Space Odyssey](https://reader030.fdocuments.in/reader030/viewer/2022012015/615a8d9968c2cc71902c51cc/html5/thumbnails/22.jpg)
Total marginal predicted performance
IAM Online:
Introduction LSTM with peephole connections Results and discussion Conclusion
![Page 23: LSTM: A Search Space Odyssey](https://reader030.fdocuments.in/reader030/viewer/2022012015/615a8d9968c2cc71902c51cc/html5/thumbnails/23.jpg)
Total marginal predicted performance
JSB Chorales :
Introduction LSTM with peephole connections Results and discussion Conclusion
![Page 24: LSTM: A Search Space Odyssey](https://reader030.fdocuments.in/reader030/viewer/2022012015/615a8d9968c2cc71902c51cc/html5/thumbnails/24.jpg)
Conclusion
• The most commonly used LSTM architecture performs reasonably well on various datasets.
• Coupling the input and forget gates (CIFG) or removing peephole connections (NP)
simplified LSTMs in these experiments without significantly decreasing performance.
• The forget gate and the output activation function are the most critical components of the
LSTM block
• the learning rate is the most crucial hyperparameter, followed by the network size.
• Hyperparameters are virtually independent
Introduction LSTM with peephole connections Results and discussion Conclusion
![Page 25: LSTM: A Search Space Odyssey](https://reader030.fdocuments.in/reader030/viewer/2022012015/615a8d9968c2cc71902c51cc/html5/thumbnails/25.jpg)
References:
• K. Greff, R. K. Srivastava, J. Koutník, B. R. Steunebrink and J. Schmidhuber, "LSTM: A
Search Space Odyssey," in IEEE Transactions on Neural Networks and Learning Systems, vol.
28, no. 10, pp. 2222-2232, Oct. 2017.
• https://www.youtube.com/watch?v=lycKqccytfU
• https://www.youtube.com/watch?v=lWkFhVq9-nc
• https://en.wikipedia.org/wiki/Long_short-term_memory
Introduction LSTM with peephole connections Results and discussion Conclusion