Author: Kamper, Candace, E An E-Learning Usability Study: The
Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. ·...
Transcript of Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. ·...
![Page 1: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare](https://reader035.fdocuments.in/reader035/viewer/2022062610/6106b4ff88b850110762a55e/html5/thumbnails/1.jpg)
Vector Quantized Neural Networks for Acoustic Unit
Discovery
Benjamin van Niekerk, Leanne Nortje, Herman Kamper
![Page 2: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare](https://reader035.fdocuments.in/reader035/viewer/2022062610/6106b4ff88b850110762a55e/html5/thumbnails/2.jpg)
The Generative Factors of Speech
Content:● Discrete phonetic units.● ≅44 phonemes in English.
HH / Y / UW / M / ER
Prosody:● Rhythm● Intonation● Stresses
Timbre:● Quality of a particular voice.● Characterized by frequency
spectrum.
HUMOUR
![Page 3: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare](https://reader035.fdocuments.in/reader035/viewer/2022062610/6106b4ff88b850110762a55e/html5/thumbnails/3.jpg)
The Generative Factors of Speech
Content:● Discrete phonetic units.● ≅44 phonemes in English.
HH / Y / UW / M / ER
Prosody:● Rhythm● Intonation● Stresses
Timbre:● Quality of a particular voice.● Characterized by frequency
spectrum.
HUMOUR
![Page 4: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare](https://reader035.fdocuments.in/reader035/viewer/2022062610/6106b4ff88b850110762a55e/html5/thumbnails/4.jpg)
The Generative Factors of Speech
Content:● Discrete phonetic units.● ≅44 phonemes in English.
HH / Y / UW / M / ER
Prosody:● Rhythm● Intonation● Stresses
Timbre:● Quality of a particular voice.● Characterized by frequency
spectrum.
HUMOUR
![Page 5: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare](https://reader035.fdocuments.in/reader035/viewer/2022062610/6106b4ff88b850110762a55e/html5/thumbnails/5.jpg)
The Generative Factors of Speech
Content:● Discrete phonetic units.● ≅44 phonemes in English.
HH / Y / UW / M / ER
Prosody:● Rhythm● Intonation● Stresses
Timbre:● Quality of a particular voice.● Characterized by frequency
spectrum.
HUMOUR
![Page 6: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare](https://reader035.fdocuments.in/reader035/viewer/2022062610/6106b4ff88b850110762a55e/html5/thumbnails/6.jpg)
The Generative Factors of Speech
HH / Y UW/ M/ ER/
Content:● Discrete phonetic units.● ≅44 phonemes in English.
Prosody:● Rhythm● Intonation● Stresses
Timbre:● Quality of a particular voice.● Characterized by frequency
spectrum.
![Page 7: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare](https://reader035.fdocuments.in/reader035/viewer/2022062610/6106b4ff88b850110762a55e/html5/thumbnails/7.jpg)
The Generative Factors of Speech
HH / Y UW/ M/ ER/
Content:● Discrete phonetic units.● ≅44 phonemes in English.
Prosody:● Rhythm● Intonation● Stresses
Timbre:● Quality of a particular voice.● Characterized by frequency
spectrum.
![Page 8: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare](https://reader035.fdocuments.in/reader035/viewer/2022062610/6106b4ff88b850110762a55e/html5/thumbnails/8.jpg)
The Generative Factors of Speech
HH / Y UW/ M/ ER/
Content:● Discrete phonetic units.● ≅44 phonemes in English.
Prosody:● Rhythm● Intonation● Stresses
Timbre:● Quality of a particular voice.● Characterized by frequency
spectrum.
![Page 9: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare](https://reader035.fdocuments.in/reader035/viewer/2022062610/6106b4ff88b850110762a55e/html5/thumbnails/9.jpg)
The Generative Factors of Speech
HH / Y UW/ M/ ER/
Content:● Discrete phonetic units.● ≅44 phonemes in English.
Prosody:● Rhythm● Intonation● Stresses
Timbre:● Quality of a particular voice.● Characterized by frequency
spectrum.
![Page 10: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare](https://reader035.fdocuments.in/reader035/viewer/2022062610/6106b4ff88b850110762a55e/html5/thumbnails/10.jpg)
The Generative Factors of Speech
Content:● Discrete phonetic units.● ≅44 phonemes in English.
Prosody:● Rhythm● Intonation● Stresses
Timbre:● Quality of a particular voice.● Characterized by frequency
spectrum.
![Page 11: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare](https://reader035.fdocuments.in/reader035/viewer/2022062610/6106b4ff88b850110762a55e/html5/thumbnails/11.jpg)
What is Acoustic Unit Discovery?
The goal is to learn discrete representations of speech that separate phonetic content from the other factors.…all without any labels or annotations!
![Page 12: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare](https://reader035.fdocuments.in/reader035/viewer/2022062610/6106b4ff88b850110762a55e/html5/thumbnails/12.jpg)
What is Acoustic Unit Discovery?
The goal is to learn discrete representations of speech that separate phonetic content from the other factors.…all without any labels or annotations!
![Page 13: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare](https://reader035.fdocuments.in/reader035/viewer/2022062610/6106b4ff88b850110762a55e/html5/thumbnails/13.jpg)
What is Acoustic Unit Discovery?
Encoder
The goal is to learn discrete representations of speech that separate phonetic content from the other factors.…all without any labels or annotations!
![Page 14: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare](https://reader035.fdocuments.in/reader035/viewer/2022062610/6106b4ff88b850110762a55e/html5/thumbnails/14.jpg)
What is Acoustic Unit Discovery?
Encoder
The goal is to learn discrete representations of speech that separate phonetic content from the other factors.…all without any labels or annotations!
![Page 15: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare](https://reader035.fdocuments.in/reader035/viewer/2022062610/6106b4ff88b850110762a55e/html5/thumbnails/15.jpg)
Applications
Bootstrap training of low-resource speech systems:
Automatic speech recognition
Text-to-speech
Non-parallel voice conversion
![Page 16: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare](https://reader035.fdocuments.in/reader035/viewer/2022062610/6106b4ff88b850110762a55e/html5/thumbnails/16.jpg)
Applications
Automatic speech recognition
Text-to-speech
Non-parallel voice conversion
Bootstrap training of low-resource speech systems:
![Page 17: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare](https://reader035.fdocuments.in/reader035/viewer/2022062610/6106b4ff88b850110762a55e/html5/thumbnails/17.jpg)
Applications
Automatic speech recognition
Text-to-speech
Non-parallel voice conversion
Bootstrap training of low-resource speech systems:
![Page 18: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare](https://reader035.fdocuments.in/reader035/viewer/2022062610/6106b4ff88b850110762a55e/html5/thumbnails/18.jpg)
Applications
Automatic speech recognition
Text-to-speech
Non-parallel voice conversion
Bootstrap training of low-resource speech systems:
![Page 19: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare](https://reader035.fdocuments.in/reader035/viewer/2022062610/6106b4ff88b850110762a55e/html5/thumbnails/19.jpg)
But, how do we learn discrete representations using neural networks?
![Page 20: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare](https://reader035.fdocuments.in/reader035/viewer/2022062610/6106b4ff88b850110762a55e/html5/thumbnails/20.jpg)
But, how do we learn discrete representations using neural networks?
A. van den Oord, O. Vinyals, and K. Kavukcuoglu. “Neural discrete representation learning.” Advances in Neural Information Processing Systems. 2017.
![Page 21: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare](https://reader035.fdocuments.in/reader035/viewer/2022062610/6106b4ff88b850110762a55e/html5/thumbnails/21.jpg)
Vector Quantization Layer
Codebook
![Page 22: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare](https://reader035.fdocuments.in/reader035/viewer/2022062610/6106b4ff88b850110762a55e/html5/thumbnails/22.jpg)
Vector Quantization Layer
Encoder
Codebook
![Page 23: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare](https://reader035.fdocuments.in/reader035/viewer/2022062610/6106b4ff88b850110762a55e/html5/thumbnails/23.jpg)
Vector Quantization Layer
Encoder
Codebook
![Page 24: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare](https://reader035.fdocuments.in/reader035/viewer/2022062610/6106b4ff88b850110762a55e/html5/thumbnails/24.jpg)
Vector Quantization Layer
Encoder
Codebook
![Page 25: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare](https://reader035.fdocuments.in/reader035/viewer/2022062610/6106b4ff88b850110762a55e/html5/thumbnails/25.jpg)
Vector Quantization Layer
Encoder
Codebook
![Page 26: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare](https://reader035.fdocuments.in/reader035/viewer/2022062610/6106b4ff88b850110762a55e/html5/thumbnails/26.jpg)
Vector Quantization Layer
Encoder
Codebook
![Page 27: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare](https://reader035.fdocuments.in/reader035/viewer/2022062610/6106b4ff88b850110762a55e/html5/thumbnails/27.jpg)
Vector Quantization Layer
Encoder
Codebook
![Page 28: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare](https://reader035.fdocuments.in/reader035/viewer/2022062610/6106b4ff88b850110762a55e/html5/thumbnails/28.jpg)
Vector Quantization Layer
Encoder
Codebook
![Page 29: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare](https://reader035.fdocuments.in/reader035/viewer/2022062610/6106b4ff88b850110762a55e/html5/thumbnails/29.jpg)
Vector Quantization Layer
Encoder
Codebook
![Page 30: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare](https://reader035.fdocuments.in/reader035/viewer/2022062610/6106b4ff88b850110762a55e/html5/thumbnails/30.jpg)
Our contribution: we propose and compare two models for acoustic unit discovery in the ZeroSpeech 2020 Challenge.
A Vector-Quantized Variational Autoencoder (VQ-VAE)1. A combination of Vector-Quantization and
Contrastive Predictive Coding (VQ-CPC)2.
Encoder
Decoder
VQ
layer
Inspired by: J. Chorowski, et al. “Unsupervised speech representation learning using wavenet autoencoders.” IEEE/ACM transactions on audio, speech, and language processing. 2019.
![Page 31: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare](https://reader035.fdocuments.in/reader035/viewer/2022062610/6106b4ff88b850110762a55e/html5/thumbnails/31.jpg)
Our contribution: we propose and compare two models for acoustic unit discovery in the ZeroSpeech 2020 Challenge.
A Vector-Quantized Variational Autoencoder (VQ-VAE)1. A combination of Vector-Quantization and
Contrastive Predictive Coding (VQ-CPC)2.
Encoder
Decoder
VQ
layer
Inspired by: J. Chorowski, et al. “Unsupervised speech representation learning using wavenet autoencoders.” IEEE/ACM transactions on audio, speech, and language processing. 2019.
![Page 32: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare](https://reader035.fdocuments.in/reader035/viewer/2022062610/6106b4ff88b850110762a55e/html5/thumbnails/32.jpg)
Our contribution: we propose and compare two models for acoustic unit discovery in the ZeroSpeech 2020 Challenge.
A combination of Vector-Quantization and Contrastive Predictive Coding (VQ-CPC)2.A Vector-Quantized Variational
Autoencoder (VQ-VAE)1.
Encoder
Decoder
VQ
layer
Inspired by: J. Chorowski, et al. “Unsupervised speech representation learning using wavenet autoencoders.” IEEE/ACM transactions on audio, speech, and language processing. 2019.
Inspired by: A. van den Oord, et al. “Representation Learning with Contrastive Predictive Coding.” 2018.
![Page 33: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare](https://reader035.fdocuments.in/reader035/viewer/2022062610/6106b4ff88b850110762a55e/html5/thumbnails/33.jpg)
Vector-Quantized Variational Autoencoder
Encoder
VQ
layer
Decoder
![Page 34: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare](https://reader035.fdocuments.in/reader035/viewer/2022062610/6106b4ff88b850110762a55e/html5/thumbnails/34.jpg)
Vector-Quantized Variational Autoencoder
minimize reconstruction error
Encoder
VQ
layer
Decoder
![Page 35: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare](https://reader035.fdocuments.in/reader035/viewer/2022062610/6106b4ff88b850110762a55e/html5/thumbnails/35.jpg)
Vector-Quantized Variational Autoencoder
Encoder
Decoder
VQ
layer
Information bottleneck
![Page 36: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare](https://reader035.fdocuments.in/reader035/viewer/2022062610/6106b4ff88b850110762a55e/html5/thumbnails/36.jpg)
Vector-Quantized Variational Autoencoder
Encoder
Decoder
VQ
layer
Information bottleneck
Speaker
![Page 37: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare](https://reader035.fdocuments.in/reader035/viewer/2022062610/6106b4ff88b850110762a55e/html5/thumbnails/37.jpg)
Vector-Quantized Variational Autoencoder
Encoder
Decoder
VQ
layer
Information bottleneck
SpeakerPowerful autoregressive model
![Page 38: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare](https://reader035.fdocuments.in/reader035/viewer/2022062610/6106b4ff88b850110762a55e/html5/thumbnails/38.jpg)
Vector-Quantized Contrastive Predictive Coding
Input
Prediction
![Page 39: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare](https://reader035.fdocuments.in/reader035/viewer/2022062610/6106b4ff88b850110762a55e/html5/thumbnails/39.jpg)
Vector-Quantized Contrastive Predictive Coding
Input
Encoder
![Page 40: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare](https://reader035.fdocuments.in/reader035/viewer/2022062610/6106b4ff88b850110762a55e/html5/thumbnails/40.jpg)
Vector-Quantized Contrastive Predictive Coding
Input
Encoder
VQ layer
![Page 41: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare](https://reader035.fdocuments.in/reader035/viewer/2022062610/6106b4ff88b850110762a55e/html5/thumbnails/41.jpg)
Vector-Quantized Contrastive Predictive Coding
Input
Encoder
VQ layer
Context model
![Page 42: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare](https://reader035.fdocuments.in/reader035/viewer/2022062610/6106b4ff88b850110762a55e/html5/thumbnails/42.jpg)
Vector-Quantized Contrastive Predictive Coding
Input
Encoder
VQ layer
Context model
Predictions
![Page 43: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare](https://reader035.fdocuments.in/reader035/viewer/2022062610/6106b4ff88b850110762a55e/html5/thumbnails/43.jpg)
Vector-Quantized Contrastive Predictive Coding
Context vector
![Page 44: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare](https://reader035.fdocuments.in/reader035/viewer/2022062610/6106b4ff88b850110762a55e/html5/thumbnails/44.jpg)
Vector-Quantized Contrastive Predictive Coding
Context vector
Positive example
![Page 45: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare](https://reader035.fdocuments.in/reader035/viewer/2022062610/6106b4ff88b850110762a55e/html5/thumbnails/45.jpg)
Vector-Quantized Contrastive Predictive Coding
Context vector
Positive example
Negative examples
![Page 46: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare](https://reader035.fdocuments.in/reader035/viewer/2022062610/6106b4ff88b850110762a55e/html5/thumbnails/46.jpg)
Vector-Quantized Contrastive Predictive Coding
Context vector
Positive example
Negative examples
![Page 47: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare](https://reader035.fdocuments.in/reader035/viewer/2022062610/6106b4ff88b850110762a55e/html5/thumbnails/47.jpg)
Vector-Quantized Contrastive Predictive Coding
Context vector
Positive example
Negative examples
![Page 48: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare](https://reader035.fdocuments.in/reader035/viewer/2022062610/6106b4ff88b850110762a55e/html5/thumbnails/48.jpg)
Evaluation - Voice Conversion
Encoder
Decoder
VQ
layer
Evaluation Metrics:● Speaker similarity (1-5 scale).● Intelligibility (character error rate).● Mean opinion score (1-5 scale).
![Page 49: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare](https://reader035.fdocuments.in/reader035/viewer/2022062610/6106b4ff88b850110762a55e/html5/thumbnails/49.jpg)
Evaluation - Voice Conversion
Encoder
Decoder
VQ
layer
Evaluation Metrics:● Speaker similarity (1-5 scale).● Intelligibility (character error rate).● Mean opinion score (1-5 scale).
![Page 50: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare](https://reader035.fdocuments.in/reader035/viewer/2022062610/6106b4ff88b850110762a55e/html5/thumbnails/50.jpg)
Source Converted Target Other Conversion
Evaluation - Voice Conversion
![Page 51: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare](https://reader035.fdocuments.in/reader035/viewer/2022062610/6106b4ff88b850110762a55e/html5/thumbnails/51.jpg)
Evaluation - Voice Conversion
![Page 52: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare](https://reader035.fdocuments.in/reader035/viewer/2022062610/6106b4ff88b850110762a55e/html5/thumbnails/52.jpg)
Evaluation - Voice Conversion
![Page 53: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare](https://reader035.fdocuments.in/reader035/viewer/2022062610/6106b4ff88b850110762a55e/html5/thumbnails/53.jpg)
Evaluation - Voice Conversion
![Page 54: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare](https://reader035.fdocuments.in/reader035/viewer/2022062610/6106b4ff88b850110762a55e/html5/thumbnails/54.jpg)
Evaluation - ABX Score
Triphone A:
beg
Encoder
![Page 55: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare](https://reader035.fdocuments.in/reader035/viewer/2022062610/6106b4ff88b850110762a55e/html5/thumbnails/55.jpg)
Evaluation - ABX Score
Triphone A:
beg
Triphone B:
bag
Encoder Encoder
![Page 56: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare](https://reader035.fdocuments.in/reader035/viewer/2022062610/6106b4ff88b850110762a55e/html5/thumbnails/56.jpg)
Evaluation - ABX Score
Triphone A:
beg
Triphone B:
bag
Triphone X:
beg
Encoder Encoder Encoder
![Page 57: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare](https://reader035.fdocuments.in/reader035/viewer/2022062610/6106b4ff88b850110762a55e/html5/thumbnails/57.jpg)
Evaluation - ABX Score
Triphone A:
bug
Triphone B:
bag
Triphone X:
beg
Encoder Encoder Encoder
![Page 58: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare](https://reader035.fdocuments.in/reader035/viewer/2022062610/6106b4ff88b850110762a55e/html5/thumbnails/58.jpg)
Evaluation - ABX Score
![Page 59: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare](https://reader035.fdocuments.in/reader035/viewer/2022062610/6106b4ff88b850110762a55e/html5/thumbnails/59.jpg)
Questions?
![Page 60: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare](https://reader035.fdocuments.in/reader035/viewer/2022062610/6106b4ff88b850110762a55e/html5/thumbnails/60.jpg)
Vector Quantized Variational Autoencoder
log-Mel spec
conv3(768)batchnorm
ReLU
conv3(768)batchnorm
ReLU
conv4stride2(768)batchnorm
ReLU
conv3(768)batchnorm
ReLU
conv3(768)batchnorm
ReLU
Encoder linear(64) VQ(512)
100H
z50
Hz
Bottleneck
jitter(0.5) embedding
Decoder
concat
upsample
biGRU(128)biGRU(128)
upsample
GRU(896)
linear(256)ReLU
linear(256)ReLU
softmaxsamplemu-law
embedding
speaker