Entropy and Dynamism Criteria for Voice Quality Classification
Applications
Authors: Peter D. Kukharchik, Igor E. Kheidorov, Hanna M. Lukashevich, Denis L.
Mitrofanov
Belarusian State University, Radiophysics Department, Minsk, Belarus
VOICE QUALITY: FUNCTIONS, ANALYSIS AND SYNTHESIS
GENEVA - AUGUST 27-29, 2003
ISCA Tutorial and Research Workshop International Speech Communication
Association
VOICE QUALITY: FUNCTIONS, ANALYSIS AND SYNTHESIS
ISCA Tutorial and Research Workshop International Speech Communication
AssociationVoice Quality Classification Applications
Introduction System design Experiment Conclusion
VOICE QUALITY: FUNCTIONS, ANALYSIS AND SYNTHESIS
ISCA Tutorial and Research Workshop International Speech Communication
AssociationIntroduction
Audio is a large and extremely variable data class.
The range of sounds is large, from music genres to animal cries to synthesizer samples.
Any of the above can and will occur in combination.
VOICE QUALITY: FUNCTIONS, ANALYSIS AND SYNTHESIS
ISCA Tutorial and Research Workshop International Speech Communication
AssociationExisting Approaches
Signal Processing Techniques Spectrum Modulation spectrum Temporal Information
Decision Making Bayesian Information Criterion (BIC) Log Likelihood Ratio Hidden Markov Model (HMM)
VOICE QUALITY: FUNCTIONS, ANALYSIS AND SYNTHESIS
ISCA Tutorial and Research Workshop International Speech Communication
Association
Block diagram of the proposed system
Feature vector extraction
Neural network
Entropy&
DynamismHMM
Input Data(Wave file)
Segments
Vectors(Mel Cepstra)
Probability of Russian phonemes
Entropyand
Dynamism
VOICE QUALITY: FUNCTIONS, ANALYSIS AND SYNTHESIS
ISCA Tutorial and Research Workshop International Speech Communication
AssociationDefinitions
Entropy and averaged entropy
Entropy is measure of the uncertainty or disorder in a given distribution
nk
K
knkn xqPxqPh |log| 2
1
2
2
1N
N
n
nttn h
NH
We useN=40
VOICE QUALITY: FUNCTIONS, ANALYSIS AND SYNTHESIS
ISCA Tutorial and Research Workshop International Speech Communication
AssociationDefinitions
Dynamism and average dynamism
Dynamism is a measure of the rate of change of a quantity
K
knknkn xqPxqPd
1
21 ]||[
2
2
1N
N
n
nttn d
ND
VOICE QUALITY: FUNCTIONS, ANALYSIS AND SYNTHESIS
ISCA Tutorial and Research Workshop International Speech Communication
Association
Feature Vectors extraction
We use 12 Mel Cepstra coefficients in 30ms window with shifting of frame 10ms, for 4-15min wave files of russian speech, non-russian speech and music.
VOICE QUALITY: FUNCTIONS, ANALYSIS AND SYNTHESIS
ISCA Tutorial and Research Workshop International Speech Communication
Association
S0
S1
S2
S3
S4
S5
S6
HMM
HMM
Define HMM for signal – one HMM state for every segment we want to find
Perform a Viterbi search of an optimal path using probabilities from previous step
Determine segment boundaries as a moments of HMM states change
Hidden Markov Model
VOICE QUALITY: FUNCTIONS, ANALYSIS AND SYNTHESIS
ISCA Tutorial and Research Workshop International Speech Communication
Association
Neural network for probabilities generation : grounds
Neural networks can model probabilities distribution with a high accuracy due to their ability to approximate a large variety of functions
If training neural network doesn’t stop in local minimum
the outputs can be considered as classes probabilities
Neural Network
VOICE QUALITY: FUNCTIONS, ANALYSIS AND SYNTHESIS
ISCA Tutorial and Research Workshop International Speech Communication
Association
Neural network for probabilities generation : structure
• Fully connected mutilayer perceptron
– Input layer size equals to feature vector size
– Output layer size equals to probability of phonemes
– Number and sizes of hidden layers varies
– Tangent activation for hidden neurons
– Softmax activation for output neurons
Mutilayer Perceptron
VOICE QUALITY: FUNCTIONS, ANALYSIS AND SYNTHESIS
ISCA Tutorial and Research Workshop International Speech Communication
AssociationResults
Music
Entropy histogram
VOICE QUALITY: FUNCTIONS, ANALYSIS AND SYNTHESIS
ISCA Tutorial and Research Workshop International Speech Communication
AssociationResults - Russian Speech
VOICE QUALITY: FUNCTIONS, ANALYSIS AND SYNTHESIS
ISCA Tutorial and Research Workshop International Speech Communication
AssociationResults - Foreign
VOICE QUALITY: FUNCTIONS, ANALYSIS AND SYNTHESIS
ISCA Tutorial and Research Workshop International Speech Communication
AssociationResults - Russian and Foreign
Blue is Russian, pink is French
VOICE QUALITY: FUNCTIONS, ANALYSIS AND SYNTHESIS
ISCA Tutorial and Research Workshop International Speech Communication
AssociationResults
Two Russian speakers (blue and brown) and Music
(others)
Russian speaker (blue) and Music
(pink)
VOICE QUALITY: FUNCTIONS, ANALYSIS AND SYNTHESIS
ISCA Tutorial and Research Workshop International Speech Communication
Association
Results Pure Russian & “Czech” Russian
There some difference even between native speech and Russian with Czech accent
VOICE QUALITY: FUNCTIONS, ANALYSIS AND SYNTHESIS
ISCA Tutorial and Research Workshop International Speech Communication
AssociationResultsEntropy histograms of “normal” (brown) and
“rough” (blue) French speech
VOICE QUALITY: FUNCTIONS, ANALYSIS AND SYNTHESIS
ISCA Tutorial and Research Workshop International Speech Communication
AssociationResultsEntropy histograms for “normal”
(brown), “rough” (blue) and “lips” (lips) French speech
VOICE QUALITY: FUNCTIONS, ANALYSIS AND SYNTHESIS
ISCA Tutorial and Research Workshop International Speech Communication
AssociationConclusion
Further research Parameter vectors, their size, number of
context frames Specialized HMM structures for a certain
type of speech signals
Conclusion Entropy and Dynamism features, as
experiments show, can be successfully used for automatic signal segmentation. Further research in this area can lead to better practical results.
Top Related