1 Tutorial on Neural Network Models for Speech and Image Processing B. Yegnanarayana Speech & Vision...

Tutorial on

Neural Network Models for Speech and Image Processing

B. YegnanarayanaSpeech & Vision Laboratory

Dept. of Computer Science & EngineeringIIT Madras, Chennai-600036

yegna@cs.iitm.ernet.in

WCCI 2002, Honululu, Hawaii, USAMay 12, 2002

Need for New Models of Computing for Speech & Image

• Speech & Image processing tasks

• Issues in dealing with these tasks by human beings

• Issues in dealing with the tasks by machine

• Need for new models of computing in dealing with natural signals

• Need for effective (relevant) computing

• Role of Artificial Neural Networks (ANN)

Next >< Prev

Organization of the Tutorial

Part I Feature extraction and classification problems with speech and image data

Part II Basics of ANN

Part III ANN models for feature extraction and classification

Part IV Applications in speech and image processing

Next >< Prev

PART I

Feature Extraction and Classification Problems in

Speech and Image

Feature Extraction and Classification Problems in Speech and Image

• Distinction between natural and synthetic signals (unknown model vs known model generating the signal)

• Nature of speech and image data (non-repetitive data, but repetitive features)

• Need for feature extraction and classification

• Methods for feature extraction and models for classification

• Need for nonlinear approaches (methods and models)

Next >< Prev

Speech vs Audio

• Audio (audible) signals (noise, music, speech and other signals)

• Categories of audio signals

– Audio signal vs non-signal (noise)

– Signal from speech production mechanism vs other audio signals

– Non-speech vs speech signals (like with natural language)

Next >< Prev

Speech Production Mechanism

< Back

Different types of sounds

< Back

Categorization of sound units

< Back

Nature of Speech Signal

• Digital speech: Sequence of samples or numbers

• Waveform for word “MASK” (Figure)

• Characteristics of speech signal– Excitation source characteristics

– Vocal tract system characteristics

Next>< Prev

Waveform for the word “mask”

Source-System Model of Speech Production

Impulsetrain generator

Randomnoise generator

Time-varyingdigital filterX

Voice/unvoiced

switch

u(n)s(n)

Pitch period

Vocal tract parameters

Next >< Prev

Features from Speech Signal (demo)

• Different components of speech (speech, source and system)

• Different speech sound units (Alphabet in Indian Languages)

• Different emotions

• Different speakers

Next >< Prev

Speech Signal Processing Methods

• To extract source-system features and suprasegmental features

• Production-based features

• DSP-based features

• Perception-based features

Next >< Prev

Models for Matching and Classification

• Dynamic Time Warping (DTW)

• Hidden Markov Models (HMM)

• Gaussian Mixture Models (GMM)

Next >< Prev

Applications of Speech Processing

• Speech recognition

• Speaker recognition/verification

• Speech enhancement

• Speech compression

• Audio indexing and retrieval

Next >< Prev

Limitations of Feature Extraction Methods and Classification Models

• Fixed frame analysis

• Variability in the implicit pattern

• Not pattern-based analysis

• Temporal nature of the patterns

Next >< Prev

Need for New Approaches

• To deal with ambiguity and variability in the data for feature extraction

• To combine evidence from multiple sources (classifiers and knowledge sources)

Next >< Prev

Images• Digital Image - Matrix of numbers

• Types of Images

– line sketches, binary, gray level and color

– Still images, video, multimedia

Next >< Prev

Image Analysis

• Feature extraction• Image segmentation: Gray level, color,

texture• Image classification

Next >< Prev

Processing of Texture-like Images2-D Gabor Filter

A typical Gaussian filter with =30

A typical Gabor filter with

=30, =3.14 and =45

)sincos(2

1),,,,,( ))()(( 22

yxyxyx

Next >< Prev

Limitations

• Feature extraction• Matching• Classification methods/models

Next >< Prev

Need for New Approaches

• Feature extraction: PCA and nonlinear PCA

• Matching: Stereo images

• Smoothing: Using the knowledge of image and not noise

• Edge extraction and classification: Integration of global and local information or combining evidence

< Prev Next >

PART II

Basics of ANN

• Problem solving: Pattern recognition tasks by human and machine

• Pattern vs data

• Pattern processing vs data processing

• Architectural mismatch

• Need for new models of computing

Artificial Neural Networks

Next >< Prev

Biological Neural Networks

• Structure and function: Neurons, interconnections, dynamics for learning and recall

• Features: Robustness, fault tolerance, flexibility, ability to deal with variety of data situations, collective computation

• Comparison with computers: Speed, processing, size and complexity, fault tolerance, control mechanism

• Parallel and Distributed Processing (PDP) models

Next >< Prev

Basics of ANN

• ANN terminology: Processing unit (fig), interconnection, operation and update (input, weights, activation value, output function, output value)

• Models of neurons: MP neuron, perceptron and adaline

• Topology (fig)• Basic learning laws (fig)

Next >< Prev

Model of a Neuron

Topology

Basic Learning Laws

Activation and Synaptic Dynamic Models

• General activation dynamics model

ijjjiiiiiiiiiiiii wxfJxDExfIxCBxAtx ))()(())()(()(

Passive decay term

Excitatory term Inhibitory term

• Synaptic dynamics model

)()()()( tststwtw jiijij Passive

decay term

Correlation term

• Stability and convergence<Prev

Functional Units and Pattern Recognition Tasks

• Feedforward ANN– Pattern association– Pattern classification– Pattern mapping/classification

• Feedback ANN– Autoassociation– Pattern storage (LTM)– Pattern environment storage (LTM)

• Feedforward and Feedback (Competitive Learning) ANN– Pattern storage (STM)– Pattern clustering– Feature map

Next >< Prev

Two Layer Feedforward Neural Network (FFNN)

Next >< Prev

PR Tasks by FFNN• Pattern association

– Architecture: Two layers, linear processing, single set of weights– Learning:, Hebb's (orthogonal) rule, Delta (linearly independent) rule– Recall: Direct– Limitation: Linear independence, number of patterns restricted to input dimensionality– To overcome: Nonlinear processing units, leads to a pattern classification problem

• Pattern classification– Architecture: Two layers, nonlinear processing units, geometrical interpretation– Learning: Perceptron learning– Recall: Direct– Limitation: Linearly separable functions, cannot handle hard problems– To overcome: More layers, leads to a hard learning problem

• Pattern mapping/classification– Architecture: Multilayer (hidden), nonlinear processing units, geometrical interpretation– Learning: Generalized delta rule (backpropagation)– Recall: Direct– Limitation: Slow learning, does not guarantee convergence– To overcome: More complex architecture

Next >< Prev

Perceptron Network

• Perceptron classification problem• Perceptron learning law• Perceptron convergence theorem• Perceptron representation problem• Multilayer perceptron

Next >< Prev

Geometric Interpretation of Perceptron Learning

Next >< Prev

Generalized Delta Rule (Backpropagation Learning)

okkj ffbsssw )(,

hji swfsasw

Next >< Prev

Issues in Backpropagation Learning

• Description and features of error backpropagation

• Performance of backpropagation learning• Refinements of backpropagation learning• Interpretation of results of learning• Generalization• Tasks with backpropagation network• Limitations of backpropagation learning• Extensions to backpropagation

Next >< Prev

PR Tasks by FBNN• Autoassociation

– Architecture: Single layer with feedback, linear processing units– Learning: Hebb (orthogonal inputs), Delta (linearly independent inputs)– Recall: Activation dynamics until stable states are reached– Limitation: No accretive behavior– To overcome: Nonlinear processing units, leads to a pattern storage problem

• Pattern Storage– Architecture: Feedback neural network, nonlinear processing units, states, Hopfield

energy analysis– Learning: Not important– Recall: Activation dynamics until stable states are reached– Limitation: Hard problems, limited number of patterns, false minima– To overcome: Stochastic update, hidden units

• Pattern Environment Storage– Architecture: Boltzmann machine, nonlinear processing units, hidden units,

stochastic update– Learning: Boltzmann learning law, simulated annealing– Recall: Activation dynamics, simulated annealing– Limitation: Slow learning– To Overcome: Different architecture

Next >< Prev

Hopfield Model• Model• Pattern storage condition

• Capacity of Hopfield model: Number of patterns for a given probability of error

ij aaw )sgn(

lliij aa

,...1where

• Energy analysis:

ssswV iijiij

Continuous Hopfield model: x

1)(Next >< Prev

State Transition Diagram

Next >< Prev

Computation of Weights for Pattern Storage

Patterns to be stored (111) and (010).

Results in set of inequalities to be satisfied.

Next >< Prev

Pattern Storage Tasks• Hard problems : Conflicting requirements on a

set of inequalities• Hidden units: Problem of false minima• Stochastic update

Stochastic equilibrium: Boltzmann-Gibbs Law Z

)(Next >< Prev

Simulated Annealing

Next >< Prev

Boltzmann Machine

• Pattern environment storage• Architecture: Visible units, hidden

units, stochastic update, simulated annealing

• Boltzmann Learning Law:

ijijpp

Next >< Prev

Discussion on Boltzmann Learning• Expression for Boltzmann learning

– Significance of p+ij and p-

– Learning and unlearning– Local property– Choice of and initial weights

• Implementation of Boltzmann learning– Algorithm for learning a pattern environment– Algorithm for recall of a pattern– Implementation of simulated annealing– Annealing schedule

• Pattern recognition tasks by Boltzmann machine– Pattern completion– Pattern association– Recall from noisy or partial input

• Interpretation of Boltzmann learning– Markov property of simulated annealing– Clamped-free energy and full energy

• Variations of Boltzmann learning– Deterministic Boltzmann machine– Mean-field approximation

Next >< Prev

Competitive Learning Neural Network (CLNN)

Output layer with on-center and off-surroundconnections

Input layer

Next >< Prev

PR Tasks by CLNN• Pattern storage (STM)

– Architecture: Two layers (input and competitive), linear processing units– Learning: No learning in FF stage, fixed weights in FB layer– Recall: Not relevant– Limitation: STM, no application, theoretical interest– To overcome: Nonlinear output function in FB stage, learning in FF stage

• Pattern clustering (grouping)– Architecture: Two layers (input and competitive), nonlinear processing units in

the competitive layer– Learning: Only in FF stage, Competitive learning– Recall: Direct in FF stage, activation dynamics until stable state is reached in

FB layer– Limitation: Fixed (rigid) grouping of patterns– To overcome: Train neighbourhood units in competition layer

• Feature map– Architecture: Self-organization network, two layers, nonlinear processing units,

excitatory neighbourhood units– Learning: Weights leading to the neighbourhood units in the competitive layer– Recall: Apply input, determine winner– Limitation: Only visual features, not quantitative– To overcome: More complex architecture

Next >< Prev

Learning Algorithms for PCA networks

Next >< Prev

Self Organization Network

(a) Network structure (b) Neighborhood regions at different times in the output layer

Input layer

Output layer

Next >< Prev

Illustration of SOM

< Prev Next >

PART III

ANN Models for Feature Extraction and Classification

Next >

Neural Network Architecture and Models for Feature Extraction

• Multilayer Feedforward Neural Network (MLFFNN)

• Autoassociative Neural Networks (AANN)

• Constraint Satisfaction Models (CSM)• Self Organization MAP (SOM)• Time Delay Neural Networks (TDNN)• Hidden Markov Models (HMM)

Next >< Prev

Multilayer FFNN

• Nonlinear feature extraction followed by linearly separable classification problem

Next >< Prev

• Complex decision hypersurfaces for classification

• Asymptotic approximation of a posterior class probabilities

Multilayer FFNN

Next >< Prev

• Radial Basis Function NN: Clustering followed by classification

Input vector a

Basis function

Class labels

Radial Basis Function

Next >< Prev

• Architecture• Nonlinear PCA• Feature extraction• Distribution capturing ability

Autoassociation Neural Network (AANN)

Next >< Prev

Autoassociation Neural Network (AANN)

• Architecture

Input Layer Output LayerDimension Compression Hidden Layer

Distribution Capturing Ability of AANN

• Distribution of feature vector (fig) • Illustration of distribution in 2D

case (fig) • Comparison with Gaussian Mixture

Model (fig)

Next >< Prev

Distribution of feature vector

(a) Illustration of distribution in 2D case(b,c) Comparison with Gaussian Mixture Model

Feature Extraction by AANN

• Input and output to AANN: Sequence of signal samples (captures dominant 2nd order statistical features)

• Input and output to AANN: Sequence of Residual samples (captures higher order statistical features in the sample sequence)

Next >< Prev

Constraint Satisfaction Model

• Purpose: To satisfy the given (weak) constraints as much as possible

• Structure: Feedback network with units (hypotheses), connections (constraints / knowledge)

• Goodness of fit function: Depends on the output of unit and connection weights

• Relaxation Strategies: Deterministic and Stochastic

Next >< Prev

Application of CS Models

• Combining evidence• Combining classifiers outputs• Solving optimization problems

Next >< Prev

Self Organization Map (illustrations)

• Organization of 2D input to 1D feature mapping

• Organization of 16 Dimensional LPC vector to obtain phoneme map

• Organization of large document files

Next >< Prev

Time Delay Neural Networks for Temporal Pattern Recognition

Next >< Prev

Stochastic Models for Temporal Pattern Recognition

• Maximum likelihood formulation: Determine the class w, given the observation symbol sequence y, using criterion

• Markov Models

• Hidden Markov Models

)/(max wyPw

Next >< Prev

PART IV

Applications in Speech & Image Processing

Applications in Speech and Image Processing

• Edge extraction in texture-like images

• Texture segmentation/classification by CS model

• Road detection from satellite images

• Speech recognition by CS model

• Speaker recognition by AANN model

Next >< Prev

Problem of Edge Extraction in Texture-like Images

• Nature of texture-like images• Problem of edge extraction• Preprocessing (1-D) to derive partial evidence• Combining evidence using CS model

Next >< Prev

• Texture Edges are the locations where there is an abrupt change in texture properties

Problem of Edge Extraction

Image with 4 natural texture regions

Edgemap showing micro edges

Edgemap showing macro edges

Next >< Prev

1-D processing using Gabor Filter and Difference Operator

• 1-D Gabor smoothing filter : Magnitude and Phase

1-D Gabor Filter: Gaussian modulated by a complex sinusoidal

Odd Component

Even Component

)cos()2

)sin()2

Next >< Prev

• Differential operator for edge evidence – First derivative of 1-D Gaussian function

• Need for a set of Gabor filters

1-D processing using Gabor filter and Difference operator (contd.)

Next >< Prev

Texture Edge Extraction using 1-D Gabor Magnitude and Phase

• Apply 1-D Gabor filter along each of the parallel lines of an image in one direction ( say, horizontal )

• Apply all Gabor filters of the filter bank in a similar way

• For each of the Gabor filtered output, partial edge information is extracted by applying the 1-D differential operator in the orthogonal direction ( say, vertical )

• The entire process is repeated in the orthogonal (vertical and horizontal) directions to obtain the partial edge evidence in the other direction

• The partial edge evidence is combined using a Constraint Satisfaction Neural Network Model

Next >< Prev

Bank of 1-D Gabor Filters

Input Image

Filtered Image

Edge evidence

Combining the Edge evidence using Constraint Satisfaction Neural Network Mode

Edge map

Texture Edge Extraction using a set of 1-D Gabor Filters

Post-processing using 1-D Differential operator and Thresholding

Next >< Prev

Structure of 3-D CSNN Model

3D lattice of size IxJxKConnections among the

nodes across the layers of for each pixel

-veConnections from a set of neighboring

nodes to each node in the same layer.

Combining Evidence using CSNN model

Next >< Prev

Combining the Edge Evidence using Constraint Satisfaction Neural Network

(CSNN) Model

• Neural network model contains nodes arranged in a 3-D lattice structure

• Each node corresponds to a pixel in the post-processed Gabor filter output

• Post processed output of a single 1-D Gabor filter is an input to one 2-D layer of nodes

• Different layers of nodes, each corresponding to a particular filter output, are stacked one upon the other to form the 3-D structure

• Each node represents a hypothesis• Connection between two nodes represents a constraint

• Each node is connected to other nodes with inhibitory and excitatory connections

Next >< Prev

31or 1 1 if , 8

21or 1 1 if , 16

11or 11 if , 8

,1,1,,,

W kjikji

1 and 1,,,,,

W kjikji

Let, represents the weight of the connection from node (i,j,k) to node (i1,j1,k) within each layer k, and the weight represents the constraint between the nodes in two different layers (k and k1) in the same column. These are given as:

1,1,1,,, kjikjiW

• The node is connected to other nodes in the same column with excitatory connections

Combining Evidence using CSNN model (contd.)

Next >< Prev

• Using the notation as the output of the node (i,j,k), and the set as the state if the network

• The state of the neural network model is initialized using:

• In the deterministic relaxation method, the state of the network is updated iteratively by changing the output of each node at one time

• The state of each node is obtained using:

Ui,j,k (n) = Wi,j,k,i1,j1,k i1,j1,k + Wi,j,k,i,j,k1 i,j,k1 +Ii,j,k

Where Ui,j,k(n) is the net input to node(i,j,k) at nth iteration, and Ii,j,k is the external input given to the node (i,j,k)

• The state of the network is updated using:

where is the threshold

}1,0{,, kji

},,{ ,,, kjikji

otherwise ,0

pixel edge an of evidence has pixel theif ,1)0(,,, kji

otherwise

if Un i,j,k

kji ,0

,1)1(,,

Combining Evidence using CSNN model (contd.)

Next >< Prev

Comparison of Edge Extraction using Gabor Magnitude and Gabor Phase

2-D Gabor Filter 1-D Gabor Magnitude

Texture Image 1-D Gabor Phase

2-D Gabor Filter1-D Gabor MagnitudeTexture Image

1-D Gabor Phase

Next >< Prev

Texture Segmentation and Classification

• Image analysis (revisited)• Problem of texture segmentation and

classification • Preprocessing using 2D Gabor filter to derive

feature vector • Combining the partial evidence using CS model

Next >< Prev

CS Model for Texture Classification

• Supervised and unsupervised problem• Modeling of image constraint• Formulation of a posterior probability CS

model• Hopfield neural network model and its energy

function • Deterministic and Stochastic relaxation

strategies

Next >< Prev

CS Model for Texture Classification- Modeling of Image Constraints

• Feature formation process: Defined by the conditional probability of the feature vector gs of each pixels given the model

parameter of each class k.

2/||||

)2()|(

kksekLgGP

• Partition process: Defines the probability of the label of a pixel given the label of the pixels in its pth order neighborhood.

psrrs Z

Next >< Prev

• Label competition process: Describes the conditional probability of assigning a new label to an already labeled pixel

elLkLP

CS Model for Texture Classification- Modeling of Image Constraints (contd.)

• Formulation of Posteriori Probability

lspsrsss lLNLgGkLE

lspsrsss

lLNrLgGkLP

),,,|(1

),,,|(

lNrrsk

lspsrsss

lLNrLgGkLE

)()())2ln((2

),,,|(

lspsrs

total lLNrLgGkLEE ),,,|(,

)()( kLPgGPZZZ ssscp

• Total energy of the system

Next >< Prev

Connections among the nodes across the layers

of for each pixel

+ve-ve

CS Model for Texture Classification

Connections from a set of neighboring

nodes to each node in the same layer.

k(ijk)

Next >< Prev

Hopfield Neural Network and its Energy Function

iiii i

iiHopfield OBOOWE 1

kjikji

kjikjikjikji kji

kjikjiHopfield OBOOWE ,,

,,,,1,1,1,,

,, 1,1,11,1,1,,,2

o1 oj oN

B1 Bj BN

Next >< Prev

Natural Textures Initial Classification Final Classification

< Back

Results of Texture Classification - Natural Textures

Band-2 IRS image containing 4 texture

classes Initial Classification Final Classification

< Back

Results of Texture Classification - Remotely Sensing Data

SIR-C/X-SAR image of Lost City of Ubar

Classification using multispectral information

Classification using multispectral and textural

information

< Back

Results of Texture Classification - Multispectral Data

Speech Recognition using CS Model

• Problem of recognition of SCV unit (Table)• Issues in classification of SCVs(Table)• Representation of isolated utterance of

SCV unit– 60ms before and 140 ms after vowel

onset point– 240 dimensional feature vector

consisting of weighted cepstral coefficients

• Block diagram of the recognition system for SCV unit (Fig)

• CS network for classification of SCV unit(Fig)

Next >< Prev

Problem of Recognition of SCV Units

Issues in Classification of SCVs

• Importance of SCVs – High frequency of occurrence: About 45%

• Main Issues in Classification of SCVs– Large number of SCV classes – Similarity among several SCVs classes

• Model of Classification of SCVs– Should have good discriminatory capablity( Artificial neural networks )- Should be able to handle large number of classes( Neural networks based on a modular approach )

Block Diagram of Recognition

System for SCV Units

CS Network for Classification of SCV Units

External evidence of bias for the node is computed using the

output of the MLFFNN5

External evidence of bias for the node is computed using the

output of the MLFFNN1

External evidence of bias for the node is

computed using the output of the MLFFNN9

Vowel Feedback Subnetwork

MOA Feedback Subnetwork

POA Feedback Subnetwork

Classification Performance of CSM and other SCV Recognition Systems

on Test Data of 80 SCV Classes

Decision CriteriaSCV RecognitionSystem Case 1 Case 2 Case 3 Case 4HMM based system 45.5 59.2 65.9 71.480-class MLFFNN 45.3 59.7 66.9 72.2MOA modularnetwork

29.2 50.2 59.0 65.3

POA modularnetwork

35.1 56.9 69.5 76.6

Vowel modularnetwork

30.1 47.5 58.8 63.6

Combined evidencebased system

51.6 63.5 70.7 74.5

ConstraintSatisfaction model

65.6 75.0 80.2 82.6

Next >< Prev

Speaker Verification using AANN Models and Vocal Tract System

Features• One AANN for each speaker • Verification by identification • AANN structure: 19L 38N 4N 38N 19 L• Feature: 19 weighted LPCC from 16th order

LPC for each frame of 27.5 ms and frame shift 13.75ms

• Training: Pattern mode, 100 epochs, 1 min of data

• Testing: Model giving highest confidence for 10 sec of test data

Next >< Prev

Speaker Recognition using Source Features

• One model for each speaker • Structure of AANN: 40L 48N 12N 48N 40L• Feature: About 10 sec of data, 60 epochs• Testing: Select model giving highest

confidence for 2 sec of test data

Next >< Prev

Other Applications

• Speech enhancement• Speech compression • Image compression• Character recognition• Stereo image matching

Next >< Prev

Summary and Conclusions

• Speech and image processing: Natural tasks• Significance of pattern processing • Limitation of conventional computer architecture• Need for new models or architectures for pattern

processing tasks• Basics of ANN• Architecture of ANN for feature extraction and

classification • Potential of ANN for speech and image

processing

< Prev

References

1. B.Yegnanarayana, “ Artificial Neural Networks”, Prentice-Hall of India, New Delhi, 1999

2. L. R. Rabiner and B. H. Juang, “Fundamentals of Speech Recognition”, Prentice-Hall, New Jersey, 1993

3. Alan C. Bovik, Handbook of Image and Video Processing, Academic Press, 2001

4. Xuedong Hwang, Alex Acero and Hsiao-Wuen Hon, “Spoken Language Processing”, Prentice-Hall, New Jersey, 2001

5. P. P. Raghu, “Artificial Neural Network Models for Texture Analysis”, PhD Thesis, CSE Dept., IIT Madras, 1995

6. C. Chandra Sekar, “Neural Network Models for Recognition of Stop Consonant Vowel (SCV) Segments in Continuous Speech”, PhD Thesis, CSE Dept., IIT Madras, 1996

7. P. Kiran Kumar, “Texture Edge Extraction using One Dimensional Processing”, MS Thesis, CSE Dept., 2001

8. S. P. Kishore, “Speaker Verification using Autoassociative Neural Netwrok Models”, MS Thesis, CSE Dept., IIT Madras, 2000

9. B. Yegnanarayana, K. Sharath Reddy and S. P. Kishore, “Source and System Features for Speaker Recognition using AANN Models”, ICASSP, May 2001

10. S. P. Kishore, Suryakanth V. Ganagashetty and B. Yegnanarayana, “Online Text Independent Speaker Verification System using Autoassociative Neural Network Models”, INNS-IEEE Int. Conf. Neural Networks, July 2001.

11. K. Sharat Reddy, “Source and System Features for Speaker Recognition”, MS Thesis, CSE Dept., IIT Madras, September 2001.

12. B. Yegnanarayana and S. P. Kishore, “Autoassociative Neural Networks: An alternative to GMM for Pattern Recognition”, to appear in Nerual Networks 2002.

1 Tutorial on Neural Network Models for Speech and Image Processing B. Yegnanarayana Speech & Vision...

Documents

Transcript of 1 Tutorial on Neural Network Models for Speech and Image Processing B. Yegnanarayana Speech & Vision...

Recurrent neural network-enhanced HMM speech recognition ...

Speech Emotion Recognition Using Deep Neural Network ......Speech Emotion Recognition Using Deep Neural Network Considering Verbal and Nonverbal Speech Sounds Kun-Yi Huang, Chung-Hsien

Deep neural network for speech synthesis

1 WaveMedic: Convolutional Neural Networks for Speech …cs229.stanford.edu/proj2016/report/FisherScherlis-WaveMedic... · 1 WaveMedic: Convolutional Neural Networks for Speech Audio

TTS Consortium, Funded by DeitY, GoI, Indiaknr/facets2017/Hema.pdf · Hema A Murthy and B Yegnanarayana, “Group delay functions and its application to speech technology,” Sadhana,

Course Description for Monsoon- 2014 - IIIT Hyderabad · Course Description for Monsoon- 2014 ... Speech Signal Processing B. Yegnanarayana ... Labs + Tutorials 1.5 - 3 hr per week

New Paradigm in Speech Recognition: Deep Neural Networks

Convolutional Neural Networks for Speech Recognition The ...

Convolutional Neural Networks for Isolated Word Speech ... I IST.pdf · Keywords-Automatic Speech Recognition (ASR), Mel Frequency Cepstral Coefficient (MFCC), Convolutional Neural

Speech synthesis from neural decoding of spoken sentences

Neural Networks used for Speech Recognition - · PDF fileIndex Terms—speech recognition, neural networks, ... Continuous character of the ... and Radial Basis Functions neural network

(Brain Study) Speech Recognition using Neural

New Paradigm in Speech Recognition: Deep Neural Networks · · 2018-04-28Dominique Fohr, Odile Mella, Irina Illina. New Paradigm in Speech Recognition: Deep Neural Net- ... speech

Speech Recognition: Statistical and Neural Information ...papers.nips.cc/paper/174-speech-recognition... · SPEECH RECOGNITION: STATISTICAL AND NEURAL INFORMATION PROCESSING APPROACHES

ARTIFICIAL NEURAL NETWORK-BASED SPEECH ...mjcs.fsktm.um.edu.my/document.aspx?FileName=1557.pdfArtificial Neural Network-Based Speech Recognition Using Dwt Analysis Applied On Isolated

Neural specializations for speech and pitch: moving …web.ics.purdue.edu/~gandour/publications/ptrsb08.pdf · Neural specializations for speech and pitch: moving beyond the dichotomies

Deep neural network for speech synthesis - SSPNET · Deep neural network for speech synthesis Heng Lu University of Edinburgh 23 May 2013

Neural specializations for speech and pitch: moving …rstb.royalsocietypublishing.org/content/royptb/363/1493/1087.full.pdf · Neural specializations for speech and pitch: moving

FSER: Deep Convolutional Neural Networks for Speech ...

Neural mechanisms underlying auditory feedback control …web.mit.edu/hst.722/www/Topics/Speech/Tourville_in... · Neural mechanisms underlying auditory feedback control of speech