Acoustic Analysis of Speech Robert A. Prosek, Ph.D. CSD 301 Robert A. Prosek, Ph.D. CSD 301.

Acoustic Analysis of Speech

Acoustic Analysis of Speech

Robert A. Prosek, Ph.D.CSD 301

Robert A. Prosek, Ph.D.CSD 301

Acoustic AnalysisAcoustic Analysis

•Instrumental acoustical analyses have been used for over 100 years

•Analog techniques dominated the first 60 of these years

•More recently, digital techniques have dominated the field

•We will begin by introducing a few of the important analog methods, then turn to the digital

•Instrumental acoustical analyses have been used for over 100 years

•Analog techniques dominated the first 60 of these years

•More recently, digital techniques have dominated the field

•We will begin by introducing a few of the important analog methods, then turn to the digital

Oscillograph/OscillogramOscillograph/Oscillogram•Any device that can display a waveform is an

oscillograph

•The output (display or hardcopy) is an oscillogram

•There is limited information available in a waveform

•silence

•burst

•noise

•periodicity

•Any device that can display a waveform is an oscillograph

•The output (display or hardcopy) is an oscillogram

•There is limited information available in a waveform

•silence

•burst

•noise

•periodicity

Filter Bank AnalysisFilter Bank Analysis

•In this procedure, a filter bank or a single filter is used to divide the signal energy into frequency bands

•The output energy is displayed for each band

•This is a form of spectral analysis

•The output typically is displayed in the form of an histogram

•The technique is very common in audiology and hearing applications

•In this procedure, a filter bank or a single filter is used to divide the signal energy into frequency bands

•The output energy is displayed for each band

•This is a form of spectral analysis

•The output typically is displayed in the form of an histogram

•The technique is very common in audiology and hearing applications

Sound Spectrograph/Spectrogra

m

Sound Spectrograph/Spectrogra

m•The instrument is called a spectrograph

•The output (usually a hardcopy) is a spectrogram

•This is the most commonly used device in speech research

•The spectrograph can capture the dynamics of speech

•Acoustic signals vary only in frequency, amplitude and time

•The sound spectrograph captures all of these

•The instrument is called a spectrograph

•The output (usually a hardcopy) is a spectrogram

•This is the most commonly used device in speech research

•The spectrograph can capture the dynamics of speech

•Acoustic signals vary only in frequency, amplitude and time

•The sound spectrograph captures all of these

Sound SpectrogramSound Spectrogram•Abscissa is time

•Ordinate is frequency

•Intensity is shown as shades of gray

•Black areas indicate the highest amplitudes

•White areas indicate the noise floor

•Amplitudes between these extremes are shown in varying shades of grey

•the more intense the signal is at a particular frequency and time, the darker the trace

•Abscissa is time

•Ordinate is frequency

•Intensity is shown as shades of gray

•Black areas indicate the highest amplitudes

•White areas indicate the noise floor

•Amplitudes between these extremes are shown in varying shades of grey

•the more intense the signal is at a particular frequency and time, the darker the trace

Digital Signal Processing (1)


•In the late 1960’s general purpose digital computers made it possible to analyze acoustic signals on the computer

•These techniques are necissarily discrete as well as digital

•Once in discrete form, the signal can be stored conveniently and analyzed in many way that were not possible with analog techniques

•In the late 1960’s general purpose digital computers made it possible to analyze acoustic signals on the computer

•These techniques are necissarily discrete as well as digital

•Once in discrete form, the signal can be stored conveniently and analyzed in many way that were not possible with analog techniques


Digital Signal Processing (2)•Presampling or brickwall filtering

•Nyquist Theorum

•In order to represent a signal faithfully, it must be sampled at a rate equal to twice its highest frequency

•The brickwall filter removes all of the energy above the Nyquist frequency

•The clinician/researcher determines the Nyquist frequency

•Some knowledge of speech and speech and language disorders is required

•Presampling or brickwall filtering

•Nyquist Theorum

•In order to represent a signal faithfully, it must be sampled at a rate equal to twice its highest frequency

•The brickwall filter removes all of the energy above the Nyquist frequency

•The clinician/researcher determines the Nyquist frequency

•Some knowledge of speech and speech and language disorders is required


Digital Signal Processing (3)•Sampling

•Analog-to-digital conversion

•Signal must be sampled at the Nyquist rate

•Sampling decides the times at which the signal will be

•Sampling converts the acoustic signal into a series of numbers

•Instead of amplitudes at all instances of time, no matter how small the time interval, amplitudes in the digital world exist only at the sampling interval

•Aliasing

•Sampling

•Analog-to-digital conversion

•Signal must be sampled at the Nyquist rate

•Sampling decides the times at which the signal will be

•Sampling converts the acoustic signal into a series of numbers

•Instead of amplitudes at all instances of time, no matter how small the time interval, amplitudes in the digital world exist only at the sampling interval

•Aliasing



•Quantization

•Discrete number of amplitude levels

•The more quantizer levels available, the more the discrete signal represents the original analog signal

•In our applications, 16 -bit quantizers over a 20-volt range are typical

•This yields an amplitude resolution of 300 μvolts and a signal to noise ratio of 96 dB

•Quantization

•Discrete number of amplitude levels

•The more quantizer levels available, the more the discrete signal represents the original analog signal

•In our applications, 16 -bit quantizers over a 20-volt range are typical

•This yields an amplitude resolution of 300 μvolts and a signal to noise ratio of 96 dB



•After A/D conversion

•the signal is stored as a stream of numbers

•time is related by the index to the sampling rate

•the amplitude is the stored number

•in this form, many operations can be performed

•After A/D conversion

•the signal is stored as a stream of numbers

•time is related by the index to the sampling rate

•the amplitude is the stored number

•in this form, many operations can be performed

Waveform DisplayWaveform Display•Duration measurements

•speech changes gradually

•some consistent rules need to be adopted

•Signal editing

•again, some consistent rules need to be adopted

•Amplitude measurements

•rms is the most common

•vocal fundamental frequency

•Duration measurements

•speech changes gradually

•some consistent rules need to be adopted

•Signal editing

•again, some consistent rules need to be adopted

•Amplitude measurements

•rms is the most common

•vocal fundamental frequency

Digital Spectrum Analysis

Digital Spectrum Analysis

•The Fourier Transform revisited (FFT)

•Periodic waveforms can be thought of as a series of sinusoids

•amplitude and phase

•The Fourier Transform and the Inverse Fourier transform allow powerful analysis-by-synthesis techniques

•The Fourier Transform revisited (FFT)

•Periodic waveforms can be thought of as a series of sinusoids

•amplitude and phase

•The Fourier Transform and the Inverse Fourier transform allow powerful analysis-by-synthesis techniques

Digital SpectrographDigital Spectrograph•This is a series of spectra based on the

FFT or LPC (see below)

•The amplitude is depicted as shades of gray

•PRAAT is an example of a digital spectrograph

•Speech Filing System, Speech Station 2, Wavesurfer, and many other free or commercially spectrographs are available

•This is a series of spectra based on the FFT or LPC (see below)

•The amplitude is depicted as shades of gray

•PRAAT is an example of a digital spectrograph

•Speech Filing System, Speech Station 2, Wavesurfer, and many other free or commercially spectrographs are available

Linear Predictive Coding (1)

Linear Predictive Coding (1)

•Speech is highly predictable over the short term

•It is not hard to predict the amplitude of the next time sample of the speech waveform from a knowledge of the previous amplitudes

•As few as 10 to 15 previous samples is all that is required

•Speech is highly predictable over the short term

•It is not hard to predict the amplitude of the next time sample of the speech waveform from a knowledge of the previous amplitudes

•As few as 10 to 15 previous samples is all that is required

LPC (2)LPC (2)

•From statistics, we know that:

•y= a0+a1(x-1)+a2(x-2)+...+an(x-n)

•where y is the amplitude of the next sample

•and x is one of the previous samples

•This is linear prediction

•From statistics, we know that:

•y= a0+a1(x-1)+a2(x-2)+...+an(x-n)

•where y is the amplitude of the next sample

•and x is one of the previous samples

•This is linear prediction

LPC (3)LPC (3)

•Linear Predictive Coding (LPC) is one of the most powerful techniques in speech analysis

•The a’s in the previous equation can be used as estimates of the resonances of the vocal tract.

•They can represent sections of the vocal tract

•Linear Predictive Coding (LPC) is one of the most powerful techniques in speech analysis

•The a’s in the previous equation can be used as estimates of the resonances of the vocal tract.

•They can represent sections of the vocal tract

Wideband versus Narrowband

Spectrograms

Wideband versus Narrowband

Spectrograms

•Wideband (0.005, 0.007, 0.009)

•Short time window

•Good for measuring formant frequencies

•Narrowband (0.1, 0.05)

•Long time window

•Good for showing and measuring harmonics

•Wideband (0.005, 0.007, 0.009)

•Short time window

•Good for measuring formant frequencies

•Narrowband (0.1, 0.05)

•Long time window

•Good for showing and measuring harmonics

Acoustic Analysis of Speech Robert A. Prosek, Ph.D. CSD 301 Robert A. Prosek, Ph.D. CSD 301.

Documents

Transcript of Acoustic Analysis of Speech Robert A. Prosek, Ph.D. CSD 301 Robert A. Prosek, Ph.D. CSD 301.