Meena Ramani 04/12/06 EEL 6586 Automatic Speech Processing.

67
Meena Ramani 04/12/06 EEL 6586 Automatic Speech Processing

Transcript of Meena Ramani 04/12/06 EEL 6586 Automatic Speech Processing.

Page 1: Meena Ramani 04/12/06 EEL 6586 Automatic Speech Processing.

Meena Ramani

04/12/06

EEL 6586 Automatic Speech Processing

Page 2: Meena Ramani 04/12/06 EEL 6586 Automatic Speech Processing.

Topics to be covered

Lecture 1: The incredible sense of hearing 1The incredible sense of hearing 1

Anatomy

Perception of Sound

Lecture 2: The incredible sense of hearing 2The incredible sense of hearing 2

Psychoacoustics

Hearing aids and cochlear implants

Page 3: Meena Ramani 04/12/06 EEL 6586 Automatic Speech Processing.

The incredible sense of hearing-2The incredible sense of hearing-2

“Behind these unprepossessing flaps lie structures of such delicacy that they shame the most skillful craftsman"

-Stevens, S.S. [Professor of Psychophysics, Harvard University]

Page 4: Meena Ramani 04/12/06 EEL 6586 Automatic Speech Processing.

How do we hear?

Page 5: Meena Ramani 04/12/06 EEL 6586 Automatic Speech Processing.

Threshold of Hearing

Page 6: Meena Ramani 04/12/06 EEL 6586 Automatic Speech Processing.

Equal loudness curves

Page 7: Meena Ramani 04/12/06 EEL 6586 Automatic Speech Processing.

The Bass Loss Problem

Rock music

Too lowno bass

Too hightoo much bass

Page 8: Meena Ramani 04/12/06 EEL 6586 Automatic Speech Processing.

Threshold variation with age

102

103

104

-10

0

10

20

30

40

50

60

70

80

90

Frequency (Hz)

Th

res

ho

ld o

f h

ea

rin

g (

dB

SP

L)

Thresholds of hearing for normal & HI listeners

Normal hearingHearing impaired

Page 9: Meena Ramani 04/12/06 EEL 6586 Automatic Speech Processing.

The Audiogram

0 1000 2000 3000 4000 5000 6000-20

0

20

40

60

80

100

Frequency, Hz

He

ari

ng

Le

ve

l (H

L),

dB

Audiogram

Left EarRight Ear

Page 10: Meena Ramani 04/12/06 EEL 6586 Automatic Speech Processing.

The Audiogram (contd.)

Pure tone audiogram

[250 500 1K 2K 4K 6k] Hz

<20 dB HL is Normal Hearing

0 1000 2000 3000 4000 5000 6000-20

0

20

40

60

80

100

Frequency, Hz

He

ari

ng

Le

ve

l (H

L),

dB

Audiogram

Left EarRight Ear

Page 11: Meena Ramani 04/12/06 EEL 6586 Automatic Speech Processing.

Loudness Growth Curve

0 20 40 60 80 1000

1

2

3

4

5

6

7

Input level (dB SPL)

LG

OB

-Lo

ud

ne

ss

ra

tin

g

LGOB loudness growth curve at 250 Hz

Normal hearingHearing impaired

Page 12: Meena Ramani 04/12/06 EEL 6586 Automatic Speech Processing.

Otoacoustic emissions

• The ear produces some sounds!– OHC-outer hair cell

• Used to test hearing for infants & check if patient is feigning a loss

Page 13: Meena Ramani 04/12/06 EEL 6586 Automatic Speech Processing.

Monoaural beats

If two tones are presented monaurally with a small frequency difference, a beating pattern can be heard

500 & 502 Hz 500 & 520 Hz

Interaction of the two tones in the same auditory filter

Waveform: 150 Hz + 170 Hz

Page 14: Meena Ramani 04/12/06 EEL 6586 Automatic Speech Processing.

Beating can also be heard when the tones are presented to different ears!

Beating arises from neural interaction

Only perceived if the tones are sufficiently close in frequency

500 Hz - left 520 Hz - right binaural

Binaural beats

Page 15: Meena Ramani 04/12/06 EEL 6586 Automatic Speech Processing.

The case of the missing fundamental

Telephone BW: 300-3400 Hz

How do we know the pitch?

Primary Auditory cortex

•Pitch sensitive neurons [Bendor and Wang, Nature 2005]

•Neuron responds to fundamental and harmonics

•What are the I/Ps to these neuron?

How do spikes represent periodic, temporal and spectral information?

Page 16: Meena Ramani 04/12/06 EEL 6586 Automatic Speech Processing.

Matlab code available

Feed it a wav file

Spits out PSTH

<post stimulus time histogram>

Auditory-periphery model

(Zhang et al. ~2001)

Page 17: Meena Ramani 04/12/06 EEL 6586 Automatic Speech Processing.

Critical bands

Equally loud, close in frequency

•Same IHCs

•Slightly louder

Equally loud, separated in freq.

•Different IHCs

•Twice as loud

Psychoacoustic experiments

Page 18: Meena Ramani 04/12/06 EEL 6586 Automatic Speech Processing.

Critical Band (cont.)

• Proposed by Fletcher• How to measure?

– S/N ratio vs noise BW • CB ~= 1.5mm spacing on BM• 24 such band pass filters

• BW of the filters increases with fc

• Logarithmic relationship – Weber’s law example

• Bark scale

Center Freq Critical BW

100 90

200 90

500 110

1000 150

2000 280

5000 700

10000 1200

Page 19: Meena Ramani 04/12/06 EEL 6586 Automatic Speech Processing.

Critical bands for HI

103

104

0

10

20

30

40

50

60

70

80

90

Desired tone frequency (Hz)

De

sir

ed

to

ne

th

res

ho

ld (

dB

SP

L)

4 kHz tuning curve for normal & HI listeners

MaskerNormal hearingHearing impaired

Page 21: Meena Ramani 04/12/06 EEL 6586 Automatic Speech Processing.

Frequency Masking

• Masking occurs because two frequencies lie within a critical band and the higher amplitude one masks the lower amplitude signal

• Masking can be because of broad band, narrowband noise, pure and complex tones

• Low frequency broad band sounds mask the most– Eg. Truck on road, water flowing

• Masking threshold– Amount of dB for test tone to be just audible in presence of noise

Page 22: Meena Ramani 04/12/06 EEL 6586 Automatic Speech Processing.

Temporal Aspects of Masking

• Simultaneous Masking• Pre-Stimulus/Backward/Premasking

– 1st test tone 2nd Masker

• Poststimulus/Forward/Postmasking– 1st Masker 2nd test tone

Page 23: Meena Ramani 04/12/06 EEL 6586 Automatic Speech Processing.

Simultaneous masking– Duration >200ms constant test tone threshold– Assume hearing system integrates over a period of 200ms

Postmasking– Decay in effect of masker for 100ms– More dominant

Premasking – Takes place 20ms before masker is on!!– Each sensation is not instantaneous , requires build-up time

• Quick build up for loud maskers• Slower build up for softer maskers

– Less dominant effect

Temporal Aspects of Masking (contd.)

Page 24: Meena Ramani 04/12/06 EEL 6586 Automatic Speech Processing.

Temporal masking for HI

0 20 40 60 80 100 120 1400

10

20

30

40

50

60

70

80

Desired-Masker tone separation (ms)

De

sir

ed

to

ne

th

res

ho

ld (

dB

SP

L)

Temporal resolution at 4 kHz for normal & HI listeners

Normal hearingHearing impaired

Page 25: Meena Ramani 04/12/06 EEL 6586 Automatic Speech Processing.

Meena Ramani

04/14/06

EEL 6586 Automatic Speech Processing

Page 26: Meena Ramani 04/12/06 EEL 6586 Automatic Speech Processing.

Normal Hearing

Sensorineural Hearing Loss

Mild to Severe Loss

[10 20 30 60 80 90] dB HL

Time (s)

Fre

qu

en

cy

(H

z)

Cell phone speech for normal hearing

0 0.5 1 1.5 20

500

1000

1500

2000

2500

3000

3500

4000

-250

-200

-150

-100

-50

0

Time (s)

Fre

qu

en

cy

(H

z)

Cell phone speech for SNHL

0 0.5 1 1.5 20

500

1000

1500

2000

2500

3000

3500

4000

-250

-200

-150

-100

-50

0

What do the hearing impaired hear?

Page 27: Meena Ramani 04/12/06 EEL 6586 Automatic Speech Processing.

Facts on Hearing Loss in Adults

• One in every ten (28 million) Americans has hearing loss. 

• The vast majority of Americans (95% or 26 million) with hearing loss can have their hearing loss treated with hearing aids. 

• Only 6 million use HAs

• Millions of Americans with hearing loss could benefit from hearing aids but avoid them because of the stigma.

Page 28: Meena Ramani 04/12/06 EEL 6586 Automatic Speech Processing.

Types of Hearing aids

Behind The earIn the Ear

In the Canal Completely in the canal

Page 29: Meena Ramani 04/12/06 EEL 6586 Automatic Speech Processing.

Anatomy of a Hearing Aid

• Microphone• Tone hook• Volume control• On/off switch

• Battery compartment

Page 30: Meena Ramani 04/12/06 EEL 6586 Automatic Speech Processing.

Ear Mold Measurements

Hearing Aid Fitting

Page 31: Meena Ramani 04/12/06 EEL 6586 Automatic Speech Processing.

Acclimatization effect

Auditory cortex brain plasticity

Time for the HI to reuse the HF information: Acclimatization effect

How does this affect HA fitting?– Multiple fitting sessions– Initial fitting should be optimum one

Page 32: Meena Ramani 04/12/06 EEL 6586 Automatic Speech Processing.

So doc, what is the fitting methodology employed by the hearing aid company to compensate for my hearing loss?

Not-so-average Joe

(PhD EE/Speech person)

CO

NFI

DEN

TIA

L?

Page 33: Meena Ramani 04/12/06 EEL 6586 Automatic Speech Processing.

So, do you want your HA to:

1) Always be comfortably loud2) Equalize loudness across

frequencies3) Normalize loudness

…?

?

Which fitting methodology is the bestbest?

Page 34: Meena Ramani 04/12/06 EEL 6586 Automatic Speech Processing.

Existing HL compensation algorithms

Rationale Adhoc: Half Gain, POGO Make speech comfortable: NAL-R Loudness normalization: IHAFF, Fig 6 Loudness equalization: DSL

Hearing aid fittingalgorithms

Threshold-only Suprathreshold

NAL-R POGO HG Fig 6 IHAFF DSL

Page 35: Meena Ramani 04/12/06 EEL 6586 Automatic Speech Processing.

Sensorineural hearing loss [10 20 30 60 80 90] dB HLSpeech level= 65 dBA

Spectrograms and sound files

Normal hearing Hearing impaired HI with Linear gain

HI with DSL gain HI with RBC gain

Section Two

Page 36: Meena Ramani 04/12/06 EEL 6586 Automatic Speech Processing.

Speech Intelligibility

Objective MeasuresAI, STI

Speech Quality

Objective MeasuresPESQ

Subjective MeasuresMOS

Speech Intelligibility (SI): The degree to which speech can be understood

Performance metrics

Subjective MeasuresHINT

Speech Quality: “Does the speech match your expectations?”

Page 37: Meena Ramani 04/12/06 EEL 6586 Automatic Speech Processing.

Performance metrics (contd.)• Objective speech quality measure

– Perceptual Evaluation of Subjective Quality (PESQ)• Subjective speech quality measure

– Mean Opinion Score (MOS)• Subjective speech intelligibility measure

– Hearing In Noise Test (HINT)

Reference signal

Comparison signal

Score

Page 38: Meena Ramani 04/12/06 EEL 6586 Automatic Speech Processing.

Hearing In Noise Test (HINT)

Page 39: Meena Ramani 04/12/06 EEL 6586 Automatic Speech Processing.

Subjective listening experiments

Audiograms of the HI patients

0 2000 4000 6000 80000

20

40

60

80

100

120

Frequency (Hz)

Th

res

ho

ld o

f h

ea

rin

g (

dB

HL

)

Left ear audiograms of the HI subjectsLocation:

Shands speech & hearing clinic

(sound proof booth)

Subjects:

15 HI people– PTA: 40-70 dB HL

15 normal hearing people

Tools used:

Matlab HINT and MOS GUIs

Page 40: Meena Ramani 04/12/06 EEL 6586 Automatic Speech Processing.

Subjective HINT and MOS scores for RBC:hearing impaired, cell phone speech

RBC has a 7 dB improvement in SI when compared to DSL

MOS scores reveal that RBC has a quality rating of ‘Good’

None HPF RBC NALR POGO HG NALRP DSL

1-Bad

2-Poor

3-Fair

4-Good

5-Excellent

Algorithm

Ave. MOSs of 15 HI subjects

None HPF RBC NALR POGO HG NALRP DSL-20

-15

-10

-5

0

5

Algorithm

SN

R r

ela

tiv

e t

o b

as

eli

ne

(d

B)

Ave. HINT scores of 15 HI subjects

Page 41: Meena Ramani 04/12/06 EEL 6586 Automatic Speech Processing.

Subjective HINT and MOS scores for RBC:normal hearing, cell phone speech

RBC has a 12 dB improvement in SI when compared to DSL

MOS scores reveal that RBC has a quality rating of ‘Good’

Page 42: Meena Ramani 04/12/06 EEL 6586 Automatic Speech Processing.

Cochlear Implants

The first fully functional Brain Machine The first fully functional Brain Machine Interface (BMI)Interface (BMI)

Definition:

A device that electrically stimulates the auditory nerve of patients with severe-to-profound hearing loss to provide them with sound and speech information

Page 43: Meena Ramani 04/12/06 EEL 6586 Automatic Speech Processing.

Who is a candidate?

• Severe-to profound sensorineural hearing loss

• Hearing loss did not reach severe-to-profound level until after acquiring oral speech and language skills

• Limited benefit from hearing aids

Page 44: Meena Ramani 04/12/06 EEL 6586 Automatic Speech Processing.

• Worldwide:– Over 100,000 multi-channel implants

• At Univ of Florida:– Implanted first patient in 1985– Currently follow over 400 cochlear patients

CI statistics

Page 45: Meena Ramani 04/12/06 EEL 6586 Automatic Speech Processing.

Technical and Safety Issues

• Magnetic Resonance Imaging• Surgical issues

Page 46: Meena Ramani 04/12/06 EEL 6586 Automatic Speech Processing.

How does the Cochlea encode frequencies?

Page 47: Meena Ramani 04/12/06 EEL 6586 Automatic Speech Processing.
Page 48: Meena Ramani 04/12/06 EEL 6586 Automatic Speech Processing.
Page 49: Meena Ramani 04/12/06 EEL 6586 Automatic Speech Processing.

Example: New Freedom

Page 50: Meena Ramani 04/12/06 EEL 6586 Automatic Speech Processing.
Page 51: Meena Ramani 04/12/06 EEL 6586 Automatic Speech Processing.

CI characteristics

1. Electrode design – Number of electrodes, electrode configuration

2. Type of stimulation – Analog or pulsatile

3. Transmission link – Transcutaneous or percutaneous

4. Signal processing – Waveform representation or feature extraction

Page 52: Meena Ramani 04/12/06 EEL 6586 Automatic Speech Processing.

Signal processing

• Compressed Analog (CA)• Continuous Interleaved Sampling (CIS)• Multiple Peak (MPEAK )• Spectral Maxima Sound Processor (SMSP)• Spectral Peak (SPEAK)

Page 53: Meena Ramani 04/12/06 EEL 6586 Automatic Speech Processing.

Compressed Analog (CA) approach

Page 54: Meena Ramani 04/12/06 EEL 6586 Automatic Speech Processing.

CA activation signals

Page 55: Meena Ramani 04/12/06 EEL 6586 Automatic Speech Processing.

Continuous Interleaved Sampling (CIS)

Page 56: Meena Ramani 04/12/06 EEL 6586 Automatic Speech Processing.

CIS activation signals

Page 57: Meena Ramani 04/12/06 EEL 6586 Automatic Speech Processing.

Multiple Peak (MPEAK)

Page 58: Meena Ramani 04/12/06 EEL 6586 Automatic Speech Processing.

MPEAK activated electrodes

Page 59: Meena Ramani 04/12/06 EEL 6586 Automatic Speech Processing.

Spectral Maxima Sound Processor (SMSP)

Page 60: Meena Ramani 04/12/06 EEL 6586 Automatic Speech Processing.

SMSP activated electrodes

Page 61: Meena Ramani 04/12/06 EEL 6586 Automatic Speech Processing.

Spectral Peak (SPEAK)

Page 62: Meena Ramani 04/12/06 EEL 6586 Automatic Speech Processing.

SPEAK activated electrodes

Page 63: Meena Ramani 04/12/06 EEL 6586 Automatic Speech Processing.

Outcomes for Post-lingual Adults

• Wide range of success

• Most score 90-100% on AV sentence materials

• Majority score > 80% on high context

• Performance more varied on single word tests

Page 64: Meena Ramani 04/12/06 EEL 6586 Automatic Speech Processing.

Auditory Brainstem Implant

• Approved October 20, 2000• Uses the Nucleus 24 system

processors• Plate array with 21 electrodes

Page 65: Meena Ramani 04/12/06 EEL 6586 Automatic Speech Processing.

Review-1Pinna:

ITDs,IIDs: Horizontal localizationReflections: Vertical localization

Ear canal:¼ wave resonance 1-3 kHz

Middle ear:Amplification by lever action and by areaStapedius reflex

Cochlea:IHCs/OHCs: convert mechanical to electricalPlace theory: frequency analysisMissing fundamental

Page 66: Meena Ramani 04/12/06 EEL 6586 Automatic Speech Processing.

Review-2

Adaptation: AN firing sensitive to changes

Otoacoustic emissions:Produced by movement of OHCs

Beats:Monaural & binaural

Measurement of hearing:Audiogram: threshold of hearingThreshold variation with ageEqual loudness curves

Bass loss problem: discrimination against LFs

Page 67: Meena Ramani 04/12/06 EEL 6586 Automatic Speech Processing.

Review-3

Critical bands:used for efficient encodingBark scale

Masking:Frequency: LFs mask moreTemporal: simultaneous, pre and post

Hearing impairment:Hearing aids: external to cochleaCochlear implants: inside cochlea