To examine the feasibility of using confusion matrices from speech recognition tests to identify...

1
0 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 0 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% P ercentage ofTotalP ossible False A larm s P ercentage ofM issing C hannels D etected C ICA HMM D TW CHANCE To examine the feasibility of using confusion matrices from speech recognition tests to identify impaired channels, impairments in this study were simulated by setting the amplitude of one of the acoustic model channels from the acoustic waveform presented to the normal- hearing subjects to zero. • Removing an acoustic model channel represents the worst-case impairment: a channel transmitting zero speech information. This is not a clinically relevant scenario, but an appropriate proof of concept, which is the intended scope of this initial study. METHOD • Unimpaired model (baseline) • Eight analysis / eight presentation filters (p-of-p) • Logarithmically spaced from 150Hz – 6450Hz • Speech processed in 2 millisecond windows • Impaired models • Eight impaired models [imp1, imp2, imp3, imp4, imp5, imp6, imp7, imp8] • Single channel removed from model (envelope set to zero) – other channels unaffected • Nine vowel tokens – [had, hawed, head, heard, heed, hid, hood, hud, who’d] • Fourteen consonant tokens – [b, d, f, g, j, k, m, n, p, s, sh, t, v, z] – in /aCa/ context • Three noise levels: quiet, +8 dB SNR, and +6 dB SNR • Impairment measurably affects vowel recognition, smaller effect on consonant recognition • Inspired by Miller and Nicely information transmission analysis. • Use vowel formant-to-channel mapping to determine the classification matrix assuming that the vowel formant locations are the critical discriminating features. • The classification matrix for CICA can be directly calculated from the distribution of first (F1) and second (F2) vowel formants across the model channels. • Calculate transmission in bits per stimulus as from the grouped confusion matrix constructed using the classification matrix. Impairments to the ideal speech presentation, which can result from various physiological or psychophysical conditions, are one factor affecting how much benefit severely deafened individuals receive from cochlear implants. Studies of the effects of impairments often use acoustic models to examine impairments on all channels; however isolated impairments affecting a small number of electrodes are more common and have been shown to negatively affect speech recognition performance. Assessing cochlear implant device performance on an electrode-by- electrode basis requires extensive, time-consuming tests. In this study, methods for screening poorly performing electrodes are evaluated using vowel confusion data from normal-hearing subjects tested with acoustic models. Confusion patterns for vowel tokens in noise with missing spectral information were measured using an eight channel acoustic model with individual channels removed. The missing channel model represents the most severe case of a poorly performing channel, with no speech information transmitted on the channel under investigation. Preliminary results indicate successful identification of the missing channels using the vowel confusion matrices for each acoustic model. A technique that quickly identifies potential poorly performing channels would allow researchers, and ultimately clinicians, to develop and implement alternate speech presentations that maximize transmission of critical speech information. Identifying Poorly Performing Channels in Cochlear Implants Through Confusion Matrix Analysis Jeremiah J. Remus and Leslie M. Collins Department of Electrical and Computer Engineering, Duke University, Durham, NC INTRODUCTION Methods Used to Identify Poorly Performing Channels 2 Two issues were identified as critical for developing a method to identify impaired channels through speech-based confusion matrix analysis: 1)creating a set of stimuli, speech-based or nonsensical, that thoroughly tests each channel for impairments. 2)developing a measure for classifying the similarities and differences between the tokens with regard to the critical discriminating features transmitted on each channel. Caution must be exercised to avoid creating a stimulus set that is overly simplified. Tokens must contain sufficient complexity to ensure their usefulness in measuring impairments of interest. Token confusions in the listening test can be attributed to one of three causes: 1)phonetic similarities between the tokens; 2)the structure of the additive noise and experiment setup; 3)the effect of the impairments on the critical discriminating features. Additional efforts to remove the influence of the first two causes of confusions from the impaired model confusion matrix, and isolate confusions resulting from impairments to the acoustic model, will improve the identification of the impaired channels. The CICA method was able to identify the missing 3 1 Listening Experiment Performance Identifying Missing Channels 4 Conclusions This research is supported by the National Science Foundation grant NSF-BES-00-85370 Channel Information Confusion Analysis Dynamic Time Warping (DTW) Hidden Markov Models (HMM) 10 20 30 40 50 60 70 10 20 30 40 50 60 Least cost mapping through cost/distance matrix , mn D , 1 mn D 1, 1 m n D 1, m n D 1, 1 1, 1 , 1, , 1 min( , , ) m n m n mn m n mn D d D D D Decision metric: , (M ,Q) L P | ij i j observation • Compare the cepstrum coefficients of an impaired processed token to the cepstrum coefficients of an unimpaired processed token, used as a template. • Difference between cepstrum coefficient vectors calculated using Euclidean distance 2 2 1 (,) N k k k d xy x y Model ( λ ) Parameters M: number of Gaussian mixtures Q: number of model states , (;) log ij i j ij ij n nn Txy n nn Ch.1 Ch.2 Ch.3 Ch.4 Ch.5 Ch.6 Ch.7 Ch.8 had - - - F1 - F2 - - haw ed - - - F1/F2 - - - - head - - F1 - - F2 - - heard - - F1 - F2 - - - heed - F1 - - - F2 - - hid - - F1 - - F2 - - hood - - F1 - F2 - - - hud - - - F1 F2 - - - who'd - F1 - - F2 - - - Distribution of first and second vowel formants across channels CO ST M A TR IX …. …. Processed speech signal: W indowed w ith 50% overlap: Cepstrum coefficientvectors calculated foreach w indow : D istance between tw o vectors . Post-processing on HMM/DTW decision metrics • Average the off- diagonal values in each row of the matrix of decision metrics for use as an estimated value of the confusion matrix diagonal. • Greater average off- diagonal values suggest more similarity between the stimuli and the potential incorrect responses, suggesting a lower rate of confusion. • Calculate correlation coefficient between diagonal of listening test confusion matrix and averaged off- diagonal values. Decision metric: , M,N L , ij i j D xs unimp imp1 imp2 imp3 imp4 imp5 imp6 imp7 imp8 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% A verage V ow el S cores M odel P ercent C orrect unimp imp1 imp2 imp3 imp4 imp5 imp6 imp7 imp8 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% A verage C onsonantS cores M odel P ercent C orrect Quiet 8 dB 6 dB Quiet 8 dB 6 dB • Performance comparison: percentage of subjects whose missing channels were correctly identified versus the percentage of possible “false alarms” experienced. • All methods perform better than chance with the best performance obtained using the CICA method. Based on the percentage of total possible false alarms to find all missing channels, we would expect to observe (on average) the following number of false alarms en route to identifying the missing channel in a single subject with eight channels: random guessing (chance): 3.5 HMM: 2.5 DTW: 2.3 CICA: 1.6 unimp imp1 imp2 imp3 imp4 imp5 imp6 imp7 imp8 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% M odel P ercent Transm itted V ow el Inform ation Transm ission unimp imp1 imp2 imp3 imp4 imp5 imp6 imp7 imp8 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% M odel P ercent Transm itted C onsonantInform ation Transm ission D uration F1 F2 Voicing Nasality Affrication D uration Place AVG , L L od ij i mean j i • Find path through cost matrix resulting in lowest cumulative cost D • Token similarities measured quantitatively using the HMM will ideally mirror perceptual token similarities and correlate to the rate of token confusion. • Calculate the log likelihood that the HMM trained on the cepstrum coefficients of an unimpaired processed token will produce as an observation the cepstrum coefficients of an impaired processed token. Hidden Markov Model Structure * Pooled across SNR * *

Transcript of To examine the feasibility of using confusion matrices from speech recognition tests to identify...

Page 1: To examine the feasibility of using confusion matrices from speech recognition tests to identify impaired channels, impairments in this study were simulated.

0 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%0

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Percentage of Total Possible False Alarms

Per

cent

age

of M

issi

ng C

hann

els

Det

ecte

d

CICA

HMMDTW

CHANCE

To examine the feasibility of using confusion matrices from speech recognition tests to identify impaired channels, impairments in this study were simulated by setting the amplitude of one of the acoustic model channels from the acoustic waveform presented to the normal-hearing subjects to zero.

• Removing an acoustic model channel represents the worst-case impairment: a channel transmitting zero speech information.

• This is not a clinically relevant scenario, but an appropriate proof of concept, which is the intended scope of this initial study.

METHOD

• Unimpaired model (baseline)

• Eight analysis / eight presentation filters (p-of-p)

• Logarithmically spaced from 150Hz – 6450Hz

• Speech processed in 2 millisecond windows

• Impaired models

• Eight impaired models [imp1, imp2, imp3, imp4, imp5, imp6, imp7, imp8]

• Single channel removed from model (envelope set to zero) – other channels unaffected

• Nine vowel tokens – [had, hawed, head, heard, heed, hid, hood, hud, who’d]

• Fourteen consonant tokens – [b, d, f, g, j, k, m, n, p, s, sh, t, v, z] – in /aCa/ context

• Three noise levels: quiet, +8 dB SNR, and +6 dB SNR

• Impairment measurably affects vowel recognition, smaller effect on consonant recognition

• The transmission of F1 and F2 is noticeably affected by the impairments. Removing channels 2 and 3 affects F1 transmission, and removing channels 5 through 7 affects F2 transmission.

• Inspired by Miller and Nicely information transmission analysis.

• Use vowel formant-to-channel mapping to determine the classification matrix assuming that the vowel formant locations are the critical discriminating features.

• The classification matrix for CICA can be directly calculated from the distribution of first (F1) and second (F2) vowel formants across the model channels.

• Calculate transmission in bits per stimulus as

from the grouped confusion matrix constructed using the classification matrix.

Impairments to the ideal speech presentation, which can result from various physiological or psychophysical conditions, are one factor affecting how much benefit severely deafened individuals receive from cochlear implants. Studies of the effects of impairments often use acoustic models to examine impairments on all channels; however isolated impairments affecting a small number of electrodes are more common and have been shown to negatively affect speech recognition performance. Assessing cochlear implant device performance on an electrode-by-electrode basis requires extensive, time-consuming tests.

In this study, methods for screening poorly performing electrodes are evaluated using vowel confusion data from normal-hearing subjects tested with acoustic models. Confusion patterns for vowel tokens in noise with missing spectral information were measured using an eight channel acoustic model with individual channels removed. The missing channel model represents the most severe case of a poorly performing channel, with no speech information transmitted on the channel under investigation. Preliminary results indicate successful identification of the missing channels using the vowel confusion matrices for each acoustic model. A technique that quickly identifies potential poorly performing channels would allow researchers, and ultimately clinicians, to develop and implement alternate speech presentations that maximize transmission of critical speech information.

Identifying Poorly Performing Channels in Cochlear Implants Through Confusion Matrix Analysis

Jeremiah J. Remus and Leslie M. CollinsDepartment of Electrical and Computer Engineering, Duke University, Durham, NC

INTRODUCTION Methods Used to Identify Poorly Performing Channels2

Two issues were identified as critical for developing a method to identify impaired channels through speech-based confusion matrix analysis:

1) creating a set of stimuli, speech-based or nonsensical, that thoroughly tests each channel for impairments.

2) developing a measure for classifying the similarities and differences between the tokens with regard to the critical discriminating features transmitted on each channel.

Caution must be exercised to avoid creating a stimulus set that is overly simplified. Tokens must contain sufficient complexity to ensure their usefulness in measuring impairments of interest.

Token confusions in the listening test can be attributed to one of three causes:

1) phonetic similarities between the tokens;2) the structure of the additive noise and experiment

setup;3) the effect of the impairments on the critical

discriminating features.Additional efforts to remove the influence of the first two causes

of confusions from the impaired model confusion matrix, and isolate confusions resulting from impairments to the acoustic model, will improve the identification of the impaired channels.

The CICA method was able to identify the missing channels with half the false alarm rate of random chance. Using the analysis presented in this study on listening test results as a pre-screener to provide some prior information for a full test of channel impairments could significantly reduce the test complexity.

3

1 Listening Experiment

Performance Identifying Missing Channels

4 Conclusions

This research is supported by the National Science Foundation grant NSF-BES-00-85370

Channel Information Confusion Analysis

Dynamic Time Warping (DTW)

Hidden Markov Models (HMM)

10 20 30 40 50 60 70

10

20

30

40

50

60

Least cost mapping through cost/distance matrix

,m nD , 1m nD

1, 1m nD 1,m nD

1, 1 1, 1 , 1, , 1min( , , )m n m n m n m n m nD d D D D

Decision metric: , (M,Q)L P |i j i jobservation

• Compare the cepstrum coefficients of an impaired processed token to the cepstrum coefficients of an unimpaired processed token, used as a template.

• Difference between cepstrum coefficient vectors calculated using Euclidean distance

2

21

( , )N

k kk

d x y x y

Model (λ) ParametersM: number of Gaussian mixturesQ: number of model states

,

( ; ) logij i j

i j ij

n n nT x y

n nn Ch.1 Ch.2 Ch.3 Ch.4 Ch.5 Ch.6 Ch.7 Ch.8

had - - - F1 - F2 - -hawed - - - F1/F2 - - - -head - - F1 - - F2 - -heard - - F1 - F2 - - -heed - F1 - - - F2 - -hid - - F1 - - F2 - -

hood - - F1 - F2 - - -hud - - - F1 F2 - - -

who'd - F1 - - F2 - - -

Distribution of first and second vowel formants across channels

COST MATRIX

….

….

Processed speech signal:

Windowed with 50% overlap:

Cepstrum coefficient vectors calculated for each window:

Distance between two vectors .

Post-processing on HMM/DTW decision metrics

• Average the off-diagonal values in each row of the matrix of decision metrics for use as an estimated value of the confusion matrix diagonal.

• Greater average off-diagonal values suggest more similarity between the stimuli and the potential incorrect responses, suggesting a lower rate of confusion.

• Calculate correlation coefficient between diagonal of listening test confusion matrix and averaged off-diagonal values.

Decision metric: , M,NL ,i j i jD x s

unimp imp1 imp2 imp3 imp4 imp5 imp6 imp7 imp80%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%Average Vowel Scores

Model

Pe

rce

nt C

orr

ect

unimp imp1 imp2 imp3 imp4 imp5 imp6 imp7 imp80%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%Average Consonant Scores

Model

Pe

rce

nt C

orr

ect

Quiet8 dB6 dB

Quiet8 dB6 dB

• Performance comparison: percentage of subjects whose missing channels were correctly identified versus the percentage of possible “false alarms” experienced.

• All methods perform better than chance with the best performance obtained using the CICA method.

Based on the percentage of total possible false alarms to find all missing channels, we would expect to observe (on average) the following number of false alarms en route to identifying the missing channel in a single subject with eight channels:

random guessing (chance): 3.5HMM: 2.5DTW: 2.3CICA: 1.6

unimp imp1 imp2 imp3 imp4 imp5 imp6 imp7 imp80%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Model

Pe

rce

nt T

ran

smitt

ed

Vowel Information Transmission

unimp imp1 imp2 imp3 imp4 imp5 imp6 imp7 imp80%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Model

Pe

rce

nt T

ran

smitt

ed

Consonant Information Transmission

Duration

F1F2

VoicingNasalityAffricationDurationPlace

AVG ,L L odi ji mean j i

• Find path through cost matrix resulting in lowest cumulative cost D

• Token similarities measured quantitatively using the HMM will ideally mirror perceptual token similarities and correlate to the rate of token confusion.

• Calculate the log likelihood that the HMM trained on the cepstrum coefficients of an unimpaired processed token will produce as an observation the cepstrum coefficients of an impaired processed token.

Hidden Markov Model Structure

* Pooled across SNR

* *