Improving Speech Intelligibility in Fluctuating Background ...

98
1 Improving Speech Intelligibility in Fluctuating Background Interference by Laura A. D’Aquila S.B., Massachusetts Institute of Technology (2015), Electrical Engineering and Computer Science, Mathematics Submitted to the Department of Electrical Engineering and Computer Science in Partial Fulfillment of the Requirements for the Degree of Master of Engineering in Electrical Engineering and Computer Science at the Massachusetts Institute of Technology June 2016 © Massachusetts Institute of Technology 2016. All rights reserved. Author: _________________________________________________________________ Department of Electrical Engineering and Computer Science May 20, 2016 Certified by: _____________________________________________________________ Dr. Charlotte M. Reed, Senior Research Scientist Research Laboratory of Electronics May 20, 2016 Certified by: _____________________________________________________________ Professor Louis D. Braida, Henry Ellis Warren Professor Electrical Engineering and Health Sciences and Technology May 20, 2016 Accepted by: ____________________________________________________________ Dr. Christopher J. Terman, Chairman, Masters of Engineering Thesis Committee

Transcript of Improving Speech Intelligibility in Fluctuating Background ...

Page 1: Improving Speech Intelligibility in Fluctuating Background ...

1

Improving Speech Intelligibility in Fluctuating Background

Interference

by Laura A. D’Aquila

S.B., Massachusetts Institute of Technology (2015),

Electrical Engineering and Computer Science, Mathematics

Submitted to the

Department of Electrical Engineering and Computer Science

in Partial Fulfillment of the Requirements for the Degree of

Master of Engineering in Electrical Engineering and Computer Science

at the

Massachusetts Institute of Technology

June 2016

© Massachusetts Institute of Technology 2016. All rights reserved.

Author: _________________________________________________________________

Department of Electrical Engineering and Computer Science

May 20, 2016

Certified by: _____________________________________________________________

Dr. Charlotte M. Reed, Senior Research Scientist

Research Laboratory of Electronics

May 20, 2016

Certified by: _____________________________________________________________

Professor Louis D. Braida, Henry Ellis Warren Professor

Electrical Engineering and Health Sciences and Technology

May 20, 2016

Accepted by: ____________________________________________________________

Dr. Christopher J. Terman, Chairman, Masters of Engineering Thesis Committee

Page 2: Improving Speech Intelligibility in Fluctuating Background ...

2

Improving Speech Intelligibility in Fluctuating Background

Interference

by Laura A. D’Aquila

Submitted to the Department of Electrical Engineering and Computer Science on May

20, 2016, in partial fulfillment of the requirements for the degree of Master of

Engineering in Electrical Engineering and Computer Science.

ABSTRACT

The masking release (MR; i.e., better speech recognition in fluctuating compared to continuous

noise backgrounds) that is evident for normal-hearing (NH) listeners is generally reduced or

absent in hearing-impaired (HI) listeners. In this study, a signal-processing technique was

developed to improve MR in HI listeners and offer insight into the mechanisms influencing the

size of MR. This technique compares short-term and long-term estimates of energy, increases the

level of short-term segments whose energy is below the average energy, and normalizes the

overall energy of the processed signal to be equivalent to that of the original long-term estimate.

In consonant-identification tests, HI listeners achieved similar scores for processed and

unprocessed stimuli in quiet and in continuous-noise backgrounds, while superior performance

was obtained for the processed speech in some of the fluctuating background noises. Thus, the

energy-normalized signals led to larger values of MR compared to that obtained with

unprocessed signals.

Page 3: Improving Speech Intelligibility in Fluctuating Background ...

3

ACKNOWLEDGMENTS

This research was supported by the National Institute on Deafness and Other Communication

Disorders of the National Institutes of Health under Award Number R01 DC000117.

I would like to extend a big thank you to my advisors, Dr. Charlotte M. Reed and Professor

Louis D. Braida. From when I first began doing research in their lab as a sophomore to now as I

wrap up my M.Eng thesis, they have always made themselves available and offered much

guidance, instruction, and support. Their analysis and ideas for moving forward were crucial to

the success of this project. Their kindness made me look forward to coming into lab every day. I

am also extremely grateful for the RA funding that they provided me with as I worked on the

project. Additionally, I would like to heartily thank Dr. Joseph G. Desloge, the signal processing

mastermind of the project. During the spring of my senior year, his help was critical as I coded

the different components of this project. Despite having since taken a new job on the West Coast,

he still kindly spoke with me weekly on the phone throughout the year to discuss my project and

offer his very valuable insight, ideas, and feedback. I could not have asked for a better group of

mentors than Dr. Reed, Professor Braida, and Dr. Desloge. As part of the Sensory

Communication Group, the three of them performed much of the previous work that led to this

project, and this project would also not have been possible without their continued involvement.

I would lastly like to thank my family, who have provided me with countless opportunities

throughout my life without which I would not be at where I am today. I am very grateful for the

love and confidence that they have had in me throughout it all and for their shaping me into the

person I am. It is comforting to know that I can always turn to them no matter what happens.

Page 4: Improving Speech Intelligibility in Fluctuating Background ...

4

I. BACKGROUND

Many hearing-impaired (HI) listeners with sensorineural hearing loss who are able to

understand speech in quiet environments without much difficulty encounter more problems in

noisy situations, such as in a cafeteria or at a social gathering. Indeed, it has been shown that

these listeners require a higher speech-to-noise ratio (SNR) to achieve a given level of

performance than do normal-hearing (NH) listeners (Festen and Plomp, 1990). This is the case

regardless of whether the noise is temporally fluctuating, such as interfering voices in the

background, or is steady-state, such as a motor.

Festen and Plomp (1990) measured the SNR required for 50%-correct sentence reception

in different types of background interference. Whereas HI listeners required a similar SNR

regardless of the type of interfering noise, NH listeners performed better (i.e., required a lower

SNR) in temporally fluctuating interference than in steady-state interference. Listeners who

perform better with fluctuating interference are said to experience a release from masking. This

release from masking occurs when listeners are able to perceive audible glimpses of the target

speech during dips in the fluctuating noise (Cooke, 2006) and it aids in the ability to converse

normally in the noisy social situations mentioned above.

One possible explanation of reduced release from masking in HI listeners is based on the

effects of reduced audibility in HI listeners, who are less likely to be able to receive the target

speech in the noise gaps (Desloge et al., 2010). Léger et al. (2015) looked at release from

masking in greater depth, particularly with respect to consonant recognition with different types

of speech processing. The processing allowed for the examination of the roles played by the

signal’s slowly varying component, known as its envelope (ENV), and rapidly varying

component, known as its temporal fine structure (TFS), on release from masking. The consonant

Page 5: Improving Speech Intelligibility in Fluctuating Background ...

5

speech stimuli were processed using the Hilbert Transform to convey ENV cues, TFS cues, or

ENV cues recovered from TFS speech. Consonant identification was measured in the presence of

steady-state and 10-Hz square-wave interrupted speech-shaped noise. The percent-correct scores

were used to calculate masking release (MR) in percentage points, defined as the difference in

scores in interrupted noise and in continuous noise at a given SNR. The results showed that HI

listeners generally experienced MR for TFS and recovered-ENV speech but not for unprocessed

or ENV speech. The study concluded that the increase in MR may be related to the way the TFS

processing interacts with the interrupted noise signal, rather than to the presence of TFS itself.

Under certain circumstances, the removal of amplitude-envelope variation in TFS speech may

amplify the higher SNR glimpses of the speech signal during gaps in a fluctuating noise.

Reed et al. (2016) further investigated the conclusions of Léger et al regarding the role of

reduced amplitude variation in MR. The study tested an infinite peak-clipped (IPC) speech

condition, which used the sign of each sample point of the input signal to convert positive terms

to +1, convert negative terms to -1, and leave zero terms unchanged. This processing thus also

removed much of the amplitude variation. Speech intelligibility in noise and MR were compared

for TFS, IPC, and unprocessed speech for HI listeners. Outcomes for TFS and IPC speech were

very similar, leading to the conclusion that the removal of amplitude variation can indeed lead to

MR. Because both the TFS and IPC speech contained fine-structure cues, however, it was still

possible that TFS was responsible for the observed MR. Another condition was created in which

both TFS and amplitude-variation cues were eliminated by passing an ENV signal through the

TFS processing stage. Greater MR was observed for this condition than for the original ENV

speech, thus lending support to the hypothesis that reduced amplitude variation can lead to

improved MR in HI listeners. This MR arose as the less-intense portions of the speech stimulus,

Page 6: Improving Speech Intelligibility in Fluctuating Background ...

6

which occurred in the noise gaps, became more audible to HI listeners when the amplitude was

“normalized” to remove variation. These studies proved promising in understanding a potential

way to improve MR in HI listeners; however, the improvement in MR was mainly due to a

decreased performance in continuous noise rather than an increased performance in fluctuating

noise.

To address these issues, Desloge et al. (2016) developed a signal-processing technique

designed to achieve similar reductions in signal amplitude variation without suffering a loss in

intelligibility in continuous background noise. Using non-real-time processing over the

broadband signal, the technique compared short-term and long-term estimates of energy,

increased the level of short-term segments whose energy was below the average energy, and

normalized the overall energy of the processed signal to be equivalent to that of the original

long-term estimate. In consonant-identification tests, HI listeners achieved similar scores for

processed and unprocessed stimuli in quiet and in continuous-noise backgrounds, while superior

performance was obtained for the processed speech in fluctuating background noises. Thus, the

energy-normalized signals led to larger values of MR compared to that obtained with

unprocessed signals. The work described in this paper builds upon Desloge et al. by

implementing and evaluating a real-time and multi-band version of the signal processing

algorithm in a broader range of noises.

II. GOALS

This study investigates a novel signal processing technique, called energy equalization

(EEQ), for the reduction of amplitude variation, which Reed et al. (2016) had concluded could

contribute to MR in HI listeners. EEQ processing “normalizes” the fluctuating short-term signal

energy to be equal to the long-term average signal energy. This technique is thus another way of

Page 7: Improving Speech Intelligibility in Fluctuating Background ...

7

removing the rapid amplitude variation that occurs in speech. The goal is for this signal

processing to improve the performance of HI listeners in fluctuating background noise without

leading to a drop in performance in continuous background noise. This change in performance

would thus result in greater MR for HI listeners.

Energy equalization is applicable in the area of hearing aid and cochlear implant

processing, and it could potentially also be used to benefit NH listeners and even machine

listening systems that use automatic speech recognition. Other potential applications of EEQ

processing include cell-phone or teleconferencing systems where an individual is speaking in a

noisy environment and in speech recognition in interfering backgrounds. Thus, wherever speech

reception is needed in noise, energy equalization could be used.

The short-term signal energy for speech varies at a syllabic rate as intervals fluctuate

between being more intense (usually during vowels), less intense (usually during consonants),

and silent. Meanwhile, the long-term signal energy remains relatively constant and reflects the

overall loudness at which a speaker is talking. These overall properties of speech persist even

when background noise is added to the signal. The quiet portions of the speech signal are the

most troublesome ones to HI listeners and lead to reduced speech comprehension. Energy

equalization is a way for combatting this difficulty by amplifying the quieter parts of the signal

(that may be present during gaps in the background noise) relative to the louder parts of the

signal (that occur when background noise is fully present). This technique makes speech content

present during the dips in background noise more audible and hence useful for speech

comprehension.

Page 8: Improving Speech Intelligibility in Fluctuating Background ...

8

III. SIGNAL PROCESSING ALGORITHM

The EEQ processing seeks to reduce short-term amplitude fluctuations of a speech-plus-

noise (S+N) stimulus while operating blindly and without introducing excessive distortion. The

following is a general description of the steps the EEQ processing performs in real-time on a

S+N signal x(t):

● Form running short-term and long-term moving averages of the signal energy, Eshort(t)

and Elong(t):

Eshort(t) = AVGshort[x2(t)] and Elong(t) = AVGlong[x

2(t)],

where AVG is a moving-average operator that utilizes specified short and long time

constants to provide an estimate of the signal’s energy.

● In this implementation, the AVG operators are single-pole infinite impulse

response (IIR) low pass filters applied to the instantaneous signal energy, x2(t),

with time constants of 5 ms and 200 ms for the short and long averages,

respectively. The magnitude and phase of the square root of the ratio of the

frequency response of AVGlong to the frequency response of AVGshort are shown

in Figure 1, which is useful in understanding the scale factor computed in the next

step of the processing.

● Determine the scale factor, SC(t):

SC(t) = √𝐸𝑙𝑜𝑛𝑔(𝑡) / 𝐸𝑠ℎ𝑜𝑟𝑡(𝑡),

where attention is made to avoid dividing by zero during quiet intervals.

● To prevent over-amplification of the noise floor, SC(t) had an upper limit of 20

dB.

Page 9: Improving Speech Intelligibility in Fluctuating Background ...

9

● To prevent attenuation of stronger signal components, SC(t) had a lower limit of 0

dB.

● Apply the scale factor to the original signal:

y(t) = SC(t)x(t).

● Form the output z(t) by normalizing y(t) to have the same energy as x(t):

z(t) = K(t)y(t),

where K(t) is chosen such that AVGlong[z2(t)] = AVGlong[x

2(t)].

The processing described above can be applied either to a broadband signal or

independently to bandpass filtered components. The current implementation operated on both the

broadband signal (EEQ1) and a signal divided into four contiguous frequency bands (EEQ4).

These conditions are described in more detail in Section IV-D. Figure 2 depicts block diagrams

of the EEQ1 (Figure 2A) and EEQ4 (Figure 2B) algorithms. The EEQ algorithm that was

implemented follows the outline of the steps described above to process x[n], a sampled version

of the original signal x(t). The original signal is first multiplied by SC[n], as shown in Figure 2A,

and the resulting EEQ signal is then multiplied by K[n] to ensure that the long-term energy of the

EEQ signal is equal to the long-term energy of the original signal at every sample point. SC[n] is

restricted to lie in the range of 0 dB to 20 dB. Appendix I describes a modification to the

computation of the scale factor that could be used without this lower limit in place.

IV. METHODS

The experimental protocol for testing human subjects was approved by the internal

review board of the Massachusetts Institute of Technology. All testing was conducted in

Page 10: Improving Speech Intelligibility in Fluctuating Background ...

10

compliance with regulations and ethical guidelines on experimentation with human subjects. All

listeners provided informed consent and were paid for their participation in the experiments.

A. Participants

Six male and three female HI listeners with bilateral, symmetric, mild-to-severe

sensorineural hearing loss participated in the experiment. They were all native speakers of

American English and ranged in age from 20 to 69 years with an average age of 36.7 years. Six

of the listeners were younger (33 years or less) and three were older (58-69 years). Five of the

listeners had sloping high-frequency losses (HI-1, HI-2, HI-4, HI-5, and HI-7), three had

relatively flat losses (HI-6, HI-8, and HI-9), and one had a “cookie-bite” loss (HI-3). Seven of

the listeners (all but HI-1 and HI-3) were regular users of bilateral hearing aids. The five-

frequency (0.25, 0.5, 1, 2, and 4 kHz) audiometric pure-tone average (PTA) ranged from 27 dB

HL to 75 dB HL across listeners with an average of 45.3 dB HL.

The test ear, age, and five frequency PTA for each HI listener are listed in Table 1 along

with the speech levels and SNRs employed in the experiment. The pure-tone thresholds of the HI

listeners in dB SPL are shown in Figure 3. The pure-tone threshold measurements were obtained

with Sennheiser HD580 headphones for 500 ms stimuli in a three-alternative forced-choice

adaptive procedure which estimates the threshold level required for 70.7%-correct detection (see

Léger et al., 2015).

Four NH listeners (defined as having pure-tone thresholds of 15 dB HL or better in the

octave frequencies between 250 and 8000 Hz) also participated in the study. They were native

speakers of American English, included three males and one female, and ranged in age from 19

to 54 years, with an average age of 30.0 years. A test ear was selected for each listener (2 left ear

Page 11: Improving Speech Intelligibility in Fluctuating Background ...

11

and 2 right ear). The mean adaptive thresholds across test ears of the NH listeners are provided in

the first panel of Figure 3.

B. Speech Stimuli

The speech materials were Vowel-Consonant-Vowel (VCV) stimuli, with C=/p t k b d g f

s ʃ v z dʒ m n r l/ and V=/a/, taken from the corpus of Shannon et al. (1999). The set used for

testing consisted of 64 VCV tokens (one utterance of each of the 16 disyllables by two male and

two female speakers). The mean VCV duration was 945 ms with a range of 688 to 1339 ms

across the 64 VCVs in the test set. The recordings were digitized with 16-bit precision at a

sampling rate of 32 kHz and filtered to a bandwidth of 80-8020 Hz for presentation.

C. Interference Conditions

Noises from two broad categories of maskers were added to the speech stimuli prior to

processing for presentation. Four background interference conditions were derived from speech-

shaped noise but did not come from actual speech samples. Three additional background

interference conditions, referred to as vocoded modulated noises, were derived from actual

speech samples. The RMS level of each of the noises except for the baseline condition was

adjusted to be equal to that of the continuous noise, whose level was set as described in Section

IV-F. The maskers used in the study are shown in Figure 4 and are summarized below.

Maskers derived from randomly-generated speech-shaped noise (spectrogram

shown in Figure 5) but not coming from actual speech samples. This paper refers

to these as non-speech-derived noises:

o Baseline Noise (BAS): Continuous speech-shaped noise at 30 dB SPL.

o Continuous Noise (CON): Additional continuous noise added to BAS.

Page 12: Improving Speech Intelligibility in Fluctuating Background ...

12

o Square-Wave Interrupted Noise (SQW): 10-Hz square-wave interruption

with 50% duty cycle added to BAS.

o Sinusoidal Amplitude Modulation Noise (SAM): 10-Hz sinusoidal

amplitude modulation noise added to BAS.

Maskers derived from actual speech samples (referred to as vocoded modulated

noise). These maskers were designed to exhibit fluctuations realistic of speech

without the informational masking component. This paper refers to these as

speech-derived noises:

o 1-Speaker Vocoded Modulated Noise (VC-1)

o 2-Speaker Vocoded Modulated Noise (VC-2)

o 4-Speaker Vocoded Modulated Noise (VC-4)

Appendix II describes the steps used to generate the vocoded modulated noises.

D. Speech Conditions

Listeners were presented with S+N signals with three different kinds of processing

applied:

Unprocessed Condition (UNP): The S+N signals were presented as described above with

no further processing beyond per-subject NAL-RP (Dillon, 2001) amplification.

1-band Energy Equalized Condition (EEQ1): EEQ processing was applied to the

broadband S+N signal over the range of 80-8020 Hz. As described in Section III, the

EEQ processing compared short-term and long-term estimates of S+N signal energy,

increased the level of short-term segments whose energy was below the average signal

energy, and normalized the overall energy of the processed signal to be equivalent to that

of the original long-term estimate (see Figure 2A).

Page 13: Improving Speech Intelligibility in Fluctuating Background ...

13

4-band Energy Equalized Condition (EEQ4): The same technique as in the EEQ1

condition was applied independently to 4 logarithmically-equal bands of the S+N signal

in the range of 80-8020 Hz. In doing so, order 6 (36 dB/octave) passband filters divided

the input signals into bands with frequency ranges of 80 – 253 Hz, 253 – 801 Hz, 801 –

2535 Hz, and 2535 – 8020 Hz, respectively, and the EEQ1 processing was applied

independently to each band prior to reconstructing the signal by summing across bands

(see Figure 2B).

E. Speech and Noise Signals

Figure 6 shows the waveform of one the VCV tokens used in the experiment, ‘APA’, for

UNP speech in BAS noise. The vowel components, which have more energy than the consonant

component, constitute the two higher-energy sections of the speech that surround the weaker

consonant component in the center. These sections of the speech are annotated at the top of the

figure. Figure 7 shows the waveforms of this token in the different speech and noise conditions

(Figure 7A for the UNP condition, Figure 7B for the EEQ1 condition, and Figure 7C for the

recombined EEQ4 condition). In every type of interference except for BAS, the SNR is set to -4

dB. The left panels show the S+N waveforms, and the right panels show the distribution of the

amplitude of the S+N signal in dB. These amplitude distributions were generated by sampling

points of the S+N signal and, based on their amplitudes in dB, placing them into buckets of

length 1 dB in the range of -10 dB to 85 dB. The RMS value of the signal in dB is shown by the

blue vertical line, and the median amplitude is shown by the green vertical line.

The gaps of the noise in the plots of the S+N waveforms make evident the reduction in

short-term amplitude fluctuations by the EEQ processing. For example, a comparison of the S+N

waveforms in SQW between UNP and either EEQ1 or EEQ4 shows that the lower-energy speech

Page 14: Improving Speech Intelligibility in Fluctuating Background ...

14

components that are present during the gaps in the fluctuating interference are greater in energy

in the EEQ processed signals. The reduction in amplitude is also seen in the amplitude

distributions in the right panels. The low-energy tails of the amplitude distributions in the UNP

condition are reduced or absent in the EEQ1 and EEQ4 conditions. As a result, the median

amplitudes (given by the green vertical lines) in the EEQ1 and EEQ4 conditions are shifted to

the right, despite the RMS values (given by the blue vertical lines) remaining constant between

UNP and EEQ (as a result of the final normalization step in the EEQ processing that sets the

long-term energy of the output equal to the long-term energy of the input at every sample point).

These effects are analyzed in more detail in Section VI-A of the paper.

Figure 8 shows the EEQ4 waveforms and amplitude distributions in the CON (Figure

8A) and SQW (Figure 8B) conditions on a band-by-band basis. As in Figure 7, an SNR of -4 dB

is used. The boundaries of the four bands, which are logarithmically spaced, are 80 – 253 Hz

(Band 1), 253 – 801 Hz (Band 2), 801 – 2535 Hz (Band 3), and 2535 – 8020 Hz (Band 4). Band

2 has the largest RMS value, followed by, in decreasing order, Band 3, Band 4, and Band 1. The

EEQ1 processing was applied independently in each band.

F. Test Procedure

Experiments were controlled by a desktop PC using MatlabTM software. The digitized

speech-plus-noise stimuli were played through a 24-bit PCI sound card (E-MU 0404 by Creative

Professional) and then passed through a programmable attenuator (Tucker-Davis PA4) and a

headphone buffer (Tucker-Davis HB6) before being presented monaurally to the listener in a

soundproof booth via a pair of headphones (Sennheiser HD580). A monitor, keyboard, and

mouse located within the soundproof booth allowed the listener to interact with the control PC.

Page 15: Improving Speech Intelligibility in Fluctuating Background ...

15

Consonant identification was tested using a one-interval, 16-alternative, forced-choice

procedure without correct-answer feedback. On each 64-trial run, one of the 64 tokens from the

test set was selected randomly without replacement. Depending on the noise condition, a

randomly selected noise segment equal in duration to that of the speech token was scaled to

achieve the desired SNR and then added to the speech token. The resulting stimulus was either

presented unprocessed (for the UNP conditions) or processed according to EEQ1 or EEQ4

before being presented to the listener for identification. The listener’s task was to identify the

medial consonant of the VCV token that had been presented by selecting a response (using a

computer mouse) from a 4x4 visual array of orthographic representations associated with the

consonant stimuli. No time limit was imposed on the listeners’ responses. Each run typically

lasted 3-5 minutes depending on the listener’s response times. Chance performance was 6.25%-

correct.

Experiment 1. NH listeners were tested using a speech level of 60 dB SPL. The SNR was

set to -10 dB (selected to yield roughly 50%-correct performance for UNP speech in CON noise)

for all noise conditions (except for BAS). For the HI listeners, a linear-gain amplification was

applied to the speech-plus-noise stimuli using the NAL-RP formula (Dillon, 2001). Each HI

listener selected a comfortable speech level when listening to UNP speech in the BAS condition.

For these listeners, the SNR was selected to yield roughly 50%-correct performance for UNP

speech in CON noise. The speech levels and SNRs for each HI listener are listed in Table 1. The

noise levels in dB are the differences between the speech levels and the SNRs.

The three speech conditions were tested in the order of UNP first, followed by EEQ1 and

EEQ4 in a random order. The seven noise conditions were tested in order of BAS first, followed

by a randomized order of the remaining six noises (CON, SQW, SAM, VC-1, VC-2, and VC-4).

Page 16: Improving Speech Intelligibility in Fluctuating Background ...

16

Five 64-trial runs were presented for each of the 21 conditions (3 speech types x 7 noises). The

first run was considered as practice and discarded. The final four test runs were used to calculate

the percent-correct score in each condition.

Experiment 2. Four of the HI listeners (HI-2, HI-4, HI-5, and HI-7) were tested at two

additional values of SNR after completing Experiment 1. As shown in Table 2, one SNR was 4

dB lower than that employed in Experiment 1 and the other was 4 dB higher. This testing was

conducted with UNP and EEQ1 speech in six types of noise: CON, SQW, SAM, VC-1, VC-2,

and VC-4. The test order for UNP and EEQ1 speech was selected randomly for each listener. For

each speech type, the two additional values of SNR were presented in random order. Within each

SNR, the test order of the six types of noises was selected at random. Five 64-trial runs were

presented at each condition using the tokens from the test set. The first run was discarded as

practice and the final four runs were used to calculate the percent-correct score on each of the 24

additional conditions (2 speech types x 6 noises x 2 SNRs). Other than the SNR, all other

experimental parameters remained the same as in Experiment 1.

G. Data Analysis

For each condition, percent-correct scores were averaged over the final 4 runs (consisting

of 4*64=256 trials). Analysis of Variance (ANOVA) tests were performed on rationalized

arcsine units (RAU; Studebaker, 1985) scores to examine the effects of speech type and noise

condition on these percent-correct scores. MR in percentage points was calculated as the

difference between scores in fluctuating noise and in continuous noise:

MR = Score in Fluctuating Noise - Score in Continuous Noise .

Page 17: Improving Speech Intelligibility in Fluctuating Background ...

17

Additionally, as was done by Léger et al., a normalized measure of masking release (NMR) was

calculated as the quotient of MR and the difference between scores in quiet and in continuous

noise:

NMR = Score in Fluctuating Noise - Score in Continuous Noise

Score in Quiet - Score in Continuous Noise .

NMR thus represents the fraction of baseline performance lost in continuous noise that can be

recovered in interrupted noise. Listeners who perform just as well in fluctuating noise as in quiet

have an NMR of 1, and listeners who do not perform any better in fluctuating noise than in

continuous noise have an NMR of 0. The metric is useful for comparing performance among HI

listeners whose scores in quiet are different. By using baseline performance as a reference, NMR

emphasizes the differences in performance with interrupted and continuous noise as opposed to

the differences due to factors such as the severity of the hearing loss of the listener or the

distorting effects of the processing on the speech itself.

The MR and NMR calculations in SQW and SAM noises used CON noise as the

continuous noise, and the MR and NMR calculations in VC-1 and VC-2 noises used VC-4 noise

as the continuous noise. These NMR formulas are listed here:

NMRSQW = SQW Score - CON Score

BAS Score - CON Score

NMRSAM = SAM Score - CON Score

BAS Score - CON Score

NMRVC−1 = VC-1 Score - VC-4 Score

BAS Score - VC-4 Score

NMRVC−2 = VC-2 Score - VC-4 Score

BAS Score - VC-4 Score

Page 18: Improving Speech Intelligibility in Fluctuating Background ...

18

V. RESULTS

A. Experiment 1

The scores from Experiment 1 are reported in Appendix III-A and Appendix III-B and

are summarized in Figure 9, Figure 10, and Figure 11. Appendix III-A provides the scores for

each NH listener in each of the seven noise conditions for UNP, EEQ1, and EEQ4 speech, and

Appendix III-B provides this same information for each HI listener. In Figure 9, the scores are

plotted to highlight the differences in the average scores of the NH and HI listeners across

conditions. In Figure 10 and Figure 11, the scores are plotted to highlight the differences of

speech types within each noise for the NH and HI listeners (Figure 10 for the average NH results

and the average HI results and Figure 11 for the average NH results and the individual HI

results).

First, consider average NH and HI performance, as shown in Figure 9. As expected, the

performance for both groups was greatest in the BAS condition. Performance was lowest in

CON (and was approximately 50%-correct by design of the experiment) and VC-4 (which was

derived from samples of enough speakers to behave similarly to continuous noise). Performance

was intermediate for the remaining noises. Other than in CON noise, scores were greater for NH

than for HI listeners across noise conditions for all three speech types. The differences between

the two groups were relatively small in the BAS condition (where the average differences

between NH and HI listeners were 5.3% in UNP, 7.8% in EEQ1, and 12.2% in EEQ4), showing

that the two groups diverge the most in fluctuating noise conditions where NH listeners were

able to listen in the gaps, unlike HI listeners. In fact, across the five noises other than BAS and

CON, NH scores were on average 17.9, 15.9, and 17.1 percentage points higher than HI scores

for the UNP, EEQ1, and EEQ4 conditions, respectively. HI listeners exhibited slightly more

Page 19: Improving Speech Intelligibility in Fluctuating Background ...

19

variability in their results than did NH listeners: the mean standard deviations across listeners

(computed as the average of the standard deviations in each of the seven noises1) in percentage

points were 3.59 for UNP, 3.23 for EEQ1, and 4.38 in EEQ4 for NH listeners and 4.67 in UNP,

4.86 in EEQ1, and 4.59 in EEQ4 for HI listeners.

Next, consider NH and HI performance across the different speech types, as is shown in

Figures 10 and 11. Both figures show the mean scores for the NH listeners. Figure 10 shows the

mean scores for the HI listeners, whereas Figure 11 shows the scores for the individual HI

listeners. Note that the data depicted here are the same as that shown in Figure 9 and are

replotted to highlight differences in speech types within a given noise. In general, both NH and

HI listeners scored best in UNP followed by EEQ1 followed by EEQ4. Averaged across the

different listeners and noise types, the NH scores were 78.7% in UNP, 75.5% in EEQ1, and

73.4% in EEQ4, and the HI scores were 65.3% in UNP, 63.0% in EEQ1, and 59.1% in EEQ4.

By noise type, the scores of NH listeners generally followed the pattern of CON = VC-4 < VC-2

< VC-1 < SAM < SQW < BAS, and those of HI listeners generally followed the pattern of VC-4

< CON = VC-2 < VC-1 = SAM < SQW < BAS. Averaged across the different listeners and

speech types, the NH scores were 98.3% in BAS, 52.1% in CON, 92.4% in SQW, 86.2% in

SAM, 81.1% in VC-1, 68.6% in VC-2, and 52.4% in VC-4, and the HI scores were 89.8% in

BAS, 51.5% in CON, 72.1% in SQW, 64.5% in SAM, 61.7% in VC-1, 52.1% in VC-2, and

45.7% in VC-4. EEQ1 processing was effective in improving the scores of HI and NH listeners

in SQW noise: the average NH listener and eight of the nine individual HI listeners (all but HI-3)

1 Note that here, the standard deviation for a given noise and processing condition is calculated

as √∑ 𝜎𝑖

2𝑛𝑖=1

𝑛 , where 𝜎𝑖

2 is the variance of the four recorded runs on listener i and n is the number

of listeners.

Page 20: Improving Speech Intelligibility in Fluctuating Background ...

20

scored higher with EEQ1 than with UNP in SQW noise. EEQ1 processing also yielded improved

performance for SAM noise in six of the nine HI listeners (all but HI-3, HI-7, and HI-9). For

EEQ4 processing, no improvements over UNP were seen in SQW noise for NH listeners;

however, all but one HI listener (HI-3) showed an improvement. For EEQ4 in SAM noise, there

was no evidence for improvements over UNP for either NH or HI listeners. For all remaining

noise conditions, for both HI and NH, scores were highest with UNP and lowest with EEQ4,

with EEQ1 in between.

The results obtained on each individual NH and HI listener were analyzed using a two-

way ANOVA with main factors of speech type and noise condition. The ANOVAs were

conducted at the significance level of 0.01 on the RAU of the 84 percent-correct scores obtained

on each listener (3 speech x 7 noise x 4 repetitions) and are reported in Table 3. All but one of

the NH listeners (NH-1) and all of the HI listeners had a significant effect of speech type, and all

of the NH and HI listeners had a significant effect of noise type. One of the NH listeners (NH-2)

and all but three of the HI listeners (HI-2, HI-3, and HI-7) had a significant speech by noise

interaction.

Post-hoc Tukey-Kramer comparisons at the significance level of 0.05 were conducted for

cases of significant main factor effects, and the results are listed in Table 4. By speech type, most

listeners had UNP = EEQ1 > EEQ4 (NH-2, HI-2, HI-4, HI-8, HI-9) or UNP > EEQ1 = EEQ4

(NH-3, NH-4, HI-1, HI-5, HI-7). The exceptions were HI-3, who had UNP > EEQ1 > EEQ4, and

HI-6, who had EEQ1 > UNP > EEQ4. By noise type, BAS, SQW, SAM, and VC-1 were greater

than VC-2, VC-4, and CON. Most listeners had BAS > SQW > SAM > VC-1 (NH-2, NH-3, NH-

4, HI-1, and HI-3) or BAS > SQW > SAM = VC-1 (NH-1, HI-2, HI-4, HI-5, HI-6, HI-8, and HI-

9). The exception was HI-7, who had BAS > SQW = SAM = VC-1. All NH listeners had VC-2 >

Page 21: Improving Speech Intelligibility in Fluctuating Background ...

21

VC-4 = CON, and the order of VC-2, VC-4, and CON in HI listeners varied with each listener,

with five of the nine HI listeners (HI-2, HI-4, HI-5, HI-7, and HI-8) having no significant

differences among the three conditions. As discussed in the preceding paragraph, the significant

speech by noise interaction present in many of the HI listeners is largely due to improved

performance with EEQ1 processing relative to UNP in the SQW and SAM conditions but not in

the other noises.

The NMR data calculated from the scores of Experiment 1 are reported in Appendix III-C

and Appendix III-D and are summarized in Figure 12. Appendix III-C provides the NMR for

each NH listener in the SQW, SAM, VC-1, and VC-2 noise conditions for UNP, EEQ1, and

EEQ4 speech, and Appendix III-D provides this same information for each HI listener. In Figure

12, the NMR results are plotted for the average NH listener and the individual HI listeners to

highlight the differences of speech types within each noise.

As shown in Figure 12, for the HI listeners, NMR was generally similar in EEQ1 and

EEQ4 speech in the various noises and was greater in EEQ1 and EEQ4 than in UNP speech.

Averaged over the HI listeners and the noise types, these NMR values were 0.266 in UNP, 0.345

in EEQ1, and 0.361 in EEQ4. NMR for HI listeners by noise type was generally greatest in SQW

interference, smallest in VC-2 interference, and between the two and equivalent in SAM and

VC-1 interference. As such, NMR was generally greater in the non-speech derived noises than in

the speech-derived noises. Averaged over the HI listeners and speech types, these NMR values

were 0.525 in SQW, 0.320 in SAM, 0.359 in VC-1, and 0.143 in VC-2. EEQ processing yielded

the largest improvement in NMR for HI listeners in the SQW conditions. This improvement

decreased in the SAM condition and disappeared in the VC-1 and VC-2 conditions. Averaged

across HI listeners, NMR values for UNP, EEQ1, and EEQ4, respectively, were 0.320, 0.639,

Page 22: Improving Speech Intelligibility in Fluctuating Background ...

22

and 0.616 in SQW; 0.227, 0.400, and 0.335 in SAM; 0.391, 0.376, and 0.312 in VC-1; and

0.126, 0.125, and 0.180 in VC-2. NH listeners generally achieved greater NMR than did the HI

listeners with little effect of speech type. Averaged across speech type for NH listeners, NMR

decreases in the order of SQW, SAM, VC-1, and VC-2. Averaged across NH listeners, NMR for

UNP, EEQ1, and EEQ4, respectively, were 0.861, 0.907, and 0.846 in SQW; 0.792, 0.735, and

0.694 in SAM; 0.673, 0.600, and 0.607 in VC-1; and 0.356, 0.351, and 0.355 in VC-2. Both

within and across listeners, HI listeners exhibited greater variability in their results than did NH

listeners.

B. Experiment 2

The scores of Experiment 2 are reported in Appendix IV-A and are summarized in Figure

13. Appendix IV-A provides the scores for each HI listener in the non-BAS noise conditions for

UNP and EEQ1 speech at each of the three SNRs that were tested. Figure 13A plots the results in

non-speech derived noises (except BAS) as a function of SNR and fits sigmoidal functions to the

data, and Figure 13B does the same for the speech-derived noises. The sigmoidal fits to the

psychometric functions in Figure 13 assumed a lower bound corresponding to chance

performance on the consonant-identification task (6.25%-correct) and an upper bound

corresponding to a given listener’s score on the BAS condition for UNP or EEQ. The fitting

process found the slope and midpoint values of a logistic function that minimized the error

between the fit and the data points. The results of the fits are summarized in Table 5 in terms of

their midpoints (SNR in dB yielding a 50%-correct score) and slopes around the midpoint (in

percentage points per dB). For the CON noise conditions, the midpoints and slopes were similar

for UNP and EEQ1 signals for each of the HI listeners. In CON, averaged across listeners,

midpoints were -3.9 dB and -2.8 dB for UNP and EEQ1, respectively, and slopes were 5.2

Page 23: Improving Speech Intelligibility in Fluctuating Background ...

23

percent per dB and 4.3 percent per dB, respectively. In the two non-speech derived fluctuating

background noises, the midpoints were lower for EEQ1 than for UNP for each of the HI

listeners. Averaged across listeners and for UNP and EEQ1, respectively, midpoints were -13.6

dB and -98.5 dB in SQW (a difference of 84.9 dB) and -7.3 dB and -15.9 dB in SAM (a

difference of 8.6 dB).2 For the speech-derived noise conditions, the midpoints and slopes were

similar for UNP and EEQ1 signals for each of the HI listeners. Averaged across listeners and for

UNP and EEQ1, respectively, midpoints were -9.0 dB and -11.0 dB in VC-1 (a difference of 2.0

dB), -5.8 dB and -4.5 dB in VC-2 (a difference of -1.3 dB), and -3.4 dB and -2.4 dB in VC-4 (a

difference of -1.0 dB). Slopes were similar for both types of processing, where they were ordered

as SQW < SAM < CON for the non-speech derived noises and VC-1 < VC-2 < VC-4 for the

speech-derived noises.

In Figure 14, MR in percentage points is plotted as a function of SNR for SQW, SAM,

VC-1, and VC-2. Here, MR was computed as the difference in sigmoid fits between fluctuating

and continuous noises. Note that this metric differs from NMR in that it is not normalized by the

difference between scores in quiet and in continuous noise (i.e., MR is the numerator in the

NMR quotient). Similarly to what was done in the NMR calculations, MR was computed with

CON as the continuous noise when SQW and SAM were the fluctuating noises and with VC-4 as

the continuous noise when VC-1 and VC-2 were the fluctuating noises. In SQW interference,

these plots indicate greater MR with EEQ1 than with UNP for all subjects across the SNRs. For

SAM interference, MR with EEQ1 generally exceeded that with UNP, although the increase was

generally smaller than in SQW interference. The trend was similar with VC-1 interference,

2 It should be noted that the midpoint of HI-5 (-300.1 dB) was highly deviant relative to the

remaining 3 HI listeners (whose midpoints ranged from -24.4 to -37.3 dB). The SQW midpoint

average falls to -31.3 dB if HI-5 is eliminated, leading to a difference of 17.6 dB with UNP.

Page 24: Improving Speech Intelligibility in Fluctuating Background ...

24

although the increase in MR with EEQ1 was smaller than that observed for SAM interference.

With VC-2, there was no clear trend showing greater MR for either EEQ1 or UNP. These

observations were generally consistent with the NMR findings discussed in the next paragraph.

NMR was calculated from the scores of Experiment 2 and is reported in Appendix IV-B

and summarized in Figure 15. Appendix IV-B provides the calculated NMR data for each HI

listener in each of the seven noise conditions for UNP and EEQ1 speech at each of the three

SNRs that were tested. In Figure 15A, NMR for EEQ1 is plotted as a function of NMR for UNP

for the individual HI listeners in SQW and SAM noise at the various SNRs, and in Figure 15B,

this same information is plotted for VC-1 and VC-2 noise. In Figure 15A, every NMR data point

lies above the 45-degree reference line, showing a strong tendency in HI listeners for larger

NMR with EEQ1 processing in non-speech derived noises at all SNRs tested. Additionally,

NMR was greater with SQW interference than with SAM interference. In SQW noise, NMR

averaged across subjects at the low, mid, and high SNRs was 0.431, 0.314, and -0.100,

respectively, for UNP and 0.765, 0.657, and 0.564, respectively, for EEQ1. These same numbers

in SAM noise were 0.284, 0.210, and 0.136, respectively, for UNP and 0.505, 0.501, and 0.467,

respectively, for EEQ1.

As shown in Figure 15B, there was less of a difference in NMR for UNP and EEQ1 for

the speech-derived noises than seen in Figure 15A for the non-speech derived noises. However,

more data points were above the reference line with VC-1 than with VC-2. Additionally, NMR

with both UNP and EEQ1 was greater with VC-1 interference than with VC-2 interference. In

VC-1 noise, NMR averaged across subjects for UNP was 0.366 at the low SNR, 0.386 at the mid

SNR, and 0.231 at the high SNR. These numbers for EEQ1 were 0.488 at the low SNR, 0.369 at

Page 25: Improving Speech Intelligibility in Fluctuating Background ...

25

the mid SNR, and 0.515 at the high SNR. These same numbers in VC-2 noise were 0.223, 0.117,

and 0.019, respectively, for UNP and 0.177, 0.119, and 0.247, respectively, for EEQ1.

VI. DISCUSSION

This section discusses the results of the experiments in greater detail and analyzes

potential explanations for the outcomes. Section VI-A begins by examining the effects that the

EEQ processing has on the amplitude variability of the waveforms. In Section VI-B, the EEQ

effect on NMR is explored. Models are introduced in Section VI-C that attempt to predict the

performance benefit gained with the EEQ processing. In Section VI-D, EEQ1 is compared to

EEQ4 in an attempt to understand differences in performance. Finally, in Section VI-E, different

types of background interference are subjected to a glimpse analysis to explain the different

effects of the EEQ processing with the speech-derived versus non-speech-derived noises.

A. Effect of EEQ on Signal Amplitude

The waveform and amplitude distribution plots of the various S+N signals in Figures 7A,

7B, and 7C are now examined in more detail to assess how EEQ achieves its goal of equalizing

the energy of an S+N signal. The amplitude distribution plots depict RMS values with blue

vertical lines and amplitude medians with green vertical lines. Median amplitudes were plotted

because of their resilience to outliers as compared to the means. As shown in the figures, the

RMS values are constant between UNP and EEQ1 and between UNP and EEQ4 within each type

of interference. This is because the final step of the EEQ processing normalizes the output signal

at every sample point to have a long-term energy equal to that of the input signal. Note also that

the RMS value, which is determined by the levels of the speech and the noise, is equal in all

types of interference except BAS. This is because, in the figure, the SNR is -4 dB in all non-BAS

Page 26: Improving Speech Intelligibility in Fluctuating Background ...

26

conditions. However, despite the RMS values being the same within a type of interference, the

median amplitudes are not the same. The median amplitude is greater with EEQ1 and EEQ4 than

with UNP. For the VCV token depicted in the figure, the differences in median amplitudes in dB

between EEQ1 and UNP are 2.10 for BAS, 0.42 for CON, 1.78 for SQW, 2.05 for SAM, 1.36 for

VC-1, 1.11 for VC-2, and 0.80 for VC-4. Note that except in CON and VC-4, these values are

smaller than the differences in mean amplitudes between EEQ1 and UNP, which are 4.98 for

BAS, 0.39 for CON, 4.25 for SQW, 2.82 for SAM, 2.97 for VC-1, 1.79 for VC-2, and 0.65 for

VC-4. The fact that the differences in mean amplitudes are greater than the differences in median

amplitudes highlights the fact that the UNP histograms contain tails of low-energy components

that are not present with EEQ. The rightwards shift of the amplitude distribution with the EEQ

processing occurs because the lower energy speech components which are present during the

gaps in the noise are amplified with the processing. The movement of the tail of the amplitude

distribution towards the center of the histogram corresponds to the reduction in amplitude

variation in the processed speech. The waveform and amplitude distributions of EEQ1 and EEQ4

look approximately the same when examining the broadband signals. In dB, the absolute values

of the differences in mean amplitudes between EEQ1 and EEQ4 are 0.21 for BAS, 0.27 for

CON, 0.32 for SQW, 0.33 for SAM, 0.18 for VC-1, 0.73 for VC-2, and 0.57 for VC-4. Further

discussion of the differences between EEQ1 and EEQ4 is found in Section VI-D below.

B. Effect of EEQ on NMR

It was stated that the goals of this research are to increase NMR in HI listeners by

increasing performance in fluctuating interference while maintaining performance in baseline

and continuous noise conditions. For HI listeners, EEQ1 processing yielded improved

performance in SQW and SAM noises (average scores increased by 7.2 and 1.6 percentage

Page 27: Improving Speech Intelligibility in Fluctuating Background ...

27

points, respectively) but not for the speech-derived noises. For HI listeners, EEQ1 processing

resulted on average in 2.4 and 6.3 percentage point reductions in performance for BAS and CON

noises, respectively. As such, for HI listeners, NMR was greater in SQW and SAM noises with

EEQ1 compared to UNP. This was brought about both by an increase in performance in

fluctuating noise and a greater decrease in performance in CON noise than in BAS. For HI

listeners with EEQ4 processing, NMR also increased, but compared to UNP there was a bigger

performance drop in all noise conditions except SQW, which had a slight performance increase.

Meanwhile, for NH listeners, EEQ1 processing yielded a slight improvement in performance in

SQW noise (average score increased by 1.4 percentage points) but not in the remaining noises.

The overall effect on NMR was minimal both in EEQ1 and EEQ4.

The benefits of EEQ processing for HI listeners in SQW interference are evident through

the NMR results, which are shown in Figure 12 for Experiment 1 and in Figure 15 for

Experiment 2. Figure 16 re-plots the results from Figure 12 to show NMR as a function of the 5-

frequency PTA hearing loss of each of the nine HI listeners. In the figure, NMR decreases as a

function of PTA with UNP speech, which demonstrates the increasing effects of reduced

audibility in the SQW noise gaps with severity of hearing loss. However, with EEQ1 and EEQ4

processing, NMR is much more constant relative to PTA, which highlights the benefits provided

to HI listeners by making the speech component of the signal more audible in the SQW noise

gaps. Additionally, as shown in Figure 15, the increase in NMR with EEQ1 relative to UNP for

SQW and SAM holds at various SNRs: with UNP in these types of interference, NMR becomes

close to zero or even negative at the high SNRs, whereas with EEQ1, NMR is always positive.

Page 28: Improving Speech Intelligibility in Fluctuating Background ...

28

C. Modelling Psychometric Functions

Two analyses were performed to explore the percent-correct performance shown in

Figure 13 for each speech type and noise as a function of SNR and to attempt to account for the

changes in performance, especially the performance boost in SQW noise.

1) Local Changes in SNR

The first analysis investigated whether the performance improvement in fluctuating

noises with EEQ processing can be explained solely by changes to the SNR. Specifically, for

low-to-moderate SNRs and fluctuating noise, EEQ tends to amplify the higher-SNR stimulus

segments present in the gaps when noise energy is low relative to the lower-SNR stimulus

segments when the noise energy is high. By doing this, EEQ changes the effective SNR of the

stimulus, and so it is possible that the observed increase in NMR, which depends on the

observer, might be explained simply by an increase in SNR, which is independent of the

observer. The first analysis addressed this question by estimating the change in SNR as a result

of EEQ processing and looking at scores as a function of this changed SNR.

Although EEQ processing is nonlinear, the scale factor is applied linearly to the speech

and to the noise. Thus, it is possible to determine its effect on the speech and noise components

of the signal separately for a particular stimulus at a particular input SNR, thus allowing

computation of the post-processing SNR for that input. The output SNR for a particular input

sample (consisting of specific speech and noise samples s(t) and n(t) and a known input SNR,

SNRUNP) may be calculated as follows:

(1) Compute the EEQ scale factor SC(t) applied to an input of x(t) = s(t) + n(t).

(2) The EEQ output signal is given as y(t) = x(t) * SC(t) = ys(t) + yn(t), where:

ys(t) = s(t) * SC(t) and

Page 29: Improving Speech Intelligibility in Fluctuating Background ...

29

yn(t) = n(t) * SC(t) .

(3) The post-processed SNR (in dB) for this combination of s(t) , n(t), and SNRUNP is

𝑆𝑁𝑅𝐸𝐸𝑄 = 10log10( 𝑦𝑠2(𝑡)̅̅ ̅̅ ̅̅ ̅ / 𝑦𝑛

2(𝑡)̅̅ ̅̅ ̅̅ ̅ ) ,

where 𝑦𝑠2(𝑡)̅̅ ̅̅ ̅̅ ̅ and 𝑦𝑛

2(𝑡)̅̅ ̅̅ ̅̅ ̅ are the mean values of 𝑦𝑠 2(t) and 𝑦𝑛

2 (t), respectively.

Each of the 64 speech tokens used in the experiments was examined with six noise types

(CON, SQW, SAM, VC-1, VC-2, and VC-4) and values of SNRUNP (ranging from -40 to +40

dB). For every combination of speech token, noise type, and SNRUNP, 10 noise samples n(t) of

length equal to s(t) were randomly generated. The above procedure was used to calculate

SNREEQ1 as a function of SNRUNP and noise type averaged across each of 10 noise samples

combined with each of the 64 speech tokens. These averages were used to formulate a pre-to-

post-processing SNR mapping function SNREEQ1 = F(SNRUNP, noise type), shown in Figure 17,

where a diagonal reference line is included to show the case of SNRUNP = SNREEQ1. When

SNRUNP is negative, EEQ1 processing provides an SNR boost by raising the level of the signal

present in the dips in the noise. Interestingly, when SNRUNP is positive, EEQ1 processing actually

lowers the SNR because fluctuations in the signal, as opposed to the noise, drive the

equalization. The CON, SQW, SAM, VC-1, VC-2, and VC-4 curves cross the reference line at

SNRUNP values of -5.8, 4.8, 3.1, 2.5, -0.6, -2.3 dB, respectively.

Using the pre-to-post-processing SNR mapping function, the psychometric functions in

Figure 13 were replotted. Scores for UNP were plotted versus SNRUNP and scores for EEQ1 were

plotted versus SNREEQ1. These plots can be seen in Figure 18A for the non-speech-derived noises

and in Figure 18B for the speech-derived noises. Had the performance boost with EEQ1

processing been able to be explained solely by the change in SNR, the score for UNP and EEQ1

in a given noise type should be the same at a given SNR. For the non-speech-derived noises, this

Page 30: Improving Speech Intelligibility in Fluctuating Background ...

30

prediction fits well for the data of HI-2 and HI-7, especially in the SQW and SAM conditions,

and for HI-5 in the SAM condition. For the speech-derived noises, this prediction fits well for the

data of HI-2 and HI-7 in the VC-1, VC-2, and VC-4 conditions, for HI-4 in the VC-1 condition,

and for HI-5 in the VC-2 condition. Other than these cases, the modelling of performance based

on the SNR mapping function is less effective.

It should be noted that the local SNR analysis was computed using an entire VCV token,

which is dominated by the vowel components in both duration and level. It is assumed that for

the consonant portion alone, the SNREEQ1 vs SNRUNP curves cross the diagonal reference line at

more positive SNRs than are shown in Figure 17 for the whole VCV token. This is because the

low-energy consonant component is often the beneficiary of the short-term amplification

provided by the EEQ processing algorithm. As such, the EEQ processing does not have a

negative impact on local consonant SNR until more positive SNRs, at which point the speech is

dominant enough that a slight decrease in SNR would not hurt performance.

2) Crest Factor

The second analysis explored whether the performance improvements with the EEQ

processing can be explained by the changes in amplitude variation of an S+N signal. The crest

factor, defined as the peak amplitude of a waveform x divided by its RMS value, gives a sense of

the amplitude variation of the signal. In dB, the crest factor is given as:

Crest Factor = 20log10 (|𝑥|𝑝𝑒𝑎𝑘

𝑥𝑟𝑚𝑠).

Because EEQ processing reduces amplitude variation, it is expected that the processing

would reduce the crest factor as the maximum value of the signal moves closer to the RMS value

of the signal. In a manner similar to what was done for the SNR analysis above, each of the 64

speech tokens used in the experiments was examined with various noise types (CON, SQW,

Page 31: Improving Speech Intelligibility in Fluctuating Background ...

31

SAM, VC-1, VC-2, and VC-4), processing conditions (UNP and EEQ1), and values of SNRUNP

(ranging from -40 to +40 dB). For every combination of speech token, processing, and SNRUNP,

10 noise samples n(t) of length equal to s(t) were randomly generated. The average maximum

value and the average RMS value across these 10 S+N samples were then recorded. Using these

two average values, an average crest factor in dB was calculated for each of the 64 test syllables

at each noise type, processing type, and value of SNRUNP. Finally, the 64 crest factors (in dB)

calculated for each condition were averaged to formulate a function of Crest Factor = F(SNRUNP,

noise type, processing type). This function is shown in Figure 19A for the non-speech-derived

noises and in Figure 19B for the speech-derived noises, where it can be seen that the EEQ1 crest-

factor curves lie below the corresponding UNP curves of the same noise type. This effect is

consistent with the reduced amplitude variability (and therefore decrease in the ratio of its

maximum value to its RMS value) in the EEQ1 processed signal. Note that the crest factor for

speech in the speech-derived noises is more variable than that in the non-speech-derived noises,

as shown by the jagged curves across the SNRs in Figure 19B as compared to Figure 19A. This

behavior comes from the greater variability in the speech-derived noises in general.

In a manner similar to what was done with the SNR analysis described above, the

psychometric functions in Figure 13 were plotted on a crest-factor scale. Scores for UNP were

plotted versus the pre-processing crest factor and scores for EEQ1 were plotted versus the post-

processing crest factor. These plots can be seen in Figure 20. The percent-correct curves still do

not lie on top of each other, indicating that crest factor by itself also cannot be used to explain the

performance benefits with the EEQ1 processing. In fact, because pure noise has a lower crest

factor than pure speech (as seen by the crest factor curves for CON being lower at negative

SNRs than at positive SNRs), one would expect processing whose performance benefits can be

Page 32: Improving Speech Intelligibility in Fluctuating Background ...

32

explained solely by crest factor changes to result in signals that have higher crest factors than

UNP signals. However, as stated above, EEQ1 processing lowers the crest factor and thus cannot

be used to explain the psychometric data. This analysis was also performed by using different

percentiles of signal amplitude in the numerator of the crest factor formula (for example, by

using the 95th or 99th percentile rather than the maximum value), but this mapping did not fit the

data well either.

D. EEQ1 vs EEQ4 Processing

EEQ1 proved to be more effective than EEQ4 processing for HI listeners; with the

average HI data, the mean difference between EEQ1 and EEQ4 scores across the seven noises

was 4.0 percentage points. It had been hypothesized that processing frequency bands

independently could provide benefit particularly with speech-derived noises for HI listeners with

non-uniform losses. However, by applying different scale factors to different frequency bands,

such independent-band processing may have interfered with the spectral shape, resulting in

decreased effectiveness. To see if this might be the case, outputs of the three processing schemes

were examined in each of the four bands used for EEQ4.

Figure 21 compares RMS values and median amplitudes for UNP, EEQ1, and EEQ4

within each of the four bands used in the EEQ4 processing as a function of SNR. As seen in

Figures 21A, 21B, and 21C, the RMS values for the three different kinds of processing do not

differ much on a band-by-band basis. This is because the EEQ processing normalizes the RMS

mean of the output signal to be equal to that of the input signal. However, an obvious difference

can be seen between the median values of the UNP and both EEQ processing schemes, as shown

in Figures 21D, 21E, and 21F. For UNP, the median amplitudes have a generally linear decrease

with an increase in SNR, whereas the slopes of the EEQ1 and EEQ4 functions level off at around

Page 33: Improving Speech Intelligibility in Fluctuating Background ...

33

0 dB. This is consistent with the EEQ processing amplifying the low energy speech components.

The shapes of these functions are generally similar for EEQ1 and EEQ4. However, at the lower

SNRs, bands 1 and 4, and to a lesser extent bands 2 and 3, show greater overlap with EEQ4 than

with EEQ1. At the higher SNRs, bands 1 and 4 and bands 2 and 3 show greater separation for

EEQ4 compared to EEQ1. It is possible that these differences in spectral shape led to the overall

4.0 percentage point decrease in performance with EEQ4 relative to EEQ1. It is possible that

other metrics besides RMS values and median amplitudes might reveal a larger difference in

spectral shape between the two processing schemes. It is also possible that the additional

processing involved in the multi-band scheme introduced additional distortions to the signal,

which led to the observed decreases in performance with EEQ4 compared to EEQ1.

E. Glimpse Analysis of Vocoded and Non-Vocoded Noises

The EEQ processing scheme proved to be more effective, both in terms of improving

scores and NMR, with the non-speech-derived noises compared to the speech-derived noises. An

analysis was conducted on the differences in occurrences of noise glimpses between these two

categories of noises to explore why this may be the case. This analysis was similar to one done

by Cooke (2006), who looked at glimpse percentages and counts for a number of competing

background speakers. Cooke’s analysis defined a glimpse to be a connected region of the

spectrotemporal excitation pattern in which the energy of the speech token exceeded that of the

background by at least 3 dB in each time-frequency element. Unlike Cooke’s analysis, the

current analysis looked at noise alone and examined its envelope as opposed to its

spectrotemporal pattern. The analysis used here defines a noise glimpse to be a section of the

noise where the envelope drops more than 3 dB below the RMS value of the noise for at least 10

ms. Gaps present at the immediate start or end of the noise were not counted because these

Page 34: Improving Speech Intelligibility in Fluctuating Background ...

34

durations might be truncated from their actual duration. The threshold of 3 dB below the RMS

value was chosen to prevent steady-state noise, which has many small fluctuations in the vicinity

of its RMS value, from being classified as having a significant portion of the signal spent in a

glimpse. The minimum duration of 10 ms was chosen based on a study by He et al. (1999),

which showed that the threshold for detecting a gap in a longer duration noise (400 ms) is

roughly 5 ms, independent of the location of the gap within the noise or whether the gap location

is randomly selected on each trial. Therefore, the analysis described in this paper chose a

minimum duration of 10 ms that was twice as long as the threshold where gaps can be perfectly

discriminated.

Figure 22 depicts the waveforms and envelopes of VC-1 (Figure 22A), VC-2 (Figure

22B), and VC-4 (Figure 22C) noises. The envelope was computed by passing the absolute value

of the signal’s Hilbert transform through 16 logarithmically spaced low-pass filters in the range

of 80 Hz to 8020 Hz with cutoffs of 64 Hz. The red lines represent the RMS values, and the

green lines represent 3 dB below the RMS values. An interval is considered to be a noise gap if

the envelope (shown in blue) drops below the green threshold line for at least 10 ms. As shown

in the figures, as the number of speakers increases in the speech-derived noises, the envelope

hovers closer to the RMS value.

The analysis considered six of the noises used in the experiment (eliminating only the

BAS noise). Additionally, it considered speech-vocoded modulated noises derived from more

than 4 speaker samples; VC-8, VC-16, VC-32, VC-64, VC-128, VC-256, and VC-512 were also

examined. Five hundred samples of each of the noise types were generated to have a duration

equal to an arbitrarily chosen VCV token of 1.29 seconds. Note that these additional noise types

were generated from multiple samples of the same eight speakers as were used to generate VC-1,

Page 35: Improving Speech Intelligibility in Fluctuating Background ...

35

VC-2, and VC-4. Half of these samples came from combinations of female speakers and half

from combinations of male speakers. For each noise sample, the occurrences of the glimpses

using the above definition were determined. This information was then used to calculate the

percentage that the glimpses constitute of the overall noise duration, the number of glimpses per

second, and the average length of the glimpses. For the first two quantities, the averages over the

500 noise samples generated are shown in Figures 23 and 24: Figure 23 depicts the percentage of

glimpses information, and Figure 24 depicts the measured glimpses-per-second information.

Cooke’s paper contains plots of these same quantities which are similar in shape to the results

obtained with this study’s slightly different metric of a noise glimpse. Figure 25 shows a

histogram of the final quantity, the average glimpse duration in each of the 500 noise samples.

As shown in Figure 23, the average fraction of time spent in a glimpse decreased as the

number of speakers in the vocoded modulated noise increased. As the number of speakers

increased, this fraction approached zero, the value in CON noise. SQW and SAM had values of

0.439 and 0.419, respectively. Note that had the opening and closing gaps been counted and had

the RMS value been used as a threshold instead of 3 dB below the RMS value, these numbers

would have been closer to 0.5 by nature of the symmetry of the noises. VC-1, VC-2, and VC-4,

the three speech-derived noises used in the experiment, had fractions of 0.423, 0.336, and 0.257,

respectively. VC-1 was therefore very similar to SQW and SAM in terms of fraction of the time

spent in a gap, whereas VC-4 had more gaps than CON using the current metric. The variability

of this measure was greater for the speech-derived noises than the non-speech-derived noises and

decreased as the number of speakers in the speech-derived noises increased. Standard deviations

within the 500 samples were 0.120 for VC-1, 0.098 for VC-2, and 0.081 for VC-4, whereas these

values were 0.000 for CON, 0.015 for SQW, and 0.014 for SAM.

Page 36: Improving Speech Intelligibility in Fluctuating Background ...

36

As shown in Figure 24, the average number of glimpses per second increased from 1

speaker to 2 speakers and then decreased from there on as the number of speakers in the vocoded

noises increased. This quantity approached zero, the value in CON noise. SQW and SAM had

glimpse per second values of 9.20 and 9.22, respectively, and VC-1, VC-2, and VC-4 had

glimpse per second values of 2.57, 3.46, and 3.43, respectively. Thus, all of the speech-derived

noises had fewer number of glimpses per second than did the non-speech-derived noises. The

variability in the number of glimpses per second was greater for the speech-derived noises than

for the non-speech-derived noises, and the variability first increased and then decreased as the

number of speakers in the speech-derived noises increased. VC-1, VC-2, and VC-4 had standard

deviations of 1.14, 1.33, and 1.34, respectively, between the 500 samples, whereas these values

were 0.000, 0.318, and 0.319 for CON, SQW, and SAM, respectively.

To generate Figure 25, the 500 average glimpse durations (corresponding to the average

glimpse duration in each of the 500 noise samples generated for a given noise) were placed into

buckets of length 10 ms. As shown in the figure, the average glimpse duration between samples

of the same type of speech-derived noise varies quite a bit, especially for the ones which were

tested in the experiments (VC-1, VC-2, and VC-4). The histograms for these noises span many

buckets. Meanwhile, for the non-speech-derived noises (CON, SQW, and SAM), there is very

little variability in the average glimpse duration between noise samples. The histograms for these

noises occupy a single bucket: for CON, the bin from 0 to 0.01 seconds and for SQW and SAM,

the bin from 0.04 to 0.05 seconds. Almost every single sample for VC-1, VC-2, and VC-4 falls

into a bucket of greater duration than that for SQW and SAM.

Figures 23, 24, and 25 offer insight into why the EEQ processing performed better with

the non-speech-derived noises than with the speech-derived noises. Although VC-1, SQW, and

Page 37: Improving Speech Intelligibility in Fluctuating Background ...

37

SAM have similar amounts of total time spent in glimpses, these times are distributed over a

greater number of glimpses in SQW and SAM. With VC-2 and VC-4, there are both fewer total

times spent in glimpses and total number of glimpses than with SQW and SAM. Additionally,

the variability is much greater in the speech-derived noises than the non-speech-derived noises.

The EEQ processing performs best with short, frequent glimpses, as this gives it the best

opportunity to amplify speech exposed during the gaps in the noise by normalizing with the ratio

of the long and short term energies. With VC-1, there are fewer, longer glimpses. Therefore, the

listener only gets a few samples of the speech stimulus rather than little bits throughout. During

the longer glimpses, the running long-term average would be reduced, leading to smaller changes

in gain in these sections. With fewer and longer glimpses (and therefore fewer and longer non-

glimpses as well), it is also possible that the entirety of the low-energy consonant portion of the

speech stimuli would be covered by noise. Thus, the EEQ processing may have less of an

opportunity to operate effectively on the portion of the speech where HI listeners require the

most amplification and could instead end up amplifying noise during these parts. Finally, the

predictability of the non-speech-derived noises (as evidenced by the low standard deviation in

percentage glimpses and number of glimpses) may make it easier for HI listeners to benefit from

EEQ processing with those noises. With the speech-derived noises, the standard deviations are

high, and each sample of noise is quite different from the others. This unpredictable pattern

makes it harder for HI listeners to benefit from EEQ processing in the speech-derived noises.

To examine the role of glimpsing on performance in the different types of noise, Figure

26 plots the mean NH and HI scores for UNP and EEQ1 as a function of the average fraction of

the noise spent in a glimpse. For both speech types and groups of listeners, scores increased with

an increase in the fraction of glimpses once this measure exceeded approximately 0.25. Below

Page 38: Improving Speech Intelligibility in Fluctuating Background ...

38

this fraction, scores were roughly constant at the level observed in CON. For both NH and HI

listeners, the UNP curve lies above the EEQ1 curve for the smaller fractions of glimpses.

However, as the fraction of glimpses increases, the difference between the curves gets smaller

and even reverses at the highest fractions of glimpses. These trends are consistent with the

concept that the EEQ processing is most effective when there are many glimpses available

throughout the duration of the noise signal.

Another explanation for why EEQ processing results in less of an NMR improvement in

speech-derived noises for HI listeners lies in the fact that many HI listeners begin with a greater

NMR in VC-1 and VC-2 compared to SQW and SAM. As shown by Figure 12, the HI listeners

with the most severe hearing losses (HI-6, HI-7, HI-8, and HI-9) have almost no NMR in the

UNP condition in SQW but do have a non-zero NMR in VC-1. In fact, in the UNP condition, the

NMR is much more constant in VC-1 interference as the listener’s PTA increases than is the case

in SQW. For UNP, this implies that in VC-1 noise, HI listeners were able to recover more of the

performance that was lost in VC-4 noise than they were in SQW with their performance lost in

CON. Thus, there is less room for NMR improvement with EEQ processing in the speech-

derived noises, and it is less surprising that there is not as much of an increase compared to the

non-speech-derived noises.

VII. CONCLUSIONS

EEQ processing was effective in improving NMR for HI listeners in SQW and SAM

interference. The EEQ effect on NMR was less apparent in VC-1 and VC-2 interference.

These observations held across various SNRs.

Page 39: Improving Speech Intelligibility in Fluctuating Background ...

39

NMR improvements for EEQ resulted primarily from increased performance in

fluctuating noise, especially in SQW interference, although there was also a smaller

decrease in performance in BAS and CON for EEQ.

EEQ processing is more effective with regular and frequent gaps in the fluctuating noises,

as is seen in SQW and SAM. VC-1 and VC-2 have gaps that are more variable in length

and therefore limit the effectiveness of the EEQ processing in using the short and long

window to normalize energy.

EEQ1 processing was more effective than EEQ4 processing. EEQ4 may have interfered

with the spectral shape, resulting in decreased effectiveness.

NMR decreased with increasing hearing loss for UNP speech but was roughly

independent of degree of loss for EEQ speech. This resulted in a large increase in NMR

for HI listeners with the most severe hearing losses.

Although EEQ processing increases the local SNR and decreases the amplitude variation

of a noisy speech signal, neither of these effects provided a complete explanation of

behavioral performance with EEQ signals over a wide range of SNR.

VIII. FUTURE WORK

This study arose out of the desire to study and understand the factors that influence NMR

in HI listeners and to explore a signal processing technique to improve NMR. Future work

relates to these long-term goals. The work reported here evaluated the effectiveness of the EEQ

processing scheme in a consonant identification task. Future studies will explore how the EEQ

processing scheme fares in other speech types, specifically vowels and sentences. A model to

predict the differences in performance exhibited by HI listeners with UNP and EEQ processing

Page 40: Improving Speech Intelligibility in Fluctuating Background ...

40

in the different types of background noise, building on the SNR and crest factor analyses

described in this thesis, will be further investigated. Also, the EEQ processing scheme will

continue to be analyzed for the potential for improvements in a broader range of fluctuating

noises, most notably in noises with irregular gap lengths. Additionally, the factors causing NMR

to be greater in the speech-derived noises than in the non-speech derived noises with UNP will

be investigated. Ways to restrict the EEQ4 processing from resulting in as much spectral

alteration will also be explored, potentially by restricting the scale factor applied to adjacent

bands to be within a fixed range of each other. Additional signal processing techniques will also

be examined for the improvement of NMR in HI listeners. These techniques will perhaps make

use of what was learned from the EEQ processing, and they could together lead to an increased

understanding of the mechanisms contributing to masking release in both NH and HI listeners.

Page 41: Improving Speech Intelligibility in Fluctuating Background ...

41

References

Cooke, M. (2006). “A glimpsing model of speech perception in noise,” J. Acoust. Soc. Am. 119,

1562-1573.

Desloge, J. G., Reed, C. M., Braida, L. D., Perez, Z. D., and D’Aquila, L. A. (2016). “Technique

to improve speech intelligibility in fluctuating background noise by normalizing signal

energy over time.” Manuscript in preparation.

Desloge, J. G., Reed, C. M., Braida, L. D. Perez, Z. D., and Delhorne, L. A. (2010). “Speech

_____ reception by listeners with real and simulated hearing impairment: Effects of continuous

_____ and interrupted noise,” J. Acoust. Soc. Am. 128, 342-359.

Dillon, H. (2001). Hearing Aids (Thieme, New York), pp. 239-247.

Festen, J. M., and Plomp, R. (1990). "Effects of fluctuating noise and interfering speech on the

speech reception threshold for impaired and normal hearing,'' J. Acoust. Soc. Am. 88,

1725-1736.

He, N., Horwitz, A. R., Dubno, J. R., and Mills, J. H. (1999). “Psychometric functions for gap

detection in noise measured from young and aged subjects,” J. Acoust. Soc. Am. 106, 966-

978.

Léger, A. C., Reed, C. M., Desloge, J. G., Swaminathan, J., and Braida, L. D. (2015). “Consonant

identification in noise using Hilbert-transform temporal fine-structure speech and

recovered-envelope speech for listeners with normal and impaired hearing,” J. Acoust. Soc.

Am. 136, 859-866.

Phatak, S, and Grant, K. W. (2014). “Phoneme recognition in vocoded maskers by normal-

hearing and aided hearing-impaired listeners,” J. Acoust. Soc. Am. 136, 859-866.

Page 42: Improving Speech Intelligibility in Fluctuating Background ...

42

Reed, C. M., Desloge, J. G., Braida, L. D., Léger, A. C., and Perez, Z. D. (2016). “Level

variations in speech: Effect on masking release in hearing-impaired listeners.” Under

review for J. Acoust. Soc. Am.

Shannon, R. V., Jensvold, A., Padilla, M., Robert, M. E., and Wang, X. (1999). "Consonant

recordings for speech testing," J. Acoust. Soc. Am. 106, L71-74.

Studebaker, G. A. (1985). “A ‘Rationalized’ Arcsine Transform,” J. Speech Lang. Hear. Res. 28,

455-462.

Page 43: Improving Speech Intelligibility in Fluctuating Background ...

43

Figure 1: The magnitude and phase of the square root of the ratio of the frequency response of

AVGlong to the frequency response of AVGshort. AVGshort and AVGlong are the moving average

operators used by the EEQ processing in the computation of the running short- and long-term

energies, respectively, of the signal. They are single-pole IIR low pass filters applied to the

instantaneous signal energy with time constants of 5 ms and 200 ms for the short and long

averages, respectively.

Page 44: Improving Speech Intelligibility in Fluctuating Background ...

44

Figure 2: Block diagrams of the EEQ processing algorithm used in this implementation. Figure

2A outlines the steps of the EEQ1 processing. Eshort[n] and Elong[n] are computed with single pole

IIR filters applied to the instantaneous signal energy with time constants of 5 ms and 200 ms,

respectively, and the scale factor SC[n] is restricted to lie in the range of 0 dB to 20 dB. Figure

2B shows the EEQ1 processing applied independently in each of the four frequency bands to

yield EEQ4.

Figure 2A:

Figure 2B:

Page 45: Improving Speech Intelligibility in Fluctuating Background ...

45

Table 1: Test ear, age, and 5-frequency pure-tone average (PTA) for each HI listener. The final

two columns provide the comfortable speech presentation levels chosen by each listener with

NAL amplification and the SNR used in testing all speech conditions. The SNR was chosen to

yield 50%-correct in continuous noise.

Listener Test Ear Age 5-Frequency

PTA (dB HL)

Speech Level

(dB SPL)

SNR (dB)

HI-1 R 33 27 68 -8

HI-2 R 58 28 65 -2

HI-3 L 21 30 65 -6

HI-4 L 23 36 65 -2

HI-5 L 20 45 65 -4

HI-6 L 69 53 70 0

HI-7 L 59 56 68 0

HI-8 R 26 58 70 -2

HI-9 L 21 75 71 -2

Page 46: Improving Speech Intelligibility in Fluctuating Background ...

46

Figure 3: Pure-tone detection thresholds in dB SPL measured for 500 ms tones in a three-

alternative forced-choice adaptive procedure. A green line representing the average thresholds of

the test ears of the NH listeners is shown in the upper left box, and the thresholds for the HI

listeners are shown in the remaining boxes. For the HI listeners, thresholds are shown for the

right ear (red circles) and left ear (blue x’s), with the points of the test ear connected using a solid

line and the points of the non-test ear connected using a dashed line.

Page 47: Improving Speech Intelligibility in Fluctuating Background ...

47

Figure 4: Waveforms of the 7 background interference conditions used in testing. To make it

easier to see the effects of the modulation, the same underlying noise sample was used to

generate all four non-speech derived noises in this figure.

Page 48: Improving Speech Intelligibility in Fluctuating Background ...

48

Figure 5: The spectrogram of the randomly generated speech-shaped noise used to create the

BAS, CON, SQW, and SAM interference conditions. The speech-shaped noise had a total

duration of 30 seconds, and a random segment of the speech-shaped noise (of the desired

interference duration) was chosen every time a sample of BAS, CON, SQW, or SAM was

generated.

Page 49: Improving Speech Intelligibility in Fluctuating Background ...

49

Figure 6: Waveform of the VCV token ‘APA’ for UNP speech in the BAS noise condition. The

high-energy vowel components and the low energy consonant component are annotated at the

top of the figure.

Page 50: Improving Speech Intelligibility in Fluctuating Background ...

50

Figure 7: Waveforms and amplitude distribution plots for the VCV token ‘APA’ presented with

the seven different kinds of background interference (BAS, CON, SQW, SAM, VC-1, VC-2, and

VC-4) with UNP (Figure 7A), EEQ1 (Figure 7B), and EEQ4 (Figure 7C) processing. The speech

is presented at a level of 65 dB SPL, and the noise (other than BAS) is presented at a level of 69

dB SPL, leading to an SNR of -4 dB. The blue line in the amplitude distribution plot represents

the RMS value. The green line is the median of the amplitude distribution.

Page 51: Improving Speech Intelligibility in Fluctuating Background ...

51

Figure 7A

Page 52: Improving Speech Intelligibility in Fluctuating Background ...

52

Figure 7B

Page 53: Improving Speech Intelligibility in Fluctuating Background ...

53

Figure 7C

Page 54: Improving Speech Intelligibility in Fluctuating Background ...

54

Figure 8: Waveforms and amplitude distribution plots for the VCV token ‘APA’ presented with

CON (Figure 8A) and SQW (Figure 8B) interference with EEQ4 processing. Each of the four

rows in the figure corresponds to a different logarithmically equal band in the range of 80 Hz to

8020 Hz. The speech is presented at a level of 65 dB SPL, and the noise is presented at a level of

69 dB SPL, leading to an SNR of -4 dB. The blue line in the amplitude distribution plot

represents the RMS value. The green line is the median of the amplitude distribution.

Figure 8A:

Page 55: Improving Speech Intelligibility in Fluctuating Background ...

55

Figure 8B:

Page 56: Improving Speech Intelligibility in Fluctuating Background ...

56

Table 2: The SNRs employed in Experiment 2. The Mid SNR was equivalent to the one used in

Experiment 1. The Low SNR was 4 dB lower than the Mid SNR, and the High SNR was 4 dB

higher than the Mid SNR.

Listener Low SNR Mid SNR High SNR

HI-2 -6 -2 2

HI-4 -6 -2 2

HI-5 -8 -4 0

HI-7 -4 0 4

Page 57: Improving Speech Intelligibility in Fluctuating Background ...

57

Figure 9: The mean percent correct scores achieved by the four NH (green bars) and nine HI

listeners (gold bars) in Experiment 1. The scores were measured with UNP (top panel), EEQ1

(middle panel), and EEQ4 (bottom panel) processing in BAS, CON, SQW, SAM, VC-1, VC-2,

and VC-4 background interference conditions. The error bars associated with each bar are +/- 1

standard deviation.

Page 58: Improving Speech Intelligibility in Fluctuating Background ...

58

Figure 10: The mean percent correct scores achieved by the four NH (upper panel) and nine HI

listeners (lower panel) in Experiment 1. The scores were measured with UNP (purple bars),

EEQ1 (orange bars), and EEQ4 (green bars) processing in BAS, CON, SQW, SAM, VC-1, VC-

2, and VC-4 background interference conditions.

Page 59: Improving Speech Intelligibility in Fluctuating Background ...

59

Figure 11: The mean percent correct scores achieved by the four NH listeners (upper left panel)

and the individual percent correct scores achieved by the nine HI listeners (other nine panels) in

Experiment 1. The scores were measured with UNP (purple bars), EEQ1 (orange bars), and

EEQ4 (green bars) processing in BAS, CON, SQW, SAM, VC-1, VC-2, and VC-4 background

interference conditions. The error bars associated with each bar are +/- 1 standard deviation.

Page 60: Improving Speech Intelligibility in Fluctuating Background ...

60

Table 3: Analysis of Variance results conducted on the rationalized arcsine units for the percent

correct scores of each of the four NH and nine HI listeners. The F-statistic (along with the

degrees of freedom) and probability are shown for each listener by speech type, noise type, and

speech by noise interaction. The probabilities below the 0.01 significance level are bolded and

annotated with an asterisk.

Speech Type Noise Type Speech x Noise

F(2, 63) p F(6, 63) p F(12, 63) p

NH-1 2.54 0.0866 228.22 < 0.0001* 1.24 0.2745

NH-2 26.95 < 0.0001* 269.21 < 0.0001* 2.87 0.0033*

NH-3 17.72 < 0.0001* 281.37 < 0.0001* 1.32 0.2311

NH-4 12.08 < 0.0001* 273.64 < 0.0001* 1.85 0.0592

Speech Type Noise Type Speech x Noise

F(2, 63) p F(6, 63) p F(12, 63) p

HI-1 7.92 0.0009* 237.95 < 0.0001* 2.68 0.0056*

HI-2 7.20 0.0015* 139.51 < 0.0001* 1.57 0.1244

HI-3 114.04 < 0.0001* 325.82 < 0.0001* 0.89 0.565

HI-4 15.71 < 0.0001* 93.95 < 0.0001* 5.22 < 0.0001*

HI-5 21.16 < 0.0001* 162.28 < 0.0001* 2.65 0.0061*

HI-6 30.59 < 0.0001* 163.58 < 0.0001* 3.17 0.0014*

HI-7 7.97 0.0008* 64.67 < 0.0001* 2.26 0.0188

HI-8 17.84 < 0.0001* 134.79 < 0.0001* 4.16 < 0.0001*

HI-9 5.69 0.0054* 94.82 < 0.0001* 3.44 0.0007*

Page 61: Improving Speech Intelligibility in Fluctuating Background ...

61

Table 4: Tukey-Kramer post-hoc multiple comparison results among the four NH and nine HI

listeners using a significance level of 0.05. The ordering is shown for each listener by speech

type (1 for UNP, 2 for EEQ1, and 3 for EEQ4), noise type (1 for BAS, 2 for CON, 3 for SQW, 4

for SAM, 5 for VC-1, 6 for VC-2, and 7 for VC-4), and speech by noise interaction. When two

conditions are not significantly different, they are listed in decreasing order of mean value

observed. Note that there were some cases where conditions X and Y and conditions Y and Z

were not significantly different, but conditions X and Z were significantly different. In these

cases, Y was listed in the table as being not significantly different to whichever of X and Z it was

closer in mean value to.

Speech Type Noise Type

NH-1 1 = 2 = 3 1 > 3 > 4 = 5 > 6 > 7 = 2

NH-2 1 = 2 > 3 1 > 3 > 4 > 5 > 6 > 7 = 2

NH-3 1 > 2 = 3 1 > 3 > 4 > 5 > 6 > 7 = 2

NH-4 1 > 2 = 3 1 > 3 > 4 > 5 > 6 > 7 = 2

Speech Type Noise Type

HI-1 1 > 2 = 3 1 > 3 > 4 > 5 > 6 > 2 = 7

HI-2 1 = 2 > 3 1 > 3 > 4 = 5 > 6 = 7 = 2

HI-3 1 > 2 > 3 1 > 3 > 4 > 5 > 2 > 6 > 7

HI-4 1 = 2 > 3 1 > 3 > 4 = 5 > 6 = 2 = 7

HI-5 1 > 2 = 3 1 > 3 > 4 = 5 > 6 = 2 = 7

HI-6 2 > 1 > 3 1 > 3 > 4 = 5 = 2 > 6 = 7

HI-7 1 > 2 = 3 1 > 3 = 4 = 5 > 6 = 2 = 7

HI-8 1 = 2 > 3 1 > 3 > 4 = 5 > 2 = 6 = 7

HI-9 1 = 2 > 3 1 > 3 > 4 = 2 = 5 > 6 = 7

Page 62: Improving Speech Intelligibility in Fluctuating Background ...

62

Figure 12: The mean NMR achieved by the NH listeners (first group of bars) and the individual

NMR for each of the HI listeners (remaining nine groups of bars) with UNP (purple bars), EEQ1

(orange bars), and EEQ4 (green bars) processing. The NMR for the SQW (upper left panel) and

SAM (lower left panel) noises was calculated relative to the CON condition, whereas the NMR

for the VC-1 (upper right panel) and VC-2 (lower right panel) noises was calculated relative to

the VC-4 noise condition.

Page 63: Improving Speech Intelligibility in Fluctuating Background ...

63

Figure 13: Percent-correct scores plotted as a function of SNR in dB for UNP (circles) and

EEQ1 (asterisks) speech. The data for the non-speech derived noises (SQW noise in blue, SAM

noise in green, and CON noise in red) are plotted in Figure 13A, and the data for the speech-

derived noises (VC-1 noise in blue, VC-2 noise in green, and VC-4 noise in red) are plotted in

Figure 13B. Sigmoidal fits to each of these functions are shown with data points connected by

continuous lines for UNP conditions and dashed lines for EEQ1 conditions.

Figure 13A:

Page 64: Improving Speech Intelligibility in Fluctuating Background ...

64

Figure 13B:

Page 65: Improving Speech Intelligibility in Fluctuating Background ...

65

Table 5: Midpoints (M) of SNR in dB for 50%-correct identification and slopes (S) around the

midpoints in percentage points per dB of sigmoidal fits to the data of the four individual HI

listeners shown in Figure 13. M and S are given for UNP and EEQ1 speech in six noise

backgrounds: CON, SQW, SAM, VC-1, VC-2, and VC-4. Means across listeners are provided in

the final row.

UNP SPEECH EEQ1 SPEECH

CON SQW SAM CON SQW SAM

M S M S M S M S M S M S

HI-2 -3.8 5.1 -18.7 1.7 -7.8 3.5 -2.2 5.0 -32.1 1.1 -8.6 3.5

HI-4 -5.3 4.2 -13.9 2.1 -8.7 3.3 -3.9 4.4 -37.3 1.9 -24.4 1.7

HI-5 -4.3 7.2 -15.8 1.5 -8.2 3.7 -2.7 4.7 -300.1 0.2 -24.2 1.0

HI-7 -2.1 4.1 -6.2 2.1 -4.3 3.4 -2.4 3.3 -24.4 1.2 -6.2 3.5

Means -3.9 5.2 -13.6 1.9 -7.3 3.5 -2.8 4.3 -98.5 1.1 -15.9 2.4

UNP SPEECH EEQ1 SPEECH

VC-1 VC-2 VC-4 VC-1 VC-2 VC-4

M S M S M S M S M S M S

HI-2 -8.0 3.4 -6.3 2.8 -3.4 5.2 -8.9 3.0 -3.9 3.8 -2.1 4.0

HI-4 -11.8 2.7 -7.7 3.3 -5.8 3.2 -13.1 2.6 -6.9 3.1 -3.3 4.2

HI-5 -11.1 3.1 -5.6 3.7 -3.6 5.7 -14.5 1.3 -5.0 3.7 -2.3 4.9

HI-7 -5.1 3.2 -3.7 2.7 -0.9 5.0 -7.4 3.1 -2.4 3.9 -1.8 3.9

Means -9.0 3.1 -5.8 3.1 -3.4 4.8 -11.0 2.5 -4.5 3.6 -2.4 4.3

Page 66: Improving Speech Intelligibility in Fluctuating Background ...

66

Figure 14: MR in percentage-points for UNP (solid lines) and EEQ1 (dotted lines) speech

plotted as a function of SNR in dB. MR is computed as a subtraction of the fluctuating and

continuous sigmoid fit functions in Figure 13. MR for the non-speech derived fluctuating noises

(SQW noise in blue and SAM noise in green) is computed with CON as the continuous noise and

is plotted in Figure 14A, and MR for the speech-derived fluctuating noises (VC-1 noise in blue

and VC-2 noise in green) is computed with VC-4 as the continuous noise and is plotted in Figure

14B.

Figure 14A:

Page 67: Improving Speech Intelligibility in Fluctuating Background ...

67

Figure 14B:

Page 68: Improving Speech Intelligibility in Fluctuating Background ...

68

Figure 15: Normalized masking release (NMR) for EEQ1 plotted as a function of NMR for

UNP for the four HI listeners. The data for the non-speech derived noises (SQW with filled

symbols and SAM with unfilled symbols) are plotted in Figure 15A, and the data for the speech-

derived noises (VC-1 with filled symbols and VC-2 with unfilled symbols) are plotted in Figure

15B. NMR is shown for three values of SNR: circles represent the lowest SNR, diamonds the

middle SNR, and squares the highest SNR tested for each of the listeners. The markers are pink

for HI-2, blue for HI-4, red for HI-5, and green for HI-7.

Figure 15A:

Page 69: Improving Speech Intelligibility in Fluctuating Background ...

69

Figure 15B:

Page 70: Improving Speech Intelligibility in Fluctuating Background ...

70

Figure 16: The NMR for Experiment 1 in the SQW condition attained by each of the HI

listeners with UNP (purple lines), EEQ1 (orange lines), and EEQ4 (green lines) processing

plotted as a function of PTA in dB HL.

Page 71: Improving Speech Intelligibility in Fluctuating Background ...

71

Figure 17: Effective SNR after EEQ1 processing (SNREEQ1) plotted as a function of the SNR

before EEQ1 processing (SNRUNP) for 10 samples of noise in each of the 64 test syllables in

CON (red line), SQW (blue line), SAM (green line), VC-1 (yellow line), VC-2 (pink line), and

VC-4 (cyan line) noises. A reference line of SNREEQ1 = SNRUNP (dashed black line) is shown to

make apparent at which points the EEQ processing lowers or raises the effective SNR.

Page 72: Improving Speech Intelligibility in Fluctuating Background ...

72

Figure 18: Scores plotted as a function of SNR. UNP scores (solid lines with circular markers)

are plotted as a function of SNRUNP, and EEQ1 scores (dashed lines with asterisk markers) are

plotted as a function of SNREEQ1 (obtained from the function in Figure 17). The data for the non-

speech derived noises (CON with red markers, SQW with blue markers, and SAM with green

markers) are plotted in Figure 18A, and the data for the speech-derived noises (VC-1 with blue

markers, VC-2 with green markers, and VC-4 with red markers) are plotted in Figure 18B.

Figure 18A:

Page 73: Improving Speech Intelligibility in Fluctuating Background ...

73

Figure 18B:

Page 74: Improving Speech Intelligibility in Fluctuating Background ...

74

Figure 19: Crest factor plotted as a function of SNR with UNP (solid lines) and EEQ1 (dashed

lines) for 10 samples of noise in each of the 64 test syllables. The functions in the non-speech-

derived noises (CON in red, SQW in blue, and SAM in green) are shown in Figure 19A, and the

functions in the speech-derived noises (VC-1 in blue, VC-2 in green, and VC-4 in red) are shown

in Figure 19B.

Figure 19A:

Page 75: Improving Speech Intelligibility in Fluctuating Background ...

75

Figure 19B:

Page 76: Improving Speech Intelligibility in Fluctuating Background ...

76

Figure 20: Scores plotted as a function of crest factor (obtained from Figure 19). UNP scores are

solid lines with circular markers, and EEQ1 scores are dashed lines with asterisk markers. The

data for the non-speech derived noises (CON with red markers, SQW with blue markers, and

SAM with green markers) are plotted in Figure 20A, and the data for the speech-derived noises

(VC-1 with blue markers, VC-2 with green markers, and VC-4 with red markers) are plotted in

Figure 20B.

Figure 20A:

Page 77: Improving Speech Intelligibility in Fluctuating Background ...

77

Figure 20B:

Page 78: Improving Speech Intelligibility in Fluctuating Background ...

78

Figure 21: The RMS value (Figures 21A, 21B, and 21C) and median amplitude (Figures 21D,

21E, and 21F) of the syllable ‘APA’ presented at 70 dB SPL in SQW interference in the UNP,

EEQ1, and EEQ4 speech conditions. The values in four logarithmically spaced frequency bands

in the range of 80-8020 Hz are plotted with SNRs ranging from -20 dB to 20 dB. The data are

plotted with red squares for Band 1, green circles for Band 2, blue asterisks for Band 3, and

black x’s for Band 4.

Figure 21A:

Page 79: Improving Speech Intelligibility in Fluctuating Background ...

79

Figure 21B:

Page 80: Improving Speech Intelligibility in Fluctuating Background ...

80

Figure 21C:

Page 81: Improving Speech Intelligibility in Fluctuating Background ...

81

Figure 21D:

Page 82: Improving Speech Intelligibility in Fluctuating Background ...

82

Figure 21E:

Page 83: Improving Speech Intelligibility in Fluctuating Background ...

83

Figure 21F:

Page 84: Improving Speech Intelligibility in Fluctuating Background ...

84

Figure 22: The waveforms and envelopes of VC-1, VC-2, and VC-4 noises. Shown with the

envelope (in dB) are the RMS value (red line) and 3 less than the RMS value (green line). The

black horizontal lines correspond to the noise glimpses.

Figure 22A:

Page 85: Improving Speech Intelligibility in Fluctuating Background ...

85

Figure 22B:

Figure 22C:

Page 86: Improving Speech Intelligibility in Fluctuating Background ...

86

Figure 23: The fraction that the glimpses constitute of the overall noise duration of 1.29 seconds

(which was fixed to be equal to the duration of an arbitrarily chosen VCV token) plotted as a

function of the number of speakers in the vocoded modulated noises (which are powers of 2

between 1 and 512). Also shown for reference are these values in CON, SQW, and SAM noise,

the first of which is part of the connected vocoded modulated noises curve. For each noise type,

500 samples were averaged together for the computation of the percent glimpses. The error bars

associated with each data point are +/- 1 standard deviation.

Page 87: Improving Speech Intelligibility in Fluctuating Background ...

87

Figure 24: The average number of glimpses per second (computed using noise samples of

duration 1.29 seconds, the length of an arbitrarily chosen VCV token) plotted as a function of the

number of speakers in the vocoded modulated noises (which are powers of 2 between 1 and

512). Also shown for reference are these values in CON, SQW, and SAM noise, the first of

which is part of the connected vocoded modulated noises curve. For each noise type, 500

samples were averaged together for the computation of number of glimpses. The error bars

associated with each data point are +/- 1 standard deviation.

Page 88: Improving Speech Intelligibility in Fluctuating Background ...

88

Figure 25: Histograms of the average glimpse duration of the different noise types: speaker-

vocoded modulated noises (with the number of speaker samples as powers of 2 between 1 and

512), CON, SQW, and SAM. For every noise type, 500 samples were generated, and for every

sample, the average glimpse duration was calculated. In generating these histograms, these 500

average glimpse durations were placed into buckets of length 10 ms.

Page 89: Improving Speech Intelligibility in Fluctuating Background ...

89

Page 90: Improving Speech Intelligibility in Fluctuating Background ...

90

Figure 26: The scores averaged across the NH (red lines and markers) and HI listeners (blue

lines and markers) for UNP (circular markers connected by solid lines) and EEQ1 (asterisk

markers connected by dotted lines) plotted as a function of the fraction glimpses in CON (0.000),

SQW (0.436), SAM (0.418), VC-1 (0.423), VC-2 (0.364), and VC-4 (0.236).

Page 91: Improving Speech Intelligibility in Fluctuating Background ...

91

APPENDIX

Appendix I: Speaker Vocoded Modulated Noise Generation

When its scale factor does not have a lower limit of 0 dB, an implementation of EEQ

processing may choose to use a fast-attack, slow-release scheme when computing Eslow(t). In

such a scheme, SC(t) is computed as follows:

SC(t) = √𝐸𝑠𝑙𝑜𝑤(𝑡) / 𝐸𝑓𝑎𝑠𝑡(𝑡) .

Here, Eslow(t) = MAX(Eshort(t), Elong(t)) and Efast(t) = Eshort(t). Computing Eslow(t) in such a manner

ensures that it responds quickly to increases in energy but reacts more slowly to decreases in

energy. As a result, the audibility of the quiet portions of the signal is maximized by quickly

amplifying them and slowly attenuating them. Note that because the current implementation of

EEQ processing sets a lower limit of 0 dB on SC(t), the fast-attack, slow-release scheme does not

have an additional effect beyond what is already provided by the scale factor limit. However,

should an implementation not have this lower limit on the scale factor in place, the fast-attack,

slow-release scheme would provide an additional benefit in terms of maximizing the audibility

of the weaker signal components.

Page 92: Improving Speech Intelligibility in Fluctuating Background ...

92

Appendix II: Speaker Vocoded Modulated Noise Generation

The 1-, 2-, and 4-speaker vocoded modulated noises were generated according to a

method described in Phatak and Grant (2014). The method takes randomly generated continuous

flat noise and a speech signal as input and produces the speaker vocoded modulated noise as

follows:

1) Apply a Butterworth filter (slope 72 dB/oct) to the noise and the speech signals

separately in 6 logarithmically-equal bands in the range 80-8020 Hz.

2) Apply a low-pass filter (cutoff 64 Hz) to the envelope of the filtered speech signal

(from step 1) in each band.

3) Modulate the filtered noise (from step 1) by the low-pass filtered speech envelope

(from step 2) in each band and scale to the original level of the filtered speech

signal (from step 1).

4) Sum the modulated bands and time-reverse the output to obtain VC-1.

5) Repeat steps 1-4 to obtain a VC-1 noise sample as many times as needed. Add

two VC-1 samples together to create a VC-2 sample or add four VC-1 samples

together to create a VC-4 sample.

6) Scale the VC-1, VC-2, or VC-4 sample to the desired level.

The speech signal used as input comes from one of eight speakers, four of whom are

female and four of whom are male. The speech samples for a given speaker consist of sentences

that are scaled to the same level and concatenated together with a smoothing window applied to

the onset and offset of each sentence. Every sample of a speaker vocoded modulated noise comes

from a randomly chosen combination of speakers of a randomly chosen gender. For example, if a

sample of VC-2 is selected to consist of female speakers, the concatenated speech samples from

two of the four female speakers will be selected. A random start index is chosen for each selected

concatenated speech sample, and the end index is computed based on the desired duration of the

noise. The concatenated speech samples trimmed to these start and end indices are used as the

input to the above algorithm for generating the speaker vocoded modulated noise.

Page 93: Improving Speech Intelligibility in Fluctuating Background ...

93

Appendix III: Results, Experiment 1

Appendix III-A: Experiment 1 consonant identification scores in percent-correct for the four

normal hearing (NH) listeners. Scores are listed for UNP, EEQ1, and EEQ4 speech for seven

noise conditions: BAS, CON, SQW, SAM, VC-1, VC-2, and VC-4. The NH listeners were tested

at a speech level of 65 dB SPL and an SNR of -10 dB. Means across listeners are provided in the

final row.

UNP EEQ1 EEQ4

BAS CON SQW SAM BAS CON SQW SAM BAS CON SQW SAM

NH-1 98.8 52.7 91.0 86.3 98.4 45.3 92.2 78.5 97.3 52.0 91.4 82.8

NH-2 98.4 57.4 93.4 92.6 99.2 56.6 95.3 89.1 99.2 48.0 90.6 80.1

NH-3 98.4 55.1 94.5 90.6 98.4 47.7 93.8 88.7 97.3 45.7 93.4 86.7

NH-4 98.8 55.9 91.4 88.7 97.7 53.1 94.5 86.3 97.7 55.9 87.5 84.0

Mean 98.6 55.3 92.6 89.6 98.4 50.7 93.9 85.6 97.9 50.4 90.7 83.4

UNP EEQ1 EEQ4

VC-1 VC-2 VC-4 VC-1 VC-2 VC-4 VC-1 VC-2 VC-4

NH-1 84.4 69.5 57.8 82.8 71.1 56.6 80.1 71.9 53.9

NH-2 81.6 76.2 59.4 81.6 69.5 53.1 74.2 62.9 41.4

NH-3 89.1 75.0 58.2 77.7 64.8 47.7 81.6 65.6 48.4

NH-4 85.9 67.6 54.7 77.0 67.6 50.4 76.6 61.7 47.7

Mean 85.3 72.1 57.5 79.8 68.3 52.0 78.1 65.5 47.9

Page 94: Improving Speech Intelligibility in Fluctuating Background ...

94

Appendix III-B: Experiment 1 consonant identification scores in percent-correct for the nine

hearing impaired (HI) listeners. Scores are listed for UNP, EEQ1, and EEQ4 speech for seven

noise conditions: BAS, CON, SQW, SAM, VC-1, VC-2, and VC-4. The speech levels and SNRs

used in testing each of HI listeners are provided in Table 1. Means across listeners are provided

in the final row.

UNP EEQ1 EEQ4

BAS CON SQW SAM BAS CON SQW SAM BAS CON SQW SAM

HI-1 100.0 59.0 91.4 84.0 99.2 53.9 94.9 84.4 97.7 55.1 93.8 86.7

HI-2 94.5 57.4 74.2 66.4 95.3 49.2 77.3 71.1 91.0 47.3 75.4 63.3

HI-3 96.9 58.2 77.3 74.6 94.5 50.8 71.9 64.5 87.1 41.0 62.1 52.7

HI-4 91.4 58.6 68.8 63.7 85.5 54.7 80.9 77.7 82.8 48.0 75.8 60.2

HI-5 95.3 50.4 66.8 62.5 91.4 42.2 85.5 64.1 87.5 43.4 68.4 62.5

HI-6 89.5 56.3 57.8 55.9 91.0 52.3 68.4 62.5 80.1 48.0 62.1 50.4

HI-7 89.8 56.3 60.5 62.1 82.8 46.9 67.2 59.0 78.1 45.3 66.0 62.5

HI-8 87.9 53.9 63.3 55.1 86.7 42.2 75.4 62.5 82.4 43.8 65.2 53.1

HI-9 92.6 59.8 60.2 64.5 89.5 60.5 77.3 57.4 84.4 56.6 71.9 57.4

Mean 93.1 56.6 68.9 65.4 90.7 50.3 77.6 67.0 85.7 47.6 71.2 61.0

UNP EEQ1 EEQ4

VC-1 VC-2 VC-4 VC-1 VC-2 VC-4 VC-1 VC-2 VC-4

HI-1 81.3 75.4 59.8 77.3 59.0 50.0 75.4 64.8 50.4

HI-2 69.5 58.6 54.3 70.3 55.1 52.0 59.4 56.6 52.3

HI-3 66.0 46.1 41.0 54.3 42.2 30.9 46.9 30.9 27.3

HI-4 69.5 62.9 57.4 66.8 55.5 49.6 61.3 55.9 45.7

HI-5 68.4 51.6 47.7 54.7 53.1 36.7 60.9 48.8 39.1

HI-6 57.8 47.7 48.0 63.3 48.8 46.9 48.8 44.9 43.4

HI-7 64.8 54.7 50.0 57.4 47.3 49.2 52.7 52.3 44.1

HI-8 64.1 51.6 46.9 59.4 43.4 39.8 46.9 44.9 35.5

HI-9 63.7 52.3 48.8 53.5 52.3 44.9 52.0 48.8 41.0

Mean 67.2 55.6 50.4 61.9 50.7 44.4 56.0 49.8 42.1

Page 95: Improving Speech Intelligibility in Fluctuating Background ...

95

Appendix III-C: Experiment 1 NMR Results for normal hearing listeners. The NMR for SQW

and SAM noises was calculated relative to the CON noise condition, whereas the NMR for the

VC-1 and VC-2 noises was calculated relative to the VC-4 noise condition. Means across

listeners are provided in the final row.

UNP EEQ1 EEQ4 SQW SAM VC-1 VC-2 SQW SAM VC-1 VC-2 SQW SAM VC-1 VC-2

NH-1 0.83 0.73 0.65 0.29 0.88 0.62 0.63 0.35 0.87 0.68 0.60 0.41

NH-2 0.88 0.86 0.57 0.43 0.91 0.76 0.62 0.36 0.83 0.63 0.57 0.37

NH-3 0.91 0.82 0.77 0.42 0.91 0.81 0.59 0.34 0.92 0.80 0.68 0.35

NH-4 0.83 0.76 0.71 0.29 0.93 0.75 0.56 0.36 0.76 0.67 0.58 0.28

Mean 0.86 0.79 0.67 0.36 0.91 0.73 0.60 0.35 0.85 0.69 0.61 0.35

Page 96: Improving Speech Intelligibility in Fluctuating Background ...

96

Appendix III-D: Experiment 1 NMR Results for hearing impaired listeners. The NMR for SQW

and SAM noises was calculated relative to the CON noise condition, whereas the NMR for the

VC-1 and VC-2 noises was calculated relative to the VC-4 noise condition. Means across

listeners are provided in the final row.

UNP EEQ1 EEQ4

SQW SAM VC-1 VC-2 SQW SAM VC-1 VC-2 SQW SAM VC-1 VC-2

HI-1 0.79 0.61 0.53 0.39 0.91 0.67 0.56 0.18 0.91 0.74 0.53 0.31

HI-2 0.45 0.24 0.38 0.11 0.61 0.47 0.42 0.07 0.64 0.37 0.18 0.11

HI-3 0.49 0.42 0.45 0.09 0.48 0.31 0.37 0.18 0.46 0.25 0.33 0.06

HI-4 0.31 0.15 0.36 0.16 0.85 0.75 0.48 0.16 0.80 0.35 0.42 0.27

HI-5 0.37 0.27 0.43 0.08 0.88 0.44 0.33 0.30 0.57 0.43 0.45 0.20

HI-6 0.05 -0.01 0.24 -0.01 0.41 0.26 0.37 0.04 0.44 0.07 0.15 0.04

HI-7 0.13 0.17 0.37 0.12 0.57 0.34 0.24 -0.06 0.63 0.52 0.25 0.24

HI-8 0.28 0.03 0.42 0.11 0.75 0.46 0.42 0.08 0.56 0.24 0.24 0.20

HI-9 0.01 0.14 0.34 0.08 0.58 -0.11 0.19 0.17 0.55 0.03 0.25 0.18

Mean 0.32 0.23 0.39 0.13 0.67 0.40 0.38 0.12 0.62 0.33 0.31 0.18

Page 97: Improving Speech Intelligibility in Fluctuating Background ...

97

Appendix IV: Results, Experiment 2

Appendix IV-A: Experiment 2 consonant identification scores in percent-correct for UNP and

EEQ1 speech for six noise conditions: CON, SQW, SAM, VC-1, VC-2, and VC-4. The SNRs

used in testing each of the four HI listeners are provided in Table 2, and all other experimental

parameters are the same as in Table 1.

Low SNR Mid SNR High SNR

CON SQW SAM CON SQW SAM CON SQW SAM

HI-2 UNP 40.6 70.7 57.8 57.4 74.2 66.4 78.1 80.1 80.9

EEQ1 34.4 76.6 60.2 49.2 77.3 71.1 71.9 82.0 82.0

HI-4 UNP 47.7 65.2 59.8 58.6 68.8 63.7 77.7 77.7 81.6

EEQ1 36.3 82.4 70.3 54.7 80.9 77.7 68.0 85.2 75.8

HI-5 UNP 28.9 62.9 52.7 50.4 66.8 62.5 79.7 73.4 78.9

EEQ1 27.0 80.5 65.6 42.2 85.5 64.1 61.3 77.0 72.3

HI-7 UNP 40.2 52.7 49.2 56.3 60.5 62.1 70.3 68.0 73.0

EEQ1 41.8 66.4 54.3 46.9 67.2 59.0 67.2 72.3 77.0

Low SNR Mid SNR High SNR

VC-1 VC-2 VC-4 VC-1 VC-2 VC-4 VC-1 VC-2 VC-4

HI-2 UNP 57.0 52.7 39.1 69.5 58.6 54.3 78.9 73.4 77.3

EEQ1 59.0 44.1 35.5 70.3 55.1 52.0 78.1 73.0 66.0

HI-4 UNP 64.5 55.9 49.6 69.5 62.9 57.4 80.1 78.5 73.0

EEQ1 64.5 50.8 35.5 66.8 55.5 49.6 78.5 73.0 67.2

HI-5 UNP 61.3 44.5 28.9 68.4 51.6 47.7 81.3 73.4 70.7

EEQ1 60.9 37.5 27.0 54.7 53.1 36.7 71.1 66.0 61.7

HI-7 UNP 50.8 48.8 34.4 64.8 54.7 50.0 71.9 69.5 71.5

EEQ1 57.8 41.4 37.1 57.4 47.3 49.2 78.1 71.1 66.4

Page 98: Improving Speech Intelligibility in Fluctuating Background ...

98

Appendix IV-B: Experiment 2 NMR results for HI listeners. The NMR for SQW and SAM

noises was calculated relative to the CON noise condition, whereas the NMR for the VC-1 and

VC-2 noises was calculated relative to the VC-4 noise condition. The SNRs used in testing each

of the four HI listeners are provided in Table 2, and all other experimental parameters are the

same as in Table 1.

Low SNR Mid SNR High SNR SQW SAM VC-1 VC-2 SQW SAM VC-1 VC-2 SQW SAM VC-1 VC-2

HI-2 UNP 0.56 0.32 0.32 0.25 0.45 0.24 0.38 0.11 0.12 0.17 0.09 -0.23

EEQ1 0.69 0.42 0.39 0.14 0.61 0.47 0.42 0.07 0.43 0.43 0.41 0.24 HI-4 UNP 0.40 0.28 0.36 0.15 0.31 0.15 0.36 0.16 0.00 0.29 0.38 0.30

EEQ1 0.94 0.69 0.58 0.30 0.85 0.75 0.48 0.16 0.98 0.44 0.62 0.32 HI-5 UNP 0.51 0.36 0.49 0.24 0.37 0.27 0.43 0.08 -0.40 -0.05 0.43 0.11

EEQ1 0.83 0.60 0.53 0.16 0.88 0.44 0.33 0.30 0.52 0.36 0.32 0.14 HI-7 UNP 0.25 0.18 0.30 0.26 0.13 0.17 0.37 0.12 -0.12 0.14 0.02 -0.11

EEQ1 0.60 0.30 0.45 0.09 0.57 0.34 0.24 -0.06 0.33 0.63 0.71 0.29