Auditory Prosthesis

3
 Auditory Prosthesis  An audito ry prosthe sis is a dev ice that sub stitutes fo r or enhances the ability to hear. It is more commonly called a hearing aide. To significantly improve speech-in-noise intelligibility. Figure 1: Hearing Prosthetic System Current speech enhancement algorithms improve speech quality, but not necessarily intelligibility. While hearing-impaired listeners do benefit from improved speech quality, communication problems still exist if intelligibility is not improved. The ideal binary mask is one algorithm specifically shown to improve speech intelligibility. In [9], the speech intelligibility scores reported by normal hearing listeners increased from 12% to 100% after speech embedded in four-talker babble was processed by the ideal binary mask. Similarly, the ideal binary mask improved speech intelligibility from nearly 0% to 100% in the study described in [10]. BINARY MASK ALSORITHM Speech is sparse in the time-frequency domain. If we assume that noise is also sparse in this domain, then it very likely does not overlap with the speech. So, we can remove the noisy regions of

Transcript of Auditory Prosthesis

Page 1: Auditory Prosthesis

8/10/2019 Auditory Prosthesis

http://slidepdf.com/reader/full/auditory-prosthesis 1/3

 Auditory Prosthesis

 An auditory prosthesis is a device that substitutes for or enhances the ability to hear. It is more

commonly called a hearing aide.

To significantly improve speech-in-noise intelligibility.

Figure 1: Hearing Prosthetic System

Current speech enhancement algorithms improve speech quality, but not necessarily intelligibility.

While hearing-impaired listeners do benefit from improved speech quality, communication problems

still exist if intelligibility is not improved. The ideal binary mask is one algorithm specifically shown to

improve speech intelligibility. In [9], the speech intelligibility scores reported by normal hearing

listeners increased from 12% to 100% after speech embedded in four-talker babble was processed

by the ideal binary mask. Similarly, the ideal binary mask improved speech intelligibility from nearly

0% to 100% in the study described in [10].

BINARY MASK ALSORITHM

Speech is sparse in the time-frequency domain. If we assume that noise is also sparse in this

domain, then it very likely does not overlap with the speech. So, we can remove the noisy regions of

Page 2: Auditory Prosthesis

8/10/2019 Auditory Prosthesis

http://slidepdf.com/reader/full/auditory-prosthesis 2/3

the time-frequency plane (by applying the appropriate “binary mask”), which will leave us with intact,

noise-free speech [5]. The algorithm is effective even if the noise is not sparse in the time-frequency

domain; the overall signal-to-noise ratio (SNR) of the speech can be greatly improved by discarding

those regions of the time-frequency plane whose SNR fails to exceed a specified threshold.

Figure 2: Binary Mask Algorithm

 A practical implementation of the algorithm generally has three stages—spectral analysis,

classification, and synthesis, as shown in Fig. 2. The spectral analysis stage uses the fast Fourier

transform (FFT) or a filter bank to map the original, noisy signal from the time domain to the time-frequency (TF) domain. In the classification stage, each TF unit is either identified as belonging to

class „1‟ (clean speech, a.k.a. “target”), or class „0‟ (noise). This classification creates a binary mask.

In the synthesis stage, the TF-domain version of the original, noisy signal is multiplied by the binary

mask, effectively removing all of the noise-containing portions of the signal. After the binary mask is

applied, the TF units are then recombined to form a speech signal that is clean (or at least of higher

SNR than before).

Generalization of supervised learningfor binary mask estimationMay, T. ; Centre for Appl. Hearing Res., Tech. Univ. of Denmark, Lyngby, Denmark ; Gerkmann, T. 

This paper addresses the problem of speech segregation by estimating the

ideal binary mask (IBM) from noisy speech. Two methods will be compared, one supervised

learning approach that incorporates a priori knowledge about the feature distribution

observed during training. The second method solely relies on a frame-based speech

presence probability (SPP) es-timation, and therefore, does not depend on the acoustic

condition seen during training. We investigate the influence of mismatches between the

acoustic conditions used for training and testing on the IBM estimation performance and

discuss the advantages of both approaches.

Page 3: Auditory Prosthesis

8/10/2019 Auditory Prosthesis

http://slidepdf.com/reader/full/auditory-prosthesis 3/3

A new mask-based objective measure for predicting the intelligibility

of binary masked speech 

Chengzhu Yu ; Wojcicki, K.K. ; Loizou, P.C. ; Hansen, J.H.L. 

 Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE

Mask-based objective speech-intelligibility measures have been successfully proposed for

evaluating the performance of binary maskingalgorithms. These objective measures were computed

directly by comparing the estimated binary mask against the ground truth idealbinary mask (IdBM).

Most of these objective measures, however, assign equal weight to all time-frequency (T-F) units. In

this study, we propose to improve the existing mask-based objective measures by weighting each T-

F unit according to its target or masker loudness. The proposed objective measure shows

significantly better performance than two other existing mask-based objective measures.

An algorithm combined with spectral subtraction andbinary masking formonaural speech segregation 

Monaural speech segregation from complex concurrent noise is an extremely challenging

problem; binary mask is a method to solve this problem, however, the performance of binary mask is

limited by remaining the noise in the result. In this paper, an algorithm integrated Spectral

Subtraction and binary masking for speech separation and enhancement was proposed. It follows

the framework of computational auditory scene analysis (CASA). The energy of time-frequency (T-F)

unit was used as the clue to generate the binary mask; then the spectral subtraction algorithmwas

used to eliminate noise energy in original speech and an interim speech was obtained, after covered

the binary mask on the interim speech, the target speech can be achieved. Systematic evaluation

shows that the combined algorithm can stably improve the SNR and voice quality for noisy speech. It

performs better than existing binarymasking systems in most situations, especially when the noise

and the speech have the similar power spectrum.