Speech Enhancement in Hands-Free Device (Hearing Aid) with ...830715/FULLTEXT01.pdf · Speech...

MEE-2010-2012

Speech Enhancement in Hands-Free Device

(Hearing Aid) with emphasis on Elko’s

Beamformer

Master’s Thesis

TELAGAREDDI S N U V RAMESH

This thesis is presented as a part of Degree of Master of Science in Electrical

Engineering with Emphasis on Signal Processing

Blekinge Institute of Technology

April, 2012

Blekinge Institute of Technology

School of Engineering

Department of Electrical Engineering

Supervisor: Dr. Benny Sällberg

Examiner: Dr. Nedelko Grbic

Blekinge Tekniska Högskola

SE 371 Karlskrona

ii

This thesis is submitted to the School of Engineering at Blekinge Institute of Technology in

partial fulfillment of the requirements for the degree of Master of Science in Electrical

Engineering with Emphasis on Signal Processing.

Contact Information:

Author:

Telagareddi S N U V Ramesh

E-mail: [email protected]

Supervisor

Dr. Nedelko Grbic



Blekinge Institute of Technology, Sweden


Phone: +46 455 38 57 27

Examiner:

Dr. Benny Sällberg



Blekinge Institute of Technology, Sweden


Phone: +46 455 38 55 87

mailto:[email protected]



iii

ABSTRACT

In general, an uncontrolled environment may contain degradation components like

background noise, speech from other speakers etc. along with required speech components. It

is very tough to concentrate only on speech signals in presence of background noise for

normal listeners and hearing impaired persons. The hearing organ is substantially sensitive to

interfering noise. This interfering noise itself decreases speech quality and speech

intelligibility which in turn causes speech communication troublesome. In many applications,

the improved speech enhancement is achieved with beamformer using multiple microphones

(microphone array). The main function of any beamformer is to create a beam in the direction

of the target and place a spatial null in the direction towards jammer. The aim of this thesis

work is to find better beamforming technique which suits for hearing aid and also makes the

hearing aid free from howling effect. The work investigates working of different beam

forming techniques like Elko’s, Wiener, Maximum SNIR and Delay and Sum beamformer.

The performance evaluation of all these beamforming techniques for hearing aid works under

various noises like interference, babble, wind, restaurant and white noise. The total thesis

work is collaboration of four members. In this, my selection of interest is on Elko’s

beamformer, and also on reduction of howling effect with NLMS algorithm.

All the beamformers are implemented in MATLAB and validated with different

measurements like Signal to Noise Ratio Improvement (SNRI), Speech Distortion (SD),

Noise Distortion (ND) and Perceptual Evaluation of Speech Quality (PESQ). Whereas

feedback canceller is validated with Perceptual Evaluation of Speech Quality (PESQ) and

Echo Return Loss Enhancement (ERLE), which is also implemented in MATLAB.

Keywords: Speech enhancement, Speech intelligibility, Speech communication and Beam

forming.

iv

ACKNOWLEDGEMENT

I would like to express my sincere gratitude to my thesis supervisor Dr. Nedelko

Grbic, for giving me a wonderful opportunity to do thesis research work in signal processing

filed under his supervision. His guidance and useful comments in every stage of thesis work

greatly contributed to my work, without this it would have been so difficult in doing this

research work successfully.

I also thankful to my thesis partners Santhurenu Vuppala, Harish Midathala and

Aditya Sriteja Palanki for their continuous discussions and valuable suggestions throughout

my work.

I would like to thank my parents for their support and encouragement for the

completion of thesis. I also thankful to my friends who supported me during this thesis work.

v

TABLE OF CONTENTS

Abstract iii

Acknowledgements iv

List of figures vii

List of tables x

List of acronyms and abbreviations xi

1 Introduction 1

1.1 Hands-free communication 1

1.1.1 Hands-free communication applications 1

1.1.2 Problems for Hands-free communication 3

1.2 Objective of the work and research question 4

1.3 Organization of report 4

2 Insights of Human Sound System 5

2.1 The Anatomy of Human Hearing 5

2.2 Hearing Impairments 6

2.3 Hearing Aids 7

2.4 Different types of Hearing Aids 8

3 Background Theories 11

3.1 Time delay filtering 11

3.1.1 Ideal fraction delay 12

3.1.2 Thiran Allpass Filter 14

3.2 Acoustic Room Modelling 15

3.2.1 Image model 17

3.2.2 Image Source Method 17

4 Beamforming techniques 20

4.1 Optimal beamformer 21

4.1.1 Maximum Signal to Noise-plus-Interference Beamformer 22

4.1.2 Wiener Beamformer 23

4.2 Delay and Sum (DSB) Beamformer 23

4.3 ELKO’s Beamformer 24

4.3.1 Derivation of adaptive first-order array 25

vi

4.3.2 Optimum β 27

4.3.3 Least Mean Square version for β 28

5 Acoustic Feedback Cancellation 30

5.1 System Overview 31

5.1.1 Doubletalk detector 32

5.1.2 Adaptive Filter 32

5.1.3 Nonlinear Processor (NLP) 32

5.2 Adaptive filter algorithms 32

5.2.1 Normalized Least Mean Square (NLMS) Algorithm 33

6 Implementation and Results 34

6.1 Implementation 34

6.1.1 Beamformer 34

6.1.2 Feedback Canceller 35

6.1.3 Test Data 35

6.1.4 Objective Measures 37

6.1.4.1 Signal to Noise Ratio Improvement (SNRI) 37

6.1.4.2 Perceptual Evaluation of Speech Quality (PESQ) 37

6.1.4.3 Speech and Noise Distortions 38

6.1.4.4 Echo Return Loss Enhancement (ERLE) 38

6.2 Results 38

6.2.1 Elko’s Beamformer 39

6.2.2 Wiener Beamformer 53

6.2.3 Max-SNIR Beamformer 53

6.2.4 Delay and Sum Beamformer 54

6.2.5 Comparison 54

6.2.6 Echo cancellation with NLMS algorithm 55

7 Conclusion and Future work 58

7.1 Conclusion 58

7.2 Future work 59

Bibliography 60

vii

LIST OF FIGURES

1.1 Typical hands-free communication 3

2.1 Anatomy of the human ear 5

2.2 A simplified model of an analogue hearing aid 7

2.3 Block diagram of digital hearing aid 8

2.4 Overview of different types of hearing aids 8

2.5 Ear worn hearing aids. From left to right, the types are: BTE, ITE, ITC and CIC 10

3.1 Microphone array with small spacing between the microphones 11

3.2 (a) continuous time signal , (b) delayed signal , (c) sampled signal

and (d) delayed and sampled signal 13

3.3 Continuous-time and impulse response of the ideal fractional delay filter, when

the delay is samples and samples 14

3.4 The group delay response of Thiran allpass filter with N=40 15

3.5 Different room acoustic models 16

3.6 Path involving one reflection obtained using one image source 17

3.7 (a) Rectangular room having source and receiver in it, (b) the first six images of

the source. The dark circle is the receiver location 18

3.8 Image source model of a rectangular room. The dark cell is the original room 18

4.1 An I channel finite impulse response beamformer 21

4.2 Basic model of Delay and Sum beamformer with m microphones 24

4.3 Diagram of a microphone array composed of two omnidirectional microphones

and delay circuit 25

4.4 Various directivity patterns for a first-order differential array at (a) ,

(b) ⁄ , and (c)

⁄ 26

4.5 Schematic implementation of an adaptive first-order differential microphone using

the combination of a forward and backward facing cardioids 26

4.6 Directional responses of the array in Fig. 4.3 at (a) , (b)

and (c) 27

viii

4.7 Directional response of the forward facing cardioid, backward facing cardioid 28

4.8 Measured directional responses for the differential array for and

chosen to give the nulls in approximately increments 29

5.1 Public Address (PA) system with acoustic feedback path 30

5.2 Acoustic feedback path in hearing aid inside the human ear 31

5.3 Block diagram of Acoustic Echo Cancellation 31

5.4 Model of Adaptive filter in AEC 32

6.1 Structure of any general beamformer 34

6.2 Power Spectral Density (PSD) plots of female, male and interference signals 36

6.3 Power Spectral Density (PSD) plots of Babble, wind, restaurant and white noise 37

6.4 SNRI for female speaker at angle of 300 and Noise/Interference at angle of 270

0 45

6.5 SNRI for male speaker at angle of 300 and Noise/Interference at angle of 270

0 45

6.6 SNRI for female speaker at angle of 600 and Noise/Interference at angle of 320

0 46

6.7 SNRI for male speaker at angle of 600 and Noise/Interference at angle of 320

0 46

6.8 Output PESQ for female speaker at angle of 300 and Noise/Interference at angle

of 2700 47

6.9 Output PESQ for male speaker at angle of 300 and Noise/Interference at angle

of 2700 47

6.10 Output PESQ for female speaker at angle of 600 and Noise/Interference at angle

of 3200 48

6.11 Output PESQ for male speaker at angle of 600 and Noise/Interference at angle

of 3200 48

6.12 Speech Distortion for female speaker at angle of 300 and Noise/Interference

at angle of 2700 49

6.13 Speech Distortion for male speaker at angle of 300 and Noise/Interference

at angle of 2700 49

6.14 Speech Distortion for female speaker at angle of 600 and Noise/Interference

at angle of 3200 50

6.15 Speech Distortion for male speaker at angle of 600 and Noise/Interference

at angle of 3200 50

6.16 Noise Distortion for female speaker at angle of 300 and Noise/Interference

at angle of 2700 51

ix

6.17 Noise Distortion for male speaker at angle of 300 and Noise/Interference

at angle of 2700 51

6.18 Noise Distortion for female speaker at angle of 600 and Noise/Interference

at angle of 3200 52

6.19 Noise Distortion for male speaker at angle of 600 and Noise/Interference

at angle of 3200 52

6.20 Average SNRI of different beamformers at various situations 55

6.21 ERLE plot for NLMS algorithm 56

6.22 Plot the needed signals (NLMS algorithm) in turn are: desired signal, output

signal and error signal 57

x

LIST OF TABLES

6.1 Represents the SNRI, PESQI, SD and ND for speech (female) as source

and interference (male) as noise 40

6.2 Represents the SNRI, PESQI, SD and ND for speech (male) as source

and interference (male) as noise 40


and babble noise as noise 41


and babble noise as noise 41


and wind noise as noise 42


and wind noise as noise 42


and restaurant noise as noise 43




and white noise as noise 44



6.11 Represents the SNRI, PESQI, SD and ND for wiener beamformer

(2-microphone case) 53

6.12 Represents the SNRI, PESQI, SD and ND for Max-SNIR beamformer


6.13 Represents the SNRI, PESQI, SD and ND for Delay and Sum beamformer


6.14 ERLE values for different filter orders 56

xi

LIST OF ACRONYMS AND ABBREVIATIONS

SNR Signal-to-Noise Ratio

SNRI Signal-to-Noise Ratio Improvement

PESQ Perceptual Evaluation of Speech Quality

NLMS Normalized Least Mean Square

SD Speech Distortion

ND Noise Distortion

dB decibels

ERLE Echo Return Loss Enhancement

DTD Double Talk Detector

FD Fractional Delay

PA Public Address

AEC Acoustic Echo Cancellation

NLP Non-Linear Processor

LMS Least Mean Square

RLS Recursive Least Square

APA Affine Projection Algorithm

GSC Generalized Sidelobe Canceller

Max-SNIR Maximum Signal-to-Noise plus Interference Ratio

DSB Delay and Sum

BTE Behind the Ear

ITE In the Ear

ITC In the Channel

CIC Completely in the Channel

Introduction

Blekinge Institute of Technology 1 Introduction

Chapter 1

Introduction

1.1 Hands-free communication

In today’s technology, conference calling stands out as one of the most effective way for conducting

high level communication in all type of companies. This is due to audio conferencing is less cost and

convenient. Also most of the personal computers and mobile phones are powered by voice. This in

turn brings up the demand for hands free communication. In most of the applications, flexibility,

safety and comfort are provided through the hands free communications. On other hand, Hand held

telephony in cars is prohibited in most of the countries because to avoid accidents. Also such type of

use in cars may damage the other electronic devices like navigation equipment, etc. [1, 2, 5].

However the receiver/microphone in hands free communication is at a distance from the speaker,

whereas in hand-held telephony the microphone is close to the speaker. So the effect of surrounding

noise, poor quality in sound and acoustic feedback from the far end side, are the drawbacks for hands-

free devices when compared with hand held devices. Instead of using single microphone in hands-free

telephony, improved speech enhancement performance is achieved with array of microphones [3].

This microphone array is able to perform the tasks like speech enhancement, reverberation suppression

and echo cancellation in effective manner.

1.1.1 Hands-free communication applications

Because of its flexibility, safety and convenience, Hands-free communication has more applications.

Some of the most important applications among them are as follows [4].

Audio conferencing

Hands-free communication in cars

Hearing aids and Hearing protection head sets

In the following section, advantages, requirements and challenges for each application are discussed.

Audio-Conferencing

In Audio-conferencing, the calling party wishes to have more than one called party listen in to the

audio portion of the call. The evolution in wireless broadband high-speed internet connections has

been exploited to develop audio and video communication systems for desktop computers, laptops and

mobile phones. Due to its convenient and cost effective nature, Audio conferencing becomes most

popular in all type of industries.

Introduction

Introduction 2 Blekinge Institute of Technology

Consider a conference room with low background noise levels, in which speech acquisition device

positioned at the center of the room. Before using the source localization algorithm the distance

between speaker and microphone, movement of the speaker and room dimensions are taken into

account. Based on those values the algorithm continuously determines the direction of the speaker.

Combined with video technology, these techniques can allow the system to concentrate on the speaker,

thus providing a combined video and audio capability.

Hands-free communication in cars

Hand-held telephony in cars while driving is prohibited in many countries. The car manufactures also

prohibit such type of communication since it will damage electronic equipment inside the car. Now a

day, different solutions are available for hands-free telephony in cars like Bluetooth device and

speaker mode in mobile etc. But some car manufacturers provide audio system to which the mobile

phone can be connected. Array of microphones are mounted at an optimal position like dashboard and

the ceiling of the car for the driver’s speech acquisition. The signal captured by the array of

microphones has background noise like engine noise, wind noise, tire friction and traffic noise along

with desired driver’s speech. This captured signal is processed and then transmitted back to the far end

speaker.

Hearing aids and Hearing protection head sets

Hearing loss can be partially compensated through the use of a hearing aid. It is an electroacoustic

device designed to amplify and modulate sound. Previous hearing aids are based on analogue

technology, have large fixed frequency responses while allowing emphasis of high or low frequencies,

whose spectrum cannot always match the hearing loss. To overcome the deficiencies of analogue

technology, Digital signal processing (DSP) devices would come to offer the best solution for hearing

aids.

In many countries, workers with a noise exposure above 85 dB[A] limit are requested to wear hearing

protectors [4]. In some environments such as aircrafts, helicopters, and other industrial work places, to

communicate with other worker/person while protecting their hearing, workers need hearing protectors

with speech enhancement capabilities. The detailed description of hearing aids and hearing protectors

are explained in chapter 2. In this report hearing aid is considered as the example for the hands-free

communication.

Introduction

Blekinge Institute of Technology 3 Introduction

1.1.2 Problems for Hands-free communication

By placing the microphone far distance from the source/speaker causes number of problems like

background noise, room reverberation, other interferences and also acoustic coupling. Fig. 1.1 shows

the typical hands-free communication.

Background noise is random noise, mostly generated by engine noise, tire friction, air noise while

considering car environment. In public places like restaurant and parks, Background noise is like

babble noise, audio equipment, music, etc. It is mostly uncorrelated with speech signal.

The sound produced in closed environments causing large number of echoes to build up and then

slowly decay as the sound is absorbed by the walls. All these reflected signals are added at

microphone with different gains and phase shifts. So the final signal at the microphone is reverberated.

This reverberation mainly depends on room dimensions and reflection coefficients of the walls.

Interference is the noise from neighboring speakers while compared with desired speech. Unlike

background noise, these signals are produced by spatially constrained sound sources. This

interference is also referred as “cocktail party noise”.

In some situations, the far-end signals from the loud-speaker are captured by the microphone in the

same way as interfering signals. At that time the speaker who hears his/her own voice echoed. This is

all due to acoustic feedback.

Fig. 1.1 Typical hands-free communication [4]

Introduction

Introduction 4 Blekinge Institute of Technology

1.2 Objective of the work and research question

The main objective of the thesis is to attenuate noise/interference and also enhance the source speech

signal in any hands-free communication device (in this report-hearing aid) under various noisy

environments. In this report, the speech enhancement will acquire from Elko’s beamformer. This

thesis will compare the different beamforming techniques on the basis of parameters: Signal-to-Noise

Ratio Improvement (SNRI), PESQ, Speech Distortion and Noise Distortion. This thesis will also make

the acoustic feedback cancellation in the hands-free device.

The research questions are like

How Elko’s beamforming technique provide a speech enhancement in hands-free device

(hearing aid).

How to make the hearing aid free from howling effect.

1.3 Organization of report

The thesis report is divided into seven chapters. The paper is organized as follows. A brief description

of human sound perception is in chapter 2. In chapter 3, background theories like Fractional Delay

(FD) filter and Room Impulse Response (RIR) are discussed. Chapter 4 describes the beamforming

techniques. Acoustic Feedback cancellation is discussed in chapter 5. Chapter 6 provides both the

implementation and results of all beamforming techniques. Finally, in chapter 7 provides conclusion

and future work of the thesis.

Insights of Human Sound System

Blekinge Institute of Technology 5 Insights of Human Sound System

Chapter 2


2.1 The Anatomy of Human Hearing

Human hearing is one of the most complex processes in our bodily functions. Ear is the vertebrate

sense organ that detects and receives sound and brain that hears it. The main aim of the ear is to

change the sound pressure waves from outside world into a signal of nerve impulses and send them to

the brain. The three main parts of the ear are outer ear, middle ear and inner ear. Fig 2.1 describes an

illustration of anatomy of the human ear [6].

Fig. 2.1 Anatomy of the human ear

The outer ear is external portion of the ear, it includes the pinna, the ear canal and external auditory

meatus. The pinna is the visible part, composed of a thin elastic cartilage covered with integument,

and connected to the surrounding parts by ligaments and muscles. The pinna helps direct sound

through the ear canal to the tympanic membrane. The external auditory meatus is slightly a curved

tube, extending from the pinna and ending at eardrum or tympanic membrane. The main aim of the

outer ear is to collect sound pressure waves and guide those waves to eardrum.

The middle ear is placed in between eardrum and oval window. It contains the three ossicles or

ossicular chain, which connects the eardrum to the inner ear. The three ossicles are malleus, incus and

stapes. The malleus is attached to the mobile portion of tympanic membrane. The incus is the

connecting part between malleus and stapes. The stapes is the smallest bone in the body. The


Insights of Human Sound System 6 Blekinge Institute of Technology

movement in the eardrum causes movement of the total ossicular chain. When the stapes footplate

pushes on the oval window, it causes the movement of fluid within the cochlea. The hallow space of

the middle ear is called tympanic cavity and the tube that connects tympanic cavity with nasal cavity is

called eustachain tube. The main function of the middle ear is to transfer acoustic energy from

compression waves in air to fluid membrane waves within the cochlea in efficient manner.

The inner ear is the innermost portion of the vertebrate ear, it includes both the hearing organ (the

cochlea) and sense organ. The gate for the inner ear is oval window, which consists of three

semicircular canals, the vestibule and the coiled cochlea. The main function of cochlea is to convert

sound pressure impulses from the outer ear into electrical impulses which are passed on to the brain

via the auditory nerve. The other two organs are involved in balance. The inner ear encased in the

hardest bone of the body and it is innervated by the eighth cranial nerve in all vertebrates.

2.2 Hearing Impairments

Among human disabilities, deafness could be considered as a serious handicap, which threatens an

important part of the population. When deafness happened accidentally during life, some candidates

report that they suffer a lot from this handicap since they were accustomed with hearing faculty [7, 8].

In humans, the term hearing impairment is used for people who have relative insensitivity to sound in

the speech frequencies. The severity of a hearing loss can be categorized according to the increase in

volume that must be made above the usual level before the listener can detect it. The term hearing

impairment is rejected and the terms like deaf and hard of hearing are preferred by the majority of the

deaf people around the world.

There are two different types of hearing impairments, conductive hearing impairment and

sensorineural hearing impairment. A combination of both hearing impairments is the third type.

Hearing impairments are categorized by their severity and by the age of onset.

A conductive hearing loss is present when the sound pressure waves are not reaching the inner ear.

This can be caused by a damaged tympanic membrane or eardrum, by the destruction of the external

auditory meatus or by malfunction of the bones of the middle ear. Sensorineural hearing loss is related

to propagation of neural impulses [6]. The majority of human sensorineural hearing loss is due to

abnormalities in the hair cells of the organ of corti in the cochlea. This loss can be mild, moderate or

severe.

In the developed countries, around 10 % of people suffer from hearing impairment [9, 10]. The major

groups are 45 % elderly people over the age of 65, 42 % in the age between 25 to 45 years and a small



amount of children in the age between 3 to 10 years. Another study in [11] shows that out of 1000

newborn children, 2 to 3 are suffering from hearing impairment.

2.3 Hearing aids

Hearing aid is an electronic device, which amplifies sound to help hearing impaired persons to hear.

Until the past two decades, commercial hearing aid technology had developed little beyond simple

linear amplification with peak clipping. Today, the most sophisticated aids are available to cover

nonlinear processing architectures for hearing loss compensation and some include simple noise

reduction processing. Demand on hearing aid is varying widely based on the degree and type of

hearing loss. A complete correction of hearing loss is not possible, only a partial restoration is possible

with today’s technology [12].

The first analogue technology based hearing aid was simple and placed behind the pinna. This type

may provide massive amount of amplification (8-12 dB) in certain frequency bands. With these

devices, the gain could be increased up to approximately 25-30 dB in the frequency band between 500

Hz and 1500 Hz. In this method high frequencies will be attenuated and also it has very limited control

over the resulting insertion gain. A simplified model of an analogue hearing aid is shown in Fig. 2.2.

Fig 2.2 A simplified model of an analogue hearing aid

At microphone, we have the input speech signal to be amplified and also feedback signal. These both

signal makes the total input signal that is amplified with gain. The resulting signal is transmitted

to the loudspeaker. The acoustic signal from the hearing aid travels to the tympanic membrane via the

external auditory meatus.

In order to improve quality of an analogue technology based hearing aids, some processing of the

signal information will be necessary to overcome the deficiencies. This can be achieved with Digital



signal processing (DSP) devices. DSP devices offer the best platform to design programmable and

adaptive digital hearing aids, which can process information in real time. The programmable digital

hearing aid allows a more precise auditory fitting that matches the needs of the client. A digital

hearing aid processes sound waves by encoding them as a series of numbers that measure pitch and

volume at any instant in time. This method of processing the sound wave, bit by bit, is more precise

and allows for filtering of background noise without affecting the overall sound quality. Fig. 2.3 shows

the complete process of digital hearing aid. The working principle of a digital hearing aid is to convert

a band limited analogue signal from the microphone into discrete time samples. A digital signal

processor can either process the samples directly in the time domain or manipulate them in the

frequency domain through spectral transformation. The final output is transmitted to eardrum or

tympanic membrane.

Fig. 2.3 Block diagram of digital hearing aid

2.4 Different types of Hearing Aids

There are many types of hearing aids, which vary in power, circuitry and size. Hearing aids are mainly

divided into two groups: Implanted hearing aids and external hearing aids [6]. An overview of

different types of hearing aids is shown in Fig. 2.4

Fig 2.4 Overview of different types of hearing aids



External hearing aids are subdivided into two subgroups: Body worn Instruments and Ear worn

Instruments.

Body worn hearing aids This was the first type of hearing aid consists of a case, an earmold and

attachment wire. The case contains amplifier section, controls and battery. This case is about the size

of pack of playing cards and carried on the body or in a pocket. The earmold contains a miniature

loudspeaker. In spite of its size constraints, body worn hearing aid provides large amplification, long

battery life. It available for lower prices in the market compared to other aids.

Ear worn hearing aid is the most common hearing aid, used by the majority of hearing patients. Four

types of ear worn hearing aids can be identified. All ear worn hearing aids are shown in Fig. 2.5

Behind the Ear (BTE) This type of aid consists a case behind the pinna, an earmold and connection

between them. The case contains the controls, battery, electronic equipment, microphones and the

loudspeaker. Sound is directed from the hearing aid, through the tubing, and through the earmold to

the eardrum. The sound from the aid can be routed either acoustically or electrically to the ear. If the

sound is routed acoustically, a plastic tube is used to deliver the sound from the loudspeaker to

earmold, while if the sound is router electrically then the speaker is placed in the earmold.

In the Ear (ITE) This type of aid is smaller than BTE and perfectly fits in the outer ear bowl. The

hearing aid case is made out of hard plastic. Due to its size, ITE hearing aid allow for optional manual

features such as a volume control, program button, or telephone switch. Feedback is possible in ITE

due to closeness of microphone and the receiver. Earwax and moisture are the problems for this type

of hearing aids.

In the Channel (ITC) This type of aid fills only the bottom half of the external ear. It is smaller than

the ITE hearing aid but slightly larger than completely in the channel (CIC) hearing aid. It is more

discrete than the ITE hearing aid and more suitable for mild to moderately severe hearing loss due to

its size. Like ITC, earwax and moisture are the same problems.

Completely in the Channel (CIC) This type of hearing aid is the smallest of custom hearing aid and

it is practically invisible to an observer. CIC hearing aid fits deep inside the ear canal. CIC hearing aid

is available in analogue and digital technology. Due to the small size, there is no option for directional

microphones and volume controllers. These hearing aids are for the people with mild to moderate

hearing loss.



Fig 2.5 Ear worn hearing aids. From left to right, the types are: BTE, ITE, ITC and CIC[6]

Implanted hearing aids are in turn sub divided into two sub groups: Destructive and non-destructive. In

destructive hearing aids, electrodes are placed inside the cochlea of the patient surgically. Sounds are

transmitted to these electrodes across the skin, bone and cartilage by an FM radio signal. This type of

treatment is suitable for patients with severe sensorineural hearing loss [12]. The surgical procedures

are irreversible in destructive hearing aid.

In non-destructive implanted hearing aids, instruments are relying on conventional bone conduction

and direct bone conduction. A conventional bone conduction hearing aid works by conducting, or

carrying, sound through the temporal bone. The person hears sound the when the vibrations of the

sound are transmitted directly from the vibrating part of the bone conduction hearing aid through

temporal bone to the cochlea, missing out the outer and middle ears. Such type of arrangement may

cause pain, headache, skin irritation and eczema. Bone Anchored Hearing Aid (BAHA) is the

developed version of bone conducted hearing aids. In this, skin-penetrating titanium screws are

implanted behind the ear and the bone conductor is attached to this titanium screw. User comfort and

the fidelity are increased with this BAHA. The surgical procedures are reversible in this type. The

other type of non-destructive implanted hearing aid is the middle ear implant. This type of hearing aid

converts sound waves into mechanical vibrations. The middle ear implant excites the ossicular chain

directly via a small exciter.

Implanted hearing aids have some advantages compared to the external hearing aids. Those are as

follows.

No need of ear molds

No occlusion effects

Suitable for patients suffering from chronic otitis

Negligible feedback effect

Background theories

Blekinge Institute of Technology 11 Background theories

Chapter 3

Background Theories

In digital signal processing area, microphone array technique is growing very rapidly over single

microphone technique. Beamforming techniques use these microphone array concepts to enhance the

speech signals. Fig. 3.1 shows the geometrical microphone array setup with two microphones.

Fig. 3.1 Microphone array with small spacing between the microphones

In the Fig. 3.1, is speech/noise signal angle, is spacing between microphones, ( ) is speech/noise

signal and is time delay

⁄ ⁄ (3.1)

In this report, signals used are considered as far-end signals, since the spacing between the

microphones is very small for hearing aid (hands-free device). In order to implement different

beamforming techniques on hearing aids, there will be a need of time delay filtering which is

described in section 3.1. Acoustic room model can also be described in section 3.2 for reverberated

environment.

3.1 Time delay filtering

Digital signal processing techniques has more advantages than traditional analog techniques. One

fundamental advantage is easy implementation of constant delay. But this constant delay works

perfectly until the desired delay is only the multiples of the sampling interval. In some applications

Background theories

Background theories 12 Blekinge Institute of Technology

sampling locations must be changed or accurate time delays are needed instead of constant delays.

Fractional delay (FD) filters are useful in such situations.

Fractional delay filters are designed for bandlimited interpolation. Bandlimited interpolation is a basic

tool having massive application in digital signal processing. The problem is to compute the signal

values at arbitrary continuous times from a set of discrete time samples of the signal amplitude. In

other words, there must be able to interpolate the signal between samples. Since the original signal is

always assumed to be bandlimited to half the sampling rate ( ⁄ ), Shannon's sampling theorem tells

that the signal can be exactly regenerated from its samples by bandlimited interpolation.

Fractional-delay filters are widely useful in areas like music synthesis, synchronization of digital

modems, speech coding and synthesis [13, 14]. The decisions of the received bits or symbol values in

digital communication system are made by taking samples from incoming received continuous time

pulse sequence. To minimize probability of erroneous decision, the time impulse sequence should be

exactly at the middle of each pulse and also the synchronized sampling frequency, sampling instants

are necessary. In the modeling of musical instruments, it is important to calculate propagation delays

accurately to avoid that the instruments sound out of tone. Delays from tubes, strings and other

resonators are not multiples of sampling interval used. The theory and design of a fractional delay

filters are described in next section.

3.1.1 Ideal fraction delay

The ideal fraction delay is the digital version of a continuous-time delay line. The delay system should

be rendered bandlimited using an ideal low pass filter, where the delay only shifts the impulse

response in the time domain [15]. Consider the continuous time signal ( ) shown in Fig 3.2(a) and it

is delayed with the continuous time delay operator ( ). The delayed signal is denoted as

( ) is shown in Fig 3.2(b). On other hand consider the sampled signal ( ) shown in Fig 3.2 (c).

The delayed discrete time signal ( ) ( ) is obtained from the sampling of delayed

continuous time signal ( ) shown in Fig 3.2(d). The alphabet is a positive integer which

denotes the amount by which signal is delayed. In traditional DSP theory, can be only integer. But

in many applications the delay should be the fractional value rather than rounded integer value.

( ) ( ) (3.2)

The transfer function of ideal delay element can be obtained by taking Z-transform of Eq. 3.2.

http://www-ccrma.stanford.edu/~jos/st/Shannon_s_Sampling_Theorem.html

http://www-ccrma.stanford.edu/~jos/st/Shannon_s_Sampling_Theorem.html

Background theories


Fig 3.2 (a) continuous time signal ( ), (b) delayed signal ( ), (c) sampled signal ( )

and (d) delayed and sampled signal ( )

( ) ( )

( ) (3.3)

The main assumption while doing the Eq. (3.3) is that is an integer, if not the transform will have to

be expressed as a series expansion. Consider is a positive real number defined as the sum of its

integer part ⌊ ⌋, and the fractional part

⌊ ⌋ (3.4)

The ideal fractional delay filter can be described in frequency domain as

( ) ( ) (3.5)

The phase response of ideal delay element is linear with slope of , while the magnitude response is

unity for all frequencies. This type of system can be called as allpass system with linear phase

response.

| ( )| (3.6)

{ ( )} (3.7)

From the Shannon’s sampling theorem, a sinc interpolator can be used to calculate exact signal value

at any point in time, based on the upper frequency of ⁄ . This can be done by convolving a discrete

time signal ( ) with sinc to give the signal sample at any arbitrary continuous time

( ) ∑ ( ) ( ) (3.8)

The delayed sinc function can be referred as an ideal fractional delay interpolator

( ) ( ) ( ( ))

( ) (3.9)

Given a desired fractional delay value, the fractional delay filter coefficients can be obtained with this

infinite length delayed sinc function. Due to this infinite length, it is evident that FIR fraction delay

Background theories


filter will be always an approximation to the ideal case. For example, the ideal FD filter unit impulse

responses for two delay values and are shown in Fig 3.3

Fig 3.3. Continuous-time (solid line) and sampled (dots) impulse response of the ideal fractional delay filter,

when the delay is samples (above) and samples (below) [15].

Many design methods have been proposed for fractional delay filters of FIR and IIR type. Within the

class of FIR filters, Lagrange interpolation has been popular choice since it satisfies all the desired

properties of FD filter. Whereas in IIR filters, digital allpass filters are considered the most popular

choice since their magnitude response is exactly flat and the design can concentrate entirely on the

phase response. The design method of allpass delay fractional filter is based on solving a set of linear

equations or on an iterative optimization algorithm [13, 14, 16]. Although these methods provide

nearly optimal designs, but when high order filters are required or when coefficients values must be

calculated in real time, usefulness of these methods are limited to some extent.

3.1.2 Thiran Allpass Filter

Thiran in 1971, proposed an analytic method with closed-form design for all-pole filters having a

maximally flat group delay [17]. The transfer function of discrete time all-pass filter is below:

( ) ( )

( )

( )

( ) (3.10)

Where is the filter order and ( ) are the filter coefficients. The Thiran design

formula for a fractional delay allpass filter can be written as follows [13, 18].

Background theories


( ) ( )∏

(3.11)

where is the real valued delay parameter. The first coefficient is always 1, so there is no need to

normalize the coefficient vector [19]. Since the group delay of an allpass filter is twice that of

corresponding all-pole filter, the desired delay in allpass filter is substituted with

in Eq. 3.11.

th order rational polynomials of delay can be computed from Eq. 3.11. For example, when For

instance, when , the filter coefficients are ( ) ( )⁄ and

( )( ) ( )( )⁄ . Here , stands for group delay in samples.

Thiran also showed that if , the roots of the denominator (poles) polynomial are within the unit

circle in the complex plane, which states that the filter is stable. The filter is even stable in the case of

. The numerator is just the mirror version of denominator and also poles are inside the

unit circle, which means zeros are outside the unit circle. The radii of the poles and zeros are inverse to

each other, whereas angles are same. The group delay response of the Thiran allpass filter with the

order number is shown in Fig. 3.4.

Fig 3.4 The group delay response of Thiran allpass filter with

3.2 Acoustic Room Modelling

Room acoustics is one of the major concepts in the field of acoustic signal processing. Over the last

few years, the main attention of many acoustic field researchers is on the reduction of room

reverberation. In this thesis work, acoustic room modelling is used to simulate the propagation of

speech signals in a typical room. This can be achieved by convolution of speech signals with

simulated room impulse responses for particular positions of the speaker and microphone. An impulse

response from a microphone to a source can be achieved by solving the wave equation given below.

Background theories


( )

( ) (3.12)

Where c is the speed of propagation 340 m/s, ( ) is a function representing the sound pressure at a

time instant for a point [ ] in space with Cartesian coordinates. There are three main

methods of modelling: wave-based, ray-based and statistical [20]. All these different room acoustic

models are shown in Fig 3.5

The wave-based methods are Finite Element Method (FEM), Boundary Element Method (BEM) [21,

22]. The most accurate results can be obtained by these methods. The only difference between these

two methods is in the element structure. In BEM, the boundaries of the space are divided into volume

elements, whereas in FEM the space is divided into volume elements. These methods are well suited

for low frequencies and also for small enclosures. At high frequencies, the number of elements

required becomes very high, resulting in a large computational complexity. The most complex part in

these methods is to define boundary conditions and geometrical description of the objects.

Fig 3.5 Different room acoustic models

The ray-based methods are ray-tracing method and image source method [23, 24]. These methods are

based on geometrical room acoustics. The main difference between these methods is the way

reflection paths are calculated [20]. The ray-tracing method can be applied to geometries formed by

arbitrary surfaces, whereas the image method is limited to geometries formed by planner surfaces. In

ray-tracing method, the power emitted by a sound source is obtained from finite number of rays. The

rays are reflected after every collision with the room boundaries, their energy decreases as a

consequence of the sound absorption of the air and of the walls involved in the propagation path. After

all the rays reached the receiver, energy calculation is performed. When all rays are processed the

impulse response of the room is derived.

Background theories


The statistical modelling method named as Statistical Energy Analysis has been widely used in ship,

automotive industry and aerospace. Since this method do not model the temporal behavior of a sound

filed, it is not suitable for auralization purposes.

3.2.1 Image model

The model can be used to simulate the reverberation in a specified room based on locations of

microphone and source. Consider sound source is placed near a reflecting wall and the receiver

is placed somewhere in the room. Fig 3.6 shows the path involving one reflection obtained with one

image source. In the figure image source is located behind the wall at distance equal to the

distance of the source from the equal. At the receiver two signals arrive, one from the direct path

and other from the reflection. The triangle ( ) ( ) is isosceles, so the path length ( )

( ) is the same as ( )( ). In order to compute the path length of the reflected signal, one can

construct an image of the source and calculate the distance between receiver and image source. The

number of reflections involved in the path is equal to the level of images that was used to calculate the

path.

Fig 3.6 Path involving one reflection obtained using one image source.

3.2.2 Image Source Method

Consider a rectangular room with dimensions and as length, width and height respectively.

The location of the sound source is represented with the vector [ ] and also the location

of the receiver/microphone is represented with the vector [ ]. Fig. 3.7(a) shows the

rectangular room having source and receiver positions. These two vectors are with respect to the

origin, which is placed at one of the corner of the room.

Background theories


Fig 3.7 (a) Rectangular room having source and receiver in it, (b) the first six images of the

source. The dark circle is the receiver location.

The corresponding positions of the images measured with respect to receiver position and calculated

using the walls at and and it can be written as

[( ) ( ) ( ) ] (3.13)

Every element in the ( ) can take the values either 0 or 1, resulting in eight different

combinations. When the value of is 1 in any dimension, then an image of the source in that direction

is considered. The rectangular pattern of image rooms is repeated as shown in Fig. 3.8. In order to

consider all the image sources, the vector is added to where

[ ] (3.14)

where and are integer values. Every element in the ( ) can take the values

from to .

Fig. 3.8 Image source model of a rectangular room. The dark cell is the original room.

Background theories


The order of reflection related to an image at the position is given by

| | | | | | (3.15)

The distance between microphone and any image source is given by

‖ ‖ (3.16)

The impulse response for any sound source and microphone can be written as

( ) ∑ ∑ | |

| | | |

| | | |

| | ( )

(3.17)

where is the time delay of arrival of the reflected sound ray corresponding to this sound source,

denotes a set which contains all desired triples and similarly denotes a set that contains all the

triples . The other quantities and are the reflection coefficients of all six

walls. The ideal discrete version of Eq. 3.17 is given by

( ) ∑ ∑ | |

| | | |

| | | |

| | { ( )}

(3.18)

The source signal can be convolved with the room impulse response computed from the above Eq.

3.17 in order to simulate the signal picked by the microphone.

Beamforming techniques

Blekinge Institute of Technology 20 Beamforming techniques

Chapter 4


In many applications, the improved speech enhancement is achieved with multiple microphones

(microphone array) instead of using single microphone. The ability of the microphone array is to

exploit the spatial correlation of the multiple received signals. Also with the microphone array, spatial

and temporal domains are well utilized for the received signals. The signals propagating spatially

encounter the existence of both interfering and noise signals. Temporal filtering cannot be utilized to

separate the desired signal from the interfering signal, when both signals occupy the same temporal

frequency band. However in general both desired and interfering signals originate from various spatial

locations. This spatial separation can be exploited to separate the desired source signals from the

interference using a beamformer [25]. The beamformer is defined for a specified region corresponding

to desired source location. The main function of beamformer is to create a beam in the direction of the

target and place a spatial null in the direction towards jammer. The beamforming system can be

designed to provide a beam pattern with required characteristics.

All the available beamforming techniques are classified as either data independent (fixed) or

statistically optimum (adaptive). This classification depends upon how the weights are chosen. In data

independent beamformer the weights are chosen to present a specified response for all signal and

interference scenarios and are not depend on the array of data. Also the weights are taken in such a

way that the beamformer response approximates a desired response. Delay-Sum and the Filter-Sum

beamformers are the quite simple solutions of this type and they are limited by the number of

microphones and incapable of reducing highly directive noise sources. In statistically optimum

beamformer, the weights are based on the statistics of the array data. These array statistics are usually

unknown and also changed with time, so adaptive algorithms are used to determine the weights.

Because of weight adaptability, beamformer response converges to a statistically optimum solution.

Generalized Sidelobe Canceller (GSC) and Forst beamformers are the examples of this type. These

beamformers have high capability of interference cancellation but they are much more sensitive to

steering error, suffer from signal leakage and degradation.

Consider a signal model where source is at fixed position and noises form different positions. Both

fixed point sources and interfering or noise sources can be modeled as mixture of both coherent and



incoherent noise fields [1, 26]. The output of each sensor consists of speech signal , mixture of

coherent and incoherent noise sources and also sum of fixed point noise sources

∑ (4.1)

where, and are the :th microphone observations. Fig. 4.1 shows the structure of

linear finite impulse response beamformer. The output of the beamformer is given by

∑ ∑

(4.2)

where, is the order of the filter and , are the filter taps for the channel .

Fig. 4.1 An channel finite impulse response beamformer [1]

By inserting the signal model into Eq. 4.2, then time domain optimization objective according to

{

‖∑ ∑ [∑

] ‖

(4.3)

In the next section some of the beamforming techniques are discussed.

4.1 Optimal beamformer

Optimum beamformers are based on power criteria of the observed microphone signals. The former is

also known as maximum array gain beamformer [1, 27]. Optimal beamformers are subdivided into

Wiener beamformer and Maximum Signal to Noise plus Interference (Max-SNIR) beamformer based

on optimal weights used. The power of the beamformer output when only the speech signal is

active, is given by auto-correlation function,

[ ] { [ ]

[ ]} ∑ ∑ ∑ ∑ [ ] [ ]

[ ]

(4.4)

where [ ] denotes the cross correlation function between microphone observations and

when [ ] is active and * denotes the conjunction. The Eq. 4.4 can be rewritten with matrix notation

as


Beamforming techniques 22 Blekinge Institute of Technology

[ ] (4.5)

where H

denotes hermitian and is defined as

[

] (4.6)

where

[

[ ] [ ]

[ ] [ ]

] (4.7)

and the filters , are arranged in the following way

[

]

(4.8)

where

[ [ ] [ ] [ ]] (4.9)

In the similar manner, one can write an expression for the noise-plus-interference power, [ ],

when the speech is inactive and all other noise sources are active.

[ ] (4.10)

where defined as

[

] (4.11)

and

[

[ ]

[ ]

[ ]

[ ]] (4.12)

where, [ ] is the cross correlation between microphone and , when all other interference

source and noises are active.

4.1.1. Maximum Signal to Noise-plus-Interference (Max-SNIR) Beamformer

The output signal-to-noise plus interference power ratio (SNIR) is defined as

(4.13)

Max-SNIR beamformer maximizes Q value. The optimal weights are obtained by maximizing a ratio

between two quadratic forms,

{

} (4.14)

The Eq. 4.14 is referred to as generalized eigenvector problem. By introducing a linear variable

transformation, the Eq. 4.14 can be rewritten as



⁄ (4.15)

From the Eq. 4.14 and Eq. 4.15

{

⁄

⁄

} (4.16)

where is the eigenvector having maximum eigen value .

⁄

⁄ (4.17)

So, final optimal weights can be written as inverse of linear variable transformation

⁄ (4.18)

4.1.2 Wiener Beamformer

In this beamformer, the weights minimizes the mean square difference between the beamformer output

when all sources are present, to single sensor value when only the signal of interest is present [1]. The

optimal weights can be written as

{| [ ] [ ]| } [ ] (4.19)

where [ ] is the beamformer output and [ ] is the sensor observation. The optimal weights which

minimize the square difference between the output and the reference signal can be rewritten as [1, 28].

[ ] (4.20)

The cross correlation vector can be defined as

[ ] (4.21)

with

[ [ ] [ ] [ ]] (4.22)

with each element as

[ ] { [ ] [ ]} [ ] (4.23)

The cross correlation vector is one column of , if the reference sensor is taken as one microphone

observation. Which column is used, based on reference microphone.

4.2 Delay and Sum (DSB) Beamformer

The basic idea behind the Delay and Sum beamforming is that when a sound signal impinges upon the

microphone array, the microphone outputs are added up together with appropriate amount of delays.

The delays are based on physical spacing between the microphones. The geometrical arrangement will

also affect the array characteristics. The Fig. 4.2 shows the basic model of delay and sum beamformer.



Fig. 4.2 Basic model of Delay and Sum beamformer with microphones

In delay and sum beamforming, delays are introduced after each microphone to compensate for the

arrival time difference of the speech to each to each microphone. The delayed time signals at the

outputs are summed together. This will reinforce the desired speech signal while the noise or

interference signals are combined in an unpredictable manner. The total signal-to-noise (SNR) of the

signal is greater than or equal to that of any particular microphone’s signal. This total arrangement

makes the pattern more sensitive to sources from a particular desired direction.

The main drawback of delay and sum beamforming is the requirement of number of microphones in

order to improve SNR. For every two microphones increment in the system will improve additional 3

dB in the SNR. One more disadvantage of delay and sum beamforming is that no nulls are placed in

jammer direction.

4.3 ELKO’s Beamformer

Directional microphones are well opted for noise reduction when compared with the omnidirectional

microphones. Elko has proposed a best solution for these directional microphones. In some acoustic

noise fields, Elko’s algorithm improves signal-to-noise ratio (SNR) by attenuating sound sources from

one direction. A simple Elko system shown in Fig. 4.3, which contains two closely spaced

omnidirectional microphones. The self-optimization is based on minimizing the microphone output

under the certain constraint that the single null is placed in the rear-half plane [29, 30, 31, 32]. The

constraint is conceived by the subtraction of time-delayed outputs from omnidirectional microphones.

The proposed solution does not maximize the SNR but it can considerably improve the SNR in some

acoustic fields. This proposed system is very easy to implement and has low computational cost.



4.3.1 Derivation of adaptive first-order array

The plane sound wave signal with spectrum and wavevector , reaches one microphone

before the other in Fig 4.3. The additional time taken to reach the other microphone when compared

with the first microphone is denoted as and it will surely depends on distance between microphones

and angle of incoming sound wave signal

Fig 4.3 Diagram of a microphone array composed of two omnidirectional microphones and delay circuit

⁄ ⁄ (4.24)

where is speed of sound propagation. The delay element delays the one of the microphone’s

output. By taking both normal signal and delayed signal from Fig 4.3, it is possible to steer the null

with the time delay . The output signal can be as follows

By using the Eq. 4.24,

⁄ (4.25)

By transforming the Eq. 4.25 into frequency domain, the output becomes

[ ⁄ ] (4.26)

In Fig 4.4, the magnitude response plot for Eq. 4.26 is plotted for three different values of . For

different values of between and ⁄ , it is possible to steer the null between and . Taking

magnitude response of Eq. 4.26 yields,

| | | [ ⁄ ]

| (4.27)

Assuming small spacing and also delay and , where ⁄ ,

| | | [ ⁄ ]| (4.28)

Eq. 4.28 consists of a monopole term and dipole term . The amplitude response of the first order

differentiator rises linearly with frequency. This frequency dependency can be easily compensated in

practice by applying a first-order lowpass filter at the final array output. The Fig. 4.4 shows the



directional responses of the array in Fig 4.3 at ⁄ and ⁄ . However for any time delay

between the two microphones, this solution is not attractive. The computational requirements to

realize the adaptive algorithm and general delay are unattractive for the real-time implementation [29].

Fig 4.4 Various directivity patterns for a first-order differential array at (a) , (b) ⁄ , and (c)

⁄

One effective approach to implement a general first-order differential microphone is a simple scalar

combination of two back-to-back cardioid microphones. This approach is the best way to avoid the

necessity to generate the delay directly. Fig. 4.5 shows the back-to-back cardioid arrangement.

Fig. 4.5 Schematic implementation of an adaptive first-order differential microphone using the combination of a

forward and backward facing cardioids [29].

By choosing ⁄ , we can form back-to-back cardioid directly by subtracting the delayed

microphone signals. The low pass filter in the Fig. 4.5 is used to compensate the differentiator

response of the differential microphone. The expression for the forward facing cardioid and the

backward facing cardioid are as follows

Substitute the Eq. 4.24 in the above expressions,



⁄ (4.29)

⁄ (4.30)

and final output becomes, (4.31)

Transforming the Eq. 4.31 into frequency domain, then output becomes

( ⁄ ) ⁄ (4.32)

In this case , time delay (fixed to one sample). By changing the amount of the

backward facing cardioid in the output , it is possible to steer the null. Fig. 4.6 shows the

directional response of back-to-back cardioid arrangement for three different values of . The Fig. 4.6,

clearly shows that by changing the value of between 0 to 1 could steer the null from and .

Fig. 4.6 Directional responses of the array in Fig. 4.3 at (a) , (b) and (c)

Another type of approach is to make the spatial origin at the array center, then the expressions for

and now becomes

⁄

(4.33)

and ⁄

(4.34)

Normalizing the output signal by the input spectrum gives

|

| |

| (4.35)

4.3.2 Optimum

The value of which minimizes the minimum mean square value of the output is the optimum.

Squaring the Eq. 4.8 and taking expectation on both sides,

[ ]

(4.36)

where, ,

are the power spectrums of the front cardioid and back cardioid signals



Fig. 4.7 Directional response of the forward facing cardioid (solid line), backward facing cardioid (dotted line)

and is the cross power spectrum between front and back cardioid signals [29, 30]. The

minimum value of can be obtained by taking the derivative of Eq. 4.39 with respect to and set the

value to zero. Then,

(4.40)

Since the second derivative is positive, the value of in Eq. 4.40 is the minimum value. In real

time DSP implementations, estimates of the power and cross power spectrums are used since the

acoustic fields in which we intend to operate the adaptive microphone are stationary [29].

4.3.3 Least Mean Square version for

In order to make the system adaptive, the LMS algorithm is used to update the value of . Squaring

the Eq. 4.31 yields,

(4.41)

Minimum of the error surface [ ] can be obtained by the steepest descent algorithm. This

algorithm finds by stepping in the opposite direction to the gradient surface with respect to . The

steepest descent update equation can be as follows,

[ ]

(4.42)

where, is the step-size

Performing the differentiation with respect to on Eq. 4.41 yields,



(4.43)

Since LMS algorithm uses the instantaneous estimate of the gradient, the expectation in Eq. 4.42 is not

applicable. Instead of expectation operation, normal instantaneous estimate is used in Eq. 4.42. The

update equation for now becomes,

(4.44)

The main drawback of the LMS algorithm is that it is sensitive to the scaling of its input. This makes it

very hard to choose a learning rate depending on the step size that guarantees stability of the

algorithm. The Normalized Least Mean Squared (NLMS) algorithm is a variant of the LMS algorithm

that solves the problem by normalizing the step size with the input power. Then NLMS update

equation for is therefore

⟨ ⟩

(4.45)

where the brackets in the Eq. 4.45 indicate a time (or block) average. Fig. 4.8 has shown the directivity

plots for values of which resulted in nulls being placed in approximately increments.

Fig. 4.8 Measured directional responses for the differential array for and chosen to

give the nulls in approximately increments.

Acoustic Feedback Cancellation

Blekinge Institute of Technology 30 Acoustic Feedback Cancellation

Chapter 5


People amplify their voices in various situations by using public address (PA) systems. In most of the

situations, acoustic paths exist between the speaker and the addressing person. Fig. 5.1 shows the

acoustic feedback path in PA system. Acoustic feedback is a considerably serious problem in sound

amplification systems. It is often referred to as howling, whistling, screeching or squealing. Acoustic

feedback may arise either whenever an acoustical, electrical coupling exists between a microphone

and a loudspeaker or when the signal in the feedback loop grows unboundedly. Since the squealing is

usually very loud, it is unpleasant.

Fig. 5.1 Public Address (PA) system with acoustic feedback path (dotted line)

For example, acoustic feedback is very common problem in hearing aids, because loud speaker and

microphone positions are very close to each other. A portion of the sound coming out of the speaker is

collected by the microphone, amplified and then delivered again to speaker. This process continues

until the hearing aid goes into audible feedback oscillations. Because of this feedback oscillation,

maximum amplification in hearing aid is limited to some extent. This becomes a problem for the

hearing aid user who typically needs to maximize the audibility and gain from the hearing aid. In order

for howling to occur, the open loop gain, i.e. internal hearing aid gain and the feedback gain of the

system must be greater than unity, and also phase response of the system must be an integer multiple

of 2 at some frequency [6]. Fig. 5.2 shows the acoustic feedback in hearing aid inside the human ear.



Fig. 5.2 Acoustic feedback path in hearing aid inside the human ear [33]

The maximum insertion gain of the hearing aid can be increased with suppression of acoustic

feedback. The ability to acquire target insertion gain leads to better utilization of the speech

bandwidth, which in turn improves the speech intelligibility for the hearing impaired person [34].

Since the hearing aids are worn by the living humans, the properties of the different acoustic channels

involved are non-stationary. Mandibular movements such as chewing or yawning are the situations

that inevitably will alter the feedback channel properties. Based on the acoustic feedback environment,

the acoustic path transfer function can vary significantly. Hence, the acoustic feedback cancellers

should be adaptive [33].

5.1 System Overview

The block diagram of Acoustic Echo Cancellation (AEC) is shown in Fig. 5.3. The AEC system

consists of three important blocks, namely

Doubletalk detector

Adaptive filter

Nonlinear processor

Fig. 5.3 Block diagram of Acoustic Echo Cancellation


Acoustic Feedback Cancellation 32 Blekinge Institute of Technology

5.1.1 Doubletalk detector

In the presence of far-end signal, it is very important to know that the near-end speech signal is exits

or not. It is also important to predict when the adaptation of the filter would stop. The situation, where

both the far-end signal and near-end signal are present is called as double-talk. In double-talk situation,

the error signal has both near-end signal and echo estimation error. While updating the filtering

coefficients with this error signal, the final result tends to diverge. Double-talk detector is the solution

to overcome this problem. There are several methods of DTD such as Geigel, Benesty and Normalized

Cross-Correlation. In this thesis, Normalized Cross-Correlation method [43] is used to detect the

presence of double talk. This algorithm computes the decision static depending on the relations of

microphone signal and error signal.

5.1.2 Adaptive Filter

It is most important block and it plays a vital role in the acoustic echo cancellation. It estimates the

echo path for getting a replica of echo signal.

5.1.3 Nonlinear Processor (NLP)

It is used for partly or completely cancels the residual signal in the absence of near-end speech signal.

Removing of the residual signal will also cancels the any existing acoustic echo. The non-linear

processor is a device with a defined suppression threshold level in which signals having a level

detected:

Below the threshold are suppressed.

Above the threshold are passed (although the signal can be distorted).

The non-linear processor functions only during single talk situations. The non-linear processor

attenuates the residual echo that could not be cancelled by the adaptive filter.

The nonlinear processor (NLP) is required for completely or partly cancels the residual

signal in the absence of near-end speech signal. By removing the residual signal will cancel any

occurring acoustic echo. The NLP will gradually cancel the signal and insert a form of comfort noise

to give the impression to far-end. The NLP as well as the adaptive filter need an accurate estimation

from the DTD to operate efficiently.

5.2 Adaptive filter algorithms

The performance of adaptive echo canceller is mainly determined by the adaptive filter algorithm.

Adaptive filter characteristics are changed in order to achieve optimum desired output. An adaptive

filter with adaptive algorithm minimizes the error signal. The Fig. 5.4 shows the model of adaptive

filter used in AEC.



Fig. 5.4 Model of Adaptive filter in AEC

The notations in the Fig. 5.4 are as follows

( ) is Far-end signal, ( ) is Near-end signal, is true echo path, ( )is echo signal, ( )is

microphone signal, ̂ is estimated echo path, ̂( ) is estimate echo signal and ( ) is error signal.

The adaptive filter minimizes the echo ( ( ) ̂( )) to zero in order to get only near-end signal

( ) in the perfect situation. In AEC, the adaptive filter plays the important role to overcome the echo

problem through adaptation of filter weights. Different algorithms are proposed to overcome the

problem such as Least Mean Square (LMS), Normalized Least Mean Square (NLMS), Recursive

Least Square (RLS) and Affine Projection Algorithm (APA) and etc. Out of all adaptive algorithms,

NLMS algorithm is the most popular algorithm implemented in echo cancellation. It is simple to

implement and also guarantees convergence.

5.2.1 Normalized Least Mean Square (NLMS) Algorithm

Normalized Least Mean Square (NLMS) is actually derived from Least Mean Square (LMS)

algorithm. The requirement to derive NLMS algorithm is that the input signal power changes in time,

which will affect the convergence rate in LMS algorithm. Small signals will slow down the

convergence rate and loud signals will increase the convergence rate. To overcome this, the step-size

parameter in LMS should be normalized. The step-size for computing the update weight vector is

( )

‖ ( )‖ (5.1)

where, ( ) is the step-size parameter at sample, is normalized step-size ( ) and is

small positive constant. So finally the weight vector update equation now becomes,

( ) ( )

‖ ( )‖ ( ) ( ) (5.2)

Implementation and Results

Implementation and Results 34 Blekinge Institute of Technology

Chapter 6


In this chapter, the implementation and analysis of the four beamformers such as Elko’s, Wiener,

Max-SNIR and Delay-and-Sum beamformer are presented. In this chapter acoustic feedback

cancellation NLMS algorithm is also performed. Next section 6.1 describes the implementation and

experimental setup of the all the beamformers. Section 6.2 describes the experimental results.

6.1 Implementation

6.1.1. Beamformer

The implementation and performance evaluation of each beamformer are carried out in the MATLAB.

In general, block diagram for the any beamformer is shown in Fig. 6.1. The speech signal is given

from one direction/angle while interference/noises from other directions to beamformer. By

considering far-end speech/noise signals and also one of the microphones as the reference, the signals

will take some extra time to reach other microphones when compared with the reference microphone.

The extra time will depend on the direction of arrival and spacing between microphones. This extra

time is referred as time delay. Different angles will provide different time delays. Fractional Delay

(FD) filters are used for producing such time delays at any source/noise angles. The theoretical part of

FD is discussed in chapter 3.

Fig. 6.1 Structure of any general beamformer


Blekinge Institute of Technology 35 Implementation and Results

In the Fig. 6.1, the number of elements in microphone array may vary from beamformer to

beamformer. For example, Elko’s algorithm is implemented with two microphones, whereas for other

beamformers this number varies. Elko beamformer is designed for two microphone case. In other

beamformers based on the number of microphones, signal-to-noise ratio (SNR) and speech

intelligibility of the output will also changes. But in this thesis all the algorithm results are

implemented for 2 microphone case. The theoretical description of each beamformer is discussed in

the chapter 4. In the hearing aid, because of its miniature structure number of microphones inserted in

the aid is limited to some extent. In this thesis, for different angles of speech and interference/noise,

the beamformer’s output is noted. The SNR of the output is calculated and it is compared with the

input SNR. The difference between these two SNR’s is the performance of the beamformer.

Perceptual Evaluation of Speech Quality (PESQ), Speech Distortion (SD) and Noise Distortion (ND)

are also calculated for the beamformer’s output.

6.1.2 Feedback Canceller

The implementation of acoustic feedback canceller is also carried out in MATLAB. The block

diagram of the general acoustic feedback canceller is shown in the Fig. 5.3 from chapter 5. It is

important to find whether near-end signals are also present with the far-end signals. This operation can

be performed by Double-Talk-Detector (DTD). Once it detects the double talk, it consequently stops

the echo canceller adaptation. Adaptive echo canceller is used to adapt the echo path impulse response

and synthesize the replica echoes and Non-linear processor is used to remove the residual echoes. In

this thesis, one far-end signal and one near-end signal are given to feedback canceller system.

Normalized Least Mean Square (NLMS) algorithm is used to cancel the echo signal and gives the

desired output signal. PESQ and Echo Return Loss Enhancement (ERLE) are calculated for the output.

Based on these values performance of feedback cancellation can be estimated.

6.1.3 Test Data

Speech Signals

The speech signals used for this thesis have 16 kHz sampling rate and each speech signal have span of

6-7 seconds. Two male voices and one female voice are used for the test. In male voices, one is used

as main speech signal and other is used as interference. The power spectral densities of all the speech

signals are shown in Fig. 6.2



Fig. 6.2 Power Spectral Density (PSD) plots of female, male and interference signals

Noise signals

Different noises sampled at 16 kHz are used in this thesis. Noises used are babble noise, wind noise,

restaurant noise and white noise [36]. Fig. 6.3 shows the power spectral plots of all noises used. All

the results are taken for one speech signal as the source and one noise signal as the disturbing noise.

The input SNR value is scaled to different values such as and based on

the formula

√(

⁄ ) (6.1)

where is the variance of speech signal and

is the variance of noise signal. in the Eq.

6.1 may be and . For example to make input signal one should put

at in Eq. 6.1, then the resultant value of is multiplied with the noise signal.



Fig. 6.3 Power Spectral Density (PSD) plots of Babble, wind, restaurant and white noise signals

6.1.4 Objective Measures

The following measures are used for calculating the performance of different beamforming

techniques.

6.1.4.1 Signal to Noise Ratio Improvement (SNRI)

It is calculated by subtracting the input SNR value from the output SNR value.

( ) (

) (

) (6.2)

where is variance of the output speech signal,

is the variance of the output

noise signal, is the variance of input speech signal and

is the variance of the

input noise signal.

6.1.4.2 Perceptual Evaluation of Speech Quality (PESQ)

It is a worldwide applied industry standard for objective voice quality testing. It is also known as

intrusive objective speech quality assessment method. It is used by telecom operators, network

equipment vendors and phone manufacturers. It is standardized as ITU-T recommendation [37] P.862

(02/01). PESQ value lies in the range between -0.5 to +4.5. In those values -0.5 indicates poor quality

and +4.5 indicates the best quality of the speech signal.



6.1.4.3 Speech and Noise Distortions

Speech distortion (SD) is given as follows

( ) ∫ | ( ) ( )|

(6.3)

Where input speech signal power, is the output speech signal power and is the normalizing

factor and is given by

∫ ( )

∫ ( )

(6.4)

Noise Distortion (ND) is also calculated in the same way as above and is given as follows

( ) ∫ | ( ) ( )|

(6.5)

Where input noise signal power, is the output noise signal power and is the normalizing

factor and is given by

∫ ( )

∫ ( )

(6.6)

6.1.4.4 Echo Return Loss Enhancement (ERLE)

The performance of echo cancellation system can be calculated from ERLE. This quantity measures

how much echo attenuation the echo canceller removed from the microphone signal. It is the ratio of

the expected value of the microphone output squared [ ( )] divided by the expected value of the

error signal squared [ ( )]. It is expressed in and given by

( ) [ ( )]

[ ( )] (6.7)

The expected value is estimated as follows

[ ]

∑ (6.8)

ERLE depends on the algorithm used for the adaptive filter. Two quantities are considered with

ERLW are near-end attenuation and convergence time.

6.2 Results

For every beamforming technique, one speech signal and one noise/interference are given from

different directions. In this report, all results are calculated for 2 microphone array setup. The distance

between microphones is considered to be varying for technique to technique. In case of Elko algorithm

the distance between microphones will depend on sampling frequency used. Whereas for other

techniques, the distance is user defined. Also in this thesis, all signals used are considered to be far-

end signals. So the direction of arrival is same for all the microphones. The results for various

beamforming techniques are as follows under different environments.



6.2.1 Elko’s Beamformer

For calculation purpose, one clean female/male speech sampled at 16 kHz is used as source speech

signal and noise/interference also sampled at 16 kHz is used as disturbing surrounding signal. At two

different situations of speech and noise, the Elko’s beamforming is evaluated with different SNR

inputs

The two different situations of the speech and noise are defined as

Situation 1: Source at 300 and Noise/Interference at 270

0

Situation 2: Source at 600 and Noise/Interference at 320

0

The distance between two microphones is 0.021375 meters. This distance is same throughout this

beamforming technique. The Table 6.1 to 6. 10 represent the values of output SNR, SNRI, PESQI, SD

and ND for input SNR values of and (from Eq. 6.1). The Table 6.1, 6.3,

6.5, 6.7 and 6.9 are for female speech signal as source and noise/interference as disturbing surrounding

noise at situations 1 & 2. The Table 6.2, 6.4, 6.6, 6.8 and 6.10 are for male speech signal as source and

noise/interference as disturbing surrounding noise at situation 1 & 2. Fig. 6.2 and 6.4 shows SNRI of

female speaker as source and noise/interference as disturbing noise at situations 1 &2. Similarly Fig.

6.3 and 6.5 shows SNRI of male speaker as source and noise/interference as disturbing noise at

situations 1 &2. Fig. 6.6 and 6.8 shows output PESQ of female speaker as source and

noise/interference as disturbing noise at situations 1 &2. Similarly Fig. 6.7 and 6.9 shows output

PESQ of male speaker as source and noise/interference as disturbing noise at situations 1 &2. Fig.

6.10 and 6.12 shows speech distortion of female speaker as source and noise/interference as disturbing

noise at situations 1 &2. Similarly Fig. 6.11 and 6.13 shows speech distortion of male speaker as

source and noise/interference as disturbing noise at situations 1 &2. Fig. 6.14 and 6.16 shows noise

distortion of female speaker as source and noise/interference as disturbing noise at situations 1 &2.

Similarly Fig. 6.7 and 6.9 shows noise distortion of male speaker as source and noise/interference as

disturbing noise at situations 1 &2.



TABLE 6.1

REPRESENTS THE SNRI, PESQI, SD AND ND FOR SPEECH (FEMALE) AS SOURCE AND INTERFERENCE (MALE) AS NOISE

Situations

Input SNR

(dB)

SNRI

(dB)

Input

PESQ

Output

PESQ

PESQI

Speech

Distortion(dB)

Noise

Distortion(dB)

At

Situation 1

(Source at 300

Interference at 2700)

0

10.9357

1.196

2.299

1.103

-36.0781

-34.1912

5

10.8593

1.422

2.561

1.139

-37.5348

-34.1972

10

10.8183

1.631

2.821

1.190

-38.1222

-34.0506

15

10.7968

1.798

3.070

1.272

-38.2939

-33.9967

20

10.7905

1.906

3.295

1.389

-38.3716

-33.9756

At

Situation 2

(Source at 600


0

8.3446

1.196

1.828

0.632

-33.7497

-34.6609

5

8.3470

1.422

2.173

0.751

-35.9319

-34.7117

10

8.3474

1.631

2.453

0.822

-37.4173

-34.3155

15

8.3466

1.798

2.699

0.901

-38.0270

-34.0798

20

8.3452

1.906

2.955

1.049

-38.2551

-34.0084

TABLE 6.2

REPRESENTS THE SNRI, PESQI, SD AND ND FOR SPEECH (MALE) AS SOURCE AND INTERFERENCE (MALE) AS NOISE

Situations

Input SNR

(dB)

SNRI

(dB)

Input

PESQ

Output

PESQ

PESQI

Speech

Distortion(dB)

Noise

Distortion(dB)

At

Situation 1

(Source at 300


0

10.9947

1.259

1.776

0.517

-36.4657

-33.9304

5

10.9443

1.444

2.100

0.656

-35.6488

-33.1790

10

10.9024

1.620

2.422

0.802

-35.3350

-32.8902

15

10.8874

1.782

2.763

0.981

-35.2148

-32.8117

20

10.8836

1.907

3.047

1.148

-35.1675

-32.7840

At

Situation 2

(Source at 600


0

8.4134

1.259

1.497

0.238

-36.33227

-36.7942

5

8.4165

1.444

1.766

0.322

-37.4096

-34.5703

10

8.4198

1.620

2.048

0.428

-35.9640

-33.3407

15

8.4195

1.782

2.351

0.569

-35.4794

-32.9687

20

8.4202

1.907

2.696

0.789

-35.2654

-32.8597



TABLE 6.3

REPRESENTS THE SNRI, PESQI, SD AND ND FOR SPEECH (FEMALE) AS SOURCE AND BABBLE NOISE AS NOISE

Situations

Input SNR

(dB)

SNRI

(dB)

Input

PESQ

Output

PESQ

PESQI

Speech

Distortion(dB)

Noise

Distortion(dB)

At

Situation 1

(Source at 300

Noise at 2700)

0

12.4497

0.871

1.862

0.991

-38.2897

-33.6469

5

12.3040

1.185

2.207

1.022

-38.4095

-33.6315

10

12.2313

1.468

2.485

1.017

-38.4024

-33.6571

15

12.1930

1.677

2.737

1.060

-38.4170

-33.6471

20

12.1863

1.825

2.961

1.135

-38.3810

-33.6350

At

Situation 2

(Source at 600

Noise at 3200)

0

8.2555

0.871

1.373

0.502

-36.9718

-33.8073

5

8.2595

1.185

1.704

0.519

-37.7770

-33.8383

10

8.2611

1.468

2.053

0.585

-38.1010

-33.7275

15

8.2623

1.677

2.360

0.683

-38.2107

-33.6768

20

8.2632

1.825

2.617

0.792

-38.2638

-33.6635

TABLE 6.4

REPRESENTS THE SNRI, PESQI, SD AND ND FOR SPEECH (MALE) AS SOURCE AND BABBLE NOISE AS NOISE

Situations

Input SNR

(dB)

SNRI

(dB)

Input

PESQ

Output

PESQ

PESQI

Speech

Distortion(dB)

Noise

Distortion(dB)

At

Situation 1

(Source at 300

Noise at 2700)

0

12.5834

1.068

1.305

0.237

-35.1829

-36.5838

5

12.4977

1.297

1.381

0.084

-34.9857

-36.0664

10

12.4150

1.537

1.769

0.232

-34.9617

-36.0119

15

12.3656

1.717

2.303

0.586

-35.0247

-36.0926

20

12.3384

1.817

2.699

0.882

-35.0702

-36.1500

At

Situation 2

(Source at 600

Noise at 3200)

0

8.3180

1.068

1.095

0.027

-36.3369

-38.3613

5

8.3175

1.297

1.336

0.039

-35.7522

-37.2788

10

8.3163

1.537

1.718

0.181

-35.3216

-36.5124

15

8.3167

1.717

2.173

0.456

-35.1768

-36.2936

20

8.3160

1.817

2.525

0.708

-35.1548

-36.2599



TABLE 6.5

REPRESENTS THE SNRI, PESQI, SD AND ND FOR SPEECH (FEMALE) AS SOURCE AND WIND NOISE AS NOISE

Situations

Input SNR

(dB)

SNRI

(dB)

Input

PESQ

Output

PESQ

PESQI

Speech

Distortion(dB)

Noise

Distortion(dB)

At

Situation 1

(Source at 300

Noise at 2700)

0

14.4061

1.030

1.947

0.917

-38.6039

-33.7848

5

14.3444

1.171

2.224

1.053

-38.4634

-33.6941

10

14.2944

1.444

2.484

1.040

-38.3882

-33.6597

15

14.2574

1.658

2.736

1.078

-38.4077

-33.6552

20

14.2468

1.804

3.018

1.214

-38.3748

-33.6532

At

Situation 2

(Source at 600

Noise at 3200)

0

8.3013

1.030

1.456

0.426

-37.9234

-34.4846

5

8.3065

1.171

1.762

0.591

-38.4399

-34.0236

10

8.3077

1.444

2.066

0.622

-38.4049

-33.7931

15

8.3117

1.658

2.350

0.692

-38.4192

-33.7094

20

8.3133

1.804

2.584

0.780

-38.3200

-33.6661

TABLE 6.6

REPRESENTS THE SNRI, PESQI, SD AND ND FOR SPEECH (MALE) AS SOURCE AND WIND NOISE AS NOISE

Situations

Input SNR

(dB)

SNRI

(dB)

Input

PESQ

Output

PESQ

PESQI

Speech

Distortion(dB)

Noise

Distortion(dB)

At Situation 1

(Source at 300 Noise at 2700)

0

14.6607

1.059

1.644

0.585

-35.3041

-34.4374

5

14.5418

1.389

1.608

0.219

-35.1665

-34.2777

10

14.4687

1.584

1.932

0.348

-35.0613

-34.2062

15

14.4161

1.739

2.455

0.716

-35.0989

-34.1991

20

14.3858

1.831

2.883

1.052

-35.1424

-34.2110

At

Situation 2

(Source at 600

Noise at 3200)

0

8.3655

1.059

1.304

0.245

-37.7057

-35.6552

5

8.3676

1.389

1.601

0.212

-36.3031

-34.7767

10

8.3683

1.584

1.926

0.342

-35.4315

-34.3773

15

8.3692

1.739

2.282

0.543

-35.2292

-34.2645

20

8.3704

1.831

2.550

0.719

-35.1376

-34.2174



TABLE 6.7

REPRESENTS THE SNRI, PESQI, SD AND ND FOR SPEECH (FEMALE) AS SOURCE AND RESTAURANT NOISE AS NOISE

Situations

Input SNR

(dB)

SNRI

(dB)

Input

PESQ

Output

PESQ

PESQI

Speech

Distortion(dB)

Noise

Distortion(dB)

At

Situation 1

(Source at 300

Noise at 2700)

0

8.0629

0.806

1.472

0.666

-36.4251

-32.2789

5

7.8475

1.037

1.895

0.858

-37.9220

-32.8433

10

7.7459

1.288

2.188

0.900

-38.2570

-32.6254

15

7.6841

1.537

2.500

0.963

-38.4097

-32.5546

20

7.6530

1.729

2.733

1.004

-38.3679

-32.4935

At

Situation 2

(Source at 600

Noise at 3200)

0

8.2423

0.806

1.204

0.398

-33.0848

-35.3596

5

8.2432

1.037

1.466

0.429

-35.5155

-33.8034

10

8.2448

1.288

1.822

0.534

-37.2457

-32.9380

15

8.2433

1.537

2.312

0.775

-38.0376

-32.6448

20

8.2444

1.729

2.595

0.886

-38.2522

-32.5364

TABLE 6.8 REPRESENTS THE SNRI, PESQI, SD AND ND FOR SPEECH (MALE) AS SOURCE AND RESTAURANT NOISE AS NOISE

Situations

Input SNR

(dB)

SNRI

(dB)

Input

PESQ

Output

PESQ

PESQI

Speech

Distortion(dB)

Noise

Distortion(dB)

At

Situation 1

(Source at 300

Noise at 2700)

0

8.2085

0.978

1.038

0.060

-36.0993

-30.9708

5

8.0532

1.123

1.361

0.238

-35.2329

-30.2809

10

7.9271

1.316

1.754

0.438

-34.9608

-30.0922

15

7.8722

1.551

2.277

0.726

-34.9657

-30.0722

20

7.8055

1.738

2.655

0.917

-35.0692

-30.1035

At

Situation 2

(Source at 600

Noise at 3200)

0

8.3153

0.978

1.295

0.317

-35.3767

-33.1312

5

8.3140

1.123

1.659

0.536

-37.2125

-31.2182

10

8.3122

1.316

2.025

0.709

-35.8836

-30.4452

15

8.3097

1.551

2.353

0.802

-35.4144

-30.2245

20

8.3083

1.738

2.660

0.922

-35.2035

-30.1434



TABLE 6.9

REPRESENTS THE SNRI, PESQI, SD AND ND FOR SPEECH (FEMALE) AS SOURCE AND WHITE NOISE AS NOISE

Situations

Input SNR

(dB)

SNRI

(dB)

Input

PESQ

Output

PESQ

PESQI

Speech

Distortion(dB)

Noise

Distortion(dB)

At

Situation 1

(Source at 300

Noise at 2700)

0

5.6917

0.986

1.565

0.579

-34.4164

-31.4655

5

5.4389

1.198

1.995

0.797

-36.7326

-31.1596

10

5.3033

1.445

2.287

0.842

-37.7692

-30.9225

15

5.1931

1.669

2.575

0.906

-38.1597

-30.7898

20

5.1653

1.824

2.808

0.984

-38.3356

-30.7571

At Situation 2

(Source at 600 Noise at 3200)

0

8.2102

0.986

1.282

0.296

-33.0843

-31.7454

5

8.2061

1.198

1.587

0.389

-35.4668

-31.4975

10

8.2015

1.445

1.918

0.473

-37.2276

-30.9951

15

8.2006

1.669

2.223

0.554

-37.9338

-30.8252

20

8.2000

1.824

2.519

0.695

-38.2335

-30.7644

TABLE 6.10

REPRESENTS THE SNRI, PESQI, SD AND ND FOR SPEECH (MALE) AS SOURCE AND WHITE NOISE AS NOISE

Situations

Input SNR

(dB)

SNRI

(dB)

Input

PESQ

Output

PESQ

PESQI

Speech

Distortion(dB)

Noise

Distortion(dB)

At

Situation 1

(Source at 300

Noise at 2700)

0

5.8056

1.013

1.450

0.437

-34.4533

-32.3513

5

5.6054

1.108

1.534

0.426

-34.8061

-31.8982

10

5.4555

1.321

1.821

0.500

-35.0399

-31.4742

15

5.3857

1.579

2.230

0.651

-35.0218

-31.3314

20

5.3244

1.755

2.595

0.840

-35.0673

-31.2945

At

Situation 2

(Source at 600

Noise at 3200)

0

8.2843

1.013

1.102

0.089

-33.4520

-33.2271

5

8.2769

1.108

1.421

0.313

-34.9358

-32.2822

10

8.2687

1.321

1.816

0.495

-35.2965

-31.6069

15

8.2665

1.579

2.208

0.629

-35.2293

-31.3804

20

8.2641

1.755

2.629

0.874

-35.1453

-31.3094



Fig. 6.4 SNRI for female speaker at angle of 300 and Noise/Interference at angle of 2700

Fig. 6.5 SNRI for male speaker at angle of 300 and Noise/Interference at angle of 2700



Fig. 6.6 SNRI for female speaker at angle of 600 and Noise/Interference at angle of 3200

Fig. 6.7 SNRI for male speaker at angle of 600 and Noise/Interference at angle of 3200



Fig. 6.8 Output PESQ for female speaker at angle of 300 and Noise/Interference at angle of 2700

Fig. 6.9 Output PESQ for male speaker at angle of 300 and Noise/Interference at angle of 2700



Fig. 6.10 Output PESQ for female speaker at angle of 600 and Noise/Interference at angle of 3200

Fig. 6.11 Output PESQ for male speaker at angle of 600 and Noise/Interference at angle of 3200



Fig. 6.12 Speech Distortion for female speaker at angle of 300 and Noise/Interference at angle of 2700

Fig. 6.13 Speech Distortion for male speaker at angle of 300 and Noise/Interference at angle of 2700



Fig. 6.14 Speech Distortion for female speaker at angle of 600 and Noise/Interference at angle of 3200

Fig. 6.15 Speech Distortion for male speaker at angle of 600 and Noise/Interference at angle of 3200



Fig. 6.16 Noise Distortion for female speaker at angle of 300 and Noise/Interference at angle of 2700

Fig. 6.17 Noise Distortion for male speaker at angle of 300 and Noise/Interference at angle of 2700



Fig. 6.18 Noise Distortion for female speaker at angle of 600 and Noise/Interference at angle of 3200

Fig. 6.19 Noise Distortion for male speaker at angle of 600 and Noise/Interference at angle of 3200



From the Table 6.1 to 6.10, situation 1 performs better results than situation 2 except for the white

noise. At situation 1, the Elko beamforming system gives 10.5dB SNR improvement for female and

male speech when interference is considered as disturbing noise. Similarly the beamforming technique

provides SNR improvement of 12.5dB for babble noise, 14.4dB for wind noise, 8dB for restaurant

noise and 5.5 dB for white noise. For situation 2, the Elko beamforming technique provides SNR

improvement of nearly 8.2 dB for all noises.

6.2.2 Wiener Beamformer

The wiener beamformer operation is performed for source and interference/noise which are coming

from different directions. In this report, female/male speech signal sampled at 16 kHz is used as source

signal. The interference/noise from noise data is also sampled at 16 kHz. The Table 6.11 shows the

average SNRI, SD and ND of different noises for the wiener beamformer (2 microphones setup) when

female speech is source signal. The SNRI value in wiener beamformer depends on number of

microphones used. Larger the number of microphones, higher will be the SNRI. But in this report,

wiener beamformer is designed for 2 microphone case. The detailed results of wiener beamformer are

found in [38].

TABLE 6.11

REPRESENTS THE SNRI, PESQI, SD AND ND FOR WIENER BEAMFORMER (2-MICROPHONE CASE) [38]

Noise/Interference

Average SNRI

(dB)

Output PESQ

PESQI

Speech

Distortion(dB)

Noise

Distortion(dB)

Interference

6.50

2.653

0.535

-34.9631

-29.5718

Babble Noise

Wind Noise

5.54

5.43

2.420

2.660

0.731

0.483

-42.1800

-44.1200

-28.0190

-28.0360

Restaurant Noise 2.91 2.990 0.647 -42.4360 -27.7320

White Noise 4.34 3.120 1.444 -42.0900 -26.9800

6.2.3 Max-SNIR Beamformer

The working of Max-SNIR beamformer is same as wiener beamformer. The only difference is in

update weight equation. The Max-SNIR beamformer is also performed for source and

interference/noise which are from different directions. Same as wiener beamformer, all the signals are

taken and all the results are noted for 2 microphone set up. The Table 6.12 shows average SNRI, SD

and ND of different noises for the Max-SNIR beamformer (2 microphones setup) when female speech

is source signal. The detailed results of Max-SNIR are found in [39].



TABLE 6.12

REPRESENTS THE SNRI, PESQI, SD AND ND FOR MAX-SNIR BEAMFORMER (2-MICROPHONE CASE) [39]

Noise/Interference

Average SNRI

(dB)

Output PESQ

PESQI

Speech

Distortion(dB)

Noise

Distortion(dB)

Interference

262.95

2.567

1.254

-26.5195

-26.5226

Babble Noise

Wind Noise

258.74

199.36

4.326

4.354

1.412

1.566

-25.9525

-25.9253

-25.9389

-25.8956


White Noise 272.95 4.369 1.284 -26.8910 -26.0279

6.2.4 Delay and Sum Beamformer

The implementation of DSB is much simpler than any other beamformer. The DSB is also performed

for two microphone set up. For this arrangement, female/male signal sampled at 16 kHz as source and

noise/interference from noise data also sampled at 16 kHz as disturbing noise are given. The speech

and noise/interference signals are applied to this arrangement from different directions. The Table 6.13

shows the average SNRI, SD and ND of different noise sources for DSB arrangement when female

speech is source signal. The detailed results of DSB are found in [41].

TABLE 6.13 REPRESENTS THE SNRI, PESQ, SD AND ND FOR DELAY AND SUM BEAMFORMER (2-MICROPHONE CASE) [41]

Noise/Interference

Average SNRI

(dB)

Output PESQ

PESQI

Speech

Distortion(dB)

Noise

Distortion(dB)

Interference

0.3326

1.515

0.071

-36.3709

-36.7602

Babble Noise

Wind Noise

0.2612

0.4047

0.700

0.590

0.074

0.054

-36.1267

-35.0690

-41.3865

-44.2561


White Noise 0.8245 0.779 0.110 -31.8565 -35.7871

From the Table 6.13, we can observe that the SNRI values are very less for DSB. This is because of

number of microphones used for beamforming. When we increase the microphone number, the SNRI

will also changes. Each doubling of the number of microphones will provide at most an additional 3

dB increase in SNR.

6.2.5 Comparison

Since all the beamformers in this report are designed for hearing aid system, the number of

microphones and spacing between the microphones are limited. The comparison between all the

beamformers is based on SNRI, PESQ, SD and ND under different environments. In all noise

environments Elko’s beamformer performs better than wiener and Delay and Sum beamformer. But



Max- SNIR beamformer provides best results than all other three beamformers. Optimal Max-SNIR

beamformer is designed to produce maximum SNR improvement. This beamformer is best used to

give maximum signal to noise ratio (SNR) and as it gives maximum SNIR the quality of speech signal

(PESQ) is not so better than other beamformer. i.e wiener beamformer. So the SNRI values of Max-

SNIR beamformer are very high compared to other beamformers. Since the averages SNRI is very

high, it is considered to be the best beamformer. Fig. 6.18 shows the average SNRI of different

beamformers at different situations.

Fig. 6.20 Average SNRI of different beamformers at various situations

6.2.6 Echo cancellation with NLMS algorithm

In this thesis, acoustic echo cancellation is also implemented in MATLAB. A simple NLMS algorithm

with is used for this AEC purpose. One speech signal and one echo version of corresponding

speech signal are taken for AEC system. The echo signal is generated from room impulse response

[40] and it is added with random noise. The combined echo signal with noise is used as input signal

for AEC system. For simple computations, the NLMS operation is performed for five filter orders. For

every filter order the error signal, adaptive filtered output and ERLE are noted. The Table 6.14 shows

the ERLE values for different filter orders.



TABLE 6.14

ERLE VALUES FOR DIFFERENT FILTER ORDERS

Filter Order

ERLE

(dB)

10

18.0716

15

20

18.1286

18.0143

25 18.0479

30 17.9513

The average ERLE for all the orders is around 18.05 dB [42]. The NLMS algorithm gives very small

estimated error and also large average ERLE value. So it is one of the best adaptive algorithms

recommended for acoustic echo cancelation. In general, ERLE of NLMS is larger than the LMS. Fig.

6.19 shows ERLE plot for NLMS algorithm.

Fig. 6.21 ERLE plot for NLMS algorithm

NLMS function used in AEC provides estimated error, adaptive filtered output and filter weights. In

order to get estimated error as echoed speech signal, we have to take the desired signal as random

noise. The adaptive filter output is the output after the filter coefficients gets multiplied with the input

signal. So after multiplying with filter coefficients filter output nearly closed to desired signal. The Fig

6.20 shows the signals in NLMS: desired signal, output signal and error signal. By giving random

noise as desired signal and echo with random noise as input signal for AEC, we can get error as echo

signal. So we can say that NLMS algorithm provides best result for echo cancellation.



Fig. 6.22 Plot the needed signals (NLMS algorithm) in turn are: desired signal, output signal and error signal

Conclusion and Future work

Blekinge Institute of Technology 58 Conclusion and Future work

Chapter 7


7.1 Conclusion

This thesis is focused on the enhancement of speech signal from noisy speech signal with the help of

different beamformers. Four beamformers named as Elko, Wiener, Max- SNIR and Delay and Sum

implemented successfully. The performance of the each beamformer is measured under five noisy

environments named interference, babble, wind, restaurant and white. The quality of the output signal

can be calculated with objective metrics such as SNRI, PESQI, SD and ND. From all the results

obtained, it can be concluded that all beamformers increases the SNR of the output signal. For any

particular noisy environment, this SNR improvement varied from beamformer to beamformer. All the

results are calculated for 2 microphone case. In this thesis, acoustic echo cancellation can also be

implemented successfully. The performance of the AEC can be measured with objective metric such

as ERLE. From the AEC results, it can be concluded that the system provides satisfactory results.

This report concentrates more on performance on Elko’s beamformer. From the chapter 6, at

situation1 Elko’s beamformer provides improvement in SNR of 10.5 dB for the interference noise

signal when female/male speech as the source. Also it provides PESQ of 3.2 to 3.7 for the same

interference. For babble noise as the noise signal, Elko’s beamformer provides SNRI of 12.5 and

PESQ value of 2.8 to 3.1. In the same way it provides 14.24dB SNRI and 2.9 PESQ for wind noise,

7.64dB SNRI and 2.8 PESQ for restaurant noise, 5.14dB SNRI and 2.9 PESQ for white noise. When

compared with the other beamformers, elko’s beamformer provides better PESQ value for interference

noise source. It also provides very good results of SNRI, SD, ND and PESQ for all noise sources when

compared with wiener and DSB. The designed system provides SD of -38dB and ND of -33 to -28dB

for all noises. At situation2 the system provides SNRI of 8.2dB for all noises with acceptable SD and

ND. From the results, we can conclude that better SNRI provides better speech quality, SD and better

PESQ.

The other beamformers such as wiener, Max-SNIR, DSB are also implemented successfully in

MATLAB offline mode [38, 39, 41]. All the beamformers are compared with objective metrics such

as SNRI, PESQI, SD and ND under all noise situations. Out of all beamformers Max-SNIR provides

best result, and Delay and Sum beamformer provides poor result. Since the average SNRI is very high


Conclusion and Future work 59 Blekinge Institute of Technology

in case of Max-SNIR beamformer, it is considered to be best beamformer. In all type of noise

environments the speech intelligibility is best achieved with Max-SNIR beamformer. Echo

cancellation with NLMS algorithm is also implemented successfully in MATLAB off line mode.

NLMS algorithm provides ERLE of 18.05dB and it is large value when compared with LMS adaptive

algorithm. Also NLMS algorithm provides very less computational complexity. For the better view

the results are shown in tables and graphs.

7.1 Future work

In this thesis, the elko’s beamformer is implemented in time domain under anechoic environment. And

also it is implemented in MATLAB offline mode. So in future the elko’s beamformer should

implement in real time under echoic environment. It will also implement in frequency domain in order

to get exact results. Also acoustic echo cancellation should implement for other adaptive algorithms in

order to get best results.

Bibliography

Blekinge Institute of Technology 60 Bibliography

BIBLIOGRAPHY

[1] N. Grbic, “Optimal and Adaptive Subband Beamforming, Principles and Applications,” Doctoral

Dissertation Series No. 2001:01, ISSN: 1650-2159, Blekinge Institute of Technology, 2001.

[2] S. Nordebo, S. Nordholm, B. Bengtsson, I. Claesson, “Noise Reduction Using an Adaptive Microphone

Array in a Car-A speech Recognition Evaluation,” in Proc. IEEE Workshop on Applications of signal

Processing to Audio and Acoustics, New Paltz, NY, USA, Oct. 1993.

[3] M. Brandstein, D. Ward, “Microphone Array Signal Processing Techniques and Applications,” Ed. New

York: Springer, 2010.

[4] Z. Yermeche, “Soft-Constrained Subband Beamforming for Speech Enhancement,” Doctoral Dissertation

Series No 2007:14, ISSN 1653-2090, Blekinge Institute of Technology, 2007.

[5] N. Grbic, S. Nordholm, “Soft Constrained Subband Beamforming for Hands-Free Speech Enhancement,” in

IEEE International Conference on Acoustics, Speech and Signal Processing Proceedings, vol. 1, pp. 885-

888, May 2002.

[6] N. Westerlund, N Grbic, M. Dahl, “Subband Adaptive Feedback Control in Hearing Aids with Increased

Used Comfort,” Research Report No. 2006:01, ISSN: 1103-1581, Blekinge Institute of Technology, 2006.

[7] G. M. Clark, “University of Melbourne-Nucleus Multi-Electrode Cochlear Implant,” Karger, New York,

USA, 1987.

[8] A. B. Hamida, “Implication of New Technologies in Deafness Healthcare: Deafness Rehabilitation Using

Prospective Design of Hearing Aid Systems,” in IEEE International Symposium on Technology and Society,

pp. 85-90, 2000.

[9] U. Suat, F. G. Zeng, B, J. Sheu, “Hearing with Bionic Ears, Speech processing strategies for Cochlear

Implant Devices,” in IEEE International Conference on Circuits and Devices, May 1997.

[10] R. Naik, A. Stojcevski, V. Vibhute, J. Singh, “Implementation of Magnitude Estimation Algorithm for

Hearing Aid,” in IEEE International Workshop on Biomedical Circuits and Systems, 2004.

[11] S. Arlinger, A. Leijon, “Hearing Aids for Adults Benefits and Costs,” in The Swedish council on Technology

Assessment in Health Care, May 2003.

[12] A. Vonlanthen, Hearing instrument technology for the hearing healthcare professional, Singular Publishing

Group, 2000, ISBN 0-7693-0072-3.

[13] T. I. Laakso, V. Välimäki, M. Karjalainen, and U. K. Laine, “Splitting the unit delay-tools for fractional

delay filter design,” IEEE Signal Processing Mag., vol. 13, no.1, pp.30-60, Jan. 1996.

Bibliography

Bibliography 61 Blekinge Institute of Technology

[14] V. Välimäki, T. I. Laakso, “Fractional delay filters-design and applications,” in Theory and Applications of

Non-uniform Sampling, F. Marvasti (ed.), New York: Plenum/Kluwer, 2000.

[15] V. Välimäki, T. I. Laakso, “Principles of Fractional Delay Filters,” in IEEE International Conference on

Acoustics, Speech, and Signal Processing, (ICASSP’00), Istanbul, Turkey, June 2000.

[16] M. Lang, T. I. Laakso, “Simple and robust method for the design of allpass filters using least-squares phase

error criterion,” IEEE Trans. Circ. Syst.-Part II, vol. 41, no. 1, pp.40-48, Jan. 1994.

[17] J. –P. Thiran, “Recursive digital filters with maximally flat group delay,” IEEE Trans. Circ. Theory, vol. 18,

no. 6, pp.659-664, 1971.

[18] A. Fettweis, “A simple design of maximally flat delay digital filters,” IEEE Trans. Audio and

Electroacoust., vol. 20, no. 2, pp. 112-114, 1972.

[19] V. Välimäki, “Simple design of fractional delay allpass filters,” Helsinki University of Technology,

Laboratory of Acoustics and Audio Signal Processing, Espoo, Finland, 1994. Available at

http://www.acoustics.hut.fi/~vpv/[email protected]

[20] L. Savioja, J. Huopaniemi, T. Lokki, and R. Väänänen, “Creating interactive virtual acoustic environments,”

Journal of the Audio Engineering Society, vol. 47, no. 9, pp. 675-705, 1999.

[21] M. Kleiner, B. Dalenbäck, and P. Svensson, “Auralization – an overview,” Journal of the Audio

Engineering Society, vol. 41, no. 11, pp. 861-875, Nov. 1993.

[22] A. Pietrzyk, “Computer modeling of the sound field in small rooms,” in Proc. of the 15th AES Int. Conf. on

Audio, Acoustics and Small Spaces, vol. 2, Copenhagen, Denmark, Oct. 1998, pp. 24-31.

[23] J. Allen and D. Berkley, “Image Method for Efficiently Simulating Small Room Acoustics,” Journal of the

Acoustical Society of America, vol. 65, no. 4, pp. 943-950, 1979.

[24] A. Kulowski, “Algorithmic representation of the ray tracing technique,” Applied Acoustics, vol. 18, no. 6,

pp. 449-469, 1985.

[25] Basics of Beamformer [Online] Available: www.umiacs.umd.edu/~vikas/projects/enee624.doc

[26] J. E. Hudson, “Adaptive Array Principles,” Peter Peregrinus Ltd., 1991, ISBN 0-86341-247-5.

[27] R. A. Monzingo, T. W. Miller, “Introduction to adaptive arrays,” John Wiley and Sons, New York, 1980.

[28] S. Haykin, “Adaptive Filter Theory,” Prentice Hall Int. Inc., 1996, ISBN 0-13-397985-7.

[29] G. W. Elko, “A Simple Adaptive First-Order Differential Microphone,” in Acoust. And Speech Research

Dept. Bell Labs, Lucent Technologies, Murray Hill, NJ, Aug. 1999.

[30] G. W. Elko, “Superdirectional Microphone Arrays,” in Acoustic Signal Processing for Telecommunication,

J. Benesty and S. L. Gay (eds.), pp. 181-236, Kluwer Academic Publishers, 2000.

http://www.acoustics.hut.fi/~vpv/[email protected]

http://www.umiacs.umd.edu/~vikas/projects/enee624.doc

Bibliography

Blekinge Institute of Technology 62 Bibliography

[31] G. W. Elko, H. Teutsch, “An Adaptive Close-Talking Microphone Array,” in IEEE Workshop on

Applications of Signal Processing to Audio and Acoustics, Mohonk, USA, 2001.

[32] G. W. Elko, H. Teutsch, “First and Second Order Adaptive Differential Microphone Arrays,” in Acoust. And

Speech Research Dept. Bell Labs, Lucent Technologies, Murray Hill, NJ, Aug. 1999.

[33] M. G. Siqueira, A. Alwan, “Steady-State Analysis of Continuous Adaptation in Acoustic Feedback

Reduction Systems for Hearing – Aids,” in IEEE Transactions on Speech and Audio Processing, vol.8, no.

4, July 2000.

[34] M. G. Siqueira, A. Alwan, R. Speece, E. Petsalis, “Subband Adaptive Filtering Applied to Acoustic

Feedback Reduction in Hearing Aids,” in IEEE International 30th Asilomar Conference on Signals, Systems

and Computers, vol. 1, pp.788-792, 1996.

[35] Basics of Beamformer [Online] Available: http://en.wikipedia.org/wiki/Beamforming

[36] Noisex-92 database, taken from Signal Process. Inform. Base. [Online]. Available

http://spib.rice.edu/spib/select_noise.html

[37] P. Stefan, T. Uhl, “Quantifying the Suitability of Reference Signals for the PESQ Algorithm,” in Third Int.

Conf., on Commun. Theory, Rel. and Quality of Service, June 2010, pp. 110-115.

[38] V. Santhurenu, “Performance analysis of Speech Enhancement methods in hands free communication with

emphasis on wiener beamformer,” M. S. Thesis, Dept. of Signal Processing, Blekinge Institute of

Technology (BTH), Blekinge, Sweden, 2012.

[39] M. Harish, “Speech Enhancement in Hands-free Speech Communication with emphasis on Max-SNR

Beamformer,” M. S. Thesis, Dept. of Signal Processing, Blekinge Institute of Technology (BTH), Blekinge,

Sweden, 2012.

[40] A. Palanki, “Simulation of Microphone Inaccuracies and Multi-channel Speech Enhancement using

Beamformers in Reverberant Environment ,” M. S. Thesis, Dept. of Signal Process., Blekinge Institute of

Technology (BTH), Blekinge, Sweden, 2012.

[41] L. K. Gudipudi, “Enhancement of Speech Intelligibility using Beamforming Techniques,” M. S. Thesis,

Dept. of Signal Process., Blekinge Institute of Technology (BTH), Blekinge, Sweden, 2012.

[42] K. S. Patel, “Performance Analysis of Adaptive Algorithms based on different parameters Implemented for

Acoustic Echo Cancellation in Speech Signals,” M. S. Thesis, Dept. of Signal Processing, Blekinge Institute

of Technology (BTH), Blekinge, Sweden, 2012.

[43] H. N. Nguyen, M. Dowlatnia, A. Sarfraz, “Implementation of the LMS and NLMS algorithms for Acoustic

Echo Cancellation in teleconference system using MATLAB,” M. S. Thesis Report: 09087, ISSN 1650-

2647, Växjö University, 2009.

http://en.wikipedia.org/wiki/Beamforming

http://spib.rice.edu/spib/select_noise.html

Speech Enhancement in Hands-Free Device (Hearing Aid) with ...830715/FULLTEXT01.pdf · Speech...

Documents

Transcript of Speech Enhancement in Hands-Free Device (Hearing Aid) with ...830715/FULLTEXT01.pdf · Speech...