A Real-time Implementation of the Primary Auditory Neuron ...

Copyright © 2012, Ram Kuber Singh

A Real-time Implementation of the Primary Auditory Neuron Activities

by

Ram Kuber Singh

A thesis submitted in fulfilment of the

requirements for the degree of

MASTER OF ENGINEERING (HONOURS)

Supervisor: Professor André van Schaik

Co-Supervisors: Professor Jonathan Tapson

Dr. Ranjith Liyanapathirana

Bioelectronics and Neuroscience, The MARCS Institute

University of Western Sydney

Sydney, Australia

December 2012

Statement of Authentication

The work presented in this thesis is, to the best of my knowledge and belief except as

acknowledged in the text. I hereby declare that I have not submitted this material, either in

full or in part, for a degree at this or any other institution.

-----------------------------------------------------------------------------

(Signature)

To Sri Krishna, Srila Prabhupada and Sri Krishna Mandir

Acknowledgements

I would like to take this opportunity to thank my principal supervisor, Professor André

van Schaik for his invaluable advice. His guidance through every step of this interdisciplinary

project has allowed me to expand my understanding of the project and in the field of

neuroscience. Through my endeavour of this project, he has showed much patience in my

attempt to progressively generate results. I would also like to thank the committee member

of my supervisory panel, Professor Jonathon Tapson for asking questions to help me think

deeper through my work as well as his advice. I also wish to thank Dr Ranjith

Liyanapathirana, who is a committee member of my supervisor panel and who arranged my

first meeting with Professor André. He constantly encouraged me to be persistently

dedicated to my work and offered advice on administrative affairs as well as helping me to

obtain a teaching position in the university. It has truly been an honour to work with all my

supervisors.

I wish also to thank Professor Ray Meddis, the original author of the MAP model, for

updating me with the latest version of the model he is working with. He also got out of his

way in providing me with elaborate technical instructions to attain the desired results with

model. I wish to show my gratitude and thanks to James Wright for his technical support,

advice on optimisation and thesis correction. I want to say thank you to my colleagues at the

BENS group in Marcs Institute, Mark Wang and Gregory Cohen for their association.

I would like to give a special thank you to my sister, Lajwanti and her husband,

Joseph and their family for allowing me to be part of their family and also for their care and

support as well as for tolerating my nuisance acts. A very special thank you goes out to my

mother, who besides offering care and being a pillar of support, went an extra mile to listen

to my concerns and encouraging me throughout the project as well as ensuring I was well

fed. And finally, I wish to offer my transcendental and Hare Krishna thanks to all at Sri

Krishna Mandir who brought out the good qualities in me however little and insignificant they

may be that was amplified during my undertaking of this project.

i

Abstract

Computational models of the auditory pathway simulate the different stages of the

auditory periphery, which includes the outer, middle and inner ear stages. The studies of the

levels of the auditory pathway beyond the inner ear stages require the availability of data

primarily of the cochlea response. If the computational model of the auditory pathway

simulates the cochlea responses slowly, the responses of the higher levels of the auditory

pathway will also be slow. Hence, a real-time computational model of the auditory pathway

provides the capability to study higher levels of sound perception studies in the field of

computational neuroscience by providing on-the-fly or immediate responses of the cochlea

within the auditory pathway.

In this thesis, the development of a real-time computational model of an auditory

pathway is discussed. A review of five auditory pathway computer models is presented and

a model is selected for implementation into a real-time computer model. The transition from

the original model to a real-time implementation includes a translation to C language before

being integrated with JUCE, a C++ graphical user interface library. The input signals in the

real-time model are generated either through a software sine tone generator or acquired

from a microphone channel on a computer. As part of cochlea simulation, the algorithms are

divided to generate responses in channels. A large number of channels results in a finer

resolution of spectral projection of the cochlea response. To achieve the optimum number of

channels in real-time, POSIX threads are used to achieve computing parallelism. As part of

optimisation to load more channels, mathematical optimisation is studied and utilised in the

real-time model.

It will be demonstrated in this thesis that the RMS errors of the responses of the

developed real-time computer model of the auditory pathway as opposed to the original

model measures below 1% and its maximum load is dependent on the computer it runs on.

On a laptop with a dual core CPU, the real-time model is able to simulate 85 channels of the

basilar membrane displacement whereas a desktop with a quicker dual core CPU model

accommodates twice as many channels. With math optimisation enabled, there is 13% and

8% increase in the computation of channels for the laptop and desktop respectively.

However, the RMS errors of the real-time model with math optimisation enabled and the

original model increases to 8% due to approximation errors.

ii

Table of Contents

Acknowledgements ............................................................................................................... iv

Abstract.................................................................................................................................. i

Table of Contents ...................................................................................................................ii

Abbreviations ........................................................................................................................ v

List of Figures ...................................................................................................................... vii

List of Tables ...................................................................................................................... xiv

Code Listings ....................................................................................................................... xv

Chapter 1: Introduction ...................................................................................................... 1

1.1 Motivation ............................................................................................................... 1

1.2 Statement of the Problem ....................................................................................... 2

1.3 Objective of the Research ....................................................................................... 2

1.4 Thesis Outline ......................................................................................................... 3

Chapter 2: Literature Review ............................................................................................. 4

2.1 Auditory Pathway Models........................................................................................ 4

2.1.1 Cooke Periphery Auditory Model...................................................................... 4

2.1.2 Auditory Image Model (AIM2006) ..................................................................... 6

2.1.3 Multiple-Bandpass-Nonlinear (MBPNL) Filterbank ........................................... 9

2.1.4 The Model of Carney and Colleagues ............................................................ 10

2.1.5 Matlab Auditory Periphery (MAP) Model ........................................................ 12

2.1.6 Model Selection ............................................................................................. 13

2.2 The MAP Model of the Human Auditory Pathway ................................................. 15

2.2.1 Outer and Middle Ear ..................................................................................... 16

2.2.2 Basilar Membrane .......................................................................................... 20

2.2.3 Inner Hair Cell ................................................................................................ 26

2.2.4 Neurotransmitter Release .............................................................................. 31

2.2.5 Auditory Nerve ............................................................................................... 35

2.4 Summary .............................................................................................................. 36

Chapter 3: C Representation of MAP .............................................................................. 37

3.1 Buffer Management and Program Structure .......................................................... 38

3.1.1 Buffer Structure .............................................................................................. 38

3.1.2 Algorithm Structure ........................................................................................ 39

3.1.3 Program Structure ......................................................................................... 40

3.2 Parameters Setup ................................................................................................. 41

3.3 IIR Filter ................................................................................................................ 42

iii

3.3.1 Background ................................................................................................... 42

3.3.2 Implementation .............................................................................................. 43

3.4 Outer and Middle Ear ............................................................................................ 46

3.5 Basilar Membrane ................................................................................................. 50

3.6 Inner Hair Cell Receptor Potential ......................................................................... 56

3.7 Neurotransmitter Release Rate ............................................................................. 57

3.8 Auditory Nerve Spiking Probability ........................................................................ 60

3.9 Characteristic Responses for Various Input Settings ............................................. 65

3.10 Summary .............................................................................................................. 69

Chapter 4: Real-time Auditory Periphery (RTAP) ............................................................ 70

4.1 User Interface (UI) ................................................................................................ 71

4.2 Process Priority ..................................................................................................... 75

4.3 Structure and Settings .......................................................................................... 75

4.3.1 Class Structure .............................................................................................. 75

4.3.2 Input Settings ................................................................................................. 77

4.4 Sine Tone Generator ............................................................................................ 78

4.5 Threading ............................................................................................................. 79

4.5.1 Background ................................................................................................... 79

4.5.2 Implementation .............................................................................................. 81

4.5.3 Results........................................................................................................... 85

4.6 Response Plots ..................................................................................................... 87

4.7 Recording Feature ................................................................................................ 89

4.7.1 File Write Command Selection ....................................................................... 89

4.7.2 Binary File Format ......................................................................................... 90

4.7.3 File Writer Thread .......................................................................................... 92

4.7.4 Binary File Recording ..................................................................................... 93

4.7.5 Offline Formatting and Text File Generation .................................................. 95

4.7.6 Results........................................................................................................... 97

4.8 Summary .............................................................................................................. 98

Chapter 5: Signals Display in RTAP .................................................................................. 99

5.1 Static Plot Display ................................................................................................. 99

5.1.1 Line Drawing .................................................................................................. 99

5.1.2 Resource Management ................................................................................ 101

5.1.3 Pixels Render and Image Display Threads .................................................. 102

5.1.4 ERB Scaled Plots ........................................................................................ 103

5.1.5 Spectrogram Plots ....................................................................................... 105

iv

5.2 Scrolling Plot Display .......................................................................................... 112

5.2.1 Background ................................................................................................. 112

5.2.2 Implementation ............................................................................................ 113

5.2.3 Results......................................................................................................... 116

5.3 Summary ............................................................................................................ 117

Chapter 6: Optimisation and Load Profile .......................................................................... 120

6.1 Mathematical Optimisation .................................................................................. 120

6.1.1 Background ................................................................................................. 120

6.1.2 Implementation ............................................................................................ 123

6.1.3 Optimised RTAP Responses ....................................................................... 125

6.1.4. MAP and Optimised RTAP Responses Comparisons .................................. 132

6.2 Load Profile......................................................................................................... 134

6.2.1 Maximum Load ............................................................................................ 134

6.2.2 Thread Profile .............................................................................................. 141

6.3 Summary ............................................................................................................ 144

Chapter 7: Summary, Recommendations and Conclusion ................................................ 146

7.1 Summary ............................................................................................................ 146

7.2 Recommendations .............................................................................................. 147

7.3 Conclusion .......................................................................................................... 148

Bibliography ...................................................................................................................... 149

Appendix A ....................................................................................................................... 154

v

Abbreviations

AN Auditory Nerve

ANSP Auditory Nerve Spiking Probability

API Application Programing Interface

BM Basilar Membrane

BF Best Frequency

BPNL Bandpass-Nonlinear

CF Characteristic Frequency

CI Cochlear Implant

CPU Central Processing Unit

CV Condition Variable

dB Decibel

DP Double Precision

DRNL Dual Resonance Nonlinear

DSP Digital Signal Processor

ERB Equivalent Rectangular Bandwidth

ERBS Equivalent Rectangular Bandwidth Scale

GPOS General Purpose Operating System

GUI Graphical User Interface

HSR High Spontaneous Rate

Hz Hertz

IHC Inner Hair Cell

IIR Infinite Impulse Response

IO Input and output

JUCE Jules’ Utility Class Extension

KHz Kilo-Hertz

LSR Low Spontaneous Rate

MAP Matlab Auditory Periphery

MBPNL Multiple-Bandpass-Nonlinear

MKL Math Kernel Library

MVS Microsoft Visual Studio

OME Outer and Middle Ear

OS Operating System

vi

PZFC Pole-Zero Filter Cascade

RTAP Real-time Auditory Periphery

RTOS Real-time Operating System

SP Single Precision

SPL Sound Pressure Level

SPM State Partition Model

vii

List of Figures

Figure 2.1: Cooke periphery auditory model .......................................................................... 4

Figure 2.3: AIM2006 model ................................................................................................... 6

Figure 2.4: Analogue electrical circuit model of the middle ear .............................................. 7

Figure 2.5: Transmission line filterbank model. ..................................................................... 8

Figure 2.6: Neurotransmitter flow in IHC ............................................................................... 8

Figure 2.7: Multi-bandpass-linear (MBPNL) filter ................................................................... 9

Figure 2.8: Time-domain (left) and iso-intensity frequency spectra (right) projection of click

response of the MBPNL filter at a BF site of 9 KHz ............................................................. 10

Figure 2.9: The model of Carney and colleagues ................................................................ 11

Figure 2.10: Meddis MAP model ......................................................................................... 12

Figure 2.11: MAP model structure ....................................................................................... 16

Figure 2.12: Outer ear frequency response with peak auditory sensitivity range from 1 KHz

to 4 KHz .............................................................................................................................. 17

Figure 2.13: Acoustic energy transmittance from outer ear to the basilar membrane (BM) in

the (uncoiled) cochlea via the three bones in the middle ear ............................................... 18

Figure 2.14: Outer and middle ear model structure in MAP. ................................................ 19

Figure 2.15: Travelling wave of the basilar membrane from its basal to the apical end ....... 20

Figure 2.16: Spatial response of the BM ............................................................................. 21

Figure 2.17: Single BF point in BM modelled by a dual-resonance nonlinear (DRNL) filter

implemented in MAP. .......................................................................................................... 22

Figure 2.18: Gammatone waveforms of (a) gamma distribution, (b) sinusoidal tone and (c)

the resulting waveform of the product of (a) and (b) ............................................................ 23

Figure 2.19: Gammatone filterbank frequency response with 10 filters ............................... 24

Figure 2.20: DRNL summed output response with (a) medium level input and (b) high level

input .................................................................................................................................... 26

viii

Figure 2.21: IHC stereocilia motion effects on its electrical potential ................................... 27

Figure 2.22: Input-output relationship between auditory input stimulus and hair cell receptor

potential .............................................................................................................................. 28

Figure 2.23: Fluid-stereocilia coupling ................................................................................. 28

Figure 2.24: BM deflection causes changes in conductance in ion channels ....................... 29

Figure 2.25: Inner hair cells (IHC) membrane passive electrical circuit model. .................... 31

Figure 2.26: Neurotransmitter release from the IHC to the auditory nerve fibre ................... 32

Figure 2.27: Neurotransmitter discharge and retrieval flow ................................................. 34

Figure 3.1: Direct form type 2 IIR filter implemented by Matlab filter command. .................. 43

Figure 3.2: Induction of numerator and denominator coefficients at the initial phase of input

data sample streamed into IIR filter algorithm. .................................................................... 45

Figure 3.3: OME processing of multiple window frames. ..................................................... 49

Figure 3.4: Stapes displacement response in MAP and RTAP-numerical. .......................... 50

Figure 3.5: 3rd-order gammatone filter implementation in RTAP-numerical. ......................... 52

Figure 3.6: DRNL filter processing for 1 BF channel in multiple window frames. ................. 53

Figure 3.7: BM and IHC cilia displacement responses generated by MAP and RTAP-

numerical. ........................................................................................................................... 55

Figure 3.8: IHCRP algorithm processing for 1 BF channel in multiple window frames. ........ 56

Figure 3.9: IHC receptor potential response generated by MAP and RTAP-numerical. ....... 57

Figure 3.10: NRR algorithm processing for 1 AN channel in multiple window frames. ......... 58

Figure 3.11: LSR AN fibre neurotransmitter vesicle release rate for MAP and RTAP-

numerical. ........................................................................................................................... 59

Figure 3.12: HSR AN fibre neurotransmitter vesicle release rate for MAP and RTAP-

numerical. ........................................................................................................................... 60

Figure 3.13: ANSP processing for 1 AN channel in multiple window frames. ...................... 62

Figure 3.14: Probability of AN spiking in LSR fibre for MAP and RTAP-numerical. .............. 65

ix

Figure 3.15: Probability of AN spiking in HSR fibre for MAP and RTAP-numerical. ............. 65

Figure 3.16: Normalised RMS errors between MAP and RTAP-numerical for a 500 Hz sine

tone input observed from a 250 Hz BF channel. .................................................................. 66


tone input observed from a 1039 Hz BF channel. ................................................................ 67





Figure 4.1: RTAP main user interface. ................................................................................ 73

Figure 4.2: RTAP user interface for setting parameters. ...................................................... 74

Figure 4.3: RTAP object oriented class layout. .................................................................... 76

Figure 4.4: Sequential execution in RTAP. .......................................................................... 80

Figure 4.5: Thread synchronisation pseudocode in RTAP. .................................................. 82

Figure 4.6: Thread utilisation structure in RTAP. ................................................................. 83

Figure 4.7: Thread synchronisation in RTAP. ...................................................................... 84

Figure 4.8: Intel thread checker analysis of RTAP usage of threads. .................................. 86

Figure 4.9: (a) MAP and (b) RTAP DRNL response for 30 BF channels.............................. 88

Figure 4.10: RTAP binary file format generated when the ‘Record’ or ‘Play+Record’ button is

clicked. ................................................................................................................................ 91

Figure 4.11: File write thread operation. .............................................................................. 93

Figure 4.12: Continuity between adjacent window frames for RTAP generated DRNL

response. ............................................................................................................................ 97

Figure 5.1: Line draw test. ................................................................................................. 100

Figure 5.2: ERBS representation of the first window frame of DRNL response in RTAP

based on 85 BF channels. ................................................................................................ 104

x

Figure 5.4: Spectrogram representation of the first frame window frame of the dual

resonance nonlinear (DRNL) filterbank response in RTAP for 180 BF channels. .............. 108

Figure 5.5: Spectrogram representation of the first window frame of the inner hair cell

receptor potential (IHCRP) response in RTAP for 123 BF channels. ................................. 109

Figure 5.6: Spectrogram representation of the first window frame of the neurotransmitter

release rate (NRR) response for AN LSR fibres in RTAP for 96 BF channels. .................. 110

Figure 5.7: Spectrogram representation of the first window frame of the neurotransmitter

release rate (NRR) response for AN HSR fibres in RTAP for 81 BF channels. .................. 110

Figure 5.8: Spectrogram representation of the first window frame of the auditory nerve

spiking probability (ANSP) response for LSR fibres in RTAP for 85 BF channels. ............. 111

Figure 5.9: Spectrogram representation of the first window frame of the auditory nerve

spiking probability (ANSP) response for HSR fibres in RTAP for 65 BF channels. ............ 111

Figure 5.10: Image buffer clipping and projection of the display window. .......................... 114

Figure 5.11: ANSP response in LSR fibres of real-time speech illustrated in RTAP. ........ 118

Figure 5.12: ANSP response in HSR fibres of real-time speech illustrated in RTAP. ......... 119

Figure 6.1: 64-bit floating point format divided into two halves for fast exponentiation. ...... 123

Figure 6.2: Dual resonance nonlinear (DRNL) response generated in RTAP based on

optimised exponential function for 192 BF channels. ........................................................ 126

Figure 6.3: Inner hair cell receptor potential (IHCRP) response generated in RTAP based on

optimised exponential function for 155 BF channels. ........................................................ 127

Figure 6.4: Unstable HSR ANSP response after refractory period upon the start. ............. 128

Figure 6.5: IHCRP response displayed in RTAP based on optimised exponential function for

155 BF channels. .............................................................................................................. 129

Figure 6.6: Neurotransmitter release rate (NRR) response for low spontaneous rate (LSR)

fibre displayed in RTAP based on optimised exponential function for 123 BF channels. ... 130

Figure 6.7: Neurotransmitter release rate (NRR) response for high spontaneous rate (HSR)

displayed in RTAP based on optimised exponential function for 104 BF channels. ........... 130

xi

Figure 6.8: Auditory nerve spiking probability (ANSP) response for low spontaneous rate

(LSR) displayed in RTAP based on optimised exponential function for 107 BF channels. . 131

Figure 6.9: Auditory nerve spiking probability (ANSP) response for high spontaneous rate

(HSR) fibre displayed in RTAP based on optimised exponential function for 79 BF channels.

......................................................................................................................................... 131

Figure 6.10: Normalised RMS errors for various responses between MAP and optimised

RTAP based on a 500 Hz sine tone input observed from a 250 Hz BF channel. ............... 133


RTAP based on a 1000 Hz sine tone input observed from a 1039 Hz BF channel. ........... 133





Figure 6.14: Maximum load profile for non-optimised single precision execution of RTAP on

machines 1 and 2. ............................................................................................................. 137

Figure 6.15: Maximum load profile for optimised single precision execution of RTAP on

machines 1 and 2. ............................................................................................................. 138

Figure 6.16: Maximum load profile for non-optimised double precision execution of RTAP on

machines 1 and 2. ............................................................................................................. 139

Figure 6.17: Maximum load profile for optimised double precision execution of RTAP on

machines 1 and 2. ............................................................................................................. 140

Figure 6.18: Pixel render thread profile of RTAP on machines 1 and 2. ............................ 142

Figure 6.19: Record thread profile of RTAP on machines 1 and 2. .................................... 143

Figure 6.20: Onscreen signal display profile for maximum load in RTAP for machines 1 and

2........................................................................................................................................ 144

Figure A.1: MAP and RTAP inner hair cell receptor potential (IHCRP) response for 30 BF

channels. .......................................................................................................................... 157

Figure A.2: MAP and RTAP low spontaneous rate (LSR) fibre neurotransmitter release rate

(NRR) response for 30 BF channels. ................................................................................ 158

xii

Figure A.3: MAP and RTAP high spontaneous rate (HSR) fibre neurotransmitter release rate

(NRR) response for 30 BF channels. ................................................................................ 159

Figure A.4: MAP and RTAP low spontaneous rate (LSR) fibre auditory nerve spiking

probability (ANSP) response for 30 BF channels. ............................................................. 160

Figure A.5: MAP and RTAP high spontaneous rate (HSR) fibre auditory nerve spiking

probability (ANSP) response for 30 BF channels. ............................................................. 161

Figure A.6: Continuity between adjacent window frames for RTAP generated inner hair cell

receptor potential (IHCRP) response. ............................................................................... 162

Figure A.7: Continuity between adjacent window frames for RTAP generated

neurotransmitter release rate (NRR) in low spontaneous rate (LSR) fibres. ...................... 162

Figure A.8: Continuity between adjacent window frames for RTAP generated

neurotransmitter release rate (NRR) in high spontaneous rate (HSR) fibres. .................... 163

Figure A.9: Continuity between adjacent window frames for RTAP generated auditory nerve

spiking rate (ANSP) in low spontaneous rate (LSR) fibres. ............................................... 163

Figure A.10: Continuity between adjacent window frames for RTAP generated auditory nerve

spiking probability (ANSP) in high spontaneous rate (HSR) fibres..................................... 164

Figure A.11: ERBS representation of the first window frame of inner hair cell receptor

potential (IHCRP) response in RTAP based on 65 BF channels. ...................................... 165

Figure A.12: ERBS representation of the first window frame of neurotransmitter release rate

(NRR) for low spontaneous rate (LSR) fibre response in RTAP based on 45 BF channels.

......................................................................................................................................... 165

Figure A.13: ERBS representation of the first window frame of neurotransmitter release rate

(NRR) for high spontaneous rate (HSR) fibre response in RTAP based on 38 BF channels.

......................................................................................................................................... 166

Figure A.14: ERBS representation of the first window frame of the auditory nerve spiking

probability (ANSP) for low spontaneous rate (LSR) fibre response in RTAP based on 38 BF

channels. .......................................................................................................................... 166

Figure A.15: ERBS representation of the first window frame of the auditory nerve spiking

probability (ANSP) for high spontaneous rate (HSR) fibre response in RTAP based on 30 BF

channels. .......................................................................................................................... 167

xiv

List of Tables

Table 2.1: Review of AP model selection for real-time implementation. .............................. 15

Table 3.1: Memory allocation for IO parameters and algorithm coefficients. ........................ 39

Table 3.2: Algorithm functions in RTAP-numerical. ............................................................. 40

Table 3.3: Input settings of MAP and RTAP-numerical. ....................................................... 42

Table 4.1: Computing system platform used for RTAP development and testing. ............... 71

Table 4.2: RTAP settings for acquiring various responses. ................................................. 78

Table 4.3: Threading API comparison. ................................................................................ 81

Table 4.4: C/C++ file write profile. ....................................................................................... 90

Table 5.1: Spectrogram colour hue significance to the various stages of RTAP. ............... 108

Table 6.1: Non-optimised mathematical functions utilised in RTAP. .................................. 122

Table 6.2: Performance comparison of exponential function in MVS and MKL math libraries

and Schraudolph algorithm on machine 1. ........................................................................ 125

xv

Code Listings

Listing 3.1: RTAP-numerical program structure. .................................................................. 41

Listing 3.2: IIR filter. ............................................................................................................ 46

Listing 3.3: Input and output parameters save and load feature in the IIR filter.................... 48

Listing 3.4: DRNL computation. ........................................................................................... 51

Listing 3.5: AN spiking and non-spiking probabilities for LSR and HSR AN fibre types. ....... 64

Listing 4.1: Sine tone generator. ......................................................................................... 79

Listing 4.2: Data writes to binary file in file writer thread. ..................................................... 95

Listing 4.3: RTAP offline processing. ................................................................................... 96

Listing 5.1: Acquisition of maximum and minimum values. ................................................ 106

Listing 5.2: Static spectrogram display. ............................................................................. 107

Listing 5.3 Subsampling processed data in all algorithm functions. ................................... 113

Listing 5.4: ERBS and spectrogram plot scrolling. ............................................................. 115

Listing 6.1: Fast exponential computation. ........................................................................ 124

Listing 6.2: Code for comparing MKL & Schraudolph exponential function. ....................... 125

1

Chapter 1: Introduction

1.1 Motivation

Recent findings in anatomy and physiology of humans and animals have provided

much information about the auditory pathway (AP), which includes the outer, middle and

inner ears. To integrate these research findings, an AP computer model is a useful analytical

tool to probe into the intricacies of the AP functionality. It is an emerging area in the field of

computational neuroscience that simulates the characteristic of known physiological and

psychophysical attributes of the AP [1].

A real-time implementation of the AP computational model processes a stream of

auditory stimuli ‘on-the-fly’ and outputs the corresponding stream of processed data for

analysis. AP computer models that simulate empirical data possess subcomponents that are

nonlinear. This means that the responses of basilar membrane (BM), inner hair cell (IHC)

receptor potential and auditory nerve (AN) spiking change disproportionately with respect to

time [1]. Computationally, nonlinear algorithms add to the processing time as compared to

when linear algorithms are used. As a result, code optimisation and computational speed

enhancing code are required in the real-time implementation of the AP model. A real-time

implementation adds a new dimension for studying sound perception. One derivative of

sound perception studies is speech processing analysis based on auditory nerve spiking

events [2] [3] [4]. A digitised auditory stimulus streamed from a live audio source via a

microphone can be analysed for common speech signatures continuously and illustrated

graphically.

Another use of a real-time AP computational model is for feasibility studies of

algorithm portability on to embedded systems incorporating FPGA, DSP or ARM based

processors. Algorithms capable of portraying the characteristics of a subcomponent of the

AP in real-time on a computer gives a strong indication that it can be processed on an

embedded system where performance is either equivalent or superior due to hardware

processing acceleration features of an embedded processor. Embedded system

implementation of the real-time AP model includes speech processor of cochlear implants [5]

and enhanced automatic speech recognition devices utilised with conventional signal

processing algorithms [6]. Another utilisation is in telecommunication engineering where a

real-time AP model algorithm simulating the cochlea within the inner ear is implemented in

mobile phone devices to perform noise cancellation in real-time [7].

2

1.2 Statement of the Problem

An AP model is characterised with algorithms that simulate the responses of various

stages of the AP and a mathematical scripting language environment such as Matlab is one

platform used for such characterisation. As such, variations to system parameters that alter

the responses of the various stages in the AP model are typically done in the code. For a

large code base, this may be problematic due to the grounds of unfamiliarity and especially if

the user is not proficient in programming. Furthermore, if analysis is required on real-world

audio, a recording is required, which is then fed to the AP model for generating appropriate

responses. If the recording is large, the simulation will run for a significant duration of time

before producing interpretable responses. In other words, from the availability of the audio

signal for analysis to the simulation and the acquisition of the AP responses, there will be

significant delay.

This is especially inefficient if the AP model is connected to a hybrid network

integrated with other models simulating higher echelons of the auditory pathway. These

upstream AP models that are dependent on the response of the auditory nerve (AN) firings

in the AP model have to wait until the entire input stimulus is processed and response data

are made available before further processing can be carried out. As a result of this

bottleneck, the response of the entire network will also be delayed. A more efficient solution

is to use a real-time AP model integrated with a graphical user interface (GUI) to allow

system parameters to be modified. Computations are performed on-the-fly and thereafter, its

responses are visually projected for analysis. As such in a hybrid network, a real-time AP

model will generate instant responses that allow upstream AP models to function with

insignificant delays.

1.3 Objective of the Research

The main objective of this research is to implement a real-time computational model

of the AP that includes the outer and middle ear, the BM, IHC and AN spiking characterised

by a working non-real-time model. The real-time implementation should replicate identical or

similar characteristics as the original AP model given an identical input auditory stimulus.

The primary objectives can be further broken down into the following essential aims:

1. Review AP computer models and select one for real-time implementation.

2. Translate the available code for the selected AP model into a transitory C language

interface consisting only of the AP algorithms.

3. Test and verify the C-implemented AP algorithms closely match the behaviour of the

original AP model.

3

4. Develop C++ wrapper classes around the C-implemented algorithms of the AP model

to suit windows based GUI and audio input streaming features.

5. Test and verify the results of the C++-implemented model closely match the original

AP model.

6. Optimise the performance of the real-time model so as to incorporate more channels

to simulate.

7. Profile performance of the implementation and determine the limits of the model in

terms of number of channels and stages.

1.4 Thesis Outline

The following chapter will include the review of five AP computational models. The

criteria for AP model selection for real-time implementation is mentioned thereafter based on

latest research findings, AP completeness that is defined as the modelling of the outer,

middle and inner ear up to the point of AN spiking, accessibility to existing code for

simulation and ease of running the code. The latter stages of chapter 2 will cover the

detailed characteristics of the selected AP model for real-time implementation.

Chapter 3 covers the description of the transitory C-implemented platform translation

of the AP model from the basilar membrane (BM) to auditory nerve (AN) spiking stage. The

development and results of the C-implemented platform are illustrated and compared with

the responses of the original AP model.

Chapter 4 describes the real-time C++ graphical user interface (GUI) based

implementation of the C-platform translation of the AP model. Descriptions of wrapper class

structures, thread implementation and recording features are given.

Chapter 5 covers graphical illustration in the real-time C++ GUI program that include

static and scrolling plotting represented on Equivalent Rectangular Bandwidth (ERB) scaled

as well as spectrogram graphs. Projections of real-time auditory nerve spiking probabilities

are also presented.

Chapter 6 illustrates mathematical optimisation to speed up computations on the

real-time AP model. Load profiles of the real-time AP model are presented. This thesis is

finally concluded in chapter 7 with recommendations for future work as well as the

conclusion of this dissertation.

4

Chapter 2: Literature Review

2.1 Auditory Pathway Models

Five auditory pathway (AP) time domain computing models are reviewed in this

section of the chapter. A model is selected for real-time implementation and this will be

covered in the subsection of 2.1.6.

2.1.1 Cooke Periphery Auditory Model

The method adapted by Cooke [2] is to use human perceptual models that accounts

for psychophysical scaling. This involves the implementation of a filterbank comprised of

filters in parallel at various frequency intervals to represent auditory pathway physiology

response. Figure 2.1 illustrates the computer model adapted by Cooke. The outer and

middle ear (OME) is modelled using a 1st-order difference equation that boosts higher

frequencies though this is not projected visually in figure 2.1. The components of the inner

ear are modelled by independently parallel channels differentiated by the spatial

displacement within the cochlear. The basilar membrane (BM) is modelled based on the

BandPass-NonLinear (BPNL) model proposed by Pfeiffer [8]. The BPNL model is composed

of two bandpass filters with a static compressive nonlinear component positioned in between

them.

Figure 2.1: Cooke periphery auditory model [2].

The first block in figure 2.1 represents an asymmetric 4th-order bandpass filter with

poles corresponding to a Butterworth based filter design. It has zeros embedded strategically

within the Butterworth difference equation template so as to achieve a sharp high frequency

cut-off. The nonlinear compressive component is made of a square root function and the

subsequent block represents a symmetric 4th-order Butterworth bandpass filter. The filters

are implemented in a recursive form that speeds up computation. Figure 2.2 displays the

response of the parallel BPNL filterbanks that are displayed vertically one after another with

Fine structure extraction

Combination of responses

State Partition

Model (SPM)

Envelope detection nl bp1 bp2 stimulus

5

respect to the alignment of the centre frequencies along the spatial domain of the basilar

membrane. All centre frequencies for this plot have been converted to Bark scale, which

projects combined responses at various spatial points along the BM in linear spacing for the

critical band of human hearing [9].

Figure 2.2: BPNL filterbank response indicating travelling waveforms in a BM [2].

The output of the envelope detection after the BM module is used as the input to the

State Partition Model (SPM) that simulates the behaviour of the inner hair cells (IHC) as well

as the auditory nerve (AN) spiking based on the neurotransmitter release. AN spikes

generation are identified as three unique states which rely on a sensitivity threshold to

release neurotransmitters:

• Stimulus level higher than the sensitivity threshold triggers the depletion of

neurotransmitter and its respective replacement from the IHC.

• The sensitivity threshold is lower than the stimulus level which does not trigger

any neurotransmitter release. Though the cell has been depleted of

neurotransmitter recently and is stocked up again.

• The IHC remains inactive that suggests neither the occurrence of

neurotransmitter depletion nor replenishment.

6

The Cooke model is able to process of 61 discrete channels that simulate the

behaviour of the cochlea within the range of 100 Hz to 5000 Hz. This frequency range is

ideal for speech analysis [2]. Furthermore, the neurotransmitter release from the SPM block

simulates auditory nerve spiking allowing plausible speech data analysis as well.

2.1.2 Auditory Image Model (AIM2006)

AIM2006 was written in Matlab and developed by Patterson [3]. It supports two

different algorithm of the auditory pathway model that is categorised based on functionality

and physiology. AIM2006 has simulated 75 channels covering the speech spectral range of

100 Hz to 6000 Hz. Figure 2.3 illustrates the AIM2006 model.

Figure 2.3: AIM2006 model [3].

��

��

Middle ear filtering Middle ear filtering

Spectral analysis

Basilar membrane motion

Gammatone filter

Spectral analysis

Basilar membrane motion

Transmission line filtering Spectral sharpening

Compression

Neural encoding

Neural activity pattern

Inner hair cell simulation

Time-interval stabilisation

Correlogram

Autocorrelation

Neural encoding

Neural activity pattern

Compression

2D adaptive thresholding

Time-interval stabilisation

Auditory images

Strobed temporal integration

7

In both the functional and physiological model, the middle ear filtering is developed

by Lutman [10] using analogue electronic impedances. Figure 2.4 illustrates the electrical

circuit to model the middle ear with the impedances of every subcomponent within the

middle ear bounded with a dotted box. The voltage input to the circuit represents the

acoustic pressure at the eardrum and the current in the ‘stapes+cochlea’ branch denotes

stapes vibration velocity.

Figure 2.4: Analogue electrical circuit model of the middle ear [10].

The BM in the functional model is designed using a linear gammatone filterbank with

the parameters derived by Patterson [11]. Stapes velocity from the middle ear is converted

to displacement and is fed into the gammatone filterbank. The output of the filterbank is a

multi-channel representation of the BM displacement. The nonlinear cochlea compression is

added in the neural encoding stage. The physiological model of the BM is designed based

on a nonlinear transmission line filterbank with feedback parameters simulating the outer

hair cell (OHC) mechanical amplification of the BM motion [12]. The transmission line

filterbank is based on electroacoustic properties and comprises analogue electronic

components as illustrated in figure 2.5.

8

Figure 2.5: Transmission line filterbank model. Branches 1 and n denotes the basal site and apical sites of the BM respectively [12].

The subsequent phase of the AIM2006 model is the transduction of the BM

displacement that generates neural auditory pattern (NAP). In the functional model, NAP is

computed by using a two dimensional adaptive threshold filterbank. The BM displacement is

rectified and compressed before adaptation occurs in the time domain and suppression in

the frequency domain which aids in the sharpening of vowel formants. The physiological

model generation of NAP adopts Meddis IHC neurotransmitter flow model [13] and is

illustrated in figure 2.6. Meddis neurotransmitter model characterises the release and

replenishment of neurotransmitter of a single IHC, based on the corresponding BM

displacement level. Every BM site simulated by the transmission line filters is cascaded with

an IHC neurotransmitter model that corresponds to a single afferent nerve fibre.

Figure 2.6: Neurotransmitter flow in IHC [13].

Factory

1 Free Transmitter Pool

q

Reprocessing Store

w

Cleft

c

y(1 – q)

xw

rc

kq

Reuptake

Lost

9

The auditory image display is the final phase of the model which represents the

visual representation of sound. It is obtained by applying temporal integration to the NAP

data. This phase of the model is not essential as its scope is beyond the boundary of

auditory nerve spiking and will not be deliberated upon.

2.1.3 Multiple-Bandpass-Nonlinear (MBPNL) Filterbank

The MBPNL filterbank, developed by Goldstein [14], is a quantitative tool that models

the nonlinear behaviour of the BM subcomponent of the AP. Other subcomponents within

the AP are not addressed in this model. However, this model is essential as other AP

models follow the similar structure for BM displacement computation. Each MBPNL filter is

implemented in parallel for discrete spatial points along the BM known as best frequency

(BF) points to form a filterbank. The filter consists of dual signal processing path with

recursive implementation. Figure 2.7 depicts the block diagram of the MBPNL filter. H1, H2

and H3 represent linear bandpass filters. The lower branch of the filter indicated by H1 to H2

denotes the nonlinear narrowband and compressive tuning curves of a BM. H3 to H2

denotes the linear branch of the filter. The compression is achieved generally using either a

square or cube root function.

Figure 2.7: Multi-bandpass-linear (MBPNL) filter [14].

Goldstein [14] hypothesises that the compressive behaviour of the MPBNL filter is

due to OHC transduction in empirical measurements of BM tuning curves. The expander

subcomponent in the filter is postulated to be an expansive excitatory and suppressive

feedback response of stimulus tones below BF. This expansive feature cannot be directly

observed in measured BM tuning curves. Lin and Goldstein [15] implemented a version of

the MBPNL model in C language on the Linux operating system (OS) platform to simulate

healthy and damaged cochleae. The sampling frequency was set to 20 KHz and a 4,096

r�c

S22 S32

�c S12

H2(�)

Compressing memoryless nonlinearity

f( ) + Basilar

membrane displacement

r(t) v u

S13

�c S23

H3(�)

S11 �c S21

H1(�)

Expanding memoryless nonlinearity

f-1( )

G

MOC Efferent Control

Stapes velocity

S(t)

10

point window was chosen to represent the signals. A single BF point of 9 KHz was selected

for analysis and auditory stimuli of clicks at different intensities varying from 26 dB to 86 dB

at steps of 10 dB were utilised. Figure 2.8 illustrates the time domain and iso-intensity plots

of the click response.

Figure 2.8: Time-domain (left) and iso-intensity frequency spectra (right) projection of click response of the MBPNL filter at a BF site of 9 KHz [15].

2.1.4 The Model of Carney and Colleagues

The model of Carney and Zhang [4] is a tool for studying the nonlinearities of auditory

nerve (AN) response encoding based on simple and complex sounds. The input to the

model is auditory stimuli pressure in Pascal. The outer and middle ear effects are not

modelled. Instead, the stimulus is fed directly to two paths: signal and control paths. The first

cascade in the signal path is a nonlinear 3rd-order gammatone filter. The nonlinearity is

achieved by a dynamic tuning of the filter time constant that affects the gain and bandwidth

as well as the input stimulus levels. The nonlinearity introduces dc component in the output

of the filter which is biophysically inappropriate. As a result, a 1st-order linear gammatone

filter is introduced to eliminate the dc component.

1000

100

10

3 5 7 9 11

frequency (KHz)

26

36

46

56

66

76 86 86

76

66

56

46

36 26 dB SPL

0 0.5 1.0

0.5 1.0

0 1 2 3 4 5 6

time (ms)

11

Figure 2.9: The model of Carney and colleagues [4].

The control path comprises a time varying 3rd-order wideband gammatone filter. The

bandwidth of this filter is larger than the signal path bandwidth because of the need to

accommodate two-tone suppression. Two-tone suppression refers to the reduction of the

auditory nerve firing rate due to an addition of a second tone to the original tone that

suppresses the effect of one of the tones. In the signal path, the larger bandwidth of the

control path allows for the second tone in the signal path to reduce its gain. The second

block of the control path consists of a nonlinear function to dynamically compress the control

signal. The next nonlinear function coupled with a low-pass filter regulates the range and

compression dynamics. A final nonlinear function fine-tunes the total strength of

compression. The resultant output parameter is a time constant that varies based on the

levels of the input stimulus.

Stimulus P(t)

(t)

LP

NL

NL

Wide-band Filter

CF K

��cp��sp

Control Path

Spike Generator

Spike Time

PTSH 500 Trials

Synapse Model

s(t)

Vihc(t)

Linear Filter

CF

Time-varying Narrowband

Filter Signal Path

LP

NL

IHC

Psp(t)

12

The output of the IHC is a receptor potential and it is computed in two stages. The

first stage is a nonlinear logarithmic compressive function and the second stage is a 7th-

order low-pass filter. The advantage of using a signal processing model at the IHC stage

instead of a biophysical model is that it is easily implementable and it has fast computing

time. The synapse model simulates the release rate of the neurotransmitter. Firstly, the

immediate neurotransmitter permeability rate is computed based on a logarithmic function

with the IHC receptor potential and a BF threshold as its input. After which the

neurotransmitter discharge rate is then computed as a product of the permeability rate and

the quantity of available neurotransmitter to be discharged from the IHC. The auditory nerve

(AN) spike rate is finally computed as a product of the neurotransmitter rate and the Poisson

process that accounts for the historical spike discharge rates.

2.1.5 Matlab Auditory Periphery (MAP) Model

The MAP model [6] is an auditory pathway model that simulates outer ear all the way

to the cochlear nucleus and brainstem. Figure 2.10 illustrates the Meddis MAP model. The

outer ear consists of a parallel 1st-order band-pass filter that enhances the amplitude of

spectral range of speech. The output is sound pressure in Pascal which then is fed to a 1st-

order low-pass filter resulting in the tympanic membrane (TM) or eardrum displacement in

metres. The TM parameter is passed into a 1st-order high-pass filter that generates stapes

displacement.

Figure 2.10: Meddis MAP model [16].

The BM is modelled using a filterbank comprising a dual resonance nonlinear

(DRNL) filter [17], [18]. The DRNL filter bears its origin from the MBPNL filter though the

pathway is uniquely modelled. Stapes displacement parameter is fed to two parallel paths

made up of a linear and a nonlinear branch. The linear path has a broadly tuned gammatone

filter. The nonlinear path comprises a cascade of a gammatone filter with a narrowly tuned

bandwidth, a memoryless compressive function and another narrowly tuned gammatone

13

filter. The two paths are summed at the end that results in BM displacement in metres. To

achieve level dependent response from the DRNL filter, the gains of the two filters paths are

relatively set. Level dependent BF shifts are also accounted for by setting the different centre

frequencies for the broadband and narrowband filters. Two gammatone filters are used in

the nonlinear path to accommodate combination tones that include two-tone suppression.

A high-pass filter is used to convert the BM displacement to stereocilia displacement.

A biophysical model of the inner hair cell (IHC) is used in the MAP model that translates the

stereocilia displacement to intracellular potential to indirectly invoke neurotransmitter

release. Stereocilia displacement is firstly converted to basolateral conductance within the

respective IHC. From the conductance, the IHC voltage is derived using a Hodgkin-Huxley

model that encompasses a Boltzmann function. The IHC receptor potential influences the

flow of calcium into the IHC. The quantity of calcium ions densely packed at the synapse of

the IHC dictates the probability of release of neurotransmitter vesicles [19]. The computation

of the AN spiking is a stochastic process and is calculated as a probability binary release

rate based on the product of the concentration of calcium ions and the quantity of available

neurotransmitter for release. A more computationally intensive spike event computation is

also available in the MAP model that is required for the next stage of brain stem neuron

computing [13] though this remains outside the purview of the objectives.

2.1.6 Model Selection

The MAP model is selected to be implemented in real-time. Firstly, the code is readily

accessible and there are ready-to-run Matlab script files that demonstrate the response of

the subcomponents of the MAP model. Furthermore, as MAP version 14 is the latest version

as of the writing of this document, technical support is available from Ray Meddis. This is

especially essential as the real-time implementation should be designed to have a

considerably close operational characteristic with the original model and any unexpected

behaviour in the original model can be resolved with close consultation with the author.

Out of all the models reviewed in this document, the MAP model is the most recently

updated model with developments still being implemented to date. A recent alternative

model is the AIM model. Although a real-time implementation of the model titled AIM-C had

been recently developed with C++ [20], many modules within AIM-C conform to the

functional operational characteristics of the AIM model structure in figure 2.3 with the

exception of a pole-zero filter cascade (PZFC) [21]. The PZFC is a more realistic and

complex auditory filter than the default gammatone filter in place in the AIM model and it is

part of BM motion algorithm selectivity. As a result of the implementation of the functional

model over the more biophysically accurate physiological model, AIM-C is capable of

14

processing a filterbank of up to 200 channels and display its auditory nerve response in real-

time on a dual-core CPU running over 2GHz [22]. However, the real-time responses of AIM-

C slow in proportion to an increase in the number of channels beyond 200. Auditory nerve

spiking events computed in AIM-C take a simplified account of neurotransmitter release as

opposed to a more biophysically sophisticated approach in the AIM 2006 and MAP models.

AIM-C was developed to complement AIM2006. As a consequence, the MAP model was

chosen over AIM-C model as the former offers algorithms that are more sophisticated and

bear biophysical resemblance of close proximity. In comparison to AIM2006, the MAP model

provides an alternative platform to study sophisticated and biophysical aspects of the various

stages of the auditory pathway of various species.

The AN spiking for the MAP model is dependent on a biophysical simulation of the

IHC. Cooke uses a discrete three state model to generate the AN spikes whereas the

MBPNL model only simulates up to the basilar membrane. The Carney model uses a

complete signal processing design for the IHC to influence the AN spiking. Although the

Carney model has fast processing capabilities and is easily implementable with few

parameters, it is unable to reflect low frequency sounds in the IHC onwards to the AN

spiking. Furthermore, the IHC subcomponent of the Carney model does not account for

physiological parameters influencing AN spiking for brief and intense sounds [16]. The IHC

subcomponent of the MAP model based on a biophysical model arrests the problems

encountered from the Carney model and thus, is able to simulate AN spiking based on a

larger variation of auditory stimulus. AIM2006, is the only other model that simulates the IHC

response using a biophysical model. However, as a real-time version, AIM-C was already

developed, AIM2006 was not considered. Table 2.1 provides a summary of the AP model

selection.

15

AP Models Advantages Disadvantages

Cooke PAM • Simple design.

• Fast on modern computers.

• No biophysical attributes.

• Not updated regularly.

• Oldest of the five models.

AIM2006 • Possess biophysical traits.

• Updates available.

• Technical support available.

• Real-time version already

available.

MBPNL • Biophysical simulation of

BM.

• Does not simulate from IHC

stage onwards.

Carney • Easily implementable.

• Fast processing capabilities.

• No biophysical attributes from

IHC onwards.

• Not updated regularly.

MAP • Possess biophysical traits.

• Most recently updated

compared to other models

as of this writing.

• Technical support available.

• Complex algorithm.

Table 2.1: Review of AP model selection for real-time implementation.

2.2 The MAP Model of the Human Auditory Pathway

The MAP model is coded entirely in Matlab and is modelled with the perspective of

providing an accurate simulation of the auditory pathway (AP) that allows a user to modify

and change the parameter settings for a diverse range of human and animal auditory

analysis. The model characterises the AP from the outer and middle ear where the presence

of an auditory stimuli is channelled as an input to the model, up to the auditory nerves where

the corresponding action potential spikes are generated. The MAP model can be broken

down into five cascaded segments. Figure 2.11 displays the block diagram of the MAP

model from the outer ear to the auditory nerve.

16

Figure 2.11: MAP model structure [16]. Shapes coloured in red denotes segments that are

omitted from the real-time software application.

2.2.1 Outer and Middle Ear

of air reaches the outer ear, which is the primary Sound propagated via the medium

interface of the ear with the outside world. The outer ear has a passage where sound waves

are channelled to the end of the outer ear where they reach the tympanic membrane (TM),

also known as the eardrum. The mechanical vibration of the TM is measured as mechanical

pressure in Pascal. In MAP14, the auditory stimulus, also known as the input to the model is

measured in dB SPL. To acquire the reading in Pascal, the auditory stimulus is multiplied

with a scalar, which is calculated as follows:

� = 28� − 6 × 10�� ! (Eqn. 2.1)

MAP14 has introduced several new features in the outer and middle ear calculation

with respect to earlier versions. One feature is the implementation of external ear resonance

computation. The ear canal or outer ear is the most responsive between 1 KHz and 4 KHz of

the auditory frequency range with a peak near 3 KHz, which is a common occurrence of

human speech [23]. The outer ear behaves like a set of parallel band-pass filters that amplify

the auditory spectral range between 1 KHz to 4 KHz. Hence, the MAP model extracts the

auditory stimuli contents in the range of 1 KHz to 4 KHz using a 1st-order Butterworth band-

pass filter and applies a 10 dB gain before being summed with the original auditory stimuli.

Stimuli

Outer & Middle Ear

Basilar Membrane

Inner Hair Cell

Cochlear Nucleus

Brainstem Level 2

Auditory Nerve

Acoustic Reflex

Medial Olivo Cochlear

effects

17

This parallel resonance enhances the amplitude of the ‘speech’ range of the sensitivity plot

as indicated in figure 2.12 [23].

Figure 2.12: Outer ear frequency response with peak auditory sensitivity range from 1 KHz to 4 KHz [23].

Figure 2.12 is a generalised output ear frequency spectrum that does not account for

the direction of the incidence of auditory stimuli. Typically, the eardrum vibrates as a result of

the directional arrival of an acoustical stimulus [24]. Moreover, the magnitude of eardrum

vibration differs for every person. Ideally, the outer ear is modelled with a directional filter.

MAP14 disregards the effects of directional auditory stimuli upon the outer ear as its main

goal is to deliver a computational tool to establish a general theory of hearing [1]. However,

an outer ear filter model whose parameters accounts for auditory stimuli direction can be

added to the external ear resonance module of the MAP model in future developments.

Past the TM is the middle ear consisting of three tiny bones, namely malleus, incus

and stapes. The malleus is attached to the internal portion of the TM and is connected to the

incus, which in turn is connected to the stapes. The stapes bone connects to an oval window

in the cochlea. The role of the middle ear is to relay the acoustic energy from the TM to the

cochlea via the mechanical vibration of the three bones. For stimulus up to approximately 2

KHz, the acoustic energy is translated by a one dimensional piston-like motion of the stapes

[25]. Hence, the middle ear is considered as an acoustic impedance transformer. The outer

ear has low impedance properties and the oval window on the cochlea has high impedance.

Due to this mismatch, any acoustic energy that bypasses the middle ear will be reflected.

10000 5000 1000 500 200 100 50 20

Frequency (Hz)

18

The middle ear, therefore, attenuates the reflection and allows the acoustic energy to be

channelled to the oval window [26]. Figure 2.13 illustrates acoustic energy transfer from the

outer, to the middle ear and subsequently to the inner ear.

Figure 2.13: Acoustic energy transmittance from outer ear to the basilar membrane (BM) in

the (uncoiled) cochlea via the three bones in the middle ear [27].

MAP14 simulates stapes movement in live human subjects as described by Huber

[28] as opposed to cadaver measurements for earlier versions of MAP. This is because

stapes measurement in cadavers and live human patients do not match [29]. Hence,

simulation of the stapes motion in MAP14 is altered accordingly to the more applicable live

human recordings. Moreover, earlier versions of MAP simulated the TM and stapes output

as velocities. In order to accommodate live human data in MAP14, displacements instead of

velocities are used.

MAP14 models the middle ear as a linear system. The input to the filter is the time-

varying sound wave pressure from outer ear module and its output is a time-varying stapes

displacement. In order to obtain stapes displacement, the relationship between the pressure

and TM velocity is sought and is given as follows:

" = #$ (Eqn. 2.2)

where " is sound wave pressure in Pascal; $ is the TM velocity; and # a constant. An

alternative relationship of the TM velocity is given by

$ = 2%&' (Eqn. 2.3)

where & is the frequency of the auditory stimuli and ' is the TM displacement. Substituting

equation 2.3 into 2.2 results in

19

' = ()*+, (Eqn. 2.4)

It is observed from equation 2.4 that if the auditory stimulus frequency increases, TM

displacement decreases and vice-versa. This behaviour is implemented in MAP14 through

the use of a 1st-order Butterworth low-pass filter with a cut-off frequency at 50 Hz. Modelling

the behaviour in such a way corresponds with human data above 2 KHz [28]. The output of

the low-pass filter is a good fit for TM and stapes displacements data for frequencies beyond

2 KHz. However, there is a need to provide a threshold at very low frequencies. Hence, a

high-pass filter is cascaded to the output of the low-pass filter implemented with a 1 KHz cut-

off whose output yields stapes displacement. Figure 2.14 depicts the outer and middle ear

model structure.

Figure 2.14: Outer and middle ear model structure in MAP.

Filtered stapes displacement in metres fed to basilar membrane.

Unfiltered stapes displacement in metres

Sound pressure in Pascal

1st-order low-pass Butterworth filter, &-./0,, = 5023

1st-order high-pass Butterworth filter, &-./0,, = 1#23

Auditory stimuli

&45647-./0,, = 4#23

1st-order band-pass Butterworth filter, &90:7-./0,, = 1#23

20

2.2.2 Basilar Membrane

The basilar membrane (BM) resides in the fluid filled area within the cochlea and it

responds mechanically to the sound stimuli diffused from the stapes. In the presence of a

stimulus, its effect is propagated from the outer ear to the stapes in the middle ear and

subsequently to the round window of the cochlea, which in turns vibrates the fluid within.

Fluid motion influences the wave displacement on the BM from the basal to the apical end.

Because the width of the BM near the base of the cochlea is much smaller and rigid than its

width at the opposite end at the apex, the travelling wave starts off at a fast speed and its

amplitude and phase gradually increases as the wave propagates onwards to the apical end.

However, at a specific point along the way to the apical end, the travelling wave starts

slowing down and its amplitude is rapidly reduced though its phase continues to increase

[30]. Hence, due to its rigid characteristic near the basal end, the BM is more susceptible to

higher frequencies at this end than lower frequencies. Figure 2.15 shows a representation of

the travelling wave in the BM transmitted from the oval window to the base and finally to the

apex.

Figure 2.15: Travelling wave of the basilar membrane from its basal to the apical end [31].

Alternatively at the apical end, the BM response to lower frequencies is much more

potent than higher frequencies. BM responds tonotopically to auditory stimuli for which there

are numerous points along the BM where the responses are at its peaks [27]. In other words,

for a given stimulus of a fixed frequency, there is point along the BM that generates a peak

response. This point translates to a specific frequency called best frequency (BF).

21

Alternatively, BF for stimuli at a threshold is known as characteristic frequency (CF). The

response of the BM gradually decays at sites moving away from either of the two directions

from the BF point. Hence, the BM can be modelled using a filterbank that comprises of

multiple overlapping filters with various peak BF responses [6]. Figure 2.16 illustrates the BM

response at three discrete BF points.

Figure 2.16: Spatial response of the BM [27].

BM filters are nonlinear and asymmetric. Its asymmetry is due to faster decay for BM

magnitude response for frequencies above a BF point than for frequencies below it as the

auditory stimulus frequency shifts. Its nonlinearity is attributed to its gain for various intensity

stimuli. At low intensity stimulus, the gain is higher than for high intensity stimulus where the

22

gain is compressed. This however applies to basal points. For apical points below 1 KHz, the

compressive gain applies for a larger stimulus intensity range. Stimulus intensity also shifts

BF points and alters bandwidths of CF points, which is another attribute leading to its

nonlinearity [6]. Figure 2.20 (b) illustrates BF shift and bandwidth increase with a large

intensity stimulus.

In MAP14, the lowest and highest BF parameters are specified by the user and every

specific BF points in between the boundaries are spaced equally on a log scale. Each CF is

modelled by a filter called dual-resonance nonlinear (DRNL) filter [17]. The input to the

DRNL filter is stapes displacement from the outer and middle ear module. There are two

parallel paths that make up the DRNL filter. One is a linear path and the other a non-linear

path and the results of these parallel paths are summed at the end of the filter. Figure 2.17

shows the different sub-filters that form the DRNL filter.

Figure 2.17: Single BF point in BM modelled by a dual-resonance nonlinear (DRNL) filter implemented in MAP. The top parallel branch is linear while the bottom branch is nonlinear.

The common sub-filter that makes up the DRNL is a gammatone filter, which

primarily performs spectral analysis on the stapes displacement and outputs BM

displacement in the time domain [11]. It is characterised by the product of a gamma

distribution and a sinusoidal tone. The equation and its resultant waveform are demonstrated

as follows:

;<==<>?@� A="BCD� E�D"?@D� = F>G7H�7I/ cos(�M> + O) (Eqn. 2.5)

where

A: amplitude of the sinusoidal tone

23

N: filter order

�M: ringing frequency in rad/sec

O: initial phase in rad

b: one-sided pole bandwidth in rad/sec

Figure 2.18: Gammatone waveforms of (a) gamma distribution, (b) sinusoidal tone and (c) the resulting waveform of the product of (a) and (b) [32].

The filter parameter, b, represents the duration of the impulse response and hence,

the bandwidth of the gammatone filter. Parameter, N, denotes the filter order and the slopes

of the filter response skirts. Typical gammatone filter order within the range of 3 to 5 shapes

its magnitude to be close to human cochlea mechanics characteristic [11]. To simulate the

entire BM, a bank of gammatone filters is sufficient. However, as the gammatone filter

response is linear, the BM response will be linear as well thereby, deviating from empirical

24

physiological data. Figure 2.19 illustrates the frequency response of a gammatone filterbank

with ten filters for linear BM characterisation.

Figure 2.19: Gammatone filterbank frequency response with 10 filters [33].

In MAP14, the nonlinear branch of the DRNL filter consists of three 1st-order identical

gammatone filters, a broken stick compression function followed by three 1st-order identical

gammatone filters. Its centre frequency corresponds directly with BF points and its

bandwidth is given by the empirical formula:

PQR0R95R = " × PS + T (Eqn. 2.6)

where " and T are constants set at 0.2895 and 250 respectively [16].

Compression is applied in the nonlinear branch of the DRNL filter if the input stimulus

is above a specific threshold. This compression threshold is specified as decibels and fed to

an equation to generate a threshold displacement, which is acquired by multiplying the

decibel-converted compression threshold parameter with a reference value of 10e-9 m.

Scaling the compression threshold with the aforementioned reference value ensure that BM

25

displacement will be within the boundary of normal hearing [16]. The variable compression

threshold is determined by

U>PV = 10� − 9 × 10XYZ[�Z ! (Eqn. 2.7)

where U>PV'P is the compression threshold in dB. The nonlinear equation that is applied to

input stimulus larger than the compression threshold is as follows:

ℎ(>) = DA;@]^(>)_ . a>PV . �^" ba. C?; (cde(/)d)-/fg h (Eqn. 2.8)

where a is the exponent set at 0.2 for the best fit to the plot in figure 2.18;

< is a scalar whose default value is 50,000 [16]. For input stimulus levels below the

compression threshold, the characteristic of the nonlinear path after the first series of

cascaded gammatone filter is linear with the following formula:

ℎi>j = < . ^i>j (Eqn. 2.9)

The linear pathway of the DRNL filter comprises an adjustable linear gain with a

default value of 50 and a cascade of three identical gammatone filters. Its CF is dependent

on the BF points. It is similar to the nonlinear pathway except that there are shifts in BF with

a rise in the stimuli levels. It is characterised by the following empirical formula:

US95R = =A@US95R + a?�&&US95R × PS (Eqn. 2.10)

where =A@US95R and a?�&&US95R constants are set at 153.13 and 0.7341 respectively [16].

Similarly, the bandwidth of the linear path is also dependent on BF points given by

PQ95R = =A@PQ95R + a?�&&PQ95R × PS (Eqn. 2.11)

where =A@PQ95R and a?�&&PQ95R are constants set at 100 and 0.6351 respectively [16].

For medium intensity audio signal input, the nonlinear parallel path with the

compressive function dominates the DRNL filter summed output. The summed output of the

parallel paths of the DRNL filter is linear for very low and high intensity audio signal input.

The aforementioned effects can be observed in figure 2.20(a) for an intermediate audio input

level at 30 dB SPL and 2.20(b) that illustrates high audio input level at 85 dB SPL.

Furthermore, the peak of the summed output of the DRNL filter response from figure 2.20(b)

26

is wider than that of figure 2.20(a) and that the BF position of the higher level signal input

has also been shifted to a lower frequency [17] [18]. It is to be noted that the plots in figure

2.20 is based on earlier MAP versions and hence, the unit of measurement is presented in

metres per second instead of metres.

Figure 2.20: DRNL summed output response with (a) medium level input and (b) high level input [18].

2.2.3 Inner Hair Cell

Mechanoreceptors for vibration that are in the form of IHC are embedded on top of

the BM. The IHC convert mechanical energy of the acoustic transmitted vibration in the BM

to electrical energy. A bundle of hair-like structure called stereocilia resides on top of the

IHC. These stereocilia have varying lengths that increase from one side of the hair cell to the

27

opposite end. The deflection of the stereocilia cluster towards the direction of its longest

strand causes the IHC to increase its electrical potential from its resting potential state, also

known as depolarisation [27]. This act causes the IHC to release glutamate filled vesicles

called neurotransmitters [34]. When the stereocilia bundle is deflected in the direction of the

shortest cilia strand, the IHC potential decreases. This electrical action is known as

hyperpolarisation and as a result, a lesser number of neurotransmitters are released from

the IHC [27]. Figure 2.21 illustrates direction of movement of the hair cell along with the

corresponding receptor potential of the IHC and the action potential spikes of the respective

auditory neuron.

Figure 2.21: IHC stereocilia motion effects on its electrical potential .[27] (A) Stereocilia deflection towards its longest strand causes depolarisation. (B) Stereocilia deflection away

from its longest strand causes hyperpolarisation.

To elaborate further on the receptor potential of the inner hair cell, figure 2.22 shows

an input-output function that defines the relationship of the input stimulus peak pressure in

Pascal and inner and outer hair cells receptor potentials. The changes in depolarisation of

the hair cell occur at positive potential that is larger than the changes in hyperpolarisation

that occur at negative potential. However, the changes in depolarisation occur at a slower

rate than the changes in hyperpolarisation. The depolarisation and hyperpolarisation of the

28

IHC induces either the excitation or inhibition of action potentials in the AN fibres. These

spiking in every AN fibre occur at a specific point of a sinusoidal cycle for a sine tone input

stimulus. These spikes occur consistently for the same point and appear with a phase lag.

Though these spikes may not occur in every cycle, it will fire in multiples of the incurred

phase difference few cycles later. Therefore, phase locking occurs between input stimulus

and AN spikes for low frequencies up to 2 KHz and reduces for frequencies beyond 2 KHz.

For frequencies above 5 KHz, phase locking happens indiscriminately [35].

Figure 2.22: Input-output relationship between auditory input stimulus and hair cell receptor potential [36].

The output of the BM model is BM displacement, which is the input parameter to the

IHC model. In MAP14, the IHC model can be broken down into three phases [19]. The first

phase includes the calculation of the IHC stereocilia displacement. As illustrated in figure

2.23, the rigid swaying motion of the IHC stereocilia is initiated by the fluid in the scala media

as the BM is deflected [19]. This fluid-stereocilia coupling is characterised by equation 2.12.

Figure 2.23: Fluid-stereocilia coupling [37].

29

�- k.(/)k/ + B(>) = �-U-595c$(>) (Eqn. 2.12)

where

U-595c is the IHC cilia gain factor;

�- is the time constant;

$(>) is the BM displacement in metres;

B(>) is the IHC stereocilia displacement in metres.

Equation 2.12 has the characteristics of a high pass filter. At high frequencies, IHC

cilia movement is in phase with the BM displacement and at low frequencies, the stereocilia

movement is in phase with the BM velocity. This fluid-stereocilia coupling relationship is also

independent of the position along the BM, which allows this equation to be used throughout

any BF locations along the BM.

The stereocilia displacement causes the opening and closure of ion channels at its

tip as depicted in figure 2.24. When stereocilia bundle deflects to its longest strand,

potassium channels at the tips open allowing potassium ions to flow in. As potassium ions

are charged, the introduction of more potassium ions contributes to an increase in the

intracellular potential of the IHC. Similarly, when the stereocilia is deflected to its shortest

strand, the potassium channels are closed and as the potassium ions within the cell are

dispersed through a channel in the basolateral membrane [16], the IHC intracellular potential

drops. Another contributor to the intracellular potential is the capacitive effect of the IHC [6].

Figure 2.24: BM deflection causes changes in conductance in ion channels [16].

The magnitude of IHC stereocilia displacement from equation 2.12 determines its

apical conductance. The degree to which potassium channels open is modelled using a

three-state Boltzmann function. Hence, the relationship between the stereocilia displacement

and the IHC apical conductance is mathematically defined as follows:

30

l(B) = l-595cmce n1 + �^" b− .(/)7.!o! h × p1 + �^" b− .(/)7.qoq hrs7H + lc (Eqn. 2.13)

where

l-595cmce is the transduction conductance with all the ion channels open in Siemens;

B(>) is the IHC stereocilia displacement in metres;

Dt, and DH are sensitivity constants that define the precise nonlinearity profile;

Bt and BH are IHC displacement constants;

lc is the passive conductance in the apical membrane which is given by

lc = l0 − l-595cmce n1 + �^" b.!o! h p1 + �^" b.qoq hrs7H (Eqn. 2.14)

where l0 is resting conductance.

An electrical circuit that defines the motion of the IHC cilia is introduced in [37].

Figure 2.25 illustrates the IHC passive circuit. Using the aforementioned IHC apical

conductance and utilising Kirchhoff’s current law that states that the sum of all currents in the

branch of um is zero, the intracellular potential of the IHC, um, can be derived.

Um kvw(/)k/ + l(B)(um(>) − x/) + l+(um(>) − x+y ) = 0 (Eqn. 2.15)

where

um(>) is the intracellular IHC potential;

l+ is the potassium conductance set as a constant of 20 nS.

Um is the capacitance of the cell set as a constant of 4 pF;

x/ is the endocochlear potential set as a constant of 0.1V;

x+y is the reversal potential of the basal current for potassium ions, where x+y is given by

x+y = x+ + x/ z{zY|z{ (Eqn. 2.16)

where }~ is the epithelium resistance and }� is the endocochlear resistance in Ohms.

31

Figure 2.25: Inner hair cells (IHC) membrane passive electrical circuit model.

2.2.4 Neurotransmitter Release

Apart from the IHC receptor potential, the rate of release of neurotransmitter is also

determined by the availability of neurotransmitters in the presynaptic area within the IHC.

The synapse is defined as the medium that connects two cells and it is categorised as either

electrical or chemical [27]. In between the IHC and the auditory nerve fibres, there is a small

gap of extracellular space called synaptic cleft. The link between these two cells is made

using glutamate filled vesicles called neurotransmitters. Figure 2.26 displays the

neurotransmitter release from the IHC.

32

Figure 2.26: Neurotransmitter release from the IHC to the auditory nerve fibre [13].

The release of the neurotransmitters requires the presence of calcium ions within the

presynaptic region of the IHC. In the presence of an auditory stimulus, the IHC membrane

potential is depolarised and triggers the calcium channel to open and allows calcium ions to

flow into the IHC. With ample quantity of the calcium ions, the neurotransmitters are

released from the IHC to the auditory nerve fibres via the synaptic cleft.

With respect to the flow of calcium ions, the neurotransmitter release from the IHC is

divided into three phases. The first phase involves the opening of the calcium ion channels

which is dependent on the IHC intracellular potential, also known as receptor potential. The

calcium current is essential in the neurotransmitter release and is derived from the IHC

receptor potential calculated as follows:

��c(>) = l�cmce=�X�� (>)(um(>) − x�c) (Eqn. 2.17)

where

x�c is the Nernst equilibrium potential for calcium in Volts;

um(>) is the IHC receptor potential in Volts;

l�cmce is the calcium conductance around the synaptic region of the IHC with all channels

open in Siemens;

=�X�(>), represents the fraction of the calcium channels that are open.

33

Upon the presence of significant IHC receptor potential, the calcium channels on the

IHC open after a delay of time [38]. This lag in the response of the calcium channel opening

with respect to the IHC receptor potential is modelled using a 1st-order differential equation

(low-pass filter) as follows:

��X�km�X�(/)

k/ + =�X�(>) = =�X�,� (Eqn. 2.18)

where

��X� is a time constant;

=�X�(>), represents the fraction of the calcium channels that are open;

=�X�,� is the steady state value of =�X� (>) when the rate of change of the calcium channel

opening, km�X�(/)

k/ is zero. It is defined by a Boltzmann function that is dependent on the IHC

receptor potential given by:

=�X�,� = �1 + ��c7H�^"]��cum(>)_�7H (Eqn. 2.19)

where ��c and ��c are constants that exhibit calcium currents from published observations.

The second phase involves the entry of the calcium ions through the calcium

channels and the brief accumulation of these ions in the synaptic region of the IHC. The

effect of calcium is brief due to its rapid removal from the synaptic site either through

dissipation or active chemical buffering [16]. As there is a synaptic delay involved with

chemical synapses, the calcium concentration in the synapse is modelled as a 1st-order, low-

pass filter equation with respect to the calcium current [38]:

��c k��c ��(/)k/ + iU<)|j(>) = ��c(>) (Eqn. 2.20)

where ��c is a time constant. The final phase is the evaluation of the neurotransmitter

vesicle release rate, which is a stochastic process based on the calcium ion concentration:

#(>) = =<^ b]iU<)|j�(>) − iU<)|j/4M� _3, 0h (Eqn. 2.21)

where

k(t) is the neurotransmitter release rate;

3 is a scalar for converting calcium concentration levels into release rate;

iU<)|j/4M� is the threshold for calcium ion concentration that determines the probability of

release of neurotransmitter.

34

Neurotransmitter vesicle release is dependent on its availability in the IHC immediate

store. If there are sufficient neurotransmitters in the immediate store, they are released to

the synaptic cleft. The released neurotransmitters are expected to dock at the post synaptic

afferent fibre where they initiate the auditory nerves to fire before returning to a reprocessing

store within the IHC and repackaged into vesicles. However, some neurotransmitters in the

synaptic cleft are lost. To ensure sufficient neurotransmitters are present in the immediate

store, the IHC manufactures new neurotransmitters in a neurotransmitter factory [13]. Figure

2.27 illustrates neurotransmitter flow between IHC and AN fibre.

Figure 2.27: Neurotransmitter discharge and retrieval flow [16].

In the MAP model, a single neurotransmitter release is sufficient to trigger a voltage

spike in the AN fibre [39]. When there are insufficient neurotransmitter vesicles in the

immediate store for release, synaptic adaptation occurs where the AN fibres are unable to

fire as a result [6]. There are three equations that characterise the quantal process of

neurotransmitter release and replenishment in the IHC and AN [13], [19] and [16]. The first of

three equations is as follows:

k�(/)k/ = ^�(>) + �]V − T(>)_ − #(>)T(>) (Eqn. 2.22)

where T(>) is a time-varying quantity of neurotransmitters in the immediate store;

35

^ is the neurotransmitter transfer rate from the reprocessing store to immediate store;

�(>) is the quantity of neurotransmitter in the reprocessing store;

�]V − T(>)_ is the new neurotransmitter transfer rate from the neurotransmitter factory;

M is the maximum number of neurotransmitters;

#(>) is the neurotransmitter release rate as derived from equation 2.21.

Equation 2.22 accounts for the neurotransmitters in the immediate store that are

ready to be released from the IHC to the synapse at a rate of k(t). The equation also

accounts for newly manufactured neurotransmitters as well as those that are returned from

the reprocessing store. The second of three equations is defined as

k-(/)k/ = #(>)T(>) − Ca(>) − Ea(>) (Eqn. 2.23)

where a(>) is a time-varying quantity of the neurotransmitters in the synaptic cleft;

C is the quantity of neurotransmitters lost in the synaptic cleft;

E is the reuptake rate where neurotransmitters return from the synaptic cleft to the

reprocessing store.

Equation 2.23 defines the quantity of neurotransmitters in the cleft that accounts for

IHC to synapse released neurotransmitters as well as lost and IHC returning

neurotransmitters. The final equation of three equations represents the quantity of

neurotransmitters in the reprocessing store within the IHC that factors in the recycled

neurotransmitters from the synaptic cleft as well as the neurotransmitters that depart for the

immediate store. The equation is as follows:

k:(/)k/ = Ea(>) − ^�(>) (Eqn. 2.24)

2.2.5 Auditory Nerve

The neurotransmitter release event is used for determining AN spike events. MAP14

models this in two ways, using either a probability or a quantal model. The quantal model is

computationally intensive process where every spike is computed in detail. This is essential

for upstream auditory periphery analysis of the cochlear nucleus and brain stem activities.

However, in order to meet the objectives of this project, which is to implement the auditory

pathway up to the auditory nerve spiking, the quantal AN spiking up to the brain stem

activities computation are excluded in this thesis. AN spiking based on probability will be

discussed in this section. The AN firing rate is dependent on the quantity of neurotransmitter

residing in the synaptic cleft [16] and is given by

36

F� &AEA@; E<>� = -(/)k/ (Eqn. 2.25)

With an absolute refractory period of 0.75 ms, a spike initiation is only permissible

after 0.75 ms after the first spike occurrence. Equation 2.26 defines the probability of spikes

occurrence in this absolute refractory period:

�,5M�k = 1 − � (1 − �/7H)/7M�,Mc-/0M� (�M50k/7H (Eqn. 2.26)

where � (1 − �/7H)/7M�,Mc-/0M� (�M50k/7H is the product of probabilities of AN not firing in the

absolute refractory period. The probability of firing at the present time, t is moderated

proportionately to the prospect of the occurrence of any firing during the refractory period

[16]:

�/y = �/ × �,5M�k (Eqn. 2.27)

2.4 Summary

Five auditory pathway models have been presented in this chapter. The MAP model

has been selected for real-time implementation primarily because of its capability to simulate

stages in the auditory pathway that adhere closer to physiological findings as compared to

the other models mentioned. The stages of the model to be implemented in real-time and

their nonlinear characteristics have been described in detail. These stages range from the

outer and middle ear to the auditory nerve spiking in the cochlea.

37

Chapter 3: C Representation of MAP

A real-time system is defined as a system that satisfies response time constraints.

Similarly, real-time systems can be divided into three separate entities: hard, firm and soft. A

hard real-time system must satisfy the response time constraints as failure to meet these

constrictions results in absolute system failure. A firm real-time system allows for a few

deadline misses but anything more than the few misses will result in complete system

failure. A soft real-time system, however, allows for deadline misses although accumulative

deadline misses will lead to the functional degradation of the system [40].

Microsoft Windows operating system (OS) is not designed as a real-time operating

system [41] [42]. A real-time program requires predictable response from dependent

peripheral such as input and output (I/O) channels and Windows does not respond in a

rigidly predictable manner. Hence, hard real-time systems are not feasible for execution on

Windows and the success of firm real-time systems running on Windows is subjective to

deadline prerequisites, which vary for every system. However, Windows is able to

accommodate soft real-time system as intermittent deadline misses are tolerable [42].

Matlab Auditory Periphery (MAP) on Windows OS Therefore, a real-time implementation of

is qualified as a soft real-time system.

A real-time version of the MAP model requires the real-time system to process

auditory input signals of varying durations. Since the end time of the real-time system is

random, computation of a single window frame of discrete audio sample data lasting the

duration of input signal is impractical. While computing responses for small window frames

are insignificant, computing a single window frame consisting of a large duration of input

signal will result in delay in the response of the auditory periphery (AP) computer model.

This delay is varying based on the type of computer platform used and length of the window

frame.

A more practical manner of achieving reduced delay in response in the AP model is

to process discrete audio sampled data in reduced size window frames. The availability of

every window frame of audio samples data is done in a synchronous, regular time triggered

event to the algorithm for computation. Upon the availability of a window frame of data, the

computer starts processing it while the subsequent window frame of audio sampled data are

streamed and stored in a separate buffer segment to be released after the computation of

the current window frame is completed. Hence, through multi-tasking of such nature, real-

time responses of the AP computer model are achievable.

At the start of this project, it was hypothesized that the MAP algorithm had to be

ported to a low level programming language platform such as C as the latter programming

platform offered less computing redundancy and can therefore, compute identical responses

38

as the MAP model in shorter time duration. We called our version of the MAP model on a C-

platform that runs from the Microsoft Windows command prompt RTAP-numerical. It is a

transitional software program that was developed to accommodate and to ensure the

operational integrity of MAP algorithms after translation to the C programming language.

RTAP-numerical is required to process a large block of input data samples by

breaking it down into multiple window frames, similar to the MAP model so as to facilitate

real-time computation that will be described in chapter 4. MAP model processes 10ms

window frame whereas RTAP-numerical handles 58ms window frame based on a sampling

frequency of 22.05 KHz. Though MAP version 1-14 handles multiple window frames, its

memory management in Matlab environment is an obscure process [43]. Hence, this chapter

illuminates the results of investigation of memory utilisation based on multiple window

frames implemented on RTAP-numerical. Another essential task covered in this chapter is

the porting of the Matlab command functions to RTAP-numerical.

3.1 Buffer Management and Program Structure

3.1.1 Buffer Structure

Memory allocation in RTAP-numerical is dependent on the specific stage of interest

in the AP model that the response is set for. In every stage, the common memory allocation

structure caters for input and output (IO) parameters as well as coefficients. Table 3.1

summarises the utilisation of buffers in RTAP-numerical. A window frame size refers to the

audio block size acquired from an audio library in a cyclical period and similarly, it also refers

to the audio data block size that is processed by algorithms in the AP model. In order to

synchronise the aforementioned incoming and outgoing data to and from the audio buffer,

feedback and control parameters are required as part of buffer critical section protection.

Should the outgoing data window frame size for computation vary, redundancy increases as

additional variables are required to coordinate data transfer to and from the audio buffer.

This redundancy is eradicated by allowing the size of the outgoing window frame to be

identical to the incoming window frame size. Therefore, a constant of 1280 is selected as the

default size of one window frame for both incoming and outgoing data blocks. It is based

fundamentally on the incoming data block size with respect to the maximum sampling

frequency of 22.05 KHz of the audio library in the real-time model covered in chapter 4.

The number of auditory nerve (AN) channels is the product of the number of best

frequency (BF) channels and the number of AN fibre types used. At the start of RTAP-

numerical, the three aforementioned parameters are set and the buffers are allocated and

subsequently, the algorithms are computed. During the runtime of RTAP-numerical, these

39

buffers are reused at the instance of computation in every window frame regardless of tasks

execution in the AP model.

Tasks Buffer Size

Outer and middle ear (OME) 1280

Basilar membrane (BM) Number of BF channels * 1280

Inner hair cell (IHC) Number of BF channels * 1280

Neurotransmitter release rate (NRR) Number of AN channels * 1280

Auditory Nerve Spiking Probability (ANSP) Number of AN channels * 1280

Table 3.1: Memory allocation for IO parameters and algorithm coefficients.

3.1.2 Algorithm Structure

The structure of the functions in RTAP-numerical is designed to make as few function

calls as possible from BM to ANSP stages. As an illustration, let each task in table 3.1

represent a function call in C and it is assumed that these functions calls are made from a

main function within RTAP-numerical. Upon the availability of one window frame of sampled

audio data, the OME function is called once. The number of invocation of BM algorithm

function, however, will be dependent on the number of BF channels. With the involvement of

IHC algorithm, the number of function calls is twice the number of BF channels which

includes BM algorithm and excludes the OME invocation. The number of function calls for

NRR and ANSP algorithms introduce a maximum of four and six times of the number of BF

channels invocations based on a maximum of two AN fibre types used. A large number of

BF channels will add to the processing time in terms on the number of function calls should

this setup be used. In order to alleviate this inefficiency, the functions are categorically

implemented based on a specific response of interest. Table 3.2 summarises the

implementation of functions in RTAP-numerical.

40

Tasks Functions

DRNL DRNL-to-IHCRP DRNL-to-NRR DRNL-to-ANSP

BM � � � �

IHC � � �

NRR � �

ANSP �

Function Output

BM

displacement

IHC receptor

potential

Neurotransmitter

vesicle release

rate

AN spiking

probability

Table 3.2: Algorithm functions in RTAP-numerical.

Each of the four functions in table 3.2 when invoked computes the respective

algorithms assigned to it, which is denoted by a tick symbol. These functions also need to be

invoked just once per window frame that computes the respective responses for all BF

channels. It is only able to output one type of response as indicated by the ‘Function Output’

row of table 3.2. By restricting data availability to one buffer type for one function, code

complexity in terms of shared memory protection is minimised. This function structure also

adds a dimension of algorithm selectivity, which is advantageous in a graphical user

interface (GUI) based real-time implementation, as covered in chapter 4.

3.1.3 Program Structure

The pseudocode in listing 3.1 defines the general flow of RTAP-numerical. Upcoming

sections will describe the capability of RTAP-numerical to manage blocks of segmented

windows from the OME to ANSP stage. The flowcharts used in describing these algorithms

in the sections to come accommodate only one BF channel. This is done to ensure that the

responses of these stages comprehensively match the responses of the MAP model. For

multiple BF channels, these flowcharts can be implemented and diagrammatically projected

in parallel though they are not presented in this thesis. Multiple BF channel algorithms, which

are implemented with for loops in C and C++ will have its responses demonstrated in the

next chapter.

for (t=StartSampleData; t<(WindowFrameSizePerBF+StartSampleData); t++) {

// Generate the 500Hz sine tone sample-by-sample

x[t] = 0; x[t] = sin(2*PI*ToneFreq[i]*t/fs) * PeakAmplitude;

// Offset to the next segment within buffer StartSampleData += WindowFrameSizePerBF;

41

// OME functions

ExternalEarResonances ( pEar );

TMdisplacement ( pEar );

StapesInertia ( pEar );

// Use pre-processor directives to select an algorithm function to run

#if MEASURE_DRNL_HIRESTIMING DRNL ( pEar );

#elif MEASURE_DRNL2IHCRP_HIRESTIMING

DRNL_to_IHC_RP ( pEar ); #elif MEASURE_DRNL2IHCPRESYNAPSE_HIRESTIMING

DRNL_to_NRR ( pEar );

#elif MEASURE_DRNL2AN_HIRESTIMING DRNL_to_ANSP( pEar );

#endif

}

Listing 3.1: RTAP-numerical program structure.

3.2 Parameters Setup

The setup of MAP and RTAP-numerical for data acquisition is presented in table 3.3

while the settings of parameters used for the algorithms are projected in table A.1. For the

purpose of comparing responses between MAP and RTAP-numerical, a 500 Hz pure sine

tone input signal is used. One contrast in the settings of MAP and RTAP-numerical is the

number of window frames hosting the input stimulus. In MAP, the sine tone input is streamed

from a single window whereas RTAP-numerical relies on five equal sized segmented

windows. The five windows in RTAP-numerical are streamed into the algorithm functions

one after the other and serve as an indication of sampled audio data availability of unfixed

time duration. All results obtained from MAP and RTAP-numerical are stored in a numerical

format in a text file. Graphical illustrations of these results are done offline in a spread sheet

and are included within this chapter in the different sections below.

42

Settings MAP RTAP-Numerical

Stimulus frequency 500 Hz sine tone 500 Hz sine tone

Stimulus levels 50 dB SPL 50 dB SPL

Number of window frames 1 5

Size of window frame 220 44

Sampling rate 22050 Hz 22050 Hz

Duration of Signal Acquired 10ms 10ms

Response Signal BF 250 Hz 250 Hz

Number of BF channels 1 1

Number of AN channels 2 2

Table 3.3: Input settings of MAP and RTAP-numerical.

3.3 IIR Filter

3.3.1 Background

MAP uses a built-in filter function command in the Matlab environment. This

command function is implemented as an infinite impulse response (IIR) filter and is utilised in

the computations of external ear response, tympanic membrane and stapes displacements

of the OME stage as well as gammatone filter and IHC cilia displacement that are part of BM

displacement and IHCRP responses respectively. In the Matlab environment, the filter

command function is implemented in the form of a direct form 2 transposed filter type as

illustrated in figure 3.1 [44]. Equation 3.1 describes mathematical characteristic of the filter

command function.

�(@) = �i0j ∗ î@j + �i1j ∗ î@ − 1j + �i2j ∗ î@ − 2j + ⋯ + �i@�j ∗ î@ − @�j −

ciHj∗�iR7Hjcitj − ci)j∗�iR7)jcitj − ⋯ − ciRcj∗�iR7RIjcitj (Eqn. 3.1)

@ − 1 where is the filter order;

@� is the feedforward filter order;

@< is the feedback filter order.

The filter possesses several delay nodes denoted by z-1 that can be treated as

boundaries when the various stages of the filter are analysed. Input data, x[m] is streamed

into all the coefficient nodes denoted by b[0], b[1], b[2] up to b[nb]. Assuming that all delay

43

coefficients are initialised to zero, the initial output denoted by y[0] is dependent on the first

data sample, x[0], scaled by b[0]. At the arrival of the second input sampled data, x[1], b[1]

and –a[1] coefficients along with b[0] coefficient are required to generate y[1]. Hence, output,

y[1] is a result of the delayed manipulation of x[0] at the b[1] branch and y[0] at the -a[1]

branch added with x[1] that is scaled by b[0]. In this manner, the mth sampled output, y[m], is

dependent on the sum of x[m] to x[m-nb] range of inputs and y[m-1] to y[m-na] range of

outputs scaled by coefficients ranging from b[0] to b[nb] and a[0] to -a[na] respectively.

Figure 3.1: Direct form type 2 IIR filter implemented by Matlab filter command.

3.3.2 Implementation

The characteristic of the IIR filter is defined by a difference equation and the

coefficients b and a are parts of the numerator and denominator of the equation. As such,

the order of the filter can be segregated into two computational phases based on numerator

and denominator orders. The computation can be further broken down into two additional

parts. The first part involves the initial phase where the number of sampled input data, x,

streamed into the filter is smaller than the larger of either numerator or denominator order.

The second part involves the number of sampled data streamed into the filter being equal to

the larger of either the numerator or denominator order. The third and fourth parts have

exactly the same properties as the first two parts with the exception of the input data, x,

replaced by the output data, y. The breakdown of the IIR filter computation into four parts

offers a prudent manner of debugging the code to attain a response matching that of filter

command function in the Matlab environment.

Figure 3.2 elaborates further on the expected IIR filter implementation in RTAP-

numerical based on the four phases described in the previous paragraph. In part (a) of

Figure 3.2, the initial output value, y[0] is computed with only the b[0] and a[0] coefficients.

The subsequent output, y[1], in Figure 3.2 (b) is calculated with coefficients , b[1] and a[1]

included in the computation along with b[0] and a[0] after encountering a delay following the

y[m]

� � � �

b[0] b[1] b[2] b[nb] ...

x[m]

-a[1] -a[2] -a[na] ...

3Hi=j 3)i=j 3R7Hi=j a[0]-1

z-1 z-1 z-1

44

initial computation of y[0]. Similarly, in Figure 3.2 (c) coefficients, b[2] and a[2] are included

along with the coefficients b[0], a[0], b[1] and a[1] to attain y[3]. When all the coefficients are

used in the computation of the output as in Figure 3.2 (d), the filter then is defined to be in

the second phase of computation. The pseudocode depicting the general IIR filter

characteristic is shown in listing 3.2.

(a)

(b)

(c)

+

x[0]

y[0]

b[0

a[0]-1

b[nb] b[2] b[1]

a[1] a[na] a[2]

+ + +

...

...

...

... m = 0

((m+1)th sample, 1st frame)

+ +

x[1]

y[1]

b[0]

a[0]-1

b[nb] b[2] b[1]

a[1] a[na] a[2]

x[0]

+ +

...

...

...

... m = 1


y[0]

+ +

x[2]

y[2]

b[0]

a[0]-1

b[nb] b[2] b[1]

a[1] a[na] a[2]

x[1] x[0]

+

...

...

...

... m = 2


y[1]

+

y[0]

45

(d)

Figure 3.2: Induction of numerator and denominator coefficients at the initial phase of input

data sample streamed into IIR filter algorithm.

// (I) Initial input data stream computation for first time frame block

// 1) Calculate output, y[0]. for loop below will take care of y[1] onwards

y[0] = (b[0] * x[0] / a[0]) + zi[0];

for (i=1; i<FirstLoopCap; i++)

{

y[i] = 0.0; // initialise y[i] to 0 first

// 2) Numerator order increases for every iteration of the for loop above

if (i < NumeratorOrder) // use the increment, i to set the for loop boundary based on b coefficients

temp = i + 1;

else // if i has increased to length(b) or more, cap upcoming for loop boundary

// to numerator order

temp = NumeratorOrder;

// 3) Output, y[i], computation based only on numerator coefficient, b.

for (j=0; j<temp; j++)

y[i] = y[i] + (b[j] * x[i-j]);

// 4) Denominator order increases for every iteration of the for loop above

if (i < DenominatorOrder) temp = i;

else

temp = DenominatorOrder - 1;

// 5) Follow up on y[i] computation based only on denominator coefficient, a.

for (j=0; j<temp; j++)

y[i] = y[i] - (a[j+1] * y[i-j-1]);

// 6) Account for final denominator coefficient, a[0], before final output

y[i] = y[i] / a[0]; }

// (II) End of initial part, second phase handles the rest of the input stream

for (i=uiFirstLoopCap;i< WindowFrameSizePerBF;i++)

{

...

...

...

...

+ +

x[m]

y[m

b[0]

a[0]-1

b[nb] b[2] b[1]

a[1] a[na] a[2]

x[m-1] x[m-2] x[m-nb] m = m

((m+1)th sample, any frame)

y[m-1]

+

y[m-2]

+

y[m-na] ...

46

y[i]=0.0f;

// 7) Compute y[i] with respect to all the numerator coefficients

for (j=0;j<NumeratorOrder;j++)

y[i] = y[i] + (b[j] * x[i-j]);

// 8) Compute y[i] with respect to all the denominator coefficients

for (j=0;j<(DenominatorOrder-1);j++)

y[i] = y[i] - (a[j+1] * y[i-j-1]);

// 9) Ensure y[i] is scaled by a[0] at the end of computation

y[i] = y[i] / a[0]; }

Listing 3.2: IIR filter.

3.4 Outer and Middle Ear

The IIR filter discussed in the previous section is able to process a window frame of

data for one BF channel. In real-time processing, multiple window frames of data are

required to be streamed into the IIR filter sequentially. As the computation of a typical output

response with an IIR filter requires the use of past inputs and outputs, RTAP-numerical must

be able to handle the transition between subsequent window frames. This means that

RTAP-numerical must store the cluster of input and output data just before the end of a

window frame and load these parameters for computing output response on the subsequent

window frame. In this way, though the signals are broken down and segmented, the

processed data from the algorithm when integrated from multiple window frames output

response will be able to generate a continuous signal.

The setup of the IIR filter in section 3.2 is insufficient to compute transition between

window frames of data. Hence, the IIR filter is required to be retrofitted with additional code

to save input and output parameters at the conclusion of a window frame of data and load

these parameters at the start of the subsequent time frame. The pseudocode for the load

and saving features within the IIR filter along with the explanative comments are illustrated in

listing 3.3. The external ear resonance (EER), tympanic membrane (TM) and stapes

displacement in the outer and middle ear (OME) stage as well as the inner hair cell (IHC)

cilia displacement computation utilise the parameter save and load feature in listing 3.3 to

achieve continuity in responses between adjacent window frames.

// (I) Parameter save feature in IIR filter within preceding time frame.

// 1) Save (NumeratorOrder – 1) number of input parameters, x, towards the end of

// time frame. for (i=0; i<(NumeratorOrder-1); i++)

{

// 1a) Save in a global array, prevX[], for use later. prevX[i] = x[WindowFrameSizePerBF - 1 - i];

47

}

// 2) Save (DenominatorOrder – 1) number of output parameters, y, towards the end of // time frame.

for (i=0; i< DenominatorOrder-1; i++)

{ // 2a) Save in a global arrat, prevY[], for use later.

prevY[i] = y[WindowFrameSizePerBF - 1 - i];

}

// -----------------------------------------------------------------------------------

// (II) Parameter load feature in IIR filter within subsequent time frame.

// 3) This segment is similar to part 1 in listing 3.2 except that an additional if

// condition is added to switch between these 2 segments. for (i=0; i<FirstLoopCap; i++)

{

y[i] = 0.0f;

// 4) Compute y[i] with numerator coefficients and past input parameters from

// current time frame.

for (j=0; j<NumeratorCap1; j++) {

y[i] = y[i] + (b[j] * x[i-j]);

CarryForward = j; }

// 5) Increase the cap for number of coefficients and past input parameters if (NumeratorCap1 < NumeratorOrder)

NumeratorCap1++;

// 6) Compute y[i] by accounting for past input parameters from preceding time // frame.

for (j=0; j<NumeratorCap2; j++)

{ // Part (6a)

y[i] = y[i] + (b[CarryForward+j+1] * prevX[j]);

// Reduce the dependency on the past input parameters from preceding time // frame.

NumeratorCap2--;

}

// 7) Compute y[i] with denominator coefficients and past output parameters from

// current time frame.

for (j=0; j<DenominatorCap1;j++) {

y[i] = y[i] - (a[j+1] * y[i-j-1]);

CarryForward = j+1; }

// 8) Increase the cap for number of coefficients and past output parameters

if (DenominatorCap1 < DenominatorOrder-1) DenominatorCap1++;

// 9) Compute y[i] by accounting for past output parameters from preceding time // frame.

for (j=0; j<DenominatorCap2; j++)

{ // Part (9a)

y[i] = y[i] - (a[j+CarryForward+1] * prevY[j]);

// Reduce the dependency on the past output parameters from preceding time // frame.

48

DenominatorCap2--;

} }

Listing 3.3: Input and output parameters save and load feature in the IIR filter.

The OME stage consists of a serial cascade of three 1st-order filters. In RTAP-

numerical, each of these three filters invokes the IIR filter function, DFT2filter, described in

section 3.3 along with window continuity features from listing 3.3 to process a stream of

auditory stimulus. The response of the third filter is then scaled to produce stapes

displacement in metres. This scaling is performed in a recursive for loop. Within the same

recursive loop, dual resonance nonlinear (DRNL) algorithm buffers for the linear and

nonlinear pathway are assigned to an initial value as part of preparation to invoke DRNL

function for basilar membrane (BM) displacement computations. The initialisation of

parameters in the DRNL buffers is done within the OME function so as to reduce processing

time incurred in the DRNL function. Since a recursive loop is required to initialise the buffers

as well as compute the stapes displacement for the length of audio data block for a window

frame, it is computationally beneficial to place these tasks in one recursive loop instead of

two separate loops in OME and DRNL respectively. Figure 3.3 illustrates the OME structure

for processing sampled audio data from three window frames.

49

Figure 3.3: OME processing of multiple window frames.

The blue ring on the top and bottom ends of the filter blocks in window frames 2 and

3 represent the reliance on the past inputs and outputs from preceding window frames.

Hence, computation of initial outputs in window frame 2 relies on inputs and outputs from

window frame 1 and subsequently, window frame 3 relies on saved parameters from window

frame 2. Though, illustration ends at window frame 3, in the actual real-time implementation

computing continues with identical settings of storing of past input and output parameters

from the preceding window frame and loading them into the subsequent window frame for

output response computation. Figure 3.4 displays stapes displacement generated in MAP as

well as in RTAP-numerical with the auditory stimulus settings given in table 3.1. The

normalised root mean squared (RMS) error between the MAP and RTAP-numerical

generated stapes displacement stands at 4.8e-7 as acquired from equation 3.9. Section 3.9

++++ ++++ ++++

Stimulus

window 1 Stimulus

window 2 Stimulus

window 3

Stapes displacement

window 1

Stapes displacement

window 2

Stapes displacement

window 3

EER computation

TM displacement computation

Stapes displacement computation

1st-order BPF

1st-order LPF

1st-order HPF

1st-order BPF PrevX

PrevY

1st-order LPF PrevX

PrevY

1st-order HPF PrevX

PrevY

1st-order BPF

PrevX

PrevY

1st-order LPF PrevX

PrevY

1st-order HPF PrevX

PrevY

50

provides a detailed description of the computation of the normalised RMS error for all stages

of the AP model.

Figure 3.4: Stapes displacement response in MAP and RTAP-numerical.

3.5 Basilar Membrane

The best frequency (BF) sites in MAP are calculated using logspace command

function in the Matlab environment. Along with the minimum BF, BFmin, and maximum BF,

BFmax, the number of BF channels is also required to be specified when invoking logspace in

Matlab. In order to compute logarithmic spacing of all of the BF components within the range

of BFmin and BFmax, it is essential to translate logspace command function from MAP on to

RTAP-numerical. The calculation of BF components from logspace command function is

given by equation 3.2 [45].

PS5 = 10i906q!(f�w��)|5∗o/�(j (Eqn. 3.2)

where BFi is logarithmically spaced BF component in the range of BFmin and BFmax.

-4.00E-11

-3.00E-11

-2.00E-11

-1.00E-11

0.00E+00

1.00E-11

2.00E-11

3.00E-11

4.00E-11

0 0.002 0.004 0.006 0.008 0.01 0.012

Dis

pla

cem

en

t (m

etr

es)

Time

OME Response

MAP

RTAP-numerical

51

i is the index of frequencies ranging from 1 to the number of BF channels;

step is a logarithmic increment from BFmin to BFmax defined by equation 3.3.

D>�" = 906q!(f�w��)7906q!(f�w��)

@ 7H (Eqn. 3.3) PS

where @ is the number of BF channels required in the computation. The BF components f�are computed using logspace function before the DRNL algorithm is processed. Upon the

invocation of DRNL function in RTAP-numerical, DRNL response computation is done for

every BF channel within a recursive loop that spans up to the number of BF channels. The

pseudocode listing for DRNL computation is as follows:

// 1) Logspace computation for BF components at the start before DRNL() is called

RTAPlogspace ( BFlist, MinBF, MaxBF, NumBFchannels );

...

// 2) Compute DRNL response for parallel number of BF channels for (i=0; i<NumBFchannels; i++)

{

// 3) Compute gammatone response for linear pathway Gammatone ( &Linear_Gammatone[i], Linear_Gammatone_Order );

// 4) Compute first pass of gammatone filter in the nonlinear pathway Gammatone ( &Nonlinear_Gammatone[i], Nonlinear_Gammatone_Order );

// 5) Compute the nonlinear compression based on the stimulus input level (dB SPL)

DRNL_brokenstick_nl( DRNL_nonlin_Input[i], DRNL_nonlin_Input[i]);

// 6) Compute the second pass of gammatone filter in the nonlinear pathway

Gammatone ( &Nonlinear_Gammatone[i], Nonlinear_Gammatone_Order );

// 7) Compute DRNL response by summing the linear and nonlinear pathway response

for (j=0; j< WindowFrameSizePerBF; j++) {

DRNL_response[i][j] = DRNL_linear_Output[i][j] + DRNL_nonlinear_Output[i][j];

}

Listing 3.4: DRNL computation.

The linear path of the dual resonance nonlinear (DRNL) filter consists of one 3rd-

order gammatone filter while the nonlinear path contains two 3rd-order gammatone filters that

cascade a nonlinear function in the middle. A 3rd-order gammatone filter implemented in

MAP includes three passes of the filter command function in the Matlab environment. A

similar arrangement is necessary to attain a 3rd-order gammatone filter in RTAP-numerical.

The major difference between MAP and RTAP-numerical is the utilisation of buffers. In

RTAP-numerical, all buffers are explicitly allocated and to achieve a 3rd-order IIR filter effect,

the input and output buffers to the gammatone filters are swapped at every repetition of a

52

recursive loop defined by the filter order. Figure 3.5 illustrates the input and output buffer

swaps within the gammatone filter function in C.

Figure 3.5: 3rd-order gammatone filter implementation in RTAP-numerical.

Unlike the 1st-order filters within the OME and IHC stages where only one set of input

and output parameters require buffering, DRNL in real-time require multiple sub-stage

buffers due largely to the number of passes required to generate basilar membrane (BM)

displacement response. The setup of figure 3.5 ensures that only a bare minimum of two

buffer types are required to facilitate high filter orders without relying on additional buffers to

store intermediate filter responses with respect to the number of filter algorithm passes.

Figure 3.6 illustrates the structure of DRNL filter to compute the BM response of first three

window frames. Computation of any other window frames beyond the third window frame

follows the same setup of either time frame 2 or 3.

IIR filter 2nd pass

IIR filter 1st pass

Input Buffer

Input Buffer

Output Buffer

Output Buffer

IIR filter 3rd pass

1st-order IIR filter

algorithm


algorithm


algorithm

53

Figure 3.6: DRNL filter processing for 1 BF channel in multiple window frames.

Nodes ‘G1’, ‘G2’ and ‘G3’ in Figure 3.6 represent a 3rd-order gammatone filter in both

the linear and nonlinear pathway. Nodes ‘G4’, ‘G5’ and ‘G6’ represent a 3rd-order

gammatone filter in only the nonlinear pathway. Node ‘NL’ characterises the nonlinear

G1

G2

G3

G1

G2

G3

NL

G4

G5

G6

G1

G2

G3

G1

G2

G3

NL

G4

G5

G6

G1

G2

G3

G1

G2

G3

NL

G4

G5

G6

SW1 SW2 SW3 Stapes

displacement, S

prevG6X prevG6X

prevS prevS

prevG1Y prevG1Y

prevG2Y prevG2Y

prevG2X prevG2X

prevG3X prevG3X

prevG3Y prevG3Y

prevG4X prevG4X

prevG5X prevG5X prevG4Y prevG4Y

prevG5Y prevG5Y

prevG6Y prevG6Y

BMW1 BMW2 BMW3

Basilar membrane

displacement, BM

++++ ++++ ++++

54

compressive function in the DRNL that is computed based on the auditory stimulus level

setting. The use of additional buffers to store past inputs and outputs from preceding window

frame is evident from the illustration. In the case of gammatone filters denoted by ‘G1’, ‘G2’

and ‘G3’ in both pathways, past input and output buffers are differentiated based on linear

and nonlinear pathways. Hence, two sets of identical buffers exist from prevS up to prevG3Y

that are used for linear and nonlinear pathways. Only one instance of buffer is allocated from

prev4GX onwards in the nonlinear pathway due to the absence of a second 3rd-order

gammatone filter in the linear pathway. Since the nonlinear compressive algorithm in MAP

is a memoryless implementation, it does not depend on any past parameters. Hence no

buffers are allocated for it.

Figure 3.7 presents the DRNL linear, nonlinear and summed responses as well as

IHC cilia displacement in MAP and RTAP-numerical based on the stimulus settings given in

table 3.1. The linear and nonlinear DRNL responses are shown as references to ensure that

the DRNL summed response is computed without flaws. The normalised RMS error of the

BM displacement represented by the DRNL summed responses of MAP and RTAP-

numerical is 5.4e-6. The IHC cilia displacement is included in this section because its

amplitude is measured in the same measurement units as BM displacement and it also

provides an indication of the integrity of the input signal to the IHC receptor potential

algorithm.

55

Figure 3.7: BM and IHC cilia displacement responses generated by MAP and RTAP-numerical.

-6.00E-08

-4.00E-08

-2.00E-08

0.00E+00

2.00E-08

4.00E-08

6.00E-08

0 0.002 0.004 0.006 0.008 0.01 0.012

Dsi

pla

cem

en

t (m

etr

es)

Time

MAP DRNL & IHC Response

Summed DRNL

Lin DRNL Resp

Nonlin DRNL Resp

IHC Cilia Displacement

-6.00E-08

-4.00E-08

-2.00E-08

0.00E+00

2.00E-08

4.00E-08

6.00E-08

0 0.002 0.004 0.006 0.008 0.01 0.012

Dis

pla

cem

en

t (m

etr

es)

Time

RTAP-numerical DRNL & IHC Response

Summed DRNL

Lin DRNL Resp

56

3.6 Inner Hair Cell Receptor Potential

The inner hair cell (IHC) stage starts with the computation of the IHC cilia

displacement using a 1st-order high-pass filter. The IIR filter is utilised and its input is the BM

displacement response discussed in section 3.5 and computed for various BF channels

using parallel DRNL structure. After the IHC cilia displacement is computed, a recursive loop

is entered where the apical conductance, G(u), and IHC receptor potential (IHCRP) are

computed using equations 2.13 and 2.15. Figure 3.8 displays the computation of IHCRP

response for the first three time frames upon start of RTAP-numerical.

Figure 3.8: IHCRP algorithm processing for 1 BF channel in multiple window frames.

BM window 1

BM window 2

BM window 3

VIHCRP

window 1 VIHCRP

window 2 VIHCRP

window 3

Yes

1st-order HPF

Apical conductance, G(u),

computation

VIHCRP

computation

z-1

Vinitial

All data within 1 time frame processed?

No

VIHCRP

computation

z-1


1st-order HPF


computation

No

Yes


computation

Yes

VIHCRP

computation

z-1


1st-order HPF

No

57

To compute the very first IHCRP in the first window frame requires an initial IHCRP.

This initial value is essentially the resting potential of IHC within the BF channel defined by

equation 3.4.

u5R5/5c9 = l#x#′ +l(B)0x>l#+l(B)0 (Eqn. 3.4)

where x+y and l(B)t are defined in equations 2.16 and 2.13 respectively. All computations

apart from the first IHCRP computation from the first window frame are done using the

immediate past IHCRP value. This is denoted by the feedback of the IHCRP to the IHCRP

computation node. At the end of a window frame, the last IHCRP value computed is fed

forward to the subsequent window frame as its initial value after a 1-sample delay signified

by a z-1 node. A recursive loop ensures that the computations of IHCRPs are performed for

the size of a window frame for every BF channel before exiting. Figure 3.9 displays the

IHCRP for MAP and RTAP-numerical for the input stimulus settings given in table 3.1. The

normalised RMS error between the two signals in figure 3.9 is 5.1e-6.

Figure 3.9: IHC receptor potential response generated by MAP and RTAP-numerical.

3.7 Neurotransmitter Release Rate

The neurotransmitter release rate (NRR) stage introduces the low spontaneous rate

(LSR) and high spontaneous rate (HSR) AN fibres. A HSR AN fibre is more responsive at

the linear phase of BM response than a LSR AN fibre [46]. The LSR AN fibre whereas, are

responsive at the compressive phase of the BM response [47]. This means that a change in

-7.00E-02

-6.50E-02

-6.00E-02

-5.50E-02

-5.00E-02

-4.50E-02

0 0.002 0.004 0.006 0.008 0.01 0.012

Vo

lta

ge

(V

)

Time

IHC Receptor Potential Response

MAP

RTAP-numerical

58

the auditory stimulus level results in subdued BM vibrational motion due to its effect of

compressive effects, which in turn influences reduced firing rate response from the LSR AN

fibre [48]. The LSR and HSR fibres make two AN channels for a unique BF channel.

Figure 3.10: NRR algorithm processing for 1 AN channel in multiple window frames.

VIHCRP

window 1 VIHCRP

window 2 VIHCRP

window 3

NRR

window 1 NRR

window 2 NRR

window 3

mICa-INF computation

mICa computation

ICa computation

Ca2+ concentration computation

z-1

mICa0

z-1

U<t)|

NRR computation


No

Yes


mICa computation

ICa computation


NRR computation

z-1

z-1


No

Yes


mICa computation

ICa computation


NRR computation

z-1

z-1


No

Yes

59

The NRR algorithm is fed directly with data output from the IHCRP stage within the

same recursive loop described in section 3.7. The recursive loop takes into account of LSR

and HSR AN fibre types as well. In the first window frame, the first steady state fraction of

calcium channels opened parameter, =�X�,� , is computed using equation 2.19 with Vinitial tcalculated from equation 3.4. From =�X�,�t and calcium current, ICa0, the initial calcium

concentration, U<t)|, is computed. During the runtime of RTAP-numerical, =�X�,� and Ca2+

are updated by using their respective past immediate parameters as denoted by the

feedback pathway in figure 3.10. The z-1 delay nodes represent the same feedback

parameters from preceding window frame to the next. Figures 3.11 and 3.12 demonstrate

the responses from MAP and RTAP-numerical for LSR and HSR AN fibre types computed

based on the input parameters in table 3.1. The normalised RMS error for the LSR and HSR

plots in figures 3.11 and 3.12 are 8.0e-6 and 7.8e-6 respectively.

Figure 3.11: LSR AN fibre neurotransmitter vesicle release rate for MAP and RTAP-numerical.

0

2

4

6

8

10

12

14

16

18

0 0.002 0.004 0.006 0.008 0.01 0.012

Ne

uro

tra

nsm

itte

r R

ele

ase

Ra

te

Time

IHC Neurotransmitter Vesicle Release Rate

(LSR)

MAP

RTAP-numerical

60

Figure 3.12: HSR AN fibre neurotransmitter vesicle release rate for MAP and RTAP-numerical.

3.8 Auditory Nerve Spiking Probability

The auditory nerve spiking probability (ANSP) computation is done based on the

number of auditory nerve (AN) channels, which is a product of the number of BF channels

and the number of AN fibre types. The algorithm can be segmented into four phases. The

first phase of the algorithm starts off by entering a recursive for loop to compute the

neurotransmitter vesicle replenish rate from the factory, ejection rate from the IHC,

reprocessing rate, reuptake and loss rates that are defined by equations 2.22, 2.23 and

2.24. These parameters are categorised based on the lack of dependency on their

respective past parameters. The second computing phase requires the use of past

immediate parameters and includes the computation of the quantity of neurotransmitter

vesicles in the immediate and reprocessing stores within the IHC as well as the synaptic cleft

provided by the equations 2.22 to 2.24. As the second phase computation requires the use

of past immediate values, an initial value is required to compute the first output data for the

first window frame for HSR and LSR fibre types. This initial quantity of neurotransmitter

vesicles are computed as follows:

"UC�&>t = #0�V�(�+})+#0� (Eqn. 3.5)

0

50

100

150

200

250

300

350

0 0.002 0.004 0.006 0.008 0.01 0.012

Ne

uro

tra

nsm

itte

r R

ele

ase

Ra

te

Time

IHC Neurotransmitter Vesicle Release Rate

(HSR)

MAP

RTAP-numerical

61

"F$<AC<�C�t = "UC�&>0(�+})#0 (Eqn. 3.6)

"}�"E?a�DDt = "UC�&>0}� (Eqn. 3.7)

where pCleft0 is the initial probability of neurotransmitter vesicles in the synaptic cleft;

pAvailabel0 is the initial probability of neurotransmitter vesicles in the immediate store within

the IHC;

pReprocess0 is the initial probability of neurotransmitter vesicles in the reprocessing store

within the IHC;

Y is the depleted neurotransmitter vesicle replacement rate in the factory set as a constant

of 6;

M is the maximum neurotransmitter vesicles at the synapse set as a constant of 12;

L is the neurotransmitter vesicle loss rate in the synaptic cleft set as a constant of 250;

R is the neurotransmitter vesicle reuptake rate from the synaptic cleft into the IHC set as a

constant of 500;

X is the neurotransmitter vesicle replenishment rate from the reprocessing store set as a

constant of 60;

k0 is the initial neurotransmitter vesicle release rate computed with equation 2.21.

Subsequent computation of the second phase variables after the initial computation involves

the use of the immediate past variables, pCleftn-1, pAvailablen-1 and pReprocessn-1.

62

Figure 3.13: ANSP processing for 1 AN channel in multiple window frames.

NRR

window 1 NRR

window 2 NRR

window 3

ANSP

window 1 ANSP

window 2 ANSP

window 3


Neurotransmitter vesicles replenishment ejection, reuptake and loss rate

computation

Neurotransmitter vesicles

reprocessing and quantity in IHC and

synaptic cleft computation

z-1

Initial variables

AN output rate probability

computation

AN output not firing

probability computation

No

Yes

PrevANprobOutputRate





computation



PrevANprobNotFiring


computation


No

Yes

z-1

PrevANprobNotFiring

PrevANprobOutputRate





computation





computation

No

Yes

z-1

63

The third and fourth phases within ANSP algorithm include the computation of the

probabilities of AN spiking and AN not spiking rates respectively. Because the recurrence of

the AN spike rate is dependent on the refractory period, both the parameters from the third

and fourth phases require buffering for holding past parameters from preceding window

frames. As part of the effect of the refractory period where the likelihood of AN spiking

occurring diminishes, the number of repetitions within the recursive for loop that are

overlooked is computed by the formula as follows:

AM�,Mc-/0M� = >E�&E<a>?E�>D (Eqn. 3.8)

where irefractory is the index that is accounted in the for loop that defines the refractory period;

trefractory is the refractory period defined as a constant of 0.75ms;

ts is the sampling period of the auditory stimulus that is set according to a default sampling

rate of 22.05 KHz. Listing 3.5 characterises the pseudocode for computing probabilities of

AN spiking and not spiking rates for multiple window frames. Figures 3.14 and 3.15 illustrate

probabilities of AN spiking for LSR and HSR AN fibres for an input stimulus defined by the

settings in table 3.1 respectively. The normalised RMS errors between MAP and RTAP-

numerical generated responses for LSR and HSR plots in figures 3.14 and 3.15 stand at

3.1e-5 and 5.6e-6 respectively.

for (i=0; i < NumBFchannels; i++)

{

for (j=0; j < WindowFrameSizePerBF; j++)

{ for (k=0; k < Number_of_AN_fibre_types; k++)

{

// 1) Compute 1st and 2

nd phases of ANSP algorithm

...

// 2) Calculate the probability of AN spiking and also apply refractory effect.

// Condition 2a: Initial AN spiking probability computation in 1st window frame.

if ((!j) && (First_Window[i][k]))

{

// Since the probability of AN spiking depends on past parameter of

// probability of AN not spiking and this is the very 1st parameter computing

// done here for it, set immediate past parameter as 1.

pANspiking[k][j] = prob_Ejected[k] / ts;

} // Condition 2b: Initial AN spiking probability computation in 2

nd window frame

// onwards.

else if ((!j) && (!First_Window[i][k])) {

// Past parameter of probability of AN not spiking held in preceding

// window frame for first sample computation of any window frame so look into

// necessary buffer as follows: pANspiking[k][j] = prob_Ejected[k] * Prev_pANnotFiring[i][k] / ts;

}

// Condition 2c: For all other sample computation from any window frame. else

64

{

// Compute probability of AN spiking based on neurotransmitter release // reflected in prob_Ejected variable.

pANspiking[k][j] = prob_Ejected[k] * pANnotFiring[k][j-1] / ts;

}

// 2) Update the probability of AN not firing once the input index, j, has moved

// past refractory index. This adds recent & removes distant probabilities so as

// to reduce the fraction of spiking events during the refractory period. // Condition 2a: Compute for input index that has moved past refractory period.

if (j > Refractory_index)

{ // Compute for all window frames.

pANnotFiring[k][j] = pANnotFiring[k][j-1] * (1 – pANspiking[k][j] * ts)

/ (1 – pANspiking[k][j – Refractory_index-1] * ts); }

// Condition 2b: For 1st sample computation for any window frame apart from the 1

st

// window.

else if ((!pEar->pAN_IHCsynapse->bFirstWindow[i][k]) && (!j)) {

// Reliance on past parameter held in preceding window frame.

pANnotFiring[k][j] = Prev_pANnotFiring[i][k] * (1 – pANspiking[k][j] * ts) / (1 – Prev_pANspiking[i][k][j] * ts);

}

// Condition 2c: Computation for first window frame within the refractory period. else if ((bFirstWindow[i][k] == 0) && (j > 0) && (j <= Refractory_index))

{

pANnotFiring[k][j] = pANnotFiring[k][j-1] * (1 – pANspiking[k][j] * ts) / (1 - Prev_pANspiking[i][k][j] * ts);

}

// 3) Save past parameters for use in subsequent window frames. if (j >= (WindowFrameSizePerBF – Refractory_index - 1))

Prev_pANspiking[i][k][j-(WindowFrameSizePerBF –

Refractory_index-1)] = pANspiking[k][j]; if (j == (WindowFrameSizePerBF - 1))

Prev_pANnotFiring[i][k] = pANnotFiring[k][j];

} }

}

Listing 3.5: AN spiking and non-spiking probabilities for LSR and HSR AN fibre types.

65

Figure 3.14: Probability of AN spiking in LSR fibre for MAP and RTAP-numerical.

Figure 3.15: Probability of AN spiking in HSR fibre for MAP and RTAP-numerical.

3.9 Characteristic Responses for Various Input Settings

The responses for various stages of MAP and RTAP-numerical have only relied on

one input stimulus setting so far. However, before integrating the algorithms in RTAP-

numerical into a real-time implementation, it had to be known whether RTAP-numerical is

able to generate the responses for various input settings. These input settings described

0

20

40

60

80

100

120

140

160

180

200

0 0.002 0.004 0.006 0.008 0.01 0.012

AN

Sp

ikin

g P

rob

ab

ilit

y

Time

AN Spiking Probability (LSR)

MAP

RTAP-numerical

0

500

1000

1500

2000

2500

0 0.002 0.004 0.006 0.008 0.01 0.012

AN

Sp

ikin

g P

rob

ab

ilit

y

Time

AN Spiking Probability (HSR)

MAP

RTAP-numerical

66

here refers largely to the sound pressure levels (SPL) of the input stimulus measured in

decibels (dB). Responses of all stages within RTAP-numerical are recorded from OME to

ANSP stages for SPL sweeps of sine tone input stimulus ranging from 10 dB SPL to 90 dB

SPL in increments of 10 dB SPL. The input sine tone frequency was also varied to four

discrete levels of 500 Hz, 1000 Hz, 3000 Hz and 5000 Hz. Only for the 500 Hz sine tone, the

BF was set to 250 Hz based on the default settings of MAP. For all other aforementioned

sine tones, the respective BFs were selected to be close to the sine tone frequencies

calculated based on 30 logarithmic intervals from 250 Hz to 6000 Hz. Hence, the BFs for

1000 Hz, 3000 Hz and 5000 Hz sine tone inputs are 1039 Hz, 3109 Hz and 5377 Hz

respectively. The selection of the BF based on logarithmic intervals of 30 BF channels was

done to ensure that the logspace translated function in RTAP-numerical portrayed the same

behaviour as the Matlab logspace command function.

Figure 3.16: Normalised RMS errors between MAP and RTAP-numerical for a 500 Hz sine tone input observed from a 250 Hz BF channel.

0

0.000005

0.00001

0.000015

0.00002

0.000025

0.00003

0.000035

0.00004

0.000045

0.00005

0 20 40 60 80 100

No

rma

lise

d R

MS

Err

or

Input Level (dB SPL)

Normalised RMS Error: RTAP-numerical vs. MAP (500Hz sine tone)

Stapes Displacement

BM Displacement

IHCRP

NRR LSR

NRR HSR

ANSP LSR

ANSP HSR

67



0

0.000005

0.00001

0.000015

0.00002

0.000025

0.00003

0 20 40 60 80 100

No

rma

lise

d R

MS

Err

or



Stapes Displacement

BM Displacement

IHCRP

NRR LSR

NRR HSR

ANSP LSR

ANSP HSR

0

0.000005

0.00001

0.000015

0.00002

0.000025

0.00003

0.000035

0 20 40 60 80 100

No

rma

lise

d R

MS

Err

or



Stapes Displacement

BM Displacement

IHCRP

NRR LSR

NRR HSR

ANSP LSR

ANSP HSR

68


Figures 3.16 to 3.19 display the normalised RMS errors that are computed using

equation 3.9. As the responses for all stages within the MAP model generate responses of

varying degrees of magnitude, a normalised form of the RMS errors are presented for all

stages within the algorithms of the auditory pathway so as to project all errors within a single

graph.

�R0Mm zg ¡ = ¢£ ]¤¥¦¤¥§ _ �¥¨q ©�w��7�w�� (Eqn. 3.9)

where enorm RMSE is the normalised RMS error;

yk is a MAP generated data indexed by k within the algorithm defining a particular stage of

auditory model;

�k is response data generated by RTAP-numerical and indexed by k that defines an

algorithm response within the auditory pathway;

ymax is the maximum output parameter within the MAP generated data set;

ymin is the minimum output parameter within the MAP generated data set;

N is the total number of data to be analysed.

0

0.00001

0.00002

0.00003

0.00004

0.00005

0.00006

0.00007

0 20 40 60 80 100

No

rma

lise

d R

MS

Err

or



Stapes Displacement

BM Displacement

IHCRP

NRR LSR

NRR HSR

ANSP LSR

ANSP HSR

69

All normalised errors are significantly below 1% that projects the capability of the

translated algorithms in RTAP-numerical to match the responses of the MAP model very

closely. The highest errors are observed for the ANSP LSR followed by ANSP HSR

algorithms. However, these large errors are symptomatic of the propagation of smaller

magnitude errors that are scaled at every stage from the OME stage onwards. The use of

single precision computation to attain results from all the stages of the AP model led to the

truncation of floating point values especially in the OME and DRNL stages. This attributed to

the small errors present in these stages. Furthermore, with the magnitude of DRNL

responses in the order of 10e-8 and that of IHCRP are more significant in the vicinity of 10e-

1. These errors are further amplified at the NRR stages and finally at the ANSP stage, where

the responses are in the range of 10e-1 to 10e2 giving rise to the highest errors among all

stages as observed in figures 3.16, 3.17, 3.18 and 3.19.

3.10 Summary

The algorithms used in MAP from the outer and middle ear (OME) to the auditory

nerve spiking probability stages (ANSP) are implemented in C platform in a program called

RTAP-numerical. This program is able to simulate any of the aforementioned stage of MAP

without running the Matlab scripts. For a sinusoidal input stimulus within the range of 10 dB

SPL to 90 dB SPL as well as for frequencies within 500 Hz to 5000 Hz injected into the

program, the responses of RTAP-numerical match the responses from MAP from the OME

stage up to the ANSP stage for LSR and HSR fibres. As a result, the algorithms in RTAP-

numerical is able to be integrated satisfactorily into a graphical user interface with real-time

processing capabilities.

70

Chapter 4: Real-time Auditory Periphery (RTAP)

A program that runs on a computer is defined as a process and a process has at

least one executing thread [49]. A thread is an object within a process whose execution can

be scheduled to run at a discrete time [49] and it also defines the basic unit of CPU

utilisation [50]. Microsoft Windows supports thread priority scheduling in which the attributes

of a process can be prioritised. This allows a thread with a ‘HIGH’ priority to be executed by

the CPU more often than a thread with ‘NORMAL’ priority. The highest priority in Windows is

‘REALTIME’, which indicates that a thread gets the undivided attention of the CPU [42].

Buffered I/O that stores the status of keyboards and mouse relies on Windows kernel

threads. A CPU intensive real-time thread, which has a higher priority than kernel thread

consumes much of the CPU time and disallows buffered I/O to be engaged. Only when the

real-time thread execution is complete is the buffered I/O serviced. Disk I/O thread that

writes bulk data to a file running concurrently with a real-time thread have their operations

interleaved. This indicates that Windows supports time-sharing execution of the disk I/O

thread and the real-time thread [42].

A real-time application in Windows must not fully exploit the CPU for its own intensive

operation. Instead, I/O and disk access must also be accounted so as to ensure

responsiveness of the mouse, keyboard and the file saving features. To do so, a heartbeat

timer is recommended to be implemented in a real-time application with a process priority

lower than ‘REALTIME’. The heartbeat timer performs periodic execution of CPU intensive

real-time processing as well as I/O and disk operation and any applicable data acquisition.

The advantage of using a heartbeat timer is that it allows a real-time thread to predictably

relinquish the CPU for other peripheral-based thread execution [42]. The heartbeat timer is

the central mechanism that synchronises threads of varying functions in real-time auditory

periphery (RTAP), which is a real-time Windows program that wraps a graphical user

interface (GUI) and an audio hybrid library called JUCE around the C translated MAP

algorithms described in chapter 3.

RTAP is able to accommodate multiple best frequency (BF) and auditory nerve (AN)

channels and is developed with single and double precision builds. Though results from only

single precision execution is displayed in this thesis as double precision responses bear

identical responses. The question remains as to the quantity of channels it can

accommodate for generating the responses of the different stages within the computer

model. These quantities will vary based on computing system hardware and software

platform. As for this project, development and testing of RTAP code occurred on machine 1.

Though for acquiring the load profiles of RTAP, machines 1 and 2 were used. The

specifications for machines 1 and 2 are displayed in figure 4.1.

71

Machine 1 Machine 2

Computer: Asus U45JC Notebook Customised desktop

Processor: Intel Core i5-460M, 2.53 GHz Intel Pentium Dual-Core E6500, 2.93

GHz

RAM: 4GB 4GB

HDD: 500GB 500GB

Graphics

Card:

nVidia GeForce 310, 1GB

VRAM

nVidia GeForce 8400GS, 512MB

VRAM

OS: Microsoft Windows 7 64-bit

Home Premium

Microsoft Windows 7 64-bit Home

Premium

Table 4.1: Computing system platform used for RTAP development and testing.

This chapter aims to describe most of the features of RTAP excluding processed

data display, mathematical optimisation and load profiles that will be covered in chapters 5

and 6 respectively. The user interfaces of RTAP will be introduced in the next section and

this will be followed by process priority scheme used by RTAP. The general structure of

RTAP will be covered thereafter as well as a sine tone generator feature. Utilisation of

threading application programming interfaces (API) will be elaborated and this will be

followed by a description of single graph plotting of multiple signals that is essential when

multiple best frequency (BF) channels are introduced. Finally, data acquisition of processed

data will be presented with a focus on window frame continuity of recorded data as well as

offline processing and formatting.

4.1 User Interface (UI)

The graphical user interface (GUI) design of RTAP was carried out in an informal

manner largely because the main emphasis of this project was to implement a real-time

model of an AP that focuses largely on algorithm processing, parallel computing through

thread utilisation and graphical representation of the response of the AP model. Although,

the GUI is necessary to hold control and feedback parameters and to display plots, its

design took a lower precedence as compared to the abovementioned key features.

However, they would be briefly discussed in this section.

RTAP was designed to be launched directly from Microsoft Windows OS by double

clicking the executable icon of RTAP. This provides a rapid manner of starting up the real-

time simulator without launching any other complementary software. The main element of

RTAP is the display window where visual representations of the auditory pathway (AP)

model response are projected regardless of graph type and display type in figure 4.1.

72

Moreover, majority of the space is dedicated to the plot display because RTAP is expected

to display graphs of AP model responses represented by numerous best frequency (BF)

channels. The height of the graph is proportional to the number of BF channels and hence, a

large number of BF channels indicate a vertically expanded graph, which in turn takes up a

large plot display area.

Besides plot display, essential parameters adjustments are necessary for RTAP to

process so that various degrees of auditory perception can be experimented with. The

essential parameters include segment selectivity within the AP model for analysis, auditory

stimulus source selectivity, AP model response display type and the usage of math

optimisation in algorithm processing. These parameters along with statuses of threads and

processing times of algorithms are necessary to provide an overview and basic control of the

real-time AP model simulation. Due to insufficient space on the main user interface (UI)

page, additional essential parameters such as the sampling frequency at which the AP

model operates, the number of BF and AN channels it accommodates and sine tone

generator control parameters are spilled over to the secondary UI page in figure 4.2. Hence,

RTAP has two UI pages that hosts majority of its main controls and feedbacks. Other

parameters are held as constants based on the original settings of the MAP model.

73

Figure 4.1: RTAP main user interface.

The four clickable buttons in figure 4.1a perform the main controls of the program.

The play button with triangle icon starts and pauses algorithm processing. The ‘Set’ button

allocates memory for algorithm processing based on the number of BF channels and AN

fibre type as well as buffers for recording and image display. Alternatively, it also initiates the

pre-computation of all coefficients and constants required for all algorithms computation as

well as set the runtime process priority of RTAP to a predefined level. The advantages of

such functionalities are to provide user driven memory allocation and pre-computation of

coefficients. In other words, the algorithms in the auditory pathway (AP) model include

computing of parameters that are not dependent on the input data. These parameters can

be pre-computed before the algorithms are executed. The ‘Play+Record’ button starts

algorithm processing and binary recording of processed data concurrently while the ‘Record’

button allows the recording of processed data into a binary file only when the algorithm

processing has been initiated with the play button.

The two combo boxes in figure 4.1b select the algorithm type to run and record while

the one in figure 4.1c selects the priority that the algorithm runs on. Input source selection of

(a)

(b) (c)

(d) (e) (f)

(g)

(h)

(i) (j)

Plot display window

74

the audio streams between a hardware microphone channel on board the computer and a

RTAP-built in software sine tone generator is made using a radio button group in figure 4.1d.

The radio button group in figure 4.1e selects either a static or scrolling response plot display

while figure 4.1f selects the response plot display type that are either spectrogram or ERB

based. The radio button in figure 4.1g toggles the use of fast exponential function in the

algorithm. Once the ‘Set’ button is clicked the priority that RTAP runs on is projected in the

space figure 4.1g. The status texts in figure 4.1h display the timings of the relevant functions

that are executed during runtime and finally, figure 4.1i feeds back thread execution status

during algorithm runtime.

Figure 4.2: RTAP user interface for setting parameters.

The secondary UI page categorised by ‘Set RTAP Parameters’ tab opens an

interface that allows the user to configure the prime parameters of the algorithm. This UI is

illustrated in figure 4.2. In the group of figure 4.2a, the combo box and sliders control the

sampling rate and input parameters for sine tone generation. The control inputs in figure

4.2b vary the number of BF channels and dual resonance nonlinear (DRNL) filter

parameters. The combo box in figure 4.2c sets the number of AN fibre types. Other

(a)

(b)

(c)

75

parameters within the IHC, NRR and ANSP stages are set to their default settings identical

to the MAP model for the purpose of response verification.

4.2 Process Priority

As indicated in the introduction of this chapter, all programs that run on an OS run

based on a priority. Process priority defines the degree of CPU attention that is gained by a

program. Similarly, RTAP adopts four different process priorities when running on Windows

OS. The four priorities adopted by RTAP from the highest order include ‘REALTIME’, ‘HIGH’,

‘ABOVE NORMAL’ and ‘NORMAL’. Though there are more process categories below

‘Normal’ priority these are omitted in the development of RTAP as they lower the CPU

attention that RTAP gets thereby reducing its real-time execution capabilities. By default,

RTAP launches as ‘NORMAL’ priority but as soon as the ‘Set’ button is clicked, the priority

changes to one of the four aforementioned priority levels based on the user setting.

In the case where RTAP runs on an ‘ABOVE NORMAL’ priority, the CPU on a

computer will execute the algorithms more often than other ‘NORMAL’ priority processes in

the CPU execution queue. RTAP as ‘HIGH’ priority takes precedence as compared to

‘NORMAL’ and ‘ABOVE NORMAL’ processes and a CPU exclusively services the algorithms

treating it as a time-critical task. RTAP threads that run under ‘HIGH’ priority will pre-empt

threads from other processes running either as ‘IDLE’ or ‘NORMAL’ priority. Running as

‘REALTIME’ priority, RTAP threads will pre-empt the threads of all other non-‘REALTIME’

threads including scheduling threads in Windows OS [51]. In light of the various levels of

priorities that RTAP is capable of running at, it is hypothesised that the loading of the

number of BF channels will increase with the escalation of every discrete level of priority.

The results of optimisation through the alteration of process priorities are presented in

section 6.2.

4.3 Structure and Settings

4.3.1 Class Structure

The graphical user interface (GUI) of RTAP is implemented with an open source GUI

C++ library called JUCE. Besides the GUI library, the audio and graphics libraries within

JUCE are utilised to stream sampled audio data from the microphone channel and display

processed data response on RTAP respectively. Figure 4.3 demonstrates the layout of

RTAP through the use of object oriented classes. RTAP starts by drawing the GUI window in

MainWindow before invoking AuditoryPeripheryMaster to launch the main and parameter set

UI tabs in RTAP. The control buttons and feedback statuses are painted through the

AuditoryPeripheryJUCEmain class. Within the same class, LiveAudioInput class is invoked

76

to prepare the audio streaming from the microphone channel on board the computer.

Similarly, from AuditoryPeripheryJUCEmain, AuditoryPeripheryJUCEdisplay is invoked to

render plot display window on to RTAP GUI.

Figure 4.3: RTAP object oriented class layout.

The algorithms along with the memory allocation for coefficients, input and output

(IO) buffers described in chapter 3 are incorporated in the AuditoryPeripheryCompute class.

This class can be invoked from the AuditoryPeripheryJUCEmain but a problem lies in the

buffer size allocation. Since the maximum number of BF channels that can be processed by

RTAP was unknown at the point of development, buffer allocation based on the maximum

number of BF channels cannot be accounted for. Though, an arbitrary maximum number of

BF channels can be defined and then adjusted through trial and error, this approach was

deemed impractical as RTAP had to be re-compiled for every variation of the maximum BF

channels parameter. Instead, a more viable solution is to vary the number of BF channels

based on user selection and thereafter allocating the buffers after the ‘Set’ button is clicked.

Additionally, in the initial design of RTAP, algorithms execution was required to be

initiated by timer callback functions in the AuditoryPeripheryJUCEdisplay class in order to

synchronise algorithm execution together with plot display. Similarly,

AuditoryPeripheryJUCEmain class is also required to invoke the functions within

AuditoryPeripheryCompute to deallocate and allocate buffers of specific sizes. As the two

aforementioned classes required access to the functions in AuditoryPeripheryCompute,

creating the AuditoryPeripheryCompute class from either of the two former classes would

have prevented the other from invoking functions from AuditoryPeripheryCompute. To

resolve this conflict, AuditoryPeripheryCompute is made the base class of

AuditoryPeripheryMaster

MainWindow

AuditoryPeripheryJUCEmain

AuditoryPeripheryJUCEdisplay LiveAudioInput

AuditoryPeripheryCompute

77

AuditoryPeripheryJUCEdisplay so that when AuditoryPeripheryJUCEdisplay is invoked in

AuditoryPeripheryJUCEmain, both classes are able to access the algorithm functions of

AuditoryPeripheryCompute. Though, through the incorporation of threads in the later designs

of RTAP discarded the dependence from AuditoryPeripheryJUCEdisplay, the inheritance of

AuditoryPeripheryCompute still remains in the implementation of RTAP.

In the JUCE library, the sampling rate of the audio data streamed from the

microphone channel is set at a default of 44.1 KHz. Up to 2560 samples of audio data are

acquired from the underlying Microsoft DirectSound base library and made available to the

audioDeviceIOCallback function via the LiveAudioInput class every 58ms. As the primary

auditory range of importance is within the spectral range of speech, the sampling rate was

reduced to 22.05 KHz through the process of decimation where only even numbered

samples are retained. Therefore, instead of 2560, 1280 samples of audio data are acquired

every 58ms. On another note, the timing benchmark for executing multiple BF channels in

RTAP is 58ms based on the highest sampling rate setting of 22.05 KHz, which means that

algorithms processing time up to 58ms are acceptable but anything beyond results in the

probable degradation of the output response. This benchmark parameter of 58ms is

computed by multiplying quantity of sampled audio data, 1280 with the inverse of the highest

sampling rate at 22.05 KHz.

4.3.2 Input Settings

Table 4.2 projects the RTAP settings used to acquire real-time results for two

stimulus types. The results for the sine tone stimulus response are projected throughout all

sections within chapters 4, 5 and 6 as well as the Appendices with the exception of section

5.2. For section 5.2 the input stimulus is streamed from the microphone channel. The

number of BF channels for the response plots in some sections may also vary accordingly to

illustrate the effects of maximum load in RTAP for machines 1 and 2. System parameter

settings are projected in table A.1 of appendix A.

78

Settings Sine Tone Microphone channel

Stimulus frequency: 500 Hz -

Stimulus level: 50 dB SPL 50 dB SPL

Size of window frame: 58ms 58ms

Sampling frequency: 22.05 KHz 22.05 KHz

Minimum BF: 250 Hz 250 Hz

Maximum BF: 6000 Hz 6000 Hz

Number of BF channels: 30 30

Table 4.2: RTAP settings for acquiring various responses.

4.4 Sine Tone Generator

A window frame contains up to 1280 sampled audio data for a default sampling rate

of 22.05 KHz should sampled audio data be streamed from the microphone channel.

However, a window frame of a fixed size of 1280 may not be able to hold complete sine

wave cycles. A complete sine wave cycle is defined as a sine wave that consists of positive

and negative half of a cycle and ends after traversing the negative half cycle to a point on

the y-axis just before the start of the next sine wave. The last sine wave within a window

frame may be subjected to truncation, which is dependent on the sine tone frequency. As a

result, when RTAP processes two adjacent window frames stored with sine tone sampled

data, it will experience an abrupt end to the sine wave cycle when shifting from one window

frame to the next. Hence, the size of the window frame must be altered in order to ensure

smooth transition of sine wave cycles between adjacent frames.

The window frame size is altered based on the sine tone frequency set and its size is

typically either equal to or less than 1280 sampled data. The truncated sine wave cycle just

before the end of the window frame size of 1280 described in the preceding paragraph is

thus eradicated. This leaves behind a window frame that holds complete cycles of sine wave

and ensures a smooth transition between adjacent window frames. In RTAP, the sine tone is

computed only for one cycle when the ‘Set’ button is clicked. The sampled data for one cycle

is replicated over to the rest of the audio buffer based on the requirements of the number of

sine wave cycles needed to complete the entire window frame. Since the y-axis start and

end points of the sine wave within a window frame is one y-axis interval apart, a single

window frame of a sine tone signal can be used as a continuous input stream throughout the

runtime of algorithms in RTAP. The pseudocode for the sine tone generator is given in listing

4.1.

79

// AudioGain is based on input stimulus SPL gain in dB SPL AudioGain = 28e-6*(10^(dB_SPL/20));

// pre-calculate a sine tone cycle here

NumSampPerCycle = SamplingRate / SineFrequency; // Default_window_size = 1280 for sampling rate of 22.05 KHz

NumSampOmit = % NumSampPerCycle;

NumCyclesPerWindow = Default_window_size / NumSampPerCycle;

for (i=0; i< NumSampPerCycle; i++) {

Audio_buffer[i] = AudioGain * Amplitude *

sin(2 * PI * SineFrequency * i / SamplingRate);

// this is where a sine tone cycle is repeated over to form a continuous sine wave

for (j=1; j<NumCyclesPerWindow; j++) {

// copy over the 1st cycle data over to the next cycle

Audio_buffer[(j*NumSampPerCycle)+i] = Audio_buffer[i];

} }

// update the processed data per BF channel based if sine tone option chosen if (Audio_Source == AUDIO_SINE)

{

WindowFrameSizePerBF = Default_window_size – NumSampOmit; }

Listing 4.1: Sine tone generator.

4.5 Threading

4.5.1 Background

Besides the primary function of RTAP which is to compute MAP-based algorithms in

real-time, the application is expected to record as well as display real-time processed data.

Sampled audio data streamed from the microphone channel at a sampling rate of 44.1 KHz

are down sampled to 22.05 KHz in RTAP. The audio data is made available to RTAP

regularly at 58ms interval in a block format at a rate of 1280 samples per block. At the

availability of a block of data, RTAP is required to compute the respective response of the

AP model stage. Within the same window frame and depending on the user-based settings,

RTAP might also be expected to record the processed data samples to a file as well as

visually display them on its UI display window.

In a conventional C++ class without any thread usage, the aforementioned three

tasks are executed in a sequential manner, one task after the other. However, RTAP

computational load is varying and is dependent on the number of BF channels, which is also

a user-defined setting. Setting this parameter to a low number might not pose a problem in

RTAP as there are not much computational intensive algorithms to be executed. However, at

larger loading of the number of BF channels, complete tasks execution failure is imminent.

Figure 4.4 illustrates sequential execution of tasks based on low and high number of BF

80

channels. With a higher number of BF channels loaded in sequential execution format in

RTAP, there will not be sufficient time to execute the display function within RTAP leading to

possible incomplete processed data display. At the point of arrival of subsequent data block,

the behaviour of RTAP may become unpredictable and thus sequential execution is not an

ideal method of attaining optimum performance gain.

Figure 4.4: Sequential execution in RTAP.

To overcome the constraints encountered in sequential execution, RTAP adopts

parallel execution techniques through the use of threading application programming interface

(API). Three threading APIs were considered for implementation. Table 4.3 offers a

summary of the threading APIs. Since RTAP has been developed on Windows OS, ideally

the best parallel computing option is to implement Windows API threading. However, in light

of future expansion to multiple OS platforms, a more beneficial option is to adopt a cross OS

platform threading API. Between Open MP and POSIX threads, Open MP offers better

performance over POSIX [52] and presents an obscure abstract layer of thread

management. More elaborately, Open MP offers parallelisation of sequential code segments

with automatic thread management where thread creation, synchronisation and deletion are

automatically taken care of by the API.

Algorithm () Record () Display ()

Low number of BF channels

Block of 1280 audio

samples available

Block of 1280 audio

samples available

58ms

Display () Algorithm () Record ()

High number of BF channels Incomplete function

execution

....... Algorithm ()

81

In the case of RTAP, though runtime performance is an essential feature, the degree

of freedom in the management of threads is essential in achieving a controlled execution of

parallelised tasks. For example, in RTAP, a record thread is not required to be processed

until the ‘Record’ button event is registered. A significant redundancy is added to the

processing time if the record thread was to be created at the point when the ‘Record’ button

is clicked. With the option of thread management, a record thread is created that initiates the

creation of a binary file when the ‘Set’ button is clicked and thereafter, placed in a dormant

state. In this state, the record thread waits. Upon the clicking of the ‘Record’ button, the

contents of the record thread are processed almost instantly as there are no redundancies

involved. Hence, with respect to functionality, a larger degree of thread management control

is necessary and the POSIX API thread is chosen to be implemented in RTAP.

Threading APIs Advantages Disadvantages

Windows • Native applications run

directly on processor

unlike cross platform

threading APIs that incur

overhead [50].

• Cannot be implemented

across other OS platform (can

only be used exclusively on

Windows OS) [50].

Open MP • Cross OS platform [50].

• Good performance [52].

• Insufficient degree of direct

control of threading resources

[50].

POSIX • Cross OS platform [50].

• Larger degree of direct

control of threading

resources [50].

• Not as good performance as

Open MP [52].

Table 4.3: Threading API comparison.


Three POSIX threads, known as pthreads, are allocated for the algorithm, file record

and pixel rendering functions. The three POSIX threads are created at the launch of RTAP in

the constructors of AuditoryPeripheryCompute and AuditoryPeripheryJUCE classes

respectively. Each of the three threads are then placed in a Ready-to-Run state as each of

them wait on a unique condition variable (CV) within a while loop that exists indefinitely for

the duration of RTAP runtime. These CVs are part of POSIX thread synchronisation set and

are used in simple inter-thread communication. In RTAP, CVs are used to indicate to

respective threads on the availability of data for further processing. As a complement to CV

82

usage in POSIX, mutex locks also need to be used. A mutex allows only a single thread at

any time to exclusively access a block of code [50]. To use CV signalling in POSIX, a mutex

must be acquired first and then released after the work on the CV is completed. Figure 4.5

shows the usage of POSIX thread synchronisation in RTAP.

Figure 4.5: Thread synchronisation pseudocode in RTAP.

With respect to figure 4.5, two threads are created. As soon as thread 2 is created, it

enters a while loop and a lock is acquired on a mutex. It then encounters a command to wait

for a CV. Before it continues in wait mode, thread 2 releases the mutex lock. Alternatively,

thread 1 upon creation starts to compute and store the corresponding results on shared

memory. The thread continues on to gain a mutex lock that allows it exclusive access to

send out a CV. After transmission, the mutex lock is released. Upon the reception of the CV,

thread 2 runs only after the mutex lock is released by thread 1. Thread 2 then continues to

process the shared memory contents. In this way, thread synchronisation is achieved. Figure

4.6 illustrates the thread structure utilisation in RTAP.

Thread 2

while (RTAP_is_running)

{

// allow only thread 2 to run pthread_mutex_lock( &Mutex );

// wait for condition variable to be signalled

pthread_cond_wait ( &Condition_variable, &Mutex ); // allow other threads to run

pthread_mutex_unlock( &Mutex );

// carry out some task on data on shared memory

...

}

Thread 1

// do some work on shared memory

... // allow only thread 1 to run

pthread_mutex_lock( &Mutex );

// signal condition variable pthread_cond_signal ( &Condition_variable );

// allow other threads to run

pthread_mutex_unlock( &Mutex );

83

Figure 4.6: Thread utilisation structure in RTAP.

In RTAP, upon the availability of audio data samples either through an audio input

stream connected to the microphone channel or the sine tone generator, an algorithm thread

is invoked to compute the response of a particular stage of the auditory periphery model.

Towards the end of the algorithm thread after the computation, CVs are signalled to prepare

the pixel render and file record threads to run. As the processed data are cloned to two other

memory location segments for display and file recording, the two threads are able to run

concurrently past the boundary of a window frame. A comprehensive thread synchronisation

network diagram implemented in RTAP is depicted with pseudocode in figure 4.7.

Block of 1280 audio

samples available

Block of 1280 audio

samples available

58ms

Algorithm ()

Record ()

Display ()

Low number of BF channels

Record ()

Algorithm ()

Display ()

Display ()

Algorithm ()

Record ()

High number of BF channels

Algorithm () Thread 1

Thread 2

Thread 3

Thread 1

Thread 2

Thread 3

84

Figure 4.7: Thread synchronisation in RTAP.

At the clicking of either ‘Play’ or ‘Play+Record’ button in RTAP, a timer callback

function that runs every 58ms detects a change in a global Boolean variable that tracks the

‘Play’ buttons click status. A CV, cvWaitOnPlayButton, is transmitted that triggers the

// 58ms timer

AuditoryPeripheryJUCEdisplay::paint()

{

if (Play_button_clicked) {

Signal_cvWaitOnPlayButton;

}

if (Audio_source != MIC_IN)

{ Signal_cvComputeAlgo;

}

}

// Algorithm thread

AuditoryPeripheryCompute::ProcessFunction()

{

while (RTAP_is_running) {

Wait_for_cvWaitOnPlayButton;

if (Audio_Packets_to_be_processed)

{

if (Record_button_clicked) {

Signal_cvFileWriteWaitonRecBtn;

}

Wait_for_cvComputeAlgo;

// Process algorithm function

AlgorithmFunction(); }

}

}

AuditoryPeripheryCompute::ProcDataRecordThread() {

...

Wait_for_cvFielWriteWaitonRecBtn;

while (RTAP_is_running)

{ Wait_for_cvFileWriteWaitonSignal;

...

} }

AuditoryPeripheryCompute::AlgorithmFunction() {

// Process algorithms start

... // Process algorithms end

Signal_cvDrawPlot;

if (Record_button_clicked)

Signal_cvFileWriteWaitonSignal;

}

LiveAudioInput:audioDeviceIOCallback

{

// Acquire data from Mic In stream

... Signal_cvComputeAlgo;

}

AuditoryPeripheryJUCEdisplay:: DrawPlotPixels()

{

while (RTAP_is_running) {

Wait_for_cvDrawPlot;

// Draw image on // display window

...

} }

85

algorithm initiating thread to prepare to call on the respective algorithm function to service

the audio data. If either the ‘Play+Record’ or ‘Record’ button is clicked, another global

Boolean variable is set and the algorithm initiating thread transmits a CV,

cvFileWriteWaitonRecBtn, which allows the file record thread to write a file header into a

newly created binary file. The file record thread then suspends its operation pending

availability of processed data from the algorithm function. The algorithm initiating thread,

thereafter, waits for the availability of audio data samples either from the live audio input

stream connected to the microphone channel or the sine tone generator. Once a block of

data is saved in a shared memory location, a CV, cvComputeAlgo, is received by the

algorithm initiating thread. The respective algorithm function is then called to process the

contents of the shared memory. Towards the end of the algorithm function, where the

computation is completed, a CV called cvDrawPlot is sent out to render an image buffer

based on the recently computed processed data. The algorithm function concludes by

sending a final CV, cvFileWriteWaitonSignal, out to the file record thread that saves the

processed data to the binary file.

4.5.3 Results

Intel thread checker, which is part of Intel VTune Amplifier XE 2011 software

application, is able to measure the performance of the thread utilisation in RTAP. One

requirement of using this utility is that the program to be analysed has to have a short and

deterministic time duration. However, RTAP runtime duration is a non-deterministic variable

dependent on the intention of the user. Therefore, a separate shorter abstract of RTAP is

developed that runs two time frames where the algorithm, file record and pixel render code is

replaced with a time delay of 58ms, 15ms and 15ms respectively. The goal of this test

program is to ensure that thread execution is within the expected behaviour as observed in

figure 4.6. Figure 4.8 illustrates the output response of the Intel thread checker.

86

Figure 4.8: Intel thread checker analysis of RTAP usage of threads.

The lower half of the image in figure 4.8 represents a magnified segment of thread

execution transition in RTAP thread simulator. In part (1) of figure 4.8, the algorithm thread

after 58ms invokes the record thread through a CV specific to the record thread. As a mutex

is already locked when the CV is signalled by the algorithm thread in (1), the record thread is

unable to lock on the mutex and thus placed in Ready-to-Run mode. In part (2), the

algorithm thread unlocks the mutex that then allows the record thread to lock the mutex and

start to run as a ‘Critical Section’ where it services the CV. In the ‘Critical Section’ domain,

the algorithm thread momentarily relinquishes CPU control. After the record thread has

serviced the CV, it releases the mutex lock and the algorithm thread resumes execution by

immediately locking another mutex for the pixel render thread in part (3). The algorithm

thread immediately signals the pixel render thread with CV that puts the latter thread in a

‘Ready-to-Run’ mode waiting on the release of the pixel render mutex. In part (4), the

algorithm thread finally releases the pixel render mutex, which then allows the pixel render

thread to service the CV and subsequently release the pixel render mutex. One essential

feature of the thread simulation test is the existence of concurrency in part (3) onwards.

Thread synchronisation through CV signalling is the only serialised dependency between

threads. However, the processing time of CV signalling is negligible. Therefore, the record

and pixel render threads run concurrently indicating the achievement of parallel execution in

RTAP.

87

4.6 Response Plots

The responses for the various stages of RTAP result in the generation of multiple

signals that is directly proportional to the number of BF channels selected. A 30 BF channel

DRNL response results in the generation of 30 logarithmically spaced BM displacement

signals. The display of 30 signals on 30 different graphs does not make an ideal visual

representation of an auditory model response. Hence, it is advantageous to group the

various signals into a single graph.

Equivalent rectangular bandwidth (ERB) is a method to plot signals from a multiple

channel auditory perceptual model [22]. Equation 4.1 defines the relationship between a

signal of a specific frequency and the ERB scale.

x}Pª(&) = 21.4C?;Ht(0.00437& + 1) (Eqn. 4.1)

where ERBS(f) is the translated signal of a specific frequency offset from the point of origin

on the y-axis;

& is the frequency of the signal. Equation 4.1 is a modification of the Greenwood function

that describes the variation of a critical bandwidth with a centre frequency. It has the same

form as the Greenwood function except that the coefficient constants vary. Therefore, ERB

plotting is able to associate a centre frequency within a critical bandwidth to a position along

the basilar membrane similar to the Greenwood function [53].

RTAP processed data are time domain representation of the various BF components.

To represent time domain data on ERB scale, equation 4.1 has to be modified. This

modification is dependent on the magnitude of processed data for every stage of the MAP

model. For example, the processed data from the various stages of RTAP are in the

magnitude of 1e-1 for IHCRP, NRR and ANSP stages and in the range of 1e-7 to 1e-14 for

stapes and BM displacement stages. With a large dynamic range of amplitude to deal with,

equation 4.1 is retrofitted with an exponential function as follows:

�¡zf (>) = 21.4C?;Ht(0.00437& + 1) + �f�(>)2o (Eqn. 4.2)

where �¡zf is time domain processed sample data scaled and offset on the ERB scale; �f�

is the time domain representation of processed sample data generated with respect to BF; &

is the BF in Hertz; S is a scaling factor of the time domain processed signal, �f�. In order to

normalise a large range of amplitudes for the various stages of the MAP model, this

exponent can be tuned with an integer value to amplify the signals to an ideal visual level of

representation. The base value of 2 is selected for speedup in computing [54]. Figure 4.9

illustrates the DRNL response for 30 BF channels represented in an ERB scale.

88

Figure 4.9: (a) MAP and (b) RTAP DRNL response for 30 BF channels.

The DRNL responses of MAP and RTAP for the first window frame upon the start of

the two models are illustrated in figure 4.9 and are based on the input parameters from table

4.2. The scaling exponent from equation 4.2 used for sizing up the DRNL response is set to

250Hz

388Hz

670Hz

1159Hz

2005Hz

3469Hz

6000Hz

0 0.005 0.01 0.015 0.02

BF

sit

es

alo

ng

BM

fo

r B

M d

isp

lace

me

nt

(Hz)

Time (seconds)

MAP DRNL Response (30 BFs)

250Hz

388Hz

670Hz

1159Hz

2005Hz

3469Hz

6000Hz

0 0.005 0.01 0.015 0.02

BF

sit

es

alo

ng

BM

fo

r B

M d

isp

lace

me

nt

(Hz)

Time (seconds)

RTAP DRNL Response (30 BFs)

89

22. The signals with resonances of significant amplitudes are clearly present in the lower

frequencies especially around the region of 500 Hz, which is the frequency of the input

stimulus. The increase in phase of the traveling wave from the base to apical end is also

depicted clearly in the figure as well. The top most signal in figure 4.9 is the highest BF

situated at the basal end of the BM. From this rigid part of the BM, the wave is propagated

along the BM where the wave amplitude and phase increases. It reaches maximum

amplitude at the 500 Hz site of the BM before decreasing rapidly as it reaches the apical

end. Its phase continues to increase until the end of the apical end is reached. Hence, the

ERB scaled plot is able to project multiple signals in a condensed and well-structured

manner that describes the amplitude, phase and frequency responses within a single graph.

Furthermore, the plot in figure 4.9 (b) demonstrates the capability of RTAP in regenerating

identical DRNL response as MAP in figure 4.9 (a) given identical input parameters in real-

time.

The ERBS representation of the IHCRP, NRR LSR, NRR HSR, ANSP LSR and

ANSP HSR responses for MAP and RTAP are depicted in figure A-1 to A-5 in appendix A

and the scaling factors are tuned to 5, -5, -9, -9 and -12 respectively. Furthermore, every

aforementioned response shares similar traits with the DRNL response in terms of the

amplitude response and phase deviation in the entire BF range. It has to be noted in all ERB

scaled plots in this thesis that the frequency components are used interchangeably with ERB

scaled points to denote y-axis coordinates.

4.7 Recording Feature

4.7.1 File Write Command Selection

A vital feature that is required in RTAP is the storage of processed data output from

the various stages of the model into a file to ensure that the processed data generated

matched the MAP model response. As a result, the recording feature for RTAP is required to

store only a short segment of processed data instead of a log of continuously acquired data.

Though data logging for the entirety of the processed data is beneficial especially in the long

term usage of RTAP, this option was not considered due to its complexity and surplus time

required for implementation. Instead, priority was given to ensure that an integrated real-time

program of the MAP model existed at the end of this project. Hence, the record feature of

RTAP serves as a stepping stone to a sophisticated real-time logging of processed data for

future editions of RTAP. As for the first version of RTAP, two window frames of processed

data are required to be stored. This is mainly to ensure that the continuity between

subsequent window frames for all algorithms is intact according to the MAP model response.

Hence, the data stored in the file should be in a numerical format representing the exact data

90

output from RTAP regardless of the input stimulus and it should be organised in a structure

based on the number of BF channels as well as AN channel fibre types for either NRR or

ANSP response. Several C/C++ file write function commands were examined. These

function commands were tested by writing 1280 floating point variables into text and binary

files. Table 4.4 projects a summary of the file write function commands profiled on machine

1.

File Write

Commands

Generated Output Operating System

(OS) Platform

Average

Processing Time

C++

‘ostream_iterator’

Text file Multiple platforms 18.95ms

C++ ofstream

based ‘<<’ in a for

loop


C ‘sprintf’ and

‘write’


C/C++ ‘WriteFile’ Binary file Windows 0.13ms

C/C++ ‘fwrite’ Binary file Multiple platforms 0.14ms

Table 4.4: C/C++ file write profile.

From table 4.4, it can be deduced that ofstream based write function command take

the longest to execute. Text formatting and file writer commands, sprintf and write that

generate a text file with floating point numerals seem an ideal choice for implementation.

However, WriteFile and fwrite that are used for binary file generation are approximately 45

times faster than the combination of sprintf and write function commands. WriteFile is a

Windows based file writer and is only usable for C and C++ compilers in Windows OS

whereas fwrite is a cross-platform function command which can be used on C and C++

compilers on other OS as well. Due to the versatility of cross-platform utilisation and

relatively short processing time, fwrite was implemented in RTAP for saving processed data

into a binary file. To interpret the contents of the binary file, formatting is required. This is

achieved using an offline program separate from RTAP that converts the contents of the

binary file to alphanumeric characters and store them into a text file.

4.7.2 Binary File Format

The RTAP recording feature is divided into two primary stages. The first stage

involves data writes to a binary file that omit floating point (FP) formatting. The second stage

consists of an offline processing to restore the FP formatting and store the formatted

91

numbers into a text file. The sole reason for dividing the data recording into two stages is to

not compromise computational performance of the concurrent computation of the algorithms

and recording in RTAP. For the offline program to function, it needs to possess properties of

the recorded data. These properties are included in the header of the binary file sized as a

constant of 50 bytes. Figure 4.10 depicts the structure of the binary file format that saves raw

processed data from RTAP.

Figure 4.10: RTAP binary file format generated when the ‘Record’ or ‘Play+Record’ button is clicked.

*For non-sinusoidal

input, this parameter

is set at an all high

state i.e. 0xFFFFFFFF

*Sine Input Frequency

R T A P

Number of BF

Number of AN fibre

Number of processed data per time frame

Minimum BF

Maximum BF

Number of AN

Algorithm function ID

Unused (reserved for

future expansion)

32-bit floating point

processed data

(based on IEEE 754-

2008 [62])

0

4

8

12

16

20

24

28

32

36

50

Byte

92

4.7.3 File Writer Thread

RTAP recording feature implementation is designed so as to cause a minimal impact

on the operation of the algorithm computation function during runtime. This is achieved

through the use of a POSIX thread to perform the write operation in parallel with the

execution of algorithms. The thread is created in the Reinitialise function of

AuditoryPeripheryCompute class when the ‘Set’ button in RTAP UI is clicked. As soon as the

thread starts, it changes its property from a running state to a ready-to-run state, which

alternatively means to temporarily halt the thread from running when it first starts. This is

achieved through the use of a POSIX based command, which allows a CPU to continue

executing the thread once a unique condition variable (CV) is received. This CV,

cvFileWriteWaitonRecBtn, is broadcasted from the time triggered ProcessFunction when

either ‘Record’ or ‘Play+Rec’ button is clicked.

As soon as either the ‘Play+Record’ or ‘Record’ button is clicked, the file writer thread

continues to be processed thereafter, allowing the binary file header containing the metadata

of the processed data to be written into the binary file. This operation is done as follows:

fwrite ( FileHeader, 1, SIZEOFHEADER, pRTAPfile );

The C structure FileHeader that contains the metadata is computed and defined in

Reinitialise() after the ‘Set’ button is clicked. After the writing of the header file in the file

writer thread, an endless while loop is entered within the thread that exists for the duration of

RTAP. At the first instance of entry into the while loop, the thread is temporarily halted by

another CV POSIX command. This CV, cvFileWriteWaitonSignal, is signalled by the

algorithm thread that processes the response of the AP model stage of interest. This

signalling is done when the algorithm thread has finished processing one window frame of

data. The operation of the recording feature in RTAP is illustrated in figure 4.11.

93

Figure 4.11: File write thread operation.

4.7.4 Binary File Recording

The processed data that is saved into the binary file is dependent on two parameters:

algorithm function to run and its corresponding processed data response to be saved. Both

these options can be set at the RTAP UI. There are two record buttons available at RTAP

UI. The ‘Record’ button functions only when RTAP has started processing the algorithms or

in other words, after the ‘Play’ button is clicked. This ‘Record’ button can be clicked anytime

thereafter to record two window frames of processed data. The ‘Play+Record’ button starts

both the RTAP algorithm computation as well as data recording concurrently. This button

was implemented for the purpose of analysing the processed data from the first two window

frames. This capability of saving the first two window frames using the ‘Play’ and ‘Record’

// 1) create & open a binary file for writing

// 2) wait for 'Record' or 'Play+Record' button to be clicked pthread_mutex_lock( &CallWriteFuncMutex );

pthread_cond_wait ( &cvFileWriteWaitonRecBtn, &CallWriteFuncMutex );

pthread_mutex_unlock( &CallWriteFuncMutex );

// 3) write binary file header into binary file first

fwrite ( FileHeader, 1, SIZEOFHEADER, pRTAPfile );

while (bRTAPrunning)

{

// 4) wait for algorithm function to signal that has processed 1 time of // data packets

pthread_mutex_lock ( &Write2FileMutex[0] );

pthread_cond_wait ( &cvFileWriteWaitonSignal[0], &Write2FileMutex[0] ); pthread_mutex_unlock ( &Write2FileMutex[0] );

// 5) write 1 time frame worth of processed data into binary file }

// 6) close binary file

RTAP File Writer Thread

or

pthread_cond_signal ( &cvFileWriteWaitonSignal[0] );

Algorithm Function i.e. DRNL () / DRNL-IHCRP () / DRNL - NRR ()

/ DRNL - ANSP ()

pthread_cond_broadcast ( &cvFileWriteWaitonRecBtn );

ProcessFunction()

94

buttons separately may not be achievable due to presence of delay between clicking both

buttons.

Upon the clicking of either ‘Play’ or ‘Play+Record’ button, a Boolean flag termed as

bRecordProcData is set. Depending on the processed data intended to be recorded, the

respective function then clones the processed data from the algorithms buffer to a record

buffer. The record buffer is able to accommodate two window frames of data for a maximum

of 300 BF channels and 2 AN fibre types. Although both record and algorithm buffers hold

the same data, buffer segregation of this form is required so as to prevent buffer usage

conflicts that may arise due to asynchronous buffer access owing to the parallel execution of

the two aforementioned threads if they shared the same buffer space. Furthermore, with

abundance of memory available on modern computers, redundant buffer utilisation of such

degree is tolerable.

Subsequently, as soon as the first window frame of processed data is written into the

algorithm and record buffer, the algorithm thread continues to process the subsequent

window frame of audio data. At the availability of the second window frame of data, the

algorithm thread continues to store the processed data into the record buffer only if the

record thread is not reading the contents of the record buffer that signifies file writing in

process. This is accomplished using another Boolean flag, bRecordThreadEngaged. Hence,

the file writer thread should not be running when the data is copied. If the file writer thread is

currently writing into a file, and the ‘Record’ button is clicked again, processed data will

overwrite the existing data in the record buffer and the data being saved into the file will be

corrupted. As a means of prevention, the Boolean flag, bRecordThreadEngaged, is utilised

to signal file writer thread running status as well as ensure that buffer access by the

algorithm and file writer threads are coordinated.

As far as the ‘Play+Record’ button is concerned, processed data from the first two

window frames are certain to be recorded regardless of the number of BF channels.

However, this may not be the case for the ‘Record’ button as it may be clicked at any point in

time. When the algorithms are not being processed, clicking the ‘Record’ button has no

effect. However, at any instance in time where the algorithms are being executed for multiple

BF channels, the algorithm thread may have already begun computing beyond the first BF

channel. Hence, at the end of the recording of the first window frame after the ‘Record’

button is clicked, the record buffer may not be fully filled with processed data from all the BF

channels. To prevent such an incident from occurring, a Boolean flag, bRecordStatus, is set

to signal the file writer thread to continue with the window frame recording only when the

‘Record’ button is clicked and the algorithm computation is starting from the first BF channel.

With this strategy in place, if the ‘Record’ button is clicked midway in the computing of the

algorithm thread, bRecordStatus is only set once the algorithm thread starts to process the

95

first BF channel for the subsequent window frame. This ensures that full window frames of

processed data are recorded in the binary file.

RTAP saves two types of data with respect to the option set in the variable,

uiFunction2Rec, linked to the ‘Function to record’ combo box selector on RTAP UI. One data

type is the input stimulus that is stored in a separate record buffer for two window frames

and included in the binary file as a form of reference to the processed data. The second data

type written into the binary file is processed data, which is output data from the algorithm

function stage or alternatively, the response of one of the auditory pathway stages that

comprises of DRNL, IHCRP, NRR and ANSP. After the file writer thread is signalled by the

algorithm thread, the file writer thread firstly stores the contents of auditory stimulus record

buffer. Thereafter, two iterative for loops are encountered that transfer the contents of the

processed data record buffer into the binary file. The first loop covers the entire range of the

number of BF channels and the second loop is accountable for the number of AN fibre types.

For the scenario where either DRNL or IHCRP data are required to be stored, the number of

AN fibre type is set to one as their responses are independent of the number of AN fibre

types. Listing 4.2 offers pseudo code as an insight to binary file recording.

// 1) set flag to indicate file writer thread running

Record_Thread_Engaged_flag = true;

// 2) save input stimulus

if (Function_To_Rec != AUDIO_IN)

fwrite ( Audio_Data, 4, Number_Samples_to_Record, RTAP_file );

// 3) save processed data for every BF channel and AN fibre type if applicable

for (i=0; i<NumBFchannels; i++) {

for (j=0; j<Number_of_AN_fibre_types; j++)

{ fwrite ( ProcessedData[i][j], 4, Number_Samples_to_Record, RTAP_file );

}

}

// 4) reset flag to indicate file writer thread temporary halt

Record_Thread_Engaged_flag = false;

Listing 4.2: Data writes to binary file in file writer thread.

4.7.5 Offline Formatting and Text File Generation

RTAP-offline is an offline program separate from RTAP that formats binary file

generated in RTAP and produces a text file filled with the formatted information. Running

RTAP-offline starts the reading and translation of the contents of the first 50 bytes of the

RTAP binary file that contains the header. The metadata from this segment defining the

processed data is formatted into human interpretable strings and are stored as another

header in a new text file. An iterative for loop is then executed where every loop cycle results

96

in one window frame worth of data being read and stored in a temporary buffer. The loop

ends when the end of file is encountered. After the acquisition of each window frame of data

from the binary file, a second iterative for loop is executed. The second loop extends up to

the length of one window frame size, which is equivalent to 1280 sampled data. Within one

cycle of this loop, 4 consecutive bytes of binary data are transferred from the temporary

buffer over to a single floating point variable which is also 4 bytes in size. This transfer

process reformats 4 individual bytes of binary data into a floating point value. Invoking fprintf

function command thereafter, writes the floating point value as a numeral into a text file.

Listing 4.3 projects the pseudo code that performs the offline processing.

// 1) Open RTAP binary file ...

// 2) Create and open text file ...

// 3) Get RTAP binary length ...

// 4) Read header (1st 50 bytes), format it and store in text file

fread ( temp_Header_Buffer, 1, 50, RTAP_file );

FormatRTAPheader ( temp_Header_Buff, Header_Buffer); WriteHeader2TxtFile ( Header_Buffer );

for (i=0; i<(BinaryFileLength-50); i+=(WindowFrameSizePerBF * 4)) {

// 5) Read processed data from binary file

fread (Temp_Buffer, sizeof(float), WindowFrameSizePerBF, RTAP_file);

// 6) Format every consecutive 4 bytes into FP num

for (j=0; j<WindowFrameSizePerBF; j++) {

memcpy (FP_Data[j], Temp_Buffer[Bytes_Offset], 4);

Bytes_Offset += 4;

} // 7) Write FP number to text file

WriteFPdata2TxtFile (FP_Data);

} // 8) close binary file

// 9) close text file;

Listing 4.3: RTAP offline processing.

RTAP recording feature was initially designed and developed with an emphasis on

real-time data logging. Real-time data logging that continuously saves processed data as

soon as they are made available was attempted in the early stages of development. The

implementation is incomplete. The real-time logging of 1 BF channel was attempted and it

was observed that contents of a window frame saved in the binary file were overwritten with

contents from the subsequent time frame. Firstly, this indicated that the algorithm function

executes faster than the file writes for one window frame. The second observation was that

record buffer protection was absent. The record buffer size allocated for 300 BF channels

and 2 AN fibre types was only a multiple of 1280 FP value. In the real-time logging attempt,

the algorithm thread upon its invocation started to overwrite the record buffer with new data

97

the same time as the record buffer was being read by the file writer thread. Hence, the

binary file storage contained a mixture of data from the two window frames. An evidence of

this was the abrupt change in the recorded data pattern observed through offline analysis.

A solution to real-time data logging is to increase the record buffer size to store more

FP value from its original size of 1280 FP values. However, the question still remains as to

the limit of buffer size and the type of real-time logging strategies to adopt. For the first

edition of RTAP, this remains an open question. As a result, the implementation of real-time

logging was omitted from RTAP. Though in its place the current two-frame recording is

implemented and record buffer protection is achieved with a series of Boolean flags.

4.7.6 Results

Figure 4.12: Continuity between adjacent window frames for RTAP generated DRNL response.

Figure 4.12 displays the continuity of the DRNL response signals between first and

second window frames upon the start of RTAP. A black vertical dashed line on the figure

segregates between the first window frame to the left and the second frame on the right.

Because of the sine tone stimulus used, the signals generated in all window frames are

determinable. Thus, the continuity of the DRNL response can be concluded by analysing the

signal in the region of 500 Hz where the signals possess significant amplitude gain in

relation to the input stimulus frequency. The path of the signals from the end of the first

window frame to the start of the second window takes place in smooth transition without any

250Hz

388Hz

670Hz

1159Hz

2005Hz

3469Hz

6000Hz

0.052 0.054 0.056 0.058 0.06 0.062 0.064 0.066 0.068

BF

sit

es

alo

ng

BM

fo

r B

M d

isp

lace

me

nt

(Hz)

Time (seconds)

RTAP DRNL Response (30 BFs)

98

unwanted jitters or spikes. The window frame continuity is demonstrated for IHCRP, NRR

LSR, NRR HSR, ANSP LSR and ANSP HSR responses in figures A.6 to A.10 in appendix A.

These plots indicate that the algorithm translated to RTAP match the responses of the MAP

model. It also indicates the strategies implemented in chapter 3 for window frame continuity

is attainable in real-time.

4.8 Summary

RTAP, which is a C++ GUI-based real-time implementation of the MAP model, is

described in this chapter. Along with the algorithm buffering for window frame continuity

detailed in chapter 3, POSIX threads are used to attain the real-time effects of the MAP

model in a Windows OS environment. Responses of RTAP in a numerical form based on

multiple BF channels can be acquired through the recording features implemented on board

followed by offline processing of binary file conversion to an alphanumeric text based file.

The recorded data typically consists of two window frames of processed data. Also covered

in this chapter is the ERB scaled plotting that is able to accommodate multiple signals in a

single graph.

99

Chapter 5: Signals Display in RTAP

Visual representation of the processed signals response of the algorithm is a

necessary indicator of the real-time feature of RTAP. RTAP is capable of displaying

processed data in two formats: ERB scaled and spectrogram. These two graphing schemes

are capable of displaying multiple BF channels as well as the intensities of the processed

data. Processed data are displayed in either as a static or scrolling image. Though the static

display projects only the first window frame of processed data on the screen, it nevertheless

was developed to indicate the integrity of processed data and display mechanism. It also

serves as platform for implementing a scrolling image that visually represents processed

data for every window frame. Scrolling display is therefore, the derivative of the static

window frame display.

This chapter is separated into two halves. The first half describes the approach taken

to implement static plot displays. The development of the display mechanism is illustrated

and responses from the various stages of the AP model are displayed in ERB scaled and

spectrogram plotting modes. Thereafter, the second half of this chapter details the horizontal

scrolling implementation of ERB scaled plots.

5.1 Static Plot Display

5.1.1 Line Drawing

The built-in software renderer within the JUCE library is utilised to visually represent

the processed data in RTAP. It was initially intended to visually represent processed data on

the ERB scaled graphs as covered in section 4.6 using line rendering. One method used to

draw lines on the RTAP UI is through the utilisation of a built-in JUCE library function called

drawLine. This function takes on five arguments as follows:

drawLine ( float startX, float startY, float endX, float endY, float lineThickness)

The five arguments are fixed as a constant to display a horizontal line from left to

right of RTAP UI display window. Measuring the time taken to render a horizontal line onto

the RTAP UI is a good indicator of the capability of RTAP to visually project as many lines

proportional to the number of BF channels. Figure 5.1 illustrates five horizontal lines

displayed on RTAP UI and along with the line rendering time duration.

100

Figure 5.1: Line draw test. Five horizontal lines displayed on screen with five drawLine

functions and the subsequent line rendering time duration tagged in a green rectangle box.

From five separate readings similar to figure 5.1, the average time to project five lines

on RTAP UI is 107.4ms. This value is much larger than the cap of 58ms that includes

algorithm computation within one window frame of acquisition of sampled data. As this

design was done in the initial development phase of RTAP, threading API was not

considered until a later stage. Hence, based on the profile, the maximum number of lines

that can be accommodated along with the algorithm computation in a serial computing

format is two, which can be rendered at approximately 43ms. If RTAP was to be loaded with

more BF channels, the algorithm time increases and the number of lines must be reduced so

as not to breach the 58ms time cap. It can be concluded that the line drawing technique on

RTAP UI with drawLine function cannot be efficiently implemented without compromising the

processing time benchmark of 58ms based on serialised computing format.

Additionally, the drawLine function is a high-level software rendering abstract that

renders pixels on the screen using low level software rendering codes. These low level

codes are not available within the JUCE library for customisation and hence, there is no

information on issues such as image buffer utilisation, pixel rendering on the image buffer

and its subsequent projection on the screen. The implementation of the drawLine function is

obscure which adds considerable overhead that inadvertently increases the processing time

significantly. Low level management of resources such as image buffer allocation, its

101

utilisation in terms of pixel rendering and the projection of the image buffer to screen are

crucial in reducing processing times. These three implementable tasks are discussed in

detail over the coming sections.

5.1.2 Resource Management

The first task is to implement an image buffer and an auxiliary image buffer. Although

these two buffers hold processed data to be displayed, the differences between them are

their sizes and the storage format. The auxiliary image buffer is a two dimensional matrix

with a dimension of 32676 by 300 FP values and its main role is to buffer processed data in

its numerical FP format from every BF channel generated directly from the algorithm function

class, AuditoryPeripheryCompute. Alternatively, the image buffer is also a two dimensional

matrix that stores pixel colour values translated from the contents of the auxiliary image

buffer that represent an image extending in direction of positive x- and y-axes. The contents

of the image buffer are directly mapped on to the RTAP UI display window.

The JUCE library contains an image class that allows an image buffer allocation. The

image buffer allocation is invoked in the constructor of the AuditoryPeripheryJUCEdisplay

class as follows:

Plot2Display = Image (Image::ARGB, IMGWIDTH, IMGHEIGHT, true);

The width and height of the image is set as constants of 65536 and 600 pixels respectively.

The length has been pre-set to accommodate as many pixels as possible including off

screen pixels that are required for image plot scrolling. The height of the image buffer is fixed

based on the maximum number of BF channels that is required to be projected on screen,

which is capped at 300 channels. Each FP sampled response data generated form the

algorithm in RTAP is represented by a 2-by-2 combination of pixels. Hence, the image buffer

size is twice the length and height of the auxiliary image buffer.

The second task involves the setting of pixel on the image buffer to a particular

colour. This is defined as off-screen rendering where pixels are set on the image buffer

before being projected on the display window. The JUCE library has a built-in function that

renders a pixel on the image buffer as follows:

Plot2Display.setPixelAt(x, y, c[m]);

The JUCE setPixel function call sets a pixel denoted by the Cartesian coordinates, x and y in

the image buffer, Plot2Display, to a colour, c[m]. The last task requires the image buffer to

be projected on RTAP UI display window and this is achieved through the following JUCE

library function:

102

g.drawImage ( Plot2Display, destX, destY, destWidth, destHeight, sourceX, sourceY,

sourceWidth, sourceHeight );

The drawImage function which is a member of the graphics class, g, directly translates a

segment of the image buffer on to the display window using screen start coordinates as well

as the size of the segment to be displayed.

The major advantage of the breakdown of the tasks in such a way is that profiles for

each of the three tasks can be measured individually. The memory allocation for the image

buffers take place at the start of RTAP before the algorithm is computed whereas the pixel

settings and the projection of the image buffer is done during the runtime of the algorithms,

typically after the processing of one window frame. As a result, memory allocation of the

image buffer during the runtime of RTAP is not required thereby, reducing redundancy in

terms of processing time. The tasks involving image buffer projection onto the display

window is presented as part of the maximum load profile in section 6.2.

5.1.3 Pixels Render and Image Display Threads

In the initial design of RTAP, algorithm function invocation takes place from timer

callback function called paint from AuditoryPeripheryJUCEdisplay class that is invoked every

58ms. This is the same function that renders the plot on to RTAP UI display window. Hence,

the algorithm and rendering tasks needed to be executed in series within a time frame of

58ms. This is problematic as the algorithms are required to wait for the conclusion of the

rendering task before executing the next window frame. This would have resulted in the

algorithms to process fewer number of BF channels. To accommodate more BF channels,

the algorithms and the graphics management had to be segregated into separate threads.

The graphics tasks of off-screen image buffer rendering and display are divided into

two separate threads. The image buffer projection is an integral part of the UI display

window implemented by paint(), which is part of the main RTAP thread. Hence, only the pixel

render thread has to be created explicitly and is done so at the constructor of

AuditoryPeripheryJUCE class. Once the pixel render thread is created, a while loop within

the thread is entered that last for the duration of the RTAP application. In the while loop, the

thread then transits from an active to a ready-to-run state as it encounters a POSIX thread

based instruction that directs the thread to only proceed at the presence of a condition

variable (CV).

A CV is issued by the algorithm thread from the AuditoryPeripheryCompute class to

signal that there are processed data available in the auxiliary image buffer for plotting. As the

algorithm thread continues executing subsequent window frame, the pixel render thread

upon receiving the CV transits from a ready-to-run state to an active state. It starts to service

103

the contents of the auxiliary image buffer by translating processed data into pixels that are

rendered on the image buffer. Once the rendering operation is complete, the thread goes

back to a ready-to-run state within the while loop where it waits for the arrival of new

processed data from subsequent window frames via CV signalling.

5.1.4 ERB Scaled Plots

Plotting processed data on the ERB scale (ERBS) on RTAP uses equation 4.1 as the

fundamental equation to convert frequency to ERBS. Equation 4.2 derived from equation 4.1

that defines the characteristic of the recorded processed data for illustration cannot be

directly implemented into RTAP without modification to the scaling exponent, S. The ERBS

plots in section 4.6 were obtained by adjusting the scaling exponent as a constant through

trial and error. This was done to ensure that the output responses from the various stages of

the AP model represented on the ERBS plots were visibly distinguishable from one another

with their peaks and troughs clearly observable while not sufficiently large to overlap

adjacent signals. For DRNL responses, the scaling exponent is set to 22 due to the small

amplitude range of the BM displacement while for ANSP LSR response, the scaling

exponent is set to -9.

In RTAP, this form of normalisation is achieved using the maximum and minimum

parameters in one window frame of processed data regardless of the number of BF

channels used. The formula used in RTAP is as follows:

�¡zf (>) = �o/cM/ − 21.4C?;Ht(0.00437& + 1) + �Z(/)7�w��(/)�w��(/)7�w��(/) 2�o(c-5R6 (Eqn. 5.1)

where �¡zf (>) is the y-coordinate representation of the response signal to be displayed on

ERB scale;

�o/cM/ is the offset from the point of origin of RTAP UI display window;

�o(c-5R6 is the distance between two adjacent BF processed signal defined by equation 5.2.

�o(c-5R6 = ®��7®̄ Y�°YRZ|H (Eqn. 5.2)

where ��Rk is the final vertical point on RTAP UI display window where the pixels are plotted.

This is set as a constant of 490;

�o/cM/ is the beginning vertical point on RTAP display window where the pixels are plotted.

This is set as constant of 200;

@f� is the number of BF channels. The interval between two adjacent signals represented in

ERBS is needed to calculate intermediate start points, �5R/7o/cM/, for all signals between the

. highest and lowest BF signals

104

�5R/7o/cM/(A) = ��Rk − (2A − 1)� (Eqn. 5.3) o(c-5R6where is an incremental index starting from 0 to A @f� − 1 Equation 5.1 multiplies . �o(c-5R6 by

2 so as to scale the normalised processed sample data to be within the boundaries of

positive and negative half cycle of a BF signal. Segmentation in this manner ensures that

boundaries are clearly defined for two adjacent BF signals and that signals do not overlap

when projected on screen.

Figure 5.2: ERBS representation of the first window frame of DRNL response in RTAP based on 85 BF channels.

RTAP is able to generate and project 85 evenly spaced DRNL signals for the first

window frame based on the settings of table 4.2 on machine 1 as illustrated in in figure 5.2.

Amplitude and phase increases from higher to lower frequencies. Though, the amplitude of

signals below 500 Hz reduces. As the number of BF channels increases, more signals are

condensed in the UI display window. If the interval between adjacent signals become small,

spaces between individual signals on the ERBS graph will become insignificant to a point

that the results become incomprehensible. A better option is to harness the colour display

capabilities of the spectrogram to differentiate signal intensities. Furthermore, the limit of the

number of BF channels that can be displayed on the UI display window will increase with a

spectrogram implementation given that a window frame of data for 1 BF channel can be

represented by a only a single row of pixels as compared to the ERBS plot. Figures A.11 to

250

379

669

1179

2003

3402

6000

0 0.01 0.02 0.03 0.04 0.05 0.06

BF

Sit

es

(Hz)

Time (ms)

Maximum Load on Machine 1 (DRNL stage)

105

A.15 in appendix A contain the rest of the RTAP generated responses of the IHCRP, NRR

and ANSP stages.

5.1.5 Spectrogram Plots

An alternative plotting method used for assessing response of the auditory model is

through the use of a spectrogram. A spectrogram is used for displaying the variation of

spectral density of a signal with respect to change in time [55]. The intensity of a frequency

component, f, at a given time, t, of an input signal in a spectrogram, S, is depicted by the

shade or colour of the resultant point S(f, t). Hence, in a black and white spectrogram, a

signal with larger variation in its amplitude within a discrete frequency will be represented by

pixels in darker shade of black while a smaller amplitude variation generates pixels in a

higher shade of white. In a similar contrast, a multi-coloured spectrogram defines the

intensities of a signal in various colours. This colour contrast is unique to RTAP and will be

described in the paragraphs below.

In RTAP, the colours of a spectrogram are rendered using the setPixelAt JUCE

function. So far, for the ERBS plotting in RTAP, the first two parameters in setPixelAt were

used to define the x and y coordinates of the pixel to be set on RTAP UI display and the third

parameter was left as a constant. For the spectrogram rendering, all three parameters are

varied to represent the intensities with different colours. In particular, the third parameter in

the setPixelAt defines a specific colour through the use of a C++ colour class. As a colour is

represented in the JUCE library in the form of C++ class, memory resources have to be

allocated for the colour classes every time setPixeAt is called. This memory allocation during

runtime adds significant processing time to the computing process. To remedy the situation,

a finite number of colours are pre-allocated in the constructor AuditoryPeripheryJUCE class

before being used in the pixel render thread. The discrete colour levels are fixed to thirty to

coincide with 30 BF channels, which was the minimum number of parallel BF channels

required to operate in real-time as part of the goal of this project.

In the pixels rendering thread, the thirty pre-allocated colour classes are linearly

distributed in the range of minimum and maximum values from processed data within a

window frame. In other words, the two colours at each end of a single line colour spectrum

will be defined by maximum and minimum values that are found from a range of processed

data within a window frame. All other colours in the spectrum are represented by equal sized

intervals. Each interval when summed up for the length of the colour spectrum will equal to

the difference between the maximum and minimum parameters of a window frame. Equation

5.4 illustrates this relationship:

106

am = at + £ �w��(/)7�w��(/)GGm±H (Eqn. 5.4)

where am is the discretised colour boundary level for comparison with the processed data;

�m5R(>) is the minimum processed data value obtained in 1 window frame;

�mce(>) is the maximum processed data value obtained in 1 window frame;

at is the lowest boundary level for the colour range set at �m5R(>);

N is the pre-allocated number of colour classes set at 30 for RTAP.

The maximum and minimum values of the processed data in one window frame is

retrieved from the algorithm thread following the completion of one processed data

computation within the same recursive loop as shown by the pseudocode listing below:

for (j=0; j<WindowFrameSizePerBF; j++)

{

// perform either DRNL, IHCRP, NRR or ANSP here ...

// find local maxima & minima for 1st window frame

if (dNumAlgoInvk < 1)

{ if (fMaxPixDisplayVal < ProcessedData[j])

fMaxPixDisplayVal = ProcessedData[j];

if (fMinPixDisplayVal > ProcessedData[j]) fMinPixDisplayVal = ProcessedData[j];

}

}

Listing 5.1: Acquisition of maximum and minimum values.

The initial settings for the maximum and minimum parameters, fMaxDisplayVal and

fMinDisplayVal are set to 65535 and -65536 respectively. Within every cycle of the recursive

loop, each processed data is compared with the set values in fMaxDisplayVal and

fMinDisplayVal. If the processed data is larger than the stored parameter in fMaxDisplayVal,

the processed data is stored in fMaxDisplayVal as the new maximum value. Similarly for the

minimum value if the processed data is below the stored parameter in fMinDisplayVal, the

processed data is stored in fMinDisplayVal to represent the new minimum value. Once the

recursive loop is exited, the maximum and minimum value for the window frame is said to

have been found. Thereafter, a POSIX based condition variable (CV) is signalled to enable

the pixels rendering thread to run.

In the pixel render thread, every processed data sample in a window frame will be

compared with the thirty discrete levels of values defined by cm in equation 5.4 within the

range �mce(>) and �m5R(>). Should the respective processed data sample fall in between

floating point variables representing two adjacent colour discrete levels, the setPixelAt

function is called to set the pixels in the image buffer to a colour level based on the pre-

107

allocated colour class indexed by m. The pseudocode for pixel rendering in RTAP

spectrogram is listed as follows:

// 1) Draw pixels from left to right of RTAP display window for (x=0; x < x_Display_Window_Width; x+=PIXEL_WIDTH)

{

// 2) recursive loop to increment vertically from bottom to top of RTAP for (j=0; j < NumBFchannels; j++)

{

// 3) Compute y-axis offset y = PIXEL_Y_OFFSET-(j*PIXEL_WIDTH);

// 4) Scroll through entire 30 colour bands and find appropriate match

// for processed data for (m=0; m<NUMBER_OF_COLOUR_BANDS; m++)

{

// 5) Check whether every processed data fall within the colour band range if ((ProcessedData[j][x] >= Discrete_Colour_Level[m-1]) &&

(ProcessedData[j][x] < Discrete_Colour_Level[m]))

{ // 1:4 pixel rendering

// 1 pixel render on the image buffer:

Plot2Display.setPixelAt(x, y, c[m]); // 3 surrounding pixels also rendered by

// replicating Plot2Display.setPixelAt 3 times

}

} }

}

Listing 5.2: Static spectrogram display.

Figure 5.3: Colour representation of signal intensity in spectrogram.

In figure 5.3, the dark blue colour at the left most corner represents the highest signal

intensity in the negative half of a signal whereas the black colour on the right corner in the

same figure defines the highest signal intensity in the positive half of a signal. The green

coloured block in the centre of figure 5.3 defines zero at the centre of the signal amplitude

range. Table 5.1 summarises the impact of the colour contrast on the different responses

within RTAP. Figure 5.4 demonstrates the DRNL response displayed in RTAP for 180BF

channels with the input settings of 4.2. This DRNL response is the same signal output as its

ERB scaled response counterpart in figure 4.9 except that the spectrogram plot has more BF

channels represented than the ERBS plot.

108

RTAP stages Dark Blue Black

DRNL Lowest BM displacement Highest BM displacement

IHCRP Lowest voltage Highest voltage

NRR Lowest release rate Highest release rate

ANSP Lowest probability of spiking Highest probability of spiking

Table 5.1: Spectrogram colour hue significance to the various stages of RTAP.

Figure 5.4: Spectrogram representation of the first frame window frame of the dual resonance nonlinear (DRNL) filterbank response in RTAP for 180 BF channels.

For the DRNL response in figure 5.4, the signal is the most intense at the lower BF

region and this corresponds to the input stimulus frequency of 500 Hz. The travelling wave of

the positive half of the first sinusoidal cycle begins from the high BF of 6 KHz represented by

a vague and thin yellow strip starting from the top and left of the spectrogram going

downwards in figure 5.4. As the signal amplitude and phase increases, the yellow strip

thickens and starts leaning slightly towards the right as it approaches towards the 500 Hz BF

site and beyond. The negative half of the first sinusoidal cycle follows the same path and has

the same phase lag as the traveling wave of the positive cycle. The only exception is that the

representation of the negative half cycle in figure 5.4 is blue in colour and is right shifted

from the yellow coloured strip. The colour representation of the positive half of the second

sinusoidal cycle travelling wave onwards gradually changes from yellow to red when

approaching the 500 Hz BF site and then to black at the 500 Hz BF site region signifying the

250

390

676

1111

2031

3460

6000

0 0.01 0.02 0.03 0.04 0.05 0.06

BF

Sit

es

(Hz)

Time (ms)

Maximum Load on Machine 2 (DRNL - Unoptimised)

109

highest amplitude gain. Beyond the 500 Hz BF region, the signal returns to red and then

finally to yellow to indicate the decay in amplitude of the travelling wave. The negative half of

the second sinusoidal wave has similar characteristic and at the least intense BF region it is

represented in dark blue before returning to a lighter shade of blue for dying amplitudes.

Figure 5.5: Spectrogram representation of the first window frame of the inner hair cell receptor potential (IHCRP) response in RTAP for 123 BF channels.

The colour hue of the generated RTAP spectrogram of IHCRP response is quite

different from the DRNL response. This is identifiable by the colour representation of the two

responses. From figure 5.5, it can be concluded that IHCRP response possesses more

distinguishable sinusoidal signals than DRNL response though it has the same traits as the

DRNL response in terms of amplitude gain and phase characteristics. The NRR and ANSP

responses in figures 5.6 to 5.9 indicate that the negative half of the signals from IHCRP

response have been filtered leaving the positive half to produce vesicle release and AN

spiking probabilities in both LSR and HSR fibres. The AN spiking probability in the HSR

fibres in figure 5.9 drop considerably as time progresses and this can be observed from the

transition of colour hue from red to yellow to green at the 500 Hz BF site. This response is in

tune with a typical tone burst AN HSR response, in which there is a sharp reduction in the

likelihood in spiking in the first 10ms to 20ms of the onset of the stimulus followed by a

slower likelihood of AN firing thereafter [56].

250

389

673

1133

2009

3472

6000

0 0.01 0.02 0.03 0.04 0.05 0.06

BF

Sit

es

(Hz)

Time (ms)

Maximum Load on Machine 2 (IHCRP - Unoptimised)

110

Figure 5.6: Spectrogram representation of the first window frame of the neurotransmitter release rate (NRR) response for AN LSR fibres in RTAP for 96 BF channels.

Figure 5.7: Spectrogram representation of the first window frame of the neurotransmitter release rate (NRR) response for AN HSR fibres in RTAP for 81 BF channels.

250

386

682

1165

2057

3513

6000

0 0.01 0.02 0.03 0.04 0.05 0.06

BF

Sit

es

(Hz)

Time (ms)

Maximum Load on Machine 2 (NRR LSR - Unoptimised)

250

387

675

1131

2053

3440

6000

0 0.01 0.02 0.03 0.04 0.05 0.06

BF

Sit

es

(Hz)

Time (ms)

Maximum Load on Machine 2 (NRR HSR - Unoptimised)

111

Figure 5.8: Spectrogram representation of the first window frame of the auditory nerve spiking probability (ANSP) response for LSR fibres in RTAP for 85 BF channels.

Figure 5.9: Spectrogram representation of the first window frame of the auditory nerve spiking probability (ANSP) response for HSR fibres in RTAP for 65 BF channels.

250

394

669

1136

2003

3402

6000

0 0.01 0.02 0.03 0.04 0.05 0.06

BF

Sit

es

(Hz)

Time (ms)

Maximum Load on Machine 2 (ANSP LSR - Unoptimised)

250

391

675

1165

2012

3475

6000

0 0.01 0.02 0.03 0.04 0.05 0.06

BF

Sit

es

(Hz)

Time (ms)

Maximum Load on Machine 2 (ANSP HSR - Unoptimised)

112

5.2 Scrolling Plot Display

5.2.1 Background

In RTAP, data is streamed to the CPU for computation in contiguous blocks of data

at fixed time intervals regardless of the audio input source. As the end time in a real-time

system is non-deterministic, processed data has to be either logged or visually projected for

analysis. In the case of a real-time computational model, a feasible method of processed

data analysis is through the means of a continuous visual projection via the computer display

screen. Similarly, RTAP is designed to display real-time processed data through a dynamic

image buffer display that is rendered as a scrolling image on its UI display window. The

image is scrolled from the right to left of the screen with the latest processed data first

projected on the right of the display window.

The UI display window width for RTAP is dependent on the screen resolution settings

of the computer that RTAP runs on. The length of RTAP UI display window running on

machine 1 is 1350 pixels which is slightly larger than 1280 pixels, which is the window frame

size under the largest sampling frequency setting of 22.05 KHz. Hence, machine 1 is

capable of displaying one full window frame of processed data and a small fraction of the

subsequent window frame on the UI display window. However, if processed data from every

window frame is to be displayed in its full extent, the display window will need to be

refreshed with a new image every 58ms. An ordinary static positional screen display of

refreshed images at a rate of 17 Hz is too quick to be interpretable. Hence, this option is

clearly not feasible for implementation in RTAP.

A scrolling image traversing from right to left at a decent scroll speed that renders all

the processed data as pixels will also pose a potential problem of auxiliary image buffer

overrun. This is the case because the auxiliary image buffer is utilised as a circular buffer. As

an example, consider the scenario where audio data is streamed from the microphone

channel and processed by the algorithms. Pixel rendering on the image buffer is broken

down into various segments of equal length size based on the image scroll speed. As the

image buffer gets filled with pixels from the segmented sub-frame of the first window frame,

the auxiliary image buffer is being filled with processed data from subsequent window

frames. Because the processed data quantity written to the auxiliary image buffer is larger

than the processed data quantity written to the image buffer, the auxiliary image buffer will

eventually be overwritten with new data before it can write the pixels into the image buffer.

113


To avoid auxiliary image buffer overrun without compromising the scroll speed,

processed data write size or the quantity of processed data to be transferred to the auxiliary

image buffer has to be condensed. This write size to the auxiliary image buffer has to be the

same as the pixel write size to the image buffer, which in turn is identical to the number of

new pixels displayed to the right of the UI display window. One way to reduce the number of

processed data transfer to the auxiliary image buffer is by averaging sub-frames of

processed data within a window frame. However, this causes the loss of vital information

such as the amplitude of processed data signal. An alternative method that is adopted, is

subsampling, which involves extracting processed data sample at an interval of a

subsampling period. Listing 5.3 describes the pseudocode for subsampling.

for (i=0; i<NumBFchannels; i++)

{ // 1) Algorithm segment ...

for (j=0; j<WindowFrameSizePerBF; j++) {

// 2) Algorithm segment ...

// 3) Subsampling for scrolling plot

if (Scroll_Plot_Option_Selected)

{

// 4a) Continue to store processed data for scroll display // once subsample variable is decremented to 0

if (SubSample == 0)

{ // 5) Store the processed data

Image_Buffer[i][ImgBuff_Write_Track[i]] = ProcessedData[k][j];

// 6) Reset the subsample variable to acquire the next available

// processed data

SubSample = 100;

// 7) Increment image buffer tracker to point to the next image buffer

// segment

ImgBuff_Write_Track[i] = (ImgBuff_Write_Track[i]++) % IMGTEMPSTORESIZE; }

}

else // 4b) Decrement subsample variable

SubSample--;

... // Continue with other tasks...

...

}

}

Listing 5.3 Subsampling processed data in all algorithm functions.

Immediately after the subsampling stage in the algorithm function, the pixel render

thread is invoked via the POSIX condition variable (CV), cvDrawPlot. In the render pixel

114

thread, the subsamples from the auxiliary image buffer are translated on to the image buffer.

Though the size of the image buffer is larger than the UI display window in RTAP, the image

buffer is clipped according to the size of the UI display window. The scrolling effect in RTAP

is produced by the discrete shifting of the clipping region to the right of the image buffer at

every instance of the timer callback function invocation. This effect is illustrated in figure

5.10. The auxiliary image and image buffers are implemented as circular buffers, which

mean that once the end of the buffers are met, data processing at the start of the buffers

immediately follow. This implementation gives a continuous scroll effect regardless of the

algorithm processing runtime.

Figure 5.10: Image buffer clipping and projection of the display window.

Since the plot is scrolled from right to left of the UI display window, the pixels are

required to be rendered on the right of the display window just beyond the clipping region. It

was stated a few paragraphs ago that the size of the image buffer segment hosting the new

pixels for display on the right of the UI display window has to be identical to the processed

data write size to the image and auxiliary image buffers so as to project an instant response

Display window

Image buffer

*Frame 5

Display window

Image buffer

*Frame 20

Display window

Image buffer

*Frame 50

Display window

Image

*Frame 70

Clipped image buffer (projected on display

window)

* Arbitrary frame numbers for illustration

purpose only.

115

from the image scroll. Hence, in order to facilitate this requirement, the subsampled

processed data from the auxiliary image buffer has to be translated to the right of the UI

display window off-screen and scrolled into the display window at the next instance of

display window refresh. This technique generates a rapid and continuous response of the

image scroll effect. The pseudocode listing for this technique is laid out below.

// 1) Ensure that the algorithm thread not processing the same audio data packet

if (Pixel_Render_count_semaphore != Algorithm_Processing_count_semaphore) {

// 2a) Check if the scrolling image has reached the left end of the display

// window if (Display_Start_X_Position > 0)

{

// 2b) Update variable if it image has not reached to the left end ImgBuff_Display_Start_X_Position -= Scroll_Speed;

}

else

{ // 2c) Maintain image display from left to right of display window

// once image has scrolled to far left of display window

ImgBuff_Display_Start_X_Position = 0; }

// 3) Prepare to render pixels in the x-axis for (x=0; x<= Num_of_ProcessedData_to_display; x++)

{

// 4) Update global image buffer read offset

ImgBuff_Read_Track = (x + ImgBuff_Write_Track[0]) % sizeof(ImgBuff); // 5) Update the x-axis position for pixel rendering

ImgBuff_X = (ImgBuff_Render_Offset + x) % sizeof(Aux_ImgBuff);

// 6) Erase the column of the image buffer where pixels are to be drawn

// 7) Skim through all BF channels for (j=0; j< NumBFchannels; j++)

{

// 8) Compute y-axis offset either for ERBS or spectrogram plot

// 9) Render pixels either for ERBS or spectrogram plot

Plot2Display.setPixelAt (ImgBuff_X, y, Pixel_colour );

} }

// 10) Update global image buffer x-axis offset to point to the next location

// for rendering pixels ImgBuff_Render_Offset = (ImgBuff_Render_Offset + Scroll_Speed) % IMGBUFFSIZE;

// 11) Reset variable to indicate no more pixel rendering for current frame

Num_of_ProcessedData_to_display = 0;

... // Continue with other tasks...

...

}

Listing 5.4: ERBS and spectrogram plot scrolling.

The scrolling image has to account for two events that are unaccounted for in the

static display. In static image display, the image to be drawn is written to a fixed location

116

within the image buffer. Scrolling image display, however, has to draw pixels constantly on

various adjacent locations on the image buffer as the buffer is scrolled horizontally. The

image buffer is arranged as a circular buffer or in other words, the end of the image buffer is

immediately followed by the start of the same image buffer. Hence, at the conclusion of the

drawing of pixels on the image buffer at its end, the pixels are redrawn back at the start of

the image buffer. Before pixels are painted on to the image buffer, the image buffer segment

where the pixels are to be drawn need to be erased. This is done to ensure that signals are

not overlapped on top one another as the pixel render thread move from the end of the

image buffer to the start. This is carried out in a recursive loop where black pixels are drawn

one pixel at a time along the entire height of the image buffer for the width of two pixels.

Hence, pixels erasure and rendering are carried out at every segment of the image buffer

measuring 2-by-600 pixels.

5.2.3 Results

Subsampling preserves amplitude at regular intervals and hence, allows

distinguishable projection of processed data for a respective notable auditory stimulus. Its

effects are observable in figures 5.11 and 5.12 that depict the ANSP responses for LSR and

HSR fibres for speech. The input stimulus is a sentence spoken as follows: ‘Door with no

lock to lock’. Three voices are generated from a text-to-speech converting website [57] and

played at maximum volume through built-in speakers on machine 1 that are acquired by

RTAP through the microphone input channel. The minimum and maximum BF range is set to

250 Hz and 6 KHz respectively with a sampling rate of 22.05 KHz and an input stimulus

scale of 50 dB SPL. The number of BF channels is set to 30 though only 28 channels are

displayed in the figures below as no activities take place beyond 5 KHz.

As the drawing of the ERBS plots on screen utilise pixel instead of line rendering, the

results of the speech are presented in the form of a real-time scatter plot. In figure 5.11, the

concentration of pixel population rendering for the male voice occur at lower frequencies up

to 1.8 KHz and for the female voices up to 2.8 KHz, which translates to the likelihood of

firings within LSR fibres of auditory nerves in the aforementioned auditory spectral range.

Alternatively, the AN HSR fibres generate more significant AN firing probability. In the plot of

responses of HSR fibres in figure 5.12, ANSP is active for all frequencies up to 4 KHz

though significant firings take place at 1.5 KHz and 1.8 KHz for male and female voices

respectively. This can be observed from the larger concentration of pixel clustering in the

range of 250 Hz to the BF sites in the proximity of 1.5 KHz and1.8 KHz.

117

5.3 Summary

It has been demonstrated that RTAP is capable of displaying static responses of

processed data in either ERB scaled or spectrogram graphs. The equations characterising

ERB scaled plots covered in chapter 4 are modified to automatically generate signals with

even spacing that do not overlap in RTAP when they are placed in a single graph. For

spectrogram plots, thirty linearly distributed colours are utilised to define intensities of

processed data in a single window frame. RTAP is also capable in projecting scrolling

display of ERB scaled plots. This was illustrated in the last section of this chapter where

ANSP responses based on audio streamed from a built-in microphone channel on a laptop

has been covered for speech in different voice settings.

118

Figure 5.11: ANSP response in LSR fibres of real-time speech illustrated in RTAP.

4819Hz

4319Hz

3871Hz

3469Hz

3109Hz

2786Hz

2497Hz

2238Hz

2005Hz

1797Hz

1611Hz

1444Hz

1294Hz

1159Hz

1039Hz

931Hz

835Hz

748Hz

670Hz

601Hz

538Hz

483Hz

432Hz

388Hz

347Hz

311Hz

279Hz

250Hz

Time (sec)

Joey – USA (male) Salli – USA (female) Nicole – Australia (female)

119

Figure 5.12: ANSP response in HSR fibres of real-time speech illustrated in RTAP.

250Hz

279Hz

311Hz

347Hz

388Hz

432Hz

483Hz

538Hz

601Hz

670Hz

748Hz

835Hz

931Hz

1039Hz

1159Hz

1294Hz

1444Hz

1611Hz

1797Hz

2005Hz

2238Hz

2497Hz

2786Hz

3109Hz

3469Hz

3871Hz

4319Hz

4819Hz

Time (sec)

Joey – USA (male) Salli – USA (female) Nicole – Australia (female)

120

Chapter 6: Optimisation and Load Profile

One essential feature of a real-time auditory pathway model is that it has to be

capable of accommodating a large number of BF channels. A simulation of a stage within a

real-time auditory pathway model with a large range of BF channels will be able to project a

wide spectral range of cochlea responses. This allows the study of the perception of a wide

variety of auditory stimulus. One method of achieving large loads in terms of the number of

BF channels is to study the algorithm in the model and identify the segment that takes up the

longest time to process. Following this, investigations are required to be carried out into

optimising the segment of the code with faster code that generates output close to the

original code. By reducing the processing time in such a way, there will be ample room to

increase the number of BF channels in the real-time auditory pathway model.

This chapter describes mathematical optimisation as a form of reducing algorithm

processing runtime of RTAP and to increase the number of BF channels to represent more

discrete frequency points. It will be shown that though the number of exponential functions

used in RTAP is small, it nonetheless take up significant processing time. Hence, a faster

version of an exponential function will be implemented in RTAP and the responses of every

stage within the auditory pathway model will be exhibited. The final topic of this chapter

deals with maximum load profiles of RTAP executed on two computers under single and

double precision executions. Within the same section, thread profiles of RTAP are covered.

6.1 Mathematical Optimisation

6.1.1 Background

The compiler from Microsoft Visual Studio (MVS) is used as a platform for building

RTAP and it offers build optimisations to speed up the program. However, the optimisation

facilities of MVS compiler are unused. This is because code optimisation of such pedigree

potentially brings about non-deterministic mathematical responses that are impractical to

debug due to the large algorithm base used in RTAP. Build optimisations are therefore,

overlooked in RTAP and an alternative form of optimisation is required.

Mathematical operators and functions are used extensively in RTAP. Mathematical

operators consist of basic mathematical operations such as additions, subtractions,

multiplications and divisions. These operators with the exception of multiplication and

division are mapped in the C++ library directly to assembly language instructions that use

single CPU clock cycle when executed. Multiplications and division generally use several

CPU cycles and are dependent both on available hardware resources as well as software

library implementation. However, on a CPU with high clock rate in the range of gigahertz, the

121

clock cycles that are used up during the execution of these operators are insignificant due to

the small tick size, which ensures that these math operators are computed rapidly [58].

Mathematical functions in RTAP generally comprise exponential and logarithmic

functions as well as trigonometric entities such as sine and cosine functions. These functions

are special functions that have reliance on software math library that provides an abstract

layer of codes utilising basic mathematical operators to bring about the desired outcome.

One example is the exponential function that can be computed with Maclaurin series using

basic math operators [59]. Hence, an exponential function will require more time in execution

than a single mathematical operator on a program compiled using any C++ library. In terms

of algorithm computation, RTAP is broken into two parts. The first segment contains the

initialisation of constants and computation of coefficients in the non-real-time segment of

RTAP when the ‘Set’ button is clicked. As this segment is not time critical, the Visual Studio

C++ math library is used for computing all the parameters within this segment. In the real-

time segment, though, it is essential that the code executed take as little time as possible so

as to compute optimum number of tasks. Hence, code optimisation especially in the real-

time segment is essential.

Optimisation of basic mathematical operators in the real-time segment is ignored and

only the special mathematical functions are considered for optimisation. Exponential and

natural logarithmic functions are the only two mathematical functions that are generally used

throughout the real-time segment of RTAP. Of these two mathematical functions, the

exponential function is more widely used as opposed to the natural logarithmic function.

Table 6.1 projects the mathematical functions used in real-time segment of RTAP and the

maximum number of invocation for the highest load profile for each stage within RTAP. It is

observed that the number exponential function is invoked as many as three times more than

the natural logarithmic functions when IHCRP response is computed regardless of the

number of BF channels. This ratio increases to five times for NRR and ANSP responses with

high and low spontaneous AN fibre types utilised during runtime.

122

RTAP functions

running in real-time

Mathematical

Functions

Implemented in

Code Functions

Maximum number

of BF channels /

Maximum number

of AN channels

(machine 1)

Number of times math

functions invoked for

maximum number of

BF and AN channels

on machine 1

exp() log() exp() log()

DRNL 1 1 85 85 85

DRNL-to-IHCRP 3 1 56 168 56

DRNL-to-NRR* 5 1 36 / 72 180 36

DRNL-to-ANSP* 5 1 30 / 60 150 30

* Based on two AN fibre type channels.

Table 6.1: Non-optimised mathematical functions utilised in RTAP.

It can be deduced from table 6.1 that the mathematical functions used in the real-

time segment of RTAP are significant for large number of BF channels. Various forms of

optimisation are required to be explored to reduce the impact of such mathematical

functions. One form of mathematical optimisation is in the form of utilisation of a different

math library that offers fast math processing and brings about the same deterministic

mathematical response. MVS has its own math library that is invoked during compile time of

RTAP. An alternative to the MVS math library is the Intel Math Kernel Library (MKL). The

MKL can be implemented on MVS that substitutes only the math library of MVS while still

using the MVS compiler for the building of RTAP. RTAP compiled with MKL can be built with

three separate settings. Parallel MKL settings allow optimisation of algorithms in threaded

application while sequential settings optimise algorithm in a serialised application. Finally,

cluster setting is used for cluster computing where various computers are connected to work

as a single computing system [60] . The profiles of the utilisation of these libraries are

presented in the next section.

Another form of optimisation is to provide an alternative method for computing the

mathematical functions other than those provided by available math libraries. One such

method is through the manipulation of bits and the manner of reading an 8 byte floating point

(FP) variable as proposed by Schraudolph [61]. FP variable storage in MVS is based on the

IEEE-754 standard that is available in either 4 bytes or 8 bytes format [62]. Schraudolph

exponential macro function code relies on 8 byte FP format. The 8-byte FP has three

segments, namely, a 1-bit sign, 11-bit exponent and 52-bit mantissa segments. Equation 6.1

is used to reconstruct the 8-byte binary data into a FP parameter.

S� @B=��E = (−1)o(1 + =)2e7e! (Eqn. 6.1)

123

where s is the sign bit; m is the mantissa; x is the exponent that is shifted by a constant bias,

x0. The FP format is illustrated in figure 6.1. For the purpose of explaining fast exponential

function, the 8-byte FP format in figure 6.1 is further divided into two halves represented by i

2eand j. The principle of fast exponentiation of input, x, works in the form of using the

parameter in equation 6.1 and then dividing the result by the natural logarithm of 2 to obtain

�e. In other words, x is manipulated as an integer via integer, i, by adding it with the bias, x0,

and then left shifting it. Reading back the integer representation, i, in FP format automatically

initiates an exponentiation effect. This algorithm is described in larger detail in the following

paragraph.

Figure 6.1: 64-bit floating point format divided into two halves for fast exponentiation.

Extracted from Schraudolph [61].

The first step in Schraudolph exponentiation function of a variable x, is to add bias,

x0, which is a constant of 1023 to x. Subsequently, the resultant addition is shifted to the left

by 20 bits through multiplication with a constant of 220 so that the high order mantissa bits of

x reside in the exponent segment. If x is a floating point variable, the fractional part of its

resultant biased 20-bit shift will create a spill over to the highest order bits in the mantissa.

This outcome is effective in providing a linear interpolation of adjacent integer exponent and

attains a 211 lookup table registry that consists of linearly interpolated results. Scaling the

variable x by the 20-bit shift constant, 220 and dividing it by the natural logarithm of constant

2 before adding the 20-bit shifted bias results in the exponentiation of x. The fast exponential

function can be characterised by the following equation.

A ≔ <^ + (� − a) (Eqn. 6.2)

where i is the integer form of ex;

x is the input parameter to be exponentiated;

a is the left bit shifted scalar given by 2)t/C@(2);

b is the bias effect of y generated by 1023. 2)t;

c is a control parameter that adjusts the approximation of the fast exponential function.


The fast exponential function is defined as a macro code and is shown in listing 6.1.

A union C/C++ declaration contains an 8-byte FP number as well as two 4-byte integers. For

a single precision computation with a 4-byte FP, the FP is type casted to an 8-byte FP

124

variable. In other words, the exponent and mantissa of the 4-byte FP are padded with extra

32-bits of data before being stored in an 8-byte allocated memory location. A union

declaration allocates memory for its largest single member variable, which is 8 bytes for this

case due to the utilisation of an 8-byte double variable. The two other integer variables are

allocated in little endian format that stores the 8-byte FP variable in two halves. This is

possible as the two integer variables at a total size of 8 bytes share the same memory space

as the double variable. Hence, a macro code EXP(x) initially computes the integer equivalent

of the exponentiation of variable, x by manipulating the integer representation of the higher

4-bytes of FP variable in integer i. The result of the lower 4-bytes in integer j is ignored. The

two integers are then read back as a FP variable that results in the approximation of the

exponentiation of y.

static union {

double d;

struct {

#ifdef LITTLE_ENDIAN

int j1, i1; #else

int i1, j1;

#endif

} n; } eco;

#define LN2 0.69314718056 // natural log 2 #define EXP_A (1048576/LN2) // 2

20 / ln2

#define EXP_B 1072693248 // 1023*220

#define EXP_C 60801 // based on lowest RMS relative error

#define EXP(y) (eco.n.i1 = EXP_A*(y) + (EXP_B - EXP_C), eco.d)

Listing 6.1: Fast exponential computation.

The execution time profiles for exponential function based on MVS and MKL math

libraries as well as Schraudolph fast implementation measured on machine 1 are shown in

table 6.2. The timings are based on the exponential function used in an iterative for loop that

invokes the respective exponential functions 1280 times. The number of invocation is

identical to the number of sampled audio data acquired in one window frame for a sampling

frequency of 22.05 KHz. Code listing for the profile test is as follows:

x = -5; // constant exponent value

// Start time for math library based exponential function for (i=0; i<1280; i++)

{

y = exp(x); // MKL library based exponential function }

// End time for math library based exponential function

125

// Start time for Schraudolph fast exponential function

for (i=0; i<1280; i++) {

y = EXP(x); // Schraudolph implementation of fast exponential function

}

// End time for Schraudolph fast exponential function

Listing 6.2: Code for comparing MKL & Schraudolph exponential function.

From table 6.2 it is observed that the performance of exponential function originating

from MVS and MKL math libraries are relatively similar to one another. However,

Schraudolph exponential implementation is more than two times faster than with any of the

C++ math library enabled. Hence, due to the significant performance enhancement, the

macro code of Schraudolph fast exponential function is implemented in a global header file

within the structure of RTAP.

Compiler Options C++ math lib

exp() - (ms)

Schraudolph

EXP() - (ms)

Speedup with

Schraudolph

exponential function

Release Mode / No Intel IPP /

No Intel MKL (MVS Math

library used)

0.0609923 0.02724686 2.238507483

Release Mode / Intel IPP on /

No Intel MKL (MVS Math

library used)

0.0602093 0.02732512 2.203441376


Intel MKL on (Parallel)

0.06052258 0.0271686 2.227666497


Intel MKL on (Sequential)

0.06067916 0.02724688 2.227013148


Intel MKL on (Cluster)

0.06130546 0.02732514 2.243555202

Table 6.2: Performance comparison of exponential function in MVS and MKL math libraries

and Schraudolph algorithm on machine 1.

6.1.3 Optimised RTAP Responses

The results exhibited in this subsection are acquired from RTAP with the settings of

table 4.2 and with Schraudolph fast exponential function enabled. Additionally, the version of

RTAP used is compiled on MVS with Intel MKL parallel settings enabled as well. The results

of all the stages within the AP model with fast exponential function enabled were acquired

126

from machine 2 with maximum number of BF channels used and algorithm processing time

approximately close to 58 ms. The results of machine 1 is identical to that of machine 2

except the maximum number of BF channels used in machine 1 is lower than machine 2.

The larger of the two maximum BF load between machines 1 and 2 is selected for projection

in this subsection.

Processing time for generating the responses from all stages within RTAP has been

lowered with fast exponential function utilisation. Hence, RTAP is able to load more BF

channels. This additional loading will be discussed in detail in the following section. Figure

6.2 illustrates the effects of exponential function optimisation on DRNL response obtained

from machine 2. The result in figure 6.2 is identical to the non-optimised result in figure 5.4

with the exception that these results in figure 6.2 were generated with a larger number of BF

channels. With fast exponential function disabled, the number of BF channels that can be

loaded on to RTAP running on machine 2 is 178. With the fast exponential function utilised in

the compressive function in the nonlinear branch of DRNL computation, RTAP on machine 2

is able to accommodate 192 BF channels to compute the DRNL response.

Figure 6.2: Dual resonance nonlinear (DRNL) response generated in RTAP based on optimised exponential function for 192 BF channels.

The IHCRP response as a result of the fast exponential function executed in RTAP is

different from the non-optimised original plot in figure 5.5. The positive half cycles of the

signals around the stimulus frequency of 500 Hz BF site in the IHCRP response in figure 6.3

are maintained while the magnitudes of the negative half cycles of the signals are made

250

386

676

1167

2014

3477

6000

0 0.01 0.02 0.03 0.04 0.05 0.06

BF

Sit

es

(Hz)

Time (ms)

Maximum Load on Machine 2 (DRNL - Optimised)

127

more negative than the original response in figure 5.5. This is evident from the appearance

of the IHCRP response in figure 6.3 that contains more blue coloured pixels than the plot in

figure 5.5. The cause to this is due to the quantisation errors introduced in the approximation

of the two fast exponential functions used in equation 2.13. Thereafter through equation

2.15, the accumulative effects of the quantisation error residing in G(u) is amplified and the

magnitude of IHCRP response is evaluated to be less than non-optimised IHCRP response.

In other words, a reduced magnitude as a result of the use of fast exponential function

increases the negativity in the overall response from the IHCRP stage.

Figure 6.3: Inner hair cell receptor potential (IHCRP) response generated in RTAP based on optimised exponential function for 155 BF channels.

The deviation in the apical conductance, G(u), due to the multiplication of two fast

exponential functions in equation 2.13 has an adverse propagating effect to all other stages

upstream in the auditory pathway. DC bias negative offset of the negative half of the cycles

in the IHCRP instigates suppression of the negative cycles for signals in neurotransmitter

release rate (NRR) and auditory nerve spiking probability (ANSP) stages. Both the low (LSR)

and high spontaneous rate (HSR) fibres responses are affected. The quantisation error

effects are especially detrimental to the computation of the ANSP HSR fibre as it brings

about instability in the algorithm response after computing only 50 data sets within the first

BF channel. The degree of instability introduced by the accumulative quantisation error

effects especially from the multiplication of the fast exponential functions in equation 2.13

are scaled along with the amplitudes of HSR ANSP fibres that linger in the range of 101. Due

250

386

673

1175

2010

3437

6000

0 0.01 0.02 0.03 0.04 0.05 0.06

BF

Sit

es

(Hz)

Time (ms)

Maximum Load on Machine 2 (IHCRP - Optimised)

128

to the larger range of amplitude generated from the HSR fibres algorithm as compared to

other stages as well as the LSR fibre, unstable behaviour is initiated at the start of ANSP

HSR computation. Figure 6.4 demonstrates the unstable HSR ANSP response of the 250 Hz

BF point computed based on Schraudolph exponential function that traverses indefinitely to

the infinite regions of negative domain.

Figure 6.4: Unstable HSR ANSP response after refractory period upon the start.

To eradicate this computational inconsistency in the response from IHCRP stage,

equation 2.13 has to be reviewed again. Since, revamping the equation was not an option; a

basic computing restructuring was required. Equation 2.13 can therefore, be expanded into

the equation as follows:

l(B) = l-595cmce n1 + �^" b− .(/)o! h �^" b.!o! h p1 + �^" b− .(/)oq h �^" b.qoq hrs7H + lc (Eqn. 6.3)

One observation that follows is that the parameters, u0, s0, u1 and s1 are constants and the

two exponential functions hosting these constants can be pre-computed. Moreover, because

there is no time constrains with pre-computation, the exponential function from the Intel MKL

can be used to compute the exponential functions. The two other exponential function in

equation 6.3 handling u(t), which is varying IHC cilia displacement over time, is implemented

using the Schraudolph fast exponential function in real-time. As a result instead of

completely relying solely on the fast exponential functions to compute the apical

conductance, G(u), the computational load is shared equally between the exponential

functions from Intel MKL and Schraudolph fast exponential approximation.

-40000

-30000

-20000

-10000

0

10000

20000

30000

45 50 55 60

AN

sp

ikin

g p

rob

ab

ilit

y

Sample Number

ANSP HSR Response

ANSP Non-optimised

ANSP Optimised

129

Figure 6.5: IHCRP response displayed in RTAP based on optimised exponential function for 155 BF channels.

There is a vast difference in the IHCRP response in figure 6.5 as compared to that in

6.3. The blue pixels that dominated the projection of the spectrogram in figure 6.3 as a result

of the negative DC bias offset in the negative cycle from the fast exponential function have

diminished in the spectrogram of figure 6.5. In fact, the spectrogram in figure 6.5 matches

the spectrogram in figure 5.5 with the exception that the spectrogram in figure 6.5

accommodates more BF channels generated from machine 2 resulting in a higher resolution

spectrogram. The BF loading for non-optimised RTAP running on machine 2 simulating the

IHCRP stage is 125 BF channels while it is 155 BF channels with fast exponential enabled.

The essence of the changes in the computing structure of G(u) has a profound effect

on all stages from IHCRP to ANSP as observed from the spectrograms in figures 6.6 to 6.9

generated with the fast exponential function. Firstly, all of the plots resemble the responses

from the non-optimised plots in figures 5.5 to 5.9 generated with machine 1. Secondly, using

the modified G(u) computing approach of equation 6.3 has eradicated the instability in the

response of the ANSP HSR fibre recorded with fast exponential functions. The NRR LSR,

NRR HSR, ANSP LSR and ANSP HSR responses of figures 6.6 to 6.9 generated with

machine 2 with fast exponential function disabled have the following respective BF load

profile: 100, 78, 89 and 65. With fast exponential function enabled in RTAP running on

machine 2, the BF loading for NRR LSR, NRR HSR, ANSP LSR and ANSP HSR stand at

123, 104, 107 and 79 respectively.

250

386

673

1175

2010

3437

6000

0 0.01 0.02 0.03 0.04 0.05 0.06

BF

Sit

es

(Hz)

Time (ms)

Maximum Load on Machine 2 (IHCRP - Optimised)

130

Figure 6.6: Neurotransmitter release rate (NRR) response for low spontaneous rate (LSR) fibre displayed in RTAP based on optimised exponential function for 123 BF channels.

Figure 6.7: Neurotransmitter release rate (NRR) response for high spontaneous rate (HSR) displayed in RTAP based on optimised exponential function for 104 BF channels.

250

389

673

1163

2009

3472

6000

0 0.01 0.02 0.03 0.04 0.05 0.06

BF

Sit

es

(Hz)

Time (ms)

Maximum Load on Machine 2 (NRR LSR - Optimised)

250

385

671

1169

2037

3443

6000

0 0.01 0.02 0.03 0.04 0.05 0.06

BF

Sit

es

(Hz)

Time (ms)

Maximum Load on Machine 2 (NRR HSR - Optimised)

131

Figure 6.8: Auditory nerve spiking probability (ANSP) response for low spontaneous rate (LSR) displayed in RTAP based on optimised exponential function for 107 BF channels.

Figure 6.9: Auditory nerve spiking probability (ANSP) response for high spontaneous rate (HSR) fibre displayed in RTAP based on optimised exponential function for 79 BF channels.

250

392

672

1154

2039

3498

6000

0 0.01 0.02 0.03 0.04 0.05 0.06

BF

Sit

es

(Hz)

Time (ms)

Maximum Load on Machine 2 (ANSP LSR - Optimised)

250

391

665

1129

2039

3533

6000

0 0.01 0.02 0.03 0.04 0.05 0.06

BF

Sit

es

(Hz)

Time (ms)

Maximum Load on Machine 2 (ANSP HSR - Optimised)

132

6.1.4. MAP and Optimised RTAP Responses Comparisons

The comparison of responses between MAP and optimised RTAP using Schraudolph

exponential function are discussed in this section. The responses are acquired from all

stages from stapes displacement in the middle ear to auditory nerve spiking probability

(ANSP). As RTAP-numerical has identical response as RTAP, the results in this section are

generated from RTAP-numerical. With the settings identical to that in section 3.9, input sine

tones of 500 Hz, 1000 Hz, 3000 Hz and 5000 Hz were injected into MAP and RTAP-

numerical models and the BFs were selected as 250 Hz, 1039 Hz, 3109 Hz and 5377 Hz

respectively. The input sine tones were also altered in the range of 10 dB SPL to 90 dB SPL

for every input sine tone frequency and BF.

Figures 6.10 to 6.13 depict that the lowest deviation between MAP and RTAP occurs

in the computation of stapes displacement where Schraudolph exponential function is not in

use. This error remains identical to the RMS error measurement for stapes displacement in

figures 3.16 to 3.19. The RMS error increases for the BM displacement stage because each

pass of the DRNL filter comprises a Schraudolph fast exponential function as part of the

nonlinear computation. Two additional Schraudolph fast exponential functions implemented

in the IHCRP stage cause a further increase in the RMS errors. The largest deviations

between MAP and RTAP occur in the computation of responses for LSR and HSR fibres at

the NRR and ANSP stages where the errors due to Schraudolph fast exponential function

from all stages accumulate.

RMS errors between MAP and optimised RTAP-numerical are observed in figures

6.10, 6.11 and 6.12 to be under 5% for responses corresponding to 500 Hz, 1000 Hz and

3000 Hz sine tone inputs whereas the errors are higher for 5000 Hz sine tone input just

under 8% in figure 6.13. In comparison, the RMS errors observed in figures 3.16 to 3.19 are

below 1% with Schraudolph fast exponential function disabled. The approximations used in

Schraudolph fast exponential have introduced larger output quantisation errors as opposed

to the exponential computation used in the math libraries of MKL and MVS respectively.

Hence, Schraudolph function enhances computing speed of an exponential function while

reducing its resulting accuracy.

133

Figure 6.10: Normalised RMS errors for various responses between MAP and optimised RTAP based on a 500 Hz sine tone input observed from a 250 Hz BF channel.


0

0.01

0.02

0.03

0.04

0.05

0.06

0 20 40 60 80 100

No

rma

lise

d R

MS

Err

or


Normalised RMS Error: RTAP-numerical Optimised vs. MAP (500Hz sine tone)

Stapes Displacement

BM Displacement

IHCRP

NRR LSR

NRR HSR

ANSP LSR

ANSP HSR

0

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04

0.045

0 20 40 60 80 100

No

rma

lise

d R

MS

Err

or



Stapes Displacement

BM Displacement

IHCRP

NRR LSR

NRR HSR

ANSP LSR

ANSP HSR

134



6.2 Load Profile

6.2.1 Maximum Load

Load profile refers to the maximum number of BF channels that can be

accommodated by RTAP during its runtime where the algorithms are being processed with

0

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04

0.045

0.05

0 20 40 60 80 100

No

rma

lise

d R

MS

Err

or



Stapes Displacement

BM Displacement

IHCRP

NRR LSR

NRR HSR

ANSP LSR

ANSP HSR

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0 20 40 60 80 100

No

rma

lise

d R

MS

Err

or



Stapes Displacement

BM Displacement

IHCRP

NRR LSR

NRR HSR

ANSP LSR

ANSP HSR

135

the input signal set based on table 4.2. Algorithms include the processing of the OME stage

and any one of the four functions described in table 3.2. More specifically, the number of BF

channels is set to a random number for the first time RTAP is executed on a computer. As

long as the processing time of the algorithms is less than 58ms, the parameter controlling

the number of BF channels on RTAP UI is increased. The maximum number of BF channels

is found when the processing time of the algorithms is approximately but not greater than

58ms. The maximum load of RTAP is dependent on the computer hardware and software

that RTAP runs on and thus, maximum load varies on different computers.

Figures 6.14 and 6.15 distinguish the load profiles based on math optimised and non-

optimised single precision (SP) execution of RTAP. Similarly, figures 6.16 and 6.17 present

load profiles for the optimised and non-optimised double precision (DP) execution of RTAP.

In each of the figure, the load profile is broken further down based on the computers RTAP

is executed on as well as process priority of RTAP. Math optimised execution is defined as

the running of the algorithms in RTAP with Schraudolph fast exponential functions enabled.

Single precision execution refers to the computation and projection of data in a 4-byte or 32-

bit format while the double precision execution refers to the use of 8-byte or 64-bit variables

at the runtime of the algorithms.

One clear distinction is that RTAP is able to accommodate more than 175% as many

BF channels on machine 2 as on machine 1 for non-optimised and optimised algorithms

processing as well as SP and DP executions. Hence, the 400 MHz additional clock speed on

the dual core CPU in machine 2 is significant in providing boost to the computations. It has

to be declared that although the i5 processor on machine 1 is a dual core CPU, it has a

different architecture than the CPU in machine 2. Similarly, different computer hardware

attributes of machines 1 and 2 in terms motherboard layout, access times of random access

memory (RAM), hard disk drive (HDD) and graphics display contribute to the large load

increase in RTAP for machine 2.

Another key feature is that maximum loads on both machines are dependent on the

priority that RTAP runs on. RTAP running under the real-time priority is able to run the most

BF channels as compared when it is run under any other priorities on machine 2. Real-time

priority, however, is unavailable on machine 1 due to underlying manufacturer setup of the

machine. Nevertheless, increasing the priority levels on machines 1 and 2 from the lowest to

the highest priority for the same stage within RTAP lead to a gradual increase in the number

of BF channels. The increase is more significant when RTAP is executed as real-time priority

on machine 2 for all stages. This is so because the CPU gives its undivided attention by

processing RTAP above any other processes including the OS based processes. This is

observed on machine 2 with lag introduced in the mouse and graphics responses on the

display screen that indicate that the CPU is leveraging on the execution of RTAP more than

136

the computations of interleaving tasks of the mouse and graphics devices scheduled by

Windows OS.

On a stage-by-stage basis comparison, the maximum load for DRNL stage on both

machines far exceeds any other individual stage due to its low computationally intensive

algorithm or more specifically the use of conventional signal processing algorithm such as

the IIR filter. The BF load falls for the computation of IHCRP onwards due largely to the

accumulative processing times of algorithms in the downstream stages of the auditory

pathway. At the ANSP stage, the maximum load for the algorithm computation of two AN

fibre types provides the least loading on RTAP because all the algorithms involved in the AP

model are in full operation resulting in an accumulated computationally intensive operation.

The maximum loads for the math optimised computation for all stages of RTAP on

both machines 1 and 2 are greater than the non-optimised computation ranking at a ratio of

approximately 115% and 120% respectively. This effectively projects that the exponential

functions from C++ math library play a significant part in contributing to the processing time.

This is observable for a large number of BF channels and when analysing responses at NRR

and ANSP stages where the number of exponential calls accumulate from all earlier stages

and are scaled as a multiple of the BF channels as well as the AN fibre types. Computations

of SP and DP versions of RTAP result in identical response for all stages in the AP model.

The difference in the maximum load of running the SP and DP version of RTAP fluctuates at

approximately 0.3% for all stages in the AP model.

137

Figure 6.14: Maximum load profile for non-optimised single precision execution of RTAP on machines 1 and 2.

25

35

45

55

65

75

85

Nu

mb

er

of

BF

ch

an

ne

lsRTAP Max Load Profile for Machine1 (SP)

DRNL

DRNL-to-IHCRP

DRNL-to-NRR (1 AN

fibre type)

DRNL-to-NRR (2 AN

fibre types)

DRNL-to-ANSP (1

AN fibre type)

DRNL-to-ANSP (2

AN fibre types)

50

70

90

110

130

150

170

190

Nu

mb

er

of

BF

ch

an

ne

ls

RTAP Max Load Profile for Machine2 (SP)

DRNL

DRNL-to-IHCRP

DRNL-to-NRR (1 AN

fibre type)

DRNL-to-NRR (2 AN

fibre types)

DRNL-to-ANSP (1 AN

fibre type)

DRNL-to-ANSP (2 AN

fibre types)

138

Figure 6.15: Maximum load profile for optimised single precision execution of RTAP on machines 1 and 2.

30

40

50

60

70

80

90

100N

um

be

r o

f B

F c

ha

nn

els

Machine1 Optimised Max Load Profile (SP)Math Optimised

DRNL

Math Optimised

DRNL-to-IHCRP

Math Optimised

DRNL-to-NRR (1 AN

fibre type)Math Optimised

DRNL-to-NRR (2 AN

fibre types)Math Optimised

DRNL-to-ANSP (1 AN


DRNL-to-ANSP (2 AN

fibre types)

60

80

100

120

140

160

180

200

Nu

mb

er

of

BF

ch

an

ne

ls

Machine2 Optimised Max Load Profile (SP)Math Optimised DRNL

Math Optimised

DRNL-to-IHCRP

Math Optimised

DRNL-to-NRR (1 AN

fibre type)

Math Optimised

DRNL-to-NRR (2 AN

fibre types)

Math Optimised

DRNL-to-ANSP (1 AN


DRNL-to-ANSP (2 AN

fibre types)

139

Figure 6.16: Maximum load profile for non-optimised double precision execution of RTAP on machines 1 and 2.

25

35

45

55

65

75

85N

um

be

r o

f B

F c

ha

nn

els

RTAP Max Load Profile for Machine1 (DP)

DRNL

DRNL-to-IHCRP

DRNL-to-NRR (1 AN

fibre type)

DRNL-to-NRR (2 AN

fibre types)

DRNL-to-ANSP (1

AN fibre type)

DRNL-to-ANSP (2

AN fibre types)

55

75

95

115

135

155

175

Nu

mb

er

of

BF

ch

an

ne

ls

RTAP Max Load Profile for Machine2 (DP)

DRNL

DRNL-to-IHCRP

DRNL-to-NRR (1 AN

fibre type)

DRNL-to-NRR (2 AN

fibre types)

DRNL-to-ANSP (1 AN

fibre type)

DRNL-to-ANSP (2 AN

fibre types)

140

Figure 6.17: Maximum load profile for optimised double precision execution of RTAP on machines 1 and 2.

30

40

50

60

70

80

90

Nu

mb

er

of

BF

ch

an

ne

lsMachine1 Optimised Max Load Profile (DP)

Math Optimised

DRNL

Math Optimised

DRNL-to-IHCRP

Math Optimised

DRNL-to-NRR (1 AN


DRNL-to-NRR (2 AN

fibre types)Math Optimised

DRNL-to-ANSP (1 AN


DRNL-to-ANSP (2 AN

fibre types)

70

90

110

130

150

170

190

Nu

mb

er

of

BF

ch

an

ne

ls

Machine2 Optimised Max Load Profile (DP)Math Optimised DRNL

Math Optimised

DRNL-to-IHCRP

Math Optimised

DRNL-to-NRR (1 AN

fibre type)

Math Optimised

DRNL-to-NRR (2 AN

fibre types)

Math Optimised

DRNL-to-ANSP (1 AN

fibre type)

Math Optimised

DRNL-to-ANSP (2 AN

fibre types)

141

6.2.2 Thread Profile

In section 4.5, the utilisation of threads was discussed. The main motivation of using

threads is to achieve parallelism in RTAP. Figures 6.18, 6.19 and 6.20 depict the profiles of

three threads with the exception of the algorithm threads used in RTAP. One general

observation from these profiles is that the processing time of the threads is not dependent on

the process priority that RTAP runs on. This characteristic is attributed to the non-

deterministic Microsoft Windows OS kernel task scheduler as explained in section 2.3. This

is evident especially with an increased CPU attention on RTAP with the highest priority

settings on machines 1 and 2 where the processing time for all thread profiles fluctuate.

The pixel rendering thread that is invoked constantly by the algorithm thread just

before its conclusion has the least significant impact on the processing time. This is so

because the thread does not possess computational intensive algorithms and it spends

majority of its time reading from and writing to image buffers. Alternatively, the draw image

function that is wrapped around the main RTAP thread is invoked every 58ms to project the

pixels in the image buffer on to the screen. This thread has the highest average processing

time as compared to the other two threads other than the algorithm thread. The significant

processing time is attributed to the inherent tasks in the drawImage function that initiate the

transfer of the contents of the image buffer from memory to the hardware display

mechanism, which eventually draw the pixels on to the computer screen. The record thread

profile is based on the average of two recording instances differentiated by the

‘Play+Record’ and ‘Record’ buttons. Its processing time is the second most significant after

the main thread that draws pixels on to the screen. This can be attributed to the access of

non-volatile memory locations such as the hard disk drive (HDD) that is more significant than

the access of volatile memory such as the random access memory (RAM).

On each machine, the sum of the average processing times of all threads including

the algorithm thread exceeds the limit of 58ms imposed by the sampling rate of 44.1 KHz

base audio library, DirectSound in RTAP. Even with an inactive record thread, the

processing times of remaining threads still break the banks of the benchmark of 58ms.

Should threads not be implemented, either of the complete tasks execution in RTAP namely,

algorithm processing, processed data recording or processed data display will not be

possible as the CPU will use one main thread to perform all the aforementioned tasks that

will exceed beyond 58ms. Therefore, the use of threads bears testament to the effect of

tasks parallelism and operational integrity achieved in RTAP.

142

Figure 6.18: Pixel render thread profile of RTAP on machines 1 and 2.

0

0.05

0.1

0.15

0.2

0.25P

roce

ssin

g t

ime

(m

s)

Process Priority

Pixel Render Thread Profile (Machine 1)

ANSP HSR

Optimised ANSP LSR

DRNL

Optimised DRNL

IHCRP

Optimised IHCRP

NRR HSR

NRR LSR

Optimised NRR HSR

Optimised NRR LSR

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Pro

cess

ing

tim

e (

ms)

Process Priority

Pixel Render Thread Profile (Machine 2)ANSP HSR

Optimised ANSP LSR

DRNL

Optimised DRNL

IHCRP

Optimised IHCRP

NRR HSR

NRR LSR

Optimised NRR HSR

Optimised NRR LSR

143

Figure 6.19: Record thread profile of RTAP on machines 1 and 2.

1.2

1.4

1.6

1.8

2

2.2

2.4

2.6

2.8P

roce

ssin

g t

ime

(m

s)

Process Priority

Record Thread Profile (Machine 1)ANSP HSR

Optimised ANSP LSR

DRNL

Optimised DRNL

IHCRP

Optimised IHCRP

NRR HSR

NRR LSR

Optimised NRR HSR

Optimised NRR LSR

1.5

2

2.5

3

3.5

4

4.5

5

Pro

cess

ing

tim

e (

ms)

Process Priority

Record Thread Profile (Machine 2)ANSP HSR

Optimised ANSP LSR

DRNL

Optimised DRNL

IHCRP

Optimised IHCRP

NRR HSR

NRR LSR

Optimised NRR HSR

Optimised NRR LSR

144

Figure 6.20: Onscreen signal display profile for maximum load in RTAP for machines 1 and

2.

6.3 Summary

A fast exponential function has been implemented in RTAP. Although it is able to

increase the load profile of RTAP, it causes instability in the response of ANSP HSR stage

and renders this stage as unworkable. Segmenting the computing structure of the apical

7.2

7.4

7.6

7.8

8

8.2

8.4

8.6P

roce

ssin

g t

ime

(m

s)

Process Priority

Signals Display Profile (Machine 1)ANSP HSR

Optimised ANSP LSR

DRNL

Optimised DRNL

IHCRP

Optimised IHCRP

NRR HSR

NRR LSR

Optimised NRR HSR

Optimised NRR LSR

6.4

6.6

6.8

7

7.2

7.4

7.6

7.8

Pro

cess

ing

tim

e (

ms)

Process Priority

Signals Display Profile (Machine 2)ANSP HSR

Optimised ANSP LSR

DRNL

Optimised DRNL

IHCRP

Optimised IHCRP

NRR HSR

NRR LSR

Optimised NRR HSR

Optimised NRR LSR

145

conductance equation in the IHCRP into pre-computing and real-time segments allowed an

even utilisation of exponential functions from Intel MKL and Schraudolph optimised

implementation. This eradicated the instability in the ANSP HSR fibre response computation

and enabled higher BF channels loading at the expense of lower accuracy in the response of

RTAP regardless of the AP model stage. Load profiles of RTAP running on a desktop and a

laptop are provided. The desktop with a faster clocking CPU is able to accommodate more

numbers of BF channels than the laptop. Running RTAP at higher process priorities result in

an increase in the BF load. Single and double precision execution of RTAP bears similar BF

load results.

146

Chapter 7: Summary, Recommendations and Conclusion

7.1 Summary

Five auditory pathway (AP) computer models have been reviewed and the MAP

model has been selected for real-time implementation. The algorithms selected for real-time

implementation include basilar membrane (BM) displacement, inner hair cell receptor

potential (IHCRP), neurotransmitter release rate (NRR) and auditory nerve spiking

probabilities (ANSP). A transition program, RTAP-numerical is developed in C to ensure that

the algorithm response from MAP to C for real-time implementation matches the MAP

model. The RMS errors for all stages in the auditory model from stapes displacement to

auditory nerve spiking probability (ANSP) responses between MAP and RTAP-numerical are

below 1%. As a result of the insignificant differences between MAP and RTAP-numerical

responses, the algorithms were then incorporated into a C++ GUI library called JUCE. The

JUCE library is used for the implementation of a real-time GUI based program on Windows

operating system. To achieve real-time effects the algorithms from RTAP-numerical are

wrapped in a C++ class and integrated with POSIX threading APIs and timer callback

functions.

The real-time auditory pathway simulator, RTAP is able to process built-in generated

sine tones as well as real world audio data acquired from microphone channel on board the

computer it is running on. Static displays of real-time responses are available in ERB scaled

and spectrogram formats that allow signals generated from multiple BF channels to be

displayed on a single graph. Mathematical optimisation implemented for approximating

exponential functions are capable of increasing the number of discrete BF points that differ

based on the computing process priority of the real-time model and the stage of the auditory

pathway that the simulation takes place for. However, the optimised exponential function

deviate the accuracies of responses generated in RTAP as much as 8% at the expense of

increased BF channels loading due to quantisation errors inherent in its approximation.

Load profiles indicate that the maximum number of BF channels is generated at the

BM displacement stage and this parameter reduces at every stage in the upstream direction

of the auditory model. Optimised exponential function used in RTAP is capable of increasing

the load of the simulator by approximately 10% for a laptop and desktop running on a dual

core CPU as specified in table 4.1. These load profiles for non-optimised and optimised

execution of RTAP though varies on different computers.

147

7.2 Recommendations

While this document presents the implementation of a real-time auditory pathway

(AP) model that resulted in the emergence of RTAP, it is able to accomplish more

functionalities by adopting these operational features in future editions of the model:

1) Current record features on RTAP only record two window frames of data due to the

overheads involved in disk access that increases processing time. A real-time

logger implemented on RTAP will be able to record multiple window frames of data

indefinitely for the duration of the runtime of the algorithms.

2) Besides using sine tones and audio signal from the microphone channel on board a

computer, various other input stimuli can be implemented such as step, impulse,

and saw tooth. An alternative input stimulus source is from audio files of various

formats such as wav, mp3 and ogg.

3) Upstream stages from the MAP model that has been excluded from this version of

RTAP can be introduced in future versions that include AN spiking, cochlear

nucleus, brainstem level computations. The effects of acoustic reflex and medial

olivo cochlear (MOC) feedback modules can also be added to the OME and BM

stages respectively.

4) Directional audio filters can be added to the outer ear stage to tune the magnitude

of the input stimulus based on the bearings of the stimulus source.

5) Algorithms utilised in RTAP can be parallelised and implemented in separate

threads to achieve further optimisation. Feasible stages of such implementation are

from BM displacement computation onwards where BF channels can be segmented

into smaller groups and a thread can be allocated to every group for computational

speedup.

6) RTAP runs only on Microsoft Windows. Versatility of RTAP can be enhanced if it is

available to run across other operating system (OS) platforms. Also RTAP can be

run on real-time operating system (RTOS) such as QNX, VxWorks and RTLinux.

Profiles can be generated and compared with the performance profiles of running

RTAP on general purpose operating system (GPOS) such as Windows and Linux.

7) Faster graphics rendering libraries for future implementation of RTAP can assist in

accelerating image drawing from buffers to the screen. As an alternative to JUCE

library, OpenGL, which is a low level graphics library can be used to achieve

speedups in processed data projection on to the computer screen. Though to use it,

modifications of RTAP UI may be required.

148

7.3 Conclusion

A real-time computational model of the auditory pathway has been developed on

Microsoft Windows operating system (OS) using C++. It is based on the Matlab Auditory

Periphery (MAP) model and is able to simulate cochlea functions such as basilar membrane

displacement and auditory nerve firings on-the-fly using either sine tones or streamed real-

world audio. The root-mean-squared (RMS) errors of the responses of the real-time

simulator and MAP model are less than 1%. The output of the real-time simulator is

presented in logarithmically scaled channels either on equivalent rectangular bandwidth or

spectrogram graph. Through the utilisation of POSIX threads, computing parallelism is

achieved that complements the real-time processing. An increase in channel loading is

achieved using mathematical optimisation though the accuracy of the responses with

respect to the MAP model drops to an RMS error of 8%. Running the real-time auditory

pathway simulator on a quicker multi-core computer processor will increase channel loading

though this is not specific on cross platform OS besides Microsoft Windows.

A real-time auditory pathway simulator is an essential tool for research in the fields of

neuroscience and engineering. In the area of neuroscience, a real-time AP model is able to

interface with models simulating upstream functions of the auditory pathway and the brain in

order to provide a large simulation model that can be used in studying complexities in the

separate regions of the brain [63]. In the area of engineering, a research tool of such

magnitude has led to noise reduction in cell phones in real-time through algorithms

developed from the study and modelling of the cochlea [7]. Cochlear implant designers can

learn the feasibility of porting algorithms of an AP model by studying the behaviour of real-

time AP model [5]. Enhanced real-time speech and music processing software and hardware

tools can be developed with the aid of the real-time AP model used as a perceptual model.

Therefore, the real-time computer auditory model developed in this research project has a

potential to be utilised in a wide variety of applications.

149

Bibliography

[1] R. Meddis and E. A. Lopez-Poveda, “Overview,” in Computational Models of the Auditory System, no. 1954, R. Meddis, E. A. Lopez-Poveda, R. R. Fay, and A. N. Popper, Eds. New York Dordrecht Heidelberg London: Springer, 2010, pp. 1–6.

[2] M. P. Cooke, “A Computer Model of Peripheral Auditory Processing Incorporating Phase-Locking, Suppression and Adaptation Effects,” Speech Communication, vol. 5, pp. 261–281, 1986.

[3] R. D. Patterson, M. H. Allerhand, and G. Christian, “Time-domain Modelling of Peripheral Auditory Processing: A Modular Architecture and a Software Platform,” The Journal of Acoustical Society of America, vol. 98, no. 4, pp. 1890 – 1894, 1995.

[4] X. Zhang, M. G. Heinz, I. C. Bruce, and L. H. Carney, “A Phenomenological Model for the Responses of Auditory-Nerve Fibers�: I . Nonlinear Tuning,” The Journal of the Acoustical Society of America, vol. 109, no. 2, pp. 648–670, 2001.

[5] B. S. Wilson, E. A. Lopez-Poveda, and R. Schatzer, “Use of Auditory Models in Developing Coding Strategies for Cochlear Implants,” in Computational Models of the Auditory System, 2010, pp. 237–260.

[6] R. Meddis and E. A. Lopez-Poveda, “Auditory Periphery�: From Pinna to Auditory Nerve,” in Computational Models of the Auditory System, R. Meddis, E. A. Lopez-Poveda, R. R. Fay, and A. N. Popper, Eds. New York Dordrecht Heidelberg London: Springer, 2010, pp. 7–38.

[7] L. Watts, “Real-time, High-Resolution Simulation of the Auditory Pathway, with Application to Cell-Phone Noise Reduction,” ISCAS, pp. 3821–3824, 2010.

[8] R. R. Pfeiffer, “A Model for Two-tone Inhibition of Single Cochlear Nerve Fibres,” The Journal of the Acoustical Society of America, vol. 48, no. 6B, pp. 1373 – 1378, 1970.

[9] J. O. Smith and J. S. Abel, “Bark and ERB Bilinear Transforms,” IEEE Transactions on Speech and Audio Processing, vol. 7, no. 6, pp. 697–708, 1999.

[10] M. E. Lutman and A. M. Martin, “Development of an Electroacoustic Analogue Model of the Middle Ear and Acoustic Reflex,” Journal of Sound And Vibration, vol. 64, no. 1, pp. 133–157, 1979.

[11] R. D. Patterson, K. Robinson, J. Holdsworth, D. McKeown, C. Zhang, and M. Allerhand, “Complex Sounds And Auditory Images,” Auditory Physiology and Perception, Proc. 9th International Symposium on Hearing, no. 1992, 1992.

[12] C. Giguere and P. C. Woodland, “A Computational Model of the Auditory Periphery for Speech and Hearing Research. I. Ascending Path,” The Journal of the Acoustical Society of America, vol. 95, no. 1, pp. 331 – 342, 1994.

[13] R. Meddis, “Simulation of Mechanical to Neural Transduction in the Auditory Receptor,” The Journal of the Acoustical Society of America, vol. 79, no. 3, pp. 702–711, 1986.

150

[14] J. L. Goldstein, “Modeling Rapid Waveform Compression on the Basilar Membrane as Multiple-Bandpass-Nonlinearity Filtering,” Hearing Research, vol. 49, pp. 39–60, 1990.

[15] T. Lin and J. L. Goldstein, “Implementation of the MBPNL Nonlinear Cochlear I/O Model in the C Programming Language, and Applications for Modeling Impaired Auditory Function,” in Modeling Sensorineural Hearing Loss, W. Jesteadt, Ed. New Jersey: Lawrence Erlbaum Associates, Inc., 1997, pp. 67 – 78.

[16] R. Meddis, “Matlab Auditory Periphery (MAP) Model Technical Description.” Essex, pp. 1 –32, 2011.

[17] R. Meddis, L. P. O’Mard, and E. A. Lopez-Poveda, “A Computational Algorithm for Computing Nonlinear Auditory Frequency Selectivity,” The Journal of the Acoustical Society of America, vol. 109, no. 6, pp. 2852 – 2861, 2001.

[18] E. A. Lopez-Poveda and R. Meddis, “A Human Nonlinear Cochlear Filterbank,” The Journal of the Acoustical Society of America, vol. 110, no. 6, pp. 3107 – 3118, 2001.

[19] C. J. Sumner, E. A. Lopez-Poveda, L. P. O’Mard, and R. Meddis, “A Revised Model of the Inner-Hair Cell and Auditory-Nerve Complex,” The Journal of the Acoustical Society of America, vol. 111, no. 5, pp. 2178 – 2188, 2002.

[20] R. Patterson and T. Walter, “AIM-C,” 2009. [Online]. Available: http://code.soundsoftware.ac.uk/projects/aimc.

[21] R. F. Lyon, M. Rehn, S. Bengio, T. C. Walters, and G. Chechik, “Sound retrieval and ranking using sparse auditory representations.,” Neural computation, vol. 22, no. 9, pp. 2390–416, Sep. 2010.

[22] T. C. Walters, “Auditory-Based Processing of Communication Sounds,” University of Cambridge, 2011.

[23] J. Whittaker, “The Physics of the Ear.” Colorado, pp. 1 – 71, 2006.

[24] A. Michelsen and O. N. Larsen, “Pressure Difference Receiving Ears,” Bioinspiration & Biomimetics, vol. 011001, no. 3, pp. 1 – 18, 2008.

[25] S. E. Voss, J. J. Rosowski, S. N. Merchant, and W. T. Peake, “Acoustic Responses of the Human Middle Ear,” Hearing Research, vol. 150, pp. 43–69, 2000.

[26] J. Pickles, “The Outer and Middle Ears,” in An Introduction to the Physiology of Hearing2, 3rd ed., Bingley: Emerald Group, 2008, pp. 11 – 24.

[27] G. G. Matthews, “Hearing and Other Vibration Senses,” in Neurobiology Molecules, Cells and Systems, Blackwell Science, 1998, p. 25.

[28] A. Huber, M. Ferrazzini, S. Stoeckli, T. Linder, N. Dillier, S. Schmid, and U. Fisch, “Intraoperative Assessment of Stapes Movement,” The Annals of Otology, Rhinology & Laryngology, vol. 110, no. 1, pp. 31 – 35, 2001.

151

[29] M. A. Ruggero and A. N. Temchin, “The Roles of the External , Middle , and Inner Ears in Determining the Bandwidth of Hearing,” Proceedings of the National Academy of Sciences of the United States of America, vol. 99, no. 20, pp. 13206 – 13210, 2002.

[30] J. Pickles, “The Cochlea,” in An Introduction to the Physiology of Hearing, 3rd ed., Bingley: Emerald Group, 2008, pp. 25 – 72.

[31] A. van Schaik, “Analogue VLSI Building Blocks for an Electronic Auditory Pathway,” École Polytechnique Fédérale de Lausanne, 1997.

[32] A. G. Katsiamis, E. M. Drakakis, and R. F. Lyon, “Practical Gammatone-Like Filters for Auditory Processing,” EURASIP Journal on Audio, Speech and Music Processing, vol. 2007, pp. 1 – 25, 2007.

[33] N. Ma, “An Efficient Implementation of Gammatone Filters.” [Online]. Available: http://www.dcs.shef.ac.uk/~ning/resources/gammatone/.

[34] C. Michel, R. Nouvian, C. Azevedo-Coste, J. L. Puel, and J. Bourien, “A Computational Model of the Primary Auditory Neuron Activity.,” Conference proceedings�: ... Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Conference, vol. 2010, pp. 722–5, Jan. 2010.

[35] J. Pickles, “The Auditory Nerve,” in An Introduction to the Physiology of Hearing, 3rd ed., Bingley: Emerald Group, 2008, pp. 73 – 101.

[36] P. Dallos, “Neurobiology of Cochlear Inner and Outer Hair Cells: Intracellular Recordings.,” Hearing Research, vol. 22, pp. 185 – 198, Jan. 1986.

[37] S. A. Shamma, R. S. Chadwick, W. J. Wilbur, K. A. Morrish, and J. Rinzel, A Biophysical Model of Cochlear Processing: Intensity Dependence of Pure Tone Responses, vol. 80, no. 1. 1986, pp. 133–145.

[38] R. C. Kidd and T. F. Weiss, “Mechanisms that Degrade Timing Information in the Cochlea,” Hearing Research, vol. 49, no. HEARES 01421, pp. 181 – 208, 1990.

[39] R. Meddis, “Auditory-nerve First-spike Latency and Auditory Absolute Threshold: A Computer Model,” The Journal of the Acoustical Society of America, vol. 119, no. 1, pp. 406 – 417, 2006.

[40] P. A. Laplante, Real-time Systems Design and Analysis, 3rd ed. IEEE Press,Wiley-Interscience, 2004, pp. 1 – 505.

[41] Microsoft, “Operating System Versioning,” MSDN, 2012. [Online]. Available: http://msdn.microsoft.com/en-gb/library/dd371754(VS.85).aspx.

[42] K. Ramamritham, C. Shen, O. Gonzalez, S. Sen, and S. B. Shirgurkar, “Using Windows NT for Real-Time Applications�: Experimental Observations and Recommendations,” IEEE Real-Time Technology And Applications Symposium, pp. 1 – 13, 1998.

[43] Essex Hearing Research Laboratory, “auditory modelling at essex university,” 2012. [Online]. Available:

152

http://www.essex.ac.uk/psychology/department/HearingLab/modelling.html. [Accessed: 01-Nov-2011].

[44] Mathworks, “Filter - 1D Digital Filter,” 2012. [Online]. Available: http://www.mathworks.com.au/help/techdoc/ref/filter.html. [Accessed: 01-Jul-2011].

[45] Mathworks, “Logspace,” 2012. [Online]. Available: http://www.mathworks.com.au/help/techdoc/ref/logspace.html. [Accessed: 05-Jul-2011].

[46] M. J. Hewitt and R. Meddis, “An Evaluation of Eight Computer Models of Mammalian Inner Hair-cell Function,” Journal of the Acoustical Society of America, vol. 90, no. August 1991, pp. 904–917, 2010.

[47] R. Medldis, M. J. Hewitt, and T. M. Shackleton, “Implementation Details of a Computational Model of the Inner Hair-cell/Auditory-nerve Synapse,” The Journal of the Acoustical Society of America, vol. 87, no. April 1990, pp. 1813–1816, 2010.

[48] C. J. Plack, The Sense of Hearing. Lawrence Erlbaum Associates, Inc., 2005.

[49] N.A., “Processes and Threads,” Windows Dev Centre - Desktop, 2012. [Online]. Available: http://msdn.microsoft.com/en-us/library/windows/desktop/ms684841(v=vs.85).aspx. [Accessed: 15-Mar-2012].

[50] S. Akhter and J. Roberts, Multi-Core Programming: Increasing Performance through Software Multi-threading, 1st ed. Hillsboro: , 2006.

[51] Microsoft, “SetPriorityClass Function,” MSDN, 2012. [Online]. Available: http://msdn.microsoft.com/en-us/library/windows/desktop/ms686219(v=vs.85).aspx. [Accessed: 01-Jun-2012].

[52] B. Kuhn, P. Petersen, and E. O’Toole, “OpenMP versus Threading in C / C ++ Threaded Code Fragment from Genehunter.” .

[53] B. R. Glasberg and B. C. Moore, “Derivation of auditory filter shapes from notched-noise data.,” Hearing research, vol. 47, no. 1–2, pp. 103–138, Aug. 1990.

[54] Intel, “Intel Architecture Software Developer ’s Manual,” vol. 2. Intel, 1999.

[55] R. R. Mergu and S. K. Dixit, “Multi-Resolution Speech Spectrogram,” International Journal of Computer Applications, vol. 15, no. 4, pp. 28–32, Feb. 2011.

[56] J. Pickles, “The Auditory Nerve,” in An Introduction to the Physiology of Hearing, 3rd ed., Bingley: Emerald Group, 2008, pp. 73 – 102.

[57] IVONA Software, “Ivona Text-to-Speech.” [Online]. Available: http://www.ivona.com/us/. [Accessed: 10-Jul-2012].

[58] A. Fog, “Instruction Tables,” Copenhagen, 2011.

[59] E. W. Weisstein, “Exponential Function,” Mathworld - A Wolfram Web Resource, 2012. [Online]. Available: http://mathworld.wolfram.com/ExponentialFunction.html. [Accessed: 20-Jun-2012].

153

[60] Intel, “Intel ® Math Kernel Library for Windows * OS.” Intel, pp. 1–114.

[61] N. N. Schraudolph, “A Fast, Compact Approximation of the Exponential Function,” Neural Computation, no. 11, pp. 853–862, 1999.

[62] Microprocessor Standards Committee and Floating Point Working Group, IEEE Std 754TM-2008 (Revision of IEEE Std 754-1985), IEEE Standard for Floating-Point Arithmetic, 2008th ed., vol. 2008, no. August. IEEE, 2008.

[63] N. V Thakor, “In the Spotlight: Neuroengineering.,” IEEE Reviews in Biomedical Engineering, vol. 3, pp. 19 – 22, Jan. 2010.

154

Appendix A

Outer and Middle Ear

Filter Parameter Remarks

External Ear Resonance

(EER) filter

Numerator order = 3:

a[0] = 1.0, a[1] = -1.142727,

a[2] = 0.37408769

Denominator order = 3:

b[0] = 0.3195615, b[1] = 0.0,

b[2] = -0.31295615

Tympanic Membrane (TM)

filter


b[0] = 0.014247596


a[0] = 1.0, a[1] = -0.957524

Stapes Inertia (SI) filter Numerator order = 2:

b[0] = 0.87454802,

b[1] = -0.87454802


a[0] = 1.0,

a[1] = -0.74909604

Basilar Membrane

Linear gammatone filter Numerator order = 2,

Denominator order = 3

Filter coefficients vary based

on number of BF channels

used.

minLinCF = 153.13 Hz,

coeffLinCF = 0.7341

Coefficients for calculating

minimum linear characteristic

frequencies.

minLinBW = 100,

coeffLinBW = 0.6531


minimum linear bandwidths.

Nonlinear gammatone filter Numerator order = 2,

Denominator order = 3

Filter coefficients vary based

on number of BF channels

used.

p = 0.2895,

q = 250


bandwidths in nonlinear

pathway based on human

155

auditory pathway settings

[16].

Memoryless compression

threshold

compThreshdB = 10,

a = 50,000

c = 0.2


compressive effects in

nonlinear pathway.

Inner Hair Cell

IHC cilia displacement

filter

TC = 0.00012 Filter time constant


b[0] = 1.0,

b[1] = -0.66207103


a[0] = 0.37792897

C = 0.08 Scaling coefficient

Receptor potential u0 = 5e-9,

u1 = 1e-9

Cilia displacements

constants set based on

nonlinear characteristic.

s0 = 1e-9,

s1 = 1e-9

Dimensionless longitudinal

constants based on

nonlinear characteristic

scaled by BM length.

Gmax = 6e-9,

Ga = 0.8e-9,

Gk = 2e-8

Maximum IHC, apical and

potassium conductances in

Siemens.

Et = 0.1,

Ek = -0.08

Endocochlear and potassium

equilibrium potentials in

volts.

RPC = 0.04 Combined resistances in

ohm.

Cap = 4e-12 IHC capacitance.

Inner Hair Cell Presynaptic Region

Neurotransmitter release

rate

Gmax-Ca = 14e-9 Maximum calcium

conductance in Siemens.

ECa =0.066 Calcium equilibrium potential

in volts.

� = 400, Constants determining

156

� = 100 calcium channel opening.

�M = 5e-5,

�Ca[0] = 30e-6,

�Ca[1] = 80e-6

Membrane and calcium low

and high spontaneous rate

time constants in seconds.

Z = 2e42 Vesicle release rate scalar.

Auditory Nerve

AN spiking probability trefractory = 0.75e-3 Refractory period in seconds.

M = 12 Maximum neurotransmitter

vesicles at synapse

Y = 6 Depleted neurotransmitter

vesicles replacement rate.

X = 60 Replenishment from re-

uptake store.

L = 250 Neurotransmitter vesicle loss

rate from the cleft.

R = 500 Neurotransmitter re-uptake

rate from cleft into IHC.

Table A.1: Algorithm parameter settings

157

Figure A.1: MAP and RTAP inner hair cell receptor potential (IHCRP) response for 30 BF channels.

250Hz

388Hz

670Hz

1159Hz

2005Hz

3469Hz

6000Hz

0 0.005 0.01 0.015 0.02

BF

sit

es

alo

ng

BM

fo

r IH

C r

ece

pto

r p

ote

nti

al

(Hz)

Time (seconds)

MAP IHC RP Response (30 BFs)

250Hz

388Hz

670Hz

1159Hz

2005Hz

3469Hz

6000Hz

0 0.005 0.01 0.015 0.02

BF

sit

es

alo

ng

BM

fo

r IH

C r

ece

pto

r p

ote

nti

al

(Hz)

Time (seconds)

RTAP IHC RP Response (30 BFs)

158

Figure A.2: MAP and RTAP low spontaneous rate (LSR) fibre neurotransmitter release rate (NRR) response for 30 BF channels.

250Hz

388Hz

670Hz

1159Hz

2005Hz

3469Hz

6000Hz

0 0.005 0.01 0.015 0.02

BF

sit

es

alo

ng

BM

fo

r N

RR

to

LS

R A

N f

ibre

s (H

z)

Time (seconds)

MAP NRR LSR Response (30 BFs)

250Hz

388Hz

670Hz

1159Hz

2005Hz

3469Hz

6000Hz

0 0.005 0.01 0.015 0.02

BF

sit

es

alo

ng

BM

fo

r N

RR

to

LS

R A

N f

ibre

s (H

z)

Time (seconds)

RTAP NRR LSR Response (30 BFs)

159

Figure A.3: MAP and RTAP high spontaneous rate (HSR) fibre neurotransmitter release rate (NRR) response for 30 BF channels.

250Hz

388Hz

670Hz

1159Hz

2005Hz

3469Hz

6000Hz

0 0.005 0.01 0.015 0.02

BF

sit

es

alo

ng

BM

fo

r N

RR

to

AN

HS

R f

ibre

s (H

z)

Time (seconds)

MAP NRR HSR Response (30 BFs)

250Hz

388Hz

670Hz

1159Hz

2005Hz

3469Hz

6000Hz

0 0.005 0.01 0.015 0.02

BF

sit

es

alo

ng

BM

fo

r N

RR

to

AN

HS

R f

ibre

s (H

z)

Time (seconds)

RTAP NRR HSR Response (30 BFs)

160

Figure A.4: MAP and RTAP low spontaneous rate (LSR) fibre auditory nerve spiking probability (ANSP) response for 30 BF channels.

250Hz

388Hz

670Hz

1159Hz

2005Hz

3469Hz

6000Hz

0 0.005 0.01 0.015 0.02

BF

sit

es

alo

ng

BM

fo

r A

N s

pik

ing

on

LS

R f

ibre

s (H

z)

Time (seconds)

MAP ANSP LSR Response (30 BFs)

250Hz

388Hz

670Hz

1159Hz

2005Hz

3469Hz

6000Hz

0 0.005 0.01 0.015 0.02

BF

sit

es

alo

ng

BM

fo

r A

N s

pik

ing

on

LS

R f

ibre

s (H

z)

Time (seconds)

RTAP ANSP LSR Response (30 BFs)

161

Figure A.5: MAP and RTAP high spontaneous rate (HSR) fibre auditory nerve spiking probability (ANSP) response for 30 BF channels.

250Hz

388Hz

670Hz

1159Hz

2005Hz

3469Hz

6000Hz

0 0.005 0.01 0.015 0.02

BF

sit

es

alo

ng

BM

fo

r A

N s

pik

ing

on

HS

R f

ibre

s (H

z)

Time (seconds)

MAP ANSP HSR Response (30 BFs)

250Hz

388Hz

670Hz

1159Hz

2005Hz

3469Hz

6000Hz

0 0.005 0.01 0.015 0.02

BF

sit

es

alo

ng

BM

fo

r A

N s

pik

ing

on

HS

R f

ibre

s (H

z)

Time (seconds)

RTAP ANSP HSR Response (30 BFs)

162

Figure A.6: Continuity between adjacent window frames for RTAP generated inner hair cell receptor potential (IHCRP) response.

Figure A.7: Continuity between adjacent window frames for RTAP generated neurotransmitter release rate (NRR) in low spontaneous rate (LSR) fibres.

250Hz388Hz

670Hz

1159Hz

2005Hz

3469Hz

6000Hz

0.052 0.054 0.056 0.058 0.06 0.062 0.064 0.066 0.068

BF

sit

es

alo

ng

BM

fo

r IH

C r

ece

pto

r p

ote

nti

al

(Hz)

Time (seconds)

RTAP IHC RP Response (30 BFs)

250Hz

388Hz

670Hz

1159Hz

2005Hz

3469Hz

6000Hz

0.052 0.054 0.056 0.058 0.06 0.062 0.064 0.066 0.068BF

sit

es

alo

ng

BM

fo

r N

RR

to

LS

R A

N f

ibre

s (H

z)

Time (seconds)

RTAP NRR LSR Response (30 BFs)

163

Figure A.8: Continuity between adjacent window frames for RTAP generated neurotransmitter release rate (NRR) in high spontaneous rate (HSR) fibres.

Figure A.9: Continuity between adjacent window frames for RTAP generated auditory nerve spiking rate (ANSP) in low spontaneous rate (LSR) fibres.

250Hz

388Hz

670Hz

1159Hz

2005Hz

3469Hz

6000Hz

0.052 0.054 0.056 0.058 0.06 0.062 0.064 0.066 0.068BF

sit

es

alo

ng

BM

fo

r N

RR

to

AN

HS

R f

ibre

s (H

z)

Time (seconds)

RTAP NRR HSR Response (30 BFs)

250Hz

388Hz

670Hz

1159Hz

2005Hz

3469Hz

6000Hz

0.052 0.054 0.056 0.058 0.06 0.062 0.064 0.066 0.068

BF

sit

es

alo

ng

BM

fo

r A

N s

pik

ing

on

LS

R f

ibre

s (H

z)

Time (seconds)

RTAP ANSP LSR Response (30 BFs)

164

Figure A.10: Continuity between adjacent window frames for RTAP generated auditory nerve spiking probability (ANSP) in high spontaneous rate (HSR) fibres.

250Hz

388Hz

670Hz

1159Hz

2005Hz

3469Hz

6000Hz

0.052 0.054 0.056 0.058 0.06 0.062 0.064 0.066 0.068BF

sit

es

alo

ng

BM

fo

r A

N s

pik

ing

on

HS

R f

ibre

s (H

z)

Time (seconds)

RTAP ANSP HSR Response (30 BFs)

165

Figure A.11: ERBS representation of the first window frame of inner hair cell receptor potential (IHCRP) response in RTAP based on 65 BF channels.

Figure A.12: ERBS representation of the first window frame of neurotransmitter release rate (NRR) for low spontaneous rate (LSR) fibre response in RTAP based on 45 BF channels.

250

391

675

1165

2012

3475

6000

0 0.01 0.02 0.03 0.04 0.05 0.06

BF

Sit

es

(Hz)

Time (ms)

Maximum Load on Machine 1 (IHCRP stage)

250

386

687

1139

2031

3367

6000

0 0.01 0.02 0.03 0.04 0.05 0.06

BF

Sit

es

(Hz)

Time (ms)

Maximum Load on Machine 1 (NRR LSR stage)

166

Figure A.13: ERBS representation of the first window frame of neurotransmitter release rate (NRR) for high spontaneous rate (HSR) fibre response in RTAP based on 38 BF channels.

Figure A.14: ERBS representation of the first window frame of the auditory nerve spiking probability (ANSP) for low spontaneous rate (LSR) fibre response in RTAP based on 38 BF

channels.

250

384

643

1173

2141

3584

6000

0 0.01 0.02 0.03 0.04 0.05 0.06

BF

Sit

es

(Hz)

Time (ms)

Maximum Load on Machine 1 (NRR HSR stage)

250

384

643

1173

2141

3584

6000

0 0.01 0.02 0.03 0.04 0.05 0.06

BF

Sit

es

(Hz)

Time (ms)

Maximum Load on Machine 1 (ANSP LSR stage)

167

Figure A.15: ERBS representation of the first window frame of the auditory nerve spiking probability (ANSP) for high spontaneous rate (HSR) fibre response in RTAP based on 30 BF

channels.

250

388

670

1160

2005

3469

6000

0 0.01 0.02 0.03 0.04 0.05 0.06

BF

Sit

es

(Hz)

Time (ms)

Maximum Load on Machine 1 (ANSP HSR stage)

A Real-time Implementation of the Primary Auditory Neuron ...

Documents

Transcript of A Real-time Implementation of the Primary Auditory Neuron ...