A Real-time Implementation of the Primary Auditory Neuron ...
Transcript of A Real-time Implementation of the Primary Auditory Neuron ...
Copyright © 2012, Ram Kuber Singh
A Real-time Implementation of the Primary Auditory Neuron Activities
by
Ram Kuber Singh
A thesis submitted in fulfilment of the
requirements for the degree of
MASTER OF ENGINEERING (HONOURS)
Supervisor: Professor André van Schaik
Co-Supervisors: Professor Jonathan Tapson
Dr. Ranjith Liyanapathirana
Bioelectronics and Neuroscience, The MARCS Institute
University of Western Sydney
Sydney, Australia
December 2012
Statement of Authentication
The work presented in this thesis is, to the best of my knowledge and belief except as
acknowledged in the text. I hereby declare that I have not submitted this material, either in
full or in part, for a degree at this or any other institution.
-----------------------------------------------------------------------------
(Signature)
Acknowledgements
I would like to take this opportunity to thank my principal supervisor, Professor André
van Schaik for his invaluable advice. His guidance through every step of this interdisciplinary
project has allowed me to expand my understanding of the project and in the field of
neuroscience. Through my endeavour of this project, he has showed much patience in my
attempt to progressively generate results. I would also like to thank the committee member
of my supervisory panel, Professor Jonathon Tapson for asking questions to help me think
deeper through my work as well as his advice. I also wish to thank Dr Ranjith
Liyanapathirana, who is a committee member of my supervisor panel and who arranged my
first meeting with Professor André. He constantly encouraged me to be persistently
dedicated to my work and offered advice on administrative affairs as well as helping me to
obtain a teaching position in the university. It has truly been an honour to work with all my
supervisors.
I wish also to thank Professor Ray Meddis, the original author of the MAP model, for
updating me with the latest version of the model he is working with. He also got out of his
way in providing me with elaborate technical instructions to attain the desired results with
model. I wish to show my gratitude and thanks to James Wright for his technical support,
advice on optimisation and thesis correction. I want to say thank you to my colleagues at the
BENS group in Marcs Institute, Mark Wang and Gregory Cohen for their association.
I would like to give a special thank you to my sister, Lajwanti and her husband,
Joseph and their family for allowing me to be part of their family and also for their care and
support as well as for tolerating my nuisance acts. A very special thank you goes out to my
mother, who besides offering care and being a pillar of support, went an extra mile to listen
to my concerns and encouraging me throughout the project as well as ensuring I was well
fed. And finally, I wish to offer my transcendental and Hare Krishna thanks to all at Sri
Krishna Mandir who brought out the good qualities in me however little and insignificant they
may be that was amplified during my undertaking of this project.
i
Abstract
Computational models of the auditory pathway simulate the different stages of the
auditory periphery, which includes the outer, middle and inner ear stages. The studies of the
levels of the auditory pathway beyond the inner ear stages require the availability of data
primarily of the cochlea response. If the computational model of the auditory pathway
simulates the cochlea responses slowly, the responses of the higher levels of the auditory
pathway will also be slow. Hence, a real-time computational model of the auditory pathway
provides the capability to study higher levels of sound perception studies in the field of
computational neuroscience by providing on-the-fly or immediate responses of the cochlea
within the auditory pathway.
In this thesis, the development of a real-time computational model of an auditory
pathway is discussed. A review of five auditory pathway computer models is presented and
a model is selected for implementation into a real-time computer model. The transition from
the original model to a real-time implementation includes a translation to C language before
being integrated with JUCE, a C++ graphical user interface library. The input signals in the
real-time model are generated either through a software sine tone generator or acquired
from a microphone channel on a computer. As part of cochlea simulation, the algorithms are
divided to generate responses in channels. A large number of channels results in a finer
resolution of spectral projection of the cochlea response. To achieve the optimum number of
channels in real-time, POSIX threads are used to achieve computing parallelism. As part of
optimisation to load more channels, mathematical optimisation is studied and utilised in the
real-time model.
It will be demonstrated in this thesis that the RMS errors of the responses of the
developed real-time computer model of the auditory pathway as opposed to the original
model measures below 1% and its maximum load is dependent on the computer it runs on.
On a laptop with a dual core CPU, the real-time model is able to simulate 85 channels of the
basilar membrane displacement whereas a desktop with a quicker dual core CPU model
accommodates twice as many channels. With math optimisation enabled, there is 13% and
8% increase in the computation of channels for the laptop and desktop respectively.
However, the RMS errors of the real-time model with math optimisation enabled and the
original model increases to 8% due to approximation errors.
ii
Table of Contents
Acknowledgements ............................................................................................................... iv
Abstract.................................................................................................................................. i
Table of Contents ...................................................................................................................ii
Abbreviations ........................................................................................................................ v
List of Figures ...................................................................................................................... vii
List of Tables ...................................................................................................................... xiv
Code Listings ....................................................................................................................... xv
Chapter 1: Introduction ...................................................................................................... 1
1.1 Motivation ............................................................................................................... 1
1.2 Statement of the Problem ....................................................................................... 2
1.3 Objective of the Research ....................................................................................... 2
1.4 Thesis Outline ......................................................................................................... 3
Chapter 2: Literature Review ............................................................................................. 4
2.1 Auditory Pathway Models........................................................................................ 4
2.1.1 Cooke Periphery Auditory Model...................................................................... 4
2.1.2 Auditory Image Model (AIM2006) ..................................................................... 6
2.1.3 Multiple-Bandpass-Nonlinear (MBPNL) Filterbank ........................................... 9
2.1.4 The Model of Carney and Colleagues ............................................................ 10
2.1.5 Matlab Auditory Periphery (MAP) Model ........................................................ 12
2.1.6 Model Selection ............................................................................................. 13
2.2 The MAP Model of the Human Auditory Pathway ................................................. 15
2.2.1 Outer and Middle Ear ..................................................................................... 16
2.2.2 Basilar Membrane .......................................................................................... 20
2.2.3 Inner Hair Cell ................................................................................................ 26
2.2.4 Neurotransmitter Release .............................................................................. 31
2.2.5 Auditory Nerve ............................................................................................... 35
2.4 Summary .............................................................................................................. 36
Chapter 3: C Representation of MAP .............................................................................. 37
3.1 Buffer Management and Program Structure .......................................................... 38
3.1.1 Buffer Structure .............................................................................................. 38
3.1.2 Algorithm Structure ........................................................................................ 39
3.1.3 Program Structure ......................................................................................... 40
3.2 Parameters Setup ................................................................................................. 41
3.3 IIR Filter ................................................................................................................ 42
iii
3.3.1 Background ................................................................................................... 42
3.3.2 Implementation .............................................................................................. 43
3.4 Outer and Middle Ear ............................................................................................ 46
3.5 Basilar Membrane ................................................................................................. 50
3.6 Inner Hair Cell Receptor Potential ......................................................................... 56
3.7 Neurotransmitter Release Rate ............................................................................. 57
3.8 Auditory Nerve Spiking Probability ........................................................................ 60
3.9 Characteristic Responses for Various Input Settings ............................................. 65
3.10 Summary .............................................................................................................. 69
Chapter 4: Real-time Auditory Periphery (RTAP) ............................................................ 70
4.1 User Interface (UI) ................................................................................................ 71
4.2 Process Priority ..................................................................................................... 75
4.3 Structure and Settings .......................................................................................... 75
4.3.1 Class Structure .............................................................................................. 75
4.3.2 Input Settings ................................................................................................. 77
4.4 Sine Tone Generator ............................................................................................ 78
4.5 Threading ............................................................................................................. 79
4.5.1 Background ................................................................................................... 79
4.5.2 Implementation .............................................................................................. 81
4.5.3 Results........................................................................................................... 85
4.6 Response Plots ..................................................................................................... 87
4.7 Recording Feature ................................................................................................ 89
4.7.1 File Write Command Selection ....................................................................... 89
4.7.2 Binary File Format ......................................................................................... 90
4.7.3 File Writer Thread .......................................................................................... 92
4.7.4 Binary File Recording ..................................................................................... 93
4.7.5 Offline Formatting and Text File Generation .................................................. 95
4.7.6 Results........................................................................................................... 97
4.8 Summary .............................................................................................................. 98
Chapter 5: Signals Display in RTAP .................................................................................. 99
5.1 Static Plot Display ................................................................................................. 99
5.1.1 Line Drawing .................................................................................................. 99
5.1.2 Resource Management ................................................................................ 101
5.1.3 Pixels Render and Image Display Threads .................................................. 102
5.1.4 ERB Scaled Plots ........................................................................................ 103
5.1.5 Spectrogram Plots ....................................................................................... 105
iv
5.2 Scrolling Plot Display .......................................................................................... 112
5.2.1 Background ................................................................................................. 112
5.2.2 Implementation ............................................................................................ 113
5.2.3 Results......................................................................................................... 116
5.3 Summary ............................................................................................................ 117
Chapter 6: Optimisation and Load Profile .......................................................................... 120
6.1 Mathematical Optimisation .................................................................................. 120
6.1.1 Background ................................................................................................. 120
6.1.2 Implementation ............................................................................................ 123
6.1.3 Optimised RTAP Responses ....................................................................... 125
6.1.4. MAP and Optimised RTAP Responses Comparisons .................................. 132
6.2 Load Profile......................................................................................................... 134
6.2.1 Maximum Load ............................................................................................ 134
6.2.2 Thread Profile .............................................................................................. 141
6.3 Summary ............................................................................................................ 144
Chapter 7: Summary, Recommendations and Conclusion ................................................ 146
7.1 Summary ............................................................................................................ 146
7.2 Recommendations .............................................................................................. 147
7.3 Conclusion .......................................................................................................... 148
Bibliography ...................................................................................................................... 149
Appendix A ....................................................................................................................... 154
v
Abbreviations
AN Auditory Nerve
ANSP Auditory Nerve Spiking Probability
API Application Programing Interface
BM Basilar Membrane
BF Best Frequency
BPNL Bandpass-Nonlinear
CF Characteristic Frequency
CI Cochlear Implant
CPU Central Processing Unit
CV Condition Variable
dB Decibel
DP Double Precision
DRNL Dual Resonance Nonlinear
DSP Digital Signal Processor
ERB Equivalent Rectangular Bandwidth
ERBS Equivalent Rectangular Bandwidth Scale
GPOS General Purpose Operating System
GUI Graphical User Interface
HSR High Spontaneous Rate
Hz Hertz
IHC Inner Hair Cell
IIR Infinite Impulse Response
IO Input and output
JUCE Jules’ Utility Class Extension
KHz Kilo-Hertz
LSR Low Spontaneous Rate
MAP Matlab Auditory Periphery
MBPNL Multiple-Bandpass-Nonlinear
MKL Math Kernel Library
MVS Microsoft Visual Studio
OME Outer and Middle Ear
OS Operating System
vi
PZFC Pole-Zero Filter Cascade
RTAP Real-time Auditory Periphery
RTOS Real-time Operating System
SP Single Precision
SPL Sound Pressure Level
SPM State Partition Model
vii
List of Figures
Figure 2.1: Cooke periphery auditory model .......................................................................... 4
Figure 2.3: AIM2006 model ................................................................................................... 6
Figure 2.4: Analogue electrical circuit model of the middle ear .............................................. 7
Figure 2.5: Transmission line filterbank model. ..................................................................... 8
Figure 2.6: Neurotransmitter flow in IHC ............................................................................... 8
Figure 2.7: Multi-bandpass-linear (MBPNL) filter ................................................................... 9
Figure 2.8: Time-domain (left) and iso-intensity frequency spectra (right) projection of click
response of the MBPNL filter at a BF site of 9 KHz ............................................................. 10
Figure 2.9: The model of Carney and colleagues ................................................................ 11
Figure 2.10: Meddis MAP model ......................................................................................... 12
Figure 2.11: MAP model structure ....................................................................................... 16
Figure 2.12: Outer ear frequency response with peak auditory sensitivity range from 1 KHz
to 4 KHz .............................................................................................................................. 17
Figure 2.13: Acoustic energy transmittance from outer ear to the basilar membrane (BM) in
the (uncoiled) cochlea via the three bones in the middle ear ............................................... 18
Figure 2.14: Outer and middle ear model structure in MAP. ................................................ 19
Figure 2.15: Travelling wave of the basilar membrane from its basal to the apical end ....... 20
Figure 2.16: Spatial response of the BM ............................................................................. 21
Figure 2.17: Single BF point in BM modelled by a dual-resonance nonlinear (DRNL) filter
implemented in MAP. .......................................................................................................... 22
Figure 2.18: Gammatone waveforms of (a) gamma distribution, (b) sinusoidal tone and (c)
the resulting waveform of the product of (a) and (b) ............................................................ 23
Figure 2.19: Gammatone filterbank frequency response with 10 filters ............................... 24
Figure 2.20: DRNL summed output response with (a) medium level input and (b) high level
input .................................................................................................................................... 26
viii
Figure 2.21: IHC stereocilia motion effects on its electrical potential ................................... 27
Figure 2.22: Input-output relationship between auditory input stimulus and hair cell receptor
potential .............................................................................................................................. 28
Figure 2.23: Fluid-stereocilia coupling ................................................................................. 28
Figure 2.24: BM deflection causes changes in conductance in ion channels ....................... 29
Figure 2.25: Inner hair cells (IHC) membrane passive electrical circuit model. .................... 31
Figure 2.26: Neurotransmitter release from the IHC to the auditory nerve fibre ................... 32
Figure 2.27: Neurotransmitter discharge and retrieval flow ................................................. 34
Figure 3.1: Direct form type 2 IIR filter implemented by Matlab filter command. .................. 43
Figure 3.2: Induction of numerator and denominator coefficients at the initial phase of input
data sample streamed into IIR filter algorithm. .................................................................... 45
Figure 3.3: OME processing of multiple window frames. ..................................................... 49
Figure 3.4: Stapes displacement response in MAP and RTAP-numerical. .......................... 50
Figure 3.5: 3rd-order gammatone filter implementation in RTAP-numerical. ......................... 52
Figure 3.6: DRNL filter processing for 1 BF channel in multiple window frames. ................. 53
Figure 3.7: BM and IHC cilia displacement responses generated by MAP and RTAP-
numerical. ........................................................................................................................... 55
Figure 3.8: IHCRP algorithm processing for 1 BF channel in multiple window frames. ........ 56
Figure 3.9: IHC receptor potential response generated by MAP and RTAP-numerical. ....... 57
Figure 3.10: NRR algorithm processing for 1 AN channel in multiple window frames. ......... 58
Figure 3.11: LSR AN fibre neurotransmitter vesicle release rate for MAP and RTAP-
numerical. ........................................................................................................................... 59
Figure 3.12: HSR AN fibre neurotransmitter vesicle release rate for MAP and RTAP-
numerical. ........................................................................................................................... 60
Figure 3.13: ANSP processing for 1 AN channel in multiple window frames. ...................... 62
Figure 3.14: Probability of AN spiking in LSR fibre for MAP and RTAP-numerical. .............. 65
ix
Figure 3.15: Probability of AN spiking in HSR fibre for MAP and RTAP-numerical. ............. 65
Figure 3.16: Normalised RMS errors between MAP and RTAP-numerical for a 500 Hz sine
tone input observed from a 250 Hz BF channel. .................................................................. 66
Figure 3.17: Normalised RMS errors between MAP and RTAP-numerical for a 1000 Hz sine
tone input observed from a 1039 Hz BF channel. ................................................................ 67
Figure 3.18: Normalised RMS errors between MAP and RTAP-numerical for a 3000 Hz sine
tone input observed from a 3109 Hz BF channel. ................................................................ 67
Figure 3.19: Normalised RMS errors between MAP and RTAP-numerical for a 5000 Hz sine
tone input observed from a 5377 Hz BF channel. ................................................................ 68
Figure 4.1: RTAP main user interface. ................................................................................ 73
Figure 4.2: RTAP user interface for setting parameters. ...................................................... 74
Figure 4.3: RTAP object oriented class layout. .................................................................... 76
Figure 4.4: Sequential execution in RTAP. .......................................................................... 80
Figure 4.5: Thread synchronisation pseudocode in RTAP. .................................................. 82
Figure 4.6: Thread utilisation structure in RTAP. ................................................................. 83
Figure 4.7: Thread synchronisation in RTAP. ...................................................................... 84
Figure 4.8: Intel thread checker analysis of RTAP usage of threads. .................................. 86
Figure 4.9: (a) MAP and (b) RTAP DRNL response for 30 BF channels.............................. 88
Figure 4.10: RTAP binary file format generated when the ‘Record’ or ‘Play+Record’ button is
clicked. ................................................................................................................................ 91
Figure 4.11: File write thread operation. .............................................................................. 93
Figure 4.12: Continuity between adjacent window frames for RTAP generated DRNL
response. ............................................................................................................................ 97
Figure 5.1: Line draw test. ................................................................................................. 100
Figure 5.2: ERBS representation of the first window frame of DRNL response in RTAP
based on 85 BF channels. ................................................................................................ 104
x
Figure 5.4: Spectrogram representation of the first frame window frame of the dual
resonance nonlinear (DRNL) filterbank response in RTAP for 180 BF channels. .............. 108
Figure 5.5: Spectrogram representation of the first window frame of the inner hair cell
receptor potential (IHCRP) response in RTAP for 123 BF channels. ................................. 109
Figure 5.6: Spectrogram representation of the first window frame of the neurotransmitter
release rate (NRR) response for AN LSR fibres in RTAP for 96 BF channels. .................. 110
Figure 5.7: Spectrogram representation of the first window frame of the neurotransmitter
release rate (NRR) response for AN HSR fibres in RTAP for 81 BF channels. .................. 110
Figure 5.8: Spectrogram representation of the first window frame of the auditory nerve
spiking probability (ANSP) response for LSR fibres in RTAP for 85 BF channels. ............. 111
Figure 5.9: Spectrogram representation of the first window frame of the auditory nerve
spiking probability (ANSP) response for HSR fibres in RTAP for 65 BF channels. ............ 111
Figure 5.10: Image buffer clipping and projection of the display window. .......................... 114
Figure 5.11: ANSP response in LSR fibres of real-time speech illustrated in RTAP. ........ 118
Figure 5.12: ANSP response in HSR fibres of real-time speech illustrated in RTAP. ......... 119
Figure 6.1: 64-bit floating point format divided into two halves for fast exponentiation. ...... 123
Figure 6.2: Dual resonance nonlinear (DRNL) response generated in RTAP based on
optimised exponential function for 192 BF channels. ........................................................ 126
Figure 6.3: Inner hair cell receptor potential (IHCRP) response generated in RTAP based on
optimised exponential function for 155 BF channels. ........................................................ 127
Figure 6.4: Unstable HSR ANSP response after refractory period upon the start. ............. 128
Figure 6.5: IHCRP response displayed in RTAP based on optimised exponential function for
155 BF channels. .............................................................................................................. 129
Figure 6.6: Neurotransmitter release rate (NRR) response for low spontaneous rate (LSR)
fibre displayed in RTAP based on optimised exponential function for 123 BF channels. ... 130
Figure 6.7: Neurotransmitter release rate (NRR) response for high spontaneous rate (HSR)
displayed in RTAP based on optimised exponential function for 104 BF channels. ........... 130
xi
Figure 6.8: Auditory nerve spiking probability (ANSP) response for low spontaneous rate
(LSR) displayed in RTAP based on optimised exponential function for 107 BF channels. . 131
Figure 6.9: Auditory nerve spiking probability (ANSP) response for high spontaneous rate
(HSR) fibre displayed in RTAP based on optimised exponential function for 79 BF channels.
......................................................................................................................................... 131
Figure 6.10: Normalised RMS errors for various responses between MAP and optimised
RTAP based on a 500 Hz sine tone input observed from a 250 Hz BF channel. ............... 133
Figure 6.11: Normalised RMS errors for various responses between MAP and optimised
RTAP based on a 1000 Hz sine tone input observed from a 1039 Hz BF channel. ........... 133
Figure 6.12: Normalised RMS errors for various responses between MAP and optimised
RTAP based on a 3000 Hz sine tone input observed from a 3109 Hz BF channel. ........... 134
Figure 6.13: Normalised RMS errors for various responses between MAP and optimised
RTAP based on a 5000 Hz sine tone input observed from a 5377 Hz BF channel. ........... 134
Figure 6.14: Maximum load profile for non-optimised single precision execution of RTAP on
machines 1 and 2. ............................................................................................................. 137
Figure 6.15: Maximum load profile for optimised single precision execution of RTAP on
machines 1 and 2. ............................................................................................................. 138
Figure 6.16: Maximum load profile for non-optimised double precision execution of RTAP on
machines 1 and 2. ............................................................................................................. 139
Figure 6.17: Maximum load profile for optimised double precision execution of RTAP on
machines 1 and 2. ............................................................................................................. 140
Figure 6.18: Pixel render thread profile of RTAP on machines 1 and 2. ............................ 142
Figure 6.19: Record thread profile of RTAP on machines 1 and 2. .................................... 143
Figure 6.20: Onscreen signal display profile for maximum load in RTAP for machines 1 and
2........................................................................................................................................ 144
Figure A.1: MAP and RTAP inner hair cell receptor potential (IHCRP) response for 30 BF
channels. .......................................................................................................................... 157
Figure A.2: MAP and RTAP low spontaneous rate (LSR) fibre neurotransmitter release rate
(NRR) response for 30 BF channels. ................................................................................ 158
xii
Figure A.3: MAP and RTAP high spontaneous rate (HSR) fibre neurotransmitter release rate
(NRR) response for 30 BF channels. ................................................................................ 159
Figure A.4: MAP and RTAP low spontaneous rate (LSR) fibre auditory nerve spiking
probability (ANSP) response for 30 BF channels. ............................................................. 160
Figure A.5: MAP and RTAP high spontaneous rate (HSR) fibre auditory nerve spiking
probability (ANSP) response for 30 BF channels. ............................................................. 161
Figure A.6: Continuity between adjacent window frames for RTAP generated inner hair cell
receptor potential (IHCRP) response. ............................................................................... 162
Figure A.7: Continuity between adjacent window frames for RTAP generated
neurotransmitter release rate (NRR) in low spontaneous rate (LSR) fibres. ...................... 162
Figure A.8: Continuity between adjacent window frames for RTAP generated
neurotransmitter release rate (NRR) in high spontaneous rate (HSR) fibres. .................... 163
Figure A.9: Continuity between adjacent window frames for RTAP generated auditory nerve
spiking rate (ANSP) in low spontaneous rate (LSR) fibres. ............................................... 163
Figure A.10: Continuity between adjacent window frames for RTAP generated auditory nerve
spiking probability (ANSP) in high spontaneous rate (HSR) fibres..................................... 164
Figure A.11: ERBS representation of the first window frame of inner hair cell receptor
potential (IHCRP) response in RTAP based on 65 BF channels. ...................................... 165
Figure A.12: ERBS representation of the first window frame of neurotransmitter release rate
(NRR) for low spontaneous rate (LSR) fibre response in RTAP based on 45 BF channels.
......................................................................................................................................... 165
Figure A.13: ERBS representation of the first window frame of neurotransmitter release rate
(NRR) for high spontaneous rate (HSR) fibre response in RTAP based on 38 BF channels.
......................................................................................................................................... 166
Figure A.14: ERBS representation of the first window frame of the auditory nerve spiking
probability (ANSP) for low spontaneous rate (LSR) fibre response in RTAP based on 38 BF
channels. .......................................................................................................................... 166
Figure A.15: ERBS representation of the first window frame of the auditory nerve spiking
probability (ANSP) for high spontaneous rate (HSR) fibre response in RTAP based on 30 BF
channels. .......................................................................................................................... 167
xiv
List of Tables
Table 2.1: Review of AP model selection for real-time implementation. .............................. 15
Table 3.1: Memory allocation for IO parameters and algorithm coefficients. ........................ 39
Table 3.2: Algorithm functions in RTAP-numerical. ............................................................. 40
Table 3.3: Input settings of MAP and RTAP-numerical. ....................................................... 42
Table 4.1: Computing system platform used for RTAP development and testing. ............... 71
Table 4.2: RTAP settings for acquiring various responses. ................................................. 78
Table 4.3: Threading API comparison. ................................................................................ 81
Table 4.4: C/C++ file write profile. ....................................................................................... 90
Table 5.1: Spectrogram colour hue significance to the various stages of RTAP. ............... 108
Table 6.1: Non-optimised mathematical functions utilised in RTAP. .................................. 122
Table 6.2: Performance comparison of exponential function in MVS and MKL math libraries
and Schraudolph algorithm on machine 1. ........................................................................ 125
xv
Code Listings
Listing 3.1: RTAP-numerical program structure. .................................................................. 41
Listing 3.2: IIR filter. ............................................................................................................ 46
Listing 3.3: Input and output parameters save and load feature in the IIR filter.................... 48
Listing 3.4: DRNL computation. ........................................................................................... 51
Listing 3.5: AN spiking and non-spiking probabilities for LSR and HSR AN fibre types. ....... 64
Listing 4.1: Sine tone generator. ......................................................................................... 79
Listing 4.2: Data writes to binary file in file writer thread. ..................................................... 95
Listing 4.3: RTAP offline processing. ................................................................................... 96
Listing 5.1: Acquisition of maximum and minimum values. ................................................ 106
Listing 5.2: Static spectrogram display. ............................................................................. 107
Listing 5.3 Subsampling processed data in all algorithm functions. ................................... 113
Listing 5.4: ERBS and spectrogram plot scrolling. ............................................................. 115
Listing 6.1: Fast exponential computation. ........................................................................ 124
Listing 6.2: Code for comparing MKL & Schraudolph exponential function. ....................... 125
1
Chapter 1: Introduction
1.1 Motivation
Recent findings in anatomy and physiology of humans and animals have provided
much information about the auditory pathway (AP), which includes the outer, middle and
inner ears. To integrate these research findings, an AP computer model is a useful analytical
tool to probe into the intricacies of the AP functionality. It is an emerging area in the field of
computational neuroscience that simulates the characteristic of known physiological and
psychophysical attributes of the AP [1].
A real-time implementation of the AP computational model processes a stream of
auditory stimuli ‘on-the-fly’ and outputs the corresponding stream of processed data for
analysis. AP computer models that simulate empirical data possess subcomponents that are
nonlinear. This means that the responses of basilar membrane (BM), inner hair cell (IHC)
receptor potential and auditory nerve (AN) spiking change disproportionately with respect to
time [1]. Computationally, nonlinear algorithms add to the processing time as compared to
when linear algorithms are used. As a result, code optimisation and computational speed
enhancing code are required in the real-time implementation of the AP model. A real-time
implementation adds a new dimension for studying sound perception. One derivative of
sound perception studies is speech processing analysis based on auditory nerve spiking
events [2] [3] [4]. A digitised auditory stimulus streamed from a live audio source via a
microphone can be analysed for common speech signatures continuously and illustrated
graphically.
Another use of a real-time AP computational model is for feasibility studies of
algorithm portability on to embedded systems incorporating FPGA, DSP or ARM based
processors. Algorithms capable of portraying the characteristics of a subcomponent of the
AP in real-time on a computer gives a strong indication that it can be processed on an
embedded system where performance is either equivalent or superior due to hardware
processing acceleration features of an embedded processor. Embedded system
implementation of the real-time AP model includes speech processor of cochlear implants [5]
and enhanced automatic speech recognition devices utilised with conventional signal
processing algorithms [6]. Another utilisation is in telecommunication engineering where a
real-time AP model algorithm simulating the cochlea within the inner ear is implemented in
mobile phone devices to perform noise cancellation in real-time [7].
2
1.2 Statement of the Problem
An AP model is characterised with algorithms that simulate the responses of various
stages of the AP and a mathematical scripting language environment such as Matlab is one
platform used for such characterisation. As such, variations to system parameters that alter
the responses of the various stages in the AP model are typically done in the code. For a
large code base, this may be problematic due to the grounds of unfamiliarity and especially if
the user is not proficient in programming. Furthermore, if analysis is required on real-world
audio, a recording is required, which is then fed to the AP model for generating appropriate
responses. If the recording is large, the simulation will run for a significant duration of time
before producing interpretable responses. In other words, from the availability of the audio
signal for analysis to the simulation and the acquisition of the AP responses, there will be
significant delay.
This is especially inefficient if the AP model is connected to a hybrid network
integrated with other models simulating higher echelons of the auditory pathway. These
upstream AP models that are dependent on the response of the auditory nerve (AN) firings
in the AP model have to wait until the entire input stimulus is processed and response data
are made available before further processing can be carried out. As a result of this
bottleneck, the response of the entire network will also be delayed. A more efficient solution
is to use a real-time AP model integrated with a graphical user interface (GUI) to allow
system parameters to be modified. Computations are performed on-the-fly and thereafter, its
responses are visually projected for analysis. As such in a hybrid network, a real-time AP
model will generate instant responses that allow upstream AP models to function with
insignificant delays.
1.3 Objective of the Research
The main objective of this research is to implement a real-time computational model
of the AP that includes the outer and middle ear, the BM, IHC and AN spiking characterised
by a working non-real-time model. The real-time implementation should replicate identical or
similar characteristics as the original AP model given an identical input auditory stimulus.
The primary objectives can be further broken down into the following essential aims:
1. Review AP computer models and select one for real-time implementation.
2. Translate the available code for the selected AP model into a transitory C language
interface consisting only of the AP algorithms.
3. Test and verify the C-implemented AP algorithms closely match the behaviour of the
original AP model.
3
4. Develop C++ wrapper classes around the C-implemented algorithms of the AP model
to suit windows based GUI and audio input streaming features.
5. Test and verify the results of the C++-implemented model closely match the original
AP model.
6. Optimise the performance of the real-time model so as to incorporate more channels
to simulate.
7. Profile performance of the implementation and determine the limits of the model in
terms of number of channels and stages.
1.4 Thesis Outline
The following chapter will include the review of five AP computational models. The
criteria for AP model selection for real-time implementation is mentioned thereafter based on
latest research findings, AP completeness that is defined as the modelling of the outer,
middle and inner ear up to the point of AN spiking, accessibility to existing code for
simulation and ease of running the code. The latter stages of chapter 2 will cover the
detailed characteristics of the selected AP model for real-time implementation.
Chapter 3 covers the description of the transitory C-implemented platform translation
of the AP model from the basilar membrane (BM) to auditory nerve (AN) spiking stage. The
development and results of the C-implemented platform are illustrated and compared with
the responses of the original AP model.
Chapter 4 describes the real-time C++ graphical user interface (GUI) based
implementation of the C-platform translation of the AP model. Descriptions of wrapper class
structures, thread implementation and recording features are given.
Chapter 5 covers graphical illustration in the real-time C++ GUI program that include
static and scrolling plotting represented on Equivalent Rectangular Bandwidth (ERB) scaled
as well as spectrogram graphs. Projections of real-time auditory nerve spiking probabilities
are also presented.
Chapter 6 illustrates mathematical optimisation to speed up computations on the
real-time AP model. Load profiles of the real-time AP model are presented. This thesis is
finally concluded in chapter 7 with recommendations for future work as well as the
conclusion of this dissertation.
4
Chapter 2: Literature Review
2.1 Auditory Pathway Models
Five auditory pathway (AP) time domain computing models are reviewed in this
section of the chapter. A model is selected for real-time implementation and this will be
covered in the subsection of 2.1.6.
2.1.1 Cooke Periphery Auditory Model
The method adapted by Cooke [2] is to use human perceptual models that accounts
for psychophysical scaling. This involves the implementation of a filterbank comprised of
filters in parallel at various frequency intervals to represent auditory pathway physiology
response. Figure 2.1 illustrates the computer model adapted by Cooke. The outer and
middle ear (OME) is modelled using a 1st-order difference equation that boosts higher
frequencies though this is not projected visually in figure 2.1. The components of the inner
ear are modelled by independently parallel channels differentiated by the spatial
displacement within the cochlear. The basilar membrane (BM) is modelled based on the
BandPass-NonLinear (BPNL) model proposed by Pfeiffer [8]. The BPNL model is composed
of two bandpass filters with a static compressive nonlinear component positioned in between
them.
Figure 2.1: Cooke periphery auditory model [2].
The first block in figure 2.1 represents an asymmetric 4th-order bandpass filter with
poles corresponding to a Butterworth based filter design. It has zeros embedded strategically
within the Butterworth difference equation template so as to achieve a sharp high frequency
cut-off. The nonlinear compressive component is made of a square root function and the
subsequent block represents a symmetric 4th-order Butterworth bandpass filter. The filters
are implemented in a recursive form that speeds up computation. Figure 2.2 displays the
response of the parallel BPNL filterbanks that are displayed vertically one after another with
Fine structure extraction
Combination of responses
State Partition
Model (SPM)
Envelope detection nl bp1 bp2 stimulus
5
respect to the alignment of the centre frequencies along the spatial domain of the basilar
membrane. All centre frequencies for this plot have been converted to Bark scale, which
projects combined responses at various spatial points along the BM in linear spacing for the
critical band of human hearing [9].
Figure 2.2: BPNL filterbank response indicating travelling waveforms in a BM [2].
The output of the envelope detection after the BM module is used as the input to the
State Partition Model (SPM) that simulates the behaviour of the inner hair cells (IHC) as well
as the auditory nerve (AN) spiking based on the neurotransmitter release. AN spikes
generation are identified as three unique states which rely on a sensitivity threshold to
release neurotransmitters:
• Stimulus level higher than the sensitivity threshold triggers the depletion of
neurotransmitter and its respective replacement from the IHC.
• The sensitivity threshold is lower than the stimulus level which does not trigger
any neurotransmitter release. Though the cell has been depleted of
neurotransmitter recently and is stocked up again.
• The IHC remains inactive that suggests neither the occurrence of
neurotransmitter depletion nor replenishment.
6
The Cooke model is able to process of 61 discrete channels that simulate the
behaviour of the cochlea within the range of 100 Hz to 5000 Hz. This frequency range is
ideal for speech analysis [2]. Furthermore, the neurotransmitter release from the SPM block
simulates auditory nerve spiking allowing plausible speech data analysis as well.
2.1.2 Auditory Image Model (AIM2006)
AIM2006 was written in Matlab and developed by Patterson [3]. It supports two
different algorithm of the auditory pathway model that is categorised based on functionality
and physiology. AIM2006 has simulated 75 channels covering the speech spectral range of
100 Hz to 6000 Hz. Figure 2.3 illustrates the AIM2006 model.
Figure 2.3: AIM2006 model [3].
���������� ����������
���������� �������� ����
Middle ear filtering Middle ear filtering
Spectral analysis
Basilar membrane motion
Gammatone filter
Spectral analysis
Basilar membrane motion
Transmission line filtering Spectral sharpening
Compression
Neural encoding
Neural activity pattern
Inner hair cell simulation
Time-interval stabilisation
Correlogram
Autocorrelation
Neural encoding
Neural activity pattern
Compression
2D adaptive thresholding
Time-interval stabilisation
Auditory images
Strobed temporal integration
7
In both the functional and physiological model, the middle ear filtering is developed
by Lutman [10] using analogue electronic impedances. Figure 2.4 illustrates the electrical
circuit to model the middle ear with the impedances of every subcomponent within the
middle ear bounded with a dotted box. The voltage input to the circuit represents the
acoustic pressure at the eardrum and the current in the ‘stapes+cochlea’ branch denotes
stapes vibration velocity.
Figure 2.4: Analogue electrical circuit model of the middle ear [10].
The BM in the functional model is designed using a linear gammatone filterbank with
the parameters derived by Patterson [11]. Stapes velocity from the middle ear is converted
to displacement and is fed into the gammatone filterbank. The output of the filterbank is a
multi-channel representation of the BM displacement. The nonlinear cochlea compression is
added in the neural encoding stage. The physiological model of the BM is designed based
on a nonlinear transmission line filterbank with feedback parameters simulating the outer
hair cell (OHC) mechanical amplification of the BM motion [12]. The transmission line
filterbank is based on electroacoustic properties and comprises analogue electronic
components as illustrated in figure 2.5.
8
Figure 2.5: Transmission line filterbank model. Branches 1 and n denotes the basal site and apical sites of the BM respectively [12].
The subsequent phase of the AIM2006 model is the transduction of the BM
displacement that generates neural auditory pattern (NAP). In the functional model, NAP is
computed by using a two dimensional adaptive threshold filterbank. The BM displacement is
rectified and compressed before adaptation occurs in the time domain and suppression in
the frequency domain which aids in the sharpening of vowel formants. The physiological
model generation of NAP adopts Meddis IHC neurotransmitter flow model [13] and is
illustrated in figure 2.6. Meddis neurotransmitter model characterises the release and
replenishment of neurotransmitter of a single IHC, based on the corresponding BM
displacement level. Every BM site simulated by the transmission line filters is cascaded with
an IHC neurotransmitter model that corresponds to a single afferent nerve fibre.
Figure 2.6: Neurotransmitter flow in IHC [13].
Factory
1 Free Transmitter Pool
q
Reprocessing Store
w
Cleft
c
y(1 – q)
xw
rc
kq
Reuptake
Lost
9
The auditory image display is the final phase of the model which represents the
visual representation of sound. It is obtained by applying temporal integration to the NAP
data. This phase of the model is not essential as its scope is beyond the boundary of
auditory nerve spiking and will not be deliberated upon.
2.1.3 Multiple-Bandpass-Nonlinear (MBPNL) Filterbank
The MBPNL filterbank, developed by Goldstein [14], is a quantitative tool that models
the nonlinear behaviour of the BM subcomponent of the AP. Other subcomponents within
the AP are not addressed in this model. However, this model is essential as other AP
models follow the similar structure for BM displacement computation. Each MBPNL filter is
implemented in parallel for discrete spatial points along the BM known as best frequency
(BF) points to form a filterbank. The filter consists of dual signal processing path with
recursive implementation. Figure 2.7 depicts the block diagram of the MBPNL filter. H1, H2
and H3 represent linear bandpass filters. The lower branch of the filter indicated by H1 to H2
denotes the nonlinear narrowband and compressive tuning curves of a BM. H3 to H2
denotes the linear branch of the filter. The compression is achieved generally using either a
square or cube root function.
Figure 2.7: Multi-bandpass-linear (MBPNL) filter [14].
Goldstein [14] hypothesises that the compressive behaviour of the MPBNL filter is
due to OHC transduction in empirical measurements of BM tuning curves. The expander
subcomponent in the filter is postulated to be an expansive excitatory and suppressive
feedback response of stimulus tones below BF. This expansive feature cannot be directly
observed in measured BM tuning curves. Lin and Goldstein [15] implemented a version of
the MBPNL model in C language on the Linux operating system (OS) platform to simulate
healthy and damaged cochleae. The sampling frequency was set to 20 KHz and a 4,096
r�c
S22 S32
�c S12
H2(�)
Compressing memoryless nonlinearity
f( ) + Basilar
membrane displacement
r(t) v u
S13
�c S23
H3(�)
S11 �c S21
H1(�)
Expanding memoryless nonlinearity
f-1( )
G
MOC Efferent Control
Stapes velocity
S(t)
10
point window was chosen to represent the signals. A single BF point of 9 KHz was selected
for analysis and auditory stimuli of clicks at different intensities varying from 26 dB to 86 dB
at steps of 10 dB were utilised. Figure 2.8 illustrates the time domain and iso-intensity plots
of the click response.
Figure 2.8: Time-domain (left) and iso-intensity frequency spectra (right) projection of click response of the MBPNL filter at a BF site of 9 KHz [15].
2.1.4 The Model of Carney and Colleagues
The model of Carney and Zhang [4] is a tool for studying the nonlinearities of auditory
nerve (AN) response encoding based on simple and complex sounds. The input to the
model is auditory stimuli pressure in Pascal. The outer and middle ear effects are not
modelled. Instead, the stimulus is fed directly to two paths: signal and control paths. The first
cascade in the signal path is a nonlinear 3rd-order gammatone filter. The nonlinearity is
achieved by a dynamic tuning of the filter time constant that affects the gain and bandwidth
as well as the input stimulus levels. The nonlinearity introduces dc component in the output
of the filter which is biophysically inappropriate. As a result, a 1st-order linear gammatone
filter is introduced to eliminate the dc component.
1000
100
10
3 5 7 9 11
frequency (KHz)
26
36
46
56
66
76 86 86
76
66
56
46
36 26 dB SPL
0 0.5 1.0
0.5 1.0
0 1 2 3 4 5 6
time (ms)
11
Figure 2.9: The model of Carney and colleagues [4].
The control path comprises a time varying 3rd-order wideband gammatone filter. The
bandwidth of this filter is larger than the signal path bandwidth because of the need to
accommodate two-tone suppression. Two-tone suppression refers to the reduction of the
auditory nerve firing rate due to an addition of a second tone to the original tone that
suppresses the effect of one of the tones. In the signal path, the larger bandwidth of the
control path allows for the second tone in the signal path to reduce its gain. The second
block of the control path consists of a nonlinear function to dynamically compress the control
signal. The next nonlinear function coupled with a low-pass filter regulates the range and
compression dynamics. A final nonlinear function fine-tunes the total strength of
compression. The resultant output parameter is a time constant that varies based on the
levels of the input stimulus.
Stimulus P(t)
(t)
LP
NL
NL
Wide-band Filter
CF K
����cp����sp
Control Path
Spike Generator
Spike Time
PTSH 500 Trials
Synapse Model
s(t)
Vihc(t)
Linear Filter
CF
Time-varying Narrowband
Filter Signal Path
LP
NL
IHC
Psp(t)
12
The output of the IHC is a receptor potential and it is computed in two stages. The
first stage is a nonlinear logarithmic compressive function and the second stage is a 7th-
order low-pass filter. The advantage of using a signal processing model at the IHC stage
instead of a biophysical model is that it is easily implementable and it has fast computing
time. The synapse model simulates the release rate of the neurotransmitter. Firstly, the
immediate neurotransmitter permeability rate is computed based on a logarithmic function
with the IHC receptor potential and a BF threshold as its input. After which the
neurotransmitter discharge rate is then computed as a product of the permeability rate and
the quantity of available neurotransmitter to be discharged from the IHC. The auditory nerve
(AN) spike rate is finally computed as a product of the neurotransmitter rate and the Poisson
process that accounts for the historical spike discharge rates.
2.1.5 Matlab Auditory Periphery (MAP) Model
The MAP model [6] is an auditory pathway model that simulates outer ear all the way
to the cochlear nucleus and brainstem. Figure 2.10 illustrates the Meddis MAP model. The
outer ear consists of a parallel 1st-order band-pass filter that enhances the amplitude of
spectral range of speech. The output is sound pressure in Pascal which then is fed to a 1st-
order low-pass filter resulting in the tympanic membrane (TM) or eardrum displacement in
metres. The TM parameter is passed into a 1st-order high-pass filter that generates stapes
displacement.
Figure 2.10: Meddis MAP model [16].
The BM is modelled using a filterbank comprising a dual resonance nonlinear
(DRNL) filter [17], [18]. The DRNL filter bears its origin from the MBPNL filter though the
pathway is uniquely modelled. Stapes displacement parameter is fed to two parallel paths
made up of a linear and a nonlinear branch. The linear path has a broadly tuned gammatone
filter. The nonlinear path comprises a cascade of a gammatone filter with a narrowly tuned
bandwidth, a memoryless compressive function and another narrowly tuned gammatone
13
filter. The two paths are summed at the end that results in BM displacement in metres. To
achieve level dependent response from the DRNL filter, the gains of the two filters paths are
relatively set. Level dependent BF shifts are also accounted for by setting the different centre
frequencies for the broadband and narrowband filters. Two gammatone filters are used in
the nonlinear path to accommodate combination tones that include two-tone suppression.
A high-pass filter is used to convert the BM displacement to stereocilia displacement.
A biophysical model of the inner hair cell (IHC) is used in the MAP model that translates the
stereocilia displacement to intracellular potential to indirectly invoke neurotransmitter
release. Stereocilia displacement is firstly converted to basolateral conductance within the
respective IHC. From the conductance, the IHC voltage is derived using a Hodgkin-Huxley
model that encompasses a Boltzmann function. The IHC receptor potential influences the
flow of calcium into the IHC. The quantity of calcium ions densely packed at the synapse of
the IHC dictates the probability of release of neurotransmitter vesicles [19]. The computation
of the AN spiking is a stochastic process and is calculated as a probability binary release
rate based on the product of the concentration of calcium ions and the quantity of available
neurotransmitter for release. A more computationally intensive spike event computation is
also available in the MAP model that is required for the next stage of brain stem neuron
computing [13] though this remains outside the purview of the objectives.
2.1.6 Model Selection
The MAP model is selected to be implemented in real-time. Firstly, the code is readily
accessible and there are ready-to-run Matlab script files that demonstrate the response of
the subcomponents of the MAP model. Furthermore, as MAP version 14 is the latest version
as of the writing of this document, technical support is available from Ray Meddis. This is
especially essential as the real-time implementation should be designed to have a
considerably close operational characteristic with the original model and any unexpected
behaviour in the original model can be resolved with close consultation with the author.
Out of all the models reviewed in this document, the MAP model is the most recently
updated model with developments still being implemented to date. A recent alternative
model is the AIM model. Although a real-time implementation of the model titled AIM-C had
been recently developed with C++ [20], many modules within AIM-C conform to the
functional operational characteristics of the AIM model structure in figure 2.3 with the
exception of a pole-zero filter cascade (PZFC) [21]. The PZFC is a more realistic and
complex auditory filter than the default gammatone filter in place in the AIM model and it is
part of BM motion algorithm selectivity. As a result of the implementation of the functional
model over the more biophysically accurate physiological model, AIM-C is capable of
14
processing a filterbank of up to 200 channels and display its auditory nerve response in real-
time on a dual-core CPU running over 2GHz [22]. However, the real-time responses of AIM-
C slow in proportion to an increase in the number of channels beyond 200. Auditory nerve
spiking events computed in AIM-C take a simplified account of neurotransmitter release as
opposed to a more biophysically sophisticated approach in the AIM 2006 and MAP models.
AIM-C was developed to complement AIM2006. As a consequence, the MAP model was
chosen over AIM-C model as the former offers algorithms that are more sophisticated and
bear biophysical resemblance of close proximity. In comparison to AIM2006, the MAP model
provides an alternative platform to study sophisticated and biophysical aspects of the various
stages of the auditory pathway of various species.
The AN spiking for the MAP model is dependent on a biophysical simulation of the
IHC. Cooke uses a discrete three state model to generate the AN spikes whereas the
MBPNL model only simulates up to the basilar membrane. The Carney model uses a
complete signal processing design for the IHC to influence the AN spiking. Although the
Carney model has fast processing capabilities and is easily implementable with few
parameters, it is unable to reflect low frequency sounds in the IHC onwards to the AN
spiking. Furthermore, the IHC subcomponent of the Carney model does not account for
physiological parameters influencing AN spiking for brief and intense sounds [16]. The IHC
subcomponent of the MAP model based on a biophysical model arrests the problems
encountered from the Carney model and thus, is able to simulate AN spiking based on a
larger variation of auditory stimulus. AIM2006, is the only other model that simulates the IHC
response using a biophysical model. However, as a real-time version, AIM-C was already
developed, AIM2006 was not considered. Table 2.1 provides a summary of the AP model
selection.
15
AP Models Advantages Disadvantages
Cooke PAM • Simple design.
• Fast on modern computers.
• No biophysical attributes.
• Not updated regularly.
• Oldest of the five models.
AIM2006 • Possess biophysical traits.
• Updates available.
• Technical support available.
• Real-time version already
available.
MBPNL • Biophysical simulation of
BM.
• Does not simulate from IHC
stage onwards.
Carney • Easily implementable.
• Fast processing capabilities.
• No biophysical attributes from
IHC onwards.
• Not updated regularly.
MAP • Possess biophysical traits.
• Most recently updated
compared to other models
as of this writing.
• Technical support available.
• Complex algorithm.
Table 2.1: Review of AP model selection for real-time implementation.
2.2 The MAP Model of the Human Auditory Pathway
The MAP model is coded entirely in Matlab and is modelled with the perspective of
providing an accurate simulation of the auditory pathway (AP) that allows a user to modify
and change the parameter settings for a diverse range of human and animal auditory
analysis. The model characterises the AP from the outer and middle ear where the presence
of an auditory stimuli is channelled as an input to the model, up to the auditory nerves where
the corresponding action potential spikes are generated. The MAP model can be broken
down into five cascaded segments. Figure 2.11 displays the block diagram of the MAP
model from the outer ear to the auditory nerve.
16
Figure 2.11: MAP model structure [16]. Shapes coloured in red denotes segments that are
omitted from the real-time software application.
2.2.1 Outer and Middle Ear
of air reaches the outer ear, which is the primary Sound propagated via the medium
interface of the ear with the outside world. The outer ear has a passage where sound waves
are channelled to the end of the outer ear where they reach the tympanic membrane (TM),
also known as the eardrum. The mechanical vibration of the TM is measured as mechanical
pressure in Pascal. In MAP14, the auditory stimulus, also known as the input to the model is
measured in dB SPL. To acquire the reading in Pascal, the auditory stimulus is multiplied
with a scalar, which is calculated as follows:
� = 28� − 6 × 10���������� ! (Eqn. 2.1)
MAP14 has introduced several new features in the outer and middle ear calculation
with respect to earlier versions. One feature is the implementation of external ear resonance
computation. The ear canal or outer ear is the most responsive between 1 KHz and 4 KHz of
the auditory frequency range with a peak near 3 KHz, which is a common occurrence of
human speech [23]. The outer ear behaves like a set of parallel band-pass filters that amplify
the auditory spectral range between 1 KHz to 4 KHz. Hence, the MAP model extracts the
auditory stimuli contents in the range of 1 KHz to 4 KHz using a 1st-order Butterworth band-
pass filter and applies a 10 dB gain before being summed with the original auditory stimuli.
Stimuli
Outer & Middle Ear
Basilar Membrane
Inner Hair Cell
Cochlear Nucleus
Brainstem Level 2
Auditory Nerve
Acoustic Reflex
Medial Olivo Cochlear
effects
17
This parallel resonance enhances the amplitude of the ‘speech’ range of the sensitivity plot
as indicated in figure 2.12 [23].
Figure 2.12: Outer ear frequency response with peak auditory sensitivity range from 1 KHz to 4 KHz [23].
Figure 2.12 is a generalised output ear frequency spectrum that does not account for
the direction of the incidence of auditory stimuli. Typically, the eardrum vibrates as a result of
the directional arrival of an acoustical stimulus [24]. Moreover, the magnitude of eardrum
vibration differs for every person. Ideally, the outer ear is modelled with a directional filter.
MAP14 disregards the effects of directional auditory stimuli upon the outer ear as its main
goal is to deliver a computational tool to establish a general theory of hearing [1]. However,
an outer ear filter model whose parameters accounts for auditory stimuli direction can be
added to the external ear resonance module of the MAP model in future developments.
Past the TM is the middle ear consisting of three tiny bones, namely malleus, incus
and stapes. The malleus is attached to the internal portion of the TM and is connected to the
incus, which in turn is connected to the stapes. The stapes bone connects to an oval window
in the cochlea. The role of the middle ear is to relay the acoustic energy from the TM to the
cochlea via the mechanical vibration of the three bones. For stimulus up to approximately 2
KHz, the acoustic energy is translated by a one dimensional piston-like motion of the stapes
[25]. Hence, the middle ear is considered as an acoustic impedance transformer. The outer
ear has low impedance properties and the oval window on the cochlea has high impedance.
Due to this mismatch, any acoustic energy that bypasses the middle ear will be reflected.
10000 5000 1000 500 200 100 50 20
Frequency (Hz)
18
The middle ear, therefore, attenuates the reflection and allows the acoustic energy to be
channelled to the oval window [26]. Figure 2.13 illustrates acoustic energy transfer from the
outer, to the middle ear and subsequently to the inner ear.
Figure 2.13: Acoustic energy transmittance from outer ear to the basilar membrane (BM) in
the (uncoiled) cochlea via the three bones in the middle ear [27].
MAP14 simulates stapes movement in live human subjects as described by Huber
[28] as opposed to cadaver measurements for earlier versions of MAP. This is because
stapes measurement in cadavers and live human patients do not match [29]. Hence,
simulation of the stapes motion in MAP14 is altered accordingly to the more applicable live
human recordings. Moreover, earlier versions of MAP simulated the TM and stapes output
as velocities. In order to accommodate live human data in MAP14, displacements instead of
velocities are used.
MAP14 models the middle ear as a linear system. The input to the filter is the time-
varying sound wave pressure from outer ear module and its output is a time-varying stapes
displacement. In order to obtain stapes displacement, the relationship between the pressure
and TM velocity is sought and is given as follows:
" = #$ (Eqn. 2.2)
where " is sound wave pressure in Pascal; $ is the TM velocity; and # a constant. An
alternative relationship of the TM velocity is given by
$ = 2%&' (Eqn. 2.3)
where & is the frequency of the auditory stimuli and ' is the TM displacement. Substituting
equation 2.3 into 2.2 results in
19
' = ()*+, (Eqn. 2.4)
It is observed from equation 2.4 that if the auditory stimulus frequency increases, TM
displacement decreases and vice-versa. This behaviour is implemented in MAP14 through
the use of a 1st-order Butterworth low-pass filter with a cut-off frequency at 50 Hz. Modelling
the behaviour in such a way corresponds with human data above 2 KHz [28]. The output of
the low-pass filter is a good fit for TM and stapes displacements data for frequencies beyond
2 KHz. However, there is a need to provide a threshold at very low frequencies. Hence, a
high-pass filter is cascaded to the output of the low-pass filter implemented with a 1 KHz cut-
off whose output yields stapes displacement. Figure 2.14 depicts the outer and middle ear
model structure.
Figure 2.14: Outer and middle ear model structure in MAP.
Filtered stapes displacement in metres fed to basilar membrane.
Unfiltered stapes displacement in metres
Sound pressure in Pascal
1st-order low-pass Butterworth filter, &-./0,, = 5023
1st-order high-pass Butterworth filter, &-./0,, = 1#23
Auditory stimuli
&45647-./0,, = 4#23
1st-order band-pass Butterworth filter, &90:7-./0,, = 1#23
20
2.2.2 Basilar Membrane
The basilar membrane (BM) resides in the fluid filled area within the cochlea and it
responds mechanically to the sound stimuli diffused from the stapes. In the presence of a
stimulus, its effect is propagated from the outer ear to the stapes in the middle ear and
subsequently to the round window of the cochlea, which in turns vibrates the fluid within.
Fluid motion influences the wave displacement on the BM from the basal to the apical end.
Because the width of the BM near the base of the cochlea is much smaller and rigid than its
width at the opposite end at the apex, the travelling wave starts off at a fast speed and its
amplitude and phase gradually increases as the wave propagates onwards to the apical end.
However, at a specific point along the way to the apical end, the travelling wave starts
slowing down and its amplitude is rapidly reduced though its phase continues to increase
[30]. Hence, due to its rigid characteristic near the basal end, the BM is more susceptible to
higher frequencies at this end than lower frequencies. Figure 2.15 shows a representation of
the travelling wave in the BM transmitted from the oval window to the base and finally to the
apex.
Figure 2.15: Travelling wave of the basilar membrane from its basal to the apical end [31].
Alternatively at the apical end, the BM response to lower frequencies is much more
potent than higher frequencies. BM responds tonotopically to auditory stimuli for which there
are numerous points along the BM where the responses are at its peaks [27]. In other words,
for a given stimulus of a fixed frequency, there is point along the BM that generates a peak
response. This point translates to a specific frequency called best frequency (BF).
21
Alternatively, BF for stimuli at a threshold is known as characteristic frequency (CF). The
response of the BM gradually decays at sites moving away from either of the two directions
from the BF point. Hence, the BM can be modelled using a filterbank that comprises of
multiple overlapping filters with various peak BF responses [6]. Figure 2.16 illustrates the BM
response at three discrete BF points.
Figure 2.16: Spatial response of the BM [27].
BM filters are nonlinear and asymmetric. Its asymmetry is due to faster decay for BM
magnitude response for frequencies above a BF point than for frequencies below it as the
auditory stimulus frequency shifts. Its nonlinearity is attributed to its gain for various intensity
stimuli. At low intensity stimulus, the gain is higher than for high intensity stimulus where the
22
gain is compressed. This however applies to basal points. For apical points below 1 KHz, the
compressive gain applies for a larger stimulus intensity range. Stimulus intensity also shifts
BF points and alters bandwidths of CF points, which is another attribute leading to its
nonlinearity [6]. Figure 2.20 (b) illustrates BF shift and bandwidth increase with a large
intensity stimulus.
In MAP14, the lowest and highest BF parameters are specified by the user and every
specific BF points in between the boundaries are spaced equally on a log scale. Each CF is
modelled by a filter called dual-resonance nonlinear (DRNL) filter [17]. The input to the
DRNL filter is stapes displacement from the outer and middle ear module. There are two
parallel paths that make up the DRNL filter. One is a linear path and the other a non-linear
path and the results of these parallel paths are summed at the end of the filter. Figure 2.17
shows the different sub-filters that form the DRNL filter.
Figure 2.17: Single BF point in BM modelled by a dual-resonance nonlinear (DRNL) filter implemented in MAP. The top parallel branch is linear while the bottom branch is nonlinear.
The common sub-filter that makes up the DRNL is a gammatone filter, which
primarily performs spectral analysis on the stapes displacement and outputs BM
displacement in the time domain [11]. It is characterised by the product of a gamma
distribution and a sinusoidal tone. The equation and its resultant waveform are demonstrated
as follows:
;<==<>?@� A="BCD� E�D"?@D� = F>G7H�7I/ cos(�M> + O) (Eqn. 2.5)
where
A: amplitude of the sinusoidal tone
23
N: filter order
�M: ringing frequency in rad/sec
O: initial phase in rad
b: one-sided pole bandwidth in rad/sec
Figure 2.18: Gammatone waveforms of (a) gamma distribution, (b) sinusoidal tone and (c) the resulting waveform of the product of (a) and (b) [32].
The filter parameter, b, represents the duration of the impulse response and hence,
the bandwidth of the gammatone filter. Parameter, N, denotes the filter order and the slopes
of the filter response skirts. Typical gammatone filter order within the range of 3 to 5 shapes
its magnitude to be close to human cochlea mechanics characteristic [11]. To simulate the
entire BM, a bank of gammatone filters is sufficient. However, as the gammatone filter
response is linear, the BM response will be linear as well thereby, deviating from empirical
24
physiological data. Figure 2.19 illustrates the frequency response of a gammatone filterbank
with ten filters for linear BM characterisation.
Figure 2.19: Gammatone filterbank frequency response with 10 filters [33].
In MAP14, the nonlinear branch of the DRNL filter consists of three 1st-order identical
gammatone filters, a broken stick compression function followed by three 1st-order identical
gammatone filters. Its centre frequency corresponds directly with BF points and its
bandwidth is given by the empirical formula:
PQR0R95R = " × PS + T (Eqn. 2.6)
where " and T are constants set at 0.2895 and 250 respectively [16].
Compression is applied in the nonlinear branch of the DRNL filter if the input stimulus
is above a specific threshold. This compression threshold is specified as decibels and fed to
an equation to generate a threshold displacement, which is acquired by multiplying the
decibel-converted compression threshold parameter with a reference value of 10e-9 m.
Scaling the compression threshold with the aforementioned reference value ensure that BM
25
displacement will be within the boundary of normal hearing [16]. The variable compression
threshold is determined by
U>PV = 10� − 9 × 10XYZ[�Z ! (Eqn. 2.7)
where U>PV'P is the compression threshold in dB. The nonlinear equation that is applied to
input stimulus larger than the compression threshold is as follows:
ℎ(>) = DA;@]^(>)_ . a>PV . �^" ba. C?; (cde(/)d)-/fg h (Eqn. 2.8)
where a is the exponent set at 0.2 for the best fit to the plot in figure 2.18;
< is a scalar whose default value is 50,000 [16]. For input stimulus levels below the
compression threshold, the characteristic of the nonlinear path after the first series of
cascaded gammatone filter is linear with the following formula:
ℎi>j = < . ^i>j (Eqn. 2.9)
The linear pathway of the DRNL filter comprises an adjustable linear gain with a
default value of 50 and a cascade of three identical gammatone filters. Its CF is dependent
on the BF points. It is similar to the nonlinear pathway except that there are shifts in BF with
a rise in the stimuli levels. It is characterised by the following empirical formula:
US95R = =A@US95R + a?�&&US95R × PS (Eqn. 2.10)
where =A@US95R and a?�&&US95R constants are set at 153.13 and 0.7341 respectively [16].
Similarly, the bandwidth of the linear path is also dependent on BF points given by
PQ95R = =A@PQ95R + a?�&&PQ95R × PS (Eqn. 2.11)
where =A@PQ95R and a?�&&PQ95R are constants set at 100 and 0.6351 respectively [16].
For medium intensity audio signal input, the nonlinear parallel path with the
compressive function dominates the DRNL filter summed output. The summed output of the
parallel paths of the DRNL filter is linear for very low and high intensity audio signal input.
The aforementioned effects can be observed in figure 2.20(a) for an intermediate audio input
level at 30 dB SPL and 2.20(b) that illustrates high audio input level at 85 dB SPL.
Furthermore, the peak of the summed output of the DRNL filter response from figure 2.20(b)
26
is wider than that of figure 2.20(a) and that the BF position of the higher level signal input
has also been shifted to a lower frequency [17] [18]. It is to be noted that the plots in figure
2.20 is based on earlier MAP versions and hence, the unit of measurement is presented in
metres per second instead of metres.
Figure 2.20: DRNL summed output response with (a) medium level input and (b) high level input [18].
2.2.3 Inner Hair Cell
Mechanoreceptors for vibration that are in the form of IHC are embedded on top of
the BM. The IHC convert mechanical energy of the acoustic transmitted vibration in the BM
to electrical energy. A bundle of hair-like structure called stereocilia resides on top of the
IHC. These stereocilia have varying lengths that increase from one side of the hair cell to the
27
opposite end. The deflection of the stereocilia cluster towards the direction of its longest
strand causes the IHC to increase its electrical potential from its resting potential state, also
known as depolarisation [27]. This act causes the IHC to release glutamate filled vesicles
called neurotransmitters [34]. When the stereocilia bundle is deflected in the direction of the
shortest cilia strand, the IHC potential decreases. This electrical action is known as
hyperpolarisation and as a result, a lesser number of neurotransmitters are released from
the IHC [27]. Figure 2.21 illustrates direction of movement of the hair cell along with the
corresponding receptor potential of the IHC and the action potential spikes of the respective
auditory neuron.
Figure 2.21: IHC stereocilia motion effects on its electrical potential .[27] (A) Stereocilia deflection towards its longest strand causes depolarisation. (B) Stereocilia deflection away
from its longest strand causes hyperpolarisation.
To elaborate further on the receptor potential of the inner hair cell, figure 2.22 shows
an input-output function that defines the relationship of the input stimulus peak pressure in
Pascal and inner and outer hair cells receptor potentials. The changes in depolarisation of
the hair cell occur at positive potential that is larger than the changes in hyperpolarisation
that occur at negative potential. However, the changes in depolarisation occur at a slower
rate than the changes in hyperpolarisation. The depolarisation and hyperpolarisation of the
28
IHC induces either the excitation or inhibition of action potentials in the AN fibres. These
spiking in every AN fibre occur at a specific point of a sinusoidal cycle for a sine tone input
stimulus. These spikes occur consistently for the same point and appear with a phase lag.
Though these spikes may not occur in every cycle, it will fire in multiples of the incurred
phase difference few cycles later. Therefore, phase locking occurs between input stimulus
and AN spikes for low frequencies up to 2 KHz and reduces for frequencies beyond 2 KHz.
For frequencies above 5 KHz, phase locking happens indiscriminately [35].
Figure 2.22: Input-output relationship between auditory input stimulus and hair cell receptor potential [36].
The output of the BM model is BM displacement, which is the input parameter to the
IHC model. In MAP14, the IHC model can be broken down into three phases [19]. The first
phase includes the calculation of the IHC stereocilia displacement. As illustrated in figure
2.23, the rigid swaying motion of the IHC stereocilia is initiated by the fluid in the scala media
as the BM is deflected [19]. This fluid-stereocilia coupling is characterised by equation 2.12.
Figure 2.23: Fluid-stereocilia coupling [37].
29
�- k.(/)k/ + B(>) = �-U-595c$(>) (Eqn. 2.12)
where
U-595c is the IHC cilia gain factor;
�- is the time constant;
$(>) is the BM displacement in metres;
B(>) is the IHC stereocilia displacement in metres.
Equation 2.12 has the characteristics of a high pass filter. At high frequencies, IHC
cilia movement is in phase with the BM displacement and at low frequencies, the stereocilia
movement is in phase with the BM velocity. This fluid-stereocilia coupling relationship is also
independent of the position along the BM, which allows this equation to be used throughout
any BF locations along the BM.
The stereocilia displacement causes the opening and closure of ion channels at its
tip as depicted in figure 2.24. When stereocilia bundle deflects to its longest strand,
potassium channels at the tips open allowing potassium ions to flow in. As potassium ions
are charged, the introduction of more potassium ions contributes to an increase in the
intracellular potential of the IHC. Similarly, when the stereocilia is deflected to its shortest
strand, the potassium channels are closed and as the potassium ions within the cell are
dispersed through a channel in the basolateral membrane [16], the IHC intracellular potential
drops. Another contributor to the intracellular potential is the capacitive effect of the IHC [6].
Figure 2.24: BM deflection causes changes in conductance in ion channels [16].
The magnitude of IHC stereocilia displacement from equation 2.12 determines its
apical conductance. The degree to which potassium channels open is modelled using a
three-state Boltzmann function. Hence, the relationship between the stereocilia displacement
and the IHC apical conductance is mathematically defined as follows:
30
l(B) = l-595cmce n1 + �^" b− .(/)7.!o! h × p1 + �^" b− .(/)7.qoq hrs7H + lc (Eqn. 2.13)
where
l-595cmce is the transduction conductance with all the ion channels open in Siemens;
B(>) is the IHC stereocilia displacement in metres;
Dt, and DH are sensitivity constants that define the precise nonlinearity profile;
Bt and BH are IHC displacement constants;
lc is the passive conductance in the apical membrane which is given by
lc = l0 − l-595cmce n1 + �^" b.!o! h p1 + �^" b.qoq hrs7H (Eqn. 2.14)
where l0 is resting conductance.
An electrical circuit that defines the motion of the IHC cilia is introduced in [37].
Figure 2.25 illustrates the IHC passive circuit. Using the aforementioned IHC apical
conductance and utilising Kirchhoff’s current law that states that the sum of all currents in the
branch of um is zero, the intracellular potential of the IHC, um, can be derived.
Um kvw(/)k/ + l(B)(um(>) − x/) + l+(um(>) − x+y ) = 0 (Eqn. 2.15)
where
um(>) is the intracellular IHC potential;
l+ is the potassium conductance set as a constant of 20 nS.
Um is the capacitance of the cell set as a constant of 4 pF;
x/ is the endocochlear potential set as a constant of 0.1V;
x+y is the reversal potential of the basal current for potassium ions, where x+y is given by
x+y = x+ + x/ z{zY|z{ (Eqn. 2.16)
where }~ is the epithelium resistance and }� is the endocochlear resistance in Ohms.
31
Figure 2.25: Inner hair cells (IHC) membrane passive electrical circuit model.
2.2.4 Neurotransmitter Release
Apart from the IHC receptor potential, the rate of release of neurotransmitter is also
determined by the availability of neurotransmitters in the presynaptic area within the IHC.
The synapse is defined as the medium that connects two cells and it is categorised as either
electrical or chemical [27]. In between the IHC and the auditory nerve fibres, there is a small
gap of extracellular space called synaptic cleft. The link between these two cells is made
using glutamate filled vesicles called neurotransmitters. Figure 2.26 displays the
neurotransmitter release from the IHC.
32
Figure 2.26: Neurotransmitter release from the IHC to the auditory nerve fibre [13].
The release of the neurotransmitters requires the presence of calcium ions within the
presynaptic region of the IHC. In the presence of an auditory stimulus, the IHC membrane
potential is depolarised and triggers the calcium channel to open and allows calcium ions to
flow into the IHC. With ample quantity of the calcium ions, the neurotransmitters are
released from the IHC to the auditory nerve fibres via the synaptic cleft.
With respect to the flow of calcium ions, the neurotransmitter release from the IHC is
divided into three phases. The first phase involves the opening of the calcium ion channels
which is dependent on the IHC intracellular potential, also known as receptor potential. The
calcium current is essential in the neurotransmitter release and is derived from the IHC
receptor potential calculated as follows:
��c(>) = l�cmce=�X�� (>)(um(>) − x�c) (Eqn. 2.17)
where
x�c is the Nernst equilibrium potential for calcium in Volts;
um(>) is the IHC receptor potential in Volts;
l�cmce is the calcium conductance around the synaptic region of the IHC with all channels
open in Siemens;
=�X�(>), represents the fraction of the calcium channels that are open.
33
Upon the presence of significant IHC receptor potential, the calcium channels on the
IHC open after a delay of time [38]. This lag in the response of the calcium channel opening
with respect to the IHC receptor potential is modelled using a 1st-order differential equation
(low-pass filter) as follows:
��X�km�X�(/)
k/ + =�X�(>) = =�X�,� (Eqn. 2.18)
where
��X� is a time constant;
=�X�(>), represents the fraction of the calcium channels that are open;
=�X�,� is the steady state value of =�X� (>) when the rate of change of the calcium channel
opening, km�X�(/)
k/ is zero. It is defined by a Boltzmann function that is dependent on the IHC
receptor potential given by:
=�X�,� = �1 + ��c7H�^"]��cum(>)_�7H (Eqn. 2.19)
where ��c and ��c are constants that exhibit calcium currents from published observations.
The second phase involves the entry of the calcium ions through the calcium
channels and the brief accumulation of these ions in the synaptic region of the IHC. The
effect of calcium is brief due to its rapid removal from the synaptic site either through
dissipation or active chemical buffering [16]. As there is a synaptic delay involved with
chemical synapses, the calcium concentration in the synapse is modelled as a 1st-order, low-
pass filter equation with respect to the calcium current [38]:
��c k��c ��(/)k/ + iU<)|j(>) = ��c(>) (Eqn. 2.20)
where ��c is a time constant. The final phase is the evaluation of the neurotransmitter
vesicle release rate, which is a stochastic process based on the calcium ion concentration:
#(>) = =<^ b]iU<)|j�(>) − iU<)|j/4M� _3, 0h (Eqn. 2.21)
where
k(t) is the neurotransmitter release rate;
3 is a scalar for converting calcium concentration levels into release rate;
iU<)|j/4M� is the threshold for calcium ion concentration that determines the probability of
release of neurotransmitter.
34
Neurotransmitter vesicle release is dependent on its availability in the IHC immediate
store. If there are sufficient neurotransmitters in the immediate store, they are released to
the synaptic cleft. The released neurotransmitters are expected to dock at the post synaptic
afferent fibre where they initiate the auditory nerves to fire before returning to a reprocessing
store within the IHC and repackaged into vesicles. However, some neurotransmitters in the
synaptic cleft are lost. To ensure sufficient neurotransmitters are present in the immediate
store, the IHC manufactures new neurotransmitters in a neurotransmitter factory [13]. Figure
2.27 illustrates neurotransmitter flow between IHC and AN fibre.
Figure 2.27: Neurotransmitter discharge and retrieval flow [16].
In the MAP model, a single neurotransmitter release is sufficient to trigger a voltage
spike in the AN fibre [39]. When there are insufficient neurotransmitter vesicles in the
immediate store for release, synaptic adaptation occurs where the AN fibres are unable to
fire as a result [6]. There are three equations that characterise the quantal process of
neurotransmitter release and replenishment in the IHC and AN [13], [19] and [16]. The first of
three equations is as follows:
k�(/)k/ = ^�(>) + �]V − T(>)_ − #(>)T(>) (Eqn. 2.22)
where T(>) is a time-varying quantity of neurotransmitters in the immediate store;
35
^ is the neurotransmitter transfer rate from the reprocessing store to immediate store;
�(>) is the quantity of neurotransmitter in the reprocessing store;
�]V − T(>)_ is the new neurotransmitter transfer rate from the neurotransmitter factory;
M is the maximum number of neurotransmitters;
#(>) is the neurotransmitter release rate as derived from equation 2.21.
Equation 2.22 accounts for the neurotransmitters in the immediate store that are
ready to be released from the IHC to the synapse at a rate of k(t). The equation also
accounts for newly manufactured neurotransmitters as well as those that are returned from
the reprocessing store. The second of three equations is defined as
k-(/)k/ = #(>)T(>) − Ca(>) − Ea(>) (Eqn. 2.23)
where a(>) is a time-varying quantity of the neurotransmitters in the synaptic cleft;
C is the quantity of neurotransmitters lost in the synaptic cleft;
E is the reuptake rate where neurotransmitters return from the synaptic cleft to the
reprocessing store.
Equation 2.23 defines the quantity of neurotransmitters in the cleft that accounts for
IHC to synapse released neurotransmitters as well as lost and IHC returning
neurotransmitters. The final equation of three equations represents the quantity of
neurotransmitters in the reprocessing store within the IHC that factors in the recycled
neurotransmitters from the synaptic cleft as well as the neurotransmitters that depart for the
immediate store. The equation is as follows:
k:(/)k/ = Ea(>) − ^�(>) (Eqn. 2.24)
2.2.5 Auditory Nerve
The neurotransmitter release event is used for determining AN spike events. MAP14
models this in two ways, using either a probability or a quantal model. The quantal model is
computationally intensive process where every spike is computed in detail. This is essential
for upstream auditory periphery analysis of the cochlear nucleus and brain stem activities.
However, in order to meet the objectives of this project, which is to implement the auditory
pathway up to the auditory nerve spiking, the quantal AN spiking up to the brain stem
activities computation are excluded in this thesis. AN spiking based on probability will be
discussed in this section. The AN firing rate is dependent on the quantity of neurotransmitter
residing in the synaptic cleft [16] and is given by
36
F� &AEA@; E<>� = -(/)k/ (Eqn. 2.25)
With an absolute refractory period of 0.75 ms, a spike initiation is only permissible
after 0.75 ms after the first spike occurrence. Equation 2.26 defines the probability of spikes
occurrence in this absolute refractory period:
�,5M�k = 1 − � (1 − �/7H)/7M�,Mc-/0M� (�M50k/7H (Eqn. 2.26)
where � (1 − �/7H)/7M�,Mc-/0M� (�M50k/7H is the product of probabilities of AN not firing in the
absolute refractory period. The probability of firing at the present time, t is moderated
proportionately to the prospect of the occurrence of any firing during the refractory period
[16]:
�/y = �/ × �,5M�k (Eqn. 2.27)
2.4 Summary
Five auditory pathway models have been presented in this chapter. The MAP model
has been selected for real-time implementation primarily because of its capability to simulate
stages in the auditory pathway that adhere closer to physiological findings as compared to
the other models mentioned. The stages of the model to be implemented in real-time and
their nonlinear characteristics have been described in detail. These stages range from the
outer and middle ear to the auditory nerve spiking in the cochlea.
37
Chapter 3: C Representation of MAP
A real-time system is defined as a system that satisfies response time constraints.
Similarly, real-time systems can be divided into three separate entities: hard, firm and soft. A
hard real-time system must satisfy the response time constraints as failure to meet these
constrictions results in absolute system failure. A firm real-time system allows for a few
deadline misses but anything more than the few misses will result in complete system
failure. A soft real-time system, however, allows for deadline misses although accumulative
deadline misses will lead to the functional degradation of the system [40].
Microsoft Windows operating system (OS) is not designed as a real-time operating
system [41] [42]. A real-time program requires predictable response from dependent
peripheral such as input and output (I/O) channels and Windows does not respond in a
rigidly predictable manner. Hence, hard real-time systems are not feasible for execution on
Windows and the success of firm real-time systems running on Windows is subjective to
deadline prerequisites, which vary for every system. However, Windows is able to
accommodate soft real-time system as intermittent deadline misses are tolerable [42].
Matlab Auditory Periphery (MAP) on Windows OS Therefore, a real-time implementation of
is qualified as a soft real-time system.
A real-time version of the MAP model requires the real-time system to process
auditory input signals of varying durations. Since the end time of the real-time system is
random, computation of a single window frame of discrete audio sample data lasting the
duration of input signal is impractical. While computing responses for small window frames
are insignificant, computing a single window frame consisting of a large duration of input
signal will result in delay in the response of the auditory periphery (AP) computer model.
This delay is varying based on the type of computer platform used and length of the window
frame.
A more practical manner of achieving reduced delay in response in the AP model is
to process discrete audio sampled data in reduced size window frames. The availability of
every window frame of audio samples data is done in a synchronous, regular time triggered
event to the algorithm for computation. Upon the availability of a window frame of data, the
computer starts processing it while the subsequent window frame of audio sampled data are
streamed and stored in a separate buffer segment to be released after the computation of
the current window frame is completed. Hence, through multi-tasking of such nature, real-
time responses of the AP computer model are achievable.
At the start of this project, it was hypothesized that the MAP algorithm had to be
ported to a low level programming language platform such as C as the latter programming
platform offered less computing redundancy and can therefore, compute identical responses
38
as the MAP model in shorter time duration. We called our version of the MAP model on a C-
platform that runs from the Microsoft Windows command prompt RTAP-numerical. It is a
transitional software program that was developed to accommodate and to ensure the
operational integrity of MAP algorithms after translation to the C programming language.
RTAP-numerical is required to process a large block of input data samples by
breaking it down into multiple window frames, similar to the MAP model so as to facilitate
real-time computation that will be described in chapter 4. MAP model processes 10ms
window frame whereas RTAP-numerical handles 58ms window frame based on a sampling
frequency of 22.05 KHz. Though MAP version 1-14 handles multiple window frames, its
memory management in Matlab environment is an obscure process [43]. Hence, this chapter
illuminates the results of investigation of memory utilisation based on multiple window
frames implemented on RTAP-numerical. Another essential task covered in this chapter is
the porting of the Matlab command functions to RTAP-numerical.
3.1 Buffer Management and Program Structure
3.1.1 Buffer Structure
Memory allocation in RTAP-numerical is dependent on the specific stage of interest
in the AP model that the response is set for. In every stage, the common memory allocation
structure caters for input and output (IO) parameters as well as coefficients. Table 3.1
summarises the utilisation of buffers in RTAP-numerical. A window frame size refers to the
audio block size acquired from an audio library in a cyclical period and similarly, it also refers
to the audio data block size that is processed by algorithms in the AP model. In order to
synchronise the aforementioned incoming and outgoing data to and from the audio buffer,
feedback and control parameters are required as part of buffer critical section protection.
Should the outgoing data window frame size for computation vary, redundancy increases as
additional variables are required to coordinate data transfer to and from the audio buffer.
This redundancy is eradicated by allowing the size of the outgoing window frame to be
identical to the incoming window frame size. Therefore, a constant of 1280 is selected as the
default size of one window frame for both incoming and outgoing data blocks. It is based
fundamentally on the incoming data block size with respect to the maximum sampling
frequency of 22.05 KHz of the audio library in the real-time model covered in chapter 4.
The number of auditory nerve (AN) channels is the product of the number of best
frequency (BF) channels and the number of AN fibre types used. At the start of RTAP-
numerical, the three aforementioned parameters are set and the buffers are allocated and
subsequently, the algorithms are computed. During the runtime of RTAP-numerical, these
39
buffers are reused at the instance of computation in every window frame regardless of tasks
execution in the AP model.
Tasks Buffer Size
Outer and middle ear (OME) 1280
Basilar membrane (BM) Number of BF channels * 1280
Inner hair cell (IHC) Number of BF channels * 1280
Neurotransmitter release rate (NRR) Number of AN channels * 1280
Auditory Nerve Spiking Probability (ANSP) Number of AN channels * 1280
Table 3.1: Memory allocation for IO parameters and algorithm coefficients.
3.1.2 Algorithm Structure
The structure of the functions in RTAP-numerical is designed to make as few function
calls as possible from BM to ANSP stages. As an illustration, let each task in table 3.1
represent a function call in C and it is assumed that these functions calls are made from a
main function within RTAP-numerical. Upon the availability of one window frame of sampled
audio data, the OME function is called once. The number of invocation of BM algorithm
function, however, will be dependent on the number of BF channels. With the involvement of
IHC algorithm, the number of function calls is twice the number of BF channels which
includes BM algorithm and excludes the OME invocation. The number of function calls for
NRR and ANSP algorithms introduce a maximum of four and six times of the number of BF
channels invocations based on a maximum of two AN fibre types used. A large number of
BF channels will add to the processing time in terms on the number of function calls should
this setup be used. In order to alleviate this inefficiency, the functions are categorically
implemented based on a specific response of interest. Table 3.2 summarises the
implementation of functions in RTAP-numerical.
40
Tasks Functions
DRNL DRNL-to-IHCRP DRNL-to-NRR DRNL-to-ANSP
BM � � � �
IHC � � �
NRR � �
ANSP �
Function Output
BM
displacement
IHC receptor
potential
Neurotransmitter
vesicle release
rate
AN spiking
probability
Table 3.2: Algorithm functions in RTAP-numerical.
Each of the four functions in table 3.2 when invoked computes the respective
algorithms assigned to it, which is denoted by a tick symbol. These functions also need to be
invoked just once per window frame that computes the respective responses for all BF
channels. It is only able to output one type of response as indicated by the ‘Function Output’
row of table 3.2. By restricting data availability to one buffer type for one function, code
complexity in terms of shared memory protection is minimised. This function structure also
adds a dimension of algorithm selectivity, which is advantageous in a graphical user
interface (GUI) based real-time implementation, as covered in chapter 4.
3.1.3 Program Structure
The pseudocode in listing 3.1 defines the general flow of RTAP-numerical. Upcoming
sections will describe the capability of RTAP-numerical to manage blocks of segmented
windows from the OME to ANSP stage. The flowcharts used in describing these algorithms
in the sections to come accommodate only one BF channel. This is done to ensure that the
responses of these stages comprehensively match the responses of the MAP model. For
multiple BF channels, these flowcharts can be implemented and diagrammatically projected
in parallel though they are not presented in this thesis. Multiple BF channel algorithms, which
are implemented with for loops in C and C++ will have its responses demonstrated in the
next chapter.
for (t=StartSampleData; t<(WindowFrameSizePerBF+StartSampleData); t++) {
// Generate the 500Hz sine tone sample-by-sample
x[t] = 0; x[t] = sin(2*PI*ToneFreq[i]*t/fs) * PeakAmplitude;
// Offset to the next segment within buffer StartSampleData += WindowFrameSizePerBF;
41
// OME functions
ExternalEarResonances ( pEar );
TMdisplacement ( pEar );
StapesInertia ( pEar );
// Use pre-processor directives to select an algorithm function to run
#if MEASURE_DRNL_HIRESTIMING DRNL ( pEar );
#elif MEASURE_DRNL2IHCRP_HIRESTIMING
DRNL_to_IHC_RP ( pEar ); #elif MEASURE_DRNL2IHCPRESYNAPSE_HIRESTIMING
DRNL_to_NRR ( pEar );
#elif MEASURE_DRNL2AN_HIRESTIMING DRNL_to_ANSP( pEar );
#endif
}
Listing 3.1: RTAP-numerical program structure.
3.2 Parameters Setup
The setup of MAP and RTAP-numerical for data acquisition is presented in table 3.3
while the settings of parameters used for the algorithms are projected in table A.1. For the
purpose of comparing responses between MAP and RTAP-numerical, a 500 Hz pure sine
tone input signal is used. One contrast in the settings of MAP and RTAP-numerical is the
number of window frames hosting the input stimulus. In MAP, the sine tone input is streamed
from a single window whereas RTAP-numerical relies on five equal sized segmented
windows. The five windows in RTAP-numerical are streamed into the algorithm functions
one after the other and serve as an indication of sampled audio data availability of unfixed
time duration. All results obtained from MAP and RTAP-numerical are stored in a numerical
format in a text file. Graphical illustrations of these results are done offline in a spread sheet
and are included within this chapter in the different sections below.
42
Settings MAP RTAP-Numerical
Stimulus frequency 500 Hz sine tone 500 Hz sine tone
Stimulus levels 50 dB SPL 50 dB SPL
Number of window frames 1 5
Size of window frame 220 44
Sampling rate 22050 Hz 22050 Hz
Duration of Signal Acquired 10ms 10ms
Response Signal BF 250 Hz 250 Hz
Number of BF channels 1 1
Number of AN channels 2 2
Table 3.3: Input settings of MAP and RTAP-numerical.
3.3 IIR Filter
3.3.1 Background
MAP uses a built-in filter function command in the Matlab environment. This
command function is implemented as an infinite impulse response (IIR) filter and is utilised in
the computations of external ear response, tympanic membrane and stapes displacements
of the OME stage as well as gammatone filter and IHC cilia displacement that are part of BM
displacement and IHCRP responses respectively. In the Matlab environment, the filter
command function is implemented in the form of a direct form 2 transposed filter type as
illustrated in figure 3.1 [44]. Equation 3.1 describes mathematical characteristic of the filter
command function.
�(@) = �i0j ∗ ^i@j + �i1j ∗ ^i@ − 1j + �i2j ∗ ^i@ − 2j + ⋯ + �i@�j ∗ ^i@ − @�j −
ciHj∗�iR7Hjcitj − ci)j∗�iR7)jcitj − ⋯ − ciRcj∗�iR7RIjcitj (Eqn. 3.1)
@ − 1 where is the filter order;
@� is the feedforward filter order;
@< is the feedback filter order.
The filter possesses several delay nodes denoted by z-1 that can be treated as
boundaries when the various stages of the filter are analysed. Input data, x[m] is streamed
into all the coefficient nodes denoted by b[0], b[1], b[2] up to b[nb]. Assuming that all delay
43
coefficients are initialised to zero, the initial output denoted by y[0] is dependent on the first
data sample, x[0], scaled by b[0]. At the arrival of the second input sampled data, x[1], b[1]
and –a[1] coefficients along with b[0] coefficient are required to generate y[1]. Hence, output,
y[1] is a result of the delayed manipulation of x[0] at the b[1] branch and y[0] at the -a[1]
branch added with x[1] that is scaled by b[0]. In this manner, the mth sampled output, y[m], is
dependent on the sum of x[m] to x[m-nb] range of inputs and y[m-1] to y[m-na] range of
outputs scaled by coefficients ranging from b[0] to b[nb] and a[0] to -a[na] respectively.
Figure 3.1: Direct form type 2 IIR filter implemented by Matlab filter command.
3.3.2 Implementation
The characteristic of the IIR filter is defined by a difference equation and the
coefficients b and a are parts of the numerator and denominator of the equation. As such,
the order of the filter can be segregated into two computational phases based on numerator
and denominator orders. The computation can be further broken down into two additional
parts. The first part involves the initial phase where the number of sampled input data, x,
streamed into the filter is smaller than the larger of either numerator or denominator order.
The second part involves the number of sampled data streamed into the filter being equal to
the larger of either the numerator or denominator order. The third and fourth parts have
exactly the same properties as the first two parts with the exception of the input data, x,
replaced by the output data, y. The breakdown of the IIR filter computation into four parts
offers a prudent manner of debugging the code to attain a response matching that of filter
command function in the Matlab environment.
Figure 3.2 elaborates further on the expected IIR filter implementation in RTAP-
numerical based on the four phases described in the previous paragraph. In part (a) of
Figure 3.2, the initial output value, y[0] is computed with only the b[0] and a[0] coefficients.
The subsequent output, y[1], in Figure 3.2 (b) is calculated with coefficients , b[1] and a[1]
included in the computation along with b[0] and a[0] after encountering a delay following the
y[m]
� � � �
b[0] b[1] b[2] b[nb] ...
x[m]
-a[1] -a[2] -a[na] ...
3Hi=j 3)i=j 3R7Hi=j a[0]-1
z-1 z-1 z-1
44
initial computation of y[0]. Similarly, in Figure 3.2 (c) coefficients, b[2] and a[2] are included
along with the coefficients b[0], a[0], b[1] and a[1] to attain y[3]. When all the coefficients are
used in the computation of the output as in Figure 3.2 (d), the filter then is defined to be in
the second phase of computation. The pseudocode depicting the general IIR filter
characteristic is shown in listing 3.2.
(a)
(b)
(c)
+
x[0]
y[0]
b[0
a[0]-1
b[nb] b[2] b[1]
a[1] a[na] a[2]
+ + +
...
...
...
... m = 0
((m+1)th sample, 1st frame)
+ +
x[1]
y[1]
b[0]
a[0]-1
b[nb] b[2] b[1]
a[1] a[na] a[2]
x[0]
+ +
...
...
...
... m = 1
((m+1)th sample, 1st frame)
y[0]
+ +
x[2]
y[2]
b[0]
a[0]-1
b[nb] b[2] b[1]
a[1] a[na] a[2]
x[1] x[0]
+
...
...
...
... m = 2
((m+1)th sample, 1st frame)
y[1]
+
y[0]
45
(d)
Figure 3.2: Induction of numerator and denominator coefficients at the initial phase of input
data sample streamed into IIR filter algorithm.
// (I) Initial input data stream computation for first time frame block
// 1) Calculate output, y[0]. for loop below will take care of y[1] onwards
y[0] = (b[0] * x[0] / a[0]) + zi[0];
for (i=1; i<FirstLoopCap; i++)
{
y[i] = 0.0; // initialise y[i] to 0 first
// 2) Numerator order increases for every iteration of the for loop above
if (i < NumeratorOrder) // use the increment, i to set the for loop boundary based on b coefficients
temp = i + 1;
else // if i has increased to length(b) or more, cap upcoming for loop boundary
// to numerator order
temp = NumeratorOrder;
// 3) Output, y[i], computation based only on numerator coefficient, b.
for (j=0; j<temp; j++)
y[i] = y[i] + (b[j] * x[i-j]);
// 4) Denominator order increases for every iteration of the for loop above
if (i < DenominatorOrder) temp = i;
else
temp = DenominatorOrder - 1;
// 5) Follow up on y[i] computation based only on denominator coefficient, a.
for (j=0; j<temp; j++)
y[i] = y[i] - (a[j+1] * y[i-j-1]);
// 6) Account for final denominator coefficient, a[0], before final output
y[i] = y[i] / a[0]; }
// (II) End of initial part, second phase handles the rest of the input stream
for (i=uiFirstLoopCap;i< WindowFrameSizePerBF;i++)
{
...
...
...
...
+ +
x[m]
y[m
b[0]
a[0]-1
b[nb] b[2] b[1]
a[1] a[na] a[2]
x[m-1] x[m-2] x[m-nb] m = m
((m+1)th sample, any frame)
y[m-1]
+
y[m-2]
+
y[m-na] ...
46
y[i]=0.0f;
// 7) Compute y[i] with respect to all the numerator coefficients
for (j=0;j<NumeratorOrder;j++)
y[i] = y[i] + (b[j] * x[i-j]);
// 8) Compute y[i] with respect to all the denominator coefficients
for (j=0;j<(DenominatorOrder-1);j++)
y[i] = y[i] - (a[j+1] * y[i-j-1]);
// 9) Ensure y[i] is scaled by a[0] at the end of computation
y[i] = y[i] / a[0]; }
Listing 3.2: IIR filter.
3.4 Outer and Middle Ear
The IIR filter discussed in the previous section is able to process a window frame of
data for one BF channel. In real-time processing, multiple window frames of data are
required to be streamed into the IIR filter sequentially. As the computation of a typical output
response with an IIR filter requires the use of past inputs and outputs, RTAP-numerical must
be able to handle the transition between subsequent window frames. This means that
RTAP-numerical must store the cluster of input and output data just before the end of a
window frame and load these parameters for computing output response on the subsequent
window frame. In this way, though the signals are broken down and segmented, the
processed data from the algorithm when integrated from multiple window frames output
response will be able to generate a continuous signal.
The setup of the IIR filter in section 3.2 is insufficient to compute transition between
window frames of data. Hence, the IIR filter is required to be retrofitted with additional code
to save input and output parameters at the conclusion of a window frame of data and load
these parameters at the start of the subsequent time frame. The pseudocode for the load
and saving features within the IIR filter along with the explanative comments are illustrated in
listing 3.3. The external ear resonance (EER), tympanic membrane (TM) and stapes
displacement in the outer and middle ear (OME) stage as well as the inner hair cell (IHC)
cilia displacement computation utilise the parameter save and load feature in listing 3.3 to
achieve continuity in responses between adjacent window frames.
// (I) Parameter save feature in IIR filter within preceding time frame.
// 1) Save (NumeratorOrder – 1) number of input parameters, x, towards the end of
// time frame. for (i=0; i<(NumeratorOrder-1); i++)
{
// 1a) Save in a global array, prevX[], for use later. prevX[i] = x[WindowFrameSizePerBF - 1 - i];
47
}
// 2) Save (DenominatorOrder – 1) number of output parameters, y, towards the end of // time frame.
for (i=0; i< DenominatorOrder-1; i++)
{ // 2a) Save in a global arrat, prevY[], for use later.
prevY[i] = y[WindowFrameSizePerBF - 1 - i];
}
// -----------------------------------------------------------------------------------
// (II) Parameter load feature in IIR filter within subsequent time frame.
// 3) This segment is similar to part 1 in listing 3.2 except that an additional if
// condition is added to switch between these 2 segments. for (i=0; i<FirstLoopCap; i++)
{
y[i] = 0.0f;
// 4) Compute y[i] with numerator coefficients and past input parameters from
// current time frame.
for (j=0; j<NumeratorCap1; j++) {
y[i] = y[i] + (b[j] * x[i-j]);
CarryForward = j; }
// 5) Increase the cap for number of coefficients and past input parameters if (NumeratorCap1 < NumeratorOrder)
NumeratorCap1++;
// 6) Compute y[i] by accounting for past input parameters from preceding time // frame.
for (j=0; j<NumeratorCap2; j++)
{ // Part (6a)
y[i] = y[i] + (b[CarryForward+j+1] * prevX[j]);
// Reduce the dependency on the past input parameters from preceding time // frame.
NumeratorCap2--;
}
// 7) Compute y[i] with denominator coefficients and past output parameters from
// current time frame.
for (j=0; j<DenominatorCap1;j++) {
y[i] = y[i] - (a[j+1] * y[i-j-1]);
CarryForward = j+1; }
// 8) Increase the cap for number of coefficients and past output parameters
if (DenominatorCap1 < DenominatorOrder-1) DenominatorCap1++;
// 9) Compute y[i] by accounting for past output parameters from preceding time // frame.
for (j=0; j<DenominatorCap2; j++)
{ // Part (9a)
y[i] = y[i] - (a[j+CarryForward+1] * prevY[j]);
// Reduce the dependency on the past output parameters from preceding time // frame.
48
DenominatorCap2--;
} }
Listing 3.3: Input and output parameters save and load feature in the IIR filter.
The OME stage consists of a serial cascade of three 1st-order filters. In RTAP-
numerical, each of these three filters invokes the IIR filter function, DFT2filter, described in
section 3.3 along with window continuity features from listing 3.3 to process a stream of
auditory stimulus. The response of the third filter is then scaled to produce stapes
displacement in metres. This scaling is performed in a recursive for loop. Within the same
recursive loop, dual resonance nonlinear (DRNL) algorithm buffers for the linear and
nonlinear pathway are assigned to an initial value as part of preparation to invoke DRNL
function for basilar membrane (BM) displacement computations. The initialisation of
parameters in the DRNL buffers is done within the OME function so as to reduce processing
time incurred in the DRNL function. Since a recursive loop is required to initialise the buffers
as well as compute the stapes displacement for the length of audio data block for a window
frame, it is computationally beneficial to place these tasks in one recursive loop instead of
two separate loops in OME and DRNL respectively. Figure 3.3 illustrates the OME structure
for processing sampled audio data from three window frames.
49
Figure 3.3: OME processing of multiple window frames.
The blue ring on the top and bottom ends of the filter blocks in window frames 2 and
3 represent the reliance on the past inputs and outputs from preceding window frames.
Hence, computation of initial outputs in window frame 2 relies on inputs and outputs from
window frame 1 and subsequently, window frame 3 relies on saved parameters from window
frame 2. Though, illustration ends at window frame 3, in the actual real-time implementation
computing continues with identical settings of storing of past input and output parameters
from the preceding window frame and loading them into the subsequent window frame for
output response computation. Figure 3.4 displays stapes displacement generated in MAP as
well as in RTAP-numerical with the auditory stimulus settings given in table 3.1. The
normalised root mean squared (RMS) error between the MAP and RTAP-numerical
generated stapes displacement stands at 4.8e-7 as acquired from equation 3.9. Section 3.9
++++ ++++ ++++
Stimulus
window 1 Stimulus
window 2 Stimulus
window 3
Stapes displacement
window 1
Stapes displacement
window 2
Stapes displacement
window 3
EER computation
TM displacement computation
Stapes displacement computation
1st-order BPF
1st-order LPF
1st-order HPF
1st-order BPF PrevX
PrevY
1st-order LPF PrevX
PrevY
1st-order HPF PrevX
PrevY
1st-order BPF
PrevX
PrevY
1st-order LPF PrevX
PrevY
1st-order HPF PrevX
PrevY
50
provides a detailed description of the computation of the normalised RMS error for all stages
of the AP model.
Figure 3.4: Stapes displacement response in MAP and RTAP-numerical.
3.5 Basilar Membrane
The best frequency (BF) sites in MAP are calculated using logspace command
function in the Matlab environment. Along with the minimum BF, BFmin, and maximum BF,
BFmax, the number of BF channels is also required to be specified when invoking logspace in
Matlab. In order to compute logarithmic spacing of all of the BF components within the range
of BFmin and BFmax, it is essential to translate logspace command function from MAP on to
RTAP-numerical. The calculation of BF components from logspace command function is
given by equation 3.2 [45].
PS5 = 10i906q!(f�w��)|5∗o/�(j (Eqn. 3.2)
where BFi is logarithmically spaced BF component in the range of BFmin and BFmax.
-4.00E-11
-3.00E-11
-2.00E-11
-1.00E-11
0.00E+00
1.00E-11
2.00E-11
3.00E-11
4.00E-11
0 0.002 0.004 0.006 0.008 0.01 0.012
Dis
pla
cem
en
t (m
etr
es)
Time
OME Response
MAP
RTAP-numerical
51
i is the index of frequencies ranging from 1 to the number of BF channels;
step is a logarithmic increment from BFmin to BFmax defined by equation 3.3.
D>�" = 906q!(f�w��)7906q!(f�w��)
@ 7H (Eqn. 3.3) PS
where @ is the number of BF channels required in the computation. The BF components f�are computed using logspace function before the DRNL algorithm is processed. Upon the
invocation of DRNL function in RTAP-numerical, DRNL response computation is done for
every BF channel within a recursive loop that spans up to the number of BF channels. The
pseudocode listing for DRNL computation is as follows:
// 1) Logspace computation for BF components at the start before DRNL() is called
RTAPlogspace ( BFlist, MinBF, MaxBF, NumBFchannels );
...
// 2) Compute DRNL response for parallel number of BF channels for (i=0; i<NumBFchannels; i++)
{
// 3) Compute gammatone response for linear pathway Gammatone ( &Linear_Gammatone[i], Linear_Gammatone_Order );
// 4) Compute first pass of gammatone filter in the nonlinear pathway Gammatone ( &Nonlinear_Gammatone[i], Nonlinear_Gammatone_Order );
// 5) Compute the nonlinear compression based on the stimulus input level (dB SPL)
DRNL_brokenstick_nl( DRNL_nonlin_Input[i], DRNL_nonlin_Input[i]);
// 6) Compute the second pass of gammatone filter in the nonlinear pathway
Gammatone ( &Nonlinear_Gammatone[i], Nonlinear_Gammatone_Order );
// 7) Compute DRNL response by summing the linear and nonlinear pathway response
for (j=0; j< WindowFrameSizePerBF; j++) {
DRNL_response[i][j] = DRNL_linear_Output[i][j] + DRNL_nonlinear_Output[i][j];
}
Listing 3.4: DRNL computation.
The linear path of the dual resonance nonlinear (DRNL) filter consists of one 3rd-
order gammatone filter while the nonlinear path contains two 3rd-order gammatone filters that
cascade a nonlinear function in the middle. A 3rd-order gammatone filter implemented in
MAP includes three passes of the filter command function in the Matlab environment. A
similar arrangement is necessary to attain a 3rd-order gammatone filter in RTAP-numerical.
The major difference between MAP and RTAP-numerical is the utilisation of buffers. In
RTAP-numerical, all buffers are explicitly allocated and to achieve a 3rd-order IIR filter effect,
the input and output buffers to the gammatone filters are swapped at every repetition of a
52
recursive loop defined by the filter order. Figure 3.5 illustrates the input and output buffer
swaps within the gammatone filter function in C.
Figure 3.5: 3rd-order gammatone filter implementation in RTAP-numerical.
Unlike the 1st-order filters within the OME and IHC stages where only one set of input
and output parameters require buffering, DRNL in real-time require multiple sub-stage
buffers due largely to the number of passes required to generate basilar membrane (BM)
displacement response. The setup of figure 3.5 ensures that only a bare minimum of two
buffer types are required to facilitate high filter orders without relying on additional buffers to
store intermediate filter responses with respect to the number of filter algorithm passes.
Figure 3.6 illustrates the structure of DRNL filter to compute the BM response of first three
window frames. Computation of any other window frames beyond the third window frame
follows the same setup of either time frame 2 or 3.
IIR filter 2nd pass
IIR filter 1st pass
Input Buffer
Input Buffer
Output Buffer
Output Buffer
IIR filter 3rd pass
1st-order IIR filter
algorithm
1st-order IIR filter
algorithm
1st-order IIR filter
algorithm
53
Figure 3.6: DRNL filter processing for 1 BF channel in multiple window frames.
Nodes ‘G1’, ‘G2’ and ‘G3’ in Figure 3.6 represent a 3rd-order gammatone filter in both
the linear and nonlinear pathway. Nodes ‘G4’, ‘G5’ and ‘G6’ represent a 3rd-order
gammatone filter in only the nonlinear pathway. Node ‘NL’ characterises the nonlinear
G1
G2
G3
G1
G2
G3
NL
G4
G5
G6
G1
G2
G3
G1
G2
G3
NL
G4
G5
G6
G1
G2
G3
G1
G2
G3
NL
G4
G5
G6
SW1 SW2 SW3 Stapes
displacement, S
prevG6X prevG6X
prevS prevS
prevG1Y prevG1Y
prevG2Y prevG2Y
prevG2X prevG2X
prevG3X prevG3X
prevG3Y prevG3Y
prevG4X prevG4X
prevG5X prevG5X prevG4Y prevG4Y
prevG5Y prevG5Y
prevG6Y prevG6Y
BMW1 BMW2 BMW3
Basilar membrane
displacement, BM
++++ ++++ ++++
54
compressive function in the DRNL that is computed based on the auditory stimulus level
setting. The use of additional buffers to store past inputs and outputs from preceding window
frame is evident from the illustration. In the case of gammatone filters denoted by ‘G1’, ‘G2’
and ‘G3’ in both pathways, past input and output buffers are differentiated based on linear
and nonlinear pathways. Hence, two sets of identical buffers exist from prevS up to prevG3Y
that are used for linear and nonlinear pathways. Only one instance of buffer is allocated from
prev4GX onwards in the nonlinear pathway due to the absence of a second 3rd-order
gammatone filter in the linear pathway. Since the nonlinear compressive algorithm in MAP
is a memoryless implementation, it does not depend on any past parameters. Hence no
buffers are allocated for it.
Figure 3.7 presents the DRNL linear, nonlinear and summed responses as well as
IHC cilia displacement in MAP and RTAP-numerical based on the stimulus settings given in
table 3.1. The linear and nonlinear DRNL responses are shown as references to ensure that
the DRNL summed response is computed without flaws. The normalised RMS error of the
BM displacement represented by the DRNL summed responses of MAP and RTAP-
numerical is 5.4e-6. The IHC cilia displacement is included in this section because its
amplitude is measured in the same measurement units as BM displacement and it also
provides an indication of the integrity of the input signal to the IHC receptor potential
algorithm.
55
Figure 3.7: BM and IHC cilia displacement responses generated by MAP and RTAP-numerical.
-6.00E-08
-4.00E-08
-2.00E-08
0.00E+00
2.00E-08
4.00E-08
6.00E-08
0 0.002 0.004 0.006 0.008 0.01 0.012
Dsi
pla
cem
en
t (m
etr
es)
Time
MAP DRNL & IHC Response
Summed DRNL
Lin DRNL Resp
Nonlin DRNL Resp
IHC Cilia Displacement
-6.00E-08
-4.00E-08
-2.00E-08
0.00E+00
2.00E-08
4.00E-08
6.00E-08
0 0.002 0.004 0.006 0.008 0.01 0.012
Dis
pla
cem
en
t (m
etr
es)
Time
RTAP-numerical DRNL & IHC Response
Summed DRNL
Lin DRNL Resp
56
3.6 Inner Hair Cell Receptor Potential
The inner hair cell (IHC) stage starts with the computation of the IHC cilia
displacement using a 1st-order high-pass filter. The IIR filter is utilised and its input is the BM
displacement response discussed in section 3.5 and computed for various BF channels
using parallel DRNL structure. After the IHC cilia displacement is computed, a recursive loop
is entered where the apical conductance, G(u), and IHC receptor potential (IHCRP) are
computed using equations 2.13 and 2.15. Figure 3.8 displays the computation of IHCRP
response for the first three time frames upon start of RTAP-numerical.
Figure 3.8: IHCRP algorithm processing for 1 BF channel in multiple window frames.
BM window 1
BM window 2
BM window 3
VIHCRP
window 1 VIHCRP
window 2 VIHCRP
window 3
Yes
1st-order HPF
Apical conductance, G(u),
computation
VIHCRP
computation
z-1
Vinitial
All data within 1 time frame processed?
No
VIHCRP
computation
z-1
All data within 1 time frame processed?
1st-order HPF
Apical conductance, G(u),
computation
No
Yes
Apical conductance, G(u),
computation
Yes
VIHCRP
computation
z-1
All data within 1 time frame processed?
1st-order HPF
No
57
To compute the very first IHCRP in the first window frame requires an initial IHCRP.
This initial value is essentially the resting potential of IHC within the BF channel defined by
equation 3.4.
u5R5/5c9 = l#x#′ +l(B)0x>l#+l(B)0 (Eqn. 3.4)
where x+y and l(B)t are defined in equations 2.16 and 2.13 respectively. All computations
apart from the first IHCRP computation from the first window frame are done using the
immediate past IHCRP value. This is denoted by the feedback of the IHCRP to the IHCRP
computation node. At the end of a window frame, the last IHCRP value computed is fed
forward to the subsequent window frame as its initial value after a 1-sample delay signified
by a z-1 node. A recursive loop ensures that the computations of IHCRPs are performed for
the size of a window frame for every BF channel before exiting. Figure 3.9 displays the
IHCRP for MAP and RTAP-numerical for the input stimulus settings given in table 3.1. The
normalised RMS error between the two signals in figure 3.9 is 5.1e-6.
Figure 3.9: IHC receptor potential response generated by MAP and RTAP-numerical.
3.7 Neurotransmitter Release Rate
The neurotransmitter release rate (NRR) stage introduces the low spontaneous rate
(LSR) and high spontaneous rate (HSR) AN fibres. A HSR AN fibre is more responsive at
the linear phase of BM response than a LSR AN fibre [46]. The LSR AN fibre whereas, are
responsive at the compressive phase of the BM response [47]. This means that a change in
-7.00E-02
-6.50E-02
-6.00E-02
-5.50E-02
-5.00E-02
-4.50E-02
0 0.002 0.004 0.006 0.008 0.01 0.012
Vo
lta
ge
(V
)
Time
IHC Receptor Potential Response
MAP
RTAP-numerical
58
the auditory stimulus level results in subdued BM vibrational motion due to its effect of
compressive effects, which in turn influences reduced firing rate response from the LSR AN
fibre [48]. The LSR and HSR fibres make two AN channels for a unique BF channel.
Figure 3.10: NRR algorithm processing for 1 AN channel in multiple window frames.
VIHCRP
window 1 VIHCRP
window 2 VIHCRP
window 3
NRR
window 1 NRR
window 2 NRR
window 3
mICa-INF computation
mICa computation
ICa computation
Ca2+ concentration computation
z-1
mICa0
z-1
U<t)|
NRR computation
All data within 1 time frame processed?
No
Yes
mICa-INF computation
mICa computation
ICa computation
Ca2+ concentration computation
NRR computation
z-1
z-1
All data within 1 time frame processed?
No
Yes
mICa-INF computation
mICa computation
ICa computation
Ca2+ concentration computation
NRR computation
z-1
z-1
All data within 1 time frame processed?
No
Yes
59
The NRR algorithm is fed directly with data output from the IHCRP stage within the
same recursive loop described in section 3.7. The recursive loop takes into account of LSR
and HSR AN fibre types as well. In the first window frame, the first steady state fraction of
calcium channels opened parameter, =�X�,� , is computed using equation 2.19 with Vinitial tcalculated from equation 3.4. From =�X�,�t and calcium current, ICa0, the initial calcium
concentration, U<t)|, is computed. During the runtime of RTAP-numerical, =�X�,� and Ca2+
are updated by using their respective past immediate parameters as denoted by the
feedback pathway in figure 3.10. The z-1 delay nodes represent the same feedback
parameters from preceding window frame to the next. Figures 3.11 and 3.12 demonstrate
the responses from MAP and RTAP-numerical for LSR and HSR AN fibre types computed
based on the input parameters in table 3.1. The normalised RMS error for the LSR and HSR
plots in figures 3.11 and 3.12 are 8.0e-6 and 7.8e-6 respectively.
Figure 3.11: LSR AN fibre neurotransmitter vesicle release rate for MAP and RTAP-numerical.
0
2
4
6
8
10
12
14
16
18
0 0.002 0.004 0.006 0.008 0.01 0.012
Ne
uro
tra
nsm
itte
r R
ele
ase
Ra
te
Time
IHC Neurotransmitter Vesicle Release Rate
(LSR)
MAP
RTAP-numerical
60
Figure 3.12: HSR AN fibre neurotransmitter vesicle release rate for MAP and RTAP-numerical.
3.8 Auditory Nerve Spiking Probability
The auditory nerve spiking probability (ANSP) computation is done based on the
number of auditory nerve (AN) channels, which is a product of the number of BF channels
and the number of AN fibre types. The algorithm can be segmented into four phases. The
first phase of the algorithm starts off by entering a recursive for loop to compute the
neurotransmitter vesicle replenish rate from the factory, ejection rate from the IHC,
reprocessing rate, reuptake and loss rates that are defined by equations 2.22, 2.23 and
2.24. These parameters are categorised based on the lack of dependency on their
respective past parameters. The second computing phase requires the use of past
immediate parameters and includes the computation of the quantity of neurotransmitter
vesicles in the immediate and reprocessing stores within the IHC as well as the synaptic cleft
provided by the equations 2.22 to 2.24. As the second phase computation requires the use
of past immediate values, an initial value is required to compute the first output data for the
first window frame for HSR and LSR fibre types. This initial quantity of neurotransmitter
vesicles are computed as follows:
"UC�&>t = #0�V�(�+})+#0� (Eqn. 3.5)
0
50
100
150
200
250
300
350
0 0.002 0.004 0.006 0.008 0.01 0.012
Ne
uro
tra
nsm
itte
r R
ele
ase
Ra
te
Time
IHC Neurotransmitter Vesicle Release Rate
(HSR)
MAP
RTAP-numerical
61
"F$<AC<�C�t = "UC�&>0(�+})#0 (Eqn. 3.6)
"}�"E?a�DDt = "UC�&>0}� (Eqn. 3.7)
where pCleft0 is the initial probability of neurotransmitter vesicles in the synaptic cleft;
pAvailabel0 is the initial probability of neurotransmitter vesicles in the immediate store within
the IHC;
pReprocess0 is the initial probability of neurotransmitter vesicles in the reprocessing store
within the IHC;
Y is the depleted neurotransmitter vesicle replacement rate in the factory set as a constant
of 6;
M is the maximum neurotransmitter vesicles at the synapse set as a constant of 12;
L is the neurotransmitter vesicle loss rate in the synaptic cleft set as a constant of 250;
R is the neurotransmitter vesicle reuptake rate from the synaptic cleft into the IHC set as a
constant of 500;
X is the neurotransmitter vesicle replenishment rate from the reprocessing store set as a
constant of 60;
k0 is the initial neurotransmitter vesicle release rate computed with equation 2.21.
Subsequent computation of the second phase variables after the initial computation involves
the use of the immediate past variables, pCleftn-1, pAvailablen-1 and pReprocessn-1.
62
Figure 3.13: ANSP processing for 1 AN channel in multiple window frames.
NRR
window 1 NRR
window 2 NRR
window 3
ANSP
window 1 ANSP
window 2 ANSP
window 3
All data within 1 time frame processed?
Neurotransmitter vesicles replenishment ejection, reuptake and loss rate
computation
Neurotransmitter vesicles
reprocessing and quantity in IHC and
synaptic cleft computation
z-1
Initial variables
AN output rate probability
computation
AN output not firing
probability computation
No
Yes
PrevANprobOutputRate
Neurotransmitter vesicles
reprocessing and quantity in IHC and
synaptic cleft computation
AN output rate probability
computation
AN output not firing
probability computation
PrevANprobNotFiring
Neurotransmitter vesicles replenishment ejection, reuptake and loss rate
computation
All data within 1 time frame processed?
No
Yes
z-1
PrevANprobNotFiring
PrevANprobOutputRate
Neurotransmitter vesicles
reprocessing and quantity in IHC and
synaptic cleft computation
AN output rate probability
computation
AN output not firing
probability computation
All data within 1 time frame processed?
Neurotransmitter vesicles replenishment ejection, reuptake and loss rate
computation
No
Yes
z-1
63
The third and fourth phases within ANSP algorithm include the computation of the
probabilities of AN spiking and AN not spiking rates respectively. Because the recurrence of
the AN spike rate is dependent on the refractory period, both the parameters from the third
and fourth phases require buffering for holding past parameters from preceding window
frames. As part of the effect of the refractory period where the likelihood of AN spiking
occurring diminishes, the number of repetitions within the recursive for loop that are
overlooked is computed by the formula as follows:
AM�,Mc-/0M� = >E�&E<a>?E�>D (Eqn. 3.8)
where irefractory is the index that is accounted in the for loop that defines the refractory period;
trefractory is the refractory period defined as a constant of 0.75ms;
ts is the sampling period of the auditory stimulus that is set according to a default sampling
rate of 22.05 KHz. Listing 3.5 characterises the pseudocode for computing probabilities of
AN spiking and not spiking rates for multiple window frames. Figures 3.14 and 3.15 illustrate
probabilities of AN spiking for LSR and HSR AN fibres for an input stimulus defined by the
settings in table 3.1 respectively. The normalised RMS errors between MAP and RTAP-
numerical generated responses for LSR and HSR plots in figures 3.14 and 3.15 stand at
3.1e-5 and 5.6e-6 respectively.
for (i=0; i < NumBFchannels; i++)
{
for (j=0; j < WindowFrameSizePerBF; j++)
{ for (k=0; k < Number_of_AN_fibre_types; k++)
{
// 1) Compute 1st and 2
nd phases of ANSP algorithm
...
// 2) Calculate the probability of AN spiking and also apply refractory effect.
// Condition 2a: Initial AN spiking probability computation in 1st window frame.
if ((!j) && (First_Window[i][k]))
{
// Since the probability of AN spiking depends on past parameter of
// probability of AN not spiking and this is the very 1st parameter computing
// done here for it, set immediate past parameter as 1.
pANspiking[k][j] = prob_Ejected[k] / ts;
} // Condition 2b: Initial AN spiking probability computation in 2
nd window frame
// onwards.
else if ((!j) && (!First_Window[i][k])) {
// Past parameter of probability of AN not spiking held in preceding
// window frame for first sample computation of any window frame so look into
// necessary buffer as follows: pANspiking[k][j] = prob_Ejected[k] * Prev_pANnotFiring[i][k] / ts;
}
// Condition 2c: For all other sample computation from any window frame. else
64
{
// Compute probability of AN spiking based on neurotransmitter release // reflected in prob_Ejected variable.
pANspiking[k][j] = prob_Ejected[k] * pANnotFiring[k][j-1] / ts;
}
// 2) Update the probability of AN not firing once the input index, j, has moved
// past refractory index. This adds recent & removes distant probabilities so as
// to reduce the fraction of spiking events during the refractory period. // Condition 2a: Compute for input index that has moved past refractory period.
if (j > Refractory_index)
{ // Compute for all window frames.
pANnotFiring[k][j] = pANnotFiring[k][j-1] * (1 – pANspiking[k][j] * ts)
/ (1 – pANspiking[k][j – Refractory_index-1] * ts); }
// Condition 2b: For 1st sample computation for any window frame apart from the 1
st
// window.
else if ((!pEar->pAN_IHCsynapse->bFirstWindow[i][k]) && (!j)) {
// Reliance on past parameter held in preceding window frame.
pANnotFiring[k][j] = Prev_pANnotFiring[i][k] * (1 – pANspiking[k][j] * ts) / (1 – Prev_pANspiking[i][k][j] * ts);
}
// Condition 2c: Computation for first window frame within the refractory period. else if ((bFirstWindow[i][k] == 0) && (j > 0) && (j <= Refractory_index))
{
pANnotFiring[k][j] = pANnotFiring[k][j-1] * (1 – pANspiking[k][j] * ts) / (1 - Prev_pANspiking[i][k][j] * ts);
}
// 3) Save past parameters for use in subsequent window frames. if (j >= (WindowFrameSizePerBF – Refractory_index - 1))
Prev_pANspiking[i][k][j-(WindowFrameSizePerBF –
Refractory_index-1)] = pANspiking[k][j]; if (j == (WindowFrameSizePerBF - 1))
Prev_pANnotFiring[i][k] = pANnotFiring[k][j];
} }
}
Listing 3.5: AN spiking and non-spiking probabilities for LSR and HSR AN fibre types.
65
Figure 3.14: Probability of AN spiking in LSR fibre for MAP and RTAP-numerical.
Figure 3.15: Probability of AN spiking in HSR fibre for MAP and RTAP-numerical.
3.9 Characteristic Responses for Various Input Settings
The responses for various stages of MAP and RTAP-numerical have only relied on
one input stimulus setting so far. However, before integrating the algorithms in RTAP-
numerical into a real-time implementation, it had to be known whether RTAP-numerical is
able to generate the responses for various input settings. These input settings described
0
20
40
60
80
100
120
140
160
180
200
0 0.002 0.004 0.006 0.008 0.01 0.012
AN
Sp
ikin
g P
rob
ab
ilit
y
Time
AN Spiking Probability (LSR)
MAP
RTAP-numerical
0
500
1000
1500
2000
2500
0 0.002 0.004 0.006 0.008 0.01 0.012
AN
Sp
ikin
g P
rob
ab
ilit
y
Time
AN Spiking Probability (HSR)
MAP
RTAP-numerical
66
here refers largely to the sound pressure levels (SPL) of the input stimulus measured in
decibels (dB). Responses of all stages within RTAP-numerical are recorded from OME to
ANSP stages for SPL sweeps of sine tone input stimulus ranging from 10 dB SPL to 90 dB
SPL in increments of 10 dB SPL. The input sine tone frequency was also varied to four
discrete levels of 500 Hz, 1000 Hz, 3000 Hz and 5000 Hz. Only for the 500 Hz sine tone, the
BF was set to 250 Hz based on the default settings of MAP. For all other aforementioned
sine tones, the respective BFs were selected to be close to the sine tone frequencies
calculated based on 30 logarithmic intervals from 250 Hz to 6000 Hz. Hence, the BFs for
1000 Hz, 3000 Hz and 5000 Hz sine tone inputs are 1039 Hz, 3109 Hz and 5377 Hz
respectively. The selection of the BF based on logarithmic intervals of 30 BF channels was
done to ensure that the logspace translated function in RTAP-numerical portrayed the same
behaviour as the Matlab logspace command function.
Figure 3.16: Normalised RMS errors between MAP and RTAP-numerical for a 500 Hz sine tone input observed from a 250 Hz BF channel.
0
0.000005
0.00001
0.000015
0.00002
0.000025
0.00003
0.000035
0.00004
0.000045
0.00005
0 20 40 60 80 100
No
rma
lise
d R
MS
Err
or
Input Level (dB SPL)
Normalised RMS Error: RTAP-numerical vs. MAP (500Hz sine tone)
Stapes Displacement
BM Displacement
IHCRP
NRR LSR
NRR HSR
ANSP LSR
ANSP HSR
67
Figure 3.17: Normalised RMS errors between MAP and RTAP-numerical for a 1000 Hz sine tone input observed from a 1039 Hz BF channel.
Figure 3.18: Normalised RMS errors between MAP and RTAP-numerical for a 3000 Hz sine tone input observed from a 3109 Hz BF channel.
0
0.000005
0.00001
0.000015
0.00002
0.000025
0.00003
0 20 40 60 80 100
No
rma
lise
d R
MS
Err
or
Input Level (dB SPL)
Normalised RMS Error: RTAP-numerical vs. MAP (1000Hz sine tone)
Stapes Displacement
BM Displacement
IHCRP
NRR LSR
NRR HSR
ANSP LSR
ANSP HSR
0
0.000005
0.00001
0.000015
0.00002
0.000025
0.00003
0.000035
0 20 40 60 80 100
No
rma
lise
d R
MS
Err
or
Input Level (dB SPL)
Normalised RMS Error: RTAP-numerical vs. MAP (3000Hz sine tone)
Stapes Displacement
BM Displacement
IHCRP
NRR LSR
NRR HSR
ANSP LSR
ANSP HSR
68
Figure 3.19: Normalised RMS errors between MAP and RTAP-numerical for a 5000 Hz sine tone input observed from a 5377 Hz BF channel.
Figures 3.16 to 3.19 display the normalised RMS errors that are computed using
equation 3.9. As the responses for all stages within the MAP model generate responses of
varying degrees of magnitude, a normalised form of the RMS errors are presented for all
stages within the algorithms of the auditory pathway so as to project all errors within a single
graph.
�R0Mm zg ¡ = ¢£ ]¤¥¦¤¥§ _ �¥¨q ©�w��7�w�� (Eqn. 3.9)
where enorm RMSE is the normalised RMS error;
yk is a MAP generated data indexed by k within the algorithm defining a particular stage of
auditory model;
�k is response data generated by RTAP-numerical and indexed by k that defines an
algorithm response within the auditory pathway;
ymax is the maximum output parameter within the MAP generated data set;
ymin is the minimum output parameter within the MAP generated data set;
N is the total number of data to be analysed.
0
0.00001
0.00002
0.00003
0.00004
0.00005
0.00006
0.00007
0 20 40 60 80 100
No
rma
lise
d R
MS
Err
or
Input Level (dB SPL)
Normalised RMS Error: RTAP-numerical vs. MAP (5000Hz sine tone)
Stapes Displacement
BM Displacement
IHCRP
NRR LSR
NRR HSR
ANSP LSR
ANSP HSR
69
All normalised errors are significantly below 1% that projects the capability of the
translated algorithms in RTAP-numerical to match the responses of the MAP model very
closely. The highest errors are observed for the ANSP LSR followed by ANSP HSR
algorithms. However, these large errors are symptomatic of the propagation of smaller
magnitude errors that are scaled at every stage from the OME stage onwards. The use of
single precision computation to attain results from all the stages of the AP model led to the
truncation of floating point values especially in the OME and DRNL stages. This attributed to
the small errors present in these stages. Furthermore, with the magnitude of DRNL
responses in the order of 10e-8 and that of IHCRP are more significant in the vicinity of 10e-
1. These errors are further amplified at the NRR stages and finally at the ANSP stage, where
the responses are in the range of 10e-1 to 10e2 giving rise to the highest errors among all
stages as observed in figures 3.16, 3.17, 3.18 and 3.19.
3.10 Summary
The algorithms used in MAP from the outer and middle ear (OME) to the auditory
nerve spiking probability stages (ANSP) are implemented in C platform in a program called
RTAP-numerical. This program is able to simulate any of the aforementioned stage of MAP
without running the Matlab scripts. For a sinusoidal input stimulus within the range of 10 dB
SPL to 90 dB SPL as well as for frequencies within 500 Hz to 5000 Hz injected into the
program, the responses of RTAP-numerical match the responses from MAP from the OME
stage up to the ANSP stage for LSR and HSR fibres. As a result, the algorithms in RTAP-
numerical is able to be integrated satisfactorily into a graphical user interface with real-time
processing capabilities.
70
Chapter 4: Real-time Auditory Periphery (RTAP)
A program that runs on a computer is defined as a process and a process has at
least one executing thread [49]. A thread is an object within a process whose execution can
be scheduled to run at a discrete time [49] and it also defines the basic unit of CPU
utilisation [50]. Microsoft Windows supports thread priority scheduling in which the attributes
of a process can be prioritised. This allows a thread with a ‘HIGH’ priority to be executed by
the CPU more often than a thread with ‘NORMAL’ priority. The highest priority in Windows is
‘REALTIME’, which indicates that a thread gets the undivided attention of the CPU [42].
Buffered I/O that stores the status of keyboards and mouse relies on Windows kernel
threads. A CPU intensive real-time thread, which has a higher priority than kernel thread
consumes much of the CPU time and disallows buffered I/O to be engaged. Only when the
real-time thread execution is complete is the buffered I/O serviced. Disk I/O thread that
writes bulk data to a file running concurrently with a real-time thread have their operations
interleaved. This indicates that Windows supports time-sharing execution of the disk I/O
thread and the real-time thread [42].
A real-time application in Windows must not fully exploit the CPU for its own intensive
operation. Instead, I/O and disk access must also be accounted so as to ensure
responsiveness of the mouse, keyboard and the file saving features. To do so, a heartbeat
timer is recommended to be implemented in a real-time application with a process priority
lower than ‘REALTIME’. The heartbeat timer performs periodic execution of CPU intensive
real-time processing as well as I/O and disk operation and any applicable data acquisition.
The advantage of using a heartbeat timer is that it allows a real-time thread to predictably
relinquish the CPU for other peripheral-based thread execution [42]. The heartbeat timer is
the central mechanism that synchronises threads of varying functions in real-time auditory
periphery (RTAP), which is a real-time Windows program that wraps a graphical user
interface (GUI) and an audio hybrid library called JUCE around the C translated MAP
algorithms described in chapter 3.
RTAP is able to accommodate multiple best frequency (BF) and auditory nerve (AN)
channels and is developed with single and double precision builds. Though results from only
single precision execution is displayed in this thesis as double precision responses bear
identical responses. The question remains as to the quantity of channels it can
accommodate for generating the responses of the different stages within the computer
model. These quantities will vary based on computing system hardware and software
platform. As for this project, development and testing of RTAP code occurred on machine 1.
Though for acquiring the load profiles of RTAP, machines 1 and 2 were used. The
specifications for machines 1 and 2 are displayed in figure 4.1.
71
Machine 1 Machine 2
Computer: Asus U45JC Notebook Customised desktop
Processor: Intel Core i5-460M, 2.53 GHz Intel Pentium Dual-Core E6500, 2.93
GHz
RAM: 4GB 4GB
HDD: 500GB 500GB
Graphics
Card:
nVidia GeForce 310, 1GB
VRAM
nVidia GeForce 8400GS, 512MB
VRAM
OS: Microsoft Windows 7 64-bit
Home Premium
Microsoft Windows 7 64-bit Home
Premium
Table 4.1: Computing system platform used for RTAP development and testing.
This chapter aims to describe most of the features of RTAP excluding processed
data display, mathematical optimisation and load profiles that will be covered in chapters 5
and 6 respectively. The user interfaces of RTAP will be introduced in the next section and
this will be followed by process priority scheme used by RTAP. The general structure of
RTAP will be covered thereafter as well as a sine tone generator feature. Utilisation of
threading application programming interfaces (API) will be elaborated and this will be
followed by a description of single graph plotting of multiple signals that is essential when
multiple best frequency (BF) channels are introduced. Finally, data acquisition of processed
data will be presented with a focus on window frame continuity of recorded data as well as
offline processing and formatting.
4.1 User Interface (UI)
The graphical user interface (GUI) design of RTAP was carried out in an informal
manner largely because the main emphasis of this project was to implement a real-time
model of an AP that focuses largely on algorithm processing, parallel computing through
thread utilisation and graphical representation of the response of the AP model. Although,
the GUI is necessary to hold control and feedback parameters and to display plots, its
design took a lower precedence as compared to the abovementioned key features.
However, they would be briefly discussed in this section.
RTAP was designed to be launched directly from Microsoft Windows OS by double
clicking the executable icon of RTAP. This provides a rapid manner of starting up the real-
time simulator without launching any other complementary software. The main element of
RTAP is the display window where visual representations of the auditory pathway (AP)
model response are projected regardless of graph type and display type in figure 4.1.
72
Moreover, majority of the space is dedicated to the plot display because RTAP is expected
to display graphs of AP model responses represented by numerous best frequency (BF)
channels. The height of the graph is proportional to the number of BF channels and hence, a
large number of BF channels indicate a vertically expanded graph, which in turn takes up a
large plot display area.
Besides plot display, essential parameters adjustments are necessary for RTAP to
process so that various degrees of auditory perception can be experimented with. The
essential parameters include segment selectivity within the AP model for analysis, auditory
stimulus source selectivity, AP model response display type and the usage of math
optimisation in algorithm processing. These parameters along with statuses of threads and
processing times of algorithms are necessary to provide an overview and basic control of the
real-time AP model simulation. Due to insufficient space on the main user interface (UI)
page, additional essential parameters such as the sampling frequency at which the AP
model operates, the number of BF and AN channels it accommodates and sine tone
generator control parameters are spilled over to the secondary UI page in figure 4.2. Hence,
RTAP has two UI pages that hosts majority of its main controls and feedbacks. Other
parameters are held as constants based on the original settings of the MAP model.
73
Figure 4.1: RTAP main user interface.
The four clickable buttons in figure 4.1a perform the main controls of the program.
The play button with triangle icon starts and pauses algorithm processing. The ‘Set’ button
allocates memory for algorithm processing based on the number of BF channels and AN
fibre type as well as buffers for recording and image display. Alternatively, it also initiates the
pre-computation of all coefficients and constants required for all algorithms computation as
well as set the runtime process priority of RTAP to a predefined level. The advantages of
such functionalities are to provide user driven memory allocation and pre-computation of
coefficients. In other words, the algorithms in the auditory pathway (AP) model include
computing of parameters that are not dependent on the input data. These parameters can
be pre-computed before the algorithms are executed. The ‘Play+Record’ button starts
algorithm processing and binary recording of processed data concurrently while the ‘Record’
button allows the recording of processed data into a binary file only when the algorithm
processing has been initiated with the play button.
The two combo boxes in figure 4.1b select the algorithm type to run and record while
the one in figure 4.1c selects the priority that the algorithm runs on. Input source selection of
(a)
(b) (c)
(d) (e) (f)
(g)
(h)
(i) (j)
Plot display window
74
the audio streams between a hardware microphone channel on board the computer and a
RTAP-built in software sine tone generator is made using a radio button group in figure 4.1d.
The radio button group in figure 4.1e selects either a static or scrolling response plot display
while figure 4.1f selects the response plot display type that are either spectrogram or ERB
based. The radio button in figure 4.1g toggles the use of fast exponential function in the
algorithm. Once the ‘Set’ button is clicked the priority that RTAP runs on is projected in the
space figure 4.1g. The status texts in figure 4.1h display the timings of the relevant functions
that are executed during runtime and finally, figure 4.1i feeds back thread execution status
during algorithm runtime.
Figure 4.2: RTAP user interface for setting parameters.
The secondary UI page categorised by ‘Set RTAP Parameters’ tab opens an
interface that allows the user to configure the prime parameters of the algorithm. This UI is
illustrated in figure 4.2. In the group of figure 4.2a, the combo box and sliders control the
sampling rate and input parameters for sine tone generation. The control inputs in figure
4.2b vary the number of BF channels and dual resonance nonlinear (DRNL) filter
parameters. The combo box in figure 4.2c sets the number of AN fibre types. Other
(a)
(b)
(c)
75
parameters within the IHC, NRR and ANSP stages are set to their default settings identical
to the MAP model for the purpose of response verification.
4.2 Process Priority
As indicated in the introduction of this chapter, all programs that run on an OS run
based on a priority. Process priority defines the degree of CPU attention that is gained by a
program. Similarly, RTAP adopts four different process priorities when running on Windows
OS. The four priorities adopted by RTAP from the highest order include ‘REALTIME’, ‘HIGH’,
‘ABOVE NORMAL’ and ‘NORMAL’. Though there are more process categories below
‘Normal’ priority these are omitted in the development of RTAP as they lower the CPU
attention that RTAP gets thereby reducing its real-time execution capabilities. By default,
RTAP launches as ‘NORMAL’ priority but as soon as the ‘Set’ button is clicked, the priority
changes to one of the four aforementioned priority levels based on the user setting.
In the case where RTAP runs on an ‘ABOVE NORMAL’ priority, the CPU on a
computer will execute the algorithms more often than other ‘NORMAL’ priority processes in
the CPU execution queue. RTAP as ‘HIGH’ priority takes precedence as compared to
‘NORMAL’ and ‘ABOVE NORMAL’ processes and a CPU exclusively services the algorithms
treating it as a time-critical task. RTAP threads that run under ‘HIGH’ priority will pre-empt
threads from other processes running either as ‘IDLE’ or ‘NORMAL’ priority. Running as
‘REALTIME’ priority, RTAP threads will pre-empt the threads of all other non-‘REALTIME’
threads including scheduling threads in Windows OS [51]. In light of the various levels of
priorities that RTAP is capable of running at, it is hypothesised that the loading of the
number of BF channels will increase with the escalation of every discrete level of priority.
The results of optimisation through the alteration of process priorities are presented in
section 6.2.
4.3 Structure and Settings
4.3.1 Class Structure
The graphical user interface (GUI) of RTAP is implemented with an open source GUI
C++ library called JUCE. Besides the GUI library, the audio and graphics libraries within
JUCE are utilised to stream sampled audio data from the microphone channel and display
processed data response on RTAP respectively. Figure 4.3 demonstrates the layout of
RTAP through the use of object oriented classes. RTAP starts by drawing the GUI window in
MainWindow before invoking AuditoryPeripheryMaster to launch the main and parameter set
UI tabs in RTAP. The control buttons and feedback statuses are painted through the
AuditoryPeripheryJUCEmain class. Within the same class, LiveAudioInput class is invoked
76
to prepare the audio streaming from the microphone channel on board the computer.
Similarly, from AuditoryPeripheryJUCEmain, AuditoryPeripheryJUCEdisplay is invoked to
render plot display window on to RTAP GUI.
Figure 4.3: RTAP object oriented class layout.
The algorithms along with the memory allocation for coefficients, input and output
(IO) buffers described in chapter 3 are incorporated in the AuditoryPeripheryCompute class.
This class can be invoked from the AuditoryPeripheryJUCEmain but a problem lies in the
buffer size allocation. Since the maximum number of BF channels that can be processed by
RTAP was unknown at the point of development, buffer allocation based on the maximum
number of BF channels cannot be accounted for. Though, an arbitrary maximum number of
BF channels can be defined and then adjusted through trial and error, this approach was
deemed impractical as RTAP had to be re-compiled for every variation of the maximum BF
channels parameter. Instead, a more viable solution is to vary the number of BF channels
based on user selection and thereafter allocating the buffers after the ‘Set’ button is clicked.
Additionally, in the initial design of RTAP, algorithms execution was required to be
initiated by timer callback functions in the AuditoryPeripheryJUCEdisplay class in order to
synchronise algorithm execution together with plot display. Similarly,
AuditoryPeripheryJUCEmain class is also required to invoke the functions within
AuditoryPeripheryCompute to deallocate and allocate buffers of specific sizes. As the two
aforementioned classes required access to the functions in AuditoryPeripheryCompute,
creating the AuditoryPeripheryCompute class from either of the two former classes would
have prevented the other from invoking functions from AuditoryPeripheryCompute. To
resolve this conflict, AuditoryPeripheryCompute is made the base class of
AuditoryPeripheryMaster
MainWindow
AuditoryPeripheryJUCEmain
AuditoryPeripheryJUCEdisplay LiveAudioInput
AuditoryPeripheryCompute
77
AuditoryPeripheryJUCEdisplay so that when AuditoryPeripheryJUCEdisplay is invoked in
AuditoryPeripheryJUCEmain, both classes are able to access the algorithm functions of
AuditoryPeripheryCompute. Though, through the incorporation of threads in the later designs
of RTAP discarded the dependence from AuditoryPeripheryJUCEdisplay, the inheritance of
AuditoryPeripheryCompute still remains in the implementation of RTAP.
In the JUCE library, the sampling rate of the audio data streamed from the
microphone channel is set at a default of 44.1 KHz. Up to 2560 samples of audio data are
acquired from the underlying Microsoft DirectSound base library and made available to the
audioDeviceIOCallback function via the LiveAudioInput class every 58ms. As the primary
auditory range of importance is within the spectral range of speech, the sampling rate was
reduced to 22.05 KHz through the process of decimation where only even numbered
samples are retained. Therefore, instead of 2560, 1280 samples of audio data are acquired
every 58ms. On another note, the timing benchmark for executing multiple BF channels in
RTAP is 58ms based on the highest sampling rate setting of 22.05 KHz, which means that
algorithms processing time up to 58ms are acceptable but anything beyond results in the
probable degradation of the output response. This benchmark parameter of 58ms is
computed by multiplying quantity of sampled audio data, 1280 with the inverse of the highest
sampling rate at 22.05 KHz.
4.3.2 Input Settings
Table 4.2 projects the RTAP settings used to acquire real-time results for two
stimulus types. The results for the sine tone stimulus response are projected throughout all
sections within chapters 4, 5 and 6 as well as the Appendices with the exception of section
5.2. For section 5.2 the input stimulus is streamed from the microphone channel. The
number of BF channels for the response plots in some sections may also vary accordingly to
illustrate the effects of maximum load in RTAP for machines 1 and 2. System parameter
settings are projected in table A.1 of appendix A.
78
Settings Sine Tone Microphone channel
Stimulus frequency: 500 Hz -
Stimulus level: 50 dB SPL 50 dB SPL
Size of window frame: 58ms 58ms
Sampling frequency: 22.05 KHz 22.05 KHz
Minimum BF: 250 Hz 250 Hz
Maximum BF: 6000 Hz 6000 Hz
Number of BF channels: 30 30
Table 4.2: RTAP settings for acquiring various responses.
4.4 Sine Tone Generator
A window frame contains up to 1280 sampled audio data for a default sampling rate
of 22.05 KHz should sampled audio data be streamed from the microphone channel.
However, a window frame of a fixed size of 1280 may not be able to hold complete sine
wave cycles. A complete sine wave cycle is defined as a sine wave that consists of positive
and negative half of a cycle and ends after traversing the negative half cycle to a point on
the y-axis just before the start of the next sine wave. The last sine wave within a window
frame may be subjected to truncation, which is dependent on the sine tone frequency. As a
result, when RTAP processes two adjacent window frames stored with sine tone sampled
data, it will experience an abrupt end to the sine wave cycle when shifting from one window
frame to the next. Hence, the size of the window frame must be altered in order to ensure
smooth transition of sine wave cycles between adjacent frames.
The window frame size is altered based on the sine tone frequency set and its size is
typically either equal to or less than 1280 sampled data. The truncated sine wave cycle just
before the end of the window frame size of 1280 described in the preceding paragraph is
thus eradicated. This leaves behind a window frame that holds complete cycles of sine wave
and ensures a smooth transition between adjacent window frames. In RTAP, the sine tone is
computed only for one cycle when the ‘Set’ button is clicked. The sampled data for one cycle
is replicated over to the rest of the audio buffer based on the requirements of the number of
sine wave cycles needed to complete the entire window frame. Since the y-axis start and
end points of the sine wave within a window frame is one y-axis interval apart, a single
window frame of a sine tone signal can be used as a continuous input stream throughout the
runtime of algorithms in RTAP. The pseudocode for the sine tone generator is given in listing
4.1.
79
// AudioGain is based on input stimulus SPL gain in dB SPL AudioGain = 28e-6*(10^(dB_SPL/20));
// pre-calculate a sine tone cycle here
NumSampPerCycle = SamplingRate / SineFrequency; // Default_window_size = 1280 for sampling rate of 22.05 KHz
NumSampOmit = % NumSampPerCycle;
NumCyclesPerWindow = Default_window_size / NumSampPerCycle;
for (i=0; i< NumSampPerCycle; i++) {
Audio_buffer[i] = AudioGain * Amplitude *
sin(2 * PI * SineFrequency * i / SamplingRate);
// this is where a sine tone cycle is repeated over to form a continuous sine wave
for (j=1; j<NumCyclesPerWindow; j++) {
// copy over the 1st cycle data over to the next cycle
Audio_buffer[(j*NumSampPerCycle)+i] = Audio_buffer[i];
} }
// update the processed data per BF channel based if sine tone option chosen if (Audio_Source == AUDIO_SINE)
{
WindowFrameSizePerBF = Default_window_size – NumSampOmit; }
Listing 4.1: Sine tone generator.
4.5 Threading
4.5.1 Background
Besides the primary function of RTAP which is to compute MAP-based algorithms in
real-time, the application is expected to record as well as display real-time processed data.
Sampled audio data streamed from the microphone channel at a sampling rate of 44.1 KHz
are down sampled to 22.05 KHz in RTAP. The audio data is made available to RTAP
regularly at 58ms interval in a block format at a rate of 1280 samples per block. At the
availability of a block of data, RTAP is required to compute the respective response of the
AP model stage. Within the same window frame and depending on the user-based settings,
RTAP might also be expected to record the processed data samples to a file as well as
visually display them on its UI display window.
In a conventional C++ class without any thread usage, the aforementioned three
tasks are executed in a sequential manner, one task after the other. However, RTAP
computational load is varying and is dependent on the number of BF channels, which is also
a user-defined setting. Setting this parameter to a low number might not pose a problem in
RTAP as there are not much computational intensive algorithms to be executed. However, at
larger loading of the number of BF channels, complete tasks execution failure is imminent.
Figure 4.4 illustrates sequential execution of tasks based on low and high number of BF
80
channels. With a higher number of BF channels loaded in sequential execution format in
RTAP, there will not be sufficient time to execute the display function within RTAP leading to
possible incomplete processed data display. At the point of arrival of subsequent data block,
the behaviour of RTAP may become unpredictable and thus sequential execution is not an
ideal method of attaining optimum performance gain.
Figure 4.4: Sequential execution in RTAP.
To overcome the constraints encountered in sequential execution, RTAP adopts
parallel execution techniques through the use of threading application programming interface
(API). Three threading APIs were considered for implementation. Table 4.3 offers a
summary of the threading APIs. Since RTAP has been developed on Windows OS, ideally
the best parallel computing option is to implement Windows API threading. However, in light
of future expansion to multiple OS platforms, a more beneficial option is to adopt a cross OS
platform threading API. Between Open MP and POSIX threads, Open MP offers better
performance over POSIX [52] and presents an obscure abstract layer of thread
management. More elaborately, Open MP offers parallelisation of sequential code segments
with automatic thread management where thread creation, synchronisation and deletion are
automatically taken care of by the API.
Algorithm () Record () Display ()
Low number of BF channels
Block of 1280 audio
samples available
Block of 1280 audio
samples available
58ms
Display () Algorithm () Record ()
High number of BF channels Incomplete function
execution
....... Algorithm ()
81
In the case of RTAP, though runtime performance is an essential feature, the degree
of freedom in the management of threads is essential in achieving a controlled execution of
parallelised tasks. For example, in RTAP, a record thread is not required to be processed
until the ‘Record’ button event is registered. A significant redundancy is added to the
processing time if the record thread was to be created at the point when the ‘Record’ button
is clicked. With the option of thread management, a record thread is created that initiates the
creation of a binary file when the ‘Set’ button is clicked and thereafter, placed in a dormant
state. In this state, the record thread waits. Upon the clicking of the ‘Record’ button, the
contents of the record thread are processed almost instantly as there are no redundancies
involved. Hence, with respect to functionality, a larger degree of thread management control
is necessary and the POSIX API thread is chosen to be implemented in RTAP.
Threading APIs Advantages Disadvantages
Windows • Native applications run
directly on processor
unlike cross platform
threading APIs that incur
overhead [50].
• Cannot be implemented
across other OS platform (can
only be used exclusively on
Windows OS) [50].
Open MP • Cross OS platform [50].
• Good performance [52].
• Insufficient degree of direct
control of threading resources
[50].
POSIX • Cross OS platform [50].
• Larger degree of direct
control of threading
resources [50].
• Not as good performance as
Open MP [52].
Table 4.3: Threading API comparison.
4.5.2 Implementation
Three POSIX threads, known as pthreads, are allocated for the algorithm, file record
and pixel rendering functions. The three POSIX threads are created at the launch of RTAP in
the constructors of AuditoryPeripheryCompute and AuditoryPeripheryJUCE classes
respectively. Each of the three threads are then placed in a Ready-to-Run state as each of
them wait on a unique condition variable (CV) within a while loop that exists indefinitely for
the duration of RTAP runtime. These CVs are part of POSIX thread synchronisation set and
are used in simple inter-thread communication. In RTAP, CVs are used to indicate to
respective threads on the availability of data for further processing. As a complement to CV
82
usage in POSIX, mutex locks also need to be used. A mutex allows only a single thread at
any time to exclusively access a block of code [50]. To use CV signalling in POSIX, a mutex
must be acquired first and then released after the work on the CV is completed. Figure 4.5
shows the usage of POSIX thread synchronisation in RTAP.
Figure 4.5: Thread synchronisation pseudocode in RTAP.
With respect to figure 4.5, two threads are created. As soon as thread 2 is created, it
enters a while loop and a lock is acquired on a mutex. It then encounters a command to wait
for a CV. Before it continues in wait mode, thread 2 releases the mutex lock. Alternatively,
thread 1 upon creation starts to compute and store the corresponding results on shared
memory. The thread continues on to gain a mutex lock that allows it exclusive access to
send out a CV. After transmission, the mutex lock is released. Upon the reception of the CV,
thread 2 runs only after the mutex lock is released by thread 1. Thread 2 then continues to
process the shared memory contents. In this way, thread synchronisation is achieved. Figure
4.6 illustrates the thread structure utilisation in RTAP.
Thread 2
while (RTAP_is_running)
{
// allow only thread 2 to run pthread_mutex_lock( &Mutex );
// wait for condition variable to be signalled
pthread_cond_wait ( &Condition_variable, &Mutex ); // allow other threads to run
pthread_mutex_unlock( &Mutex );
// carry out some task on data on shared memory
...
}
Thread 1
// do some work on shared memory
... // allow only thread 1 to run
pthread_mutex_lock( &Mutex );
// signal condition variable pthread_cond_signal ( &Condition_variable );
// allow other threads to run
pthread_mutex_unlock( &Mutex );
83
Figure 4.6: Thread utilisation structure in RTAP.
In RTAP, upon the availability of audio data samples either through an audio input
stream connected to the microphone channel or the sine tone generator, an algorithm thread
is invoked to compute the response of a particular stage of the auditory periphery model.
Towards the end of the algorithm thread after the computation, CVs are signalled to prepare
the pixel render and file record threads to run. As the processed data are cloned to two other
memory location segments for display and file recording, the two threads are able to run
concurrently past the boundary of a window frame. A comprehensive thread synchronisation
network diagram implemented in RTAP is depicted with pseudocode in figure 4.7.
Block of 1280 audio
samples available
Block of 1280 audio
samples available
58ms
Algorithm ()
Record ()
Display ()
Low number of BF channels
Record ()
Algorithm ()
Display ()
Display ()
Algorithm ()
Record ()
High number of BF channels
Algorithm () Thread 1
Thread 2
Thread 3
Thread 1
Thread 2
Thread 3
84
Figure 4.7: Thread synchronisation in RTAP.
At the clicking of either ‘Play’ or ‘Play+Record’ button in RTAP, a timer callback
function that runs every 58ms detects a change in a global Boolean variable that tracks the
‘Play’ buttons click status. A CV, cvWaitOnPlayButton, is transmitted that triggers the
// 58ms timer
AuditoryPeripheryJUCEdisplay::paint()
{
if (Play_button_clicked) {
Signal_cvWaitOnPlayButton;
}
if (Audio_source != MIC_IN)
{ Signal_cvComputeAlgo;
}
}
// Algorithm thread
AuditoryPeripheryCompute::ProcessFunction()
{
while (RTAP_is_running) {
Wait_for_cvWaitOnPlayButton;
if (Audio_Packets_to_be_processed)
{
if (Record_button_clicked) {
Signal_cvFileWriteWaitonRecBtn;
}
Wait_for_cvComputeAlgo;
// Process algorithm function
AlgorithmFunction(); }
}
}
AuditoryPeripheryCompute::ProcDataRecordThread() {
...
Wait_for_cvFielWriteWaitonRecBtn;
while (RTAP_is_running)
{ Wait_for_cvFileWriteWaitonSignal;
...
} }
AuditoryPeripheryCompute::AlgorithmFunction() {
// Process algorithms start
... // Process algorithms end
Signal_cvDrawPlot;
if (Record_button_clicked)
Signal_cvFileWriteWaitonSignal;
}
LiveAudioInput:audioDeviceIOCallback
{
// Acquire data from Mic In stream
... Signal_cvComputeAlgo;
}
AuditoryPeripheryJUCEdisplay:: DrawPlotPixels()
{
while (RTAP_is_running) {
Wait_for_cvDrawPlot;
// Draw image on // display window
...
} }
85
algorithm initiating thread to prepare to call on the respective algorithm function to service
the audio data. If either the ‘Play+Record’ or ‘Record’ button is clicked, another global
Boolean variable is set and the algorithm initiating thread transmits a CV,
cvFileWriteWaitonRecBtn, which allows the file record thread to write a file header into a
newly created binary file. The file record thread then suspends its operation pending
availability of processed data from the algorithm function. The algorithm initiating thread,
thereafter, waits for the availability of audio data samples either from the live audio input
stream connected to the microphone channel or the sine tone generator. Once a block of
data is saved in a shared memory location, a CV, cvComputeAlgo, is received by the
algorithm initiating thread. The respective algorithm function is then called to process the
contents of the shared memory. Towards the end of the algorithm function, where the
computation is completed, a CV called cvDrawPlot is sent out to render an image buffer
based on the recently computed processed data. The algorithm function concludes by
sending a final CV, cvFileWriteWaitonSignal, out to the file record thread that saves the
processed data to the binary file.
4.5.3 Results
Intel thread checker, which is part of Intel VTune Amplifier XE 2011 software
application, is able to measure the performance of the thread utilisation in RTAP. One
requirement of using this utility is that the program to be analysed has to have a short and
deterministic time duration. However, RTAP runtime duration is a non-deterministic variable
dependent on the intention of the user. Therefore, a separate shorter abstract of RTAP is
developed that runs two time frames where the algorithm, file record and pixel render code is
replaced with a time delay of 58ms, 15ms and 15ms respectively. The goal of this test
program is to ensure that thread execution is within the expected behaviour as observed in
figure 4.6. Figure 4.8 illustrates the output response of the Intel thread checker.
86
Figure 4.8: Intel thread checker analysis of RTAP usage of threads.
The lower half of the image in figure 4.8 represents a magnified segment of thread
execution transition in RTAP thread simulator. In part (1) of figure 4.8, the algorithm thread
after 58ms invokes the record thread through a CV specific to the record thread. As a mutex
is already locked when the CV is signalled by the algorithm thread in (1), the record thread is
unable to lock on the mutex and thus placed in Ready-to-Run mode. In part (2), the
algorithm thread unlocks the mutex that then allows the record thread to lock the mutex and
start to run as a ‘Critical Section’ where it services the CV. In the ‘Critical Section’ domain,
the algorithm thread momentarily relinquishes CPU control. After the record thread has
serviced the CV, it releases the mutex lock and the algorithm thread resumes execution by
immediately locking another mutex for the pixel render thread in part (3). The algorithm
thread immediately signals the pixel render thread with CV that puts the latter thread in a
‘Ready-to-Run’ mode waiting on the release of the pixel render mutex. In part (4), the
algorithm thread finally releases the pixel render mutex, which then allows the pixel render
thread to service the CV and subsequently release the pixel render mutex. One essential
feature of the thread simulation test is the existence of concurrency in part (3) onwards.
Thread synchronisation through CV signalling is the only serialised dependency between
threads. However, the processing time of CV signalling is negligible. Therefore, the record
and pixel render threads run concurrently indicating the achievement of parallel execution in
RTAP.
87
4.6 Response Plots
The responses for the various stages of RTAP result in the generation of multiple
signals that is directly proportional to the number of BF channels selected. A 30 BF channel
DRNL response results in the generation of 30 logarithmically spaced BM displacement
signals. The display of 30 signals on 30 different graphs does not make an ideal visual
representation of an auditory model response. Hence, it is advantageous to group the
various signals into a single graph.
Equivalent rectangular bandwidth (ERB) is a method to plot signals from a multiple
channel auditory perceptual model [22]. Equation 4.1 defines the relationship between a
signal of a specific frequency and the ERB scale.
x}Pª(&) = 21.4C?;Ht(0.00437& + 1) (Eqn. 4.1)
where ERBS(f) is the translated signal of a specific frequency offset from the point of origin
on the y-axis;
& is the frequency of the signal. Equation 4.1 is a modification of the Greenwood function
that describes the variation of a critical bandwidth with a centre frequency. It has the same
form as the Greenwood function except that the coefficient constants vary. Therefore, ERB
plotting is able to associate a centre frequency within a critical bandwidth to a position along
the basilar membrane similar to the Greenwood function [53].
RTAP processed data are time domain representation of the various BF components.
To represent time domain data on ERB scale, equation 4.1 has to be modified. This
modification is dependent on the magnitude of processed data for every stage of the MAP
model. For example, the processed data from the various stages of RTAP are in the
magnitude of 1e-1 for IHCRP, NRR and ANSP stages and in the range of 1e-7 to 1e-14 for
stapes and BM displacement stages. With a large dynamic range of amplitude to deal with,
equation 4.1 is retrofitted with an exponential function as follows:
�¡zf (>) = 21.4C?;Ht(0.00437& + 1) + �f�(>)2o (Eqn. 4.2)
where �¡zf is time domain processed sample data scaled and offset on the ERB scale; �f�
is the time domain representation of processed sample data generated with respect to BF; &
is the BF in Hertz; S is a scaling factor of the time domain processed signal, �f�. In order to
normalise a large range of amplitudes for the various stages of the MAP model, this
exponent can be tuned with an integer value to amplify the signals to an ideal visual level of
representation. The base value of 2 is selected for speedup in computing [54]. Figure 4.9
illustrates the DRNL response for 30 BF channels represented in an ERB scale.
88
Figure 4.9: (a) MAP and (b) RTAP DRNL response for 30 BF channels.
The DRNL responses of MAP and RTAP for the first window frame upon the start of
the two models are illustrated in figure 4.9 and are based on the input parameters from table
4.2. The scaling exponent from equation 4.2 used for sizing up the DRNL response is set to
250Hz
388Hz
670Hz
1159Hz
2005Hz
3469Hz
6000Hz
0 0.005 0.01 0.015 0.02
BF
sit
es
alo
ng
BM
fo
r B
M d
isp
lace
me
nt
(Hz)
Time (seconds)
MAP DRNL Response (30 BFs)
250Hz
388Hz
670Hz
1159Hz
2005Hz
3469Hz
6000Hz
0 0.005 0.01 0.015 0.02
BF
sit
es
alo
ng
BM
fo
r B
M d
isp
lace
me
nt
(Hz)
Time (seconds)
RTAP DRNL Response (30 BFs)
89
22. The signals with resonances of significant amplitudes are clearly present in the lower
frequencies especially around the region of 500 Hz, which is the frequency of the input
stimulus. The increase in phase of the traveling wave from the base to apical end is also
depicted clearly in the figure as well. The top most signal in figure 4.9 is the highest BF
situated at the basal end of the BM. From this rigid part of the BM, the wave is propagated
along the BM where the wave amplitude and phase increases. It reaches maximum
amplitude at the 500 Hz site of the BM before decreasing rapidly as it reaches the apical
end. Its phase continues to increase until the end of the apical end is reached. Hence, the
ERB scaled plot is able to project multiple signals in a condensed and well-structured
manner that describes the amplitude, phase and frequency responses within a single graph.
Furthermore, the plot in figure 4.9 (b) demonstrates the capability of RTAP in regenerating
identical DRNL response as MAP in figure 4.9 (a) given identical input parameters in real-
time.
The ERBS representation of the IHCRP, NRR LSR, NRR HSR, ANSP LSR and
ANSP HSR responses for MAP and RTAP are depicted in figure A-1 to A-5 in appendix A
and the scaling factors are tuned to 5, -5, -9, -9 and -12 respectively. Furthermore, every
aforementioned response shares similar traits with the DRNL response in terms of the
amplitude response and phase deviation in the entire BF range. It has to be noted in all ERB
scaled plots in this thesis that the frequency components are used interchangeably with ERB
scaled points to denote y-axis coordinates.
4.7 Recording Feature
4.7.1 File Write Command Selection
A vital feature that is required in RTAP is the storage of processed data output from
the various stages of the model into a file to ensure that the processed data generated
matched the MAP model response. As a result, the recording feature for RTAP is required to
store only a short segment of processed data instead of a log of continuously acquired data.
Though data logging for the entirety of the processed data is beneficial especially in the long
term usage of RTAP, this option was not considered due to its complexity and surplus time
required for implementation. Instead, priority was given to ensure that an integrated real-time
program of the MAP model existed at the end of this project. Hence, the record feature of
RTAP serves as a stepping stone to a sophisticated real-time logging of processed data for
future editions of RTAP. As for the first version of RTAP, two window frames of processed
data are required to be stored. This is mainly to ensure that the continuity between
subsequent window frames for all algorithms is intact according to the MAP model response.
Hence, the data stored in the file should be in a numerical format representing the exact data
90
output from RTAP regardless of the input stimulus and it should be organised in a structure
based on the number of BF channels as well as AN channel fibre types for either NRR or
ANSP response. Several C/C++ file write function commands were examined. These
function commands were tested by writing 1280 floating point variables into text and binary
files. Table 4.4 projects a summary of the file write function commands profiled on machine
1.
File Write
Commands
Generated Output Operating System
(OS) Platform
Average
Processing Time
C++
‘ostream_iterator’
Text file Multiple platforms 18.95ms
C++ ofstream
based ‘<<’ in a for
loop
Text file Multiple platforms 32.11ms
C ‘sprintf’ and
‘write’
Text file Multiple platforms 6.31ms
C/C++ ‘WriteFile’ Binary file Windows 0.13ms
C/C++ ‘fwrite’ Binary file Multiple platforms 0.14ms
Table 4.4: C/C++ file write profile.
From table 4.4, it can be deduced that ofstream based write function command take
the longest to execute. Text formatting and file writer commands, sprintf and write that
generate a text file with floating point numerals seem an ideal choice for implementation.
However, WriteFile and fwrite that are used for binary file generation are approximately 45
times faster than the combination of sprintf and write function commands. WriteFile is a
Windows based file writer and is only usable for C and C++ compilers in Windows OS
whereas fwrite is a cross-platform function command which can be used on C and C++
compilers on other OS as well. Due to the versatility of cross-platform utilisation and
relatively short processing time, fwrite was implemented in RTAP for saving processed data
into a binary file. To interpret the contents of the binary file, formatting is required. This is
achieved using an offline program separate from RTAP that converts the contents of the
binary file to alphanumeric characters and store them into a text file.
4.7.2 Binary File Format
The RTAP recording feature is divided into two primary stages. The first stage
involves data writes to a binary file that omit floating point (FP) formatting. The second stage
consists of an offline processing to restore the FP formatting and store the formatted
91
numbers into a text file. The sole reason for dividing the data recording into two stages is to
not compromise computational performance of the concurrent computation of the algorithms
and recording in RTAP. For the offline program to function, it needs to possess properties of
the recorded data. These properties are included in the header of the binary file sized as a
constant of 50 bytes. Figure 4.10 depicts the structure of the binary file format that saves raw
processed data from RTAP.
Figure 4.10: RTAP binary file format generated when the ‘Record’ or ‘Play+Record’ button is clicked.
*For non-sinusoidal
input, this parameter
is set at an all high
state i.e. 0xFFFFFFFF
*Sine Input Frequency
R T A P
Number of BF
Number of AN fibre
Number of processed data per time frame
Minimum BF
Maximum BF
Number of AN
Algorithm function ID
Unused (reserved for
future expansion)
32-bit floating point
processed data
(based on IEEE 754-
2008 [62])
0
4
8
12
16
20
24
28
32
36
50
Byte
92
4.7.3 File Writer Thread
RTAP recording feature implementation is designed so as to cause a minimal impact
on the operation of the algorithm computation function during runtime. This is achieved
through the use of a POSIX thread to perform the write operation in parallel with the
execution of algorithms. The thread is created in the Reinitialise function of
AuditoryPeripheryCompute class when the ‘Set’ button in RTAP UI is clicked. As soon as the
thread starts, it changes its property from a running state to a ready-to-run state, which
alternatively means to temporarily halt the thread from running when it first starts. This is
achieved through the use of a POSIX based command, which allows a CPU to continue
executing the thread once a unique condition variable (CV) is received. This CV,
cvFileWriteWaitonRecBtn, is broadcasted from the time triggered ProcessFunction when
either ‘Record’ or ‘Play+Rec’ button is clicked.
As soon as either the ‘Play+Record’ or ‘Record’ button is clicked, the file writer thread
continues to be processed thereafter, allowing the binary file header containing the metadata
of the processed data to be written into the binary file. This operation is done as follows:
fwrite ( FileHeader, 1, SIZEOFHEADER, pRTAPfile );
The C structure FileHeader that contains the metadata is computed and defined in
Reinitialise() after the ‘Set’ button is clicked. After the writing of the header file in the file
writer thread, an endless while loop is entered within the thread that exists for the duration of
RTAP. At the first instance of entry into the while loop, the thread is temporarily halted by
another CV POSIX command. This CV, cvFileWriteWaitonSignal, is signalled by the
algorithm thread that processes the response of the AP model stage of interest. This
signalling is done when the algorithm thread has finished processing one window frame of
data. The operation of the recording feature in RTAP is illustrated in figure 4.11.
93
Figure 4.11: File write thread operation.
4.7.4 Binary File Recording
The processed data that is saved into the binary file is dependent on two parameters:
algorithm function to run and its corresponding processed data response to be saved. Both
these options can be set at the RTAP UI. There are two record buttons available at RTAP
UI. The ‘Record’ button functions only when RTAP has started processing the algorithms or
in other words, after the ‘Play’ button is clicked. This ‘Record’ button can be clicked anytime
thereafter to record two window frames of processed data. The ‘Play+Record’ button starts
both the RTAP algorithm computation as well as data recording concurrently. This button
was implemented for the purpose of analysing the processed data from the first two window
frames. This capability of saving the first two window frames using the ‘Play’ and ‘Record’
// 1) create & open a binary file for writing
// 2) wait for 'Record' or 'Play+Record' button to be clicked pthread_mutex_lock( &CallWriteFuncMutex );
pthread_cond_wait ( &cvFileWriteWaitonRecBtn, &CallWriteFuncMutex );
pthread_mutex_unlock( &CallWriteFuncMutex );
// 3) write binary file header into binary file first
fwrite ( FileHeader, 1, SIZEOFHEADER, pRTAPfile );
while (bRTAPrunning)
{
// 4) wait for algorithm function to signal that has processed 1 time of // data packets
pthread_mutex_lock ( &Write2FileMutex[0] );
pthread_cond_wait ( &cvFileWriteWaitonSignal[0], &Write2FileMutex[0] ); pthread_mutex_unlock ( &Write2FileMutex[0] );
// 5) write 1 time frame worth of processed data into binary file }
// 6) close binary file
RTAP File Writer Thread
or
pthread_cond_signal ( &cvFileWriteWaitonSignal[0] );
Algorithm Function i.e. DRNL () / DRNL-IHCRP () / DRNL - NRR ()
/ DRNL - ANSP ()
pthread_cond_broadcast ( &cvFileWriteWaitonRecBtn );
ProcessFunction()
94
buttons separately may not be achievable due to presence of delay between clicking both
buttons.
Upon the clicking of either ‘Play’ or ‘Play+Record’ button, a Boolean flag termed as
bRecordProcData is set. Depending on the processed data intended to be recorded, the
respective function then clones the processed data from the algorithms buffer to a record
buffer. The record buffer is able to accommodate two window frames of data for a maximum
of 300 BF channels and 2 AN fibre types. Although both record and algorithm buffers hold
the same data, buffer segregation of this form is required so as to prevent buffer usage
conflicts that may arise due to asynchronous buffer access owing to the parallel execution of
the two aforementioned threads if they shared the same buffer space. Furthermore, with
abundance of memory available on modern computers, redundant buffer utilisation of such
degree is tolerable.
Subsequently, as soon as the first window frame of processed data is written into the
algorithm and record buffer, the algorithm thread continues to process the subsequent
window frame of audio data. At the availability of the second window frame of data, the
algorithm thread continues to store the processed data into the record buffer only if the
record thread is not reading the contents of the record buffer that signifies file writing in
process. This is accomplished using another Boolean flag, bRecordThreadEngaged. Hence,
the file writer thread should not be running when the data is copied. If the file writer thread is
currently writing into a file, and the ‘Record’ button is clicked again, processed data will
overwrite the existing data in the record buffer and the data being saved into the file will be
corrupted. As a means of prevention, the Boolean flag, bRecordThreadEngaged, is utilised
to signal file writer thread running status as well as ensure that buffer access by the
algorithm and file writer threads are coordinated.
As far as the ‘Play+Record’ button is concerned, processed data from the first two
window frames are certain to be recorded regardless of the number of BF channels.
However, this may not be the case for the ‘Record’ button as it may be clicked at any point in
time. When the algorithms are not being processed, clicking the ‘Record’ button has no
effect. However, at any instance in time where the algorithms are being executed for multiple
BF channels, the algorithm thread may have already begun computing beyond the first BF
channel. Hence, at the end of the recording of the first window frame after the ‘Record’
button is clicked, the record buffer may not be fully filled with processed data from all the BF
channels. To prevent such an incident from occurring, a Boolean flag, bRecordStatus, is set
to signal the file writer thread to continue with the window frame recording only when the
‘Record’ button is clicked and the algorithm computation is starting from the first BF channel.
With this strategy in place, if the ‘Record’ button is clicked midway in the computing of the
algorithm thread, bRecordStatus is only set once the algorithm thread starts to process the
95
first BF channel for the subsequent window frame. This ensures that full window frames of
processed data are recorded in the binary file.
RTAP saves two types of data with respect to the option set in the variable,
uiFunction2Rec, linked to the ‘Function to record’ combo box selector on RTAP UI. One data
type is the input stimulus that is stored in a separate record buffer for two window frames
and included in the binary file as a form of reference to the processed data. The second data
type written into the binary file is processed data, which is output data from the algorithm
function stage or alternatively, the response of one of the auditory pathway stages that
comprises of DRNL, IHCRP, NRR and ANSP. After the file writer thread is signalled by the
algorithm thread, the file writer thread firstly stores the contents of auditory stimulus record
buffer. Thereafter, two iterative for loops are encountered that transfer the contents of the
processed data record buffer into the binary file. The first loop covers the entire range of the
number of BF channels and the second loop is accountable for the number of AN fibre types.
For the scenario where either DRNL or IHCRP data are required to be stored, the number of
AN fibre type is set to one as their responses are independent of the number of AN fibre
types. Listing 4.2 offers pseudo code as an insight to binary file recording.
// 1) set flag to indicate file writer thread running
Record_Thread_Engaged_flag = true;
// 2) save input stimulus
if (Function_To_Rec != AUDIO_IN)
fwrite ( Audio_Data, 4, Number_Samples_to_Record, RTAP_file );
// 3) save processed data for every BF channel and AN fibre type if applicable
for (i=0; i<NumBFchannels; i++) {
for (j=0; j<Number_of_AN_fibre_types; j++)
{ fwrite ( ProcessedData[i][j], 4, Number_Samples_to_Record, RTAP_file );
}
}
// 4) reset flag to indicate file writer thread temporary halt
Record_Thread_Engaged_flag = false;
Listing 4.2: Data writes to binary file in file writer thread.
4.7.5 Offline Formatting and Text File Generation
RTAP-offline is an offline program separate from RTAP that formats binary file
generated in RTAP and produces a text file filled with the formatted information. Running
RTAP-offline starts the reading and translation of the contents of the first 50 bytes of the
RTAP binary file that contains the header. The metadata from this segment defining the
processed data is formatted into human interpretable strings and are stored as another
header in a new text file. An iterative for loop is then executed where every loop cycle results
96
in one window frame worth of data being read and stored in a temporary buffer. The loop
ends when the end of file is encountered. After the acquisition of each window frame of data
from the binary file, a second iterative for loop is executed. The second loop extends up to
the length of one window frame size, which is equivalent to 1280 sampled data. Within one
cycle of this loop, 4 consecutive bytes of binary data are transferred from the temporary
buffer over to a single floating point variable which is also 4 bytes in size. This transfer
process reformats 4 individual bytes of binary data into a floating point value. Invoking fprintf
function command thereafter, writes the floating point value as a numeral into a text file.
Listing 4.3 projects the pseudo code that performs the offline processing.
// 1) Open RTAP binary file ...
// 2) Create and open text file ...
// 3) Get RTAP binary length ...
// 4) Read header (1st 50 bytes), format it and store in text file
fread ( temp_Header_Buffer, 1, 50, RTAP_file );
FormatRTAPheader ( temp_Header_Buff, Header_Buffer); WriteHeader2TxtFile ( Header_Buffer );
for (i=0; i<(BinaryFileLength-50); i+=(WindowFrameSizePerBF * 4)) {
// 5) Read processed data from binary file
fread (Temp_Buffer, sizeof(float), WindowFrameSizePerBF, RTAP_file);
// 6) Format every consecutive 4 bytes into FP num
for (j=0; j<WindowFrameSizePerBF; j++) {
memcpy (FP_Data[j], Temp_Buffer[Bytes_Offset], 4);
Bytes_Offset += 4;
} // 7) Write FP number to text file
WriteFPdata2TxtFile (FP_Data);
} // 8) close binary file
// 9) close text file;
Listing 4.3: RTAP offline processing.
RTAP recording feature was initially designed and developed with an emphasis on
real-time data logging. Real-time data logging that continuously saves processed data as
soon as they are made available was attempted in the early stages of development. The
implementation is incomplete. The real-time logging of 1 BF channel was attempted and it
was observed that contents of a window frame saved in the binary file were overwritten with
contents from the subsequent time frame. Firstly, this indicated that the algorithm function
executes faster than the file writes for one window frame. The second observation was that
record buffer protection was absent. The record buffer size allocated for 300 BF channels
and 2 AN fibre types was only a multiple of 1280 FP value. In the real-time logging attempt,
the algorithm thread upon its invocation started to overwrite the record buffer with new data
97
the same time as the record buffer was being read by the file writer thread. Hence, the
binary file storage contained a mixture of data from the two window frames. An evidence of
this was the abrupt change in the recorded data pattern observed through offline analysis.
A solution to real-time data logging is to increase the record buffer size to store more
FP value from its original size of 1280 FP values. However, the question still remains as to
the limit of buffer size and the type of real-time logging strategies to adopt. For the first
edition of RTAP, this remains an open question. As a result, the implementation of real-time
logging was omitted from RTAP. Though in its place the current two-frame recording is
implemented and record buffer protection is achieved with a series of Boolean flags.
4.7.6 Results
Figure 4.12: Continuity between adjacent window frames for RTAP generated DRNL response.
Figure 4.12 displays the continuity of the DRNL response signals between first and
second window frames upon the start of RTAP. A black vertical dashed line on the figure
segregates between the first window frame to the left and the second frame on the right.
Because of the sine tone stimulus used, the signals generated in all window frames are
determinable. Thus, the continuity of the DRNL response can be concluded by analysing the
signal in the region of 500 Hz where the signals possess significant amplitude gain in
relation to the input stimulus frequency. The path of the signals from the end of the first
window frame to the start of the second window takes place in smooth transition without any
250Hz
388Hz
670Hz
1159Hz
2005Hz
3469Hz
6000Hz
0.052 0.054 0.056 0.058 0.06 0.062 0.064 0.066 0.068
BF
sit
es
alo
ng
BM
fo
r B
M d
isp
lace
me
nt
(Hz)
Time (seconds)
RTAP DRNL Response (30 BFs)
98
unwanted jitters or spikes. The window frame continuity is demonstrated for IHCRP, NRR
LSR, NRR HSR, ANSP LSR and ANSP HSR responses in figures A.6 to A.10 in appendix A.
These plots indicate that the algorithm translated to RTAP match the responses of the MAP
model. It also indicates the strategies implemented in chapter 3 for window frame continuity
is attainable in real-time.
4.8 Summary
RTAP, which is a C++ GUI-based real-time implementation of the MAP model, is
described in this chapter. Along with the algorithm buffering for window frame continuity
detailed in chapter 3, POSIX threads are used to attain the real-time effects of the MAP
model in a Windows OS environment. Responses of RTAP in a numerical form based on
multiple BF channels can be acquired through the recording features implemented on board
followed by offline processing of binary file conversion to an alphanumeric text based file.
The recorded data typically consists of two window frames of processed data. Also covered
in this chapter is the ERB scaled plotting that is able to accommodate multiple signals in a
single graph.
99
Chapter 5: Signals Display in RTAP
Visual representation of the processed signals response of the algorithm is a
necessary indicator of the real-time feature of RTAP. RTAP is capable of displaying
processed data in two formats: ERB scaled and spectrogram. These two graphing schemes
are capable of displaying multiple BF channels as well as the intensities of the processed
data. Processed data are displayed in either as a static or scrolling image. Though the static
display projects only the first window frame of processed data on the screen, it nevertheless
was developed to indicate the integrity of processed data and display mechanism. It also
serves as platform for implementing a scrolling image that visually represents processed
data for every window frame. Scrolling display is therefore, the derivative of the static
window frame display.
This chapter is separated into two halves. The first half describes the approach taken
to implement static plot displays. The development of the display mechanism is illustrated
and responses from the various stages of the AP model are displayed in ERB scaled and
spectrogram plotting modes. Thereafter, the second half of this chapter details the horizontal
scrolling implementation of ERB scaled plots.
5.1 Static Plot Display
5.1.1 Line Drawing
The built-in software renderer within the JUCE library is utilised to visually represent
the processed data in RTAP. It was initially intended to visually represent processed data on
the ERB scaled graphs as covered in section 4.6 using line rendering. One method used to
draw lines on the RTAP UI is through the utilisation of a built-in JUCE library function called
drawLine. This function takes on five arguments as follows:
drawLine ( float startX, float startY, float endX, float endY, float lineThickness)
The five arguments are fixed as a constant to display a horizontal line from left to
right of RTAP UI display window. Measuring the time taken to render a horizontal line onto
the RTAP UI is a good indicator of the capability of RTAP to visually project as many lines
proportional to the number of BF channels. Figure 5.1 illustrates five horizontal lines
displayed on RTAP UI and along with the line rendering time duration.
100
Figure 5.1: Line draw test. Five horizontal lines displayed on screen with five drawLine
functions and the subsequent line rendering time duration tagged in a green rectangle box.
From five separate readings similar to figure 5.1, the average time to project five lines
on RTAP UI is 107.4ms. This value is much larger than the cap of 58ms that includes
algorithm computation within one window frame of acquisition of sampled data. As this
design was done in the initial development phase of RTAP, threading API was not
considered until a later stage. Hence, based on the profile, the maximum number of lines
that can be accommodated along with the algorithm computation in a serial computing
format is two, which can be rendered at approximately 43ms. If RTAP was to be loaded with
more BF channels, the algorithm time increases and the number of lines must be reduced so
as not to breach the 58ms time cap. It can be concluded that the line drawing technique on
RTAP UI with drawLine function cannot be efficiently implemented without compromising the
processing time benchmark of 58ms based on serialised computing format.
Additionally, the drawLine function is a high-level software rendering abstract that
renders pixels on the screen using low level software rendering codes. These low level
codes are not available within the JUCE library for customisation and hence, there is no
information on issues such as image buffer utilisation, pixel rendering on the image buffer
and its subsequent projection on the screen. The implementation of the drawLine function is
obscure which adds considerable overhead that inadvertently increases the processing time
significantly. Low level management of resources such as image buffer allocation, its
101
utilisation in terms of pixel rendering and the projection of the image buffer to screen are
crucial in reducing processing times. These three implementable tasks are discussed in
detail over the coming sections.
5.1.2 Resource Management
The first task is to implement an image buffer and an auxiliary image buffer. Although
these two buffers hold processed data to be displayed, the differences between them are
their sizes and the storage format. The auxiliary image buffer is a two dimensional matrix
with a dimension of 32676 by 300 FP values and its main role is to buffer processed data in
its numerical FP format from every BF channel generated directly from the algorithm function
class, AuditoryPeripheryCompute. Alternatively, the image buffer is also a two dimensional
matrix that stores pixel colour values translated from the contents of the auxiliary image
buffer that represent an image extending in direction of positive x- and y-axes. The contents
of the image buffer are directly mapped on to the RTAP UI display window.
The JUCE library contains an image class that allows an image buffer allocation. The
image buffer allocation is invoked in the constructor of the AuditoryPeripheryJUCEdisplay
class as follows:
Plot2Display = Image (Image::ARGB, IMGWIDTH, IMGHEIGHT, true);
The width and height of the image is set as constants of 65536 and 600 pixels respectively.
The length has been pre-set to accommodate as many pixels as possible including off
screen pixels that are required for image plot scrolling. The height of the image buffer is fixed
based on the maximum number of BF channels that is required to be projected on screen,
which is capped at 300 channels. Each FP sampled response data generated form the
algorithm in RTAP is represented by a 2-by-2 combination of pixels. Hence, the image buffer
size is twice the length and height of the auxiliary image buffer.
The second task involves the setting of pixel on the image buffer to a particular
colour. This is defined as off-screen rendering where pixels are set on the image buffer
before being projected on the display window. The JUCE library has a built-in function that
renders a pixel on the image buffer as follows:
Plot2Display.setPixelAt(x, y, c[m]);
The JUCE setPixel function call sets a pixel denoted by the Cartesian coordinates, x and y in
the image buffer, Plot2Display, to a colour, c[m]. The last task requires the image buffer to
be projected on RTAP UI display window and this is achieved through the following JUCE
library function:
102
g.drawImage ( Plot2Display, destX, destY, destWidth, destHeight, sourceX, sourceY,
sourceWidth, sourceHeight );
The drawImage function which is a member of the graphics class, g, directly translates a
segment of the image buffer on to the display window using screen start coordinates as well
as the size of the segment to be displayed.
The major advantage of the breakdown of the tasks in such a way is that profiles for
each of the three tasks can be measured individually. The memory allocation for the image
buffers take place at the start of RTAP before the algorithm is computed whereas the pixel
settings and the projection of the image buffer is done during the runtime of the algorithms,
typically after the processing of one window frame. As a result, memory allocation of the
image buffer during the runtime of RTAP is not required thereby, reducing redundancy in
terms of processing time. The tasks involving image buffer projection onto the display
window is presented as part of the maximum load profile in section 6.2.
5.1.3 Pixels Render and Image Display Threads
In the initial design of RTAP, algorithm function invocation takes place from timer
callback function called paint from AuditoryPeripheryJUCEdisplay class that is invoked every
58ms. This is the same function that renders the plot on to RTAP UI display window. Hence,
the algorithm and rendering tasks needed to be executed in series within a time frame of
58ms. This is problematic as the algorithms are required to wait for the conclusion of the
rendering task before executing the next window frame. This would have resulted in the
algorithms to process fewer number of BF channels. To accommodate more BF channels,
the algorithms and the graphics management had to be segregated into separate threads.
The graphics tasks of off-screen image buffer rendering and display are divided into
two separate threads. The image buffer projection is an integral part of the UI display
window implemented by paint(), which is part of the main RTAP thread. Hence, only the pixel
render thread has to be created explicitly and is done so at the constructor of
AuditoryPeripheryJUCE class. Once the pixel render thread is created, a while loop within
the thread is entered that last for the duration of the RTAP application. In the while loop, the
thread then transits from an active to a ready-to-run state as it encounters a POSIX thread
based instruction that directs the thread to only proceed at the presence of a condition
variable (CV).
A CV is issued by the algorithm thread from the AuditoryPeripheryCompute class to
signal that there are processed data available in the auxiliary image buffer for plotting. As the
algorithm thread continues executing subsequent window frame, the pixel render thread
upon receiving the CV transits from a ready-to-run state to an active state. It starts to service
103
the contents of the auxiliary image buffer by translating processed data into pixels that are
rendered on the image buffer. Once the rendering operation is complete, the thread goes
back to a ready-to-run state within the while loop where it waits for the arrival of new
processed data from subsequent window frames via CV signalling.
5.1.4 ERB Scaled Plots
Plotting processed data on the ERB scale (ERBS) on RTAP uses equation 4.1 as the
fundamental equation to convert frequency to ERBS. Equation 4.2 derived from equation 4.1
that defines the characteristic of the recorded processed data for illustration cannot be
directly implemented into RTAP without modification to the scaling exponent, S. The ERBS
plots in section 4.6 were obtained by adjusting the scaling exponent as a constant through
trial and error. This was done to ensure that the output responses from the various stages of
the AP model represented on the ERBS plots were visibly distinguishable from one another
with their peaks and troughs clearly observable while not sufficiently large to overlap
adjacent signals. For DRNL responses, the scaling exponent is set to 22 due to the small
amplitude range of the BM displacement while for ANSP LSR response, the scaling
exponent is set to -9.
In RTAP, this form of normalisation is achieved using the maximum and minimum
parameters in one window frame of processed data regardless of the number of BF
channels used. The formula used in RTAP is as follows:
�¡zf (>) = �o/cM/ − 21.4C?;Ht(0.00437& + 1) + �Z(/)7�w��(/)�w��(/)7�w��(/) 2�o(c-5R6 (Eqn. 5.1)
where �¡zf (>) is the y-coordinate representation of the response signal to be displayed on
ERB scale;
�o/cM/ is the offset from the point of origin of RTAP UI display window;
�o(c-5R6 is the distance between two adjacent BF processed signal defined by equation 5.2.
�o(c-5R6 = ®���7®̄ Y�°YRZ|H (Eqn. 5.2)
where ��Rk is the final vertical point on RTAP UI display window where the pixels are plotted.
This is set as a constant of 490;
�o/cM/ is the beginning vertical point on RTAP display window where the pixels are plotted.
This is set as constant of 200;
@f� is the number of BF channels. The interval between two adjacent signals represented in
ERBS is needed to calculate intermediate start points, �5R/7o/cM/, for all signals between the
. highest and lowest BF signals
104
�5R/7o/cM/(A) = ��Rk − (2A − 1)� (Eqn. 5.3) o(c-5R6where is an incremental index starting from 0 to A @f� − 1 Equation 5.1 multiplies . �o(c-5R6 by
2 so as to scale the normalised processed sample data to be within the boundaries of
positive and negative half cycle of a BF signal. Segmentation in this manner ensures that
boundaries are clearly defined for two adjacent BF signals and that signals do not overlap
when projected on screen.
Figure 5.2: ERBS representation of the first window frame of DRNL response in RTAP based on 85 BF channels.
RTAP is able to generate and project 85 evenly spaced DRNL signals for the first
window frame based on the settings of table 4.2 on machine 1 as illustrated in in figure 5.2.
Amplitude and phase increases from higher to lower frequencies. Though, the amplitude of
signals below 500 Hz reduces. As the number of BF channels increases, more signals are
condensed in the UI display window. If the interval between adjacent signals become small,
spaces between individual signals on the ERBS graph will become insignificant to a point
that the results become incomprehensible. A better option is to harness the colour display
capabilities of the spectrogram to differentiate signal intensities. Furthermore, the limit of the
number of BF channels that can be displayed on the UI display window will increase with a
spectrogram implementation given that a window frame of data for 1 BF channel can be
represented by a only a single row of pixels as compared to the ERBS plot. Figures A.11 to
250
379
669
1179
2003
3402
6000
0 0.01 0.02 0.03 0.04 0.05 0.06
BF
Sit
es
(Hz)
Time (ms)
Maximum Load on Machine 1 (DRNL stage)
105
A.15 in appendix A contain the rest of the RTAP generated responses of the IHCRP, NRR
and ANSP stages.
5.1.5 Spectrogram Plots
An alternative plotting method used for assessing response of the auditory model is
through the use of a spectrogram. A spectrogram is used for displaying the variation of
spectral density of a signal with respect to change in time [55]. The intensity of a frequency
component, f, at a given time, t, of an input signal in a spectrogram, S, is depicted by the
shade or colour of the resultant point S(f, t). Hence, in a black and white spectrogram, a
signal with larger variation in its amplitude within a discrete frequency will be represented by
pixels in darker shade of black while a smaller amplitude variation generates pixels in a
higher shade of white. In a similar contrast, a multi-coloured spectrogram defines the
intensities of a signal in various colours. This colour contrast is unique to RTAP and will be
described in the paragraphs below.
In RTAP, the colours of a spectrogram are rendered using the setPixelAt JUCE
function. So far, for the ERBS plotting in RTAP, the first two parameters in setPixelAt were
used to define the x and y coordinates of the pixel to be set on RTAP UI display and the third
parameter was left as a constant. For the spectrogram rendering, all three parameters are
varied to represent the intensities with different colours. In particular, the third parameter in
the setPixelAt defines a specific colour through the use of a C++ colour class. As a colour is
represented in the JUCE library in the form of C++ class, memory resources have to be
allocated for the colour classes every time setPixeAt is called. This memory allocation during
runtime adds significant processing time to the computing process. To remedy the situation,
a finite number of colours are pre-allocated in the constructor AuditoryPeripheryJUCE class
before being used in the pixel render thread. The discrete colour levels are fixed to thirty to
coincide with 30 BF channels, which was the minimum number of parallel BF channels
required to operate in real-time as part of the goal of this project.
In the pixels rendering thread, the thirty pre-allocated colour classes are linearly
distributed in the range of minimum and maximum values from processed data within a
window frame. In other words, the two colours at each end of a single line colour spectrum
will be defined by maximum and minimum values that are found from a range of processed
data within a window frame. All other colours in the spectrum are represented by equal sized
intervals. Each interval when summed up for the length of the colour spectrum will equal to
the difference between the maximum and minimum parameters of a window frame. Equation
5.4 illustrates this relationship:
106
am = at + £ �w��(/)7�w��(/)GGm±H (Eqn. 5.4)
where am is the discretised colour boundary level for comparison with the processed data;
�m5R(>) is the minimum processed data value obtained in 1 window frame;
�mce(>) is the maximum processed data value obtained in 1 window frame;
at is the lowest boundary level for the colour range set at �m5R(>);
N is the pre-allocated number of colour classes set at 30 for RTAP.
The maximum and minimum values of the processed data in one window frame is
retrieved from the algorithm thread following the completion of one processed data
computation within the same recursive loop as shown by the pseudocode listing below:
for (j=0; j<WindowFrameSizePerBF; j++)
{
// perform either DRNL, IHCRP, NRR or ANSP here ...
// find local maxima & minima for 1st window frame
if (dNumAlgoInvk < 1)
{ if (fMaxPixDisplayVal < ProcessedData[j])
fMaxPixDisplayVal = ProcessedData[j];
if (fMinPixDisplayVal > ProcessedData[j]) fMinPixDisplayVal = ProcessedData[j];
}
}
Listing 5.1: Acquisition of maximum and minimum values.
The initial settings for the maximum and minimum parameters, fMaxDisplayVal and
fMinDisplayVal are set to 65535 and -65536 respectively. Within every cycle of the recursive
loop, each processed data is compared with the set values in fMaxDisplayVal and
fMinDisplayVal. If the processed data is larger than the stored parameter in fMaxDisplayVal,
the processed data is stored in fMaxDisplayVal as the new maximum value. Similarly for the
minimum value if the processed data is below the stored parameter in fMinDisplayVal, the
processed data is stored in fMinDisplayVal to represent the new minimum value. Once the
recursive loop is exited, the maximum and minimum value for the window frame is said to
have been found. Thereafter, a POSIX based condition variable (CV) is signalled to enable
the pixels rendering thread to run.
In the pixel render thread, every processed data sample in a window frame will be
compared with the thirty discrete levels of values defined by cm in equation 5.4 within the
range �mce(>) and �m5R(>). Should the respective processed data sample fall in between
floating point variables representing two adjacent colour discrete levels, the setPixelAt
function is called to set the pixels in the image buffer to a colour level based on the pre-
107
allocated colour class indexed by m. The pseudocode for pixel rendering in RTAP
spectrogram is listed as follows:
// 1) Draw pixels from left to right of RTAP display window for (x=0; x < x_Display_Window_Width; x+=PIXEL_WIDTH)
{
// 2) recursive loop to increment vertically from bottom to top of RTAP for (j=0; j < NumBFchannels; j++)
{
// 3) Compute y-axis offset y = PIXEL_Y_OFFSET-(j*PIXEL_WIDTH);
// 4) Scroll through entire 30 colour bands and find appropriate match
// for processed data for (m=0; m<NUMBER_OF_COLOUR_BANDS; m++)
{
// 5) Check whether every processed data fall within the colour band range if ((ProcessedData[j][x] >= Discrete_Colour_Level[m-1]) &&
(ProcessedData[j][x] < Discrete_Colour_Level[m]))
{ // 1:4 pixel rendering
// 1 pixel render on the image buffer:
Plot2Display.setPixelAt(x, y, c[m]); // 3 surrounding pixels also rendered by
// replicating Plot2Display.setPixelAt 3 times
}
} }
}
Listing 5.2: Static spectrogram display.
Figure 5.3: Colour representation of signal intensity in spectrogram.
In figure 5.3, the dark blue colour at the left most corner represents the highest signal
intensity in the negative half of a signal whereas the black colour on the right corner in the
same figure defines the highest signal intensity in the positive half of a signal. The green
coloured block in the centre of figure 5.3 defines zero at the centre of the signal amplitude
range. Table 5.1 summarises the impact of the colour contrast on the different responses
within RTAP. Figure 5.4 demonstrates the DRNL response displayed in RTAP for 180BF
channels with the input settings of 4.2. This DRNL response is the same signal output as its
ERB scaled response counterpart in figure 4.9 except that the spectrogram plot has more BF
channels represented than the ERBS plot.
108
RTAP stages Dark Blue Black
DRNL Lowest BM displacement Highest BM displacement
IHCRP Lowest voltage Highest voltage
NRR Lowest release rate Highest release rate
ANSP Lowest probability of spiking Highest probability of spiking
Table 5.1: Spectrogram colour hue significance to the various stages of RTAP.
Figure 5.4: Spectrogram representation of the first frame window frame of the dual resonance nonlinear (DRNL) filterbank response in RTAP for 180 BF channels.
For the DRNL response in figure 5.4, the signal is the most intense at the lower BF
region and this corresponds to the input stimulus frequency of 500 Hz. The travelling wave of
the positive half of the first sinusoidal cycle begins from the high BF of 6 KHz represented by
a vague and thin yellow strip starting from the top and left of the spectrogram going
downwards in figure 5.4. As the signal amplitude and phase increases, the yellow strip
thickens and starts leaning slightly towards the right as it approaches towards the 500 Hz BF
site and beyond. The negative half of the first sinusoidal cycle follows the same path and has
the same phase lag as the traveling wave of the positive cycle. The only exception is that the
representation of the negative half cycle in figure 5.4 is blue in colour and is right shifted
from the yellow coloured strip. The colour representation of the positive half of the second
sinusoidal cycle travelling wave onwards gradually changes from yellow to red when
approaching the 500 Hz BF site and then to black at the 500 Hz BF site region signifying the
250
390
676
1111
2031
3460
6000
0 0.01 0.02 0.03 0.04 0.05 0.06
BF
Sit
es
(Hz)
Time (ms)
Maximum Load on Machine 2 (DRNL - Unoptimised)
109
highest amplitude gain. Beyond the 500 Hz BF region, the signal returns to red and then
finally to yellow to indicate the decay in amplitude of the travelling wave. The negative half of
the second sinusoidal wave has similar characteristic and at the least intense BF region it is
represented in dark blue before returning to a lighter shade of blue for dying amplitudes.
Figure 5.5: Spectrogram representation of the first window frame of the inner hair cell receptor potential (IHCRP) response in RTAP for 123 BF channels.
The colour hue of the generated RTAP spectrogram of IHCRP response is quite
different from the DRNL response. This is identifiable by the colour representation of the two
responses. From figure 5.5, it can be concluded that IHCRP response possesses more
distinguishable sinusoidal signals than DRNL response though it has the same traits as the
DRNL response in terms of amplitude gain and phase characteristics. The NRR and ANSP
responses in figures 5.6 to 5.9 indicate that the negative half of the signals from IHCRP
response have been filtered leaving the positive half to produce vesicle release and AN
spiking probabilities in both LSR and HSR fibres. The AN spiking probability in the HSR
fibres in figure 5.9 drop considerably as time progresses and this can be observed from the
transition of colour hue from red to yellow to green at the 500 Hz BF site. This response is in
tune with a typical tone burst AN HSR response, in which there is a sharp reduction in the
likelihood in spiking in the first 10ms to 20ms of the onset of the stimulus followed by a
slower likelihood of AN firing thereafter [56].
250
389
673
1133
2009
3472
6000
0 0.01 0.02 0.03 0.04 0.05 0.06
BF
Sit
es
(Hz)
Time (ms)
Maximum Load on Machine 2 (IHCRP - Unoptimised)
110
Figure 5.6: Spectrogram representation of the first window frame of the neurotransmitter release rate (NRR) response for AN LSR fibres in RTAP for 96 BF channels.
Figure 5.7: Spectrogram representation of the first window frame of the neurotransmitter release rate (NRR) response for AN HSR fibres in RTAP for 81 BF channels.
250
386
682
1165
2057
3513
6000
0 0.01 0.02 0.03 0.04 0.05 0.06
BF
Sit
es
(Hz)
Time (ms)
Maximum Load on Machine 2 (NRR LSR - Unoptimised)
250
387
675
1131
2053
3440
6000
0 0.01 0.02 0.03 0.04 0.05 0.06
BF
Sit
es
(Hz)
Time (ms)
Maximum Load on Machine 2 (NRR HSR - Unoptimised)
111
Figure 5.8: Spectrogram representation of the first window frame of the auditory nerve spiking probability (ANSP) response for LSR fibres in RTAP for 85 BF channels.
Figure 5.9: Spectrogram representation of the first window frame of the auditory nerve spiking probability (ANSP) response for HSR fibres in RTAP for 65 BF channels.
250
394
669
1136
2003
3402
6000
0 0.01 0.02 0.03 0.04 0.05 0.06
BF
Sit
es
(Hz)
Time (ms)
Maximum Load on Machine 2 (ANSP LSR - Unoptimised)
250
391
675
1165
2012
3475
6000
0 0.01 0.02 0.03 0.04 0.05 0.06
BF
Sit
es
(Hz)
Time (ms)
Maximum Load on Machine 2 (ANSP HSR - Unoptimised)
112
5.2 Scrolling Plot Display
5.2.1 Background
In RTAP, data is streamed to the CPU for computation in contiguous blocks of data
at fixed time intervals regardless of the audio input source. As the end time in a real-time
system is non-deterministic, processed data has to be either logged or visually projected for
analysis. In the case of a real-time computational model, a feasible method of processed
data analysis is through the means of a continuous visual projection via the computer display
screen. Similarly, RTAP is designed to display real-time processed data through a dynamic
image buffer display that is rendered as a scrolling image on its UI display window. The
image is scrolled from the right to left of the screen with the latest processed data first
projected on the right of the display window.
The UI display window width for RTAP is dependent on the screen resolution settings
of the computer that RTAP runs on. The length of RTAP UI display window running on
machine 1 is 1350 pixels which is slightly larger than 1280 pixels, which is the window frame
size under the largest sampling frequency setting of 22.05 KHz. Hence, machine 1 is
capable of displaying one full window frame of processed data and a small fraction of the
subsequent window frame on the UI display window. However, if processed data from every
window frame is to be displayed in its full extent, the display window will need to be
refreshed with a new image every 58ms. An ordinary static positional screen display of
refreshed images at a rate of 17 Hz is too quick to be interpretable. Hence, this option is
clearly not feasible for implementation in RTAP.
A scrolling image traversing from right to left at a decent scroll speed that renders all
the processed data as pixels will also pose a potential problem of auxiliary image buffer
overrun. This is the case because the auxiliary image buffer is utilised as a circular buffer. As
an example, consider the scenario where audio data is streamed from the microphone
channel and processed by the algorithms. Pixel rendering on the image buffer is broken
down into various segments of equal length size based on the image scroll speed. As the
image buffer gets filled with pixels from the segmented sub-frame of the first window frame,
the auxiliary image buffer is being filled with processed data from subsequent window
frames. Because the processed data quantity written to the auxiliary image buffer is larger
than the processed data quantity written to the image buffer, the auxiliary image buffer will
eventually be overwritten with new data before it can write the pixels into the image buffer.
113
5.2.2 Implementation
To avoid auxiliary image buffer overrun without compromising the scroll speed,
processed data write size or the quantity of processed data to be transferred to the auxiliary
image buffer has to be condensed. This write size to the auxiliary image buffer has to be the
same as the pixel write size to the image buffer, which in turn is identical to the number of
new pixels displayed to the right of the UI display window. One way to reduce the number of
processed data transfer to the auxiliary image buffer is by averaging sub-frames of
processed data within a window frame. However, this causes the loss of vital information
such as the amplitude of processed data signal. An alternative method that is adopted, is
subsampling, which involves extracting processed data sample at an interval of a
subsampling period. Listing 5.3 describes the pseudocode for subsampling.
for (i=0; i<NumBFchannels; i++)
{ // 1) Algorithm segment ...
for (j=0; j<WindowFrameSizePerBF; j++) {
// 2) Algorithm segment ...
// 3) Subsampling for scrolling plot
if (Scroll_Plot_Option_Selected)
{
// 4a) Continue to store processed data for scroll display // once subsample variable is decremented to 0
if (SubSample == 0)
{ // 5) Store the processed data
Image_Buffer[i][ImgBuff_Write_Track[i]] = ProcessedData[k][j];
// 6) Reset the subsample variable to acquire the next available
// processed data
SubSample = 100;
// 7) Increment image buffer tracker to point to the next image buffer
// segment
ImgBuff_Write_Track[i] = (ImgBuff_Write_Track[i]++) % IMGTEMPSTORESIZE; }
}
else // 4b) Decrement subsample variable
SubSample--;
... // Continue with other tasks...
...
}
}
Listing 5.3 Subsampling processed data in all algorithm functions.
Immediately after the subsampling stage in the algorithm function, the pixel render
thread is invoked via the POSIX condition variable (CV), cvDrawPlot. In the render pixel
114
thread, the subsamples from the auxiliary image buffer are translated on to the image buffer.
Though the size of the image buffer is larger than the UI display window in RTAP, the image
buffer is clipped according to the size of the UI display window. The scrolling effect in RTAP
is produced by the discrete shifting of the clipping region to the right of the image buffer at
every instance of the timer callback function invocation. This effect is illustrated in figure
5.10. The auxiliary image and image buffers are implemented as circular buffers, which
mean that once the end of the buffers are met, data processing at the start of the buffers
immediately follow. This implementation gives a continuous scroll effect regardless of the
algorithm processing runtime.
Figure 5.10: Image buffer clipping and projection of the display window.
Since the plot is scrolled from right to left of the UI display window, the pixels are
required to be rendered on the right of the display window just beyond the clipping region. It
was stated a few paragraphs ago that the size of the image buffer segment hosting the new
pixels for display on the right of the UI display window has to be identical to the processed
data write size to the image and auxiliary image buffers so as to project an instant response
Display window
Image buffer
*Frame 5
Display window
Image buffer
*Frame 20
Display window
Image buffer
*Frame 50
Display window
Image
*Frame 70
Clipped image buffer (projected on display
window)
* Arbitrary frame numbers for illustration
purpose only.
115
from the image scroll. Hence, in order to facilitate this requirement, the subsampled
processed data from the auxiliary image buffer has to be translated to the right of the UI
display window off-screen and scrolled into the display window at the next instance of
display window refresh. This technique generates a rapid and continuous response of the
image scroll effect. The pseudocode listing for this technique is laid out below.
// 1) Ensure that the algorithm thread not processing the same audio data packet
if (Pixel_Render_count_semaphore != Algorithm_Processing_count_semaphore) {
// 2a) Check if the scrolling image has reached the left end of the display
// window if (Display_Start_X_Position > 0)
{
// 2b) Update variable if it image has not reached to the left end ImgBuff_Display_Start_X_Position -= Scroll_Speed;
}
else
{ // 2c) Maintain image display from left to right of display window
// once image has scrolled to far left of display window
ImgBuff_Display_Start_X_Position = 0; }
// 3) Prepare to render pixels in the x-axis for (x=0; x<= Num_of_ProcessedData_to_display; x++)
{
// 4) Update global image buffer read offset
ImgBuff_Read_Track = (x + ImgBuff_Write_Track[0]) % sizeof(ImgBuff); // 5) Update the x-axis position for pixel rendering
ImgBuff_X = (ImgBuff_Render_Offset + x) % sizeof(Aux_ImgBuff);
// 6) Erase the column of the image buffer where pixels are to be drawn
// 7) Skim through all BF channels for (j=0; j< NumBFchannels; j++)
{
// 8) Compute y-axis offset either for ERBS or spectrogram plot
// 9) Render pixels either for ERBS or spectrogram plot
Plot2Display.setPixelAt (ImgBuff_X, y, Pixel_colour );
} }
// 10) Update global image buffer x-axis offset to point to the next location
// for rendering pixels ImgBuff_Render_Offset = (ImgBuff_Render_Offset + Scroll_Speed) % IMGBUFFSIZE;
// 11) Reset variable to indicate no more pixel rendering for current frame
Num_of_ProcessedData_to_display = 0;
... // Continue with other tasks...
...
}
Listing 5.4: ERBS and spectrogram plot scrolling.
The scrolling image has to account for two events that are unaccounted for in the
static display. In static image display, the image to be drawn is written to a fixed location
116
within the image buffer. Scrolling image display, however, has to draw pixels constantly on
various adjacent locations on the image buffer as the buffer is scrolled horizontally. The
image buffer is arranged as a circular buffer or in other words, the end of the image buffer is
immediately followed by the start of the same image buffer. Hence, at the conclusion of the
drawing of pixels on the image buffer at its end, the pixels are redrawn back at the start of
the image buffer. Before pixels are painted on to the image buffer, the image buffer segment
where the pixels are to be drawn need to be erased. This is done to ensure that signals are
not overlapped on top one another as the pixel render thread move from the end of the
image buffer to the start. This is carried out in a recursive loop where black pixels are drawn
one pixel at a time along the entire height of the image buffer for the width of two pixels.
Hence, pixels erasure and rendering are carried out at every segment of the image buffer
measuring 2-by-600 pixels.
5.2.3 Results
Subsampling preserves amplitude at regular intervals and hence, allows
distinguishable projection of processed data for a respective notable auditory stimulus. Its
effects are observable in figures 5.11 and 5.12 that depict the ANSP responses for LSR and
HSR fibres for speech. The input stimulus is a sentence spoken as follows: ‘Door with no
lock to lock’. Three voices are generated from a text-to-speech converting website [57] and
played at maximum volume through built-in speakers on machine 1 that are acquired by
RTAP through the microphone input channel. The minimum and maximum BF range is set to
250 Hz and 6 KHz respectively with a sampling rate of 22.05 KHz and an input stimulus
scale of 50 dB SPL. The number of BF channels is set to 30 though only 28 channels are
displayed in the figures below as no activities take place beyond 5 KHz.
As the drawing of the ERBS plots on screen utilise pixel instead of line rendering, the
results of the speech are presented in the form of a real-time scatter plot. In figure 5.11, the
concentration of pixel population rendering for the male voice occur at lower frequencies up
to 1.8 KHz and for the female voices up to 2.8 KHz, which translates to the likelihood of
firings within LSR fibres of auditory nerves in the aforementioned auditory spectral range.
Alternatively, the AN HSR fibres generate more significant AN firing probability. In the plot of
responses of HSR fibres in figure 5.12, ANSP is active for all frequencies up to 4 KHz
though significant firings take place at 1.5 KHz and 1.8 KHz for male and female voices
respectively. This can be observed from the larger concentration of pixel clustering in the
range of 250 Hz to the BF sites in the proximity of 1.5 KHz and1.8 KHz.
117
5.3 Summary
It has been demonstrated that RTAP is capable of displaying static responses of
processed data in either ERB scaled or spectrogram graphs. The equations characterising
ERB scaled plots covered in chapter 4 are modified to automatically generate signals with
even spacing that do not overlap in RTAP when they are placed in a single graph. For
spectrogram plots, thirty linearly distributed colours are utilised to define intensities of
processed data in a single window frame. RTAP is also capable in projecting scrolling
display of ERB scaled plots. This was illustrated in the last section of this chapter where
ANSP responses based on audio streamed from a built-in microphone channel on a laptop
has been covered for speech in different voice settings.
118
Figure 5.11: ANSP response in LSR fibres of real-time speech illustrated in RTAP.
4819Hz
4319Hz
3871Hz
3469Hz
3109Hz
2786Hz
2497Hz
2238Hz
2005Hz
1797Hz
1611Hz
1444Hz
1294Hz
1159Hz
1039Hz
931Hz
835Hz
748Hz
670Hz
601Hz
538Hz
483Hz
432Hz
388Hz
347Hz
311Hz
279Hz
250Hz
Time (sec)
Joey – USA (male) Salli – USA (female) Nicole – Australia (female)
119
Figure 5.12: ANSP response in HSR fibres of real-time speech illustrated in RTAP.
250Hz
279Hz
311Hz
347Hz
388Hz
432Hz
483Hz
538Hz
601Hz
670Hz
748Hz
835Hz
931Hz
1039Hz
1159Hz
1294Hz
1444Hz
1611Hz
1797Hz
2005Hz
2238Hz
2497Hz
2786Hz
3109Hz
3469Hz
3871Hz
4319Hz
4819Hz
Time (sec)
Joey – USA (male) Salli – USA (female) Nicole – Australia (female)
120
Chapter 6: Optimisation and Load Profile
One essential feature of a real-time auditory pathway model is that it has to be
capable of accommodating a large number of BF channels. A simulation of a stage within a
real-time auditory pathway model with a large range of BF channels will be able to project a
wide spectral range of cochlea responses. This allows the study of the perception of a wide
variety of auditory stimulus. One method of achieving large loads in terms of the number of
BF channels is to study the algorithm in the model and identify the segment that takes up the
longest time to process. Following this, investigations are required to be carried out into
optimising the segment of the code with faster code that generates output close to the
original code. By reducing the processing time in such a way, there will be ample room to
increase the number of BF channels in the real-time auditory pathway model.
This chapter describes mathematical optimisation as a form of reducing algorithm
processing runtime of RTAP and to increase the number of BF channels to represent more
discrete frequency points. It will be shown that though the number of exponential functions
used in RTAP is small, it nonetheless take up significant processing time. Hence, a faster
version of an exponential function will be implemented in RTAP and the responses of every
stage within the auditory pathway model will be exhibited. The final topic of this chapter
deals with maximum load profiles of RTAP executed on two computers under single and
double precision executions. Within the same section, thread profiles of RTAP are covered.
6.1 Mathematical Optimisation
6.1.1 Background
The compiler from Microsoft Visual Studio (MVS) is used as a platform for building
RTAP and it offers build optimisations to speed up the program. However, the optimisation
facilities of MVS compiler are unused. This is because code optimisation of such pedigree
potentially brings about non-deterministic mathematical responses that are impractical to
debug due to the large algorithm base used in RTAP. Build optimisations are therefore,
overlooked in RTAP and an alternative form of optimisation is required.
Mathematical operators and functions are used extensively in RTAP. Mathematical
operators consist of basic mathematical operations such as additions, subtractions,
multiplications and divisions. These operators with the exception of multiplication and
division are mapped in the C++ library directly to assembly language instructions that use
single CPU clock cycle when executed. Multiplications and division generally use several
CPU cycles and are dependent both on available hardware resources as well as software
library implementation. However, on a CPU with high clock rate in the range of gigahertz, the
121
clock cycles that are used up during the execution of these operators are insignificant due to
the small tick size, which ensures that these math operators are computed rapidly [58].
Mathematical functions in RTAP generally comprise exponential and logarithmic
functions as well as trigonometric entities such as sine and cosine functions. These functions
are special functions that have reliance on software math library that provides an abstract
layer of codes utilising basic mathematical operators to bring about the desired outcome.
One example is the exponential function that can be computed with Maclaurin series using
basic math operators [59]. Hence, an exponential function will require more time in execution
than a single mathematical operator on a program compiled using any C++ library. In terms
of algorithm computation, RTAP is broken into two parts. The first segment contains the
initialisation of constants and computation of coefficients in the non-real-time segment of
RTAP when the ‘Set’ button is clicked. As this segment is not time critical, the Visual Studio
C++ math library is used for computing all the parameters within this segment. In the real-
time segment, though, it is essential that the code executed take as little time as possible so
as to compute optimum number of tasks. Hence, code optimisation especially in the real-
time segment is essential.
Optimisation of basic mathematical operators in the real-time segment is ignored and
only the special mathematical functions are considered for optimisation. Exponential and
natural logarithmic functions are the only two mathematical functions that are generally used
throughout the real-time segment of RTAP. Of these two mathematical functions, the
exponential function is more widely used as opposed to the natural logarithmic function.
Table 6.1 projects the mathematical functions used in real-time segment of RTAP and the
maximum number of invocation for the highest load profile for each stage within RTAP. It is
observed that the number exponential function is invoked as many as three times more than
the natural logarithmic functions when IHCRP response is computed regardless of the
number of BF channels. This ratio increases to five times for NRR and ANSP responses with
high and low spontaneous AN fibre types utilised during runtime.
122
RTAP functions
running in real-time
Mathematical
Functions
Implemented in
Code Functions
Maximum number
of BF channels /
Maximum number
of AN channels
(machine 1)
Number of times math
functions invoked for
maximum number of
BF and AN channels
on machine 1
exp() log() exp() log()
DRNL 1 1 85 85 85
DRNL-to-IHCRP 3 1 56 168 56
DRNL-to-NRR* 5 1 36 / 72 180 36
DRNL-to-ANSP* 5 1 30 / 60 150 30
* Based on two AN fibre type channels.
Table 6.1: Non-optimised mathematical functions utilised in RTAP.
It can be deduced from table 6.1 that the mathematical functions used in the real-
time segment of RTAP are significant for large number of BF channels. Various forms of
optimisation are required to be explored to reduce the impact of such mathematical
functions. One form of mathematical optimisation is in the form of utilisation of a different
math library that offers fast math processing and brings about the same deterministic
mathematical response. MVS has its own math library that is invoked during compile time of
RTAP. An alternative to the MVS math library is the Intel Math Kernel Library (MKL). The
MKL can be implemented on MVS that substitutes only the math library of MVS while still
using the MVS compiler for the building of RTAP. RTAP compiled with MKL can be built with
three separate settings. Parallel MKL settings allow optimisation of algorithms in threaded
application while sequential settings optimise algorithm in a serialised application. Finally,
cluster setting is used for cluster computing where various computers are connected to work
as a single computing system [60] . The profiles of the utilisation of these libraries are
presented in the next section.
Another form of optimisation is to provide an alternative method for computing the
mathematical functions other than those provided by available math libraries. One such
method is through the manipulation of bits and the manner of reading an 8 byte floating point
(FP) variable as proposed by Schraudolph [61]. FP variable storage in MVS is based on the
IEEE-754 standard that is available in either 4 bytes or 8 bytes format [62]. Schraudolph
exponential macro function code relies on 8 byte FP format. The 8-byte FP has three
segments, namely, a 1-bit sign, 11-bit exponent and 52-bit mantissa segments. Equation 6.1
is used to reconstruct the 8-byte binary data into a FP parameter.
S� @B=��E = (−1)o(1 + =)2e7e! (Eqn. 6.1)
123
where s is the sign bit; m is the mantissa; x is the exponent that is shifted by a constant bias,
x0. The FP format is illustrated in figure 6.1. For the purpose of explaining fast exponential
function, the 8-byte FP format in figure 6.1 is further divided into two halves represented by i
2eand j. The principle of fast exponentiation of input, x, works in the form of using the
parameter in equation 6.1 and then dividing the result by the natural logarithm of 2 to obtain
�e. In other words, x is manipulated as an integer via integer, i, by adding it with the bias, x0,
and then left shifting it. Reading back the integer representation, i, in FP format automatically
initiates an exponentiation effect. This algorithm is described in larger detail in the following
paragraph.
Figure 6.1: 64-bit floating point format divided into two halves for fast exponentiation.
Extracted from Schraudolph [61].
The first step in Schraudolph exponentiation function of a variable x, is to add bias,
x0, which is a constant of 1023 to x. Subsequently, the resultant addition is shifted to the left
by 20 bits through multiplication with a constant of 220 so that the high order mantissa bits of
x reside in the exponent segment. If x is a floating point variable, the fractional part of its
resultant biased 20-bit shift will create a spill over to the highest order bits in the mantissa.
This outcome is effective in providing a linear interpolation of adjacent integer exponent and
attains a 211 lookup table registry that consists of linearly interpolated results. Scaling the
variable x by the 20-bit shift constant, 220 and dividing it by the natural logarithm of constant
2 before adding the 20-bit shifted bias results in the exponentiation of x. The fast exponential
function can be characterised by the following equation.
A ≔ <^ + (� − a) (Eqn. 6.2)
where i is the integer form of ex;
x is the input parameter to be exponentiated;
a is the left bit shifted scalar given by 2)t/C@(2);
b is the bias effect of y generated by 1023. 2)t;
c is a control parameter that adjusts the approximation of the fast exponential function.
6.1.2 Implementation
The fast exponential function is defined as a macro code and is shown in listing 6.1.
A union C/C++ declaration contains an 8-byte FP number as well as two 4-byte integers. For
a single precision computation with a 4-byte FP, the FP is type casted to an 8-byte FP
124
variable. In other words, the exponent and mantissa of the 4-byte FP are padded with extra
32-bits of data before being stored in an 8-byte allocated memory location. A union
declaration allocates memory for its largest single member variable, which is 8 bytes for this
case due to the utilisation of an 8-byte double variable. The two other integer variables are
allocated in little endian format that stores the 8-byte FP variable in two halves. This is
possible as the two integer variables at a total size of 8 bytes share the same memory space
as the double variable. Hence, a macro code EXP(x) initially computes the integer equivalent
of the exponentiation of variable, x by manipulating the integer representation of the higher
4-bytes of FP variable in integer i. The result of the lower 4-bytes in integer j is ignored. The
two integers are then read back as a FP variable that results in the approximation of the
exponentiation of y.
static union {
double d;
struct {
#ifdef LITTLE_ENDIAN
int j1, i1; #else
int i1, j1;
#endif
} n; } eco;
#define LN2 0.69314718056 // natural log 2 #define EXP_A (1048576/LN2) // 2
20 / ln2
#define EXP_B 1072693248 // 1023*220
#define EXP_C 60801 // based on lowest RMS relative error
#define EXP(y) (eco.n.i1 = EXP_A*(y) + (EXP_B - EXP_C), eco.d)
Listing 6.1: Fast exponential computation.
The execution time profiles for exponential function based on MVS and MKL math
libraries as well as Schraudolph fast implementation measured on machine 1 are shown in
table 6.2. The timings are based on the exponential function used in an iterative for loop that
invokes the respective exponential functions 1280 times. The number of invocation is
identical to the number of sampled audio data acquired in one window frame for a sampling
frequency of 22.05 KHz. Code listing for the profile test is as follows:
x = -5; // constant exponent value
// Start time for math library based exponential function for (i=0; i<1280; i++)
{
y = exp(x); // MKL library based exponential function }
// End time for math library based exponential function
125
// Start time for Schraudolph fast exponential function
for (i=0; i<1280; i++) {
y = EXP(x); // Schraudolph implementation of fast exponential function
}
// End time for Schraudolph fast exponential function
Listing 6.2: Code for comparing MKL & Schraudolph exponential function.
From table 6.2 it is observed that the performance of exponential function originating
from MVS and MKL math libraries are relatively similar to one another. However,
Schraudolph exponential implementation is more than two times faster than with any of the
C++ math library enabled. Hence, due to the significant performance enhancement, the
macro code of Schraudolph fast exponential function is implemented in a global header file
within the structure of RTAP.
Compiler Options C++ math lib
exp() - (ms)
Schraudolph
EXP() - (ms)
Speedup with
Schraudolph
exponential function
Release Mode / No Intel IPP /
No Intel MKL (MVS Math
library used)
0.0609923 0.02724686 2.238507483
Release Mode / Intel IPP on /
No Intel MKL (MVS Math
library used)
0.0602093 0.02732512 2.203441376
Release Mode / Intel IPP on /
Intel MKL on (Parallel)
0.06052258 0.0271686 2.227666497
Release Mode / Intel IPP on /
Intel MKL on (Sequential)
0.06067916 0.02724688 2.227013148
Release Mode / Intel IPP on /
Intel MKL on (Cluster)
0.06130546 0.02732514 2.243555202
Table 6.2: Performance comparison of exponential function in MVS and MKL math libraries
and Schraudolph algorithm on machine 1.
6.1.3 Optimised RTAP Responses
The results exhibited in this subsection are acquired from RTAP with the settings of
table 4.2 and with Schraudolph fast exponential function enabled. Additionally, the version of
RTAP used is compiled on MVS with Intel MKL parallel settings enabled as well. The results
of all the stages within the AP model with fast exponential function enabled were acquired
126
from machine 2 with maximum number of BF channels used and algorithm processing time
approximately close to 58 ms. The results of machine 1 is identical to that of machine 2
except the maximum number of BF channels used in machine 1 is lower than machine 2.
The larger of the two maximum BF load between machines 1 and 2 is selected for projection
in this subsection.
Processing time for generating the responses from all stages within RTAP has been
lowered with fast exponential function utilisation. Hence, RTAP is able to load more BF
channels. This additional loading will be discussed in detail in the following section. Figure
6.2 illustrates the effects of exponential function optimisation on DRNL response obtained
from machine 2. The result in figure 6.2 is identical to the non-optimised result in figure 5.4
with the exception that these results in figure 6.2 were generated with a larger number of BF
channels. With fast exponential function disabled, the number of BF channels that can be
loaded on to RTAP running on machine 2 is 178. With the fast exponential function utilised in
the compressive function in the nonlinear branch of DRNL computation, RTAP on machine 2
is able to accommodate 192 BF channels to compute the DRNL response.
Figure 6.2: Dual resonance nonlinear (DRNL) response generated in RTAP based on optimised exponential function for 192 BF channels.
The IHCRP response as a result of the fast exponential function executed in RTAP is
different from the non-optimised original plot in figure 5.5. The positive half cycles of the
signals around the stimulus frequency of 500 Hz BF site in the IHCRP response in figure 6.3
are maintained while the magnitudes of the negative half cycles of the signals are made
250
386
676
1167
2014
3477
6000
0 0.01 0.02 0.03 0.04 0.05 0.06
BF
Sit
es
(Hz)
Time (ms)
Maximum Load on Machine 2 (DRNL - Optimised)
127
more negative than the original response in figure 5.5. This is evident from the appearance
of the IHCRP response in figure 6.3 that contains more blue coloured pixels than the plot in
figure 5.5. The cause to this is due to the quantisation errors introduced in the approximation
of the two fast exponential functions used in equation 2.13. Thereafter through equation
2.15, the accumulative effects of the quantisation error residing in G(u) is amplified and the
magnitude of IHCRP response is evaluated to be less than non-optimised IHCRP response.
In other words, a reduced magnitude as a result of the use of fast exponential function
increases the negativity in the overall response from the IHCRP stage.
Figure 6.3: Inner hair cell receptor potential (IHCRP) response generated in RTAP based on optimised exponential function for 155 BF channels.
The deviation in the apical conductance, G(u), due to the multiplication of two fast
exponential functions in equation 2.13 has an adverse propagating effect to all other stages
upstream in the auditory pathway. DC bias negative offset of the negative half of the cycles
in the IHCRP instigates suppression of the negative cycles for signals in neurotransmitter
release rate (NRR) and auditory nerve spiking probability (ANSP) stages. Both the low (LSR)
and high spontaneous rate (HSR) fibres responses are affected. The quantisation error
effects are especially detrimental to the computation of the ANSP HSR fibre as it brings
about instability in the algorithm response after computing only 50 data sets within the first
BF channel. The degree of instability introduced by the accumulative quantisation error
effects especially from the multiplication of the fast exponential functions in equation 2.13
are scaled along with the amplitudes of HSR ANSP fibres that linger in the range of 101. Due
250
386
673
1175
2010
3437
6000
0 0.01 0.02 0.03 0.04 0.05 0.06
BF
Sit
es
(Hz)
Time (ms)
Maximum Load on Machine 2 (IHCRP - Optimised)
128
to the larger range of amplitude generated from the HSR fibres algorithm as compared to
other stages as well as the LSR fibre, unstable behaviour is initiated at the start of ANSP
HSR computation. Figure 6.4 demonstrates the unstable HSR ANSP response of the 250 Hz
BF point computed based on Schraudolph exponential function that traverses indefinitely to
the infinite regions of negative domain.
Figure 6.4: Unstable HSR ANSP response after refractory period upon the start.
To eradicate this computational inconsistency in the response from IHCRP stage,
equation 2.13 has to be reviewed again. Since, revamping the equation was not an option; a
basic computing restructuring was required. Equation 2.13 can therefore, be expanded into
the equation as follows:
l(B) = l-595cmce n1 + �^" b− .(/)o! h �^" b.!o! h p1 + �^" b− .(/)oq h �^" b.qoq hrs7H + lc (Eqn. 6.3)
One observation that follows is that the parameters, u0, s0, u1 and s1 are constants and the
two exponential functions hosting these constants can be pre-computed. Moreover, because
there is no time constrains with pre-computation, the exponential function from the Intel MKL
can be used to compute the exponential functions. The two other exponential function in
equation 6.3 handling u(t), which is varying IHC cilia displacement over time, is implemented
using the Schraudolph fast exponential function in real-time. As a result instead of
completely relying solely on the fast exponential functions to compute the apical
conductance, G(u), the computational load is shared equally between the exponential
functions from Intel MKL and Schraudolph fast exponential approximation.
-40000
-30000
-20000
-10000
0
10000
20000
30000
45 50 55 60
AN
sp
ikin
g p
rob
ab
ilit
y
Sample Number
ANSP HSR Response
ANSP Non-optimised
ANSP Optimised
129
Figure 6.5: IHCRP response displayed in RTAP based on optimised exponential function for 155 BF channels.
There is a vast difference in the IHCRP response in figure 6.5 as compared to that in
6.3. The blue pixels that dominated the projection of the spectrogram in figure 6.3 as a result
of the negative DC bias offset in the negative cycle from the fast exponential function have
diminished in the spectrogram of figure 6.5. In fact, the spectrogram in figure 6.5 matches
the spectrogram in figure 5.5 with the exception that the spectrogram in figure 6.5
accommodates more BF channels generated from machine 2 resulting in a higher resolution
spectrogram. The BF loading for non-optimised RTAP running on machine 2 simulating the
IHCRP stage is 125 BF channels while it is 155 BF channels with fast exponential enabled.
The essence of the changes in the computing structure of G(u) has a profound effect
on all stages from IHCRP to ANSP as observed from the spectrograms in figures 6.6 to 6.9
generated with the fast exponential function. Firstly, all of the plots resemble the responses
from the non-optimised plots in figures 5.5 to 5.9 generated with machine 1. Secondly, using
the modified G(u) computing approach of equation 6.3 has eradicated the instability in the
response of the ANSP HSR fibre recorded with fast exponential functions. The NRR LSR,
NRR HSR, ANSP LSR and ANSP HSR responses of figures 6.6 to 6.9 generated with
machine 2 with fast exponential function disabled have the following respective BF load
profile: 100, 78, 89 and 65. With fast exponential function enabled in RTAP running on
machine 2, the BF loading for NRR LSR, NRR HSR, ANSP LSR and ANSP HSR stand at
123, 104, 107 and 79 respectively.
250
386
673
1175
2010
3437
6000
0 0.01 0.02 0.03 0.04 0.05 0.06
BF
Sit
es
(Hz)
Time (ms)
Maximum Load on Machine 2 (IHCRP - Optimised)
130
Figure 6.6: Neurotransmitter release rate (NRR) response for low spontaneous rate (LSR) fibre displayed in RTAP based on optimised exponential function for 123 BF channels.
Figure 6.7: Neurotransmitter release rate (NRR) response for high spontaneous rate (HSR) displayed in RTAP based on optimised exponential function for 104 BF channels.
250
389
673
1163
2009
3472
6000
0 0.01 0.02 0.03 0.04 0.05 0.06
BF
Sit
es
(Hz)
Time (ms)
Maximum Load on Machine 2 (NRR LSR - Optimised)
250
385
671
1169
2037
3443
6000
0 0.01 0.02 0.03 0.04 0.05 0.06
BF
Sit
es
(Hz)
Time (ms)
Maximum Load on Machine 2 (NRR HSR - Optimised)
131
Figure 6.8: Auditory nerve spiking probability (ANSP) response for low spontaneous rate (LSR) displayed in RTAP based on optimised exponential function for 107 BF channels.
Figure 6.9: Auditory nerve spiking probability (ANSP) response for high spontaneous rate (HSR) fibre displayed in RTAP based on optimised exponential function for 79 BF channels.
250
392
672
1154
2039
3498
6000
0 0.01 0.02 0.03 0.04 0.05 0.06
BF
Sit
es
(Hz)
Time (ms)
Maximum Load on Machine 2 (ANSP LSR - Optimised)
250
391
665
1129
2039
3533
6000
0 0.01 0.02 0.03 0.04 0.05 0.06
BF
Sit
es
(Hz)
Time (ms)
Maximum Load on Machine 2 (ANSP HSR - Optimised)
132
6.1.4. MAP and Optimised RTAP Responses Comparisons
The comparison of responses between MAP and optimised RTAP using Schraudolph
exponential function are discussed in this section. The responses are acquired from all
stages from stapes displacement in the middle ear to auditory nerve spiking probability
(ANSP). As RTAP-numerical has identical response as RTAP, the results in this section are
generated from RTAP-numerical. With the settings identical to that in section 3.9, input sine
tones of 500 Hz, 1000 Hz, 3000 Hz and 5000 Hz were injected into MAP and RTAP-
numerical models and the BFs were selected as 250 Hz, 1039 Hz, 3109 Hz and 5377 Hz
respectively. The input sine tones were also altered in the range of 10 dB SPL to 90 dB SPL
for every input sine tone frequency and BF.
Figures 6.10 to 6.13 depict that the lowest deviation between MAP and RTAP occurs
in the computation of stapes displacement where Schraudolph exponential function is not in
use. This error remains identical to the RMS error measurement for stapes displacement in
figures 3.16 to 3.19. The RMS error increases for the BM displacement stage because each
pass of the DRNL filter comprises a Schraudolph fast exponential function as part of the
nonlinear computation. Two additional Schraudolph fast exponential functions implemented
in the IHCRP stage cause a further increase in the RMS errors. The largest deviations
between MAP and RTAP occur in the computation of responses for LSR and HSR fibres at
the NRR and ANSP stages where the errors due to Schraudolph fast exponential function
from all stages accumulate.
RMS errors between MAP and optimised RTAP-numerical are observed in figures
6.10, 6.11 and 6.12 to be under 5% for responses corresponding to 500 Hz, 1000 Hz and
3000 Hz sine tone inputs whereas the errors are higher for 5000 Hz sine tone input just
under 8% in figure 6.13. In comparison, the RMS errors observed in figures 3.16 to 3.19 are
below 1% with Schraudolph fast exponential function disabled. The approximations used in
Schraudolph fast exponential have introduced larger output quantisation errors as opposed
to the exponential computation used in the math libraries of MKL and MVS respectively.
Hence, Schraudolph function enhances computing speed of an exponential function while
reducing its resulting accuracy.
133
Figure 6.10: Normalised RMS errors for various responses between MAP and optimised RTAP based on a 500 Hz sine tone input observed from a 250 Hz BF channel.
Figure 6.11: Normalised RMS errors for various responses between MAP and optimised RTAP based on a 1000 Hz sine tone input observed from a 1039 Hz BF channel.
0
0.01
0.02
0.03
0.04
0.05
0.06
0 20 40 60 80 100
No
rma
lise
d R
MS
Err
or
Input Level (dB SPL)
Normalised RMS Error: RTAP-numerical Optimised vs. MAP (500Hz sine tone)
Stapes Displacement
BM Displacement
IHCRP
NRR LSR
NRR HSR
ANSP LSR
ANSP HSR
0
0.005
0.01
0.015
0.02
0.025
0.03
0.035
0.04
0.045
0 20 40 60 80 100
No
rma
lise
d R
MS
Err
or
Input Level (dB SPL)
Normalised RMS Error: RTAP-numerical Optimised vs. MAP (1000Hz sine tone)
Stapes Displacement
BM Displacement
IHCRP
NRR LSR
NRR HSR
ANSP LSR
ANSP HSR
134
Figure 6.12: Normalised RMS errors for various responses between MAP and optimised RTAP based on a 3000 Hz sine tone input observed from a 3109 Hz BF channel.
Figure 6.13: Normalised RMS errors for various responses between MAP and optimised RTAP based on a 5000 Hz sine tone input observed from a 5377 Hz BF channel.
6.2 Load Profile
6.2.1 Maximum Load
Load profile refers to the maximum number of BF channels that can be
accommodated by RTAP during its runtime where the algorithms are being processed with
0
0.005
0.01
0.015
0.02
0.025
0.03
0.035
0.04
0.045
0.05
0 20 40 60 80 100
No
rma
lise
d R
MS
Err
or
Input Level (dB SPL)
Normalised RMS Error: RTAP-numerical Optimised vs. MAP (3000Hz sine tone)
Stapes Displacement
BM Displacement
IHCRP
NRR LSR
NRR HSR
ANSP LSR
ANSP HSR
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0 20 40 60 80 100
No
rma
lise
d R
MS
Err
or
Input Level (dB SPL)
Normalised RMS Error: RTAP-numerical Optimised vs. MAP (5000Hz sine tone)
Stapes Displacement
BM Displacement
IHCRP
NRR LSR
NRR HSR
ANSP LSR
ANSP HSR
135
the input signal set based on table 4.2. Algorithms include the processing of the OME stage
and any one of the four functions described in table 3.2. More specifically, the number of BF
channels is set to a random number for the first time RTAP is executed on a computer. As
long as the processing time of the algorithms is less than 58ms, the parameter controlling
the number of BF channels on RTAP UI is increased. The maximum number of BF channels
is found when the processing time of the algorithms is approximately but not greater than
58ms. The maximum load of RTAP is dependent on the computer hardware and software
that RTAP runs on and thus, maximum load varies on different computers.
Figures 6.14 and 6.15 distinguish the load profiles based on math optimised and non-
optimised single precision (SP) execution of RTAP. Similarly, figures 6.16 and 6.17 present
load profiles for the optimised and non-optimised double precision (DP) execution of RTAP.
In each of the figure, the load profile is broken further down based on the computers RTAP
is executed on as well as process priority of RTAP. Math optimised execution is defined as
the running of the algorithms in RTAP with Schraudolph fast exponential functions enabled.
Single precision execution refers to the computation and projection of data in a 4-byte or 32-
bit format while the double precision execution refers to the use of 8-byte or 64-bit variables
at the runtime of the algorithms.
One clear distinction is that RTAP is able to accommodate more than 175% as many
BF channels on machine 2 as on machine 1 for non-optimised and optimised algorithms
processing as well as SP and DP executions. Hence, the 400 MHz additional clock speed on
the dual core CPU in machine 2 is significant in providing boost to the computations. It has
to be declared that although the i5 processor on machine 1 is a dual core CPU, it has a
different architecture than the CPU in machine 2. Similarly, different computer hardware
attributes of machines 1 and 2 in terms motherboard layout, access times of random access
memory (RAM), hard disk drive (HDD) and graphics display contribute to the large load
increase in RTAP for machine 2.
Another key feature is that maximum loads on both machines are dependent on the
priority that RTAP runs on. RTAP running under the real-time priority is able to run the most
BF channels as compared when it is run under any other priorities on machine 2. Real-time
priority, however, is unavailable on machine 1 due to underlying manufacturer setup of the
machine. Nevertheless, increasing the priority levels on machines 1 and 2 from the lowest to
the highest priority for the same stage within RTAP lead to a gradual increase in the number
of BF channels. The increase is more significant when RTAP is executed as real-time priority
on machine 2 for all stages. This is so because the CPU gives its undivided attention by
processing RTAP above any other processes including the OS based processes. This is
observed on machine 2 with lag introduced in the mouse and graphics responses on the
display screen that indicate that the CPU is leveraging on the execution of RTAP more than
136
the computations of interleaving tasks of the mouse and graphics devices scheduled by
Windows OS.
On a stage-by-stage basis comparison, the maximum load for DRNL stage on both
machines far exceeds any other individual stage due to its low computationally intensive
algorithm or more specifically the use of conventional signal processing algorithm such as
the IIR filter. The BF load falls for the computation of IHCRP onwards due largely to the
accumulative processing times of algorithms in the downstream stages of the auditory
pathway. At the ANSP stage, the maximum load for the algorithm computation of two AN
fibre types provides the least loading on RTAP because all the algorithms involved in the AP
model are in full operation resulting in an accumulated computationally intensive operation.
The maximum loads for the math optimised computation for all stages of RTAP on
both machines 1 and 2 are greater than the non-optimised computation ranking at a ratio of
approximately 115% and 120% respectively. This effectively projects that the exponential
functions from C++ math library play a significant part in contributing to the processing time.
This is observable for a large number of BF channels and when analysing responses at NRR
and ANSP stages where the number of exponential calls accumulate from all earlier stages
and are scaled as a multiple of the BF channels as well as the AN fibre types. Computations
of SP and DP versions of RTAP result in identical response for all stages in the AP model.
The difference in the maximum load of running the SP and DP version of RTAP fluctuates at
approximately 0.3% for all stages in the AP model.
137
Figure 6.14: Maximum load profile for non-optimised single precision execution of RTAP on machines 1 and 2.
25
35
45
55
65
75
85
Nu
mb
er
of
BF
ch
an
ne
lsRTAP Max Load Profile for Machine1 (SP)
DRNL
DRNL-to-IHCRP
DRNL-to-NRR (1 AN
fibre type)
DRNL-to-NRR (2 AN
fibre types)
DRNL-to-ANSP (1
AN fibre type)
DRNL-to-ANSP (2
AN fibre types)
50
70
90
110
130
150
170
190
Nu
mb
er
of
BF
ch
an
ne
ls
RTAP Max Load Profile for Machine2 (SP)
DRNL
DRNL-to-IHCRP
DRNL-to-NRR (1 AN
fibre type)
DRNL-to-NRR (2 AN
fibre types)
DRNL-to-ANSP (1 AN
fibre type)
DRNL-to-ANSP (2 AN
fibre types)
138
Figure 6.15: Maximum load profile for optimised single precision execution of RTAP on machines 1 and 2.
30
40
50
60
70
80
90
100N
um
be
r o
f B
F c
ha
nn
els
Machine1 Optimised Max Load Profile (SP)Math Optimised
DRNL
Math Optimised
DRNL-to-IHCRP
Math Optimised
DRNL-to-NRR (1 AN
fibre type)Math Optimised
DRNL-to-NRR (2 AN
fibre types)Math Optimised
DRNL-to-ANSP (1 AN
fibre type)Math Optimised
DRNL-to-ANSP (2 AN
fibre types)
60
80
100
120
140
160
180
200
Nu
mb
er
of
BF
ch
an
ne
ls
Machine2 Optimised Max Load Profile (SP)Math Optimised DRNL
Math Optimised
DRNL-to-IHCRP
Math Optimised
DRNL-to-NRR (1 AN
fibre type)
Math Optimised
DRNL-to-NRR (2 AN
fibre types)
Math Optimised
DRNL-to-ANSP (1 AN
fibre type)Math Optimised
DRNL-to-ANSP (2 AN
fibre types)
139
Figure 6.16: Maximum load profile for non-optimised double precision execution of RTAP on machines 1 and 2.
25
35
45
55
65
75
85N
um
be
r o
f B
F c
ha
nn
els
RTAP Max Load Profile for Machine1 (DP)
DRNL
DRNL-to-IHCRP
DRNL-to-NRR (1 AN
fibre type)
DRNL-to-NRR (2 AN
fibre types)
DRNL-to-ANSP (1
AN fibre type)
DRNL-to-ANSP (2
AN fibre types)
55
75
95
115
135
155
175
Nu
mb
er
of
BF
ch
an
ne
ls
RTAP Max Load Profile for Machine2 (DP)
DRNL
DRNL-to-IHCRP
DRNL-to-NRR (1 AN
fibre type)
DRNL-to-NRR (2 AN
fibre types)
DRNL-to-ANSP (1 AN
fibre type)
DRNL-to-ANSP (2 AN
fibre types)
140
Figure 6.17: Maximum load profile for optimised double precision execution of RTAP on machines 1 and 2.
30
40
50
60
70
80
90
Nu
mb
er
of
BF
ch
an
ne
lsMachine1 Optimised Max Load Profile (DP)
Math Optimised
DRNL
Math Optimised
DRNL-to-IHCRP
Math Optimised
DRNL-to-NRR (1 AN
fibre type)Math Optimised
DRNL-to-NRR (2 AN
fibre types)Math Optimised
DRNL-to-ANSP (1 AN
fibre type)Math Optimised
DRNL-to-ANSP (2 AN
fibre types)
70
90
110
130
150
170
190
Nu
mb
er
of
BF
ch
an
ne
ls
Machine2 Optimised Max Load Profile (DP)Math Optimised DRNL
Math Optimised
DRNL-to-IHCRP
Math Optimised
DRNL-to-NRR (1 AN
fibre type)
Math Optimised
DRNL-to-NRR (2 AN
fibre types)
Math Optimised
DRNL-to-ANSP (1 AN
fibre type)
Math Optimised
DRNL-to-ANSP (2 AN
fibre types)
141
6.2.2 Thread Profile
In section 4.5, the utilisation of threads was discussed. The main motivation of using
threads is to achieve parallelism in RTAP. Figures 6.18, 6.19 and 6.20 depict the profiles of
three threads with the exception of the algorithm threads used in RTAP. One general
observation from these profiles is that the processing time of the threads is not dependent on
the process priority that RTAP runs on. This characteristic is attributed to the non-
deterministic Microsoft Windows OS kernel task scheduler as explained in section 2.3. This
is evident especially with an increased CPU attention on RTAP with the highest priority
settings on machines 1 and 2 where the processing time for all thread profiles fluctuate.
The pixel rendering thread that is invoked constantly by the algorithm thread just
before its conclusion has the least significant impact on the processing time. This is so
because the thread does not possess computational intensive algorithms and it spends
majority of its time reading from and writing to image buffers. Alternatively, the draw image
function that is wrapped around the main RTAP thread is invoked every 58ms to project the
pixels in the image buffer on to the screen. This thread has the highest average processing
time as compared to the other two threads other than the algorithm thread. The significant
processing time is attributed to the inherent tasks in the drawImage function that initiate the
transfer of the contents of the image buffer from memory to the hardware display
mechanism, which eventually draw the pixels on to the computer screen. The record thread
profile is based on the average of two recording instances differentiated by the
‘Play+Record’ and ‘Record’ buttons. Its processing time is the second most significant after
the main thread that draws pixels on to the screen. This can be attributed to the access of
non-volatile memory locations such as the hard disk drive (HDD) that is more significant than
the access of volatile memory such as the random access memory (RAM).
On each machine, the sum of the average processing times of all threads including
the algorithm thread exceeds the limit of 58ms imposed by the sampling rate of 44.1 KHz
base audio library, DirectSound in RTAP. Even with an inactive record thread, the
processing times of remaining threads still break the banks of the benchmark of 58ms.
Should threads not be implemented, either of the complete tasks execution in RTAP namely,
algorithm processing, processed data recording or processed data display will not be
possible as the CPU will use one main thread to perform all the aforementioned tasks that
will exceed beyond 58ms. Therefore, the use of threads bears testament to the effect of
tasks parallelism and operational integrity achieved in RTAP.
142
Figure 6.18: Pixel render thread profile of RTAP on machines 1 and 2.
0
0.05
0.1
0.15
0.2
0.25P
roce
ssin
g t
ime
(m
s)
Process Priority
Pixel Render Thread Profile (Machine 1)
ANSP HSR
Optimised ANSP LSR
DRNL
Optimised DRNL
IHCRP
Optimised IHCRP
NRR HSR
NRR LSR
Optimised NRR HSR
Optimised NRR LSR
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Pro
cess
ing
tim
e (
ms)
Process Priority
Pixel Render Thread Profile (Machine 2)ANSP HSR
Optimised ANSP LSR
DRNL
Optimised DRNL
IHCRP
Optimised IHCRP
NRR HSR
NRR LSR
Optimised NRR HSR
Optimised NRR LSR
143
Figure 6.19: Record thread profile of RTAP on machines 1 and 2.
1.2
1.4
1.6
1.8
2
2.2
2.4
2.6
2.8P
roce
ssin
g t
ime
(m
s)
Process Priority
Record Thread Profile (Machine 1)ANSP HSR
Optimised ANSP LSR
DRNL
Optimised DRNL
IHCRP
Optimised IHCRP
NRR HSR
NRR LSR
Optimised NRR HSR
Optimised NRR LSR
1.5
2
2.5
3
3.5
4
4.5
5
Pro
cess
ing
tim
e (
ms)
Process Priority
Record Thread Profile (Machine 2)ANSP HSR
Optimised ANSP LSR
DRNL
Optimised DRNL
IHCRP
Optimised IHCRP
NRR HSR
NRR LSR
Optimised NRR HSR
Optimised NRR LSR
144
Figure 6.20: Onscreen signal display profile for maximum load in RTAP for machines 1 and
2.
6.3 Summary
A fast exponential function has been implemented in RTAP. Although it is able to
increase the load profile of RTAP, it causes instability in the response of ANSP HSR stage
and renders this stage as unworkable. Segmenting the computing structure of the apical
7.2
7.4
7.6
7.8
8
8.2
8.4
8.6P
roce
ssin
g t
ime
(m
s)
Process Priority
Signals Display Profile (Machine 1)ANSP HSR
Optimised ANSP LSR
DRNL
Optimised DRNL
IHCRP
Optimised IHCRP
NRR HSR
NRR LSR
Optimised NRR HSR
Optimised NRR LSR
6.4
6.6
6.8
7
7.2
7.4
7.6
7.8
Pro
cess
ing
tim
e (
ms)
Process Priority
Signals Display Profile (Machine 2)ANSP HSR
Optimised ANSP LSR
DRNL
Optimised DRNL
IHCRP
Optimised IHCRP
NRR HSR
NRR LSR
Optimised NRR HSR
Optimised NRR LSR
145
conductance equation in the IHCRP into pre-computing and real-time segments allowed an
even utilisation of exponential functions from Intel MKL and Schraudolph optimised
implementation. This eradicated the instability in the ANSP HSR fibre response computation
and enabled higher BF channels loading at the expense of lower accuracy in the response of
RTAP regardless of the AP model stage. Load profiles of RTAP running on a desktop and a
laptop are provided. The desktop with a faster clocking CPU is able to accommodate more
numbers of BF channels than the laptop. Running RTAP at higher process priorities result in
an increase in the BF load. Single and double precision execution of RTAP bears similar BF
load results.
146
Chapter 7: Summary, Recommendations and Conclusion
7.1 Summary
Five auditory pathway (AP) computer models have been reviewed and the MAP
model has been selected for real-time implementation. The algorithms selected for real-time
implementation include basilar membrane (BM) displacement, inner hair cell receptor
potential (IHCRP), neurotransmitter release rate (NRR) and auditory nerve spiking
probabilities (ANSP). A transition program, RTAP-numerical is developed in C to ensure that
the algorithm response from MAP to C for real-time implementation matches the MAP
model. The RMS errors for all stages in the auditory model from stapes displacement to
auditory nerve spiking probability (ANSP) responses between MAP and RTAP-numerical are
below 1%. As a result of the insignificant differences between MAP and RTAP-numerical
responses, the algorithms were then incorporated into a C++ GUI library called JUCE. The
JUCE library is used for the implementation of a real-time GUI based program on Windows
operating system. To achieve real-time effects the algorithms from RTAP-numerical are
wrapped in a C++ class and integrated with POSIX threading APIs and timer callback
functions.
The real-time auditory pathway simulator, RTAP is able to process built-in generated
sine tones as well as real world audio data acquired from microphone channel on board the
computer it is running on. Static displays of real-time responses are available in ERB scaled
and spectrogram formats that allow signals generated from multiple BF channels to be
displayed on a single graph. Mathematical optimisation implemented for approximating
exponential functions are capable of increasing the number of discrete BF points that differ
based on the computing process priority of the real-time model and the stage of the auditory
pathway that the simulation takes place for. However, the optimised exponential function
deviate the accuracies of responses generated in RTAP as much as 8% at the expense of
increased BF channels loading due to quantisation errors inherent in its approximation.
Load profiles indicate that the maximum number of BF channels is generated at the
BM displacement stage and this parameter reduces at every stage in the upstream direction
of the auditory model. Optimised exponential function used in RTAP is capable of increasing
the load of the simulator by approximately 10% for a laptop and desktop running on a dual
core CPU as specified in table 4.1. These load profiles for non-optimised and optimised
execution of RTAP though varies on different computers.
147
7.2 Recommendations
While this document presents the implementation of a real-time auditory pathway
(AP) model that resulted in the emergence of RTAP, it is able to accomplish more
functionalities by adopting these operational features in future editions of the model:
1) Current record features on RTAP only record two window frames of data due to the
overheads involved in disk access that increases processing time. A real-time
logger implemented on RTAP will be able to record multiple window frames of data
indefinitely for the duration of the runtime of the algorithms.
2) Besides using sine tones and audio signal from the microphone channel on board a
computer, various other input stimuli can be implemented such as step, impulse,
and saw tooth. An alternative input stimulus source is from audio files of various
formats such as wav, mp3 and ogg.
3) Upstream stages from the MAP model that has been excluded from this version of
RTAP can be introduced in future versions that include AN spiking, cochlear
nucleus, brainstem level computations. The effects of acoustic reflex and medial
olivo cochlear (MOC) feedback modules can also be added to the OME and BM
stages respectively.
4) Directional audio filters can be added to the outer ear stage to tune the magnitude
of the input stimulus based on the bearings of the stimulus source.
5) Algorithms utilised in RTAP can be parallelised and implemented in separate
threads to achieve further optimisation. Feasible stages of such implementation are
from BM displacement computation onwards where BF channels can be segmented
into smaller groups and a thread can be allocated to every group for computational
speedup.
6) RTAP runs only on Microsoft Windows. Versatility of RTAP can be enhanced if it is
available to run across other operating system (OS) platforms. Also RTAP can be
run on real-time operating system (RTOS) such as QNX, VxWorks and RTLinux.
Profiles can be generated and compared with the performance profiles of running
RTAP on general purpose operating system (GPOS) such as Windows and Linux.
7) Faster graphics rendering libraries for future implementation of RTAP can assist in
accelerating image drawing from buffers to the screen. As an alternative to JUCE
library, OpenGL, which is a low level graphics library can be used to achieve
speedups in processed data projection on to the computer screen. Though to use it,
modifications of RTAP UI may be required.
148
7.3 Conclusion
A real-time computational model of the auditory pathway has been developed on
Microsoft Windows operating system (OS) using C++. It is based on the Matlab Auditory
Periphery (MAP) model and is able to simulate cochlea functions such as basilar membrane
displacement and auditory nerve firings on-the-fly using either sine tones or streamed real-
world audio. The root-mean-squared (RMS) errors of the responses of the real-time
simulator and MAP model are less than 1%. The output of the real-time simulator is
presented in logarithmically scaled channels either on equivalent rectangular bandwidth or
spectrogram graph. Through the utilisation of POSIX threads, computing parallelism is
achieved that complements the real-time processing. An increase in channel loading is
achieved using mathematical optimisation though the accuracy of the responses with
respect to the MAP model drops to an RMS error of 8%. Running the real-time auditory
pathway simulator on a quicker multi-core computer processor will increase channel loading
though this is not specific on cross platform OS besides Microsoft Windows.
A real-time auditory pathway simulator is an essential tool for research in the fields of
neuroscience and engineering. In the area of neuroscience, a real-time AP model is able to
interface with models simulating upstream functions of the auditory pathway and the brain in
order to provide a large simulation model that can be used in studying complexities in the
separate regions of the brain [63]. In the area of engineering, a research tool of such
magnitude has led to noise reduction in cell phones in real-time through algorithms
developed from the study and modelling of the cochlea [7]. Cochlear implant designers can
learn the feasibility of porting algorithms of an AP model by studying the behaviour of real-
time AP model [5]. Enhanced real-time speech and music processing software and hardware
tools can be developed with the aid of the real-time AP model used as a perceptual model.
Therefore, the real-time computer auditory model developed in this research project has a
potential to be utilised in a wide variety of applications.
149
Bibliography
[1] R. Meddis and E. A. Lopez-Poveda, “Overview,” in Computational Models of the Auditory System, no. 1954, R. Meddis, E. A. Lopez-Poveda, R. R. Fay, and A. N. Popper, Eds. New York Dordrecht Heidelberg London: Springer, 2010, pp. 1–6.
[2] M. P. Cooke, “A Computer Model of Peripheral Auditory Processing Incorporating Phase-Locking, Suppression and Adaptation Effects,” Speech Communication, vol. 5, pp. 261–281, 1986.
[3] R. D. Patterson, M. H. Allerhand, and G. Christian, “Time-domain Modelling of Peripheral Auditory Processing: A Modular Architecture and a Software Platform,” The Journal of Acoustical Society of America, vol. 98, no. 4, pp. 1890 – 1894, 1995.
[4] X. Zhang, M. G. Heinz, I. C. Bruce, and L. H. Carney, “A Phenomenological Model for the Responses of Auditory-Nerve Fibers�: I . Nonlinear Tuning,” The Journal of the Acoustical Society of America, vol. 109, no. 2, pp. 648–670, 2001.
[5] B. S. Wilson, E. A. Lopez-Poveda, and R. Schatzer, “Use of Auditory Models in Developing Coding Strategies for Cochlear Implants,” in Computational Models of the Auditory System, 2010, pp. 237–260.
[6] R. Meddis and E. A. Lopez-Poveda, “Auditory Periphery�: From Pinna to Auditory Nerve,” in Computational Models of the Auditory System, R. Meddis, E. A. Lopez-Poveda, R. R. Fay, and A. N. Popper, Eds. New York Dordrecht Heidelberg London: Springer, 2010, pp. 7–38.
[7] L. Watts, “Real-time, High-Resolution Simulation of the Auditory Pathway, with Application to Cell-Phone Noise Reduction,” ISCAS, pp. 3821–3824, 2010.
[8] R. R. Pfeiffer, “A Model for Two-tone Inhibition of Single Cochlear Nerve Fibres,” The Journal of the Acoustical Society of America, vol. 48, no. 6B, pp. 1373 – 1378, 1970.
[9] J. O. Smith and J. S. Abel, “Bark and ERB Bilinear Transforms,” IEEE Transactions on Speech and Audio Processing, vol. 7, no. 6, pp. 697–708, 1999.
[10] M. E. Lutman and A. M. Martin, “Development of an Electroacoustic Analogue Model of the Middle Ear and Acoustic Reflex,” Journal of Sound And Vibration, vol. 64, no. 1, pp. 133–157, 1979.
[11] R. D. Patterson, K. Robinson, J. Holdsworth, D. McKeown, C. Zhang, and M. Allerhand, “Complex Sounds And Auditory Images,” Auditory Physiology and Perception, Proc. 9th International Symposium on Hearing, no. 1992, 1992.
[12] C. Giguere and P. C. Woodland, “A Computational Model of the Auditory Periphery for Speech and Hearing Research. I. Ascending Path,” The Journal of the Acoustical Society of America, vol. 95, no. 1, pp. 331 – 342, 1994.
[13] R. Meddis, “Simulation of Mechanical to Neural Transduction in the Auditory Receptor,” The Journal of the Acoustical Society of America, vol. 79, no. 3, pp. 702–711, 1986.
150
[14] J. L. Goldstein, “Modeling Rapid Waveform Compression on the Basilar Membrane as Multiple-Bandpass-Nonlinearity Filtering,” Hearing Research, vol. 49, pp. 39–60, 1990.
[15] T. Lin and J. L. Goldstein, “Implementation of the MBPNL Nonlinear Cochlear I/O Model in the C Programming Language, and Applications for Modeling Impaired Auditory Function,” in Modeling Sensorineural Hearing Loss, W. Jesteadt, Ed. New Jersey: Lawrence Erlbaum Associates, Inc., 1997, pp. 67 – 78.
[16] R. Meddis, “Matlab Auditory Periphery (MAP) Model Technical Description.” Essex, pp. 1 –32, 2011.
[17] R. Meddis, L. P. O’Mard, and E. A. Lopez-Poveda, “A Computational Algorithm for Computing Nonlinear Auditory Frequency Selectivity,” The Journal of the Acoustical Society of America, vol. 109, no. 6, pp. 2852 – 2861, 2001.
[18] E. A. Lopez-Poveda and R. Meddis, “A Human Nonlinear Cochlear Filterbank,” The Journal of the Acoustical Society of America, vol. 110, no. 6, pp. 3107 – 3118, 2001.
[19] C. J. Sumner, E. A. Lopez-Poveda, L. P. O’Mard, and R. Meddis, “A Revised Model of the Inner-Hair Cell and Auditory-Nerve Complex,” The Journal of the Acoustical Society of America, vol. 111, no. 5, pp. 2178 – 2188, 2002.
[20] R. Patterson and T. Walter, “AIM-C,” 2009. [Online]. Available: http://code.soundsoftware.ac.uk/projects/aimc.
[21] R. F. Lyon, M. Rehn, S. Bengio, T. C. Walters, and G. Chechik, “Sound retrieval and ranking using sparse auditory representations.,” Neural computation, vol. 22, no. 9, pp. 2390–416, Sep. 2010.
[22] T. C. Walters, “Auditory-Based Processing of Communication Sounds,” University of Cambridge, 2011.
[23] J. Whittaker, “The Physics of the Ear.” Colorado, pp. 1 – 71, 2006.
[24] A. Michelsen and O. N. Larsen, “Pressure Difference Receiving Ears,” Bioinspiration & Biomimetics, vol. 011001, no. 3, pp. 1 – 18, 2008.
[25] S. E. Voss, J. J. Rosowski, S. N. Merchant, and W. T. Peake, “Acoustic Responses of the Human Middle Ear,” Hearing Research, vol. 150, pp. 43–69, 2000.
[26] J. Pickles, “The Outer and Middle Ears,” in An Introduction to the Physiology of Hearing2, 3rd ed., Bingley: Emerald Group, 2008, pp. 11 – 24.
[27] G. G. Matthews, “Hearing and Other Vibration Senses,” in Neurobiology Molecules, Cells and Systems, Blackwell Science, 1998, p. 25.
[28] A. Huber, M. Ferrazzini, S. Stoeckli, T. Linder, N. Dillier, S. Schmid, and U. Fisch, “Intraoperative Assessment of Stapes Movement,” The Annals of Otology, Rhinology & Laryngology, vol. 110, no. 1, pp. 31 – 35, 2001.
151
[29] M. A. Ruggero and A. N. Temchin, “The Roles of the External , Middle , and Inner Ears in Determining the Bandwidth of Hearing,” Proceedings of the National Academy of Sciences of the United States of America, vol. 99, no. 20, pp. 13206 – 13210, 2002.
[30] J. Pickles, “The Cochlea,” in An Introduction to the Physiology of Hearing, 3rd ed., Bingley: Emerald Group, 2008, pp. 25 – 72.
[31] A. van Schaik, “Analogue VLSI Building Blocks for an Electronic Auditory Pathway,” École Polytechnique Fédérale de Lausanne, 1997.
[32] A. G. Katsiamis, E. M. Drakakis, and R. F. Lyon, “Practical Gammatone-Like Filters for Auditory Processing,” EURASIP Journal on Audio, Speech and Music Processing, vol. 2007, pp. 1 – 25, 2007.
[33] N. Ma, “An Efficient Implementation of Gammatone Filters.” [Online]. Available: http://www.dcs.shef.ac.uk/~ning/resources/gammatone/.
[34] C. Michel, R. Nouvian, C. Azevedo-Coste, J. L. Puel, and J. Bourien, “A Computational Model of the Primary Auditory Neuron Activity.,” Conference proceedings�: ... Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Conference, vol. 2010, pp. 722–5, Jan. 2010.
[35] J. Pickles, “The Auditory Nerve,” in An Introduction to the Physiology of Hearing, 3rd ed., Bingley: Emerald Group, 2008, pp. 73 – 101.
[36] P. Dallos, “Neurobiology of Cochlear Inner and Outer Hair Cells: Intracellular Recordings.,” Hearing Research, vol. 22, pp. 185 – 198, Jan. 1986.
[37] S. A. Shamma, R. S. Chadwick, W. J. Wilbur, K. A. Morrish, and J. Rinzel, A Biophysical Model of Cochlear Processing: Intensity Dependence of Pure Tone Responses, vol. 80, no. 1. 1986, pp. 133–145.
[38] R. C. Kidd and T. F. Weiss, “Mechanisms that Degrade Timing Information in the Cochlea,” Hearing Research, vol. 49, no. HEARES 01421, pp. 181 – 208, 1990.
[39] R. Meddis, “Auditory-nerve First-spike Latency and Auditory Absolute Threshold: A Computer Model,” The Journal of the Acoustical Society of America, vol. 119, no. 1, pp. 406 – 417, 2006.
[40] P. A. Laplante, Real-time Systems Design and Analysis, 3rd ed. IEEE Press,Wiley-Interscience, 2004, pp. 1 – 505.
[41] Microsoft, “Operating System Versioning,” MSDN, 2012. [Online]. Available: http://msdn.microsoft.com/en-gb/library/dd371754(VS.85).aspx.
[42] K. Ramamritham, C. Shen, O. Gonzalez, S. Sen, and S. B. Shirgurkar, “Using Windows NT for Real-Time Applications�: Experimental Observations and Recommendations,” IEEE Real-Time Technology And Applications Symposium, pp. 1 – 13, 1998.
[43] Essex Hearing Research Laboratory, “auditory modelling at essex university,” 2012. [Online]. Available:
152
http://www.essex.ac.uk/psychology/department/HearingLab/modelling.html. [Accessed: 01-Nov-2011].
[44] Mathworks, “Filter - 1D Digital Filter,” 2012. [Online]. Available: http://www.mathworks.com.au/help/techdoc/ref/filter.html. [Accessed: 01-Jul-2011].
[45] Mathworks, “Logspace,” 2012. [Online]. Available: http://www.mathworks.com.au/help/techdoc/ref/logspace.html. [Accessed: 05-Jul-2011].
[46] M. J. Hewitt and R. Meddis, “An Evaluation of Eight Computer Models of Mammalian Inner Hair-cell Function,” Journal of the Acoustical Society of America, vol. 90, no. August 1991, pp. 904–917, 2010.
[47] R. Medldis, M. J. Hewitt, and T. M. Shackleton, “Implementation Details of a Computational Model of the Inner Hair-cell/Auditory-nerve Synapse,” The Journal of the Acoustical Society of America, vol. 87, no. April 1990, pp. 1813–1816, 2010.
[48] C. J. Plack, The Sense of Hearing. Lawrence Erlbaum Associates, Inc., 2005.
[49] N.A., “Processes and Threads,” Windows Dev Centre - Desktop, 2012. [Online]. Available: http://msdn.microsoft.com/en-us/library/windows/desktop/ms684841(v=vs.85).aspx. [Accessed: 15-Mar-2012].
[50] S. Akhter and J. Roberts, Multi-Core Programming: Increasing Performance through Software Multi-threading, 1st ed. Hillsboro: , 2006.
[51] Microsoft, “SetPriorityClass Function,” MSDN, 2012. [Online]. Available: http://msdn.microsoft.com/en-us/library/windows/desktop/ms686219(v=vs.85).aspx. [Accessed: 01-Jun-2012].
[52] B. Kuhn, P. Petersen, and E. O’Toole, “OpenMP versus Threading in C / C ++ Threaded Code Fragment from Genehunter.” .
[53] B. R. Glasberg and B. C. Moore, “Derivation of auditory filter shapes from notched-noise data.,” Hearing research, vol. 47, no. 1–2, pp. 103–138, Aug. 1990.
[54] Intel, “Intel Architecture Software Developer ’s Manual,” vol. 2. Intel, 1999.
[55] R. R. Mergu and S. K. Dixit, “Multi-Resolution Speech Spectrogram,” International Journal of Computer Applications, vol. 15, no. 4, pp. 28–32, Feb. 2011.
[56] J. Pickles, “The Auditory Nerve,” in An Introduction to the Physiology of Hearing, 3rd ed., Bingley: Emerald Group, 2008, pp. 73 – 102.
[57] IVONA Software, “Ivona Text-to-Speech.” [Online]. Available: http://www.ivona.com/us/. [Accessed: 10-Jul-2012].
[58] A. Fog, “Instruction Tables,” Copenhagen, 2011.
[59] E. W. Weisstein, “Exponential Function,” Mathworld - A Wolfram Web Resource, 2012. [Online]. Available: http://mathworld.wolfram.com/ExponentialFunction.html. [Accessed: 20-Jun-2012].
153
[60] Intel, “Intel ® Math Kernel Library for Windows * OS.” Intel, pp. 1–114.
[61] N. N. Schraudolph, “A Fast, Compact Approximation of the Exponential Function,” Neural Computation, no. 11, pp. 853–862, 1999.
[62] Microprocessor Standards Committee and Floating Point Working Group, IEEE Std 754TM-2008 (Revision of IEEE Std 754-1985), IEEE Standard for Floating-Point Arithmetic, 2008th ed., vol. 2008, no. August. IEEE, 2008.
[63] N. V Thakor, “In the Spotlight: Neuroengineering.,” IEEE Reviews in Biomedical Engineering, vol. 3, pp. 19 – 22, Jan. 2010.
154
Appendix A
Outer and Middle Ear
Filter Parameter Remarks
External Ear Resonance
(EER) filter
Numerator order = 3:
a[0] = 1.0, a[1] = -1.142727,
a[2] = 0.37408769
Denominator order = 3:
b[0] = 0.3195615, b[1] = 0.0,
b[2] = -0.31295615
Tympanic Membrane (TM)
filter
Numerator order = 1:
b[0] = 0.014247596
Denominator order = 2:
a[0] = 1.0, a[1] = -0.957524
Stapes Inertia (SI) filter Numerator order = 2:
b[0] = 0.87454802,
b[1] = -0.87454802
Denominator order = 2:
a[0] = 1.0,
a[1] = -0.74909604
Basilar Membrane
Linear gammatone filter Numerator order = 2,
Denominator order = 3
Filter coefficients vary based
on number of BF channels
used.
minLinCF = 153.13 Hz,
coeffLinCF = 0.7341
Coefficients for calculating
minimum linear characteristic
frequencies.
minLinBW = 100,
coeffLinBW = 0.6531
Coefficients for calculating
minimum linear bandwidths.
Nonlinear gammatone filter Numerator order = 2,
Denominator order = 3
Filter coefficients vary based
on number of BF channels
used.
p = 0.2895,
q = 250
Coefficients for calculating
bandwidths in nonlinear
pathway based on human
155
auditory pathway settings
[16].
Memoryless compression
threshold
compThreshdB = 10,
a = 50,000
c = 0.2
Coefficients for calculating
compressive effects in
nonlinear pathway.
Inner Hair Cell
IHC cilia displacement
filter
TC = 0.00012 Filter time constant
Numerator order = 2:
b[0] = 1.0,
b[1] = -0.66207103
Denominator order = 1:
a[0] = 0.37792897
C = 0.08 Scaling coefficient
Receptor potential u0 = 5e-9,
u1 = 1e-9
Cilia displacements
constants set based on
nonlinear characteristic.
s0 = 1e-9,
s1 = 1e-9
Dimensionless longitudinal
constants based on
nonlinear characteristic
scaled by BM length.
Gmax = 6e-9,
Ga = 0.8e-9,
Gk = 2e-8
Maximum IHC, apical and
potassium conductances in
Siemens.
Et = 0.1,
Ek = -0.08
Endocochlear and potassium
equilibrium potentials in
volts.
RPC = 0.04 Combined resistances in
ohm.
Cap = 4e-12 IHC capacitance.
Inner Hair Cell Presynaptic Region
Neurotransmitter release
rate
Gmax-Ca = 14e-9 Maximum calcium
conductance in Siemens.
ECa =0.066 Calcium equilibrium potential
in volts.
� = 400, Constants determining
156
� = 100 calcium channel opening.
�M = 5e-5,
�Ca[0] = 30e-6,
�Ca[1] = 80e-6
Membrane and calcium low
and high spontaneous rate
time constants in seconds.
Z = 2e42 Vesicle release rate scalar.
Auditory Nerve
AN spiking probability trefractory = 0.75e-3 Refractory period in seconds.
M = 12 Maximum neurotransmitter
vesicles at synapse
Y = 6 Depleted neurotransmitter
vesicles replacement rate.
X = 60 Replenishment from re-
uptake store.
L = 250 Neurotransmitter vesicle loss
rate from the cleft.
R = 500 Neurotransmitter re-uptake
rate from cleft into IHC.
Table A.1: Algorithm parameter settings
157
Figure A.1: MAP and RTAP inner hair cell receptor potential (IHCRP) response for 30 BF channels.
250Hz
388Hz
670Hz
1159Hz
2005Hz
3469Hz
6000Hz
0 0.005 0.01 0.015 0.02
BF
sit
es
alo
ng
BM
fo
r IH
C r
ece
pto
r p
ote
nti
al
(Hz)
Time (seconds)
MAP IHC RP Response (30 BFs)
250Hz
388Hz
670Hz
1159Hz
2005Hz
3469Hz
6000Hz
0 0.005 0.01 0.015 0.02
BF
sit
es
alo
ng
BM
fo
r IH
C r
ece
pto
r p
ote
nti
al
(Hz)
Time (seconds)
RTAP IHC RP Response (30 BFs)
158
Figure A.2: MAP and RTAP low spontaneous rate (LSR) fibre neurotransmitter release rate (NRR) response for 30 BF channels.
250Hz
388Hz
670Hz
1159Hz
2005Hz
3469Hz
6000Hz
0 0.005 0.01 0.015 0.02
BF
sit
es
alo
ng
BM
fo
r N
RR
to
LS
R A
N f
ibre
s (H
z)
Time (seconds)
MAP NRR LSR Response (30 BFs)
250Hz
388Hz
670Hz
1159Hz
2005Hz
3469Hz
6000Hz
0 0.005 0.01 0.015 0.02
BF
sit
es
alo
ng
BM
fo
r N
RR
to
LS
R A
N f
ibre
s (H
z)
Time (seconds)
RTAP NRR LSR Response (30 BFs)
159
Figure A.3: MAP and RTAP high spontaneous rate (HSR) fibre neurotransmitter release rate (NRR) response for 30 BF channels.
250Hz
388Hz
670Hz
1159Hz
2005Hz
3469Hz
6000Hz
0 0.005 0.01 0.015 0.02
BF
sit
es
alo
ng
BM
fo
r N
RR
to
AN
HS
R f
ibre
s (H
z)
Time (seconds)
MAP NRR HSR Response (30 BFs)
250Hz
388Hz
670Hz
1159Hz
2005Hz
3469Hz
6000Hz
0 0.005 0.01 0.015 0.02
BF
sit
es
alo
ng
BM
fo
r N
RR
to
AN
HS
R f
ibre
s (H
z)
Time (seconds)
RTAP NRR HSR Response (30 BFs)
160
Figure A.4: MAP and RTAP low spontaneous rate (LSR) fibre auditory nerve spiking probability (ANSP) response for 30 BF channels.
250Hz
388Hz
670Hz
1159Hz
2005Hz
3469Hz
6000Hz
0 0.005 0.01 0.015 0.02
BF
sit
es
alo
ng
BM
fo
r A
N s
pik
ing
on
LS
R f
ibre
s (H
z)
Time (seconds)
MAP ANSP LSR Response (30 BFs)
250Hz
388Hz
670Hz
1159Hz
2005Hz
3469Hz
6000Hz
0 0.005 0.01 0.015 0.02
BF
sit
es
alo
ng
BM
fo
r A
N s
pik
ing
on
LS
R f
ibre
s (H
z)
Time (seconds)
RTAP ANSP LSR Response (30 BFs)
161
Figure A.5: MAP and RTAP high spontaneous rate (HSR) fibre auditory nerve spiking probability (ANSP) response for 30 BF channels.
250Hz
388Hz
670Hz
1159Hz
2005Hz
3469Hz
6000Hz
0 0.005 0.01 0.015 0.02
BF
sit
es
alo
ng
BM
fo
r A
N s
pik
ing
on
HS
R f
ibre
s (H
z)
Time (seconds)
MAP ANSP HSR Response (30 BFs)
250Hz
388Hz
670Hz
1159Hz
2005Hz
3469Hz
6000Hz
0 0.005 0.01 0.015 0.02
BF
sit
es
alo
ng
BM
fo
r A
N s
pik
ing
on
HS
R f
ibre
s (H
z)
Time (seconds)
RTAP ANSP HSR Response (30 BFs)
162
Figure A.6: Continuity between adjacent window frames for RTAP generated inner hair cell receptor potential (IHCRP) response.
Figure A.7: Continuity between adjacent window frames for RTAP generated neurotransmitter release rate (NRR) in low spontaneous rate (LSR) fibres.
250Hz388Hz
670Hz
1159Hz
2005Hz
3469Hz
6000Hz
0.052 0.054 0.056 0.058 0.06 0.062 0.064 0.066 0.068
BF
sit
es
alo
ng
BM
fo
r IH
C r
ece
pto
r p
ote
nti
al
(Hz)
Time (seconds)
RTAP IHC RP Response (30 BFs)
250Hz
388Hz
670Hz
1159Hz
2005Hz
3469Hz
6000Hz
0.052 0.054 0.056 0.058 0.06 0.062 0.064 0.066 0.068BF
sit
es
alo
ng
BM
fo
r N
RR
to
LS
R A
N f
ibre
s (H
z)
Time (seconds)
RTAP NRR LSR Response (30 BFs)
163
Figure A.8: Continuity between adjacent window frames for RTAP generated neurotransmitter release rate (NRR) in high spontaneous rate (HSR) fibres.
Figure A.9: Continuity between adjacent window frames for RTAP generated auditory nerve spiking rate (ANSP) in low spontaneous rate (LSR) fibres.
250Hz
388Hz
670Hz
1159Hz
2005Hz
3469Hz
6000Hz
0.052 0.054 0.056 0.058 0.06 0.062 0.064 0.066 0.068BF
sit
es
alo
ng
BM
fo
r N
RR
to
AN
HS
R f
ibre
s (H
z)
Time (seconds)
RTAP NRR HSR Response (30 BFs)
250Hz
388Hz
670Hz
1159Hz
2005Hz
3469Hz
6000Hz
0.052 0.054 0.056 0.058 0.06 0.062 0.064 0.066 0.068
BF
sit
es
alo
ng
BM
fo
r A
N s
pik
ing
on
LS
R f
ibre
s (H
z)
Time (seconds)
RTAP ANSP LSR Response (30 BFs)
164
Figure A.10: Continuity between adjacent window frames for RTAP generated auditory nerve spiking probability (ANSP) in high spontaneous rate (HSR) fibres.
250Hz
388Hz
670Hz
1159Hz
2005Hz
3469Hz
6000Hz
0.052 0.054 0.056 0.058 0.06 0.062 0.064 0.066 0.068BF
sit
es
alo
ng
BM
fo
r A
N s
pik
ing
on
HS
R f
ibre
s (H
z)
Time (seconds)
RTAP ANSP HSR Response (30 BFs)
165
Figure A.11: ERBS representation of the first window frame of inner hair cell receptor potential (IHCRP) response in RTAP based on 65 BF channels.
Figure A.12: ERBS representation of the first window frame of neurotransmitter release rate (NRR) for low spontaneous rate (LSR) fibre response in RTAP based on 45 BF channels.
250
391
675
1165
2012
3475
6000
0 0.01 0.02 0.03 0.04 0.05 0.06
BF
Sit
es
(Hz)
Time (ms)
Maximum Load on Machine 1 (IHCRP stage)
250
386
687
1139
2031
3367
6000
0 0.01 0.02 0.03 0.04 0.05 0.06
BF
Sit
es
(Hz)
Time (ms)
Maximum Load on Machine 1 (NRR LSR stage)
166
Figure A.13: ERBS representation of the first window frame of neurotransmitter release rate (NRR) for high spontaneous rate (HSR) fibre response in RTAP based on 38 BF channels.
Figure A.14: ERBS representation of the first window frame of the auditory nerve spiking probability (ANSP) for low spontaneous rate (LSR) fibre response in RTAP based on 38 BF
channels.
250
384
643
1173
2141
3584
6000
0 0.01 0.02 0.03 0.04 0.05 0.06
BF
Sit
es
(Hz)
Time (ms)
Maximum Load on Machine 1 (NRR HSR stage)
250
384
643
1173
2141
3584
6000
0 0.01 0.02 0.03 0.04 0.05 0.06
BF
Sit
es
(Hz)
Time (ms)
Maximum Load on Machine 1 (ANSP LSR stage)
167
Figure A.15: ERBS representation of the first window frame of the auditory nerve spiking probability (ANSP) for high spontaneous rate (HSR) fibre response in RTAP based on 30 BF
channels.
250
388
670
1160
2005
3469
6000
0 0.01 0.02 0.03 0.04 0.05 0.06
BF
Sit
es
(Hz)
Time (ms)
Maximum Load on Machine 1 (ANSP HSR stage)