Open Speech Platform (OSP) Project Sponsor Nation...

14
Open Speech Platform (OSP) Project Sponsor Nation Institute of Health, NIDCD This work is supported by Nation Institute of Health, NIH/NIDCD grant R01DC015436, "A Real- time, Open, Portable, Extensible Speech Lab" to University of California, San Diego. Table of Contents License Overview Summary Recommended Usage For Electrical Engineering and Computer Science Practitioners For Audiologists and Speech Scientists Real-Time Master Hearing Aid (RT-MHA) Software Subband decomposition WDRC AFC RT-MHA Performance User Device Software OSP Installation Basic Installation (Optional) Zoom TAC-8 Installation for Use with OSP Running the system Initial Test - 1 (Optional) Initial Test – 2 with Zoom tac-8 Changing the filter coefficients Using the Tablet Mode: Setting Up Mac's Wifi Setting Up Android Working with the Android app OSP – MATLAB Documentation References License

Transcript of Open Speech Platform (OSP) Project Sponsor Nation...

Page 1: Open Speech Platform (OSP) Project Sponsor Nation ...openspeechplatform.ucsd.edu/downloads/Openspeech... · (Optional) Zoom TAC-8 Installation for Use with OSP Running the system

Open Speech Platform (OSP)

Project Sponsor Nation Institute of Health,NIDCDThis work is supported by Nation Institute of Health, NIH/NIDCD grant R01DC015436, "A Real-time, Open, Portable, Extensible Speech Lab" to University of California, San Diego.

Table of ContentsLicenseOverview

SummaryRecommended Usage

For Electrical Engineering and Computer Science PractitionersFor Audiologists and Speech Scientists

Real-Time Master Hearing Aid (RT-MHA) Software

Subband decompositionWDRCAFCRT-MHA Performance

User Device Software

OSP Installation

Basic Installation(Optional) Zoom TAC-8 Installation for Use with OSPRunning the system

Initial Test - 1(Optional) Initial Test – 2 with Zoom tac-8

Changing the filter coefficientsUsing the Tablet Mode:

Setting Up Mac's WifiSetting Up AndroidWorking with the Android app

OSP – MATLAB Documentation

References

License

Page 2: Open Speech Platform (OSP) Project Sponsor Nation ...openspeechplatform.ucsd.edu/downloads/Openspeech... · (Optional) Zoom TAC-8 Installation for Use with OSP Running the system

Copyright 2017 Regents of the University of California

Redistribution and use in source and binary forms, with or without modification, are permittedprovided that the following conditions are met:

1. Redistributions of source code must retain the above copyright notice, this list ofconditions and the following disclaimer.

2. Redistributions in binary form must reproduce the above copyright notice, this list ofconditions and the following disclaimer in the documentation and/or other materialsprovided with the distribution.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANYEXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIESOF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENTSHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITEDTO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; ORBUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER INCONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANYWAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCHDAMAGE.

Overview

Summary

This document provides an overview of the basic functionalities of hearing aid (HA) software, theReal-Time Master Hearing Aid (RT-MHA), completed in our Open Speech Platform (OSP). The basicfunctions include: (i) Subband decomposition, (ii) Wide dynamic range compression (WDRC), and(iii) Adaptive feedback cancellation (AFC). We provide brief descriptions to these functions withrecommended references.

Recommended Usage

The software modules in Release 2017a are depicted in Figure 1. It is possible to reconfigure thesystem at runtime and compile time. The system runs real-time on a MacBook with an overalllatency of 7.98 ms. This software will work with off-the-shelf microphones and speakers for real-time input and output. This is adequate for contributing to improving the technology. For someclinical investigations, form-factor accurate ear level assemblies are required.

Page 3: Open Speech Platform (OSP) Project Sponsor Nation ...openspeechplatform.ucsd.edu/downloads/Openspeech... · (Optional) Zoom TAC-8 Installation for Use with OSP Running the system

Figure 1: Software modules in Openspeechplatform_2017a_V1_0

For Electrical Engineering and Computer Science Practitioners

The reference design is provided in the files osp_process.c and osp_process.h. If you areworking on alternate implementations of basic HA functions, we suggest you clone a givenfunction and call this in the reference design. Keeping the interfaces same will minimize codechanges – do not change the .h files unless you have a good reason. If you are adding additionalfunctionality, implement it in the libosp and modify the reference design accordingly.

For Audiologists and Speech Scientists

For runtime reconfiguration of the RT-MHA, the user device provides for real-time changes toWDRC parameters. If you do not have local help to extend the Android software to suit your needs,contact us with details of the clinical study and features you would like.

Real-Time Master Hearing Aid (RT-MHA) Software

Page 4: Open Speech Platform (OSP) Project Sponsor Nation ...openspeechplatform.ucsd.edu/downloads/Openspeech... · (Optional) Zoom TAC-8 Installation for Use with OSP Running the system

We will discuss the open source libraries developed for the OSP. The basic functionalities of theOSP software is the RT-MHA. Figure 2 shows the block diagram of the RT-MHA software withsignal flows. This architecture with different sampling rates (96 kHz for I/O and 32 kHz for mainprocessing) has the benefit of minimizing hardware latency and improving spatial resolution ofbeamforming with multiple microphones. The basic HA functions are implemented in the 32 kHzdomain, including subband decomposition, WDRC, and AFC. These algorithms are provided insource code and compiled libraries and will be briefly discussed.

Figure 2: RT-MHA software block diagram with signal flows.

Subband decomposition

In the RT-MHA, subband decomposition is provided to divide the full frequency spectrum intomultiple frequency regions called “subbands”. This decomposition enables independent gaincontrol of the HA system in each frequency region using different WDRC parameters. Thedecomposition is implemented as a bank of 6 finite impulse response (FIR) filters (Subband-1 to 6in Figure 2) whose frequency responses are shown in Figure 3. Each of the 6 FIR filters has 193taps, equivalent to a latency of ((193-1)/2)/32 kHz) = 3 ms. The filters are designed in MATLABand the resulting filters are saved in .flt files for inclusion with the RT-MHA software.Bandwidths and upper and lower cut-off frequencies of these filters are determined according to aset of critical frequency values. It is possible for users to modify the MATLAB scripts to use FIRfilters of different length and different number of subbands. This change requires recompiling thelibrary.

Page 5: Open Speech Platform (OSP) Project Sponsor Nation ...openspeechplatform.ucsd.edu/downloads/Openspeech... · (Optional) Zoom TAC-8 Installation for Use with OSP Running the system

Figure 3: Frequency response of the 6-channel filter bank.

WDRC

The WDRC technique is one of the essential building blocks of a HA [1]. The purpose of a WDRCsystem is to modify the volume of the audio signal arriving at the user's ear in such a way as tomake the output sound as audible, comfortable, and intelligible as possible for the user. At a highlevel, the WDRC amplifies quiet sounds (40-50 dB SPL), attenuates loud sounds (85-100 dB SPL),and applies a variable gain for everything in between [2]. In the RT-MHA, the WDRC algorithm is abased on a version of Prof. James Kates [3] and developed in MATLAB. We ported this module toANSI C and provided as a library in source code. The implemented WDRC is a multi-channelsystem, in which the gain control is realized independently in each subband. The amount of gainin each subband is a frequency dependent, non-linear function of the input signal power.Basically, the overall WDRC algorithm is based on envelope detection (Peak Detector) and non-linear amplification (Compression Rule) as illustrated in Figure 4, with primary control parameters:compression ratio (CR), attack time (AT), release time (RT), and upper and lower kneepoints(K_{up} and K_{low}) [3]. In each subband, a peak detector tracks the envelope variations of theinput subband signal and estimates the signal power accordingly. Then the amount of gain toapply will be determined based on a compression rule of the estimated input power level given bythe peak detector. The AT and RT are illustrated in Figure 5, which are affected by the trackingability of the peak detector. The CR, K_{up} and K_{low} are the control parameters for determiningthe compression rule, as shown in Figure 6. These WDRC parameters for each subband can bespecified at compile time and changed at run time using the user device.

Figure 4: The WDRC system.

Page 6: Open Speech Platform (OSP) Project Sponsor Nation ...openspeechplatform.ucsd.edu/downloads/Openspeech... · (Optional) Zoom TAC-8 Installation for Use with OSP Running the system

Figure 5: Attack time and release time illustration.

Figure 6: Input-output curve of the compression rule.

AFC

Physical placement of the microphone and receiver in a HA device poses a major problem knownas acoustic feedback [3]. This results from the acoustic coupling between the receiver and themicrophone, in which the amplified signal through the receiver is collected by the microphone andre-amplified, causing severe distortion in the desired signal [4]. Consequently, it limits theavailable amount of amplification in a HA and disturbs the user due to the produced howling orwhistling sounds. To overcome this problem, AFC has become the most common technique in

Page 7: Open Speech Platform (OSP) Project Sponsor Nation ...openspeechplatform.ucsd.edu/downloads/Openspeech... · (Optional) Zoom TAC-8 Installation for Use with OSP Running the system

modern HAs because of its ability to track the variations in the acoustic feedback path and cancelsthe feedback signal accordingly. Figure 7 depicts the AFC framework implemented in our RT-MHAsystem, which is mainly based on the Filtered-X Least Mean Square (FXLMS) method [5,6]. In thisframework, the AFC filter W(z) is a finite impulse response (FIR) filter placed in parallel with the HAprocessing G(z) that continuously adjusts its coefficients to emulate the impulse response of thefeedback path F(z). x(n) is the desired input signal and d(n) is the actual input to the microphone,which contains x(n) and the feedback signal y(n) generated by the HA output s(n) passing throughF(z). \hat{y}(n) is the estimate of y(n) given by the output of W(z). e(n) = d(n) − \hat{y}(n) is thefeedback-compensated signal. H(z) and A(z) are respectively the band-limited filter and the pre-whitening filter (both FIR and fixed), with u(n), e_f(n), and u_f(n) being the output signals of thefilters. Besides the basic FXLMS, the proportionate normalized LMS (PNLMS) [7-9] is also providedthat improves the feedback tracking ability over the FXLMS. The AFC module is provided as alibrary in source code.

Figure 7: The AFC framework.

RT-MHA Performance

In Table 1, we present performance of RT-MHA compared with 3 commercial HAs from Opticon(open sound experience – opn, Alta2, and Sensei). These are most advanced, previously a high-end and mid-level HAs, respectively, from Oticon. Based on the results, we conclude that the RT-MHA software is comparable to commercial HAs.

Page 8: Open Speech Platform (OSP) Project Sponsor Nation ...openspeechplatform.ucsd.edu/downloads/Openspeech... · (Optional) Zoom TAC-8 Installation for Use with OSP Running the system

Table 1: Performance of RT-MHA, compared with commercial HAs.

User Device Software

The user device software is implemented above TCP/IP layer in a software stack called OSPLayer.Figure 8 shows the runtime parameters enabled in the current implementation. Future extensionswill include uploading audio and other internal states of RT-MHA to the user device for postprocessing. This modular structure enables investigations in self fitting and auto fittingalgorithms.

Figure 8: Screen shots of examples of user device software. Left panel shows RT-MHA parametersthat can be controlled in real time. Middle panel shows an example of self-fit software. Right panelshows example of a word recognition test to evaluate HA performance.

OSP Installation

Page 9: Open Speech Platform (OSP) Project Sponsor Nation ...openspeechplatform.ucsd.edu/downloads/Openspeech... · (Optional) Zoom TAC-8 Installation for Use with OSP Running the system

Basic Installation

1. Xcode

1. Open app store2. Make sure you have the system up to date3. Search for xcode4. Install xcode5. Open xcode and agree to the licensing6. Wait for xcode to finish installing

2. Install brew

1. Go to the website: https://brew.sh2. Follow the instructions on the website to install brew

3. Install port audio

1. Run the following command: “brew install portaudio”

4. Install doxygen

1. Run the following command: "brew install doxygen"2. Next install graphviz using this command: “brew install graphviz”

5. Extract the tar file6. Open xcode and open the r01-osp-xcode project file7. Build project8. Run the code9. If the prompt appears “Enable developer mode for this mac”, click on enable.

(Optional) Zoom TAC-8 Installation for Use with OSP

1. Zoom Tac-8 configuration

1. Download the Zoom Tac-8 drivers and install the software on your mac.https://www.zoom-na.com/products/production-recording/audio-interfaces/zoom-tac-8-thunderbolt-audio-converter#downloads

2. Restart the computer.3. Now download the MixEfx software from the same link and install the software.4. Connect Zoom Tac-8 to the mac using thunderbolt. If tac8 software does not

open, then open the application from the Launchpad.5. Make sure the Tac-8 software detects the hardware. If not follow troubleshooting

tips from the website.6. Go to file > Open > MixSetting.tac8 file from the tar ball. NOTE: This needs to be

done every time the app is closed and opened.

2. Open Audio MIDI Setup app from the Launchpad.

3. Select Zoom Tac-8 on the left hand side.

4. On the formats, change the sampling frequency to 96kHz.

5. Open System Preferences from Launchpad

Page 10: Open Speech Platform (OSP) Project Sponsor Nation ...openspeechplatform.ucsd.edu/downloads/Openspeech... · (Optional) Zoom TAC-8 Installation for Use with OSP Running the system

6. Go to sound and then choose ZOOM TAC-8 in the input and output sections.

Running the system

1. Open Xcode2. Open project file: R01-OSP-xcode3. Go to Product>scheme>Edit scheme4. Go to RUN > Arguments5. The system supports the various functionalities that have been enabled using the

following arguments

-a Starts the application with the AFC turned off by default.-r Starts the application with the rear microphones turned off by default. That is,it disables channels 3 and 4 on the Zoom Tac-8 box since they correspond to therear left and rear right microphones on the hearing aids. If you're not using thesechannels, and are just using channels 1 and 2, this is recommended, although itdoesn't hurt if these channels are left on (the beamforming algorithm is relativelylight and doesn't take too much processing power).-d Starts the application with a built-in gain of . This is mostly used to set anattenuation factor, in which case, a negative gain is supplied. If the user wants a-20dB gain (attenuation), then he will start the application with ./osp-exe -d-100or ./osp-exe -d -100. The space between the -d flag and the gain is ignored, soeither case will work.-T Starts the application with a predefined AFC type. The types of AFC supportedare 0 = FXLMS and 1 = PNLMS. So if the user wants to start the application withPNLMS, he will start the program with ./osp-exe -T1. If the -T flag is notsupplied, the program will default to FXLMS (-T0).-l Starts the application in loopback-only mode. The audio is not enabled, andinstead, audio data is read from the into the application where the OSP algorithmis applied to the audio data, and written to an output file. This is useful forverifying against benchmark data, such as from MATLAB. This mode is mostlyused for internal debugging.-t This starts the application in TCP daemon mode. In this mode, the applicationis waiting for a connection on port 8001 from an OSP client. Once a client isconnected, all parameters (such as g50 and g80 gains, attack/release time, kneepoints, etc) come from the client. The command line version of the applicationcannot be changed manually. To stop the application once it's started in TCPmode, the user must hit CTRL+C.-M<MPO on/off> By setting this to 0, MPO is disabled and attenuation factor isset to 0 dB; by setting it to 1, MPO is enabled. When MPO needs to be disabled,the following command should be executed ./osp-exe -M0. Similarly to enableMPO, the following command needs to be executed ./osp-exe -M1. Thisfunctionality is mainly for disabling MPO for testing the Hearing aid, for example,for performing the ANSI test we do not need MPO functionality.-h will display on the command line all of these options. -h is short for "help" incase the operator needs a quick cheat-sheet for the command-line argumentsthat are supported.

If the -t (TCP daemon mode) flag is NOT supplied, the application will start in keyboard-inputmode. In this mode, many parameters can be changed on-the-fly from keyboard inputs while theprogram is running. To change the parameters, simply enter the values below (one at a timefollowed by the enter key) in the same terminal the osp-exe program was run from. For example,

Page 11: Open Speech Platform (OSP) Project Sponsor Nation ...openspeechplatform.ucsd.edu/downloads/Openspeech... · (Optional) Zoom TAC-8 Installation for Use with OSP Running the system

while the program is running, if the operator chooses to adjust the gain to 10dB, he can simplyenter g 10 followed by Enter on the command line. If the operator wants to turn on/off AFC, hecan simply enter s followed by enter to toggle AFC mode.

Initial Test - 1

For this test we do not need Zoom Tac-8 to be connected to the MAC

1. Make sure in the Audio MIDI setup, the Built-in input and output are at 96kHz 2 ch 24 bitinteger.

2. Run the project with the following arguments: -a, -M0, -r3. Run your finger across the internal microphone, you would hear a brushing noise

(sometimes maybe even a squealing noise).

(Optional) Initial Test – 2 with Zoom tac-8

1. Setup the Zoom tac-8 as previously mentioned.2. Make sure the input and output ports are aligned. Currently input 1 goes to output 1 and

input 2 goes to output 2.

Changing the filter coefficients

1. The filter coefficients can be changed by replacing the filterbank_coeffs*.flt files inthe following directory: r01-osp-xcode/osp/filter_coefficients

2. By default we have set the filterback coefficients to Kates filters. You can access Boothroydfilterbanks in r01-osp-xcode/osp/Boothroyd Filters.

3. Also you can define your own filterbanks in flt format and save them in r01-osp-xcode/osp/filter_coefficients to test them.

Using the Tablet Mode:

Setting Up Mac's Wifi

1. Wifi dongle – EDUP – DB 1607

1. Get drivers from the website:http://www.szedup.com/support/driver-download/ep-db1607-driver/

2. Get proper drivers based on mac os version3. Go to downloads, then the driver folder4. Installer.pkg5. Follow instructions6. Restart mac7. Click the usb wifi icon in the menu bar (mac)8. Connect to your preferred wifi network using the wifi dongle9. Go to Apple icon and open system preferences

10. Go to sharing in the system preferences11. Go to internet sharing12. Select “Share your connection from” => USB dongle (our case 802.11ac WLAN

adapter)13. In the “To Computer using”: Check the Wifi.14. Select wifi options -> Change the network name for your desired network name,

and create a password for this network. This network would be used to connect

Page 12: Open Speech Platform (OSP) Project Sponsor Nation ...openspeechplatform.ucsd.edu/downloads/Openspeech... · (Optional) Zoom TAC-8 Installation for Use with OSP Running the system

the tablet to the Macbook.15. Make sure your macʼs wifi is turned on.16. On your left side column, hit internet sharing and hit “start” in the prompt that

pops up.

Setting Up Android

2. The android app can be installed on the device in 2 ways:

We have provided the Android app project in the tar ball. You can build the appand port it to the android device.We have also included the apk for the R01 android app. You would have to justcopy the app to the device and install it.

Working with the Android app

1. Run the code with –t parameter enabled.2. Connect the android device to the wifi network that you defined while installing the wifi

dongle.3. The app should automatically fill the IP address of the mac, if it fails enter: ”192.168.2.1”.4. Enter the Researcher initials and the user initials5. Change the parameters in the researcher page.6. Hit transmit, the values would be loaded to the mac.7. The logs would be created in the same project folder.

Changing the location of logs

1. Open xcode project.2. Go to Product>scheme>Edit scheme3. Under Environment Variables, edit the LOG_PATH variable to change the location

of the logs directory.

NOTE: Default log directory is the project path itself.

Format of log files

Raw data from android app has the following format :[date, pageName, ButtonName, CrispnessMultipliers, FullnessStepSize, VolumeStepSize,CrispnessStepSize, FullnessLevel, VolumeLevel, CrispnessLevel, FullnessValue,VolumeValue, CrispnessValue, CompressionRatio, g50, g65, g80, KneeLow, MPOLimit,AttackTime, ReleaseTime]

OSP – MATLAB Documentation

The following files have been included with R01 OSP-MATLAB release:

1. Filterbank Coefficients: Boothroyd Filters and Kates filter banks have been included in thisfolder as .flt files. MHA_GenerateFbank.m is the MATLAB function to produce thefilterbanks.

2. MHA parameters: This folder contains the MHA parameters for 4 different standardaudiograms: NH, N2, N4, and S2. NH is normal hearing and N2, N4, and S2 are 3 types ofhearing losses. In each file, there are 7 columns, each with 6 numbers:

Page 13: Open Speech Platform (OSP) Project Sponsor Nation ...openspeechplatform.ucsd.edu/downloads/Openspeech... · (Optional) Zoom TAC-8 Installation for Use with OSP Running the system

Column 1: hearing losses of the 6 bands.Column 2: gain values at 50 dB SPL for the 6 bands.Column 3: gain values at 80 dB SPL for the 6 bands.Column 4: attack times for the 6 bands.Column 5: release times for the 6 bands.Column 6: lower kneepoints for the 6 bands.Column 7: upper kneepoints for the 6 bands.

3. Feedback Paths: 26 feedback paths provided from Prof. James Kates of University ofColorado Boulder have been included. The feedback path impulse responses wererecorded using real hearing aid devices put on an “EDDI” acoustic manikin. The samplingfrequency was 15.625 kHz. Each feedback path was measured from a particular scenarioand can be identified by its file name:

Earmold: 0=leaky unvented, 1=tight fit unvented, 2=ventedHandset: 0=on ear, 1-5=cm separation, 6=removedFor example, EDDI26.DAT means “vented” with “handset removed”; EDDI0.DATmeans “leaky unvented” with “handset on ear”.

4. HASPI_HASQI: The HASPI and HASQI are performance metrics to measure the intelligibilityand quality of audio files, respectively. HASPI and HASQI codes have been provided to usfrom Prof. James Kates. The Description of using the functions have been documented inthe functions themselves. The files haspi_v1.m and hasqi_v2.m are functions to evaluatethe speech files.

5. Input Files: All the input sound files are 32kHz sampled. Add your custom input files atthat sampling rate in the same folder.

References1. James M. Kates, "Principles of digital dynamic-range compression," Trends in

Amplification, vol. 9, no.2, pp. 45-76, 2005.

2. Shilpi Banerjee, The compression handbook: An overview of the characteristics andapplications of compression amplification, 4th Edition, Starkey Laboratories, 2017.

3. James M. Kates, Digital hearing aids, Plural publishing, 2008.

4. Toon van Waterschoot and Marc Moonen, "Fifty years of acoustic feedback control: State ofthe art and future challenges," Proceedings of IEEE, vol. 99, no. 2, pp. 288-327, 2011.

5. Johan Hellgren, "Analysis of feedback cancellation in hearing aids with Filtered-x LMS andthe direct method of closed loop identification," IEEE Transactions on Speech and AudioProcessing, vol. 10, no. 2, pp. 119-131, 2002.

6. Hsiang-Feng Chi, Shawn X. Gao, Sigfrid D. Soli, and Abeer Alwan, "Band-limited feedbackcancellation with a modified filtered-X LMS algorithm for hearing aids," SpeechCommunication, vol. 39, no. 1-2, pp. 147-161, 2003.

7. Donald L. Duttweiler, "Proportionate normalized least-mean-squares adaptation in echocancelers," IEEE Transactions on Speech and Audio Processing, vol. 8, no. 5, pp. 508-518,2000.

Page 14: Open Speech Platform (OSP) Project Sponsor Nation ...openspeechplatform.ucsd.edu/downloads/Openspeech... · (Optional) Zoom TAC-8 Installation for Use with OSP Running the system

8. Jacob Benesty and Steven L. Gay, “An improved PNLMS algorithm,” in Proceedings of IEEEInternational Conference on Acoustic, Speech, and Signal Processing (ICASSP), pp. 1881-1884, 2002.

9. Constantin Paleologu, Jacob Benesty, and Silviu Ciochină, "An improved proportionateNLMS algorithm based on the l0 norm," in Proceedings of IEEE International Conference onAcoustic, Speech, and Signal Processing (ICASSP), pp. 309-312, 2010.