Neuromorphic Audition Group Massimiliano Versace, Jonathan Tapson, Rico Moeckel, Cristina Muoz,...

21
Neuromorphic Audition Group Massimiliano Versace, Jonathan Tapson, Rico Moeckel, Cristina Muoz, Yingxue Wang, Shih- Chii Liu, Andre van Schaik, Hynek Hermansky, David Anderson, Malcolm Slaney, Andrew Schwartz, Tara Julia Hamilton, John Harris, Nima Mesgarani, Shihab Shamma

Transcript of Neuromorphic Audition Group Massimiliano Versace, Jonathan Tapson, Rico Moeckel, Cristina Muoz,...

Page 1: Neuromorphic Audition Group Massimiliano Versace, Jonathan Tapson, Rico Moeckel, Cristina Muoz, Yingxue Wang, Shih-Chii Liu, Andre van Schaik, Hynek Hermansky,

Neuromorphic Audition Group

Massimiliano Versace, Jonathan Tapson, Rico Moeckel, Cristina Muoz, Yingxue Wang, Shih-Chii Liu, Andre van Schaik, Hynek Hermansky, David Anderson, Malcolm

Slaney, Andrew Schwartz, Tara Julia Hamilton, John Harris, Nima Mesgarani, Shihab Shamma

Page 2: Neuromorphic Audition Group Massimiliano Versace, Jonathan Tapson, Rico Moeckel, Cristina Muoz, Yingxue Wang, Shih-Chii Liu, Andre van Schaik, Hynek Hermansky,

Outline

• Field Programmable Analog Array (Dave)• Speaker Identification (Malcolm, Nima and Max)• Speech Recognition (Hynek, Misha, Jordon)• STRF Noise Suppression (Nima, Shihab, Dave)• Reconstructions from STRF/Modulation Detectors

(Nima, Shihab)• Social sonar demonstration using silicon cochlea and

RoboQuad toy (Toby and Malcolm)• Cochlear ITD Detector (Andrew, Malcolm, Shih-Chii)• Cochlear Periodicity Detector (Teddy, John, Malcolm,

Shih-Chii)

Page 3: Neuromorphic Audition Group Massimiliano Versace, Jonathan Tapson, Rico Moeckel, Cristina Muoz, Yingxue Wang, Shih-Chii Liu, Andre van Schaik, Hynek Hermansky,

FPAA

Page 4: Neuromorphic Audition Group Massimiliano Versace, Jonathan Tapson, Rico Moeckel, Cristina Muoz, Yingxue Wang, Shih-Chii Liu, Andre van Schaik, Hynek Hermansky,

Speaker ID

Features Model

Features Model

Features Model

Features Model

WinnerTake

All

MFCCSTRF

GMMART

Page 5: Neuromorphic Audition Group Massimiliano Versace, Jonathan Tapson, Rico Moeckel, Cristina Muoz, Yingxue Wang, Shih-Chii Liu, Andre van Schaik, Hynek Hermansky,

Speaker ID - STRF

Page 6: Neuromorphic Audition Group Massimiliano Versace, Jonathan Tapson, Rico Moeckel, Cristina Muoz, Yingxue Wang, Shih-Chii Liu, Andre van Schaik, Hynek Hermansky,

Speaker ID – ARTMalcom Slaney – Heather Ames – Max Versace

Supervised Fuzzy Adaptive Resonance Theory neural network (ARTMAP) uses top-down expectations to learn categories

First test: three synthesized vowels (large clusters) spoken by three speakers (different colors) represented in 2D feature space.

Page 7: Neuromorphic Audition Group Massimiliano Versace, Jonathan Tapson, Rico Moeckel, Cristina Muoz, Yingxue Wang, Shih-Chii Liu, Andre van Schaik, Hynek Hermansky,

Speaker ID - ART Results

Feature extraction

Feature extraction

Vowel extraction

Vowel extraction

TrainingTraining

Features

Feature vectors for“vowel” data

Acoustic Model of

Speaker Identity

Speech input (.wav)

12 MFCC + E, First and second derivs

Utterance Independent

transformation

Utterance Independent

transformation

TransformedFeatures

½ wave rectify, Lowpass filter,

Choice of high energy timeslices

TBD

ARTMAP TestingTesting

PredictedSpeaker Identity

50% correct after 100 cross-validations (# of instances of ARTMAP run)on 10 speaker identification

Continued work:1.Improved vowel extraction2.Utterance independent transformation of feature space

Why we care?Top-DownOnline

Page 8: Neuromorphic Audition Group Massimiliano Versace, Jonathan Tapson, Rico Moeckel, Cristina Muoz, Yingxue Wang, Shih-Chii Liu, Andre van Schaik, Hynek Hermansky,

Speaker ID - Results

Test % Correct

% Correctin 5dB noise

MFCC (Baseline)

81.3% 81.0%

STRF 79.8%

ART ~60%

Very preliminary work!!!! Comparing to technology (MFCC+GMM) that have been perfected over

decades.

Page 9: Neuromorphic Audition Group Massimiliano Versace, Jonathan Tapson, Rico Moeckel, Cristina Muoz, Yingxue Wang, Shih-Chii Liu, Andre van Schaik, Hynek Hermansky,

ASR - Phoneme Posteriors

Page 10: Neuromorphic Audition Group Massimiliano Versace, Jonathan Tapson, Rico Moeckel, Cristina Muoz, Yingxue Wang, Shih-Chii Liu, Andre van Schaik, Hynek Hermansky,

ASR - Combining InformationTr

aini

ng

Cont

ext

{ } { }, Pr | ,Q X C Correct X C=

C

X

?Machines P(word|sound)P(word|context)

Humans [1-P(word|sound)] [1-P(word|context)]

Maximize

Page 11: Neuromorphic Audition Group Massimiliano Versace, Jonathan Tapson, Rico Moeckel, Cristina Muoz, Yingxue Wang, Shih-Chii Liu, Andre van Schaik, Hynek Hermansky,

Inverse model: from neural responses to sound

QuickTime™ and a decompressor

are needed to see this picture.

Page 12: Neuromorphic Audition Group Massimiliano Versace, Jonathan Tapson, Rico Moeckel, Cristina Muoz, Yingxue Wang, Shih-Chii Liu, Andre van Schaik, Hynek Hermansky,

Reconstruction of speech in white noise

• Reconstructed speech is “cleaner” than the original noisy

QuickTime™ and a decompressor

are needed to see this picture.

Original Spectrograms Reconstructed Spectrograms

Page 13: Neuromorphic Audition Group Massimiliano Versace, Jonathan Tapson, Rico Moeckel, Cristina Muoz, Yingxue Wang, Shih-Chii Liu, Andre van Schaik, Hynek Hermansky,

Psychoacoustically-motivated Speech Enhancement

• Perceptual loudnessL=(b*e(t))^a

• By mapping loudness using the same type of function, noise can be decreased

• Results from STRFprocessing

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Page 14: Neuromorphic Audition Group Massimiliano Versace, Jonathan Tapson, Rico Moeckel, Cristina Muoz, Yingxue Wang, Shih-Chii Liu, Andre van Schaik, Hynek Hermansky,

Noise suppression using inverse model

• Train G-filters on reconstructing clean stimuli from corresponding noisy responses. Apply the trained filters to new noisy responses

14

Cortical decomposition “Trained” inverse filters

QuickTime™ and a decompressor

are needed to see this picture.

QuickTime™ and a decompressor

are needed to see this picture.

Page 15: Neuromorphic Audition Group Massimiliano Versace, Jonathan Tapson, Rico Moeckel, Cristina Muoz, Yingxue Wang, Shih-Chii Liu, Andre van Schaik, Hynek Hermansky,

Noise Suppression for White, Jet and City Noise

15

Page 16: Neuromorphic Audition Group Massimiliano Versace, Jonathan Tapson, Rico Moeckel, Cristina Muoz, Yingxue Wang, Shih-Chii Liu, Andre van Schaik, Hynek Hermansky,

RS MediaLinux version 2.4.18-rmk5-mx1ads-p3 ([email protected]) (gcc version 2.95.3 20010315 (release)) #517 Fri Feb 16 11:40:45 HKT 2007Processor: ARM/CIRRUS Arm920Tsid(wb) revision 0Architecture: Motorola MX1ADSOn node 0 totalpages: 8192zone(0): 8192 pages.zone(1): 0 pages.zone(2): 0 pages.Kernel command line: root=fe01 ro mem=32MConsole: colour dummy device 80x30Calibrating delay loop... 98.50 BogoMIPSMemory: 32MB = 32MB totalMemory: 30816KB available (1023K code, 316K data, 60K init)Dentry-cache hash table entries: 4096 (order: 3, 32768 bytes)Inode-cache hash table entries: 2048 (order: 2, 16384 bytes)Mount-cache hash table entries: 512 (order: 0, 4096 bytes)Buffer-cache hash table entries: 1024 (order: 0, 4096 bytes)Page-cache hash table entries: 8192 (order: 3, 32768 bytes)POSIX conformance testing by UNIFIXLinux NET4.0 for Linux 2.4Based upon Swansea University Computer Society NET3.039Initializing RT netlink socketStarting kswapdttySA0 at I/O 0x206000 (irq = 29) is a MX1ADSttySA1 at I/O 0x207000 (irq = 23) is a MX1ADSpty: 256 Unix98 ptys configuredDMA InitializingLinux version 2.4.18-rmk5-mx1ads-p3 ([email protected]) (gcc version 2.95.3 20010315 (release)) #517 Fri Feb 16 11:40:45 HKT 2007Processor: ARM/CIRRUS Arm920Tsid(wb) revision 0Architecture: Motorola MX1ADSOn node 0 totalpages: 8192zone(0): 8192 pages.zone(1): 0 pages.zone(2): 0 pages.Kernel command line: root=fe01 ro mem=32MConsole: colour dummy device 80x30Calibrating delay loop... 98.50 BogoMIPSMemory: 32MB = 32MB totalMemory: 30816KB available (1023K code, 316K data, 60K init)Dentry-cache hash table entries: 4096 (order: 3, 32768 bytes)Inode-cache hash table entries: 2048 (order: 2, 16384 bytes)Mount-cache hash table entries: 512 (order: 0, 4096 bytes)Buffer-cache hash table entries: 1024 (order: 0, 4096 bytes)Page-cache hash table entries: 8192 (order: 3, 32768 bytes)POSIX conformance testing by UNIFIXLinux NET4.0 for Linux 2.4Based upon Swansea University Computer Society NET3.039Initializing RT netlink socketStarting kswapdttySA0 at I/O 0x206000 (irq = 29) is a MX1ADSttySA1 at I/O 0x207000 (irq = 23) is a MX1ADSpty: 256 Unix98 ptys configuredDMA Initializing

Page 17: Neuromorphic Audition Group Massimiliano Versace, Jonathan Tapson, Rico Moeckel, Cristina Muoz, Yingxue Wang, Shih-Chii Liu, Andre van Schaik, Hynek Hermansky,

Cochlear - ITD Detector

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Time

Position

Page 18: Neuromorphic Audition Group Massimiliano Versace, Jonathan Tapson, Rico Moeckel, Cristina Muoz, Yingxue Wang, Shih-Chii Liu, Andre van Schaik, Hynek Hermansky,

Cochlear - JAER Demo

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Page 19: Neuromorphic Audition Group Massimiliano Versace, Jonathan Tapson, Rico Moeckel, Cristina Muoz, Yingxue Wang, Shih-Chii Liu, Andre van Schaik, Hynek Hermansky,

Cochlear - Periodicity detectorResponse to “hiss” Response to “coo”

Page 20: Neuromorphic Audition Group Massimiliano Versace, Jonathan Tapson, Rico Moeckel, Cristina Muoz, Yingxue Wang, Shih-Chii Liu, Andre van Schaik, Hynek Hermansky,
Page 21: Neuromorphic Audition Group Massimiliano Versace, Jonathan Tapson, Rico Moeckel, Cristina Muoz, Yingxue Wang, Shih-Chii Liu, Andre van Schaik, Hynek Hermansky,

When both channels conditionally independent• pCpA – probability of correct recognition in both channels• pC(1-pA ) – correct in ch1 but not in ch2

• pA(1-pC) – correct in ch2 but not in ch1

These three cases are mutually exclusive, thus probability of correct recogntion is

p = pCpA + pC(1-pA) + pA(1-pC) = pC+pA-pCpA

Probability of error

e = (1-p) = 1-pC-pA+pCpA = (1-pC)(1-pA) = eCeA

context(top-down)

acoustic(bottom-up)

pC

pA

stimulus decision