Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and Computer Engineering

19
identification of microchiroptera from echolocation calls Lessons learned from human automatic speech recognition Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and Computer Engineering University of Florida Gainesville, FL, USA December 1, 2004

description

Statistical automatic identification of microchiroptera from echolocation calls Lessons learned from human automatic speech recognition. Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and Computer Engineering University of Florida Gainesville, FL, USA December 1, 2004. - PowerPoint PPT Presentation

Transcript of Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and Computer Engineering

Page 1: Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and Computer Engineering

Statistical automatic identification of microchiroptera from echolocation calls

Lessons learned from human automatic speech recognition

Mark D. SkowronskiComputational Neuro-Engineering LabElectrical and Computer Engineering

University of FloridaGainesville, FL, USADecember 1, 2004

Page 2: Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and Computer Engineering

Overview• Motivations for bat acoustic research• Review bat call classification methods• Contrast with 1970s human ASR

– Machine learning vs. expert knowledge• Experiments• Conclusions and future work

Page 3: Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and Computer Engineering

Bat research motivations• Bats are among:

– the most diverse (25% of all mammal species),– the most endangered,– and the least studied mammals.

• Close relationship with insects– agricultural impact– disease vectors

• Acoustical research– non-invasive (compared to netting)– significant domain (echolocation)

Page 4: Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and Computer Engineering

More motivations• Calls simple compared to human speech• Same goals as human ASR

– Detection– Feature extraction– Classification– Noise-robust performance

• Easier to design/develop models• Domain between toy problems and ASR

Page 5: Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and Computer Engineering

Bat echolocation• Ultrasonic, brief chirps (~active sonar)• Determine range, velocity of nearby objects

(clutter, prey, other bats)• Tailored for task, environment

Tadarida brasiliensis (Mexican free-tailed bat)

Listen to 10x time-expanded search calls:Sound (OLE2)

Page 6: Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and Computer Engineering

Echolocation calls• Two characteristics

– Frequency modulated (range information)– Constant frequency (velocity information)

• Features (holistic)– Freq. extrema– Duration– Shape– # harmonics– Call interval

Mexican free-tailed calls, concatenated

Page 7: Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and Computer Engineering

Current classification methods• Expert sonogram readers

– Manual or automatic feature extraction• Griffin 1958, Fenton and Bell 1981

– Comparison with exemplar sonograms– Decision trees

• Automatic classification– Discriminant function analysis

• By far the most popular method in literature• Available in statistical software packages (SAS, SPSS)

– Others• Artificial neural networks, Parsons 2001• Spectrogram correlation, Pettersson Elektronik AB

Parallels the 1970s acoustic-phonetic approach to human ASR.

Page 8: Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and Computer Engineering

Acoustic phonetics

• Bottom up paradigm– Frames, boundaries, groups, phonemes, words– Mimics techniques of expert spectrogram readers

• Manual or automatic feature extraction– Formants, voicing, duration, intensity, transitions

• Classification– Decision tree, discriminant functions, neural network, Gaussian

mixture model, Viterbi path

DH AH F UH T B AO L G EY EM IH Z OW V ER

Page 9: Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and Computer Engineering

Acoustic phonetics limitations• Variability of conversational speech

– Complex rules, difficult to train• Boundaries difficult to define

– Coarticulation, reduction• Feature estimates brittle

– Variable noise robustness• Hard decisions, errors accumulate

Shifted to machine learning paradigm of human ASR by 1980s: better able to account for variability of speech, noise.

Page 10: Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and Computer Engineering

Machine learning ASR• Data-driven models

– Non-parametric: dynamic time warp (DTW)– Parametric: hidden Markov model (HMM)

• Frame-based– Identical features from every frame– Expert information in feature extraction– Models account for feature, temporal

variabilitiesMachine learning dominates state-of-the-art ASR.

Page 11: Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and Computer Engineering

Data collection• UF Bat House, home to 60,000 bats

– Mexican free-tailed bat (vast majority)– Evening bat– Southeastern myotis

• Continuous recording– 90 minutes around sunset– ~20,000 calls

• Equipment:– B&K mic (4939), 100 kHz– B&K preamp (2670)– Custom amp/AA filter– NI 6036E 200kS/s A/D card– Laptop, Matlab– Portable

Page 12: Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and Computer Engineering

Experiment design• Hand labels as ground truth

– Narrowband spectrogram– 436 calls (2% of data) in 3 hours (80x real time)– Four classes, a priori: 34, 40, 20, 6%– All experiments on hand-labeled data only– No hand-labeled calls excluded from experiments

1 2 3 4

Page 13: Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and Computer Engineering

Methods• Baseline, from the literature

– Features• Duration• Zero crossing: Fmin, Fmax, Fmax_energy• MUSIC super resolution frequency estimator

– Classifier• Discriminant function analysis, quadratic boundaries

• DTW and HMM– Features

• Frequency (MUSIC), log energy, Δs (HMM only)– HMM

• 5 states/model• 4 Gaussian mixtures/state, diagonal covariances

• Tests– Leave one out– Repeated trials: 25% test data, 1000 trials– Test on train data (HMM only)

Page 14: Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and Computer Engineering

Results• Baseline, zero crossing

– Leave one out: 72.5% correct– Repeated trials: 72.5 ± 4% (mean ± std)

• Baseline, MUSIC– Leave one out: 79.1%– Repeated trials: 77.5 ± 4%

• DTW– Leave one out: 74.5 %– Repeated trials: 74.1 ± 4%

• HMM– Test on train: 85.3 %

Page 15: Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and Computer Engineering

Confusion matrices1 2 3 4

1 107 38 1 2 72.3%

2 21 134 16 4 76.6%

3 2 29 57 0 64.8%

4 4 3 0 18 72.0%

72.5%

Baseline, zero crossing Baseline, MUSIC

DTW HMM

1 2 3 4

1 110 36 1 1 74.3%

2 12 149 12 2 85.1%

3 4 18 66 0 75.0%

4 3 2 0 20 80.0%

79.1%

1 2 3 4

1 115 29 0 4 77.7%

2 32 131 11 1 74.9%

3 5 20 63 0 71.6%

4 5 4 0 16 64.0%

74.5%

1 2 3 4

1 118 25 0 5 79.7%

2 10 154 5 6 88.0%

3 1 12 75 0 85.2%

4 0 0 0 25 100%

85.3%

Page 16: Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and Computer Engineering

Comments• Experiments

– Weakness: accuracy of class labels– No labeled calls excluded, realistic– HMM most accurate, but undertrained– MUSIC frequency estimate robust, but 1000x slower

than ZCA (20x real time)• Machine learning

– Expert information still necessary• Feature extraction (dimensionality reduction)• Model parameters

– DTW: fast training, slow classification– HMM: slow training, fast classification (real time)

Page 17: Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and Computer Engineering

Future work• Ultimate goal

– Real-time portable system for species ID– Commercial product possibilites

• Feature extraction– Robust

• Broadband noise• Echos• Unknown distance between bat and microphone

– Chirp model, echo model– Faster frequency estimates– Match assumptions of classifiers

Page 18: Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and Computer Engineering

More future work• Detection

– Replace energy-based method with principled statistical methods using frame-based features

• Classification– Accurate class labels for training

• Netting• Record from known bat roosts (preferred)

– Pseudo-sinusoidal input• Oscillator network• Echo state network

Page 19: Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and Computer Engineering

Information

[email protected]• http://www.cnel.ufl.edu/~markskow