Trends dio and oustic Signal...

Trends

Audio and Acoustic Signal Processing

ICASSP 2011

Malcolm Slaney, Yahoo! Research Silicon Valley, USA

Patrick A. Naylor, Imperial College London, UK

What do we mean by ‘Trends’?

ACTIVITY# papers submitted

ACHIEVEMENTS

Things we can do now

that we couldn’t beforeACHIEVEMENTS

Things we can do now

that we couldn’t before

CHALLENGES

Things we want to do

but can’t do yetCHALLENGES

Things we want to do

but can’t do yet

OPPORTUNITIES

Things we didn’t

know we

wanted to do

OPPORTUNITIES

Things we didn’t

know we

wanted to do

CURIOSITIES

Things that might be

interesting but we don’t

know what to do with them

CURIOSITIES

Things that might be

interesting but we don’t

know what to do with them

ResearchResearch

Historical Trends – ICASSP Submissions

Historical Trends – ICASSP Accepted Papers

Music at ICASSP

• Three sessions

– SS-L5: Music Signal Processing Exploiting Musical Knowledge

– AE-L3: Music Signal Processing

– AE-P7: Music Signal Processing

• Reasons

– New EDICS

– More content

– Commercially relevant

Big Datasets

• Easy to rip CDs

– Copyright issues

• Million-Song Dataset

– Distribute features

– Columbia and EchoNest

MIREX Competition

Musical Separation

• Sound separation

– Uses

• Understanding (key, melody)

• Transcriptions

• Multipitch estimation

– With better models

• HMM

• Scores

– Techniques

• NMF

• Matching Pursuit

• PLCA

From: POLYPHONIC AUDIO-TO-SCORE ALIGNMENT BASED ON BAYESIAN LATENT HARMONIC ALLOCATION HIDDEN MARKOV MODEL. Akira Maezawa,

Hiroshi G. Okuno, Tetsuya Ogata, Kyoto University, Japan; Masataka Goto, National Institute of Advanced Industrial Science and Technology, Japan

Music Research

• Tagging

– Genre

– Emotion

• Miscellaneous

– Morphing

– Similarity

AASP-P2.1: SOUND MORPHING BY FEATURE INTERPOLATION, Marcelo Caetano, Xavier Rodet, Institut de Recherche

et Coordination Acoustique/Musique, France

Applications vs. Algorithms?

If I’m going to be Queen, I suppose I

will not have much time left for

If I’m going to be Queen, I suppose I

will not have much time left for

My Expectation

Maximization

algorithm has converged !

My Expectation

Maximization

algorithm has converged !

If there is a trend towards things that look nice (applications), let’s not

lose sight of the fundamental power behind them (algorithms).

Microphone Array Signal Processing

• Hearing aids

• TV / Entertainment

• Linear, planar/cylindrical,

spherical, distributed

• Spacing and orientation

• Localization of sources

• Tracking

• Extraction/Separation

• Inference of room geometry

APPLICATIONS

GEOMETRY and DISTRIBUTION

the Eigenmike®

– mh Acoustics

32 elements

8.4 cm rigid sphere

2 - 64 elements, 0.5 m, linear ‘wing’ array

Planar Array Geometry Directivity Index

measured vs theoretical

Source Separation

• ‘Cocktail party problem’

– Colin Cherry in 1950s

– Audio signals from multi-talker

distant talking scenarios

– Behavior of a listener presented

with two speech signals

simultaneously

Colin Cherry

‘TRENDSETTER’

Source Separation

• Determined and underdetermined scenarios

– Clustering based blind source separation

– Permutation problem (EM)

– Reverberation times of, say, 100 - 500 ms

Speech Enhancement

• Dereverberation technology

real-world applications

– Single and multichannel

– Acoustic channel inversion

– Speech and Music

Speech Enhancement

• Dereverberation1 technology

real-world applications

– Single and multichannel

– Acoustic channel inversion

– Speech and Music

[1] P. A. Naylor and N. D. Gaubitch, Eds., Speech Dereverberation. Springer, 2010.

Synergies

•• Joint dereverberation and blind source separationJoint dereverberation and blind source separation

•• Speech recognition of reverberant speechSpeech recognition of reverberant speech

CleanSpeechHMM

Reverb-erationModel

Recurring Theme

S P A R S I T Y!

History of Sparsity at ICASSP

Matching PursuitMatching Pursuit

Compressed SensingCompressed Sensing

Deep Belief NetworksDeep Belief Networks

L1 RegularizationL1 Regularization

I thought:

Spatial-Temporal Receptive Fields

• Original sparse representation (spikes!)

SS-L7.1: SPEECH PROCESSING WITH A CORTICAL REPRESENTATION OF AUDIO, Nima Mesgarani, Shihab Shamma,

University of Maryland, United States

Deep Belief Network

From: SS-L7.4: LEARNING A BETTER REPRESENTATION OF SPEECH SOUND WAVES USING RESTRICTED BOLTZMANN

MACHINES, Navdeep Jaitly, Geoffrey Hinton, University of Toronto, Canada

These are NOT your average wavelet/Gabor response!!

Nonlinear Modeling via Sparsity

Basis 1

Basis 2

All-Possible Combinations Over-complete Basis

Industrial Perspectives

“Remaining challenges [in source separation] could include BSS for

unknown/dynamic number of sources.”Tomohiro Nakatani NTT Communication Science Laboratories

Mixed-signal ICs for mobile phones

“Moore’s Law is driving DSP speed and memory capacity … enabling

implementation of sophisticated DSP functions that have resulted from years of

research in acoustic signal processing. The end-user experience is one of natural

wideband voice communication, devoid of acoustic background noise and

unwanted artefacts.”

Anthony Magrath Director of DSP Technology, Wolfson Microelectronics

“The applications of sound capture, speech enhancement, and audio processing

technologies shift gradually from communications mostly, towards speech

recognition and building natural human-machine interfaces for mobile devices, in

cars, and in our living rooms.”Ivan Tashev Microsoft Research

iTunes Screen Shot

After-thought

• Trend

– Origin: Old English trendan ‘revolve, rotate’, of Germanic origin

• “What goes around, comes around” (?)

Texture

Sparsity

• Better representations

– Sparse

• Matching pursuit

• DBNs

• features for recognition

– New angles (cortical and textures)

– Subspaces (latent and otherwise)

Trends dio and oustic Signal...

Documents

Transcript of Trends dio and oustic Signal...

Marketing - 1.Dio

CC2650 LaunchPad Development Kit Quick Start Guide · CC2650 LaunchPad Development Kit Part Number: LAUNCHXL-CC2650 BoosterPack Ecosystem ... DIO 23 DIO 22 DIO 24 DIO 10 DIO 21 DIO

Dio navi introduction

Bondallian dio discussion4

DIO 15 - DIO, The International Journal of Scientific History

DIO Fizz Quizz

Knjiga - 1. Dio

Letras Dio

Kuharica Prvi Dio

DIO User Guide 1.0 - Intuicom Wireless Solutions · DIO transceiver is linking with an unlimited number of remote or slave DIO transceivers. While the master DIO transceiver is configured

ZEBA 1 DIO

Data brief - STEVAL-IDB009V1 - Evaluation platform based ...DIO 2 4 DIO 1 2 DIO 1 DIO 7 DIO 5 DIO 0 R E S E TN DIO 2 DIO 3 DIO 8 DIO 4 R E S E TN DIO 6 DIO 2 5 DIO 2 3 DIO 1 1 ...

Dio oper domestication

5. dio izjave

DIARY JOURNAL CATALOGUE 2020 - fmiast.co.za · Assorted DIO-4000-18 Black DIO-4000-01 Lime DIO-4000-06 Light Blue DIO-4000-10 Orange DIO-4000-12 Lilac DIO-4000-23 Pink DIO-4000-30

Matematica e-dio

Oncology DIO

DIO-24 DIO-96/144 User Manual Nov. 2011 Version 2 · DIO-24 24-bit Compatible DIO OPTO-22 Board. 1.1. Introduction . The DIO-24 provides 24 TTL digital I/O lines. The DIO-24 emulates

Dio Brochure

100 and 300 Series DIO Modules 1. Intent & Scope 2 ... · 100 and 300 Series DIO Modules Document IM-DIO-100-2.7 Page 7 5.1 DIO-110 Card Cage DIO The card cage mounted DIO-100 has