Progressive Filtering and Its Application for Query-by-Singing/Humming

23
111/06/13 1 Progressive Filtering and Its Application for Query- by-Singing/Humming J.-S. Roger Jang ( 張張張 ) Multimedia Information Retrieval Lab CS Dept., Tsing Hua Univ., Taiwan http://www.cs.nthu.edu.tw/~jang

description

Progressive Filtering and Its Application for Query-by-Singing/Humming. J.-S. Roger Jang ( 張智星 ) Multimedia Information Retrieval Lab CS Dept., Tsing Hua Univ., Taiwan http://www.cs.nthu.edu.tw/~jang. Recent Publications. Journals - PowerPoint PPT Presentation

Transcript of Progressive Filtering and Its Application for Query-by-Singing/Humming

Page 1: Progressive Filtering and Its Application for Query-by-Singing/Humming

112/04/20 1

Progressive Filtering and Its Application for Query-by-Singing/Humming

J.-S. Roger Jang (張智星 )

Multimedia Information Retrieval Lab

CS Dept., Tsing Hua Univ., Taiwan

http://www.cs.nthu.edu.tw/~jang

Page 2: Progressive Filtering and Its Application for Query-by-Singing/Humming

-2-

Recent Publications

Journals Jiang-Chun Chen, J.-S. Roger Jang, "TRUES: Tone Recognition Using Extended

Segments", ACM Transactions on Asian Language Information Processing, 2008. J.-S. Roger Jang and Hong-Ru Lee, "A General Framework of Progressive Filtering and Its

Application to Query by Singing/Humming", IEEE Transactions on Audio, Speech, and Language Processing, No. 2, Vol. 16, PP. 350-358, Feb 2008.

Conferences Liang-Yu Chen, Chun-Jen Lee, Jyh-Shing Roger Jang, "Minimum Phone Error

Discriminative Training For Mandarin Chinese Speaker Adaptation", Proceedings of INTERSPEECH 2008, Brisbane, Australia, Sept. 2008.

Chao-Ling Hsu, Jyh-Shing Roger Jang, and Te-Lu Tsai, "Separation of Singing Voice from Music Accompaniment with Unvoiced Sounds Reconstruction for Monaural Recordings", Proceedings of 125th AES Convention, San Francisco, USA, Oct. 2008.

Zhi-Sheng Chen, Jia-Min Zen, Jyh-Shing Roger Jang, "Music Annotation and Retrieval System Using Anti-Models", Proceedings of 125th AES Convention, San Francisco, USA, Oct. 2008.

Page 3: Progressive Filtering and Its Application for Query-by-Singing/Humming

-3-

Outline

Problem definition of QBSHMethods for QBSHProgressive FilteringConclusions

Page 4: Progressive Filtering and Its Application for Query-by-Singing/Humming

-4-

Introduction to QBSH

QBSH: Query by Singing/Humming Input: Singing or humming from microphone Output: A ranking list retrieved from the song

database

Overview First paper: Around1994 Extensive studies since 2001 State of the art: QBSH tasks at ISMIR/MIREX

Page 5: Progressive Filtering and Its Application for Query-by-Singing/Humming

-5-

Challenges in QBSH Systems

Reliable pitch tracking for acoustic input Input from mobile devices Input at noisy karaoke box

Song database preparation Audio music vs. MIDIs

Efficient/effective retrieval Karaoke machine: ~10,000 songs Internet music search engine: ~500,000,000 songs

Page 6: Progressive Filtering and Its Application for Query-by-Singing/Humming

-6-

Page 7: Progressive Filtering and Its Application for Query-by-Singing/Humming

-7-

Goal and Approach

Goal: To retrieve songs effectively within a given response time, say 5 seconds or so

Our strategy Multi-stage progressive filtering Data-driven design methodology based on DP

Page 8: Progressive Filtering and Its Application for Query-by-Singing/Humming

-8-

Approaches to QBSH

Pitch TrackingMethods for QBSH

Page 9: Progressive Filtering and Its Application for Query-by-Singing/Humming

-9-

A Quick Demo of QBSH

Demo page of MIR lab: http://mirlab.org/mir_main/demo.htm

Demo of QBSH http://mirlab.org/Demo/MusicSearch/index.htm

Page 10: Progressive Filtering and Its Application for Query-by-Singing/Humming

-10-

Progressive Filtering

Multi-stage representation Each stage is a method for QBSH

stage1

stage2

stagei

0n 1n 2n 1in in

1s 2s is

)( 1101 stnd )( 2212 stnd )(1 iiii stnd

… …

si: survival rate for stage idi: delay for stage ini-1: no. of input songs to stage i

Page 11: Progressive Filtering and Its Application for Query-by-Singing/Humming

-11-

Stage Characteristics for Effectiveness

RS curve for stage i: recog. rate = ri(s)

Survival rates s (%)

Recog.rates (%)

More effective method

Less effective method

Random guess

10010

100

65

Top-10% recog. rate is 65%

(0, 0)

(100, 100)

Survival rateSurvival rate

Recog. rateRecog. rate

Page 12: Progressive Filtering and Its Application for Query-by-Singing/Humming

-12-

TS curve for stage i: average time = ti(s)

Stage Characteristics for Efficiency

Survival rates (%)

Average time(ms)

Less efficient method

More efficient method

10010

5

When s=10%, the averageone-to-one comparison timeis 5ms

Survival rateSurvival rate

TimeTime

(0, 0)

(100, 0)

Page 13: Progressive Filtering and Its Application for Query-by-Singing/Humming

-13-

Formulation as an Optim. Problem

Max:

subject to the constraints

n (= n0): Size of the song database

Tmax : maximum allowable response time, say, 5 sec.

10 : the size of the retrieved ranking list.

10321

max121332122111

m

mmm

sssns

Tstssnsstsnsstnssnt

mmm srsrsrsssR 221121 ),,,(

Page 14: Progressive Filtering and Its Application for Query-by-Singing/Humming

-14-

DP-based Approach

The orig. optim. task can be cast into DP: Optimum-value function Ri(s, t) is the optimum

recog. rate at stage i, with a cumulated survival rate s and a cumulated computation time t.

Recurrent formula for Ri(s, t) can be derived based on changing the survival rate of stage i, as follows.

xtx

snt

x

sRxrtsR iii

xsxi ,max),( 1

1,

Page 15: Progressive Filtering and Its Application for Query-by-Singing/Humming

-15-

Recurrent formula for Ri(s, t)

xtx

snt

x

sRxrtsR iii

xsxi ,max),( 1

1,

stage1

stagei-1

stagei

0n 2in 1in in

1s 1is xsi

1d 1id id

… …1n

),( tsRi

di: delay of stage i

)(xri ii dtxsR ,/1

Page 16: Progressive Filtering and Its Application for Query-by-Singing/Humming

-16-

DP-based Approach

Boundary conditions for Ri(s, t) :

Optim. recog. rate:

We can then back track to find the optimum s1, s2, …, sm.

sitiftsR

tisiftsR

i

i

,,0 0),(

,,0 0),(

max,10

Tn

Rm

Page 17: Progressive Filtering and Its Application for Query-by-Singing/Humming

-17-

Five Stages for Our Study

We chose 5 stages for DP-based design method: Range comparison Modified edit distance LS DTW with down-sampled inputs DTW

Page 18: Progressive Filtering and Its Application for Query-by-Singing/Humming

-18-

Corpora

QBSH corpus 2797 8-second recordings (8 KHz, 8 bits) of 48

kids songs, by118 subjects 500 for design set, the others for test

Song database 13320 songs

Comparison mode Anchored beginning

Page 19: Progressive Filtering and Its Application for Query-by-Singing/Humming

-19-

RS curves

Page 20: Progressive Filtering and Its Application for Query-by-Singing/Humming

-20-

TS Curves

Page 21: Progressive Filtering and Its Application for Query-by-Singing/Humming

-21-

Optimum RR wrt Response Time

Page 22: Progressive Filtering and Its Application for Query-by-Singing/Humming

-22-

Survival Rates wrt Response Time

Page 23: Progressive Filtering and Its Application for Query-by-Singing/Humming

-23-

Conclusions & Future Work

Conclusions Advantages:

A scalable meta-methodFeasible for optimizing QBSH systemsApplicable (?) to other multimedia retrieval systems

DisadvantagesDerivation of RS and TS curves is time-consuming

Future work More effective/efficient method for each stage