Progressive Filtering and Its Application for Query-by-Singing/Humming
description
Transcript of Progressive Filtering and Its Application for Query-by-Singing/Humming
112/04/20 1
Progressive Filtering and Its Application for Query-by-Singing/Humming
J.-S. Roger Jang (張智星 )
Multimedia Information Retrieval Lab
CS Dept., Tsing Hua Univ., Taiwan
http://www.cs.nthu.edu.tw/~jang
-2-
Recent Publications
Journals Jiang-Chun Chen, J.-S. Roger Jang, "TRUES: Tone Recognition Using Extended
Segments", ACM Transactions on Asian Language Information Processing, 2008. J.-S. Roger Jang and Hong-Ru Lee, "A General Framework of Progressive Filtering and Its
Application to Query by Singing/Humming", IEEE Transactions on Audio, Speech, and Language Processing, No. 2, Vol. 16, PP. 350-358, Feb 2008.
Conferences Liang-Yu Chen, Chun-Jen Lee, Jyh-Shing Roger Jang, "Minimum Phone Error
Discriminative Training For Mandarin Chinese Speaker Adaptation", Proceedings of INTERSPEECH 2008, Brisbane, Australia, Sept. 2008.
Chao-Ling Hsu, Jyh-Shing Roger Jang, and Te-Lu Tsai, "Separation of Singing Voice from Music Accompaniment with Unvoiced Sounds Reconstruction for Monaural Recordings", Proceedings of 125th AES Convention, San Francisco, USA, Oct. 2008.
Zhi-Sheng Chen, Jia-Min Zen, Jyh-Shing Roger Jang, "Music Annotation and Retrieval System Using Anti-Models", Proceedings of 125th AES Convention, San Francisco, USA, Oct. 2008.
-3-
Outline
Problem definition of QBSHMethods for QBSHProgressive FilteringConclusions
-4-
Introduction to QBSH
QBSH: Query by Singing/Humming Input: Singing or humming from microphone Output: A ranking list retrieved from the song
database
Overview First paper: Around1994 Extensive studies since 2001 State of the art: QBSH tasks at ISMIR/MIREX
-5-
Challenges in QBSH Systems
Reliable pitch tracking for acoustic input Input from mobile devices Input at noisy karaoke box
Song database preparation Audio music vs. MIDIs
Efficient/effective retrieval Karaoke machine: ~10,000 songs Internet music search engine: ~500,000,000 songs
-6-
-7-
Goal and Approach
Goal: To retrieve songs effectively within a given response time, say 5 seconds or so
Our strategy Multi-stage progressive filtering Data-driven design methodology based on DP
-8-
Approaches to QBSH
Pitch TrackingMethods for QBSH
-9-
A Quick Demo of QBSH
Demo page of MIR lab: http://mirlab.org/mir_main/demo.htm
Demo of QBSH http://mirlab.org/Demo/MusicSearch/index.htm
-10-
Progressive Filtering
Multi-stage representation Each stage is a method for QBSH
stage1
stage2
stagei
0n 1n 2n 1in in
1s 2s is
)( 1101 stnd )( 2212 stnd )(1 iiii stnd
… …
si: survival rate for stage idi: delay for stage ini-1: no. of input songs to stage i
-11-
Stage Characteristics for Effectiveness
RS curve for stage i: recog. rate = ri(s)
Survival rates s (%)
Recog.rates (%)
More effective method
Less effective method
Random guess
10010
100
65
Top-10% recog. rate is 65%
(0, 0)
(100, 100)
Survival rateSurvival rate
Recog. rateRecog. rate
-12-
TS curve for stage i: average time = ti(s)
Stage Characteristics for Efficiency
Survival rates (%)
Average time(ms)
Less efficient method
More efficient method
10010
5
When s=10%, the averageone-to-one comparison timeis 5ms
Survival rateSurvival rate
TimeTime
(0, 0)
(100, 0)
-13-
Formulation as an Optim. Problem
Max:
subject to the constraints
n (= n0): Size of the song database
Tmax : maximum allowable response time, say, 5 sec.
10 : the size of the retrieved ranking list.
10321
max121332122111
m
mmm
sssns
Tstssnsstsnsstnssnt
mmm srsrsrsssR 221121 ),,,(
-14-
DP-based Approach
The orig. optim. task can be cast into DP: Optimum-value function Ri(s, t) is the optimum
recog. rate at stage i, with a cumulated survival rate s and a cumulated computation time t.
Recurrent formula for Ri(s, t) can be derived based on changing the survival rate of stage i, as follows.
xtx
snt
x
sRxrtsR iii
xsxi ,max),( 1
1,
-15-
Recurrent formula for Ri(s, t)
xtx
snt
x
sRxrtsR iii
xsxi ,max),( 1
1,
stage1
stagei-1
stagei
0n 2in 1in in
1s 1is xsi
1d 1id id
… …1n
),( tsRi
di: delay of stage i
)(xri ii dtxsR ,/1
-16-
DP-based Approach
Boundary conditions for Ri(s, t) :
Optim. recog. rate:
We can then back track to find the optimum s1, s2, …, sm.
sitiftsR
tisiftsR
i
i
,,0 0),(
,,0 0),(
max,10
Tn
Rm
-17-
Five Stages for Our Study
We chose 5 stages for DP-based design method: Range comparison Modified edit distance LS DTW with down-sampled inputs DTW
-18-
Corpora
QBSH corpus 2797 8-second recordings (8 KHz, 8 bits) of 48
kids songs, by118 subjects 500 for design set, the others for test
Song database 13320 songs
Comparison mode Anchored beginning
-19-
RS curves
-20-
TS Curves
-21-
Optimum RR wrt Response Time
-22-
Survival Rates wrt Response Time
-23-
Conclusions & Future Work
Conclusions Advantages:
A scalable meta-methodFeasible for optimizing QBSH systemsApplicable (?) to other multimedia retrieval systems
DisadvantagesDerivation of RS and TS curves is time-consuming
Future work More effective/efficient method for each stage