Retrieval Methods for QBSH (Query By Singing/Humming) J.-S. Roger Jang ( 張智星 ) [email protected]...

15
Retrieval Methods for QBSH (Query By Singing/Humming) J.-S. Roger Jang ( 張張張 ) [email protected] http://mirlab.org/jang Multimedia Information Retrieval Lab CSIE Dept, National Taiwan University

Transcript of Retrieval Methods for QBSH (Query By Singing/Humming) J.-S. Roger Jang ( 張智星 ) [email protected]...

Page 1: Retrieval Methods for QBSH (Query By Singing/Humming) J.-S. Roger Jang ( 張智星 ) jang@mirlab.org  Multimedia Information Retrieval.

Retrieval Methods for QBSH (Query By Singing/Humming)

J.-S. Roger Jang (張智星 )

[email protected]

http://mirlab.org/jang

Multimedia Information Retrieval Lab

CSIE Dept, National Taiwan University

Page 2: Retrieval Methods for QBSH (Query By Singing/Humming) J.-S. Roger Jang ( 張智星 ) jang@mirlab.org  Multimedia Information Retrieval.

Retrieval Methods for QBSH

Goal Find the most similar melody in the database

Challenges Robust pitch tracking for various acoustic inputs

Input from mobile devicesInput at a noisy karaoke box

Comparison methods should be able to deal with…Key variations in users’ input (for instance, due to gender

difference)Tempo variations in users’ inputReasonable response time, e.g., 5 seconds

Page 3: Retrieval Methods for QBSH (Query By Singing/Humming) J.-S. Roger Jang ( 張智星 ) jang@mirlab.org  Multimedia Information Retrieval.

Evaluation of QBSH Methods

Two categories for evaluating QBSH methods Efficiency: How fast is the system?

Can it deal with a music database of size 100K?

Effectiveness: How accurate is the system?Top-10 recognition rates for n queries:

• (1+0+0+1+1…)/n

Top-10 mean reciprocal rank for n queries:• (1/3+1/inf+1/4+1/2+1/5…)/n

True positive and true negative to deal with out-of-vocabulary (OOV) problem

Page 4: Retrieval Methods for QBSH (Query By Singing/Humming) J.-S. Roger Jang ( 張智星 ) jang@mirlab.org  Multimedia Information Retrieval.

Types of QBSH Approaches

Categories of approaches to QBSH Histogram/statistics-based Note vs. note

Edit distance

Frame vs. noteHMM

Frame vs. frameLinear scaling, DTW, recursive alignment

Page 5: Retrieval Methods for QBSH (Query By Singing/Humming) J.-S. Roger Jang ( 張智星 ) jang@mirlab.org  Multimedia Information Retrieval.

Linear Scaling (LS)

Concept Scale the query linearly to match the candidates

Assumption Uniform tempo variation

Rest handling Cut leading and trailing zeros (silence) All the other zeros (rests) are replaced with the

previous non-zero pitch

Page 6: Retrieval Methods for QBSH (Query By Singing/Humming) J.-S. Roger Jang ( 張智星 ) jang@mirlab.org  Multimedia Information Retrieval.

Linear Scaling

Scale the query pitch linearly to match the candidates

Original input pitch

Stretched by 1.25

Stretched by 1.5

Compressed by 0.75

Compressed by 0.5

Target pitch in database

Best match

Original pitch

Page 7: Retrieval Methods for QBSH (Query By Singing/Humming) J.-S. Roger Jang ( 張智星 ) jang@mirlab.org  Multimedia Information Retrieval.

Strength and Weakness of LS

Strength One-shot for dealing

with key transposition Efficient and effective Indexing methods

available

Weakness Cannot deal with non-

uniform tempo variations

Typical mapping path

Page 8: Retrieval Methods for QBSH (Query By Singing/Humming) J.-S. Roger Jang ( 張智星 ) jang@mirlab.org  Multimedia Information Retrieval.

Shorten or Lengthen a Pitch Vector

Given a pitch vector x of length m, how to shorten or lengthen it to length n? x2=interp1(1:m, x, linspace(1, m, n)); Examples

m=7, n=13m=7, n=9

Page 9: Retrieval Methods for QBSH (Query By Singing/Humming) J.-S. Roger Jang ( 張智星 ) jang@mirlab.org  Multimedia Information Retrieval.

Distance Function for LS

Commonly used distance function for LS Normalized Lp-norm

Characteristics Usually p=1 or 2 for LS Normalization to get rid of length variations

pp

n

pp

p n

xxxxL

/1

21)(

Page 10: Retrieval Methods for QBSH (Query By Singing/Humming) J.-S. Roger Jang ( 張智星 ) jang@mirlab.org  Multimedia Information Retrieval.

Key Transposition in LS

How to find the best transposed query that has the smallest distance from the database items: Best transposition

In practice…

)(minargˆ rsqLs ps

Query

Database item

Transposed query

)()()(ˆ1

)()()(ˆ2

rmedianqmedianrqmediansp

rmeanqmeanrqmeansp

Page 11: Retrieval Methods for QBSH (Query By Singing/Humming) J.-S. Roger Jang ( 張智星 ) jang@mirlab.org  Multimedia Information Retrieval.

Example of Linear Scaling via L1 Norm

linScaling01.m

0 50 100 150 200 250 300 350

50

60

70Database and input pitch vectors

Sem

itone

s

Database pitch

Input pitch

0 50 100 150 200 250 300 350

50

60

70

Sem

itone

s

Database and scaled pitch vectors

Database pitch

Scaled pitch

0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.50

2

4

Scaling factor

Dis

tanc

e

Normalized distance

Page 12: Retrieval Methods for QBSH (Query By Singing/Humming) J.-S. Roger Jang ( 張智星 ) jang@mirlab.org  Multimedia Information Retrieval.

Linear Scaling via L1 and L2 Norm

linScaling02.m

0 50 100 150 200 250 300 350

50

60

70Database and input pitch vectors

Sem

itone

s

Database pitch

Input pitch

0 50 100 150 200 250 300 350

50

60

70

Sem

itone

s

Database and scaled pitch vectors

Database pitch

Scaled pitch via L1 norm

Scaled pitch via L2 norm

0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.50

5

Scaling factor

Dis

tanc

es

Normalized distances via L1 & L

2 norm

L

1 norm

L2 norm

Page 13: Retrieval Methods for QBSH (Query By Singing/Humming) J.-S. Roger Jang ( 張智星 ) jang@mirlab.org  Multimedia Information Retrieval.

DTW (Dynamic Time Warping)

About DTW DTW introduction DTW for QBSH#1 method for task 2 in QBSH/MIREX 2006

Page 14: Retrieval Methods for QBSH (Query By Singing/Humming) J.-S. Roger Jang ( 張智星 ) jang@mirlab.org  Multimedia Information Retrieval.

RA (Recursive Alignment)

Characteristics Combine characteristics

of LS & DTW #1 method for task 1 in

QBSH/MIREX 2006

A typical mapping path

Page 15: Retrieval Methods for QBSH (Query By Singing/Humming) J.-S. Roger Jang ( 張智星 ) jang@mirlab.org  Multimedia Information Retrieval.

Modified Edit Distance

Note segmentation

Modified edit distance

,

)(}2),,....,,({

)(}2),,,....,({

)(),(

)(),(

)(),(

min

1,1

11,

1,1

1,

,1

,

ionfragmentatjkbbawd

ionconsolidatikbaawd

treplacemenbawd

insertionbwd

deletionawd

d

jkjikji

jikijki

jiji

jji

ji

ji