Retrieval Methods for QBSH (Query By Singing/Humming) J.-S. Roger Jang ( 張智星 ) jang@mirlab.org...

Post on 17-Dec-2015

283 views 3 download

Transcript of Retrieval Methods for QBSH (Query By Singing/Humming) J.-S. Roger Jang ( 張智星 ) jang@mirlab.org...

Retrieval Methods for QBSH (Query By Singing/Humming)

J.-S. Roger Jang (張智星 )

jang@mirlab.org

http://mirlab.org/jang

Multimedia Information Retrieval Lab

CSIE Dept, National Taiwan University

Retrieval Methods for QBSH

Goal Find the most similar melody in the database

Challenges Robust pitch tracking for various acoustic inputs

Input from mobile devicesInput at a noisy karaoke box

Comparison methods should be able to deal with…Key variations in users’ input (for instance, due to gender

difference)Tempo variations in users’ inputReasonable response time, e.g., 5 seconds

Evaluation of QBSH Methods

Two categories for evaluating QBSH methods Efficiency: How fast is the system?

Can it deal with a music database of size 100K?

Effectiveness: How accurate is the system?Top-10 recognition rates for n queries:

• (1+0+0+1+1…)/n

Top-10 mean reciprocal rank for n queries:• (1/3+1/inf+1/4+1/2+1/5…)/n

True positive and true negative to deal with out-of-vocabulary (OOV) problem

Types of QBSH Approaches

Categories of approaches to QBSH Histogram/statistics-based Note vs. note

Edit distance

Frame vs. noteHMM

Frame vs. frameLinear scaling, DTW, recursive alignment

Linear Scaling (LS)

Concept Scale the query linearly to match the candidates

Assumption Uniform tempo variation

Rest handling Cut leading and trailing zeros (silence) All the other zeros (rests) are replaced with the

previous non-zero pitch

Linear Scaling

Scale the query pitch linearly to match the candidates

Original input pitch

Stretched by 1.25

Stretched by 1.5

Compressed by 0.75

Compressed by 0.5

Target pitch in database

Best match

Original pitch

Strength and Weakness of LS

Strength One-shot for dealing

with key transposition Efficient and effective Indexing methods

available

Weakness Cannot deal with non-

uniform tempo variations

Typical mapping path

Shorten or Lengthen a Pitch Vector

Given a pitch vector x of length m, how to shorten or lengthen it to length n? x2=interp1(1:m, x, linspace(1, m, n)); Examples

m=7, n=13m=7, n=9

Distance Function for LS

Commonly used distance function for LS Normalized Lp-norm

Characteristics Usually p=1 or 2 for LS Normalization to get rid of length variations

pp

n

pp

p n

xxxxL

/1

21)(

Key Transposition in LS

How to find the best transposed query that has the smallest distance from the database items: Best transposition

In practice…

)(minargˆ rsqLs ps

Query

Database item

Transposed query

)()()(ˆ1

)()()(ˆ2

rmedianqmedianrqmediansp

rmeanqmeanrqmeansp

Example of Linear Scaling via L1 Norm

linScaling01.m

0 50 100 150 200 250 300 350

50

60

70Database and input pitch vectors

Sem

itone

s

Database pitch

Input pitch

0 50 100 150 200 250 300 350

50

60

70

Sem

itone

s

Database and scaled pitch vectors

Database pitch

Scaled pitch

0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.50

2

4

Scaling factor

Dis

tanc

e

Normalized distance

Linear Scaling via L1 and L2 Norm

linScaling02.m

0 50 100 150 200 250 300 350

50

60

70Database and input pitch vectors

Sem

itone

s

Database pitch

Input pitch

0 50 100 150 200 250 300 350

50

60

70

Sem

itone

s

Database and scaled pitch vectors

Database pitch

Scaled pitch via L1 norm

Scaled pitch via L2 norm

0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.50

5

Scaling factor

Dis

tanc

es

Normalized distances via L1 & L

2 norm

L

1 norm

L2 norm

DTW (Dynamic Time Warping)

About DTW DTW introduction DTW for QBSH#1 method for task 2 in QBSH/MIREX 2006

RA (Recursive Alignment)

Characteristics Combine characteristics

of LS & DTW #1 method for task 1 in

QBSH/MIREX 2006

A typical mapping path

Modified Edit Distance

Note segmentation

Modified edit distance

,

)(}2),,....,,({

)(}2),,,....,({

)(),(

)(),(

)(),(

min

1,1

11,

1,1

1,

,1

,

ionfragmentatjkbbawd

ionconsolidatikbaawd

treplacemenbawd

insertionbwd

deletionawd

d

jkjikji

jikijki

jiji

jji

ji

ji