DTW for QBSH
description
Transcript of DTW for QBSH
![Page 1: DTW for QBSH](https://reader036.fdocuments.in/reader036/viewer/2022081506/56815848550346895dc59e83/html5/thumbnails/1.jpg)
DTW for QBSH
J.-S Roger Jang (張智星 )
http://mirlab.org/jang
MIR Lab, CSIE Dept.
National Taiwan University
![Page 2: DTW for QBSH](https://reader036.fdocuments.in/reader036/viewer/2022081506/56815848550346895dc59e83/html5/thumbnails/2.jpg)
-2-
Dynamic Time Warping (DTW)
Goal: Allows comparison of high tolerance to tempo variation
Characteristics: Robust for irregular tempo variations Trial-and-error for dealing with key transposition
Expensive in computation Does not conform to triangle inequality Some indexing algorithms do exist
![Page 3: DTW for QBSH](https://reader036.fdocuments.in/reader036/viewer/2022081506/56815848550346895dc59e83/html5/thumbnails/3.jpg)
-3-
Type-1 DTW
i
j
t(i-1)
r(j)
),(minAnswer 3.
|)1()1(|)1,1(
)1,2(
)1,1(
)2,1(
min|)()(|),(
),(for formula Recurrent.2
):1( and ):1( between distanceDTW :),( .1
jmD
rtD
jiD
jiD
jiD
jritjiD
jiD
jritjiD
j
),( jiD
t: input pitch vector (8 sec)r: reference pitch vectorLocal paths: 27-45-63 degrees
3-step formula for type-1 DTW(with anchored beginning)
r(j-1)
t(i)
![Page 4: DTW for QBSH](https://reader036.fdocuments.in/reader036/viewer/2022081506/56815848550346895dc59e83/html5/thumbnails/4.jpg)
-4-
Type-2 DTW
i
j
t(i-1)
r(j) ),( jiD
r(j-1)
t(i)
t: input pitch vector (8 sec)r: reference pitch vectorLocal paths: 0-45-90 degrees
3-step formula for type-2 DTW(with anchored beginning)
),(minAnswer 3.
|)1()1(|)1,1(
),1(
)1,1(
)1,(
min|)()(|),(
),(for formula Recurrent.2
):1( and ):1( between distanceDTW :),( .1
jmD
rtD
jiD
jiD
jiD
jritjiD
jiD
jritjiD
j
![Page 5: DTW for QBSH](https://reader036.fdocuments.in/reader036/viewer/2022081506/56815848550346895dc59e83/html5/thumbnails/5.jpg)
-5-
Local Path Constraints
Type 1: 27-45-63 local paths
Type 2: 0-45-90 local paths
jiD ,
jiD ,
),1(
)1,1(
)1,(
min
)()(),(
jiD
jiD
jiD
jritjiD
)1,2(
)1,1(
)2,1(
min
)()(),(
jiD
jiD
jiD
jritjiD
2,1 jiD
1, jiD 1,1 jiD
jiD ,1
1,1 jiD 1,2 jiD
![Page 6: DTW for QBSH](https://reader036.fdocuments.in/reader036/viewer/2022081506/56815848550346895dc59e83/html5/thumbnails/6.jpg)
-6-
Path Penalty
Goal: To avoid paths deviated from 45 degrees
Path penalty Small/no penalty for 45-degree path Large penalty for paths deviated from 45-degree
)1,2(
)1,1(
)2,1(
min)()(),(
jiD
jiD
jiD
jritjiD
),( jiD
)2,1( jiD
)1,2( jiD
)1,1( jiD
0
![Page 7: DTW for QBSH](https://reader036.fdocuments.in/reader036/viewer/2022081506/56815848550346895dc59e83/html5/thumbnails/7.jpg)
-8-
DTW Paths of “Anchored Beginning”
Anchored beginning end position is free to move
Assumption: The speed of a user’s acoustic input falls within 1/2 and 2 times of that of the intended song.
DTW table size for 8-sec query = 250x180 250 = 31.25*8 375 = 250*1.5
i
j
![Page 8: DTW for QBSH](https://reader036.fdocuments.in/reader036/viewer/2022081506/56815848550346895dc59e83/html5/thumbnails/8.jpg)
-9-
DTW Paths of “Anchored Anywhere”
Anchored anywhere Both ends are free to move.
DTW table size for 8-sec query against 3-min song = 250 x 5620 250 = 31.25*8 5620 = 31.25*180
i
j
![Page 9: DTW for QBSH](https://reader036.fdocuments.in/reader036/viewer/2022081506/56815848550346895dc59e83/html5/thumbnails/9.jpg)
-10-
4
2
8
8
2
3
4
1
2 3 7 8 2
7
2
5
0
1
6
6
0
2
1
1 3 4 2
4 0 1 5
1 5 6 0
5 1
1
5
4
3
6
5
1
0
1
2 7
4
5
6
0
0 6
6
0
1
2
1
![Page 10: DTW for QBSH](https://reader036.fdocuments.in/reader036/viewer/2022081506/56815848550346895dc59e83/html5/thumbnails/10.jpg)
-11-
4
2
8
8
2
3
4
1
2 3 7 8 2
7
2
5
0
1
6
6
0
2
1
2
1
2
5
5
7
0
10
3
1
6
6
4
7
7 6
5
12
7
1
6
2
4
7
1
2
1 3 4 2
4 0 1 5
01 5 6 0
1
2
1
0
6
65 1
1
5
4
3
6
5
1
0
1
2 7
4
5
6
0
0 6
6
0
1
2
1
![Page 11: DTW for QBSH](https://reader036.fdocuments.in/reader036/viewer/2022081506/56815848550346895dc59e83/html5/thumbnails/11.jpg)
-13-
Implementation Issues
To save memory Use 2-column table for type-1 DTW Use 1-column table for type-2 DTW
To avoid too many if-then statements Pad type-1 DTW with two-layer padding Pad type-2 DTW with one-layer padding
To find a suitable path Minimizing total distance Minimizing average distance
![Page 12: DTW for QBSH](https://reader036.fdocuments.in/reader036/viewer/2022081506/56815848550346895dc59e83/html5/thumbnails/12.jpg)
-14-
Other Variants
Local constraints
Flexible start/ending pos.
![Page 13: DTW for QBSH](https://reader036.fdocuments.in/reader036/viewer/2022081506/56815848550346895dc59e83/html5/thumbnails/13.jpg)
-15-
DTW Path of “Anchored Beginning”
![Page 14: DTW for QBSH](https://reader036.fdocuments.in/reader036/viewer/2022081506/56815848550346895dc59e83/html5/thumbnails/14.jpg)
-16-
DTW Path of “Anchored Anywhere”
![Page 15: DTW for QBSH](https://reader036.fdocuments.in/reader036/viewer/2022081506/56815848550346895dc59e83/html5/thumbnails/15.jpg)
-17-
Another Two Views of DTW Path of “Anchored
Anywhere”
![Page 16: DTW for QBSH](https://reader036.fdocuments.in/reader036/viewer/2022081506/56815848550346895dc59e83/html5/thumbnails/16.jpg)
-19-
Key Transposition (1/2)
Goal: Allow users’ input of different keys
Method 1: Mean shift and heuristic modification
5 DTW computation when compared to each song
Mean
-4 40-2 21 3
t-2t+2(t’)t’-1 t’+1t
![Page 17: DTW for QBSH](https://reader036.fdocuments.in/reader036/viewer/2022081506/56815848550346895dc59e83/html5/thumbnails/17.jpg)
-20-
Key Transposition (2/2)
Method 2: Fixed point iteration Step 1: DTW alignment Step 2: Stop if mapping path fixed Step 3: Shift to the same mean based on the alignment
Step 4: Go back to step 2.Characteristics
DTW distance monotonically non-increasing to guarantee convergence
![Page 18: DTW for QBSH](https://reader036.fdocuments.in/reader036/viewer/2022081506/56815848550346895dc59e83/html5/thumbnails/18.jpg)
-24-
Type-3 DTW:Frame to Note Alignment
DP-based method for filling the table:
67
64
65
Frame-levelPitch vector
Notes
)1,1(
),1(min|)()(|),(
jiD
jiDjritjiD
jiD ,
1,1 jiD
jiD ,1
Recurrent formula: Local constraint:
62
65
![Page 19: DTW for QBSH](https://reader036.fdocuments.in/reader036/viewer/2022081506/56815848550346895dc59e83/html5/thumbnails/19.jpg)
-25-
Type-3 DTW
Characteristics Frame-based query input vs. note-based music database
Note duration unused
More efficient, less effective
Heuristics for key-transposition
Mapping path
![Page 20: DTW for QBSH](https://reader036.fdocuments.in/reader036/viewer/2022081506/56815848550346895dc59e83/html5/thumbnails/20.jpg)
-26-
Type-3 DTW:Effects of Key Transposition
Rough key transpos.
Fine key transpos.
Please refer to the online tutorial page for playback.