Similarity Matrix Processing for Music Structure Analysis
description
Transcript of Similarity Matrix Processing for Music Structure Analysis
![Page 1: Similarity Matrix Processing for Music Structure Analysis](https://reader035.fdocuments.in/reader035/viewer/2022062315/56815290550346895dc0b26f/html5/thumbnails/1.jpg)
Similarity Matrix Processing for Music Structure Analysis
Yu Shiu, Hong Jeng
C.-C. Jay Kuo
ACM Multimedia 2006
![Page 2: Similarity Matrix Processing for Music Structure Analysis](https://reader035.fdocuments.in/reader035/viewer/2022062315/56815290550346895dc0b26f/html5/thumbnails/2.jpg)
System Framework
![Page 3: Similarity Matrix Processing for Music Structure Analysis](https://reader035.fdocuments.in/reader035/viewer/2022062315/56815290550346895dc0b26f/html5/thumbnails/3.jpg)
Pitch Class Profile (PCP)
• The PCP vector is a 12-dimensional vector, which shows the relative intensities of the 12 pitch classes, {C, C#, D, D#, E, F, F#, G, G#, A, A#,B}
• Normalized to a unit vector
![Page 4: Similarity Matrix Processing for Music Structure Analysis](https://reader035.fdocuments.in/reader035/viewer/2022062315/56815290550346895dc0b26f/html5/thumbnails/4.jpg)
Pitch Class Profile (PCP)
![Page 5: Similarity Matrix Processing for Music Structure Analysis](https://reader035.fdocuments.in/reader035/viewer/2022062315/56815290550346895dc0b26f/html5/thumbnails/5.jpg)
Measure-based Similarity Matrix
• Previous similarity matrix– Pre-defined window size– results in a similarity matrix of a large
size that makes further processing more expensive
• In this paper– Use measure as the element of
similarity matrix
![Page 6: Similarity Matrix Processing for Music Structure Analysis](https://reader035.fdocuments.in/reader035/viewer/2022062315/56815290550346895dc0b26f/html5/thumbnails/6.jpg)
Measure-based Similarity Matrix
• PCP Vector generation– choose a window size that is equal to
the duration of one half beat– Detect onset signal
• compute the change of the spectral content between two adjacent shifting windows of 20ms long and with 50% overlap
![Page 7: Similarity Matrix Processing for Music Structure Analysis](https://reader035.fdocuments.in/reader035/viewer/2022062315/56815290550346895dc0b26f/html5/thumbnails/7.jpg)
Measure-based Similarity Matrix
– the autocorrelation function (ACF) of the onset signal is calculated to determine the beat period
– Example:• 100BPM → length of half beat is 300 ms• Longer than the window size commonly
use in previous work
![Page 8: Similarity Matrix Processing for Music Structure Analysis](https://reader035.fdocuments.in/reader035/viewer/2022062315/56815290550346895dc0b26f/html5/thumbnails/8.jpg)
Measure-based Similarity Matrix
• Grouping N successive PCP vectors
• Since PCP vectors are unit vectors, 0 <= sij <= 1
• dynamic time warping (DTW) can be used to enhance the sij value
![Page 9: Similarity Matrix Processing for Music Structure Analysis](https://reader035.fdocuments.in/reader035/viewer/2022062315/56815290550346895dc0b26f/html5/thumbnails/9.jpg)
Dynamic Time Warping
![Page 10: Similarity Matrix Processing for Music Structure Analysis](https://reader035.fdocuments.in/reader035/viewer/2022062315/56815290550346895dc0b26f/html5/thumbnails/10.jpg)
Measure-based Similarity Matrix
• After the simplification, a 3-minute song with a tempo of 100BPM can form a 75 × 75 similarity matrix
• MSM reveals more the chord similarity rather than the melody similarity
![Page 11: Similarity Matrix Processing for Music Structure Analysis](https://reader035.fdocuments.in/reader035/viewer/2022062315/56815290550346895dc0b26f/html5/thumbnails/11.jpg)
• Johnny Cash’s Hurt repeatedly uses the chord succession {Am, Am, C, D} in the 1st and 3rd sections while {G, A, F, C} in the 2nd and 4th sections.
• Beatles’ Yesterday does not have chord succession of short periods. Its music form structure is P = {I V V C V C V O}
Two MSM Examples
![Page 12: Similarity Matrix Processing for Music Structure Analysis](https://reader035.fdocuments.in/reader035/viewer/2022062315/56815290550346895dc0b26f/html5/thumbnails/12.jpg)
Detection of Local Similarity
• Using a 2D moving window
![Page 13: Similarity Matrix Processing for Music Structure Analysis](https://reader035.fdocuments.in/reader035/viewer/2022062315/56815290550346895dc0b26f/html5/thumbnails/13.jpg)
Detection of Local Similarity• move the 2D moving window along
the diagonal line of the MSM
![Page 14: Similarity Matrix Processing for Music Structure Analysis](https://reader035.fdocuments.in/reader035/viewer/2022062315/56815290550346895dc0b26f/html5/thumbnails/14.jpg)
Detection of Long Range Similarity
• The Viterbi algorithm is used to find segments with consecutive large similarity values along the 45-degree direction
• we can exploit the output from the second module that provides the chord succession similarity to enhance the long range similarity detection.
![Page 15: Similarity Matrix Processing for Music Structure Analysis](https://reader035.fdocuments.in/reader035/viewer/2022062315/56815290550346895dc0b26f/html5/thumbnails/15.jpg)
Detection of Long Range Similarity
• interpret the x-axis as the “time”, the y-axis as the “state”
![Page 16: Similarity Matrix Processing for Music Structure Analysis](https://reader035.fdocuments.in/reader035/viewer/2022062315/56815290550346895dc0b26f/html5/thumbnails/16.jpg)
Detection of Long Range Similarity
• use “scores” instead of “probabilities”
• The score of a path is defined as the product of similarity value of all states and scores of all state transitions
![Page 17: Similarity Matrix Processing for Music Structure Analysis](https://reader035.fdocuments.in/reader035/viewer/2022062315/56815290550346895dc0b26f/html5/thumbnails/17.jpg)
Detection of Long Range Similarity
• PT0 > PT1 to guarantee the preference along the 45-degree direction.– The larger the ratio, the more favorable
the path will proceed along the 45-degree direction.
– In our experiment, the ratio PT0/PT1 is chosen to be 1.5
![Page 18: Similarity Matrix Processing for Music Structure Analysis](https://reader035.fdocuments.in/reader035/viewer/2022062315/56815290550346895dc0b26f/html5/thumbnails/18.jpg)
Detection of Long Range Similarity
• Pruning with Chord Succession Information– sections with repetitive chord
successions of a certain period should be similar to sections of same period
– A period value p is tagged to a measure
![Page 19: Similarity Matrix Processing for Music Structure Analysis](https://reader035.fdocuments.in/reader035/viewer/2022062315/56815290550346895dc0b26f/html5/thumbnails/19.jpg)
Detection of Long Range Similarity
![Page 20: Similarity Matrix Processing for Music Structure Analysis](https://reader035.fdocuments.in/reader035/viewer/2022062315/56815290550346895dc0b26f/html5/thumbnails/20.jpg)
Post-processing
• we begin with the state j that gives the highest Q(L, j) at time L, and perform a back-tracking process.
• Segments with length smaller than φ measures are removed– In our implementation, φ = 8.
• Segments whose mean similarity value is less than a threshold, τ , are removed– τ = mean + standard deviation (for all sij)
![Page 21: Similarity Matrix Processing for Music Structure Analysis](https://reader035.fdocuments.in/reader035/viewer/2022062315/56815290550346895dc0b26f/html5/thumbnails/21.jpg)
Post-processing
• Each segment should be divided– if their two corresponding sections in the song
overlap with each other– if there is a significant difference between
similarity values before and after a certain point in the segment.
• If there are conflicts on sections, the one with a higher similarity value has the priority to keep the boundaries
• For those songs in verse-chorus form, similarity values are clustered into two classes– high similarity values are claimed to be the chorus
![Page 22: Similarity Matrix Processing for Music Structure Analysis](https://reader035.fdocuments.in/reader035/viewer/2022062315/56815290550346895dc0b26f/html5/thumbnails/22.jpg)
![Page 23: Similarity Matrix Processing for Music Structure Analysis](https://reader035.fdocuments.in/reader035/viewer/2022062315/56815290550346895dc0b26f/html5/thumbnails/23.jpg)
Experiment
• collection of 120 pop, country and rock songs after 60’s.
• 100 of them are of the verse-chorus form and 20 are of the AAA or other form
• mono audio sampled at a rate of 22,050Hz, with 16 bits per sample.
![Page 24: Similarity Matrix Processing for Music Structure Analysis](https://reader035.fdocuments.in/reader035/viewer/2022062315/56815290550346895dc0b26f/html5/thumbnails/24.jpg)
Experimental Results
• The pattern extraction of a song is claimed to be correct if all patterns in the song are extracted without distinguishing between verse and chorus
• The accurate detection rate is 112/120 = 93.33%.
![Page 25: Similarity Matrix Processing for Music Structure Analysis](https://reader035.fdocuments.in/reader035/viewer/2022062315/56815290550346895dc0b26f/html5/thumbnails/25.jpg)
Experimental Results