Chord Recognition and Segmentation using EM-trained …dpwe/talks/ismir-2003-10.pdfChord Recognition...

19
Chord Recognition - Sheh & Ellis 2003-10-29 - p. /19 1 Chord Recognition EM-trained HMMs Experiments & Results Chord Recognition and Segmentation using EM-trained Hidden Markov Models Alex Sheh & Dan Ellis {asheh79,dpwe}@ee.columbia.edu LabROSA, Columbia University

Transcript of Chord Recognition and Segmentation using EM-trained …dpwe/talks/ismir-2003-10.pdfChord Recognition...

Page 1: Chord Recognition and Segmentation using EM-trained …dpwe/talks/ismir-2003-10.pdfChord Recognition and Segmentation using EM-trained Hidden Markov Models Alex Sheh & Dan Ellis {asheh79,dpwe}@ee.columbia.edu

Chord Recognition - Sheh & Ellis 2003-10-29 - p. /191

• Chord Recognition• EM-trained HMMs• Experiments & Results

Chord Recognition and Segmentation using EM-trained

Hidden Markov Models

Alex Sheh & Dan Ellis {asheh79,dpwe}@ee.columbia.eduLabROSA, Columbia University

Page 2: Chord Recognition and Segmentation using EM-trained …dpwe/talks/ismir-2003-10.pdfChord Recognition and Segmentation using EM-trained Hidden Markov Models Alex Sheh & Dan Ellis {asheh79,dpwe}@ee.columbia.edu

Chord Recognition - Sheh & Ellis 2003-10-29 - p. /192

• Basic problem:Recover chord sequence labels from audio

Easier than note transcription ?More relevant to listener perception ?

Chord Transcription

Page 3: Chord Recognition and Segmentation using EM-trained …dpwe/talks/ismir-2003-10.pdfChord Recognition and Segmentation using EM-trained Hidden Markov Models Alex Sheh & Dan Ellis {asheh79,dpwe}@ee.columbia.edu

Chord Recognition - Sheh & Ellis 2003-10-29 - p. /193

• Enharmonicity:Chord labels can be ambiguousC# vs Db

• Many different chord classesmajor, minor, 6th, 9th, ...fold into 7 ‘main’ classes:maj, min, maj7, min7, dom7, aug, dim

• Acoustic variabilitychords are the same regardless of instrumentation

Difficulties with Chord Transcription

Page 4: Chord Recognition and Segmentation using EM-trained …dpwe/talks/ismir-2003-10.pdfChord Recognition and Segmentation using EM-trained Hidden Markov Models Alex Sheh & Dan Ellis {asheh79,dpwe}@ee.columbia.edu

Chord Recognition - Sheh & Ellis 2003-10-29 - p. /194

• Note transcription, then note→chord ruleslike labeling chords in MIDI transcripts

• Spectrum→chord rulesi.e. find harmonic peaks, use knowledge of likely notes in each chord

• Trained classifierdon’t use any “expert knowledge”instead, learn patterns from labeled examples

Approaches to Chord Transcription

Page 5: Chord Recognition and Segmentation using EM-trained …dpwe/talks/ismir-2003-10.pdfChord Recognition and Segmentation using EM-trained Hidden Markov Models Alex Sheh & Dan Ellis {asheh79,dpwe}@ee.columbia.edu

Chord Recognition - Sheh & Ellis 2003-10-29 - p. /195

• Use labeled training examples to estimate for features and chord class

•• Use Bayes Rule to get posterior probabilities for each class given features:

Statistical-Pattern-Recognition Chord Recognizer

p(x|wi) wix

Labeledtraining

examples{xn,ωxn}

Estimateconditional pdf

for class ω1

p(x|ω1)

Sortaccordingto class

p(wi|x) =p(x|wi) · p(wi)

 j p(x|w j) · p(w j)

Page 6: Chord Recognition and Segmentation using EM-trained …dpwe/talks/ismir-2003-10.pdfChord Recognition and Segmentation using EM-trained Hidden Markov Models Alex Sheh & Dan Ellis {asheh79,dpwe}@ee.columbia.edu

Chord Recognition - Sheh & Ellis 2003-10-29 - p. /196

• We need (lots of) examples of audio segments and the appropriate chord labelsNot widely available!Even when you can get it, there is very little

• We could hand-mark a training setpainfully time-consuming!

• Can we generate it automatically?we could, if we already had the chord transcriptions system working...

Training Data Sources

Page 7: Chord Recognition and Segmentation using EM-trained …dpwe/talks/ismir-2003-10.pdfChord Recognition and Segmentation using EM-trained Hidden Markov Models Alex Sheh & Dan Ellis {asheh79,dpwe}@ee.columbia.edu

Chord Recognition - Sheh & Ellis 2003-10-29 - p. /197

• Chord recognition is like word recognitionwe are trying to recover a sequence of ‘exclusive’ labels associated with an audio streamwe have a lot of potential training audio, but no time labels

• Can we do what they do in ASR?.. i.e. iterative re-estimation of models and labels using Expectation-MaximizationNeed only label sequence, not timings(i.e. words or chords in order, no times)

Speech Recognition Analogy

Page 8: Chord Recognition and Segmentation using EM-trained …dpwe/talks/ismir-2003-10.pdfChord Recognition and Segmentation using EM-trained Hidden Markov Models Alex Sheh & Dan Ellis {asheh79,dpwe}@ee.columbia.edu

Chord Recognition - Sheh & Ellis 2003-10-29 - p. /198

• Estimate ‘soft’ labels using current models• Update model parameters from new labels• Repeat until convergence to local maximum

EM HMM Re-Estimation

ae1 ae2 ae3

dh1 dh2

Model inventory

Uniforminitializationalignments

Initializationparameters

Repeat until convergence

E-step:probabilitiesof unknowns

M-step:maximize viaparameters

Labelled training datadh ax k ae ts ae t aa n

dh

dh

s ae t aa nax

ax

k

k

ae

ae

tΘinit

p(qn|X1,Θold)i N

Θ : max E[log p(X,Q | Θ)]

Page 9: Chord Recognition and Segmentation using EM-trained …dpwe/talks/ismir-2003-10.pdfChord Recognition and Segmentation using EM-trained Hidden Markov Models Alex Sheh & Dan Ellis {asheh79,dpwe}@ee.columbia.edu

Chord Recognition - Sheh & Ellis 2003-10-29 - p. /199

• All we need are the chord sequences for our training examples

• OLGA Tab archives (http://www.olga.net/)?multiple authors, unreliable quality

• Hal Leonard “Paperback Song Series”many Beatles songs, consistent detailmanually retyped for 20 songs:“Beatles for Sale”, “Help”, “Hard Day’s Night”

• Issues: repeats, intros, weird bits in the middle

Chord Sequence Data Sources

Page 10: Chord Recognition and Segmentation using EM-trained …dpwe/talks/ismir-2003-10.pdfChord Recognition and Segmentation using EM-trained Hidden Markov Models Alex Sheh & Dan Ellis {asheh79,dpwe}@ee.columbia.edu

Chord Recognition - Sheh & Ellis 2003-10-29 - p. /1910

• Preliminary investigations to see if this workssmall database to get startedcompare different feature setsdifferent ways to evaluate resultscan we reintroduce a little high-level knowledge?

Experiments

Page 11: Chord Recognition and Segmentation using EM-trained …dpwe/talks/ismir-2003-10.pdfChord Recognition and Segmentation using EM-trained Hidden Markov Models Alex Sheh & Dan Ellis {asheh79,dpwe}@ee.columbia.edu

Chord Recognition - Sheh & Ellis 2003-10-29 - p. /1911

• Two training sets:Train on 18 songs, test on 2 held-out songs(fair test)Train on all 20 songs, test on 2 songs from training set (cheating, rewards over-fitting, upper bound)

• Two evaluation measuresRecognition: transcribe unknown chordsForced alignment: find time boundaries given correct chord sequenceScore frame-level accuracy against hand-markings

Training/Test Conditions

Page 12: Chord Recognition and Segmentation using EM-trained …dpwe/talks/ismir-2003-10.pdfChord Recognition and Segmentation using EM-trained Hidden Markov Models Alex Sheh & Dan Ellis {asheh79,dpwe}@ee.columbia.edu

Chord Recognition - Sheh & Ellis 2003-10-29 - p. /1912

• Can try any features with EM training...• Use MFCCs as baseline

“the” feature in ASR, also useful in music IRalso try deltas, double deltas as in ASRcapture ‘formants’, not pitch - straw man

• “Pitch Class Profile” features (Fujishima’99)collapse FFT bin energies into (24) chroma binsa/k/a Chroma Spectrum (Bartsch’01) ...

Features

FFT bin (up to 2000 Hz)

Chro

ma

bin

(qua

rter t

ones

)

Spectrum-to-chroma mapping weights

20 40 60 80 100

5

10

15

20

Page 13: Chord Recognition and Segmentation using EM-trained …dpwe/talks/ismir-2003-10.pdfChord Recognition and Segmentation using EM-trained Hidden Markov Models Alex Sheh & Dan Ellis {asheh79,dpwe}@ee.columbia.edu

Chord Recognition - Sheh & Ellis 2003-10-29 - p. /1913

• Statistical system learns separate models for each of (7 chord types) x (21 roots)models are means, variances of feature vectorsonly 32 actually appear in our training seteven so, many have few training instances

• Expect similarity between e.g. Amaj & Bmajsame chord, just shifted in frequencyshift = rotation of 24-bin chroma space

• Can align & average all transpositions of same chord after each training iterationthen rotate back to starting chroma, continue...

Averaging Rotated PCP Models

Page 14: Chord Recognition and Segmentation using EM-trained …dpwe/talks/ismir-2003-10.pdfChord Recognition and Segmentation using EM-trained Hidden Markov Models Alex Sheh & Dan Ellis {asheh79,dpwe}@ee.columbia.edu

Chord Recognition - Sheh & Ellis 2003-10-29 - p. /1914

• Models are not adequately discriminant:

Results: Recognition

(random ~3%)

MFCCs are poor(can overtrain)

PCPs a little better(ROT helps

generalization)

Feature Recogtrain18 train20

MFCC 5.9 16.77.7 19.6

MFCC D 15.8 7.61.5 6.9

PCP 10.0 23.618.2 26.4

PCP ROT 23.3 23.120.1 13.1

Percent Frame Accuracy: Recognition

Eight Days a WeekEvery Little Thing

Page 15: Chord Recognition and Segmentation using EM-trained …dpwe/talks/ismir-2003-10.pdfChord Recognition and Segmentation using EM-trained Hidden Markov Models Alex Sheh & Dan Ellis {asheh79,dpwe}@ee.columbia.edu

Chord Recognition - Sheh & Ellis 2003-10-29 - p. /1915

• Some glimmers of hope!

Results: Alignment

Feature Aligntrain18 train20

MFCC 27.0 20.914.5 23.0

MFCC D 24.1 13.119.9 19.7

PCP 26.3 41.046.2 53.7

PCP ROT 68.8 68.383.3 83.8

Percent Frame Accuracy: Forced Alignment

PCP_ROT best accuracy

& best generalization

Eight Days a WeekEvery Little Thing

Page 16: Chord Recognition and Segmentation using EM-trained …dpwe/talks/ismir-2003-10.pdfChord Recognition and Segmentation using EM-trained Hidden Markov Models Alex Sheh & Dan Ellis {asheh79,dpwe}@ee.columbia.edu

Chord Recognition - Sheh & Ellis 2003-10-29 - p. /1916

• Major/minor confusions• C# is relative minor (shared notes)

Chord ConfusionsXIRTAMNOISUFNOC"ROJAME"

keeWasyaDthgiE

jaM niM 7jaM 7niM 7mDo Aug Dim

bF/E 851 511 . 9 . . .

F/#E 9 . . . . . .

bG/#F . . . 11 . . .

G 3 . . . . . .

bA/#G . . . . . . .

A 9 . . . 1 . .

bB/#A . . . . . . .

bC/B 8 . . . , . .

C/#B . . . . . . .

bD/#C . 02 . . . . .

D . 41 . . . . .

bE/#D . . . . . . .

Page 17: Chord Recognition and Segmentation using EM-trained …dpwe/talks/ismir-2003-10.pdfChord Recognition and Segmentation using EM-trained Hidden Markov Models Alex Sheh & Dan Ellis {asheh79,dpwe}@ee.columbia.edu

Chord Recognition - Sheh & Ellis 2003-10-29 - p. /1917

• Chord model centers (means) indicate chord ‘templates’:

What did the models learn?

0 5 10 15 20 250

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4PCP_ROT family model means (train18)

DIMDOM7MAJMINMIN7

C D E F G A B C (for C-root chords)

Page 18: Chord Recognition and Segmentation using EM-trained …dpwe/talks/ismir-2003-10.pdfChord Recognition and Segmentation using EM-trained Hidden Markov Models Alex Sheh & Dan Ellis {asheh79,dpwe}@ee.columbia.edu

Chord Recognition - Sheh & Ellis 2003-10-29 - p. /1918

• A flavor of the features & results:

Recognition/Alignment Example

true E G D Bm G

align E G DBm G

recog E G Bm Am Em7 Bm Em7

16.27 24.84time / sec

intensity

A

#

B

C

#

D

#

E

F

#

G

#

pitc

h cla

ssBeatles - Beatles For Sale - Eight Days a Week (4096pt)

20

0

40

60

80

100

120

Page 19: Chord Recognition and Segmentation using EM-trained …dpwe/talks/ismir-2003-10.pdfChord Recognition and Segmentation using EM-trained Hidden Markov Models Alex Sheh & Dan Ellis {asheh79,dpwe}@ee.columbia.edu

Chord Recognition - Sheh & Ellis 2003-10-29 - p. /1919

• ASR-style EM is a viable approach for learning chord modelsmore training data needed

• Better featurescapture ‘global’ properties of chordsrobust to fine tuning issues?

• Better representation for training labelsi.e. allow for extra repeats?

• More ways to reintroduce music knowledge

Conclusions & Future Work