Berruecos Y Rosas Bulmaro - Manual de Pintura .-.DD-BOOKS.com.
Research Article Guitar Chords Classification Using ...Universidad Politecnica de Juventino Rosas,...
Transcript of Research Article Guitar Chords Classification Using ...Universidad Politecnica de Juventino Rosas,...
-
Research ArticleGuitar Chords Classification Using UncertaintyMeasurements of Frequency Bins
Jesus Guerrero-Turrubiates,1 Sergio Ledesma,1
Sheila Gonzalez-Reyna,2 and Gabriel Avina-Cervantes1
1Division de Ingenierias, Universidad de Guanajuato, Campus Irapuato-Salamanca, Carretera Salamanca-Valle de Santiagokm 3.5 + 1.8 km, Comunidad de Palo Blanco, 36885 Salamanca, GTO, Mexico2Universidad Politecnica de Juventino Rosas, Hidalgo 102, Comunidad de Valencia, 38253 Santa Cruz de Juventino Rosas,GTO, Mexico
Correspondence should be addressed to Jesus Guerrero-Turrubiates; [email protected]
Received 28 May 2015; Accepted 2 September 2015
Academic Editor: Matteo Gaeta
Copyright © 2015 Jesus Guerrero-Turrubiates et al. This is an open access article distributed under the Creative CommonsAttribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work isproperly cited.
This paper presents a method to perform chord classification from recorded audio.The signal harmonics are obtained by using theFast Fourier Transform, and timbral information is suppressed by spectral whitening. Amultiple fundamental frequency estimationof whitened data is achieved by adding attenuated harmonics by a weighting function.This paper proposes a method that performsfeature selection by using a thresholding of the uncertainty of all frequency bins. Those measurements under the threshold areremoved from the signal in the frequency domain. This allows a reduction of 95.53% of the signal characteristics, and the other4.47% of frequency bins are used as enhanced information for the classifier. An Artificial Neural Network was utilized to classifyfour types of chords: major, minor, major 7th, and minor 7th. Those, played in the twelve musical notes, give a total of 48 differentchords. Two reference methods (based on Hidden Markov Models) were compared with the method proposed in this paper byhaving the same database for the evaluation test. In most of the performed tests, the proposed method achieved a reasonably highperformance, with an accuracy of 93%.
1. Introduction
A chord, by definition, is a harmonic set of two or moremusical notes that are heard as if they was simultaneouslysounding [1]. A musical note refers to the pitch class set of đ¶,đ¶âŻ/đ·â, đ·, đ·âŻ/đžâ, đž, đč, đčâŻ/đșâ, đș, đșâŻ/đŽâ, đŽ, đŽâŻ/đ”â, đ”, andthe intervals between notes are known as half-note intervalor semitone interval. Thus, chords can be seen as musicalfeatures and they are the principal harmonic content thatdescribes a musical piece [2, 3].
A chord has a basic construction known as triad thatincludes notes identified as a fundamental (the root), a third,and a fifth [4]. The root can be any note chosen from thepitch class set, and it is used as the first note to constructthe chord; besides, this note gives the name to the chord.The third has the function of making the chord be minor or
major. For a minor chord the third is located at 3 half-notesintervals from the root. On the other hand, a major chord hasthe third placed at 4 half-note intervals from the root. Theperfect fifth, which completes the triad, is located at 7 half-note intervals from the root. If a note is added to the triad at 11half-note intervals from the root, then the chord will becomea 7th chord. For instance, a đ¶ major chord (đ¶Maj) will becomposed of a rootđ¶ note, amajor thirdđž note, and a perfectfifth đș note; the đ¶ major with a 7th (đ¶Maj7) is composed ofthe same triad of đ¶major plus the 7th note đ”.
Chord arrangements, melody and lyrics, can be groupedin written summaries known as lead sheets [5]. All kind ofmusicians, from professionals to amateur, make use of thesesheets because they provide additional information aboutwhen and how to play the chords or some other arrangementon a melody.
Hindawi Publishing CorporationMathematical Problems in EngineeringVolume 2015, Article ID 205369, 9 pageshttp://dx.doi.org/10.1155/2015/205369
-
2 Mathematical Problems in Engineering
Writing lead sheets of chords by hand is a task known aschord transcription. It can only be performed by an expert;however this is a time-consuming and expensive process. Inengineering, the automatization of chord transcription hasbeen considered a high-level task and has some applicationssuch as key detection [6, 7], cover song identification [8], andaudio-to-lyrics alignment [9].
Chord transcription requires recognizing or estimatingthe chord from an audio file by applying some signal pre-processing.Themost commonmethod for chord recognitionis based on templates [10, 11]; in this case a template is avector of numbers. Then, this method suggests that onlychord definition is necessary to achieve recognition. Thesimplest chord template has a binary structure, for thiskind of template, the notes that belong to the chord willhave unit amplitude, and the remaining ones will have nullamplitude. This template is described by a 12-dimensionalvector; each number in the vector represents a semitone inthe chromatic scale or pitch class set. As an illustration, the đ¶major chord template will be [1 0 0 0 1 0 0 1 0 0 0].The 12-dimensional vectors obtained from an audio framesignal are known as chroma vectors, and they were proposedby Fujishima [12] for chord recognition using templates. Inhis work, chroma vectors are obtained from the DiscreteFourier Transform (DFT) of the input signal. Fujishimaâsmethod (Pitch Class Profile, PCP) is based on an intensitymap on the Simple Auditory Model (SAM) of Leman [13].This allows chroma vector to be formed by the energy of thetwelve semitones of the chromatic scale. In order to performchord recognition, two matching methods were tested: theNearest Neighbors [14] (Euclidean distances between thetemplate vectors and the chroma vectors) and the WeightedSum Method (dot product between chroma vectors andtemplates).
Lee [11] applied the Harmonic Product Spectrum (HPS)[15] to propose the Enhanced Pitch Class Profile (EPCP). Inhis study, chord recognition is performed by maximizing thecorrelation between chord templates and chroma vectors.
Template matching models have poor recognition per-formance on real life songs, because chords change withtime, and consequently chroma vectors will have semitonesof two different chords. Therefore, statistical models becamepopularmethods for chord recognition [16â18].Thus,HiddenMarkov Models [19, 20] (HMM) are probabilistic models fora sequence of observed variables assumed to be independentof each other, and it is supposed that there is a sequence ofhidden variables that are related with the observed variables.
Barbancho et al. [21] proposed a method using HMMto perform a transcription of guitar chords. The chord typesused in their study are major, minor, major 7th, and minor7th of each root of the pitch class set. That is a total of 48chord types. All of them can be played in many differentforms; thus, to play the same chord several finger positionscan be used. In their work, 330 different forms for 48 chordtypes are proposed (for details see the reference); in thiscase every single form is a hidden state. Feature extractionis achieved by the algorithm presented by Klapuri [22], anda model that constrains the transitions between consecutiveforms is proposed.Additionally, a cost function thatmeasures
the physical difficulty of moving from one chord formto another one is developed. Their method was evaluatedusing recordings from threemusical instruments: an acousticguitar, an electric guitar, and a Spanish guitar.
RyynaÌnen and Klapuri [23] proposed a method usingHMM to perform melody transcription and classificationof bass line and chords in polyphonic music. In this case,fundamental frequencies (đč
0âs) are found using the estimator
in [21]; after that, these are passed through a PCP algorithmin order to enhance them. A HMM of 24 states (12 statesfor major chords and 12 states for minor ones) is defined.The transition probabilities between states are found usingthe Viterbi algorithm [24]. The method does not detectsilent segments; however, it provides chord labeling for eachanalyzed frame.
The aforementionedmethods achieve low accuracies, andthe most recent cited one, the method from Barbancho et al.,achieves high accuracy by combining probabilistic models.However, the uses of a HMM and the probabilistic modelsin their work make such method somewhat complex.
In this paper, we propose a method based on ArtificialNeural Networks (ANNs) to classify chords from recordedaudio. This method classifies chords from any octave for asix-string standard guitar. The chord types are major, minor,major 7th, and minor 7th, that is, the same variants for thechords used by Barbancho et al. [21]. First, time signals areconverted to the frequency domain, and timbral informationis suppressed by spectral whitening. For feature selection, wepropose an algorithm that measures the uncertainty for thefrequency bins. This allows reducing the dimensionality ofthe input signal and enhances the relevant components toimprove the accuracy of the classifier. Finally, the extractedinformation is sent to an ANN to be classified. Our methodavoids the calculation of transition probabilities and prob-abilistic models working in combination; nevertheless theaccuracy achieved in this study has superior performanceover the most mentioned methods.
The rest of this paper is organized as follows. In Section 2,fundamental concepts related to this study are presented.Section 3 details the theoretical aspects of the proposedmethod. Section 4 presents experimental results that validatethe proposedmethod, and Section 5 includes our conclusionsand directions for future work.
2. General Concepts
For clarity purposes, this section presents two important con-cepts widely used in Digital Signal Processing (DSP). Theseconcepts are the Fourier Transform and spectral whitening.
2.1. Transformation to Frequency Domain. The human hear-ing system is capable of performing a transformation fromthe time domain to the frequency domain. There is evidencethat humans are more sensitive to magnitude than phaseinformation [25]; as a consequence humans can perceiveharmonic information. This is the main idea to perform theclassification of guitar audio signals in this work.Therefore, afrequency domain representation of the original signal has tobe calculated.
-
Mathematical Problems in Engineering 3
Time (s)
250
200
150
100
Freq
uenc
y (H
z)
50
10 2 3 4 5
Figure 1: Example of a spectrogram.
The time to frequency domain transformation is obtainedby applying the Fast Fourier Transform (FFT) to the inputsignal đ„[đ] and is represented by
đ = F {đ„ [đ]} . (1)
Equation (1) describes the transformation of đ„[đ] at alltimes. However, this is not convenient because songs orsignals, in general, are not stationary. For this reason, awindow function, đ€[đ], is applied to the time signal as
đ§ [đ] = đ„ [đ]đ€ [đ] , (2)
where đ€[đ], for this study, is the Hamming window functionaccording to
đ€ [đ] = đ â (1 â đ) cos( 2đđđ â 1
) , (3)
where đ = 0.54, đ = [0,đ â 1], and đ is the number ofsamples in the frame analysis. A study about the use ofdifferentwindow types can be found inHarris [26]. Equations(2) and (3) divide the signal in different frames that allowingthe analysis of the signal in the frequency domain by
đđ€= F {đ§ [đ]} . (4)
For this work, windowing functions will have 50% ofoverlapping to analyze the entire signal and thus obtain aset of frames đ§
đ[đ] (for simplicity in the notation đ§
đwill
be used). Those frames can be concatenated to construct amatrix Z = [đ§
1đ§2
â â â đ§đ], and, then, compute the FFT for
every column. The result is a representation in the frequencydomain as in Figure 1; this representation is known asspectrogram [27]. This is the format that the signals will bepresented to the classifier for training.
2.2. Spectral Whitening. This process allows obtaining auniform spectrum of the input signal, and it is achieved byboosting the frequency bins of the FFT. There exist differentmethods to perform spectral whitening [28â31].
Thus, inverse filtering [22] is the whitening method usedin our experiments, and it is described next.
First, the original windowed signal is zero-padded totwice its length as
đŠđ= [đ§đ 0 0 â â â 0]
âș
, (5)
00
0.2
Mag
nitu
de
0.4
0.6
0.8
1
1000 3000 5000Frequency (Hz)
7000
Figure 2: Responsesđ»đ(đ) applied in spectral whitening.
and its FFT, represented by Îđ, is calculated. The resulting
frequency spectrumwill have an improved amplitude estima-tion because of the zero-padding. Next, a filter bank is appliedto Îđ; the central frequencies of this bank are given by
đđ= 229 (10
(đ+1)/21.4â 1) , (6)
where đ = 0, . . . , 30. In this case, each filter in the bank hasa triangular response đ»
đ; in fact, this bank tries to simulate
the inner ear basilar membrane. The band-pass frequenciesfor each filter are from đ
đâ1to đđ+1
. Because there is no morerelevant information at higher frequencies than 7000Hz, themaximum value for the parameter đ was 30.
Subsequently, the standard deviations đđare calculated as
đđ= (
1
đŸâ
đ
đ»đ(đ)
Îđ (đ)
2)
1/2
for đ = 0, 1, . . . , đŸ â 1,
(7)
where uppercase đŸ is the length of the FFT series.Later on, the compression coefficients for the central fre-
quencies đ1, đ2, . . . , đ
đare calculated as đŸ
đ= đ
]â1đ
, where ] =[0, 1] is the amount of spectralwhitening applied to the signal.The coefficients đŸ
đare those that belong to the frequency bin
of the âpeakâ of each triangle response; observe Figure 2.Therest of the coefficients đŸ(đ) for the remaining frequency binsare obtained performing a linear interpolation between thecentral frequency coefficients đŸ
đ.
Finally, the white spectrum is obtained with a pointwisemultiplication of all compression coefficients with Î
đas
Iđ(đ) = đŸ (đ) Î
đ(đ) . (8)
-
4 Mathematical Problems in Engineering
Z Y I
Ί T
Size:fL Ă m
Size:fL Ă m
Size:fL Ă m
Size:
Size:2fL Ă m
Size:2fL Ă m
Size:2fL Ă m
estimationFeature
selection Classifier
Fourier Spectral Remove halfof thespectrumwhiteningTransform
Zero-padding Î
Multiple F0Î
fP Ă m
Figure 3: Overview of the proposed system for training with đđżfrequency bins andđ samples of audio.
3. Proposed Method
Our proposed method is described in the block diagramshown in Figure 3. The method begins by defining thecolumns of matrix Z as
Z =[[[[[[
[
đ§1 [0] đ§2 [0] â â â đ§đ [0]
đ§1 [1] đ§2 [1] â â â đ§đ [1]
.
.
.... d
.
.
.
đ§1[đđż] đ§2[đđż] â â â đ§
đ[đđż]
]]]]]]
]
, (9)
where a single column vector [đ§đ[0] đ§
đ[1] â â â đ§
đ[đđż]]âș
represents theđth Hamming windowed audio sample.Thesecolumns are zero-padding to twice their length đ
đżas
Y = [Z | 0]âș = [đŠ1 đŠ2 â â â đŠđ] , (10)where 0 is a zero matrix of the same size of Z. Then, (10)indicates an augmented matrix.
After that, the signal spectrum for every column of Yis calculated by applying the FFT, and then these columnsare passed through a spectral whitening step and the outputmatrix is represented as I. Furthermore, by taking advantageof the symmetrical shape of the FFT, only the first half of thefrequency spectrum (represented by Î) is taken in order toperform the analysis.
A multiple fundamental frequency estimation algorithmand a weighting function are applied to the whitened audiosignals. These algorithms enhance the fundamental frequen-cies by adding their harmonics attenuated by the weightingfunction. The output matrix of this step is denoted asΊ.
The training set includes all data in a matrix of đđż
frequency bins and đ audio samples, where each row orfrequency bin will be an input to the classifier. The numberof inputs can be reduced from đ
đżto đđ(T matrix) by
applying a method based on the uncertainty of the frequencybins, thus enhancing the pertinent information to performa classification. Finally, enhanced data are used to train theclassifier and then to validate its performance.
3.1. Multiple Fundamental Frequency Estimation. The funda-mental frequencies of the semitones in the guitar are definedby
đđ= 2đ/12
đmin, (11)
where đ â Z and đmin is the minimum frequency to beknown; for example, in a standard six-string guitar, the lowestnote is đž having a frequency of 82Hz.
Signal theory establishes that the harmonic partials (orjust harmonics) of a fundamental frequency are defined by
đâđ
= âđđđ, (12)
where âđ= 2, 3, 4, . . . ,đ + 1. In this study đ represents the
number of harmonics to be considered. As an illustration, fora fundamental frequency đ
đ= 131Hz of a đ¶ note, the first
three harmonics will be the set {262, 393, 524}.In this work, if a frequency is located at ±3% of the
semitones frequencies, then this frequency is considered tobe correct. This approach was proposed in [22].
In anđth frame under analysis, fundamental frequenciescan be raised if harmonics are added to its fundamentals [22],by applying
Î (đđ, đ) = Î (đ
đ, đ) +
đ+1
â
âđ=2
Î (âđđđ, đ) , (13)
and, then, all harmonics Î(âđđđ, đ) and their fundamental
frequencies Î(đđ, đ), described in (13), are removed from
the frequency spectrum. When the resulting signal is againanalyzed, with the describedmethod, a different fundamentalfrequency will be raised.
A common issue with (13) is when two or more funda-mentals share a same harmonic. For instance, the fundamen-tal frequency of 65.5Hz of đ¶ note has a harmonic locatedat 196.5Hz. When the Euclidean distances [32] between theanalysis frequency and the frequencies of the semitones arecomputed, the minimum distance or nearest frequency willcorrespond to the đș note.This implies that if those two notesare present in the same analysis frame, then the harmonic ofđș will be summed and eliminated with the harmonics of theđ¶ note. This is because the 196Hz harmonic is located in therange of ±3% of the frequency of a đș note.
There are some methods that deal with this problem. In[33], a technique that makes use of a moving average filter isproposed. In that work, the fundamental frequency takes itsoriginal amplitude and a moving average filter modifies theamplitude of its harmonics.Then, only part of their amplitudeis removed from the original frequency spectrum.
In [22], a weighting function that modifies the amplitudeof the harmonics is proposed. Also, an algorithm to find
-
Mathematical Problems in Engineering 5
multiple fundamental frequencies is suggested.Theweightingfunction is given by
đđ,âđ
=đđ /đmax + đŒ
âđđđ /đ + đœ
, (14)
wheređđ /đmax represents the low limit frequency (e.g., 82Hz),
đđ /đ is the fundamental frequency đ
đunder analysis, and
đđ is the sampling frequency. The parameters đŒ and đœ are
used to optimize the function and minimize the amplitudeestimation error (see [22] for details). In the work [22], theanalyzed đ
đin a whitened signal Î(đ,đ) is used to find its
harmonics with
đ (đ) =
đ
â
âđ=1
đ (đ, đ)maxđ
|Î (đ,đ)| , (15)
where đ is a range of frequency bins in the vicinity of đđ
analyzed. The parameter đ indicates that the signal spectrumis divided into analysis blocks, to find the fundamental fre-quencies. Thus, đ (đ) becomes a linear function of the magni-tude spectrum Î(đ,đ). Then, a residual spectrum Î
đ (đ,đ)
is initialized to Î(đ,đ), and a fundamental period đ isestimated using Î
đ (đ,đ). The harmonics of đ are found in
âđđđ /đ, and then they are added to a vector Î
đ·(đ,đ) in their
corresponding position of the spectrum. The new residualspectrum is calculated as
Îđ (đ,đ) â max (0,Î
đ (đ,đ) â đÎ
đ·(đ,đ)) , (16)
where đ = [0, 1] is the amount of subtraction. This processiteratively computes a different fundamental frequency usingthe methodology described above. The algorithm finishesuntil there are nomore harmonics inÎ
đ (đ,đ) to be analyzed.
Equation (15) was adapted to keep the notation of our work;refer to [22] for further analysis.
In this study, we propose a modification of Klapuriâs algo-rithm, in an attempt to achieve a better estimate of the multi-ple fundamental frequencies. Using (14) and the đth whit-ened signal Î(đ,đ), the multiple fundamental frequenciescan be found by using
Ί (đ,đ) = đđâ
âđ
đ (đ, âđ)Î (âđđ,đ)
, (17)
where âđ= {đ | đ â Z, đ > 1, đđ < đŸ/2} for đ = 0, 1, . . . ,
đŸ/2. Equation (17) analyzes all frequency bins and its har-monics in the signal spectrum.This equation adds to the đthfrequency bin, all its harmonics in â
đđ of the entire spectrum.
Besides, theweighting function performs an estimation of theharmonic amplitude that must be added to the đth frequencybin. Observe that the weighting function does not modify theoriginal amplitude of the harmonics.
Finally when all frequency bins have been analyzed, theresulting signal has all its fundamental frequencies with highamplitude. This will help the classifier to have an accurateperformance.
3.2. Feature Selection. Theobjective of this paper is to classifyfrequencies.Then, the inputs of the classifier are all frequencybins that come from the FFT. However, not all frequencybins will have relevant information. Therefore, a method toremove unnecessary data and enhance the relevant data hasto be performed.This will result in a reduction of the numberof inputs to the classifier.
We propose a method based on the uncertainty of thefrequency bins. This method will discriminate all those thatare not relevant for the classifier in order to improve itsperformance.
In Wei [34], it is stated that, similarly to the entropy, thevariance can be considered as a measure of uncertainty ofa random variable, if and only if the distribution has onecentral tendency. The histograms for all frequency bins ofthe 48 chord types were calculated. This can be used toverify whether the distribution could be approximated to anydistribution with only one central tendency. For simplicity,Figure 4 represents one frequency bin distribution of a đ¶major and a đ¶ minor chord, respectively; it can be seen thatthe distribution fits into a Gaussian distribution. This samebehavior was observed in the other samples of the 48 differentchords. This demonstrates that the variance can be used inthis study as an uncertainty measure in the frequency bins.
In order to perform the feature selection using theuncertainty of the frequency bins, first consider a matrix Ίdefined by
Ί =
[[[[[[[
[
âđ1
âđ2
.
.
.
âđđ
]]]]]]]
]
, (18)
where âđđis a vector formed by the magnitudes of the đth-
component frequency bin of all audio samples.The variancesof each âđ
đcan be computed with
đ2
đ=
1
đ
đ
â
đ=1
(âđđ(đ) â đ)
2
, for đ = 1, 2, . . . , đđż, (19)
wheređ = đž {
âđđ} . (20)
If đ2đ
â 0, then it means that for that particular frequencybin the input is quasi-constant; consequently this frequencybin can be eliminated from all audio samples. This can beachieved if we consider
đ2
max = maxđ
{ïżœâïżœ2
đ} , (21)
and a vector ]âind formed with the indexes đ of ïżœâïżœ2
đthat are
defined by
]âind = {ïżœâïżœ2
đ| (ïżœâïżœ2
đâ„ đđ2
max)} , where 0 †đ †1. (22)
Once feature selection has been performed, the remainingfrequency bins will form the input to the classifier.
-
6 Mathematical Problems in Engineering
Distribution
2000
0.01
0.02
0.03
0.04
0.05
0.06
40 60 80 100
Fitted curve
P(|Î(f,m
)|)
P(|Î(f,m
)|)
DistributionFitted curve
2000
0.01
0.02
0.03
0.04
0.05
40 60 80 100
Fundamental frequency (f = 131Hz) of C minorFundamental frequency (f = 131Hz) of C major
|Î(f, m)| |Î(f, m)|
Figure 4: Central tendency of the fundamental frequency of a đ¶ chord.
Input
Input
Output
Input
Input
Input
number 1
number 2
number 3
number 4
layerHidden
layerOutput
layer
Figure 5: Multilayer perceptron.
3.3. Classifier. Classification is an important part for chordtranscription. In order to perform a good classification,important data will be generated from the original infor-mation. Then, a classification algorithm will be able tolabel the chords. Artificial Neural Networks [35] (ANNs)can be considered as âmassively parallel computing systemsconsisting of an extremely large number of simple processorswith many interconnectionsâ; according to Jain et al. [36]ANNshave been used in chord recognition as a preprocessingmethod or as a classification method. Gagnon et al. [37]proposed a method with ANN to preclassify the number ofstrings plucked in a chord. Humphrey and Bello [38] usedlabeled data to train a convolutional neural network. In thisstudy, an Artificial Neural Network was used to performclassification. Figure 5 represents the configuration for theANN used in this work.TheANNwas trained using the BackPropagation algorithm [39].
4. Experimental Results
Computer simulationswere performed to quantitatively eval-uate the proposed method. The performance of two state-of-the-art references [21, 23] was compared with the presentwork.
Databases for training and testing containing four chordtypes (major, minor, major 7th, andminor 7th) with differentversions of the same chord are considered. Electric andacoustic guitar recordings were used to construct the trainingdata set. A total of 25 minutes were recorded from an electricguitar, and a total of 30 minutes were recorded from anacoustic guitar. Recordings include sets of chords playedconsecutively (e.g., đ¶-đ¶âŻ-đ·-đ·âŻ . . .), as well as some partsof songs. The database used for evaluation was providedby Barbancho et al. [21]. This database has 14 recordings:11 recordings from two different Spanish guitars played bytwo different guitar players, 2 recordings from an electricguitar, and 1 recording from an acoustic guitar, making atotal duration of 21 minutes and 50 seconds. The samplingfrequency đ
đ is of 44100Hz for all audio recordings.
The training data set was divided into frames of 93ms,leading to a FFT of 4096 frequency bins. In the spectralwhitening, the signal was zero-padded to twice its lengthbefore applying the frequency domain transform, so a FFTof 8192 data was obtained. For the spectral whitening, the đŸparameter takes the original length of the FFT but the lengthof the whitened signals remains at 4096 frequency bins. Forthe multiple fundamental frequency estimation, the đŒ and đœparameters are constant and set to 52 and 320, respectively,as in Klapuri [22], while the parameter đ was adjusted toimprove performance. An optimum value of 0.99 was found.This parameter differs from the value in [22] because, in ourmethod, the signal is modified in every cycle that â
đin (15)
increases; on the other hand, Klapuri [22] modifies the signalafter â
đincreases to its higher value.
-
Mathematical Problems in Engineering 7
Table 1: Frequency variances and threshold đđmax = 0.050.
Variance 0.012 0.052 â â â 0.037 0.055 â â â 0.048 0.060 0.010Frequency bin 130 131 â â â 163 164 â â â 195 196 197
Table 2: Classifier inputs with threshold đđmax = 0.050.
Classifier input â â â đth đth + 1 đth + 2 â â â Frequency bin â â â 131 164 196 â â â
150
100150200250300350400
500500 1000 2000 2500 30001500
Audio samples
450
Freq
uenc
y (H
z)
Figure 6: Training set before feature extraction.
These processes were applied to all audio samples to builda training data set. In this case, the data set is a matrix of4096 rows (frequency bins) by 5000 columns (audio samples).In (21), the maximum variance for all frequency bins inthe audio samples is computed. Equation (22) proposes athreshold to remove all those frequency bins that remainquasi-constant. For instance, suppose that a threshold of 0.05is set, and some frequency bins variances (shown in Table 1)are evaluated. Only those above the threshold will be taken asinputs to the classifier, as is shown in Table 2.
Performance tests were made to find the optimal valuefor đ. This parameter was varied; then the ANN was trainedand evaluated. The process was repeated until the best resultwas obtained. The đ parameter was found to be optimalat 0.01326. This allows a 95.6% reduction of the total ofthe frequency bins, while keeping the relevant information.Therefore, we concluded that, for a đ value lower than 0.01326,some information required for a correct classification is lost.Figure 6 shows part of the training data set, in fact onlyfrequency bins in the range [0, 500], and 3000 audio samplesare depicted. Figure 7 shows the same data set of Figure 6after the feature extraction algorithm was applied. It can beobserved that the algorithmmaintains sufficient informationto train the classifier.
An ANN was used as a classification method with 183inputs and 48 outputs. The applied performance metric wasthe ratio of the number of correctly classified chords to thetotal number of frames analyzed.
The validation test had the same structure as the onepresented in Figure 3. First, audio data was loaded. Second,
500180160140120100
Clas
sifier
inpu
ts
80604020
1000 1500 2000 2500 3000Audio samples
Figure 7: Training set after feature extraction.
Table 3: Comparison between methods.
Reference method [21] [48 possibilities: major,minor, major 7th, and minor 7th]
PM 95%PHY 83%MUS 86%PC 75%
Reference method [23] [24 (major/minor) and48 (major, minor, major 7th, and minor 7th)possibilities evaluated separately]
MM 91%MMC 80%CC 70%
Proposed method (48 possibilities: major,minor, major 7th, and minor 7th) VTH 93%
a frequency domain transformation and a spectral whiteningare applied to the signal. Finally, the multiple fundamentalfrequency estimation algorithm is used. At this point, thesignal has 4096 frequency bins. To reduce the number offrequency bins, only those that meet (22) are taken from thesignal and then passed through the classifier.
The results of the proposed method VTH (VarianceThreshold) in this work were compared with two state-of-the-art methods. The best are shown in Table 3; specifically 48chord types with different variants of the same chord wereevaluated. For referencemethodproposed byBarbancho et al.[21], experiments with different algorithms were performed.This method is denoted by PM and includes all modelsdescribed next. The PHY model describes the probability ofthe physical transition between chords. These probabilitiesare computed by measuring the cost of moving the fingersfrom one position to another. The MUS model is based onmusical transition probabilities, that is, the probabilities ofswitching between chords. These were estimated from thefirst eight albums of The Beatles. And, the PC model isequal to the proposed method but without the transitionprobabilities; instead, uniform transition probabilities areused. All models were separately tested; an accuracy of 86%
-
8 Mathematical Problems in Engineering
was achieved atmost.The best result was obtained fromusingthe combination of all methods; a 95% accuracy was achievedin this case.
For the reference method proposed by RyynaÌnen andKlapuri [23], the evaluation results were taken from [21]; inthis case, three tests were performed. First, MM tests (onlymajor and minor chords) were carried on; for all three tests,this was the one with the highest accuracy (91%). Second,MMC tests were executed, all chords were taken into account;however 7th major/minor chords labeled as major/minorwere correctly classified; that is, ađ¶Maj7 labeled asđ¶Maj wascorrect. Finally, CC tests were set with the 48 possibilities;that is, 7th major/minor chords labeled as major/minor wereincorrect; this results in an accuracy of 70%.
The proposed method on this paper achieves an accuracyof 93% in the evaluation test. This classification performancewas achieved with a 95% confidence interval of [91.4, 94.6].The results are competitive with the two reference methods.Even though Barbancho et al. [21] have a 95% of accuracy,it is only achieved when all algorithms PHY, MUS, andPC are combined. Besides, HMM needs the calculations ofprobability transitions between the states of the model (48chord types). This makes their method more complex thanthe one presented in this work. This paper focuses only onchord recognition, so the comparison with [21] does not takeinto consideration the finger configuration.
5. Conclusions
Amethod to classify chords of an audio signal is proposed inthis work. This is based on a frequency domain transforma-tion, where harmonics are the key to find the fundamentalfrequencies that compose the input signal. It was found thatall remaining frequency bins after feature extraction were inthe range from 40Hz to 800Hz.This means that the relevantinformation for the classifier is located on the low frequencyend.
The chords considered were major, minor, major 7th, andminor 7th. Two state-of-the-art methods, which used thesame chords, were taken to compare our study. All computersimulations were performed using the same database. Thereference method from RyynaÌnen and Klapuri [23] had thebest performance when only 24 chord types were considered.Our method outperforms the method of RyynaÌnen andKlapuri by 2%, even when, in our work, 48 chord types wereclassified. The reference method of Barbancho et al. [21] hadan accuracy of 95%; however, they performed a signal analysisto propose two statistical models and a third one that doesnot consider probability transitions between states.Their bestperformance is achieved with all models working together; ifthey are separately tested, the performance is at most 86%.Also, their classificationmethod is based on aHiddenMarkovModel that needs interconnected states.
The method presented in this work avoids designingstatistical models and interconnected states for the HMM.The Artificial Neural Network as a classification methodworks with a high precision when the data presented havebeen processed with an appropriate algorithm.The proposed
method for feature selection achieves high accuracy, becausethe data presented to the classifier have the pertinent infor-mation to be trained.
The sampling frequency of 44100Hz and the windowingof 4096 data result in a frequency resolution of 10Hz. Withthis frequency resolution it is not possible to distinguishthe low frequencies of the guitar, for example, an đž with82Hz and an đč with 87Hz. However, the original signalhas six sources (strings), where three of them are octavesfrom the other three (except for 7th chords). Then, becausethe proposed method for multiple fundamental frequencyestimation adds the harmonics for every single đth bin, thehigh octaves can be raised. For example, for an đž of 82Hz,the octave at 164Hz will also be raised.Then, this octave withthe other fundamentals gives a correct classification of thechord. In the case of an đč, the fundamental at 87Hz can notbe distinguished from the frequency of 82Hz. Nevertheless,the octave at 174Hz will be perfectly raised; so with the otherfundamentals frequencies of đč, the ANN performs a correctclassification.
The present work due to its simplicity can be appliedto chord recognition in some devices, for example, a Field-ProgrammableGateArray (FPGA) or somemicrocontrollers.This study leaves for a future work the source separation ofeach string in the guitar. Once a played chord is known, wecan make some assumptions about where the hand playingthe chord is.Thus,we can apply somemethods of blind sourceseparation to obtain the audio of each guitar string. Besides,with the information of separated strings, the classifier canbe extended for a wide set of chord families. Because theclassification can be performed by a single string insteadof the mixture of six strings, this can lead to the completetranscription of guitar chords and identification of stringsbeing played.
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper.
Acknowledgments
This research has been supported by the âNational Councilon Science and Technologyâ of Mexico (CONACYT) underGrant no. 429450/265881 and by Universidad de GuanajuatothroughDAIP.The authors would like to thankA.M. Barban-cho et al., for providing the database used for comparison.
References
[1] O. Karolyi, Introducing Music, Penguin Books, 1965.[2] Hal Leonard, The Real Book, Hal Leonard, Milwaukee, Wis,
USA, 2004.
[3] J. Brent and S. Barkley,Modalogy: Scales,Modes andChords, HalLeonard Corporation, 2011.
[4] D. Latarski, An Introduction to Chord Theory, DoLa Publisher,1982.
-
Mathematical Problems in Engineering 9
[5] J. Weil, T. Sikora, J. Durrieu, and G. Richard, âAutomatic gen-eration of lead sheets from polyphonic music signals,â in Pro-ceedings of the 10th International Society for Music InformationRetrieval Conference, pp. 603â608, 2009.
[6] A. Shenoy and Y. Wang, âKey, chord, and rhythm tracking ofpopular music recordings,âComputerMusic Journal, vol. 29, no.3, pp. 75â86, 2005.
[7] K. Lee and M. Slaney, âAcoustic chord transcription and keyextraction from audio using Key-dependent HMMs trained onsynthesized audio,â IEEE Transactions on Audio, Speech andLanguage Processing, vol. 16, no. 2, pp. 291â301, 2008.
[8] D. P. W. Ellis and G. E. Poliner, âIdentifying âcover songsâ withchroma features and dynamic programming beat tracking,â inProceedings of the IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP â07), pp. IV1429âIV1432,Honolulu, Hawaii, USA, April 2007.
[9] M. Mauch, H. Fujihara, and M. Goto, âIntegrating additionalchord information into HMM-based lyrics-to-audio align-ment,â IEEE Transactions on Audio, Speech and Language Pro-cessing, vol. 20, no. 1, pp. 200â210, 2012.
[10] C. Harte and M. Sandler, âAutomatic chord identification usingquantized chromagram,â in Proceedings of the Audio Engineer-ing Society Convention, pp. 28â31, 2005.
[11] K. Lee, âAutomatic chord recognition from audio usingenhanced pitch class profile,â in Proceedings of the InternationalComputerMusic Conference (ICMC â06), NewOrleans, La,USA,2006.
[12] T. Fujishima, âReal time chord recognition of musical sound:a system using common lisp music,â in Proceedings of theInternational Computer Music Conference (ICMC â99), pp. 464â467, 1999.
[13] M. Leman,Music and SchemaTheory, Springer, 1995.[14] T. Cover and P. Hart, âNearest neighbor pattern classification,â
IEEE Transactions on Information Theory, vol. 13, no. 1, pp. 21â27, 1967.
[15] M. R. Schroeder, âPeriod histogram and product spectrum: newmethods for fundamental-frequency measurement,â The Jour-nal of the Acoustical Society of America, vol. 43, no. 4, pp. 829â834, 1968.
[16] A. Sheh and D. Ellis, âChord segmentation and recognitionusing EM-trained hiddenMarkovmodels,â in Proceedings of the4th International Society for Music Information Retrieval Con-ference, pp. 183â189, Taipei, Taiwan, 2006.
[17] T. Cho and J. P. Bello, âReal-time implementation of HMM-based chord estimation in musical audio,â in Proceedings of theInternational Computer Music Conference (ICMC â09), pp. 117â120, August 2009.
[18] K. Martin, âA blackboard system for automatic transcription ofsimple polyphonic music,â Tech. Rep. 385, Massachusetts Insti-tute of Technology Media Laboratory Perceptual ComputingSection, 1996.
[19] L. R. Rabiner and B.-H. Juang, âAn introduction to hiddenMarkov models,â IEEE ASSP Magazine, vol. 3, no. 1, pp. 4â16,1986.
[20] L. R. Rabiner, âTutorial on hiddenMarkov models and selectedapplications in speech recognition,â Proceedings of the IEEE, vol.77, no. 2, pp. 257â286, 1989.
[21] A. M. Barbancho, A. Klapuri, L. J. Tardon, and I. Barbancho,âAutomatic transcription of guitar chords and fingering fromaudio,â IEEE Transactions on Audio, Speech, and LanguageProcessing, vol. 20, no. 3, pp. 915â921, 2012.
[22] A. Klapuri, âMultiple fundamental frequency estimation bysumming harmonic amplitudes,â in Proceedings of the 7thInternational Conference onMusic Information Retrieval (ISMIRâ06), pp. 1â6, Victoria, Canada, 2006.
[23] M. P. RyynaÌnen and A. P. Klapuri, âAutomatic transcription ofmelody, bass line, and chords in polyphonic music,â ComputerMusic Journal, vol. 32, no. 3, pp. 72â76, 2008.
[24] G.D. Forney Jr., âThe viterbi algorithm,âProceedings of the IEEE,vol. 61, no. 3, pp. 268â278, 1973.
[25] D. Deutch,The Psychology of Music, Academic Press, New York,NY, USA, 1999.
[26] F. J. Harris, âOn the use of windows for harmonic analysis withthe discrete Fourier transform,â Proceedings of the IEEE, vol. 66,no. 1, pp. 51â83, 1978.
[27] M. J. Bastiaans, âA sampling theorem for the complex spec-trogram, and Gaborâs expansion on a signal in Gaussian ele-mentary signals,â Optical Engineering, vol. 20, no. 4, Article ID204597, 1981.
[28] Y. C. Eldar and A. V. Oppenheim, âMMSE whitening and sub-space whitening,â IEEETransactions on InformationTheory, vol.49, no. 7, pp. 1846â1851, 2003.
[29] C.-Y. Chi and D. Wang, âAn improved inverse filtering methodfor parametric spectral estimation,â IEEE Transactions on SignalProcessing, vol. 40, no. 7, pp. 1807â1811, 1992.
[30] F. M. Hsu and A. A. Giordano, âDigital whitening techniquesfor improving spread spectrum communications performancein the presence of narrowband jamming and interference,â IEEETransactions on Communications, vol. 26, no. 2, pp. 209â216,1978.
[31] T. Tolonen and M. Karjalainen, âA computationally efficientmultipitch analysis model,â IEEE Transactions on Speech andAudio Processing, vol. 8, no. 6, pp. 708â716, 2000.
[32] P.-E. Danielsson, âEuclidean distance mapping,â ComputerGraphics and Image Processing, vol. 14, no. 3, pp. 227â248, 1980.
[33] A. P. Klapuri, âMultiple fundamental frequency estimationbased on harmonicity and spectral smoothness,â IEEE Transac-tions on Speech and Audio Processing, vol. 11, no. 6, pp. 804â816,2003.
[34] Y. Wei, Variance, entropy, and uncertainty measure [Ph.D.thesis], Department of Stastistics, Peopleâs University of China,1987.
[35] J. J. Hopfield, âArtificial neural networks,â IEEE Circuits andDevices Magazine, vol. 4, no. 5, pp. 3â10, 1988.
[36] A. K. Jain, R. P. W. Duin, and J. Mao, âStatistical pattern recog-nition: a review,â IEEE Transactions on Pattern Analysis andMachine Intelligence, vol. 22, no. 1, pp. 4â37, 2000.
[37] T. Gagnon, S. Larouche, and R. Lefebvre, âA neural networkapproach for pre-classification in musical chord recognition,âin Proceedings of the Record of the 37th Asilomar Conferenceon Signals, Systems, and Computers, pp. 2106â2109, Monterrey,Mexico, November 2003.
[38] E. J. Humphrey and J. P. Bello, âRethinking automatic chordrecognitionwith convolutional neural networks,â in Proceedingsof the 11th IEEE International Conference on Machine Learningand Applications (ICMLA â12), pp. 357â362, December 2012.
[39] A. T. C. Goh, âBack-propagation neural networks for modelingcomplex systems,â Artificial Intelligence in Engineering, vol. 9,no. 3, pp. 143â151, 1995.
-
Submit your manuscripts athttp://www.hindawi.com
Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014
MathematicsJournal of
Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014
Mathematical Problems in Engineering
Hindawi Publishing Corporationhttp://www.hindawi.com
Differential EquationsInternational Journal of
Volume 2014
Applied MathematicsJournal of
Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014
Probability and StatisticsHindawi Publishing Corporationhttp://www.hindawi.com Volume 2014
Journal of
Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014
OptimizationJournal of
Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014
CombinatoricsHindawi Publishing Corporationhttp://www.hindawi.com Volume 2014
International Journal of
Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014
Operations ResearchAdvances in
Journal of
Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014
Function Spaces
Abstract and Applied AnalysisHindawi Publishing Corporationhttp://www.hindawi.com Volume 2014
International Journal of Mathematics and Mathematical Sciences
Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014
The Scientific World JournalHindawi Publishing Corporation http://www.hindawi.com Volume 2014
Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014
Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014
Decision SciencesAdvances in
Discrete MathematicsJournal of
Hindawi Publishing Corporationhttp://www.hindawi.com
Volume 2014 Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014
Stochastic AnalysisInternational Journal of