Chorale Harmonisation in the Style of J.S. Bach A Machine...

Chorale Harmonisation in the Style of J.S. Bach

A Machine Learning Approach

Alex Chilvers

2006

Contents

1 Introduction 3

2 Project Background 5

3 Previous Work 73.1 Music Representation . . . . . . . . . . . . . . . . . . . . . . . . 73.2 Harmonisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3.2.1 Manual Harmonisation . . . . . . . . . . . . . . . . . . . . 93.2.2 Automatic Harmonisation . . . . . . . . . . . . . . . . . . 9

3.3 Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . 113.3.1 Features and Feature Space . . . . . . . . . . . . . . . . . 123.3.2 Concept Learning . . . . . . . . . . . . . . . . . . . . . . 123.3.3 Bayesian Learning . . . . . . . . . . . . . . . . . . . . . . 133.3.4 Decision Tree Learning . . . . . . . . . . . . . . . . . . . . 133.3.5 Instance-based Learning . . . . . . . . . . . . . . . . . . . 14

4 Representations 164.1 Classifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164.2 Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

4.2.1 Local Features . . . . . . . . . . . . . . . . . . . . . . . . 184.2.2 Global Features . . . . . . . . . . . . . . . . . . . . . . . . 21

5 System Architecture 235.1 High Level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

5.1.1 Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235.1.2 Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

5.2 Low Level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255.3 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

6 Evaluation 316.1 Empirical . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316.2 Qualitative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

7 Results 347.1 Experiment Settings . . . . . . . . . . . . . . . . . . . . . . . . . 34

7.1.1 ML Classifier . . . . . . . . . . . . . . . . . . . . . . . . . 347.1.2 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . 34

7.2 Baseline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357.3 ML Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

1

7.4 Feature Encodings . . . . . . . . . . . . . . . . . . . . . . . . . . 367.4.1 Context Size . . . . . . . . . . . . . . . . . . . . . . . . . 367.4.2 Relative Context Pitch . . . . . . . . . . . . . . . . . . . . 377.4.3 Contours . . . . . . . . . . . . . . . . . . . . . . . . . . . 377.4.4 Previous Length . . . . . . . . . . . . . . . . . . . . . . . 377.4.5 Previous Classifications . . . . . . . . . . . . . . . . . . . 387.4.6 Future Context . . . . . . . . . . . . . . . . . . . . . . . . 387.4.7 Future Context / Previous Classifications . . . . . . . . . 387.4.8 Location and Metre . . . . . . . . . . . . . . . . . . . . . 387.4.9 Pitch Features . . . . . . . . . . . . . . . . . . . . . . . . 397.4.10 Tonality . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

7.5 Classification Approaches . . . . . . . . . . . . . . . . . . . . . . 397.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

8 Discussion 428.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 428.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

A Chorale Datasets 45

2

Chapter 1

Introduction

Music is a part of every human culture. It predates the written word, andmay well predate any spoken language. The history of music is a complexone, and can be studied from a number of different perspectives — theological,sociological, political etc. Understanding the music that was produced by acertain group of people, during a certain period of time, can help one to furtherunderstand their culture and way of life. Music is used to enhance celebrations,and is often an important part of religious tradition.

In western music, melody is often the most important aspect of a piece ofmusic. The melody is the main sequence of notes that can be heard throughoutthe piece and is the most immediately distinguishing feature of a composition. Itmay be performed by a voice, some instrument, or combinations of both. Often,the melody will be louder than the other parts. However, melody alone doesnot necessarily contain much detail by which we can determine a genre, era orcomposer. It is features found in the way that a piece of music is accompanied, orharmonised, that often says the most about music stylistically — particularlyclassical western music. It is worth noting that we are focusing on the waymusic is composed, not performed. If we were focusing on performances, wewould need to consider how the performing artist’s interpretation can also havean impact on a piece’s style (even if the music is monophonic — that is, withoutharmony).

Different genres of music, often identified by a period in history and/orthe part of the world in which it was predominantly created, can be identifiedby a trained listener. There may not necessarily be anything unique about themelody, but the combination and interaction of the melody with the harmonisingnotes can often make a piece of music immediately classifiable in terms of genre,or even composer. For example, there are certain chords (that is, combinationsof notes played simultaneously) that are considered typical of jazz music. Thesestylistic features in music may not be hard rules followed by composers. Often,the best way to detect them is to have spent a great deal of time listening toexamples, training your ear, and developing a “feel” for a style. The chorales ofJ.S. Bach provide a large number of pieces that can be studied in order to acquirethis aforementioned feel. Though they are relatively short compositions, thereare many of them, and they are all harmonised by Bach in his quite recognisableBaroque style.

Being able to create new music that seems indistinguishable from music of a

3

particular style, or composer, is a step further than simply recognising it. Onecan imagine composing some music in a classical style and having a trainedlistener believe that it was written by someone who lived a few centuries ago.Attempting to assign, to a computer, the task of learning and mimicking amusical style is the focus of our research.

4

Chapter 2

Project Background

Computer music is generally used to refer to music that has been generated orcomposed with the aid of computers — beyond simply as a recording tool. Peo-ple in the academic field of computer music seek to find ways of applying moderncomputing techniques and technologies to applications that aim at automatingsome aspect of music composition or analysis.

Automatic harmonisation is a topic that has, in the past, been explored bynumerous computer music researchers. Harmonisation refers to the implemen-tation of harmony, usually by using chords. Harmony is sometimes referredto as the “vertical” aspect of music, with melody being the “horizontal” (withrespect to time). It essentially involves multiple notes being played simultane-ously, built around, and in support of, the note being played in the main melodyline. Generating harmony automatically would require a computer system tomake a decision as to which chords (or single notes) to use at a given time.

Bach chorales have usually been the focus of automatic harmonisation re-search. The reason for this is the abundance of them, as explained below, alongwith the fact that they all began as monophonic compositions — with harmon-isation later generated by a second party (namely Bach).

A chorale was originally a hymn of the Lutheran church, sung by the entirecongregation. Although chorales were traditionally sung in unison (that is, ev-erybody sang the same melody simultaneously), several composers harmonisedthe tunes. Johann Sebastian Bach (a Baroque style composer, born in 1685 anddied in 1750) is most notable, having harmonised hundreds of chorales to beperformed by four-part choirs. Even though Bach did not actually write anychorale melodies, his name is virtually synonymous with the term chorale.

The fact that Bach applied the same basic process (that of building threeparts to be performed in harmony with the main melody) repeatedly to somany chorales suggests that it may be possible to simulate the process compu-tationally. There are arguments both for and against this idea. Although exactsimulation may not necessarily be possible, subtle (and, learnable) patterns inBach’s work may be apparent, thus making simulation possible. On the otherhand, any music composition is a difficult task and many would argue that Bachwas a genius. Thus, it could be argued that his work cannot be generalised sothat a computer can learn to simulate it — that there are irreproducible in-dicators of his mastery. Additionally, there are many different harmonisationsthat can accompany a melody. While it may be possible to simulate “good”

5

harmonisation, hoping to choose the harmonisation chosen by Bach may beoverambitious.

Despite this, in the past, people have built systems that attempt to generatechorale harmonisations using Bach’s general style. We later summarise a numberof such alternative approaches taken by others in the field.

The purpose of this project is to build a system that, having been trained ona set of four-part Bach chorales, can generate chords (that are a combination ofnotes in the Bass, Tenor and Alto parts) to harmonise a given Soprano melody.The system will analyse hundreds of entire four-part chorales and develop amodel for harmonisation. When given the Soprano part as input, the systemwill use this model to generate the remaining parts.

The ultimate aim is to finish with a system that produces harmony, forpreviously unseen melodies, that is identical to the harmony that we knowBach to have created for that same melody. We would like to accomplish thisambitious goal, but there are a number of different ways to measure the successof the system. At the very least, the system should produce harmony that isdeemed to be both musically acceptable (that is, does not break any fundamentalrules or produce any inappropriate/dissonant note combinations), and a closeresemblance of Bach’s own work. Of course, the system should be able to dothis to any given melody.

On top of this, it would be desirable to finish with a framework that, givenanother data set on which to train (for example, 300 jazz pieces), and perhapsafter some alterations, should be able to perform an adequate harmonisation ofmelodies in the new genre.

6

Chapter 3

Previous Work

Due to the interdisciplinary nature of this project, some research into, andunderstanding of, a number of different academic fields is required before at-tempting any implementation. It is also best to have an understanding of somekey concepts before trying to understand the work we are presenting. Firstly, itis best to have a reasonable familiarity with music and music theory, includingnotation. Next, it is important to consider the different ways of approachingharmonisation, including previous attempts at automatic chorale harmonisa-tion. Since we are taking a Machine Learning approach to this task, an overviewof Machine Learning is also a beneficial premise.

3.1 Music Representation

Music can be defined as combinations and sequences of notes. Musical notes arenamed after the first seven letters of the alphabet (A, B, . . ., G). These notescorrespond to the white keys on a piano’s keyboard. The black keys are called‘flats’ ([) or ‘sharps’ (]) — for example, the black key between A and B can bereferred to as either A] or B[. One set of all twelve notes (7 white keys, and 5black keys) is called an octave, and this covers all of the distinct notes used inwestern music. The same note can, however, be voiced in a number of differentoctaves (e.g. low C, middle C, high C, etc.).

Music is universally transcribed as a ‘score’ on manuscript paper. A symboliclanguage is used, and a musician is expected to read the notes that are to beplayed.

When music is transcribed as a score, each note is represented by a symbol(different symbols indicate different durations) written on, or between, lines(this set of lines is called a ‘stave’). The vertical position in relation to theselines determines the note’s pitch (that is, its name, e.g. C, and its octave).There are also two different types of staves — treble and bass. Their functionis, essentially, the same. Figure 3.1a provides an example of some notes on atreble stave.

The data used by researchers to approach computational music tasks (in-cluding that of automatic harmonisation) in the past has usually come in MIDIformat. This is a digital representation of music, as it is performed. Each noteevent is represented as a combination of start time, pitch, duration and dy-

7

a.

b.

**kern*clefG2*k[b-]*M2/2=-2d/2a/=2f/2d/=2c]/4d/4e/=2f/*-

Figure 3.1: Score (a.) vs **kern (b.) transcription

namic. MIDI players are readily available, making listening to such files theirprimary use. The nature of MIDI can often make it difficult to handle. Mostsignificantly, the timing of notes is, in a way, too precise. This is due, in muchpart, to slight human errors at performance time. It can be difficult to map themusic to a score transcription.

However, there are other music representation languages that are a more di-rect transcription from an actual score. A way of digitally representing a score isto use the Humdrum **kern format. A full description of **kern can be found athttp://dactyl.som.ohio-state.edu/Humdrum/representations/kern.html.

Figure 3.1 shows the conversion of a short line of music from its score tran-scription to **kern. In the case of polyphony (multiple lines played simulta-neously), simultaneous events appear on the same horizontal line, as each partis represented by a separate column. Integers are used to represent a note’sduration, along side the note name. Different octaves are expressed by the caseand number of letters used. For example, c, cc, C and CC all represent the noteC, but each is in a different octave.

3.2 Harmonisation

Here, we consider the ways in which manual harmonisation can be approached.We also look at some previous attempts at automatic chorale harmonisation.

8

3.2.1 Manual Harmonisation

The act of harmonising a melody is an important part of composition that oftendictates the final style of a piece of music. It is not simply a matter of filling outmusical space (that is, creating a ‘fuller’ sound). Scruton [1997] distinguishestwo different ways of considering harmony: Chords, in which separate tones arecombined to form new musical entities; and Polyphony, in which the componentparts are melodies themselves. The second approach would tend to be moredetailed and ‘note specific’, since there are a number of different ways of pro-ducing the same chord, while the chord alphabet (major, minor, diminished,seventh, ninth etc.) is finite. Thus, there are two separate ways of going aboutmanually harmonising a melody — choosing chords to accompany the melodynotes and/or writing separate counter-melodies (which consequently produce amusical movement through chords).

Of course, if there are two ways of manually approaching the harmonisationtask, there may be two parallel ways of automating the process. That is, eitherharmonise each melody note with a complete chord, or sequentially build threeseparate and complete lines (in the case of chorale harmonisation) over the entiremelody.

It is also important to note that harmonisation cannot be satisfactorily ac-complished by choosing random chords or notes. There are certain restrictionsthat pertain to what will sound ‘good’, and what will sound ‘bad’. Usually, aperson can hear for themselves when a note combination does not work. This isexplained by the concept of Beating. Beating is the result of interference pat-terns created between sound waves [Scruton, 1997] (namely, the sound wavesproduced by two distinct notes). In musical theory, the notes that can be per-formed within a piece tend to be restricted to those that appear in the scalearound which the piece is centred (although, there are usually exceptions to thisrestriction found throughout the piece, called ‘accidentals’, marked by ], [ or \),as indicated by the key signature.

3.2.2 Automatic Harmonisation

As mentioned previously, there have been other researchers in the area of com-puter music, and artificial intelligence, who have tackled this same task of auto-matically harmonising chorale melodies in the style of Johann Sebastian Bach.These attempts have been made using a variety of computational techniques.Outlined below are some of the approaches that are relevant to this work.

Markov Models

Kaan M. Biyikoglu’s [2003] submission to ESCOM5 (The European Society forthe Cognitive Sciences of Music’s 5th Conference) explores the implementationof a Markov model for the harmonisation of chorales. A Markov model uses theassumption that the probability of an event is conditional on a finite number ofpreceding events.

The Maximum Likelihood Estimate (MLE) is used to estimate transitionprobabilities from the corpus. If w1 . . . wt is a sequence of random variablestaking values from a finite alphabet {c1 . . . cn}, then a 2nd-order Markov model

9

defines the MLE as follows:

P (wt = ck|wt−2 = ci, wt−1 = cj) =C(cicjck)C(cicj)

where C(cicjck) and C(cicj) are, respectively, the counts of the occurrences ofsequences (cicjck) and (cicj) in the corpus.

In the specific case of Biyikoglu’s system, the alphabet consists of chordsymbols (such as major, minor, diminished, sevenths etc.) built in all twelvepitches (such as C, C], D etc.). The entire corpus is transposed to the samekey prior to training (thus resulting in less zero-counts — that is, a sequence ofchords that never occurs in the corpus, though it still may be valid), and thetransition probabilities are determined using the MLE.

In the testing phase, candidate chords are chosen based on the requirementthat the note in the melody occurred within the chord. For example, if themelody note is an E, then the chord consisting of the notes C, E and G is acandidate chord, while the chord consisting of the notes C, E[ and G is not. Fi-nally, the chord progressions are determined by using the Viterbi algorithm (seehttp://viterbi.usc.edu/about/viterbi/viterbi_algorithm.htm). An ad-ditional stage uses voice-leading rules to assign each note in the chord to oneparticular part (i.e. Alto, Tenor, Bass), thus resulting in the required 4-parttexture.

Unfortunately, this system has not offered any results that can be used tomeasure the success of the approach as a means of effectively recreating Bach’swork.

Probabilistic Inference

Another approach, similar to that of Biyikoglu, that has been used to auto-mate chorale harmonisation, is that taken by Moray Allan and Christopher K.I. Williams [2005]. Their NIPS (Neural Information Processing Systems) Con-ference paper describes a system which uses Hidden Markov Models (HMMs)as a means for composing new harmonisations. Instead of using only the ob-served states as in the Markov case, HMMs work around the assumption thatthe observations occur due to some hidden states.

In the case of Allan and Williams’ system, the observed states are the melodynotes, and the hidden states are the chords. So, rather than only using themelody notes to restrict the possible chords in a sequence (as in the MarkovModel approach), the melody note is incorporated into a first-order model. Infact, the system described makes two first-order assumptions. Firstly, that theprobability of a chord occurring depends only on the immediately precedingchord. Secondly, that the probability of a particular observation being made(that is, a particular note) depends only on that current state (i.e. chord).

Again, the Viterbi algorithm was used to determine harmonic sequences (orchord progressions). However, there is an additional HMM introduced in orderto add complexity and realism to the harmonisation by way of ornamentation.Ornamentation involves the insertion of short notes as a means of making themusic more interesting. This additional stage smoothes out the transition be-tween the notes in a line of music.

This system does not produce particularly good results. That is, the gener-ated harmonisations rarely match Bach’s own composition (although, the author

10

did describe some attempts as “reasonable harmonisations”). A possible reasonfor this, cited by the authors, is the sparseness of the data. In the HMM used,there are 5,046 hidden chord states and 58 visible melody states. Additionally,the ignorance of context could be another shortcoming of the system. Perhapsusing a bigram model, or even a trigram model, would improve on the unigrammodel used, albeit at the expense of increased data sparseness.

Others

Although less closely related to the approach we are taking, there have been anumber of other approaches taken by computer music researchers in creatingautomatic harmonisation and composition.

Early work using Constraint-based systems show how rules may be used toassign a score to a particular choice for harmonisation [Pachet and Roy, 2001].Then, search algorithms may be used to produce the best possible harmoni-sation. Such a system does depend on the assumption that music is governedentirely by rules, and that such rules can be used to quantify the validity of achosen harmonisation.

Probabilistic finite state grammars have also been used to harmonise newchoral melodies. Conklin and Witten [1995] produced a system that combineddifferent models of properties of a musical sequence. These properties, such aspitch, duration and position in bar, are not dissimilar to the features we areusing to represent music within my system. In their system, a large number ofmodels are thus used in parallel.

Chorale harmonisation has also been attempted using neural networks. HAR-MONET [Hild et al., 1992], is a system that uses such an approach. Here, thenetwork is trained by being shown, at time t, the Soprano voice at t− 1, t and,t + 1, the harmonies from t− 3 to t, and the location of t in the musical phrase.The neural nets are used to, first, determine the Bass note to harmonise thecurrent Soprano note. Rules are then used to determine how the remainder ofthe chord should be constructed

Discussion

With particular regard to the work using Probabilistic Inference, we have foundsome of the issues faced to be quite indicative of potential issues that we toowill face. The main problem will be finding a way of encoding the music so thatthe data does not become too sparse, as well as finding an optimal amount ofcontext to use, again without resulting in data sparseness.

Where other researchers have treated the melody as though it were a resultof the chord progressions (or, as they call it, the harmonic motion), our approachwill treat the chords as coming as a result of the melody (since we are usingfeatures of the melody to classify it with a chord at any given time). This isa more logical approach, since Bach himself completed the composition of eachchorale by building harmony to match the already complete melody.

3.3 Machine Learning

For many years now, Machine Learning (ML) has been applied to a large numberof computational tasks. The idea that a computer system may automatically

11

improve with experience has led to a number of successful applications. TomMitchell’s whitepaper on ML [Mitchell, 2006] provides examples such as SpeechRecognition, Computer Vision, Bio-surveillance, and Robot Control. There area great number of existing ML algorithms that can be used to map observationsof an event to a resulting classification of that event. We will consider some ofthese algorithms, particularly those most relevant to our project.

3.3.1 Features and Feature Space

Firstly, most ML classification algorithms and approaches hinge on the ideathat we are trying to use a feature vector, representing some scenario or adescription of an entity (known as an ‘instance’), to reach some conclusionabout, or classification of, the instance being represented.

A feature is a specific property of an instance that can be observed andassigned a value (often within a specific domain). A feature vector is then ann-dimensional vector containing the values of these features.

As an example, if we are trying to use ML to predict whether it is going torain, we may choose two features to be Maximum Temperature and Humidity.Then, for every day we have on record, we can build a 2-dimensional featurewith that day’s maximum temperature and humidity. We will then be left witha collection of vectors and the classification they produced (Rain or No-Rain);e.g. 〈21◦, 35%〉 → No-Rain, 〈17◦, 70%〉 → Rain, 〈19◦, 75%〉 → Rain, etc. Withthis, one of the algorithms below can be used to build a model that, given anew 2-dimensional feature vector, can make a decision as to whether that vectorcorresponds to the Rain or No-Rain classification.

A feature space is an abstract space in which a feature vector can be repre-sented by a point in n-dimensional space (n being determined by the numberof features used to represent an instance). In the above example, it is not dif-ficult to imagine how each of these vectors would be represented as a pointon a 2-dimensional graph — the points (21,35), (17,70) and (19,75) respec-tively. ML aims to determine n − 1-dimensional classification boundaries in ann-dimensional space. This way, a new point in space (representing a previouslyunseen instance) can be classified based on where it appears in relation to theseboundaries. In the 2-dimensional case, a line may be drawn to separate allpoints that receive the Rain classification from those classified with No-Rain.

3.3.2 Concept Learning

Tom Mitchell [1997] defines Concept Learning as “Inferring a boolean-valuedfunction from training examples of its input and output”. A requirement ofthis approach is that each of the attributes has a small set of possible values.If the scenario explained above were simplified so that the only possible valuesfor Maximum Temperature were ‘Hot’, ‘Warm’, ‘Cold’ or ‘Cool’ and the onlypossible values for Humidity were ‘High’ or ‘Low’, then it would be an exampleof such a boolean-valued function, where each instance is labeled as a memberor non-member of the concept “It will rain”.

A Concept Learning system can determine the possible subset of values foreach feature that are required in order for an instance to belong to the targetconcept.

12

3.3.3 Bayesian Learning

Bayesian learning algorithms are based on Bayesian reasoning — a probabilisticapproach to inference. The assumption is that optimal decisions can be madeby reasoning about the probabilities observed in a corpus, and the observationsmade about a new instance.

Bayesian algorithms, such as naive Bayes, that calculate explicit probabilitiesfor the hypotheses are among the most practical approaches for certain tasks,such as that of automatically classifying text documents.

Bayes theorem, the principle behind Bayesian methods, provides a way ofcalculating the posterior probability P (h|D), the probability of a hypothesisholding, given the training examples. The theorem is formalised as follows:

P (h|D) =P (D|h)P (h)

P (D)

where D is the training data, h is the hypothesis in question, P (D|h) is theprobability of observing the training data given that the hypothesis holds, andP (h) is the prior probability that h holds.

Since P (D) is a constant independent of h, this formula can be simplified toP (h|D) = P (D|h)P (h).

As a simple example, if we are trying to determine whether it will rain ona day that is Hot with High humidity, we need to determine (from the trainingexamples) the probability of it being Hot when it has rained, the probabilityof it being High humidity when it has rained, and the overall probability of itRaining on any day.

In an ML application of Bayesian learning, the goal is usually to find thehypothesis h (in the example of text classification, the options may be thevarious genres of news articles — sport, world etc.) that maximises this posteriorprobability.

3.3.4 Decision Tree Learning

Decision Tree Learning is a method for approximating discrete-valued targetfunctions represented by decision trees. Previous applications of this methodhave included medical diagnosis and risk assessment on loan applications. Clas-sifications are made by moving down a tree, and branching in different directions,based on the value of a specific feature. According to Mitchell [1997], the mostappropriate problems for this approach should meet the following criteria:

• Instances are represented by attribute-value pairs (i.e. feature values).

• The target function has discrete output values.

• Disjunctive descriptions may be required.

• The training data may contain errors.

• The training data may contain missing attribute values.

Essentially, a Decision Tree is a (usually quite large) network of if-else state-ments. Branching occurs based on certain conditions, defined by feature values.When a Decision Tree model is constructed, a hierarchy of the features is learned.

13

In some cases, it may be possible to classify a new feature vector based on thevalue of only a single feature. If not, then the classifier will search down thetree until a final decision (classification) can be reached.

Decision trees are constructed top-down. The attribute that is tested at thetop node of the tree is the one which is deemed to best classify the trainingexamples when used on its own. A statistical test is used to weight the relativeimportance of each instance attribute. From here, a child node is constructedfor each possible branching from that node (that is, each of the different possiblevalues). This process is then repeated at each child, and continues until eachpath reaches a classification.

As a simple example (in the context of our project), if it happens that everynote that occurs at the beginning of a piece in our training data is harmonisedusing the tonic chord, then knowing that the note we are trying to classify is atthe beginning of a piece will be enough to determine the classification. However,if it happens that the note is not at the piece’s beginning, more features willneed to be checked (and our classifier will move down the tree).

Ross Quinlan’s C4.5 [1993] describes one implementation of a Decision Treelearning algorithm.

3.3.5 Instance-based Learning

In contrast to attempting to construct a general target function based on train-ing examples, Instance-based (or Memory-based) learning simply stores thetraining examples. Any generalising occurs only when an unseen example needsto be classified. This is when a more complex comparison to the training data,beyond simply finding an exact match, is used. This postponement of complexcalculations is the reason such methods are often referred to as “lazy” learningmethods. The main issue in Instance-based learning is the way in which priorinstances are used to classify new, unseen, instances.

The most basic method is the k-Nearest Neighbour algorithm. The standardEuclidean distance is used to determine the nearest neighbours of an instance.The Euclidean distance formula is a simple way of measuring distance in ann-dimensional space. This formula, for finding the distance between the twopoints P = (p1, p2, . . . , pn) and Q = (q1, q2, . . . , qn), is formalised as:

Distance =√

(p1 − q1)2 + (p2 − q2)2 + . . . + (pn − qn)2

From here, the most common classification amongst these neighbours is chosenas the classification for the new instance. A similar approach is the Distance-Weighted Nearest Neighbour algorithm. The main difference here is that neigh-bours that are closer to the new instance are given more importance than thosefurther away.

Locally weighted regression is a generalisation of these approaches. Thesame idea applies, except some other function, be it linear, quadratic, or other,is used to determine the classification.

Conversely to the Decision Tree approach, when using Instance-based learn-ing, the entire vector is immediately taken into consideration. There is no chanceof a classification being after the observation of only one single feature value.

There are multiple algorithms that use this approach, and TiMBL (theTilburg Memory-Based Learner) is an implementation of a number of such al-

14

gorithms. This is the ML software that has been incorporated into our finalsystem.

15

Chapter 4

Representations

An important part of any computational music task is to clearly define theways in which music can be encoded. Music exists purely as sound. It does,however, have a well-defined representation language that allows visual analysis.Similarly, we need to determine how we will represent our music in a way thatwill allow machine analysis. Possible representations are now discussed.

4.1 Classifications

Before considering the possible ways of encoding the input for the classificationphase of the system (that is, the feature vectors), it is equally important toconsider the different ways of encoding the output (that is, the classificationsthemselves).

There are a number of factors that need to be considered. One considerationis the overall architecture of the system. The main question is whether we intendto classify entire chords in one go, or rather attempt to handle each part (thoseparts being the Alto, Tenor and Bass lines) separately, using a separately trainedclassifier for each. Even once this decision is made, there are different ways toapproach each option.

These various approaches are further explained below.

Full Chord

Architecturally speaking, the simplest approach to classifying harmony is touse entire chords. With such a system, only one classifier needs to be trained,and that same classifier needs to be consulted once only for each event beingclassified. However, encoding these chords is not a trivial matter.

One option is to simply use note name combinations (for example, CEGor AC]E). However, even if octave is ignored (i.e. a low G is treated thesame as a G two octaves above), this would clearly result in very sparse data(that is, more sparse than is necessary to properly distinguish between differentharmonisations). The problems caused by encoding features in this way arealso later discussed. We have thus focused on alternate methods that avoidsuch problems in our implementation.

16

A sensible alternative to using note names, and the one that would mostsuit an approach that normalises each piece so that notes are represented assemitonal distances from the tonic, is to normalise the classifications to be rela-tive to the tonic. Additionally, since encodings in this task are essentially beingtreated symbolically (rather than mathematically), the chosen symbols for theseencodings is relatively unimportant as long as they remain consistent. Concernsabout this approach are discussed later.

A simple way of converting these classes to symbols is to build a string of 12bits where the first bit is the tonic, the second bit is the next semitone up andso on. Then, for each note in the chromatic scale (moving up in semitones), thecorresponding index in the string can be assigned a value of 0 or 1 depending onwhether or not that note is found in the harmonisation (1 if that note is ‘on’, 0otherwise). Each class will therefore consist of between zero and three 1s, andthe remainder 0s. For example, the chord consisting only of the tonic would berepresented by 100000000000. Using this approach, there are 1,464 possiblesuch strings (thus a maximum of 1,464 different classes) — although, clearly,not all of these will be musically acceptable.

Clearly, this approach fails to distinguish between the voicing of notes bydifferent parts. That is, a tonic chord in which the Bass part voices the tonicitself, the Tenor part voices the third, and the Alto voices the fifth would beconsidered equal to any inversion of that chord (provided that it is those samethree notes being used) after both have been normalised.

Individual Parts

Since attempting to assign an entire chord to a melodic event results in a largenumber of different choices (even after key normalisation), although probablynot the figure of 1,464 mentioned above, the reduction of possible classes that isachieved by treating each of the three harmonising parts separately should beconsidered as a potentially more intelligent approach.

Using this approach, there will only be 13 possible classes for each part(one for each note in the chromatic scale beginning on the tonic, plus a classrepresenting no new note is performed by that part — whether it be a rest, or asustained note from earlier). Of course, when the three parts are combined thereare significantly more combinations of classifications (2,197 in fact). However,each individual model will only need to consider 13 possibilities at a time.

This is quite a different task to that of choosing full chord classificationsfor each event. Perhaps this approach is more closely related to the approachthat Bach himself would have taken in harmonising a chorale. Rather thandetermining the harmonic motion of the piece, three separate and disjoint linesof music will be found and will hopefully combine to create the most appropriateharmonic sequence.

One issue that needs to be addressed in implementing a system that followsthis approach is determining the order in which the parts should be found.Clearly, each of the three harmonising parts is not independant of the othertwo. So, using the classification that has been produced by the other partswould be beneficial. This would, consequently, result in the introduction of newfeatures aimed at capturing the relationship between the part in question, andthe other harmonising parts.

17

Again, there are different ways of encoding the aforementioned 13 classesthat can be used by each classifier to classify an event in the melody. And,much like the options for Full Chord classifications, the chosen representationitself will have no impact on the results. Consistency is the only requirement.

4.2 Features

The nature of Machine Learning requires observations about a particular eventto be made, so that a decision can then be made about a resulting classificationfor the event. Much like a human would first consider some features of an event,they must be encoded for the ML software. Clearly, the efficiency of the systemwill hinge on the ability to capture as much information as possible with thefeatures chosen.

In order to categorise the types of features that can be used to describean event in a musical melody, we have defined two types of features. Theseare Local (or Melody) features and Global (or Piece) features. Within thesetwo categories, further subdivisions can be made to distinguish between pitch-related features, and timing-related features. However, we have not made thisdivision explicit.

Below, we have considered the different possible encodings of these featuresthat may be chosen.

4.2.1 Local Features

The features deemed Local features are those that are specific to the point inthe melody for which we are attempting to determine a chord. These featuresneed to be recalculated for every single event in the piece and may hold differentvalues for each event in the piece.

When determining harmonisation, the most important aspect that needs tobe considered is Pitch (that is, the name of the note, C or E[ for example).In the context of this project, note duration is, perhaps, less significant as weare focused on determining a set of notes to harmonise the melody — with theduration of those harmonising notes being ignored. However, as mentioned,time-domain features of the melody will be captured since, for example, it maywell be that a short note in the melody is unlikely to be harmonised by a newchord.

The melody features that need to considered are as follows:

Current Pitch

This is the pitch of the note being performed in the melody (that is, theevent we are trying to classify). There are a number of different ways of encod-ing this feature.

In order to avoid the problem of sparse training data, it seems essential thatthis feature is measured relatively, rather than using the actual real pitch (suchas C or E[). The obvious solution is to somehow normalise all of the pieces,so that relationships between the notes within a piece are captured, as opposedto the relationships between notes in general. Additionally, it seems reasonable(for greatly simplifying the task) to ignore the octave in which the note is being

18

voiced (as including this results in sparse data). So, if the piece’s key can bedetermined, the pitch of each note can be measured as a semitonal distance fromthe tonic. For example, if the piece is found to be in C (or, for that matter,Cminor), then a C note within that piece can be assigned the value 0, and anE[ the value 3, since they are 0 semitones and 3 semitones (respectively) fromthe tonic.

One immediate problem that does arise from this solution, and a problemthat leads into an entirely different computational music task, is that of deter-mining the key when it is not explicitly given in the score. Provided that thekey is explicitly stated, there is no need to address this issue.

Pitch of Previous/Next n Notes

These features allow the note being classified to be considered within thecontext of the melody. There are a number of decisions that need to be madewith regard to this set of features.

There are many different ways of encoding these features. One immediatelyobvious way is to simply reuse the values of the Current Pitch features forthose notes to which we are referring. This is reasonably straightforward toimplement, however it may not necessarily be the most effective approach.

An alternative is, rather than measure the pitch of surrounding notes as avalue relative to the piece’s tonic, to measure their pitches as relative to thecurrent Soprano note’s pitch. So, for example, we would explicitly capture thatthe pitch of the next note in the melody is an absolute distance of 1 semitoneaway, and the pitch in two notes time is 3 semitones away etc.

Another way of encoding these features is to indicate somehow the directionof the melodic contour. That is, to assign a value of ‘up’ or ‘down’, followedby a number of semitones. This more accurately captures the movement of themelody, and would also likely result in fewer value possibilities than the firstapproach mentioned above.

One general underlying concern is the way in which all of these encodingswill be using a symbolic approach to representing features. That is, even wherenumbers are being used to represent the features, these numbers will be treatedsymbolically by the ML system, rather than numerically. Mathematics is animportant part of music. However, while it would make sense to approach thistask in a more mathematical way, and it is possible to have ML software treatfeatures mathematically, the mathematics of music is not so simple. For thisreason, a more mathematical approach is left for future work.

Encoding aside, using different values for n may also have a significant impacton the success of this system. Taking larger context windows (higher n) willgive more information that helps a chord classification to be chosen. However,having too many context features may lead to sparsity of data within the modelcreated by the ML software — thus making it difficult to find closely matchingfeature vectors. The impact of this decision will be determined by runningexperiments using different n-values.

Current Length

This feature should represent the duration of the melody note currently being

19

Name SymbolSemibreve 1Minim 2Crotchet 4Quaver 8Semiquaver 16Demisemiquaver 32

Table 4.1: Note Length key

analysed. Simply having a well-defined (and sensible) mapping between eachpossible duration and a symbol should be adequate since the mathematical re-lationships between feature values are being ignored. Table 4.1 contains thesymbols used.

Of course, there are alternatives to this encoding of the length feature. Forexample, in the same way that a tonal centre is chosen for classifying pitchas a relative value, length could be measured as a relative value. However, asmentioned previously, this will intuitively have a less significant impact on howthe melody is to be harmonised.

Length of Previous/Next n Notes

In the same way that the pitch of the surrounding notes in the melody can berepresented by features, so too can the length of these notes. Whatever decisionsare made in regard to both the chosen window size (as previously mentioned)and the chosen encoding of length (also previously mentioned) will have to befollowed by this set of features.

Distance to Previous/Next Bar

Although it may not seem particularly relevant to harmonisation, the locationof a note within a bar (also referred to as a ‘measure’) of music can have an im-pact on its harmonisation. Bar lines often provide a basic (though, by no meanscomprehensive) partitioning of musical phrases, and a note at the beginning ofa phrase may well be treated differently to a note later in the phrase.

The encoding of these features should be quite simple. We can hope thatthe ML software used is able to determine the ways in which various featureswork together. So, it should be adequate to count the number of notes (that is,melody events) that occur between the current note and the bar line in question.From this, if the length of those notes is needed, they can be taken from thecontext length features described above, and a more precise placing of the notewithin the bar can be made.

Location within Piece

While the fact the location of a note within a bar may have an impact on itsharmonisation has already been mentioned, we may also consider the locationof that note within the piece as a whole. For example, if the note occurs at

20

either extreme of the piece (beginning or end), it may well be that the chord ismore likely to be the tonic chord.

Encoding this feature can easily be done by counting the number of barsin the piece (say, totalBars), then noting the number of the bar in which thenote in question occurs (say, currentBar), and working out currentBar

totalBars to finda value between 0 and 1. This number would not give a particularly preciselocation of the note in question (i.e. the value of this feature would be the samefor each note within the same bar). Also, this feature should not be taken assymbolic.

4.2.2 Global Features

While the features that are specific to each individual event may be more usefulin helping to classify that event with a chord, it may be necessary to appendsome more global information about the piece to that event’s feature vector.

Of course, the need for such features depends greatly on how the abovemelody features are represented.

The global features that we have considered are as follows:

Metre

The metre of a piece (indicated explicitly by the piece’s Time Signature),states the piece’s underlying rhythm and defines what constitutes a bar. Themajority of pieces in the chorale corpus used (close to 90%, in fact) are in SimpleDuple time (4/4). This indicates that each bar is comprised of 4 crotchet beats.The remaining pieces are in Simple Triple time (3/4). In such cases, there are3 crotchet beats per bar.

Fortunately, a piece’s metre is explicitly stated at the beginning of the score.So, one of the two values mentioned can be assigned to this feature for eachmelodic event within the piece.

Key Signature

Obviously, unless the pitch of a note is represented as a raw note name (suchas C or E[), then the entire key signature of a piece need not be representedby a feature (since all pieces will be normalised to the same tonal centre, whichis represented by 0). However, it is worth noting that the key signature doesneed to be extracted from each piece in order to deduce the values for all pitchfeatures. On top of this, it seems wise to use the piece’s tonality (major/minorclassification) as a feature, since this tends to define a scale of notes which canbe used. Although other notes may be used, they tend to occur less frequently.

In the past, this issue of chorales being major or minor has resulted inresearchers simply building two separate models for harmonisation — a majormodel and a minor model. The result of this is a reduction in the amount oftraining data that can be used to build each model. Our approach is to buildone model, and simply rely on a feature to capture this partition in the corpus.The use of a tonality (major/minor) feature will be tested. However, the piece’stonic will be used implicitly in our system.

21

Piece Length

Once again, this is a feature that should be made redundant by other features.In this case, the length of the piece will be used in determining a value for theevent’s Location within the Piece feature, however the length itself will not addany useful information to assist in making classifications.

The use of the piece’s length as an actual feature will not be tested in oursystem.

Previous n Classifications

Since we are aiming to find the optimal progression of chords to harmonisea melody, it seems ideal to incorporate the context of the harmony as well asthat of the melody. So, being able to implement a system that is able to pro-duce classifications ‘on-the-fly’, thus allowing these classifications to be used asfeatures in the following events, is our goal.

The representation of these features, of course, depends on the encoding ofclassifications (discussed earlier). Also, the choice of an appropriate value for nis as much a concern as for the local features mentioned above.

22

Chapter 5

System Architecture

In this chapter, we will consider the architecture of our system on three lev-els. We begin with the most abstract description, and conclude with a specificdescription of the scripts that comprise our final implementation.

5.1 High Level

The system developed completes three main tasks. The first task involvestraining a Machine Learning classifier that is capable of harmonising a choralemelody. The second is the classification stage, in which this classifier is testedwith new melodies. The third task we need to consider is that of evaluating oursystem’s accuracy. The various ways of completing this latter task are furtherexplored in Chapter 6.

Below, the first two tasks are further broken down into more specific sub-tasks.

5.1.1 Training

The training phase is comprised of every step that is required to take a corpusof chorales (in whatever format is available to us), prepare it for processing,convert each event that we wish to classify into a feature vector (and determineits classification), arrange these vectors/classifications so that they are in theformat required by our chosen ML classification package, and feed this data tothe classifier.

These phases are explained conceptually below.

Normalise the corpus

Beginning with a collection of complete chorale scores, some pre-processing ofthe data will take place. This will ensure that all useful information (informa-tion that will contribute to the determination of feature values) is kept, whileunnecessary information (such as code that is used purely for formatting andaesthetics) is discarded.

The result of this phase is a collection of simplified chorales, stored in a moreeasily read file structure, ready for further processing.

23

Create vectors

This is a conversion phase. Rather than simply removing redundant data,each chorale is encoded in a completely new, machine-readable, format. EachSoprano note is replaced by a vector (ordered sequence of values) capturingelements such as the note’s pitch, length, the pitch and length of surroundingnotes, and so on.

Additionally, the harmonisation that accompanies each event is encoded andstored.

After this phase, we have a collection of chorales — each represented as asequence of vector/classification pairs. This is the ‘internal representation’ ofthe chorales.

Format vectors

While the chorales have been formally encoded to be more easily read byour system, the vectors and chorales need to be ordered and arranged so thatthe ML software can read them in and subsequently develop a model for classi-fication (that is, harmonisation).

The formatting required depends on the chosen ML package. After format-ting, the data is ready for processing. This is the ‘external representation’ ofthe chorales.

Train classifier

By feeding the newly formatted vector/classification pairs to the chosen MLpackage, a model is built. In our case, this happens as a black box (that is, weuse a ready-made package), although we have input as to the type of model andalgorithms used.

Once this model has been built, we can begin testing our classifier.

5.1.2 Testing

The testing phase does contain a number of steps that are identical to thosetaken in the training phase. However, there are some differences in the waytesting is implemented.

Normalise melody

This is similar to the normalisation process in training. Again, only use-ful information is extracted from the melody score. The only difference is thatwe are, conceptually, only going to be dealing with a melody as input — ratherthan an entire four-part chorale. However, if we are using a portion of ourchorale collection for testing — and evaluating — the normalising is, in fact,exactly the same.

The result is a simplified representation of the melody, with no redundantdata.

24

Create vector

Taking the simplified melody, a vector using the same features as in trainingis built to encode each note event in the melody.

The only difference between this process in training and testing is that thereare no accompanying classifications for the testing phase. The classification isthe unknown that we seek to determine.

Format vector

In order for each feature vector to be read by the ML system we use, someformatting is required. Again, no classifications are known. However, the samefeature formatting to that used in the training phase is used again.

Apply classifier

It is in this phase that the classification model, and the classifier itself, isconsulted.

The vector will be fed to the ML package, and the model built at the end ofthe training task is used to determine the best classification (harmonisation) tomatch this vector.

5.2 Low Level

The system developed comprises multiple programs. These programs begin withthe Bach chorales in **kern format, and finish with suitable input for the chosenMachine Learning software.

Since the training and testing phases contain a number of overlapping (someidentical) tasks, the implementation of the two phases can be combined into aset of programs that can be run in different modes, ‘testing’ and ‘training’.

Below is a summary of each step to be taken.

Normaliser

• Input: **kern files.

• Output: Stored events (Soprano-note/classification pairs) and their ini-tial feature vectors.

• Description: Essentially, this stage extracts each event and its classifica-tion (i.e. the chord). It also initialises the feature vectors for each event ineach song, as there is some global information that needs to be capturedbefore it is discarded. For example, the piece’s metre is used as a featureand this remains constant for every event in the chorale. Additionally, thepiece’s key signature needs to be stored as certain features are measuredrelative to the key’s tonic.

If testing is being done on an unseen melody, with no known accompani-ment, then the process is applied, though without chords. Essentially, allthe information stored is the same. It simply is not necessary, or possible,to extract information about the events’ classification.

25

!!!COM: Bach, Johann Sebastian!!!OPR: Chorale Harmonizations!!!ONR: Nr. 125!!!OTL: Mein’ Augen schliess’ ich jetzt!!!SCT: BWV 378!!!YEC: Copyright 1996, Center for Computer Assisted Research in the Human-ities!!!YEM: Rights to all derivative electronic formats reserved.!!!YEM: Refer to licensing agreement for further details.

**kern **kern **kern **kern*ICvox *ICvox *ICvox *ICvox*Ibass *Itenor *Icalto *Isoprn

* * *Ialto **k[f]] *k[f]] *k[f]] *k[f]]*M4/4 *M4/4 *M4/4 *M4/4*clefF4 *clefGv2 *clefG2 *clefG28GG/L 8B/L 4d/ 4g/8AA/J 8A/J . .

=1 =1 =1 =18BB/L 4G/ 4d/ 4g/8C/J . . .8D/L 4F]/ 4A/ 4d/8C/J . . .4BB/ 4G/ 4d/ 4g/4AA/ 4c 8e/L 4a/

. . 8f]/J .=2 =2 =2 =2

etc.

Figure 5.1: First 2 bars of a Chorale in **kern

NOTE: 4g/ CHORD: (G,B,D) VECTOR: (4, g, 4/4, Gmajor, ??, ??, ??, ??)

Figure 5.2: Sample Normaliser output of first note

The data collected by this phase is a list of songs, each mapping to a listof events, chords (if training) and vectors.

Figure 5.1 shows the type of data we begin with — (a portion of) a four-part chorale in **kern format. Figure 5.2 gives an example of how eachSoprano event (in this case, the first) in the chorale may be represented.Note that the vector is, at this stage, incomplete (unknowns representedby ‘??’) and possibly incorrectly encoded. For example, the Key (4th)feature may eventually lose the tonic, and simply say ‘major’.

Vector Builder

• Input: The output of the Normaliser.

• Output: The final feature vectors.

26

• Description: This stage analyses the melody and determines all of thevalues for the features defined.

Some feature values are altered to better represent the observations theyare aimed at making (as mentioned with regard to the Key feature). Allfeatures are now in the correct format, and most features now have theirvalues determined, although there are exceptions to this case where valuesare unknown and indeterminable. If training, the chord extracted by theNormaliser is also converted to the final format that is used as the event’sclassification.

Additionally, the original representation of the note, found in the score,is now unnecessary and can be discarded. This is because the featurescomprehensively describe the event.

Vector Formatter

• Input: The vectors (and, if training, the classifications) for each note.

• Output: A representation of the vectors and classifications that is suit-able to be fed into the Machine Learning software.

• Description: This stage prepares the data for the ML software. Doingthis separately from the other stages allows flexibility if different MachineLearners are used.

The data needs to be arranged in a single line, and a special characteris needed to represent that a feature value is unknown. In the case oftraining, all vectors/classifications are stored in a single file to be used bythe classifier to build a model. If training, one line of data (representingone event’s vector) is used to make a new classification.

It is also at this stage that we can choose which features are used inour vectors. So, if we want to run experiments using different featurecombinations, this will all be controlled at this final formatting stage.

Figure 5.3 presents a diagram of the basic architecture.

We have not yet discussed the methods of evaluation. This will be undertakenin more detail in Chapter 6. If we are using complete chorales from our corpusas test data, then we will need to ensure that the chord is extracted by theNormaliser (as when training). In fact, everything except the formatting of theML input is identical to the training process.

When it does come time to get classifications, the actual classification ex-tracted from the complete chorale is stored so that it can be compared to theclassification chosen by the classifier for some empirical evaluation.

Additionally, if evaluation by listening to the output is going to occur, thenany data that may have been removed from the original **kern chorale, but thatis needed to convert the events and their chosen classifications back to **kernneeds to be stored. If a **kern file is reconstructed, complete with our system’sharmonisation, there is software available that can convert **kern to MIDI —thus allowing us to listen to our results.

27

Normaliser #"

!Events,Vectors

XXXXXXXXXXXXz

��9Vector Builder

XXXXXXXXXXXXz

��9Vector Formatter

#"

!Vectors

**kern files

?

ML input?

Figure 5.3: System Architecture

28

5.3 Implementation

The implementation of our system consists of a number of python scripts. How-ever, there are three main scripts that need to be executed to complete a fullround of training and testing (using, for example, tenfold cross-validation).

These three scripts essentially perform the three tasks defined by our lowlevel system architecture. The first acts as a normaliser. The second, as thename below suggests, acts as a vector builder that also consults the classifierwhen testing. Furthermore, provided a complete chorale is being used, it cancompare the harmonisation used by Bach to that produced by the classifier.Similarly, the third acts as a vector formatter (but only for the training phase).

Below is a summary of each script.

1. chordExtractor.py

• Input: Directory of **kern files → bwv001.krn, bwv002.krn, . . .

• Output: Pickled dictionary mapping song’s filename to list of events(Soprano-note/classification pairs) → songs.pkl, and the initial fea-ture vectors → vectors.pkl.

• Description: Essentially, it extracts each event (every note per-formed by the Soprano part) and its classification (i.e. the chord).It also initialises the feature vectors for each event in each song, asthere is some global information that needs to be captured before itis discarded.

2. vectorBuilder.py

• Input: songs.pkl and vectors.pkl

• Output: If training, vectors.pkl is completed. If testing, accuracyresults, and a textual representation of the results, are produced.

• Extra requirements: This program calls on a number of featurevalue extractors defined as separate classes. If testing, it also connectsto an ML system, trained on the output of the testing phase, forinstant classifications. The ML system, ideally, is waiting in a servermode. It is fed a single formatted feature vector, and returns aclassification derived from the model that was built in training.

• Description: This program is used for both training and testing,but the output and processes differ. If training, the classificationsare obtained from songs.pkl (and converted to the desired values). Iftesting, classifications are made by sending a single formatted vectorto the ML system. Then comparisons are made to the classificationsstored in songs.pkl (again, after they’re converted) to check accu-racy. This is the simplest way of evaluating the system’s accuracy —checking for exact matches.

3. reformatC45.py

• Input: songs.pkl and vectors.pkl

• Output: A single file containing every vector and classification fromthe training data, formatted to be fed into the ML system for learning(the c4.5 format is accepted by TiMBL) → chords.train

29

• Description: Only for training, this program needs to consult songs.pklfor the actual classifications, and vectors.pkl for the correspondingvectors.

In order to convert the results to a set of output that can be aurally as-sessed, additional scripts are required — as well as external pieces of conversionsoftware. These are not, however, integral to the actual architecture of ourharmonisation classifying system.

30

Chapter 6

Evaluation

With the task of automatic harmonisation, there are a number of different waysto measure the success of the working system. One way is to listen to theresulting harmonisations — even to have a number of trained baroque musicianslisten to them — and rate them purely on how they sound. This project is,however, a computational task. Thus, we cannot neglect to treat the resultsempirically. In past research, evaluation has often only been conducted usinghuman judgment to rate the system’s success subjectively. It is important tofind some way of empirically evaluating our system.

Possible evaluation methods are outlined below.

6.1 Empirical

Since the task of automatic harmonisation is being treated as a classificationtask, evaluation must be carried out in a similar way to methods used by re-searchers working on other classification tasks such as part-of-speech tagging,named entity recognition, and similar areas that utilise ML techniques.

A tenfold cross-validation can be used to ensure a comprehensive testing ofthe system. This involves partitioning the data into ten subsets, and runningthe system ten times. On each run, a different subset should be used for testing,while the remaining nine subsets are used for training.

The main problem faced is determining how to partition the data. There aretwo obvious options in the case of this task. Firstly, all of the vectors could beconcatenated in one file, and that file then randomised so that 1/10 of the datacan be removed at a time. However, this assumes that the vectors can all bebuilt prior to any interaction with the ML classifier. So, if we intend to obtainclassifications ‘on-the-fly’ and use them as features in the following vectors, theordering of events will have to remain the same. Additionally, it may well be thecase that a Soprano note’s location within the piece has an impact on the systemto correctly harmonise it — particularly if context is being used, since the firstand last notes have no prior or future context respectively. So, randomising thepieces could result in certain datasets having an irregular amount of notes thatoccur at the extremes of a piece. This could lead to significant differences inthe results produced by certain datasets.

The second option is to partition at the song level. That is, take an entire

31

piece at a time when splitting the data. Now, certain pieces might be moredifficult than others for predicting their harmonisation. For example, perhapsBach was in an unusual mood when he was working on a particular piece. Theproblem here is that an entire piece will make up a significant portion of thedataset being used for testing. Thus, the results for this set may stand out assignificantly worse than for the rest.

Despite the potential problems caused by choosing the latter option, thisseems the optimal way of running our system with full flexibility — mainly,using previous classifications as features. It is clearly the best option.

The basic steps involved in performing the tenfold cross-validation are asfollows:

1. Split up the data into training (9/10) and test data (1/10)

2. Run chordExtractor.py on the training data

3. Run vectorBuilder.py in training mode

4. Run reformatC45.py

5. Run chordExtractor.py on the test data

6. Train the ML system on the reformatted training data, and leave it waitingfor connections

7. Run vectorBuilder.py in test mode

This is then repeated nine time so that each subset of data is tested once.Empirical results are then obtained, and the results are also ready for further

analysis, including conversion to MIDI for aural analysis.

Statistical Comparisons

As statistical results are collected, it is difficult to determine the significanceof changes made to the system without the use of some statistical test. The t-test can be used to determine whether the means of two groups are statisticallysignificantly different from each other (a normal distribution is assumed). So,using the results obtained from one run of the system, together with the resultsobtained from a second run using different settings, we can check whether theresults have become significantly better or worse. The importance of this isthat we are then able to infer that the changes made to the system have ac-tually improved, or impaired, its performance, as opposed to the results being‘essentially’ the same.

The formula for the t-test is a ratio. The numerator is the difference betweenthe two means (in our case, the average accuracy across all ten test sets), whilethe denominator is a measure of the variability of the scores. If the t-valueis positive, the second set of results is an improvement on the first. However,a threshold needs to be set for deciding what indicates statistical significance.Conventionally, if the p-value of the calculated t-statistic is less than 0.05, thensignificance is concluded.

32

6.2 Qualitative

The purpose, and ultimate goal, of our system is to produce harmonisations thatclosely match the harmonisations produced by Bach himself. It seems undulyscientific to only assess the results by a statistical comparison, particularly sincethere can be multiple harmonisations that can still be deemed ‘correct’, thoughnot used by Bach in the given instance. While statistical evaluation is vital, it isalso necessary to listen to the results, if possible, and perform further evaluationbased on the aural appeal of the resulting harmonisations.

Clearly, using the human ear to evaluate is more complicated than measuringresults empirically. However, it is important to determine how successfully thesystem has learnt Bach’s style. This can be done by listening to the music,provided the listener has some experience with the music of Bach, and theBaroque style in general. A possible scenario could involve a listener hearing arandomised group of 4-part chorales, and having to classify each harmonisationas ‘Bach’ or ‘Computer’.

It is also important to note that there are many different possible harmoni-sations that are equally valid and, perhaps, even equally similar to the style inwhich Bach composed his own harmonisations. It is quite feasible to imaginethat Bach may have often been faced with multiple options from which he choseone at random. Perhaps he even made decisions that avoided one composi-tion bearing too much similarity to another composition. Clearly, if this is thecase, the learning task becomes significantly more difficult, especially if exactnote/chord predictions are expected.

33

Chapter 7

Results

Here, we present results produced by the system, and analyse them in a waythat will allow us to reach conclusions regarding the success of our approach,and the impact of various settings, as well as ways in which we can improve onour implementation.

We need to begin by describing the external software and data that will beused. Then, we proceed with a comprehensive analysis of all experiment results.

7.1 Experiment Settings

All of the outlined experiments have been executed using the same MachineLearning package. A single corpus has also been used. These are both outlinedbelow.

7.1.1 ML Classifier

The ML package we have chosen to use is the Tilburg Memory-Based Learner(TiMBL). This software package is freely available from http://ilk.uvt.nl/software.html. It was developed at Tilburg University in the Netherlands.The appeal of this system is the flexibility offered with regard to the algorithmused to make classifications. Options include the IB1, IB2, TRIBL, TRIBL2,and IGTREE algorithms. IGTREE is a fast heuristic approximation of IB1,while TRIBL is a hybrid combination of the two. The software also offersvarious weighting metrics, such as the modified value difference metric (mvdm)in which each pair of values for a particular feature is assigned a value difference.

Another attractive feature of TiMBL is its ability to operate as a server.Once the classifier has been trained, it will wait for clients to make connectionsand send feature vectors through for classification. Then, it simply returnsthe chosen classification to the client. This allows classifications to be made‘on-the-fly’ — working nicely with the architecture of our system.

7.1.2 Data Collection

The corpus we have chosen to work with is a collection of Bach chorales, ex-tracted from the official Humdrum website (http://kern.humdrum.org/). Wehave taken 230 **kern files, and randomly split them into 10 subsets (split at

34

the song-level — i.e. 23 full chorales in each set). See Appendix A for a list ofthe files in each subset.

The chosen class encoding we have used is the 12-bit string discussed inChapter 4. Having extracted all of the chords (represented as 12-bit vectors)from the entire corpus, 171 such classes were found (significantly less than themaximum figure of 1,464 mentioned in Chapter 4).

The issue of determining each piece’s key has been mentioned already. For-tunately, most of the pieces in this corpus do explicitly state the key. However,a number of the pieces simply give the key signature — that is, a list of flatsor sharps. With this information, it is possible to narrow down the key to twopossibilities — a major key, or the relative minor key. There is no fail-safe al-gorithm for determining which of these two possibilities is the correct one. Forthis reason, we have settled for assuming it to be the major option (which seemsto be the case more frequently).

7.2 Baseline

In order to get a general idea of the difficulty of this task, and to be able tocompare the improvements of our system’s output, it is important to define abaseline. This is an accuracy rate that we can aim (and expect) to improveon with our system. Achieving statistically significantly better results suggeststhat we have implemented an intelligent approach to the task.

A common baseline that is used for classifications tasks is to take the mostcommon class for every classification. The most common class in our corpusis 100010010000. This is to be expected, as this chord is a major triad (notes1,3 and 5 of the major scale) beginning on the Tonic. The baseline appears inTable 7.2 as Base.

7.3 ML Settings

Our initial implementation has used the following features:

NotePitch (0, 1, . . . , 11) Pitch of the note, as an interval distance fromTonic.

NoteValue (1, 2, 4, 8, 16, 32) Length of the note (in **kern’s default rep-resentation).

LastBar (0, 1, . . . , 7) The number of note events since the bar began.

NextBar (0, 1, . . . , 7) The number of note events remaining before the barends.

Metre (3/4, 4/4) The song’s metre (time signature).

MajorOrMinor (Major, Minor) Whether the song’s key is major or minor.

SongLocation (Continuous) The location of the bar in the song (float be-tween 0 and 1).

Note−nValue (1, 2, 4, 8, 16, 32) Lengths of previous melody events.

35

Settings Results %IB1 (k=1) 32.30 (6.94)IB1 (k=3) 26.68 (3.45)IB1 (k=5) 25.69 (2.40)IB1 (k=1) + mvdm 31.67 (5.44)IB1 (k=3) + mvdm 27.01 (2.00)IB1 (k=5) + mvdm 27.05 (2.58)IGTREE 21.91 (0.10)

Table 7.1: Initial results (Mean,(Std Dev)) with various TiMBL settings

Note−nPitch (0, 1, . . . , 11) Pitches of previous melody events.

Class−n (string of 12 bits) The classification of the previous events.

Note: In all features using n, the feature is constructed for n = 1, 2 and 3. Ifa feature value is not determinable (for example, the current note is the first inthe piece - thus no prior context is available) then the character ‘=’ is used torepresent that the value is unknown (in accordance with TiMBL’s conventions).

We have run the system on the data sets seven times with different machinelearning settings — IB1, IB1 with mvdm (both using k = 1, 3 and 5) andIGTREE.

The aim of these tests is to determine how big an impact the different TiMBLsetting have on my results and, if possible, which settings will produce bestresults overall.

The mean and standard deviation of the percentage accuracies obtained(from tenfold cross-validation) are shown in Table 7.1.

It is difficult to confidently determine the best ML settings. In general, itseems that using mvdm has produced slightly less accurate results with theIB1 algorithm, compared to the experiments run without mvdm (although thedifference is insignificant).

This pattern of mvdm making no improvement on the results continuesthroughout all of the below experiments. Additionally, using k = 1 for theIB1 algorithm outperforms higher values of k. Consequently, Table 7.2 showsonly the results obtained using IB1 (k = 1) and IGTREE.

For comparison, the results obtained in this section are labelled Init in theresults table.

7.4 Feature Encodings

We have next altered some of the features in an attempt to determine the impactthat certain features (and their encoding) have on the results.

7.4.1 Context Size

The results below correspond to the testing of two new context window sizesfor the preceding melody note features.

36

• First, we have chosen n = 1 (one note to the left of the current note istaken as context). These results appear in Table 7.2 as f1a.

• Second, we have chosen n = 5 (five notes to the left of the current noteare taken as context). These results appear in Table 7.2 as f1b.

One immediately interesting observation is the impact these changes havehad on the performance of the IGTREE algorithm. Both n = 1 and n = 5encodings have produced the exact same results. Whereas the original n = 3results were different (marginally worse, in fact).

It is interesting to note that using a context window of one note has producedthe best results — being marginally more accurate than the original n = 3results. These improvements are not statistically significant (according to thet-test). On the other hand, extending the window to 5 has had a significantlydetrimental effect.

We have thus chosen to keep three notes as context for future experiments(as per the initial settings).

7.4.2 Relative Context Pitch

These results appear in Table 7.2 as f2.The changes made for these results are that all previous pitch features have

been assigned values relative to the current soprano note (as a semitonal dis-tance), rather than relative to the tonic as in all previous experiments.

These results make it difficult to reach many strong conclusions. There wasa statistically significant drop in the accuracy of the IB1 ML algorithm with thenew encoding method. However, curiously, there was an extremely significantimprovement in the performance of the new system when the IGTREE algorithmwas used.

So far, encoding relative to the tonic and using the IB1 algorithm has beenmost successful.

7.4.3 Contours

These results appear in Table 7.2 as f3.For these results, the only context features used are one feature to say

whether the melody has moved up, down, or flat from the previous note, andone to say the same for the next note.

This is the first time we have evaluated the use of any melodic ‘look-ahead’.Previous experiments used only the context of preceding notes in the melody.

This has not led to significant improvement on the initial IB1 results.

7.4.4 Previous Length

These results appear in Table 7.2 as f4.For these results, the three features Note−nValue (for n = 1, 2, 3), which en-

code the duration length of the previous three melody notes, have been removed.That is, only the pitch of previous notes are observed.

No future context is observed.While the IGTREE results have again significantly improved on the initial

results, IB1 results are yet to improve significantly.

37

7.4.5 Previous Classifications

These results appear in Table 7.2 as f5.For these results, we return to our initial features (Init), and remove the

Class−n features (for n = 1, 2, 3) which encode the classifications given to theprevious three melody notes.

It seems that the previous chords’ inclusion had a weakening effect on thesystem’s accuracy. The comparison between these IB1 results and the initialIB1 results show the improvement to be insignificant. However, the IGTREEresults, when compared with those from the initial experiments, are an evenmore significant improvement (though, not significant when compared to f2with IGTREE).

A likely reason for this ‘harmonic motion’ context not improving the resultsis that so many of the prior classifications are wrong. So, we are in effect puttingincorrect values in our feature vectors around 65% of the time.

7.4.6 Future Context

These results appear in Table 7.2 as f6.As explained previously, all experiments so far have used only previous

melody notes as context (aside from the contour encoding experiment). Thisessentially models the improvisation process in which one is unaware of whatthey can expect next. For the following results, some ‘look-ahead’ is introduced.Not only do the Length (Value) and Pitch of the previous three notes get takenas features, those of the next three notes are also included (Note+nPitch/Value,with n = 1, 2, 3). All other features are the same as in Init.

The results for IB1 are statistically significantly better than any resultsachieved so far. It is interesting to see that this is the impact of looking aheadat the melody. Again, IGTREE results are significantly better than the initial,though not so in comparison to f2.

As with the experiments using only previous context, changing the windowsize to 1 or 5 had an insignificant impact on these results.

7.4.7 Future Context / Previous Classifications

These results appear in Table 7.2 as f7.This experiment is a combination of the two changes made in f5 and f6, with

the use of future melody context features (Note+nPitch/Value, with n = 1, 2, 3),and the removal of previous classifications (as they had a negative impact).

The IB1 results have not significantly improved on the best so far (f6) —although, they are slightly better. The IGTREE results have now reached asignificant improvement on those from the f2 experiments.

7.4.8 Location and Metre

These results appear in Table 7.2 as f8.The features used for these results are the same as f7, with the removal

of location features (including the length-value of context notes, which weredesigned to capture information about the location of the note within a bar)and the piece’s metre. The idea here is to try a more ‘localised’ approach to

38

classification — that is, not so much considering the larger context of the event,and using only the tonality global feature.

It is quite interesting to note the improvement on the IGTREE results,suggesting that this ML algorithm is best suited to fewer features. However,this improvement is not significant.

Intriguingly, the results for IB1 are not significantly different to our otherbest results so far (neither better, nor worse).

7.4.9 Pitch Features

These results appear in Table 7.2 as f9.For this experiment, even fewer features have been used than in f8, with the

extra removal of all length features (those of the note being classified, and thoseof the context notes).

The motivation for this experiment is to determine whether IGTREE simplyworks better when fewer features are used, and to determine the impact ofusing a vector that captures only information related to pitch — that is, alltime-domain features have been removed.

Clearly, since these results are not as high for either algorithm, it is not sim-ply a case of reducing the number of features. Also, using note length featuresobviously does have an impact on the way chords are determined (contrary toour hypothesis in Chapter 4).

7.4.10 Tonality

These results appear in Table 7.2 as f10.Since the work of other researchers attempting this task has seen a differ-

ent treatment of the pieces’ tonality (usually, by building separate classificationmodels for major and minor pieces), this experiment is aimed purely at deter-mining the impact of the majorOrMinor feature (about which we have alreadyexpressed concerns).

The fact that the removal of the majorOrMinor feature had an insignificantimpact on the results obtained by the f9 experiments (which focused only onmelody pitch features) suggests that this feature is not being used effectivelyby the system, and perhaps the ‘key’ issue needs to be considered in differentways.

7.5 Classification Approaches

It is worth considering whether or not the results are going to improve if threeseparate classifiers are run (one for each voice: Alto, Tenor and Bass). We haveused the initial features to test a classifier that builds a Bass part to match thegiven melody, a classifier that builds an Alto part to match the given melody,and one that builds a Tenor part. The results are shown in Table 7.3

Judging from the results, using individual part classifiers would not lead toa more accurate system if all parts were handled separately. Clearly, if we wereto attempt a system of this nature, the Bass part would have to be determinedfirst — and using the result as a feature would perhaps help in determiningthe other parts. However, with approximately 60% of these Bass classifications

39

Label NP NV LB NB M MoM SL n C Pr IB1 IGTREEBase 8.71 (0.86) 8.71 (0.86)Init y y y y y y y 3 -,PV,T y 32.30 (6.94) 21.91 (0.10)f1a y y y y y y y 1 -,PV,T y 33.82 (5.69) 22.11 (1.88)f1b y y y y y y y 5 -,PV,T y 26.42 (6.47) 22.11 (1.88)f2 y y y y y y y 3 -,PV,R y 23.77 (4.94) 30.19 (6.00)f3 y y y y y y y 1 +-,P,C y 33.45 (5.28) 29.43 (5.93)f4 y y y y y y y 3 -,P,T y 32.35 (6.35) 22.11 (1.88)f5 y y y y y y y 3 -,PV,T n 37.14 (6.18) 34.57 (7.44)f6 y y y y y y y 3 +-,PV,T y 39.39 (6.07) 31.33 (6.00)f7 y y y y y y y 3 +-,PV,T n 41.71 (5.80) 36.77 (6.20)f8 y y n n n y n 3 +-,P,T n 40.34 (5.74) 40.98 (5.14)f9 y n n n n y n 3 +-,P,T n 36.65 (5.10) 37.39 (5.05)f10 y n n n n n n 3 +-,P,T n 36.47 (5.15) 37.29 (5.05)

Key:NP = NotePitch, NV = NoteValue, L/NB = Last/NextBar, M = Metre, MoM= MajorOrMinor, SL = SongLocation, n = size of context window, C = melodycontext features [+ = future, - = past, P = pitch, V = value, T = relative totonic, R = relative to current pitch, C = contour], Pr = Class-1/2/3.

Table 7.2: Results — Mean and Standard Deviation for Accuracy percentages

Part IB1 IGTREEBass 42.26 (5.99) 30.61 (3.21)Tenor 19.36 (1.39) 20.89 (1.30)Alto 18.70 (0.94) 20.57 (1.11)

Table 7.3: Results (Mean, (Std Dev)) for individual part classifiers

40

proving inaccurate (and approximately 80% of the Tenor and Alto parts), it isdifficult to justify this approach.

On the other hand, it is quite possible that the results would greatly improvewhen all three chosen classifications are finally combined to make the chord foreach event (when compared to our current chord results). Looking at each partindividually, as we have here, is taking into account the line that voices eachnote in the chord. It may be that, once combined, the Bass part is in factperforming the note that Bach assigned to the Tenor, the Tenor is performingBach’s Alto note, and the Alto performing Bach’s Bass note (for example). Inwhich case, though each part individually was wrong, the chord is completelycorrect.

7.6 Summary

In analysing all of the results we have obtained, there are definitely some strongconclusions that can be made regarding the impact made by certain features.

We began with what we hypothesised would best capture the informationrequired to choose the correct harmonisation. However, it is clear that theaddition of features that look at where the melody is going, and not simplywhere it has come from, gives the system a much better chance of pickingthe right chord for that point in the melody. Since Bach did in fact have themelody in its entirety when he built a chorale harmonisation, he too would havepresumably paid attention to such details. A conclusion that can be drawn fromthis is that real-time harmonisation is more difficult, not due to time constraints,but due to not knowing where the melody is heading.

Another extremely interesting observation is the way in which the system’saccuracy improved when the previous harmonisation chords were left out of thefeature vector. It is reasonable to expect that paying attention to the way inwhich a chord progression is forming would benefit a harmonisation system (and,perhaps if we analysed our results by listening to them we would discover thatthe harmonisations produced using previous chord features are a lot ‘smoother’).However, when more than half of the chords we are choosing are, in fact, wrong,it is understandable that such mistakes in the feature vectors would make itdifficult to use our classification model accurately.

Some interesting observations can also be made that give some insight intothe nature of the two algorithms used — IB1 and IGTREE. There are twofeature sets that produced the best results — f7 and f8. The former producedthe best results for IB1, and the latter produced the best results for IGTREE(and, the best combined results). The difference between these two sets is thatf7 includes some extra features that capture more time-domain information andcontext. It seems that the IB1 algorithm may be more capable of picking whichfeatures to use, since the performance of this algorithm was very high for bothsets, while IGTREE gained accuracy with the removal of those features uniqueto f7.

41

Chapter 8

Discussion

8.1 Conclusion

The ultimate aim of this project, as stated in Chapter 2 was to build a systemthat produces harmony that is identical to the harmony Bach created for thesame melody.

Though our exact match percentages were not particularly high (by the stan-dards of most common classification tasks), we did considerably outperform ourset baseline. We were also able to gain statistically significant improvementsusing different feature encodings and combinations. Achieving best results ofapproximately 40% accuracy is reasonable, especially when using the most fre-quent chord only yielded accuracy of around 9%. This does suggest that oursystem was successful in learning something about harmonisation.

It was also interesting to note the impact had by certain features. In par-ticular, looking ahead at the coming melody notes had an impact on how anote should be harmonised. On the other hand, considering preceding chordprogressions made predicting the next chord more difficult. We also observedthe positive impact of considering the duration of notes when harmonising. Wealso determined the impact of choosing how to best encode features of music sothat the desired information is accurately represented.

While it was infeasible to perform a qualitative, aural evaluation of all ofthe harmonisations produced by the system, converting some of the harmonisa-tions produced by the initial experiments to MIDI did allow for some informallistening. It was certainly apparent that the produced harmonisations were in aBaroque style. It was also apparent that, in most cases, the harmonisations didsound ‘good’. Although, the concern regarding determining a piece’s tonalitydid seem to have a negative impact in some pieces (those that were, in fact,built in a minor scale) and it is difficult to say how much our system (and ourempirical results) would have improved if the piece’s key was treated differently(or, if only major pieces were used). However, since this method of evaluationwas informal, and not carried out empirically, little more can be said about theobservations made.

Another aim that was mentioned in Chapter 2 was to finish with a frameworkthat could easily be adapted to another corpus of data, such as jazz pieces. Whilewe have not actually attempted to use another corpus, this system is ready to

42

be used on any set of **kern files.Though with somewhat less certainty, it can also be said that we have learnt

some things about the way in which Bach approached harmonisation many yearsago — or, at least, the nature of harmonisation. In Section 3.2.1, we explainedthe two alternative approaches to harmonisation. Since our classifier had suchlittle success in choosing one of 13 possible classes for the middle individualparts (namely Alto and Tenor) when compared to choosing the entire chordin one go, it seems that the Polyphony approach (whereby the harmonisingparts are treated as melodies themselves) is inappropriate for simulating Bachharmonisation and, perhaps, not the approach adopted by the man himself.

8.2 Future Work

Different methods of evaluation have been proposed in Chapter 6, however notall have been adequately attempted. It would be interesting to see how the har-monisations produced by our system would be evaluated by a group of trainedlisteners — that is, people with a strong knowledge of Bach’s composition.

Additionally, while we considered implementing a polyphonic approach toharmonisation, and conducted some related experiments, we have discussed theshortfalls of our experiment and the potential success of this approach. Ofcourse, our conclusion that it is not easy to generate an entire Tenor or Altopart to harmonise a Soprano melody stands. We have shown that the resultsare far worse than accurately choosing a chord. However, it is worth exploringdifferent approaches that see the three harmonising parts treated separately(and, different methods of evaluating/combining the results).

As mentioned above, the framework of our system could easily be applied toattempt the automatic harmonisation of melodies in a different style. It wouldbe interesting to see how successful the approach we have taken would be whenapplied to another genre of music. This, perhaps, would also give an insightinto how original Bach was with each chorale harmonisation, in comparison toother composers or collections of work.

In Section 4.2.1, concerns regarding the failure to consider the mathematicsof music were discussed. This failure to recognise the importance of mathematicsin music seems unavoidable when considering the task at hand, and the MLsoftware available. Treating the current features used as numerical values wouldresult in an oversimplification of the relationships between values. A far morecomplex encoding, however, may be more successful. For example, encodingpitch as the frequency of the sound wave that produces the note may be anoption.

The issue of determining, and properly using, a piece’s key has been men-tioned a number of times. All of our system’s harmonisations were built relativeto the piece’s tonic. The fact that, for each piece that was mistakenly judgedto be in its actual key’s relative major key, the tonic used was wrong, suggeststhat this is an issue that should be addressed.

It seems that this task of automatic Key Induction (that is, determining apiece’s tonal centre by observing the music itself) has scarcely been approachedby computer music researchers. This task could well be worthy of a year-longproject in its own right. A possible solution would be to approach this as aclassification task also. If, as was the case with some of our chorale corpus, the

43

key is narrowed down to being either a major key or its relative minor (possibleif we are given the actual key signature — that is, the number of ]s or [s), thensimply using note counts, and note combination observations, would perhaps beenough to confidently make a decision.

The more difficult task is determining a piece’s tonal centre based only onthe notes used and no other score information. This could be approached ina similar manner. The task could be broken down into two subtasks. Firstlyfinding the tonic, and then determining the major/minor tonality in a secondaryphase. Alternatively, the first phase could determine the sharps and flats used,followed by a phase to determine the tonic and, hence, the key.

44

Appendix A

Chorale Datasets

Below are the ten data sets used for our tenfold cross-validation.

sub1:bwv0263.krn bwv0294.krn bwv0324.krn bwv0354.krn bwv0384.krn bwv0414.krn

chor032.krn chor272.krn bwv0273.krn bwv0304.krn bwv0334.krn bwv0364.krnbwv0394.krn bwv0424.krn chor117.krn chor350.krn bwv0284.krn bwv0314.krnbwv0344.krn bwv0374.krn bwv0404.krn bwv0434.krn chor187.krn

sub2:bwv0255.krn bwv0296.krn bwv0336.krn bwv0376.krn bwv0416.krn chor136.krn

bwv0265.krn bwv0306.krn bwv0346.krn bwv0386.krn bwv0426.krn chor201.krnbwv0275.krn bwv0316.krn bwv0356.krn bwv0396.krn bwv0436.krn chor276.krnbwv0286.krn bwv0326.krn bwv0366.krn bwv0406.krn chor048.krn






bwv0268.krn bwv0309.krn bwv0349.krn bwv0389.krn bwv0429.krn chor224.krnbwv0278.krn bwv0319.krn bwv0359.krn bwv0399.krn chor009.krn chor299.krnbwv0289.krn bwv0329.krn bwv0369.krn bwv0409.krn chor069.krn

45











46

Bibliography

Moray Allan and Christopher K. I. Williams. Harmonising chorales by proba-bilistic inference. In Advances in Neural Information Processing Systems 17.MIT Press, 2005.

Kaan M. Biyikoglu. A markov model for chorale harmonization. In Proceedingsof the 5th Triennial ESCOM Conference, 2003.

Darrell Conklin and Ian H. Witten. Multiple viewpoint systems for musicalprediction. Journal of New Music Research, 24:51–73, 1995.

Hermann Hild, Johannes Feulner, and Wolfgang Menzel. HARMONET: A neu-ral net for harmonizing chorales in the style of J.S. Bach. In Advances inNeural Information Processing 4 (NIPS 4), 1992.

Tom Mitchell. Machine Learning. McGraw Hill, 1997.

Tom Mitchell. The discipline of machine learning. Whitepaper, 2006.

Francois Pachet and Pierre Roy. Musical harmonization with constraints: Asurvey. Constraints, 6(1):7–19, 2001.

J. Ross Quinlan. C4.5: Programs for machine learning. Morgan KaufmannPublishers Inc., 1993.

Roger Scruton. The Aesthetics of Music. Oxford University Press, 1997.

47

Chorale Harmonisation in the Style of J.S. Bach A Machine...

Documents

Transcript of Chorale Harmonisation in the Style of J.S. Bach A Machine...