A Statistical Analysis of Tonal Harmony

6
A Statistical Analysis of Tonal Harmony By David Temperley 2009 ------------ Overview It is generally believed that harmony in common-practice music (i.e. 18th and 19th century Western art music) is characterized by certain basic principles. Dominant harmonies (V and vii) go to tonics (I), predominants (IV and ii) go to dominants, root motion by descending fifth is especially favored, and so on. But to what extent are these principles actually followed in common-practice composition? There has been surprisingly little empirical study of this question. [1] This page presents a statistical analysis of harmonic progressions in a corpus of common-practice music. The data files and programs used can be downloaded at the bottom of the page. The data comes from the workbook accompanying Stefan Kostka and Dorothy Payne's theory textbook Tonal Harmony, 3rd edition (McGraw-Hill, 1995). The workbook contains a number of excerpts of common-practice pieces, to be analyzed by the student; an accompanying instructor's manual contains "correct" analyses done by the textbook authors, in conventional Roman numeral notation. The analyses also show modulations, and represent each chord in relation to the local key. I created a corpus consisting of all of the analyzed excerpts in the workbook of 8 measures of more in length; there were 46 such excerpts. I call this the "Kostka-Payne corpus." (A list of the excerpts is shown here .) I created midifiles and "notefiles" (textfiles listing the notes with pitches and on/off times) of all the excerpts. (This was done in connection with the testing of the Melisma music analysis system; the notefiles and midifiles are available at the Melisma ftp site .) The harmonic analyses of the excerpts were computationally encoded by Bryan Pardo, and added to the midifiles (these midifiles are available at Pardo's website ). I then converted Pardo's analyses into another format, which I call "chord-list" format. The beginning of a chord-list (for the opening of the Minuet in G major from the Notebook for Anna Magdalena Bach) is shown here: 0.000 2.608 - 0 1 7 7 2.608 3.913 - 5 4 7 0 3.913 5.217 - 0 1 7 7 5.217 6.521 - 11 7 7 6 Each line represents a chord segment. The first number indicates the beginning of the segment, in seconds. (For each excerpt, I chose a tempo that I thought was reasonable, and then generated times for the chord segments using this tempo.) The second number represents the end time of the segment. Following this are four integers. The first is the "chromatic relative root": the chromatic interval from the root to the tonic. I use the usual pitch-class notation for intervals: I = 0, bII (or #I) = 1, II = 2, etc. The second integer indicates the "diatonic relative root" - the Roman numeral number (I = 1, bII = 2, II = 2, etc.). The third number indicates the tonic (assuming the usual pitch-class notation: C = 0, Db/C# = 1, etc.), and the fourth number indicates the _absolute_ root (again assuming the usual pitch-class notation). So the first chord statement above indicates I in the key of G major - a G major chord, in absolute terms. (Applied chords were relabeled in relation to the local key: for example, V/V was converted to II.) Note that this format contains no information about the quality of chords (major/minor/diminished) or extensions (e.g. sevenths, ninths). This information is available in Pardo's midifiles, but I did not encode it. [2]

description

Statistic on music

Transcript of A Statistical Analysis of Tonal Harmony

Page 1: A Statistical Analysis of Tonal Harmony

A Statistical Analysis of Tonal Harmony

By David Temperley2009

------------

Overview

It is generally believed that harmony in common-practice music (i.e. 18th and 19th century Western artmusic) is characterized by certain basic principles. Dominant harmonies (V and vii) go to tonics (I),predominants (IV and ii) go to dominants, root motion by descending fifth is especially favored, and soon. But to what extent are these principles actually followed in common-practice composition? There hasbeen surprisingly little empirical study of this question. [1]

This page presents a statistical analysis of harmonic progressions in a corpus of common-practice music.The data files and programs used can be downloaded at the bottom of the page.

The data comes from the workbook accompanying Stefan Kostka and Dorothy Payne's theory textbookTonal Harmony, 3rd edition (McGraw-Hill, 1995). The workbook contains a number of excerpts ofcommon-practice pieces, to be analyzed by the student; an accompanying instructor's manual contains"correct" analyses done by the textbook authors, in conventional Roman numeral notation. The analysesalso show modulations, and represent each chord in relation to the local key.

I created a corpus consisting of all of the analyzed excerpts in the workbook of 8 measures of more inlength; there were 46 such excerpts. I call this the "Kostka-Payne corpus." (A list of the excerpts is shownhere.) I created midifiles and "notefiles" (textfiles listing the notes with pitches and on/off times) of all theexcerpts. (This was done in connection with the testing of the Melisma music analysis system; thenotefiles and midifiles are available at the Melisma ftp site.) The harmonic analyses of the excerpts werecomputationally encoded by Bryan Pardo, and added to the midifiles (these midifiles are available atPardo's website). I then converted Pardo's analyses into another format, which I call "chord-list" format.The beginning of a chord-list (for the opening of the Minuet in G major from the Notebook for AnnaMagdalena Bach) is shown here:

0.000 2.608 - 0 1 7 7 2.608 3.913 - 5 4 7 0 3.913 5.217 - 0 1 7 7 5.217 6.521 - 11 7 7 6

Each line represents a chord segment. The first number indicates the beginning of the segment, inseconds. (For each excerpt, I chose a tempo that I thought was reasonable, and then generated times forthe chord segments using this tempo.) The second number represents the end time of the segment.Following this are four integers. The first is the "chromatic relative root": the chromatic interval from theroot to the tonic. I use the usual pitch-class notation for intervals: I = 0, bII (or #I) = 1, II = 2, etc. Thesecond integer indicates the "diatonic relative root" - the Roman numeral number (I = 1, bII = 2, II = 2,etc.). The third number indicates the tonic (assuming the usual pitch-class notation: C = 0, Db/C# = 1,etc.), and the fourth number indicates the _absolute_ root (again assuming the usual pitch-class notation).So the first chord statement above indicates I in the key of G major - a G major chord, in absolute terms.(Applied chords were relabeled in relation to the local key: for example, V/V was converted to II.)

Note that this format contains no information about the quality of chords (major/minor/diminished) orextensions (e.g. sevenths, ninths). This information is available in Pardo's midifiles, but I did not encodeit. [2]

Page 2: A Statistical Analysis of Tonal Harmony

The file kp-chord-list contains the chord-lists for the complete KP corpus. The title of each excerpt (usingthe short names shown in the corpus list) is indicated at the beginning of the excerpt. Dotted lines "---"separate one key section from another. ("Pivot chords" - chords at key boundaries that function in both theprevious key and the following one - are represented in both key sections.) I also separated the corpus intomajor-key and minor-key key sections; the file kp-chord-list-ma includes just the major-key ones, and kp-chord-list-mi includes just the minor-key ones.

A few chords in the corpus were given chord symbols for which there is no widely accepted root, such as"German 6th". For such chords, the label -1 is used for the chromatic, diatonic, and absolute roots.

Some Aggregate Statistics

Once I had the KP corpus in "chord-list" form, I then wrote a perl-script, tally.pl, which extracts variouskinds of aggregate statistics.

The corpus contains 919 chords, and a total time of 1354.116 seconds.

First I extracted the total count of each chromatic relative root, and the total amount of time spent on thatroot.

proportion total excluding timeRoot count proportion tonic (secs) proportionI 318 0.346 --- 553.792 0.409bII 17 0.018 0.029 29.805 0.022II 104 0.113 0.180 118.766 0.088bIII 10 0.011 0.017 16.668 0.012III 21 0.023 0.036 25.104 0.019IV 70 0.076 0.121 91.622 0.068#IV 17 0.018 0.029 18.652 0.014V 214 0.233 0.370 302.102 0.223bVI 34 0.037 0.059 44.383 0.033VI 50 0.054 0.087 76.706 0.057bVII 6 0.007 0.010 8.301 0.006VII 35 0.038 0.061 37.552 0.028

(The first "proportion" column shows the count of the chord as a proportion of the total count; the second"proportion" column shows the time spent on the chord as a proportion of the total time.)

There were also 23 "miscellaneous" chords, not assigned any explicit root (such as augmented-sixthchords), taking a total time of 30.663 seconds. (These are assigned chromatic root of -1 in the chord list;diatonic root and absolute root are also -1.)

Then I looked at the "chord transitions" -- the number of times each chord moves to each other chord."Antecedent" chords are shown on the vertical axis, "consequent" chords on the horizontal; for example,the number of occurrences of I moving to II is 31. (The data only reflects transitions within a single keysection; no transition is recorded for moves from one key section to another.)

CHROMATIC ROOT TRANSITION COUNTS

Cons I bII II bIII III IV #IV V bVI VI bVII VIIAnt I 0 7 31 1 4 45 2 116 11 17 3 19 bII 3 0 8 0 0 0 1 2 0 0 0 1 II 22 3 0 1 4 1 7 45 2 8 0 6 bIII 1 1 0 0 0 0 0 4 4 0 0 0 III 1 0 2 0 0 7 0 1 0 7 0 1 IV 32 2 10 0 4 0 3 11 0 1 1 4 #IV 7 0 0 0 0 0 0 9 0 0 0 0

Page 3: A Statistical Analysis of Tonal Harmony

V 167 0 8 1 2 4 0 0 7 6 0 2 bVI 5 2 8 0 1 3 0 2 0 3 2 0 VI 4 2 28 0 1 4 2 1 0 0 0 1 bVII 0 0 0 5 0 0 0 1 0 0 0 0 VII 27 0 0 0 3 0 1 1 1 0 0 0

It is useful to represent this data in two other ways. First, we represent chromatic root transitions as aproportion of the total count for the consequent chord. The values in each column sum to 1; thus one cansee, for example, that I is approached by V 62.1% of the time.

CHROMATIC ROOT TRANSITIONS AS PROPORTION OF COUNT FOR CONSEQUENT CHORD

Cons I bII II bIII III IV #IV V bVI VI bVII VIIAnt I 0.000 0.412 0.326 0.125 0.211 0.703 0.125 0.601 0.440 0.405 0.500 0.559 bII 0.011 0.000 0.084 0.000 0.000 0.000 0.062 0.010 0.000 0.000 0.000 0.029 II 0.082 0.176 0.000 0.125 0.211 0.016 0.438 0.233 0.080 0.190 0.000 0.176 bIII 0.004 0.059 0.000 0.000 0.000 0.000 0.000 0.021 0.160 0.000 0.000 0.000 III 0.004 0.000 0.021 0.000 0.000 0.109 0.000 0.005 0.000 0.167 0.000 0.029 IV 0.119 0.118 0.105 0.000 0.211 0.000 0.188 0.057 0.000 0.024 0.167 0.118 #IV 0.026 0.000 0.000 0.000 0.000 0.000 0.000 0.047 0.000 0.000 0.000 0.000 V 0.621 0.000 0.084 0.125 0.105 0.062 0.000 0.000 0.280 0.143 0.000 0.059 bVI 0.019 0.118 0.084 0.000 0.053 0.047 0.000 0.010 0.000 0.071 0.333 0.000 VI 0.015 0.118 0.295 0.000 0.053 0.062 0.125 0.005 0.000 0.000 0.000 0.029 bVII 0.000 0.000 0.000 0.625 0.000 0.000 0.000 0.005 0.000 0.000 0.000 0.000 VII 0.100 0.000 0.000 0.000 0.158 0.000 0.062 0.005 0.040 0.000 0.000 0.000

Now the same for the antecedent chord. Now each row sums to 1. For example, I moves to V .453 of thetime.

CHROMATIC ROOT TRANSITIONS AS PROPORTION OF COUNT FOR CONSEQUENT CHORD

Cons I bII II bIII III IV #IV V bVI VI bVII VIIAnt I 0.000 0.027 0.121 0.004 0.016 0.176 0.008 0.453 0.043 0.066 0.012 0.074 bII 0.200 0.000 0.533 0.000 0.000 0.000 0.067 0.133 0.000 0.000 0.000 0.067 II 0.222 0.030 0.000 0.010 0.040 0.010 0.071 0.455 0.020 0.081 0.000 0.061 bIII 0.100 0.100 0.000 0.000 0.000 0.000 0.000 0.400 0.400 0.000 0.000 0.000 III 0.053 0.000 0.105 0.000 0.000 0.368 0.000 0.053 0.000 0.368 0.000 0.053 IV 0.471 0.029 0.147 0.000 0.059 0.000 0.044 0.162 0.000 0.015 0.015 0.059 #IV 0.438 0.000 0.000 0.000 0.000 0.000 0.000 0.562 0.000 0.000 0.000 0.000 V 0.848 0.000 0.041 0.005 0.010 0.020 0.000 0.000 0.036 0.030 0.000 0.010 bVI 0.192 0.077 0.308 0.000 0.038 0.115 0.000 0.077 0.000 0.115 0.077 0.000 VI 0.093 0.047 0.651 0.000 0.023 0.093 0.047 0.023 0.000 0.000 0.000 0.023 bVII 0.000 0.000 0.000 0.833 0.000 0.000 0.000 0.167 0.000 0.000 0.000 0.000 VII 0.818 0.000 0.000 0.000 0.091 0.000 0.030 0.030 0.030 0.000 0.000 0.000

As a final analysis, we consider the counts of different root interval motions. The left column belowshows each chromatic interval (+m2 = ascending minor second, +M2 = ascending major second, etc.)along with its count. The right column groups these into diatonic intervals. (Each interval is representedby its smallest possible form; so a descending fifth is represented as an ascending fourth, +P4.)

INTERVAL COUNTS

Chromatic Diatonic+m2 72 +M/m2 127+M2 55+m3 7 +M/m3 32+M3 25+P4 308 +P4 308-TT 25 TT 25-P4 167 -P4 167

Page 4: A Statistical Analysis of Tonal Harmony

-M3 21 -M/m3 64-m3 43-M2 34 -M/m2 65-m2 31

Discussion

To a considerable extent, the conventional rules of harmony are supported by this data. This is perhapsmost clearly seen in the table of root transition counts. The most common root motions, in order, are V-I,I-V, ii-V, and I-IV (the last two are equally common). All of these are standard, "correct" progressions oftonal harmony. "Incorrect" progressions such as V-IV are generally less common.

A few things are surprising. In particular, the frequencies of ii-I and IV-I are surprisingly high. Both ofthese represent "predominant-to-tonic" motions and are generally considered undesirable. IV-Iprogressions do occur in certain circumstances (such as plagal cadences and I-IV-I motions expanding anopening I) but their frequency here seems high. This appears to be largely due to cadential 6/4 chords; thisis discussed further below.

The interval counts are also of interest. Traditional theory holds that certain intervallic root motions arepreferred over others: descending fifths are most preferred (strongly favored over ascending fifths),descending thirds over ascending thirds, and ascending seconds over descending seconds. This dataclearly shows all three of these preferences: descending fifths (+P4, 308) are much more common thanascending fifths (-P4, 167), descending thirds (65) are more common than ascending (32), and ascendingseconds (127) are more common than descending (65). Overall, fourths are by far the most common(475); seconds (192) are much more common than thirds (96), and tritones least common of all (25).

Aggregate Statistics (with Cadential 6/4's Reanalyzed)

A close inspection of the data revealed that the oddities noted above -- the high frequency of ii-I and IV-I -- were largely due to cadential 6/4 chords. Cadential 6/4's, which are extremely common in the KP corpus(and in common-practice music generally), are analyzed in the Kostka-Payne text in a "two-level"fashion: A I6/4-V is placed inside a larger V. (This is in fact a common convention; under this convention,the cadential 6/4 is labeled as V6/4.) The encoding of the data by Pardo reflected the lower level (I6/4-V),and the data presented above reflects that as well. However, cadential 6/4's are frequently (indeednormally) preceded by II or IV; thus it seemed likely that this largely accounted for the high frequency ofII-I and IV-I motions. I thought that using the "V6/4" analysis might permit the conventional principles oftonal harmony to emerge more strongly. (This is surely one reason why many people prefer the V6/4analysis.)

The data was therefore recoded, using the higher-level (V) analysis of cadential 6/4's. That is, every twochord statements representing a cadential I6/4 followed by a V were replaced by a single statementrepresenting V. The modified chord-list is kp-chord-list-2. Consider just the transition table:

Cons I bII II bIII III IV #IV V bVI VI bVII VIIAnt I 0 7 31 1 4 45 2 84 11 17 3 19 bII 2 0 8 0 0 0 1 3 0 0 0 1 II 5 3 0 1 4 1 7 62 2 8 0 6 bIII 1 1 0 0 0 0 0 4 4 0 0 0 III 1 0 2 0 0 7 0 1 0 7 0 1 IV 27 2 10 0 4 0 3 16 0 1 1 4 #IV 3 0 0 0 0 0 0 13 0 0 0 0 V 166 0 8 1 2 4 0 0 7 6 0 2 bVI 3 2 8 0 1 3 0 4 0 3 2 0 VI 4 2 28 0 1 4 2 1 0 0 0 1 bVII 0 0 0 5 0 0 0 1 0 0 0 0 VII 26 0 0 0 3 0 1 2 1 0 0 0

Page 5: A Statistical Analysis of Tonal Harmony

The recoding of cadential 6/4's has a significant effect. The count of II-I is reduced from 22 to 5; the countof IV-I is reduced from 32 to 27. The top 10 transitions are now V-I; I-V; II-V; I-IV; I-II; VI-II; IV-I; VII-I; I-VII; I-VI.

Once the "V6/4" analysis of cadential 6/4's is assumed, the conventional principles of tonal harmonyappear to be very strongly confirmed. Not a very earth-shattering conclusion (which is why I decided toput this in a web page rather than trying to publish it!) but I think it's good to know.

A number of other comments could be made about this data. For example, compare the transitionalfrequency of IV-II (10) to II-IV (1); IV-II is much more common, again confirming a conventional rule.But I will leave further explorations to the reader. The reader could also use tally.pl to reproduce thesestatistics, and to gather further statistics from the chord lists provided -- for example, analyzing major andminor key sections separately. (In fact, the differences between the major and minor key distributions arefairly modest. Perhaps this should not surprise us, since the primary tonic/dominant/predominantharmonies - I, V, II, IV - are the same in both modes, and function similarly.)

Notes

1. A few sources deserve mention. Helen Budge's (1943) dissertation, "A Study of Chord FrequenciesBased on the Music of Representative Composers of the Eighteenth and Nineteenth Centuries," presentsan interesting statistical analysis of tonal harmony, systematically gathered from analyses by experts. Butonly data on the frequency of individual (diatonic) chords is provided; there is no data about transitions(motions from chord to chord). Allen Irvine McHose's (1947) study "The Contrapuntal HarmonicTechnique of the 18th Century" offers occasional statistics about the frequency of various chords andprogressions, but presents no complete data (such as tables of chord or progression frequencies). PhilipNorman's 1945 study "A Quantitative Study of Harmonic Similarities in Certain Specified Works of Bach,Beethoven, and Wagner" has statistics about chord progressions, but he assumes a new chord on everynote - that is, he makes no allowance for non-chord-tones; this goes against the modern practice ofharmonic analysis. Dmitri Tymoczko's paper "Root Motion, Function, Scale Degree" (Musurgia 2005,available in English at Tymoczko's website) analyzes a set of progressions from major-key Bach chorales.Finally, David Huron, in his book Sweet Anticipation (2006), presents data about chord transitions for "asample of Baroque music" (pp. 250-1; no further information is given about the sample).

2. The mftext program available at the Melisma website) can be used to extract the chord labels fromPardo's midifiles. While I have not analyzed the labels in detail with regard to mode and inversion, I didextract a few basic statistics. There are 949 chord labels total (this is slightly greater than my count, sincein Pardo's annotations, there may be two chords of the same root and key in succession). Chords built onmajor triads (including seventh chords that contain major triads, e.g. dominant sevenths) are 68.3% of thetotal; those built on minor triads, 21.2%; those built on diminished triads, 9.9%. Root-position chords are60.7 of the total; first-inversion, 23.3%; second inversion, 12.9%; third inversion, 3.1%.

Downloads

List of excerpts in the Kostka-Payne corpus

kp-nbck This directory contains "note-beat-chord-key" files for all excerpts in the corpus: A list of notes("Note [ontime] [offtime] [pitch]"), beats ("Beat [time] [level]"), chords ("Chord [ontime] [offtime][root]") and key sections ("Key [start time] [end time] [tonic] [mode:ma=0,mi=1]"). I made these as anintermediate step towards making the "chord-lists" below. These files bring together the "beat list" and"note list" formats that I used with the Melisma system (see the Melisma website for explanation) with theharmonic and key information from the Kostka-Payne analyses.

Chord list (list of chord statements) for the KP corpus

Page 6: A Statistical Analysis of Tonal Harmony

Chord list for the KP corpus, major key sections only

Chord list for the KP corpus, minor key sections only

Chord list for the KP corpus with the "V6/4" analysis of cadential 6/4 chords

The "V6/4" chord-list, major-key sections only

The "V6/4" chord-list, minor-key sections only

tally.pl, a perl script for extracting aggregate data from chord lists. (The tables presented above are alloutputs of tally.pl.)