Journal of New Music Research TempEst: Harpsichord ...simond/pub/2010/Tidhar-Fazekas-Mauc… ·...

11
PLEASE SCROLL DOWN FOR ARTICLE This article was downloaded by: [Dixon, Simon] On: 25 February 2011 Access details: Access Details: [subscription number 933744037] Publisher Routledge Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37- 41 Mortimer Street, London W1T 3JH, UK Journal of New Music Research Publication details, including instructions for authors and subscription information: http://www.informaworld.com/smpp/title~content=t713817838 TempEst: Harpsichord Temperament Estimation in a Semantic Web Environment Dan Tidhar a ; György Fazekas a ; Matthias Mauch b ; Simon Dixon a a Queen Mary University of London, UK b National Institute for Advanced Industrial Science and Technology, Japan Online publication date: 19 February 2011 To cite this Article Tidhar, Dan , Fazekas, György , Mauch, Matthias and Dixon, Simon(2010) 'TempEst: Harpsichord Temperament Estimation in a Semantic Web Environment', Journal of New Music Research, 39: 4, 327 — 336 To link to this Article: DOI: 10.1080/09298215.2010.520720 URL: http://dx.doi.org/10.1080/09298215.2010.520720 Full terms and conditions of use: http://www.informaworld.com/terms-and-conditions-of-access.pdf This article may be used for research, teaching and private study purposes. Any substantial or systematic reproduction, re-distribution, re-selling, loan or sub-licensing, systematic supply or distribution in any form to anyone is expressly forbidden. The publisher does not give any warranty express or implied or make any representation that the contents will be complete or accurate or up to date. The accuracy of any instructions, formulae and drug doses should be independently verified with primary sources. The publisher shall not be liable for any loss, actions, claims, proceedings, demand or costs or damages whatsoever or howsoever caused arising directly or indirectly in connection with or arising out of the use of this material.

Transcript of Journal of New Music Research TempEst: Harpsichord ...simond/pub/2010/Tidhar-Fazekas-Mauc… ·...

PLEASE SCROLL DOWN FOR ARTICLE

This article was downloaded by: [Dixon, Simon]On: 25 February 2011Access details: Access Details: [subscription number 933744037]Publisher RoutledgeInforma Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

Journal of New Music ResearchPublication details, including instructions for authors and subscription information:http://www.informaworld.com/smpp/title~content=t713817838

TempEst: Harpsichord Temperament Estimation in a Semantic WebEnvironmentDan Tidhara; György Fazekasa; Matthias Mauchb; Simon Dixona

a Queen Mary University of London, UK b National Institute for Advanced Industrial Science andTechnology, Japan

Online publication date: 19 February 2011

To cite this Article Tidhar, Dan , Fazekas, György , Mauch, Matthias and Dixon, Simon(2010) 'TempEst: HarpsichordTemperament Estimation in a Semantic Web Environment', Journal of New Music Research, 39: 4, 327 — 336To link to this Article: DOI: 10.1080/09298215.2010.520720URL: http://dx.doi.org/10.1080/09298215.2010.520720

Full terms and conditions of use: http://www.informaworld.com/terms-and-conditions-of-access.pdf

This article may be used for research, teaching and private study purposes. Any substantial orsystematic reproduction, re-distribution, re-selling, loan or sub-licensing, systematic supply ordistribution in any form to anyone is expressly forbidden.

The publisher does not give any warranty express or implied or make any representation that the contentswill be complete or accurate or up to date. The accuracy of any instructions, formulae and drug dosesshould be independently verified with primary sources. The publisher shall not be liable for any loss,actions, claims, proceedings, demand or costs or damages whatsoever or howsoever caused arising directlyor indirectly in connection with or arising out of the use of this material.

TempEst: Harpsichord Temperament Estimation in a Semantic Web

Environment

Dan Tidhar1, Gyorgy Fazekas1, Matthias Mauch2, and Simon Dixon1

1Queen Mary University of London, UK; 2National Institute for Advanced Industrial Science and Technology, Japan

Abstract

Issues concerning tuning and temperament bear relevanceto music research in areas such as historical musicology,performance and recording studies, and music perception.We have recently demonstrated that it is possible toclassify keyboard temperament automatically from audiorecordings of standard musical works to the extent ofaccurately distinguishing between six different tempera-ments often used in harpsichord recordings.

The current paper extends this work by combiningdigital signal processing with semantic computing anddemonstrates the use of the temperament classifier in aSemantic Web environment. We present the Tempera-ment Ontology which models the main concepts,relationships, and parameters of musical temperament,and facilitates the description and inference of variouscharacteristics of specific temperaments. We then des-cribe TempEst, a web application for temperamentestimation. TempEst integrates the classifier with ontol-ogy-based information processing in order to provide anextensible online service, which reports the class andproperties of both known and unknown temperaments.TempEst allows users to upload harpsichord recordings,and provides them with an estimated temperament aswell as other inferred characteristics of the instrument’stuning.

1. Introduction

In recent years, historical performance practice of earlymusic on period instruments has become well-establishedas part of mainstream scholarship, musicianship, as wellas public music consumption. As part of this process, the

hegemony of equal temperament as the main paradigmfor tuning keyboard instruments has been weakened, andincreasing attention is being directed towards historical,unequal temperaments (e.g. Lehman, 2005). Algorithmsfor music visualization and synthesis using arbitrarytemperaments are available in software such as Scala,1

however, the idea of analysing and extracting tempera-ment from audio recordings has not yet been explored insuch applications.

There are many ways in which automatic tempera-ment estimation can be useful to musicians, musicolo-gists, and listeners. For educational purposes such as eartraining and studying different temperaments in thecontext of real recordings, it would be useful to have amusic database which can perform retrieval according totemperament. Professional users such as keyboard tunersand performers would be assisted by a system providingfeedback about tuning accuracy and stability, as well ashelping to classify creative temperaments (i.e. thosewhich do not strictly adhere to any known recipe) anddetermine some of their properties. Furthermore, musi-cologists are likely to find a temperament estimator to bea useful tool when studying performance practice fromrecordings. Advances in music signal processing haveenabled the automation of many aspects of the analysisof music recordings. In particular, our recent work onkeyboard temperament estimation (Tidhar, Mauch, &Dixon, 2010) shows that it is possible to design anautomatic system to recognize temperament directlyfrom audio recordings of unknown works. We presenteda classifier which is capable of distinguishing, withhigh accuracy, between six different temperaments

Correspondence: Dan Tidhar, Centre for Digital Music, Queen Mary University of London, London, E1 4NS, UK.E-mail: [email protected]

1See Scala home page: http://www.huygens-fokker.org/scala

Journal of New Music Research2010, Vol. 39, No. 4, pp. 327–335

DOI: 10.1080/09298215.2010.520720 � 2010 Taylor & Francis

Downloaded By: [Dixon, Simon] At: 20:56 25 February 2011

commonly used in harpsichord recordings. Although theclassification task did not rely on any prior knowledge ofthe score, it did assume that the harpsichord was tuned toone of a pre-defined set of six temperaments. Thislimitation is reasonable in an experimental setting whichis, to the best of our knowledge, the first of its kind.However, in order to obtain an open-ended system whichcan be used for a public temperament estimation webservice, a more general classifier is required, one which isable to cope with both standard and non-standard (i.e.unknown to the system) temperaments. The OMRAS2project2 provides a framework for describing andinterlinking music related concepts and resources onthe Semantic Web, including applications for web-basedmusic signal processing. This framework offers an idealcontext for designing and implementing such a general-ized, open-ended classifier. In the OMRAS2 framework,we employ Semantic Web technologies. We representinformation about music using standardized languagessuch as Resource Description Framework (RDF)3 andstructures having standardized semantics. For a thor-ough review of these technologies, languages and theirapplications in the music domain see Fazekas, Raimond,Jacobson, and Sandler (2010).

Central to our framework is the Music Ontology(Raimond, Abdallah, Sandler, & Frederick, 2007), whichprovides a basis for communicating musical informationon the Web. It is designed to be extensible in order todescribe specific sub-domains of musical knowledge suchas acoustical features of audio recordings or musicolo-gical terms. As part of the work reported here, weprovide one such extension, namely, the TemperamentOntology.4 This ontology provides means for describingthe properties of temperaments and the relationshipsbetween temperaments and temperament families. Theontology is accompanied by a web service which providesdescriptions of an increasing set of individual tempera-ments. For example, the RDF description of the Vallottitemperament may be accessed at the following URL:http://purl.org/temperament/symbol/Vallotti.

Based on our previous work on temperament estima-tion, the Temperament Ontology, and various otherOMRAS2 components, we developed a web applicationcalled TempEst. The application enables users to uploada recording for classification, and receive results in RDFformat describing the identified temperament accordingto the Temperament Ontology. The scope of the classifier

is dynamically determined by the set of temperamentdescriptions provided by the data service using theontology at any given moment. TempEst includes asimple rule-based inference module which enables it toaccommodate temperaments not yet described using theontology, and to produce useful statements abouttemperaments even if results are inconclusive due topartial or missing data. A set of inference rules which aredescribed in the section TempEst inference below arecurrently in use.5 The temperament classification algo-rithm in TempEst is implemented using Vampy,6 aPython7 wrapper for the Vamp audio analysis pluginsystem (Cannam, 2009), and exposed on the Web as partof the Sonic Annotator Web Application (SAWA). Adetailed description of the SAWA system is provided inFazekas, Cannam, and Sandler (2009).

The remainder of this article is organized as follows.The rest of this section provides a brief introduction tothe subject matter of temperament and temperamentestimation. The following section presents the Tempera-ment Ontology, describing its design and giving examplesof how specific temperaments may be expressed accord-ing to the ontology. We then describe TempEst and itsvarious components, including the methods used to inferthe properties of arbitrary temperaments. The finalsection contains conclusions from the work and describesdirections for future work.

1.1 Temperament

Theoretical and practical aspects of temperament arecovered thoroughly elsewhere (Barbour, 2004/1951;Rasch, 2002; Di Veroli, 2009), so we shall provide onlya brief formulation of the problem here. Musicalconsonance has been explained in different theoreticalframeworks throughout the years (Helmholtz, 1863;Lundin, 1947; Terhardt, 1977; Sethares, 2004; Palisca &Moore, 2010), but at least since Pythagoras it has beengenerally accepted that for musical sounds with harmo-nic spectra the sensation of consonance correlates tosmall integer frequency ratios, and specifically super-particular ratios of the form (nþ 1)/n where n� 5.Continuous-pitch instruments as well as singers candynamically adapt their intonation to form perfectlyconsonant intervals if required. Fixed-pitch instrumentssuch as keyboard, some fretted, and some percussioninstruments, need to commit to a certain tuning schemefor the duration of a piece, if not an entire concert. Atleast in the Western musical tradition, this gives riseto the need for temperament, because one cannot2OMRAS2 web site: http://www.omras2.org

3RDF is a data model primarily used on the Semantic Web for

expressing simple statements in the form of subject-predicate-object. It has several different serialization formats such asXML/RDF and N3. See http://www.w3.org/TR/rdf-primer/for an introduction.4http://purl.org/ontology/temperament

5Due to the dynamic nature of the system, further rules may beadded to reflect accumulated experience.6VamPy is available from http://vamp-plugins.org/vampy. html7See www.python.org for details on Python, an increasingly

popular language in the scientific community.

328 Dan Tidhar et al.

Downloaded By: [Dixon, Simon] At: 20:56 25 February 2011

accommodate all pure intervals within an integer numberof octaves (see e.g. Schroeder, 2009).

Consider 12 fifth steps (each being seven semitones; apure fifth has a frequency ratio of 3/2) and seven octavesteps (each 12 semitones; frequency ratio 2/1). On akeyboard instrument both of these sequences of intervalslead to the same key, despite the fact that (3/2)12 4 27.The ratio between the two sides of this inequality isreferred to as the Pythagorean comma, and is roughlyequal to 53/52. (Another relevant comma is the ratiobetween four consecutive fifths and two octaves and amajor third, i.e. between (3/2)4 and (226 (5/4)), which iscalled the Syntonic comma, and is equal to 81/80. Forpractical tuning purposes, these commas are oftenconflated and are just referred to as ‘the comma’.)

One way of defining particular temperaments isaccording to the distribution of the comma along thecircle of fifths. Equal temperament, for example, can bedefined as such where each fifth on the circle isdiminished by exactly the same amount, i.e. 1/12 of a(Pythagorean) comma. Determining a temperament canbe regarded as an optimization problem, wherebykeeping the octaves pure (i.e. ‘closing’ the circle of fifths)is a major constraint, and various considerations lead todifferent compromises between pureness of fifths andpureness of major thirds. Among these considerations,are the key, or set of keys which should work well in thegiven temperament; a temperament is considered to‘work well’ for a given key if the most frequent harmonicintervals in the key (major thirds and to some extentfifths, most notably in tonic and dominant positions) arenot too far from their pure underlying frequency ratios,and are therefore perceived as consonant. Temperamentswhich work well for most keys, and are bearable in allkeys, are normally referred to as ‘well’ temperaments.Temperaments in which all fifths but one or two areequal to each other, are called regular temperaments.Equal temperament is the only temperament which isboth regular and well.

Next, we briefly describe the six temperamentsspecified in Table 1, namely, twelve-tone equal tempera-ment, Vallotti, fifth-comma, quarter-comma meantone,sixth-comma meantone, and just intonation. In equaltemperament, all 12 fifths are equally diminished by 1/12of a (Pythagorean) comma, resulting in 12 equalsemitones, each of frequency ratio 12

ffiffiffi

2p

. In a Vallottitemperament, six of the fifths are diminished by 1/6 of acomma each, and the other six fifths are left pure. Equaltemperament and Vallotti are graphically described inFigure 1. In the fifth-comma temperament we used, fiveof the fifths are diminished by a 1/5 comma each, and theremaining seven are pure. In a quarter-comma meantonetemperament, 11 of the fifths are shrunk by 1/4 of acomma, and the one remaining fifth is 7/4 of a commalarger than pure. In sixth-comma meantone, 11 fifths areshrunk by 1/6 of a comma, and the one remaining fifth is

5/6 comma larger than pure. In the just-intonationtemperament we used all tones are calculated as integerratios, given by the following vector which representstwelve chromatic tones above the reference A: (16/15,9/8, 6/5, 5/4, 4/3, 45/32, 3/2, 8/5, 5/3, 9/5, 15/8, 2/1). Thedeviations in cents8 from equal temperament of the fiveother temperaments we use are given in Table 1. Apartfrom being relatively common, this set of six tempera-ments represents different categories: equal temperamentis both well and regular, just intonation is neither wellnor regular, Vallotti and fifth-comma are well andirregular, and the two variants of meantone (quarterand sixth comma) are regular but not well.

1.2 Temperament estimation

Temperament estimation was first formulated as amusic signal processing task by Tidhar et al. (2010).Building a system to classify musical recordings bytemperament presents particular signal processing chal-lenges. First, the differences between temperaments aresmall, of the order of a few cents. For example, ifA¼ 415 Hz is used as the reference pitch, then middle Cmight have a frequency of 246.76, 247.46, 247.60,247.93, 248.23 or 248.99 Hz, based on the six exampletemperaments described above (see Table 1). To resolvethese frequencies in a spectrum, a window of severalseconds duration would be required, but this introducesother problems, since musical notes are not stationaryand generally do not last this long.

Table 1. Deviations (in cents) from equal temperament for the

six target temperaments we used in Tidhar et al. (2010). Theabbreviated temperament names below stand for: equaltemperament, Vallotti, fifth comma, quarter-comma meantone,

sixth-comma meantone, and just intonation.

Note Equal Vallotti 1/5 C 1/4 CMT 1/6 CMT Just

C 0 5.9 8.2 10.3 4.9 15.6C#/D[ 0 0 71.6 27.4 13.0 713.7D 0 2 2.7 3.4 1.6 72

D#/E[ 0 3.9 2.3 20.5 9.8 79.8E 0 72.0 2.0 73.4 71.6 2.0F 0 7.8 6.3 13.7 6.5 13.7

F#/G[ 0 72.0 73.5 710.3 74.9 715.6G 0 3.9 5.5 6.8 3.3 17.6G#/A[ 0 2.0 0.4 24.0 11.4 711.7

A 0 0 0 0 0 0A#/B[ 0 5.9 4.3 17.1 8.1 11.7B 0 73.9 70.8 76.8 73.3 3.9

8A cent is one hundredth of a semitone, i.e. one twelve-

hundredth of an octave.

TempEst: harpsichord temperament estimation in a Semantic Web environment 329

Downloaded By: [Dixon, Simon] At: 20:56 25 February 2011

The second problem is that in musical recordings,notes rarely occur in isolation. There are almost alwaysmultiple notes sounding simultaneously, and this has thepotential to bias any frequency estimates. To makematters worse, the intervals which are often favoured inmusic are those where the partials coincide. In particular,it can be difficult to discern whether a spectral peak is afundamental frequency or a partial of another funda-mental. The ability to distinguish between these cases iscrucial to accurate pitch estimation and thus also tosuccessful temperament classification.

For example, if an A2 at 110 Hz is played, thespectrum will also have a peak at 330 Hz, which is thethird harmonic of the note A2, but also (if the fifth ispure) the fundamental frequency of an E4. In manytemperaments however, the actual note E4 will have afrequency different from 330 Hz (e.g. 329.6 Hz in equaltemperament), so the estimation of E4 would be biased ifthis partial was used in its estimation. We thereforerequire a method for distinguishing peaks correspondingto fundamental frequencies from those which are causedby higher harmonics. This would not be the case if we didassume knowledge of the score, but this was deliberatelyavoided in our original formulation of the problem, inorder to increase generality, which is important forpractical applications. In the context of a web servicesuch as TempEst, the benefits of not requiring anyknowledge of the score are apparent.

To avoid the bias caused by overlapping partials,while circumventing the problem of full polyphonictranscription,9 we introduced the concept of conservativetranscription (Tidhar et al., 2010), which entails ignoringfrequencies which could potentially be harmonics oflower co-occurring frequencies. We showed that con-servative transcription is applicable in practical situa-tions, and that it can improve temperament estimation inrecordings of typical harpsichord music. Figure 2demonstrates the result of conservative transcription on

the first two bars of J.S. Bach’s prelude in C Major fromthe first volume of the Well-Tempered Clavier, BWV 846.

Using various pitch estimation algorithms, weobtained satisfactory results (23 out of 24 real harpsi-chord recordings and 24 out of 24 sythesized harpsichordrecordings were classified correctly). When appliedwith suitable parameters and enhanced by methodssuch as conservative transcription, several differentalgorithms performed well with only minor differences.For our current implementation of TempEst we use theQuadratically Interpolated Fast Fourier Transform(QIFFT) (Smith, 2008).

2. The temperament ontology

In order to formally express individual temperaments, aswell as temperament classes and properties, and in orderto facilitate reasoning and inference, we developed theTemperament Ontology, which extends and comple-ments the Music Ontology. The ontology is publiclyavailable online.10

The Temperament Ontology is designed to accom-modate extensions and modifications by the user com-munity, so that additional individual temperaments aswell as richer details and properties can be added as theontology develops. The current ontology is admittedlybiased toward historical temperaments, but this bias maychange according to the user community’s interests.

2.1 Ontology design

Temperaments can be classified according to differentcriteria. In order to account for all of these, and toaccommodate additional criteria in the future, our modelconsists of a shallow hierarchy, in which all classes aresubclasses of Temperament (see Figure 3). Whilehierarchical relations are kept to a minimum due topossible ambiguities and in order to facilitate extensi-bility, more complex relations can be expressed, forexample, using multiple class memberships.

Since there is more than one way to associate tuningsystems with their properties, we treat temperamentdescriptions as concepts as well, and use reification11

which keeps the model open and extensible. TheTemperament Description class currently allowstwo alternative temperament specification methods.Individual temperaments can be assigned with either orwith both, and additional specification methods can beeasily accommodated.

This design allows a high degree of expressiveflexibility, and is easily extensible. However, this

121

121

121

121

121

121

121

121

121

121

121

121

A

D

GC

F

Bb

Eb

G#

C#F#

B

E

61

61

61

61

61

61

A

D

GC

F

Bb

Eb

G#

C#F#

B

E

Fig. 1. Circle of fifths representations for Equal Temperament(left) and Vallotti (right). The fractions specify the distribution

of the Pythagorean comma between the fifths (fifths without afraction are pure).

9Polyphonic transcription is still regarded an unsolved

problem (Casey et al., 2008; Klapuri, 2009).

10http://purl.org/ontology/temperament11In this context: representing a relation by an object in order to

reason about the relation itself.

330 Dan Tidhar et al.

Downloaded By: [Dixon, Simon] At: 20:56 25 February 2011

structure does not enforce semantic coherence, and it isup to the modellers to avoid contradictions such as asingle individual being defined as a Well Temperament(i.e. a temperament that works well for all keys) and as aRestricted Temperament (i.e. a temperament that doesnot) at the same time.

2.2 Temperament descriptions

We model two forms of temperament specificationswhich are subclasses of the Temperament Descriptionclass, reflecting two of the commonly used methods.Other methods can easily be accommodated as addi-tional subclasses (see Figure 3). The two methods aredeviations (in cents) from equal temperament (seeTable 1), and deviations from pure fifths (frequencyratio 3:2) which are specified as signed fractions of acomma (see Figures 3 and 4). Various commas arecurrently modelled including the Pythagorean and theSyntonic ones, and others can be added.

Figure 4 shows a segment of an RDF graph describingVallotti using the temperament description typeCircleOfFifths. The graph expresses the fact thatin this particular temperament, the fifth from C to Gis diminished by one sixth of the Pythagorean comma.The description type CircleOfFifths encodesspecific temperaments in terms of the deviations ofeach fifth from a pure fifth, expressed as signed fractionsof the Pythagorean comma. In Listing 1 (in Section 3.3

below), we also provide a part of a serialized RDFdocument showing a more detailed example of describinga temperament using both supported methods.

2.3 Individual temperaments

Temperaments are modelled as individual instances of atleast one class of the Temperament class hierarchy(see Figure 3), and would normally inherit from morethan one subclass according to their properties. The

Fig. 2. The first two bars of J.S. Bach’s Prelude in C Major are shown in piano-roll notation (the horizontal axis represents time and

the vertical axis represents pitch). The actual notes are rendered in light grey, with the Conservative Transcription superimposed indark borders.

Fig. 3. The main classes of the Temperament Ontology, including some of the Temperament subclasses (right) and the parallel

Temperament Description class (left).

Fig. 4. A segment of a temperament description of type

CircleOfFifths for Vallotti.

TempEst: harpsichord temperament estimation in a Semantic Web environment 331

Downloaded By: [Dixon, Simon] At: 20:56 25 February 2011

ontology currently includes individuals for many of thecommonly-used historical temperaments and others cannaturally be added as required.

3. TempEst

TempEst is a web application built on Semantic Webtechnologies. It enables users to upload harpsichordrecordings and retrieve information about the instru-ment’s tuning. TempEst integrates temperament estima-tion methods with temperament profiles expressed usingthe Temperament Ontology, and with some degree ofinference over the measurement data expressed in RDF.The system will be developed further as the usercommunity grows, as a result of further ontologydevelopment (e.g. additional temperaments) as well asfurther development of the inference rules.

3.1 System components

TempEst is constructed from a number of modularcomponents, combining software developed in theOMRAS2 project with some third-party libraries. Inthis section, we briefly describe the most important partsof the system, which allow easy exposure of the musicprocessing algorithms as a web application.

3.1.1 Vamp plugins and Vampy

Vamp is a plugin system for audio feature extractionusing a dedicated API (Cannam, 2009) developed inOMRAS2 for this purpose. Vamp plugins accept audioinput and produce structured data output as described inthe Cþþ SDKs provided for plugin and host develop-ment. A large number of Vamp plugins are available12

for audio analysis, ranging from low-level featureextractors to plugins computing high-level annotationssuch as structural segmentation.

Vampy provides an easy to use interface between thePython programming language and Vamp plugin hosts.Vampy itself is implemented as a Vamp plugin whichmay be installed in the usual manner.13 When it isinstalled, any appropriately structured Python scriptsfound in its script directory will be presented as ifthey were individual Vamp plugins for any Vamp host touse. Vampy allows for taking advantage of Python’shigh-level and efficient software libraries such asNumPy and SciPy for numerical computation and signalprocessing.

Vamp and Vampy plugins require a host environmentto run, for example Sonic Visualiser14 and SonicAnnotator.15 In the TempEst system, we implementedthe previously described temperament estimation algo-rithm as a Vampy plugin. Besides the ability to use thenumerical libraries available for Python, this dynamicenvironment enables us to generate an RDF descriptionfrom numerical results, and feed it into an inferenceengine such as Cwm16 (Closed World Machine), whichwas developed as part of the Semantic Web ApplicationPlatform (Berners-Lee et al., 2006).

3.1.2 Web application

TempEst is implemented as part of the SAWA17 frame-work. This framework uses Semantic Web technologies tooffer services to audio researchers and interested end-users. SAWA can be used to upload a set of audio files andperform feature extraction in batch mode using SonicAnnotator, its efficient underlying Vamp plugin host.SAWA uses its own web server based on the Cherrypy18

library. This allows for writing HTTP request handlers asordinary methods defined within a web application class,making it straightforward to accept audio files or dataentered into HTML forms, as well as serving dynamicallygenerated web pages and publishing data received fromother system components. A detailed description of thecomponents of SAWA is available in Fazekas et al. (2009).

The main SAWA application was designed for generalpurpose web-based audio feature extraction. However,this has been extended to incorporate other applicationswith similar needs such as SAWA-Recommender,19 aquery-by-example music retrieval application. TempEst20

is the latest addition to this framework. Using TempEst,one may upload harpsichord recordings for analysis, andreceive a detailed description of the temperament(s) usedin the recordings. This includes the temperamentidentified by the classifier, some known properties ofthe temperament resulting from inference on measure-ment data,21 and the measurements themselves on whichthe classification and inference are based. These resultsare expressed using the Temperament Ontology andpublished as RDF documents.

12Vamp plugins: http://vamp-plugins.org/download.html13See http://vamp-plugins.org/download.html#install

14Available from: http://sonicvisualiser.org/15Available from: http://omras2.org/SonicAnnotator16Available from: http://www.w3.org/2000/10/swap/doc/cwm.html17Available at http://www.isophonics.net/sawa/18Available at: http://www.cherrypy.org/19Available at: http://www.isophonics.net/sawa/rec20Available at: http://www.isophonics.net/sawa/tempest21The signal processing component of TempEst producesdeviations from equal temperament. This description format isconverted to the circle of fifths before inferring additional

properties of the temperament, such as regularity.

332 Dan Tidhar et al.

Downloaded By: [Dixon, Simon] At: 20:56 25 February 2011

3.2 TempEst inference

In order to deal with a potentially unlimited number ofdifferent temperaments, the classification frameworkneeds to be extended beyond a particular set of pre-defined temperaments. By linking the classifier directlyto the ontology, its target set becomes extensible, sinceany adjustments or extensions to the ontology takeimmediate effect on the classification scope. At anygiven moment, however, this scope is inevitably limitedby the coverage of the ontology. The TempEst inferencemodule attempts to generate useful information aboutany given temperament, even if it is not covered by theontology. This is done in the form of a set ofimplication rules which deduce information fromheuristic measurements as described below. The inputto the rules, as well as the rules themselves, arecurrently regarded as proof of concept. One of ourresearch goals is to explore inference algorithms whichclosely reflect the open-ended structure of the ontology,such that for a given temperament, temperament classesare determined with each inference step as theidentification becomes more specific. It remains anopen question whether it is easier to identify a specificknown temperament and then derive its properties, orto determine these properties first and use them as abasis for identification.

For example, in the former approach, upon encoun-tering a Vallotti temperament, prior to identifying it asVallotti, the inference system would deduce its irregu-larity, its wellness, and the fact that it is a sixth-commatemperament, and only then determine its identity. Inthe latter approach (as implemented in the currentsystem), however, the identification is based on nearestneighbour classification, independently of any deducedcharacteristics. We currently implement two sets ofrules, based on heuristically defined thresholds asdescribed below.

3.2.1 Tolerance thresholds for pureness and equality

To facilitate the inference rules described below, wedetermine how close to 3:2 a frequency ratio needs to beto be regarded as a pure fifth, and how close twofrequency ratios need to be in order for them to beconsidered ‘equal’. We derive these heuristically from ouroriginal classification experiment, by checking which arethe most distant fifths still classified as equal to oneanother in any of the temperaments, and which fifths arefarthest from 3:2 but still classified as pure. Although theclassifier does not directly provide any information aboutfifths, this information is implied by the estimatedtemperament, e.g. in Vallotti, C#7G# is pure, F7Cand C7G are equal to one another, and so on. Thethresholds determined this way are used as defaults(currently set to 0.02 Pythagorean Comma or approxi-

mately 0.47 cents), which can be overridden by the userissuing the query.

3.2.2 Rules based on pure fifths

The following rules deduce temperament characteristicsfrom the number of fifths which are considered pure,i.e. fall within the vicinity of 3:2 defined by the above-mentioned threshold. The total number of pure fifthscan theoretically vary between 0 and 11 (12 is obviouslynot possible within the circle of fifths). Note that inorder for the output to be expressed in RDF, theconsequents have to be positively formulated, e.g.rather than concluding that a temperament is not awell temperament (in the third rule below), we use theclass of restricted temperaments modelled by theontology.

1. If any fifth is pure, then the temperament isunequal

2. If more than two fifths are pure, then the tempera-ment is irregular

3. If more than 8 fifths are pure, then the temperamentis restricted

4. If 11 fifths are pure, then the temperament isPythagorean

5. If exactly 6 fifths are pure, then the temperament is6th-comma-like

3.2.3 Rules based on sets of equal fifths

The following rules deduce temperament characteristicsfrom the largest set of fifths that are equal to each other,where equality is defined by the above-mentionedthreshold. The number of equal fifths can vary between0 and 12.

1. If all fifths are equal, then the temperament is equaltemperament

2. If at least 10 fifths are equal, then the temperament isregular

3. If less then 10 fifths are equal, then the temperamentis irregular

3.3 Query interface and system output

TempEst queries are based on audio material which isuploaded via the SAWA interface. The user is required tospecify an approximate tuning reference frequency.22

22The system is capable of fine-tuning the reference

frequency within a vicinity of + 40 cents, but requires areference to avoid ambiguities between, for example, a B[ inbaroque pitch and an A in modern pitch, as both canpotentially be associated with a frequency in the vicinity of

440 Hz.

TempEst: harpsichord temperament estimation in a Semantic Web environment 333

Downloaded By: [Dixon, Simon] At: 20:56 25 February 2011

Users may optionally also specify alternative tolerancesfor determining which fifths are considered pure andwhich are considered equal by the system. Theseparameters then override the system’s defaults. The userinterface for configuring TempEst’s parameters is shownin Figure 5. The output, in RDF format, consists of anestimated temperament, the estimated pitch-class frequen-cies, as well as any output produced by the inferencemodule.

As the output of the system, we may obtain the RDFdocument using N3 notation23 shown in Listing 1. Thisdescribes Vallotti temperament using both temperament

Listing 1. RDF description of Vallotti temperament (abbreviated).

23For an explanation of RDF syntax and serialization formatsincluding N3 see http://www.w3.org/TeamSubmission/turtle/and http://www.w3.org/DesignIssues/Notation3.html. See alsoFazekas et al. (2010) in the present issue for examples and

applications in the music domain.

334 Dan Tidhar et al.

Downloaded By: [Dixon, Simon] At: 20:56 25 February 2011

description methods. Note that the actual documentincludes 12 elements of type PitchClassDeviationfor each description type DeviationsFromEqual, and12 elements of type FifthInterval for each descrip-tion type CircleOfFifths; due to space limitations weonly show two of each.

4. Conclusion and future directions

TempEst combines signal processing techniques withsemantic computing and Semantic Web technology, andis motivated by musicological research goals. It illus-trates the use of OMRAS2 technology in the type of end-user application which was targeted by the project.TempEst makes extensive use of OMRAS2 componentssuch as the Music Ontology and SAWA. In thispaper, we reviewed our previous work in temperamentclassification, and have extended it with the Tempera-ment Ontology, an inference module, and a webapplication. The resulting system incorporates many ofthe OMRAS2 goals, most notably full integration ofdigital signal processing and semantic informationprocessing to perform an MIR task in a Semantic Webenvironment.

We plan to facilitate collection of usage data, andalthough copyright issues dictate that we cannot main-tain the uploaded recordings, we will be able to collectand publish some statistics describing the recognizedtemperaments. These may include, for example, informa-tion about the most popular temperaments used with the

system, as well as co-occurrence analysis of individualtemperaments. At a later stage, we plan to implement anoptional log-in mechanism, which would enable users tomaintain their data across sessions, and would enable usto collect more detailed information about the users andtheir queries.

The music information retrieval task of temperamentestimation can be extended to include keyboard instru-ments other than the harpsichord, as well as recordingsinvolving additional instruments. Temperament, ratherthan intonation, is a potentially relevant characteristic ofany recording which involves at least one fixed-pitchinstrument, such as recordings of early music ensembleswith keyboard continuo.

Further research will focus on the development ofthe semantic inference module based on analysis ofaccumulated data. In particular, we are interested ininference rules which reflect the structure of the ontology,by corresponding to particular temperament classes.Since TempEst is directly linked to the TemperamentOntology, it has the potential to expand as more data isgathered and analysed. The ontology and the tempera-ment data service are designed to grow and toaccommodate further individual temperaments as wellas temperament classes, and will thus be adjusted toreflect the interests of its user community. As TempEstmatures, with richer ontology, data service, and inferencecomponents, it will be suitable for analysing large audiocollections, and facilitate in-depth studies of tempera-ment including, for example, diachronic analysis ofrecordings.

Fig. 5. Temperament Estimator Configuration Interface in SAWA.

TempEst: harpsichord temperament estimation in a Semantic Web environment 335

Downloaded By: [Dixon, Simon] At: 20:56 25 February 2011

Acknowledgements

The authors acknowledge the support of the School ofElectronic Engineering and Computer Science, QueenMary University of London, and the EPSRC-fundedICT project OMRAS2 (EP/E017614/1).

References

Barbour, J.M. (2004/1951). Tuning and Temperament, AHistorical Survey. New York: Dover.

Berners-Lee, T., Kolovski, V., Connolly, D., Hendler, J.,Kolovski, V., & Scharf, Y. (2006). A reasoner for theweb. Retrieved January 2010 from http://www.w3.org/2000/10/swap/doc/paper/

Cannam, C. (2009). The Vamp audio analysis plugin API: Aprogrammer’s guide. Retrieved from http://vamp-plugins.org/guide.pdf

Casey, M., Veltkamp, R., Goto, M., Leman, M., Rhodes,C., & Slaney, M. (2008). Content-based music informa-tion retrieval: Current directions and future challenges.Proceedings of the IEEE, 96(4), 668–696.

Di Veroli, C. (2009). Unequal Temperaments: Theory,History, and Practice. Bray, Ireland: Bray Baroque.

Fazekas, G., Cannam, C., & Sandler, M. (2009). Reusablemetadata and software components for automatic audioanalysis. In Proceedings of the IEEE/ACM Joint Con-ference on Digital Libraries (JCDL’09) Workshop onIntegrating Digital Library Content with ComputationalTools and Services, Austin, TX, USA.

Fazekas, G., Raimond, Y., Jacobson, K., & Sandler, M.(2010). An overview of Semantic Web activities in theOMRAS2 project. Journal of New Music Research 39(4),295–311.

Helmholtz, H. v. (1863). Die lehre von den tonempfindungenals physiologische grundlage fur die theorie der musik (Onthe Sensations of Tone as a Physiological Basis for theTheory of Music). Richmond, VA: Brunswick.

Klapuri, A.P. (2009). A method for visualizing the pitchcontent of polyphonic music signals. In Proceedings ofthe 10th International Society for Music InformationRetrieval Conference (ISMIR 2009), Kobe, Japan, pp.615–620.

Lehman, B. (2005). Bach’s extraordinary temperament: OurRosetta Stone—1. Early Music, 33(1), 3–23.

Lundin, R.W. (1947). Towards a cultural theory ofconsonance. Journal of Psychology, 23, 45–49.

Palisca, C.V., & Moore, B.C.J. (2010). Consonance.Retrieved from http://www.grovemusic.com/

Raimond, Y., Abdallah, S., Sandler, M., & Frederick, G.(2007). The music ontology. In Proceedings of the 7thInternational Conference on Music Information Retrieval(ISMIR 2007), Vienna, Austria, pp. 417–422.

Rasch, R. (2002). Tuning and temperament. In T.Christensen (Ed.), The Cambridge History of WesternMusic Theory (pp. 193–222). Cambridge: CambridgeUniversity Press.

Schroeder, M.R. (2009). Number Theory in Science andCommunication: With Applications in Cryptography, Phy-sics, Digital Information, Computing, and Self-similarity.Berlin: Springer.

Sethares, W. (2004). Tuning, Timbre, Spectrum, Scale.Berlin: Springer.

Smith, J.O. (2008). Spectral Audio Signal Processing,October 2008 Draft [Online book]. Retrieved fromhttps://ccrma. stanford.edu/sasp/

Terhardt, E. (1977). The two-component theory of musicalconsonance. In E. Evans & J. Wilson (Eds.), Psychophy-sics and Physiology of Hearing, pp. 381–390. London:Academic.

Tidhar, D., Mauch, M., & Dixon, S. (2010). High precisionfrequency estimation for harpsichord tuning classifica-tion. In IEEE International Conference on Acoustics,Speech, and Signal Processing (ICASSP2010), Dallas,TX, USA, pp. 61–64.

336 Dan Tidhar et al.

Downloaded By: [Dixon, Simon] At: 20:56 25 February 2011